位置:首页 > 九章学会Hive - Hive的函数及使用 >

九章学会Hive - Hive的函数及使用

作者:小牛君|发布时间:2017-06-16

小牛学堂的课程大纲最近进行了再一次升级,并且同时推出Java大数据平台开发班、Python爬虫与数据挖掘班、Spark项目班、Spark大神班、机器学习算法实战班、BI数据分析实战班, 目前这类人群凤毛麟角,导致这个行业的平均薪资极高,为此小牛学堂集合了行业的诸多大牛开设对应班级,为想学习的同学提供机会!
如果想了解详细情况,请联系 今日值班讲师 或者直接加入千人QQ群进行咨询:210992946

以下是本文正文:


1.  hive函数

所有的内置函数都注册在

org.apache.hadoop.hive.ql.exec.FunctionRegistry类中

以静态代码快的形式进行注册

 

show functions;

desc function functionName;

desc function extended functionName;

 

例:

desc function year;

year(param) - Returns the year component of the date/timestamp/interval

 

desc function extended when;

CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END - When a = true, returns b; when c = true, return d; else return e

Example:

 SELECT

 CASE

   WHEN deptno=1 THEN Engineering

   WHEN deptno=2 THEN Finance

   ELSE admin

 END,

 CASE

   WHEN zone=7 THEN Americas

   ELSE Asia-Pac

 END

 FROM emp_details

 

desc function extended concat;

concat(str1, str2, ... strN) - returns the concatenation of str1, str2, ... strN or concat(bin1, bin2, ... binN) - returns the concatenation of bytes in binary data  bin1, bin2, ... binN

Returns NULL if any argument is NULL.

Example:

  > SELECT concat('abc', 'def') FROM src LIMIT 1;

  'abcdef'

 

desc function extended concat_ws;

concat_ws(separator, [string | array(string)]+) - returns the concatenation of the strings separated by the separator.

Example:

  > SELECT concat_ws('.', 'www', array('facebook', 'com')) FROM src LIMIT 1;

  'www.facebook.com'

 

desc function extended collect_list;

collect_list(x) - Returns a list of objects with duplicates

 

desc function extended collect_set;

collect_set(x) - Returns a set of objects with duplicate elements eliminated

 

desc function extended split;

split(str, regex) - Splits str around occurances that match regex

Example:

  > SELECT split('oneAtwoBthreeC', '[ABC]') FROM src LIMIT 1;

  ["one", "two", "three"]

 

desc function extended explode;

explode(a) - separates the elements of array a into multiple rows, or the elements of a map into multiple rows and columns

 

desc function extended case;

CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END - When a = b, returns c; when a = d, return e; else return f

Example:

 SELECT

 CASE deptno

   WHEN 1 THEN Engineering

   WHEN 2 THEN Finance

   ELSE admin

 END,

 CASE zone

   WHEN 7 THEN Americas

   ELSE Asia-Pac

 END

 FROM emp_details

 

desc function extended get_json_object;

get_json_object(json_txt, path) - Extract a json object from path

Extract json object from a json string based on json path specified, and return json string of the extracted json object. It will return null if the input json string is invalid.

A limited version of JSONPath supported:

  $   : Root object

  .   : Child operator

  []  : Subscript operator for array

  *   : Wildcard for []

Syntax not supported that's worth noticing:

  ''  : Zero length string as key

  ..  : Recursive descent

  @   : Current object/element

  ()  : Script expression

  ?() : Filter (script) expression.

  [,] : Union operator

  [start:end:step] : array slice operator

 

desc function extended json_tuple;

json_tuple(jsonStr, p1, p2, ..., pn) - like get_json_object, but it takes multiple names and return a tuple. All the input parameters and output column types are string.

 

时间函数

timestamp yyyy-MM-dd HH:mm:ss.SSS 格式的字符串

datetime yyyy-MM-dd HH:mm:ss 格式的字符串

date yyyy-MM-dd 格式的字符串

 

select current_date  from dual;

2017-04-08

select current_timestamp  from dual;

2017-04-08 10:10:25.783

 

from_unixtime(bigint unixtime[, string format]) return: string

例:

select from_unixtime(1323308943,'yyyyMMdd') from dual;

20111208

select from_unixtime(1323308943) from dual;

2011-12-08 09:49:03

 

unix_timestamp([string date[,string pattern]]) return:bigint

说明:

不带任何参数,则返回当前时间戳

不带参数pattern,则"yyyy-MM-dd HH:mm:ss"格式的字符串到UNIX时间戳。

如果转化失败,则返回0

:

select unix_timestamp() from dual;

1491615665

select unix_timestamp('2011-12-07 13:01:03') from dual;

1323234063

select unix_timestamp('20111207 13:01:03','yyyyMMdd HH:mm:ss') from dual;

1323234063

 

to_date(string/date/datetime timestamp) return:string

说明: 返回日期时间字段中的日期部分。

例:

select to_date('2011-12-08 10:03:01') from dual;

2011-12-08

 

year(string date):int

例:

select year('2011-12-08 10:03:01') from dual;

2011

select year('2012-12-08') from dual;

2012

select month('2011-12-08 10:03:01') from dual;

12

select month('2011-08-08') from dual;

8

select day('2011-12-08 10:03:01') from dual;

8

select day('2011-12-24') from dual;

24

select hour('2011-12-08 10:03:01') from dual;

10

select minute('2011-12-08 10:03:01') from dual;

3

select second('2011-12-08 10:03:01') from dual;

1

 

说明: 返回日期在当年的周数。

select weekofyear('2011-12-08 10:03:01') from dual;

49

 

datediff(string enddate, string startdate) : int天数

例:

select datediff('2012-12-08','2012-05-09') from dual;

 

date_add(string startdate, int days) : string

例:

select date_add('2012-12-08',10) from dual;

2012-12-18

 

date_sub (string startdate, int days) : string

例:

select date_sub('2012-12-08',10) from dual;

2012-11-28

8.1流程控制

select id,

  case id

    when 1 then 'a'

    when 2 then 'b'

    when 3 then 'c'

    when 4 then 'd'

    else 'other'

  end

from t1;

 

 

8.2单词计数

select wc.word,count(1) count from (

  select explode(split(line,' ')) word from dula

) wc

group by wc.word

order by count desc;

 

 

8.3行转列

select

  id,

  name,

  concat_ws(",",collect_set(address)) addr

from address group by id,name;

 

8.4列转行与lateral view

select id, name, tmpTable.address addr

from addr2

lateral view

  explode(split(addr,",")) tmpTable

as address;

 

lateral view 会产生一个支持别名表的虚拟表。

 

8.5分组topN

create table emp

as

select e.id empid,e.deptid,s.salary from t_employee e

join t_salary s on e.id=s.empid;

 

select e.empid id,e.deptid deptid,e.salary salary from(

  select *,

    row_number() over(partition by deptid order by salary) rank

  from emp) e

where e.rank<=3;

 

8.6json函数

select * from json;

json.line

{"movie":"1193","rate":"5","timeStamp":"978300760","uid":"1"}

{"movie":"661","rate":"3","timeStamp":"978302109","uid":"1"}

{"movie":"914","rate":"3","timeStamp":"978301968","uid":"1"}

{"movie":"3408","rate":"4","timeStamp":"978300275","uid":"1"}

 

 

select a.* from json lateral view json_tuple(line,'movie','rate','timeStamp','uid') a as movie,rate,time,uid;

a.movie    a.rate   a.time        a.uid

1193      5       978300760       1

661       3       978302109       1

914       3       978301968       1

3408      4       978300275       1

 

select

   movie,

   rate,

   from_unixtime(cast(time as bigint), 'yyyy-MM-dd HH:mm:ss') time,

   uid

from json

lateral view json_tuple(line,'movie','rate','timeStamp','uid') a

as movie,rate,time,uid;

 

select

   json_tuple(line,'movie','rate','timeStamp','uid') as (movie,rate,time,uid)

from dula;

 

movie     rate    time    uid

1193      5       2001-01-01 06:12:40     1

661       3       2001-01-01 06:35:09     1

914       3       2001-01-01 06:32:48     1

3408      4       2001-01-01 06:04:35     1

 

select

  get_json_object(line,'$.movie') movie,

  get_json_object(line,'$.rate') rate,

  from_unixtime(cast(get_json_object(line,'$.timeStamp') as bigint), 'yyyy-MM-dd HH:mm:ss') time,

  get_json_object(line,'$.uid') uid

from json;

movie     rate    time    uid

1193      5       2001-01-01 06:12:40     1

661       3       2001-01-01 06:35:09     1

914       3       2001-01-01 06:32:48     1

3408      4       2001-01-01 06:04:35     1

 

8.7窗口函数over partition by

 

 

 

例一:sum窗口

需求:

create table t_salary(username   string,month string,salary int)

row format delimited fields terminated by   ',';

load data local inpath   '/root/t_salary.dat' into table t_salary;

 

A,2015-01,5

A,2015-01,15

B,2015-01,5

A,2015-01,8

B,2015-01,25

A,2015-01,5

A,2015-02,4

A,2015-02,6

B,2015-02,10

B,2015-02,5

A,2015-03,14

A,2015-03,16

B,2015-03,20

B,2015-03,5

 

需要得到如下统计结果:

用户   月份   月总额 累计总额

A      2015-01   33   33

A      2015-02   10   43

A      2015-03   30   73

B      2015-01   30   30

B      2015-02   15   45

......   

 

create view sumsal as

select

  username,

  month,

  sum(salary) salary

from t_salary

group by username,month;

 

create table salt as

select

  username,

  month,

  sum(salary) salary

from t_salary

group by username,month;

 

select

  username,

  month,

  salary,

  sum(salary) over(partition by username order by month) accumulativeSalary

from salt;

 

select

  username,

  month,

  salary,

  sum(salary) over(partition by username order by month) accumulativeSalary

from (

  select

    username,

    month,

    sum(salary) salary

  from t_salary

  group by username,month;

);

 

 

 

 

overorder by salary range between 5 preceding and 5 following):窗口范围为当前行数据幅度减55后的范围内的。

create table t2(name string,class   string,s int)

row format delimited fields terminated by   ',';

 

adf,3,45

asdf,3,55

cfe,2,74

3dd,3,78

fda,1,80

gds,2,92

ffd,1,95

dss,1,95

ddd,3,99

gf,3,99

 

load data local inpath   '/root/hive-demo/t2.dat' into table t2;

select * from t2

t2.name t2.class        t2.s

adf       3       45

asdf      3       55

cfe       2       74

3dd       3       78

fda       1       80

gds       2       92

ffd       1       95

dss       1       95

ddd       3       99

gf        3       99

select name,class,s, sum(s)over(order by s range between 2 preceding and 2 following) mm from t2;

name      class   s       mm

adf     3         45      45

asdf      3       55      55

cfe       2       74      74

3dd       3       78      158

fda       1       80      158

gds       2       92      92

dss       1       95      190

ffd       1       95      190

gf        3       99      198

ddd       3       99        198

select name,class,s, sum(s)over(order by s rows between 2 preceding and 2 following) mm from t2;

name      class   s       mm

adf       3       45      174

asdf      3       55      252

cfe       2       74      332

3dd       3       78      379

fda       1       80      419

gds       2       92      440

dss       1       95      461

ffd       1       95      480

gf        3       99      388

ddd       3       99      293

select name,class,s,rank()over(partition by class order by s desc) mm from t2;

name      class   s       mm

dss       1       95      1

ffd       1       95      1

fda       1       80      3

gds       2       92      1

cfe       2       74      2

gf        3       99      1

ddd       3       99      1

3dd       3       78      3

asdf      3       55      4

adf       3       45      5

select name,class,s,dense_rank()over(partition by class order by s desc) mm from t2;

name      class   s       mm

dss       1       95      1

ffd       1       95      1

fda       1       80      2

gds       2       92      1

cfe       2       74      2

gf        3       99      1

ddd       3       99      1

3dd       3       78      2

asdf      3       55      3

adf       3       45      4

 

 

 

 

between 语法:

-- 默认为从起点到当前行

sum(pv) over(partition by cookieid order   by createtime ASC)

--从起点到当前行

sum(pv) over(partition by cookieid order   by createtime rows between unbounded preceding and current row)

--分组内所有行

sum(pv) over(partition by cookieid)

--当前行+往前3

sum(pv) over(partition by cookieid order   by createtime rows between 3 preceding and current row)

--当前行+往前3+往后1

sum(pv) over(partition by cookieid count
 

order by createtime rows between 3   preceding and 1 following)

---当前行+往后所有行

sum(pv) over(partition by cookieid order   by createtime rows between current row and unbounded following) 

 

rows between A and B

current row 当前行

unbounded preceding 起始行

3 preceding 前三行

1 following 后一行

unbounded following 末尾行

 

 

 

 

 

 



了解更多详情请联系 今日值班讲师 或者直接加入千人QQ群进行咨询:210992946

分享到: