位置:首页 > 九章学会Hive - Hive的开发 >

九章学会Hive - Hive的开发

作者:小牛君|发布时间:2017-06-16

小牛学堂的课程大纲最近进行了再一次升级,并且同时推出Java大数据平台开发班、Python爬虫与数据挖掘班、Spark项目班、Spark大神班、机器学习算法实战班、BI数据分析实战班, 目前这类人群凤毛麟角,导致这个行业的平均薪资极高,为此小牛学堂集合了行业的诸多大牛开设对应班级,为想学习的同学提供机会!
如果想了解详细情况,请联系 今日值班讲师 或者直接加入千人QQ群进行咨询:Spark大数据交流学习群613807316

以下是本文正文:


1.  Hive 开发

debug模式

hive -hiveconf hive.root.logger=DEBUG,console

 

debug远程调试hive

先启动hive远程调试

hive --debug

出现如下提示

Listening for transport dt_socket at address: 8000

然后在idea新建一个远程连接应用

image.png

然后以debug形式运行即可

image.png

 

自定义hook

Hive 中没有超级管理员一说,任何用户都可以进行 grant/revoke 等相关操作可以通过前置hook实现指定用户为超级管理员

 

写一个类继承 Hive 中的钩子 AbstractSemanticAnalyzerHook,并打成 jar 包,上传至$HIVE_HOME/lib/目录下

然后在 hive-site.xml 中配置

<property>

  <name>hive.semantic.analyzer.hook</name>

  <value>hive.AuthHook</value>

</property>

代码如下:

package hive;
 
import org.apache.hadoop.hive.ql.parse.*;
 
import org.apache.hadoop.hive.ql.session.SessionState;
 
 
public class AuthHook extends AbstractSemanticAnalyzerHook {
   
// 默认管理员账号
   
private static String admin = "root";
   
/**
     *
作为超级管理员,其实质是当拥有删除、创建等等操作之前,对以上行为进行拦截
     * 所以需要再 hook 切面里面来做
     */
   
@Override
   
public ASTNode preAnalyze(HiveSemanticAnalyzerHookContext context, ASTNode ast) throws SemanticException {
       
switch (ast.getToken().getType()) {
           
case HiveParser.TOK_CREATEDATABASE:
           
case HiveParser.TOK_DROPDATABASE:
           
case HiveParser.TOK_CREATEROLE:
           
case HiveParser.TOK_DROPROLE:
           
case HiveParser.TOK_GRANT:
           
case HiveParser.TOK_REVOKE:
 
           
case HiveParser.TOK_GRANT_ROLE:
           
case HiveParser.TOK_REVOKE_ROLE:
                String userName =
null;
               
if (SessionState.get() != null && SessionState.get().getAuthenticator() != null)
                    userName = SessionState.get().getAuthenticator().getUserName();
               
if (!admin.equalsIgnoreCase(userName))
                   
throw new SemanticException(userName + " can't use ADMIN options, except " + admin + ".");
               
break;
           
default:
               
break;
        }
       
return ast;
    }
}
 

 

 

 

 

自定义UDF函数

1、自定义 UDF extends org.apache.hadoop.hive.ql.exec.UDF

2、需要实现 evaluate 函数,evaluate 函数支持重载

3、把程序打包放到目标机器上去

4、进入 hive 客户端,添加 jar 包:hive>add jar jar 路径

可添加hdfs路径

add jar hdfs://mini1:9000/jars/hive.jar;

5、创建临时函数:

CREATE TEMPORARY FUNCTION 自定义名称 AS '自定义 UDF 的全类名'

6、执行 HQL 语句;

7、销毁临时函数:

DROP TEMPORARY FUNCTION 自定义名称

注: UDF 只能实现一进一出的操作.

可通在类上增加Description注解增加函数描述

 

@Description(

name = "MyUpper",

value = " _FUNC_(str) - Returns str with all characters changed to uppercase",

extended = "Example:\n > SELECT _FUNC_(name) FROM src;")

 

操作案例:

编写代码

@Description(name = "MyUpper",
      value =
" _FUNC_(str) - Returns str with all characters changed to uppercase",
      extended =
"Example:\n > SELECT _FUNC_(name) FROM src;")
 
public class MyUpper extends UDF {
  
public String evaluate(String text) {
     
return text==null?null:text.toUpperCase();
   }
}
 

jar包并上传

 

add jar /root/hive.jar;

create temporary function myUp as   'lidacui.MyUpper';

select myup(line) from dula;

 

 

 

@Description(name = "ZodiacOrConste",
        value =
" _FUNC_(data,int) - The specified date into the zodiac or constellation," +
               
" 0 on behalf of the zodiac, the constellation of the 1",
        extended =
"Example:\n"
               
+ " > SELECT _FUNC_(name,0) FROM src;\n"
               
+ " > SELECT _FUNC_(name,1) FROM src;")
 
public class ZodiacOrConstellationUDF extends UDF {
 
   
public final String[] zodiacArr = {"猴", "鸡", "狗", "猪", "鼠", "牛", "虎", "兔", "龙", "蛇", "马", "羊"};
   
public final String[] constellationArr = {"水瓶座", "双鱼座", "白羊座", "金牛座", "双子座", "巨蟹座",
           
"狮子座", "处女座", "天秤座", "天蝎座", "射手座", "魔羯座"};
   
public final int[] constellationEdgeDay = {20, 19, 21, 21, 21, 22, 23, 23, 23, 23, 22, 22};
 
   
/**
     * @param
birthday
    
* @param type     0 表示要获取生肖 1表示要获取星座
     * @return
    
*/
   
public String evaluate(Date birthday, Integer type) {
       
if (birthday == null)
           
return null;
       
if (type == 0)
           
return getZodica(new java.util.Date(birthday.getTime()));
       
else if (type == 1)
           
return getConstellation(new java.util.Date(birthday.getTime()));
 
       
return null;
    }
   
/**
     *
根据日期获取生肖
     * @return
    
*/
   
public String getZodica(java.util.Date date) {
        Calendar cal = Calendar.getInstance();
        cal.setTime(date);
       
return zodiacArr[cal.get(Calendar.YEAR) % 12];
    }
 
   
/**
     *
根据日期获取星座
     * @return
    
*/
   
public String getConstellation(java.util.Date date) {
        Calendar cal = Calendar.getInstance();
        cal.setTime(date);
       
int month = cal.get(Calendar.MONTH);
       
int day = cal.get(Calendar.DAY_OF_MONTH);
       
if (day < constellationEdgeDay[month])
            month = month -
1;
       
if(month==-1) month=11;
       
return constellationArr[month];
    }
}
 

add jar /root/hive.jar;

create temporary function zoc as   'lidacui.ZodiacOrConstellationUDF';

select name,time,zoc(time,0)   zodiac,zoc(time,1) constellation from t1;

 

 

 

自定义GenericUDF函数

操作案例:

@Description(name = "nvl", value = "_FUNC_(expr1, expr2) - Returns expr2 if expr1 is null",
        extended =
"Example:\n"
     
+ " > SELECT _FUNC_(dep, 'Not Applicable') FROM src;\n 'Not Applicable' if dep is null")
 
public class GenericUDFNVL extends GenericUDF {
  
private ObjectInspector[] argumentOIs;
  
private GenericUDFUtils.ReturnObjectInspectorResolver returnOIResolver;
  
@Override
  
public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException {
     
if (arguments.length != 2)
        
throw new UDFArgumentLengthException("The function nvl(expr1, expr2) needs at least two arguments.");
     
for (int i = 0; i < arguments.length; i++)
        
if (arguments[i].getCategory() != ObjectInspector.Category.PRIMITIVE)
           
throw new UDFArgumentTypeException(i, "Only primitive type arguments are accepted but "
                 
+ arguments[i].getTypeName() + " is passed.");
       
//保存参数类型列表
       
argumentOIs = arguments;
       
//用于获取参数值的对象,需调用update更新要获取的参数
     
returnOIResolver = new GenericUDFUtils.ReturnObjectInspectorResolver(true);
     
if (!(returnOIResolver.update(arguments[0]) && returnOIResolver.update(arguments[1])))
        
throw new UDFArgumentTypeException(1,
              
"The first and the second arguments of function NVL should have the same type, "
                    
+ "but they are different: \"" + arguments[0].getTypeName() + "\" and \""
                    
+ arguments[1].getTypeName() + "\"");
     
return returnOIResolver.get();
   }
  
@Override
  
public Object evaluate(DeferredObject[] arguments) throws HiveException {
       
// fieldValue is null, return defaultValue
       
Object retVal = returnOIResolver.convertIfNecessary(arguments[0].get(), argumentOIs[0]);
       
if(retVal==null)
            retVal=
returnOIResolver.convertIfNecessary(arguments[1].get(), argumentOIs[1]);
     
return retVal;
   }
  
@Override
  
public String getDisplayString(String[] children) {
      StringBuilder sb =
new StringBuilder();
      sb.append(
"if ").append(children[0]).append(" is null returns").append(children[1]);
     
return sb.toString();
   }
}
 

 

add jar /root/hive.jar;

create temporary function nvl as   'com.wangzhe.hivefunc.GenericUDFNVL';

select   nvl(1,2),nvl(null,5),nvl(null,"hello") from t1;

 

UDFType三大属性

1deterministic

 

如果传给一个UDF的参数值相同,无论调用多少次UDF,它的返回值都是相同的,我们就认为这个UDF是确定性UDF。上例中就属于不确定的。

 

2ststeful

 

有状态的UDFUDF实例内部会维护一个状态,每次调用该UDFevaluate方法,该状态可能会改变,如上例行号属性。

 

3distinctLike

 

这个标记通常出现在聚合函数中,在做聚合时,如果碰到两个相同的值,我们只取其中一个不影响最后的聚合结果,那么distinctLike为真,否则为假。

 



了解更多详情请联系 今日值班讲师 或者直接加入千人QQ群进行咨询:Spark大数据交流学习群613807316

分享到: