Hiveserver2 Ui and Upgrade hive2.0.0

升级hive的标准动作：

更新metadata，就是执行sql语句。更新前先备份原来的库！！
调整依赖，我这里是升级spark，编译参考spark-without-hive
修改参数(hive/spark/hadoop)来适应新版本
hiveserver2 ui：启动hiveserver2服务，访问10002端口即可。UI配置

环境说明：

centos5
hadoop-2.6.3
spark-1.6.0-without-hive
hive-2.0.0

操作详情

# 备份
[hadoop@file1 tools]$ mysqldump -uroot -p hive >hive1.2.1-20160413.backup.sql

# 准备好程序后的目录结构
[hadoop@file1 ~]$ ll
总计 20
drwxrwxr-x 3 hadoop hadoop 4096 04-13 11:59 collect
drwx------ 3 hadoop hadoop 4096 04-07 16:43 dfs
lrwxrwxrwx 1 hadoop hadoop   18 04-11 10:09 hadoop -> tools/hadoop-2.6.3
lrwxrwxrwx 1 hadoop hadoop   40 04-13 10:26 hive -> /home/hadoop/tools/apache-hive-2.0.0-bin
lrwxrwxrwx 1 hadoop hadoop   42 04-13 10:52 spark -> tools/spark-1.6.0-bin-hadoop2-without-hive
drwxrwxr-x 6 hadoop hadoop 4096 04-13 12:10 tmp
drwxrwxr-x 9 hadoop hadoop 4096 04-13 11:48 tools
[hadoop@file1 tools]$ ll
总计 84
drwxrwxr-x  8 hadoop hadoop  4096 04-08 09:25 apache-hive-1.2.1-bin
drwxrwxr-x  8 hadoop hadoop  4096 04-13 10:16 apache-hive-2.0.0-bin
drwxr-xr-x 11 hadoop hadoop  4096 04-07 16:34 hadoop-2.6.3
-rw-rw-r--  1 hadoop hadoop 46879 04-13 10:11 hive1.2.1-20160413.backup.sql
drwxrwxr-x  2 hadoop hadoop  4096 03-31 15:28 mysql
lrwxrwxrwx  1 hadoop hadoop    36 04-13 10:17 spark -> spark-1.6.0-bin-hadoop2-without-hive
drwxrwxr-x 11 hadoop hadoop  4096 04-07 18:23 spark-1.3.1-bin-hadoop2.6.3-without-hive
drwxrwxr-x 11 hadoop hadoop  4096 03-28 11:15 spark-1.6.0-bin-hadoop2-without-hive
drwxr-xr-x 11 hadoop hadoop  4096 03-31 16:14 zookeeper-3.4.6

# 环境变量我直接加载的是link软链接的，我这直接修改软链就行了。根据情况调整。
# apache-hive-2.0.0-bin同级目录建立spark软链接，或者再hive-env.sh中指定SPARK_HOME的位置

# hive-1.2.1并没有txn的表，所有要单独执行下hive-txn-schema-2.0.0.mysql.sql，
# 然后再更新（后面的Duplicate column的错没问题的）
[hadoop@file1 tools]$ cd apache-hive-2.0.0-bin/scripts/metastore/upgrade/mysql/
[hadoop@file1 mysql]$ mysql -uroot -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 10765
Server version: 5.5.48 MySQL Community Server (GPL)

Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> use hive;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> source hive-txn-schema-2.0.0.mysql.sql
Query OK, 0 rows affected (0.01 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.04 sec)

Query OK, 0 rows affected (0.03 sec)

Query OK, 1 row affected (0.04 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.03 sec)
Records: 0  Duplicates: 0  Warnings: 0

Query OK, 0 rows affected (0.01 sec)

Query OK, 1 row affected (0.00 sec)

Query OK, 0 rows affected (0.01 sec)

Query OK, 0 rows affected (0.01 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 1 row affected (0.00 sec)

Query OK, 0 rows affected (0.01 sec)

mysql> source upgrade-1.2.0-to-2.0.0.mysql.sql
+------------------------------------------------+
|                                                |
+------------------------------------------------+
| Upgrading MetaStore schema from 1.2.0 to 2.0.0 |
+------------------------------------------------+
1 row in set, 1 warning (0.00 sec)

+---------------------------------------------------------------------------------------------------------------+
|                                                                                                               |
+---------------------------------------------------------------------------------------------------------------+
| < HIVE-7018 Remove Table and Partition tables column LINK_TARGET_ID from Mysql for other DBs do not have it > |
+---------------------------------------------------------------------------------------------------------------+
1 row in set, 1 warning (0.00 sec)

Query OK, 0 rows affected, 1 warning (0.03 sec)

Query OK, 0 rows affected, 1 warning (0.00 sec)

Query OK, 0 rows affected, 1 warning (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

+---------------------------------+
| Completed remove LINK_TARGET_ID |
+---------------------------------+
| Completed remove LINK_TARGET_ID |
+---------------------------------+
1 row in set (0.02 sec)

Query OK, 0 rows affected (0.02 sec)

Query OK, 31 rows affected (0.01 sec)
Records: 31  Duplicates: 0  Warnings: 0

Query OK, 0 rows affected (0.05 sec)
Records: 0  Duplicates: 0  Warnings: 0

Query OK, 0 rows affected (0.02 sec)
Records: 0  Duplicates: 0  Warnings: 0

Query OK, 0 rows affected (0.00 sec)
Records: 0  Duplicates: 0  Warnings: 0

Query OK, 0 rows affected (0.03 sec)
Records: 0  Duplicates: 0  Warnings: 0

Query OK, 0 rows affected (0.00 sec)
Records: 0  Duplicates: 0  Warnings: 0

ERROR 1060 (42S21): Duplicate column name 'CQ_HIGHEST_TXN_ID'
ERROR 1060 (42S21): Duplicate column name 'CQ_META_INFO'
ERROR 1060 (42S21): Duplicate column name 'CQ_HADOOP_JOB_ID'
ERROR 1050 (42S01): Table 'COMPLETED_COMPACTIONS' already exists
ERROR 1060 (42S21): Duplicate column name 'TXN_AGENT_INFO'
ERROR 1060 (42S21): Duplicate column name 'TXN_HEARTBEAT_COUNT'
ERROR 1060 (42S21): Duplicate column name 'HL_HEARTBEAT_COUNT'
ERROR 1060 (42S21): Duplicate column name 'TXN_META_INFO'
ERROR 1060 (42S21): Duplicate column name 'HL_AGENT_INFO'
ERROR 1060 (42S21): Duplicate column name 'HL_BLOCKEDBY_EXT_ID'
ERROR 1060 (42S21): Duplicate column name 'HL_BLOCKEDBY_INT_ID'
ERROR 1050 (42S01): Table 'AUX_TABLE' already exists
Query OK, 1 row affected (0.01 sec)
Rows matched: 1  Changed: 1  Warnings: 0

+---------------------------------------------------------+
|                                                         |
+---------------------------------------------------------+
| Finished upgrading MetaStore schema from 1.2.0 to 2.0.0 |
+---------------------------------------------------------+
1 row in set, 1 warning (0.00 sec)

# 拷贝hive原来的配置和依赖jar

[hadoop@file1 mysql]$ cd ~/tools/apache-hive-2.0.0-bin/conf/
[hadoop@file1 conf]$ cp ~/tools/apache-hive-1.2.1-bin/conf/hive-site.xml ./
[hadoop@file1 conf]$ cp ~/tools/apache-hive-1.2.1-bin/conf/spark-defaults.conf ./
[hadoop@file1 conf]$ cp ~/tools/apache-hive-1.2.1-bin/conf/hive-env.sh ./

# 用到spark需要加大PermSize
[hadoop@file1 hive]$ vi conf/hive-env.sh
export HADOOP_USER_CLASSPATH_FIRST=true
export HADOOP_OPTS="$HADOOP_OPTS -XX:MaxPermSize=256m"

[hadoop@file1 conf]$ cd ../lib/
[hadoop@file1 lib]$ cp ~/tools/apache-hive-1.2.1-bin/lib/mysql-connector-java-5.1.34.jar ./

# centos5需要删除下面两个jar，centos6没必要删
[hadoop@file1 apache-hive-2.0.0-bin]$ rm lib/hive-jdbc-2.0.0-standalone.jar 
[hadoop@file1 apache-hive-2.0.0-bin]$ rm lib/snappy-java-1.0.5.jar 

# spark-1.6.0更新

# http://spark.apache.org/docs/latest/hadoop-provided.html
# http://stackoverflow.com/questions/30906412/noclassdeffounderror-com-apache-hadoop-fs-fsdatainputstream-when-execute-spark-s
[hadoop@file1 apache-hive-2.0.0-bin]$ cd ~/tools/spark-1.6.0-bin-hadoop2-without-hive/conf/
[hadoop@file1 conf]$ cp spark-env.sh.template spark-env.sh
[hadoop@file1 conf]$ vi spark-env.sh
HADOOP_HOME=/home/hadoop/hadoop
SPARK_DIST_CLASSPATH=`$HADOOP_HOME/bin/hadoop classpath`

[hadoop@file1 ~]$ cp ~/tools/spark-1.6.0-bin-hadoop2-without-hive/lib/spark-1.6.0-yarn-shuffle.jar ~/tools/hadoop-2.6.3/share/hadoop/yarn/
[hadoop@file1 ~]$ rm ~/tools/hadoop-2.6.3/share/hadoop/yarn/spark-1.3.1-yarn-shuffle.jar 

[hadoop@file1 ~]$ rsync -vaz --delete ~/tools/hadoop-2.6.3/share file2:~/tools/hadoop-2.6.3/ 
[hadoop@file1 ~]$ rsync -vaz --delete ~/tools/hadoop-2.6.3/share file3:~/tools/hadoop-2.6.3/ 

[hadoop@file1 ~]$ hdfs dfs -put ~/tools/spark-1.6.0-bin-hadoop2-without-hive/lib/spark-assembly-1.6.0-hadoop2.6.3.jar /spark/

[hadoop@file1 apache-hive-2.0.0-bin]$ vi conf/spark-defaults.conf 
spark.yarn.jar    hdfs:///spark/spark-assembly-1.6.0-hadoop2.6.3.jar

# 重启yarn（如果你用hiveserver2，先往下看，后面还会修改配置重启的）

[hadoop@file1 apache-hive-2.0.0-bin]$ cd ~/tools/hadoop-2.6.3/
[hadoop@file1 hadoop-2.6.3]$ sbin/stop-yarn.sh 
[hadoop@file1 hadoop-2.6.3]$ sbin/start-yarn.sh 

更新到这里，执行hive命令是ok了的。但是hiveserver还有问题。

# 启动hiveserver2
[hadoop@file1 hive]$ nohup bin/hiveserver2 &

# 启动spark historyserver
[hadoop@file1 spark]$ cat start-historyserver.sh 
source $HADOOP_HOME/libexec/hadoop-config.sh
sbin/start-history-server.sh hdfs:///spark-eventlogs

[hadoop@file1 hive]$ bin/beeline -u jdbc:hive2://file1:10000/ -n hadoop -p hadoop
which: no hbase in (/home/hadoop/hadoop/bin:/home/hadoop/hive/bin:/opt/jdk1.7.0_60/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/home/hadoop/tools/hadoop-2.6.3/bin:/home/hadoop/tools/hadoop-2.6.3:/home/hadoop/tools/apache-hive-1.2.1-bin:/home/hadoop/bin)
ls: /home/hadoop/hive/lib/hive-jdbc-*-standalone.jar: 没有那个文件或目录
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/tools/apache-hive-2.0.0-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/tools/hadoop-2.6.3/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to jdbc:hive2://file1:10000/
Error: Failed to open new session: java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: hadoop is not allowed to impersonate hadoop (state=,code=0)
Beeline version 2.0.0 by Apache Hive
beeline> 

Beeline连接hiveserver2失败，模拟的hadoop用户授权失败。需要修改hadoop的参数。

# https://community.hortonworks.com/questions/4905/error-while-running-hive-queries-from-zeppelin.html
# http://stackoverflow.com/questions/25073792/error-e0902-exception-occured-user-root-is-not-allowed-to-impersonate-root
# core-site.xml添加，并重启集群hdfs & yarn
<property>
<name>hadoop.proxyuser.hadoop.hosts</name><value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name><value>*</value>
</property>

[hadoop@file1 hadoop-2.6.3]$ sbin/stop-all.sh
[hadoop@file1 hadoop-2.6.3]$ sbin/start-all.sh 

[hadoop@file1 hive]$ bin/beeline -u jdbc:hive2://file1:10000 -n hadoop -p hadoop
...
0: jdbc:hive2://file1:10000/> set hive.execution.engine=spark;
No rows affected (0.019 seconds)
0: jdbc:hive2://file1:10000/> select count(*) from t_info where edate=20160413;
INFO  : Compiling command(queryId=hadoop_20160413114039_f930d3e7-af83-4b12-a536-404a4e20eeea): select count(*) from t_info where edate=20160413
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:c0, type:bigint, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hadoop_20160413114039_f930d3e7-af83-4b12-a536-404a4e20eeea); Time taken: 0.523 seconds
INFO  : Executing command(queryId=hadoop_20160413114039_f930d3e7-af83-4b12-a536-404a4e20eeea): select count(*) from t_info where edate=20160413
INFO  : Query ID = hadoop_20160413114039_f930d3e7-af83-4b12-a536-404a4e20eeea
INFO  : Total jobs = 1
INFO  : Launching Job 1 out of 1
INFO  : Starting task [Stage-1:MAPRED] in serial mode

INFO  : 
Query Hive on Spark job[0] stages:
INFO  : 0
INFO  : 1
INFO  : 
Status: Running (Hive on Spark job[0])
INFO  : Job Progress Format
CurrentTime StageId_StageAttemptId: SucceededTasksCount(+RunningTasksCount-FailedTasksCount)/TotalTasksCount [StageCost]
INFO  : 2016-04-13 11:41:20,519 Stage-0_0: 0(+8)/10     Stage-1_0: 0/1
INFO  : 2016-04-13 11:41:23,577 Stage-0_0: 0(+8)/10     Stage-1_0: 0/1
INFO  : 2016-04-13 11:41:26,817 Stage-0_0: 0(+8)/10     Stage-1_0: 0/1
INFO  : 2016-04-13 11:41:29,858 Stage-0_0: 0(+8)/10     Stage-1_0: 0/1
INFO  : 2016-04-13 11:41:32,903 Stage-0_0: 0(+8)/10     Stage-1_0: 0/1
INFO  : 2016-04-13 11:41:35,942 Stage-0_0: 0(+8)/10     Stage-1_0: 0/1
INFO  : 2016-04-13 11:41:37,969 Stage-0_0: 0(+9)/10     Stage-1_0: 0/1
INFO  : 2016-04-13 11:41:38,981 Stage-0_0: 1(+8)/10     Stage-1_0: 0/1
INFO  : 2016-04-13 11:41:39,994 Stage-0_0: 3(+7)/10     Stage-1_0: 0/1
INFO  : 2016-04-13 11:41:43,030 Stage-0_0: 3(+7)/10     Stage-1_0: 0/1
INFO  : 2016-04-13 11:41:45,056 Stage-0_0: 5(+5)/10     Stage-1_0: 0/1
INFO  : 2016-04-13 11:41:46,072 Stage-0_0: 6(+4)/10     Stage-1_0: 0/1
INFO  : 2016-04-13 11:41:47,085 Stage-0_0: 8(+2)/10     Stage-1_0: 0/1
INFO  : 2016-04-13 11:41:48,096 Stage-0_0: 9(+1)/10     Stage-1_0: 0/1
INFO  : 2016-04-13 11:41:51,125 Stage-0_0: 9(+1)/10     Stage-1_0: 0/1
INFO  : 2016-04-13 11:41:52,134 Stage-0_0: 10/10 Finished       Stage-1_0: 1/1 Finished
INFO  : Status: Finished successfully in 64.78 seconds
INFO  : Completed executing command(queryId=hadoop_20160413114039_f930d3e7-af83-4b12-a536-404a4e20eeea); Time taken: 71.767 seconds
INFO  : OK
+-----------+--+
|    c0     |
+-----------+--+
| 89867722  |
+-----------+--+
1 row selected (72.45 seconds)

本来升级是想看看UI长什么样子，有点失望，功能太少了。只能看当前执行的SQL和session，历史记录不能查看。期待新版本UI更强大。

升级后beeline上下键切换历史的也不起作用了，hive-2.0.0没也啥吸引的功能（hive2准备淘汰mr了），觉得不爽可以直接替换软链退回hive1.2.1-spark1.3.1（实践后没问题，spark.yarn.jar记得改）

–END

Winse Blog

走走停停都是风景, 熙熙攘攘都向最好, 忙忙碌碌都为明朝, 何畏之.

Hiveserver2 Ui and Upgrade hive2.0.0

操作详情

Comments