Article
hiveserver2 ui and upgrade hive2.0.0
升级hive的标准动作:
- 更新metadata,就是执行sql语句。更新前先备份原来的库!!
- 调整依赖,我这里是升级spark,编译参考spark-without-hive
- 修改参数(hive/spark/hadoop)来适应新版本
- hiveserver2 ui:启动hiveserver2服务,访问10002端口即可。UI配置
环境说明:
- centos5
- hadoop-2.6.3
- spark-1.6.0-without-hive
- hive-2.0.0
# 操作详情
# 备份
[hadoop@file1 tools]$ mysqldump -uroot -p hive >hive1.2.1-20160413.backup.sql
# 准备好程序后的目录结构
[hadoop@file1 ~]$ ll
总计 20
drwxrwxr-x 3 hadoop hadoop 4096 04-13 11:59 collect
drwx------ 3 hadoop hadoop 4096 04-07 16:43 dfs
lrwxrwxrwx 1 hadoop hadoop 18 04-11 10:09 hadoop -> tools/hadoop-2.6.3
lrwxrwxrwx 1 hadoop hadoop 40 04-13 10:26 hive -> /home/hadoop/tools/apache-hive-2.0.0-bin
lrwxrwxrwx 1 hadoop hadoop 42 04-13 10:52 spark -> tools/spark-1.6.0-bin-hadoop2-without-hive
drwxrwxr-x 6 hadoop hadoop 4096 04-13 12:10 tmp
drwxrwxr-x 9 hadoop hadoop 4096 04-13 11:48 tools
[hadoop@file1 tools]$ ll
总计 84
drwxrwxr-x 8 hadoop hadoop 4096 04-08 09:25 apache-hive-1.2.1-bin
drwxrwxr-x 8 hadoop hadoop 4096 04-13 10:16 apache-hive-2.0.0-bin
drwxr-xr-x 11 hadoop hadoop 4096 04-07 16:34 hadoop-2.6.3
-rw-rw-r-- 1 hadoop hadoop 46879 04-13 10:11 hive1.2.1-20160413.backup.sql
drwxrwxr-x 2 hadoop hadoop 4096 03-31 15:28 mysql
lrwxrwxrwx 1 hadoop hadoop 36 04-13 10:17 spark -> spark-1.6.0-bin-hadoop2-without-hive
drwxrwxr-x 11 hadoop hadoop 4096 04-07 18:23 spark-1.3.1-bin-hadoop2.6.3-without-hive
drwxrwxr-x 11 hadoop hadoop 4096 03-28 11:15 spark-1.6.0-bin-hadoop2-without-hive
drwxr-xr-x 11 hadoop hadoop 4096 03-31 16:14 zookeeper-3.4.6
# 环境变量我直接加载的是link软链接的,我这直接修改软链就行了。根据情况调整。
# apache-hive-2.0.0-bin同级目录建立spark软链接,或者再hive-env.sh中指定SPARK_HOME的位置
# hive-1.2.1并没有txn的表,所有要单独执行下hive-txn-schema-2.0.0.mysql.sql,
# 然后再更新(后面的Duplicate column的错没问题的)
[hadoop@file1 tools]$ cd apache-hive-2.0.0-bin/scripts/metastore/upgrade/mysql/
[hadoop@file1 mysql]$ mysql -uroot -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 10765
Server version: 5.5.48 MySQL Community Server (GPL)
Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> use hive;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> source hive-txn-schema-2.0.0.mysql.sql
Query OK, 0 rows affected (0.01 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.04 sec)
Query OK, 0 rows affected (0.03 sec)
Query OK, 1 row affected (0.04 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.03 sec)
Records: 0 Duplicates: 0 Warnings: 0
Query OK, 0 rows affected (0.01 sec)
Query OK, 1 row affected (0.00 sec)
Query OK, 0 rows affected (0.01 sec)
Query OK, 0 rows affected (0.01 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 1 row affected (0.00 sec)
Query OK, 0 rows affected (0.01 sec)
mysql> source upgrade-1.2.0-to-2.0.0.mysql.sql
+------------------------------------------------+
| |
+------------------------------------------------+
| Upgrading MetaStore schema from 1.2.0 to 2.0.0 |
+------------------------------------------------+
1 row in set, 1 warning (0.00 sec)
+---------------------------------------------------------------------------------------------------------------+
| |
+---------------------------------------------------------------------------------------------------------------+
| < HIVE-7018 Remove Table and Partition tables column LINK_TARGET_ID from Mysql for other DBs do not have it > |
+---------------------------------------------------------------------------------------------------------------+
1 row in set, 1 warning (0.00 sec)
Query OK, 0 rows affected, 1 warning (0.03 sec)
Query OK, 0 rows affected, 1 warning (0.00 sec)
Query OK, 0 rows affected, 1 warning (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
+---------------------------------+
| Completed remove LINK_TARGET_ID |
+---------------------------------+
| Completed remove LINK_TARGET_ID |
+---------------------------------+
1 row in set (0.02 sec)
Query OK, 0 rows affected (0.02 sec)
Query OK, 31 rows affected (0.01 sec)
Records: 31 Duplicates: 0 Warnings: 0
Query OK, 0 rows affected (0.05 sec)
Records: 0 Duplicates: 0 Warnings: 0
Query OK, 0 rows affected (0.02 sec)
Records: 0 Duplicates: 0 Warnings: 0
Query OK, 0 rows affected (0.00 sec)
Records: 0 Duplicates: 0 Warnings: 0
Query OK, 0 rows affected (0.03 sec)
Records: 0 Duplicates: 0 Warnings: 0
Query OK, 0 rows affected (0.00 sec)
Records: 0 Duplicates: 0 Warnings: 0
ERROR 1060 (42S21): Duplicate column name 'CQ_HIGHEST_TXN_ID'
ERROR 1060 (42S21): Duplicate column name 'CQ_META_INFO'
ERROR 1060 (42S21): Duplicate column name 'CQ_HADOOP_JOB_ID'
ERROR 1050 (42S01): Table 'COMPLETED_COMPACTIONS' already exists
ERROR 1060 (42S21): Duplicate column name 'TXN_AGENT_INFO'
ERROR 1060 (42S21): Duplicate column name 'TXN_HEARTBEAT_COUNT'
ERROR 1060 (42S21): Duplicate column name 'HL_HEARTBEAT_COUNT'
ERROR 1060 (42S21): Duplicate column name 'TXN_META_INFO'
ERROR 1060 (42S21): Duplicate column name 'HL_AGENT_INFO'
ERROR 1060 (42S21): Duplicate column name 'HL_BLOCKEDBY_EXT_ID'
ERROR 1060 (42S21): Duplicate column name 'HL_BLOCKEDBY_INT_ID'
ERROR 1050 (42S01): Table 'AUX_TABLE' already exists
Query OK, 1 row affected (0.01 sec)
Rows matched: 1 Changed: 1 Warnings: 0
+---------------------------------------------------------+
| |
+---------------------------------------------------------+
| Finished upgrading MetaStore schema from 1.2.0 to 2.0.0 |
+---------------------------------------------------------+
1 row in set, 1 warning (0.00 sec)
# 拷贝hive原来的配置和依赖jar
[hadoop@file1 mysql]$ cd ~/tools/apache-hive-2.0.0-bin/conf/
[hadoop@file1 conf]$ cp ~/tools/apache-hive-1.2.1-bin/conf/hive-site.xml ./
[hadoop@file1 conf]$ cp ~/tools/apache-hive-1.2.1-bin/conf/spark-defaults.conf ./
[hadoop@file1 conf]$ cp ~/tools/apache-hive-1.2.1-bin/conf/hive-env.sh ./
# 用到spark需要加大PermSize
[hadoop@file1 hive]$ vi conf/hive-env.sh
export HADOOP_USER_CLASSPATH_FIRST=true
export HADOOP_OPTS="$HADOOP_OPTS -XX:MaxPermSize=256m"
[hadoop@file1 conf]$ cd ../lib/
[hadoop@file1 lib]$ cp ~/tools/apache-hive-1.2.1-bin/lib/mysql-connector-java-5.1.34.jar ./
# centos5需要删除下面两个jar,centos6没必要删
[hadoop@file1 apache-hive-2.0.0-bin]$ rm lib/hive-jdbc-2.0.0-standalone.jar
[hadoop@file1 apache-hive-2.0.0-bin]$ rm lib/snappy-java-1.0.5.jar
# spark-1.6.0更新
# http://spark.apache.org/docs/latest/hadoop-provided.html
# http://stackoverflow.com/questions/30906412/noclassdeffounderror-com-apache-hadoop-fs-fsdatainputstream-when-execute-spark-s
[hadoop@file1 apache-hive-2.0.0-bin]$ cd ~/tools/spark-1.6.0-bin-hadoop2-without-hive/conf/
[hadoop@file1 conf]$ cp spark-env.sh.template spark-env.sh
[hadoop@file1 conf]$ vi spark-env.sh
HADOOP_HOME=/home/hadoop/hadoop
SPARK_DIST_CLASSPATH=`$HADOOP_HOME/bin/hadoop classpath`
[hadoop@file1 ~]$ cp ~/tools/spark-1.6.0-bin-hadoop2-without-hive/lib/spark-1.6.0-yarn-shuffle.jar ~/tools/hadoop-2.6.3/share/hadoop/yarn/
[hadoop@file1 ~]$ rm ~/tools/hadoop-2.6.3/share/hadoop/yarn/spark-1.3.1-yarn-shuffle.jar
[hadoop@file1 ~]$ rsync -vaz --delete ~/tools/hadoop-2.6.3/share file2:~/tools/hadoop-2.6.3/
[hadoop@file1 ~]$ rsync -vaz --delete ~/tools/hadoop-2.6.3/share file3:~/tools/hadoop-2.6.3/
[hadoop@file1 ~]$ hdfs dfs -put ~/tools/spark-1.6.0-bin-hadoop2-without-hive/lib/spark-assembly-1.6.0-hadoop2.6.3.jar /spark/
[hadoop@file1 apache-hive-2.0.0-bin]$ vi conf/spark-defaults.conf
spark.yarn.jar hdfs:///spark/spark-assembly-1.6.0-hadoop2.6.3.jar
# 重启yarn(如果你用hiveserver2,先往下看,后面还会修改配置重启的)
[hadoop@file1 apache-hive-2.0.0-bin]$ cd ~/tools/hadoop-2.6.3/
[hadoop@file1 hadoop-2.6.3]$ sbin/stop-yarn.sh
[hadoop@file1 hadoop-2.6.3]$ sbin/start-yarn.sh
更新到这里,执行hive命令是ok了的。但是hiveserver还有问题。
# 启动hiveserver2
[hadoop@file1 hive]$ nohup bin/hiveserver2 &
# 启动spark historyserver
[hadoop@file1 spark]$ cat start-historyserver.sh
source $HADOOP_HOME/libexec/hadoop-config.sh
sbin/start-history-server.sh hdfs:///spark-eventlogs
[hadoop@file1 hive]$ bin/beeline -u jdbc:hive2://file1:10000/ -n hadoop -p hadoop
which: no hbase in (/home/hadoop/hadoop/bin:/home/hadoop/hive/bin:/opt/jdk1.7.0_60/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/home/hadoop/tools/hadoop-2.6.3/bin:/home/hadoop/tools/hadoop-2.6.3:/home/hadoop/tools/apache-hive-1.2.1-bin:/home/hadoop/bin)
ls: /home/hadoop/hive/lib/hive-jdbc-*-standalone.jar: 没有那个文件或目录
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/tools/apache-hive-2.0.0-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/tools/hadoop-2.6.3/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to jdbc:hive2://file1:10000/
Error: Failed to open new session: java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: hadoop is not allowed to impersonate hadoop (state=,code=0)
Beeline version 2.0.0 by Apache Hive
beeline>
Beeline连接hiveserver2失败,模拟的hadoop用户授权失败。需要修改hadoop的参数。
# https://community.hortonworks.com/questions/4905/error-while-running-hive-queries-from-zeppelin.html
# http://stackoverflow.com/questions/25073792/error-e0902-exception-occured-user-root-is-not-allowed-to-impersonate-root
# core-site.xml添加,并重启集群hdfs & yarn
<property>
<name>hadoop.proxyuser.hadoop.hosts</name><value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name><value>*</value>
</property>
[hadoop@file1 hadoop-2.6.3]$ sbin/stop-all.sh
[hadoop@file1 hadoop-2.6.3]$ sbin/start-all.sh
[hadoop@file1 hive]$ bin/beeline -u jdbc:hive2://file1:10000 -n hadoop -p hadoop
...
0: jdbc:hive2://file1:10000/> set hive.execution.engine=spark;
No rows affected (0.019 seconds)
0: jdbc:hive2://file1:10000/> select count(*) from t_info where edate=20160413;
INFO : Compiling command(queryId=hadoop_20160413114039_f930d3e7-af83-4b12-a536-404a4e20eeea): select count(*) from t_info where edate=20160413
INFO : Semantic Analysis Completed
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:c0, type:bigint, comment:null)], properties:null)
INFO : Completed compiling command(queryId=hadoop_20160413114039_f930d3e7-af83-4b12-a536-404a4e20eeea); Time taken: 0.523 seconds
INFO : Executing command(queryId=hadoop_20160413114039_f930d3e7-af83-4b12-a536-404a4e20eeea): select count(*) from t_info where edate=20160413
INFO : Query ID = hadoop_20160413114039_f930d3e7-af83-4b12-a536-404a4e20eeea
INFO : Total jobs = 1
INFO : Launching Job 1 out of 1
INFO : Starting task [Stage-1:MAPRED] in serial mode
INFO :
Query Hive on Spark job[0] stages:
INFO : 0
INFO : 1
INFO :
Status: Running (Hive on Spark job[0])
INFO : Job Progress Format
CurrentTime StageId_StageAttemptId: SucceededTasksCount(+RunningTasksCount-FailedTasksCount)/TotalTasksCount [StageCost]
INFO : 2016-04-13 11:41:20,519 Stage-0_0: 0(+8)/10 Stage-1_0: 0/1
INFO : 2016-04-13 11:41:23,577 Stage-0_0: 0(+8)/10 Stage-1_0: 0/1
INFO : 2016-04-13 11:41:26,817 Stage-0_0: 0(+8)/10 Stage-1_0: 0/1
INFO : 2016-04-13 11:41:29,858 Stage-0_0: 0(+8)/10 Stage-1_0: 0/1
INFO : 2016-04-13 11:41:32,903 Stage-0_0: 0(+8)/10 Stage-1_0: 0/1
INFO : 2016-04-13 11:41:35,942 Stage-0_0: 0(+8)/10 Stage-1_0: 0/1
INFO : 2016-04-13 11:41:37,969 Stage-0_0: 0(+9)/10 Stage-1_0: 0/1
INFO : 2016-04-13 11:41:38,981 Stage-0_0: 1(+8)/10 Stage-1_0: 0/1
INFO : 2016-04-13 11:41:39,994 Stage-0_0: 3(+7)/10 Stage-1_0: 0/1
INFO : 2016-04-13 11:41:43,030 Stage-0_0: 3(+7)/10 Stage-1_0: 0/1
INFO : 2016-04-13 11:41:45,056 Stage-0_0: 5(+5)/10 Stage-1_0: 0/1
INFO : 2016-04-13 11:41:46,072 Stage-0_0: 6(+4)/10 Stage-1_0: 0/1
INFO : 2016-04-13 11:41:47,085 Stage-0_0: 8(+2)/10 Stage-1_0: 0/1
INFO : 2016-04-13 11:41:48,096 Stage-0_0: 9(+1)/10 Stage-1_0: 0/1
INFO : 2016-04-13 11:41:51,125 Stage-0_0: 9(+1)/10 Stage-1_0: 0/1
INFO : 2016-04-13 11:41:52,134 Stage-0_0: 10/10 Finished Stage-1_0: 1/1 Finished
INFO : Status: Finished successfully in 64.78 seconds
INFO : Completed executing command(queryId=hadoop_20160413114039_f930d3e7-af83-4b12-a536-404a4e20eeea); Time taken: 71.767 seconds
INFO : OK
+-----------+--+
| c0 |
+-----------+--+
| 89867722 |
+-----------+--+
1 row selected (72.45 seconds)
本来升级是想看看UI长什么样子,有点失望,功能太少了。只能看当前执行的SQL和session,历史记录不能查看。期待新版本UI更强大。
升级后beeline上下键切换历史的也不起作用了,hive-2.0.0没也啥吸引的功能(hive2准备淘汰mr了),觉得不爽可以直接替换 软链 退回hive1.2.1-spark1.3.1(实践后没问题,spark.yarn.jar记得改)
–END
Related
Related posts
-
杀鸡焉用牛刀:DuckDB 正取代部分 Spark 场景
2026-02-16
-
基于对象存储的 Spark 数据读写实战:从末尾追加到任意更新
2025-10-28
-
认真的博客
2021-12-08
-
视频自动翻译
2018-08-25