Winse Blog

走走停停都是风景, 熙熙攘攘都向最好, 忙忙碌碌都为明朝, 何畏之.

Hiveserver2 Ui and Upgrade hive2.0.0

升级hive的标准动作:

  • 更新metadata,就是执行sql语句。更新前先备份原来的库!!
  • 调整依赖,我这里是升级spark,编译参考spark-without-hive
  • 修改参数(hive/spark/hadoop)来适应新版本
  • hiveserver2 ui:启动hiveserver2服务,访问10002端口即可。UI配置

环境说明:

  • centos5
  • hadoop-2.6.3
  • spark-1.6.0-without-hive
  • hive-2.0.0

操作详情

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
# 备份
[hadoop@file1 tools]$ mysqldump -uroot -p hive >hive1.2.1-20160413.backup.sql

# 准备好程序后的目录结构
[hadoop@file1 ~]$ ll
总计 20
drwxrwxr-x 3 hadoop hadoop 4096 04-13 11:59 collect
drwx------ 3 hadoop hadoop 4096 04-07 16:43 dfs
lrwxrwxrwx 1 hadoop hadoop   18 04-11 10:09 hadoop -> tools/hadoop-2.6.3
lrwxrwxrwx 1 hadoop hadoop   40 04-13 10:26 hive -> /home/hadoop/tools/apache-hive-2.0.0-bin
lrwxrwxrwx 1 hadoop hadoop   42 04-13 10:52 spark -> tools/spark-1.6.0-bin-hadoop2-without-hive
drwxrwxr-x 6 hadoop hadoop 4096 04-13 12:10 tmp
drwxrwxr-x 9 hadoop hadoop 4096 04-13 11:48 tools
[hadoop@file1 tools]$ ll
总计 84
drwxrwxr-x  8 hadoop hadoop  4096 04-08 09:25 apache-hive-1.2.1-bin
drwxrwxr-x  8 hadoop hadoop  4096 04-13 10:16 apache-hive-2.0.0-bin
drwxr-xr-x 11 hadoop hadoop  4096 04-07 16:34 hadoop-2.6.3
-rw-rw-r--  1 hadoop hadoop 46879 04-13 10:11 hive1.2.1-20160413.backup.sql
drwxrwxr-x  2 hadoop hadoop  4096 03-31 15:28 mysql
lrwxrwxrwx  1 hadoop hadoop    36 04-13 10:17 spark -> spark-1.6.0-bin-hadoop2-without-hive
drwxrwxr-x 11 hadoop hadoop  4096 04-07 18:23 spark-1.3.1-bin-hadoop2.6.3-without-hive
drwxrwxr-x 11 hadoop hadoop  4096 03-28 11:15 spark-1.6.0-bin-hadoop2-without-hive
drwxr-xr-x 11 hadoop hadoop  4096 03-31 16:14 zookeeper-3.4.6

# 环境变量我直接加载的是link软链接的,我这直接修改软链就行了。根据情况调整。
# apache-hive-2.0.0-bin同级目录建立spark软链接,或者再hive-env.sh中指定SPARK_HOME的位置

# hive-1.2.1并没有txn的表,所有要单独执行下hive-txn-schema-2.0.0.mysql.sql,
# 然后再更新(后面的Duplicate column的错没问题的)
[hadoop@file1 tools]$ cd apache-hive-2.0.0-bin/scripts/metastore/upgrade/mysql/
[hadoop@file1 mysql]$ mysql -uroot -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 10765
Server version: 5.5.48 MySQL Community Server (GPL)

Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> use hive;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> source hive-txn-schema-2.0.0.mysql.sql
Query OK, 0 rows affected (0.01 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.04 sec)

Query OK, 0 rows affected (0.03 sec)

Query OK, 1 row affected (0.04 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.03 sec)
Records: 0  Duplicates: 0  Warnings: 0

Query OK, 0 rows affected (0.01 sec)

Query OK, 1 row affected (0.00 sec)

Query OK, 0 rows affected (0.01 sec)

Query OK, 0 rows affected (0.01 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 1 row affected (0.00 sec)

Query OK, 0 rows affected (0.01 sec)

mysql> source upgrade-1.2.0-to-2.0.0.mysql.sql
+------------------------------------------------+
|                                                |
+------------------------------------------------+
| Upgrading MetaStore schema from 1.2.0 to 2.0.0 |
+------------------------------------------------+
1 row in set, 1 warning (0.00 sec)

+---------------------------------------------------------------------------------------------------------------+
|                                                                                                               |
+---------------------------------------------------------------------------------------------------------------+
| < HIVE-7018 Remove Table and Partition tables column LINK_TARGET_ID from Mysql for other DBs do not have it > |
+---------------------------------------------------------------------------------------------------------------+
1 row in set, 1 warning (0.00 sec)

Query OK, 0 rows affected, 1 warning (0.03 sec)

Query OK, 0 rows affected, 1 warning (0.00 sec)

Query OK, 0 rows affected, 1 warning (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

+---------------------------------+
| Completed remove LINK_TARGET_ID |
+---------------------------------+
| Completed remove LINK_TARGET_ID |
+---------------------------------+
1 row in set (0.02 sec)

Query OK, 0 rows affected (0.02 sec)

Query OK, 31 rows affected (0.01 sec)
Records: 31  Duplicates: 0  Warnings: 0

Query OK, 0 rows affected (0.05 sec)
Records: 0  Duplicates: 0  Warnings: 0

Query OK, 0 rows affected (0.02 sec)
Records: 0  Duplicates: 0  Warnings: 0

Query OK, 0 rows affected (0.00 sec)
Records: 0  Duplicates: 0  Warnings: 0

Query OK, 0 rows affected (0.03 sec)
Records: 0  Duplicates: 0  Warnings: 0

Query OK, 0 rows affected (0.00 sec)
Records: 0  Duplicates: 0  Warnings: 0

ERROR 1060 (42S21): Duplicate column name 'CQ_HIGHEST_TXN_ID'
ERROR 1060 (42S21): Duplicate column name 'CQ_META_INFO'
ERROR 1060 (42S21): Duplicate column name 'CQ_HADOOP_JOB_ID'
ERROR 1050 (42S01): Table 'COMPLETED_COMPACTIONS' already exists
ERROR 1060 (42S21): Duplicate column name 'TXN_AGENT_INFO'
ERROR 1060 (42S21): Duplicate column name 'TXN_HEARTBEAT_COUNT'
ERROR 1060 (42S21): Duplicate column name 'HL_HEARTBEAT_COUNT'
ERROR 1060 (42S21): Duplicate column name 'TXN_META_INFO'
ERROR 1060 (42S21): Duplicate column name 'HL_AGENT_INFO'
ERROR 1060 (42S21): Duplicate column name 'HL_BLOCKEDBY_EXT_ID'
ERROR 1060 (42S21): Duplicate column name 'HL_BLOCKEDBY_INT_ID'
ERROR 1050 (42S01): Table 'AUX_TABLE' already exists
Query OK, 1 row affected (0.01 sec)
Rows matched: 1  Changed: 1  Warnings: 0

+---------------------------------------------------------+
|                                                         |
+---------------------------------------------------------+
| Finished upgrading MetaStore schema from 1.2.0 to 2.0.0 |
+---------------------------------------------------------+
1 row in set, 1 warning (0.00 sec)

# 拷贝hive原来的配置和依赖jar

[hadoop@file1 mysql]$ cd ~/tools/apache-hive-2.0.0-bin/conf/
[hadoop@file1 conf]$ cp ~/tools/apache-hive-1.2.1-bin/conf/hive-site.xml ./
[hadoop@file1 conf]$ cp ~/tools/apache-hive-1.2.1-bin/conf/spark-defaults.conf ./
[hadoop@file1 conf]$ cp ~/tools/apache-hive-1.2.1-bin/conf/hive-env.sh ./

# 用到spark需要加大PermSize
[hadoop@file1 hive]$ vi conf/hive-env.sh
export HADOOP_USER_CLASSPATH_FIRST=true
export HADOOP_OPTS="$HADOOP_OPTS -XX:MaxPermSize=256m"

[hadoop@file1 conf]$ cd ../lib/
[hadoop@file1 lib]$ cp ~/tools/apache-hive-1.2.1-bin/lib/mysql-connector-java-5.1.34.jar ./

# centos5需要删除下面两个jar,centos6没必要删
[hadoop@file1 apache-hive-2.0.0-bin]$ rm lib/hive-jdbc-2.0.0-standalone.jar 
[hadoop@file1 apache-hive-2.0.0-bin]$ rm lib/snappy-java-1.0.5.jar 

# spark-1.6.0更新

# http://spark.apache.org/docs/latest/hadoop-provided.html
# http://stackoverflow.com/questions/30906412/noclassdeffounderror-com-apache-hadoop-fs-fsdatainputstream-when-execute-spark-s
[hadoop@file1 apache-hive-2.0.0-bin]$ cd ~/tools/spark-1.6.0-bin-hadoop2-without-hive/conf/
[hadoop@file1 conf]$ cp spark-env.sh.template spark-env.sh
[hadoop@file1 conf]$ vi spark-env.sh
HADOOP_HOME=/home/hadoop/hadoop
SPARK_DIST_CLASSPATH=`$HADOOP_HOME/bin/hadoop classpath`

[hadoop@file1 ~]$ cp ~/tools/spark-1.6.0-bin-hadoop2-without-hive/lib/spark-1.6.0-yarn-shuffle.jar ~/tools/hadoop-2.6.3/share/hadoop/yarn/
[hadoop@file1 ~]$ rm ~/tools/hadoop-2.6.3/share/hadoop/yarn/spark-1.3.1-yarn-shuffle.jar 

[hadoop@file1 ~]$ rsync -vaz --delete ~/tools/hadoop-2.6.3/share file2:~/tools/hadoop-2.6.3/ 
[hadoop@file1 ~]$ rsync -vaz --delete ~/tools/hadoop-2.6.3/share file3:~/tools/hadoop-2.6.3/ 

[hadoop@file1 ~]$ hdfs dfs -put ~/tools/spark-1.6.0-bin-hadoop2-without-hive/lib/spark-assembly-1.6.0-hadoop2.6.3.jar /spark/

[hadoop@file1 apache-hive-2.0.0-bin]$ vi conf/spark-defaults.conf 
spark.yarn.jar    hdfs:///spark/spark-assembly-1.6.0-hadoop2.6.3.jar

# 重启yarn(如果你用hiveserver2,先往下看,后面还会修改配置重启的)

[hadoop@file1 apache-hive-2.0.0-bin]$ cd ~/tools/hadoop-2.6.3/
[hadoop@file1 hadoop-2.6.3]$ sbin/stop-yarn.sh 
[hadoop@file1 hadoop-2.6.3]$ sbin/start-yarn.sh 

更新到这里,执行hive命令是ok了的。但是hiveserver还有问题。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# 启动hiveserver2
[hadoop@file1 hive]$ nohup bin/hiveserver2 &

# 启动spark historyserver
[hadoop@file1 spark]$ cat start-historyserver.sh 
source $HADOOP_HOME/libexec/hadoop-config.sh
sbin/start-history-server.sh hdfs:///spark-eventlogs

[hadoop@file1 hive]$ bin/beeline -u jdbc:hive2://file1:10000/ -n hadoop -p hadoop
which: no hbase in (/home/hadoop/hadoop/bin:/home/hadoop/hive/bin:/opt/jdk1.7.0_60/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/home/hadoop/tools/hadoop-2.6.3/bin:/home/hadoop/tools/hadoop-2.6.3:/home/hadoop/tools/apache-hive-1.2.1-bin:/home/hadoop/bin)
ls: /home/hadoop/hive/lib/hive-jdbc-*-standalone.jar: 没有那个文件或目录
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/tools/apache-hive-2.0.0-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/tools/hadoop-2.6.3/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to jdbc:hive2://file1:10000/
Error: Failed to open new session: java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: hadoop is not allowed to impersonate hadoop (state=,code=0)
Beeline version 2.0.0 by Apache Hive
beeline> 

Beeline连接hiveserver2失败,模拟的hadoop用户授权失败。需要修改hadoop的参数。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
# https://community.hortonworks.com/questions/4905/error-while-running-hive-queries-from-zeppelin.html
# http://stackoverflow.com/questions/25073792/error-e0902-exception-occured-user-root-is-not-allowed-to-impersonate-root
# core-site.xml添加,并重启集群hdfs & yarn
<property>
<name>hadoop.proxyuser.hadoop.hosts</name><value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name><value>*</value>
</property>

[hadoop@file1 hadoop-2.6.3]$ sbin/stop-all.sh
[hadoop@file1 hadoop-2.6.3]$ sbin/start-all.sh 

[hadoop@file1 hive]$ bin/beeline -u jdbc:hive2://file1:10000 -n hadoop -p hadoop
...
0: jdbc:hive2://file1:10000/> set hive.execution.engine=spark;
No rows affected (0.019 seconds)
0: jdbc:hive2://file1:10000/> select count(*) from t_info where edate=20160413;
INFO  : Compiling command(queryId=hadoop_20160413114039_f930d3e7-af83-4b12-a536-404a4e20eeea): select count(*) from t_info where edate=20160413
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:c0, type:bigint, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hadoop_20160413114039_f930d3e7-af83-4b12-a536-404a4e20eeea); Time taken: 0.523 seconds
INFO  : Executing command(queryId=hadoop_20160413114039_f930d3e7-af83-4b12-a536-404a4e20eeea): select count(*) from t_info where edate=20160413
INFO  : Query ID = hadoop_20160413114039_f930d3e7-af83-4b12-a536-404a4e20eeea
INFO  : Total jobs = 1
INFO  : Launching Job 1 out of 1
INFO  : Starting task [Stage-1:MAPRED] in serial mode

INFO  : 
Query Hive on Spark job[0] stages:
INFO  : 0
INFO  : 1
INFO  : 
Status: Running (Hive on Spark job[0])
INFO  : Job Progress Format
CurrentTime StageId_StageAttemptId: SucceededTasksCount(+RunningTasksCount-FailedTasksCount)/TotalTasksCount [StageCost]
INFO  : 2016-04-13 11:41:20,519 Stage-0_0: 0(+8)/10     Stage-1_0: 0/1
INFO  : 2016-04-13 11:41:23,577 Stage-0_0: 0(+8)/10     Stage-1_0: 0/1
INFO  : 2016-04-13 11:41:26,817 Stage-0_0: 0(+8)/10     Stage-1_0: 0/1
INFO  : 2016-04-13 11:41:29,858 Stage-0_0: 0(+8)/10     Stage-1_0: 0/1
INFO  : 2016-04-13 11:41:32,903 Stage-0_0: 0(+8)/10     Stage-1_0: 0/1
INFO  : 2016-04-13 11:41:35,942 Stage-0_0: 0(+8)/10     Stage-1_0: 0/1
INFO  : 2016-04-13 11:41:37,969 Stage-0_0: 0(+9)/10     Stage-1_0: 0/1
INFO  : 2016-04-13 11:41:38,981 Stage-0_0: 1(+8)/10     Stage-1_0: 0/1
INFO  : 2016-04-13 11:41:39,994 Stage-0_0: 3(+7)/10     Stage-1_0: 0/1
INFO  : 2016-04-13 11:41:43,030 Stage-0_0: 3(+7)/10     Stage-1_0: 0/1
INFO  : 2016-04-13 11:41:45,056 Stage-0_0: 5(+5)/10     Stage-1_0: 0/1
INFO  : 2016-04-13 11:41:46,072 Stage-0_0: 6(+4)/10     Stage-1_0: 0/1
INFO  : 2016-04-13 11:41:47,085 Stage-0_0: 8(+2)/10     Stage-1_0: 0/1
INFO  : 2016-04-13 11:41:48,096 Stage-0_0: 9(+1)/10     Stage-1_0: 0/1
INFO  : 2016-04-13 11:41:51,125 Stage-0_0: 9(+1)/10     Stage-1_0: 0/1
INFO  : 2016-04-13 11:41:52,134 Stage-0_0: 10/10 Finished       Stage-1_0: 1/1 Finished
INFO  : Status: Finished successfully in 64.78 seconds
INFO  : Completed executing command(queryId=hadoop_20160413114039_f930d3e7-af83-4b12-a536-404a4e20eeea); Time taken: 71.767 seconds
INFO  : OK
+-----------+--+
|    c0     |
+-----------+--+
| 89867722  |
+-----------+--+
1 row selected (72.45 seconds)

本来升级是想看看UI长什么样子,有点失望,功能太少了。只能看当前执行的SQL和session,历史记录不能查看。期待新版本UI更强大。

升级后beeline上下键切换历史的也不起作用了,hive-2.0.0没也啥吸引的功能(hive2准备淘汰mr了),觉得不爽可以直接替换 软链 退回hive1.2.1-spark1.3.1(实践后没问题,spark.yarn.jar记得改)

–END

Comments