Winse Blog

走走停停都是风景, 熙熙攘攘都向最好, 忙忙碌碌都为明朝, 何畏之.

Hiveserver2 Ui and Upgrade hive2.0.0

升级hive的标准动作:

  • 更新metadata,就是执行sql语句。更新前先备份原来的库!!
  • 调整依赖,我这里是升级spark,编译参考spark-without-hive
  • 修改参数(hive/spark/hadoop)来适应新版本
  • hiveserver2 ui:启动hiveserver2服务,访问10002端口即可。UI配置

环境说明:

  • centos5
  • hadoop-2.6.3
  • spark-1.6.0-without-hive
  • hive-2.0.0

操作详情

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
# 备份
[hadoop@file1 tools]$ mysqldump -uroot -p hive >hive1.2.1-20160413.backup.sql

# 准备好程序后的目录结构
[hadoop@file1 ~]$ ll
总计 20
drwxrwxr-x 3 hadoop hadoop 4096 04-13 11:59 collect
drwx------ 3 hadoop hadoop 4096 04-07 16:43 dfs
lrwxrwxrwx 1 hadoop hadoop   18 04-11 10:09 hadoop -> tools/hadoop-2.6.3
lrwxrwxrwx 1 hadoop hadoop   40 04-13 10:26 hive -> /home/hadoop/tools/apache-hive-2.0.0-bin
lrwxrwxrwx 1 hadoop hadoop   42 04-13 10:52 spark -> tools/spark-1.6.0-bin-hadoop2-without-hive
drwxrwxr-x 6 hadoop hadoop 4096 04-13 12:10 tmp
drwxrwxr-x 9 hadoop hadoop 4096 04-13 11:48 tools
[hadoop@file1 tools]$ ll
总计 84
drwxrwxr-x  8 hadoop hadoop  4096 04-08 09:25 apache-hive-1.2.1-bin
drwxrwxr-x  8 hadoop hadoop  4096 04-13 10:16 apache-hive-2.0.0-bin
drwxr-xr-x 11 hadoop hadoop  4096 04-07 16:34 hadoop-2.6.3
-rw-rw-r--  1 hadoop hadoop 46879 04-13 10:11 hive1.2.1-20160413.backup.sql
drwxrwxr-x  2 hadoop hadoop  4096 03-31 15:28 mysql
lrwxrwxrwx  1 hadoop hadoop    36 04-13 10:17 spark -> spark-1.6.0-bin-hadoop2-without-hive
drwxrwxr-x 11 hadoop hadoop  4096 04-07 18:23 spark-1.3.1-bin-hadoop2.6.3-without-hive
drwxrwxr-x 11 hadoop hadoop  4096 03-28 11:15 spark-1.6.0-bin-hadoop2-without-hive
drwxr-xr-x 11 hadoop hadoop  4096 03-31 16:14 zookeeper-3.4.6

# 环境变量我直接加载的是link软链接的,我这直接修改软链就行了。根据情况调整。
# apache-hive-2.0.0-bin同级目录建立spark软链接,或者再hive-env.sh中指定SPARK_HOME的位置

# hive-1.2.1并没有txn的表,所有要单独执行下hive-txn-schema-2.0.0.mysql.sql,
# 然后再更新(后面的Duplicate column的错没问题的)
[hadoop@file1 tools]$ cd apache-hive-2.0.0-bin/scripts/metastore/upgrade/mysql/
[hadoop@file1 mysql]$ mysql -uroot -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 10765
Server version: 5.5.48 MySQL Community Server (GPL)

Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> use hive;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> source hive-txn-schema-2.0.0.mysql.sql
Query OK, 0 rows affected (0.01 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.04 sec)

Query OK, 0 rows affected (0.03 sec)

Query OK, 1 row affected (0.04 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.03 sec)
Records: 0  Duplicates: 0  Warnings: 0

Query OK, 0 rows affected (0.01 sec)

Query OK, 1 row affected (0.00 sec)

Query OK, 0 rows affected (0.01 sec)

Query OK, 0 rows affected (0.01 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 1 row affected (0.00 sec)

Query OK, 0 rows affected (0.01 sec)

mysql> source upgrade-1.2.0-to-2.0.0.mysql.sql
+------------------------------------------------+
|                                                |
+------------------------------------------------+
| Upgrading MetaStore schema from 1.2.0 to 2.0.0 |
+------------------------------------------------+
1 row in set, 1 warning (0.00 sec)

+---------------------------------------------------------------------------------------------------------------+
|                                                                                                               |
+---------------------------------------------------------------------------------------------------------------+
| < HIVE-7018 Remove Table and Partition tables column LINK_TARGET_ID from Mysql for other DBs do not have it > |
+---------------------------------------------------------------------------------------------------------------+
1 row in set, 1 warning (0.00 sec)

Query OK, 0 rows affected, 1 warning (0.03 sec)

Query OK, 0 rows affected, 1 warning (0.00 sec)

Query OK, 0 rows affected, 1 warning (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

+---------------------------------+
| Completed remove LINK_TARGET_ID |
+---------------------------------+
| Completed remove LINK_TARGET_ID |
+---------------------------------+
1 row in set (0.02 sec)

Query OK, 0 rows affected (0.02 sec)

Query OK, 31 rows affected (0.01 sec)
Records: 31  Duplicates: 0  Warnings: 0

Query OK, 0 rows affected (0.05 sec)
Records: 0  Duplicates: 0  Warnings: 0

Query OK, 0 rows affected (0.02 sec)
Records: 0  Duplicates: 0  Warnings: 0

Query OK, 0 rows affected (0.00 sec)
Records: 0  Duplicates: 0  Warnings: 0

Query OK, 0 rows affected (0.03 sec)
Records: 0  Duplicates: 0  Warnings: 0

Query OK, 0 rows affected (0.00 sec)
Records: 0  Duplicates: 0  Warnings: 0

ERROR 1060 (42S21): Duplicate column name 'CQ_HIGHEST_TXN_ID'
ERROR 1060 (42S21): Duplicate column name 'CQ_META_INFO'
ERROR 1060 (42S21): Duplicate column name 'CQ_HADOOP_JOB_ID'
ERROR 1050 (42S01): Table 'COMPLETED_COMPACTIONS' already exists
ERROR 1060 (42S21): Duplicate column name 'TXN_AGENT_INFO'
ERROR 1060 (42S21): Duplicate column name 'TXN_HEARTBEAT_COUNT'
ERROR 1060 (42S21): Duplicate column name 'HL_HEARTBEAT_COUNT'
ERROR 1060 (42S21): Duplicate column name 'TXN_META_INFO'
ERROR 1060 (42S21): Duplicate column name 'HL_AGENT_INFO'
ERROR 1060 (42S21): Duplicate column name 'HL_BLOCKEDBY_EXT_ID'
ERROR 1060 (42S21): Duplicate column name 'HL_BLOCKEDBY_INT_ID'
ERROR 1050 (42S01): Table 'AUX_TABLE' already exists
Query OK, 1 row affected (0.01 sec)
Rows matched: 1  Changed: 1  Warnings: 0

+---------------------------------------------------------+
|                                                         |
+---------------------------------------------------------+
| Finished upgrading MetaStore schema from 1.2.0 to 2.0.0 |
+---------------------------------------------------------+
1 row in set, 1 warning (0.00 sec)

# 拷贝hive原来的配置和依赖jar

[hadoop@file1 mysql]$ cd ~/tools/apache-hive-2.0.0-bin/conf/
[hadoop@file1 conf]$ cp ~/tools/apache-hive-1.2.1-bin/conf/hive-site.xml ./
[hadoop@file1 conf]$ cp ~/tools/apache-hive-1.2.1-bin/conf/spark-defaults.conf ./
[hadoop@file1 conf]$ cp ~/tools/apache-hive-1.2.1-bin/conf/hive-env.sh ./

# 用到spark需要加大PermSize
[hadoop@file1 hive]$ vi conf/hive-env.sh
export HADOOP_USER_CLASSPATH_FIRST=true
export HADOOP_OPTS="$HADOOP_OPTS -XX:MaxPermSize=256m"

[hadoop@file1 conf]$ cd ../lib/
[hadoop@file1 lib]$ cp ~/tools/apache-hive-1.2.1-bin/lib/mysql-connector-java-5.1.34.jar ./

# centos5需要删除下面两个jar,centos6没必要删
[hadoop@file1 apache-hive-2.0.0-bin]$ rm lib/hive-jdbc-2.0.0-standalone.jar 
[hadoop@file1 apache-hive-2.0.0-bin]$ rm lib/snappy-java-1.0.5.jar 

# spark-1.6.0更新

# http://spark.apache.org/docs/latest/hadoop-provided.html
# http://stackoverflow.com/questions/30906412/noclassdeffounderror-com-apache-hadoop-fs-fsdatainputstream-when-execute-spark-s
[hadoop@file1 apache-hive-2.0.0-bin]$ cd ~/tools/spark-1.6.0-bin-hadoop2-without-hive/conf/
[hadoop@file1 conf]$ cp spark-env.sh.template spark-env.sh
[hadoop@file1 conf]$ vi spark-env.sh
HADOOP_HOME=/home/hadoop/hadoop
SPARK_DIST_CLASSPATH=`$HADOOP_HOME/bin/hadoop classpath`

[hadoop@file1 ~]$ cp ~/tools/spark-1.6.0-bin-hadoop2-without-hive/lib/spark-1.6.0-yarn-shuffle.jar ~/tools/hadoop-2.6.3/share/hadoop/yarn/
[hadoop@file1 ~]$ rm ~/tools/hadoop-2.6.3/share/hadoop/yarn/spark-1.3.1-yarn-shuffle.jar 

[hadoop@file1 ~]$ rsync -vaz --delete ~/tools/hadoop-2.6.3/share file2:~/tools/hadoop-2.6.3/ 
[hadoop@file1 ~]$ rsync -vaz --delete ~/tools/hadoop-2.6.3/share file3:~/tools/hadoop-2.6.3/ 

[hadoop@file1 ~]$ hdfs dfs -put ~/tools/spark-1.6.0-bin-hadoop2-without-hive/lib/spark-assembly-1.6.0-hadoop2.6.3.jar /spark/

[hadoop@file1 apache-hive-2.0.0-bin]$ vi conf/spark-defaults.conf 
spark.yarn.jar    hdfs:///spark/spark-assembly-1.6.0-hadoop2.6.3.jar

# 重启yarn(如果你用hiveserver2,先往下看,后面还会修改配置重启的)

[hadoop@file1 apache-hive-2.0.0-bin]$ cd ~/tools/hadoop-2.6.3/
[hadoop@file1 hadoop-2.6.3]$ sbin/stop-yarn.sh 
[hadoop@file1 hadoop-2.6.3]$ sbin/start-yarn.sh 

更新到这里,执行hive命令是ok了的。但是hiveserver还有问题。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# 启动hiveserver2
[hadoop@file1 hive]$ nohup bin/hiveserver2 &

# 启动spark historyserver
[hadoop@file1 spark]$ cat start-historyserver.sh 
source $HADOOP_HOME/libexec/hadoop-config.sh
sbin/start-history-server.sh hdfs:///spark-eventlogs

[hadoop@file1 hive]$ bin/beeline -u jdbc:hive2://file1:10000/ -n hadoop -p hadoop
which: no hbase in (/home/hadoop/hadoop/bin:/home/hadoop/hive/bin:/opt/jdk1.7.0_60/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/home/hadoop/tools/hadoop-2.6.3/bin:/home/hadoop/tools/hadoop-2.6.3:/home/hadoop/tools/apache-hive-1.2.1-bin:/home/hadoop/bin)
ls: /home/hadoop/hive/lib/hive-jdbc-*-standalone.jar: 没有那个文件或目录
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/tools/apache-hive-2.0.0-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/tools/hadoop-2.6.3/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to jdbc:hive2://file1:10000/
Error: Failed to open new session: java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: hadoop is not allowed to impersonate hadoop (state=,code=0)
Beeline version 2.0.0 by Apache Hive
beeline> 

Beeline连接hiveserver2失败,模拟的hadoop用户授权失败。需要修改hadoop的参数。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
# https://community.hortonworks.com/questions/4905/error-while-running-hive-queries-from-zeppelin.html
# http://stackoverflow.com/questions/25073792/error-e0902-exception-occured-user-root-is-not-allowed-to-impersonate-root
# core-site.xml添加,并重启集群hdfs & yarn
<property>
<name>hadoop.proxyuser.hadoop.hosts</name><value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name><value>*</value>
</property>

[hadoop@file1 hadoop-2.6.3]$ sbin/stop-all.sh
[hadoop@file1 hadoop-2.6.3]$ sbin/start-all.sh 

[hadoop@file1 hive]$ bin/beeline -u jdbc:hive2://file1:10000 -n hadoop -p hadoop
...
0: jdbc:hive2://file1:10000/> set hive.execution.engine=spark;
No rows affected (0.019 seconds)
0: jdbc:hive2://file1:10000/> select count(*) from t_info where edate=20160413;
INFO  : Compiling command(queryId=hadoop_20160413114039_f930d3e7-af83-4b12-a536-404a4e20eeea): select count(*) from t_info where edate=20160413
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:c0, type:bigint, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hadoop_20160413114039_f930d3e7-af83-4b12-a536-404a4e20eeea); Time taken: 0.523 seconds
INFO  : Executing command(queryId=hadoop_20160413114039_f930d3e7-af83-4b12-a536-404a4e20eeea): select count(*) from t_info where edate=20160413
INFO  : Query ID = hadoop_20160413114039_f930d3e7-af83-4b12-a536-404a4e20eeea
INFO  : Total jobs = 1
INFO  : Launching Job 1 out of 1
INFO  : Starting task [Stage-1:MAPRED] in serial mode

INFO  : 
Query Hive on Spark job[0] stages:
INFO  : 0
INFO  : 1
INFO  : 
Status: Running (Hive on Spark job[0])
INFO  : Job Progress Format
CurrentTime StageId_StageAttemptId: SucceededTasksCount(+RunningTasksCount-FailedTasksCount)/TotalTasksCount [StageCost]
INFO  : 2016-04-13 11:41:20,519 Stage-0_0: 0(+8)/10     Stage-1_0: 0/1
INFO  : 2016-04-13 11:41:23,577 Stage-0_0: 0(+8)/10     Stage-1_0: 0/1
INFO  : 2016-04-13 11:41:26,817 Stage-0_0: 0(+8)/10     Stage-1_0: 0/1
INFO  : 2016-04-13 11:41:29,858 Stage-0_0: 0(+8)/10     Stage-1_0: 0/1
INFO  : 2016-04-13 11:41:32,903 Stage-0_0: 0(+8)/10     Stage-1_0: 0/1
INFO  : 2016-04-13 11:41:35,942 Stage-0_0: 0(+8)/10     Stage-1_0: 0/1
INFO  : 2016-04-13 11:41:37,969 Stage-0_0: 0(+9)/10     Stage-1_0: 0/1
INFO  : 2016-04-13 11:41:38,981 Stage-0_0: 1(+8)/10     Stage-1_0: 0/1
INFO  : 2016-04-13 11:41:39,994 Stage-0_0: 3(+7)/10     Stage-1_0: 0/1
INFO  : 2016-04-13 11:41:43,030 Stage-0_0: 3(+7)/10     Stage-1_0: 0/1
INFO  : 2016-04-13 11:41:45,056 Stage-0_0: 5(+5)/10     Stage-1_0: 0/1
INFO  : 2016-04-13 11:41:46,072 Stage-0_0: 6(+4)/10     Stage-1_0: 0/1
INFO  : 2016-04-13 11:41:47,085 Stage-0_0: 8(+2)/10     Stage-1_0: 0/1
INFO  : 2016-04-13 11:41:48,096 Stage-0_0: 9(+1)/10     Stage-1_0: 0/1
INFO  : 2016-04-13 11:41:51,125 Stage-0_0: 9(+1)/10     Stage-1_0: 0/1
INFO  : 2016-04-13 11:41:52,134 Stage-0_0: 10/10 Finished       Stage-1_0: 1/1 Finished
INFO  : Status: Finished successfully in 64.78 seconds
INFO  : Completed executing command(queryId=hadoop_20160413114039_f930d3e7-af83-4b12-a536-404a4e20eeea); Time taken: 71.767 seconds
INFO  : OK
+-----------+--+
|    c0     |
+-----------+--+
| 89867722  |
+-----------+--+
1 row selected (72.45 seconds)

本来升级是想看看UI长什么样子,有点失望,功能太少了。只能看当前执行的SQL和session,历史记录不能查看。期待新版本UI更强大。

升级后beeline上下键切换历史的也不起作用了,hive-2.0.0没也啥吸引的功能(hive2准备淘汰mr了),觉得不爽可以直接替换 软链 退回hive1.2.1-spark1.3.1(实践后没问题,spark.yarn.jar记得改)

–END

Spark-on-yarn内存分配

上次写了一篇关于配置参数是如何影响mapreduce的实际调度的参考

  • opts(yarn.app.mapreduce.am.command-opts、mapreduce.map.java.opts、mapreduce.reduce.java.opts)是实际运行程序是内存参数。
  • memory(yarn.app.mapreduce.am.resource.mb、mapreduce.map.memory.mb、mapreduce.reduce.memory.mb)是用于ResourceManager计算集群资源使用和调度。

了解参数区别,就没有再深究task内存的问题了。

新问题-内存分配

这次又遇到内存问题:spark使用yarn-client的方式运行时,spark有memoryOverhead的设置,但是加了额外的内存后,再经过集群调度内存浪费严重,对于本来就小内存的集群来说完全无法接受。

  • am默认是512加上384 overhead,也就是896m。但是调度后am分配内存资源为1024。
  • executor默认是1024加上384,等于1408M。单调度后executor分配内存资源为2048。

从appmaster的日志可以看出来请求的内存大小是1408:

一个executor就浪费了500M,本来可以跑4个executor的但现在只能执行3个!

关于内存参数的具体含义查看官网: spark-on-yarnyarn-default.xml

参数
spark.yarn.am.memory 512m
spark.driver.memory 1g
spark.yarn.executor.memoryOverhead executorMemory * 0.10, with minimum of 384
spark.yarn.driver.memoryOverhead driverMemory * 0.10, with minimum of 384
spark.yarn.am.memoryOverhead AM memory * 0.10, with minimum of 384
yarn.nodemanager.resource.memory-mb 8192
yarn.scheduler.minimum-allocation-mb 1024
yarn.scheduler.maximum-allocation-mb 8192

分配的内存看着像是 最小分配内存 的整数倍。把 yarn.scheduler.minimum-allocation-mb 修改为512,重启yarn再运行,executor的分配的内存果真减少到1536(512*3)。

同时 http://blog.javachen.com/2015/06/09/memory-in-spark-on-yarn.html 这篇文章也讲 在YARN中,Container申请的内存大小必须为yarn.scheduler.minimum-allocation-mb的整数倍 。我们不去猜,调试下调度代码,看看究竟是什么情况。

1
2
3
4
5
6
7
8
9
[hadoop@cu2 hadoop-2.6.3]$ sbin/yarn-daemon.sh stop resourcemanager 

[hadoop@cu2 hadoop]$ grep "minimum-allocation-mb" -1 yarn-site.xml 
<property>
<name>yarn.scheduler.minimum-allocation-mb</name><value>512</value>
</property>

[hadoop@cu2 hadoop-2.6.3]$ export YARN_RESOURCEMANAGER_OPTS="-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=8000"
[hadoop@cu2 hadoop-2.6.3]$ sbin/yarn-daemon.sh start resourcemanager 

本地eclipse在 CapacityScheduler#allocate 打断点,然后跑任务:

1
2
hive> set hive.execution.engine=spark;
hive> select count(*) from t_ods_access_log2 where month=201512;

AppMaster内存分配:

Executor内存分配:

request进到allocate后,最终调用 DefaultResourceCalculator.normalize 重新计算了一遍请求需要的资源,把内存调整了。默认的DefaultResourceCalculator可以通过 capacity-scheduler.xml 的 yarn.scheduler.capacity.resource-calculator 来修改。

具体代码调度过程如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
  public Allocation allocate(ApplicationAttemptId applicationAttemptId,
      List<ResourceRequest> ask, List<ContainerId> release, 
      List<String> blacklistAdditions, List<String> blacklistRemovals) {
    ...
    // Sanity check
    SchedulerUtils.normalizeRequests(
        ask, getResourceCalculator(), getClusterResource(),
        getMinimumResourceCapability(), maximumAllocation);
...

  public static void normalizeRequest(
      ResourceRequest ask, 
      ResourceCalculator resourceCalculator, 
      Resource clusterResource,
      Resource minimumResource,
      Resource maximumResource,
      Resource incrementResource) {
    Resource normalized = 
        Resources.normalize(
            resourceCalculator, ask.getCapability(), minimumResource,
            maximumResource, incrementResource);
    ask.setCapability(normalized);
  }   
...

  public static Resource normalize(
      ResourceCalculator calculator, Resource lhs, Resource min,
      Resource max, Resource increment) {
    return calculator.normalize(lhs, min, max, increment);
  }
...

  public Resource normalize(Resource r, Resource minimumResource,
      Resource maximumResource, Resource stepFactor) {
    int normalizedMemory = Math.min(
        roundUp(
            Math.max(r.getMemory(), minimumResource.getMemory()),
            stepFactor.getMemory()),
            maximumResource.getMemory());
    return Resources.createResource(normalizedMemory);
  }
...

  public static int roundUp(int a, int b) {
    return divideAndCeil(a, b) * b;
  }
  

小结

今天又重新认识一个yarn参数 yarn.scheduler.minimum-allocation-mb ,不仅仅是最小分配的内存,同时分配的资源也是minimum-allocation-mb的整数倍,还告诉我们 yarn.nodemanager.resource.memory-mb 也最好是minimum-allocation-mb的整数倍。

间接的学习了新的参数,可以通过 yarn.scheduler.capacity.resource-calculator 参数 来修改 CapacityScheduler 调度器的资源计算类。

–END

Hive-on-spark Snappy on Centos5

hive的assembly包就是一个坑货!既然是一个单独的可运行的jar放到lib包下面干嘛呢!!纯属记录工作过程总的经历,想找干货的飘过吧!!


上周支撑部门其他项目的hadoop项目,由于 hive mr 比较慢,想用spark试一试看能不能优化。但是系统使用Centos5,我们项目使用的是Centos6。按部就班的编译呗,hive-on-saprk启用SNAPPY的必要条件:

  • hadoop使用snappy需要native的支持,首先当然是Centos5上编译hadoop。(现在看来可以不必要,但每次hdfs命令都提示我native的错误就很不爽)
  • hive增加spark。

各程序版本信息:

  • hadoop-2.6.3
  • hive-1.2.1
  • spark-1.3.1
  • centos5.4

编译hadoop-snappy

  • centos5手动
1
2
3
4
5
6
7
8
9
10
11
[root@localhost snappy-1.1.3]# ./autogen.sh 
Remember to add `AC_PROG_LIBTOOL' to `configure.ac'.
You should update your `aclocal.m4' by running aclocal.
libtoolize: `config.guess' exists: use `--force' to overwrite
libtoolize: `config.sub' exists: use `--force' to overwrite
libtoolize: `ltmain.sh' exists: use `--force' to overwrite
Makefile.am:4: Libtool library used but `LIBTOOL' is undefined
Makefile.am:4: 
Makefile.am:4: The usual way to define `LIBTOOL' is to add `AC_PROG_LIBTOOL'
Makefile.am:4: to `configure.ac' and run `aclocal' and `autoconf' again.
Makefile.am:20: `dist_doc_DATA' is used but `docdir' is undefined

在centos5上面手动编译搞不定,不是专业写C的,这些问题就是天书啊(查了很多资料,试了很多方法都没通)!! Snappy可以在centos6上面编译,编译好以后再centos5上面也能用,编译hadoop-snappy也是ok的

  • centos5-rpm

这里直接用rpm安装snappy。觉得创建虚拟机麻烦的话,也可以用docker。docker不同版本的centos下载: https://github.com/CentOS/sig-cloud-instance-images/ 。然后docker共享host主机的文件: docker run -ti -v /home/hadoop:/home/hadoop -v /opt:/opt -v /data:/data centos:centos5 /bin/bash

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
[root@8fb11f6b3ced ~]# cat /etc/redhat-release 
CentOS release 5.11 (Final)

https://www.rpmfind.net/linux/rpm2html/search.php?query=snappy
https://www.rpmfind.net/linux/rpm2html/search.php?query=snappy-devel

[root@8fb11f6b3ced hadoop-2.6.3-src]# rpm -ivh snappy-1.0.5-1.el5.x86_64.rpm 
[root@8fb11f6b3ced hadoop-2.6.3-src]# rpm -ivh snappy-devel-1.0.5-1.el5.x86_64.rpm                                                                                  

[root@8fb11f6b3ced hadoop-2.6.3-src]# rpm -ql snappy-devel snappy
/usr/include/snappy-c.h
/usr/include/snappy-sinksource.h
/usr/include/snappy-stubs-public.h
/usr/include/snappy.h
/usr/lib64/libsnappy.so
/usr/share/doc/snappy-devel-1.0.5
/usr/share/doc/snappy-devel-1.0.5/format_description.txt
/usr/lib64/libsnappy.so.1
/usr/lib64/libsnappy.so.1.1.3
/usr/share/doc/snappy-1.0.5
/usr/share/doc/snappy-1.0.5/AUTHORS
/usr/share/doc/snappy-1.0.5/COPYING
/usr/share/doc/snappy-1.0.5/ChangeLog
/usr/share/doc/snappy-1.0.5/NEWS
/usr/share/doc/snappy-1.0.5/README

[root@8fb11f6b3ced hadoop-2.6.3-src]# export JAVA_HOME=/opt/jdk1.7.0_17
[root@8fb11f6b3ced hadoop-2.6.3-src]# export MAVEN_HOME=/opt/apache-maven-3.3.9
[root@8fb11f6b3ced hadoop-2.6.3-src]# export PATH=$JAVA_HOME/bin:$MAVEN_HOME/bin:$PATH
[root@8fb11f6b3ced hadoop-2.6.3-src]#  
[root@8fb11f6b3ced hadoop-2.6.3-src]# yum install which gcc gcc-c++ zlib-devel make -y
[root@8fb11f6b3ced hadoop-2.6.3-src]# 
[root@8fb11f6b3ced hadoop-2.6.3-src]# cd protobuf-2.5.0
[root@8fb11f6b3ced hadoop-2.6.3-src]# ./configure 
[root@8fb11f6b3ced hadoop-2.6.3-src]# make && make install
[root@8fb11f6b3ced hadoop-2.6.3-src]# 
[root@8fb11f6b3ced hadoop-2.6.3-src]# which protoc
[root@8fb11f6b3ced hadoop-2.6.3-src]# 
[root@8fb11f6b3ced hadoop-2.6.3-src]# yum install cmake openssl openssl-devel -y
[root@8fb11f6b3ced hadoop-2.6.3-src]# cd hadoop-2.6.3-src/
# bundle.snappy和snappy.lib一起使用,可以把系统的snappy.so文件拷贝到lib/native下面(方便拷贝)
# <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-project-dist/2.6.0/META-INF/maven/org.apache.hadoop/hadoop-project-dist/pom.xml>
[root@8fb11f6b3ced hadoop-2.6.3-src]# mvn clean package -Dmaven.javadoc.skip=true -DskipTests -Drequire.snappy=true -Dbundle.snappy=true -Dsnappy.lib=/usr/lib64 -Pdist,native

[root@8fb11f6b3ced hadoop-2.6.3-src]# ll hadoop-dist/target/hadoop-2.6.3/lib/native/
total 3808
-rw-r--r-- 1 root root 1036552 Apr 12 09:35 libhadoop.a
-rw-r--r-- 1 root root 1212600 Apr 12 09:36 libhadooppipes.a
lrwxrwxrwx 1 root root      18 Apr 12 09:35 libhadoop.so -> libhadoop.so.1.0.0
-rwxr-xr-x 1 root root  613267 Apr 12 09:35 libhadoop.so.1.0.0
-rw-r--r-- 1 root root  401836 Apr 12 09:36 libhadooputils.a
-rw-r--r-- 1 root root  364026 Apr 12 09:35 libhdfs.a
lrwxrwxrwx 1 root root      16 Apr 12 09:35 libhdfs.so -> libhdfs.so.0.0.0
-rwxr-xr-x 1 root root  229672 Apr 12 09:35 libhdfs.so.0.0.0
lrwxrwxrwx 1 root root      18 Apr 12 09:35 libsnappy.so -> libsnappy.so.1.1.3
lrwxrwxrwx 1 root root      18 Apr 12 09:35 libsnappy.so.1 -> libsnappy.so.1.1.3
-rwxr-xr-x 1 root root   21568 Apr 12 09:35 libsnappy.so.1.1.3

[root@8fb11f6b3ced hadoop-2.6.3-src]# cd hadoop-dist/target/hadoop-2.6.3/
[root@8fb11f6b3ced hadoop-2.6.3]# bin/hadoop checknative -a
16/04/12 09:38:29 WARN bzip2.Bzip2Factory: Failed to load/initialize native-bzip2 library system-native, will use pure-Java version
16/04/12 09:38:29 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
Native library checking:
hadoop:  true /data/bigdata/sources/hadoop-2.6.3-src/hadoop-dist/target/hadoop-2.6.3/lib/native/libhadoop.so.1.0.0
zlib:    true /lib64/libz.so.1
snappy:  true /data/bigdata/sources/hadoop-2.6.3-src/hadoop-dist/target/hadoop-2.6.3/lib/native/libsnappy.so.1
lz4:     true revision:99
bzip2:   false 
openssl: false org.apache.hadoop.crypto.OpensslCipher.initIDs()V
16/04/12 09:38:29 INFO util.ExitUtil: Exiting with status 1

把native下面的打tar包,然后替换生产的。一切都是正常的。接下来坑爹的是spark-snappy,具体的说应该是hive-assmably坑!!

hive-on-spark snappy

spark官网也没讲使用snappy需要做什么额外的配置(默认spark.io.compression.codec默认为snappy)。部署后设置 hive.execution.engine=spark 执行spark查询,立马就报错了 Caused by: java.lang.UnsatisfiedLinkError: /tmp/snappy-1.0.5-libsn appyjava.so: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.9’ not found (required by /tmp/snappy-1.0.5-libsnappyjava.so) 从错误堆栈看与hadoop-native-snappy没关系,而是一个snappy-java的包。

1
2
3
4
5
6
7
8
9
10
11
[hadoop@file1 ~]$ strings /usr/lib64/libstdc++.so.6 | grep GLIBCXX
GLIBCXX_3.4
GLIBCXX_3.4.1
GLIBCXX_3.4.2
GLIBCXX_3.4.3
GLIBCXX_3.4.4
GLIBCXX_3.4.5
GLIBCXX_3.4.6
GLIBCXX_3.4.7
GLIBCXX_3.4.8
GLIBCXX_FORCE_NEW

确实缺少GLIBCXX_3.4.9,最新版本的centos5.11也是一样输出的。

spark的配置为:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
spark.yarn.jar    hdfs:///spark/spark-assembly-1.3.1-hadoop2.6.3.jar

spark.master  yarn-client

spark.dynamicAllocation.enabled    true
spark.shuffle.service.enabled      true
spark.dynamicAllocation.minExecutors    2 
spark.dynamicAllocation.maxExecutors    18

spark.driver.maxResultSize   0
spark.master=yarn-client
spark.driver.memory=5g
spark.eventLog.enabled  true
spark.eventLog.compress  true
spark.eventLog.dir    hdfs:///spark-eventlogs
spark.yarn.historyServer.address file1:18080

spark.serializer        org.apache.spark.serializer.KryoSerializer
spark.kryoserializer.buffer.max    512m

报错的具体信息:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
- 16/04/12 20:20:08 INFO storage.BlockManagerMaster: Registered BlockManager
- java.lang.reflect.InvocationTargetException
-        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
-        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
-        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
-        at java.lang.reflect.Method.invoke(Method.java:606)
-        at org.xerial.snappy.SnappyLoader.loadNativeLibrary(SnappyLoader.java:322)
-        at org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:229)
-        at org.xerial.snappy.Snappy.<clinit>(Snappy.java:48)
-        at org.apache.spark.io.SnappyCompressionCodec.<init>(CompressionCodec.scala:150)
-        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
-        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
-        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
-        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
-        at org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:68)
-        at org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:60)
-        at org.apache.spark.scheduler.EventLoggingListener.<init>(EventLoggingListener.scala:67)
-        at org.apache.spark.SparkContext.<init>(SparkContext.scala:400)
-        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:61)
-        at org.apache.hive.spark.client.RemoteDriver.<init>(RemoteDriver.java:169)
-        at org.apache.hive.spark.client.RemoteDriver.main(RemoteDriver.java:556)
-        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
-        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
-        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
-        at java.lang.reflect.Method.invoke(Method.java:606)
-        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
-        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
-        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
-        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
-        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
- Caused by: java.lang.UnsatisfiedLinkError: /tmp/snappy-1.0.5-libsnappyjava.so: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.9' not found (required by /tmp/snappy-1.0.5-libs
-        at java.lang.ClassLoader$NativeLibrary.load(Native Method)
-        at java.lang.ClassLoader.loadLibrary1(ClassLoader.java:1965)
-        at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1890)
-        at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1851)
-        at java.lang.Runtime.load0(Runtime.java:795)
-        at java.lang.System.load(System.java:1062)
-        at org.xerial.snappy.SnappyNativeLoader.load(SnappyNativeLoader.java:39)
-        ... 28 more

spark用到了snappy-java来处理snappy的解压缩。用jinfo获取SparkSubmit进程的classpath,用这个classpath跑helloworld确实是报错的,但是单独用hadoop-common下面的 snappy-java-1.0.4.1.jar 是没问题的。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
[hadoop@file1 snappy-java-test]$ cat Hello.java 
import org.xerial.snappy.Snappy;

public class Hello { 
public static void main(String[] args) throws Exception {
String input = "Hello snappy-java!";

byte[] compressed = Snappy.compress(input.getBytes("utf-8"));
byte[] uncompressed = Snappy.uncompress(compressed);

String result = new String(uncompressed, "utf-8");
System.out.println(result);
}
}

[hadoop@file1 snappy-java-test]$ java -cp .:/home/hadoop/tools/hadoop-2.6.3/share/hadoop/common/lib/snappy-java-1.0.4.1.jar Hello
Hello snappy-java!

而而而,classpath中就只有hadoop-common和hadoop-mapreduce下面有snappy-java包,并且都是1.0.4.1,那TMD的使用SparkSubmit-classpath加载Snappy是哪个jar里面的呢?

调整后的helloworld为:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[hadoop@file1 snappy-java-test]$ cat Hello.java 
import org.xerial.snappy.Snappy;

public class Hello { 
public static void main(String[] args) throws Exception {
String input = "Hello snappy-java!";

System.out.println(Snappy.class.getProtectionDomain());
byte[] compressed = Snappy.compress(input.getBytes("utf-8"));
byte[] uncompressed = Snappy.uncompress(compressed);


String result = new String(uncompressed, "utf-8");
System.out.println(result);
}
}

添加getProtectionDomain查看加载类的jar。再编译跑一次,这次终于找到真凶了!!hive-assembly,assembly包还放在lib下面就tmd的是一个坑货!!hive-exec的guava已经坑了很多人了,这次换hive-jdbc了!!(我这里的环境是centos5,centos6是没有这个问题的!!)

如果指定使用hadoop编译依赖的snappy.so.1.1.3动态链接库会出现版本不兼容的问题。还是干掉hive-jdbc-standalone吧。。。囧

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# 查看源码SnappyLoader#loadSnappySystemProperties,可以通过配置指定使用系统动态链接库
[hadoop@file1 snappy-java-test]$ cat org-xerial-snappy.properties 
org.xerial.snappy.use.systemlib=true
[hadoop@file1 snappy-java-test]$ ln -s /home/hadoop/tools/hadoop-2.6.3/lib/native/libsnappy.so libsnappyjava.so
[hadoop@file1 snappy-java-test]$ ll
总计 1240
-rw-rw-r-- 1 hadoop hadoop     854 04-08 10:11 Hello.class
-rw-rw-r-- 1 hadoop hadoop     408 04-08 10:11 Hello.java
lrwxrwxrwx 1 hadoop hadoop      55 04-12 19:37 libsnappyjava.so -> /home/hadoop/tools/hadoop-2.6.3/lib/native/libsnappy.so
-rw-rw-r-- 1 hadoop hadoop      37 04-12 19:15 org-xerial-snappy.properties
-rw-r--r-- 1 hadoop hadoop 1251514 2014-04-29 snappy-java-1.0.5.jar
[hadoop@file1 snappy-java-test]$ java -cp .:snappy-java-1.0.5.jar -Djava.library.path=. Hello
ProtectionDomain  (file:/home/hadoop/snappy-java-test/snappy-java-1.0.5.jar <no signer certificates>)
 sun.misc.Launcher$AppClassLoader@333cb1eb
 <no principals>
 java.security.Permissions@7377711 (
 ("java.io.FilePermission" "/home/hadoop/snappy-java-test/snappy-java-1.0.5.jar" "read")
 ("java.lang.RuntimePermission" "exitVM")
)


Exception in thread "main" java.lang.UnsatisfiedLinkError: org.xerial.snappy.SnappyNative.maxCompressedLength(I)I
        at org.xerial.snappy.SnappyNative.maxCompressedLength(Native Method)
        at org.xerial.snappy.Snappy.maxCompressedLength(Snappy.java:320)
        at org.xerial.snappy.Snappy.rawCompress(Snappy.java:333)
        at org.xerial.snappy.Snappy.compress(Snappy.java:92)
        at Hello.main(Hello.java:8)
      

删掉jdbc-standalone后,hive-on-spark就ok了。如果你无法下手删除 hive-jdbc-1.2.1-standalone.jar ,那就把 spark.io.compression.codec 改成 lz4 等压缩也是可以的。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
[hadoop@file1 ~]$ hive

Logging initialized using configuration in file:/home/hadoop/tools/apache-hive-1.2.1-bin/conf/hive-log4j.properties
hive> set hive.execution.engine=spark;
hive> select count(*) from t_info where edate=20160411;
Query ID = hadoop_20160412205338_2c95c5fd-af50-42ba-8681-e154e4b74cb1
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Spark Job = 69afc030-fa1f-4fdf-81ef-12bdca411a4f

Query Hive on Spark job[0] stages:
0
1

Status: Running (Hive on Spark job[0])
Job Progress Format
CurrentTime StageId_StageAttemptId: SucceededTasksCount(+RunningTasksCount-FailedTasksCount)/TotalTasksCount [StageCost]
2016-04-12 20:54:11,367 Stage-0_0: 0(+2)/234    Stage-1_0: 0/1
2016-04-12 20:54:14,421 Stage-0_0: 0(+2)/234    Stage-1_0: 0/1
2016-04-12 20:54:17,457 Stage-0_0: 0(+2)/234    Stage-1_0: 0/1
2016-04-12 20:54:19,486 Stage-0_0: 2(+2)/234    Stage-1_0: 0/1
2016-04-12 20:54:20,497 Stage-0_0: 3(+2)/234    Stage-1_0: 0/1
2016-04-12 20:54:21,509 Stage-0_0: 5(+2)/234    Stage-1_0: 0/1
2016-04-12 20:54:22,520 Stage-0_0: 6(+2)/234    Stage-1_0: 0/1
2016-04-12 20:54:23,532 Stage-0_0: 7(+2)/234    Stage-1_0: 0/1

小结

第一,hive的assembly的包太tmd的坑了。第二,以后找java具体加载那个类,可以通过 class.getProtectionDomain 来获取了。第三,又多尝试一个环境部署hadoop。呵呵

–END

puppet4.4.1入门安装

网上资料比较多比较老,基本操作可以借鉴。安装Puppet最简单的方式就是用yum来安装(操作系统centos6),由于天朝的特殊环境最好建立本地仓库。本文记录我自己安装过程的过程,先介绍本地仓库创建,然后介绍Puppet环境的搭建。

操作系统:

1
2
[root@hadoop-master2 ~]# cat /etc/redhat-release 
CentOS release 6.5 (Final)

更新

2016-4-28 15:42:32 - rpm强制安装puppetserver。依赖jdk8有点麻烦,自己安装jdk7就好了。 2016-5-3 09:39:40 - 更新puppetserver性能的部分,运行在Jetty之上不需要再折腾passenger了。见文章最后。

本地仓库搭建

Puppet4所有依赖都进行统一打包,其实通过rpm就能直接安装。为了体现下高大山、并且Puppet内部的项目之间是有依赖的。这里先使用createrepo创建本地库。

createrepo其实就是用来创建目录下rpm文件的索引数据(repodata)。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
[root@hadoop-master2 ~]# yum install createrepo

# 下载系统对应的puppet-pc1的包: https://yum.puppetlabs.com/el/6/PC1/x86_64/ 全部最新版本
[root@hadoop-master2 repo]# ls -1
puppet-agent-1.4.1-1.el6.x86_64.rpm
puppet-dashboard-1.2.23-0.1rc3.el6.noarch.rpm
puppetdb-4.0.0-1.el6.noarch.rpm
puppetdb-termini-3.2.4-1.el6.noarch.rpm
puppetdb-terminus-3-1.el6.noarch.rpm
puppetserver-2.3.1-1.el6.noarch.rpm

[root@hadoop-master2 repo]# createrepo .
Spawning worker 0 with 6 pkgs
Workers Finished
Gathering worker results

Saving Primary metadata
Saving file lists metadata
Saving other metadata
Generating sqlite DBs
Sqlite DBs complete

[root@hadoop-master2 puppetlabs]# cat /etc/yum.repos.d/puppet-local.repo 
[puppet-local]
name=Puppet Local
baseurl=file:///opt/puppetlabs/repo
failovermethod=priority
enabled=1
gpgcheck=0

查看local下的rpm包:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[root@hadoop-master2 repo]# yum clean all
Loaded plugins: fastestmirror, security
Cleaning repos: base epel extras pgdg94 puppet-local updates
Cleaning up Everything

[root@hadoop-master2 repo]# yum list all | grep "puppet-local"
puppet-agent.x86_64                         1.4.1-1.el6                  @puppet-local
puppet-dashboard.noarch                     1.2.23-0.1rc3.el6            @puppet-local
puppetdb.noarch                             4.0.0-1.el6                  @puppet-local
puppetdb-termini.noarch                     3.2.4-1.el6                  @puppet-local
puppetserver.noarch                         2.3.1-1.el6                  @puppet-local
puppetdb-terminus.noarch                    3-1.el6                      puppet-local

[root@hadoop-master2 repo]# yum search puppet

网上资料还有安装 yum-priorities 来设置repo优先级的。我这里没有包冲突问题所以并没有安装这个。

单机安装

安装前翻一翻官网的文档: https://docs.puppet.com/puppetserver/latest/install_from_packages.html

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
# 先看看 puppet-agent 和 puppetserver 的依赖
[root@hadoop-master2 repo]# yum deplist puppet-agent
Loaded plugins: fastestmirror, security
Loading mirror speeds from cached hostfile
 * base: mirrors.aliyun.com
 * epel: ftp.cuhk.edu.hk
 * extras: mirrors.aliyun.com
 * updates: mirrors.aliyun.com
Finding dependencies: 
package: puppet-agent.x86_64 1.4.1-1.el6
  dependency: tar
   provider: tar.x86_64 2:1.23-13.el6
  dependency: /bin/sh
   provider: bash.x86_64 4.1.2-33.el6
   provider: bash.x86_64 4.1.2-33.el6_7.1
  dependency: readline
   provider: readline.i686 6.0-4.el6
   provider: readline.x86_64 6.0-4.el6
  dependency: util-linux
   provider: util-linux-ng.i686 2.17.2-12.18.el6
   provider: util-linux-ng.x86_64 2.17.2-12.18.el6
  dependency: chkconfig
   provider: chkconfig.x86_64 1.3.49.3-5.el6
   provider: chkconfig.x86_64 1.3.49.3-5.el6_7.2

[root@hadoop-master2 repo]# yum deplist puppetserver
Loaded plugins: fastestmirror, security
Loading mirror speeds from cached hostfile
 * base: mirrors.aliyun.com
 * epel: ftp.cuhk.edu.hk
 * extras: mirrors.aliyun.com
 * updates: mirrors.aliyun.com
Finding dependencies: 
package: puppetserver.noarch 2.3.1-1.el6
  dependency: /bin/bash
   provider: bash.x86_64 4.1.2-33.el6
   provider: bash.x86_64 4.1.2-33.el6_7.1
  dependency: java-1.8.0-openjdk-headless
   provider: java-1.8.0-openjdk-headless.x86_64 1:1.8.0.45-35.b13.el6
   provider: java-1.8.0-openjdk-headless.x86_64 1:1.8.0.51-0.b16.el6_6
   provider: java-1.8.0-openjdk-headless.x86_64 1:1.8.0.51-1.b16.el6_7
   provider: java-1.8.0-openjdk-headless.x86_64 1:1.8.0.51-3.b16.el6_7
   provider: java-1.8.0-openjdk-headless.x86_64 1:1.8.0.65-0.b17.el6_7
   provider: java-1.8.0-openjdk-headless.x86_64 1:1.8.0.71-1.b15.el6_7
   provider: java-1.8.0-openjdk-headless.x86_64 1:1.8.0.77-0.b03.el6_7
  dependency: puppet-agent >= 1.4.0
   provider: puppet-agent.x86_64 1.4.1-1.el6
  dependency: net-tools
   provider: net-tools.x86_64 1.60-110.el6_2
  dependency: /usr/bin/env
   provider: coreutils.x86_64 8.4-37.el6
   provider: coreutils.x86_64 8.4-37.el6_7.3
  dependency: /bin/sh
   provider: bash.x86_64 4.1.2-33.el6
   provider: bash.x86_64 4.1.2-33.el6_7.1
  dependency: chkconfig
   provider: chkconfig.x86_64 1.3.49.3-5.el6
   provider: chkconfig.x86_64 1.3.49.3-5.el6_7.2

# 安装
[root@hadoop-master2 repo]# yum install puppetserver

# jps查看进程,然后查看端口
[root@hadoop-master2 repo]# netstat -anp | grep 4526
tcp        0      0 :::8140                     :::*                        LISTEN      4526/java           

# 安装好后,查看各版本软件版本信息
[root@hadoop-master2 repo]# puppet -V
4.4.1
[root@hadoop-master2 repo]# facter -v
3.1.5 (commit b5c2cf9b2ac290cb17fcadea19b467a39e17c1fd)
[root@hadoop-master2 repo]# puppetserver -v
puppetserver version: 2.3.1

puppetserver依赖puppet-agent,而puppet-agent是一个all-in-one的assembly的包。所以服务端安装puppetserver就行了。客户端仅安装puppet-agent即可。

Puppet4的目录进行比较大的调整,程序路径为 /opt/puppetlabs ,配置路径为 /etc/puppetlabs 。如果你看的是puppet3资料,对照查看官网 Where Did Everything Go in Puppet 4.x? 了解各程序的目录位置。

如果你单独安装了jdk(依赖的是jdk8也是挺烦的),也可以使用rpm强制安装puppetserver,然后指定java程序的路径:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
bash-4.1# yum deplist puppetserver
Loaded plugins: fastestmirror, priorities
Loading mirror speeds from cached hostfile
 * centos-local: 172.17.42.1:8888
Finding dependencies: 
package: puppetserver.noarch 2.3.1-1.el6
  dependency: /bin/bash
   provider: bash.x86_64 4.1.2-29.el6
  dependency: java-1.8.0-openjdk-headless
   provider: java-1.8.0-openjdk-headless.x86_64 1.8.0.20-3.b26.el6
  dependency: puppet-agent >= 1.4.0
   provider: puppet-agent.x86_64 1.4.1-1.el6
  dependency: net-tools
   provider: net-tools.x86_64 1.60-110.el6_2
  dependency: /usr/bin/env
   provider: coreutils.x86_64 8.4-37.el6
  dependency: /bin/sh
   provider: bash.x86_64 4.1.2-29.el6
  dependency: chkconfig
   provider: chkconfig.x86_64 1.3.49.3-2.el6_4.1

bash-4.1# rpm -ivh http://172.17.42.1:8888/centos6/puppet/puppetserver-2.3.1-1.el6.noarch.rpm --nodeps --force
Retrieving http://172.17.42.1:8888/centos6/puppet/puppetserver-2.3.1-1.el6.noarch.rpm
warning: /var/tmp/rpm-tmp.7CAtn8: Header V4 RSA/SHA1 Signature, key ID 4bd6ec30: NOKEY
Preparing...                ########################################### [100%]
usermod: no changes
   1:puppetserver           ########################################### [100%]
usermod: no changes
bash-4.1# chkconfig --list | grep puppetserver
puppetserver    0:off   1:off   2:on    3:on    4:on    5:on    6:off

bash-4.1# cat /etc/sysconfig/puppetserver 
...
JAVA_BIN="/opt/jdk1.7.0_60/bin/java"
...

bash-4.1# netstat -a
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address               Foreign Address             State      
tcp        0      0 *:8140                      *:*                         LISTEN      
...

单机版HelloWorld

单机模式不需要认证,当做学习调试环境挺好的:方便简单。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
[root@hadoop-master2 manifests]# vi helloworld.pp
notify { 'greeting':
  message => 'Hello, world!'
}

[root@hadoop-master2 manifests]# puppet apply helloworld.pp 
Notice: Compiled catalog for hadoop-master2.localdomain in environment production in 0.03 seconds
Notice: Hello, world!
Notice: /Stage[main]/Main/Notify[greeting]/message: defined 'message' as 'Hello, world!'
Notice: Applied catalog in 0.04 seconds

# 可以用resource根据当前环境生成配置
[root@hadoop-master2 manifests]# puppet resource user hadoop
user { 'hadoop':
  ensure           => 'present',
  gid              => '500',
  home             => '/home/hadoop',
  password         => 'XXXXXX',
  password_max_age => '99999',
  password_min_age => '0',
  shell            => '/bin/bash',
  uid              => '500',
}

# 状态变更
[root@hadoop-master2 puppetlabs]# bin/puppet resource service puppet ensure=running enable=false
Notice: /Service[puppet]/enable: enable changed 'true' to 'false'
service { 'puppet':
  ensure => 'running',
  enable => 'false',
}
[root@hadoop-master2 puppetlabs]# chkconfig --list | grep puppet
puppet          0:off   1:off   2:off   3:off   4:off   5:off   6:off
puppetserver    0:off   1:off   2:on    3:on    4:on    5:on    6:off

CS模式配置

这里完全模拟生产环境情况(内网),首先搭建两个本地仓库:centos,puppet。puppet依赖RPM根据具体情况下载即可,我这里用的是centos6.5。

搭建私有仓库:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
增加 java-1.8.0-openjdk-headless 和 tzdata-java-2014g(iso带的2013g不适配)
[root@hadoop-master2 repo]# ll
total 142344
-rw-r--r-- 1 root root 33135156 Apr  9 21:47 java-1.8.0-openjdk-headless-1.8.0.51-3.b16.el6_7.x86_64.rpm
-rw-r--r-- 1 root root 26740012 Apr  9 11:29 puppet-agent-1.4.1-1.el6.x86_64.rpm
-rw-r--r-- 1 root root  4509000 Apr  9 11:29 puppet-dashboard-1.2.23-0.1rc3.el6.noarch.rpm
-rw-r--r-- 1 root root 21866876 Apr  9 11:29 puppetdb-4.0.0-1.el6.noarch.rpm
-rw-r--r-- 1 root root    25516 Apr  9 11:29 puppetdb-termini-3.2.4-1.el6.noarch.rpm
-rw-r--r-- 1 root root     3676 Apr  9 11:29 puppetdb-terminus-3-1.el6.noarch.rpm
-rw-r--r-- 1 root root 33412844 Apr  9 11:29 puppetserver-2.3.1-1.el6.noarch.rpm
drwxr-xr-x 2 root root     4096 Apr  9 22:56 repodata
-rw-r--r-- 1 root root   181196 Sep 17  2014 tzdata-java-2014g-1.el6.noarch.rpm

[root@hadoop-master2 ~]# mount -t iso9660 -o loop CentOS-6.5-x86_64-bin-DVD1.iso /mnt/cdrom
# httpd 我的系统已经安装了
[root@hadoop-master2 ~]# cd /var/www/html/
[root@hadoop-master2 html]# ll
total 820
lrwxrwxrwx  1 root root     10 Apr  9 21:54 centos6_5 -> /mnt/cdrom
lrwxrwxrwx  1 root root     20 Mar 30 17:11 puppet -> /opt/puppetlabs/repo

启动docker实例,参考 docker的安装。由于centos和puppet中有包冲突,需要安装 yum-priorities

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
[root@hadoop-master2 repo]# docker run -i -t centos:centos6 /bin/bash
bash-4.1# cat /etc/redhat-release 
CentOS release 6.5 (Final)

bash-4.1# yum install yum-plugin-priorities-1.1.30-30.el6.noarch.rpm 

# 把默认的repo清理掉,添加puppet和centos
bash-4.1# cat /etc/yum.repos.d/puppet-local.repo 
[puppet-local]
name=Puppet Local
baseurl=http://172.17.42.1/puppet
failovermethod=priority
enabled=1
gpgcheck=0
priority=1
bash-4.1# cat /etc/yum.repos.d/centos-local.repo 
[centos-local]
name=Centos Local
baseurl=http://172.17.42.1/centos6_5
failovermethod=priority
enabled=1
gpgcheck=0
priority=2

bash-4.1# yum install puppetserver

# 加载环境变量
bash-4.1# source /etc/profile.d/puppet-agent.sh
# 查看puppet各程序版本
bash-4.1# puppet -V
4.4.1
bash-4.1# puppetserver -v
puppetserver version: 2.3.1
bash-4.1# facter -v
3.1.5 (commit b5c2cf9b2ac290cb17fcadea19b467a39e17c1fd)

Agent安装:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
bash-4.1# cat /etc/yum.repos.d/puppet-local.repo 
[puppet-local]
name=Puppet Local
baseurl=http://172.17.42.1/puppet
failovermethod=priority
enabled=1
gpgcheck=0

[centos-local]
name=Centos Local
baseurl=http://172.17.42.1/centos6_5
failovermethod=priority
enabled=1
gpgcheck=0

bash-4.1# yum install puppet-agent -y

配置:

  • 添加hosts
1
2
3
4
bash-4.1# cat /etc/hosts
172.17.0.4 puppet
172.17.0.5 agent1
172.17.0.6 agent2
  • master自测
1
2
3
4
5
6
7
8
bash-4.1# puppet agent -t
Info: Using configured environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Caching catalog for 3e4b2ba27563.localdomain
Info: Applying configuration version '1460222292'
Info: Creating state file /opt/puppetlabs/puppet/cache/state/state.yaml
Notice: Applied catalog in 0.01 seconds
  • agent连接服务器
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
bash-4.1# puppet agent -t
Info: Creating a new SSL key for 5a56be361905.localdomain
Info: Caching certificate for ca
Info: csr_attributes file loading from /etc/puppetlabs/puppet/csr_attributes.yaml
Info: Creating a new SSL certificate request for 5a56be361905.localdomain
Info: Certificate Request fingerprint (SHA256): 58:1A:2E:28:D3:D7:C5:7B:E3:1A:C2:0F:70:D0:46:C0:34:39:7F:EC:98:65:B1:09:96:D3:4B:A7:4B:32:A6:C6
Info: Caching certificate for ca
Exiting; no certificate found and waitforcert is disabled

# master查看/认证
bash-4.1# puppet cert list
  "5a56be361905.localdomain" (SHA256) 58:1A:2E:28:D3:D7:C5:7B:E3:1A:C2:0F:70:D0:46:C0:34:39:7F:EC:98:65:B1:09:96:D3:4B:A7:4B:32:A6:C6
  "6516b8d0538b.localdomain" (SHA256) F7:49:CC:93:EA:5D:D9:A2:90:33:01:A9:74:86:97:0C:20:0C:EB:24:3A:13:85:64:5C:32:A8:D7:36:91:3C:77
bash-4.1# puppet cert sign --all 
Notice: Signed certificate request for 6516b8d0538b.localdomain
Notice: Removing file Puppet::SSL::CertificateRequest 6516b8d0538b.localdomain at '/etc/puppetlabs/puppet/ssl/ca/requests/6516b8d0538b.localdomain.pem'
Notice: Signed certificate request for 5a56be361905.localdomain
Notice: Removing file Puppet::SSL::CertificateRequest 5a56be361905.localdomain at '/etc/puppetlabs/puppet/ssl/ca/requests/5a56be361905.localdomain.pem'

# agent再连
bash-4.1# puppet agent -t
Info: Caching certificate for 5a56be361905.localdomain
Info: Caching certificate_revocation_list for ca
Info: Caching certificate for 5a56be361905.localdomain
Info: Using configured environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Caching catalog for 5a56be361905.localdomain
Info: Applying configuration version '1460222614'
Info: Creating state file /opt/puppetlabs/puppet/cache/state/state.yaml
Notice: Applied catalog in 0.02 seconds

相比puppet那么多配置项,安装还是相对简单的。安装写到这些也差不多了,接下来要研究下监控和puppet的配置。

安装过程中也遇到一些问题,主要都是DNS导致。一开始 直接用hosts 来配置是最简便的,把server的ip指定为puppet域名。

再来个Hello:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# master
bash-4.1# cd /etc/puppetlabs/code/environments/production/
bash-4.1# ls
environment.conf  hieradata  manifests  modules
bash-4.1# cd manifests/
bash-4.1# cat helloworld.pp 
notify { 'Hello World' : 
}

# agent
bash-4.1# puppet agent -t
Info: Using configured environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Caching catalog for 5a56be361905.localdomain
Info: Applying configuration version '1460223248'
Notice: Hello World
Notice: /Stage[main]/Main/Notify[Hello World]/message: defined 'message' as 'Hello World'
Notice: Applied catalog in 0.02 seconds
bash-4.1# 

最后说说PuppetServer性能

晚上很多资料都是旧的,一般都是 puppetmaster + apache/nginx + passenger 。新版本使用puppetserver后,服务运行在JVM之上( Puppet Server is hosted by a Jetty web server ),性能比原来ruby的方式更好(反正官网是这么说的)。所以没必要折腾其他ruby的东西了。

题外话:搭上JVM(java)的车,对于大家都好^_^,现在大数据HADOOP都是基于java的,spark的scala也是运行在JVM之上。

Because Puppet Server runs on the JVM, it takes a bit longer than the Apache/Passenger stack to start and get ready to accept HTTP connections.

Overall, Puppet Server performance is significantly better than a Puppet master running on the Apache/Passenger stack, but the initial startup is definitely slower.

参考

–END

DBCP参数在Hive JDBC上的实践

查询程序一开始只是简单使用dbcp来做连接的限制。在实践的过程中遇到各种问题,本文记录DBCP的参数优化提高程序健壮性的两次过程。

最开始的DBCP的配置:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
<bean id="hiveDataSource" class="org.apache.commons.dbcp.BasicDataSource"
  destroy-method="close" 
  p:driverClassName="${hiveDriverClassName}"
  p:url="${hiveUrl}" 
  p:username="${hiveUsername}" 
  p:password="${hivePassword}"
  p:maxIdle="${hiveMaxIdle}" 
  p:maxWait="${hiveMaxWait}" 
  p:maxActive="${hiveMaxActive}" />

<bean id="hiveTemplate" class="org.springframework.jdbc.core.JdbcTemplate">
  <property name="dataSource">
      <ref bean="hiveDataSource" />
  </property>
</bean>

第一个遇到的问题,就是每次hiveserver2重启后,这个查询程序也得重启。在实际使用过程中非常的麻烦!!

重启问题(连接断开后不能重连)

首先给出学习的链接 http://elf8848.iteye.com/blog/1931778 巨详细,同时问题的场景都一模一样啊!!

添加三个参数:

  • testOnBorrow = “true” 借出连接时不要测试,否则很影响性能。如果需要可以把validation语句搞个性能消耗最少的
  • testWhileIdle = “true” 指明连接是否被空闲连接回收器(如果有)进行检验.如果检测失败,则连接将被从池中去除.
  • validationQuery = “show databases” 验证连接是否可用,使用的SQL语句

解释:

testWhileIdle = “true” 表示每 {timeBetweenEvictionRunsMillis} (默认-1,不执行)秒,取出 {numTestsPerEvictionRun} (默认值3)条连接,使用 {validationQuery} 进行测试 ,测试不成功就销毁连接。销毁连接后,连接数量就少了,如果小于minIdle数量,就新建连接。

testOnBorrow = “true” 它的默认值是true,如果测试失败会drop掉然后再borrow。false表示每次从连接池中取出连接时,不需要执行 {validationQuery} 中的SQL进行测试。若配置为true,对性能有非常大的影响,性能会下降7-10倍。所在一定要配置为false.

调整参数后hiveserver2重启,查询再连会先报错然后再连。在每次取连接的时刻使用 show databases 测试,如果失败则从pool中删掉这个连接,重新再取,实现了重连的效果。这里不用 select 1 hive里面执行很慢, 同时testWhileIdle并没有生效,因为没有配置timeBetweenEvictionRunsMillis参数。

调整后的:

1
2
3
4
5
6
7
8
9
10
11
12
13
<bean id="hiveDataSource" class="org.apache.commons.dbcp.BasicDataSource"
destroy-method="close" 
p:driverClassName="${hiveDriverClassName}"
p:url="${hiveUrl}" 
p:username="${hiveUsername}" 
p:password="${hivePassword}"
p:testOnBorrow="${hiveTestOnBorrow}"
p:testWhileIdle="${hiveTestWhileIdle}" 
p:validationQuery="${hiveValidationQuery}"
p:maxIdle="${hiveMaxIdle}" 
p:maxWait="${hiveMaxWait}" 
p:maxActive="${hiveMaxActive}" 
/>

问题又来了,由于测试切换tez和spark才配置了上面的重连。但是切换到spark后,启动的spark会一直保持(连接创建的session不会主动关闭),直到hiveserver2 session超时(默认6h检查一次,7h idle就关闭)。

注意:有个隐忧,hive-on-spark每个连接都创建一个SESSION,这就退化到MR操作了。不能完全利用SPARK的优势!!例如业务中,即查询count、又获取一页数据,这里就是两个单独的spark程序!!N个session就N个 hive on spark 啊!!

第二个问题,服务端session强制关闭

问题其实和参考中的: MySQL8小时问题,Mysql服务器默认连接的“wait_timeout”是8小时,也就是说一个connection空闲超过8个小时,Mysql将自动断开该 connection 一模一样的。在增加 minEvictableIdleTimeMillistimeBetweenEvictionRunsMillis 设置检查和回收的时间。

  • timeBetweenEvictionRunsMillis = “1800000” 每30分钟运行一次空闲连接回收器,没必要那么频繁。
  • minEvictableIdleTimeMillis = “3600000” 池中的连接空闲1个小时后被回收,如果1个半小时没有操作,这个session就会被客户端关闭。可以通过yarn-8088的scheduler页面查看。

设置后的最终效果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
<bean id="hiveDataSource" class="org.apache.commons.dbcp.BasicDataSource"
destroy-method="close" 
p:driverClassName="${hiveDriverClassName}"
p:url="${hiveUrl}" 
p:username="${hiveUsername}" 
p:password="${hivePassword}"
p:testOnBorrow="${hiveTestOnBorrow}"
p:validationQuery="${hiveValidationQuery}"
p:maxWait="${hiveMaxWait}" 
p:maxIdle="${hiveMaxIdle}" 
p:maxActive="${hiveMaxActive}" 
p:testWhileIdle="${hiveTestWhileIdle}" 
p:timeBetweenEvictionRunsMillis="${hiveTimeBetweenEvictionRunsMillis}" 
p:minEvictableIdleTimeMillis="${hiveMinEvictableIdleTimeMillis}" 
p:removeAbandoned="true"
p:logAbandoned="true"
/>

很多程序都有很多参数,大部分能通过文档明白,但是一些参数不到实践真的很难真正体会它的含义。参考的文章两次改进我查看了,但是第一次看的时刻根本没去加其他参数,因为对我来说没用,解决当前问题用不到嘛。

hadoop的参数更多,core/hdfs/mapred/yarn需要多用才能发现参数的功能和妙用。纸上得来终觉浅,绝知此事要躬行

–END