Winse Blog

走走停停都是风景, 熙熙攘攘都向最好, 忙忙碌碌都为明朝, 何畏之.

Install Ganglia on Redhat5+

对使用C写的复杂的程序安装心里有阴影,还有本来可以上网的话,使用yum安装会省很多的事情。 但是没办法,环境是这样,正式环境没有提供网络环境,搭建的本地yum环境也不知道行不行。

上次在自己电脑的虚拟机上面成功安装过ganglia,但apache、rrdtool依赖使用yum安装的,安装过程比较揪心。把ganglia安装到正式环境就不了了之的。 上个星期生产环境出现了用户查询数据久久不能返回的问题,由于查询程序写的比较差的缘故。但同时也给自己敲了警钟,都不知道集群机器运行情况,终究是大隐患;安装后监测集群同时为以后程序的调优工作带来便利。

总结下,原来安装ganglia就仅是按照网络的步骤一步步的弄,同时各个程序的版本又有可能不一致,每一步都胆战心惊!没有重点重心,以至于浪费了很多的事情。 分步骤有条不紊的操作就可以踏实多了,安装ganglia主要涉及三个核心部分(安装程序包下载(提取码:0ec4)):

纯手工安装Ganglia(rrdtool也是手工安装),本次安装全部使用源码包安装,有部分lib有重复编译。

  • rrdtool
  • gmetad / gmond
  • apache / web
  • 集群子节点部署
  • 配置hadoop metrics监控hadoop集群

按照顺序一个个的安装就可以了,无需为一个个依赖的版本不一致问题而忧心。不考虑版本的问题时,可以更好的单个参考网络上的实践。

安装rrdtool

推荐按照官网教程步骤操作,非常的详细。(如果可以上网,推荐用yum安装,方便简单。其实我们也不是c程序员,也不是要成为rrdtool的开发者,能用会用就好!!) 教程中 环境变量必须得设置 !这个很重点!

下面是安装rrdtool过程中用到的软件,列出的顺序即为安装的次序:

1
2
3
4
5
6
7
8
9
10
11
12
13
[hadoop@umcc97-44 rrdbuild]$ ll -tr | grep -v 'tar'
总计 24132
drwxrwxrwx  6   1000          1000    4096 07-17 12:12 pkg-config-0.23
drwxr-xr-x 11 hadoop            80    4096 07-17 12:28 zlib-1.2.3
drwxr-xr-x  7   1004 avahi-autoipd    4096 07-17 12:29 libpng-1.2.18
drwxr-xr-x  8   1000 users            4096 07-17 12:31 freetype-2.3.5
drwxrwxrwx 15  50138 vcsa            12288 07-17 16:37 libxml2-2.6.32
drwxrwxrwx 15   1488 users            4096 07-17 16:53 fontconfig-2.4.2
drwxrwxrwx  4 sjyw   sjyw             4096 07-17 16:56 pixman-0.10.0
drwxrwsrwx  8   1000 ftp              4096 07-17 16:59 cairo-1.6.4
drwxrwxrwx 12 sjyw   sjyw             4096 07-17 17:01 glib-2.15.4
drwxrwxrwx  9 sjyw   sjyw             4096 07-17 17:16 pango-1.21.1
drwxr-xr-x 11   1003          1001    4096 07-17 17:36 rrdtool-1.4.8

具体操作的步骤(原来包括操作步骤,发现太累赘了重新调整了一下):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
# 下面几个环境变量是基础!
BUILD_DIR=/home/ganglia/rrdbuild
INSTALL_DIR=/opt/rrdtool-1.4.8

export PKG_CONFIG_PATH=${INSTALL_DIR}/lib/pkgconfig
export PATH=$INSTALL_DIR/bin:$PATH

export LDFLAGS="-Wl,--rpath -Wl,${INSTALL_DIR}/lib" 

[root@umcc97-44 rrdbuild]# tar zxvf pkg-config-0.23.tar.gz 
[root@umcc97-44 rrdbuild]# cd pkg-config-0.23
[root@umcc97-44 pkg-config-0.23]# ./configure --prefix=$INSTALL_DIR CFLAGS="-O3 -fPIC"
[root@umcc97-44 pkg-config-0.23]# make && make install

# 这个环境变量也很重要
[root@umcc97-44 pkg-config-0.23]# export PKG_CONFIG=$INSTALL_DIR/bin/pkg-config
[root@umcc97-44 pkg-config-0.23]# cd ..

[root@umcc97-44 rrdbuild]# tar zxvf zlib-1.2.3.tar.gz 
[root@umcc97-44 rrdbuild]# cd zlib-1.2.3
# 修改了下官网的命令; 64位问题 recompile with -fPIC
[root@umcc97-44 zlib-1.2.3]# CFLAGS="-O3 -fPIC" ./configure
[root@umcc97-44 zlib-1.2.3]# make && make install

[root@umcc97-44 rrdbuild]# tar zxvf libpng-1.2.18.tar.gz 
[root@umcc97-44 rrdbuild]# cd libpng-1.2.18
[root@umcc97-44 zlib-1.2.3]# cd ../libpng-1.2.18
[root@umcc97-44 libpng-1.2.18]# env CFLAGS="-O3 -fPIC" ./configure --prefix=$INSTALL_DIR
[root@umcc97-44 libpng-1.2.18]# make && make install

[root@umcc97-44 libpng-1.2.18]# cd ..
[root@umcc97-44 rrdbuild]# tar zxvf freetype-2.3.5.tar.gz 
[root@umcc97-44 rrdbuild]# cd freetype-2.3.5
[root@umcc97-44 freetype-2.3.5]# ./configure --prefix=$INSTALL_DIR CFLAGS="-O3 -fPIC"
[root@umcc97-44 freetype-2.3.5]# make && make install

[root@umcc97-44 rrdbuild]# tar zxvf libxml2-2.6.32.tar.gz 
[root@umcc97-44 rrdbuild]# cd libxml2-2.6.32
[root@umcc97-44 libxml2-2.6.32]#  ./configure --prefix=$INSTALL_DIR CFLAGS="-O3 -fPIC"
[root@umcc97-44 libxml2-2.6.32]# make && make install

[root@umcc97-44 libxml2-2.6.32]# cd ..
[root@umcc97-44 rrdbuild]# tar zxvf fontconfig-2.4.2.tar.gz 
[root@umcc97-44 rrdbuild]# cd fontconfig-2.4.2
[root@umcc97-44 fontconfig-2.4.2]# ./configure --prefix=$INSTALL_DIR CFLAGS="-O3 -fPIC" --with-freetype-config=$INSTALL_DIR/bin/freetype-config
[root@umcc97-44 fontconfig-2.4.2]# make && make install

[root@umcc97-44 fontconfig-2.4.2]# cd ..
[root@umcc97-44 rrdbuild]# cd pixman-0.10.0
[root@umcc97-44 pixman-0.10.0]# ./configure --prefix=$INSTALL_DIR CFLAGS="-O3 -fPIC"
[root@umcc97-44 pixman-0.10.0]# make && make install

[root@umcc97-44 pixman-0.10.0]# cd ../cairo-1.6.4
[root@umcc97-44 cairo-1.6.4]# ./configure --prefix=$INSTALL_DIR \
>     --enable-xlib=no \
>     --enable-xlib-render=no \
>     --enable-win32=no \
>     CFLAGS="-O3 -fPIC"
[root@umcc97-44 cairo-1.6.4]# make && make install

[root@umcc97-44 cairo-1.6.4]# cd ..
[root@umcc97-44 rrdbuild]# tar zxvf glib-2.15.4.tar.gz 
[root@umcc97-44 rrdbuild]# cd glib-2.15.4
[root@umcc97-44 glib-2.15.4]# ./configure --prefix=$INSTALL_DIR CFLAGS="-O3 -fPIC"
[root@umcc97-44 glib-2.15.4]# make && make install

[root@umcc97-44 rrdbuild]# bunzip2 -c pango-1.21.1.tar.bz2 | tar xf -
[root@umcc97-44 rrdbuild]# ll
[root@umcc97-44 rrdbuild]# cd pango-1.21.1
[root@umcc97-44 pango-1.21.1]# ./configure --prefix=$INSTALL_DIR CFLAGS="-O3 -fPIC" --without-x
[root@umcc97-44 pango-1.21.1]# export PATH=$INSTALL_DIR/bin:$PATH
[root@umcc97-44 pango-1.21.1]# make && make install

[root@umcc97-44 rrdbuild]# cd rrdtool-1.4.8/
[root@umcc97-44 rrdtool-1.4.8]#  ./configure --prefix=$INSTALL_DIR --disable-tcl --disable-python
[root@umcc97-44 rrdtool-1.4.8]# make clean
[root@umcc97-44 rrdtool-1.4.8]# make 
[root@umcc97-44 rrdtool-1.4.8]# make install
   
## 安装完后,搞个例子玩玩   
[root@umcc97-44 rrdtool-1.4.8]# cd /opt/rrdtool-1.4.8/share/rrdtool/examples/
[root@umcc97-44 examples]# ll
[root@umcc97-44 examples]# ./4charts.pl 
This script has created 4charts.png in the current directory
This demonstrates the use of the TIME and % RPN operators
# 运行完后,会在当前目录生成不同尺寸的png的图片
 
[hadoop@umcc97-44 ~]$ /opt/rrdtool-1.4.8/bin/rrdtool -v
RRDtool 1.4.8  Copyright 1997-2013 by Tobias Oetiker <tobi@oetiker.ch>
               Compiled Jul 17 2014 17:37:58

Usage: rrdtool [options] command command_options
Valid commands: create, update, updatev, graph, graphv,  dump, restore,
      last, lastupdate, first, info, fetch, tune,
      resize, xport, flushcached

RRDtool is distributed under the Terms of the GNU General
Public License Version 2. (www.gnu.org/copyleft/gpl.html)

For more information read the RRD manpages

到这里rrd安装好。期间,遇到zlib的CFLAGS变量设置的问题,以及终端断了必须重新设置环境变量两个大点的问题!其他如果按照官网的顺序安装基本顺顺利利了。

同时认识到了pkg,其实类似于java的jar嘛,依赖包不一定非要安装在系统的默认位置,自己管理也是一种简单易行的方式。接下来安装gmetad/gmond也使用这样方式,为后面部署gmond带来便利:所有依赖的包都放在一个目录下嘛! 接下来ganglia程序。

gmetad安装

需要用到的软件包:

1
2
3
4
5
6
./gangliabuild/ganglia-web-3.5.12
./gangliabuild/apr-1.5.1
./gangliabuild/apr-util-1.5.3
./gangliabuild/confuse-2.7
./gangliabuild/expat-2.0.1
./gangliabuild/ganglia-3.6.0

整个安装过程,除了make的时刻rrd的库找不到的问题(通过 LD_LIBRARY_PATH 解决),其他都可以很顺利的安装。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# 把下载来的tar全部解压
[root@umcc97-44 gangliabuild]# find . -name "*.tar.gz" -exec tar zxvf {} \;

[root@umcc97-44 gangliabuild]# cd expat-2.0.1
[root@umcc97-44 expat-2.0.1]# INSTALL_DIR=/opt/ganglia
[root@umcc97-44 expat-2.0.1]# ./configure --prefix=$INSTALL_DIR 
[root@umcc97-44 expat-2.0.1]# make && make install

[root@umcc97-44 expat-2.0.1]# cd ../apr-1.5.1
[root@umcc97-44 apr-1.5.1]# ./configure --prefix=$INSTALL_DIR 
[root@umcc97-44 apr-1.5.1]# make && make install

[root@umcc97-44 apr-1.5.1]# cd ../apr-util-1.5.3
[root@umcc97-44 apr-util-1.5.3]# ./configure --with-apr=/opt/ganglia --with-expat=/opt/ganglia --prefix=$INSTALL_DIR 
[root@umcc97-44 apr-util-1.5.3]# make && make install

[root@umcc97-44 apr-util-1.5.3]# cd ../confuse-2.7
[root@umcc97-44 confuse-2.7]# ./configure CFLAGS=-fPIC --disable-nls --prefix=$INSTALL_DIR 
[root@umcc97-44 confuse-2.7]# make && make install

[root@umcc97-44 confuse-2.7]# cd ../ganglia-3.6.0
[root@umcc97-44 ganglia-3.6.0]# export LDFLAGS="-Wl,--rpath -Wl,${INSTALL_DIR}/lib" 
[root@umcc97-44 ganglia-3.6.0]# export PKG_CONFIG_PATH=${INSTALL_DIR}/lib/pkgconfig
# 注意sysconfdir,运行程序配置所在的目录
[root@umcc97-44 ganglia-3.6.0]# ./configure --prefix=$INSTALL_DIR --with-librrd=/opt/rrdtool-1.4.8 --with-libexpat=/opt/ganglia --with-libconfuse=/opt/ganglia --with-libpcre=no  --with-gmetad --enable-gexec --enable-status -sysconfdir=/etc/ganglia
...
Welcome to..
     ______                  ___
    / ____/___ _____  ____ _/ (_)___ _
   / / __/ __ `/ __ \/ __ `/ / / __ `/
  / /_/ / /_/ / / / / /_/ / / / /_/ /
  \____/\__,_/_/ /_/\__, /_/_/\__,_/
                   /____/

Copyright (c) 2005 University of California, Berkeley

Version: 3.6.0
Library: Release 3.6.0 0:0:0

Type "make" to compile.

[root@umcc97-44 ganglia-3.6.0]# 
# 设置rrd的LIB路径
[root@umcc97-44 ganglia-3.6.0]# export LD_LIBRARY_PATH=/opt/rrdtool-1.4.8/lib
[root@umcc97-44 ganglia-3.6.0]# make
[root@umcc97-44 ganglia-3.6.0]# make install

接下来是配置gmetad

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
[root@umcc97-44 ganglia-3.6.0]#  cd gmetad
[root@umcc97-44 gmetad]# cp gmetad.init /etc/init.d/gmetad
[root@umcc97-44 gmetad]# chkconfig gmetad on

[root@umcc97-44 gmetad]# chkconfig --list gmetad
gmetad            0:off   1:off   2:on    3:on    4:on    5:on    6:off

[root@umcc97-44 gmetad]# mkdir -p /var/lib/ganglia/rrds
[root@umcc97-44 gmetad]# chown nobody:nobody /var/lib/ganglia/rrds
[root@umcc97-44 gmetad]# 
# 没有启动起来,程序的路径不对
[root@umcc97-44 gmetad]# service gmetad start
Starting GANGLIA gmetad: 
[root@umcc97-44 gmetad]# 
[root@umcc97-44 gmetad]# ln -s /opt/ganglia/sbin/gmetad /usr/sbin/gmetad
[root@umcc97-44 gmetad]# service gmetad start
Starting GANGLIA gmetad: [  OK  ]

# 配置
[root@umcc97-44 gmetad]# cp gmetad.conf /etc/ganglia/gmetad.conf
[root@umcc97-44 gmetad]# vi /etc/ganglia/gmetad.conf 
 datasource "hadoop" localhost
 rrd_rootdir "/var/lib/ganglia/rrds"

[root@umcc97-44 gmetad]# service gmetad restart
Shutting down GANGLIA gmetad: [  OK  ]
Starting GANGLIA gmetad: [  OK  ]

# 测试下
[root@umcc97-44 gmetad]# telnet localhost 8651

gmond安装(Update 2016-1-23 17:42:07 其实上面的步骤已经安装好了gmond)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
[root@umcc97-44 gmetad]# pwd
/home/ganglia/gangliabuild/ganglia-3.6.0/gmetad
[root@umcc97-44 gmetad]# cd ..
[root@umcc97-44 ganglia-3.6.0]# ./configure --prefix=$INSTALL_DIR  --with-libpcre=no
...
Welcome to..
     ______                  ___
    / ____/___ _____  ____ _/ (_)___ _
   / / __/ __ `/ __ \/ __ `/ / / __ `/
  / /_/ / /_/ / / / / /_/ / / / /_/ /
  \____/\__,_/_/ /_/\__, /_/_/\__,_/
                   /____/

Copyright (c) 2005 University of California, Berkeley

Version: 3.6.0
Library: Release 3.6.0 0:0:0

Type "make" to compile.

# 尽管检查通过了,但是make会报错
# 需要指定lib包位置
[root@umcc97-44 ganglia-3.6.0]# ./configure --prefix=$INSTALL_DIR  --with-libpcre=no  --with-libexpat=/opt/ganglia --with-libconfuse=/opt/ganglia -sysconfdir=/etc/ganglia
[root@umcc97-44 ganglia-3.6.0]# make && make install

[root@umcc97-44 ganglia-3.6.0]# cd gmond/
[root@umcc97-44 gmond]# ./gmond -t > /etc/ganglia/gmond.conf

# 和gmetad一样,需要把路径把程序做个软连接
[root@umcc97-44 gmond]# cat gmond.init
  #!/bin/sh
  #
  # chkconfig: 2345 70 40
  # description: gmond startup script
  #
  GMOND=/usr/sbin/gmond

...
[root@umcc97-44 gmond]# ln -s /opt/ganglia/sbin/gmond /usr/sbin/gmond

[root@umcc97-44 gmond]# cp gmond.init /etc/init.d/gmond
[root@umcc97-44 gmond]# chkconfig --add gmond
[root@umcc97-44 gmond]# chkconfig --list gmond
gmond             0:off   1:off   2:on    3:on    4:on    5:on    6:off

[root@umcc97-44 ganglia-3.6.0]# vi /etc/ganglia/gmond.conf 
 cluster-name

[root@umcc97-44 ganglia-3.6.0]# service gmond start
Starting GANGLIA gmond: [  OK  ]

# 测试下
[root@umcc97-44 ganglia-3.6.0]# telnet localhost 8649

查看运行情况:

1
2
[root@umcc97-44 ganglia-3.6.0]# ldconfig -v
[root@umcc97-44 ganglia-3.6.0]# /opt/ganglia/bin/gstat -a

安装apache和php环境

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
[root@umcc97-44 webbuild]# tar zxvf httpd-2.4.9.tar.gz 
[root@umcc97-44 webbuild]# cd httpd-2.4.9
[root@umcc97-44 httpd-2.4.9]# ./configure -with-enable-so -sysconfdir=/etc/httpd
...
checking for APR... no
configure: error: APR not found.  Please read the documentation.

# 前面安装ganglia时也安装过APR但是安装的目录指定的,混用不是很好。查看官方安装2.4的安装文档,可以直接把apr放到srclib下,编译时会同时编译这些依赖
[root@umcc97-44 httpd-2.4.9]# cd srclib/
[root@umcc97-44 srclib]# cp -r /home/ganglia/gangliabuild/apr-1.5.1 ./
[root@umcc97-44 srclib]# cp -r /home/ganglia/gangliabuild/apr-util-1.5.3 ./
[root@umcc97-44 srclib]# mv apr-1.5.1 apr
[root@umcc97-44 srclib]# mv apr-util-1.5.3 apr-util
[root@umcc97-44 srclib]# ll
[root@umcc97-44 srclib]# cd ..
[root@umcc97-44 httpd-2.4.9]#  cd ../
[root@umcc97-44 webbuild]# tar zxvf pcre-8.35.tar.gz 
# 正则表达式的包,这里安装默认位置
[root@umcc97-44 webbuild]# cd pcre-8.35
[root@umcc97-44 pcre-8.35]# ./configure 
[root@umcc97-44 pcre-8.35]# make && make install

[root@umcc97-44 pcre-8.35]# cd ../httpd-2.4.9
[root@umcc97-44 httpd-2.4.9]# ./configure --with-included-apr -with-enable-so -sysconfdir=/etc/httpd
[root@umcc97-44 httpd-2.4.9]# make && make install

[root@umcc97-44 httpd-2.4.9]# cd /usr/local/apache2/
[root@umcc97-44 apache2]# cd /etc/httpd

[root@umcc97-44 httpd]# cd /home/ganglia/webbuild/
[root@umcc97-44 webbuild]# tar zxvf php-5.5.14\ \(2\).tar.gz 
[root@umcc97-44 webbuild]# cd php-5.5.14
# 用了安装rrd时的libxml
[root@umcc97-44 php-5.5.14]# ./configure -with-apxs2=/usr/local/apache2/bin/apxs --with-libxml-dir=/opt/rrdtool-1.4.8/ -sysconfdir=/etc -with-config-file-path=/etc -with-config-file-scan-dir=/usr/etc/php.d -with-zlib
[root@umcc97-44 php-5.5.14]# make && make install

[root@umcc97-44 php-5.5.14]#  
[root@umcc97-44 php-5.5.14]#  vi /etc/httpd/httpd.conf

  LoadModule php5_module        modules/libphp5.so #这个安装php后自动加上了

  DocumentRoot "/var/www/html"
  <Directory "/var/www/html">

  AddType application/x-httpd-php .php

[root@umcc97-44 php-5.5.14]# /usr/local/apache2/bin/apachectl start
AH00526: Syntax error on line 215 of /etc/httpd/httpd.conf:
DocumentRoot must be a directory

[root@umcc97-44 php-5.5.14]# mkdir -p /var/www/html
[root@umcc97-44 php-5.5.14]# /usr/local/apache2/bin/apachectl start
AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using 10.18.97.44. Set the 'ServerName' directive globally to suppress this message

[root@umcc97-44 php-5.5.14]#  vi /etc/httpd/httpd.conf
  ServerName
[root@umcc97-44 php-5.5.14]#/usr/local/apache2/bin/apachectl start
httpd (pid 31416) already running

[root@umcc97-44 php-5.5.14]# cp /usr/local/apache2/bin/apachectl /etc/init.d/httpd
[root@umcc97-44 php-5.5.14]# chkconfig --add httpd
service httpd does not support chkconfig

[root@umcc97-44 ~]# cat /etc/init.d/httpd 
 #chkconfig: 2345 10 90
 #description: Activates/Deactivates Apache Web Server
 
[root@umcc97-44 ~]# service httpd start

[root@umcc97-44 ~]# cd /var/www/html/
[root@umcc97-44 ~]# vi index.php
# http://umcc97-44 浏览器查看下结果

# /usr/local/apache2/bin/apachectl -k stop
[root@umcc97-44 ganglia-web]# service httpd -k stop   
# 等apache结束
[root@umcc97-44 ganglia-web]# tail -f /usr/local/apache2/logs/error_log 

部署ganglia-web:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
[root@umcc97-44 ~]# cd /home/ganglia/gangliabuild/ganglia-web-3.5.12
[root@umcc97-44 ganglia-web-3.5.12]# ls
[root@umcc97-44 ganglia-web-3.5.12]# make install
rsync --exclude "rpmbuild" --exclude "*.gz" --exclude "Makefile" --exclude "*debian*" --exclude "ganglia-web-3.5.12" --exclude ".git*" --exclude "*.in" --exclude "*~" --exclude "#*#" --exclude "ganglia-web.spec" --exclude "apache.conf" -a . ganglia-web-3.5.12
mkdir -p //var/lib/ganglia-web/dwoo/compiled && \
  mkdir -p //var/lib/ganglia-web/dwoo/cache && \
  mkdir -p //var/lib/ganglia-web && \
  rsync -a ganglia-web-3.5.12/conf //var/lib/ganglia-web && \
  mkdir -p //usr/share/ganglia-webfrontend && \
  rsync --exclude "conf" -a ganglia-web-3.5.12/* //usr/share/ganglia-webfrontend && \
  chown -R root:root //var/lib/ganglia-web

[root@umcc97-44 ganglia-web-3.5.12]# mv /usr/share/ganglia-webfrontend /var/www/html/ganglia
[root@umcc97-44 ganglia-web-3.5.12]# cd /var/www/html/ganglia/    

# 修改配置,在安装完gmetad后有新建/var/lib/ganglia/rrds其实和conf中的配置是一致的
[root@umcc97-44 ganglia]# cp conf_default.php conf.php    

[root@umcc97-44 ganglia]# cd /var/lib/ganglia-web/
[root@umcc97-44 ganglia-web]# cd dwoo/
[root@umcc97-44 dwoo]# ll
total 8
drwxr-xr-x 2 root root 4096 Jul 17 21:34 cache
drwxr-xr-x 2 root root 4096 Jul 17 21:34 compiled
[root@umcc97-44 dwoo]# chmod 777 *    
# http://umcc97-44/ganglia

部署gmond到其他集群节点

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

[root@umcc97-44 opt]# cat /etc/init.d/gmond 
  #!/bin/sh
  #
  # chkconfig: 2345 70 40
  # description: gmond startup script
  #
  GMOND=/usr/sbin/gmond   

[root@umcc97-44 opt]# vi /etc/ganglia/gmetad.conf     
 data_source
 # 重启gmetad

# 部署其他节点
[root@umcc97-44 opt]# ssh-copy-id -i ~/.ssh/id_rsa.pub umcc97-144
[root@umcc97-44 opt]# scp /etc/init.d/gmond umcc97-144:/etc/init.d/
[root@umcc97-44 opt]# ssh umcc97-144 'mkdir /etc/ganglia' 
[root@umcc97-44 opt]# scp /etc/ganglia/gmond.conf  umcc97-144:/etc/ganglia/
[root@umcc97-44 opt]# rsync -vaz ganglia umcc97-144:/opt/
[root@umcc97-44 opt]# ssh umcc97-144
Last login: Tue Jun 10 12:08:47 2014

[root@umcc97-144 ~]# ln -s /opt/ganglia/sbin/gmond /usr/sbin/gmond
[root@umcc97-144 ~]# chkconfig --add gmond
[root@umcc97-144 ~]# service gmond start
Starting GANGLIA gmond: [  OK  ]

Hadoop/Hbase Metrics配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
[hadoop@umcc97-44 ~]$ cat hadoop-2.2.0/etc/hadoop/hadoop-metrics*
#
#   Licensed to the Apache Software Foundation (ASF) under one or more
#   contributor license agreements.  See the NOTICE file distributed with
#   this work for additional information regarding copyright ownership.
#   The ASF licenses this file to You under the Apache License, Version 2.0
#   (the "License"); you may not use this file except in compliance with
#   the License.  You may obtain a copy of the License at
#
#       http://www.apache.org/licenses/LICENSE-2.0
#
#   Unless required by applicable law or agreed to in writing, software
#   distributed under the License is distributed on an "AS IS" BASIS,
#   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#   See the License for the specific language governing permissions and
#   limitations under the License.
#

# syntax: [prefix].[source|sink].[instance].[options]
# See javadoc of package-info.java for org.apache.hadoop.metrics2 for details

# @changed
#*.sink.file.class=org.apache.hadoop.metrics2.sink.FileSink
# default sampling period, in seconds
#*.period=10

# The namenode-metrics.out will contain metrics from all context
#namenode.sink.file.filename=namenode-metrics.out
# Specifying a special sampling period for namenode:
#namenode.sink.*.period=8

#datanode.sink.file.filename=datanode-metrics.out

# the following example split metrics of different
# context to different sinks (in this case files)
#jobtracker.sink.file_jvm.context=jvm
#jobtracker.sink.file_jvm.filename=jobtracker-jvm-metrics.out
#jobtracker.sink.file_mapred.context=mapred
#jobtracker.sink.file_mapred.filename=jobtracker-mapred-metrics.out

#tasktracker.sink.file.filename=tasktracker-metrics.out

#maptask.sink.file.filename=maptask-metrics.out

#reducetask.sink.file.filename=reducetask-metrics.out



*.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31
*.sink.ganglia.period=10

*.sink.ganglia.slope=jvm.metrics.gcCount=zero,jvm.metrics.memHeapUsedM=both
*.sink.ganglia.dmax=jvm.metrics.threadsBlocked=70,jvm.metrics.memHeapUsedM=40

namenode.sink.ganglia.servers=umcc97-44:8649
resourcemanager.sink.ganglia.servers=umcc97-44:8649

datanode.sink.ganglia.servers=umcc97-44:8649
nodemanager.sink.ganglia.servers=umcc97-44:8649

maptask.sink.ganglia.servers=umcc97-44:8649
reducetask.sink.ganglia.servers=umcc97-44:8649



# Configuration of the "dfs" context for null
dfs.class=org.apache.hadoop.metrics.spi.NullContext

# Configuration of the "dfs" context for file
#dfs.class=org.apache.hadoop.metrics.file.FileContext
#dfs.period=10
#dfs.fileName=/tmp/dfsmetrics.log

# Configuration of the "dfs" context for ganglia
# Pick one: Ganglia 3.0 (former) or Ganglia 3.1 (latter)
# dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext
# dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
# dfs.period=10
# dfs.servers=localhost:8649


# Configuration of the "mapred" context for null
mapred.class=org.apache.hadoop.metrics.spi.NullContext

# Configuration of the "mapred" context for file
#mapred.class=org.apache.hadoop.metrics.file.FileContext
#mapred.period=10
#mapred.fileName=/tmp/mrmetrics.log

# Configuration of the "mapred" context for ganglia
# Pick one: Ganglia 3.0 (former) or Ganglia 3.1 (latter)
# mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext
# mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
# mapred.period=10
# mapred.servers=localhost:8649


# Configuration of the "jvm" context for null
#jvm.class=org.apache.hadoop.metrics.spi.NullContext

# Configuration of the "jvm" context for file
#jvm.class=org.apache.hadoop.metrics.file.FileContext
#jvm.period=10
#jvm.fileName=/tmp/jvmmetrics.log

# Configuration of the "jvm" context for ganglia
# jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext
# jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
# jvm.period=10
# jvm.servers=localhost:8649

# Configuration of the "rpc" context for null
rpc.class=org.apache.hadoop.metrics.spi.NullContext

# Configuration of the "rpc" context for file
#rpc.class=org.apache.hadoop.metrics.file.FileContext
#rpc.period=10
#rpc.fileName=/tmp/rpcmetrics.log

# Configuration of the "rpc" context for ganglia
# rpc.class=org.apache.hadoop.metrics.ganglia.GangliaContext
# rpc.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
# rpc.period=10
# rpc.servers=localhost:8649


# Configuration of the "ugi" context for null
ugi.class=org.apache.hadoop.metrics.spi.NullContext

# Configuration of the "ugi" context for file
#ugi.class=org.apache.hadoop.metrics.file.FileContext
#ugi.period=10
#ugi.fileName=/tmp/ugimetrics.log

# Configuration of the "ugi" context for ganglia
# ugi.class=org.apache.hadoop.metrics.ganglia.GangliaContext
# ugi.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
# ugi.period=10
# ugi.servers=localhost:8649

[hadoop@umcc97-44 ~]$ cat hbase-0.98.3-hadoop2/conf/hadoop-metrics2-hbase.properties 
# syntax: [prefix].[source|sink].[instance].[options]
# See javadoc of package-info.java for org.apache.hadoop.metrics2 for details

#*.sink.file*.class=org.apache.hadoop.metrics2.sink.FileSink
# default sampling period
#*.period=10

# Below are some examples of sinks that could be used
# to monitor different hbase daemons.

# hbase.sink.file-all.class=org.apache.hadoop.metrics2.sink.FileSink
# hbase.sink.file-all.filename=all.metrics

# hbase.sink.file0.class=org.apache.hadoop.metrics2.sink.FileSink
# hbase.sink.file0.context=hmaster
# hbase.sink.file0.filename=master.metrics

# hbase.sink.file1.class=org.apache.hadoop.metrics2.sink.FileSink
# hbase.sink.file1.context=thrift-one
# hbase.sink.file1.filename=thrift-one.metrics

# hbase.sink.file2.class=org.apache.hadoop.metrics2.sink.FileSink
# hbase.sink.file2.context=thrift-two
# hbase.sink.file2.filename=thrift-one.metrics

# hbase.sink.file3.class=org.apache.hadoop.metrics2.sink.FileSink
# hbase.sink.file3.context=rest
# hbase.sink.file3.filename=rest.metrics


*.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31
*.sink.ganglia.period=10

hbase.sink.ganglia.period=10
hbase.sink.ganglia.servers=umcc97-44:8649

然后properties配置同步到集群的从节点(datanode/regionserver),重启集群。等一会儿就能在ganglia-web界面看到多了很多很多的指标量。

参考

ganglia

apache web

–END

Upgrade Hive: 0.12.0 to 0.13.1

由于hive-0.12.0的FileSystem使用不当导致内存溢出问题,最终考虑升级hive。升级的过程没想象中的那么可怕,步骤很简单:对源数据库执行升级脚本,拷贝原hive-0.12.0的配置和jar,然后把添加jar重启hiverserver2即可。记录了升级到0.13,添加tez,调试hive。

修改环境变量

1
2
HIVE_HOME=/home/hadoop/apache-hive-0.13.1-bin
PATH=$JAVA_HOME/bin:$HIVE_HOME/bin:$PATH

如果要使用hwi,需要自己下载原来编译生成war。(默认的bin.tar.gz里面不包括)

1
2
winse@Lenovo-PC ~/git/hive/hwi
$ mvn package war:war

配置的时刻注意下hive.hwi.war.file是相对于HIVE_HOME的位置lib/hive-hwi-0.13.1.war。同时需要把$JDK/lib/tools.jar加入到classpath。

1
2
3
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/home/esw/jdk1.7.0_60/lib/tools.jar

$CD/bin/hive --service hwi

升级metadata

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
[hadoop@ismp0 ~]$ cd apache-hive-0.13.1-bin/scripts/metastore/upgrade/mysql/

[hadoop@ismp0 mysql]$ mysql -uXXX -hXXX -pXXX
mysql> use hive
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> source upgrade-0.12.0-to-0.13.0.mysql.sql
+--------------------------------------------------+
|                                                  |
+--------------------------------------------------+
| Upgrading MetaStore schema from 0.12.0 to 0.13.0 |
+--------------------------------------------------+
1 row in set, 1 warning (0.00 sec)

+-----------------------------------------------------------------------+
|                                                                       |
+-----------------------------------------------------------------------+
| < HIVE-5700 enforce single date format for partition column storage > |
+-----------------------------------------------------------------------+
1 row in set, 1 warning (0.00 sec)

Query OK, 0 rows affected (0.22 sec)
Rows matched: 0  Changed: 0  Warnings: 0

+--------------------------------------------+
|                                            |
+--------------------------------------------+
| < HIVE-6386: Add owner filed to database > |
+--------------------------------------------+
1 row in set, 1 warning (0.00 sec)

Query OK, 1 row affected (0.33 sec)
Records: 1  Duplicates: 0  Warnings: 0

Query OK, 1 row affected (0.16 sec)
Records: 1  Duplicates: 0  Warnings: 0

+---------------------------------------------------------------------------------------------+
|                                                                                             |
+---------------------------------------------------------------------------------------------+
| <HIVE-6458 Add schema upgrade scripts for metastore changes related to permanent functions> |
+---------------------------------------------------------------------------------------------+
1 row in set, 1 warning (0.00 sec)

Query OK, 0 rows affected (0.06 sec)

Query OK, 0 rows affected (0.06 sec)

+----------------------------------------------------------------------------------+
|                                                                                  |
+----------------------------------------------------------------------------------+
| <HIVE-6757 Remove deprecated parquet classes from outside of org.apache package> |
+----------------------------------------------------------------------------------+
1 row in set, 1 warning (0.00 sec)

Query OK, 0 rows affected (0.04 sec)
Rows matched: 0  Changed: 0  Warnings: 0

Query OK, 0 rows affected (0.01 sec)
Rows matched: 0  Changed: 0  Warnings: 0

Query OK, 0 rows affected (0.01 sec)
Rows matched: 0  Changed: 0  Warnings: 0

Query OK, 0 rows affected (0.07 sec)

Query OK, 0 rows affected (0.12 sec)

Query OK, 0 rows affected (0.07 sec)

Query OK, 0 rows affected (0.06 sec)

Query OK, 1 row affected (0.05 sec)

Query OK, 0 rows affected (0.06 sec)

Query OK, 0 rows affected (0.15 sec)
Records: 0  Duplicates: 0  Warnings: 0

Query OK, 0 rows affected (0.06 sec)

Query OK, 1 row affected (0.05 sec)

Query OK, 0 rows affected (0.07 sec)

Query OK, 0 rows affected (0.06 sec)

Query OK, 1 row affected (0.05 sec)

Query OK, 1 row affected (0.07 sec)
Rows matched: 1  Changed: 1  Warnings: 0

+-----------------------------------------------------------+
|                                                           |
+-----------------------------------------------------------+
| Finished upgrading MetaStore schema from 0.12.0 to 0.13.0 |
+-----------------------------------------------------------+
1 row in set, 1 warning (0.00 sec)

mysql> 
mysql> 
mysql> exit
Bye

[hadoop@ismp0 ~]$ vi .bash_profile
[hadoop@ismp0 ~]$ source .bash_profile
[hadoop@ismp0 ~]$ cd apache-hive-0.13.1-bin
[hadoop@ismp0 apache-hive-0.13.1-bin]$ cd conf/
[hadoop@ismp0 conf]$ cp ~/hive-0.12.0/conf/hive-site.xml ./
[hadoop@ismp0 conf]$ cd ..
[hadoop@ismp0 apache-hive-0.13.1-bin]$ cp ~/hive-0.12.0/lib/mysql-connector-java-5.1.21-bin.jar lib/
[hadoop@ismp0 apache-hive-0.13.1-bin]$ hive
[hadoop@ismp0 apache-hive-0.13.1-bin]$ hive

hive>  select count(*) from t_ods_idc_isp_log2 where day=20140624;
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1403006477300_3403, Tracking URL = http://umcc97-79:8088/proxy/application_1403006477300_3403/
Kill Command = /home/hadoop/hadoop-2.2.0/bin/hadoop job  -kill job_1403006477300_3403
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 1
2014-06-24 17:19:07,618 Stage-1 map = 0%,  reduce = 0%
2014-06-24 17:19:15,283 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 2.37 sec
2014-06-24 17:19:16,360 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 5.49 sec
2014-06-24 17:19:22,749 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 7.99 sec
MapReduce Total cumulative CPU time: 7 seconds 990 msec
Ended Job = job_1403006477300_3403
MapReduce Jobs Launched: 
Job 0: Map: 2  Reduce: 1   Cumulative CPU: 7.99 sec   HDFS Read: 19785618 HDFS Write: 6 SUCCESS
Total MapReduce CPU Time Spent: 7 seconds 990 msec
OK
77625
Time taken: 36.387 seconds, Fetched: 1 row(s)
hive> 

[hadoop@ismp0 apache-hive-0.13.1-bin]$ nohup bin/hiveserver2 &

$# 测试hive-jdbc
[hadoop@ismp0 apache-hive-0.13.1-bin]$ bin/beeline 
Beeline version 0.13.1 by Apache Hive
beeline> !connect jdbc:hive2://10.18.97.22:10000/
scan complete in 7ms
Connecting to jdbc:hive2://10.18.97.22:10000/
Enter username for jdbc:hive2://10.18.97.22:10000/: hadoop
Enter password for jdbc:hive2://10.18.97.22:10000/: 
Connected to: Apache Hive (version 0.13.1)
Driver: Hive JDBC (version 0.13.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://10.18.97.22:10000/> show tables;
+-------------------------+
|        tab_name         |
+-------------------------+
...
| test_123                |
+-------------------------+
10 rows selected (2.547 seconds)
0: jdbc:hive2://10.18.97.22:10000/>  select count(*) from t_ods_idc_isp_log2 where day=20140624;
+--------+
|  _c0   |
+--------+
| 77625  |
+--------+
1 row selected (37.463 seconds)
0: jdbc:hive2://10.18.97.22:10000/> 

上一篇tez的安装使用中由于hive的缘故进行了回退,现在升级到hive-0.13后,也在hive上试下tez的功能:

  • 本地添加tez依赖,设置环境变量
  • MR添加tez依赖,添加tez-site.xml
  • 切换到tez的engine
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
$# 已上传到HDFS
$ hadoop fs -mkdir /apps
$ hadoop fs -put tez-0.4.0-incubating /apps/
$ hadoop fs -ls /apps
Found 1 items
drwxr-xr-x   - hadoop supergroup          0 2014-09-09 16:19 /apps/tez-0.4.0-incubating

$ cat etc/hadoop/tez-site.xml 
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
  <property>
    <name>tez.lib.uris</name>
    <value>${fs.default.name}/apps/tez-0.4.0-incubating,${fs.default.name}/apps/tez-0.4.0-incubating/lib/</value>
  </property>
</configuration>

$ export HADOOP_CLASSPATH=${TEZ_HOME}/*:${TEZ_HOME}/lib/*:$HADOOP_CLASSPATH
$ apache-hive-0.13.1-bin/bin/hive
hive> set hive.execution.engine=tez;
hive> select count(*) from t_ods_idc_isp_log2 ;
Time taken: 24.926 seconds, Fetched: 1 row(s)

hive> set hive.execution.engine=mr;                              
hive> select count(*) from t_ods_idc_isp_log2 where day=20140720;
Time taken: 40.585 seconds, Fetched: 1 row(s)

// 添加TEZ的jar到CLASSPATH
$# @hive-env.sh
 # export TEZ_HOME=/home/hadoop/tez-0.4.0-incubating
 # export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$TEZ_HOME/*:$TEZ_HOME/lib/*
$ last_hour=2014090915
$ hive --hiveconf hive.execution.engine=tez -e "select houseId, count(*) 
from 
(
select houseId
from t_house_monitor2
where hour=$last_hour
group by from_unixtime(cast(accesstime as bigint), 'yyyyMMdd'),houseId,IP,port,domain,serviceType,illegalType,currentState,usr,icpError,regerror,regDomain,use_type,real_useType
) hs
group by houseId"

简单从时间上看,还是有效果的。

调试Hive

也很简单,hive脚本已经默认集成了这个功能,设置下DEBUG环境变量即可。

1
2
3
4
5
6
7
[hadoop@master1 ~]$ less apache-hive-0.13.1-bin/bin/ext/debug.sh
[hadoop@master1 bin]$ less hive

$# 脚本最终会把调试的参数` -agentlib:jdwp=transport=dt_socket,server=y,address=8000,suspend=y`加入到HADOOP_CLIENT_OPTS中,最后合并到HADOOP_OPTS传递给java程序。

[hadoop@master1 bin]$ DEBUG=true hive
Listening for transport dt_socket at address: 8000

然后通过eclipse的远程调试即可一步步的查看整个过程。下面断点处为记录解析功能:

编译源码导入eclipse

1
2
3
4
5
6
$ git clone https://github.com/apache/hive.git

winse@Lenovo-PC /cygdrive/e/git/hive
$ git checkout branch-0.13

E:\git\hive>mvn clean package eclipse:eclipse -DskipTests -Dmaven.test.skip=true -Phadoop-2

注意点

  • 除了分区,hive表数据路径下不能包括其他文件夹
1
2
3
4
5
6
7
8
9
10
11
12
13
14
hive> create database test location '/user/hive/warehouse_temp/' ;

hive> create table t_ods_ddos as select * from default.t_ods_ddos limit 0;

hive> select * from t_ods_ddos;
OK
Time taken: 0.176 seconds

[hadoop@umcc97-44 ~]$ hadoop fs -mkdir /user/hive/warehouse_temp/t_ods_ddos/abc

hive> select * from t_ods_ddos;
OK
Failed with exception java.io.IOException:java.io.IOException: Not a file: hdfs://umcc97-44:9000/user/hive/warehouse_temp/t_ods_ddos/abc
Time taken: 0.167 seconds

–END

Tez编译及使用

初步了解

hadoop2自带的mapreduce任务中间只能传递一次,也即一个任务只能聚合一次(然后就的写入磁盘)。tez项目是对原有yarn架构的一个拓展,使用DAG(无环有向图)实现MRR的任务框架。

上图中,左边的MR任务完成一个步骤后,需要进行 数据存储 后再执行另一个任务来进行第二个 reduce ; 而tez则可以在reduce后继续执行reduce,减少了中间过程的IO以及mapreduce的启动时间。

环境整合

  • Install/Deploy
  • hadoop-2.2.0(umcc97-44:hdfs, umcc97-79:yarn)
  • windows下使用Cygwin编译

下载编译tez

首先下载tez-0.4.0-incubating.tar.gz,同时还需要protoc的程序支持(可以参考Hadoop源码编译)。 解压后,使用mvn编译。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
Administrator@winseliu /cygdrive/e/local/libs/big
$ tar zxvf tez-0.4.0-incubating.tar.gz

Administrator@winseliu /cygdrive/e/local/libs/big
$ cd tez-0.4.0-incubating/

Administrator@winseliu /cygdrive/e/local/libs/big/tez-0.4.0-incubating
$ mvn install -DskipTests -Dmaven.javadoc.skip
...
[INFO] Reactor Summary:
[INFO]
[INFO] tez ............................................... SUCCESS [1.518s]
[INFO] tez-api ........................................... SUCCESS [8.890s]
[INFO] tez-common ........................................ SUCCESS [0.725s]
[INFO] tez-runtime-internals ............................. SUCCESS [2.529s]
[INFO] tez-runtime-library ............................... SUCCESS [5.100s]
[INFO] tez-mapreduce ..................................... SUCCESS [3.666s]
[INFO] tez-mapreduce-examples ............................ SUCCESS [2.692s]
[INFO] tez-dag ........................................... SUCCESS [13.943s]
[INFO] tez-tests ......................................... SUCCESS [1.691s]
[INFO] tez-dist .......................................... SUCCESS [14.370s]
[INFO] Tez ............................................... SUCCESS [0.245s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 55.791s
[INFO] Finished at: Tue Jun 17 17:33:45 CST 2014
[INFO] Final Memory: 35M/151M
[INFO] ------------------------------------------------------------------------

上传tez程序的jars到HDFS

为了简单我直接把tez jars上传到开发环境的集群上面去测试了。放到本地集群环境应该也类似。

1
2
3
4
5
6
7
8
9
10
11
Administrator@winseliu /cygdrive/e/local/libs/big/tez-0.4.0-incubating
$ cd tez-dist/

Administrator@winseliu /cygdrive/e/local/libs/big/tez-0.4.0-incubating/tez-dist
$ cd target/

Administrator@winseliu /cygdrive/e/local/libs/big/tez-0.4.0-incubating/tez-dist/target
$ export HADOOP_USER_NAME=hadoop

Administrator@winseliu /cygdrive/e/local/libs/big/tez-0.4.0-incubating/tez-dist/target
$ hadoop dfs -put tez-0.4.0-incubating/tez-0.4.0-incubating/ hdfs://umcc97-44:9000/apps/ 

配置集群环境

首先看下原来集群的classpath路径,路径中已经包括了 etc/hadoop 目录,所以这里我直接把 tez-site.xml 放到该目录下。同时把tez-lib复制到 share/hadoop/tez 目录下,并添加到 HADOOP_CLASSPATH 环境变量。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
[hadoop@umcc97-79 hadoop]$ hadoop classpath
/home/hadoop/hadoop-2.2.0/etc/hadoop:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/*:/home/hadoop/hadoop-2.2.0/share/hadoop/common/*:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/*:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/*:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/*:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/*:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/*:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/*:/home/hadoop/hadoop-2.2.0/contrib/capacity-scheduler/*.jar

# 用于map/reduce
[hadoop@umcc97-79 hadoop]$ cat tez-site.xml 
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
<property>
  <name>tez.lib.uris</name>
  <value>${fs.default.name}/apps/tez-0.4.0-incubating,${fs.default.name}/apps/tez-0.4.0-incubating/lib/</value>
</property>
</configuration>

[hadoop@umcc97-79 hadoop]$ cd ~/hadoop-2.2.0/share/hadoop/tez/
[hadoop@umcc97-79 tez]$ ll
total 9616
-rw-r--r-- 1 hadoop hadoop  303139 Jun 17 17:33 avro-1.7.4.jar
-rw-r--r-- 1 hadoop hadoop   41123 Jun 17 17:33 commons-cli-1.2.jar
-rw-r--r-- 1 hadoop hadoop  610259 Jun 17 17:33 commons-collections4-4.0.jar
-rw-r--r-- 1 hadoop hadoop 1648200 Jun 17 17:33 guava-11.0.2.jar
-rw-r--r-- 1 hadoop hadoop  710492 Jun 17 17:33 guice-3.0.jar
-rw-r--r-- 1 hadoop hadoop  656365 Jun 17 17:33 hadoop-mapreduce-client-common-2.2.0.jar
-rw-r--r-- 1 hadoop hadoop 1455001 Jun 17 17:33 hadoop-mapreduce-client-core-2.2.0.jar
-rw-r--r-- 1 hadoop hadoop   21537 Jun 17 17:33 hadoop-mapreduce-client-shuffle-2.2.0.jar
-rw-r--r-- 1 hadoop hadoop   81743 Jun 17 17:33 jettison-1.3.4.jar
-rw-r--r-- 1 hadoop hadoop  533455 Jun 17 17:33 protobuf-java-2.5.0.jar
-rw-r--r-- 1 hadoop hadoop  995968 Jun 17 17:33 snappy-java-1.0.4.1.jar
-rw-r--r-- 1 hadoop hadoop  749917 Jun 17 17:33 tez-api-0.4.0-incubating.jar
-rw-r--r-- 1 hadoop hadoop   34049 Jun 17 17:33 tez-common-0.4.0-incubating.jar
-rw-r--r-- 1 hadoop hadoop  970987 Jun 17 17:33 tez-dag-0.4.0-incubating.jar
-rw-r--r-- 1 hadoop hadoop  246409 Jun 17 17:33 tez-mapreduce-0.4.0-incubating.jar
-rw-r--r-- 1 hadoop hadoop  199934 Jun 17 17:33 tez-mapreduce-examples-0.4.0-incubating.jar
-rw-r--r-- 1 hadoop hadoop  114692 Jun 17 17:33 tez-runtime-internals-0.4.0-incubating.jar
-rw-r--r-- 1 hadoop hadoop  352177 Jun 17 17:33 tez-runtime-library-0.4.0-incubating.jar
-rw-r--r-- 1 hadoop hadoop    6845 Jun 17 17:33 tez-tests-0.4.0-incubating.jar

# MR配置,用于client任务提交
[hadoop@umcc97-79 hadoop]$ grep HADOOP_CLASSPATH hadoop-env.sh
export HADOOP_CLASSPATH=${HADOOP_HOME}/share/hadoop/tez/*:${HADOOP_HOME}/share/hadoop/tez/lib/*:$HADOOP_CLASSPATH

[hadoop@umcc97-79 hadoop]$ sed -n 19,23p mapred-site.xml
<configuration>
<property>
  <name>mapreduce.framework.name</name>
  <value>yarn-tez</value>
</property>

同步,重启yarn

1
2
3
4
5
6
for h in `cat hadoop-2.2.0/etc/hadoop/slaves ` ; do 
  rsync -vaz --exclude=logs --exclude=pid --exclude=tmp  hadoop-2.2.0 $h:~/ ; 
done

# 同步到secondnamenode
rsync -vaz --exclude=logs --exclude=pid --exclude=tmp  hadoop-2.2.0 umcc97-44:~/

测试

1
2
3
4
5
6
7
8
9
10
11
[hadoop@umcc97-79 ~]$ hadoop classpath
/home/hadoop/hadoop-2.2.0/etc/hadoop:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/*:/home/hadoop/hadoop-2.2.0/share/hadoop/common/*:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/*:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/*:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/*:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/*:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/*:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/*:/home/hadoop/hadoop-2.2.0/share/hadoop/tez/*:/home/hadoop/hadoop-2.2.0/share/hadoop/tez/lib/*:/home/hadoop/hadoop-2.2.0/contrib/capacity-scheduler/*.jar

[hadoop@umcc97-79 ~]$ cd hadoop-2.2.0/share/hadoop/mapreduce/
[hadoop@umcc97-79 mapreduce]$ hadoop jar hadoop-mapreduce-client-jobclient-2.2.0-tests.jar sleep -mt 1 -rt 1 -m 1 -r 1

cd hadoop-2.2.0/share/hadoop/tez/

hadoop fs -put ~/hadoop-2.2.0/logs/yarn-hadoop-resourcemanager-umcc97-79.* /hello/in
hadoop fs -rmr /hello/out
hadoop jar tez-mapreduce-examples-0.4.0-incubating.jar orderedwordcount  /hello/in /hello/out

回滚,使用时临时修改环境变量即可

使用了tez后,导致hive-0.12.0不能运行。由于其他同事需要用hive,得把配置全部修改回去。【升级hive请查看hive-0.13中使用tez

在配置文件中配置为yarn,要使用tez在 提交任务 时指定配置参数即可。

1
2
3
export HADOOP_CLASSPATH=${HADOOP_HOME}/share/hadoop/tez/*:${HADOOP_HOME}/share/hadoop/tez/lib/*:$HADOOP_CLASSPATH
hadoop jar hadoop-2.2.0/share/hadoop/tez/tez-mapreduce-examples-0.4.0-incubating.jar orderedwordcount \
  -Dmapreduce.framework.name=yarn-tez  /hello/in /hello/out

org.apache.tez.mapreduce.examples.OrderedWordCount不仅计算出了结果,同时按个数大小进行了排序。

问题: tez的任务的history还不知道怎么弄的,启动historyserver没作用?

0.6版本已经有ui了。

持续更新

本来想编译好tez-0.6就往hive-0.13上面放,没想到遇到钉子了!!hive-0.13不支持!!

在编译tez并想集成到hive,先下载hive的源码,看看pom.xml中使用的是到底是什么版本的tez,再编译tez不迟!!!

1
2
apache-hive-1.1.0-src.tar.gz/pom.xml
    <tez.version>0.5.2</tez.version>

tez-0.6在hadoop-2.2基础上编译:

1
2
3
4
5
6
7
8
E:\local\opt\bigdata\apache-tez-0.6.0-src>mvn  package -Dhadoop.version=2.2.0 -DskipTests -Dmaven.javadoc.skip=true -DskipATS

vi tez-dist/pom.xml
<profile>
      <id>hadoop26</id>
      <activation>
        <activeByDefault>false</activeByDefault>
      </activation>

–END

远程调试hadoop2以及错误处理方法

了解程序运行过程,除了一行行代码的扫射源代码。更快捷的方式是运行调试源码,通过F6/F7来一步步的带领我们熟悉程序。针对特定细节具体数据,打个断点调试则是水到渠成的方式。

Java远程调试

1
2
3
 * JDK 1.3 or earlier -Xnoagent -Djava.compiler=NONE -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=6006
 * JDK 1.4(linux ok) -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=6006
 * newer JDK(win7 & jdk7) -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=6006

同一操作系统任务提交

windows提交到windows,linux提交到linux,可以直接通过命令行添加参数调试wordcount任务:

1
2
3
E:\local\dotfile>hdfs dfs -rmr /out # native-lib放在非path路径下,cmd脚本中有对其进行处理

E:\local\dotfile>hadoop org.apache.hadoop.examples.WordCount  "-Dmapreduce.map.java.opts=-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=8090 -Djava.library.path=E:\local\libs\big\hadoop-2.2.0\lib\native -Dmapreduce.reduce.java.opts=-Djava.library.path=E:\local\libs\big\hadoop-2.2.0\lib\native"  /in /out

suspend设置为y,会等待客户端连接再运行。在eclipse中在WordCount$TokenizerMapper#map打个断点,然后再使用Remote Java Application就可以调试程序了。

Hadoop集群环境下调试任务

hadoop有很多的程序,同样有对应的环境变量选项来进行设置!

  • 主程序-调试Job提交
    • set HADOOP_OPTS="-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=8090"
    • 可以在配置文件中进行设置。需要注意可能会覆盖已经设置的该参数的值。
  • Nodemanager调试
    • set HADOOP_NODEMANAGER_OPTS="-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=8092"
    • (linux下需要定义在文件中)YARN_NODEMANAGER_OPTS="-Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8092"
  • ResourceManager调试
    • HADOOP_RESOURCEMANAGER_OPTS
    • export YARN_RESOURCEMANAGER_OPTS="-Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8091"

Linux上的设置略有不同,通过SSH再调用的进程(如NodeManager)需要把其OPTS写到命令行脚本文件中!! linux需要远程调试NodeManager的话,需要写到etc/hadoop/yarn-env.sh文件中!不然,nodemanger不生效(通过ssh去执行的)!

其他调试技巧

调试测试集群环境,比本地windows开发环境复杂点。毕竟本地windows的就一个主一个从。而把任务放到分布式集群上时,例如调试分布式缓存的! 那么就需要一些小技巧来获取任务运行所在的机器!下面的步骤中有具体操作命令。

任务配置及运行

eclipse下windows提交job到linux的补丁,查阅[MAPREDUCE-5655]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# 配置
  <property>
      <name>mapred.remote.os</name>
      <value>Linux</value>
  </property>
  <property>
      <name>mapreduce.job.jar</name>
      <value>dta-analyser-all.jar</value>
  </property>

  <property>
      <name>mapreduce.map.java.opts</name>
      <value>-Xmx1024m -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=18090</value>
  </property>

  <property>
      <name>mapred.task.timeout</name>
      <value>1800000</value>
  </property>

# 代码,map/reduce数都设置为1 
job.setNumReduceTasks(1);
job.getConfiguration().setInt(MRJobConfig.NUM_MAPS, 1);

  • 调试的时刻把超时时间设置的久一点,否则:
1
 Got exception: java.net.SocketTimeoutException: Call From winseliu/127.0.0.1 to winse.com:2850 failed on socket timeout exception: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch :
  • 调试main方法参数设置

调试main(转瞬即逝的把suspend设置为true!),map的调试选项的语句写在配置文件里面

1
2
3
4
export HADOOP_OPTS="-Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=8073"

Administrator@winseliu ~/hadoop
$ sh -x bin/hadoop org.apache.hadoop.examples.WordCount /in /out 

遍历所有子节点,查找节点运行map程序的信息

map调试的端口配置为18090,根据这个选项来查找程序运行的机器。

1
2
3
4
5
6
7
8
9
[hadoop@umcc97-44 ~]$ for h in `cat hadoop-2.2.0/etc/hadoop/slaves` ; do ssh $h 'ps aux|grep java | grep 18090'; echo $h;  done
hadoop    8667  0.0  0.0  63888  1268 ?        Ss   18:21   0:00 bash -c ps aux|grep java | grep 18090
umcc97-142
hadoop   12686  0.0  0.0  63868  1260 ?        Ss   18:21   0:00 bash -c ps aux|grep java | grep 18090
umcc97-143
hadoop   23516  0.0  0.0  63856  1108 ?        Ss   18:11   0:00 /bin/bash -c /home/java/jdk1.7.0_45/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN  -Xmx256m -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=18090 -Djava.io.tmpdir=/home/hadoop/hadoop-2.2.0/tmp/nm-local-dir/usercache/hadoop/appcache/application_1397006359464_1605/container_1397006359464_1605_01_000002/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/home/hadoop/hadoop-2.2.0/logs/userlogs/application_1397006359464_1605/container_1397006359464_1605_01_000002 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild 10.18.97.143 57576 attempt_1397006359464_1605_m_000000_0 2 1>/home/hadoop/hadoop-2.2.0/logs/userlogs/application_1397006359464_1605/container_1397006359464_1605_01_000002/stdout 2>/home/hadoop/hadoop-2.2.0/logs/userlogs/application_1397006359464_1605/container_1397006359464_1605_01_000002/stderr 
hadoop   23522  0.0  0.0 605136 15728 ?        Sl   18:11   0:00 /home/java/jdk1.7.0_45/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx256m -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=18090 -Djava.io.tmpdir=/home/hadoop/hadoop-2.2.0/tmp/nm-local-dir/usercache/hadoop/appcache/application_1397006359464_1605/container_1397006359464_1605_01_000002/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/home/hadoop/hadoop-2.2.0/logs/userlogs/application_1397006359464_1605/container_1397006359464_1605_01_000002 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild 10.18.97.143 57576 attempt_1397006359464_1605_m_000000_0 2
hadoop   23665  0.0  0.0  63856  1264 ?        Ss   18:21   0:00 bash -c ps aux|grep java | grep 18090
umcc97-144

仅打印运行map的节点名称

1
2
3
[hadoop@umcc97-44 ~]$ for h in `cat hadoop-2.2.0/etc/hadoop/slaves` ; do ssh $h 'if ps aux|grep -v grep | grep java | grep 18090 | grep -v bash 2>&1 1>/dev/null ; then echo `hostname`; fi'; done
umcc97-142
[hadoop@umcc97-44 ~]$ 

后面的操作就和普通的java程序调试步骤一样了。不再赘述。

任务运行过程中的数据

辅助运行的两个bash程序

运行的第一个程序(000001)为AppMaster,第二程序(000002)才是我们提交job的map任务。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
[hadoop@umcc97-143 ~]$ cd hadoop-2.2.0/tmp/nm-local-dir/nmPrivate
[hadoop@umcc97-143 nmPrivate]$ ls -Rl
.:
total 12
drwxrwxr-x 4 hadoop hadoop 4096 Apr 21 18:34 application_1397006359464_1606
-rw-rw-r-- 1 hadoop hadoop    6 Apr 21 18:34 container_1397006359464_1606_01_000001.pid
-rw-rw-r-- 1 hadoop hadoop    6 Apr 21 18:34 container_1397006359464_1606_01_000002.pid

./application_1397006359464_1606:
total 8
drwxrwxr-x 2 hadoop hadoop 4096 Apr 21 18:34 container_1397006359464_1606_01_000001
drwxrwxr-x 2 hadoop hadoop 4096 Apr 21 18:34 container_1397006359464_1606_01_000002

./application_1397006359464_1606/container_1397006359464_1606_01_000001:
total 8
-rw-r--r-- 1 hadoop hadoop   95 Apr 21 18:34 container_1397006359464_1606_01_000001.tokens
-rw-r--r-- 1 hadoop hadoop 3121 Apr 21 18:34 launch_container.sh

./application_1397006359464_1606/container_1397006359464_1606_01_000002:
total 8
-rw-r--r-- 1 hadoop hadoop  129 Apr 21 18:34 container_1397006359464_1606_01_000002.tokens
-rw-r--r-- 1 hadoop hadoop 3532 Apr 21 18:34 launch_container.sh
[hadoop@umcc97-143 nmPrivate]$ 
[hadoop@umcc97-143 nmPrivate]$ jps
4692 NodeManager
4173 DataNode
13497 YarnChild
7538 HRegionServer
13376 MRAppMaster
13574 Jps
[hadoop@umcc97-143 nmPrivate]$ cat *.pid
13366
13491
[hadoop@umcc97-143 nmPrivate]$ ps aux | grep 13366
hadoop   13366  0.0  0.0  63868  1088 ?        Ss   18:34   0:00 /bin/bash -c /home/java/jdk1.7.0_45/bin/java -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/home/hadoop/hadoop-2.2.0/logs/userlogs/application_1397006359464_1606/container_1397006359464_1606_01_000001 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA  -Xmx1024m org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1>/home/hadoop/hadoop-2.2.0/logs/userlogs/application_1397006359464_1606/container_1397006359464_1606_01_000001/stdout 2>/home/hadoop/hadoop-2.2.0/logs/userlogs/application_1397006359464_1606/container_1397006359464_1606_01_000001/stderr 
hadoop   13594  0.0  0.0  61204   760 pts/2    S+   18:36   0:00 grep 13366
[hadoop@umcc97-143 nmPrivate]$ ps aux | grep 13491
hadoop   13491  0.0  0.0  63868  1100 ?        Ss   18:34   0:00 /bin/bash -c /home/java/jdk1.7.0_45/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN  -Xmx256m -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=18090 -Djava.io.tmpdir=/home/hadoop/hadoop-2.2.0/tmp/nm-local-dir/usercache/hadoop/appcache/application_1397006359464_1606/container_1397006359464_1606_01_000002/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/home/hadoop/hadoop-2.2.0/logs/userlogs/application_1397006359464_1606/container_1397006359464_1606_01_000002 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild 10.18.97.143 52046 attempt_1397006359464_1606_m_000000_0 2 1>/home/hadoop/hadoop-2.2.0/logs/userlogs/application_1397006359464_1606/container_1397006359464_1606_01_000002/stdout 2>/home/hadoop/hadoop-2.2.0/logs/userlogs/application_1397006359464_1606/container_1397006359464_1606_01_000002/stderr 
hadoop   13599  0.0  0.0  61204   760 pts/2    S+   18:37   0:00 grep 13491
[hadoop@umcc97-143 nmPrivate]$ 

程序运行本地缓存数据

1
2
3
4
5
6
7
8
9
10
[hadoop@umcc97-143 container_1397006359464_1606_01_000002]$ ls -l
total 28
-rw-r--r-- 1 hadoop hadoop  129 Apr 21 18:34 container_tokens
-rwx------ 1 hadoop hadoop  516 Apr 21 18:34 default_container_executor.sh
lrwxrwxrwx 1 hadoop hadoop   65 Apr 21 18:34 filter.io -> /home/hadoop/hadoop-2.2.0/tmp/nm-local-dir/filecache/10/filter.io
lrwxrwxrwx 1 hadoop hadoop  120 Apr 21 18:34 job.jar -> /home/hadoop/hadoop-2.2.0/tmp/nm-local-dir/usercache/hadoop/appcache/application_1397006359464_1606/filecache/10/job.jar
lrwxrwxrwx 1 hadoop hadoop  120 Apr 21 18:34 job.xml -> /home/hadoop/hadoop-2.2.0/tmp/nm-local-dir/usercache/hadoop/appcache/application_1397006359464_1606/filecache/13/job.xml
-rwx------ 1 hadoop hadoop 3532 Apr 21 18:34 launch_container.sh
drwx--x--- 2 hadoop hadoop 4096 Apr 21 18:34 tmp
[hadoop@umcc97-143 container_1397006359464_1606_01_000002]$ 

处理问题方法

  • 打印DEBUG日志:export HADOOP_ROOT_LOGGER=DEBUG,console
    • 日志文件放置在nodemanager节点的logs/userlogs目录下。
  • 打印DEBUG日志也搞不定时,可以在源码里面sysout信息然后把class覆盖,来进行定位配置的问题。
  • 如果不清楚shell的执行过程,可以通过sh -x [CMD],或者在脚本文件的操作前加上set -x。相当于windows-batch的echo on功能。

参考

–END

Hadoop2学习过程/资源

接触集群,起始为毕业论文觉得做一个SSH的内容管理系统觉得无趣,选择了Hadoop相关的选题。尽管做的很烂,但是当时做出来一个东西还是挺开心的。中间断了近一年,但是在鼎象做游戏的时刻部署系统/维护查日志,对linux熟悉了不少。在科韵开始做插件开发,神马都的看源码。而后,真正的做了一个hadoop的项目,相比2年前,对编程和学习的方式都有了提升。hadoop,kettle这些都是很牛掰的很火热软件,但是中文资料相对较少,变化也很快。慢慢的觉得自己看代码和英文的资源都过得去。

再碌碌无为的过了一年,而今又接触hadoop,时代变了,但是基本的技术还是相同的。从hadoop1升级到hadoop2,尽管变化很大,熟悉的过程中再次遇到很多的问题。觉得无能无力,原来的日子好似白过了!记录是一种美德,不仅是走过路过的足迹,亦是摘树后人乘凉的盛举。

要弄hadoop,首先得把对english的偏见放下!如果还是baidu查找你遇到的问题,那或许你会多走很多的弯路!

书籍

开始还是推荐下中文资料:

  • [Hadoop权威指南/Hadoop The Definitive Guide] 大师写的书,值的膜拜
  • [hadoop实战/Hadoop in Action] 相对来说是也是一本不错的

网页资源

  1. Linux部署Hadoop
  2. Windows部署Hadoop
  3. eclipse直接访问HDFS/提交任务
  4. cygwin部署hadoop
  5. hdfs脚本文件系统
  6. 编译源码
  7. 远程调试
  8. 与正式环境有关
  9. 开发参考的资源/代码访问集群

工具

  • lrzsz
  • SecureCRT
  • WinSCP
  • w3m # 最后使用SSH代理访问取代!

思维

  • 化整为零

技巧

  • 查看jar列表:jar tvf JAR
  • ssh-copy-id
  • rsync
  • find
  • ls -Sl
  • while/for
  • cat unknown.txt | cut -b 62- | sort | uniq

  • 持续的更新 -

–END