Winse Blog

走走停停都是风景, 熙熙攘攘都向最好, 忙忙碌碌都为明朝, 何畏之.

Hadoop安装与升级-Docker中安装(1)

其实部署一个hadoop集群不难,按照步骤一步步的操作: 无密钥登录,防火墙(以及selinux),JDK,配置,启动(包括format)。

集群机器准备

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
[root@cu2 ~]# docker -v
Docker version 1.6.2, build 7c8fca2/1.6.2

[root@cu2 ~]# docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
centos              centos6             62068de82c82        4 months ago        250.7 MB

[root@cu2 ~]# docker run -d --name hadoop-master1 -h hadoop-master1 centos:centos6 /usr/sbin/sshd -D
c975b0e41429a3c214e86552f2a9f599ba8ee7487e8fbdc25fd59d29adacca4f
[root@cu2 ~]# docker run -d --name hadoop-master2 -h hadoop-master2 centos:centos6 /usr/sbin/sshd -D
fac1d2ee4a05ab8457f4bd6756622ac8236f64423544150d355f9e3091764d8f
[root@cu2 ~]# docker run -d --name hadoop-slaver1 -h hadoop-slaver1 centos:centos6 /usr/sbin/sshd -D
cc8734f2a0963a030b994f69be697308a13e511557eaefc7d4aca7e300950ded
[root@cu2 ~]# docker run -d --name hadoop-slaver2 -h hadoop-slaver2 centos:centos6 /usr/sbin/sshd -D
7e4b5410a7cb8585436775f15609708b309a5b83930da74d6571533251c26355
[root@cu2 ~]# docker run -d --name hadoop-slaver3 -h hadoop-slaver3 centos:centos6 /usr/sbin/sshd -D
26018b256403d956b4272b6bda09a58d1fc6938591d18f9892ba72782c41880b

[root@cu2 ~]# docker ps -a
CONTAINER ID        IMAGE               COMMAND               CREATED              STATUS              PORTS               NAMES
26018b256403        centos:centos6      "/usr/sbin/sshd -D"   About a minute ago   Up About a minute                       hadoop-slaver3      
7e4b5410a7cb        centos:centos6      "/usr/sbin/sshd -D"   About a minute ago   Up About a minute                       hadoop-slaver2      
cc8734f2a096        centos:centos6      "/usr/sbin/sshd -D"   About a minute ago   Up About a minute                       hadoop-slaver1      
fac1d2ee4a05        centos:centos6      "/usr/sbin/sshd -D"   About a minute ago   Up About a minute                       hadoop-master2      
c975b0e41429        centos:centos6      "/usr/sbin/sshd -D"   8 minutes ago        Up 8 minutes                            hadoop-master1      

[root@cu2 ~]# docker ps | grep hadoop | awk '{print $1}' | xargs -I{} docker inspect -f ' ' {}
172.17.0.6 hadoop-slaver3
172.17.0.5 hadoop-slaver2
172.17.0.4 hadoop-slaver1
172.17.0.3 hadoop-master2
172.17.0.2 hadoop-master1

重启docker后,可以直接通过名称启动即可:

1
2
3
4
5
6
7
8
[root@cu2 ~]# service docker start
Starting docker:                                           [  OK  ]
[root@cu2 ~]# docker start hadoop-master1 hadoop-master2 hadoop-slaver1 hadoop-slaver2 hadoop-slaver3
hadoop-master1
hadoop-master2
hadoop-slaver1
hadoop-slaver2
hadoop-slaver3

重启后,hosts文件会被重置!最好就是测试好之前不要重启docker!(长期用docker集群实例,还是指定DNS!!)

机器配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
[root@cu2 ~]# ssh root@172.17.0.2
root@172.17.0.2's password: 
Last login: Thu Jan  7 06:17:11 2016 from 172.17.42.1
[root@hadoop-master1 ~]# 
[root@hadoop-master1 ~]# vi /etc/hosts
127.0.0.1       localhost
::1     localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

172.17.0.6 hadoop-slaver3
172.17.0.5 hadoop-slaver2
172.17.0.4 hadoop-slaver1
172.17.0.3 hadoop-master2
172.17.0.2 hadoop-master1

[root@hadoop-master1 ~]# ssh-keygen
[root@hadoop-master1 ~]# 
[root@hadoop-master1 ~]# ssh-copy-id hadoop-master1
[root@hadoop-master1 ~]# ssh-copy-id hadoop-master2
[root@hadoop-master1 ~]# ssh-copy-id hadoop-slaver1
[root@hadoop-master1 ~]# ssh-copy-id hadoop-slaver2
[root@hadoop-master1 ~]# ssh-copy-id hadoop-slaver3

# 拷贝hosts
[root@hadoop-master1 ~]# for h in hadoop-master2 hadoop-slaver1 hadoop-slaver2 hadoop-slaver3 ; do scp /etc/hosts $h:/etc/ ; done

# 安装需要的软件
[root@hadoop-master1 ~]# for h in hadoop-master1 hadoop-master2 hadoop-slaver1 hadoop-slaver2 hadoop-slaver3 ; do ssh $h "yum install man rsync curl wget tar" ; done

# 创建用户
[root@hadoop-master1 ~]# for h in hadoop-master1 hadoop-master2 hadoop-slaver1 hadoop-slaver2 hadoop-slaver3 ; do ssh $h useradd hadoop ; done

#// 把要设置的密码拷贝一下,接下来直接右键(CRT)粘贴弄5次就可以了。如果是几十几百台机器可以使用expect来实现
[root@hadoop-master1 ~]# for h in hadoop-master1 hadoop-master2 hadoop-slaver1 hadoop-slaver2 hadoop-slaver3 ; do ssh $h passwd hadoop ; done
New password: hadoop
BAD PASSWORD: it is based on a dictionary word
BAD PASSWORD: is too simple
Retype new password: hadoop
Changing password for user hadoop.
passwd: all authentication tokens updated successfully.
...

# 建立数据目录,赋权给hadoop用户
[root@hadoop-master1 ~]# for h in hadoop-master1 hadoop-master2 hadoop-slaver1 hadoop-slaver2 hadoop-slaver3 ; do ssh $h "mkdir /data; chown hadoop:hadoop /data" ; done

[root@hadoop-master1 ~]# su - hadoop
[hadoop@hadoop-master1 ~]$ ssh-keygen 
[hadoop@hadoop-master1 ~]$ ssh-copy-id hadoop-master1
[hadoop@hadoop-master1 ~]$ ssh-copy-id hadoop-master2
[hadoop@hadoop-master1 ~]$ ssh-copy-id hadoop-slaver1
[hadoop@hadoop-master1 ~]$ ssh-copy-id hadoop-slaver2
[hadoop@hadoop-master1 ~]$ ssh-copy-id hadoop-slaver3

[hadoop@hadoop-master1 ~]$ ll
total 139036
drwxr-xr-x 9 hadoop hadoop      4096 Oct  7  2013 hadoop-2.2.0
-rw-r--r-- 1 hadoop hadoop 142362384 Jan  7 07:14 jdk-7u60-linux-x64.gz
drwxr-xr-x 8 hadoop hadoop      4096 Jan  7 07:11 zookeeper-3.4.6
[hadoop@hadoop-master1 ~]$ tar zxvf jdk-7u60-linux-x64.gz 
[hadoop@hadoop-master1 ~]$ tar zxvf hadoop-2.2.0.tar.gz 
[hadoop@hadoop-master1 ~]$ tar zxvf zookeeper-3.4.6.tar.gz 

# 清理生产上无用的数据
[hadoop@hadoop-master1 ~]$ rm hadoop-2.2.0.tar.gz zookeeper-3.4.6.tar.gz jdk-7u60-linux-x64.gz 

[hadoop@hadoop-master1 ~]$ cd zookeeper-3.4.6/
[hadoop@hadoop-master1 zookeeper-3.4.6]$ rm -rf docs/ src/

[hadoop@hadoop-master1 zookeeper-3.4.6]$ cd ../hadoop-2.2.0/
[hadoop@hadoop-master1 hadoop-2.2.0]$ cd share/
[hadoop@hadoop-master1 share]$ rm -rf doc/

程序配置与启动

  • java
1
2
3
4
5
6
7
[hadoop@hadoop-master1 ~]$ cd
[hadoop@hadoop-master1 ~]$ vi .bashrc 
...
JAVA_HOME=~/jdk1.7.0_60
PATH=$JAVA_HOME/bin:$PATH

export JAVA_HOME PATH

退出shell再登录,或者source .bashrc!

  • zookeeper
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
[hadoop@hadoop-master1 ~]$ cd zookeeper-3.4.6/conf
[hadoop@hadoop-master1 conf]$ cp zoo_sample.cfg zoo.cfg
[hadoop@hadoop-master1 conf]$ vi zoo.cfg 
...
dataDir=/data/zookeeper

[hadoop@hadoop-master1 ~]$ mkdir /data/zookeeper

[hadoop@hadoop-master1 ~]$ cd ~/zookeeper-3.4.6/
[hadoop@hadoop-master1 zookeeper-3.4.6]$ bin/zkServer.sh start
JMX enabled by default
Using config: /home/hadoop/zookeeper-3.4.6/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[hadoop@hadoop-master1 zookeeper-3.4.6]$ 
[hadoop@hadoop-master1 zookeeper-3.4.6]$ jps
244 QuorumPeerMain
265 Jps

[hadoop@hadoop-master1 zookeeper-3.4.6]$ less zookeeper.out 
  • hadoop
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
[hadoop@hadoop-master1 ~]$ cd ~/hadoop-2.2.0/etc/hadoop/
[hadoop@hadoop-master1 hadoop]$ rm *.cmd
[hadoop@hadoop-master1 hadoop]$ vi hadoop-env.sh 
# 修改java_home和hadoop_pid,以及yarn_pid

[hadoop@hadoop-master1 hadoop]$ vi core-site.xml 

<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop-master1:9000</value>
</property>

<property>
<name>hadoop.tmp.dir</name>
<value>/data/tmp</value>
</property>


[hadoop@hadoop-master1 hadoop]$ vi hdfs-site.xml 

<property>
<name>dfs.replication</name>
<value>1</value>
</property>

<property>
<name>dfs.namenode.secondary.http-address</name>
<value> </value>
</property>


[hadoop@hadoop-master1 hadoop]$ vi mapred-site.xml

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop-master1:10020</value>
</property>

<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop-master1:19888</value>
</property>


[hadoop@hadoop-master1 hadoop]$ vi yarn-site.xml 

<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>

<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

<property>
<name>yarn.resourcemanager.address</name>
<value>hadoop-master1:8032</value>
</property>

<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hadoop-master1:8030</value>
</property>

<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hadoop-master1:8031</value>
</property>

<property>
<name>yarn.resourcemanager.admin.address</name>
<value>hadoop-master1:8033</value>
</property>

<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>hadoop-master1:8080</value>
</property>

启动Hadoop

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
[hadoop@hadoop-master1 hadoop-2.2.0]$ bin/hadoop version
Hadoop 2.2.0
Subversion https://svn.apache.org/repos/asf/hadoop/common -r 1529768
Compiled by hortonmu on 2013-10-07T06:28Z
Compiled with protoc 2.5.0
From source with checksum 79e53ce7994d1628b240f09af91e1af4
This command was run using /home/hadoop/hadoop-2.2.0/share/hadoop/common/hadoop-common-2.2.0.jar

[hadoop@hadoop-master1 hadoop-2.2.0]$ bin/hadoop namenode -format

# 默认自带的libhadoop有点问题,start-dfs.sh通过hdfs getconf -namenodes输出信息导致执行错误
[hadoop@hadoop-master1 hadoop-2.2.0]$ rm lib/native/libh*

[hadoop@hadoop-master1 ~]$ cd 
[hadoop@hadoop-master1 ~]$ for h in hadoop-master2 hadoop-slaver1 hadoop-slaver2 hadoop-slaver3 ; do scp -r jdk1.7.0_60 $h:~/ ; done
[hadoop@hadoop-master1 ~]$ for h in hadoop-master2 hadoop-slaver1 hadoop-slaver2 hadoop-slaver3 ; do scp -r hadoop-2.2.0 $h:~/ ; done
[hadoop@hadoop-master1 ~]$ for h in hadoop-master2 hadoop-slaver1 hadoop-slaver2 hadoop-slaver3 ; do scp -r .bashrc $h:~/ ; done

[hadoop@hadoop-master1 ~]$ cd hadoop-2.2.0/
[hadoop@hadoop-master1 hadoop-2.2.0]$ sbin/start-dfs.sh

[hadoop@hadoop-master1 hadoop-2.2.0]$ sbin/stop-dfs.sh
[hadoop@hadoop-master1 hadoop-2.2.0]$ sbin/start-dfs.sh
[hadoop@hadoop-master1 hadoop-2.2.0]$ jps
244 QuorumPeerMain
3995 NameNode
4187 Jps

通过CRT的Port Forwarding的dynamic socket5,浏览器配置socket5代理就可以通过50070端口查看hadoop hdfs集群的状态了。

–END

Comments