Winse Blog

走走停停都是风景, 熙熙攘攘都向最好, 忙忙碌碌都为明朝, 何畏之.

Dnsmasq解决docker集群节点互通问题

上个星期学习了一下docker,写了一个伪分布式的Dockerfile

通过--link的方式master能访问slaver,毕竟slaver的相关信息已经被写入到master的hosts文件里面去了嘛!理所当然认为,直接把master的hosts文件全部复制一份到所有slaver节点问题就解决了。

等真正操作的时刻,发现不是那么回事,docker容器不给修改hosts文件!!(2016-1-7 14:18:11 注: Docker 1.6.2已经可以修改/etc/hosts了!重启后hosts的变更也没了,囧)

错误实现

首先,看下不当的操作:

1
2
3
4
5
6
7
8
9
10
# 注意:没有填写image,会去找Dockerfile
[root@docker hadoop]# docker run -d --name slaver1 -h slaver1 hadoop
[root@docker hadoop]# docker run -d --name slaver2 -h slaver2 hadoop
[root@docker hadoop]# docker run -d --name master -h master --link slaver1:slaver1 --link slaver2:slaver2 hadoop

[root@docker ~]# docker ps
CONTAINER ID        IMAGE               COMMAND                CREATED             STATUS              PORTS               NAMES
dafc82678811        hadoop:latest       /bin/sh -c '/usr/sbi   40 seconds ago      Up 40 seconds       22/tcp              master
86d2da5209c5        hadoop:latest       /bin/sh -c '/usr/sbi   49 seconds ago      Up 48 seconds       22/tcp              master/slaver2,slaver2
7b9761fb05a8        hadoop:latest       /bin/sh -c '/usr/sbi   56 seconds ago      Up 55 seconds       22/tcp              master/slaver1,slaver1

此时,通过--link连接方式,master的hosts中已经包括了slaver1和slaver2,按照正常的路子,登录master拷贝其hosts到slaver节点,一切就妥妥的了。现实是残酷的:

1
2
-bash-4.1# scp /etc/hosts slaver1:/etc/
scp: /etc//hosts: Read-only file system

DNS完美解决问题

首先需要在宿主机器上安装dns服务器,bind不多说比较麻烦。这里参考网上人家解决方式,使用dnsmasq来搭建DNS服务器。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
[root@docker ~]# yum install dnsmasq -y

[root@docker ~]# cp /etc/resolv.conf /etc/resolv.dnsmasq.conf 
[root@docker ~]# touch /etc/dnsmasq.hosts

[root@docker ~]# vi /etc/resolv.conf
[root@docker ~]# cat /etc/resolv.conf
; generated by /sbin/dhclient-script
nameserver 127.0.0.1 

[root@docker ~]# vi /etc/dnsmasq.conf
[root@docker ~]# cat /etc/dnsmasq.conf
...
resolv-file=/etc/resolv.dnsmasq.conf
...
addn-hosts=/etc/dnsmasq.hosts

[root@docker ~]# service dnsmasq restart

[root@docker ~]# dig www.baidu.com
...
;; SERVER: 127.0.0.1#53(127.0.0.1)
...

通过dig可以查看当前的DNS服务器你已经修改为localhost了。然后启动docker容器来搭建环境。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# 注意:没有填写image,会去找Dockerfile

[root@docker hadoop]# docker run -d  --dns 172.17.42.1 --name slaver1 -h slaver1 hadoop
[root@docker hadoop]# docker run -d  --dns 172.17.42.1 --name slaver2 -h slaver2 hadoop
[root@docker hadoop]# docker run -d  --dns 172.17.42.1 --name master -h master hadoop

[root@docker ~]# docker ps
CONTAINER ID        IMAGE               COMMAND                CREATED             STATUS              PORTS               NAMES
f6e63b311e60        hadoop:latest       /bin/sh -c '/usr/sbi   6 seconds ago       Up 5 seconds        22/tcp              master
454ae2c3e435        hadoop:latest       /bin/sh -c '/usr/sbi   13 seconds ago      Up 12 seconds       22/tcp              slaver2
7698230a03fb        hadoop:latest       /bin/sh -c '/usr/sbi   21 seconds ago      Up 20 seconds       22/tcp              slaver1

[root@docker ~]# docker ps | grep hadoop | awk '{print $1}' | xargs -I{} docker inspect -f '{{.NetworkSettings.IPAddress}} {{.Config.Hostname}}' {} > /etc/dnsmasq.hosts
[root@docker ~]# service dnsmasq restart

[root@docker ~]# ssh hadoop@master
hadoop@master's password: 
[hadoop@master ~]$ ping slaver1
PING slaver1 (172.17.0.9) 56(84) bytes of data.
64 bytes from slaver1 (172.17.0.9): icmp_seq=1 ttl=64 time=1.79 ms
...
[hadoop@master ~]$ ping slaver2
PING slaver2 (172.17.0.10) 56(84) bytes of data.
64 bytes from slaver2 (172.17.0.10): icmp_seq=1 ttl=64 time=1.96 ms
...

节点互通后,后面的步骤都类似了,ssh无密钥通信,格式化namenode,启动等等。

遇到的问题

  • 一开始我把配置文件放在/root目录下,dnsmasq总是不起作用。最后放到/etc目录就可以,不知道啥子问题。
  • 配置dns启动docker容器后,如果不起作用看下/etc/resolv.conf。如果互ping不同,去掉resolv的search localhost再试下。

DNS可以正常工作的配置:

1
2
3
4
5
6
7
8
9
10
-bash-4.1# ping slaver
PING slaver (172.17.0.7) 56(84) bytes of data.
64 bytes from slaver (172.17.0.7): icmp_seq=1 ttl=64 time=0.095 ms

-bash-4.1# cat /etc/resolv.conf 
nameserver 172.17.42.1
search localdomain

-bash-4.1# cat /etc/resolv.conf 
nameserver 172.17.42.1

如果还是不行的话,关掉防火墙然后重启下docker服务: service iptables stop; service docker restart

如果要访问外网,也可以条件其他的DNS服务解析:

1
2
3
-bash-4.1# vi /etc/resolv.conf 
nameserver 172.17.42.1
nameserver 8.8.8.8

常用命令

1
2
3
~]# docker run -d --dns 172.17.42.1 --name puppet -h puppet winse/hadoop:2.6.0 /usr/sbin/sshd -D
~]# docker inspect `docker ps -a | grep centos | awk '{print $1}'` | grep IPAddress
~]# docker stop `docker ps -a | grep centos | awk '{print $1}'`

参考

–END

编译/搭建Spark环境

记录spark编译和打包成tar的整个流程。包括各个版本的编译过程,使用make-distribution脚本打包,搭建本地、standalone、yarn的spark环境。

  • 2016-1 spark-1.6.0
  • 2015-04 【Spark-1.3.0】单独附在最后,添加了spark-sql功能使用和spark-HA的配置

编译和打包

  • spark-1.6.0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
// java version "1.7.0_17" & Apache Maven 3.3.9 & CentOS release 6.6 (Final)
[hadoop@cu2 spark-1.6.0]$ export MAVEN_OPTS="-Xmx3g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"
[hadoop@cu2 spark-1.6.0]$ mvn package eclipse:eclipse -Phadoop-2.6 -Dhadoop.version=2.6.3 -Pyarn -Phive -Phive-thriftserver -Dmaven.test.skip=true -Dmaven.javadoc.skip=true -DskipTests

[hadoop@cu2 spark-1.6.0]$ vi make-distribution.sh 
BUILD_COMMAND=("$MVN" package -DskipTests $@)

[hadoop@cu2 spark-1.6.0]$ ./make-distribution.sh --tgz --mvn "$(which mvn)"  -Dhadoop-2.6 -Dhadoop.version=2.6.3 -Pyarn -Phive -Phive-thriftserver -Dmaven.test.skip=true -Dmaven.javadoc.skip=true -DskipTests 
[hadoop@cu2 spark-1.6.0]$ ll spark-1.6.0-bin-2.6.3.tgz 

// examples
[hadoop@cu2 spark-1.6.0-bin-2.6.3]$ export HADOOP_CONF_DIR=~/hadoop-2.6.3/etc/hadoop
[hadoop@cu2 spark-1.6.0-bin-2.6.3]$ bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client lib/spark-examples-1.6.0-hadoop2.6.3.jar 10

[hadoop@cu2 spark-1.6.0-bin-2.6.3]$ export HADOOP_CONF_DIR=~/hadoop-2.6.3/etc/hadoop
[hadoop@cu2 spark-1.6.0-bin-2.6.3]$ export SPARK_PRINT_LAUNCH_COMMAND=true
// export HADOOP_ROOT_LOGGER=DEBUG,console Spark的脚本不认这个变量
[hadoop@cu2 spark-1.6.0-bin-2.6.3]$ bin/spark-submit --master yarn --deploy-mode client --class org.apache.spark.examples.streaming.HdfsWordCount lib/spark-examples-1.6.0-hadoop2.6.3.jar /data

// --driver-java-options "-Dhadoop.root.logger=WARN,console" 
// --driver-java-options "-Dhadoop.root.logger=WARN,console -agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=8090"

// org.apache.spark.deploy.yarn.Client#copyFileToRemote
// --conf "spark.yarn.jar=hdfs://hadoop-master2:9000/spark-assembly-1.6.0-hadoop2.6.3.jar"

// http://spark.apache.org/docs/latest/running-on-yarn.html
  • spark-1.5
1
2
3
4
5
-- jdk8-x64 & spark-1.5.2 & maven-3.3.9
set or export MAVEN_OPTS=-Xmx2g
mvn package eclipse:eclipse -Phadoop-2.6 -Pyarn -Phive -Phive-thriftserver -Dmaven.test.skip=true -Dmaven.javadoc.skip=true -DskipTests
-- 注释掉pom.xml中的<useZincServer>true</useZincServer> @see http://stackoverflow.com/questions/31844848/building-spark-with-maven-error-finding-javac-but-path-is-correct
-- 公司网络不稳定,遇到下载maven包报错,多重试几次!!
  • spark-1.4.1
1
2
3
4
5
6
7
8
9
10
[hadoop@cu2 spark-1.4.1]$ export MAVEN_OPTS="-Xmx3g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"

[hadoop@cu2 spark-1.4.1]$ mvn package -Phadoop-2.6 -Dhadoop.version=2.7.1 -Pyarn -Phive -Phive-thriftserver -Dmaven.test.skip=true -Dmaven.javadoc.skip=true -DskipTests

-- 打包:
-- // 修改BUILD_COMMAND变量
[hadoop@cu2 spark-1.4.1]$ vi make-distribution.sh 
BUILD_COMMAND=("$MVN"  package -DskipTests $@)

[hadoop@cu2 spark-1.4.1]$ ./make-distribution.sh --mvn `which mvn` --tgz  --skip-java-test   -Phadoop-2.6 -Dhadoop.version=2.7.1 -Pyarn -Phive -Phive-thriftserver -Dmaven.test.skip=true -Dmaven.javadoc.skip=true -DskipTests
  • spark-1.1.0

官网提供的hadoop版本没有2.5的。这里我自己下载源码再进行编译。先下载spark-1.1.0.tgz,解压然后执行命令编译:

1
2
3
4
export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"
mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.5.1 -Phive -X -DskipTests clean package

-- mvn package eclipse:eclipse -Phadoop-2.2 -Pyarn -Dmaven.test.skip=true -Dmaven.javadoc.skip=true -DskipTests

注意事项:用64位的JDK!!加上maven参数,不然很可能出现OOM(甚至各种稀奇古怪的问题)。编译的时间也挺长的,可以先去吃个饭。或者取消一些功能的编译(如hive)。

编译完后,在assembly功能下会生成包括所有spark及其依赖的jar文件。

1
2
3
4
[root@docker scala-2.10]# cd spark-1.1.0/assembly/target/scala-2.10/
[root@docker scala-2.10]# ll -h
total 135M
-rw-r--r--. 1 root root 135M Oct 15 21:18 spark-assembly-1.1.0-hadoop2.5.1.jar

打包:

上面我们已经编译好了spark程序,这里对其进行打包集成到一个压缩包。使用程序自带的make-distribution.sh即可。

为了减少重新编译的巨长的等待时间,修改下脚本make-distribution.sh的maven编译参数,去掉maven的clean阶段操作(最好直接注释掉mvn那行),修改最终结果如下:

1
2
#BUILD_COMMAND="mvn clean package -DskipTests $@"
BUILD_COMMAND="mvn package -DskipTests $@"

然后执行命令:

1
2
3
4
5
[root@docker spark-1.1.0]# sh -x make-distribution.sh --tgz  --skip-java-test -Pyarn -Phadoop-2.4 -Dhadoop.version=2.5.1 -Phive 
[root@docker spark-1.1.0]# ll -h
total 185M
...
-rw-r--r--. 1 root root 185M Oct 16 00:09 spark-1.1.0-bin-2.5.1.tgz

最终会在目录行打包生成tgz的文件。

本地运行

把本机ip主机名写入到hosts,方便以后windows本机查看日志

1
2
3
4
[root@docker spark-1.1.0-bin-2.5.1]# echo 192.168.154.128 docker >> /etc/hosts
[root@docker spark-1.1.0-bin-2.5.1]# cat /etc/hosts
...
192.168.154.128 docker
  • 运行helloworld:
1
2
3
4
5
6
7
[root@docker spark-1.1.0-bin-2.5.1]# bin/run-example SparkPi 10
Spark assembly has been built with Hive, including Datanucleus jars on classpath
...
14/10/16 00:22:36 INFO SparkContext: Job finished: reduce at SparkPi.scala:35, took 2.848632007 s
Pi is roughly 3.139344
14/10/16 00:22:36 INFO SparkUI: Stopped Spark web UI at http://docker:4040
...
  • 交互式操作:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
[root@docker spark-1.1.0-bin-2.5.1]# bin/spark-shell --master local[2]
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.1.0
      /_/

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_60)
...
14/10/16 00:25:57 INFO SparkUI: Started SparkUI at http://docker:4040
14/10/16 00:25:57 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/10/16 00:25:58 INFO Executor: Using REPL class URI: http://192.168.154.128:39385
14/10/16 00:25:58 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@docker:57417/user/HeartbeatReceiver
14/10/16 00:25:58 INFO SparkILoop: Created spark context..
Spark context available as sc.

scala> 

说明下环境:我使用windows作为开发环境,使用虚拟机中的linux作为测试环境。同时通过ssh连接的隧道来实现windows无缝的访问虚拟机linux操作系统(可以通过浏览器socket5代理查看web页面)。

启动交互式访问后,就可以通过浏览器访问4040查看spark程序的状态。

任务已经启动,接下来就可以进行操作:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
scala> val textFile=sc.textFile("README.md")
textFile: org.apache.spark.rdd.RDD[String] = README.md MappedRDD[1] at textFile at <console>:12

scala> textFile.count()
res0: Long = 141

scala> textFile.first()
res1: String = # Apache Spark

scala> val linesWithSpark = textFile.filter(line=>line.contains("Spark"))
linesWithSpark: org.apache.spark.rdd.RDD[String] = FilteredRDD[2] at filter at <console>:14

scala> textFile.filter(line=>line.contains("Spark")).count()
res2: Long = 21

scala> textFile.map(_.split(" ").size).reduce((a,b) => if(a>b) a else b)
res3: Int = 15

scala> import java.lang.Math
import java.lang.Math

scala> textFile.map(_.split(" ").size).reduce((a,b)=>Math.max(a,b))
res4: Int = 15

scala> val wordCounts = textFile.flatMap(_.split(" ")).map((_, 1)).reduceByKey(_+_)
wordCounts: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[8] at reduceByKey at <console>:15

scala> wordCounts.collect()
res5: Array[(String, Int)] = Array((means,1), (under,2), (this,4), (Because,1), (Python,2), (agree,1), (cluster.,1), (its,1), (follows.,1), (general,2), (have,2), (YARN,,3), (pre-built,1), (locally.,1), (locally,2), (changed,1), (MRv1,,1), (several,1), (only,1), (sc.parallelize(1,1), (This,2), (learning,,1), (basic,1), (requests,1), (first,1), (Configuration,1), (MapReduce,2), (CLI,1), (graph,1), (without,1), (documentation,1), ("yarn-client",1), ([params]`.,1), (any,2), (setting,2), (application,1), (prefer,1), (SparkPi,2), (engine,1), (version,3), (file,1), (documentation,,1), (<http://spark.apache.org/>,1), (MASTER,1), (entry,1), (example,3), (are,2), (systems.,1), (params,1), (scala>,1), (provides,1), (refer,1), (MLLib,1), (Interactive,2), (artifact,1), (configure,1), (can,8), (<art...

执行了上面一些操作后,通过网页查看状态变化:

Spark-standalone集群

部署集群需要用到多个服务器,这里我使用docker来进行部署。

本来应该早早完成本文的实践,但是在搭建docker-hadoop集群时花费了很多的时间。关于搭建集群dnsmasq处理域名问题参见下一篇文章。 最终实现可以参考:docker-hadoop

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
[root@docker docker-hadoop]# docker run -d  --dns 172.17.42.1 --name slaver1 -h slaver1 spark-yarn
[root@docker docker-hadoop]# docker run -d  --dns 172.17.42.1 --name slaver2 -h slaver2 spark-yarn
[root@docker docker-hadoop]# docker run -d  --dns 172.17.42.1 --name master -h master spark-yarn

[root@docker docker-hadoop]# docker ps | grep spark | awk '{print $1}' | xargs -I{} docker inspect -f ' ' {} > /etc/dnsmasq.hosts
[root@docker docker-hadoop]# cat /etc/dnsmasq.hosts 
172.17.0.29 master
172.17.0.28 slaver2
172.17.0.27 slaver1
[root@docker docker-hadoop]# service dnsmasq restart
[root@docker docker-hadoop]# ssh hadoop@master

[hadoop@master ~]$ ssh-copy-id master
[hadoop@master ~]$ ssh-copy-id localhost
[hadoop@master ~]$ ssh-copy-id slaver1
[hadoop@master ~]$ ssh-copy-id slaver2
[hadoop@master spark-1.1.0-bin-2.5.1]$ sbin/start-all.sh 
[hadoop@master spark-1.1.0-bin-2.5.1]$ /opt/jdk1.7.0_67/bin/jps  -m
266 Jps -m
132 Master --ip master --port 7077 --webui-port 8080

通过网页可以查看集群的状态:

运行任务连接到master:

1
2
3
4
5
6
[hadoop@master spark-1.1.0-bin-2.5.1]$ bin/spark-shell --master spark://master:7077
...
14/10/17 11:31:08 INFO BlockManagerMasterActor: Registering block manager slaver2:55473 with 265.4 MB RAM
14/10/17 11:31:09 INFO BlockManagerMasterActor: Registering block manager slaver1:33441 with 265.4 MB RAM

scala> 

从上图可以看到,程序已经正确连接到spark集群,master为driver,任务节点为slaver1和slaver2。下面运行下程序,然后通过网页查看运行的状态。

1
2
3
scala> val textFile=sc.textFile("README.md")
scala> textFile.count()
scala> textFile.map(_.split(" ").size).reduce((a,b) => if(a>b) a else b)

系统安装好了,启动spark-standalone集群和hadoop-yarn一样。配置ssh、java,然后启动,配合网页8080/4040可以实时的了解任务的指标。

yarn集群

注意:如果你是按照前面的步骤来操作的,需要先把spark-standalone的集群停掉。端口8080和yarn web使用端口冲突,会导致yarn启动失败。

修改spark-env.sh,添加HADOOP_CONF_DIR参数。然后提交任务到yarn上执行就行了。

1
2
3
4
5
6
7
8
[hadoop@master spark-1.1.0-bin-2.5.1]$ cat conf/spark-env.sh
#!/usr/bin/env bash

JAVA_HOME=/opt/jdk1.7.0_67 

HADOOP_CONF_DIR=/opt/hadoop-2.5.1/etc/hadoop

[hadoop@master spark-1.1.0-bin-2.5.1]$ bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster lib/spark-examples-1.1.0-hadoop2.5.1.jar  10

运行的结果输出在driver的slaver2节点,对应输出型来说不是很直观。spark-yarn提供了另一种方式,driver直接本地运行yarn-client

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
[hadoop@master spark-1.1.0-bin-2.5.1]$ bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client lib/spark-examples-1.1.0-hadoop2.5.1.jar  10
...
14/10/17 13:31:02 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 8248 ms on slaver1 (1/10)
14/10/17 13:31:02 INFO scheduler.TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, slaver1, PROCESS_LOCAL, 1228 bytes)
14/10/17 13:31:02 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 231 ms on slaver1 (2/10)
14/10/17 13:31:02 INFO scheduler.TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3, slaver1, PROCESS_LOCAL, 1228 bytes)
14/10/17 13:31:02 INFO scheduler.TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 158 ms on slaver1 (3/10)
14/10/17 13:31:02 INFO scheduler.TaskSetManager: Starting task 4.0 in stage 0.0 (TID 4, slaver1, PROCESS_LOCAL, 1228 bytes)
14/10/17 13:31:03 INFO scheduler.TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 284 ms on slaver1 (4/10)
14/10/17 13:31:03 INFO scheduler.TaskSetManager: Starting task 5.0 in stage 0.0 (TID 5, slaver1, PROCESS_LOCAL, 1228 bytes)
14/10/17 13:31:03 INFO scheduler.TaskSetManager: Finished task 4.0 in stage 0.0 (TID 4) in 175 ms on slaver1 (5/10)
14/10/17 13:31:03 INFO scheduler.TaskSetManager: Starting task 6.0 in stage 0.0 (TID 6, slaver1, PROCESS_LOCAL, 1228 bytes)
14/10/17 13:31:03 INFO scheduler.TaskSetManager: Finished task 5.0 in stage 0.0 (TID 5) in 301 ms on slaver1 (6/10)
14/10/17 13:31:03 INFO scheduler.TaskSetManager: Starting task 7.0 in stage 0.0 (TID 7, slaver1, PROCESS_LOCAL, 1228 bytes)
14/10/17 13:31:03 INFO scheduler.TaskSetManager: Finished task 6.0 in stage 0.0 (TID 6) in 175 ms on slaver1 (7/10)
14/10/17 13:31:03 INFO scheduler.TaskSetManager: Starting task 8.0 in stage 0.0 (TID 8, slaver1, PROCESS_LOCAL, 1228 bytes)
14/10/17 13:31:03 INFO scheduler.TaskSetManager: Finished task 7.0 in stage 0.0 (TID 7) in 143 ms on slaver1 (8/10)
14/10/17 13:31:03 INFO scheduler.TaskSetManager: Finished task 8.0 in stage 0.0 (TID 8) in 164 ms on slaver1 (9/10)
14/10/17 13:31:03 INFO scheduler.TaskSetManager: Starting task 9.0 in stage 0.0 (TID 9, slaver1, PROCESS_LOCAL, 1228 bytes)
14/10/17 13:31:03 INFO cluster.YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@slaver2:51923/user/Executor#1132577949] with ID 1
14/10/17 13:31:04 INFO util.RackResolver: Resolved slaver2 to /default-rack
14/10/17 13:31:04 INFO scheduler.TaskSetManager: Finished task 9.0 in stage 0.0 (TID 9) in 397 ms on slaver1 (10/10)
14/10/17 13:31:04 INFO cluster.YarnClientClusterScheduler: Removed TaskSet 0.0, whose tasks have all completed, from pool 
14/10/17 13:31:04 INFO scheduler.DAGScheduler: Stage 0 (reduce at SparkPi.scala:35) finished in 26.084 s
14/10/17 13:31:04 INFO spark.SparkContext: Job finished: reduce at SparkPi.scala:35, took 28.31400558 s
Pi is roughly 3.140248

thrift连接yarn运行时时受容器内存最大值限制,需要修改yarn-site.xml。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
cat yarn-site.xml 
<property>
  <name>yarn.nodemanager.resource.memory-mb</name>
  <value>32000</value>
</property>

<property>
  <name>yarn.nodemanager.resource.memory-mb</name>
  <value>32768</value>
</property>

<property>
  <name>yarn.scheduler.minimum-allocation-mb</name>
  <value>2048</value>
</property>

<property>
  <name>yarn.scheduler.maximum-allocation-mb</name>
  <value>32768</value>
</property>

./sbin/start-thriftserver.sh --executor-memory 29g --master yarn-client

不能把executor-memory的内存设置为等于最大值,否则会报错:

1
Exception in thread "main" java.lang.IllegalArgumentException: Required executor memory (30720+2150 MB) is above the max threshold (32768 MB) of this cluster!

总结

本文主要是搭建spark的环境搭建,本地运行、以及在docker中搭建spark集群、yarn集群三种方式。本地运行最简单方便,但是没有模拟到集群环境;spark提供了yarn框架上的实现,直接提交任务到yarn即可;spark集群相对比较简单和方便,接下来的远程调试主要通过spark伪分布式集群方式来进行。

参考

后记 Spark-1.3.0

编译1.3.0(cygwin)

正式环境用的hadoop-2.2,不是开发环境,没有maven等工具。先本地编译后,再方式去。(由于是添加计算的工具,可以随便一点)。

1
2
3
4
5
6
7
8
9
10
11
12
13
export MAVEN_OPTS="-Xmx3g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"
mvn package eclipse:eclipse -Phadoop-2.2 -Pyarn -Phive -Phive-thriftserver -Dmaven.test.skip=true -Dmaven.javadoc.skip=true -DskipTests

-- // 删除生成的eclipse文件中的including
find . -name ".classpath" | xargs -I{} sed -i 's/ including="\*\*\/\*\.java"//' {}

dos2unix make-distribution.sh
./make-distribution.sh --mvn `which mvn` --tgz  --skip-java-test -Phadoop-2.2 -Pyarn -Dmaven.test.skip=true -Dmaven.javadoc.skip=true -DskipTests

-- linux环境部署
-- // 这个版本,windows-cygwin编译的shell文件也是**windows的换行符**!!需要注意下!
[esw@bigdatamgr1 spark-1.3.0-bin-2.2.0]$ find bin/* -perm /u+x | xargs -I{} sed -i 's/^M//g' {} 
[esw@bigdatamgr1 spark-1.3.0-bin-2.2.0]$ find sbin/* -perm /u+x | xargs -I{} sed -i 's/^M//g' {} 

spark-1.3.0运行spark-sql

  1. 连接到hive-engine

  2. 依赖tez

hive的hive.execution.engine的tez,添加tez的jar和hive-site到CLASSPATH。

包的导入以及配置:(如果使用meta-service的就不用这么麻烦)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
[esw@bigdatamgr1 spark-1.3.0-bin-2.2.0]$ vi conf/spark-env.sh 
...
JAVA_HOME=/home/esw/jdk1.7.0_60

# log4j

__add_to_classpath() {

  root=$1

  if [ -d "$root" ] ; then
    for f in `ls $root/*.jar | grep -v -E '/hive.*.jar'`  ; do
      if [ -n "$SPARK_DIST_CLASSPATH" ] ; then
        export SPARK_DIST_CLASSPATH=$SPARK_DIST_CLASSPATH:$f
      else
        export SPARK_DIST_CLASSPATH=$f
      fi
    done
  fi

}

__add_to_classpath "/home/esw/tez-0.4.0-incubating"
__add_to_classpath "/home/esw/tez-0.4.0-incubating/lib"
__add_to_classpath "/home/esw/apache-hive-0.13.1/lib"

export HADOOP_CONF_DIR=/data/opt/ibm/biginsights/hadoop-2.2.0/etc/hadoop
export SPARK_CLASSPATH=/home/esw/spark-1.3.0-bin-2.2.0/conf:$HADOOP_CONF_DIR

不能直接把hive的包全部加进去,hive-0.13.1a和hive-0.13.1的部分包不一致!!

  java.lang.NoSuchMethodException: org.apache.hadoop.hive.ql.exec.Utilities.deserializeObjectByKryo(com.esotericsoftware.kryo.Kryo, java.io.InputStream, java.lang.Class)

  private static java.lang.Object org.apache.hadoop.hive.ql.exec.Utilities.deserializeObjectByKryo(org.apache.hive.com.esotericsoftware.kryo.Kryo,java.io.InputStream,java.lang.Class)

* 如果不依赖tez,可以直接把datanucleus的三个包拷贝到lib目录下。

[esw@bigdatamgr1 spark-1.3.0-bin-2.2.0]$ ll lib
total 262364
-rw-rw-r-- 1 hadoop hadoop    339666 Mar 25 19:35 datanucleus-api-jdo-3.2.6.jar
-rw-rw-r-- 1 hadoop hadoop   1890075 Mar 25 19:35 datanucleus-core-3.2.10.jar
-rw-rw-r-- 1 hadoop hadoop   1809447 Mar 25 19:35 datanucleus-rdbms-3.2.9.jar
-rwxr-xr-x 1 hadoop hadoop   4136686 Mar 31 13:05 spark-1.3.0-yarn-shuffle.jar
-rwxr-xr-x 1 hadoop hadoop 154198768 Mar 31 13:05 spark-assembly-1.3.0-hadoop2.2.0.jar
-rwxr-xr-x 1 hadoop hadoop 106275583 Mar 31 13:05 spark-examples-1.3.0-hadoop2.2.0.jar
  
[esw@bigdatamgr1 conf]$ ll
...
lrwxrwxrwx 1 esw biadmin   50 Mar 31 13:26 hive-site.xml -> /home/esw/apache-hive-0.13.1/conf/hive-site.xml
-rw-r--r-- 1 esw biadmin  632 Mar 31 15:12 log4j.properties
lrwxrwxrwx 1 esw biadmin   44 Mar 31 10:20 slaves -> /data/opt/ibm/biginsights/hadoop-conf/slaves
-rwxr-xr-x 1 esw biadmin 3380 Mar 31 16:17 spark-env.sh
lrwxrwxrwx 1 esw biadmin   62 Mar 31 16:17 tez-site.xml -> /data/opt/ibm/biginsights/hadoop-2.2.0/etc/hadoop/tez-site.xml

上面用的是hive-site.xml直接连接数据库的方式。也可以起hive-metaserver,然后spark通过连接meta即可:

1
2
3
4
5
6
7
8
9
10
# 起meta服务
nohup bin/hive --service metastore > metastore.log 2>&1 &

# hive客户端配置
vi hive-site.xml
<property>
  <name>hive.metastore.uris</name>
  <value>thrift://DataNode2:9083</value>
  <description>Thrift uri for the remote metastore. Used by metastore client to connect to remote metastore.</description>
</property>
  1. 运行:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
[esw@bigdatamgr1 spark-1.3.0-bin-2.2.0]$  bin/spark-sql 2>sql.log
SET spark.sql.hive.version=0.13.1
spark-sql> show databases;
default
neva2dta
spark-sql> show tables;
pokes   false
t_neva2_dps_xdr false
t_neva2_ipdr_xdr        false
spark-sql> select count(*) from pokes;
500
spark-sql> 

[esw@bigdatamgr1 conf]$ vi spark-env.sh 
#!/usr/bin/env bash

JAVA_HOME=/home/esw/jdk1.7.0_60
SPARK_CLASSPATH='/home/esw/apache-hive-0.13.1/lib/*:/home/esw/tez-0.4.0-incubating/*:/home/esw/tez-0.4.0-incubating/lib/*'

# 同步
[esw@bigdatamgr1 ~]$ for h in `cat ~/spark-1.3.0-bin-2.2.0/conf/slaves` ; do rsync -vaz /data/opt/ibm/biginsights/hadoop-2.2.0 $h:/data/opt/ibm/biginsights/  ; done

运行hivesever服务

1
2
3
4
5
6
7
8
[esw@bigdatamgr1 spark-1.3.0-bin-2.2.0]$ cat start_thrift.sh 
#!/bin/bash
# hive-classpath已经在spark-env.sh中添加

./sbin/start-thriftserver.sh --master spark://bigdatamgr1:7077 --executor-memory 16g
[esw@bigdatamgr1 spark-1.3.0-bin-2.2.0]$ ./start_thrift.sh 

[esw@bigdatamgr1 spark-1.3.0-bin-2.2.0]$ bin/beeline -u jdbc:hive2://bigdatamgr1:10001 -n esw -p '' 

在不依赖外部的jar时,spark的启动脚本是没有问题的,但是我们添加了很多依赖的jar这么写就有问题了,尽管thrift启动正常,但是shell总是打印错误:

1
2
3
4
failed to launch org.apache.spark.sql.hive.thriftserver.HiveThriftServer2:
  ========================================
  
full log in /home/esw/spark-1.3.0-bin-2.2.0/sbin/../logs/spark-esw-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-bigdatamgr1.out

比较隐晦,问题在sbin/spark-daemon.sh,启动完后通过if [[ ! $(ps -p "$newpid" -o args=) =~ $command ]]; then(其中=~表示正则匹配,最终spark-class.sh调用java会加上classpath),而上面的classpath会很长,导致上面的匹配失败!!

1
2
3
4
5
6
7
8
[hadoop@bigdatamgr1 spark-1.3.0-bin-2.2.0]$ vi bin/spark-class
...
  exec "$RUNNER" -cp "$CLASSPATH" $JAVA_OPTS "$@"
fi

-- 匹配失败时的值
[esw@bigdatamgr1 spark-1.3.0-bin-2.2.0]$ ps -p 1925344 -o args=
/home/esw/jdk1.7.0_60/bin/java -cp :/home/esw/spark-1.3.0-bin-2.2.0/sbin/../conf:/home/esw/spark-1.3.0-bin-2.2.0/lib/spark-assembly-1.3.0-hadoop2.2.0.jar:/home/esw/spark

解决办法

先看实验:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[dpi@dacs tmp]$ java -cp ~/kettle/data-integration/lib/mysql-connector-java-5.1.31-bin.jar:. JDBCConnTest

[dpi@dacs tmp]$ echo $CLASSPATH
.
[dpi@dacs tmp]$ export CLASSPATH=~/kettle/data-integration/lib/mysql-connector-java-5.1.31-bin.jar
[dpi@dacs tmp]$ java JDBCConnTest
错误: 找不到或无法加载主类 JDBCConnTest
[dpi@dacs tmp]$ java -cp . JDBCConnTest
java.lang.ClassNotFoundException: com.mysql.jdbc.Driver

[dpi@dacs tmp]$ echo $CLASSPATH
/home/dpi/kettle/data-integration/lib/mysql-connector-java-5.1.31-bin.jar
[dpi@dacs tmp]$ export CLASSPATH=~/kettle/data-integration/lib/mysql-connector-java-5.1.31-bin.jar:.
[dpi@dacs tmp]$ java JDBCConnTest

设置cp后会覆盖CLASSPATH。所以问题的解决方法:直接把cp的路径删掉(不添加),前面export的classpath路径。java程序会去主动获取改环境变量。

1
2
  export CLASSPATH
  exec "$RUNNER" $JAVA_OPTS "$@"

效果如下:

1
2
++ ps -p 1932338 -o args=
+ [[ ! /home/esw/jdk1.7.0_60/bin/java -XX:MaxPermSize=128m -Xms512m -Xmx512m org.apache.spark.deploy.SparkSubmit --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 --executor-memory 48g spark-internal =~ org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 ]]

Spark-HA

仅需要配置,重启spark集群即可。

1
2
3
4
5
6
7
[esw@bigdata8 spark-1.3.0-bin-2.2.0]$ cat conf/spark-env.sh
...
SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=bi-00-01.bi.domain.com:2181 -Dspark.deploy.zookeeper.dir=/spark"

[esw@bigdatamgr1 conf]$ vi spark-defaults.conf 
spark.master                     spark://bigdatamgr1:7077,bigdata8:7077
...

各个master要单独的启动:

1
2
[esw@bigdatamgr1 spark-1.3.0-bin-2.2.0]$ sbin/start-all.sh 
[esw@bigdata8 spark-1.3.0-bin-2.2.0]$ sbin/start-master.sh 

通过查看http://bigdata8:8080/当前的状态为STANDBY。Workers列表为空。

1
[esw@bigdatamgr1 spark-1.3.0-bin-2.2.0]$ sbin/stop-master.sh 

停了bigdatamgr1后,刷新bigdata8:8080页面等1分钟左右就变成ALIVE,然后其他所有的节点也连接到bigdata8了。

–END

[读码] Spark1.1.0前篇–代码统计导入Eclipse

看过亚太研究院的spark在线教学视频,说spark1.0的源码仅有3w+的代码,蠢蠢欲动。先具体看下源码的量,估算估算;然后搭建eclipse读码环境。

计算源码行数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
winse@Lenovo-PC ~/git/spark
$ git branch -v
* (detached from v1.1.0) 2f9b2bd [maven-release-plugin] prepare release v1.1.0-rc4
  master                 4d8ae70 [behind 1246] Cleanup on Connection and ConnectionManager

winse@Lenovo-PC ~/git/spark
$ find . -name "*.scala" | grep 'src/main' | xargs sed  -e 's:\/\*.*\*\/::' -e  '/\/\*/, /\*\//{
/\/\*/{
 s:\/\*.*::p
}
/\*\//{
 s:.*\*\/::p
}
d
}' | sed -e '/^\s*$/d' -e '/^\s*\/\//d' | grep -v '^import' | grep -v '^package' | wc -l
72967

winse@Lenovo-PC ~/git/spark
$ ^scala^java
1749

winse@Lenovo-PC ~/git/spark
$ ^src/main^core/src/main
877

winse@Lenovo-PC ~/git/spark
$ ^java^scala
38526

全部源码的数量(去掉测试)大概在7W左右,仅计算核心代码core下面的代码量在4W。从量上面来说还是比较乐观的,学习scala然后读spark的源码。

spark1.0.0的核心代码量在3w左右。1.1多了大概1w行!!

Docker

查看目录结构的时刻,看到spark1下面竟然有docker,不过看Dockerfile的内容只是简单的安装了scala、把本机的spark映射到docker容器、然后运行spark主从集群。

导入eclipse

spark使用主要使用scala编写,首先需要下载scala-ide直接下载2.10的版本(基于eclipse,很多操作都类似);然后下载spark的源码检出v1.1.0的;然后使用maven生成eclipse工程文件。

(不推荐)使用sbt生成工程文件。这种方式会缺少一些依赖的jar,处理比较麻烦,还不清楚到底是少了啥!

1
2
3
4
$ cd sbt/
$ sed -i 's/^M//g' *
$ cd ..
$ sbt/sbt eclipse -mem 512

(推荐)使用MVN编译生成,使用Maven生成官网文章

1
2
3
4
5
6
7
8
9
10
11
12
13
winse@Lenovo-PC ~/git/spark
$ git clean -x -fd #清理非仓库代码

$ echo $SCALA_HOME #指定scala-home
/cygdrive/d/scala

# 这里我直接修改默认值,理论上加 -Phadoop-2.2 选项应该也是可以的
$ vi pom.xml # hadoop.version 2.2.0
$ mvn eclipse:eclipse

$ find . -name ".classpath" | xargs sed -i -e 's/including="\*\*\/\*.java"//' -e 's/excluding="\*\*\/\*.java"//'

#也可以把添加特性的操作/添加scala源码包操作批量处理掉

然后导入到eclipse,然后再针对性的处理报错:

  • 先把每个工程都添加scala特性
  • 把含有python源码包的去掉(手动删除.classpath中classpathentry即可)
  • 确认下并加上src/test/scala的源码包。

注意,进行上面的步骤之前,由于scala源文件比较多,编译的时间会比较长,先把Project->Build Automatically去掉,然后一次性把问题处理掉后再手动build!

  • 手动使用existing maven projects导入yarn/stable,然后把yarn/common以链接的形式引入,并添加到源码包。

还有一个 value q is not a member of StringContext quasiquotes的错误,有些类需要在2.10添加编译组件才能正常编译,修改scala编译首选项。

添加依赖的编译组件后,整个功能就能正常编译通过了。接下来就能调试看源码了。

备注:clean后发现target目录下并没有重新编译生成class,去掉-Xshow-phases才行。

-Xshow-phases Print a synopsis of compiler phases.

Maven编译spark

如果使用的hadoop版本在官网没有集成assembly版本,可以使用maven手动构建。至于打包可以查看下一篇文章。

1
2
$ export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"
$ mvn -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 -DskipTests clean package

yarn的profile能够编译成可执行的jar文件(包括所有依赖的spark),具体内容下一篇讲。

小结

断断续续的写了两天,字数统计弄了大半天,主要在于多行注释的处理。时间最主要都消耗在sbt、maven构建eclipse项目文件(生成、fixed)上。编译scala量上去后确实非常非常的慢,不管是maven还是eclipse都慢!

下一篇将使用docker搭建spark环境,并使用远程调试连接到helloworld程序。

参考

–END

思考

随着年龄的增大,很多原来不曾想的问题慢慢的都开始环绕在自己四周。开始让自己不得不反思,不得不去改变。

本人是一个性格比较极端,又很内向,所以对自己不关心、无自己原来没有直接联系的东西,很少体现积极主动的一面。时时刻刻展现着保守派的作风。自己又在学习能力方面自我感觉良好,对现状总是很不满,对一样事物的持续坚持的耐久力不足(倒不是不能吃苦、吃不了苦的问题)!

从出生到毕业,一直以来都有亲人朋友让我依靠,有很明确值得挑战和超越的目标(总体水平一般,在我前面的人乌压压一片)。出来工作后一直都很迷失,不知道自己能干啥,可以干啥,师范类专业连教师资格证都没有拿到(不是后悔,自己觉得不应该)!!现在想来其实自己太执拗,像极了不撞南墙死不改的蛮牛!!

年龄增加体力不及,开始思考着应该去锻炼锻炼了,但是一直各种借口无疾而终!觉得身体还行,以后再说。。。 工作资历增加直接辅导指导的大哥不再,开始各种瞎折腾,东一锤西一棒,终究是拣了芝麻丢了西瓜!觉得学习能力强以后都敢都来得及,以后再学呗。。。
但是在运动场上,一直坚持运动的同学,打个3、4个小时的羽毛球气不喘一下,这时开始懊悔。 当原来一起协作的同事,开始在领域有所斩获,各种嫉妒羡慕的心里开始作祟。

星期一个个的开始了结束,自己却没有得到该有的锤炼和进度,在大势所趋下,自己却总是那么的慢慢吞吞!阅读一个类的千行源码,竟然断断续续花费了仅2个月!本来年前看tomcat原来的计划最终石沉大海!

总是对自己不够狠;狠下来一次后,总是各种理由,最终不能坚持!

–END

配置ssh登录docker-centos

上一篇写的是docker的入门知识,并没有进行实战。这些记录下使用ssh登录centos容器。

前文中参考的博客介绍了使用ssh登录tutorial容器(ubuntu),然后进行tomcat的安装,以及通过端口映射在客户机进行访问的例子。

尝试

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
docker pull learn/tutorial
docker run -i -t learn/tutorial /bin/bash
  apt-get update
  apt-get install openssh-server
  which sshd
  /usr/sbin/sshd
  mkdir /var/run/sshd
  passwd #输入用户密码,我这里设置为123456,便于SSH客户端登陆使用
  exit #退出
docker ps -l
docker commit 51774a81beb3 learn/tutorial # 提交后,下次启动就可以基于容器更改的系统
docker run -d -p 49154:22 -p 80:8080 learn/tutorial /usr/sbin/sshd -D
ssh root@127.0.0.1 -p 49154
  # 在ubuntu 12.04上安装oracle jdk 7
  apt-get install python-software-properties
  add-apt-repository ppa:webupd8team/java
  apt-get update
  apt-get install -y wget
  apt-get install oracle-java7-installer
  java -version
  # 下载tomcat 7.0.47
  wget http://mirror.bit.edu.cn/apache/tomcat/tomcat-7/v7.0.47/bin/apache-tomcat-7.0.47.tar.gz
  # 解压,运行
  tar xvf apache-tomcat-7.0.47.tar.gz
  cd apache-tomcat-7.0.47
  bin/startup.sh

然而在centos上,运行是不成功的。总结操作如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
[root@docker ~]# docker pull centos:centos6
[root@docker ~]# docker run -i -t  centos:centos6 /bin/bash
  yum install which openssh-server openssh-clients

  /usr/sbin/sshd # 这里会报错,需要手动生成key
  ssh-keygen -f /etc/ssh/ssh_host_rsa_key
  ssh-keygen -t dsa -f /etc/ssh/ssh_host_dsa_key

  vi /etc/pam.d/sshd  # 修改pam_loginuid.so为optional
  # /bin/sed -i 's/.*session.*required.*pam_loginuid.so.*/session optional pam_loginuid.so/g' /etc/pam.d/sshd
  
  passwd # 添加密码
  
  rm -rf /etc/localtime
  ln -s /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
  cat > /etc/sysconfig/clock  <<EOF
  ZONE="Asia/Shanghai"
  UTC=True
  EOF
  • 提交保存成果
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
[root@docker ~]# docker ps -l
[root@docker ~]# docker commit 3a7b6994bb2a winse/hadoop # 保存为自己使用的版本

[root@docker ~]# docker run -d winse/hadoop /usr/sbin/sshd
f5cb57f6ec22dd9d257bf610322e2bd547ea0064262fcad63308b932c0490670
[root@docker ~]# docker ps -l
CONTAINER ID        IMAGE                 COMMAND             CREATED             STATUS                     PORTS               NAMES
f5cb57f6ec22        winse/hadoop:latest   /usr/sbin/sshd      2 seconds ago       Exited (0) 2 seconds ago                       sharp_rosalind      

[root@docker ~]# docker run -d -p 8888:22 winse/hadoop /usr/sbin/sshd -D
f9814253159373e8a8df3261904200a733b41c63f55708db3cb56a7ebf650cef
[root@docker ~]# docker ps -l
CONTAINER ID        IMAGE                 COMMAND             CREATED             STATUS              PORTS                  NAMES
f98142531593        winse/hadoop:latest   /usr/sbin/sshd -D   5 seconds ago       Up 4 seconds        0.0.0.0:8888->22/tcp   boring_bell         
[root@docker ~]# ssh localhost -p 8888
The authenticity of host '[localhost]:8888 ([::1]:8888)' can't be established.
RSA key fingerprint is f5:5e:be:ae:ea:b1:ed:e8:49:43:28:9e:80:87:0d:86.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '[localhost]:8888' (RSA) to the list of known hosts.
root@localhost's password: 
Last login: Mon Sep 29 14:48:23 2014 from localhost
-bash-4.1# 

参数-D表示sshd运行在前台。这样当前的docker容器就会一直有程序在运行,不至于执行完指定的任务就被关闭掉了。

在centos配置ssh登录需要进行额外参数的设置。这个还是挺折腾人的。关于把/etc/pam.d/sshd中的pam_loginuid.so修改为optional,stackoverflow)上的回答还是挺中肯的。

连上ssh后,下一步就和你远程操作服务器一样了。其实docker运行一个容器后,就会分配一个ip,你也可以根据这个ip来连接。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
[root@docker ~]# docker run -t -i winse/hadoop /bin/bash
bash-4.1# ssh localhost
ssh: connect to host localhost port 22: Connection refused
bash-4.1# service sshd start
Starting sshd:                                             [  OK  ]
bash-4.1# ifconfig
eth0      Link encap:Ethernet  HWaddr 1E:2B:23:16:98:7E  
          inet addr:172.17.0.31  Bcast:0.0.0.0  Mask:255.255.0.0
          inet6 addr: fe80::1c2b:23ff:fe16:987e/64 Scope:Link

# 新开一个终端
[root@docker ~]# ssh 172.17.0.31
The authenticity of host '172.17.0.31 (172.17.0.31)' can't be established.
RSA key fingerprint is f5:5e:be:ae:ea:b1:ed:e8:49:43:28:9e:80:87:0d:86.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '172.17.0.31' (RSA) to the list of known hosts.
root@172.17.0.31's password: 
Last login: Mon Sep 29 14:48:23 2014 from localhost
-bash-4.1#           

使用Dockerfile脚本安装

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
[root@docker ~]# mkdir hadoop
[root@docker ~]# cd hadoop/
[root@docker hadoop]# touch Dockerfile
[root@docker hadoop]# vi Dockerfile
  # hadoop2 on docker-centos
  FROM centos:centos6
  MAINTAINER Winse <fuqiuliu2006@qq.com>
  RUN yum install -y which openssh-clients openssh-server #-y表示交互都输入yes

  RUN ssh-keygen -f /etc/ssh/ssh_host_rsa_key
  RUN ssh-keygen -t dsa -f /etc/ssh/ssh_host_dsa_key

  RUN echo 'root:hadoop' |chpasswd # echo password | passwd --stdin root

  RUN sed -i '/pam_loginuid.so/c session    optional     pam_loginuid.so'  /etc/pam.d/sshd

  EXPOSE 22
  CMD /usr/sbin/sshd -D
  
[root@docker hadoop]# docker build -t="winse/hadoop" .

[root@docker hadoop]# docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
winse/hadoop        latest              9d7f115ef0ec        5 minutes ago       289.1 MB
...

[root@docker hadoop]# docker run -d --name slaver1 winse/hadoop
[root@docker hadoop]# docker run -d --name slaver2 winse/hadoop
[root@docker hadoop]# docker run -d --name master1 -P --link slaver1:slaver1 --link slaver2:slaver2  winse/hadoop

[root@docker hadoop]# docker restart slaver1 slaver2 master1
slaver1
slaver2
master1

[root@docker hadoop]# docker port master1 22
0.0.0.0:49159
[root@docker hadoop]# ssh localhost -p 49159
... 
-bash-4.1# cat /etc/hosts
172.17.0.31     7ef63f98e2d1
127.0.0.1       localhost
::1     localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
172.17.0.29     slaver1
172.17.0.30     slaver2

参考

–END