K8s Hadoop Deploy

折磨了一个多星期，最后还是调通了。折磨源于不自知，源于孤单，源于自负，后来通过扩展、查阅资料、请教同事顺利解决。简单部署可以查看README.md 。

yum install docker-engine-1.12.6 docker-engine-selinux-1.12.6 -y

cd kube-deploy
vi hosts
vi k8s.profile
# 把deploy同步到其他实体机，同时把k8s.profile映射到/etc/profile.d
./rsync-deploy.sh

cd docker-multinode/
./master.sh or ./worker.sh

docker save gcr.io/google_containers/etcd-amd64:3.0.4 | docker-bs load
docker save quay.io/coreos/flannel:v0.6.1-amd64 | docker-bs load

cd kube-deploy/hadoop/kubenetes/
./prepare.sh
kubectl create -f hadoop-master2.yaml
kubectl create -f hadoop-slaver.yaml 

Tip：其实使用一套配置就可以启动多个集群，在 kubectl create 后面加上 -n namespace 即可。

比如：

[root@cu2 kubenetes]# kubectl create namespace hd1
[root@cu2 kubenetes]# kubectl create namespace hd2

[root@cu2 kubenetes]# ./prepare.sh hd1
[root@cu2 kubenetes]# kubectl create -f hadoop-master2.yaml -n hd1
[root@cu2 kubenetes]# kubectl create -f hadoop-slaver.yaml -n hd1
[root@cu2 kubenetes]# ./prepare.sh hd2
[root@cu2 kubenetes]# kubectl create -f hadoop-master2.yaml -n hd2
[root@cu2 kubenetes]# kubectl create -f hadoop-slaver.yaml -n hd2

[root@cu2 kubenetes]# kubectl get pods --all-namespaces
NAMESPACE     NAME                                    READY     STATUS    RESTARTS   AGE
hd1           hadoop-master2                          1/1       Running   0          28s
hd1           slaver-rc-fdcsw                         1/1       Running   0          18s
hd1           slaver-rc-qv964                         1/1       Running   0          18s
hd2           hadoop-master2                          1/1       Running   0          26s
hd2           slaver-rc-0vdfk                         1/1       Running   0          17s
hd2           slaver-rc-r7g84                         1/1       Running   0          17s
...

现在想来其实就是 dockerd –ip-masq=false 的问题（所有涉及的dockerd都需要加）。还有就是一台机器单机下的容器互相访问，源IP都错也是安装了openvpn所导致，对所有过eth0的都加了MASQUERADE。

根源就在于请求的源地址被替换，也就是iptables的转发进行了SNAT。关于iptables转发这篇文章讲的非常清晰；IPtables之四：NAT原理和配置。

所遇到的问题

没加ip-masq之前，namenode收到datanode的请求后，源地址是flannel.0的ip: 10.1.98.0。

namenode对应的日志为：

2017-04-09 07:22:06,920 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* registerDatanode: from DatanodeRegistration(10.1.98.0, datanodeUuid=5086c549-f3bb-4ef6-8f56-05b1f7adb7d3, infoPort=50075, ipcPort=50020, storageInfo=lv=-56;cid=CID-522174fa-6e7b-4c3f-ae99-23c3018e35d7;nsid=1613705851;c=0) storage 5086c549-f3bb-4ef6-8f56-05b1f7adb7d3
2017-04-09 07:22:06,920 INFO org.apache.hadoop.net.NetworkTopology: Removing a node: /default-rack/10.1.98.0:50010
2017-04-09 07:22:06,921 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/10.1.98.0:50010

一开始以为是flannel的问题，换成yum安装，然后同时flannel把backend切换成vxlan后，还是一样的问题。

最后请教搞网络的同事，应该是请求的源地址被替换了，也就定位到iptables。然后通过查看文档，其实前面也有看到过对应的文章，但是看不明白不知道缘由。

iptables的部分相关信息：

[root@cu2 ~]# iptables -S -t nat
...
-A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A PREROUTING -j PREROUTING_direct
-A PREROUTING -j PREROUTING_ZONES_SOURCE
-A PREROUTING -j PREROUTING_ZONES
-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
-A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A OUTPUT -j OUTPUT_direct
-A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
-A POSTROUTING -s 10.1.34.0/24 ! -o docker0 -j MASQUERADE
-A POSTROUTING -m comment --comment "kubernetes postrouting rules" -j KUBE-POSTROUTING
-A POSTROUTING -j POSTROUTING_direct
-A POSTROUTING -j POSTROUTING_ZONES_SOURCE
-A POSTROUTING -j POSTROUTING_ZONES
-A KUBE-MARK-DROP -j MARK --set-xmark 0x8000/0x8000
-A KUBE-MARK-MASQ -j MARK --set-xmark 0x4000/0x4000
-A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -m mark --mark 0x4000/0x4000 -j MASQUERADE
-A KUBE-SEP-75CPIAPDB4MAVFWI -s 10.1.40.3/32 -m comment --comment "kube-system/kube-dns:dns-tcp" -j KUBE-MARK-MASQ
-A KUBE-SEP-75CPIAPDB4MAVFWI -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp" -m tcp -j DNAT --to-destination 10.1.40.3:53
-A KUBE-SEP-IWNPEB4T46P6VG5J -s 192.168.0.148/32 -m comment --comment "default/kubernetes:https" -j KUBE-MARK-MASQ
-A KUBE-SEP-IWNPEB4T46P6VG5J -p tcp -m comment --comment "default/kubernetes:https" -m recent --set --name KUBE-SEP-IWNPEB4T46P6VG5J --mask 255.255.255.255 --rsource -m tcp -j DNAT --to-destination 192.168.0.148:6443
-A KUBE-SEP-UYUINV25NDNSKNUW -s 10.1.40.3/32 -m comment --comment "kube-system/kube-dns:dns" -j KUBE-MARK-MASQ
-A KUBE-SEP-UYUINV25NDNSKNUW -p udp -m comment --comment "kube-system/kube-dns:dns" -m udp -j DNAT --to-destination 10.1.40.3:53
-A KUBE-SEP-XDHL2OHX2ICPQHKI -s 10.1.40.2/32 -m comment --comment "kube-system/kubernetes-dashboard:" -j KUBE-MARK-MASQ
-A KUBE-SEP-XDHL2OHX2ICPQHKI -p tcp -m comment --comment "kube-system/kubernetes-dashboard:" -m tcp -j DNAT --to-destination 10.1.40.2:9090
-A KUBE-SERVICES -d 10.0.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-SVC-NPX46M4PTMTKRN6Y
-A KUBE-SERVICES -d 10.0.0.95/32 -p tcp -m comment --comment "kube-system/kubernetes-dashboard: cluster IP" -m tcp --dport 80 -j KUBE-SVC-XGLOHA7QRQ3V22RZ
-A KUBE-SERVICES -d 10.0.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -j KUBE-SVC-TCOU7JCQXEZGVUNU
-A KUBE-SERVICES -d 10.0.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -j KUBE-SVC-ERIFXISQEP7F7OF4
-A KUBE-SERVICES -m comment --comment "kubernetes service nodeports; NOTE: this must be the last rule in this chain" -m addrtype --dst-type LOCAL -j KUBE-NODEPORTS
-A KUBE-SVC-ERIFXISQEP7F7OF4 -m comment --comment "kube-system/kube-dns:dns-tcp" -j KUBE-SEP-75CPIAPDB4MAVFWI
-A KUBE-SVC-NPX46M4PTMTKRN6Y -m comment --comment "default/kubernetes:https" -m recent --rcheck --seconds 10800 --reap --name KUBE-SEP-IWNPEB4T46P6VG5J --mask 255.255.255.255 --rsource -j KUBE-SEP-IWNPEB4T46P6VG5J
-A KUBE-SVC-NPX46M4PTMTKRN6Y -m comment --comment "default/kubernetes:https" -j KUBE-SEP-IWNPEB4T46P6VG5J
-A KUBE-SVC-TCOU7JCQXEZGVUNU -m comment --comment "kube-system/kube-dns:dns" -j KUBE-SEP-UYUINV25NDNSKNUW
-A KUBE-SVC-XGLOHA7QRQ3V22RZ -m comment --comment "kube-system/kubernetes-dashboard:" -j KUBE-SEP-XDHL2OHX2ICPQHKI

在dockerd服务脚本加上 --ip-masq=false 后，-A POSTROUTING -s 10.1.34.0/24 ! -o docker0 -j MASQUERADE 这一句就没有了，也就是不会进行源地址重写了，这样请求发送到namenode后还是datanode容器的IP。问题解决，原因简单的让人欲哭无泪啊。

写yaml遇到的一些其他问题：

当然还有很多其他的问题，这篇就写这么多，优化工作后面的弄好了再写。

中间过程步骤记录

主要就是记录心路历程，如果以后遇到同样的问题能让自己快速回想起来。如果仅仅为了部署，可以跳过该部分，直接后最后的常用命令。

记录下中间 通过yum安装etcd和flanneld 的过程。物理机安装flanneld会把配置docker环境变量（/run/flannel/subnet.env）加入启动脚本。

安装docker-v1.12
https://docs.docker.com/v1.12/
https://docs.docker.com/v1.12/engine/installation/linux/centos/

# 删掉原来的
yum-config-manager --disable docker-ce*
yum remove -y docker-ce*

sudo tee /etc/yum.repos.d/docker.repo <<-'EOF'
[dockerrepo]
name=Docker Repository
baseurl=https://yum.dockerproject.org/repo/main/centos/7/
enabled=1
gpgcheck=1
gpgkey=https://yum.dockerproject.org/gpg
EOF

https://yum.dockerproject.org/repo/main/centos/7/Packages/
[root@cu3 ~]# yum --showduplicates list docker-engine | expand
docker-engine.x86_64             1.12.6-1.el7.centos                  dockerrepo

[root@cu3 yum.repos.d]# yum install docker-engine-1.12.6 docker-engine-selinux-1.12.6


https://kubernetes.io/docs/getting-started-guides/centos/centos_manual_config/

cat > /etc/yum.repos.d/virt7-docker-common-release.repo <<EOF
[virt7-docker-common-release]
name=virt7-docker-common-release
baseurl=http://cbs.centos.org/repos/virt7-docker-common-release/x86_64/os/
gpgcheck=0
EOF

yum -y install --enablerepo=virt7-docker-common-release etcd flannel
yum -y install --enablerepo=virt7-docker-common-release flannel

- ETCD配置
[root@cu3 docker-multinode]# 
etcdctl mkdir /kube-centos/network
etcdctl set /kube-centos/network/config "{ \"Network\": \"10.1.0.0/16\", \"SubnetLen\": 24, \"Backend\": { \"Type\": \"vxlan\" } }"

- FlANNEL
[root@cu3 ~]# cat /etc/sysconfig/flanneld
# Flanneld configuration options  

# etcd url location.  Point this to the server where etcd runs
FLANNEL_ETCD_ENDPOINTS="http://cu3:2379"

# etcd config key.  This is the configuration key that flannel queries
# For address range assignment
FLANNEL_ETCD_PREFIX="/kube-centos/network"

# Any additional options that you want to pass
#FLANNEL_OPTIONS=""

[root@cu2 yum.repos.d]# systemctl daemon-reload

[root@cu2 yum.repos.d]# cat /run/flannel/subnet.env

[root@cu2 ~]# systemctl cat docker
...
# /usr/lib/systemd/system/docker.service.d/flannel.conf
[Service]
EnvironmentFile=-/run/flannel/docker 

测试过程中有yaml配置中启动sshd，然后启动容器后，通过手动启动namenode、datanode的方式来测试：

cd hadoop-2.6.5
gosu hadoop mkdir /data/bigdata
gosu hadoop sbin/hadoop-daemon.sh start datanode 

cd hadoop-2.6.5/
gosu hadoop  bin/hadoop namenode -format 
gosu hadoop sbin/hadoop-daemon.sh start namenode

后来发现问题出在iptables后，又回到原来的docker-bootstrap启动，需要删除flannel.1的网络：

# yum安装flanneld后停止 https://kubernetes.io/docs/getting-started-guides/scratch/
ip link set flannel.1 down
ip link delete flannel.1
route -n

rm /usr/lib/systemd/system/docker.service.d/flannel.conf 

开了防火墙的话，把容器的端加入到信任列表：

systemctl enable firewalld && systemctl start firewalld

firewall-cmd --zone=trusted --add-source=10.0.0.0/8 --permanent 
firewall-cmd --zone=trusted --add-source=192.168.0.0/16 --permanent 
firewall-cmd --reload

一些有趣的命令

查看用了哪些镜像

[root@cu2 /]# kubectl get pods --all-namespaces -o jsonpath="{..image}" |\
 tr -s '[[:space:]]' '\n' |\
 sort |\
 uniq -c
      2 gcr.io/google_containers/dnsmasq-metrics-amd64:1.0
      2 gcr.io/google_containers/exechealthz-amd64:1.2
     12 gcr.io/google_containers/hyperkube-amd64:v1.5.5
      2 gcr.io/google_containers/kube-addon-manager-amd64:v6.1
      2 gcr.io/google_containers/kubedns-amd64:1.9
      2 gcr.io/google_containers/kube-dnsmasq-amd64:1.4
      2 gcr.io/google_containers/kubernetes-dashboard-amd64:v1.5.0
    
      
修改默认kubectl的配置

[root@cu2 ~]# vi $KUBECONFIG 
apiVersion: v1
kind: Config
preferences: {}
current-context: default
clusters:
- cluster:
    server: http://localhost:8080
  name: default
contexts:
- context:
    cluster: default
    user: ""
    namespace: kube-system
  name: default
users: {}

如果kubectl没有下载，可以从镜像启动的容器里面获取

[root@cu2 docker-multinode]# docker exec -ti 0c0360bcc2c3 bash
root@cu2:/# cp kubectl /var/run/

[root@cu2 run]# mv kubectl /data/kubernetes/kube-deploy/docker-multinode/

获取容器IP

https://kubernetes.io/docs/user-guide/jsonpath/
[root@cu2 ~]# kubectl get pods -o wide -l run=redis -o jsonpath={..podIP}
10.1.75.2 10.1.75.3 10.1.58.3 10.1.58.2 10.1.33.3

网络共用: --net

docker run -ti --entrypoint=sh --net=container:8e9f21956469f4ef7e5b9d91798788ab83f380795d2825cdacae0ed28f5ba03b gcr.io/google_containers/skydns-amd64:1.0


格式化输出

kubectl get pods --all-namespaces -o jsonpath="{.items[*].spec.containers[*].image}"  

[root@cu2 ~]# export POD_COL="custom-columns=NAME:.metadata.name,RESTARTS:.status.containerStatuses[*].restartCount,CONTAINERS:.spec.containers[*].name,IP:.status.podIP,HOST:.spec.nodeName"
[root@cu2 ~]# kubectl get pods -o $POD_COL 

kubectl get po -l k8s-app=kube-dns -o=custom-columns=NAME:.metadata.name,CONTAINERS:.spec.containers[*].name

[root@cu2 kubernetes]# kubectl get po --all-namespaces -o=custom-columns=NAME:.metadata.name,CONTAINERS:.spec.containers[*].name

kubectl get po --all-namespaces {range .items[*]}{.metadata.name}{“\t”}{end}

备份

echo "$(docker ps  | grep -v IMAGE | awk '{print $2}' )
$(docker-bs ps | grep -v IMAGE | awk '{print $2}' )" | sort -u | while read image ; do docker save $image>$(echo $image | tr '[/:]' _).tar ; done

加Label

cat /etc/hosts | grep -E "\scu[0-9]\s" | awk '{print "kubectl label nodes "$1" hostname="$2}' | while read line ; do sh -c "$line" ; done

扩容

[root@cu2 kubernetes]# kubectl run redis --image=redis:3.2.8 
[root@cu2 kubernetes]# kubectl scale --replicas=9 deployment/redis

 echo " $( kubectl describe pods hadoop-master2 | grep -E "Node|Container ID" | awk -F/ '{print $NF}' | tr '\n' ' ' | awk '{print "ssh "$1" \rdocker exec -ti "$2" bash"}' ) "
 

测试DNS是否成功：

[root@cu2 kube-deploy]# vi busybox.yaml
apiVersion: v1
kind: Pod
metadata:
  name: busybox
  namespace: default
spec:
  containers:
  - image: busybox
    command:
      - sleep
      - "3600"
    imagePullPolicy: IfNotPresent
    name: busybox
  restartPolicy: Always

[root@cu3 kube-deploy]# kubectl create -f busybox.yaml 
pod "busybox" created
[root@cu3 kube-deploy]# kubectl get pods 
NAME      READY     STATUS              RESTARTS   AGE
busybox   0/1       ContainerCreating   0          11s
[root@cu3 kube-deploy]# kubectl get pods 
NAME      READY     STATUS    RESTARTS   AGE
busybox   1/1       Running   0          1m
[root@cu3 kube-deploy]# kubectl exec -ti busybox -- nslookup kubernetes.default
Server:    10.0.0.10
Address 1: 10.0.0.10 kube-dns.kube-system.svc.cluster.local

Name:      kubernetes.default
Address 1: 10.0.0.1 kubernetes.default.svc.cluster.local

用容器的MYSQL的做客户端

kubectl run -it --rm --image=mysql:5.6 mysql-client -- mysql -h mysql -ppassword

小结一点：日志的重要性！

[root@cu2 kubenetes]# docker ps -a | grep kubelet
[root@cu2 kubenetes]# docker logs --tail=200 7432da457558

E0417 11:39:40.194844   22528 configmap.go:174] Couldn't get configMap hadoop/dta-hadoop-config: configmaps "dta-hadoop-config" not found
E0417 11:39:40.194910   22528 configmap.go:174] Couldn't get configMap hadoop/dta-bin-config: configmaps "dta-bin-config" not found

监控heapster的一些错误，还没调好

[root@cu2 ~]# kubectl exec -ti heapster-564189836-shn2q -n kube-system -- sh
/ # 
/ # 
没pod的数据
/ # /heapster --source=https://kubernetes.default --sink=log --heapster-port=8083 -v 10

E0329 10:11:53.823641       1 reflector.go:203] k8s.io/heapster/metrics/processors/node_autoscaling_enricher.go:100: Failed to list *api.Node: Get https://kubernetes.default/api/v1/nodes?resourceVersion=0: dial tcp 10.0.0.1:443: i/o timeout


$heapster/metrics
$heapster/api/v1/model/debug/allkeys

其他一些配置

other_args=" --registry-mirror=https://docker.mirrors.ustc.edu.cn "

--insecure-registry gcr.io 

iptables -S -t nat

其他一些资源

statefulset

–END

Winse Blog

走走停停都是风景, 熙熙攘攘都向最好, 忙忙碌碌都为明朝, 何畏之.

所遇到的问题

中间过程步骤记录

一些有趣的命令

其他一些资源

Comments