Winse Blog

走走停停都是风景, 熙熙攘攘都向最好, 忙忙碌碌都为明朝, 何畏之.

Docker入门

docker进一年来火热,发现挺适合用来做运维系统发布的。如果用来捣鼓hadoop的系统部署感觉还是挺不错的。下面一起来学习下docker吧。

docker中提供了windows的安装文档,但是其实很坑爹啊。尽管提供exe安装,但是最终还是安装visualbox,然后启动带了docker的linux系统(iso)。

如果你已经安装了vmware,但没有安装linux,可以直接下载iso,然后通过iso来启动。

安装

如果你同时安装了vmware,又已经安装了linux,那下面简单列出安装配置docker中使用的命令。docker需要64位的linux操作系统,我这里使用的是centos6,具体的安装步骤看官网的安装教程

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
[root@docker ~]# yum install epel-release

[root@docker ~]# yum install docker-io
[root@docker ~]# service docker start

[root@docker ~]# docker run learn/tutorial /bin/echo hello world
Unable to find image 'learn/tutorial' locally
Pulling repository learn/tutorial
8dbd9e392a96: Pulling fs layer 
8dbd9e392a96: Download complete 
hello world

[root@docker ~]# docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
learn/tutorial      latest              8dbd9e392a96        17 months ago       128 MB
[root@docker ~]# docker images learn/tutorial 
REPOSITORY          TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
learn/tutorial      latest              8dbd9e392a96        17 months ago       128 MB

docker执行run命令时,如果指定的image本地不存在,会从hub服务器获取。也可以先从pull服务器获取image,然后在执行。

1
docker pull centos

【注】:如果启动失败可能的问题,1:重装一下docker; 2:还是不行,启动报docker: relocation error: docker: symbol dm_task_get_info_with_deferred_remove, version Base not defined in file libdevmapper.so.1.02 with link time reference,更新yum upgrade device-mapper-libs,然后启动service docker start(具体描述见文章末)

简单入门

HelloWorld教程

单次执行

1
2
[root@docker ~]# docker run learn/tutorial /bin/echo 'hello world'
hello world

命令执行完后,容器就会关闭。

交互式执行方式

1
2
3
4
5
6
7
8
9
10
11
[root@docker ~]# docker run -t -i learn/tutorial /bin/bash
root@274ede23baad:/# uptime
 12:36:02 up  5:59,  0 users,  load average: 0.00, 0.00, 0.00
root@9db219d2e98b:/# cat /etc/issue
Ubuntu 12.04 LTS \n \l
root@274ede23baad:/# pwd
/
root@274ede23baad:/# ls
bin  boot  dev  etc  home  lib  lib64  media  mnt  opt  proc  root  run  sbin  selinux  srv  sys  tmp  usr  var
root@274ede23baad:/# exit
exit
  • -t flag assigns a pseudo-tty or terminal inside our new container。
  • -i flag allows us to make an interactive connection by grabbing the standard in (STDIN) of the container.

后台任务

1
2
3
4
5
6
[root@docker ~]# docker run -d learn/tutorial /bin/sh -c "while true; do echo hello world; sleep 1; done" 
17e28b56e0cc4ddb5522736e2bcfd752d849a5b1d0b598478ee66b255801aa7c

[root@docker ~]# docker ps
CONTAINER ID        IMAGE                   COMMAND                CREATED             STATUS              PORTS               NAMES
17e28b56e0cc        learn/tutorial:latest   /bin/sh -c 'while tr   2 minutes ago       Up 2 minutes                            trusting_wozniak    
  • -d flag tells Docker to run the container and put it in the background, to daemonize it.

执行返回的是containter id(唯一ID)。通过ps可以查看当前的后台任务列表。ps列表中的containter id对应,可以查看相应的信息,最后的字段是一个随机指定的名字(也可以指定,后面再讲)。

1
2
3
4
5
6
7
8
9
[root@docker ~]# docker logs trusting_wozniak
hello world
hello world
...

[root@docker ~]# docker stop trusting_wozniak
trusting_wozniak
[root@docker ~]# docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES

可以通过logs查看容器的标准输出,通过stop来停止容器。

1
2
3
4
5
6
7
8
9
10
11
$ sudo docker exec -i -t 665b4a1e17b6 /bin/bash #by ID
or
$ sudo docker exec -i -t loving_heisenberg /bin/bash #by Name
$ root@665b4a1e17b6:/#

---

$ sudo docker attach 665b4a1e17b6 #by ID
or
$ sudo docker attach loving_heisenberg #by Name
$ root@665b4a1e17b6:/# 

深入容器

Working with Containers

可以交互式的方式运行container,也可以后台任务的方式运行。

docker的命令:

1
2
3
# Usage:  [sudo] docker [flags] [command] [arguments] ..
# Example:
$ sudo docker run -i -t ubuntu /bin/bash

每个命令可以指定跟一系列的开关标识(flags)和参数(arguments)。

各种参数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
$ docker version

$ docker run -d -P training/webapp python app.py

$ docker ps -l
CONTAINER ID  IMAGE                   COMMAND       CREATED        STATUS        PORTS                    NAMES
bc533791f3f5  training/webapp:latest  python app.py 5 seconds ago  Up 2 seconds  0.0.0.0:49155->5000/tcp  nostalgic_morse

# docker run -d -p 6379 -v /home/hadoop/redis-2.8.13:/opt/redis-2.8.13 learn/tutorial /opt/redis-2.8.13/src/redis-server 
be0b410f3601ea36070b3e519d9cc7cbe259caa2392f468c2dd2baebef42c4a8

# docker ps -l
CONTAINER ID        IMAGE                   COMMAND                CREATED             STATUS              PORTS                     NAMES
be0b410f3601        learn/tutorial:latest   /opt/redis-2.8.13/sr   10 seconds ago      Up 10 seconds       0.0.0.0:49153->6379/tcp   sad_colden          

# /home/hadoop/redis-2.8.13/src/redis-cli -p 49153
127.0.0.1:49153> keys *
(empty list or set)
127.0.0.1:49153> 
  • -P flag is new and tells Docker to map any required network ports inside our container to our host. This lets us view our web application.
  • -l tells the docker ps command to return the details of the last container started.
  • -a the docker ps command only shows information about running containers. If you want to see stopped containers too use the -a flag.
  • -p Network port bindings are very configurable in Docker. In our last example the -P flag is a shortcut for -p 5000 that maps port 5000 inside the container to a high port (from the range 49153 to 65535) on the local Docker host. We can also bind Docker containers to specific ports using the -p flag。
  • -v flag you can also mount a directory from your own host into a container.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[root@docker redis-2.8.13]# docker run -d -p 6379:6379 -v /home/hadoop/redis-2.8.13:/opt/redis-2.8.13 learn/tutorial /opt/redis-2.8.13/src/redis-server 
2c50850c9437698769e54281a9f4154dc4120da2e113802454f1a23c83ab91fe

[root@docker redis-2.8.13]# docker ps
CONTAINER ID        IMAGE                   COMMAND                CREATED             STATUS              PORTS                    NAMES
2c50850c9437        learn/tutorial:latest   /opt/redis-2.8.13/sr   29 seconds ago      Up 28 seconds       0.0.0.0:6379->6379/tcp   naughty_yonath  

[root@docker redis-2.8.13]# docker port naughty_yonath 6379
0.0.0.0:6379

[root@docker redis-2.8.13]# docker logs -f naughty_yonath
...
[1] 27 Sep 13:48:12.192 * The server is now ready to accept connections on port 6379
[1] 27 Sep 13:50:33.228 * DB saved on disk
[1] 27 Sep 13:50:43.730 * DB saved on disk
  • -f This time though we’ve added a new flag, -f. This causes the docker logs command to act like the tail -f command and watch the container’s standard out. We can see here the logs from Flask showing the application running on port 5000 and the access log entries for it.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[root@docker redis-2.8.13]# docker top naughty_yonath
UID                 PID                 PPID                C                   STIME               TTY                 TIME                CMD
root                5015                1433                0                   21:48               ?                   00:00:00            /opt/redis-2.8.13/src/redis-server *:6379
[root@docker redis-2.8.13]# docker inspect naughty_yonath
...
    "Volumes": {
        "/opt/redis-2.8.13": "/home/hadoop/redis-2.8.13"
    },
    "VolumesRW": {
        "/opt/redis-2.8.13": true
    }
}

[root@docker redis-2.8.13]# docker inspect -f '' naughty_yonath
map[/opt/redis-2.8.13:/home/hadoop/redis-2.8.13]

重启

1
2
3
4
5
6
7
8
9
10
[root@docker redis-2.8.13]# docker stop naughty_yonath
naughty_yonath
[root@docker redis-2.8.13]# docker ps -l
CONTAINER ID        IMAGE                   COMMAND                CREATED             STATUS                     PORTS               NAMES
2c50850c9437        learn/tutorial:latest   /opt/redis-2.8.13/sr   8 minutes ago       Exited (0) 5 seconds ago                       naughty_yonath      
[root@docker redis-2.8.13]# docker start naughty_yonath
naughty_yonath
[root@docker redis-2.8.13]# docker ps -l
CONTAINER ID        IMAGE                   COMMAND                CREATED             STATUS              PORTS                    NAMES
2c50850c9437        learn/tutorial:latest   /opt/redis-2.8.13/sr   8 minutes ago       Up 1 seconds        0.0.0.0:6379->6379/tcp   naughty_yonath

删除

1
2
3
4
docker stop naughty_yonath
docker rm naughty_yonath
或者
docker rm -f naughty_yonath

Images

Working with Docker Images

列出本地的images

1
2
3
4
docker images
# REPO[:TAG]
docker run -t -i ubuntu:14.04 /bin/bash
docker run -t -i ubuntu:latest /bin/bash

从Hub获取镜像Image

1
2
3
4
docker pull centos
docker run -t -i centos /bin/bash
docker search sinatra 
docker pull training/sinatra

创建自己的images

直接更新image

1
2
3
4
5
6
7
$ docker run -t -i training/sinatra /bin/bash
root@0b2616b0e5a8:/# gem install json
$ sudo docker commit -m="Added json gem" -a="Kate Smith" \
  0b2616b0e5a8 ouruser/sinatra:v2
$ docker images
$ docker run -t -i ouruser/sinatra:v2 /bin/bash
root@78e82f680994:/#

通过DockerFile来添加功能,进行更新。

1
2
3
4
5
6
7
8
9
10
11
$ mkdir sinatra
$ cd sinatra
$ touch Dockerfile
  # This is a comment
  FROM ubuntu:14.04
  MAINTAINER Kate Smith <ksmith@example.com>
  RUN apt-get update && apt-get install -y ruby ruby-dev
  RUN gem install sinatra

$ docker build -t="ouruser/sinatra:v2" .
$ docker run -t -i ouruser/sinatra:v2 /bin/bash

具体的DockerFile中各个指令的含义及其使用方法,参考Building an image from a DockerfileBest Practices for Writing Dockerfiles,以及Dockerfile Reference。具体例子docker-perl

添加新标签Tag

1
2
3
4
5
$ docker tag 5db5f8471261 ouruser/sinatra:devel
$ docker images ouruser/sinatra
REPOSITORY          TAG     IMAGE ID      CREATED        VIRTUAL SIZE
ouruser/sinatra     latest  5db5f8471261  11 hours ago   446.7 MB
ouruser/sinatra     devel   5db5f8471261  11 hours ago   446.7 MB

上传分享到hub

1
docker push ouruser/sinatra

从本地删除

1
docker rmi training/sinatra

多container结合使用

Linking Containers Together

端口映射

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
docker run -d -P training/webapp python app.py

docker ps nostalgic_morse
CONTAINER ID  IMAGE                   COMMAND       CREATED        STATUS        PORTS                    NAMES
bc533791f3f5  training/webapp:latest  python app.py 5 seconds ago  Up 2 seconds  0.0.0.0:49155->5000/tcp  nostalgic_morse

docker run -d -p 5000:5000 training/webapp python app.py

docker run -d -p 127.0.0.1:5000:5000 training/webapp python app.py

docker run -d -p 127.0.0.1::5000 training/webapp python app.py

# The -p flag can be used multiple times to configure multiple ports.
docker run -d -p 127.0.0.1:5000:5000/udp training/webapp python app.py

docker port nostalgic_morse 5000
127.0.0.1:49155

Container Linking

docker想的还是很周到的。面临两个container互相访问,一个db,一个web,哪web怎么访问db的数据呢?

指定container的名称:

1
2
3
4
5
6
7
8
$ docker run -d -P --name web training/webapp python app.py

$ docker ps -l
CONTAINER ID  IMAGE                  COMMAND        CREATED       STATUS       PORTS                    NAMES
aed84ee21bde  training/webapp:latest python app.py  12 hours ago  Up 2 seconds 0.0.0.0:49154->5000/tcp  web

$ docker inspect -f "" aed84ee21bde
/web

容器互通:

1
2
3
4
5
6
7
8
9
$ docker run -d --name db training/postgres

$ docker rm -f web
$ docker run -d -P --name web --link db:db training/webapp python app.py

$ docker ps
CONTAINER ID  IMAGE                     COMMAND               CREATED             STATUS             PORTS                    NAMES
349169744e49  training/postgres:latest  su postgres -c '/usr  About a minute ago  Up About a minute  5432/tcp                 db, web/db
aed84ee21bde  training/webapp:latest    python app.py         16 hours ago        Up 2 minutes       0.0.0.0:49154->5000/tcp  web

链接后,在web容器会添加DB的环境变量,同时把db的ip加入到/etc/hosts中。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

$ docker run --rm --name web2 --link db:db training/webapp env
    . . .
    DB_NAME=/web2/db
    DB_PORT=tcp://172.17.0.5:5432
    DB_PORT_5432_TCP=tcp://172.17.0.5:5432
    DB_PORT_5432_TCP_PROTO=tcp
    DB_PORT_5432_TCP_PORT=5432
    DB_PORT_5432_TCP_ADDR=172.17.0.5

$ docker run -t -i --rm --link db:db training/webapp /bin/bash
root@aed84ee21bde:/opt/webapp# cat /etc/hosts
172.17.0.7  aed84ee21bde
. . .
172.17.0.5  db    

You can see that Docker has created a series of environment variables with useful information about the source db container. Each variable is prefixed with DB_, which is populated from the alias you specified above. If the alias were db1, the variables would be prefixed with DB1_.

存储

Managing Data in Containers

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# Adding a data volume
docker run -d -P --name web -v /webapp training/webapp python app.py

# Mount a Host Directory as a Data Volume
docker run -d -P --name web -v /src/webapp:/opt/webapp training/webapp python app.py
# 只读
docker run -d -P --name web -v /src/webapp:/opt/webapp:ro training/webapp python app.py

# Mount a Host File as a Data Volume
docker run --rm -it -v ~/.bash_history:/.bash_history ubuntu /bin/bash

# Creating and mounting a Data Volume Container
docker run -d -v /dbdata --name dbdata training/postgres echo Data-only container for postgres
docker run -d --volumes-from dbdata --name db1 training/postgres
docker run -d --volumes-from dbdata --name db2 training/postgres
docker run -d --name db3 --volumes-from db1 training/postgres

# Backup, restore, or migrate data volumes
docker run --volumes-from dbdata -v $(pwd):/backup ubuntu tar cvf /backup/backup.tar /dbdata
docker run -v /dbdata --name dbdata2 ubuntu /bin/bash
docker run --volumes-from dbdata2 -v $(pwd):/backup busybox tar xvf /backup/backup.tar

回顾

管理docker主要使用其提供的各种命令、以及参数来进行。

  • 本地的镜像管理: docker images / docker rmi [image identify]
  • 容器管理: docker ps -a|-l / docker start|stop|rm|restart [image identify]
  • 运行容器:docker run [images] [command]
    • -d 后台运行
    • -ti tty交互式运行
    • -P 把容器expose的端口映射到宿主机器端口。可以通过docker port [container-name]来查看端口映射关系。
    • -p [host-machine-port:container-machine-port]手动指定端口映射关系
    • -h [hostname] 实例操作系统的hostname
    • –name [name] 容器实例标识
    • -v [path] 建立目录
    • -v [host-machine-path:container-machine-path] 把宿主的文件路径映射到容器操作系统的指定目录
    • –link [container-name:name] 多容器之间互相访问。

还有很多辅助命令如:top, logs, port, inspect。以及进行版本管理的pull, push, commit, tag等等。

更新

  • 2015年3月3日00:29:44

docker官网连不上,大中国的防火墙巨坑啊!从原来的docker导出(或者翻墙设置http_proxy)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[root@docker ~]# docker ps -a
CONTAINER ID        IMAGE                   COMMAND                CREATED             STATUS                      PORTS               NAMES
4a1ba5605868        learn/tutorial:latest   /bin/bash              15 seconds ago      Exited (0) 11 seconds ago                       loving_wilson        
6e8a77ff8c26        centos:centos6          /bin/bash              10 minutes ago      Exited (0) 10 minutes ago                       determined_almeida  
[root@docker ~]# docker export loving_wilson > learn_tutorial.tar

#===

[root@localhost ~]# cat centos6.tar | docker import - centos:centos6
876f82e7032a2ed567421298c6dd12a74ac7b37fc28ef4fd062ebb4678bd6821
[root@localhost ~]# cat learn_tutorial.tar | docker import - learn/tutorial
dc574b587de3479ecc3622c7b4f12227d894aa1461737612130122092a72bdb4
[root@localhost ~]# docker images
REPOSITORY          TAG                 IMAGE ID            CREATED              VIRTUAL SIZE
learn/tutorial      latest              dc574b587de3        23 seconds ago       128.2 MB
centos              centos6             876f82e7032a        About a minute ago   212.7 MB
  • 2015年8月5日11:04:14

1 看看国内网站是否有对应的镜像: http://dockerpool.com/downloads

2 连不上可以https://registry.hub.docker.com/_/centos/ , 直接到github上面下载对应的dockerfile

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
[root@localhost ~]# git clone -b CentOS-6  https://github.com/CentOS/sig-cloud-instance-images.git
[root@localhost docker]# docker build . 

[root@localhost docker]# docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
<none>              <none>              437c8a32e0c6        27 seconds ago      203.1 MB

# 启动登陆容器,安装sshd
[root@localhost docker]# docker run -ti 437c8a32e0c6 /bin/bash
[root@077cd71ff08f /]# yum install which openssh-server openssh-clients
[root@077cd71ff08f /]# chkconfig --list
iptables        0:off   1:off   2:on    3:on    4:on    5:on    6:off
netconsole      0:off   1:off   2:off   3:off   4:off   5:off   6:off
netfs           0:off   1:off   2:off   3:on    4:on    5:on    6:off
network         0:off   1:off   2:on    3:on    4:on    5:on    6:off
rdisc           0:off   1:off   2:off   3:off   4:off   5:off   6:off
restorecond     0:off   1:off   2:off   3:off   4:off   5:off   6:off
sshd            0:off   1:off   2:on    3:on    4:on    5:on    6:off
udev-post       0:off   1:on    2:on    3:on    4:on    5:on    6:off
[root@077cd71ff08f /]# service sshd status
openssh-daemon is stopped
[root@077cd71ff08f /]# service sshd start
Generating SSH2 RSA host key:                              [  OK  ]
Generating SSH1 RSA host key:                              [  OK  ]
Generating SSH2 DSA host key:                              [  OK  ]
Starting sshd:                                             [  OK  ]
[root@077cd71ff08f /]# vi /etc/ssh/sshd_config 
#UsePAM no
#或者 sed -i '/pam_loginuid.so/c session    optional     pam_loginuid.so'  /etc/pam.d/sshd
[root@077cd71ff08f /]# which sshd
/usr/sbin/sshd
[root@077cd71ff08f /]# passwd 记得添加密码

# 提交更新镜像
[root@localhost ~]# docker ps -a
CONTAINER ID        IMAGE                 COMMAND             CREATED             STATUS                      PORTS               NAMES
077cd71ff08f        bigdata:latest        "/bin/bash"         4 minutes ago       Exited (0) 11 seconds ago                       desperate_bell       
7195847a0166        437c8a32e0c6:latest   "/bin/bash"         3 hours ago         Up 5 minutes                                    determined_feynman  
[root@localhost ~]# docker diff 077cd71ff08f (提交之前记得清数据)
[root@localhost ~]# docker stop 077cd71ff08f
[root@localhost ~]# docker commit 077cd71ff08f bigdata
[root@localhost ~]# docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
bigdata             latest              c2a336f22ff8        4 minutes ago       261.3 MB

# 启动新容器,使用ssh远程登陆
[root@localhost ~]# docker run -d --dns 172.17.42.1 --name master -h master bigdata /usr/sbin/sshd -D
[root@localhost ~]# docker run -d  --dns 172.17.42.1 --name slaver1 -h slaver1 bigdata /usr/sbin/sshd -D
[root@localhost ~]# docker inspect master
[root@localhost ~]# vi /etc/hosts
[root@localhost ~]# service dnsmasq restart

3 自己制作

1
2
3
[root@localhost docker]# wget --no-check-certificate  https://raw.githubusercontent.com/docker/docker/master/contrib/mkimage-yum.sh
[root@localhost docker]# chmod +x mkimage-yum.sh
[root@localhost docker]# ./mkimage-yum.sh centos6
  • 2015年3月2日16:13:12

再在centos6.5上安装最新docker的,启动后报错:

1
2
3
4
5
[root@localhost ~]# docker -d
INFO[0000] +job serveapi(unix:///var/run/docker.sock)   
INFO[0000] WARNING: You are running linux kernel version 2.6.32-431.el6.x86_64, which might be unstable running docker. Please upgrade your kernel to 3.8.0. 
INFO[0000] Listening for HTTP on unix (/var/run/docker.sock) 
docker: relocation error: docker: symbol dm_task_get_info_with_deferred_remove, version Base not defined in file libdevmapper.so.1.02 with link time reference

需要再安装新的依赖(囧,md,用yum安装还要自己安装其他依赖!!)

1
[root@localhost ~]#  yum install device-mapper-event-libs
  • 报错2:cgroup.procs: invalid argument[2015年8月6日11:11:49]
1
2
3
[root@localhost ~]# docker start 5ed45ce5ad3d
Error response from daemon: Cannot start container 5ed45ce5ad3d: [8] System error: write /cgroup/freezer/docker/5ed45ce5ad3d085fe3c004f90eef7c774a722e84cf0c9d18c197cc5900bbc8ae/cgroup.procs: invalid argument
FATA[0000] Error: failed to start one or more containers 

修改配置:http://blog.csdn.net/jollypigclub/article/details/40428095

1
2
3
4
5
[root@localhost ~]# vi /etc/sysconfig/docker
...
other_args="--exec-driver=lxc"
#other_args=""
...
  • 报错3:2017-3-1 14:27:23 再次安装docker-io这些安装的是1.7.1-2.el6 报错又不同了:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
[root@bigdata1 ~]# cat /etc/redhat-release 
CentOS release 6.5 (Final)
[root@bigdata1 ~]# yum install epel-release
[root@bigdata1 ~]# yum install docker-io
[root@bigdata1 ~]# docker -d
INFO[0000] Listening for HTTP on unix (/var/run/docker.sock) 
WARN[0000] You are running linux kernel version 2.6.32-431.el6.x86_64, which might be unstable running docker. Please upgrade your kernel to 3.10.0. 
docker: relocation error: docker: symbol dm_task_get_info_with_deferred_remove, version Base not defined in file libdevmapper.so.1.02 with link time reference

https://github.com/docker/docker/issues/12108
[root@bigdata1 ~]# yum install device-mapper-devel

[root@bigdata1 ~]# service docker start
Starting docker:                                           [确定]
[root@bigdata1 ~]# docker version
Client version: 1.7.1
Client API version: 1.19
Go version (client): go1.4.2
Git commit (client): 786b29d/1.7.1
OS/Arch (client): linux/amd64
Server version: 1.7.1
Server API version: 1.19
Go version (server): go1.4.2
Git commit (server): 786b29d/1.7.1
OS/Arch (server): linux/amd64
  • docker本地存储的路径[@ 2015年8月5日11:19:17]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[root@localhost docker]# cd /var/lib/docker/
[root@localhost docker]# ls
containers  devicemapper  graph  init  linkgraph.db  repositories-devicemapper  tmp  trust  volumes
[root@localhost docker]# cd graph/
[root@localhost graph]# ll
总用量 16
drwx------ 2 root root 4096 8月   5 10:39 d5d33a6a321ae20a3ae4805b5643560ce9c16a49d2f1d32541b39e04ad083983
drwx------ 2 root root 4096 8月   5 10:39 d8ed1be0a39bcc741aa1e95e59b844140d9294afc75082697184cdfbf2bc6a2d
drwx------ 2 root root 4096 8月   5 09:48 f1b10cd842498c23d206ee0cbeaa9de8d2ae09ff3c7af2723a9e337a6965d639
drwx------ 2 root root 4096 8月   5 10:39 _tmp

[root@localhost docker]# cd devicemapper/devicemapper/
[root@localhost devicemapper]# ll
总用量 976024
-rw------- 1 root root 107374182400 8月   5 09:38 data
-rw------- 1 root root   2147483648 8月   5 09:38 metadata

参考

–END

在windows开发测试mapreduce几种方式

备注: 文后面的maven打包、以及执行的shell脚本还是极好的…

hadoop提供的两大组件HDFS、MapReduce。其中HDFS提供了丰富的API,最重要的有类似shell的脚本进行操作。而编写程序,要很方便的调试测试,其实是一件比较麻烦和繁琐的事情。

首先可能针对拆分的功能进行单独的方法级别的单元测试,然后到map/reduce的一个完整的处理过程的测试,再就是针对整个MR的测试,前面说的都是在IDE中完成后,最后需要到测试环境对其进行验证。

  • 单独的方法这里就不必多讲,直接使用eclipse自带的junit即可完成。
  • mrunit,针对map/reduce的测试,以至于整个MR流程的测试,但是mrunit的输入是针对小数据量的。
  • 本地模式运行程序,模拟正式的环境来进行测试,数据直接从hdfs获取。
  • 测试环境远程调试,尽管经过前面的步骤可能还会遇到各种问题,此时可结合remote debug来定位问题。

mrunit测试map/reduce

首先去到官网下载,把对应的jar加入到你项目的依赖。懒得去手工下载的话直接使用maven。

1
2
3
4
5
6
7
<dependency>
  <groupId>org.apache.mrunit</groupId>
  <artifactId>mrunit</artifactId>
  <version>1.1.0</version>
  <classifier>hadoop2</classifier>
  <scope>test</scope>
</dependency>

可以对mapreduce的各种情况(map/reduce/map-reduce/map-combine-reduce)进行简单的测试,验证逻辑上是否存在问题。官方文档的例子已经很具体详细了。

先新建初始化driver(MapDriver/ReduceDriver/MapReduceDriver),然后添加配置配置信息(configuration),再指定withInput来进行输入数据,和withOutput对应的输出数据。运行调用runTest方法就会模拟mr的整个运行机制来对单条的记录进行处理。因为都是在一个jvm中执行,调试是很方便的。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
private MapReduceDriver<LongWritable, Text, KeyWrapper, ValueWrapper, Text, Text> mrDriver;

@Before
public void setUp() {
  AccessLogMapper mapper = new AccessLogMapper();
  AccessLogReducer reducer = new AccessLogReducer();
  // AccessLogCombiner combiner = new AccessLogCombiner();

  mrDriver = MapReduceDriver.newMapReduceDriver(mapper, reducer);

  // mDriver = MapDriver.newMapDriver(mapper);
  // mcrDriver = MapReduceDriver.newMapReduceDriver(mapper, reducer, combiner);
}

private String[] datas;

@After
public void run() throws IOException {
  if (datas != null) {
      // 配置
      ...
      mrDriver.setConfiguration(config);
      // mrDriver.getConfiguration().addResource("job_1399189058775_0627_conf.xml");

    // 输入输出
      Text input = new Text();
      int i = 0;
      for (String data : datas) {
          input.set(data);
          mrDriver.withInput(new LongWritable(++i), new Text(data));
      }
      mrDriver.withOutputFormat(MultipleFileOutputFormat.class, TextInputFormat.class);
      mrDriver.runTest();
  }
}

// / datas

private String[] datas() {
  return ...;
}

@Test
public void testOne() throws IOException {
  datas = new String[] { datas()[0] };
}

local方式进行本地测试

mapreduce默认提供了两种任务框架: local和yarn。YARN环境需要把程序发布到nodemanager上去运行,对于开发测试来讲,还是太繁琐了。

使用local的方式,既不用打包同时拥有IDE本地调试的便利,同时数据直接从HDFS中获取,也就是说,除了任务框架不同,其他都一样,程序的输入输出,任务代码的业务逻辑。为全面开发调试/测试提供了极其重要的方式。

只需要指定服务为local的服务框架,再加上输入输出即可。如果本地用户和hdfs的用户不同,设置下环境变量HADOOP_USER_NAME。同样map、reduce通过线程来模拟,都运行的同一个JVM中,断点调试也很方便。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
public class WordCountTest {
  
  static {
      System.setProperty("HADOOP_USER_NAME", "hadoop");
  }
  
  private static final String HDFS_SERVER = "hdfs://umcc97-44:9000";

  @Test
  public void test() throws Exception {
      WordCount.main(new String[]{
              "-Dmapreduce.framework.name=local", 
              "-Dfs.defaultFS=" + HDFS_SERVER, 
              HDFS_SERVER + "/user/hadoop/dta/001.tar.gz", 
              HDFS_SERVER + "/user/hadoop/output/"});
  }

}

测试环境打包测试

放到测试环境后,appmanager、map、reduce都是运行在不同的jvm;还有就是需要对程序进行打包,挺啰嗦而且麻烦的事情,依赖包多的话,包还挺大,每次job都需要传递这么大一个文件,也挺浪费的。

提供两种打包方式,一种是直接jar运行的,一种是所有的jar压缩包tar.gz方式。可以结合distributecache减少每次执行程序需要传递给nodemanager的数据量,以及结合mapreduce运行时配置参数可以进行远程调试。

1
2
3
4
5
6
调试appmanager
-Dyarn.app.mapreduce.am.command-opts="-Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=18090" 
调试map
-Dmapreduce.map.java.opts
调试reduce
-Dmapreduce.reduce.java.opts

小结

通过以上3中方式基本上能处理工作终于到的大部分问题了。大部分的功能使用mrunit测试就可以了,还可以单独的测试map,或者reduce挺不错的。

附录:maven打包

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
<profile>
  <id>jar</id>
  <build>
      <plugins>
          <plugin>
              <groupId>org.apache.maven.plugins</groupId>
              <artifactId>maven-assembly-plugin</artifactId>
              <executions>
                  <execution>
                      <id>make-assembly</id>
                      <phase>package</phase>
                      <goals>
                          <goal>single</goal>
                      </goals>
                  </execution>
              </executions>
              <configuration>
                  <descriptorRefs>
                      <descriptorRef>
                          jar-with-dependencies
                      </descriptorRef>
                  </descriptorRefs>
              </configuration>
          </plugin>

      </plugins>
  </build>
</profile>

<profile>
  <id>tar</id>
  <build>
      <plugins>
          <plugin>
              <groupId>org.apache.maven.plugins</groupId>
              <artifactId>maven-assembly-plugin</artifactId>
              <executions>
                  <execution>
                      <id>make-assembly</id>
                      <phase>package</phase>
                      <goals>
                          <goal>single</goal>
                      </goals>
                  </execution>
              </executions>
              <configuration>
                  <appendAssemblyId>true</appendAssemblyId>
                  <descriptors>
                      <descriptor>${basedir}/../assemblies/application.xml</descriptor>
                  </descriptors>
              </configuration>
          </plugin>
      </plugins>
  </build>
</profile>

打包成tar.gz的描述文件:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
<assembly>
  <id>dist-${env}</id>
  <formats>
      <format>tar.gz</format>
  </formats>
  <includeBaseDirectory>true</includeBaseDirectory>
  <fileSets>
      <fileSet>
          <directory>${basedir}/src/main/scripts</directory>
          <outputDirectory>/bin</outputDirectory>
          <includes>
              <include>*.sh</include>
          </includes>
          <fileMode>0755</fileMode>
          <lineEnding>unix</lineEnding>
      </fileSet>
      <fileSet>
          <directory>${basedir}/target/classes</directory>
          <outputDirectory>/conf</outputDirectory>
          <includes>
              <include>*.xml</include>
              <include>*.properties</include>
          </includes>
      </fileSet>
      <fileSet>
          <directory>${basedir}/target</directory>
          <outputDirectory>/lib/core</outputDirectory>
          <includes>
              <include>${project.artifactId}-${project.version}.jar
              </include>
          </includes>
      </fileSet>
  </fileSets>
  <dependencySets>
      <dependencySet>
          <useProjectArtifact>false</useProjectArtifact>
          <outputDirectory>/lib/common</outputDirectory>
          <scope>runtime</scope>
      </dependencySet>
  </dependencySets>
</assembly>

运行整个程序的shell脚本

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#!/bin/sh

bin=`which $0`
bin=`dirname ${bin}`
bin=`cd "$bin"; pwd`

export ANAYSER_HOME=`dirname "$bin"`

export ANAYSER_LOG_DIR=$ANAYSER_HOME/logs

export ANAYSER_OPTS="-Dproc_dta_analyser -server -Xms1024M -Xmx2048M -Danalyser.log.dir=${ANAYSER_LOG_DIR}"

export HADOOP_HOME=${HADOOP_HOME:-/home/hadoop/hadoop-2.2.0}
export ANAYSER_CLASSPATH=$ANAYSER_HOME/conf
export ANAYSER_CLASSPATH=$ANAYSER_CLASSPATH:$HADOOP_HOME/etc/hadoop

for f in $ANAYSER_HOME/lib/core/*.jar ; do
  export ANAYSER_CLASSPATH+=:$f
done

for f in $ANAYSER_HOME/lib/common/*.jar ; do
  export ANAYSER_CLASSPATH+=:$f
done

if [ ! -d $ANAYSER_LOG_DIR ] ; then
  mkdir -p $ANAYSER_LOG_DIR
fi

[ -w "$ANAYSER_PID_DIR" ] ||  mkdir -p "$ANAYSER_PID_DIR"

nohup ${JAVA_HOME}/bin/java $ANAYSER_OPTS -cp $ANAYSER_CLASSPATH com.analyser.AnalyserStarter >$ANAYSER_LOG_DIR/stdout 2>$ANAYSER_LOG_DIR/stderr &

–END

Scala Wordcount on Hadoop2

从了解scala,到spark再次遇见scala,准备好好学学这门语言。函数式编程大势所趋,简洁的语法,更抽象好用的集合操作。土生土长的JVM的语言,以及凭借其与java的互操作性,发展前景一片光明。在云计算以及手机(android)开发都有其大展拳脚的地方。

工作中大部分时间写mapreduce,项目空白期实践了一下把scala搬上hadoop。整体来说用scala写个helloworld是比较简单的,就一些细节的东西比较繁琐。尽管用了几年的eclipse了,但是scala-ide还是需要再适应适应!scala-idea也没有大家说的那么好,和webstorm比差远了。

使用scala主要原因:

  • 写JavaBean更简单方便
  • 多返回值无需定义Result实体类
  • 集合更抽象的方法真的很好用
  • trait可以更便捷的进行操作层面的聚合,也就是可以把操作分离出来,进行组合就可以实现新的功能。这不就是decorate模式嘛!java的decorate多麻烦的!加点东西太麻烦了!!!

上面的scala代码和java的比较类似,主要在集合操作上不同而已,变量定义简单化。

编写好代码后就是运行调试。

前面其他的文章已经说过了,默认mapreduce.framework.name的配置是本地local,所以直接运行就像运行一个普通的本地java程序。这就不多讲了。 这里主要讲讲怎么把代码打包放到真实的集群环境运行,相比java的版本要添加那些步骤。

从项目的maven pom中可以发现,其实就是多了scala-lang的新依赖而已,其他都是hadoop自带的公共包。

所以运行程序只需要指定把scala-lang.jar添加到运行环境的classpath中即可。使用maven打包后的项目结构如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
[hadoop@master1 scalamapred-1.0.5]$ cd lib/
[hadoop@master1 lib]$ ls -l
total 8
drwxrwxr-x. 2 hadoop hadoop 4096 Sep 11 23:10 common
drwxrwxr-x. 2 hadoop hadoop 4096 Sep 11 23:56 core
[hadoop@master1 lib]$ ll core/
total 12
-rw-r--r--. 1 hadoop hadoop 11903 Sep 11 23:55 scalamapred-1.0.5.jar
[hadoop@master1 lib]$ ls common/
activation-1.1.jar                commons-lang-2.6.jar            hadoop-hdfs-2.2.0.jar                     jaxb-api-2.2.2.jar                      log4j-1.2.17.jar
aopalliance-1.0.jar               commons-logging-1.1.1.jar       hadoop-mapreduce-client-common-2.2.0.jar  jaxb-impl-2.2.3-1.jar                   management-api-3.0.0-b012.jar
asm-3.1.jar                       commons-math-2.1.jar            hadoop-mapreduce-client-core-2.2.0.jar    jersey-client-1.9.jar                   netty-3.6.2.Final.jar
avro-1.7.4.jar                    commons-net-3.1.jar             hadoop-yarn-api-2.2.0.jar                 jersey-core-1.9.jar                     paranamer-2.3.jar
commons-beanutils-1.7.0.jar       gmbal-api-only-3.0.0-b023.jar   hadoop-yarn-client-2.2.0.jar              jersey-grizzly2-1.9.jar                 protobuf-java-2.5.0.jar
commons-beanutils-core-1.8.0.jar  grizzly-framework-2.1.2.jar     hadoop-yarn-common-2.2.0.jar              jersey-guice-1.9.jar                    scala-library-2.10.4.jar
commons-cli-1.2.jar               grizzly-http-2.1.2.jar          hadoop-yarn-server-common-2.2.0.jar       jersey-json-1.9.jar                     servlet-api-2.5.jar
commons-codec-1.4.jar             grizzly-http-server-2.1.2.jar   jackson-core-asl-1.8.8.jar                jersey-server-1.9.jar                   slf4j-api-1.7.1.jar
commons-collections-3.2.1.jar     grizzly-http-servlet-2.1.2.jar  jackson-jaxrs-1.8.3.jar                   jersey-test-framework-core-1.9.jar      slf4j-log4j12-1.7.1.jar
commons-compress-1.4.1.jar        grizzly-rcm-2.1.2.jar           jackson-mapper-asl-1.8.8.jar              jersey-test-framework-grizzly2-1.9.jar  snappy-java-1.0.4.1.jar
commons-configuration-1.6.jar     guava-17.0.jar                  jackson-xc-1.8.3.jar                      jets3t-0.6.1.jar                        stax-api-1.0.1.jar
commons-daemon-1.0.13.jar         guice-3.0.jar                   jasper-compiler-5.5.23.jar                jettison-1.1.jar                        xmlenc-0.52.jar
commons-digester-1.8.jar          guice-servlet-3.0.jar           jasper-runtime-5.5.23.jar                 jetty-6.1.26.jar                        xz-1.0.jar
commons-el-1.0.jar                hadoop-annotations-2.2.0.jar    javax.inject-1.jar                        jetty-util-6.1.26.jar                   zookeeper-3.4.5.jar
commons-httpclient-3.1.jar        hadoop-auth-2.2.0.jar           javax.servlet-3.1.jar                     jsch-0.1.42.jar
commons-io-2.1.jar                hadoop-common-2.2.0.jar         javax.servlet-api-3.0.1.jar               jsp-api-2.1.jar
[hadoop@master1 lib]$ 

完整的pom.xml的内容为:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>

  <groupId>com.winse</groupId>
  <version>1.0</version>

  <artifactId>scalamapred</artifactId>

  <build>
      <plugins>
          <plugin>
              <groupId>org.scala-tools</groupId>
              <artifactId>maven-scala-plugin</artifactId>
              <version>2.15.2</version>
              <executions>
                  <execution>
                      <id>scala-compile-first</id>
                      <phase>process-resources</phase>
                      <goals>
                          <goal>add-source</goal>
                          <goal>compile</goal>
                      </goals>
                  </execution>
                  <execution>
                      <id>scala-test-compile</id>
                      <phase>process-test-resources</phase>
                      <goals>
                          <goal>testCompile</goal>
                      </goals>
                  </execution>
              </executions>
              <configuration>
                  <scalaVersion>${scala.version}</scalaVersion>
              </configuration>
          </plugin>

          <plugin>
              <groupId>org.codehaus.mojo</groupId>
              <artifactId>build-helper-maven-plugin</artifactId>
              <version>1.8</version>
              <executions>
                  <execution>
                      <id>add-scala-sources</id>
                      <phase>generate-sources</phase>
                      <goals>
                          <goal>add-source</goal>
                      </goals>
                      <configuration>
                          <sources>
                              <source>${basedir}/src/main/scala</source>
                          </sources>
                      </configuration>
                  </execution>
                  <execution>
                      <id>add-scala-test-sources</id>
                      <phase>generate-test-sources</phase>
                      <goals>
                          <goal>add-test-source</goal>
                      </goals>
                      <configuration>
                          <sources>
                              <source>${basedir}/src/test/scala</source>
                          </sources>
                      </configuration>
                  </execution>
              </executions>
          </plugin>
      </plugins>
  </build>

  <dependencies>
      <dependency>
          <groupId>org.apache.hadoop</groupId>
          <artifactId>hadoop-mapreduce-client-common</artifactId>
          <version>${hadoop.version}</version>
      </dependency>
      <dependency>
          <groupId>org.apache.hadoop</groupId>
          <artifactId>hadoop-hdfs</artifactId>
          <version>${hadoop.version}</version>
      </dependency>
      <dependency>
          <groupId>org.apache.hadoop</groupId>
          <artifactId>hadoop-mapreduce-client-core</artifactId>
          <version>${hadoop.version}</version>
      </dependency>
      <dependency>
          <groupId>org.apache.hadoop</groupId>
          <artifactId>hadoop-common</artifactId>
          <version>${hadoop.version}</version>
      </dependency>
      <dependency>
          <groupId>org.scala-lang</groupId>
          <artifactId>scala-library</artifactId>
          <version>${scala.version}</version>
      </dependency>
  </dependencies>
  <properties>
      <scala.version>2.10.4</scala.version>
      <hadoop.version>2.2.0</hadoop.version>
  </properties>

  <profiles>
      <profile>
          <id>tar</id>
          <build>
              <plugins>
                  <plugin>
                      <groupId>org.apache.maven.plugins</groupId>
                      <artifactId>maven-assembly-plugin</artifactId>
                      <executions>
                          <execution>
                              <id>make-assembly</id>
                              <phase>package</phase>
                              <goals>
                                  <goal>single</goal>
                              </goals>
                          </execution>
                      </executions>
                  </plugin>

              </plugins>
          </build>
      </profile>
  </profiles>

  <repositories>
      <repository>
          <id>scala-tools.org</id>
          <name>Scala-tools Maven2 Repository</name>
          <url>http://scala-tools.org/repo-releases</url>
      </repository>
  </repositories>
  <pluginRepositories>
      <pluginRepository>
          <id>scala-tools.org</id>
          <name>Scala-tools Maven2 Repository</name>
          <url>http://scala-tools.org/repo-releases</url>
      </pluginRepository>
  </pluginRepositories>

</project>

在lib文件夹下面包括common和core两放置jar的文件夹,common是项目的依赖包,core下面的是项目的源码jar。

接下来运行程序,通过libjar把scala-library的包加入到mapreduce的运行时classpath。当然也可以把scala-library加入到mapreduce.application.classpath(默认值为$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*)。

1
2
3
4
5
6
[hadoop@master1 scalamapred-1.0.5]$ for j in `find . -name "*.jar"` ; do export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$j ; done
或者
[hadoop@master1 scalamapred-1.0.5]$ export HADOOP_CLASSPATH=
[hadoop@master1 scalamapred-1.0.5]$ export HADOOP_CLASSPATH=/home/hadoop/scalamapred-1.0.5/lib/core/*:/home/hadoop/scalamapred-1.0.5/lib/common/*

[hadoop@master1 scalamapred-1.0.5]$ hadoop com.github.winse.hadoop.HelloScalaMapRed -libjars lib/common/scala-library-2.10.4.jar 

问题攻略

上面如果不加libjar的话,会在nodemanager的代码中抛出异常!!本来以为不加依赖包也就不能执行mapreduce里面的代码而已。问题的根源在哪里呢?

给代码添加远程调试的配置,然后运行一步步的查找问题(如果一次找不到就多运行调试几次)。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
[hadoop@master1 scalamapred-1.0.5]$ hadoop com.github.winse.hadoop.HelloScalaMapRed  -Dyarn.app.mapreduce.am.command-opts="-Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=18090"

// 我这里slaver就一台,去到机器上查看运行的程序

[hadoop@slaver1 nmPrivate]$ ps axu|grep java
hadoop    1427  0.6 10.5 1562760 106344 ?      Sl   Sep11   0:45 /opt/jdk1.7.0_60//bin/java -Dproc_datanode -Xmx1000m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/hadoop/hadoop-2.2.0/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/home/hadoop/hadoop-2.2.0 -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,console -Djava.library.path=/home/hadoop/hadoop-2.2.0/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/hadoop/hadoop-2.2.0/logs -Dhadoop.log.file=hadoop-hadoop-datanode-slaver1.log -Dhadoop.home.dir=/home/hadoop/hadoop-2.2.0 -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,RFA -Djava.library.path=/home/hadoop/hadoop-2.2.0/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -server -Dhadoop.security.logger=ERROR,RFAS -Dhadoop.security.logger=ERROR,RFAS -Dhadoop.security.logger=ERROR,RFAS -Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.server.datanode.DataNode
hadoop    2874  2.5 11.7 1599312 118980 ?      Sl   00:08   0:57 /opt/jdk1.7.0_60//bin/java -Dproc_nodemanager -Xmx1000m -Dhadoop.log.dir=/home/hadoop/hadoop-2.2.0/logs -Dyarn.log.dir=/home/hadoop/hadoop-2.2.0/logs -Dhadoop.log.file=yarn-hadoop-nodemanager-slaver1.log -Dyarn.log.file=yarn-hadoop-nodemanager-slaver1.log -Dyarn.home.dir= -Dyarn.id.str=hadoop -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/home/hadoop/hadoop-2.2.0/lib/native -Dyarn.policy.file=hadoop-policy.xml -server -Dhadoop.log.dir=/home/hadoop/hadoop-2.2.0/logs -Dyarn.log.dir=/home/hadoop/hadoop-2.2.0/logs -Dhadoop.log.file=yarn-hadoop-nodemanager-slaver1.log -Dyarn.log.file=yarn-hadoop-nodemanager-slaver1.log -Dyarn.home.dir=/home/hadoop/hadoop-2.2.0 -Dhadoop.home.dir=/home/hadoop/hadoop-2.2.0 -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/home/hadoop/hadoop-2.2.0/lib/native -classpath /home/hadoop/hadoop-2.2.0/etc/hadoop:/home/hadoop/hadoop-2.2.0/etc/hadoop:/home/hadoop/hadoop-2.2.0/etc/hadoop:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/*:/home/hadoop/hadoop-2.2.0/share/hadoop/common/*:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/*:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/*:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/*:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/*:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/*:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/*:/contrib/capacity-scheduler/*.jar:/contrib/capacity-scheduler/*.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/*:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/*:/home/hadoop/hadoop-2.2.0/etc/hadoop/nm-config/log4j.properties org.apache.hadoop.yarn.server.nodemanager.NodeManager
hadoop    3750  0.0  0.1 106104  1200 ?        Ss   00:43   0:00 /bin/bash -c /opt/jdk1.7.0_60//bin/java -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/home/hadoop/hadoop-2.2.0/logs/userlogs/application_1410453720744_0007/container_1410453720744_0007_01_000001 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA  -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=18090 org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1>/home/hadoop/hadoop-2.2.0/logs/userlogs/application_1410453720744_0007/container_1410453720744_0007_01_000001/stdout 2>/home/hadoop/hadoop-2.2.0/logs/userlogs/application_1410453720744_0007/container_1410453720744_0007_01_000001/stderr 
hadoop    3759  0.1  1.8 737648 18232 ?        Sl   00:43   0:00 /opt/jdk1.7.0_60//bin/java -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/home/hadoop/hadoop-2.2.0/logs/userlogs/application_1410453720744_0007/container_1410453720744_0007_01_000001 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=18090 org.apache.hadoop.mapreduce.v2.app.MRAppMaster
hadoop    3778  0.0  0.0 103256   832 pts/0    S+   00:45   0:00 grep java

// 去到对应的目录下查看launcher.sh的脚本
// appmaster launcher

[hadoop@slaver1 nm-local-dir]$ cd nmPrivate/application_1410453720744_0007/
[hadoop@slaver1 application_1410453720744_0007]$ ll
total 4
drwxrwxr-x. 2 hadoop hadoop 4096 Sep 12 00:43 container_1410453720744_0007_01_000001
[hadoop@slaver1 application_1410453720744_0007]$ less container_1410453720744_0007_01_000001/
container_1410453720744_0007_01_000001.tokens       launch_container.sh                                 
.container_1410453720744_0007_01_000001.tokens.crc  .launch_container.sh.crc                            
[hadoop@slaver1 application_1410453720744_0007]$ less container_1410453720744_0007_01_000001/launch_container.sh 
#!/bin/bash

export NM_HTTP_PORT="8042"
export LOCAL_DIRS="/home/hadoop/data/nm-local-dir/usercache/hadoop/appcache/application_1410453720744_0007"
export HADOOP_COMMON_HOME="/home/hadoop/hadoop-2.2.0"
export JAVA_HOME="/opt/jdk1.7.0_60/"
export NM_AUX_SERVICE_mapreduce_shuffle="AAA0+gAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=
"
export HADOOP_YARN_HOME="/home/hadoop/hadoop-2.2.0"
export CLASSPATH="$PWD:$HADOOP_CONF_DIR:$HADOOP_COMMON_HOME/share/hadoop/common/*:$HADOOP_COMMON_HOME/share/hadoop/common/lib/*:$HADOOP_HDFS_HOME/share/hadoop/hdfs/*:$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*:$HADOOP_YARN_HOME/share/hadoop/yarn/*:$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*:job.jar/job.jar:job.jar/classes/:job.jar/lib/*:$PWD/*"
export HADOOP_TOKEN_FILE_LOCATION="/home/hadoop/data/nm-local-dir/usercache/hadoop/appcache/application_1410453720744_0007/container_1410453720744_0007_01_000001/container_tokens"
export NM_HOST="slaver1"
export APPLICATION_WEB_PROXY_BASE="/proxy/application_1410453720744_0007"
export JVM_PID="$$"
export USER="hadoop"
export HADOOP_HDFS_HOME="/home/hadoop/hadoop-2.2.0"
export PWD="/home/hadoop/data/nm-local-dir/usercache/hadoop/appcache/application_1410453720744_0007/container_1410453720744_0007_01_000001"
export CONTAINER_ID="container_1410453720744_0007_01_000001"
export HOME="/home/"
export NM_PORT="40888"
export LOGNAME="hadoop"
export APP_SUBMIT_TIME_ENV="1410455811401"
export MAX_APP_ATTEMPTS="2"
export HADOOP_CONF_DIR="/home/hadoop/hadoop-2.2.0/etc/hadoop"
export MALLOC_ARENA_MAX="4"
export LOG_DIRS="/home/hadoop/hadoop-2.2.0/logs/userlogs/application_1410453720744_0007/container_1410453720744_0007_01_000001"
ln -sf "/home/hadoop/data/nm-local-dir/usercache/hadoop/appcache/application_1410453720744_0007/filecache/10/job.jar" "job.jar"
ln -sf "/home/hadoop/data/nm-local-dir/usercache/hadoop/appcache/application_1410453720744_0007/filecache/13/job.xml" "job.xml"
mkdir -p jobSubmitDir
ln -sf "/home/hadoop/data/nm-local-dir/usercache/hadoop/appcache/application_1410453720744_0007/filecache/11/job.splitmetainfo" "jobSubmitDir/job.splitmetainfo"
mkdir -p jobSubmitDir
ln -sf "/home/hadoop/data/nm-local-dir/usercache/hadoop/appcache/application_1410453720744_0007/filecache/12/job.split" "jobSubmitDir/job.split"
exec /bin/bash -c "$JAVA_HOME/bin/java -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/home/hadoop/hadoop-2.2.0/logs/userlogs/application_1410453720744_0007/container_1410453720744_0007_01_000001 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA  -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=18090 org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1>/home/hadoop/hadoop-2.2.0/logs/userlogs/application_1410453720744_0007/container_1410453720744_0007_01_000001/stdout 2>/home/hadoop/hadoop-2.2.0/logs/userlogs/application_1410453720744_0007/container_1410453720744_0007_01_000001/stderr "

// 去到TMP对应的目录下,查看MRAppMaster根目录

[hadoop@slaver1 ~]$ cd /home/hadoop/data/nm-local-dir/usercache/hadoop/appcache/application_1410453720744_0007/container_1410453720744_0007_01_000001
[hadoop@slaver1 container_1410453720744_0007_01_000001]$ ll
total 28
-rw-r--r--. 1 hadoop hadoop   95 Sep 12 00:43 container_tokens
-rwx------. 1 hadoop hadoop  468 Sep 12 00:43 default_container_executor.sh
lrwxrwxrwx. 1 hadoop hadoop  108 Sep 12 00:43 job.jar -> /home/hadoop/data/nm-local-dir/usercache/hadoop/appcache/application_1410453720744_0007/filecache/10/job.jar
drwxrwxr-x. 2 hadoop hadoop 4096 Sep 12 00:43 jobSubmitDir
lrwxrwxrwx. 1 hadoop hadoop  108 Sep 12 00:43 job.xml -> /home/hadoop/data/nm-local-dir/usercache/hadoop/appcache/application_1410453720744_0007/filecache/13/job.xml
-rwx------. 1 hadoop hadoop 3005 Sep 12 00:43 launch_container.sh
drwx--x---. 2 hadoop hadoop 4096 Sep 12 00:43 tmp
[hadoop@slaver1 container_1410453720744_0007_01_000001]$ 

为了对应,我这里列出来在添加了libjar的TMP目录的列表:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[hadoop@master1 scalamapred-1.0.5]$ hadoop com.github.winse.hadoop.HelloScalaMapRed  -Dyarn.app.mapreduce.am.command-opts="-Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=18090" -libjars lib/common/scala-library-2.10.4.jar 

[hadoop@slaver1 container_1410453720744_0007_01_000001]$ cd /home/hadoop/data/nm-local-dir/usercache/hadoop/appcache/application_1410453720744_0008/container_1410453720744_0008_01_000001
[hadoop@slaver1 container_1410453720744_0008_01_000001]$ ll
total 32
-rw-r--r--. 1 hadoop hadoop   95 Sep 12 00:49 container_tokens
-rwx------. 1 hadoop hadoop  468 Sep 12 00:49 default_container_executor.sh
lrwxrwxrwx. 1 hadoop hadoop  108 Sep 12 00:49 job.jar -> /home/hadoop/data/nm-local-dir/usercache/hadoop/appcache/application_1410453720744_0008/filecache/10/job.jar
drwxrwxr-x. 2 hadoop hadoop 4096 Sep 12 00:49 jobSubmitDir
lrwxrwxrwx. 1 hadoop hadoop  108 Sep 12 00:49 job.xml -> /home/hadoop/data/nm-local-dir/usercache/hadoop/appcache/application_1410453720744_0008/filecache/13/job.xml
-rwx------. 1 hadoop hadoop 3127 Sep 12 00:49 launch_container.sh
lrwxrwxrwx. 1 hadoop hadoop   85 Sep 12 00:49 scala-library-2.10.4.jar -> /home/hadoop/data/nm-local-dir/usercache/hadoop/filecache/10/scala-library-2.10.4.jar
drwx--x---. 2 hadoop hadoop 4096 Sep 12 00:49 tmp
[hadoop@slaver1 container_1410453720744_0008_01_000001]$ 

windows本地使用eclipse和进行跟踪调试代码。

此时可以通过8088的网页查看状态,当前有一个mrappmaster在执行,如果第一个失败,会尝试执行第二次。

运行调试多次后,最终确定问题所在。在master中会检查是否为 链式mr ,而加载该class的时刻,同时要加载父类的class,即scala的类,所以在这里会抛出异常。

去到查看程序运行的日志,可以看到程序抛出的异常NoClassDefFoundError

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
[hadoop@slaver1 ~]$ less /home/hadoop/hadoop-2.2.0/logs/userlogs/application_1410448728371_0003/*/syslog
2014-09-11 22:55:12,616 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for application appattempt_1410448728371_0003_000001
...
2014-09-11 22:55:18,677 INFO [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Adding job token for job_1410448728371_0003 to jobTokenSecretManager
2014-09-11 22:55:19,119 FATAL [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
java.lang.NoClassDefFoundError: scala/Function1
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:190)
        at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.isChainJob(JobImpl.java:1277)
        at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.makeUberDecision(JobImpl.java:1217)
        at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.access$3700(JobImpl.java:135)
        at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1420)
        at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1358)
        at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
        at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
        at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
        at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
        at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:972)
        at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:134)
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1227)
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1035)
        at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1445)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1441)
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1374)
Caused by: java.lang.ClassNotFoundException: scala.Function1
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        ... 22 more
2014-09-11 22:55:19,130 INFO [Thread-1] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: MRAppMaster received a signal. Signaling RMCommunicator and JobHistoryEventHandler.

意外收获

  • 推测执行初始化代码

  • OutputFormat的获取Committer代码

参考

–END

【笔记】Beginning Scala(1)

Scala借鉴了python、ruby等函数式语言。从java转过来还是需要一个适应阶段,与groovy比似乎困难多了不少。一年前好奇接触过,看了一些官网的入门教程,觉得这就是一个异类,后面就放下了。

直到再次弄hadoop,接触spark。经过一个时间的过渡期后,发现Scala确实能处理java的一些繁琐问题,为我们的双手减负,写出更简洁更优雅的代码,或者说更”易懂“。

这篇是第一章(About Scala and How to Install It)和第二章(Scala Syntax, Scripts, and Your First Scala Programs)的笔记。

作者寄语:

My Path was hard, and I hope yours will be easier.

历史与安装

随着HotSpot对JVM的改进,JDK1.3的程序与C++写的程序一样快。Java程序可以运行几个星期、几个月、甚至一年都不用重启。

好的Java代码与C/C++的代码一样快,甚至更快。在同样功能下,经过深度调优的C/C++程序会比Java程序更高效,与C/C++相比Java程序需要更多的内存,但对于一个适度复杂的项目(非系统内核级别),JVM程序将比C/C++表现的更优异。

这么多年来,Java在语言级别还不成熟。Java语法停滞不前,Java上的web框架越来越笨重。处理XML,或者其他一些简单概念的实现,如字段生成前台的HTML表单,需要越来越多的代码。对Java越来越失望。 Java5增加了枚举和泛型,对语言而言这是一个可喜的消息,但编码方面我们不得不使用IDE来完成Java代码编写。

“写Scala”的Martin Odersky曾编写了java编译器和泛型功能。Scala(2001, first version in 2003),语法表达能力如ruby,但同时有Java的强类型和高性能。

Scala即快又简洁,同时类型安全。Scala运行效率也很高,最终编译成Java字节码跑在JVM上,又能与Java代码互相调用。

But most importantly, Scala taught me to program and reason about programming differently. I stopped thinking in terms of allocating buffers, structs, and objects, and of changing those pieces of memory. Instead, I learned to think about most of my programs as transforming input to output. This change in thinking has lead to lower defect rates, more modular code, and more testable code. Scala has also given me the tools to write smaller, more modular units of code and asse mble them together into a whole that is maintainable, yet far more complex than anything that I could write in Java or Ruby for that matter.

下载安装JDK6+配置PATH, Scala 2.10+下载zip版本的,然后解压就行了。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
winse@Lenovo-PC /cygdrive/d/scala/bin
$ ls -1
fsc
fsc.bat
scala
scala.bat
scalac
scalac.bat
scaladoc
scaladoc.bat
scalap
scalap.bat

winse@Lenovo-PC /cygdrive/d/scala/bin
$ scala
Welcome to Scala version 2.10.4 (Java HotSpot(TM) Client VM, Java 1.7.0_02).
Type in expressions to have them evaluated.
Type :help for more information.

scala> def fact(n:Int)=1 to n reduceLeft(_*_) // n!
fact: (n: Int)Int

scala> fact(5)
res0: Int = 120

语法结构,第一个Scala程序

运行程序的三种方式:

  • 命令行交互式的REPL(read-eval-print loop)
  • shell/cmd脚本
  • 编译打包成jar后运行,跟Java一样

REPL

进入到Scala的bin目录下,双击scala.bat打开。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
scala> 1+1
res0: Int = 2

scala> res0*8
res1: Int = 16

scala> val x="hello world"
x: String = hello world

scala> var xl=x.length
xl: Int = 11

scala> import java.util._
import java.util._

scala> val d = new Date
d: java.util.Date = Mon Sep 08 09:17:08 CST 2014

脚本

脚本中无需显示的定义main方法,当你运行脚本时,Scala把整个文件的内容添加到类的main方法中,编译代码,然后运行生成的main方法。你只需在脚本文件中编写scala代码即可。

1
2
3
4
5
6
7
winse@Lenovo-PC ~
$ scala hello.scala
hello world

winse@Lenovo-PC ~
$ cat hello.scala
println("hello world")

编译后运行

运行方式和javac类似,会生成对应类的字节码class文件。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
winse@Lenovo-PC ~/scala-hello
$ scalac hello.scala

winse@Lenovo-PC ~/scala-hello
$ ll
total 13
-rwxr-xr-x  1 winse None 2067 Sep  8 09:27 hello$.class
-rwxr-xr-x  1 winse None  704 Sep  8 09:27 hello$delayedInit$body.class
-rwxr-xr-x  1 winse None  921 Sep  8 09:27 hello.class
-rw-r--r--+ 1 winse None   58 Sep  8 09:26 hello.scala

winse@Lenovo-PC ~/scala-hello
$ cat hello.scala
object hello extends App {

  println("hello world")

}

winse@Lenovo-PC ~/scala-hello
$ scala hello
hello world

编译器的启动是很耗时的操作,你可以使用fsc(fast Scala Compiler),fsc是单独运行在后台的编译进程。

如果你原有的项目中使用Ant或Maven,scala有对应的插件,可以很容易把Scala集成到项目中。

First Scala Programs

在Scala,你可以编写像ruby和python脚本语言代码。如输出“hello world”的println方法,封装了System.out.println()。因为太常用了,println被定义在Scala的Predef(预定义成员)中,每个程序都会自动加载,就像java.lang会自动引入到每个java程序一样。

1
2
3
4
5
6
7
8
println("hello world")

for {i<- 1 to 10}
  println(i)

for {i<- 1 to 10
     j<- 1 to 10}
  println(i*j)

99乘法表:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
scala> for(i<- 1 to 9){
     | for(j<- 1 to i)
     | printf("%s*%s=%2s\t",j,i,i*j);
     |
     | println()
     | }
1*1= 1
1*2= 2  2*2= 4
1*3= 3  2*3= 6  3*3= 9
1*4= 4  2*4= 8  3*4=12  4*4=16
1*5= 5  2*5=10  3*5=15  4*5=20  5*5=25
1*6= 6  2*6=12  3*6=18  4*6=24  5*6=30  6*6=36
1*7= 7  2*7=14  3*7=21  4*7=28  5*7=35  6*7=42  7*7=49
1*8= 8  2*8=16  3*8=24  4*8=32  5*8=40  6*8=48  7*8=56  8*8=64
1*9= 9  2*9=18  3*9=27  4*9=36  5*9=45  6*9=54  7*9=63  8*9=72  9*9=81

编写复杂点的程序,可以使用Scala-IDE

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
import scala.io._   // like java import scala.io.*

def toInt(in: String): Option[Int] =
  try {
    Some(Integer.parseInt(in.trim))
  } catch {
    case e: NumberFormatException => None
  }

def sum(in: Seq[String]) = {
  val ints = in.flatMap(s => toInt(s))
  ints.foldLeft(0)((a, b) => a + b)
}

println("Enter some numbers and press CTRL+C")

val input = Source.fromInputStream(System.in)
val lines = input.getLines.toSeq

println("Sum " + sum(lines))

Option是包含一个或零个对象的容器。如果不包含元素,返回的是单例的None。如果包括一个元素,就是新的Some(theElement)的实例。Option是Scala中避免空指针异常(null pointer)和显示进行null检查的处理一种方式。如果是None,一个业务逻辑将应用到0个元素,是Some就应用到一个元素上。

方法没有显示的return语句,默认就是方法“最后”(逻辑上最后执行的)一个语句的返回值。

sum方法的参数Seq是一个trait(类似java interface),是Array,List以机构其他顺序集合的父trait。trait拥有java interface的所有特性,同时traits可以包括方法的实现。你可以混合很多的traits成一个类。Traits除了不能定义有参构造函数外,其他和类一样。trait使得“多重继承”简单化,无需担忧 the diamond problem(有点类似近亲结婚 ^ v ^)。

如:当BC都实现了M方法,D不知道用谁的M,会有歧义!!

在Scala中,定义参数分为val和var,val类似于java final,var类似于java的变量定义。对于不变化的变量,定义为val可以减少代码错误几率,进行防御性的编程。

接下来运行程序, 输入一些数字后按CTRL+C结束,就会输出计算的和。

1
2
3
4
5
6
7
8
E:\local\home\Administrator\scala-hello>scala Sum.scala
Enter some numbers and press CTRL+C
12
23
34
45
Sum 114
终止批处理操作吗(Y/N)?

基本的语法Basic Syntax

Scala的全部语法和语言的定义可以查看Scala Language Specification

数字、字符串和XML常量

  • ; 行结束符可以忽略

  • 和Java一样的常量定义

    Integer: 1882, -1 Boolean: true, false Double: 1.0, 1d, 1e3 Long: 42L Float: 78.9f Characters: ‘4’, ‘?’, ‘z’ Strings: “Hello World”

  • Scala支持多行的字符串

    “”“Hello Multiline World”“”

  • Scala支持XML常量,包括内嵌的Scala代码

    Foll

      {(1 to 3).map(i =>
    • {i}
    • )}

包package和import

package定义在源代码非注释的第一行。和java一样。

import则比java的更加灵活。基本的用法使用:

1
import scala.xml._

scala中的import可以基于前面的imports语句。如再导入scala.xml.transform

1
import transform._

也可以导入一个具体的class和object:

1
import scala.collection.mutable.HashMap

一次性倒入一个package下的几个class或object:

1
import scala.collection.immutable.{TreeMap, TreeSet}

甚至可以给原有的class或object定义一个别名。

1
import scala.util.parsing.json.{JSON => JsonParser}

import可以定义在任何代码块中,并且只会在当前作用域内有效。还可以引入objects的method,相当于java的import static。

1
2
3
4
5
6
7
8
9
10
11
class Frog {
  import scala.xml._
  def n: NodeSeq = NodeSeq.Empty
}

object Moose {
  def bark = "woof"
}

import Moose._
bark

Class, Trait和Object

Scala的对象语法和规则比Java的更加复杂。

Scala去掉了一个文件中只能定义一个public类的限制。你想在一个文件里面放n个类都可以,同时文件的名称也没有限制(Java文件名需要和public的类同名)。

Scala中默认访问级别是public的。

1
2
3
4
5
6
// scala
class Foo

// java
public class Foo {
}

如果构造函数、方法没有参数,可以省略参数列表(即不需要输入括号)。

1
2
3
4
5
6
7
8
9
10
11
12
new Foo

new Foo()

class Bar(name: String)

new Bar("Working...")

class Baz(name: String) {
  // constructor code is inline
  if(name == null) throw new Exception("Name is null")
}

Scala的trait,和java中的interface类似。同时trait可以包括具体实现的方法,这是一个非常方便的特性,你不必在定义复杂的类继承关系来实现代码的重用,在Scala中,把代码写在trait中即可。Scala traits类似于Ruby mixins

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
trait Dog

class Fizz2(name: String) extends Bar(name) with Dog

trait Cat {
  def meow(): String
}

trait FuzzyCat extends Cat {
  override def meow(): String = "Meeeeeeeeeow"
}

trait OtherThing {
  def hello() = 4
}

class Yep extends FuzzyCat with OtherThing

(new Yep).meow()
(new Yep).hello()

Scala中不支持static关键字,可以使用object单例对象来实现类似的功能。当object对象第一次访问才会被初始化,在对应的访问域内仅有一个该实例。Scala object还有一个优势,由于是类的实例,所以可以作为方法参数进行传递。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
object Simple

object OneMethod {
  def myMethod() = "Only One"
}

object Dude extends Yep

object Dude2 extends Yep {
  override def meow() = "Dude looks like a cat"
}

object OtherDude extends Yep {
  def twoMeows(otherparam: Yep) = meow + ", " + otherparam.meow
}

OtherDude.meow // Meeeeeeeeeow
OtherDude.twoMeows(Dude) // Meeeeeeeeeow, Meeeeeeeeeow
OtherDude.twoMeows(Dude2) // Meeeeeeeeeow, Dude looks like a cat

如果object嵌套定义在class, trait, object内部的时刻,在其作用域下每个实例会创建一个object的单例。

1
2
3
4
5
6
7
class HasYep {
  object myYep extends Yep {
    override def meow = "Moof"
  }
}

(new HasYep).myYep.meow // 每个HasYep实例会有一个单独的myYep

同样Classes,Objects,traits也可以嵌套在classes,objects,traits。

1
2
3
4
class HasClass {
  private class MyDude extends FuzzyCat
  def makeOne(): FuzzyCat = new MyDude
}

类继承Class Hierarchy

除了方法(method),其他一切都是对象(an instance of a class)。Java的primitives类型在Scala也被当做对象,如int(Int)。当两个Ints相加时,Scala编译器会对字节码进行优化最终和java的两个ints相加时一样的。如果使用了Int的方法hashCode和toString,当primitive类型被用于需要引用类型时(expects an Any),Scala编译器会对其进行装箱,如把Int值加入到HashMap。

为了保持命名的规范化,即所有类的第一个单词都是大写的。在Scala中的原始类型对应为Int,Long,Double,Float,Boolean,Char,Short,Byte,他们都是AnyVal的子类。java的void对应Unit, 同样是AnyVal的子类。你也可以使用()来显示的返回Unit类型实例。

1
2
3
val v = ()

List(v) // List[Unit] = List(())

Nothing是很酷,任何方法返回Nothing,表示它不是正常返回,肯定是抛出了异常。None是一个Option[Nothing]的实例,它的get方法会返回Nothing,也就是说get方法会抛出异常,而不是返回底层的值类型null。

Any是Scala中所有类的基类,想Object在Java中的地位。但是,Nothing/primitives等等,所以需要在Object下面定义Scala的根基类。

AnyVal是Scala中primitives对象的包装类的基类。 AnyRef与Java中的Object类似。eq,ne,==,!=这些方法的含义不同。==编译后最终调用java的equals方法,如果需要进行对象引用的比较,使用eq进行处理。

方法声明

类型推测很强大也很有用,但是需要小心使用,当类型返回类型不明确时,需要显示进行声明。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
def myMethod(): String = "Moof"

def myOtherMethod() = "Moof" // not have to explicity declare the return type

def foo(a: Int): String = a.toString

def f2(a: Int, b: Boolean): String = if(b) b.toString else "false"

def list[T](p: T): List[T] = p :: Nil

list(1)
list("Hello")

// 可变参数, Seq[Int]
def largest(as: Int*): Int = as.reduceLeft((a,b) => a max b)

largest(1)
largest(2, 3, 99)
largest(33, 22, 33, 22)

def mkString[T](as: T*): String = as.foldLeft("")(_ + _.toString)

def sum[T <: Number](as: T*): Double = as.foldLeft(0d)(_ + _.doubleValue)

方法可以定义在任何方法块中,除了最外层即classes,traits,objects定义的地方。方法中可以使用当前作用域类的所有的成员。

1
2
3
4
5
6
7
8
9
10
11
def readLines(br: BufferedReader) = {
  var ret: List[String] = Nil

  def readAll(): Unit = br.readLine match { 
      case null =>
      case s => ret ::= s; readAll()
  }

  readAll()
  ret.reverse
}

方法重写和java的不一样,被重写的方法必须带上override的修饰符。重写抽象的方法可以不带override的修饰符。

1
2
3
4
5
6
7
abstract class Base {
  def thing: String
}

class One extends Base {
  def thing = "Moof"
}

不带参数的方法和变量可以使用相同的方式访问,重写父类方法时可以使用val代替def。

1
2
3
4
5
6
7
class Two extends One {
  override val thing = (new java.util.Date).toString
}

class Three extends One {
  override lazy val thing = super.thing + (new java.util.Date).toString
}

变量声明

和声明方法类似,不过关键字使用val, var, lazy val。var 可以在设置值以后再次进行修改,类似于java中的变量。val在运行到该作用域时就初始化。lazy val仅在访问的时刻进行计算一次。

1
2
3
var y: String = "Moof"
val x: String = "Moof"
lazy val lz: String = someLongQuery()

在编程时,不推荐使用var变量除非一定要用变量。Given that mutability leads to unexpected defects, minimizing mutability in code minimizes mutability-related defects.

Scala类型推测对变量一样有效,在参数类型明确的情况下,定义参数时可以不用指定类型。

1
2
var y2 = "Moof"
val x2 = "Moof"

Scala支持同时接受多个参数值。 If a code block or method returns a Tuple, the Tuple can be assigned to a val variable.

1
2
val (i1: Int, s1: String) = Pair(33, "Moof")
val (i2, s2) = Pair(43, "Moof")

运行的效果如下:

1
2
3
4
5
6
scala> val (i2,s2)=Pair(43,"W")
i2: Int = 43
s2: String = W

scala> i2
res0: Int = 43

代码块

方法和参数定义都可以定义在单行。

1
def meth9 = "hello world"

或者定义在大括号包围的代码块中。代码块可以去嵌套。代码块的返回值是最后一个行的运行结果。

1
2
3
4
5
def meth3(): String = {"Moof"}
def meth4(): String = {
  val d = new java.util.Date()
  d.toString()
}

参数定义同样可以使用代码块,适合于有少量计算的赋值操作。

1
2
3
4
val x3: String = {
  val d = new java.util.Date()
  d.toString()
}

Call-by-Name

在java中,所有方法是按call-by-reference或者call-by-value(原始类型)调用。也就是说,在调用栈中的参数的值或者引用(AnyRef)会传递给调用者。

Scala提供另一种传递参数给方法(函数)的方式:call-by-name,可以把方法块传给调用者。 Each time the callee accesses the parameter, the code block is executed and the value is calculated.

Call-by-name容许我们把耗时的操作(但可能不会用到的)当做参数。For example, in a call to the logger you can use call-by-name, and the express to print is only calculated if it’s going to be logged。Call-by-name同样容许我们创建(如while/doWhile)自定义的控制结构。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
def nano() ={
  println("Getting nano")
  System.nanoTime
}

def delayed(t: => Long) = {
  println("In delayed method")
  println("Param: " + t)
  t
}

scala> delayed(nano())
In delayed method
Getting nano
Param: 198642874346225
Getting nano
res1: Long = 198642875202814

def notDelayed(t: Long) = {
  println("In not delayed method")
  println("Param: " + t)
  t
}

scala> notDelayed(nano)
Getting nano
In not delayed method
Param: 199944029171474
res5: Long = 199944029171474

注意println输出的位置和次数。

方法调用

1
2
3
4
5
instance.method()
instance.method

instance.method(param)
instance method param

方法没有参数时可以省略括号。当只有可以参数时,可以省去点和括号。

实际运行效果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
scala> "abc" toUpperCase
warning: there were 1 feature warning(s); re-run with -feature for details
res0: String = ABC

scala> "abc".toUpperCase
res1: String = ABC

scala> "abc".charAt 1
<console>:1: error: ';' expected but integer literal found.
       "abc".charAt 1
                    ^

scala> "abc" charAt 1
res2: Char = b

scala> "abc" concat "efg"
res3: String = abcefg

Scala允许方法名中包括+/-/*/?, Scala’s dotless method notation creates a syntactically neutral way of invoking methods that are hard-coded operators in Java.

1
2
3
4
5
scala> 2.1.*(4.3)
res4: Double = 9.03

scala> 2.1 * 4.3
res5: Double = 9.03

多参数的方法调用和java一样。

1
instance.method(p1, p2)

Scala中的泛型方法,编译器可以进行类型推断。当然你也可以显示的指定类型。

1
instance.method[TypeParam](p1, p2)

Functions, apply, update, and Compiler Magic

Scala是一门函数语言,也意味着你可以传递函数,可以把函数作为返回值在函数和方法中返回。

函数是一个带有参数和返回值的代码块。 在JVM中是不容许传递代码块的。Scala中使用特定接口的匿名内部类作为函数内部实现。当传递一个函数时,其实就是传递一个特定接口(trait)的对象。

定义函数的trait使用一个参数和一个返回值:

1
Function1[A, B]

其中A是参数类型,B是返回值类型。

所有的函数接口都有一个apply的方法,用于函数的调用。

1
Function1.apply(p: A): B

Thus, you can define a method that takes a function and invokes the function with the parameter 42:

1
def answer(f: Function1[Int, String]) = f.apply(42)

如果(只要)对象包括apply方法,可以省略apply,直接把参数跟在函数名后面。

1
def answer(f: Function1[Int, String]) = f(42)

Scala提供的语法糖,在编译时f(42)会编译成f.apply(42)。这样使用可以让代码更简洁漂亮,同时看起来更像函数调用的写法。

更多的语法糖:

1
2
3
4
Function1[Int, String]
Int => String

def answer(f: Int => String) = f(42)

这种语法糖适用于所有包括apply方法对象。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
scala> class Ap {
     | def apply(in: Int) = in.toString
     | }
defined class Ap

scala> new Ap()(44)
res0: String = 44

scala> new Ap(44)
<console>:9: error: too many arguments for constructor Ap: ()Ap
              new Ap(44)
              ^

scala> val a = new Ap
a: Ap = Ap@18258b2

scala> a(44)
res2: String = 44

如果类包括update方法,编译解析赋值操作时,会调用两个参数的update方法。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
scala> class Up {
     | def update(k: Int, v: String) = println("Hey: " + k + " " + v)
     | }
defined class Up

scala> val u = new Up
u: Up = Up@7bfd80

scala> u(33) = "hello"
Hey: 33 hello

scala> class Update {
     | def update(what: String) = println("Singler: " + what)
     | def update(a: Int, b: Int, what: String) = println("2d update")
     | }
defined class Update

scala> val u = new Update
u: Update = Update@4bd4d2

scala> u() = "Foo"
Singler: Foo

scala> u(3,4) = "Howdy"
2d update

Scala中Array和HashMap使用update的方式进行设值。使用这种方式我们可以编写和Scala类似特性的库。

Scala的这些特性可以让我们编写更易理解的代码。同时理解Scala的这些语法糖,能更好的与java类库一起协作。

Case Classes

Scala has a mechanism for creating classes that have the common stuff filled in. Most of the time, when I define a class, I have to write the toString, hashCode, and equals methods. These methods are boilerplate. Scala provides the case class mechanism for filling in these blanks as well as support for pattern matching.

A case class provides the same facilities as a normal class, but the compiler generates toString, hashCode, and equals methods (which you can override).

Case classes can be instantiated without the use of the new statement. By default, all the parameters in the case class’s constructor become properties on the case class. Here’s how to create a case class:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
scala> case class Stuff(name: String, age: Int)
defined class Stuff

scala> val s = Stuff("David", 45)
s: Stuff = Stuff(David,45)

scala> s.toString
res0: String = Stuff(David,45)

scala> s == Stuff("David", 45) // == 相当于java中的equals
res1: Boolean = true

scala> s == Stuff("David", 42)
res2: Boolean = false

scala> s.name
res4: String = David

scala> s.age
res5: Int = 45

手写case class功能的类:

1
2
3
4
5
6
7
8
9
10
11
12
13
class Stuff(val name: String, val age: Int) {
  override def toString = "Stuff(" + name + "," + age + ")"
  override def hashCode = name.hashCode + age
  override def equals(other: AnyRef) = other match {
      case s: Stuff => this.name == s.name && this.age = s.age
      case _ => false
  }
}

object Stuff {
  def apply(name: String, age: Int) = new Stuff(name, age)
  def unapply(s: Stuff) = Some((s.name, s.age))
}

Basic Pattern Matching

模式匹配(Pattern matching)可以使用很少的代码编写非常复杂的判断。Scala Pattern matching和Java switch语句类似, but you can test against almost anything, and you can even assign pieces of the matched value to variables. Like everything in Scala, pattern matching is an expression, so it result s in a value that may be assigned or returned. The most basic pattern matching is like Java’s switch, except there is no break in each case as the cases do not fall through to each other.

1
2
3
4
44 match {
  case 44 => true
  case _ => false
}

可以对String进行match操作,类似于C#。

1
2
3
4
5
"David" match {
  case "David" => 45
  case "Elwood" => 77
  case _ => 0
}

可以多case classes进行模式匹配(pattern match)操作。Case classes提供了非常适合与pattern-matching的语法。下面的例子,用于匹配Stuff的name==David以及age==45的对象。

1
2
3
4
Stuff("David", 45) match {
  case Stuff("David", 45) => true
  case _ => false
}

仅匹配名字:

1
2
3
4
Stuff("David", 45) match {
  case Stuff("David", _) => "David"
  case _ => "Other"
}

还可以把值提取出来,如把age的值赋给howOld变量:

1
2
3
4
Stuff("David", 45) match {
  case Stuff("David", howOld) => "David, age: " + howOld
  case _ => "Other"
}

还可以在pattern和=>之间添加条件。如年龄小于30的返回young David,其他的结果为old David。

1
2
3
4
5
Stuff("David", 45) match {
  case Stuff("David", age) if age < 30 => "young David"
  case Stuff("David", _) => "old David"
  case _ => "Other"
}

Pattern matching还可以根据类型进行匹配:

1
2
3
4
5
6
x match {
  case d: java.util.Date => "The date in milliseconds is " + d.getTime
  case u: java.net.URL => "The URL path: " + u.getPath
  case s: String => "String: " + s
  case _ => "Something else"
}

如果使用Java代码的话,需要多很多的转换!!

1
2
3
4
if(x instanceof Date) return "The date in milliseconds is " + ((Date)x).getTime();
if(x instanceof URL) return "The URL path: " + ((URL)x).getPath();
if(x instanceof String) return "String: " + ((String)x);
return "Something else"

if/else and while

while在Scala中比较少用。if/else使用频率高一些,比java的三目赋值操作符(?:)使用频率更高。if和while表达式总是返回Unit(相当于Java的Void)。if/else的返回值更具各个部分表单时类型确定。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
if(exp) println("yes")

// multiline
if(exp) {
  println("Line one")
  println("Line two")
}

val i: Int = if(exp) 1 else 3

val i: Int = if(exp) 1 
else {
  val j = System.currentTimeMillis
  (j % 100L).toInt
}

while executes its code block as long as its expression evaluates to true, just like Java. In practice, using recursion, a method calling itself, provides more readab le code and enforces the concept of transforming input to output rather than changing, mutating, variables. Recursive methods can be as efficient as a while loop.

1
2
3
4
while (exp) println("Working...")
while (exp) {
  println("Working...")
}

for

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
for { i <- 1 to 3} println(i)

for { i <- 1 to 3
      j <- 1 to 3
  } println(i*j)

def isOdd(in: Int) = in % 2 == 1
for {i <- 1 to 5 if ifOdd(i)} println(i)

for {i <- 1 to 5
      j <- 1 to 5 if isOdd(i*j)} println(i*j)

val lst = (1 to 18 by 3).toList
for {i <- lst if isOdd(i)} yield i

for {i <- lst; j <- lst if isOdd(i*j)} yield i*j

将在第三章-集合中更详细的讲解for使用方法。

throw, try/catch/finally, and synchronized

try/finally的写法和java类似:

1
2
3
4
5
6
7
throw new Exception("Working...")

try{
  throw new Exception("Working...")
} finally {
  println("This will always be printed")
}

try/catch的语法不大一样,catch对异常进行了封装,首先它是一个表达式其返回值是一个值;使用case(pattern matched)来匹配异常类型。

1
2
3
4
5
6
7
8
9
try {
  file.write(stuff)
} catch {
  case e: java.io.IOException => // handle IO Exception
  case n: NullPointerException => // handle null Exception
}

try { Integer.parseInt("dog") } catch { case _ => 0 } //0
try { Integer.parseInt("44") } catch { case _ => 0 } //44

基于对象的同步操作,每个类都自带了synchronized方法。

1
2
3
obj.synchronized {
  // do something that needs to be serialized
}

不像java有synchronized方法修饰符。在Scala中同步方法定义使用:

1
2
3
def foo(): Int = synchronized {
  42
}

Comments

注释基本上类C的语言都一样,单行//、多上/* ... */

在Scala中还可以嵌套的注释。

1
2
3
4
5
6
7
/*
  This is an outer comment
  /* And this comment
     is nested
  */
  Outer comment
*/

Scala vs Java vs Ruby

类和实例

java有原始类型。Scala中操作都是方法调用,所有东西都是对象,无需为了原始类型而进行额外的判断/处理。

1
2
1.hashCode
2.toString

我们可以定义一个方法,传递函数(从一个Int到另一个Int转换操作)。

1
2
def with42(in: Int => Int) = in(42)
with42( 33 + )

在语言级别,如果所有东西都是统一的,在进行编程设计时就会很方便和简单。同时Scala编译时会针对JVM原始类型进行优化,使得scala的代码在效率上非常接近Java。

Traits, Interfaces, and Mixins

在java中除了Object对象,其他对象都有一个唯一的父类。Java类可以实现一个或者多个接口(定义实现类必须实现方法的约定)。这是依赖注入和测试mocks,以及其他抽象模式的基础。

Scala使用traits, Traits提供了Java接口拥有的所有特性。同时Traits可以包括方法的实现以及参数的定义。方法实现一次,把所有继承traits的方法混入子类中。

Object, Static, and Singletons

在Java中,可以定义类的(静态)方法和属性,提供了访问方法的唯一入口,同时不需要实例化对象。类(静态)属性提供了在JVM中全局共享数据的方式。 Scala提供了类似的机制:Objects。Objects是单例模式的实现。在类加载的时刻实例化该对象。这种方式同样可以共享全局状态。而且,objects也是Scala完全的面向对象的一种体现,objects是一个类的实例,而不是某种类级别的常量(some class-level constant)。可以把objects作为参数来进行传递。

Functions, Anonymous Inner Class, and Lambdas/Procs

The Java construct to pass units of computation as parameters to methods is anonymous inner class. 匿名内部类在Swing UI库非常的常见。在Swing中,许多UI事件处理的接口定义1-2个方法,在编写程序时,实现事件接口的内部类能访问外部类的私有成员数据。

Scala functions对应的就是匿名内部类。Scala functions实现了统一的接口,调用函数时执行接口的apply方法。和Java匿名内部类相比,Scala创建函数的语法更加简洁和优雅。同时,访问本地参数的规则也更加灵活。在Java匿名内部类只能访问final的参数,而Scala functions能访问和修改vars参数。

Scala和Ruby的面向对象模型和函数式编程很相似。同时Scala在访问类库和静态类型方面和Java很类似。Scala博采众长,把Java和Ruby的优点都囊括了。

总结

这一章首相讲了安装和运行Scala程序,然后围绕Scala编程的语法结构来展开。下一章讲解Scala的数据类型,使用很少的代码编写功能健壮的程序,同时编码量的减少也能有效的控制bugs的数量。

–END

Expect-批量实现SSH无密钥登录

在安装部署Hadoop集群的首要步骤就是配置SSH的无密钥登陆。

1
2
3
4
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys

ssh-copy-id -i ~/.ssh/id_rsa.pub root@$ip

然后,可以通过ssh命令来进行批量的操作。

1
2
ssh root@$ip 'cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys'
scp -o StrictHostKeyChecking=no /etc/hosts root@${ip}:/etc/

但是,一些需要密码的dialogue形式的输入时,部署N台datanode就需要输入N遍!同时新建用户也是需要输入用户密码的操作!!

Linux Expect就是用来自动化处理这些需求的。Except能根据提示来实现相应的输入。

1
2
3
4
5
6
7
8
9
10
11
[hadoop@master1 hadoop-deploy-0.0.1]$ ssh-copy-id localhost
The authenticity of host 'localhost (::1)' can't be established.
RSA key fingerprint is 4e:fe:7a:0a:98:6e:9a:ab:af:e4:65:51:9b:3d:e0:99.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
hadoop@localhost's password: 
Now try logging into the machine, with "ssh 'localhost'", and check in:

  .ssh/authorized_keys

to make sure we haven't added extra keys that you weren't expecting.

根据需要提示信息,以及需要输入的信息,可以编写对应expect脚本来进行自动化。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
[hadoop@master1 hadoop-deploy-0.0.1]$ cat bin/ssh-copy-id.expect 
#!/usr/bin/expect  

## Usage $0 [user@]host password

set host [lrange $argv 0 0];
set password [lrange $argv 1 1] ;

set timeout 30;

spawn ssh-copy-id $host ;

expect {
  "(yes/no)?" { send yes\n; exp_continue; }
  "password:" { send $password\n; exp_continue; }
}

exec sleep 1;

同样新建用户初始化密码的操作一样可以使用expect来使用:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
[hadoop@master1 hadoop-deploy-0.0.1]$ cat bin/passwd.expect
#!/usr/bin/expect  

## Usage $0 host username password

set host [lrange $argv 0 0];
set username [lrange $argv 1 1];
set password [lrange $argv 2 2] ;

set timeout 30;

##

spawn ssh $host useradd $username ;

exec sleep 1;

##

spawn ssh $host passwd $username ;

## password and repasswd all use this
expect {
  "password:" { send $password\n; exp_continue; }
}

exec sleep 1;

有了上面的脚本,预定义每台机器的root密码,使用ssh-copy-id.expect建立到各台datanode机器的无密钥登录;然后passwd.expect脚本分发给各台机器,然后使用ssh进行运行脚本建立用户初始化密码。

Expect仅在master机器上安装就可以。安装程序的如下:

1
yum install expect

or

1
2
rpm -ivh tcl-8.5.7-6.el6.x86_64.rpm
rpm -ivh expect-5.44.1.15-5.el6_4.x86_64.rpm

–END