Winse Blog

走走停停都是风景, 熙熙攘攘都向最好, 忙忙碌碌都为明朝, 何畏之.

Oozie Start Guide

步骤记录

说明:cu2就是hadoop-master2

  1. 打包
1
2
3
4
[hadoop@cu2 oozie-4.2.0]$ vi bin/mkdistro.sh 
MVN_OPTS="-Dbuild.time=${DATETIME} -Dvc.revision=${VC_REV} -Dvc.url=${VC_URL} "

[hadoop@cu2 oozie-4.2.0]$ bin/mkdistro.sh -DskipTests -Dmaven.javadoc.skip=true
  1. 依赖
1
2
3
4
5
6
7
8
9
10
11
12
13
打包后,文件的位置
[hadoop@cu2 ~]$ tar zxvf sources/oozie-4.2.0/distro/target/oozie-4.2.0-distro.tar.gz

下载 <http://dev.sencha.com/deploy/ext-2.2.zip>

yum install zip

[hadoop@cu2 oozie-4.2.0]$ mkdir libext
[hadoop@cu2 oozie-4.2.0]$ cd libext/
[hadoop@cu2 libext]$ ll
total 7584
-rw-rw-r-- 1 hadoop hadoop 6800612 Sep  7 16:00 ext-2.2.zip
-rw-rw-r-- 1 hadoop hadoop  960372 Feb 28  2015 mysql-connector-java-5.1.34.jar
  1. 安装
1
2
3
[hadoop@cu2 oozie-4.2.0]$ bin/oozie-setup.sh prepare-war

setup后,生成的war的位置:/home/hadoop/oozie-4.2.0/oozie-server/webapps/oozie.war
  1. 初始化数据库
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
创建数据库用户

CREATE DATABASE oozie;
GRANT ALL ON oozie.* TO 'oozie'@'%' IDENTIFIED BY 'oozie';
FLUSH PRIVILEGES;
GRANT ALL ON oozie.* TO 'oozie'@'localhost'  IDENTIFIED BY 'oozie';
FLUSH PRIVILEGES;

show grants for oozie;

[hadoop@cu2 oozie-4.2.0]$ vi conf/oozie-site.xml 

<property>
<name>oozie.service.JPAService.jdbc.driver</name><value>com.mysql.jdbc.Driver</value>
</property>

<property>
<name>oozie.service.JPAService.jdbc.url</name><value>jdbc:mysql://localhost:3306/oozie</value>
</property>

<property>
<name>oozie.service.JPAService.jdbc.username</name><value>oozie</value>
</property>

<property>
<name>oozie.service.JPAService.jdbc.password</name><value>oozie</value>
</property>

这里直接把hadoop的jar添加到脚本中,不拷贝到libext下面
[hadoop@cu2 oozie-4.2.0]$ vi bin/ooziedb.sh
OOZIECPPATH=""
if [ ! -z ${HADOOP_HOME} ] ; then
  OOZIECPPATH="${OOZIECPPATH}:$($HADOOP_HOME/bin/hadoop classpath)"
fi

照着写就行了,不必考虑sql文件的存在与否
[hadoop@cu2 oozie-4.2.0]$ bin/ooziedb.sh create -sqlfile oozie.sql -run
  setting CATALINA_OPTS="$CATALINA_OPTS -Xmx1024m"

Validate DB Connection
DONE
DB schema does not exist
Check OOZIE_SYS table does not exist
DONE
Create SQL schema
DONE
Create OOZIE_SYS table
DONE

Oozie DB has been created for Oozie version '4.2.0'


The SQL commands have been written to: oozie.sql
  1. 启动服务
1
2
3
4
5
6
7
8
9
10
11
12
13
由于war中没有hadoop的jar,所以这里也需要把它们添加到tomcat
[hadoop@cu2 oozie-4.2.0]$ $HADOOP_HOME/bin/hadoop classpath | sed 's/:/,/g'
/home/hadoop/hadoop-2.7.1/etc/hadoop,/home/hadoop/hadoop-2.7.1/share/hadoop/common/lib/*,/home/hadoop/hadoop-2.7.1/share/hadoop/common/*,/home/hadoop/hadoop-2.7.1/share/hadoop/hdfs,/home/hadoop/hadoop-2.7.1/share/hadoop/hdfs/lib/*,/home/hadoop/hadoop-2.7.1/share/hadoop/hdfs/*,/home/hadoop/hadoop-2.7.1/share/hadoop/yarn/lib/*,/home/hadoop/hadoop-2.7.1/share/hadoop/yarn/*,/home/hadoop/hadoop-2.7.1/share/hadoop/mapreduce/lib/*,/home/hadoop/hadoop-2.7.1/share/hadoop/mapreduce/*,/home/hadoop/hadoop-2.7.1/contrib/capacity-scheduler/*.jar

处理下把*改成*.jar

[hadoop@cu2 oozie-4.2.0]$ vi oozie-server/conf/catalina.properties 
common.loader=${catalina.base}/lib,${catalina.base}/lib/*.jar,${catalina.home}/lib,${catalina.home}/lib/*.jar,/home/hadoop/hadoop-2.7.1/etc/hadoop,/home/hadoop/hadoop-2.7.1/share/hadoop/common/lib/*.jar,/home/hadoop/hadoop-2.7.1/share/hadoop/common/*.jar,/home/hadoop/hadoop-2.7.1/share/hadoop/hdfs,/home/hadoop/hadoop-2.7.1/share/hadoop/hdfs/lib/*.jar,/home/hadoop/hadoop-2.7.1/share/hadoop/hdfs/*.jar,/home/hadoop/hadoop-2.7.1/share/hadoop/yarn/lib/*.jar,/home/hadoop/hadoop-2.7.1/share/hadoop/yarn/*.jar,/home/hadoop/hadoop-2.7.1/share/hadoop/mapreduce/lib/*.jar,/home/hadoop/hadoop-2.7.1/share/hadoop/mapreduce/*.jar,/home/hadoop/hadoop-2.7.1/contrib/capacity-scheduler/*.jar

# 前台运行 bin/oozied.sh run
[hadoop@cu2 oozie-4.2.0]$ bin/oozied.sh start

http://localhost:11000/
  1. 测试
1
2
3
4
5
6
7
8
[hadoop@cu2 oozie-4.2.0]$ vi bin/oozie
OOZIECPPATH=""
if [ ! -z ${HADOOP_HOME} ] ; then
  OOZIECPPATH="${OOZIECPPATH}:$($HADOOP_HOME/bin/hadoop classpath)"
fi

[hadoop@cu2 oozie-4.2.0]$ bin/oozie admin -oozie http://localhost:11000/oozie -status
System mode: NORMAL
  1. 跑个helloworld
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
[hadoop@cu2 oozie-4.2.0]$ tar zxvf oozie-sharelib-4.2.0.tar.gz 
[hadoop@cu2 oozie-4.2.0]$ ~/hadoop-2.7.1/bin/hadoop fs -rmr share
[hadoop@cu2 oozie-4.2.0]$ ~/hadoop-2.7.1/bin/hadoop fs -put share share
[hadoop@cu2 oozie-4.2.0]$ tar zxvf oozie-examples.tar.gz 
[hadoop@cu2 oozie-4.2.0]$ ~/hadoop-2.7.1/bin/hadoop fs -put examples examples

修改share后重启下oozie,sharelib在应用中会缓冲,中间上传程序不能识别,会报`Could not locate Oozie sharelib`的错。

[hadoop@cu2 oozie-4.2.0]$ vi examples/apps/map-reduce/job.properties 
nameNode=hdfs://hadoop-master2:9000
jobTracker=hadoop-master2:8032
queueName=default
examplesRoot=examples

oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/apps/map-reduce/workflow.xml
outputDir=map-reduce

[hadoop@cu2 oozie-4.2.0]$ bin/oozie job -oozie http://localhost:11000/oozie -config examples/apps/map-reduce/job.properties -run
Error: E0501 : E0501: Could not perform authorization operation, User: hadoop is not allowed to impersonate hadoop

[hadoop@cu2 hadoop-2.7.1]$ vi etc/hadoop/core-site.xml 
<property>
<name>hadoop.proxyuser.hadoop.hosts</name><value>*</value>
</property>

<property>
<name>hadoop.proxyuser.hadoop.groups</name><value>*</value>
</property>

[hadoop@cu2 ~]$ for h in `cat /etc/hosts | grep slaver | awk '{print $2}' ` ; do rsync -vaz hadoop-2.7.1 $h:~/ --exclude=logs ; done

同步重启集群

注:增加以上配置后,无需重启集群,可以直接用hadoop管理员账号重新加载这两个属性值,命令为:
    hdfs dfsadmin -refreshSuperUserGroupsConfiguration
    yarn rmadmin -refreshSuperUserGroupsConfiguration

[hadoop@cu2 oozie-4.2.0]$ bin/oozie job -oozie http://localhost:11000/oozie -config examples/apps/map-reduce/job.properties -run
job: 0000000-150908082015741-oozie-hado-W

[hadoop@cu2 hadoop-2.7.1]$ bin/hadoop fs -cat /user/hadoop/examples/output-data/map-reduce/part-00000

尽管能看到结果了,但是不算任务执行成功。任务是有报错的`JA006: Call From cu2/192.168.0.214 to hadoop-master2:10020 failed on connection exception`

[hadoop@cu2 hadoop-2.7.1]$ sbin/mr-jobhistory-daemon.sh start historyserver

在运行一次就ok了。

参考

–END

Comments