开年2015

Thu 2015-02-19 01:42

红包把整个2015春节捣腾的。。。

到了凌晨，春晚谢幕，寒气开始袭人，陆陆续续大家都开始休息，除夕的红包大战也告一段落。

静下来，方能更好的反思总结。

2014年过的还算顺当，任务上不仅仅是一些繁琐的应付事的工作。提供了足够的空间余地，可以做做自己喜欢的事情，捣腾一阵后，有时间可以给自己思考。

工作4年，到新的环境已然不是新人，时间的积淀和历练让我们更成熟，更有资本的同时，承担更多的责任。在项目组中，可能需要做一些表率。我觉得也是最尴尬的时间，没有傲立群雄的能耐，也不是自甘堕落的无能者，后有追兵前有猛狼。真实的感觉到85后尴尬一代的现实！

工作一段时间后，越来越容易被身边的事情干扰，时不时就会被这样那样的事情打断头绪！在工作之余，学习的时间越来越少，被电视剧和游戏霸占，觉得很是不应该！对于好胜心强的自己来说，接下来还是应该一心一意的去做一件事情！这样自己才能提升的明显，不至于自己觉得碌碌无为！

2014也帮师兄做了些事情，遇到了一些不一样的人和事情，改变了自己原来而一些看法（或者说被现实打败了）。原来总认为别人做了类似的东西，咋用就好了，原来的自己不屑一顾这些东西。还要自己去再造轮子，不希望也觉得浪费时间精力。

其实已有的东西，自己实现一遍后，才是你自己的东西！！不要觉得是在做无用功，当你改进或者添加新的功能时，你会发现自己写的自己实践过的才是自己的，才能驾轻就熟！

好基友都结婚的勾搭的，知名而不认命，什么时刻能踏上点呢！？

–END

学习btrace

Fri 2015-02-06 20:38

到官网下载btrace，在btrace-bin.zip压缩包中包括了usersguide.html入门教程。源码用hg管理点击下载。

Btrace程序是一个普通的java类，由BTrace annotations标注的带有的public static void方法组成。从代码上面来看，类似于spring aop的写法。标注用于指定需要监控的位置（方法，类）。

Btrace程序是只读的，以及只能执行有限制的操作。一般的限制：

不能创建新对象（new objects）
不能创建数组（new arrays）
不能抛出异常（throw）
不能捕获异常（catch）
不能调用实例/静态方法。仅仅能调用com.sun.btrace.BTraceUtils的static方法。
不能给被监测的程序的实例或者静态字段复制。但是Btrace程序可以给自己的类的静态字段复制。
不能有实例字段和方法。Btrace类中仅能包括static字段和public static void的方法。
不能有内部类，嵌套类（outer, inner, nested or local classes）
不能有同步块或同步方法
不能包括循环（for，while，do…while）
不能继承（父类只能是默认的java.lang.Object）
不能实现接口
不能包括assert语句
不能使用类常量

helloworld

// import all BTrace annotations
import com.sun.btrace.annotations.*;
// import statics from BTraceUtils class
import static com.sun.btrace.BTraceUtils.*;

// @BTrace annotation tells that this is a BTrace program
@BTrace
public class HelloWorld {
 
    // @OnMethod annotation tells where to probe.
    // In this example, we are interested in entry 
    // into the Thread.start() method. 
    @OnMethod(
        clazz="java.lang.Thread",
        method="start"
    )
    public static void func() {
        // println is defined in BTraceUtils
        // you can only call the static methods of BTraceUtils
        println("about to start a thread!");
    }
}

通过命令行脚本btrace <PID> <btrace-script>脚本运行。script可以是java源文件，或者已经编译好的class字节码文件。

btracec提供了类似于javac的功能，额外会对include的文件中定义的变量进行替换。如果你的btrace类就是一个普通功能的java类的话，直接用javac编译及可以了。

编写一个测试类，然后监控这个java程序的线程启动：

public class HelloTest {

  @Test
  public void test() throws Exception {
      testNewThread();
  }
  
  public void testNewThread() throws InterruptedException {
      Thread.sleep(20 * 1000); // 最佳方式就是使用Scanner，手动输入一个操作后执行后面的操作。scanner.nextLine()

      for (int i = 0; i < 100; i++) {
          final int index = i;
          new Thread(//
                  new Runnable() {
                      public void run() {
                          System.out.println("my order: " + index);
                      }
                  } //
          ).start();
      }
  }

}

然后，启动btrace程序：

cd src\test\script

#下面的内容是一个批处理文件
set PATH=%PATH%;C:\cygwin\bin;C:\cygwin\usr\local\bin

set BTRACE_HOME=E:\local\opt\btrace-bin
set CUR_ROOT=%cd%\..\..\..
set SCRIPT=%CUR_ROOT%\src\main\java\com\github\winse\btrace\HelloWorld.java
set SCRIPT=%CUR_ROOT%\target\classes\com\github\winse\btrace\HelloWorld.class

jps -m  | findstr HelloTest | gawk '{print $1}' | xargs -I {} %BTRACE_HOME%\bin\btrace.bat {} %SCRIPT%

上面的主程序启动后sleep了20s，等btrace程序启动。如果是程序一启动就要进行监控记录，可vm的参数添加javaagent：

-javaagent:E:\local\opt\btrace-bin\build\btrace-agent.jar=noServer=true,scriptOutputFile=C:\Temp\test.txt,script=F:\workspaces\cms_hadoop\btrace\target\classes\com\github\winse\btrace\HelloWorld.class

添加到eclipse的运行配置（Debug Configurations）参数（Arguments）的VM arguments输入框内。

启动主程序，就可以在C:\Temp\test.txt文件看到btrace程序输出的内容了。

注解

参数

@Self获取this对象 @Return用于获取方法的返回值对象 @TargetInstance和@TargetMethodOrField用来查看被监控的方法内部调用那些实例的方法 @ProbeClassName和@ProbeMethodName用来检测获取当前被监控实例和方法（在OnMethod中使用通配符时，查看到底有那些方法被调用）

方法

@OnMethod @OnTimer @OnError @OnExist @OnLowMemory @OnEvent @OnProbe

源码

https://github.com/winse/helloJ/tree/hello/btrace

–END

Windows Gif

Wed 2015-02-04 15:18

看到linux上各种录制gif的工具：

yum install byzanz

byzanz-record -d 10 -x 0 -y 0 -w 1363 -h 758 byzanz-demo.gif

还有各种包装的工具：mkcast

本来想在cygwin中安装byzanz，但是编译需要各种库，最终放弃了。

其实在windows下面，也有很好的gif录制的工具：LICEcap

参考

–END

Build redis-2.8

Thu 2015-01-22 09:59

jemalloc

默认make使用的libc，在内存方面会产生比较多的碎片。可以使用jemalloc要进行内存的分配管理。

如果报make cc Command not found，需要先安装gcc。

tar zxvf redis-2.8.13.bin.tar.gz 
cd redis-2.8.13
cd deps/jemalloc/
# 用于产生h头文件
./configure 

cd redis-2.8.13
make MALLOC=jemalloc
src/redis-server 

查看jemalloc的include的内容如下：

[hadoop@localhost jemalloc]$ cd include/jemalloc/
internal/              jemalloc_defs.h.in     jemalloc_macros.h      jemalloc_mangle.h      jemalloc_mangle.sh     jemalloc_protos.h.in   jemalloc_rename.h      jemalloc.sh
jemalloc_defs.h        jemalloc.h             jemalloc_macros.h.in   jemalloc_mangle_jet.h  jemalloc_protos.h      jemalloc_protos_jet.h  jemalloc_rename.sh  

查看内存使用：

[hadoop@localhost redis-2.8.13]$ src/redis-cli info
...
# Memory
used_memory:503576
used_memory_human:491.77K
used_memory_rss:2158592
used_memory_peak:503576
used_memory_peak_human:491.77K
used_memory_lua:33792
mem_fragmentation_ratio:4.29
mem_allocator:jemalloc-3.6.0
...

redis在使用过程中，会产生碎片。重启以及libc和jemalloc的对比如下：

# 运行中实例
# Memory
used_memory:4623527744
used_memory_human:4.31G
used_memory_rss:48304705536
used_memory_peak:38217543280
used_memory_peak_human:35.59G
used_memory_lua:33792
mem_fragmentation_ratio:10.45
mem_allocator:libc

51616 hadoop    20   0 45.1g  44g 1136 S  0.0 35.7   3410:42 /home/hadoop/redis-2.8.13/src/redis-server *:6371

# 序列化为rdb的文件大小
[hadoop@hadoop-master1 18111]$ ll
总用量 1183116
-rw-rw-r--. 1 hadoop hadoop 1210319541 1月  14 11:28 dump.rdb

# 重启后的实例
[hadoop@hadoop-master1 18111]$  ~/redis-2.8.13/src/redis-server --port 18111
[77484] 14 Jan 14:33:17.910 * DB loaded from disk: 218.337 seconds

# Memory
used_memory:4763158704
used_memory_human:4.44G
used_memory_rss:6217580544
used_memory_peak:4763158704
used_memory_peak_human:4.44G
used_memory_lua:33792
mem_fragmentation_ratio:1.31
mem_allocator:libc

77484 hadoop    20   0 6052m 5.8g 1200 S  0.0  4.6   3:38.39 /home/hadoop/redis-2.8.13/src/redis-server *:18111

# 使用jemalloc替换libc的实例
[hadoop@hadoop-master1 18111]$ ~/redis-jemalloc/redis-2.8.13/src/redis-server --port 18888
[14793] 14 Jan 14:50:11.250 * DB loaded from disk: 209.839 seconds

# Memory
used_memory:4527760088
used_memory_human:4.22G
used_memory_rss:4625887232
used_memory_peak:4527760088
used_memory_peak_human:4.22G
used_memory_lua:33792
mem_fragmentation_ratio:1.02
mem_allocator:jemalloc-3.6.0

14793 hadoop    20   0 4538m 4.3g 1360 S  0.0  3.4   3:28.10 /home/hadoop/redis-jemalloc/redis-2.8.13/src/redis-server *:18888                                                                                                                       

tcmalloc

root安装

如果有root用户的话操作比较简单。现在gperftools和libunwind-0.99-beta

cd libunwind-0.99-beta
./configure 
make && make install
cd /home/hadoop/gperftools-2.4
./configure 
make && make install

cd redis-2.8.13
make MALLOC=tcmalloc

如果出现./libtool: line 1125: g++: command not found的错误，缺少编译环境；

[root@localhost gperftools-2.4]# yum -y install gcc+ gcc-c++

编译后，运行报错src/redis-server: error while loading shared libraries: libtcmalloc.so.4: cannot open shared object file: No such file or directory，需要配置环境变量：

[hadoop@localhost redis-2.8.13]$ export LD_LIBRARY_PATH=/usr/local/lib
[hadoop@localhost redis-2.8.13]$ src/redis-server 

或者按照网上的做法：

echo "/usr/local/lib" > /etc/ld.so.conf.d/usr_local_lib.conf  
/sbin/ldconfig  

检查tcmalloc是否生效lsof -n | grep tcmalloc，出现以下信息说明生效

redis-ser 1716    hadoop  mem       REG  253,0  2201976  936349 /usr/local/lib/libtcmalloc.so.4.2.6

修改配置文件找到daemonize，将后面的no改为yes，让其可以以服务方式运行。

普通用户安装

考虑到可以各台机器上面复制，指定编译目录这种方式会比较方便。

cd libunwind-0.99-beta
CFLAGS=-fPIC ./configure --prefix=/home/hadoop/redis
make && make install

cd gperftools-2.4
./configure -h
export LDFLAGS="-L/home/hadoop/redis/lib"
export CPPFLAGS="-I/home/hadoop/redis/include"
./configure --prefix=/home/hadoop/redis
make && make install

编译好后，把东西redis目录内容移到redis-2.8.13/src下。然后修改src/Makefile：

[hadoop@master1 redis-2.8.13]$ vi src/Makefile
# Include paths to dependencies
FINAL_CFLAGS+= -I../deps/hiredis -I../deps/linenoise -I../deps/lua/src
    
ifeq ($(MALLOC),tcmalloc)
        #FINAL_CFLAGS+= -DUSE_TCMALLOC
        #FINAL_LIBS+= -ltcmalloc
        FINAL_CFLAGS+= -DUSE_TCMALLOC -I./include
        FINAL_LIBS+= -L./lib  -ltcmalloc -ldl

endif

ifeq ($(MALLOC),tcmalloc_minimal)
        FINAL_CFLAGS+= -DUSE_TCMALLOC
        FINAL_LIBS+= -ltcmalloc_minimal
endif

然后编译：

[hadoop@master1 redis-2.8.13]$ export LD_LIBRARY_PATH=/home/hadoop/redis-2.8.13/src/lib
[hadoop@master1 redis-2.8.13]$ make MALLOC=tcmalloc
cd src && make all
make[1]: Entering directory `/home/hadoop/redis-2.8.13/src'
    LINK redis-server
    INSTALL redis-sentinel
    CC redis-cli.o
In file included from zmalloc.h:40,
                 from redis-cli.c:50:
./include/google/tcmalloc.h:35:2: warning: #warning is a GCC extension
./include/google/tcmalloc.h:35:2: warning: #warning "google/tcmalloc.h is deprecated. Use gperftools/tcmalloc.h instead"
    LINK redis-cli
    CC redis-benchmark.o
In file included from zmalloc.h:40,
                 from redis-benchmark.c:47:
./include/google/tcmalloc.h:35:2: warning: #warning is a GCC extension
./include/google/tcmalloc.h:35:2: warning: #warning "google/tcmalloc.h is deprecated. Use gperftools/tcmalloc.h instead"
    LINK redis-benchmark
    CC redis-check-dump.o
    LINK redis-check-dump
    CC redis-check-aof.o
    LINK redis-check-aof

Hint: To run 'make test' is a good idea ;)

make[1]: Leaving directory `/home/hadoop/redis-2.8.13/src'
[hadoop@master1 redis-2.8.13]$ 

redis3集群安装cluster

编译安装和2.8一样，configuration/make/makeinstall即可。

[hadoop@umcc97-44 cluster-test]$ cat cluster.conf 
port .
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000
appendonly yes

比较苦逼的是需要安装ruby，服务器不能上网！其实ruby在能访问的机器上面安装就可以了！初始化集群的脚本其实就是客户端连接服务端，初始化集群而已。还有就是在调用命令的时刻要加上-c，这样才是使用集群模式，不然仅仅连单机，读写其他集群服务会报错！

Cygwin

开发环境系统都是在windows，想调试一步步的看源码就得编译下redis。由于cygwin环境，模拟的linux，有部分的变量没有定义，需要进行修改。修改如下:

$ git log -1
commit 0c211a1953afeda3d0d45126653e2d4c38bd88cb
Author: antirez <antirez@gmail.com>
Date:   Fri Dec 5 10:51:09 2014 +010

$ git branch
* 2.8

$ git diff
diff --git a/deps/hiredis/net.c b/deps/hiredis/net.c
index bdb84ce..6e95f22 100644
--- a/deps/hiredis/net.c
+++ b/deps/hiredis/net.c
@@ -51,6 +51,13 @@
 #include "net.h"
 #include "sds.h"

+/* Cygwin Fix */
+#ifdef __CYGWIN__
+#define TCP_KEEPCNT 8
+#define TCP_KEEPINTVL 150
+#define TCP_KEEPIDLE 14400
+#endif
+
 /* Defined in hiredis.c */
 void __redisSetError(redisContext *c, int type, const char *str);

diff --git a/src/Makefile b/src/Makefile
index 8b3e959..a72b2f2 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -63,6 +63,9 @@ else
 ifeq ($(uname_S),Darwin)
        # Darwin (nothing to do)
 else
+ifeq ($(uname_S),CYGWIN_NT-6.3-WOW64)
+       # cygwin (nothing to do)
+else
 ifeq ($(uname_S),AIX)
         # AIX
         FINAL_LDFLAGS+= -Wl,-bexpall
@@ -75,6 +78,7 @@ else
 endif
 endif
 endif
+endif
 # Include paths to dependencies
 FINAL_CFLAGS+= -I../deps/hiredis -I../deps/linenoise -I../deps/lua/src

然后编译：

cd deps/
make lua hiredis linenoise

cd ..
make

编译成功后，把程序导入eclipse CDT环境进行运行调试。导入后需要重新构建一下，不然调试的时刻会按照/cygwin的路径来查找源码。

Import，然后选择C/C++目录下的[Existing Code as Makefile project]
在[Existing Code Location]填入redis程序对应的目录，在[Toolchain for Indexer Settings]选择Cygwin GCC
导入完成后，右键选择[Build Configuration]->[Build All]
Run然后选择执行redis-server即可。

好像也可以远程调试

[root@Frankzfz]$gdbserver 10.27.10.48:9000 ./test_seg_fault

参考

–END

Kafka快速入门

Thu 2015-01-08 22:02

年前的时刻就听过kafka的大名，但是一直没有机会亲手尝试。数据写入HDFS然后再MapReduce去处理数据，这样会多出很多中间过程，浪费系统资源。实践下kafka+spark分析是否会更高效。首先了解kafka的基本操作。

文档先进行简单的介绍。kafka是一个分布式的、分区的、冗余的日志服务，提供消息系统类似的功能。主要的概念： Topic，Producers，Consumers，Partition，Distribution（replicated）；producers通过TCP发送消息给Kafka集群，然后consumer从Kafka集群获取信息。

Kafka遵循：

对于同一个生产者产生的消息有序。
消费者看到的消息顺序和消息存储的顺序一致
一个主题冗余为N的，可以容忍N-1个服务器失败而不会丢失任何消息。

下载kafka，当前稳定版本为kafka_2.10-0.8.1.1。下载后解压就可以运行了。

启动单实例

由于windows运行的程序放在bin\windows下面。需要对kafka-run-class.bat批处理文件进行稍稍修改：

rem set BASE_DIR=%CD%\..
set BASE_DIR=%CD%\..\..

rem for %%i in (%BASE_DIR%\core\lib\*.jar) do (
for %%i in (%BASE_DIR%\libs\*.jar) do (

运行程序：

bin\windows>zookeeper-server-start.bat ..\..\config\zookeeper.properties

rem 再打开一个cmd窗口运行
bin\windows>kafka-server-start.bat ..\..\config\server.properties

整合成一个脚本start-all.bat，方便以后使用：

start zookeeper-server-start.bat ..\..\config\zookeeper.properties
timeout 5
start kafka-server-start.bat ..\..\config\server.properties
exit

Topic

bin\windows>kafka-run-class.bat kafka.admin.TopicCommand --create --zookeeper localhost:2181 --replication 1 --partitions 1 --topic hello
Created topic "hello".

bin\windows>kafka-run-class.bat kafka.admin.TopicCommand --list --zookeeper localhost:2181
hello

bin\windows>kafka-run-class.bat kafka.admin.TopicCommand  --describe --zookeeper localhost:2181 --topic hello
Topic:hello     PartitionCount:1        ReplicationFactor:1     Configs:
        Topic: hello    Partition: 0    Leader: 0       Replicas: 0     Isr: 0
      
bin\windows>kafka-consumer-offset-checker.bat --zookeeper localhost:2181 --topic foo --group test      

如果是在linux下，可以运行kafka-topics.sh来创建和查询。如果觉得打印的日志很不爽，可以修改config目录下的log4j.properties（在脚本中通过环境变量log4j.configuration指定为该文件）。

发送接受消息

rem 生产者
bin\windows>kafka-console-producer.bat --broker-list localhost:9092 --topic hello

rem 消费者（新开一个窗口）
bin\windows>kafka-console-consumer.bat --zookeeper localhost:2181 --topic hello --from-beginning

都启动后，在producer的窗口输入信息。同一时刻，consumer也会打印输入的内容。

这个两个命令都有很多参数，直接输入命令不加任何参数可以输出帮助，了解各个参数的含义及其用法。

Kafka集群

集群的配置和zookeeper的集群配置方式很类似。只要修改broker.id和数据存储目录即可。

拷贝server.properties，然后修改下面的三个属性：

broker.id=1
port=9192
log.dir=/tmp/kafka-logs-1

然后启动：

set JMX_PORT=19999
start kafka-server-start.bat ..\..\config\server-1.properties
set JMX_PORT=29999
start kafka-server-start.bat ..\..\config\server-2.properties
set JMX_PORT=39999
start kafka-server-start.bat ..\..\config\server-3.properties

创建Topic

bin\windows>kafka-run-class.bat kafka.admin.TopicCommand --create --zookeeper localhost:2181 --replication-factor 3 --partitions 1 --topic mhello
Created topic "mhello".

bin\windows>kafka-run-class.bat kafka.admin.TopicCommand --describe --zookeeper localhost:2181 --topic mhello
Topic:mhello    PartitionCount:1        ReplicationFactor:3     Configs:
        Topic: mhello   Partition: 0    Leader: 0       Replicas: 0,3,1 Isr: 0,3,1

bin\windows>kafka-console-producer.bat --broker-list localhost:9092 --topic mhello
bin\windows>kafka-console-consumer.bat --zookeeper localhost:2181 --topic mhello --from-beginning
      

描述命令的第一行是所有分区的概要信息，接下来的每一行是每一个分区的信息。Leader后面的数字表示对应的broker-id的进程为当前分区的主节点，后面的Replicas是数据分布的情况（不管数据存在与否），Isr是当前存活的节点的数据分布情况。

把刚刚启动的1，2，3的节点都停掉，再查描述信息。

bin\windows>kafka-run-class.bat kafka.admin.TopicCommand --describe --zookeeper localhost:2181 --topic mhello
Topic:mhello    PartitionCount:1        ReplicationFactor:3     Configs:
        Topic: mhello   Partition: 0    Leader: 0       Replicas: 0,3,1 Isr: 0

bin\windows>kafka-console-consumer.bat --zookeeper localhost:2181 --topic mhello --from-beginning
hello1
hello2
hello3        

只要有一个节点存在，获取数据都没有问题。如果全部停了，就不能提供服务，但是查询describe命令，显示的还是0，囧！！

开启1，2，3节点后，mhello分区的状态：

Topic:mhello    PartitionCount:1        ReplicationFactor:3     Configs:
        Topic: mhello   Partition: 0    Leader: 3       Replicas: 0,3,1 Isr: 3,1

问题：当broker-id修改后，原来的数据，并不能透明的过渡。把broker-id为0的节点修改为1000，然后重启。mhello的数据仍然找不到。再次改回0，存活节点才都回来。

    Topic: mhello   Partition: 0    Leader: 3       Replicas: 0,3,1 Isr: 3,1,0

小结

把基本的功能操作了一遍，都是使用命令行操作，接下来学习下和hadoop结合，使用java-api来操作Kafka。

参考

kafka gettingStarted

实际脚本

@@
cd E:\local\opt\bigdata\zookeeper-3.4.5\bin
zkServer.cmd

@@
cd E:\local\opt\bigdata\kafka_2.11-0.10.1.0\bin\windows
kafka-server-start.bat ..\..\config\server.properties

kafka-topics.bat --zookeeper localhost:2181 --list 

重启zookeeper后，在执行这个命令报错： NoNodeException: KeeperErrorCode = NoNode for /consumers/test/offsets/foo/0.
kafka-consumer-offset-checker.bat --zookeeper localhost:2181 --topic foo --group test

kafka-console-producer.bat --broker-list localhost:9092 --topic foo

–END

← Older Blog Archives Newer →

佛爷

来之不易, 且等且珍惜.
得之我幸; 不得-争-复争-且不得, 命也, 乐享天命, 福也.

GitHub Repos

Status updating…

@winse on GitHub

Winse Blog

走走停停都是风景, 熙熙攘攘都向最好, 忙忙碌碌都为明朝, 何畏之.

开年2015

学习btrace

helloworld

注解

源码

Windows Gif

参考

Build redis-2.8

jemalloc

tcmalloc

redis3集群安装cluster

Cygwin

参考

Kafka快速入门

启动单实例

Topic

发送接受消息

Kafka集群

小结

参考

实际脚本