Redis使用优化

Thu 2016-07-28 08:22

最近对生产的Redis做了两个优化：Redis扩展、以及对简单键值对的存储优化（string改成hash形式）

Redis扩展

上一篇介绍的Codis安装。但是使用Pipeline操作时间比较长、连接数比较多的情况下，经常出现连接重置的情况。感觉不踏实，go也不懂感觉短时间处理不了这种问题。

寻求它法。前期是把不同业务数据写入不同的redis实例，根据业务来分。对于同一个业务来说，得根据key的hash来写入不同的实例，但是自己写的话得包装一堆东西。

jedis工具包括Shared的功能，根据写入key的hash映射到不同的redis实例。截取了部分Shared的主要代码:

public class Sharded<R, S extends ShardInfo<R>> {
...
    private void initialize(List<S> shards) {
  nodes = new TreeMap<Long, S>();

  for (int i = 0; i != shards.size(); ++i) {
      final S shardInfo = shards.get(i);
      if (shardInfo.getName() == null)
      for (int n = 0; n < 160 * shardInfo.getWeight(); n++) {
          nodes.put(this.algo.hash("SHARD-" + i + "-NODE-" + n),
              shardInfo);
      }
      else
      for (int n = 0; n < 160 * shardInfo.getWeight(); n++) {
          nodes.put(
              this.algo.hash(shardInfo.getName() + "*"
                  + shardInfo.getWeight() + n), shardInfo);
      }
      resources.put(shardInfo, shardInfo.createResource());
  }
    }
...   
    public S getShardInfo(byte[] key) {
  SortedMap<Long, S> tail = nodes.tailMap(algo.hash(key));
  if (tail.isEmpty()) {
      return nodes.get(nodes.firstKey());
  }
  return tail.get(tail.firstKey());
    }

    public S getShardInfo(String key) {
  return getShardInfo(SafeEncoder.encode(getKeyTag(key)));
    }
...

使用的时刻很简单，通过ShardedJedis来进读写，大部分的操作与Jedis类似。只是有部分整个集群的操作不能用：keys/scan等。

  public List<JedisShardInfo> getShards(String sValue) {
    String[] servers = sValue.split(",");

    List<JedisShardInfo> shards = new ArrayList<>();
    for (String server : servers) {
      Pair<String, Integer> hp = parseServer(server);
      shards.add(new JedisShardInfo(hp.getLeft(), hp.getRight(), Integer.MAX_VALUE));
    }
    return shards;
  }
  private ShardedJedisPool createRedisPool(String server) {
    return new ShardedJedisPool(new GenericObjectPoolConfig(), getShards(server));
  }

如果使用过程中要使用keys，可以通过getAllShards得到所有Jedis实例的键再进行处理：

  public Double zscore(String key, String member) {
    try (ShardedJedis redis = getRedis()) {
      return redis.zscore(key, member);
    }
  }
  
  public void expires(List<String> patterns, int seconds) {
    try (ShardedJedis shardedJedis = getRedis()) {
      Set<String> keys = new HashSet<>();

      for (Jedis redis : shardedJedis.getAllShards()) {
        for (String p : patterns) {
          keys.addAll(redis.keys(p)); // 调用单独实例的keys命令获取匹配的键
        }
      }

      ShardedJedisPipeline pipeline = shardedJedis.pipelined();
      for (String key : keys) {
        pipeline.expire(key, seconds);
      }
      pipeline.sync();
    }
  }

进行多实例(集群)切分后，效果还是挺明显的。写入高峰期分流效果显著，负载均摊，可使用的内存也翻翻，键也基本平均分布（ --maxmemory-policy volatile-lru ）。生产实际效果：

[hadoop@hadoop-master1 redis]$ sh stat_cluster.sh 

 * [ ============================================================> ] 4 / 4

hadoop-master1:
# Memory
used_memory:44287785776
used_memory_human:41.25G
used_memory_rss:67458658304
used_memory_peak:67981990576
used_memory_peak_human:63.31G
used_memory_lua:33792
mem_fragmentation_ratio:1.52
mem_allocator:jemalloc-3.6.0
# Keyspace
db0:keys=72729777,expires=11967,avg_ttl=63510023

hadoop-master2:
# Memory
used_memory:50667945344
used_memory_human:47.19G
used_memory_rss:66036752384
used_memory_peak:64424543672
used_memory_peak_human:60.00G
used_memory_lua:33792
mem_fragmentation_ratio:1.30
mem_allocator:jemalloc-3.6.0
# Keyspace
db0:keys=100697581,expires=13426,avg_ttl=63509903

hadoop-master3:
# Memory
used_memory:56763389184
used_memory_human:52.87G
used_memory_rss:66324045824
used_memory_peak:64424546136
used_memory_peak_human:60.00G
used_memory_lua:33792
mem_fragmentation_ratio:1.17
mem_allocator:jemalloc-3.6.0
# Keyspace
db0:keys=94363547,expires=13544,avg_ttl=63505693

hadoop-master4:
# Memory
used_memory:54513952832
used_memory_human:50.77G
used_memory_rss:67257393152
used_memory_peak:64820124928
used_memory_peak_human:60.37G
used_memory_lua:33792
mem_fragmentation_ratio:1.23
mem_allocator:jemalloc-3.6.0
# Keyspace
db0:keys=83297543,expires=12418,avg_ttl=63507046


Finished processing 4 / 4 hosts in 298.89 ms

存储优化

实际环境中存在会大量的用到简单string键值对，挺耗内存的。其实使用hash（内部存储ziplist）能更有效的利用内存。

注意是ziplist形式的hash才能省内存！！如果是skiplist的hash会浪费内存。

内存优化之Redis数据结构的设计优化实践 heylinux.com/archives/1920.html 这篇文章可能访问不了，可以通过google/baidu的快照来查看
Understanding hash-max-zipmap-entries and “hash of hashes” optimization
http://redis.io/topics/memory-optimization
http://redis.io/topics/lru-cache
Redis内存优化

下面引用官网对简单键值对和Hash的一个比较（Redis中key的相关特性不关注）: 对于小数据量的hash进行了优化

a few keys use a lot more memory than a single key containing a hash with a few fields.

We use a trick.

But many times hashes contain just a few fields. When hashes are small we can instead just encode them in an O(N) data structure, like a linear array with length-prefixed key value pairs. Since we do this only when N is small

This does not work well just from the point of view of time complexity, but also from the point of view of constant times, since a linear array of key value pairs happens to play very well with the CPU cache (it has a better cache locality than a hash table).

优化主要涉及到ziplist的两个参数，是一个cpu/memory之间的均衡关系。entries直接用默认的就好了，value最好不要大于254（ziplist节点entry大于254需要增加4个到5字节，来存储前一个entry的长度）。

hash-max-zipmap-entries 512 (hash-max-ziplist-entries for Redis >= 2.6)
hash-max-zipmap-value 64  (hash-max-ziplist-value for Redis >= 2.6)

简单列几条数据：

0dc46077dfaa4970a1ec9f38cfc29277fa9e1012.ime.galileo.baidu.com  ->  1469584847
co4hk52ia0b1.5buzd.com                                          ->  1468859527
119.84.110.82_39502                                             ->  1469666877

原始key内容可以不需要，鉴于包括域名的key太长，直接对数据key取md5。以1亿键值对来进行估算，取md5的前五位作为key，后27位作为hash键值对的key。

扫描原始redis实例，然后把键值对转换后存储到新的实例。转换Scala代码如下：

import java.util.{List => JList}
import org.apache.commons.codec.digest.DigestUtils
import redis.clients.jedis._
import scala.collection.JavaConversions._

trait RedisUtils {

  def md5(data: String): String = {
    DigestUtils.md5Hex(data)
  }

  def Type(redis: Jedis, key: String) = redis.`type`(key)

  def scan(redis: Jedis)(action: JList[String] => Unit): Unit = {
    import scala.util.control.Breaks._

    var cursor = "0"
    breakable {
      while (true) {
        val res = redis.scan(cursor)

        action(res.getResult())

        cursor = res.getStringCursor
        if (cursor.equals("0")) {
          break
        }
      }
    }
  }
  
  def printInfo(redis: Jedis): Unit = {
    println(redis.info())
  }

  // 验证：
  //  打印 **总共** 的键值对数量
  //  eval "local aks=redis.call('keys', '*'); local res=0; for i,r in ipairs(aks) do res=res+redis.call('hlen', r) end; return res" 0
  //  打印 **每个** hash包括的键值对个数
  //  eval "local aks=redis.call('keys', '*'); local res={}; for i,r in ipairs(aks) do res[i]=redis.call('hlen', r) end; return res" 0
  //

}

Object RedisTransfer extends RedisUtils {

  def handle(key: String, value: String, tp: Pipeline): Unit = {
    val m5 = md5(key)
    tp.hset(m5.substring(0, 5), m5.substring(5), value)
  }

  def main(args: Array[String]) {
    val Array(sHost, sPort, tHost, tPort) = args

    val timeout = 60 * 1000
    val source = new Jedis(sHost, sPort.toInt, timeout)
    val sp = source.pipelined()
    val target = new Jedis(tHost, tPort.toInt, timeout)
    val tp = target.pipelined()

    scan(source) { keys =>
      // 仅处理 string类型 的记录
      val requests = for (key <- keys) yield Some((key, sp.get(key)))
      sp.sync()

      for (
        request <- requests;
        (key, resp) <- request
      ) {
        try {
          handle(key, resp.get(), tp)
        } catch {
          case e: Exception => println(s"fetch $key with exception, ${e.getMessage}")
        }
      }
    }

    tp.sync()

    printInfo(target)

    target.close()
    source.close()
  }

}

由于对数据进行了处理，对比不是很清晰，不能直接说省了多少空间。但是添加上面的处理后，原来30G（大概3亿多）的实例变成了15G。

另一个案例

另外对域名的实例做了下测试，6.4百万的键值对：707.29M内存：

md5前4个字符作为key，总共产生65536个键值对。每个hash大概包括100个kv。

hash的key使用原来的键
- 不调ziplist_value的值，实际的转换成hash(skiplist)：939.6M，
- ziplist_value修改成1024，转换成hash(ziplist)：513.78M
md5的作为hash的新key：344.7M
md5的后28位作为hash的新key： 259.09M

如：

MD5:
  3:0dc46077dfaa4970a1ec9f38cfc29277fa9e1012.ime.galileo.baidu.com
  1356de078028ddf266c962533760b27c

1356 -> hash( 3:0dc46077dfaa4970a1ec9f38cfc29277fa9e1012.ime.galileo.baidu.com -> 1469584847 )
1356 -> hash( 1356de078028ddf266c962533760b27c -> 1469584847 )
1356 -> hash( de078028ddf266c962533760b27c -> 1469584847 )

–END

使用 Naxsi 处理 XSS

Tue 2016-07-19 19:43

前台安全检查时出现了【检测到目标URL存在跨站漏洞】，就是可以通过url带js来截取用户的信息。

js/jquery/jquery-1.8.2.min.js/<ScRipt>jovoys(6258);</ScRipt>

XSS的一些简单介绍：

搜索到使用 naxsi 配合 nginx 有现成的解决方案，网上的资料很乱，直接看官方文档清晰一些。

编译

[hadoop@cu2 sources]$ ll
drwxrwxr-x  6 hadoop hadoop      4096 Sep 10  2015 naxsi-0.54
-rw-r--r--  1 hadoop hadoop    192843 Jul 19 18:42 naxsi-0.54.zip
drwxr-xr-x  9 hadoop hadoop      4096 Nov 11  2015 nginx-1.7.10

[hadoop@cu2 sources]$ ll nginx-1.7.10/
total 3180
drwxr-xr-x  6 hadoop hadoop    4096 Nov 11  2015 auto
-rw-r--r--  1 hadoop hadoop  246649 Feb 10  2015 CHANGES
-rw-r--r--  1 hadoop hadoop  375103 Feb 10  2015 CHANGES.ru
drwxr-xr-x  2 hadoop hadoop    4096 Nov 11  2015 conf
-rwxr-xr-x  1 hadoop hadoop    2463 Feb 10  2015 configure
drwxr-xr-x  4 hadoop hadoop    4096 Nov 11  2015 contrib
drwxr-xr-x  2 hadoop hadoop    4096 Nov 11  2015 html
-rw-r--r--  1 hadoop hadoop    1397 Feb 10  2015 LICENSE
-rw-rw-r--  1 hadoop hadoop     342 Jul 19 18:44 Makefile
drwxr-xr-x  2 hadoop hadoop    4096 Nov 11  2015 man
drwxrwxr-x  4 hadoop hadoop    4096 Jul 19 18:45 objs
-rw-r--r--  1 hadoop hadoop 2009464 Nov 11  2015 pcre-8.36.tar.gz
-rw-r--r--  1 hadoop hadoop      49 Feb 10  2015 README
drwxr-xr-x 10 hadoop hadoop    4096 Nov 11  2015 src
-rw-r--r--  1 hadoop hadoop  571091 Nov 11  2015 zlib-1.2.8.tar.gz

[hadoop@cu2 nginx-1.7.10]$ ./configure --add-module=../naxsi-x.xx/naxsi_src/ --prefix=/opt/nginx
[hadoop@cu2 nginx-1.7.10]$ make && make install

配置

需要在 nginx.conf 的http中引入 naxsi_core.rules ，在location中加入规则。

先把 naxsi_core.rules 拷贝到 nginx/conf 目录下。

http {
    include       mime.types;
    include       naxsi_core.rules;
  ...
    server {
  ...
        location /omc {

#Enable naxsi
SecRulesEnabled;

#Enable learning mide
#LearningMode;

#Define where blocked requests go
DeniedUrl "/omc/error.jsp";

#CheckRules, determining when naxsi needs to take action
CheckRule "$SQL >= 8" BLOCK;
CheckRule "$RFI >= 8" BLOCK;
CheckRule "$TRAVERSAL >= 4" BLOCK;
CheckRule "$EVADE >= 4" BLOCK;
CheckRule "$XSS >= 8" BLOCK;

#naxsi logs goes there
error_log logs/foo.log;

                proxy_set_header        X-Real-IP $remote_addr;
                proxy_set_header        X-Forwarded-For $proxy_add_x_forwarded_for;
                proxy_set_header        Host $http_host;

                proxy_pass http://localhost:8888/omc;
        }
      ...
      

启动生效

sbin/nginx -p $PWD

https://github.com/nbs-system/naxsi/wiki/naxsi-setup https://github.com/nbs-system/naxsi/wiki/checkrules-bnf

检查会比较严格，添加后应用可能会报错，需要对 foo.log 中的情况进行确认，对规则进行一些修改。如不需要监控 cookie 里面的内容：

[omc@cu-omc1 nginx]$ vi conf/naxsi_core.rules 
:%s/|$HEADERS_VAR:Cookie//

还有一些 %[2|3] 的可能也需要改改。

uri=/omc/Frame/Time.do&learning=0&vers=0.54&total_processed=404&total_blocked=10&block=1&zone0=BODY&id0=16&var_name0=

根据请求的 id 去规则配置里面找具体的描述，然后 uri 和 var_name 查看具体的请求对症下药：去掉规则或者改请求。

如上面请求的 id0=16 对应 #@MainRule “msg:empty POST” id:16; 把请求修改成get即可。

–END

Codis简单使用

Thu 2016-07-14 19:35

总有单机搞不定的时刻，并且手动切分很麻烦的时刻。不得不开始redis集群，官网的redis3不支持pipeline首先就排除了。

安装codis，需要先安装go。(官网入门文档](https://github.com/CodisLabs/codis/blob/master/doc/tutorial_zh.md)

[root@cu2 local]# tar zxvf go1.6.2.linux-amd64.tar.gz 

[hadoop@cu2 ~]$ vi .bash_profile
...
export GOROOT=/usr/local/go
export GOPATH=$HOME/codis

PATH=$GOPATH/bin:$GOROOT/bin:$HOME/bin:$HADOOP_HOME/bin:$HIVE_HOME/bin:$PATH

[hadoop@cu2 codis]$ source ~/.bash_profile 

git,make 这些要提前安装好。测试环境都编译过hadoop、spark肯定齐全的。

通过go在线安装（如果生产不能上网，可以先在测试环境安装好后，然后打包复制过去）：

[hadoop@cu2 codis]$ go get -u -d github.com/CodisLabs/codis
package github.com/CodisLabs/codis: no buildable Go source files in /home/hadoop/codis/src/github.com/CodisLabs/codis
[hadoop@cu2 codis]$ 

<<安装依赖的工具
[hadoop@cu2 codis]$ go get github.com/tools/godep

[hadoop@cu2 codis]$ make
GO15VENDOREXPERIMENT=0 GOPATH=`godep path` godep restore
godep: [WARNING]: godep should only be used inside a valid go package directory and
godep: [WARNING]: may not function correctly. You are probably outside of your $GOPATH.
godep: [WARNING]:       Current Directory: /home/hadoop/codis/src/github.com/CodisLabs/codis
godep: [WARNING]:       $GOPATH: /home/hadoop/codis/src/github.com/CodisLabs/codis/Godeps/_workspace
       
<<这里要等一段时间

GOPATH=`godep path`:$GOPATH go build -o bin/codis-proxy ./cmd/proxy
godep: WARNING: Godep workspaces (./Godeps/_workspace) are deprecated and support for them will be removed when go1.8 is released.
godep: WARNING: Go version (go1.6) & $GO15VENDOREXPERIMENT= wants to enable the vendor experiment, but disabling because a Godep workspace (Godeps/_workspace) exists
GOPATH=`godep path`:$GOPATH go build -o bin/codis-config ./cmd/cconfig
godep: WARNING: Godep workspaces (./Godeps/_workspace) are deprecated and support for them will be removed when go1.8 is released.
godep: WARNING: Go version (go1.6) & $GO15VENDOREXPERIMENT= wants to enable the vendor experiment, but disabling because a Godep workspace (Godeps/_workspace) exists
make -j4 -C extern/redis-2.8.21/
make[1]: Entering directory `/home/hadoop/codis/src/github.com/CodisLabs/codis/extern/redis-2.8.21'
cd src && make all
make[2]: Entering directory `/home/hadoop/codis/src/github.com/CodisLabs/codis/extern/redis-2.8.21/src'
rm -rf redis-server redis-sentinel redis-cli redis-benchmark redis-check-dump redis-check-aof *.o *.gcda *.gcno *.gcov redis.info lcov-html
(cd ../deps && make distclean)
make[3]: Entering directory `/home/hadoop/codis/src/github.com/CodisLabs/codis/extern/redis-2.8.21/deps'
(cd hiredis && make clean) > /dev/null || true
(cd linenoise && make clean) > /dev/null || true
(cd lua && make clean) > /dev/null || true
(cd jemalloc && [ -f Makefile ] && make distclean) > /dev/null || true
(rm -f .make-*)
make[3]: Leaving directory `/home/hadoop/codis/src/github.com/CodisLabs/codis/extern/redis-2.8.21/deps'
(rm -f .make-*)
echo STD=-std=c99 -pedantic >> .make-settings
echo WARN=-Wall -W >> .make-settings
echo OPT=-O2 >> .make-settings
echo MALLOC=jemalloc >> .make-settings
echo CFLAGS= >> .make-settings
echo LDFLAGS= >> .make-settings
echo REDIS_CFLAGS= >> .make-settings
echo REDIS_LDFLAGS= >> .make-settings
echo PREV_FINAL_CFLAGS=-std=c99 -pedantic -Wall -W -O2 -g -ggdb   -I../deps/hiredis -I../deps/linenoise -I../deps/lua/src -DUSE_JEMALLOC -I../deps/jemalloc/include >> .make-settings
echo PREV_FINAL_LDFLAGS=  -g -ggdb -rdynamic >> .make-settings
(cd ../deps && make hiredis linenoise lua jemalloc)
make[3]: Entering directory `/home/hadoop/codis/src/github.com/CodisLabs/codis/extern/redis-2.8.21/deps'
(cd hiredis && make clean) > /dev/null || true
(cd linenoise && make clean) > /dev/null || true
(cd lua && make clean) > /dev/null || true
(cd jemalloc && [ -f Makefile ] && make distclean) > /dev/null || true
(rm -f .make-*)
(echo "" > .make-ldflags)
(echo "" > .make-cflags)
MAKE hiredis
cd hiredis && make static
MAKE linenoise
cd linenoise && make
MAKE lua
cd lua/src && make all CFLAGS="-O2 -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL " MYLDFLAGS="" AR="ar rcu"
MAKE jemalloc
cd jemalloc && ./configure --with-jemalloc-prefix=je_ --enable-cc-silence CFLAGS="-std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops " LDFLAGS=""
make[4]: Entering directory `/home/hadoop/codis/src/github.com/CodisLabs/codis/extern/redis-2.8.21/deps/linenoise'
cc  -Wall -Os -g  -c linenoise.c
make[4]: Entering directory `/home/hadoop/codis/src/github.com/CodisLabs/codis/extern/redis-2.8.21/deps/lua/src'
cc -O2 -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL    -c -o lapi.o lapi.c
make[4]: Entering directory `/home/hadoop/codis/src/github.com/CodisLabs/codis/extern/redis-2.8.21/deps/hiredis'
cc -std=c99 -pedantic -c -O3 -fPIC  -Wall -W -Wstrict-prototypes -Wwrite-strings -g -ggdb  net.c
checking for xsltproc... /usr/bin/xsltproc
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables... 
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... cc -std=c99 -pedantic -c -O3 -fPIC  -Wall -W -Wstrict-prototypes -Wwrite-strings -g -ggdb  hiredis.c
yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking how to run the C preprocessor... cc -O2 -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL    -c -o lcode.o lcode.c
make[4]: Leaving directory `/home/hadoop/codis/src/github.com/CodisLabs/codis/extern/redis-2.8.21/deps/linenoise'
cc -O2 -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL    -c -o ldebug.o ldebug.c
gcc -E
checking for grep that handles long lines and -e... /bin/grep
checking for egrep... /bin/grep -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... cc -O2 -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL    -c -o ldo.o ldo.c
yes
cc -O2 -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL    -c -o ldump.o ldump.c
checking for stdlib.h... ldo.c: In function ‘f_parser’:
ldo.c:496: warning: unused variable ‘c’
yes
checking for string.h... cc -O2 -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL    -c -o lfunc.o lfunc.c
yes
checking for memory.h... yes
checking for strings.h... cc -O2 -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL    -c -o lgc.o lgc.c
cc -O2 -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL    -c -o llex.o llex.c
yes
cc -std=c99 -pedantic -c -O3 -fPIC  -Wall -W -Wstrict-prototypes -Wwrite-strings -g -ggdb  sds.c
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking whether byte ordering is bigendian... cc -O2 -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL    -c -o lmem.o lmem.c
cc -O2 -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL    -c -o lobject.o lobject.c
no
checking size of void *... cc -std=c99 -pedantic -c -O3 -fPIC  -Wall -W -Wstrict-prototypes -Wwrite-strings -g -ggdb  async.c
cc -O2 -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL    -c -o lopcodes.o lopcodes.c
8
checking size of int... cc -O2 -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL    -c -o lparser.o lparser.c
cc -O2 -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL    -c -o lstate.o lstate.c
4
checking size of long... cc -O2 -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL    -c -o lstring.o lstring.c
8
checking size of intmax_t... cc -O2 -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL    -c -o ltable.o ltable.c
8
checking build system type... ar rcs libhiredis.a net.o hiredis.o sds.o async.o
x86_64-unknown-linux-gnu
checking host system type... x86_64-unknown-linux-gnu
checking whether pause instruction is compilable... make[4]: Leaving directory `/home/hadoop/codis/src/github.com/CodisLabs/codis/extern/redis-2.8.21/deps/hiredis'
cc -O2 -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL    -c -o ltm.o ltm.c
yes
checking whether SSE2 intrinsics is compilable... cc -O2 -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL    -c -o lundump.o lundump.c
cc -O2 -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL    -c -o lvm.o lvm.c
cc -O2 -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL    -c -o lzio.o lzio.c
yes
checking for ar... ar
checking whether __attribute__ syntax is compilable... cc -O2 -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL    -c -o strbuf.o strbuf.c
cc -O2 -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL    -c -o fpconv.o fpconv.c
yes
checking whether compiler supports -fvisibility=hidden... yes
checking whether compiler supports -Werror... yes
checking whether tls_model attribute is compilable... cc -O2 -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL    -c -o lauxlib.o lauxlib.c
cc -O2 -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL    -c -o lbaselib.o lbaselib.c
yes
checking for a BSD-compatible install... /usr/bin/install -c
checking for ranlib... ranlib
checking for ld... /usr/bin/ld
checking for autoconf... /usr/bin/autoconf
checking for memalign... yes
checking for valloc... yes
checking configured backtracing method... N/A
checking for sbrk... cc -O2 -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL    -c -o ldblib.o ldblib.c
cc -O2 -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL    -c -o liolib.o liolib.c
cc -O2 -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL    -c -o lmathlib.o lmathlib.c
yes
checking whether utrace(2) is compilable... no
checking whether valgrind is compilable... no
checking STATIC_PAGE_SHIFT... cc -O2 -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL    -c -o loslib.o loslib.c
12
cc -O2 -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL    -c -o ltablib.o ltablib.c
cc -O2 -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL    -c -o lstrlib.o lstrlib.c
checking pthread.h usability... cc -O2 -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL    -c -o loadlib.o loadlib.c
cc -O2 -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL    -c -o linit.o linit.c
yes
checking pthread.h presence... yes
checking for pthread.h... yes
checking for pthread_create in -lpthread... cc -O2 -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL    -c -o lua_cjson.o lua_cjson.c
yes
checking for _malloc_thread_cleanup... cc -O2 -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL    -c -o lua_struct.o lua_struct.c
no
checking for _pthread_mutex_init_calloc_cb... no
checking for TLS... yes
checking whether a program using ffsl is compilable... cc -O2 -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL    -c -o lua_cmsgpack.o lua_cmsgpack.c
cc -O2 -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL    -c -o lua_bit.o lua_bit.c
yes
checking whether atomic(9) is compilable... no
checking whether Darwin OSAtomic*() is compilable... cc -O2 -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL    -c -o lua.o lua.c
cc -O2 -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL    -c -o luac.o luac.c
no
checking whether to force 32-bit __sync_{add,sub}_and_fetch()... no
checking whether to force 64-bit __sync_{add,sub}_and_fetch()... no
checking whether Darwin OSSpin*() is compilable... no
checking for stdbool.h that conforms to C99... cc -O2 -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL    -c -o print.o print.c
yes
checking for _Bool... ar rcu liblua.a lapi.o lcode.o ldebug.o ldo.o ldump.o lfunc.o lgc.o llex.o lmem.o lobject.o lopcodes.o lparser.o lstate.o lstring.o ltable.o ltm.o lundump.o lvm.o lzio.o strbuf.o fpconv.o lauxlib.o lbaselib.o ldblib.o liolib.o lmathlib.o loslib.o ltablib.o lstrlib.o loadlib.o linit.o lua_cjson.o lua_struct.o lua_cmsgpack.o lua_bit.o        # DLL needs all object files
ranlib liblua.a
cc -o lua  lua.o liblua.a -lm 
liblua.a(loslib.o): In function `os_tmpname':
loslib.c:(.text+0x35): warning: the use of `tmpnam' is dangerous, better use `mkstemp'
cc -o luac  luac.o print.o liblua.a -lm 
yes
make[4]: Leaving directory `/home/hadoop/codis/src/github.com/CodisLabs/codis/extern/redis-2.8.21/deps/lua/src'
configure: creating ./config.status
config.status: creating Makefile
config.status: creating doc/html.xsl
config.status: creating doc/manpages.xsl
config.status: creating doc/jemalloc.xml
config.status: creating include/jemalloc/jemalloc_macros.h
config.status: creating include/jemalloc/jemalloc_protos.h
config.status: creating include/jemalloc/internal/jemalloc_internal.h
config.status: creating test/test.sh
config.status: creating test/include/test/jemalloc_test.h
config.status: creating config.stamp
config.status: creating bin/jemalloc.sh
config.status: creating include/jemalloc/jemalloc_defs.h
config.status: creating include/jemalloc/internal/jemalloc_internal_defs.h
config.status: creating test/include/test/jemalloc_test_defs.h
config.status: executing include/jemalloc/internal/private_namespace.h commands
config.status: executing include/jemalloc/internal/private_unnamespace.h commands
config.status: executing include/jemalloc/internal/public_symbols.txt commands
config.status: executing include/jemalloc/internal/public_namespace.h commands
config.status: executing include/jemalloc/internal/public_unnamespace.h commands
config.status: executing include/jemalloc/internal/size_classes.h commands
config.status: executing include/jemalloc/jemalloc_protos_jet.h commands
config.status: executing include/jemalloc/jemalloc_rename.h commands
config.status: executing include/jemalloc/jemalloc_mangle.h commands
config.status: executing include/jemalloc/jemalloc_mangle_jet.h commands
config.status: executing include/jemalloc/jemalloc.h commands
===============================================================================
jemalloc version   : 3.6.0-0-g46c0af68bd248b04df75e4f92d5fb804c3d75340
library revision   : 1

CC                 : gcc
CPPFLAGS           :  -D_GNU_SOURCE -D_REENTRANT
CFLAGS             : -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -fvisibility=hidden
LDFLAGS            : 
EXTRA_LDFLAGS      : 
LIBS               :  -lpthread
RPATH_EXTRA        : 

XSLTPROC           : /usr/bin/xsltproc
XSLROOT            : 

PREFIX             : /usr/local
BINDIR             : /usr/local/bin
INCLUDEDIR         : /usr/local/include
LIBDIR             : /usr/local/lib
DATADIR            : /usr/local/share
MANDIR             : /usr/local/share/man

srcroot            : 
abs_srcroot        : /home/hadoop/codis/src/github.com/CodisLabs/codis/extern/redis-2.8.21/deps/jemalloc/
objroot            : 
abs_objroot        : /home/hadoop/codis/src/github.com/CodisLabs/codis/extern/redis-2.8.21/deps/jemalloc/

JEMALLOC_PREFIX    : je_
JEMALLOC_PRIVATE_NAMESPACE
                   : je_
install_suffix     : 
autogen            : 0
experimental       : 1
cc-silence         : 1
debug              : 0
code-coverage      : 0
stats              : 1
prof               : 0
prof-libunwind     : 0
prof-libgcc        : 0
prof-gcc           : 0
tcache             : 1
fill               : 1
utrace             : 0
valgrind           : 0
xmalloc            : 0
mremap             : 0
munmap             : 0
dss                : 0
lazy_lock          : 0
tls                : 1
===============================================================================
cd jemalloc && make CFLAGS="-std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops " LDFLAGS="" lib/libjemalloc.a
make[4]: Entering directory `/home/hadoop/codis/src/github.com/CodisLabs/codis/extern/redis-2.8.21/deps/jemalloc'
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/jemalloc.o src/jemalloc.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/arena.o src/arena.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/atomic.o src/atomic.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/base.o src/base.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/bitmap.o src/bitmap.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/chunk.o src/chunk.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/chunk_dss.o src/chunk_dss.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/chunk_mmap.o src/chunk_mmap.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/ckh.o src/ckh.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/ctl.o src/ctl.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/extent.o src/extent.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/hash.o src/hash.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/huge.o src/huge.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/mb.o src/mb.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/mutex.o src/mutex.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/prof.o src/prof.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/quarantine.o src/quarantine.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/rtree.o src/rtree.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/stats.o src/stats.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/tcache.o src/tcache.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/util.o src/util.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/tsd.o src/tsd.c
ar crus lib/libjemalloc.a src/jemalloc.o src/arena.o src/atomic.o src/base.o src/bitmap.o src/chunk.o src/chunk_dss.o src/chunk_mmap.o src/ckh.o src/ctl.o src/extent.o src/hash.o src/huge.o src/mb.o src/mutex.o src/prof.o src/quarantine.o src/rtree.o src/stats.o src/tcache.o src/util.o src/tsd.o
make[4]: Leaving directory `/home/hadoop/codis/src/github.com/CodisLabs/codis/extern/redis-2.8.21/deps/jemalloc'
make[3]: Leaving directory `/home/hadoop/codis/src/github.com/CodisLabs/codis/extern/redis-2.8.21/deps'
    CC adlist.o
    CC ae.o
    CC anet.o
    CC dict.o
anet.c: In function ‘anetSockName’:
anet.c:565: warning: dereferencing pointer ‘s’ does break strict-aliasing rules
anet.c:563: note: initialized from here
anet.c:569: warning: dereferencing pointer ‘s’ does break strict-aliasing rules
anet.c:567: note: initialized from here
anet.c: In function ‘anetPeerToString’:
anet.c:543: warning: dereferencing pointer ‘s’ does break strict-aliasing rules
anet.c:541: note: initialized from here
anet.c:547: warning: dereferencing pointer ‘s’ does break strict-aliasing rules
anet.c:545: note: initialized from here
anet.c: In function ‘anetTcpAccept’:
anet.c:511: warning: dereferencing pointer ‘s’ does break strict-aliasing rules
anet.c:509: note: initialized from here
anet.c:515: warning: dereferencing pointer ‘s’ does break strict-aliasing rules
anet.c:513: note: initialized from here
    CC redis.o
    CC sds.o
    CC zmalloc.o
    CC lzf_c.o
    CC lzf_d.o
    CC pqsort.o
    CC zipmap.o
    CC ziplist.o
    CC sha1.o
    CC release.o
    CC networking.o
    CC util.o
    CC object.o
    CC db.o
    CC replication.o
    CC rdb.o
db.c: In function ‘scanGenericCommand’:
db.c:454: warning: ‘pat’ may be used uninitialized in this function
db.c:455: warning: ‘patlen’ may be used uninitialized in this function
    CC t_string.o
    CC t_list.o
    CC t_set.o
    CC t_zset.o
    CC t_hash.o
    CC config.o
    CC aof.o
    CC pubsub.o
    CC multi.o
    CC debug.o
    CC sort.o
    CC intset.o
    CC syncio.o
    CC migrate.o
    CC endianconv.o
    CC slowlog.o
    CC scripting.o
    CC bio.o
    CC rio.o
    CC rand.o
    CC memtest.o
    CC crc64.o
    CC crc32.o
    CC bitops.o
    CC sentinel.o
    CC notify.o
    CC setproctitle.o
    CC hyperloglog.o
    CC latency.o
    CC sparkline.o
    CC slots.o
    CC redis-cli.o
    CC redis-benchmark.o
    CC redis-check-dump.o
    CC redis-check-aof.o
    LINK redis-benchmark
    LINK redis-check-dump
    LINK redis-check-aof
    LINK redis-server
    INSTALL redis-sentinel
    LINK redis-cli

Hint: It's a good idea to run 'make test' ;)

make[2]: Leaving directory `/home/hadoop/codis/src/github.com/CodisLabs/codis/extern/redis-2.8.21/src'
make[1]: Leaving directory `/home/hadoop/codis/src/github.com/CodisLabs/codis/extern/redis-2.8.21'
[hadoop@cu2 codis]$ 

简单使用：

[hadoop@cu2 codis]$ pwd
/home/hadoop/codis/src/github.com/CodisLabs/codis
[hadoop@cu2 codis]$ ll bin
total 39856
drwxrwxr-x 4 hadoop hadoop     4096 Jul 14 15:49 assets
-rwxrwxr-x 1 hadoop hadoop 17329904 Jul 14 15:49 codis-config
-rwxrwxr-x 1 hadoop hadoop 17151864 Jul 14 15:49 codis-proxy
-rwxrwxr-x 1 hadoop hadoop  6313083 Jul 14 15:49 codis-server

<<配置
[hadoop@cu2 codis]$ vi config.ini 
zk=cu3:2181
dashboard_addr=cu2:18087
session_max_timeout=0
session_max_bufsize=1310720
session_max_pipeline=10240000

<<启动dashboard（大部分命令其实都可以通过网页来完成）
[hadoop@cu2 codis]$ nohup bin/codis-config dashboard >log/dashboard.log 2>&1 &

<<初始化1024 slot
[hadoop@cu2 codis]$ bin/codis-config slot init
{
  "msg": "OK",
  "ret": 0
}

<<启动redis server
[hadoop@cu2 codis]$ bin/codis-server --port 16379 --daemonize yes

<<在网页上分配好slot，或者：

$ bin/codis-config slot range-set 0 511 1 online
$ bin/codis-config slot range-set 512 1023 2 online

<<然后启动proxy
[hadoop@hadoop-master1 codis]$ nohup bin/codis-proxy -c config.ini -L proxy.log  --cpu=64 --addr=0.0.0.0:6372 --http-addr=0.0.0.0:11000 >>proxy.log 2>&1 &

<<客户端连接proxy
[hadoop@cu2 codis]$ ~/redis/bin/redis-cli -p 19000
<<不支持的命令
127.0.0.1:19000> keys *
Error: Server closed the connection
127.0.0.1:19000> scan 0
Error: Server closed the connection

127.0.0.1:19000> get a
"b"

127.0.0.1:19000> select 2
(error) ERR invalid DB index, only accept DB 0

<<也可以单独连接到redis server，进行操作
[hadoop@cu2 codis]$ ~/redis/bin/redis-cli -p 16378
# Keyspace
db0:keys=6,expires=0,avg_ttl=0
127.0.0.1:16378> keys *
1) "7"
2) "1"
3) "2"
4) "4"
5) "5"
6) "a"

codis不支持的命令，基本上都是全局操作的命令。不影响使用，如果一定要使用，可以通过客户端单独连server执行。

有个疑问，怎么查看proxy写的数据放在那个slot？要学学go才行。

安装参考

–END

使用 Flume+kafka+elasticsearch 处理数据

Tue 2016-06-28 09:50

flume-1.6依赖的kafka、elasticsearch的版本与我这使用程序的版本不一致，部分jar依赖需要替换，flume-elasticsearch-sink源码需要进行一些修改来适配elasticsearch-2.2。

flume-1.6.0
kafka_2.11-0.9.0.1(可以与0.8.2客户端通信, flume-kafka-channel-1.6.0不改)
elasticsearch-2.2.0

由于版本的差异，需要替换/添加以下jar到 flume/lib 下：

使用 mvn dependecy:copy-dependencies 导出所需依赖的包

jackson一堆，hppc-0.7.1.jar，t-digest-3.0.jar，jsr166e-1.1.0.jar，guava-18.0.jar，lucene一堆，elasticsearch-2.2.0.jar。

远程调试配置：

source由于项目上的一些特殊规则，需要自己编写。通过远程DEBUG来打断点来排查BUG。

[hadoop@ccc2 flume]$ vi conf/flume-env.sh
export JAVA_OPTS="-Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=8092"

实战1

KafkaChannel：考虑到其他功能也需要用到这些数据。

先写一个配置把flume自带功能跑通，这里用 netcat 作为输入运行：

[hadoop@ccc2 flumebin]$ cat dta.flume 
dta.sources=s1
dta.channels=c1
dta.sinks=k1

dta.channels.c1.type=org.apache.flume.channel.kafka.KafkaChannel
dta.channels.c1.capacity=10000
dta.channels.c1.transactionCapacity=1000
dta.channels.c1.brokerList=ccc5:9093
dta.channels.c1.topic=flume_cmdid_1234
dta.channels.c1.groupId=flume_dta
dta.channels.c1.zookeeperConnect=ccc3:2181/kafka_0_9
dta.channels.c1.parseAsFlumeEvent=false

dta.sources.s1.channels=c1
dta.sources.s1.type=netcat
dta.sources.s1.bind=0.0.0.0
dta.sources.s1.port=6666
dta.sources.s1.max-line-length=88888888

dta.sinks.k1.channel=c1
dta.sinks.k1.type=elasticsearch
dta.sinks.k1.hostNames=ccc2:9300
dta.sinks.k1.indexName=foo_index
dta.sinks.k1.indexType=idcisp
dta.sinks.k1.clusterName=eee-ccc
dta.sinks.k1.batchSize=500
dta.sinks.k1.ttl=5d
dta.sinks.k1.serializer=com.esw.zhfx.collector.InfoSecurityLogIndexRequestBuilderFactory
dta.sinks.k1.serializer.idcispUrlBase64=true

[hadoop@ccc2 flumebin]$ bin/flume-ng agent --classpath flume-dta-source-2.1.jar  -n dta -c conf -f dta.flume

# 新开一个窗口
[hadoop@ccc2 ~]$ nc localhost 6666

kafka的主题、ES的索引可以不要手动建，当然为了更好的控制ES索引创建可以添加一个索引名的template。

InfoSecurityLogIndexRequestBuilderFactory 实现 ElasticSearchIndexRequestBuilderFactory 把原始记录转换成 ES 的JSON对象。

  private Counter allRecordMetric = MetricManager.getInstance().counter("all_infosecurity");
  private Counter errorRecordMetric = MetricManager.getInstance().counter("error_infosecurity");
  
  public IndexRequestBuilder createIndexRequest(Client client, String indexPrefix, String indexType, Event event)
      throws IOException {
    allRecordMetric.inc();

    String record = new String(event.getBody(), outputCharset);

    context.put(ElasticSearchSinkConstants.INDEX_NAME, indexPrefix);
    indexNameBuilder.configure(context);
    IndexRequestBuilder indexRequestBuilder = client.prepareIndex(indexNameBuilder.getIndexName(event), indexType);

    try {
      Gson gson = new Gson();
      IdcIspLog log = parseRecord(record);
      BytesArray data = new BytesArray(gson.toJson(log));

      indexRequestBuilder.setSource(data);
      indexRequestBuilder.setRouting(log.commandld);
    } catch (Exception e) {
      LOG.error(e.getMessage(), e);
      errorRecordMetric.inc();

      indexRequestBuilder.setSource(record.getBytes(outputCharset));
      // 保留错误的数据
      indexRequestBuilder.setRouting("error");
    }

    return indexRequestBuilder;
  }

实战2

测试自定义的Source：

dta.sources=s1
dta.channels=c1
dta.sinks=k1

dta.channels.c1.type=memory
dta.channels.c1.capacity=1000000
dta.channels.c1.transactionCapacity=1000000
dta.channels.c1.byteCapacity=7000000000

dta.sources.s1.channels=c1
dta.sources.s1.type=com.esw.zhfx.collector.CollectSource
dta.sources.s1.spoolDir=/home/hadoop/flume/data/
dta.sources.s1.trackerDir=/tmp/dtaspool

dta.sinks.k1.channel=c1
dta.sinks.k1.type=logger

CollectSource 实现PollableSource 继承AbstractSource类。参考Flume开发文档: http://flume.apache.org/FlumeDeveloperGuide.html#source org.apache.flume.source.SequenceGeneratorSource 类。

方法process主逻辑代码如下：

  public Status process() throws EventDeliveryException {
    Status status = Status.READY;

    try {
      List<Event> events = readEvent(batchSize);
      if (!events.isEmpty()) {
        sourceCounter.addToEventReceivedCount(events.size());
        sourceCounter.incrementAppendBatchReceivedCount();

        getChannelProcessor().processEventBatch(events);
        // 记录文件已经处理的位置
        commit();

        sourceCounter.addToEventAcceptedCount(events.size());
        sourceCounter.incrementAppendBatchAcceptedCount();
      }
    } catch (ChannelException | IOException e) {
      status = Status.BACKOFF;
      Throwables.propagate(e);
    }

    return status;
  }

实例：Flume+Kafka+ES

把两个实例整合起来，把实例1的Source替换下即可。

附-kafka基本操作

[hadoop@ccc5 kafka_2.11-0.9.0.1]$ bin/kafka-server-start.sh config/server1.properties 

[hadoop@ccc5 kafka_2.11-0.9.0.1]$ cat config/server1.properties 
listeners=PLAINTEXT://:9093
log.dirs=/tmp/kafka-logs1
num.partitions=1
zookeeper.connect=ccc3,ccc4,ccc5/kafka_0_9

[hadoop@ccc5 kafka_2.11-0.9.0.1]$ bin/kafka-topics.sh --create --zookeeper ccc3:2181/kafka_0_9 --replication 1 --partitions 1 --topic flume
Created topic "flume".

[hadoop@ccc5 kafka_2.11-0.9.0.1]$ bin/kafka-topics.sh --list --zookeeper ccc3:2181/kafka_0_9
flume

[hadoop@ccc5 kafka_2.11-0.9.0.1]$ bin/kafka-console-producer.sh --broker-list ccc5:9093 --topic flume

[hadoop@ccc5 kafka_2.11-0.9.0.1]$ bin/kafka-console-consumer.sh --zookeeper ccc3:2181/kafka_0_9 --topic flume --from-beginning

##添加kafka-manager：

启动kafka添加JMX
export JMX_PORT=19999
nohup bin/kafka-server-start.sh config/server.properties &

# https://github.com/yahoo/kafka-manager/tree/1.3.1.8
nohup bin/kafka-manager -Dhttp.port=9090 &

附-Flume操作

https://flume.apache.org/FlumeUserGuide.html#fan-out-flow

#conf/flume-env.sh
export FLUME_JAVA_OPTS="-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=8092"
bin/flume-ng agent --classpath "flume-dta-libs/*" -Dflume.root.logger=DEBUG,console  -n dta -c conf -f accesslog.flume

# with ganglia
[ud@ccc-ud1 apache-flume-1.6.0-bin]$ bin/flume-ng agent --classpath "/home/ud/collector/common-lib/*"  -Dflume.root.logger=Debug,console -Dflume.monitoring.type=ganglia -Dflume.monitoring.hosts=239.2.11.71:8649 -n dta -c conf -f accesslog.flume 

# windows
bin\flume-ng.cmd agent -n agent -c conf -f helloworld.flume -property "flume.root.logger=INFO,console"

–END

使用Puppet安装配置Ganglia

Fri 2016-06-17 09:30

前面写过完全纯手工和用yum安装依赖来安装ganglia的文章，最近生产安装了puppet，既然已经手上已有牛刀，杀鸡就不用再取菜刀了。今天记录下前几天使用puppet安装ganglia的经历。

前提（自己操作过熟悉怎么用）

配置过私有仓库 (createrepo)
安装好puppet
编译过自己的rpm (rpmbuild)

编译gmetad，gmond，gweb

点击链接下载SPEC：

然后编译打包：

先手动编译安装 ganglia ，把依赖的问题处理好。编译安装没问题，然后再使用 rpmbuild 编译生成 rpm 包！！

# 1> 建立目录结构
mkdir ganglia-build
cd ganglia-build
mkdir BUILD RPMS SOURCES SPECS SRPMS

# 2> 修改配置
# ganglia-web-3.7.1.tar.gz的makefile、conf_default.php.in修改下，根据等下要配置gmetad的参数进行修改

less ganglia-web-3.7.1/Makefile 
  # Location where gweb should be installed to (excluding conf, dwoo dirs).
  GDESTDIR = /var/www/html/ganglia

  # Location where default apache configuration should be installed to.
  GCONFDIR = /usr/local/ganglia/etc/

  # Gweb statedir (where conf dir and Dwoo templates dir are stored)
  GWEB_STATEDIR = /var/www/html/ganglia

  # Gmetad rootdir (parent location of rrd folder)
  GMETAD_ROOTDIR = /data/ganglia

  APACHE_USER = apache

# 连外网太慢，下载放到本地
less ganglia-web-3.7.1/conf_default.php.in 
  #$conf['cubism_js_path'] = "js/cubism.v1.min.js";
  $conf['jquery_js_path'] = "js/jquery.min.js";
  $conf['jquerymobile_js_path'] = "js/jquery.mobile.min.js";
  $conf['jqueryui_js_path'] = "js/jquery-ui.min.js";
  $conf['rickshaw_js_path'] = "js/rickshaw.min.js";
  $conf['cubism_js_path'] = "js/cubism.v1.min.js";
  $conf['d3_js_path'] = "js/d3.min.js";
  $conf['protovis_js_path'] = "js/protovis.min.js";

# 3> 源文件
# 把文件放到SOURCES目录下，
ls SOURCES/
  ganglia-3.7.2.tar.gz  ganglia-web-3.7.1.tar.gz

# 4> 编译生成RPM
rpmbuild -v -ba SPECS/gmetad.spec 
rpmbuild -v -ba SPECS/gmond.spec 
rpmbuild -v -ba SPECS/gweb.spec 

# 5> 查看内容
rpm -qpl RPMS/x86_64/ganglia-3.7.2-1.el6.x86_64.rpm 

本地仓库

这里假设已经把系统光盘做成了本地仓库。

先安装httpd、php、createrepo，然后按照下面的步骤创建本地仓库：

# 系统带的可以从光盘拷贝，直接映射到httpd的目录下即可
[hadoop@hadoop-master1 rhel6.3]$ ls 
Packages  repodata
[hadoop@hadoop-master1 html]$ pwd
/var/www/html
[hadoop@hadoop-master1 html]$ ll
lrwxrwxrwx.  1 root root   20 2月  15 2014 rhel6.3 -> /opt/rhel6.3

[hadoop@hadoop-master1 ~]$ sudo mkdir -p /opt/dta/repo
[hadoop@hadoop-master1 ~]$ cd /opt/dta/repo
[hadoop@hadoop-master1 repo]$ ls *.rpm
gmetad-3.7.2-1.el6.x86_64.rpm  gmond-3.7.2-1.el6.x86_64.rpm  gweb-3.7.1-1.el6.x86_64.rpm  libconfuse-2.7-4.el6.x86_64.rpm

[hadoop@hadoop-master1 repo]$ sudo createrepo .
3/3 - libconfuse-2.7-4.el6.x86_64.rpm                                           
Saving Primary metadata
Saving file lists metadata
Saving other metadata

# 映射到httpd目录下
[hadoop@hadoop-master1 yum.repos.d]$ cd /var/www/html/
[hadoop@hadoop-master1 html]$ sudo ln -s /opt/dta/repo dta

# 加入本地仓库源
[hadoop@hadoop-master1 yum.repos.d]$ sudo cp puppet.repo dta.repo
[hadoop@hadoop-master1 yum.repos.d]$ sudo vi dta.repo 
[dta]
name=DTA Local
baseurl=http://hadoop-master1:801/dta
enabled=1
gpgcheck=0

注意：在安装的时刻找不到gmond，可以先清理yum的缓冲： yum clean all

puppet模块

添加了三个模块，用于主机添加repo配置和sudo配置，以及安装配置gmond。

[root@hadoop-master1 modules]# tree $PWD
/etc/puppetlabs/code/environments/production/modules
├── dtarepo
│   ├── manifests
│   │   └── init.pp
│   └── templates
│       └── dta.repo
├── gmond
│   ├── manifests
│   │   └── init.pp
│   └── templates
│       └── gmond.conf
└── sudo
    ├── manifests
    │   └── init.pp
    └── templates
        └── sudo.erb

都比较简单，通过init.pp来进行配置，然后加载模板，写入到同步主机本地文件中。

dtarepo

./dtarepo/manifests/init.pp
class dtarepo {

file{'/etc/yum.repos.d/dta.repo':
  ensure => file,
  content => template('dtarepo/dta.repo'),
}

}

./dtarepo/templates/dta.repo
[dta]
name=DTA Local
baseurl=http://hadoop-master1:801/dta
enabled=1
gpgcheck=0

sudo：用于sudo切root，方便调试

./sudo/manifests/init.pp
class sudo {

if ( $::hostname =~ /(^cu-omc)/ ) {
  $user = 'omc'
} elsif ( $::hostname =~ /(^cu-uc)/ ) {
  $user = 'uc'
} elsif ( $::hostname =~ /(^cu-ud)/ ) {
  $user = 'ud'
} elsif ( $::hostname =~ /(^cu-db)/ ) {
  $user = 'mysql'
} else {
  $user = 'hadoop'
}


file { "/etc/sudoers.d/10_$user":
  ensure => file,
  mode => '0440', 
  content => template('sudo/sudo.erb'),
}


}

./sudo/templates/sudo.erb
<%= scope.lookupvar('sudo::user') %> ALL=(ALL) NOPASSWD: ALL

gmond

在默认的gmond.conf基础上修改一下两个配置: globals.deaf, cluster.name

./gmond/manifests/init.pp
class gmond {

$deaf = $::hostname ? {
  'hadoop-master1' => 'no',
  'cu-omc1' => 'no',
  default => 'yes',
}

if ( $::hostname =~ /(^cu-)/ ) {
  $cluster_name = 'CU'
} else {
  $cluster_name = 'HADOOP'
}

package { 'gmond':
  ensure => present,
  before => File['/usr/local/ganglia/etc/gmond.conf'],
}

file { '/usr/local/ganglia/etc/gmond.conf':
  ensure => file,
  content => template('gmond/gmond.conf'),
  notify => Service['gmond'],
}

service { 'gmond':
  ensure    => running,
  enable    => true,
}

}

./gmond/templates/gmond.conf
/* This configuration is as close to 2.5.x default behavior as possible
   The values closely match ./gmond/metric.h definitions in 2.5.x */
globals {
...
  mute = no
  deaf = <%= scope.lookupvar('gmond::deaf') %>
  allow_extra_data = yes
...
cluster {
  name = "<%= scope.lookupvar('gmond::cluster_name') %>"

参考下逻辑即可（也可以通过hiera配置）。

最后在 site.pp 引用加载编写的Module：

[root@hadoop-master1 modules]# cd ../manifests/
[root@hadoop-master1 manifests]# cat site.pp 
file{'/etc/puppetlabs/mcollective/facts.yaml':
  owner    => root,
  group    => root,
  mode     => '400',
  loglevel => debug, # reduce noise in Puppet reports
  content  => inline_template("<%= scope.to_hash.reject { |k,v| k.to_s =~ /(uptime_seconds|timestamp|free)/ }.to_yaml %>"), # exclude rapidly changing facts
}


include dtarepo
include gmond

# include sudo

一键安装

安装gmetad：

首先在主机上安装gmetad，由于只需要在一台机器安装，配置没有整成模板，这里直接手动弄。

[root@hadoop-master1 dtarepo]# mco rpc package install package=gmetad -I cu-omc1 （或者直接yum install -y gmetad）

# 注意：主机多网卡时可能需要添加route
[root@cu-omc1 ~]# route add -host 239.2.11.71 dev bond0

[root@cu-omc1 ~]# /etc/ganglia/gmetad.conf 注意!! 这里的rrd_rootdir配置与上面gweb/makefile是对应的！！
data_source "HADOOP" hadoop-master1
data_source "CU" cu-omc1
gridname "CQCU"
rrd_rootdir "/data/ganglia/rrds"

注意： data_source 需要配合 gmond/manifests/init.pp 中的 deaf 属性值。

php的时区调整：vi /etc/php.ini date.timezone = “Asia/Shanghai”

安装gmond：

在cu-omc2上安装gmond（正则表达式，想怎么匹配就怎么写）：

[root@hadoop-master1 production]# mco shell -I /^cu-omc2/ run -- "/opt/puppetlabs/bin/puppet agent -t"

puppet同步好后，就安装好puppet，以及启动gmond服务。

同时看看web是否已经有图像。不要看一分钟负载，搞一个明显一点的，如磁盘容量内存容量可以明确判断数据有没有采集到的。 没有话可以试着重启gmond:

[root@hadoop-master1 production]# mco shell -I /cu-/ run -- service gmond restart

不要一次重启太多机器，时间比较长的话可以结合screen命令使用：

[root@hadoop-master1 ~]# screen
[root@hadoop-master1 ~]# mco shell -I hadoop-slaver2 -I hadoop-slaver3 -I hadoop-slaver4  -I hadoop-slaver5 -I hadoop-slaver6 -I hadoop-slaver7 -I hadoop-slaver8 -I hadoop-slaver9  run -- service gmond restart ; 
[root@hadoop-master1 ~]# for ((i=1;i<17;i++)) ; do mco shell -I /hadoop-slaver${i}.$/ run -- service gmond restart ; sleep 60 ; done  

[root@hadoop-master1 ~]# screen -ls 
[root@hadoop-master1 ~]# screen -r 22929

–END

← Older Blog Archives Newer →

佛爷

来之不易, 且等且珍惜.
得之我幸; 不得-争-复争-且不得, 命也, 乐享天命, 福也.

GitHub Repos

Status updating…

@winse on GitHub

Winse Blog

走走停停都是风景, 熙熙攘攘都向最好, 忙忙碌碌都为明朝, 何畏之.

Redis使用优化

Redis扩展

存储优化

另一个案例

使用 Naxsi 处理 XSS

Codis简单使用

安装参考

使用 Flume+kafka+elasticsearch 处理数据

实战1

实战2

实例：Flume+Kafka+ES

附-kafka基本操作

附-Flume操作

使用Puppet安装配置Ganglia

前提（自己操作过熟悉怎么用）

编译gmetad，gmond，gweb

本地仓库

puppet模块

一键安装