Winse Blog

走走停停都是风景, 熙熙攘攘都向最好, 忙忙碌碌都为明朝, 何畏之.

Rsync与scp优势

今天在做flume写kafka数据时,数据从其他目录cp拷贝过来,flume采集程序报错 程序采集的时刻文件发生了改变

1
2
3
4
07 Mar 2016 16:46:05,535 ERROR [pool-3-thread-1] (org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run:256)  - FATAL: Spool Directory source s1: { spoolDir: /home/hadoop/flume/data/ }: Uncaught exception in SpoolDirectorySource thread. Restart or reconfigure Flume to continue processing.
java.lang.IllegalStateException: File has changed size since being read: /home/hadoop/flume/data/hbase-hadoop-master-cu2.log
        at org.apache.flume.client.avro.ReliableSpoolingFileEventReader.retireCurrentFile(ReliableSpoolingFileEventReader.java:326)
        at org.apache.flume.client.avro.ReliableSpoolingFileEventReader.readEvents(ReliableSpoolingFileEventReader.java:259)

联想到scp和rsync,好像rsync是有重命名这样的步骤的。网上也有很多对比这个两个工具的资料。

这里只关注最后一点,对于按照名称来采集的程序非常关键!下面使用inotify监控目录的操作,在进行scp和rsync时发生的操作:

1
2
3
4
5
6
7
8
9
[hadoop@cu2 test]$ scp -r source target/
[hadoop@cu2 test]$ rm target/source/1234
[hadoop@cu2 test]$ rsync -vaz source target/
sending incremental file list
source/
source/1234

sent 141 bytes  received 35 bytes  352.00 bytes/sec
total size is 34  speedup is 0.19

对应的inotify的输出为:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
[hadoop@cu2 test]$ inotifywait -m target/source/ # yum install -y inotify*
Setting up watches.
Watches established.
target/source/ CREATE 1234
target/source/ OPEN 1234
target/source/ MODIFY 1234
target/source/ CLOSE_WRITE,CLOSE 1234

target/source/ DELETE 1234

target/source/ ATTRIB,ISDIR 
target/source/ CREATE .1234.ARUg56
target/source/ OPEN .1234.ARUg56
target/source/ ATTRIB .1234.ARUg56
target/source/ MODIFY .1234.ARUg56
target/source/ CLOSE_WRITE,CLOSE .1234.ARUg56
target/source/ ATTRIB .1234.ARUg56
target/source/ MOVED_FROM .1234.ARUg56
target/source/ MOVED_TO 1234

rsync会先写把内容复制到一个临时文件,复制完成后,再重命名为正式的名称。

在生产环境尽量使用rsync来进行文件(夹)的复制/同步操作,即快键有安全。

当然还有奇葩的快速删除海量文件夹的方式也用的是rsync:

1
2
3
rsync --delete-before -d /data/blank/ /var/spool/clientmqueue/ 

rsync --delete-before -a -H -v --progress --stats /tmp/test/ log/

–END

Comments