Winse Blog

走走停停都是风景, 熙熙攘攘都向最好, 忙忙碌碌都为明朝, 何畏之.

Java中使用代理-基于Shandowsocks

在开发过程中,时不时需要要代理一下,来访问我们需要的资源,比方说:DEBUG生产集群的应用、还有在Java中翻墙等等。解决了全局的代理能完成我们访问到资源的时刻,又有新的要求,比方说:只有特定的资源走代理等等。

下面把要点简单罗列下,以供参考:

JDK官网基本内容全部都包括了,其他链接作为理解辅助,看看人家的实际需求与具体解决方法。

Java全应用级代理(全局)

  • 走HTTP

Shandowsocks转HTTP,前面Docker翻墙安装Kubernate有弄过,参考:Privoxy

也可以直接用Shandowsocks提供的 启用系统代理 -> 系统代理模式 -> 全局模式 来转换,启用HTTP代理功能。(开全局模式,本地会把socks代理转成为一个http的代理)

1
2
3
4
5
-Dhttp.proxyHost=127.0.0.1
-Dhttp.proxyPort=7070
-Dhttps.proxyHost=127.0.0.1
-Dhttps.proxyPort=7070
-Dhttp.nonProxyHosts="localhost|127.0.0.1|192.168.*"
  • http.proxyHost: the host name of the proxy server
  • http.proxyPort: the port number, the default value being 80.
  • http.nonProxyHosts:a list of hosts that should be reached directly, bypassing the proxy. This is a list of patterns separated by ‘|’. The patterns may start or end with a ‘*’ for wildcards. Any host matching one of these patterns will be reached through a direct connection instead of through a proxy.
  • 走Socks
1
-DsocksProxyHost=127.0.0.1 -DsocksProxyPort=7070
  • 使用系统代理
1
-Djava.net.useSystemProxies=true

部分/使用时设置(自动切换)

  • 应用内通过 setProperty 临时 进行设置(这种有缺陷,中间一段时间相当于全局代理了,不推荐)
1
2
3
4
System.setProperty("http.proxyHost", proxyHost);
System.setProperty("http.proxyPort", proxyPort);
System.setProperty("https.proxyHost", proxyHost);
System.setProperty("https.proxyPort", proxyPort);

用完之后,取消设置:

1
2
System.clearProperty("http.proxyHost");
...
  • 请求时指定代理:
1
2
3
4
5
SocketAddress addr = new InetSocketAddress("webcache.example.com", 8080);
Proxy proxy = new Proxy(Proxy.Type.HTTP, addr);

URL url = new URL("http://java.example.org/");
URLConnection conn = url.openConnection(proxy);
  • (选择性的)配置哪些访问走代理:ProxySelector

任何请求访问网络之前,会被ProxySelector拦截。根据规则获取一个符合的Proxy(或者Proxy.NO_PROXY),然后通过这个代理去访问网络。

As you can see, with Java SE 5.0, the developer gains quite a bit of control and flexibility when it comes to proxies. Still, there are situations where one would like to decide which proxy to use dynamically, for instance to do some load balancing between proxies, or depending on the destination, in which case the API described so far would be quite cumbersome. That’s where the ProxySelector comes into play.

The best thing about the ProxySelector is that it is plugable! Which means that if you have needs that are not covered by the default one, you can write a replacement for it and plug it in!

基本上看JDK官网的内容就好了,非常全。也可以参考下 URLs and URIs, Proxies and Passwords

注册自定义的Selector:

1
2
3
4
5
public static void main(String[] args) {
    MyProxySelector ps = new MyProxySelector(ProxySelector.getDefault());
    ProxySelector.setDefault(ps);
    // rest of the application
}

Selector实现:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
public class MyProxySelector extends ProxySelector {
    // Keep a reference on the previous default
    ProxySelector defsel = null;
    
    /*
     * Inner class representing a Proxy and a few extra data
     */
    class InnerProxy {
        Proxy proxy;
        SocketAddress addr;
        // How many times did we fail to reach this proxy?
        int failedCount = 0;
        
        InnerProxy(InetSocketAddress a) {
            this(a, Proxy.Type.HTTP);
        }
        
        InnerProxy(InetSocketAddress a, Proxy.Type type) {
            addr = a;
            proxy = new Proxy(type, a);
        }
        
        SocketAddress address() {
            return addr;
        }
        
        Proxy toProxy() {
            return proxy;
        }
        
        int failed() {
            return ++failedCount;
        }
    }
    
    /*
     * A list of proxies, indexed by their address.
     */
    HashMap<SocketAddress, InnerProxy> proxies = new HashMap<SocketAddress, InnerProxy>();

    MyProxySelector(ProxySelector def) {
        // Save the previous default
        defsel = def;
        
        // Populate the HashMap (List of proxies)
        InnerProxy i = new InnerProxy(new InetSocketAddress("webcache1.example.com", 8080));
        proxies.put(i.address(), i);
        i = new InnerProxy(new InetSocketAddress("webcache2.example.com", 8080));
        proxies.put(i.address(), i);
        i = new InnerProxy(new InetSocketAddress("webcache3.example.com", 8080));
        proxies.put(i.address(), i);
    }
        
    /*
     * This is the method that the handlers will call.
     * Returns a List of proxy.
     */
    public java.util.List<Proxy> select(URI uri) {
        // Let's stick to the specs. 
        if (uri == null) {
            throw new IllegalArgumentException("URI can't be null.");
        }
        
        /* 这里可以指定你自己的规则/配置
         * If it's a http (or https) URL, then we use our own list.
         */
        String protocol = uri.getScheme();
        if ("http".equalsIgnoreCase(protocol) ||
                "https".equalsIgnoreCase(protocol)) {
            ArrayList<Proxy> l = new ArrayList<Proxy>();
            for (InnerProxy p : proxies.values()) {
                l.add(p.toProxy());
            }
            return l;
        }
        
        /*
         * Not HTTP or HTTPS (could be SOCKS or FTP)
         * defer to the default selector.
         */
        if (defsel != null) {
            return defsel.select(uri);
        } else {
            ArrayList<Proxy> l = new ArrayList<Proxy>();
            l.add(Proxy.NO_PROXY);
            return l;
        }
    }
    
    /*
     * Method called by the handlers when it failed to connect
     * to one of the proxies returned by select().
     */
    public void connectFailed(URI uri, SocketAddress sa, IOException ioe) {
        // Let's stick to the specs again.
        if (uri == null || sa == null || ioe == null) {
            throw new IllegalArgumentException("Arguments can't be null.");
        }
        
        /*
         * Let's lookup for the proxy 
         */
        InnerProxy p = proxies.get(sa); 
        if (p != null) {
            /*
             * It's one of ours, if it failed more than 3 times
             * let's remove it from the list.
             */
            if (p.failed() >= 3)
                    proxies.remove(sa);
        } else {
            /*
             * Not one of ours, let's delegate to the default.
             */
            if (defsel != null)
              defsel.connectFailed(uri, sa, ioe);
        }
    }
}

–END

Logstash采集网站的访问日志

最近又重新接触了一下elasticsearch、logstash、kibana,蛮好用的一个日志框架。

同时好久没有更新网站内容、也没怎么关注,虽然有cnzz(umeng)的日志统计功能,但是毕竟是很小一段时间的。要是能够把日志都导出来,就可以用ELK来分析一下自己网站一年来文章的访问情况。

嗯,前阵子买了阿里云的一个VPN服务器,正好可以利用利用。把访问的日志情况通过http发送给logstash,然后存储下来,等过一段时间我们再回来分析分析这些日志。^^

启动Logstash收集服务

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
~/logstash-6.1.2/bin/logstash -e '
input { 
  http { 
    port => 20000 
    response_headers => {
      "Access-Control-Allow-Origin" => "*"
      "Content-Type" => "application/json"
      "Access-Control-Allow-Headers" => "Origin, X-Requested-With, Content-Type, Accept"
    }
  } 
} 
filter {
  if [message] =~ /^\s*$/ {
    drop { }
  }
  
  json {
    source => "message"
  }
  json {
    source => "location"
    target => "location"
  }
  mutate {
    remove_field => [ "headers" ]
  }
}
output { 
  file { 
    path => "winse-accesslog-%{+YYYY-MM-dd}.log"
    codec => json_lines 
  } 
} 
'

页面发送访问日志记录

1
2
3
4
5
6
7
8
9
10
11
12
$.ajax({
  type: "POST",
  url: "http://SERVER:PORT",
  data: JSON.stringify({
    title: document.title,
    location: JSON.stringify(location),
    referrer: document.referrer,
    userAgent: navigator.userAgent
  }),
  contentType: "application/json; charset=utf-8",
  dataType: "json"
});

–END

Gitalk on Octopress

以前有添加过 多说 ,步骤都类似的。其实就是调用一个第三方的服务,把评论的数据存储在第三方。

可以先看看 gitalk 的文档 ,分四步:

  • 注册一个github 的 OAuth Apps
  • 添加div容器
  • 加入css,js依赖
  • 调用javascript显示

配置

注册一个github应用

_layouts/post.html 的 Comments 下添加一个 gitalk-container 的节点:

(粘贴后把大括号和百分号之间的空格去掉)

1
2
3
4
5
6
7
8
{ % if site.disqus_short_name and page.comments == true % }
  <section>
    <h1>Comments</h1>
<!-- gitalk评论 start -->
    <div id="gitalk-container"></div> 
<!-- gitalk评论 end -->
  </section>
{ % endif % }

_includes 目录下增加一个 gitalk.html 的页面,添加依赖并添加初始化代码:

这里clientID,clientSecret对应第一步注册应用的id和secret。

在官网文档给的例子上调整了一下: id, body, createIssueManually。代码里面是通过 labels + id 来查询对应的issue:查询Issue源码

(粘贴后把大括号和百分号之间的空格去掉)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
{ % if site.disqus_short_name and page.comments != false % }

<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/gitalk@1/dist/gitalk.css">
<script src="https://cdn.jsdelivr.net/npm/gitalk@1/dist/gitalk.min.js"></script>

<script>

var gitalk = new Gitalk({
  clientID: 'c14f68eac6330d15d984',
  clientSecret: '73b7c1fffa98e299ff0cdd332821201933858e6e',
  repo: 'winse.github.com',
  owner: 'winse',
  admin: ['winse'],
  id: location.pathname,
  labels: ['Gitalk'],
  body: "http://winse.github.io" + location.pathname,
  createIssueManually: true,
  
  // facebook-like distraction free mode
  distractionFreeMode: false
})

gitalk.render('gitalk-container')

</script>

{ % endif % }

然后在同一级目录的 after_footer.html 新增一条 这个新页面一个引用(粘贴后把大括号和百分号之间的空格去掉):

1
{ % include gitalk.html % }

初始化

其实就是在对应的repo下面建一个repo,注意下 labels 规则就行了:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
username = "winse" # GitHub 用户名

# https://github.com/settings/tokens
new_token = ""  # GitHub Token
repo_name = "winse.github.com" # 存放 issues

sitemap_url = "sitemap.xml" # sitemap
kind = "Gitalk"

# 可以结合git的状态,added的调用命令创建一个issue

# 除了使用Token,也可以手动输入密码
curl -H "Accept: application/json" -X POST -d '{"body": "http://winseliu.com/blog/2017/11/20/sed-debug-sedsed/", "labels": ["Gitalk", "/blog/2017/11/20/sed-debug-sedsed/"], "title": "gitalk 测试" }' -u $username https://api.github.com/repos/$username/$repo_name/issues
Enter host password for user 'winse':

OR

# https://developer.github.com/v3/auth/#basic-authentication
curl -u username:token https://api.github.com/user

参考

–END

Sed Debug: Sedsed

上一篇把html转成rst,但是页面之间的链接都断了。需要在标题前加上一个TAG,最终效果如下:

1
2
3
4
5
6
7

.. _Creating Objects in New Mappings:

Creating Objects in New Mappings
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

+ :ref:`Creating Objects in New Mappings`

想使用sed来实现这个功能,需要利用一些sed的高级功能,但默认sed是不能调试。这里使用sedsed来查看每一个操作的模式空间和缓冲空间,有点类似print调试。对于理解 sed 很有帮助,特别是对理解缓冲区和模式区数据的处理。

安装 sedsed

1
2
cd /opt/
git clone https://github.com/aureliojargas/sedsed

看看实际的调试输出:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
[root@ansible sedsed]# echo -e "one\ntwo\nthree\nfour" | ./sedsed.py -d -f test/scripts/sort.gnu.sed 
PATT:one$
HOLD:$
COMM:H
PATT:one$
HOLD:\none$
COMM:$ !d
PATT:two$
HOLD:\none$
COMM:H
PATT:two$
HOLD:\none\ntwo$
COMM:$ !d
PATT:three$
HOLD:\none\ntwo$
COMM:H
PATT:three$
HOLD:\none\ntwo\nthree$
COMM:$ !d
PATT:four$
HOLD:\none\ntwo\nthree$
COMM:H
PATT:four$
HOLD:\none\ntwo\nthree\nfour$
COMM:$ !d
PATT:four$
HOLD:\none\ntwo\nthree\nfour$
COMM:g
PATT:\none\ntwo\nthree\nfour$
HOLD:\none\ntwo\nthree\nfour$
COMM:s/.//
PATT:one\ntwo\nthree\nfour$
HOLD:\none\ntwo\nthree\nfour$
COMM:s/\n/&L&l/g
PATT:one\nL\nltwo\nL\nlthree\nL\nlfour$
HOLD:\none\ntwo\nthree\nfour$
COMM:s/^/\\Na/
PATT:\naone\nL\nltwo\nL\nlthree\nL\nlfour$
HOLD:\none\ntwo\nthree\nfour$
COMM:s/\nL/\\NA/
PATT:\naone\nA\nltwo\nL\nlthree\nL\nlfour$
HOLD:\none\ntwo\nthree\nfour$
COMM:s/$/\\NL/
PATT:\naone\nA\nltwo\nL\nlthree\nL\nlfour\nL$
HOLD:\none\ntwo\nthree\nfour$
COMM:b start
COMM:/\nA$/ b exit
COMM:s/\nb/\\Nl/
PATT:\naone\nA\nltwo\nL\nlthree\nL\nlfour\nL$
HOLD:\none\ntwo\nthree\nfour$
COMM:s/\nB/\\NL/
PATT:\naone\nA\nltwo\nL\nlthree\nL\nlfour\nL$
HOLD:\none\ntwo\nthree\nfour$
COMM:s/\(\na.*\nA\)\nl\([^\n]*\)\nL/\1\\Nb\2\\NB/
PATT:\naone\nA\nbtwo\nB\nlthree\nL\nlfour\nL$
HOLD:\none\ntwo\nthree\nfour$
COMM::sort
COMM:h
......

[root@ansible sedsed]# (date +'%w %d' ; date +'%-m %Y') | ./sedsed.py -d -f test/scripts/cal.sed
......

网上的一案例

看到一个论坛帖子上用sed实现 删除匹配的前两行和后三行 ,看的不是很明白,帖子仅注意介绍流程,至于数据到底是怎么样的没有讲。如果知道 sedsed 这工具的话,运行一遍就全部清楚了:

sedsed.py 处理 + 加号有点问题,所以这里就处理匹配的前两行,看看具体的数据是怎么流转的:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
[root@ansible sedsed]# echo -e "1\n2\n3\n4\n5\n6\n7\n8\n9\n10" | ./sedsed.py -d '/5/d;:go;1,2!{P;N;D};N;bgo' 
PATT:1$
HOLD:$
COMM:/5/ d
PATT:1$
HOLD:$
COMM::go
COMM:1,2 !{
COMM:N
PATT:1\n2$
HOLD:$
COMM:b go
COMM:1,2 !{
COMM:N
PATT:1\n2\n3$
HOLD:$
COMM:b go
COMM:1,2 !{
COMM:P
1
PATT:1\n2\n3$
HOLD:$
COMM:N
PATT:1\n2\n3\n4$
HOLD:$
COMM:D
PATT:2\n3\n4$
HOLD:$
COMM:/5/ d
PATT:2\n3\n4$
HOLD:$
COMM::go
COMM:1,2 !{
COMM:P
2
PATT:2\n3\n4$
HOLD:$
COMM:N
PATT:2\n3\n4\n5$
HOLD:$
COMM:D
PATT:3\n4\n5$
HOLD:$
COMM:/5/ d
PATT:6$
HOLD:$
COMM:/5/ d
PATT:6$
HOLD:$
COMM::go
COMM:1,2 !{
COMM:P
6
PATT:6$
HOLD:$
COMM:N
PATT:6\n7$
HOLD:$
COMM:D
PATT:7$
HOLD:$
COMM:/5/ d
PATT:7$
HOLD:$
COMM::go
COMM:1,2 !{
COMM:P
7
PATT:7$
HOLD:$
COMM:N
PATT:7\n8$
HOLD:$
COMM:D
PATT:8$
HOLD:$
COMM:/5/ d
PATT:8$
HOLD:$
COMM::go
COMM:1,2 !{
COMM:P
8
PATT:8$
HOLD:$
COMM:N
PATT:8\n9$
HOLD:$
COMM:D
PATT:9$
HOLD:$
COMM:/5/ d
PATT:9$
HOLD:$
COMM::go
COMM:1,2 !{
COMM:P
9
PATT:9$
HOLD:$
COMM:N
PATT:9\n10$
HOLD:$
COMM:D
PATT:10$
HOLD:$
COMM:/5/ d
PATT:10$
HOLD:$
COMM::go
COMM:1,2 !{
COMM:P
10
PATT:10$
HOLD:$
COMM:N
10

可以看到 PATT 模式空间把前面两行连到一起了,匹配到 5 的时刻其实模式空间的内容为 3\n4\n5,然后执行 d 这就相当于删除前两行了。

该命令会多输出最后一行:由于到最后一行,N 又读取了一次下一行(读到结束符),直接就返回没有执行 D 了。sed 文档中的描述如下:

1
2
3
4
5
6
7
8
9
10
  `D'
      If pattern space contains no newline, start a normal new cycle as
      if the `d' command was issued.  Otherwise, delete text in the
      pattern space up to the first newline, and restart cycle with the
      resultant pattern space, without reading a new line of input.
  
  `N'
      Add a newline to the pattern space, then append the next line of
      input to the pattern space.  If there is no more input then `sed'
      exits without processing any more commands.

修复就是:读到最后一行的时刻就不读下一行了:

1
[root@ansible sedsed]# echo -e "1\n2\n3\n4\n5\n6\n7\n8\n9\n10" | sed '/5/,+3d;:go;1,2!{P;$!N;D};N;bgo' 

加标签

Sphinx可以通过 ref 来访问整个文档中定义的标签。所以只需要在每个标题前加上TAG,然后把链接引用修改成 ref 的方式即可。

1
2
3
4
5
6
7
# 文档加TAG:
sed -i ' h;N; /\n=\+$/{ x;s/.*/.. _&:\n/;p; x };  P;D ' $(find . -name '*.rst')
sed -i ' h;N; /\n-\+$/{ x;s/.*/.. _&:\n/;p; x };  P;D ' $(find . -name '*.rst')


# 修改链接引用:
sed 's/\(`[[:alnum:] ]*`\)_/:ref:\1/ ' $(find . -name '*.rst')

–END

Gitlab on Docker

1
2
3
4
5
6
7
8
9
10
11
12
13
./docker-download-mirror.sh sameersbn/redis sameersbn/gitlab:10.1.4 sameersbn/postgresql:9.6-2

# 如果有其他的compose项目,最好每个 docker-compose.yml 放 *不同名称* 的目录下!!
mkdir gitlab
cd !*

wget https://raw.githubusercontent.com/sameersbn/docker-gitlab/master/docker-compose.yml
sed -i '/GITLAB_HOST/s/.*/    - GITLAB_HOST=192.168.193.8/' docker-compose.yml 

docker-compose up -d

firewall-cmd --zone=public --add-port=80/tcp --permanent
firewall-cmd --reload

browser http://localhost:10080

UPDATE: 2018-10-10

1
2
3
4
5
6
7
8
9
10
[root@redmine gitlab]# wget https://raw.githubusercontent.com/sameersbn/docker-gitlab/master/docker-compose.yml

把postgres的版本改成 image: sameersbn/postgresql:9.6-2 (和redmine使用同一个版本)

[root@redmine ~]# ./docker-download-mirror.sh sameersbn/redis:4.0.9-1 sameersbn/gitlab:11.3.4 

修改 GITLAB_HOST GITLAB_PORT 
[root@redmine gitlab]# docker-compose up -d

然后加入到nginx进行统一访问。git.winseliu.com。。。

–END