Sed Debug: Sedsed

上一篇把html转成rst，但是页面之间的链接都断了。需要在标题前加上一个TAG，最终效果如下：

.. _Creating Objects in New Mappings:

Creating Objects in New Mappings
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

+ :ref:`Creating Objects in New Mappings`

想使用sed来实现这个功能，需要利用一些sed的高级功能，但默认sed是不能调试。这里使用sedsed来查看每一个操作的模式空间和缓冲空间，有点类似print调试。对于理解 sed 很有帮助，特别是对理解缓冲区和模式区数据的处理。

安装 sedsed

官网文档

cd /opt/
git clone https://github.com/aureliojargas/sedsed

看看实际的调试输出：

[root@ansible sedsed]# echo -e "one\ntwo\nthree\nfour" | ./sedsed.py -d -f test/scripts/sort.gnu.sed 
PATT:one$
HOLD:$
COMM:H
PATT:one$
HOLD:\none$
COMM:$ !d
PATT:two$
HOLD:\none$
COMM:H
PATT:two$
HOLD:\none\ntwo$
COMM:$ !d
PATT:three$
HOLD:\none\ntwo$
COMM:H
PATT:three$
HOLD:\none\ntwo\nthree$
COMM:$ !d
PATT:four$
HOLD:\none\ntwo\nthree$
COMM:H
PATT:four$
HOLD:\none\ntwo\nthree\nfour$
COMM:$ !d
PATT:four$
HOLD:\none\ntwo\nthree\nfour$
COMM:g
PATT:\none\ntwo\nthree\nfour$
HOLD:\none\ntwo\nthree\nfour$
COMM:s/.//
PATT:one\ntwo\nthree\nfour$
HOLD:\none\ntwo\nthree\nfour$
COMM:s/\n/&L&l/g
PATT:one\nL\nltwo\nL\nlthree\nL\nlfour$
HOLD:\none\ntwo\nthree\nfour$
COMM:s/^/\\Na/
PATT:\naone\nL\nltwo\nL\nlthree\nL\nlfour$
HOLD:\none\ntwo\nthree\nfour$
COMM:s/\nL/\\NA/
PATT:\naone\nA\nltwo\nL\nlthree\nL\nlfour$
HOLD:\none\ntwo\nthree\nfour$
COMM:s/$/\\NL/
PATT:\naone\nA\nltwo\nL\nlthree\nL\nlfour\nL$
HOLD:\none\ntwo\nthree\nfour$
COMM:b start
COMM:/\nA$/ b exit
COMM:s/\nb/\\Nl/
PATT:\naone\nA\nltwo\nL\nlthree\nL\nlfour\nL$
HOLD:\none\ntwo\nthree\nfour$
COMM:s/\nB/\\NL/
PATT:\naone\nA\nltwo\nL\nlthree\nL\nlfour\nL$
HOLD:\none\ntwo\nthree\nfour$
COMM:s/\(\na.*\nA\)\nl\([^\n]*\)\nL/\1\\Nb\2\\NB/
PATT:\naone\nA\nbtwo\nB\nlthree\nL\nlfour\nL$
HOLD:\none\ntwo\nthree\nfour$
COMM::sort
COMM:h
......

[root@ansible sedsed]# (date +'%w %d' ; date +'%-m %Y') | ./sedsed.py -d -f test/scripts/cal.sed
......

网上的一案例

看到一个论坛帖子上用sed实现 删除匹配的前两行和后三行 ，看的不是很明白，帖子仅注意介绍流程，至于数据到底是怎么样的没有讲。如果知道 sedsed 这工具的话，运行一遍就全部清楚了：

sedsed.py 处理 + 加号有点问题，所以这里就处理匹配的前两行，看看具体的数据是怎么流转的：

[root@ansible sedsed]# echo -e "1\n2\n3\n4\n5\n6\n7\n8\n9\n10" | ./sedsed.py -d '/5/d;:go;1,2!{P;N;D};N;bgo' 
PATT:1$
HOLD:$
COMM:/5/ d
PATT:1$
HOLD:$
COMM::go
COMM:1,2 !{
COMM:N
PATT:1\n2$
HOLD:$
COMM:b go
COMM:1,2 !{
COMM:N
PATT:1\n2\n3$
HOLD:$
COMM:b go
COMM:1,2 !{
COMM:P
1
PATT:1\n2\n3$
HOLD:$
COMM:N
PATT:1\n2\n3\n4$
HOLD:$
COMM:D
PATT:2\n3\n4$
HOLD:$
COMM:/5/ d
PATT:2\n3\n4$
HOLD:$
COMM::go
COMM:1,2 !{
COMM:P
2
PATT:2\n3\n4$
HOLD:$
COMM:N
PATT:2\n3\n4\n5$
HOLD:$
COMM:D
PATT:3\n4\n5$
HOLD:$
COMM:/5/ d
PATT:6$
HOLD:$
COMM:/5/ d
PATT:6$
HOLD:$
COMM::go
COMM:1,2 !{
COMM:P
6
PATT:6$
HOLD:$
COMM:N
PATT:6\n7$
HOLD:$
COMM:D
PATT:7$
HOLD:$
COMM:/5/ d
PATT:7$
HOLD:$
COMM::go
COMM:1,2 !{
COMM:P
7
PATT:7$
HOLD:$
COMM:N
PATT:7\n8$
HOLD:$
COMM:D
PATT:8$
HOLD:$
COMM:/5/ d
PATT:8$
HOLD:$
COMM::go
COMM:1,2 !{
COMM:P
8
PATT:8$
HOLD:$
COMM:N
PATT:8\n9$
HOLD:$
COMM:D
PATT:9$
HOLD:$
COMM:/5/ d
PATT:9$
HOLD:$
COMM::go
COMM:1,2 !{
COMM:P
9
PATT:9$
HOLD:$
COMM:N
PATT:9\n10$
HOLD:$
COMM:D
PATT:10$
HOLD:$
COMM:/5/ d
PATT:10$
HOLD:$
COMM::go
COMM:1,2 !{
COMM:P
10
PATT:10$
HOLD:$
COMM:N
10

可以看到 PATT 模式空间把前面两行连到一起了，匹配到 5 的时刻其实模式空间的内容为 3\n4\n5，然后执行 d 这就相当于删除前两行了。

该命令会多输出最后一行：由于到最后一行，N 又读取了一次下一行（读到结束符），直接就返回没有执行 D 了。sed 文档中的描述如下：

  `D'
      If pattern space contains no newline, start a normal new cycle as
      if the `d' command was issued.  Otherwise, delete text in the
      pattern space up to the first newline, and restart cycle with the
      resultant pattern space, without reading a new line of input.
  
  `N'
      Add a newline to the pattern space, then append the next line of
      input to the pattern space.  If there is no more input then `sed'
      exits without processing any more commands.

修复就是：读到最后一行的时刻就不读下一行了：

[root@ansible sedsed]# echo -e "1\n2\n3\n4\n5\n6\n7\n8\n9\n10" | sed '/5/,+3d;:go;1,2!{P;$!N;D};N;bgo' 

加标签

Sphinx可以通过 ref 来访问整个文档中定义的标签。所以只需要在每个标题前加上TAG，然后把链接引用修改成 ref 的方式即可。

# 文档加TAG：
sed -i ' h;N; /\n=\+$/{ x;s/.*/.. _&:\n/;p; x };  P;D ' $(find . -name '*.rst')
sed -i ' h;N; /\n-\+$/{ x;s/.*/.. _&:\n/;p; x };  P;D ' $(find . -name '*.rst')


# 修改链接引用：
sed 's/\(`[[:alnum:] ]*`\)_/:ref:\1/ ' $(find . -name '*.rst')

–END

Winse Blog

走走停停都是风景, 熙熙攘攘都向最好, 忙忙碌碌都为明朝, 何畏之.

安装 sedsed

网上的一案例

加标签

Comments