合并分支 'master' 到 'mengxia'

Master 查看合并请求 !831

合并分支 'master' 到 'mengxia'
Master 查看合并请求 !831
35e52982 · 孟霞 · b354a1a2 · 9fcf1074 · 35e52982 · 35e52982
Commit 35e52982 authored Jan 09, 2021 by 孟霞
--- a/Chapter13/Figures/figure-bpe.tex
+++ b/Chapter13/Figures/figure-bpe.tex
 \begin{tikzpicture}
-\tikzstyle{tnode} = [rectangle,inner sep=0em,minimum width=8em,minimum height=6.6em,rounded corners=5pt,fill=ugreen!20]
-\tikzstyle{pnode} = [rectangle,inner sep=0em,minimum width=8em,minimum height=6.6em,rounded corners=5pt,fill=yellow!20]
+\tikzstyle{tnode} = [rectangle,inner sep=0em,minimum width=8em,minimum height=6.6em,rounded corners=5pt,fill=green!20]
+\tikzstyle{pnode} = [rectangle,inner sep=0em,minimum width=8em,minimum height=6.6em,rounded corners=5pt,fill=yellow!30]
 \tikzstyle{mnode} = [rectangle,inner sep=0em,minimum width=8em,minimum height=6.6em,rounded corners=5pt,fill=red!20]
 \tikzstyle{wnode} = [inner sep=0em,minimum height=1.5em]

@@ -19,7 +19,7 @@


 \begin{pgfonlayer}{background}
-\node [rectangle,inner sep=0.7em,draw,ugreen!40,dashed,very thick,rounded corners=7pt] [fit = (n1) (n4)] (box1) {};
+\node [rectangle,inner sep=0.7em,draw,ugreen!60,dashed,very thick,rounded corners=7pt] [fit = (n1) (n4)] (box1) {};
 \end{pgfonlayer}

 \node [anchor=west,align=left,font=\footnotesize] (nt1) at ([xshift=0.1em,yshift=0em]n2.east) {统计词表和\\[0.5ex]词频};
@@ -75,7 +75,7 @@
 \node [anchor=east,ublue,align=left,font=\footnotesize] (l3) at ([xshift=-0.5em,yshift=0em]cd.west) {直至达到设定的符号合\\并表大小或无法合并};

 \begin{pgfonlayer}{background}
-\node [rectangle,inner sep=0.7em,draw,yellow!40,dashed,very thick,rounded corners=7pt] [fit = (n5) (n8) (l3) (cd)] (box2) {};
+\node [rectangle,inner sep=0.7em,draw,orange!40,dashed,very thick,rounded corners=7pt] [fit = (n5) (n8) (l3) (cd)] (box2) {};
 \end{pgfonlayer}

 %第五排

--- a/Chapter13/Figures/figure-framework-of-Adversarial-Neural-machine-translation.tex
+++ b/Chapter13/Figures/figure-framework-of-Adversarial-Neural-machine-translation.tex
@@ -4,7 +4,7 @@

 \begin{tikzpicture}

-\tikzstyle{rnnnode} = [draw,inner sep=4pt,minimum width=2em,minimum height=2em,rounded corners=1pt,fill=yellow!20]
+\tikzstyle{rnnnode} = [draw,inner sep=4pt,minimum width=2em,minimum height=2em,rounded corners=1pt,fill=green!20]
 \tikzstyle{snode} = [draw,inner sep=4pt,minimum width=2em,minimum height=2em,rounded corners=1pt,fill=red!20]
 \tikzstyle{wode} = [inner sep=0pt,minimum width=2em,minimum height=2em,rounded corners=0pt]


--- a/Chapter14/Figures/figure-different-integration-model.tex
+++ b/Chapter14/Figures/figure-different-integration-model.tex
@@ -10,12 +10,12 @@
            \tikzstyle{output} = [rectangle,thick,rounded corners=3pt,minimum width=1.2cm,align=center,font=\scriptsize];

            \begin{scope}
-                \node [system,fill=orange!20,draw] (model3) at (0,0) {模型 $3$};
-                \node [system,fill=ugreen!20,draw,anchor=south] (model2) at ([yshift=0.5cm]model3.north) {模型 $2$};
+                \node [system,fill=yellow!30,draw] (model3) at (0,0) {模型 $3$};
+                \node [system,fill=green!20,draw,anchor=south] (model2) at ([yshift=0.5cm]model3.north) {模型 $2$};
                \node [system,fill=red!20,draw,anchor=south] (model1) at ([yshift=0.5cm]model2.north) {模型 $1$};

-                \node [output,fill=orange!20,draw,anchor=west] (output3) at ([xshift=0.8cm]model3.east) {输出 $3$};
-                \node [output,fill=ugreen!20,draw,anchor=west] (output2) at ([xshift=0.8cm]model2.east) {输出 $2$};
+                \node [output,fill=yellow!30,draw,anchor=west] (output3) at ([xshift=0.8cm]model3.east) {输出 $3$};
+                \node [output,fill=green!20,draw,anchor=west] (output2) at ([xshift=0.8cm]model2.east) {输出 $2$};
                \node [output,fill=red!20,draw,anchor=west] (output1) at ([xshift=0.8cm]model1.east) {输出 $1$};

                \begin{pgfonlayer}{background}
@@ -40,15 +40,15 @@
            \tikzstyle{output} = [rectangle,thick,rounded corners=3pt,minimum width=1.2cm,align=center,font=\scriptsize];

            \begin{scope}
-                \node [system,fill=orange!20,draw] (model3) at (0,0) {模型 $3$};
-                \node [system,fill=ugreen!20,draw,anchor=south] (model2) at ([yshift=0.5cm]model3.north) {模型 $2$};
+                \node [system,fill=yellow!30,draw] (model3) at (0,0) {模型 $3$};
+                \node [system,fill=green!20,draw,anchor=south] (model2) at ([yshift=0.5cm]model3.north) {模型 $2$};
                \node [system,fill=red!20,draw,anchor=south] (model1) at ([yshift=0.5cm]model2.north) {模型 $1$};

                \begin{pgfonlayer}{background}
                    \node [draw,thick,dashed,inner sep=3pt,fit=(model3) (model2) (model1)] (ensemble) {};
                \end{pgfonlayer}

-                \node [system,fill=ugreen!20,draw,right=1cm of ensemble] (model) {模型};
+                \node [system,fill=green!20,draw,right=1cm of ensemble] (model) {模型};

                \node [output,fill=cocoabrown!20,draw,minimum width=1.2cm,anchor=west] (final) at ([xshift=0.8cm]model.east) {最终\\输出};

@@ -68,12 +68,12 @@
            \tikzstyle{dot} = [circle,fill=blue!40!white,minimum size=5pt,inner sep=0pt];

            \begin{scope}
-                \node [system,fill=orange!20,draw] (model3) at (0,0) {模型 $3$};
-                \node [system,fill=ugreen!20,draw,anchor=south] (model2) at ([yshift=0.5cm]model3.north) {模型 $2$};
+                \node [system,fill=yellow!30,draw] (model3) at (0,0) {模型 $3$};
+                \node [system,fill=green!20,draw,anchor=south] (model2) at ([yshift=0.5cm]model3.north) {模型 $2$};
                \node [system,fill=red!20,draw,anchor=south] (model1) at ([yshift=0.5cm]model2.north) {模型 $1$};

-                \node [output,fill=orange!20,draw,anchor=west] (output3) at ([xshift=0.8cm]model3.east) {输出 $3$};
-                \node [output,fill=ugreen!20,draw,anchor=west] (output2) at ([xshift=0.8cm]model2.east) {输出 $2$};
+                \node [output,fill=yellow!30,draw,anchor=west] (output3) at ([xshift=0.8cm]model3.east) {输出 $3$};
+                \node [output,fill=green!20,draw,anchor=west] (output2) at ([xshift=0.8cm]model2.east) {输出 $2$};
                \node [output,fill=red!20,draw,anchor=west] (output1) at ([xshift=0.8cm]model1.east) {输出 $1$};

                \draw [->,very thick] (model1) to (output1);

--- a/Chapter14/Figures/figure-hypothesis-generation.tex
+++ b/Chapter14/Figures/figure-hypothesis-generation.tex
@@ -5,12 +5,12 @@
 \tikzstyle{output} = [rectangle,thick,rounded corners=3pt,minimum width=1.2cm,align=center,font=\scriptsize];

 \begin{scope}[local bounding box=MULTIPLE]
-    \node [system,fill=orange!20,draw] (engine3) at (0,0) {系统 $n$};
-    \node [system,fill=ugreen!20,draw,anchor=south] (engine2) at ([yshift=0.6cm]engine3.north) {系统 $2$};
+    \node [system,fill=yellow!30,draw] (engine3) at (0,0) {系统 $n$};
+    \node [system,fill=green!20,draw,anchor=south] (engine2) at ([yshift=0.6cm]engine3.north) {系统 $2$};
    \node [system,fill=red!20,draw,anchor=south] (engine1) at ([yshift=0.3cm]engine2.north) {系统 $1$};

-    \node [output,fill=orange!20,draw,anchor=west] (output3) at ([xshift=0.5cm]engine3.east) {输出 $n$};
-    \node [output,fill=ugreen!20,draw,anchor=west] (output2) at ([xshift=0.5cm]engine2.east) {输出 $2$};
+    \node [output,fill=yellow!30,draw,anchor=west] (output3) at ([xshift=0.5cm]engine3.east) {输出 $n$};
+    \node [output,fill=green!20,draw,anchor=west] (output2) at ([xshift=0.5cm]engine2.east) {输出 $2$};
    \node [output,fill=red!20,draw,anchor=west] (output1) at ([xshift=0.5cm]engine1.east) {输出 $1$};

    \draw [very thick,decorate,decoration={brace}] ([xshift=3pt]output1.north east) to node [midway,name=final] {} ([xshift=3pt]output3.south east);
@@ -25,11 +25,11 @@
 \end{scope}

 \begin{scope}[local bounding box=SINGLE]
-    \node [output,fill=ugreen!20,draw,anchor=west] (output3) at ([xshift=4cm]output3.east) {输出 $n$};
-    \node [output,fill=ugreen!20,draw,anchor=west] (output2) at ([xshift=4cm]output2.east) {输出 $2$};
-    \node [output,fill=ugreen!20,draw,anchor=west] (output1) at ([xshift=4cm]output1.east) {输出 $1$};
+    \node [output,fill=green!20,draw,anchor=west] (output3) at ([xshift=4cm]output3.east) {输出 $n$};
+    \node [output,fill=green!20,draw,anchor=west] (output2) at ([xshift=4cm]output2.east) {输出 $2$};
+    \node [output,fill=green!20,draw,anchor=west] (output1) at ([xshift=4cm]output1.east) {输出 $1$};

-    \node [system,fill=ugreen!20,draw,anchor=east,align=center,inner sep=1.9pt] (engine) at ([xshift=-0.5cm]output2.west) {单系统};
+    \node [system,fill=green!20,draw,anchor=east,align=center,inner sep=1.9pt] (engine) at ([xshift=-0.5cm]output2.west) {单系统};

    \draw [very thick,decorate,decoration={brace}] ([xshift=3pt]output1.north east) to node [midway,name=final] {} ([xshift=3pt]output3.south east);


--- a/Chapter14/Figures/figure-main-module.tex
+++ b/Chapter14/Figures/figure-main-module.tex

 \begin{tikzpicture}
 %左
-\node [anchor=west,draw=black!70,rounded corners,drop shadow,very thick,minimum width=6em,minimum height=3.5em,fill=blue!15,align=center,text=black] (part1) at (0,0) {\scriptsize{预测模块}};
+\node [anchor=west,draw=black!70,rounded corners,drop shadow,very thick,minimum width=6em,minimum height=3.5em,fill=red!15,align=center,text=black] (part1) at (0,0) {\small{预测模块}};
 \node [anchor=south] (text) at ([xshift=0.5em,yshift=-3.5em]part1.south) {\scriptsize{源语言句子（编码器输出）}};
-\node [anchor=east,draw=black!70,rounded corners,drop shadow,very thick,minimum width=6em,minimum height=3.5em,fill=blue!15,align=center,text=black] (part2) at ([xshift=10em]part1.east) {\scriptsize{搜索模块}};
+\node [anchor=east,draw=black!70,rounded corners,drop shadow,very thick,minimum width=6em,minimum height=3.5em,fill=green!15,align=center,text=black] (part2) at ([xshift=10em]part1.east) {\small{搜索模块}};

 \node [anchor=south] (text1) at ([xshift=0.1em,yshift=2.2em]part1.north) {\scriptsize{译文中已经生成的单词}};
 \node [anchor=south] (text2) at ([xshift=0.5em,yshift=2.2em]part2.north) {\scriptsize{预测当前位置的单词概率分布}};

--- a/Chapter14/Figures/figure-multi-modality.tex
+++ b/Chapter14/Figures/figure-multi-modality.tex
@@ -8,10 +8,10 @@
 	\tikzstyle{po} = [font=\scriptsize,rounded corners=1pt, fill=gray!20, minimum width=1.8em,minimum height=1.5em,draw]
 	\tikzstyle{tgt} = [minimum height=1.6em,minimum width=5.2em,fill=black!10!yellow!30,font=\footnotesize,drop shadow={shadow xshift=0.15em,shadow yshift=-0.15em,}]
 	\tikzstyle{p} = [fill=ugreen!15,minimum width=0.4em,inner sep=0pt]
-\node[ rounded corners=3pt, fill=red!20, drop shadow, minimum width=12em,minimum height=4em,draw]  (encoder) at (0,0) {编码器};
-\node[anchor=north,rounded corners=3pt, fill=yellow!20, drop shadow, minimum width=12em,minimum height=2em,draw] (lenpre) at([yshift=3em]encoder.north){长度预测器};
+\node[ rounded corners=3pt, thick,fill=red!20, drop shadow, minimum width=12em,minimum height=4em,draw]  (encoder) at (0,0) {编码器};
+\node[anchor=north,rounded corners=3pt, thick,fill=yellow!20, drop shadow, minimum width=12em,minimum height=2em,draw] (lenpre) at([yshift=3em]encoder.north){长度预测器};
 \node[anchor=north] (lable) at([xshift=3.5em,yshift=2.5em]lenpre.north){译文长度：3};
-\node[anchor=west, rounded corners=3pt, fill=blue!20, drop shadow, minimum width=13em,minimum height=4em,draw] (decoder) at ([xshift=1cm]encoder.east) {解码器};
+\node[anchor=west, rounded corners=3pt, thick,fill=blue!20, drop shadow, minimum width=13em,minimum height=4em,draw] (decoder) at ([xshift=1cm]encoder.east) {解码器};

 \node[anchor=north,emb] (en1) at ([yshift=-1.3em,xshift=-4.5em]encoder.south) {${\mathbi e}$(干)};
 \node[anchor=north,emb] (en2) at ([yshift=-1.3em,xshift=-1.5em]encoder.south) {${\mathbi e}$(得)};

--- a/Chapter14/Figures/figure-non-autoregressive.tex
+++ b/Chapter14/Figures/figure-non-autoregressive.tex
@@ -7,10 +7,10 @@
 	\tikzstyle{emb} = [font=\scriptsize,rounded corners=1pt, fill=orange!20, minimum width=1.8em,minimum height=1.5em,draw]
 	\tikzstyle{po} = [font=\scriptsize,rounded corners=1pt, fill=gray!20, minimum width=1.8em,minimum height=1.5em,draw]
 \begin{scope} 
-\node[rounded corners=3pt, fill=red!20, drop shadow, minimum width=10em,minimum height=4em,draw]  (encoder) at (0,0) {编码器};
-\node[anchor=north,rounded corners=3pt, fill=yellow!20, drop shadow, minimum width=10em,minimum height=2em,draw] (lenpre) at([yshift=3em]encoder.north){长度预测器};
+\node[rounded corners=3pt, thick,fill=red!20, drop shadow, minimum width=10em,minimum height=4em,draw]  (encoder) at (0,0) {编码器};
+\node[anchor=north,rounded corners=3pt, thick,fill=yellow!20, drop shadow, minimum width=10em,minimum height=2em,draw] (lenpre) at([yshift=3em]encoder.north){长度预测器};
 \node[anchor=north] (lable) at([xshift=3.5em,yshift=2.5em]lenpre.north){译文长度：4};
-\node[anchor=west, rounded corners=3pt, fill=blue!20, drop shadow, minimum width=16em,minimum height=4em,draw] (decoder) at ([xshift=1.8cm]encoder.east) {解码器};
+\node[anchor=west, rounded corners=3pt, thick,fill=blue!20, drop shadow, minimum width=16em,minimum height=4em,draw] (decoder) at ([xshift=1.8cm]encoder.east) {解码器};

 \node[anchor=north,emb] (en2) at ([yshift=-1.3em]encoder.south) {${\mathbi e}(x_2)$};
 \node[anchor=north,emb] (en1) at ([yshift=-1.3em,xshift=-3em]encoder.south) {${\mathbi e}(x_1)$};
@@ -61,7 +61,7 @@
 \end{scope} 

 \begin{scope}[yshift=2.8in]
-\node[rounded corners=3pt, fill=red!20, drop shadow, minimum width=10em,minimum height=4em,draw]  (encoder) at (0,0) {编码器};
+\node[rounded corners=3pt, thick,fill=red!20, drop shadow, minimum width=10em,minimum height=4em,draw]  (encoder) at (0,0) {编码器};
 \node[anchor=west,minimum width=16em,minimum height=4em] (decoder) at ([xshift=1.8cm]encoder.east) {};

 \node[anchor=north,emb] (en2) at ([yshift=-1.3em]encoder.south) {${\mathbi e}(x_2)$};
@@ -122,7 +122,7 @@
 \draw [->,very thick,dotted] ([xshift=-0.3em]out2.east) .. controls +(east:0.5) and +(west:0.5) ..([xshift=0em]de3.west);
 \draw [->,very thick,dotted] ([xshift=-0.3em]out3.east) .. controls +(east:0.5) and +(west:0.5) ..([xshift=0em]de4.west);
 \draw [->,very thick,dotted] ([xshift=-0.3em]out4.east) .. controls +(east:0.5) and +(west:0.5) ..([xshift=0em]de5.west);
-\node[anchor=west, rounded corners=3pt, fill=blue!20, drop shadow, minimum width=16em,minimum height=4em,draw] (decoder2) at ([xshift=1.8cm]encoder.east) {解码器};
+\node[anchor=west, rounded corners=3pt, thick,fill=blue!20, drop shadow, minimum width=16em,minimum height=4em,draw] (decoder2) at ([xshift=1.8cm]encoder.east) {解码器};

 \draw[->,line width=1pt] (encoder.east) -- (decoder.west);
 \end{scope}

--- a/Chapter15/Figures/figure-encoder-of-bidirectional-tree-structure.tex
+++ b/Chapter15/Figures/figure-encoder-of-bidirectional-tree-structure.tex
@@ -2,10 +2,10 @@
 \begin{tikzpicture}
 \begin{scope}

-\tikzstyle{hnode}=[rectangle,inner sep=0mm,minimum height=2em,minimum width=3em,rounded corners=5pt,fill=ugreen!20]
+\tikzstyle{hnode}=[rectangle,inner sep=0mm,minimum height=2em,minimum width=3em,rounded corners=5pt,fill=green!20]
 \tikzstyle{tnode}=[rectangle,inner sep=0mm,minimum height=2em,minimum width=3em,rounded corners=5pt,fill=red!20]
 \tikzstyle{fnoder}=[rectangle,inner sep=0mm,minimum height=2.4em,minimum width=6.8em,draw,dashed,very thick,rounded corners=5pt,red!40]
-\tikzstyle{fnodeg}=[rectangle,inner sep=0mm,minimum height=2.4em,minimum width=6.8em,draw,dashed,very thick,rounded corners=5pt,ugreen!40]
+\tikzstyle{fnodeg}=[rectangle,inner sep=0mm,minimum height=2.4em,minimum width=6.8em,draw,dashed,very thick,rounded corners=5pt,green!40]

 \node [anchor=south west,fnodeg] (f1) at (0,0) {};
 \node [anchor=west,hnode] (n1) at ([xshift=0.2em,yshift=0em]f1.west) {$\mathbi{h}_1^{\textrm{up}}$};
@@ -24,24 +24,24 @@
 \node [anchor=east,hnode] (n8) at ([xshift=-0.2em,yshift=0em]f4.east) {$\cdots$};

 \node [anchor=west,fnodeg] (f5) at ([xshift=0.6em,yshift=0em]f4.east) {};
-\node [anchor=west,hnode] (n9) at ([xshift=0.2em,yshift=0em]f5.west) {$\mathbi{h}_n^{\textrm{up}}$};
-\node [anchor=east,hnode] (n10) at ([xshift=-0.2em,yshift=0em]f5.east) {$\mathbi{h}_n^{\textrm{down}}$};
+\node [anchor=west,hnode] (n9) at ([xshift=0.2em,yshift=0em]f5.west) {$\mathbi{h}_m^{\textrm{up}}$};
+\node [anchor=east,hnode] (n10) at ([xshift=-0.2em,yshift=0em]f5.east) {$\mathbi{h}_m^{\textrm{down}}$};

 \node [anchor=south,fnoder] (f6) at ([xshift=3.7em,yshift=1em]f1.north) {};
-\node [anchor=west,tnode] (n11) at ([xshift=0.2em,yshift=0em]f6.west) {$\mathbi{h}_{n+1}^{\textrm{up}}$};
-\node [anchor=east,tnode] (n12) at ([xshift=-0.2em,yshift=0em]f6.east) {$\mathbi{h}_{n+1}^{\textrm{down}}$};
+\node [anchor=west,tnode] (n11) at ([xshift=0.2em,yshift=0em]f6.west) {$\mathbi{h}_{m+1}^{\textrm{up}}$};
+\node [anchor=east,tnode] (n12) at ([xshift=-0.2em,yshift=0em]f6.east) {$\mathbi{h}_{m+1}^{\textrm{down}}$};

 \node [anchor=south,fnoder] (f7) at ([xshift=3.7em,yshift=1em]f6.north) {};
-\node [anchor=west,tnode] (n13) at ([xshift=0.2em,yshift=0em]f7.west) {$\mathbi{h}_{n+2}^{\textrm{up}}$};
-\node [anchor=east,tnode] (n14) at ([xshift=-0.2em,yshift=0em]f7.east) {$\mathbi{h}_{n+2}^{\textrm{down}}$};
+\node [anchor=west,tnode] (n13) at ([xshift=0.2em,yshift=0em]f7.west) {$\mathbi{h}_{m+2}^{\textrm{up}}$};
+\node [anchor=east,tnode] (n14) at ([xshift=-0.2em,yshift=0em]f7.east) {$\mathbi{h}_{m+2}^{\textrm{down}}$};

 \node [anchor=south,fnoder] (f8) at ([xshift=3.7em,yshift=1em]f7.north) {};
 \node [anchor=west,tnode] (n15) at ([xshift=0.2em,yshift=0em]f8.west) {$\cdots$};
 \node [anchor=east,tnode] (n16) at ([xshift=-0.2em,yshift=0em]f8.east) {$\cdots$};

 \node [anchor=south,fnoder] (f9) at ([xshift=3.7em,yshift=1em]f8.north) {};
-\node [anchor=west,tnode] (n17) at ([xshift=0.2em,yshift=0em]f9.west) {$\mathbi{h}_{2n-1}^{\textrm{up}}$};
-\node [anchor=east,tnode] (n18) at ([xshift=-0.2em,yshift=0em]f9.east) {$\mathbi{h}_{2n-1}^{\textrm{down}}$};
+\node [anchor=west,tnode] (n17) at ([xshift=0.2em,yshift=0em]f9.west) {$\mathbi{h}_{2m-1}^{\textrm{up}}$};
+\node [anchor=east,tnode] (n18) at ([xshift=-0.2em,yshift=0em]f9.east) {$\mathbi{h}_{2m-1}^{\textrm{down}}$};


 \draw [->,thick] ([xshift=0em,yshift=0em]n11.east) -- ([xshift=0em,yshift=0em]n12.west);

--- a/Chapter15/Figures/figure-encoder-tree-structure-modeling.tex
+++ b/Chapter15/Figures/figure-encoder-tree-structure-modeling.tex
@@ -2,7 +2,7 @@
 \begin{tikzpicture}
 \begin{scope}

-\tikzstyle{hnode}=[rectangle,inner sep=0mm,minimum height=2em,minimum width=4.5em,rounded corners=5pt,fill=ugreen!30]
+\tikzstyle{hnode}=[rectangle,inner sep=0mm,minimum height=2em,minimum width=4.5em,rounded corners=5pt,fill=green!30]
 \tikzstyle{tnode}=[rectangle,inner sep=0mm,minimum height=2em,minimum width=4.5em,rounded corners=5pt,fill=red!30]
 \tikzstyle{wnode}=[inner sep=0mm,minimum height=1.4em,minimum width=4.4em]

@@ -10,12 +10,12 @@
 \node [anchor=west,hnode] (n2) at ([xshift=1em,yshift=0em]n1.east) {$\mathbi{h}_2$};
 \node [anchor=west,hnode] (n3) at ([xshift=1em,yshift=0em]n2.east) {$\mathbi{h}_3$};
 \node [anchor=west,hnode] (n4) at ([xshift=1em,yshift=0em]n3.east) {$\cdots$};
-\node [anchor=west,hnode] (n5) at ([xshift=1em,yshift=0em]n4.east) {$\mathbi{h}_n$};
+\node [anchor=west,hnode] (n5) at ([xshift=1em,yshift=0em]n4.east) {$\mathbi{h}_m$};

-\node [anchor=south,tnode] (t1) at ([xshift=2.8em,yshift=1em]n1.north) {$\mathbi{h}_{n+1}$};
-\node [anchor=south,tnode] (t2) at ([xshift=2.8em,yshift=1em]t1.north) {$\mathbi{h}_{n+2}$};
+\node [anchor=south,tnode] (t1) at ([xshift=2.8em,yshift=1em]n1.north) {$\mathbi{h}_{m+1}$};
+\node [anchor=south,tnode] (t2) at ([xshift=2.8em,yshift=1em]t1.north) {$\mathbi{h}_{m+2}$};
 \node [anchor=south,tnode] (t3) at ([xshift=2.8em,yshift=1em]t2.north) {$\cdots$};
-\node [anchor=south,tnode] (t4) at ([xshift=2.8em,yshift=1em]t3.north) {$\mathbi{h}_{2n-1}$};
+\node [anchor=south,tnode] (t4) at ([xshift=2.8em,yshift=1em]t3.north) {$\mathbi{h}_{2m-1}$};

 \draw [->,thick] ([xshift=0em,yshift=0em]n1.east) -- ([xshift=0em,yshift=0em]n2.west);
 \draw [->,thick] ([xshift=0em,yshift=0em]n2.east) -- ([xshift=0em,yshift=0em]n3.west);

--- a/Chapter15/Figures/figure-layer-fusion-method.tex
+++ b/Chapter15/Figures/figure-layer-fusion-method.tex
@@ -27,15 +27,15 @@

 \node [anchor=north,rectangle,minimum height=1.5em,minimum width=2.5em,rounded corners=5pt] (n10) at ([xshift=0em,yshift=-0.2em]n9.south) {$\mathbi{y}_{<j}$};

-\node [anchor=west,decnode,draw=ublue,fill=blue!10] (n11) at ([xshift=1.5em,yshift=0em]n10.east) {$\mathbi{s}_j^0$};
+\node [anchor=west,decnode,draw=ublue,fill=blue!10] (n11) at ([xshift=1.5em,yshift=0em]n10.east) {$\mathbi{s}_{0,j}$};

-\node [anchor=west,decnode,draw=ublue,fill=blue!10] (n12) at ([xshift=1.5em,yshift=0em]n11.east) {$\mathbi{s}_j^1$};
+\node [anchor=west,decnode,draw=ublue,fill=blue!10] (n12) at ([xshift=1.5em,yshift=0em]n11.east) {$\mathbi{s}_{1,j}$};

-\node [anchor=west,decnode,draw=ublue,fill=blue!10] (n13) at ([xshift=1.5em,yshift=0em]n12.east) {$\mathbi{s}_j^2$};
+\node [anchor=west,decnode,draw=ublue,fill=blue!10] (n13) at ([xshift=1.5em,yshift=0em]n12.east) {$\mathbi{s}_{2,j}$};

 \node [anchor=west,rectangle,minimum height=1.5em,minimum width=2.5em,rounded corners=5pt] (n14) at ([xshift=1em,yshift=0em]n13.east) {$\ldots$};

-\node [anchor=west,decnode,draw=ublue,fill=blue!10] (n15) at ([xshift=1em,yshift=0em]n14.east) {$\mathbi{s}_j^{M-1}$};
+\node [anchor=west,decnode,draw=ublue,fill=blue!10] (n15) at ([xshift=1em,yshift=0em]n14.east) {$\mathbi{s}_{M-1,j}$};

 \node [anchor=west,rectangle,minimum height=1.5em,minimum width=2.5em,rounded corners=5pt] (n16) at ([xshift=1.5em,yshift=0em]n15.east) {$\mathbi{y}_{j}$};


--- a/Chapter15/Figures/figure-main-flow-of-neural-network-structure-search.tex
+++ b/Chapter15/Figures/figure-main-flow-of-neural-network-structure-search.tex
@@ -5,9 +5,9 @@
 \begin{scope}[scale=0.36]
 \tikzstyle{every node}=[scale=0.36]

-\node[draw=ublue,very thick,drop shadow,fill=white,minimum width=40em,minimum height=25em] (rec3) at (2.25,0){};
-\node[draw=ublue,very thick,drop shadow,fill=white,minimum width=22em,minimum height=25em] (rec2) at (-12.4,0){};
-\node[draw=ublue,very thick,drop shadow,fill=white,minimum width=24em,minimum height=25em] (rec1) at (-24,0){};
+\node[draw=ublue,very thick,rounded corners=3pt,drop shadow,fill=white,minimum width=40em,minimum height=25em] (rec3) at (2.25,0){};
+\node[draw=ublue,very thick,rounded corners=3pt,drop shadow,fill=white,minimum width=22em,minimum height=25em] (rec2) at (-12.4,0){};
+\node[draw=ublue,very thick,rounded corners=3pt,drop shadow,fill=white,minimum width=24em,minimum height=25em] (rec1) at (-24,0){};

 %left
 \node[text=ublue] (label1) at (-26.4,4){\Huge\bfnew{结构空间}};

--- a/Chapter15/Figures/figure-multi-task-structure.tex
+++ b/Chapter15/Figures/figure-multi-task-structure.tex
@@ -2,7 +2,7 @@
 \begin{tikzpicture}
 \begin{scope}

-\tikzstyle{enode}=[rectangle,inner sep=0mm,minimum height=5em,minimum width=5em,rounded corners=7pt,fill=ugreen!30]
+\tikzstyle{enode}=[rectangle,inner sep=0mm,minimum height=5em,minimum width=5em,rounded corners=7pt,fill=green!30]
 \tikzstyle{dnode}=[rectangle,inner sep=0mm,minimum height=2em,minimum width=6.5em,rounded corners=5pt,fill=red!30]
 \tikzstyle{wnode}=[inner sep=0mm,minimum height=2em,minimum width=4em]


--- a/Chapter15/Figures/figure-structure-search-based-on-gradient-method.tex
+++ b/Chapter15/Figures/figure-structure-search-based-on-gradient-method.tex
@@ -6,7 +6,7 @@

 \node[node,fill=red!20] (n1) at (0,0){\scriptsize\bfnew{超网络}： \\ [1ex] 模型结构参数 \\[0.4ex] 网络参数};
 \node[anchor=west,node,fill=yellow!20] (n2) at ([xshift=4em]n1.east){\scriptsize\bfnew{优化后的超网络}： \\ [1ex]模型{\color{red}结构参数}（已优化） \\ [0.4ex]网络参数（已优化）};
-\node[anchor=west,node,fill=blue!20] (n3) at ([xshift=6em]n2.east){\scriptsize\bfnew{找到的模型结构}};
+\node[anchor=west,node,fill=green!20] (n3) at ([xshift=6em]n2.east){\scriptsize\bfnew{找到的模型结构}};

 \draw[-latex,thick] (n1.0) -- node[above,align=center,font=\scriptsize]{优化后的\\超网络}(n2.180);
 \draw[-latex,thick] (n2.0) -- node[above,align=center,font=\scriptsize]{根据结构参数\\离散化结构}(n3.180);

--- a/Chapter15/Figures/figure-structure-search-based-on-reinforcement-learning.tex
+++ b/Chapter15/Figures/figure-structure-search-based-on-reinforcement-learning.tex
@@ -5,7 +5,7 @@
 \tikzstyle{node}=[minimum height=2em,minimum width=5em,draw,rounded corners=2pt,thick,drop shadow]

 \node[node,fill=red!20] (n1) at (0,0){\small\bfnew{环境}};
-\node[anchor=south,node,fill=blue!20] (n2) at ([yshift=5em]n1.north){\small\bfnew{智能体}};
+\node[anchor=south,node,fill=green!20] (n2) at ([yshift=5em]n1.north){\small\bfnew{智能体}};
 \node[anchor=north,font=\footnotesize] at ([yshift=-0.2em]n1.south){（结构所应用于的任务）};
 \node[anchor=south,font=\footnotesize] at ([yshift=0.2em]n2.north){（结构生成器）};


--- a/Chapter15/Figures/figure-three-fusion-methods-of-tree-structure-information-1.tex
+++ b/Chapter15/Figures/figure-three-fusion-methods-of-tree-structure-information-1.tex
@@ -4,7 +4,7 @@
 \begin{tikzpicture}

 \tikzstyle{wrnode}=[rectangle,inner sep=0mm,minimum height=1.8em,minimum width=3em,rounded corners=5pt,fill=blue!30]
-\tikzstyle{srnode}=[rectangle,inner sep=0mm,minimum height=1.8em,minimum width=3em,rounded corners=5pt,fill=yellow!30]
+\tikzstyle{srnode}=[rectangle,inner sep=0mm,minimum height=1.8em,minimum width=3em,rounded corners=5pt,fill=orange!30]
 \tikzstyle{dotnode}=[inner sep=0mm,minimum height=0.5em,minimum width=1.5em]
 \tikzstyle{wnode}=[inner sep=0mm,minimum height=1.8em]
 {\small
@@ -55,7 +55,7 @@

 \begin{pgfonlayer}{background}
 \node [rectangle,inner sep=0.5em,draw=blue!80,dashed,very thick,rounded corners=10pt] [fit = (wr1) (wr3) (w1) (w3)] (box1) {};
-\node [rectangle,inner sep=0.5em,draw=yellow!80,dashed,very thick,rounded corners=10pt] [fit = (sr1) (sr4) (w4) (w7)] (box2) {};
+\node [rectangle,inner sep=0.5em,draw=orange!80,dashed,very thick,rounded corners=10pt] [fit = (sr1) (sr4) (w4) (w7)] (box2) {};
 \node [rectangle,minimum height=5em,inner sep=0.6em,fill=gray!20,draw=black,dashed,very thick,rounded corners=8pt] [fit = (m1) (m2)] (box3) {};
 \node [rectangle,minimum height=5em,inner sep=0.6em,fill=gray!20,draw=black,dashed,very thick,rounded corners=8pt] [fit = (m3) (m4)] (box4) {};
 \node [rectangle,minimum height=5em,inner sep=0.6em,fill=gray!20,draw=black,dashed,very thick,rounded corners=8pt] [fit = (m5) (m6)] (box5) {};

--- a/Chapter15/Figures/figure-three-fusion-methods-of-tree-structure-information-2.tex
+++ b/Chapter15/Figures/figure-three-fusion-methods-of-tree-structure-information-2.tex
@@ -4,7 +4,7 @@
 \begin{tikzpicture}

 \tikzstyle{wrnode}=[rectangle,inner sep=0mm,minimum height=1.8em,minimum width=3em,rounded corners=5pt,fill=blue!30]
-\tikzstyle{srnode}=[rectangle,inner sep=0mm,minimum height=1.8em,minimum width=3em,rounded corners=5pt,fill=yellow!30]
+\tikzstyle{srnode}=[rectangle,inner sep=0mm,minimum height=1.8em,minimum width=3em,rounded corners=5pt,fill=orange!30]
 \tikzstyle{dotnode}=[inner sep=0mm,minimum height=0.5em,minimum width=1.5em]
 \tikzstyle{wnode}=[inner sep=0mm,minimum height=1.8em]

@@ -48,9 +48,9 @@
 \node [anchor=south,wnode] (w10) at ([xshift=0em,yshift=0.5em]c3.north) {$\mathbi{e}_{w_2}$};

 \begin{pgfonlayer}{background}
-\node [rectangle,minimum height=5em,inner sep=0.6em,fill=ugreen!20,rounded corners=8pt] [fit = (c1) (w8)] (box6) {};
-\node [rectangle,minimum height=5em,inner sep=0.6em,fill=ugreen!20,rounded corners=8pt] [fit = (c2) (w9)] (box7) {};
-\node [rectangle,minimum height=5em,inner sep=0.6em,fill=ugreen!20,rounded corners=8pt] [fit = (c3) (w10)] (box8) {};
+\node [rectangle,minimum height=5em,inner sep=0.6em,fill=green!20,rounded corners=8pt] [fit = (c1) (w8)] (box6) {};
+\node [rectangle,minimum height=5em,inner sep=0.6em,fill=green!20,rounded corners=8pt] [fit = (c2) (w9)] (box7) {};
+\node [rectangle,minimum height=5em,inner sep=0.6em,fill=green!20,rounded corners=8pt] [fit = (c3) (w10)] (box8) {};
 \end{pgfonlayer}

 \node [anchor=south,wrnode] (wr1) at ([xshift=0em,yshift=1em]box6.north) {$\mathbi{h}_{w_1}$};
@@ -63,7 +63,7 @@

 \begin{pgfonlayer}{background}
 \node [rectangle,minimum width=20em,minimum height=13em,inner sep=0.5em,draw=blue!80,dashed,very thick,rounded corners=10pt] [fit = (h1) (w1) (h3) (c3)] (box1) {};
-\node [rectangle,inner sep=0.5em,draw=yellow!80,dashed,very thick,rounded corners=10pt] [fit = (sr1) (sr4) (w4) (w7)] (box2) {};
+\node [rectangle,inner sep=0.5em,draw=orange!80,dashed,very thick,rounded corners=10pt] [fit = (sr1) (sr4) (w4) (w7)] (box2) {};
 \node [rectangle,inner sep=0.4em,fill=gray!20,draw=black,dashed,very thick,rounded corners=8pt] [fit = (wr1)] (box3) {};
 \node [rectangle,inner sep=0.4em,fill=gray!20,draw=black,dashed,very thick,rounded corners=8pt] [fit = (wr2)] (box4) {};
 \node [rectangle,inner sep=0.4em,fill=gray!20,draw=black,dashed,very thick,rounded corners=8pt] [fit = (wr3)] (box5) {};

--- a/Chapter15/chapter15.tex
+++ b/Chapter15/chapter15.tex
--- a/Chapter16/Figures/figure-shared-space-inductive-bilingual-dictionary.tex
+++ b/Chapter16/Figures/figure-shared-space-inductive-bilingual-dictionary.tex
@@ -54,11 +54,11 @@
 \node[rec,anchor=center,rotate=60,fill=red!20](c1x5) at ([xshift=-2em,yshift=1.0em]circle1.east){\tiny{5}};

 %circle2
-\node[cir,anchor=center,rotate=-30,fill=blue!20] (c2a) at ([xshift=-5.3em,yshift=2.15em]circle2.east){\tiny{a}};
-\node[cir,anchor=east,rotate=-30,fill=blue!20] (c2b) at ([xshift=2.0em,yshift=-1.25em]c2a.east){\tiny{b}};
-\node[cir,anchor=east,rotate=-30,fill=blue!20] (c2c) at ([xshift=0.8em,yshift=-3.9em]c2a.south){\tiny{c}};
-\node[cir,anchor=east,rotate=-30,fill=blue!20] (c2x) at ([xshift=-0.3em,yshift=-1.9em]c2a.south){\tiny{x}};
-\node[cir,anchor=west,rotate=-30,fill=blue!20] (c2y) at ([xshift=1.15em,yshift=-2.85em]c2a.east){\tiny{y}};
+\node[cir,anchor=center,rotate=-30,fill=blue!20] (c2a) at ([xshift=-5.3em,yshift=2.15em]circle2.east){\tiny{$a$}};
+\node[cir,anchor=east,rotate=-30,fill=blue!20] (c2b) at ([xshift=2.0em,yshift=-1.25em]c2a.east){\tiny{$b$}};
+\node[cir,anchor=east,rotate=-30,fill=blue!20] (c2c) at ([xshift=0.8em,yshift=-3.9em]c2a.south){\tiny{$c$}};
+\node[cir,anchor=east,rotate=-30,fill=blue!20] (c2x) at ([xshift=-0.3em,yshift=-1.9em]c2a.south){\tiny{$x$}};
+\node[cir,anchor=west,rotate=-30,fill=blue!20] (c2y) at ([xshift=1.15em,yshift=-2.85em]c2a.east){\tiny{$y$}};

 %circle3
 \node[rec,anchor=center,rotate=-30,fill=red!20] (c3x1) at ([xshift=-6.7em,yshift=1.75em]circle3.east){\tiny{1}};
@@ -74,11 +74,11 @@
 \node[rec,anchor=east,rotate=-30,fill=red!20] (c4x4) at ([xshift=0.35em,yshift=-2.7em]c4x1.south){\tiny{4}};
 \node[rec,anchor=west,rotate=-30,fill=red!20] (c4x5) at ([xshift=2.35em,yshift=-3.85em]c4x1.east){\tiny{5}};

-\node[cir,anchor=center,rotate=-30,fill=blue!20] (c4a) at ([xshift=-5.3em,yshift=2.15em]circle4.east){\tiny{a}};
-\node[cir,anchor=east,rotate=-30,fill=blue!20] (c4b) at ([xshift=2.0em,yshift=-1.25em]c4a.east){\tiny{b}};
-\node[cir,anchor=east,rotate=-30,fill=blue!20] (c4c) at ([xshift=0.8em,yshift=-3.9em]c4a.south){\tiny{c}};
-\node[cir,anchor=east,rotate=-30,fill=blue!20] (c4x) at ([xshift=-0.3em,yshift=-1.9em]c4a.south){\tiny{x}};
-\node[cir,anchor=west,rotate=-30,fill=blue!20] (c4y) at ([xshift=1.15em,yshift=-2.85em]c4a.east){\tiny{y}};
+\node[cir,anchor=center,rotate=-30,fill=blue!20] (c4a) at ([xshift=-5.3em,yshift=2.15em]circle4.east){\tiny{$a$}};
+\node[cir,anchor=east,rotate=-30,fill=blue!20] (c4b) at ([xshift=2.0em,yshift=-1.25em]c4a.east){\tiny{$b$}};
+\node[cir,anchor=east,rotate=-30,fill=blue!20] (c4c) at ([xshift=0.8em,yshift=-3.9em]c4a.south){\tiny{$c$}};
+\node[cir,anchor=east,rotate=-30,fill=blue!20] (c4x) at ([xshift=-0.3em,yshift=-1.9em]c4a.south){\tiny{$x$}};
+\node[cir,anchor=west,rotate=-30,fill=blue!20] (c4y) at ([xshift=1.15em,yshift=-2.85em]c4a.east){\tiny{$y$}};

 \draw [color=red,line width=0.7pt,rotate=18] ([xshift=-5.1em,yshift=3.7em]circle4.east) ellipse (1.6em and 0.9em); 
 \draw [color=red,line width=0.7pt,rotate=-5] ([xshift=-2.8em,yshift=0.6em]circle4.east) ellipse (1.6em and 0.9em);

--- a/Chapter16/chapter16.tex
+++ b/Chapter16/chapter16.tex
@@ -22,9 +22,9 @@
 %----------------------------------------------------------------------------------------
 \chapter{低资源神经机器翻译}

-\parinterval 神经机器翻译带来的性能提升是显著的，但随之而来的问题是对海量双语训练数据的依赖。但是，不同语言可使用的数据规模是不同的。比如汉语、英语这种使用范围广泛的语言，存在着大量的双语平行句对，这些语言被称为{\small\bfnew{富资源语言}}\index{富资源语言}（High-resource Language\index{High-resource Language}）。而对于其它一些使用范围稍小的语言，如斐济语、古吉拉特语等，相关的数据非常稀少，这些语言被称为{\small\bfnew{低资源语言}}\index{低资源语言}（Low-resource Language\index{Low-resource Language}）。世界上现存语言超过5000种，仅有很少一部分为富资源语言，绝大多数均为低资源语言。即使在富资源语言中，对于一些特定的领域，双语平行语料也是十分稀缺的。有时，一些特殊的语种或者领域甚至会面临“零资源”的问题。因此，{\small\bfnew{低资源机器翻译}}\index{低资源机器翻译}（Low-resource Machine Translation）是当下急需解决且颇具挑战的问题。
+\parinterval 神经机器翻译带来的性能提升是显著的，但随之而来的问题是对海量双语训练数据的依赖。不同语言可使用的数据规模是不同的。比如汉语、英语这种使用范围广泛的语言，存在着大量的双语平行句对，这些语言被称为{\small\bfnew{富资源语言}}\index{富资源语言}（High-resource Language\index{High-resource Language}）。而对于其它一些使用范围稍小的语言，如斐济语、古吉拉特语等，相关的数据非常稀少，这些语言被称为{\small\bfnew{低资源语言}}\index{低资源语言}（Low-resource Language\index{Low-resource Language}）。世界上现存语言超过5000种，仅有很少一部分为富资源语言，绝大多数均为低资源语言。即使在富资源语言中，对于一些特定的领域，双语平行语料也是十分稀缺的。有时，一些特殊的语种或者领域甚至会面临“零资源”的问题。因此，{\small\bfnew{低资源机器翻译}}\index{低资源机器翻译}（Low-resource Machine Translation）是当下急需解决且颇具挑战的问题。

-\parinterval 本章将对低资源神经机器翻译的相关问题、模型和方法展开介绍，内容涉及数据的有效使用、双向翻译模型、多语言翻译建模、无监督机器翻译、领域适应五个方面。
+\parinterval 本章将对低资源神经机器翻译的相关问题、模型和方法展开介绍，内容涉及数据的有效使用、双向翻译模型、多语言翻译模型、无监督机器翻译、领域适应五个方面。

 %----------------------------------------------------------------------------------------
 %    NEW SECTION 16.1
@@ -55,7 +55,7 @@
 \begin{figure}[htp]
 \centering
 \input{./Chapter16/Figures/figure-application-process-of-back-translation}
-\caption{回译方法的流程}
+\caption{回译方法的简要流程}
 \label{fig:16-1}
 \end{figure}
 %----------------------------------------------
@@ -88,11 +88,11 @@
 %----------------------------------------------
 \begin{itemize}
    \vspace{0.5em}
-    \item 丢掉单词：句子中的每个词均有$\funp{P}_{\rm{Drop}}$的概率被丢弃。
+    \item {\small\bfnew{丢掉单词}}：句子中的每个词均有$\funp{P}_{\rm{Drop}}$的概率被丢弃。
    \vspace{0.5em}
-    \item 掩码单词：句子中的每个词均有$\funp{P}_{\rm{Mask}}$的概率被替换为一个额外的<Mask>词。<Mask>的作用类似于占位符，可以理解为一个句子中的部分词被屏蔽掉，无法得知该位置词的准确含义。
+    \item {\small\bfnew{掩码单词}}：句子中的每个词均有$\funp{P}_{\rm{Mask}}$的概率被替换为一个额外的<Mask>词。<Mask>的作用类似于占位符，可以理解为一个句子中的部分词被屏蔽掉，无法得知该位置词的准确含义。
    \vspace{0.5em}
-    \item 打乱顺序：将句子中距离较近的某些词的位置进行随机交换。
+    \item {\small\bfnew{打乱顺序}}：将句子中距离较近的某些词的位置进行随机交换。
    \vspace{0.5em}
 \end{itemize}
 %----------------------------------------------
@@ -112,11 +112,11 @@
 %----------------------------------------------
 \begin{itemize}
    \vspace{0.5em}
-    \item 对单语数据加噪。通过一个端到端模型预测源语言句子的调序结果，该模型和神经机器翻译模型的编码器共享参数，从而增强编码器的特征提取能力\upcite{DBLP:conf/emnlp/ZhangZ16}；
+    \item {\small\bfnew{对单语数据加噪}}。通过一个端到端模型预测源语言句子的调序结果，该模型和神经机器翻译模型的编码器共享参数，从而增强编码器的特征提取能力\upcite{DBLP:conf/emnlp/ZhangZ16}；
    \vspace{0.5em}
-    \item 训练降噪自编码器。将加噪后的句子作为输入，原始句子作为输出，用来训练降噪自编码器，这一思想在无监督机器翻译中得到了广泛应用，详细方法可以参考\ref{unsupervised-NMT}节；
+    \item {\small\bfnew{训练降噪自编码器}}。将加噪后的句子作为输入，原始句子作为输出，用来训练降噪自编码器，这一思想在无监督机器翻译中得到了广泛应用，详细方法可以参考\ref{unsupervised-NMT}节；
    \vspace{0.5em}
-    \item 对伪数据进行加噪。比如在上文中提到的对伪数据加入噪声的方法中，通常也使用上述这三种加噪方法来提高伪数据的多样性；
+    \item {\small\bfnew{对伪数据进行加噪}}。比如在上文中提到的对伪数据加入噪声的方法中，通常也使用上述这三种加噪方法来提高伪数据的多样性；
    \vspace{0.5em}
 \end{itemize}
 %----------------------------------------------
@@ -512,9 +512,9 @@

 \begin{itemize}
 \vspace{0.5em}
-\item 基于无监督的分布匹配。该步骤利用一些无监督的方法来得到一个包含噪声的初始化词典$D$。
+\item {\small\bfnew{基于无监督的分布匹配}}。该步骤利用一些无监督的方法来得到一个包含噪声的初始化词典$D$。
 \vspace{0.5em}
-\item 基于有监督的微调。利用两个单语词嵌入和第一步中学习到的种子字典执行一些对齐算法来迭代微调，例如，{\small\bfnew{普氏分析}}\index{普氏分析}（Procrustes Analysis\index{Procrustes Analysis}）\upcite{1966ASchnemann}。
+\item {\small\bfnew{基于有监督的微调}}。利用两个单语词嵌入和第一步中学习到的种子字典执行一些对齐算法来迭代微调，例如，{\small\bfnew{普氏分析}}\index{普氏分析}（Procrustes Analysis\index{Procrustes Analysis}）\upcite{1966ASchnemann}。
 \vspace{0.5em}
 \end{itemize}

@@ -542,9 +542,9 @@

 \begin{itemize}
 \vspace{0.5em}
-\item 基于生成对抗网络的方法\upcite{DBLP:conf/iclr/LampleCRDJ18,DBLP:conf/acl/ZhangLLS17,DBLP:conf/emnlp/XuYOW18,DBLP:conf/naacl/MohiuddinJ19}。在这个方法中，通过生成器来产生映射$\mathbi{W}$，鉴别器负责区分随机抽样的元素$\mathbi{W} \mathbi{X}$ 和$\mathbi{Y}$，两者共同优化收敛后即可得到映射$\mathbi{W}$。
+\item {\small\bfnew{基于生成对抗网络的方法}}\upcite{DBLP:conf/iclr/LampleCRDJ18,DBLP:conf/acl/ZhangLLS17,DBLP:conf/emnlp/XuYOW18,DBLP:conf/naacl/MohiuddinJ19}。在这个方法中，通过生成器来产生映射$\mathbi{W}$，鉴别器负责区分随机抽样的元素$\mathbi{W} \mathbi{X}$ 和$\mathbi{Y}$，两者共同优化收敛后即可得到映射$\mathbi{W}$。
 \vspace{0.5em}
-\item 基于Gromov-wasserstein 的方法\upcite{DBLP:conf/emnlp/Alvarez-MelisJ18,DBLP:conf/lrec/GarneauGBDL20,DBLP:journals/corr/abs-1811-01124,DBLP:conf/emnlp/XuYOW18}。Wasserstein距离是度量空间中定义两个概率分布之间距离的函数。在这个任务中，它用来衡量不同语言中单词对之间的相似性，利用空间近似同构的信息可以定义出一些目标函数，之后通过优化该目标函数也可以得到映射$\mathbi{W}$。
+\item {\small\bfnew{基于Gromov-wasserstein 的方法}}\upcite{DBLP:conf/emnlp/Alvarez-MelisJ18,DBLP:conf/lrec/GarneauGBDL20,DBLP:journals/corr/abs-1811-01124,DBLP:conf/emnlp/XuYOW18}。Wasserstein距离是度量空间中定义两个概率分布之间距离的函数。在这个任务中，它用来衡量不同语言中单词对之间的相似性，利用空间近似同构的信息可以定义出一些目标函数，之后通过优化该目标函数也可以得到映射$\mathbi{W}$。
 \vspace{0.5em}
 \end{itemize}

@@ -553,7 +553,7 @@
 \parinterval 微调的原理普遍基于普氏分析\upcite{DBLP:journals/corr/MikolovLS13}。假设现在有一个种子词典$D=\left\{x_{i}, y_{i}\right\}$其中${i \in\{1, n\}}$，和两个单语词嵌入$\mathbi{X}$和$\mathbi{Y}$，那么就可以将$D$作为{\small\bfnew{映射锚点}}\index{映射锚点}（Anchor\index{Anchor}）学习一个转移矩阵$\mathbi{W}$，使得$\mathbi{W} \mathbi{X}$与$\mathbi{Y}$这两个空间尽可能相近，此外通过对$\mathbi{W}$施加正交约束可以显著提高性能\upcite{DBLP:conf/naacl/XingWLL15}，于是这个优化问题就转变成了{\small\bfnew{普鲁克问题}}\index{普鲁克问题}（Procrustes Problem\index{Procrustes Problem}）\upcite{DBLP:conf/iclr/SmithTHH17}，可以通过{\small\bfnew{奇异值分解}}\index{奇异值分解}（Singular Value Decomposition，SVD\index{Singular Value Decomposition}）来获得近似解。这里用$\mathbi{X}'$和$\mathbi{Y}'$表示$D$中源语言单词和目标语言单词的词嵌入矩阵，优化$\mathbi{W}$的过程可以被描述为：

 \begin{eqnarray}
-\widehat{\mathbi{W}} & = &\underset{\mathbi{W} \in O_{d}(\mathbb{R})}{\operatorname{argmin}}\|\mathbi{W} \mathbi{X}'- \mathbi{Y}' \|_{\mathrm{F}} \nonumber \\
+\widehat{\mathbi{W}} & = & \argmin_{\mathbi{W} \in O_{d}(\mathbb{R})}{\|\mathbi{W} \mathbi{X}'- \mathbi{Y}' \|_{\mathrm{F}}} \nonumber \\
                              & = & \mathbi{U} \mathbi{V}^{\rm{T}} \\ \label{eq:16-9}
 \textrm{s.t.\ \ \ \ } \mathbi{U} \Sigma \mathbi{V}^{\rm{T}} &= &\operatorname{SVD}\left(\mathbi{Y}' \mathbi{X}'^{\rm{T}}\right)
 \label{eq:16-10}
@@ -675,10 +675,10 @@
 \parinterval 无监督神经机器翻译还有两个关键的技巧：
 \begin{itemize}
 \vspace{0.5em}
-\item 词表共享：对于源语言和目标语言里都一样的词使用同一个词嵌入，而不是源语言和目标语言各自对应一个词嵌入，比如，阿拉伯数字或者一些实体名字。这样相当于告诉模型这个词在源语言和目标语言里面表达同一个意思，隐式地引入了单词翻译的监督信号。在无监督神经机器翻译里词表共享搭配子词切分会更加有效，因为子词的覆盖范围广，比如，多个不同的词可以包含同一个子词。
+\item {\small\bfnew{词表共享}}：对于源语言和目标语言里都一样的词使用同一个词嵌入，而不是源语言和目标语言各自对应一个词嵌入，比如，阿拉伯数字或者一些实体名字。这样相当于告诉模型这个词在源语言和目标语言里面表达同一个意思，隐式地引入了单词翻译的监督信号。在无监督神经机器翻译里词表共享搭配子词切分会更加有效，因为子词的覆盖范围广，比如，多个不同的词可以包含同一个子词。

 \vspace{0.5em}
-\item 模型共享：与多语言翻译系统类似，使用同一个翻译模型来进行正向翻译（源语言$\to$目标语言）和反向翻译（目标语言$\to$源语言）。这样做降低了模型的参数量。而且，两个翻译方向可以互相为对方起到正则化的作用，减小了过拟合的风险。
+\item {\small\bfnew{模型共享}}：与多语言翻译系统类似，使用同一个翻译模型来进行正向翻译（源语言$\to$目标语言）和反向翻译（目标语言$\to$源语言）。这样做降低了模型的参数量。而且，两个翻译方向可以互相为对方起到正则化的作用，减小了过拟合的风险。
 \vspace{0.5em}
 \end{itemize}

@@ -752,9 +752,9 @@

 \begin{itemize}
 \vspace{0.5em}
-\item 基于数据的方法。利用源领域的双语数据或目标领域单语数据进行数据选择或数据增强，来增加模型训练的数据量。
+\item {\small\bfnew{基于数据的方法}}。利用源领域的双语数据或目标领域单语数据进行数据选择或数据增强，来增加模型训练的数据量。
 \vspace{0.5em}
-\item 基于模型的方法。针对领域适应开发特定的模型结构、训练策略和推断方法。
+\item {\small\bfnew{基于模型的方法}}。针对领域适应开发特定的模型结构、训练策略和推断方法。
 \vspace{0.5em}
 \end{itemize}


--- a/Chapter17/Figures/figure-cache.tex
+++ b/Chapter17/Figures/figure-cache.tex
@@ -17,10 +17,10 @@
 \node[anchor=south,font=\footnotesize,inner sep=0pt] at ([yshift=0.2em]value.north){value};
 \node[anchor=south,font=\footnotesize,inner sep=0pt] (cache)at ([yshift=2em,xshift=1.5em]key.north){\small\bfnew{缓存}};

-\node[draw,anchor=east,minimum size=1.8em,fill=orange!15] (dt) at ([yshift=2.1em,xshift=-4em]key.west){${\mathbi{d}}_{t}$};
+\node[draw,anchor=east,thick,minimum size=1.8em,fill=orange!15] (dt) at ([yshift=2.1em,xshift=-4em]key.west){${\mathbi{d}}_{t}$};
 \node[anchor=north,font=\footnotesize] (readlab) at ([xshift=2.8em,yshift=0.3em]dt.north){\red{读取}};
-\node[draw,anchor=east,minimum size=1.8em,fill=ugreen!15] (st) at ([xshift=-3.7em]dt.west){${\mathbi{s}}_{t}$};
-\node[draw,anchor=east,minimum size=1.8em,fill=red!15] (st2) at ([xshift=-0.85em,yshift=3.5em]dt.west){$ \widetilde{\mathbi{s}}_{t}$};
+\node[draw,anchor=east,thick,minimum size=1.8em,fill=ugreen!15] (st) at ([xshift=-3.7em]dt.west){${\mathbi{s}}_{t}$};
+\node[draw,anchor=east,thick,minimum size=1.8em,fill=red!15] (st2) at ([xshift=-0.85em,yshift=3.5em]dt.west){$ \widetilde{\mathbi{s}}_{t}$};

 %\node[draw,anchor=north,circle,inner sep=0pt, minimum size=1.2em,fill=yellow] (add) at ([yshift=-1em]st2.south){+};
 \node[draw,thick,inner sep=0pt, minimum size=1.1em, circle] (add) at ([yshift=-1.5em]st2.south){};
@@ -29,7 +29,7 @@

 \node[anchor=north,inner sep=0pt,font=\footnotesize,text=red] at ([xshift=-0em,yshift=-0.5em]add.south){融合};

-\node[draw,anchor=east,minimum size=1.8em,fill=yellow!15] (ct) at ([xshift=-2em,yshift=-3.5em]st.west){$ {\mathbi{C}}_{t}$};
+\node[draw,anchor=east,thick,minimum size=1.8em,fill=yellow!15] (ct) at ([xshift=-2em,yshift=-3.5em]st.west){$ {\mathbi{C}}_{t}$};
 \node[anchor=north,font=\footnotesize] (matchlab) at ([xshift=6.7em,yshift=-0.1em]ct.north){\red{匹配}};

 \node[anchor=east] (y) at ([xshift=-6em,yshift=1em]st.west){$\mathbi{y}_{t-1}$};
@@ -53,12 +53,12 @@
 %node[above,font=\footnotesize,text=red,rotate=25]{reading}
 \draw[-latex,dashed,very thick,out=-5,in=-170] (ct.0) to ([yshift=-2.5em]box.180);
 %node[above,font=\footnotesize,text=red,pos=0.7,rotate=8]{matching}
-\draw[-,very thick,out=0,in=-135](st.0) to (add.-135);
-\draw[-,very thick,out=180,in=-45](dt.180) to (add.-45);
-\draw[-latex,very thick] (add.90) -- (st2.-90);
-\draw[-latex,very thick,out=100,in=-100] (ct.90) to (output.-90);
-\draw[-latex,very thick,out=180,in=-100] (st2.180) to (output.-90);
-\draw[-latex,very thick,out=80,in=-100] (y.90) to (output.-90);
-\draw[-latex,very thick] (output.90) -- ([yshift=1em]output.90);
-\draw[-latex,very thick] ([yshift=-1.2em]yt.-90) -- (yt.-90);
+\draw[-,thick,out=0,in=-135](st.0) to (add.-135);
+\draw[-,thick,out=180,in=-45](dt.180) to (add.-45);
+\draw[-latex,thick] (add.90) -- (st2.-90);
+\draw[-latex,thick,out=100,in=-100] (ct.90) to (output.-90);
+\draw[-latex,thick,out=180,in=-100] (st2.180) to (output.-90);
+\draw[-latex,thick,out=80,in=-100] (y.90) to (output.-90);
+\draw[-latex,thick] (output.90) -- ([yshift=1em]output.90);
+\draw[-latex,thick] ([yshift=-1.2em]yt.-90) -- (yt.-90);
 \end{tikzpicture}
\ No newline at end of file
--- a/Chapter17/chapter17.tex
+++ b/Chapter17/chapter17.tex
@@ -160,11 +160,11 @@
 %----------------------------------------------------------------------------------------------------
 \begin{itemize}
    \vspace{0.5em}
-    \item 错误传播问题。级联模型导致的一个很严重的问题在于，语音识别模型得到的文本如果存在错误，这些错误很可能在翻译过程中被放大，从而使最后翻译结果出现比较大的偏差。比如识别时在句尾少生成了个“吗”，会导致翻译模型将疑问句翻译为陈述句。
+    \item {\small\bfnew{错误传播问题}}。级联模型导致的一个很严重的问题在于，语音识别模型得到的文本如果存在错误，这些错误很可能在翻译过程中被放大，从而使最后翻译结果出现比较大的偏差。比如识别时在句尾少生成了个“吗”，会导致翻译模型将疑问句翻译为陈述句。
    \vspace{0.5em}
-    \item 翻译效率问题。由于需要语音识别模型和文本标注模型只能串行地计算，翻译效率相对较低，而实际很多场景中都需要达到低延时的翻译。
+    \item {\small\bfnew{翻译效率问题}}。由于需要语音识别模型和文本标注模型只能串行地计算，翻译效率相对较低，而实际很多场景中都需要达到低延时的翻译。
    \vspace{0.5em}
-    \item 语音中的副语言信息丢失。将语音识别为文本的过程中，语音中包含的语气、情感、音调等信息会丢失，而同一句话在不同的语气中表达的意思很可能是不同的。尤其是在实际应用中，由于语音识别结果通常并不包含标点，还需要额外的后处理模型将标点还原，也会带来额外的计算代价。
+    \item {\small\bfnew{语音中的副语言信息丢失}}。将语音识别为文本的过程中，语音中包含的语气、情感、音调等信息会丢失，而同一句话在不同的语气中表达的意思很可能是不同的。尤其是在实际应用中，由于语音识别结果通常并不包含标点，还需要额外的后处理模型将标点还原，也会带来额外的计算代价。
    \vspace{0.5em}
 \end{itemize}
 %----------------------------------------------------------------------------------------------------
@@ -199,9 +199,9 @@
 %----------------------------------------------------------------------------------------------------
 \begin{itemize}
    \vspace{0.5em}
-    \item 训练数据稀缺。虽然语音识别和文本翻译的训练数据都很多，但是直接由源语言语音到目标语言文本的平行数据十分有限，因此端到端语音翻译天然地就是一种低资源翻译任务。
+    \item {\small\bfnew{训练数据稀缺}}。虽然语音识别和文本翻译的训练数据都很多，但是直接由源语言语音到目标语言文本的平行数据十分有限，因此端到端语音翻译天然地就是一种低资源翻译任务。
    \vspace{0.5em}
-    \item 建模复杂度更高。在语音识别中，模型是学习如何生成语音对应的文字序列，输入和输出的对齐比较简单，不涉及到调序的问题。在文本翻译中，模型要学习如何生成源语言序列对应的目标语言序列，仅需要学习不同语言之间的映射，不涉及到模态的转换。而语音翻译模型需要学习从语音到目标语言文本的生成，任务更加复杂。
+    \item {\small\bfnew{建模复杂度更高}}。在语音识别中，模型是学习如何生成语音对应的文字序列，输入和输出的对齐比较简单，不涉及到调序的问题。在文本翻译中，模型要学习如何生成源语言序列对应的目标语言序列，仅需要学习不同语言之间的映射，不涉及到模态的转换。而语音翻译模型需要学习从语音到目标语言文本的生成，任务更加复杂。
    \vspace{0.5em}
 \end{itemize}
 %----------------------------------------------------------------------------------------------------
@@ -231,9 +231,9 @@
 %----------------------------------------------------------------------------------------------------
 \begin{itemize}
    \vspace{0.5em}
-    \item 输入和输出之间的对齐是单调的。也就是后面的输入只会预测与前面的序列相同或后面的输出内容。比如对于图\ref{fig:17-8}中的例子，如果输入的位置t已经预测了字符l，那么t之后的位置不会再预测前面的字符h和e。
+    \item {\small\bfnew{输入和输出之间的对齐是单调的}}。也就是后面的输入只会预测与前面的序列相同或后面的输出内容。比如对于图\ref{fig:17-8}中的例子，如果输入的位置t已经预测了字符l，那么t之后的位置不会再预测前面的字符h和e。
    \vspace{0.5em}
-    \item 输入和输出之间是多对一的关系。也就是多个输入会对应到同一个输出上。这对于语音序列来说是非常自然的一件事情，由于输入的每个位置只包含非常短的语音特征，因此多个输入才可以对应到一个输出字符。
+    \item {\small\bfnew{输入和输出之间是多对一的关系}}。也就是多个输入会对应到同一个输出上。这对于语音序列来说是非常自然的一件事情，由于输入的每个位置只包含非常短的语音特征，因此多个输入才可以对应到一个输出字符。
    \vspace{0.5em}
 \end{itemize}
 %----------------------------------------------------------------------------------------------------

--- a/Chapter8/chapter8.tex
+++ b/Chapter8/chapter8.tex
@@ -532,9 +532,9 @@ span\textrm{[0,4]}&=&\textrm{“猫} \quad \textrm{喜欢} \quad \textrm{吃} \q

 \begin{itemize}
 \vspace{0.5em}
-\item 剪枝：在CKY中，每个跨度都可以生成非常多的推导（局部翻译假设）。理论上，这些推导的数量会和跨度大小成指数关系。显然不可能保存如此大量的翻译推导。对于这个问题，常用的办法是只保留top-$k$个推导。也就是每个局部结果只保留最好的$k$个，即束剪枝。在极端情况下，当$k$=1时，这个方法就变成了贪婪的方法；
+\item {\small\bfnew{剪枝}}：在CKY中，每个跨度都可以生成非常多的推导（局部翻译假设）。理论上，这些推导的数量会和跨度大小成指数关系。显然不可能保存如此大量的翻译推导。对于这个问题，常用的办法是只保留top-$k$个推导。也就是每个局部结果只保留最好的$k$个，即束剪枝。在极端情况下，当$k$=1时，这个方法就变成了贪婪的方法；
 \vspace{0.5em}
-\item $n$-best结果的生成：$n$-best推导（译文）的生成是统计机器翻译必要的功能。比如，最小错误率训练中就需要最好的$n$个结果用于特征权重调优。在基于CKY的方法中，整个句子的翻译结果会被保存在最大跨度所对应的结构中。因此一种简单的$n$-best生成方法是从这个结构中取出排名最靠前的$n$个结果。另外，也可以考虑自上而下遍历CKY生成的推导空间，得到更好的$n$-best结果\upcite{huang2005better}。
+\item {\small\bfnew{$n$-best结果的生成}}：$n$-best推导（译文）的生成是统计机器翻译必要的功能。比如，最小错误率训练中就需要最好的$n$个结果用于特征权重调优。在基于CKY的方法中，整个句子的翻译结果会被保存在最大跨度所对应的结构中。因此一种简单的$n$-best生成方法是从这个结构中取出排名最靠前的$n$个结果。另外，也可以考虑自上而下遍历CKY生成的推导空间，得到更好的$n$-best结果\upcite{huang2005better}。
 \end{itemize}
 %----------------------------------------------------------------------------------------
 %    NEW SUB-SECTION

--- a/ChapterAppend/chapterappend.tex
+++ b/ChapterAppend/chapterappend.tex
@@ -234,30 +234,34 @@

 \section{IBM模型2训练方法}

-IBM模型2与模型1的训练过程完全一样，本质上都是EM方法，因此可以直接复用{\chapterfive}中训练模型1的流程。对于句对$(\mathbf{s},\mathbf{t})$，$m=|\mathbf{s}|$，$l=|\mathbf{t}|$，E-Step的计算公式如下，其中参数$f(s_j|t_i)$与IBM模型1 一样：
+\parinterval IBM模型2与模型1的训练过程完全一样，本质上都是EM方法，因此可以直接复用{\chapterfive}中训练模型1的流程。对于源语言句子$\mathbi{s}=\{s_1,\dots,s_m\}$和目标语言句子$\mathbi{t}=\{t_1,\dots,t_l\}$，E-Step的计算公式如下：

 \begin{eqnarray}
-c(s_u|t_v;\mathbf{s},\mathbf{t}) &=&\sum\limits_{j=1}^{m} \sum\limits_{i=0}^{l} \frac{f(s_u|t_v)a(i|j,m,l) \delta(s_j,s_u)\delta (t_i,t_v) }   {\sum_{k=0}^{l} f(s_u|t_k)a(k|j,m,l)} \\
-c(i|j,m,l;\mathbf{s},\mathbf{t}) &=&\frac{f(s_j|t_i)a(i|j,m,l)}   {\sum_{k=0}^{l} f(s_j|t_k)a(k,j,m,l)}
+c(s_u|t_v;\mathbi{s},\mathbi{t}) &=&\sum\limits_{j=1}^{m} \sum\limits_{i=0}^{l} \frac{f(s_u|t_v)a(i|j,m,l) \delta(s_j,s_u)\delta (t_i,t_v) }   {\sum_{k=0}^{l} f(s_u|t_k)a(k|j,m,l)} \\
+c(i|j,m,l;\mathbi{s},\mathbi{t}) &=&\frac{f(s_j|t_i)a(i|j,m,l)}   {\sum_{k=0}^{l} f(s_j|t_k)a(k,j,m,l)}
 \label{eq:append-1}
 \end{eqnarray}

-\parinterval M-Step的计算公式如下，其中参数$a(i|j,m,l)$表示调序概率：
+\noindent M-Step的计算公式如下：

 \begin{eqnarray}
-f(s_u|t_v) &=&\frac{c(s_u|t_v;\mathbf{s},\mathbf{t}) }    {\sum_{s_u} c(s_u|t_v;\mathbf{s},\mathbf{t})} \\
-a(i|j,m,l) &=&\frac{c(i|j;\mathbf{s},\mathbf{t})}  {\sum_{i}c(i|j;\mathbf{s},\mathbf{t})}
+f(s_u|t_v) &=&\frac{c(s_u|t_v;\mathbi{s},\mathbi{t}) }    {\sum_{s_u} c(s_u|t_v;\mathbi{s},\mathbi{t})} \\
+a(i|j,m,l) &=&\frac{c(i|j,m,l;\mathbi{s},\mathbi{t})}  {\sum_{i}c(i|j,m,l;\mathbi{s},\mathbi{t})}
 \label{eq:append-2}
 \end{eqnarray}

-对于由$K$个样本组成的训练集$\{(\mathbf{s}^{[1]},\mathbf{t}^{[1]}),...,(\mathbf{s}^{[K]},\mathbf{t}^{[K]})\}$，可以将M-Step的计算调整为：
+\noindent 其中，$f(s_u|t_v)$与IBM模型1 一样表示目标语言单词$t_v$到源语言单词$s_u$的翻译概率，$a(i|j,m,l)$表示调序概率。
+
+\parinterval 对于由$K$个样本组成的训练集$\{(\mathbi{s}^{[1]},\mathbi{t}^{[1]}),...,(\mathbi{s}^{[K]},\mathbi{t}^{[K]})\}$，可以将M-Step的计算调整为：

 \begin{eqnarray}
-f(s_u|t_v) &=&\frac{\sum_{k=1}^{K}c_{\mathbb{E}}(s_u|t_v;\mathbf{s}^{[k]},\mathbf{t}^{[k]}) }    {\sum_{s_u} \sum_{k=1}^{K} c_{\mathbb{E}}(s_u|t_v;\mathbf{s}^{[k]},\mathbf{t}^{[k]})} \\
-a(i|j,m,l) &=&\frac{\sum_{k=1}^{K}c_{\mathbb{E}}(i|j;\mathbf{s}^{[k]},\mathbf{t}^{[k]})}  {\sum_{i}\sum_{k=1}^{K}c_{\mathbb{E}}(i|j;\mathbf{s}^{[k]},\mathbf{t}^{[k]})}
+f(s_u|t_v) &=&\frac{\sum_{k=1}^{K}c(s_u|t_v;\mathbi{s}^{[k]},\mathbi{t}^{[k]}) }    {\sum_{s_u} \sum_{k=1}^{K} c(s_u|t_v;\mathbi{s}^{[k]},\mathbi{t}^{[k]})} \\
+a(i|j,m,l) &=&\frac{\sum_{k=1}^{K}c(i|j,m^{[k]},l^{[k]};\mathbi{s}^{[k]},\mathbi{t}^{[k]})}  {\sum_{i}\sum_{k=1}^{K}c(i|j,m^{[k]},l^{[k]};\mathbi{s}^{[k]},\mathbi{t}^{[k]})}
 \label{eq:append-3}
 \end{eqnarray}

+\noindent 其中，$m^{[k]}=|\mathbi{s}^{[k]}|$，$l^{[k]}=|\mathbi{t}^{[k]}|$。
+
 %----------------------------------------------------------------------------------------
 %    NEW SECTION
 %----------------------------------------------------------------------------------------
@@ -265,7 +269,7 @@ a(i|j,m,l) &=&\frac{\sum_{k=1}^{K}c_{\mathbb{E}}(i|j;\mathbf{s}^{[k]},\mathbf{t}
 \section{IBM模型3训练方法}
 \parinterval IBM模型3的参数估计与模型1和模型2采用相同的方法。这里直接给出辅助函数。
 \begin{eqnarray}
-h(t,d,n,p, \lambda,\mu, \nu, \zeta) & = &  \funp{P}_{\theta}(\mathbf{s}|\mathbf{t})-\sum_{t}\lambda_{t}\big(\sum_{s}t(s|t)-1\big)  \nonumber \\
+h(t,d,n,p, \lambda,\mu, \nu, \zeta) & = &  \funp{P}_{\theta}(\mathbi{s}|\mathbi{t})-\sum_{t}\lambda_{t}\big(\sum_{s}t(s|t)-1\big)  \nonumber \\
 & & -\sum_{i}\mu_{iml}\big(\sum_{j}d(j|i,m,l)-1\big) \nonumber \\
 & & -\sum_{t}\nu_{t}\big(\sum_{\varphi}n(\varphi|t)-1\big)-\zeta(p^0+p^1-1)
 \label{eq:1.1}
@@ -273,49 +277,49 @@ h(t,d,n,p, \lambda,\mu, \nu, \zeta) & = &  \funp{P}_{\theta}(\mathbf{s}|\mathbf{

 \parinterval 由于篇幅所限这里略去了推导步骤直接给出具体公式。
 \begin{eqnarray}
-c(s|t,\mathbf{s},\mathbf{t}) & = & \sum_{\mathbf{a}}\big[\funp{P}_{\theta}(\mathbf{s},\mathbf{a}|\mathbf{t}) \times \sum_{j=1}^{m} (\delta(s_j,s) \cdot \delta(t_{a_{j}},t))\big] \label{eq:1.2} \\
-c(j|i,m,l;\mathbf{s},\mathbf{t}) & = & \sum_{\mathbf{a}}\big[\funp{P}_{\theta}(\mathbf{s},\mathbf{a}|\mathbf{t}) \times \delta(i,a_j)\big] \label{eq:1.3} \\
-c(\varphi|t;\mathbf{s},\mathbf{t}) & = & \sum_{\mathbf{a}}\big[\funp{P}_{\theta}(\mathbf{s},\mathbf{a}|\mathbf{t}) \times \sum_{i=1}^{l}\delta(\varphi,\varphi_{i})\delta(t,t_i)\big]
+c(s|t,\mathbi{s},\mathbi{t}) & = & \sum_{\mathbi{a}}\big[\funp{P}_{\theta}(\mathbi{s},\mathbi{a}|\mathbi{t}) \times \sum_{j=1}^{m} (\delta(s_j,s) \cdot \delta(t_{a_{j}},t))\big] \label{eq:1.2} \\
+c(j|i,m,l;\mathbi{s},\mathbi{t}) & = & \sum_{\mathbi{a}}\big[\funp{P}_{\theta}(\mathbi{s},\mathbi{a}|\mathbi{t}) \times \delta(i,a_j)\big] \label{eq:1.3} \\
+c(\varphi|t;\mathbi{s},\mathbi{t}) & = & \sum_{\mathbi{a}}\big[\funp{P}_{\theta}(\mathbi{s},\mathbi{a}|\mathbi{t}) \times \sum_{i=1}^{l}\delta(\varphi,\varphi_{i})\delta(t,t_i)\big]
 \label{eq:1.4}
 \end{eqnarray}

 \begin{eqnarray}
-c(0|\mathbf{s},\mathbf{t}) & = & \sum_{\mathbf{a}}\big[\funp{P}_{\theta}(\mathbf{s},\mathbf{a}|\mathbf{t})  \times (m-2\varphi_0) \big] \label{eq:1.5} \\
-c(1|\mathbf{s},\mathbf{t}) & = & \sum_{\mathbf{a}}\big[\funp{P}_{\theta}(\mathbf{s},\mathbf{a}|\mathbf{t}) \times \varphi_0 \big] \label{eq:1.6}
+c(0|\mathbi{s},\mathbi{t}) & = & \sum_{\mathbi{a}}\big[\funp{P}_{\theta}(\mathbi{s},\mathbi{a}|\mathbi{t})  \times (m-2\varphi_0) \big] \label{eq:1.5} \\
+c(1|\mathbi{s},\mathbi{t}) & = & \sum_{\mathbi{a}}\big[\funp{P}_{\theta}(\mathbi{s},\mathbi{a}|\mathbi{t}) \times \varphi_0 \big] \label{eq:1.6}
 \end{eqnarray}

 \parinterval 进一步，对于由$K$个样本组成的训练集，有：
 \begin{eqnarray}
-t(s|t) & = & \lambda_{t}^{-1} \times \sum_{k=1}^{K}c(s|t;\mathbf{s}^{[k]},\mathbf{t}^{[k]}) \label{eq:1.7} \\
-d(j|i,m,l) & = & \mu_{iml}^{-1} \times \sum_{k=1}^{K}c(j|i,m,l;\mathbf{s}^{[k]},\mathbf{t}^{[k]}) \label{eq:1.8} \\
-n(\varphi|t) & = & \nu_{t}^{-1} \times \sum_{k=1}^{K}c(\varphi |t;\mathbf{s}^{[k]},\mathbf{t}^{[k]}) \label{eq:1.9} \\
-p_x & = & \zeta^{-1} \sum_{k=1}^{K}c(x;\mathbf{s}^{[k]},\mathbf{t}^{[k]}) \label{eq:1.10}
+t(s|t) & = & \lambda_{t}^{-1} \times \sum_{k=1}^{K}c(s|t;\mathbi{s}^{[k]},\mathbi{t}^{[k]}) \label{eq:1.7} \\
+d(j|i,m,l) & = & \mu_{iml}^{-1} \times \sum_{k=1}^{K}c(j|i,m,l;\mathbi{s}^{[k]},\mathbi{t}^{[k]}) \label{eq:1.8} \\
+n(\varphi|t) & = & \nu_{t}^{-1} \times \sum_{k=1}^{K}c(\varphi |t;\mathbi{s}^{[k]},\mathbi{t}^{[k]}) \label{eq:1.9} \\
+p_x & = & \zeta^{-1} \sum_{k=1}^{K}c(x;\mathbi{s}^{[k]},\mathbi{t}^{[k]}) \label{eq:1.10}
 \end{eqnarray}

-\parinterval 在模型3中，因为繁衍率的引入，并不能像模型1和模型2那样，在保证正确性的情况下加速参数估计的过程。这就使得每次迭代过程中，都不得不面对大小为$(l+1)^m$的词对齐空间。遍历所有$(l+1)^m$个词对齐所带来的高时间复杂度显然是不能被接受的。因此就要考虑能否仅利用词对齐空间中的部分词对齐对这些参数进行估计。比较简单的方法是仅使用Viterbi对齐来进行参数估计，这里Viterbi 词对齐可以被简单的看作搜索到的最好词对齐。遗憾的是，在模型3中并没有方法直接获得Viterbi对齐。这样只能采用一种折中的策略，即仅考虑那些使得$\funp{P}_{\theta}(\mathbf{s},\mathbf{a}|\mathbf{t})$ 达到较高值的词对齐。这里把这部分词对齐组成的集合记为$S$。式\ref{eq:1.2}可以被修改为：
+\parinterval 在模型3中，因为繁衍率的引入，并不能像模型1和模型2那样，在保证正确性的情况下加速参数估计的过程。这就使得每次迭代过程中，都不得不面对大小为$(l+1)^m$的词对齐空间。遍历所有$(l+1)^m$个词对齐所带来的高时间复杂度显然是不能被接受的。因此就要考虑能否仅利用词对齐空间中的部分词对齐对这些参数进行估计。比较简单的方法是仅使用Viterbi对齐来进行参数估计，这里Viterbi 词对齐可以被简单的看作搜索到的最好词对齐。遗憾的是，在模型3中并没有方法直接获得Viterbi对齐。这样只能采用一种折中的策略，即仅考虑那些使得$\funp{P}_{\theta}(\mathbi{s},\mathbi{a}|\mathbi{t})$ 达到较高值的词对齐。这里把这部分词对齐组成的集合记为$S$。式\ref{eq:1.2}可以被修改为：
 \begin{eqnarray}
-c(s|t,\mathbf{s},\mathbf{t}) &\approx & \sum_{\mathbf{a} \in S}\big[\funp{P}_{\theta}(\mathbf{s},\mathbf{a}|\mathbf{t}) \times \sum_{j=1}^{m}(\delta(s_j,\mathbf{s}) \cdot \delta(t_{a_{j}},\mathbf{t})) \big]
+c(s|t,\mathbi{s},\mathbi{t}) &\approx & \sum_{\mathbi{a} \in S}\big[\funp{P}_{\theta}(\mathbi{s},\mathbi{a}|\mathbi{t}) \times \sum_{j=1}^{m}(\delta(s_j,\mathbi{s}) \cdot \delta(t_{a_{j}},\mathbi{t})) \big]
 \label{eq:1.11}
 \end{eqnarray}

 \parinterval 同理可以获得式\ref{eq:1.3}-\ref{eq:1.6}的修改结果。进一步，在IBM模型3中，可以定义$S$如下：
 \begin{eqnarray}
-S &=& N(b^{\infty}(V(\mathbf{s}|\mathbf{t};2))) \cup (\mathop{\cup}\limits_{ij} N(b_{i \leftrightarrow j}^{\infty}(V_{i \leftrightarrow j}(\mathbf{s}|\mathbf{t},2))))
+S &=& N(b^{\infty}(V(\mathbi{s}|\mathbi{t};2))) \cup (\mathop{\cup}\limits_{ij} N(b_{i \leftrightarrow j}^{\infty}(V_{i \leftrightarrow j}(\mathbi{s}|\mathbi{t},2))))
 \label{eq:1.12}
 \end{eqnarray}

 \parinterval 为了理解这个公式，先介绍几个概念。
 \begin{itemize}
-\item $V(\mathbf{s}|\mathbf{t})$表示Viterbi词对齐，$V(\mathbf{s}|\mathbf{t},1)$、$V(\mathbf{s}|\mathbf{t},2)$和$V(\mathbf{s}|\mathbf{t},3)$就分别对应了模型1、2 和3 的Viterbi 词对齐；
-\item 把那些满足第$j$个源语言单词对应第$i$个目标语言单词（$a_j=i$）的词对齐构成的集合记为$\mathbf{A}_{i \leftrightarrow j}(\mathbf{s},\mathbf{t})$。通常称这些对齐中$j$和$i$被``钉''在了一起。在$\mathbf{A}_{i \leftrightarrow j}(\mathbf{s},\mathbf{t})$中使$\funp{P}(\mathbf{a}|\mathbf{s},\mathbf{t})$达到最大的那个词对齐被记为$V_{i \leftrightarrow j}(\mathbf{s},\mathbf{t})$；
-\item 如果两个词对齐，通过交换两个词对齐连接就能互相转化，则称它们为邻居。一个词对齐$\mathbf{a}$的所有邻居记为$N(\mathbf{a})$。
+\item $V(\mathbi{s}|\mathbi{t})$表示Viterbi词对齐，$V(\mathbi{s}|\mathbi{t},1)$、$V(\mathbi{s}|\mathbi{t},2)$和$V(\mathbi{s}|\mathbi{t},3)$就分别对应了模型1、2 和3 的Viterbi 词对齐；
+\item 把那些满足第$j$个源语言单词对应第$i$个目标语言单词（$a_j=i$）的词对齐构成的集合记为$\mathbi{A}_{i \leftrightarrow j}(\mathbi{s},\mathbi{t})$。通常称这些对齐中$j$和$i$被``钉''在了一起。在$\mathbi{A}_{i \leftrightarrow j}(\mathbi{s},\mathbi{t})$中使$\funp{P}(\mathbi{a}|\mathbi{s},\mathbi{t})$达到最大的那个词对齐被记为$V_{i \leftrightarrow j}(\mathbi{s},\mathbi{t})$；
+\item 如果两个词对齐，通过交换两个词对齐连接就能互相转化，则称它们为邻居。一个词对齐$\mathbi{a}$的所有邻居记为$N(\mathbi{a})$。
 \end{itemize}

 \vspace{0.5em}
-\parinterval 公式\ref{eq:1.12}中，$b^{\infty}(V(\mathbf{s}|\mathbf{t};2))$ 和 $b_{i \leftrightarrow j}^{\infty}(V_{i \leftrightarrow j}(\mathbf{s}|\mathbf{t},2))$ 分别是对 $V(\mathbf{s}|\mathbf{t};3)$ 和 $V_{i \leftrightarrow j}(\mathbf{s}|\mathbf{t},3)$ 的估计。在计算$S$的过程中，需要知道一个对齐$\bf{a}$的邻居$\bf{a}^{'}$的概率，即通过$\funp{P}_{\theta}(\mathbf{a},\mathbf{s}|\mathbf{t})$计算$\funp{P}_{\theta}(\mathbf{a}',\mathbf{s}|\mathbf{t})$。在模型3中，如果$\bf{a}$和$\bf{a}'$仅区别于某个源语单词对齐到的目标位置上（$a_j \neq a_{j}'$），那么
+\parinterval 公式\ref{eq:1.12}中，$b^{\infty}(V(\mathbi{s}|\mathbi{t};2))$ 和 $b_{i \leftrightarrow j}^{\infty}(V_{i \leftrightarrow j}(\mathbi{s}|\mathbi{t},2))$ 分别是对 $V(\mathbi{s}|\mathbi{t};3)$ 和 $V_{i \leftrightarrow j}(\mathbi{s}|\mathbi{t},3)$ 的估计。在计算$S$的过程中，需要知道一个对齐$\bf{a}$的邻居$\bf{a}^{'}$的概率，即通过$\funp{P}_{\theta}(\mathbi{a},\mathbi{s}|\mathbi{t})$计算$\funp{P}_{\theta}(\mathbi{a}',\mathbi{s}|\mathbi{t})$。在模型3中，如果$\bf{a}$和$\bf{a}'$仅区别于某个源语单词对齐到的目标位置上（$a_j \neq a_{j}'$），那么

 \begin{eqnarray}
-\funp{P}_{\theta}(\mathbf{a}',\mathbf{s}|\mathbf{t}) & = & \funp{P}_{\theta}(\mathbf{a},\mathbf{s}|\mathbf{t}) \cdot  \nonumber \\
+\funp{P}_{\theta}(\mathbi{a}',\mathbi{s}|\mathbi{t}) & = & \funp{P}_{\theta}(\mathbi{a},\mathbi{s}|\mathbi{t}) \cdot  \nonumber \\
                                                                                   &     & \frac{\varphi_{i'}+1}{\varphi_i} \cdot \frac{n(\varphi_{i'}+1|t_{i'})}{n(\varphi_{i'}|t_{i'})} \cdot \frac{n(\varphi_{i}-1|t_{i})}{n(\varphi_{i}|t_{i})} \cdot \nonumber \\
                                                                                   &     & \frac{t(s_j|t_{i'})}{t(s_{j}|t_{i})} \cdot \frac{d(j|i',m,l)}{d(j|i,m,l)}
 \label{eq:1.13}
@@ -323,7 +327,7 @@ S &=& N(b^{\infty}(V(\mathbf{s}|\mathbf{t};2))) \cup (\mathop{\cup}\limits_{ij} 

 \parinterval 如果$\bf{a}$和$\bf{a}'$区别于两个位置$j_1$和$j_2$的对齐上，$a_{j_{1}}=a_{j_{2}^{'}}$且$a_{j_{2}}=a_{j_{1}^{'}}$，那么
 \begin{eqnarray}
-\funp{P}_{\theta}(\mathbf{a'},\mathbf{s}|\mathbf{t}) &=& \funp{P}_{\theta}(\mathbf{a},\mathbf{s}|\mathbf{t}) \cdot \frac{t(s_{j_{2}}|t_{a_{j_{2}}})}{t(s_{j_{1}}|t_{a_{j_{1}}})} \cdot \frac{d(j_{2}|a_{j_{2}},m,l)}{d(j_{1}|a_{j_{1}},m,l)}
+\funp{P}_{\theta}(\mathbi{a'},\mathbi{s}|\mathbi{t}) &=& \funp{P}_{\theta}(\mathbi{a},\mathbi{s}|\mathbi{t}) \cdot \frac{t(s_{j_{2}}|t_{a_{j_{2}}})}{t(s_{j_{1}}|t_{a_{j_{1}}})} \cdot \frac{d(j_{2}|a_{j_{2}},m,l)}{d(j_{1}|a_{j_{1}},m,l)}
 \label{eq:1.14}
 \end{eqnarray}

@@ -337,15 +341,15 @@ S &=& N(b^{\infty}(V(\mathbf{s}|\mathbf{t};2))) \cup (\mathop{\cup}\limits_{ij} 

 \parinterval 模型4的参数估计基本与模型3一致。需要修改的是扭曲度的估计公式，对于目标语第$i$个cept.生成的第一单词，可以得到（假设有$K$个训练样本）：
 \begin{eqnarray}
-d_1(\Delta_j|ca,cb) &=& \mu_{1cacb}^{-1} \times \sum_{k=1}^{K}c_1(\Delta_j|ca,cb;\mathbf{s}^{[k]},\mathbf{t}^{[k]})
+d_1(\Delta_j|ca,cb) &=& \mu_{1cacb}^{-1} \times \sum_{k=1}^{K}c_1(\Delta_j|ca,cb;\mathbi{s}^{[k]},\mathbi{t}^{[k]})
 \label{eq:1.15}
 \end{eqnarray}

 其中，

 \begin{eqnarray}
-c_1(\Delta_j|ca,cb;\mathbf{s},\mathbf{t})           & = & \sum_{\mathbf{a}}\big[\funp{P}_{\theta}(\mathbf{s},\mathbf{a}|\mathbf{t}) \times s_1(\Delta_j|ca,cb;\mathbf{a},\mathbf{s},\mathbf{t})\big] \label{eq:1.16} \\
-s_1(\Delta_j|ca,cb;\rm{a},\mathbf{s},\mathbf{t}) & = & \sum_{i=1}^l \big[\varepsilon(\varphi_i) \cdot \delta(\pi_{i1}-\odot _{i},\Delta_j) \cdot \nonumber \\
+c_1(\Delta_j|ca,cb;\mathbi{s},\mathbi{t})           & = & \sum_{\mathbi{a}}\big[\funp{P}_{\theta}(\mathbi{s},\mathbi{a}|\mathbi{t}) \times s_1(\Delta_j|ca,cb;\mathbi{a},\mathbi{s},\mathbi{t})\big] \label{eq:1.16} \\
+s_1(\Delta_j|ca,cb;\rm{a},\mathbi{s},\mathbi{t}) & = & \sum_{i=1}^l \big[\varepsilon(\varphi_i) \cdot \delta(\pi_{i1}-\odot _{i},\Delta_j) \cdot \nonumber \\
                                                                           &     & \delta(A(t_{i-1}),ca) \cdot \delta(B(\tau_{i1}),cb) \big] \label{eq:1.17}
 \end{eqnarray}

@@ -362,26 +366,26 @@ s_1(\Delta_j|ca,cb;\rm{a},\mathbf{s},\mathbf{t}) & = & \sum_{i=1}^l \big[\vareps
 对于目标语第$i$个cept.生成的其他单词（非第一个单词），可以得到：

 \begin{eqnarray}
-d_{>1}(\Delta_j|cb) &=& \mu_{>1cb}^{-1} \times \sum_{k=1}^{K}c_{>1}(\Delta_j|cb;\mathbf{s}^{[k]},\mathbf{t}^{[k]})
+d_{>1}(\Delta_j|cb) &=& \mu_{>1cb}^{-1} \times \sum_{k=1}^{K}c_{>1}(\Delta_j|cb;\mathbi{s}^{[k]},\mathbi{t}^{[k]})
 \label{eq:1.18}
 \end{eqnarray}

 其中，

 \begin{eqnarray}
-c_{>1}(\Delta_j|cb;\mathbf{s},\mathbf{t})                  & = & \sum_{\mathbf{a}}\big[\textrm{p}_{\theta}(\mathbf{s},\mathbf{a}|\mathbf{t}) \times s_{>1}(\Delta_j|cb;\mathbf{a},\mathbf{s},\mathbf{t}) \big] \label{eq:1.19} \\
-s_{>1}(\Delta_j|cb;\mathbf{a},\mathbf{s},\mathbf{t}) & = & \sum_{i=1}^l \big[\varepsilon(\varphi_i-1)\sum_{k=2}^{\varphi_i}\delta(\pi_{[i]k}-\pi_{[i]k-1},\Delta_j) \cdot \nonumber ß\\
+c_{>1}(\Delta_j|cb;\mathbi{s},\mathbi{t})                  & = & \sum_{\mathbi{a}}\big[\textrm{p}_{\theta}(\mathbi{s},\mathbi{a}|\mathbi{t}) \times s_{>1}(\Delta_j|cb;\mathbi{a},\mathbi{s},\mathbi{t}) \big] \label{eq:1.19} \\
+s_{>1}(\Delta_j|cb;\mathbi{a},\mathbi{s},\mathbi{t}) & = & \sum_{i=1}^l \big[\varepsilon(\varphi_i-1)\sum_{k=2}^{\varphi_i}\delta(\pi_{[i]k}-\pi_{[i]k-1},\Delta_j) \cdot \nonumber ß\\
                                                                                  &    & \delta(B(\tau_{[i]k}),cb) \big] \label{eq:1.20}
 \end{eqnarray}

 \noindent 这里，$ca$和$cb$分别表示目标语言和源语言的某个词类。模型4需要像模型3一样，通过定义一个词对齐集合$S$，使得每次迭代都在$S$上进行，进而降低运算量。模型4中$S$的定义为：

 \begin{eqnarray}
-\textrm{S} &=& N(\tilde{b}^{\infty}(V(\mathbf{s}|\mathbf{t};2))) \cup (\mathop{\cup}\limits_{ij} N(\tilde{b}_{i \leftrightarrow j}^{\infty}(V_{i \leftrightarrow j}(\mathbf{s}|\mathbf{t},2))))
+S &=& N(\tilde{b}^{\infty}(V(\mathbi{s}|\mathbi{t};2))) \cup (\mathop{\cup}\limits_{ij} N(\tilde{b}_{i \leftrightarrow j}^{\infty}(V_{i \leftrightarrow j}(\mathbi{s}|\mathbi{t},2))))
 \label{eq:1.22}
 \end{eqnarray}

-\parinterval 对于一个对齐$\mathbf{a}$，可用模型3对它的邻居进行排名，即按$\funp{P}_{\theta}(b(\mathbf{a})|\mathbf{s},\mathbf{t};3)$排序，其中$b(\mathbf{a})$表示$\mathbf{a}$的邻居。$\tilde{b}(\mathbf{a})$ 表示这个排名表中满足$\funp{P}_{\theta}(\mathbf{a}'|\mathbf{s},\mathbf{t};4) > \funp{P}_{\theta}⁡(\mathbf{a}|\mathbf{s},\mathbf{t};4)$的最高排名的$\mathbf{a}'$。 同理可知$\tilde{b}_{i \leftrightarrow j}^{\infty}(\mathbf{a})$ 的意义。这里之所以不用模型3中采用的方法直接利用$b^{\infty}(\mathbf{a})$得到模型4中高概率的对齐，是因为模型4中要想获得某个对齐$\mathbf{a}$的邻居$\mathbf{a}'$必须做很大调整，比如：调整$\tau_{[i]1}$和$\odot_{i}$等等。这个过程要比模型3的相应过程复杂得多。因此在模型4中只能借助于模型3的中间步骤来进行参数估计。
+\parinterval 对于一个对齐$\mathbi{a}$，可用模型3对它的邻居进行排名，即按$\funp{P}_{\theta}(b(\mathbi{a})|\mathbi{s},\mathbi{t};3)$排序，其中$b(\mathbi{a})$表示$\mathbi{a}$的邻居。$\tilde{b}(\mathbi{a})$ 表示这个排名表中满足$\funp{P}_{\theta}(\mathbi{a}'|\mathbi{s},\mathbi{t};4) > \funp{P}_{\theta}⁡(\mathbi{a}|\mathbi{s},\mathbi{t};4)$的最高排名的$\mathbi{a}'$。 同理可知$\tilde{b}_{i \leftrightarrow j}^{\infty}(\mathbi{a})$ 的意义。这里之所以不用模型3中采用的方法直接利用$b^{\infty}(\mathbi{a})$得到模型4中高概率的对齐，是因为模型4中要想获得某个对齐$\mathbi{a}$的邻居$\mathbi{a}'$必须做很大调整，比如：调整$\tau_{[i]1}$和$\odot_{i}$等等。这个过程要比模型3的相应过程复杂得多。因此在模型4中只能借助于模型3的中间步骤来进行参数估计。
 \setlength{\belowdisplayskip}{3pt}%调整空白大小

 %----------------------------------------------------------------------------------------
@@ -392,15 +396,15 @@ s_{>1}(\Delta_j|cb;\mathbf{a},\mathbf{s},\mathbf{t}) & = & \sum_{i=1}^l \big[\va
 \parinterval 模型5的参数估计过程也模型4的过程基本一致，二者的区别在于扭曲度的估计公式。在模型5中，对于目标语第$i$个cept.生成的第一单词，可以得到（假设有$K$个训练样本）：

 \begin{eqnarray}
-d_1(\Delta_j|cb) &=& \mu_{1cb}^{-1} \times \sum_{k=1}^{K}c_1(\Delta_j|cb;\mathbf{s}^{[k]},\mathbf{t}^{[k]})
+d_1(\Delta_j|cb) &=& \mu_{1cb}^{-1} \times \sum_{k=1}^{K}c_1(\Delta_j|cb;\mathbi{s}^{[k]},\mathbi{t}^{[k]})
 \label{eq:1.23}
 \end{eqnarray}

 其中，

 \begin{eqnarray}
-c_1(\Delta_j|cb,v_x,v_y;\mathbf{s},\mathbf{t})                   & = & \sum_{\mathbf{a}}\Big[ \funp{P}(\mathbf{s},\mathbf{a}|\mathbf{t}) \times s_1(\Delta_j|cb,v_x,v_y;\mathbf{a},\mathbf{s},\mathbf{t}) \Big] \label{eq:1.24} \\
-s_1(\Delta_j|cb,v_x,v_y;\mathbf{a},\mathbf{s},\mathbf{t}) & = & \sum_{i=1}^l \Big [ \varepsilon(\varphi_i) \cdot \delta(v_{\pi_{i1}},\Delta_j) \cdot \delta(v_{\odot _{i-1}},v_x) \nonumber \\
+c_1(\Delta_j|cb,v_x,v_y;\mathbi{s},\mathbi{t})                   & = & \sum_{\mathbi{a}}\Big[ \funp{P}(\mathbi{s},\mathbi{a}|\mathbi{t}) \times s_1(\Delta_j|cb,v_x,v_y;\mathbi{a},\mathbi{s},\mathbi{t}) \Big] \label{eq:1.24} \\
+s_1(\Delta_j|cb,v_x,v_y;\mathbi{a},\mathbi{s},\mathbi{t}) & = & \sum_{i=1}^l \Big [ \varepsilon(\varphi_i) \cdot \delta(v_{\pi_{i1}},\Delta_j) \cdot \delta(v_{\odot _{i-1}},v_x) \nonumber \\
                                                                                          &    & \cdot \delta(v_m-\varphi_i+1,v_y) \cdot \delta(v_{\pi_{i1}},v_{\pi_{i1}-1} )\Big] \label{eq:1.25}
 \end{eqnarray}

@@ -408,35 +412,35 @@ s_1(\Delta_j|cb,v_x,v_y;\mathbf{a},\mathbf{s},\mathbf{t}) & = & \sum_{i=1}^l \Bi
 对于目标语第$i$个cept.生成的其他单词（非第一个单词），可以得到：

 \begin{eqnarray}
-d_{>1}(\Delta_j|cb,v) &=& \mu_{>1cb}^{-1} \times \sum_{k=1}^{K}c_{>1}(\Delta_j|cb,v;\mathbf{s}^{[k]},\mathbf{t}^{[k]})
+d_{>1}(\Delta_j|cb,v) &=& \mu_{>1cb}^{-1} \times \sum_{k=1}^{K}c_{>1}(\Delta_j|cb,v;\mathbi{s}^{[k]},\mathbi{t}^{[k]})
 \label{eq:1.26}
 \end{eqnarray}

 其中，

 \begin{eqnarray}
-c_{>1}(\Delta_j|cb,v;\mathbf{s},\mathbf{t})                   & =  & \sum_{\mathbf{a}}\Big[\funp{P}(\mathbf{a},\mathbf{s}|\mathbf{t}) \times s_{>1}(\Delta_j|cb,v;\mathbf{a},\mathbf{s},\mathbf{t}) \Big] \label{eq:1.27} \\
-s_{>1}(\Delta_j|cb,v;\mathbf{a},\mathbf{s},\mathbf{t}) & = & \sum_{i=1}^l\Big[\varepsilon(\varphi_i-1)\sum_{k=2}^{\varphi_i} \big[\delta(v_{\pi_{ik}}-v_{\pi_{[i]k}-1},\Delta_j)  \nonumber \\
+c_{>1}(\Delta_j|cb,v;\mathbi{s},\mathbi{t})                   & =  & \sum_{\mathbi{a}}\Big[\funp{P}(\mathbi{a},\mathbi{s}|\mathbi{t}) \times s_{>1}(\Delta_j|cb,v;\mathbi{a},\mathbi{s},\mathbi{t}) \Big] \label{eq:1.27} \\
+s_{>1}(\Delta_j|cb,v;\mathbi{a},\mathbi{s},\mathbi{t}) & = & \sum_{i=1}^l\Big[\varepsilon(\varphi_i-1)\sum_{k=2}^{\varphi_i} \big[\delta(v_{\pi_{ik}}-v_{\pi_{[i]k}-1},\Delta_j)  \nonumber \\
                                                                                    &     & \cdot \delta(B(\tau_{[i]k}) ,cb) \cdot \delta(v_m-v_{\pi_{i(k-1)}}-\varphi_i+k,v) \nonumber \\
                                                                                    &     & \cdot \delta(v_{\pi_{i1}},v_{\pi_{i1}-1}) \big] \Big] \label{eq:1.28}
 \end{eqnarray}

 \vspace{0.5em}

-\parinterval 从式\ref{eq:1.24}中可以看出因子$\delta(v_{\pi_{i1}},v_{\pi_{i1}-1})$保证了，即使对齐$\mathbf{a}$不合理（一个源语言位置对应多个目标语言位置）也可以避免在这个不合理的对齐上计算结果。需要注意的是因子$\delta(v_{\pi_{p1}},v_{\pi_{p1-1}})$，确保了$\mathbf{a}$中不合理的部分不产生坏的影响，而$\mathbf{a}$中其他正确的部分仍会参与迭代。
+\parinterval 从式\ref{eq:1.24}中可以看出因子$\delta(v_{\pi_{i1}},v_{\pi_{i1}-1})$保证了，即使对齐$\mathbi{a}$不合理（一个源语言位置对应多个目标语言位置）也可以避免在这个不合理的对齐上计算结果。需要注意的是因子$\delta(v_{\pi_{p1}},v_{\pi_{p1-1}})$，确保了$\mathbi{a}$中不合理的部分不产生坏的影响，而$\mathbi{a}$中其他正确的部分仍会参与迭代。

-\parinterval 不过上面的参数估计过程与IBM前4个模型的参数估计过程并不完全一样。IBM前4个模型在每次迭代中，可以在给定$\mathbf{s}$、$\mathbf{t}$和一个对齐$\mathbf{a}$的情况下直接计算并更新参数。但是在模型5的参数估计过程中（如公式\ref{eq:1.24}），需要模拟出由$\mathbf{t}$生成$\mathbf{s}$的过程才能得到正确的结果，因为从$\mathbf{t}$、$\mathbf{s}$和$\mathbf{a}$中是不能直接得到 的正确结果的。具体说，就是要从目标语言句子的第一个单词开始到最后一个单词结束，依次生成每个目标语言单词对应的源语言单词，每处理完一个目标语言单词就要暂停，然后才能计算式\ref{eq:1.24}中求和符号里面的内容。这也就是说即使给定了$\mathbf{s}$、$\mathbf{t}$和一个对齐$\mathbf{a}$，也不能直接在它们上进行计算，必须重新模拟$\mathbf{t}$到$\mathbf{s}$的生成过程。
+\parinterval 不过上面的参数估计过程与IBM前4个模型的参数估计过程并不完全一样。IBM前4个模型在每次迭代中，可以在给定$\mathbi{s}$、$\mathbi{t}$和一个对齐$\mathbi{a}$的情况下直接计算并更新参数。但是在模型5的参数估计过程中（如公式\ref{eq:1.24}），需要模拟出由$\mathbi{t}$生成$\mathbi{s}$的过程才能得到正确的结果，因为从$\mathbi{t}$、$\mathbi{s}$和$\mathbi{a}$中是不能直接得到 的正确结果的。具体说，就是要从目标语言句子的第一个单词开始到最后一个单词结束，依次生成每个目标语言单词对应的源语言单词，每处理完一个目标语言单词就要暂停，然后才能计算式\ref{eq:1.24}中求和符号里面的内容。这也就是说即使给定了$\mathbi{s}$、$\mathbi{t}$和一个对齐$\mathbi{a}$，也不能直接在它们上进行计算，必须重新模拟$\mathbi{t}$到$\mathbi{s}$的生成过程。

-\parinterval 从前面的分析可以看出，虽然模型5比模型4更精确，但是模型5过于复杂以至于给参数估计增加了计算量（对于每组$\mathbf{t}$、$\mathbf{s}$和$\mathbf{a}$都要模拟$\mathbf{t}$生成$\mathbf{s}$的翻译过程）。因此模型5的系统实现是一个挑战。
+\parinterval 从前面的分析可以看出，虽然模型5比模型4更精确，但是模型5过于复杂以至于给参数估计增加了计算量（对于每组$\mathbi{t}$、$\mathbi{s}$和$\mathbi{a}$都要模拟$\mathbi{t}$生成$\mathbi{s}$的翻译过程）。因此模型5的系统实现是一个挑战。

 \parinterval 在模型5中同样需要定义一个词对齐集合$S$，使得每次迭代都在$S$上进行。可以对$S$进行如下定义
 \begin{eqnarray}
-\textrm{S} &=& N(\tilde{\tilde{b}}^{\infty}(V(\mathbf{s}|\mathbf{t};2))) \cup (\mathop{\cup}\limits_{ij} N(\tilde{\tilde{b}}_{i \leftrightarrow j}^{\infty}(V_{i \leftrightarrow j}(\mathbf{s}|\mathbf{t},2))))
+S &=& N(\tilde{\tilde{b}}^{\infty}(V(\mathbi{s}|\mathbi{t};2))) \cup (\mathop{\cup}\limits_{ij} N(\tilde{\tilde{b}}_{i \leftrightarrow j}^{\infty}(V_{i \leftrightarrow j}(\mathbi{s}|\mathbi{t},2))))
 \label{eq:1.29}
 \end{eqnarray}
 \vspace{0.5em}

-\noindent 其中，$\tilde{\tilde{b}}(\mathbf{a})$借用了模型4中$\tilde{b}(\mathbf{a})$的概念。不过$\tilde{\tilde{b}}(\mathbf{a})$表示在利用模型3进行排名的列表中满足$\funp{P}_{\theta}(\mathbf{a}'|\mathbf{s},\mathbf{t};5)$的最高排名的词对齐，这里$\mathbf{a}'$表示$\mathbf{a}$的邻居。
+\noindent 其中，$\tilde{\tilde{b}}(\mathbi{a})$借用了模型4中$\tilde{b}(\mathbi{a})$的概念。不过$\tilde{\tilde{b}}(\mathbi{a})$表示在利用模型3进行排名的列表中满足$\funp{P}_{\theta}(\mathbi{a}'|\mathbi{s},\mathbi{t};5)$的最高排名的词对齐，这里$\mathbi{a}'$表示$\mathbi{a}$的邻居。
 \end{appendices}