合并分支 'master' 到 'mengxia'

Master 查看合并请求 !802

合并分支 'master' 到 'mengxia'
Master 查看合并请求 !802
5cef847e · 孟霞 · be4dbbbd · 08919291 · 5cef847e · 5cef847e
Commit 5cef847e authored Jan 07, 2021 by 孟霞
--- a/Chapter10/chapter10.tex
+++ b/Chapter10/chapter10.tex
--- a/Chapter11/chapter11.tex
+++ b/Chapter11/chapter11.tex
@@ -266,9 +266,9 @@
 \subsection{位置编码}
 \label{sec:11.2.1}

-\parinterval 与基于循环神经网络的翻译模型类似，基于卷积神经网络的翻译模型同样用词嵌入序列来表示输入序列，记为$\seq{w}=\{\mathbi{w}_1,\mathbi{w}_2,...,\mathbi{w}_m\}$。序列$\seq{w}$ 是维度大小为$m \times d$的矩阵，第$i$个单词$\mathbi{w}_i$是维度为$d$的向量，其中$m$为序列长度，$d$为词嵌入向量维度。和循环神经网络不同的是，基于卷积神经网络的模型需要对每个输入单词位置进行表示。这是由于，在卷积神经网络中，受限于卷积核的大小，单层的卷积神经网络只能捕捉序列局部的相对位置信息。虽然多层的卷积神经网络可以扩大感受野，但是对全局的位置表示并不充分。而相较于基于卷积神经网络的模型，基于循环神经网络的模型按时间步对输入的序列进行建模，这样间接的对位置信息进行了建模。而词序又是自然语言处理任务中重要信息，因此这里需要单独考虑。
+\parinterval 与基于循环神经网络的翻译模型类似，基于卷积神经网络的翻译模型同样用词嵌入序列来表示输入序列，记为$\seq{w}=\{\mathbi{w}_1,...,\mathbi{w}_m\}$。序列$\seq{w}$ 是维度大小为$m \times d$的矩阵，第$i$个单词$\mathbi{w}_i$是维度为$d$的向量，其中$m$为序列长度，$d$为词嵌入向量维度。和循环神经网络不同的是，基于卷积神经网络的模型需要对每个输入单词位置进行表示。这是由于，在卷积神经网络中，受限于卷积核的大小，单层的卷积神经网络只能捕捉序列局部的相对位置信息。虽然多层的卷积神经网络可以扩大感受野，但是对全局的位置表示并不充分。而相较于基于卷积神经网络的模型，基于循环神经网络的模型按时间步对输入的序列进行建模，这样间接的对位置信息进行了建模。而词序又是自然语言处理任务中重要信息，因此这里需要单独考虑。

-\parinterval 为了更好地引入序列的词序信息，该模型引入了位置编码$\seq{p}=\{\mathbi{p}_1,\mathbi{p}_2,...,\mathbi{p}_m\}$，其中$\mathbi{p}_i$的维度大小为$d$，一般和词嵌入维度相等，其中具体数值作为网络可学习的参数。简单来说，$\mathbi{p}_i$是一个可学习的参数向量，对应位置$i$的编码。这种编码的作用就是对位置信息进行表示，不同序列中的相同位置都对应一个唯一的位置编码向量。之后将词嵌入矩阵和位置编码进行相加，得到模型的输入序列$\seq{e}=\{\mathbi{w}_1+\mathbi{p}_1,\mathbi{w}_2+\mathbi{p}_2,...,\mathbi{w}_m+\mathbi{p}_m\}$。 也有研究人员发现卷积神经网络本身具备一定的编码位置信息的能力\upcite{Islam2020HowMP}，而这里额外的位置编码模块可以被看作是对卷积神经网络位置编码能力的一种补充。
+\parinterval 为了更好地引入序列的词序信息，该模型引入了位置编码$\seq{p}=\{\mathbi{p}_1,...,\mathbi{p}_m\}$，其中$\mathbi{p}_i$的维度大小为$d$，一般和词嵌入维度相等，其中具体数值作为网络可学习的参数。简单来说，$\mathbi{p}_i$是一个可学习的参数向量，对应位置$i$的编码。这种编码的作用就是对位置信息进行表示，不同序列中的相同位置都对应一个唯一的位置编码向量。之后将词嵌入矩阵和位置编码进行相加，得到模型的输入序列$\seq{e}=\{\mathbi{w}_1+\mathbi{p}_1,...,\mathbi{w}_m+\mathbi{p}_m\}$。 也有研究人员发现卷积神经网络本身具备一定的编码位置信息的能力\upcite{Islam2020HowMP}，而这里额外的位置编码模块可以被看作是对卷积神经网络位置编码能力的一种补充。

 %----------------------------------------------------------------------------------------
 %    NEW SUB-SECTION
@@ -461,7 +461,7 @@
 \subsection{深度可分离卷积}
 \label{sec:11.3.1}

-\parinterval 根据前面的介绍，可以看到卷积神经网络容易用于局部检测和处理位置不变的特征。对于特定的表达，比如地点、情绪等，使用卷积神经网络能达到不错的识别效果，因此它常被用在文本分类中\upcite{Kalchbrenner2014ACN,Kim2014ConvolutionalNN,DBLP:conf/naacl/Johnson015,DBLP:conf/acl/JohnsonZ17}。不过机器翻译所面临的情况更复杂，除了局部句子片段信息，我们还希望模型能够捕获句子结构、语义等信息。虽然单层卷积神经网络在文本分类中已经取得了很好的效果\upcite{Kim2014ConvolutionalNN}，但是神经机器翻译等任务仍然需要有效的卷积神经网络。随着深度可分离卷积在机器翻译中的探索\upcite{Kaiser2018DepthwiseSC}，更高效的网络结构被设计出来，获得了比ConvS2S模型更好的性能。
+\parinterval 根据前面的介绍，可以看到卷积神经网络容易用于局部检测和处理位置不变的特征。对于特定的表达，比如地点、情绪等，使用卷积神经网络能达到不错的识别效果，因此它常被用在文本分类中\upcite{Kalchbrenner2014ACN,Kim2014ConvolutionalNN,DBLP:conf/naacl/Johnson015,DBLP:conf/acl/JohnsonZ17}。不过机器翻译所面临的情况更复杂，除了局部句子片段信息，研究人员还希望模型能够捕获句子结构、语义等信息。虽然单层卷积神经网络在文本分类中已经取得了很好的效果\upcite{Kim2014ConvolutionalNN}，但是神经机器翻译等任务仍然需要有效的卷积神经网络。随着深度可分离卷积在机器翻译中的探索\upcite{Kaiser2018DepthwiseSC}，更高效的网络结构被设计出来，获得了比ConvS2S模型更好的性能。

 %----------------------------------------------
 % 图17.
@@ -475,7 +475,7 @@

 \parinterval 深度可分离卷积由深度卷积和逐点卷积两部分结合而成\upcite{sifre2014rigid}。图\ref{fig:11-17}对比了标准卷积、深度卷积和逐点卷积，为了方便显示，图中只画出了部分连接。

-\parinterval 给定输入序列表示$\seq{x} = \{ \mathbi{x}_1,\mathbi{x}_2,...,\mathbi{x}_m \}$，其中$m$为序列长度，$\mathbi{x}_i \in \mathbb{R}^{O} $ ，$O$ 即输入序列的通道数。为了获得与输入序列长度相同的卷积输出结果，首先需要进行填充。为了方便描述，这里在输入序列尾部填充 $K-1$ 个元素（$K$为卷积核窗口的长度），其对应的卷积结果为$\seq{z} = \{ \mathbi{z}_1,\mathbi{z}_2,...,\mathbi{z}_m \}$。
+\parinterval 给定输入序列表示$\seq{x} = \{ \mathbi{x}_1,...,\mathbi{x}_m \}$，其中$m$为序列长度，$\mathbi{x}_i \in \mathbb{R}^{O} $ ，$O$ 即输入序列的通道数。为了获得与输入序列长度相同的卷积输出结果，首先需要进行填充。为了方便描述，这里在输入序列尾部填充 $K-1$ 个元素（$K$为卷积核窗口的长度），其对应的卷积结果为$\seq{z} = \{ \mathbi{z}_1,...,\mathbi{z}_m \}$。
 在标准卷积中，若使用N表示卷积核的个数，也就是标准卷积输出序列的通道数，那么对于第$i$个位置的第$n$个通道$ \mathbi{z}_{i,n}^\textrm{\,std}$，其标准卷积具体计算如下：
 \begin{eqnarray}
 \mathbi{z}_{i,n}^\textrm{\,std} &=& \sum_{o=1}^{O} \sum_{k=0}^{K-1} \mathbi{W}_{k,o,n}^\textrm{\,std} \mathbi{x}_{i+k,o}

--- a/Chapter12/chapter12.tex
+++ b/Chapter12/chapter12.tex
@@ -319,7 +319,7 @@

 \subsection{多头注意力机制}

-\parinterval Transformer中使用的另一项重要技术是{\small\sffamily\bfseries{多头注意力机制}}\index{多头注意力机制}（Multi-head Attention）\index{Multi-head Attention}。“多头”可以理解成将原来的$\mathbi{Q}$、$\mathbi{K}$、$\mathbi{V}$按照隐层维度平均切分成多份。假设切分$h$份，那么最终会得到$\mathbi{Q} = \{ \mathbi{Q}_1, \mathbi{Q}_2,...,\mathbi{Q}_h \}$，$\mathbi{K}=\{ \mathbi{K}_1,\mathbi{K}_2,...,\mathbi{K}_h \}$，$\mathbi{V}=\{ \mathbi{V}_1, \mathbi{V}_2,...,\mathbi{V}_h \}$。多头注意力就是用每一个切分得到的$\mathbi{Q}$，$\mathbi{K}$，$\mathbi{V}$独立的进行注意力计算，即第$i$个头的注意力计算结果$\mathbi{head}_i = \textrm{Attention}(\mathbi{Q}_i,\mathbi{K}_i, \mathbi{V}_i)$。
+\parinterval Transformer中使用的另一项重要技术是{\small\sffamily\bfseries{多头注意力机制}}\index{多头注意力机制}（Multi-head Attention）\index{Multi-head Attention}。“多头”可以理解成将原来的$\mathbi{Q}$、$\mathbi{K}$、$\mathbi{V}$按照隐层维度平均切分成多份。假设切分$h$份，那么最终会得到$\mathbi{Q} = \{ \mathbi{Q}_1,...,\mathbi{Q}_h \}$，$\mathbi{K}=\{ \mathbi{K}_1,...,\mathbi{K}_h \}$，$\mathbi{V}=\{ \mathbi{V}_1,...,\mathbi{V}_h \}$。多头注意力就是用每一个切分得到的$\mathbi{Q}$，$\mathbi{K}$，$\mathbi{V}$独立的进行注意力计算，即第$i$个头的注意力计算结果$\mathbi{head}_i = \textrm{Attention}(\mathbi{Q}_i,\mathbi{K}_i, \mathbi{V}_i)$。

 \parinterval 下面根据图\ref{fig:12-12}详细介绍多头注意力的计算过程：


--- a/Chapter13/Figures/figure-framework-of-Adversarial-Neural-machine-translation.tex
+++ b/Chapter13/Figures/figure-framework-of-Adversarial-Neural-machine-translation.tex
@@ -4,25 +4,25 @@

 \begin{tikzpicture}

-\tikzstyle{rnnnode} = [draw,inner sep=2pt,minimum width=4em,minimum height=2em,rounded corners=1pt,fill=yellow!20]
-\tikzstyle{snode} = [draw,inner sep=2pt,minimum width=4em,minimum height=2em,rounded corners=1pt,fill=red!20]
-\tikzstyle{wode} = [inner sep=0pt,minimum width=4em,minimum height=2em,rounded corners=0pt]
-
-\node [anchor=west,wode] (n1) at (0,0) {${y}_1,{y}_2,\ldots,{y}_n$};
-\node [anchor=north west,wode] (n2) at ([xshift=1em,yshift=0.5em]n1.south east) {${x}_1,{x}_2,\ldots,{x}_m$};
-\node [anchor=south west,rnnnode] (n3) at ([xshift=8em,yshift=0.5em]n2.north east) {生成模型G};
-\node [anchor=south east,wode] (n4) at ([xshift=-2em,yshift=0em]n3.north west) {$\tilde{{y}}_{1},\tilde{{y}}_{2},...,\tilde{{y}}_{J}$};
-\node [anchor=south,snode] (n5) at ([xshift=0em,yshift=6em]n2.north) {判别网络D};
+\tikzstyle{rnnnode} = [draw,inner sep=4pt,minimum width=2em,minimum height=2em,rounded corners=1pt,fill=yellow!20]
+\tikzstyle{snode} = [draw,inner sep=4pt,minimum width=2em,minimum height=2em,rounded corners=1pt,fill=red!20]
+\tikzstyle{wode} = [inner sep=0pt,minimum width=2em,minimum height=2em,rounded corners=0pt]
+
+\node [anchor=west,wode] (n1) at (0,0) {$y$};
+\node [anchor=north west,wode] (n2) at ([xshift=3em,yshift=-2.5em]n1.south east) {$x$};
+\node [anchor=south west,rnnnode] (n3) at ([xshift=8em,yshift=0.5em]n2.north east) {生成模型$G$};
+\node [anchor=south east,wode] (n4) at ([xshift=-2em,yshift=0em]n3.north west) {$\tilde{y}$};
+\node [anchor=south,snode] (n5) at ([xshift=0em,yshift=6em]n2.north) {判别网络$D$};
 \node [anchor=west,align=left,font=\small] (n6) at ([xshift=15em,yshift=-3em]n5.east) {根据$(\seq{x},\seq{\tilde{y}})$生\\成奖励信号};


-\draw [->,thick] ([xshift=0em,yshift=0em]n1.north)--([xshift=0em,yshift=0em]n5.south);
-\draw [->,thick] ([xshift=0em,yshift=0em]n2.north)--([xshift=0em,yshift=0em]n5.south);
-\draw [->,thick] ([xshift=0em,yshift=0em]n4.west)--([xshift=0em,yshift=0em]n5.south);
+\draw [->,thick] ([xshift=0em,yshift=-0.3em]n1.north)--([xshift=-0.3em,yshift=-0.1em]n5.south);
+\draw [->,thick] ([xshift=0em,yshift=0em]n2.north)--([xshift=0em,yshift=-0.1em]n5.south);
+\draw [->,thick] ([xshift=0em,yshift=-0.5em]n4.north west)--([xshift=0.3em,yshift=-0.1em]n5.south);
 \draw [->,thick] ([xshift=0em,yshift=0em]n3.north)--([xshift=0em,yshift=1em]n3.north)--([xshift=0em,yshift=0em]n4.east);


-\draw [->,thick] ([xshift=0em,yshift=0em]n5.east) --  ([xshift=14.5em,yshift=0em]n5.east) --  ([xshift=1em,yshift=0em]n3.east) --  ([xshift=0em,yshift=0em]n3.east);
+\draw [->,thick] ([xshift=0em,yshift=0em]n5.east) --  ([xshift=12.9em,yshift=0em]n5.east) --  ([xshift=1em,yshift=0em]n3.east) --  ([xshift=0em,yshift=0em]n3.east);

 \draw [->,thick] ([xshift=0em,yshift=0em]n2.east) --  ([xshift=0em,yshift=-1.5em]n3.south) --  ([xshift=0em,yshift=0em]n3.south);


--- a/Chapter13/Figures/figure-of-scheduling-sampling-method.tex
+++ b/Chapter13/Figures/figure-of-scheduling-sampling-method.tex
@@ -31,9 +31,9 @@
 \node [anchor=south,inner sep=2pt] (st2) at (n8.north) {\scriptsize{\textbf{[step $j$]}}};
 \node [anchor=south,inner sep=2pt] (st3) at (n14.north) {\scriptsize{\textbf{[step $1$]}}};

-\node [anchor=north,font=\tiny,rotate=90] (e1) at ([xshift=-2.7em,yshift=-1.1em]n3.south) {${(1-\epsilon_i)}^2$};
+\node [anchor=north,font=\tiny,rotate=90] (e1) at ([xshift=-2.7em,yshift=-1.1em]n3.south) {${1-\epsilon_i}$};
 %\node [anchor=north,font=\scriptsize] (e2) at ([xshift=2em,yshift=-0.1em]n3.south) {$\funp{P}=\epsilon_i$};
-%\node [anchor=north,font=\scriptsize] (e3) at ([xshift=-2em,yshift=-1em]n4.south) {$\funp{P}={(1-\epsilon_i)}^2$};
+%\node [anchor=north,font=\scriptsize] (e3) at ([xshift=-2em,yshift=-1em]n4.south) {$\funp{P}={1-\epsilon_i}$};
 \node [anchor=north,font=\tiny,rotate=90] (e4) at ([xshift=1.5em,yshift=-1.2em]n4.south) {$\epsilon_i$};

 %\node [anchor=south east,font=\small] (l1) at ([xshift=-1em,yshift=0.5em]n5.north west) {Loss};

--- a/Chapter13/Figures/figure-reinforcement-learning-method-based-on-actor-critic.tex
+++ b/Chapter13/Figures/figure-reinforcement-learning-method-based-on-actor-critic.tex

 \begin{tikzpicture}
 	
-	\node[anchor=west,inner sep=0mm,minimum height=4em,minimum width=5.5em,rounded corners=15pt,align=left,draw] (n1) at (0,0) {Decoder\\Encoder};
+	\node[anchor=west,inner sep=0mm,minimum height=4em,minimum width=5.5em,rounded corners=15pt,align=left,draw,fill=red!20] (n1) at (0,0) {Decoder\\Encoder};

-	\node[anchor=west,inner sep=0mm,minimum height=4em,minimum width=5.5em,rounded corners=15pt,align=left,draw] (n2) at ([xshift=10em,yshift=0em]n1.east) {Decoder\\Encoder};
+	\node[anchor=west,inner sep=0mm,minimum height=4em,minimum width=5.5em,rounded corners=15pt,align=left,draw,fill=green!20] (n2) at ([xshift=10em,yshift=0em]n1.east) {Decoder\\Encoder};

-	\node[anchor=south,inner sep=0mm,font=\small] (a1) at ([xshift=0em,yshift=1em]n1.north) {演员$p_{\theta}$};
+	\node[anchor=south,inner sep=0mm,font=\small] (a1) at ([xshift=0em,yshift=1em]n1.north) {演员$p$};

-	\node[anchor=north,inner sep=0mm] (a2) at ([xshift=0em,yshift=-1em]n1.south) {${x}_1,{x}_2,\ldots,{x}_m$};
+	\node[anchor=north,inner sep=0mm] (a2) at ([xshift=0em,yshift=-1em]n1.south) {$x$};

 	\node[anchor=south,inner sep=0mm,font=\small] (c1) at ([xshift=0em,yshift=1em]n2.north) {评论家$Q$};
-	\node[anchor=north,inner sep=0mm] (c2) at ([xshift=0em,yshift=-1em]n2.south) {${y}_1,{y}_2,\ldots,{y}_J$};
+	\node[anchor=north,inner sep=0mm] (c2) at ([xshift=0em,yshift=-1em]n2.south) {$y$};

 %	\node[anchor=west,inner sep=0mm] (n3) at ([xshift=2.1em,yshift=2em]n1.east) {$Q_1,Q_2,\ldots,Q_J$};
 %	\node[anchor=west,inner sep=0mm] (n4) at ([xshift=2.9em,yshift=-0.4em]n1.east) {$\hat{\mathbi{y}}_1,\hat{\mathbi{y}}_2,\ldots,\hat{\mathbi{y}}_J$};
@@ -27,8 +27,8 @@
 	\node[anchor=west,inner sep=0mm] (n3) at ([xshift=2.1em,yshift=1em]n1.east) {$Q_1,Q_2,\ldots,Q_J$};
 	\node[anchor=west,inner sep=0mm] (n4) at ([xshift=2.9em,yshift=-1em]n1.east) {$\tilde{{y}}_1,\tilde{{y}}_2,\ldots,\tilde{{y}}_J$};

-\draw [->,thick] ([xshift=0em,yshift=0.2em]n2.west) -- ([xshift=0em,yshift=0.2em]n1.east);
-\draw [->,thick] ([xshift=0em,yshift=-0.2em]n1.east) -- ([xshift=0em,yshift=-0.2em]n2.west);
+\draw [->,thick] ([xshift=-0.1em,yshift=0.6em]n2.west) -- ([xshift=0.1em,yshift=0.6em]n1.east);
+\draw [->,thick] ([xshift=0.1em,yshift=-0.6em]n1.east) -- ([xshift=-0.1em,yshift=-0.6em]n2.west);


 \end{tikzpicture}
\ No newline at end of file
--- a/Chapter13/chapter13.tex
+++ b/Chapter13/chapter13.tex
--- a/Chapter14/Figures/figure-beamsize-bleu.tex
+++ b/Chapter14/Figures/figure-beamsize-bleu.tex
@@ -6,7 +6,7 @@ width=8cm,
 height=5cm,
 yticklabel style={/pgf/number format/.cd,fixed,precision=2},
 xticklabel style={/pgf/number format/.cd,fixed,precision=2},
-xlabel={\footnotesize{$\log$\;(束大小)}},ylabel={\footnotesize{BLEU\ (\%)}},
+xlabel={\footnotesize{搜索束大小（取log）}},ylabel={\footnotesize{BLEU\ (\%)}},
 ymin=28.8,ymax=30.4,
 xmin=0,xmax=7,
 xtick={0,1,2,3,4,5,6,7},

--- a/Chapter14/Figures/figure-different-integration-model.tex
+++ b/Chapter14/Figures/figure-different-integration-model.tex
@@ -98,7 +98,7 @@
                \draw [-latex,blue] (lattice5) to [out=-60,in=-90] (lattice3);

                \begin{pgfonlayer}{background}
-                    \node [draw=blue,fill=white,drop shadow,thick,rounded corners=3pt,inner sep=5pt,fit=(lattice1) (lattice2) (lattice3) (lattice4) (lattice5),label={[font=\tiny,label distance=0pt]90:Lattice}] (lattice) {};
+                    \node [draw=blue,fill=white,drop shadow,thick,rounded corners=3pt,inner sep=5pt,fit=(lattice1) (lattice2) (lattice3) (lattice4) (lattice5),label={[font=\tiny,label distance=0pt]90:词格}] (lattice) {};
                \end{pgfonlayer}

                \draw [->,very thick] (output) to (lattice);

--- a/Chapter14/Figures/figure-different-softmax.tex
+++ b/Chapter14/Figures/figure-different-softmax.tex
@@ -57,7 +57,7 @@
    \node [font=\small] (label2) at ([yshift=0.6cm]out2.north) {Softmax};

    \node [anchor=west,layer,fill=orange!15!white] (net3) at ([xshift=2cm]net2.east) {};
-    \node [anchor=north,font=\scriptsize] (input3) at ([yshift=-0.5cm]net3.south) {源语};
+    \node [anchor=north,font=\scriptsize] (input3) at ([yshift=-0.5cm]net3.south) {源语言};
    \node [anchor=south,layer,align=center,font=\scriptsize,fill=yellow!10!white] (out3) at ([yshift=0.9cm]net3.north) {候选\\列表};

    \draw [->,line width=1pt] (input3) to (net3);

--- a/Chapter14/Figures/figure-iteration.tex
+++ b/Chapter14/Figures/figure-iteration.tex
@@ -11,29 +11,29 @@
 \node (point)[right of=decoder_2,xshift=2.5cm,]{\LARGE{...}};
 \node (decoder_3)[er,thick,draw,right of=point,xshift=2.5cm,fill=red!20]{\Large{解码器}};
 \draw [->,very thick,draw=black!70]([xshift=0.2cm]encoder.east) --  ([xshift=-0.2cm]decoder_1.west);
-\draw [->,very thick,draw=black!70]([xshift=0.2cm]decoder_1.east) --  ([xshift=-0.2cm]decoder_2.west);
-\draw [->,very thick,draw=black!70]([xshift=0.2cm]decoder_2.east) --  ([xshift=-0.1cm]point.west);
-\draw [->,very thick,draw=black!70]([xshift=0.1cm]point.east) --  ([xshift=-0.2cm]decoder_3.west);
+%\draw [->,very thick,draw=black!70]([xshift=0.2cm]decoder_1.east) --  ([xshift=-0.2cm]decoder_2.west);
+%\draw [->,very thick,draw=black!70]([xshift=0.2cm]decoder_2.east) --  ([xshift=-0.1cm]point.west);
+%\draw [->,very thick,draw=black!70]([xshift=0.1cm]point.east) --  ([xshift=-0.2cm]decoder_3.west);

 \draw [->,very thick,draw=black!70]([xshift=0,yshift=-1cm]encoder.south) --  ([xshift=0,yshift=-0.2cm]encoder.south);
 \draw [->,very thick,draw=black!70]([xshift=0,yshift=0.2cm]encoder.north) --  ([xshift=0,yshift=1cm]encoder.north);
 \node [below of = encoder,xshift=0cm,yshift=2.2cm]{预测目标长度};
-\node [below of = encoder,xshift=0cm,yshift=-2.2cm]{\Large$x$};
+\node [below of = encoder,xshift=0cm,yshift=-2.2cm]{\Large$\seq{x}$};

 \draw [->,very thick,draw=black!70]([xshift=0,yshift=-1cm]decoder_1.south) --  ([xshift=0,yshift=-0.2cm]decoder_1.south);
 \draw [->,very thick,draw=black!70]([xshift=0,yshift=0.2cm]decoder_1.north) --  ([xshift=0,yshift=1cm]decoder_1.north);
-\node [below of = decoder_1,xshift=0cm,yshift=-2.2cm]{\Large$x'$};
-\node (line1_1)[below of = decoder_1,xshift=0cm,yshift=2.2cm]{\Large$y'$};
+\node [below of = decoder_1,xshift=0cm,yshift=-2.2cm]{\Large$\seq{x'}$};
+\node (line1_1)[below of = decoder_1,xshift=0cm,yshift=2.2cm]{\Large$\seq{y}^{[1]}$};

 \draw [->,thick,]([xshift=0,yshift=-1cm]decoder_2.south) --  ([xshift=0,yshift=-0.2cm]decoder_2.south);
 \draw [->,very thick,draw=black!70]([xshift=0,yshift=0.2cm]decoder_2.north) --  ([xshift=0,yshift=1cm]decoder_2.north);
-\node (line1_2)[below of = decoder_2,xshift=0cm,yshift=-2.2cm]{\Large$y'$};
-\node [below of = decoder_2,xshift=0cm,yshift=2.2cm]{\Large$y''$};
+\node (line1_2)[below of = decoder_2,xshift=0cm,yshift=-2.2cm]{\Large$\seq{y}^{[1]}$};
+\node [below of = decoder_2,xshift=0cm,yshift=2.2cm]{\Large$\seq{y}^{[2]}$};

 \draw [->,very thick,draw=black!70]([xshift=0,yshift=-1cm]decoder_3.south) --  ([xshift=0,yshift=-0.2cm]decoder_3.south);
 \draw [->,very thick,draw=black!70]([xshift=0,yshift=0.2cm]decoder_3.north) --  ([xshift=0,yshift=1cm]decoder_3.north);
-\node (line3_2)[below of = decoder_3,xshift=0cm,yshift=-2.2cm]{\Large$y^{N-1}$};
-\node [below of = decoder_3,xshift=0cm,yshift=2.2cm]{\Large$y^N$};
+\node (line3_2)[below of = decoder_3,xshift=0cm,yshift=-2.2cm]{\Large$\seq{y}^{[N-1]}$};
+\node [below of = decoder_3,xshift=0cm,yshift=2.2cm]{\Large$\seq{y}^{[N]}$};

 \draw[->,very thick,draw=black!70, out=0, in=180,dotted] (line1_1.east) to (line1_2.west);
 \draw[->,very thick,draw=black!70, out=0, in=180,dotted] ([xshift=4cm]line1_1.east) to ([xshift=3cm]line1_2.west);

--- a/Chapter14/Figures/figure-mask-predict.tex
+++ b/Chapter14/Figures/figure-mask-predict.tex
@@ -5,8 +5,11 @@
 \node (encoder)[er,thick,minimum width=5.5cm,fill=ugreen!20]{\huge{编码器}};
 \node (decoder)[er,thick,right of=encoder,xshift=7.75cm,fill=red!20]{\huge{解码器}};
 \node (decoder_1)[er,thick,right of=decoder,xshift=8.75cm,fill=red!20]{\huge{解码器}};
-\draw [->,very thick,draw=black!70]([xshift=0.2cm]encoder.east) --  ([xshift=-0.2cm]decoder.west);
-\draw [->,very thick,draw=black!70]([xshift=0.2cm]decoder.east) --  ([xshift=-0.2cm]decoder_1.west);
+\draw [->,very thick,draw=blue!70]([xshift=0.2cm]encoder.east) --  ([xshift=-0.2cm]decoder.west);
+
+\begin{pgfonlayer}{background}
+\draw [->,very thick,draw=blue!70]([xshift=0.2cm,yshift=-0.8em]encoder.east) --  ([xshift=-0.2cm,yshift=-0.8em]decoder_1.west);
+\end{pgfonlayer}

 \foreach \x in {-2.2cm,-1.1cm,...,2.2cm}
 \draw [->,very thick,draw=black!70]([xshift=\x,yshift=-1cm]encoder.south) --  ([xshift=\x,yshift=-0.2cm]encoder.south);

--- a/Chapter14/Figures/figure-reranking.tex
+++ b/Chapter14/Figures/figure-reranking.tex
@@ -34,7 +34,7 @@
 	\node[anchor=south,font=\scriptsize,align=center] (w5) at ([yshift=1.6em]box3.north){\tiny\bfnew{对 \ 这个 \ 问题 \ 存在 \ 不同的 \ 看法}};
 	\node[font=\tiny] at ([xshift=-0.8em,yshift=-0.6em]encoder.east) {$N\times$};
 	\node[font=\tiny] at ([xshift=-0.8em,yshift=-0.6em]decoder.east) {$1\times$};
-	\node[font=\tiny] at ([xshift=-1em,yshift=-0.6em]decoder2.east) {$N$-1$\times$};
+	\node[font=\tiny] at ([xshift=-1em,yshift=-0.6em]decoder2.east) {$N-1\times$};
 	
 	\draw[line] (w1.north) -- (box1.south);
 	\draw[line] (w2.north) -- (box2.south);

--- a/Chapter14/Figures/figure-word-string-representation.tex
+++ b/Chapter14/Figures/figure-word-string-representation.tex
@@ -29,7 +29,7 @@
 \node [anchor= west] (word3) at ([xshift=1.4em,yshift=-3em]pos4.east){She};
 \node [anchor= west] (word4) at ([xshift=1.1em,yshift=2.8em]pos5.east){Have};
 \node [anchor= west] (word5) at ([xshift=1.3em,yshift=-2.8em]pos5.east){Has};
-\node [anchor= south] (labelb) at ([xshift=3em,yshift=-3em]word3.south){\small{(b)Lattice词串表示}};
+\node [anchor= south] (labelb) at ([xshift=3em,yshift=-3em]word3.south){\small{(b) 基于词格的词串表示}};
 \begin{pgfonlayer}{background}
 {
 % I
@@ -56,7 +56,7 @@
 \node [anchor= west] (pos3) at ([xshift=3.0em]pos2.east){$\circ$};
 \node [anchor= west] (pos2-2) at ([xshift=0.1em,yshift=1.0em]pos2.east){has};
 \draw[->,thick](pos2.east)--(pos3.west);
-\node [anchor= south] (labela) at ([xshift=2em,yshift=-3em]pos1-2.south){\small{(a)$n$-best词串表示}};
+\node [anchor= south] (labela) at ([xshift=2em,yshift=-3em]pos1-2.south){\small{(a) $n$-best词串表示}};
 \end{scope}

 \end{tikzpicture}

--- a/Chapter14/chapter14.tex
+++ b/Chapter14/chapter14.tex
--- a/Chapter15/Figures/figure-encoder-of-bidirectional-tree-structure.png
+++ b/Chapter15/Figures/figure-encoder-of-bidirectional-tree-structure.png
--- a/Chapter15/Figures/figure-encoder-tree-structure-modeling.png
+++ b/Chapter15/Figures/figure-encoder-tree-structure-modeling.png
--- a/Chapter15/Figures/figure-evolution-and-change-of-ml-methods.jpg
+++ b/Chapter15/Figures/figure-evolution-and-change-of-ml-methods.jpg
--- a/Chapter15/Figures/figure-evolution-and-change-of-ml-methods.tex
+++ b/Chapter15/Figures/figure-evolution-and-change-of-ml-methods.tex
--- a/Chapter15/Figures/figure-layer-fusion-method-2d.png
+++ b/Chapter15/Figures/figure-layer-fusion-method-2d.png
--- a/Chapter15/Figures/figure-layer-fusion-method-2d.tex
+++ b/Chapter15/Figures/figure-layer-fusion-method-2d.tex
+
+\begin{tikzpicture}
+\begin{scope}
+
+\tikzstyle{lnode}=[rectangle,inner sep=0mm,minimum height=1.5em,minimum width=3.5em,rounded corners=2pt,draw]
+\tikzstyle{snode}=[rectangle,inner sep=0mm,minimum height=1.5em,minimum width=0.8em,rounded corners=2pt,draw]
+\tikzstyle{vlnode}=[rectangle,inner sep=0mm,minimum height=1em,minimum width=5em,rounded corners=2pt,draw]
+
+
+\node [anchor=west,lnode] (n1) at (0, 0) {$\mathbi{g}^3$};
+\node [anchor=north west,lnode] (n2) at ([xshift=0em,yshift=-0.5em]n1.south west) {$\mathbi{g}^2$};
+\node [anchor=north west,lnode] (n3) at ([xshift=0em,yshift=-0.5em]n2.south west) {$\mathbi{g}^1$};
+
+\node [anchor=south] (d1) at ([xshift=0em,yshift=0.2em]n1.north) {1D};
+
+\node [anchor=west,lnode] (n4) at ([xshift=1.2em,yshift=0em]n1.east) {};
+\node [anchor=west,lnode] (n5) at ([xshift=1.2em,yshift=0em]n2.east) {};
+\node [anchor=west,lnode] (n6) at ([xshift=1.2em,yshift=0em]n3.east) {};
+
+\node [anchor=south,lnode] (n7) at ([xshift=0em,yshift=1em]n4.north) {$\mathbi{W}_1$};
+
+\node [anchor=west] (sig) at ([xshift=0em,yshift=0.4em]n5.east) {$\sigma$};
+
+\node [anchor=west,snode,fill=purple!30] (nc11) at ([xshift=1.2em,yshift=0em]n4.east) {};
+\node [anchor=west,snode,fill=yellow!30] (nc12) at ([xshift=0em,yshift=0em]nc11.east) {};
+\node [anchor=west,snode,fill=red!30] (nc13) at ([xshift=0em,yshift=0em]nc12.east) {};
+\node [anchor=west,snode,fill=blue!30] (nc14) at ([xshift=0em,yshift=0em]nc13.east) {};
+\node [anchor=west,snode,font=\footnotesize,fill=ugreen!30] (nc15) at ([xshift=0em,yshift=0em]nc14.east) {$\mathbi{o}_5^3$};
+
+\node [anchor=west,snode,fill=purple!30] (nc21) at ([xshift=1.2em,yshift=0em]n5.east) {};
+\node [anchor=west,snode,fill=yellow!30] (nc22) at ([xshift=0em,yshift=0em]nc21.east) {};
+\node [anchor=west,snode,fill=red!30] (nc23) at ([xshift=0em,yshift=0em]nc22.east) {};
+\node [anchor=west,snode,fill=blue!30] (nc24) at ([xshift=0em,yshift=0em]nc23.east) {};
+\node [anchor=west,snode,font=\footnotesize,fill=ugreen!30] (nc25) at ([xshift=0em,yshift=0em]nc24.east) {$\mathbi{o}_5^2$};
+
+\node [anchor=west,snode,fill=purple!30] (nc31) at ([xshift=1.2em,yshift=0em]n6.east) {};
+\node [anchor=west,snode,fill=yellow!30] (nc32) at ([xshift=0em,yshift=0em]nc31.east) {};
+\node [anchor=west,snode,fill=red!30] (nc33) at ([xshift=0em,yshift=0em]nc32.east) {};
+\node [anchor=west,snode,fill=blue!30] (nc34) at ([xshift=0em,yshift=0em]nc33.east) {};
+\node [anchor=west,snode,font=\footnotesize,fill=ugreen!30] (nc35) at ([xshift=0em,yshift=0em]nc34.east) {$\mathbi{o}_5^1$};
+
+\node [anchor=south,lnode] (n8) at ([xshift=0em,yshift=1em]nc13.north) {$\mathbi{W}_2$};
+
+\node [anchor=west,font=\footnotesize] (n9) at ([xshift=0.1em,yshift=0.5em]nc25.east) {Softmax};
+
+\node [anchor=west,snode,fill=purple!30] (ns11) at ([xshift=3.5em,yshift=0em]nc15.east) {};
+\node [anchor=west,snode,fill=yellow!30] (ns12) at ([xshift=0em,yshift=0em]ns11.east) {};
+\node [anchor=west,snode,fill=red!30] (ns13) at ([xshift=0em,yshift=0em]ns12.east) {};
+\node [anchor=west,snode,fill=blue!30] (ns14) at ([xshift=0em,yshift=0em]ns13.east) {};
+\node [anchor=west,snode,font=\tiny,fill=ugreen!30] (ns15) at ([xshift=0em,yshift=0em]ns14.east) {0.3};
+
+\node [anchor=west,snode,fill=purple!30] (ns21) at ([xshift=3.5em,yshift=0em]nc25.east) {};
+\node [anchor=west,snode,fill=yellow!30] (ns22) at ([xshift=0em,yshift=0em]ns21.east) {};
+\node [anchor=west,snode,fill=red!30] (ns23) at ([xshift=0em,yshift=0em]ns22.east) {};
+\node [anchor=west,snode,fill=blue!30] (ns24) at ([xshift=0em,yshift=0em]ns23.east) {};
+\node [anchor=west,snode,font=\tiny,fill=ugreen!30] (ns25) at ([xshift=0em,yshift=0em]ns24.east) {0.2};
+
+\node [anchor=west,snode,fill=purple!30] (ns31) at ([xshift=3.5em,yshift=0em]nc35.east) {};
+\node [anchor=west,snode,fill=yellow!30] (ns32) at ([xshift=0em,yshift=0em]ns31.east) {};
+\node [anchor=west,snode,fill=red!30] (ns33) at ([xshift=0em,yshift=0em]ns32.east) {};
+\node [anchor=west,snode,fill=blue!30] (ns34) at ([xshift=0em,yshift=0em]ns33.east) {};
+\node [anchor=west,snode,font=\tiny,fill=ugreen!30] (ns35) at ([xshift=0em,yshift=0em]ns34.east) {0.5};
+
+\node [anchor=west,vlnode,fill=purple!30] (ln1) at ([xshift=3.5em,yshift=-1.5em]ns15.east) {};
+\node [anchor=north west,vlnode,fill=yellow!30] (ln2) at ([xshift=-0.4em,yshift=-0.4em]ln1.north west) {};
+\node [anchor=north west,vlnode,fill=red!30] (ln3) at ([xshift=-0.4em,yshift=-0.4em]ln2.north west) {};
+\node [anchor=north west,vlnode,fill=blue!30] (ln4) at ([xshift=-0.4em,yshift=-0.4em]ln3.north west) {};
+\node [anchor=north west,vlnode,fill=ugreen!30] (ln5) at ([xshift=-0.4em,yshift=-0.4em]ln4.north west) {};
+
+\node [anchor=south] (d2) at ([xshift=0em,yshift=0.2em]ln1.north) {2D};
+
+\node [anchor=south,vlnode,rotate=-90] (ffn) at ([xshift=2em,yshift=0em]ln3.east) {FFN};
+
+\node [anchor=west,rectangle,inner sep=0mm,minimum height=3.5em,minimum width=0.8em,rounded corners=2pt,draw] (fn) at ([xshift=1.5em,yshift=0em]ffn.north) {$\mathbi{g}$};
+
+
+\draw [->,thick] ([xshift=0em,yshift=0em]n1.east) -- ([xshift=0em,yshift=0em]n4.west);
+\draw [->,thick] ([xshift=0em,yshift=0em]n2.east) -- ([xshift=0em,yshift=0em]n5.west);
+\draw [->,thick] ([xshift=0em,yshift=0em]n3.east) -- ([xshift=0em,yshift=0em]n6.west);
+
+\draw [->,thick] ([xshift=0em,yshift=0em]n4.east) -- ([xshift=0em,yshift=0em]nc11.west);
+\draw [->,thick] ([xshift=0em,yshift=0em]n5.east) -- ([xshift=0em,yshift=0em]nc21.west);
+\draw [->,thick] ([xshift=0em,yshift=0em]n6.east) -- ([xshift=0em,yshift=0em]nc31.west);
+
+\draw [->,thick] ([xshift=0em,yshift=0em]n7.south) -- ([xshift=0em,yshift=0em]n4.north);
+\draw [->,thick] ([xshift=0em,yshift=0em]n8.south) -- ([xshift=0em,yshift=0em]nc13.north);
+
+\draw [->,thick] ([xshift=0em,yshift=0em]nc25.east) -- ([xshift=0em,yshift=0em]ns21.west);
+
+\draw[->,thick,dotted] ([xshift=0em,yshift=-0em]ns15.east)..controls +(east:1.5em) and +(west:1.5em)..([xshift=-0em,yshift=-0em]ln5.west) ;
+\draw[->,thick,dotted] ([xshift=0em,yshift=-0em]ns25.east)..controls +(east:1em) and +(west:1em)..([xshift=-0em,yshift=-0em]ln5.west) ;
+\draw[->,thick,dotted] ([xshift=0em,yshift=-0em]ns35.east)..controls +(east:1.5em) and +(west:1.5em)..([xshift=-0em,yshift=-0em]ln5.west) ;
+
+\draw [->,thick] ([xshift=0.8em,yshift=0em]ln3.east) -- ([xshift=0em,yshift=0em]ffn.south);
+
+\draw [->,thick] ([xshift=0em,yshift=0em]ffn.north) -- ([xshift=0em,yshift=0em]fn.west);
+
+
+\draw [decorate,decoration={brace,mirror}] ([xshift=0em]n3.south west) to node [midway,font=\small,align=center,xshift=0em,yshift=-0.8em] {$d$} ([xshift=0em]n3.south east);
+\draw [decorate,decoration={brace,mirror}] ([xshift=0em]n6.south west) to node [midway,font=\small,align=center,xshift=0em,yshift=-0.8em] {$d_a$} ([xshift=0em]n6.south east);
+\draw [decorate,decoration={brace,mirror}] ([xshift=0em]n7.north west) to node [midway,font=\small,align=center,xshift=-0.7em,yshift=-0em] {$d$} ([xshift=0em]n7.south west);
+\draw [decorate,decoration={brace}] ([xshift=0em]n7.north west) to node [midway,font=\small,align=center,xshift=0em,yshift=0.7em] {$d$} ([xshift=0em]n7.north east);
+\draw [decorate,decoration={brace,mirror}] ([xshift=0em]n8.north west) to node [midway,font=\small,align=center,xshift=-0.8em,yshift=-0em] {$d_a$} ([xshift=0em]n8.south west);
+\draw [decorate,decoration={brace}] ([xshift=0em]n8.north west) to node [midway,font=\small,align=center,xshift=0em,yshift=0.8em] {$n_{hop}$} ([xshift=0em]n8.north east);
+\draw [decorate,decoration={brace,mirror}] ([xshift=0em]nc31.south west) to node [midway,font=\small,align=center,xshift=0em,yshift=-0.8em] {$n_{hop}$} ([xshift=0em]nc35.south east);
+\draw [decorate,decoration={brace,mirror}] ([xshift=0em]ln5.south west) to node [midway,font=\small,align=center,xshift=0em,yshift=-0.8em] {$d$} ([xshift=0em]ln5.south east);
+\draw [decorate] ([xshift=0em]ln5.south east) to node [midway,font=\footnotesize,align=center,xshift=1em,yshift=-0.5em] {$n_{hop}$} ([xshift=0em]ln1.south east);
+\draw [decorate,decoration={brace,mirror}] ([xshift=0em]fn.south east) to node [midway,font=\small,align=center,xshift=0.7em,yshift=-0em] {$d$} ([xshift=0em]fn.north east);
+
+
+
+\end{scope}
+\end{tikzpicture}
\ No newline at end of file
--- a/Chapter15/Figures/figure-light-weight-transformer-module.png
+++ b/Chapter15/Figures/figure-light-weight-transformer-module.png
--- a/Chapter15/Figures/figure-light-weight-transformer-module.tex
+++ b/Chapter15/Figures/figure-light-weight-transformer-module.tex
+%%%------------------------------------------------------------------------------------------------------------
+%%% 调序模型1：基于距离的调序
+\begin{center}
+\begin{tikzpicture}
+
+\tikzstyle{manode}=[rectangle,inner sep=0mm,minimum height=4em,minimum width=4em,rounded corners=5pt,thick,draw,fill=blue!20]
+\tikzstyle{ffnnode}=[rectangle,inner sep=0mm,minimum height=1.8em,minimum width=6em,rounded corners=5pt,thick,fill=red!20,draw]
+\tikzstyle{ebnode}=[rectangle,inner sep=0mm,minimum height=1.8em,minimum width=10em,rounded corners=5pt,thick,fill=green!20,draw]
+
+\begin{scope}[]
+
+\node [anchor=west,ffnnode] (f1) at (0, 0){FFN};
+\node [anchor=south,ebnode] (e1) at ([xshift=0em,yshift=1em]f1.north){Embedding};
+\node [anchor=south west,manode] (a1) at ([xshift=0em,yshift=1em]e1.north west){Attention};
+\node [anchor=south east,manode] (c1) at ([xshift=0em,yshift=1em]e1.north east){Conv};
+\node [anchor=south west,ebnode] (e2) at ([xshift=0em,yshift=1em]a1.north west){Embedding};
+\node [anchor=south,draw,circle,inner sep=4pt] (add1) at ([xshift=0em,yshift=0.5em]e2.north){};
+\node [anchor=south,ffnnode] (f2) at ([xshift=0em,yshift=0.5em]add1.north){FFN};
+
+
+\draw[->,thick] ([xshift=0em,yshift=0em]f1.north)--([xshift=0em,yshift=0em]e1.south);
+\draw[->,thick] ([xshift=0em,yshift=-1em]a1.south)--([xshift=0em,yshift=0em]a1.south);
+\draw[->,thick] ([xshift=0em,yshift=-1em]c1.south)--([xshift=0em,yshift=0em]c1.south);
+\draw[->,thick] ([xshift=0em,yshift=0em]a1.north)--([xshift=0em,yshift=1em]a1.north);
+\draw[->,thick] ([xshift=0em,yshift=0em]c1.north)--([xshift=0em,yshift=1em]c1.north);
+\draw[-,thick] ([xshift=0em,yshift=0em]e2.north)--([xshift=0em,yshift=0em]add1.south);
+\draw[->,thick] ([xshift=0em,yshift=0em]add1.north)--([xshift=0em,yshift=0em]f2.south);
+
+\draw[-] ([xshift=0em,yshift=0em]add1.west)--([xshift=-0em,yshift=0em]add1.east);
+\draw[-] ([xshift=0em,yshift=0em]add1.south)--([xshift=-0em,yshift=-0em]add1.north);
+
+
+\draw[->,thick,rectangle,rounded corners=5pt] ([xshift=0em,yshift=0.5em]f1.north)--([xshift=-6em,yshift=0.5em]f1.north)--([xshift=-5.45em,yshift=0em]add1.west)--([xshift=0em,yshift=0em]add1.west);
+
+
+\end{scope}
+\end{tikzpicture}
+\end{center}
\ No newline at end of file
--- a/Chapter15/Figures/figure-linear-layer-aggregation-network.png
+++ b/Chapter15/Figures/figure-linear-layer-aggregation-network.png
--- a/Chapter15/Figures/figure-multi-branch-attention-model.png
+++ b/Chapter15/Figures/figure-multi-branch-attention-model.png
--- a/Chapter15/Figures/figure-multi-branch-attention-model.tex
+++ b/Chapter15/Figures/figure-multi-branch-attention-model.tex
+%%%------------------------------------------------------------------------------------------------------------
+%%% 调序模型1：基于距离的调序
+\begin{center}
+\begin{tikzpicture}
+
+\tikzstyle{manode}=[rectangle,inner sep=0mm,minimum height=1.8em,minimum width=10em,rounded corners=5pt,thick,draw,fill=teal!20]
+\tikzstyle{ffnnode}=[rectangle,inner sep=0mm,minimum height=1.8em,minimum width=3em,rounded corners=5pt,thick,fill=red!20,draw]
+\tikzstyle{lnnode}=[rectangle,inner sep=0mm,minimum height=2em,minimum width=2.5em,rounded corners=5pt,thick,fill=green!20,draw]
+
+\begin{scope}[]
+
+\node [anchor=east,circle,fill=black,inner sep = 2pt] (n1) at (-0, 0) {};
+\node [anchor=west,draw,circle,inner sep=5pt] (n2) at ([xshift=13em,yshift=0em]n1.east){};
+\node [anchor=west,lnnode] (n3) at ([xshift=1.5em,yshift=0em]n2.east){LN};
+\node [anchor=west,circle,fill=black,inner sep=2pt] (n4) at ([xshift=1.5em,yshift=0em]n3.east){};
+\node [anchor=west,draw,circle,inner sep=5pt] (n5) at ([xshift=5em,yshift=0em]n4.east){};
+\node [anchor=west,lnnode] (n6) at ([xshift=1.5em,yshift=0em]n5.east){LN};
+
+\node [anchor=west,manode] (a1) at ([xshift=1.5em,yshift=2em]n1.east){Multi-Head Attention};
+\node [anchor=south] (a2) at ([xshift=0em,yshift=0.2em]a1.north){$\cdots$};
+\node [anchor=south,manode] (a3) at ([xshift=0em,yshift=0.2em]a2.north){Multi-Head Attention};
+
+\node [anchor=west,ffnnode] (f1) at ([xshift=1em,yshift=2em]n4.east){FFN};
+
+
+\draw[->,thick] ([xshift=-1em,yshift=0em]n1.west)--([xshift=0em,yshift=0em]n1.west);
+\draw[->,thick] ([xshift=0em,yshift=0em]n1.east)--([xshift=0em,yshift=0em]n2.west);
+\draw[->,thick] ([xshift=0em,yshift=0em]n2.east)--([xshift=0em,yshift=0em]n3.west);
+\draw[->,thick] ([xshift=0em,yshift=0em]n3.east)--([xshift=0em,yshift=0em]n4.west);
+\draw[->,thick] ([xshift=0em,yshift=0em]n4.east)--([xshift=0em,yshift=0em]n5.west);
+\draw[->,thick] ([xshift=0em,yshift=0em]n5.east)--([xshift=0em,yshift=0em]n6.west);
+\draw[->,thick] ([xshift=0em,yshift=0em]n6.east)--([xshift=1em,yshift=0em]n6.east);
+
+\draw[->,thick] ([xshift=0em,yshift=0em]n1.east)--([xshift=0em,yshift=0em]a1.west);
+\draw[->,thick] ([xshift=0em,yshift=0em]n1.east)--([xshift=0em,yshift=0em]a3.west);
+\draw[->,thick] ([xshift=0em,yshift=0em]n4.east)--([xshift=0em,yshift=0em]f1.west);
+
+\draw[->,thick,ublue,dashed] ([xshift=0em,yshift=0em]a1.east)--([xshift=0em,yshift=0em]n2.west);
+\draw[->,thick,ublue,dashed] ([xshift=0em,yshift=0em]a3.east)--([xshift=0em,yshift=0em]n2.west);
+\draw[->,thick,ublue,dashed] ([xshift=0em,yshift=0em]f1.east)--([xshift=0em,yshift=0em]n5.west);
+
+\node [anchor=west,ublue,font=\footnotesize,align=left] (w1) at ([xshift=5em,yshift=-0.5em]a2.east){以概率\\$p$丢弃};
+\node [anchor=west,ublue,font=\footnotesize,align=left] (w2) at ([xshift=0.5em,yshift=0em]f1.east){以概率\\$p$丢弃};
+
+\draw[-] ([xshift=0em,yshift=0em]n2.west)--([xshift=-0em,yshift=0em]n2.east);
+\draw[-] ([xshift=0em,yshift=0em]n2.south)--([xshift=-0em,yshift=-0em]n2.north);
+
+\draw[-] ([xshift=0em,yshift=0em]n5.west)--([xshift=-0em,yshift=0em]n5.east);
+\draw[-] ([xshift=0em,yshift=0em]n5.south)--([xshift=-0em,yshift=-0em]n5.north);
+
+\end{scope}
+\end{tikzpicture}
+\end{center}
\ No newline at end of file
--- a/Chapter15/Figures/figure-multi-cell-transformer.png
+++ b/Chapter15/Figures/figure-multi-cell-transformer.png
--- a/Chapter15/Figures/figure-multi-task-structure.png
+++ b/Chapter15/Figures/figure-multi-task-structure.png
--- a/Chapter15/Figures/figure-parallel-RNN-structure.png
+++ b/Chapter15/Figures/figure-parallel-RNN-structure.png
--- a/Chapter15/Figures/figure-parallel-RNN-structure.tex
+++ b/Chapter15/Figures/figure-parallel-RNN-structure.tex
+%%%------------------------------------------------------------------------------------------------------------
+
+\begin{center}
+\begin{tikzpicture}
+
+\tikzstyle{wrnode}=[rectangle,inner sep=0mm,minimum height=1.8em,minimum width=4em,rounded corners=5pt,fill=blue!30]
+\tikzstyle{arnode}=[rectangle,inner sep=0mm,minimum height=1.8em,minimum width=4em,rounded corners=5pt,fill=red!30]
+\tikzstyle{dotnode}=[inner sep=0mm,minimum height=0.5em,minimum width=1.5em]
+\tikzstyle{wnode}=[inner sep=0mm,minimum height=1.8em]
+{\small
+\begin{scope}[]
+
+\node [anchor=north west,wnode] (w1) at (0,0) {词预测模型};
+\node [anchor=west,wrnode] (w2) at ([xshift=1.5em,yshift=0em]w1.east) {$\mathbi{h}_{1}^{\textrm{word}}$};
+\node [anchor=west,wrnode] (w3) at ([xshift=1.5em,yshift=0em]w2.east) {$\mathbi{h}_{2}^{\textrm{word}}$};
+\node [anchor=west,wrnode] (w4) at ([xshift=7em,yshift=0em]w3.east) {$\mathbi{h}_{4}^{\textrm{word}}$};
+\node [anchor=west,dotnode] (dot1) at ([xshift=1.5em,yshift=0em]w4.east) {$\cdots$};
+
+
+\node [anchor=north east,wnode] (a1) at ([xshift=0em,yshift=-6.6em]w1.south east) {动作模型};
+\node [anchor=west,arnode] (a2) at ([xshift=1.5em,yshift=0em]a1.east) {$\mathbi{h}_{1}^{\textrm{action}}$};
+\node [anchor=west,arnode] (a3) at ([xshift=1.5em,yshift=0em]a2.east) {$\mathbi{h}_{2}^{\textrm{action}}$};
+\node [anchor=west,arnode] (a4) at ([xshift=1.5em,yshift=0em]a3.east) {$\mathbi{h}_{3}^{\textrm{action}}$};
+\node [anchor=west,arnode] (a5) at ([xshift=1.5em,yshift=0em]a4.east) {$\mathbi{h}_{4}^{\textrm{action}}$};
+\node [anchor=west,arnode] (a6) at ([xshift=1.5em,yshift=0em]a5.east) {$\mathbi{h}_{5}^{\textrm{action}}$};
+
+\node [anchor=south,wnode] (word1) at ([xshift=0em,yshift=1em]w2.north) {你};
+\node [anchor=south,wnode] (word2) at ([xshift=0em,yshift=1em]w3.north) {是};
+\node [anchor=south,wnode] (word3) at ([xshift=0em,yshift=1em]w4.north) {谁};
+\node [anchor=north,wnode] (word4) at ([xshift=0em,yshift=-1em]w2.south) {$\langle$sos$\rangle$};
+\node [anchor=north,wnode] (word5) at ([xshift=0em,yshift=-1em]w3.south) {你};
+\node [anchor=north,wnode] (word6) at ([xshift=0em,yshift=-1em]w4.south) {谁};
+
+\node [anchor=south,wnode] (word7) at ([xshift=0em,yshift=1em]a2.north) {移位};
+\node [anchor=south,wnode] (word8) at ([xshift=0em,yshift=1em]a3.north) {移位};
+\node [anchor=south,wnode] (word9) at ([xshift=0em,yshift=1em]a4.north) {左规约};
+\node [anchor=south,wnode] (word10) at ([xshift=0em,yshift=1em]a5.north) {移位};
+\node [anchor=south,wnode] (word11) at ([xshift=0em,yshift=1em]a6.north) {右规约};
+\node [anchor=north,wnode] (word12) at ([xshift=0em,yshift=-1em]a2.south) {$\langle$sos$\rangle$};
+\node [anchor=north,wnode] (word13) at ([xshift=0em,yshift=-1em]a3.south) {移位};
+\node [anchor=north,wnode] (word14) at ([xshift=0em,yshift=-1em]a4.south) {移位};
+\node [anchor=north,wnode] (word15) at ([xshift=0em,yshift=-1em]a5.south) {左规约};
+\node [anchor=north,wnode] (word16) at ([xshift=0em,yshift=-1em]a6.south) {移位};
+
+
+\node [anchor=south,wnode] (wl1) at ([xshift=6em,yshift=-1em]dot1.north) {是};
+\node [anchor=north,wnode] (wl2) at ([xshift=-2em,yshift=-2em]wl1.south) {你};
+\node [anchor=north,wnode] (wl3) at ([xshift=2em,yshift=-2em]wl1.south) {谁};
+
+\node [anchor=north,font=\tiny,rotate=45] (e1) at ([xshift=-2.2em,yshift=-0.4em]wl1.south) {左规约生成};
+\node [anchor=north,font=\tiny,rotate=-45] (e2) at ([xshift=2.2em,yshift=-0.4em]wl1.south) {右规约生成};
+
+\draw [->,thick] ([xshift=0em,yshift=0em]wl1.south) -- ([xshift=0em,yshift=0em]wl2.north);
+\draw [->,thick] ([xshift=0em,yshift=0em]wl1.south) -- ([xshift=0em,yshift=0em]wl3.north);
+
+\draw [->,thick] ([xshift=0em,yshift=0em]w1.east) -- ([xshift=0em,yshift=0em]w2.west);
+\draw [->,thick] ([xshift=0em,yshift=0em]w2.east) -- ([xshift=0em,yshift=0em]w3.west);
+\draw [->,thick] ([xshift=0em,yshift=0em]w3.east) -- ([xshift=0em,yshift=0em]w4.west);
+\draw [->,thick] ([xshift=0em,yshift=0em]w4.east) -- ([xshift=0em,yshift=0em]dot1.west);
+
+\draw [->,thick] ([xshift=0em,yshift=0em]a1.east) -- ([xshift=0em,yshift=0em]a2.west);
+\draw [->,thick] ([xshift=0em,yshift=0em]a2.east) -- ([xshift=0em,yshift=0em]a3.west);
+\draw [->,thick] ([xshift=0em,yshift=0em]a3.east) -- ([xshift=0em,yshift=0em]a4.west);
+\draw [->,thick] ([xshift=0em,yshift=0em]a4.east) -- ([xshift=0em,yshift=0em]a5.west);
+\draw [->,thick] ([xshift=0em,yshift=0em]a5.east) -- ([xshift=0em,yshift=0em]a6.west);
+
+\draw [->,thick] ([xshift=0em,yshift=0em]w2.north) -- ([xshift=0em,yshift=0em]word1.south);
+\draw [->,thick] ([xshift=0em,yshift=0em]w3.north) -- ([xshift=0em,yshift=0em]word2.south);
+\draw [->,thick] ([xshift=0em,yshift=0em]w4.north) -- ([xshift=0em,yshift=0em]word3.south);
+\draw [->,thick] ([xshift=0em,yshift=0em]word4.north) -- ([xshift=0em,yshift=0em]w2.south);
+\draw [->,thick] ([xshift=0em,yshift=0em]word5.north) -- ([xshift=0em,yshift=0em]w3.south);
+\draw [->,thick] ([xshift=0em,yshift=0em]word6.north) -- ([xshift=0em,yshift=0em]w4.south);
+\draw [->,thick] ([xshift=0em,yshift=0em]word7.north) -- ([xshift=0em,yshift=0em]word4.south);
+\draw [->,thick] ([xshift=0em,yshift=0em]word8.north) -- ([xshift=0em,yshift=0em]word5.south);
+\draw [->,thick] ([xshift=0em,yshift=0em]word10.north) -- ([xshift=0em,yshift=0em]word6.south);
+\draw [->,thick] ([xshift=0em,yshift=0em]a2.north) -- ([xshift=0em,yshift=0em]word7.south);
+\draw [->,thick] ([xshift=0em,yshift=0em]a3.north) -- ([xshift=0em,yshift=0em]word8.south);
+\draw [->,thick] ([xshift=0em,yshift=0em]a4.north) -- ([xshift=0em,yshift=0em]word9.south);
+\draw [->,thick] ([xshift=0em,yshift=0em]a5.north) -- ([xshift=0em,yshift=0em]word10.south);
+\draw [->,thick] ([xshift=0em,yshift=0em]a6.north) -- ([xshift=0em,yshift=0em]word11.south);
+\draw [->,thick] ([xshift=0em,yshift=0em]word12.north) -- ([xshift=0em,yshift=0em]a2.south);
+\draw [->,thick] ([xshift=0em,yshift=0em]word13.north) -- ([xshift=0em,yshift=0em]a3.south);
+\draw [->,thick] ([xshift=0em,yshift=0em]word14.north) -- ([xshift=0em,yshift=0em]a4.south);
+\draw [->,thick] ([xshift=0em,yshift=0em]word15.north) -- ([xshift=0em,yshift=0em]a5.south);
+\draw [->,thick] ([xshift=0em,yshift=0em]word16.north) -- ([xshift=0em,yshift=0em]a6.south);
+
+\draw[->,thick,dashed] ([xshift=0em,yshift=-0em]word1.east)..controls +(east:4em) and +(west:3em)..([xshift=-0em,yshift=-0em]a3.west) ;
+\draw[->,thick,dashed] ([xshift=0em,yshift=-0em]word2.east)..controls +(east:4em) and +(west:3em)..([xshift=-0em,yshift=-0em]a4.west) ;
+\draw[->,thick,dashed] ([xshift=0em,yshift=-0em]word2.east)..controls +(east:10em) and +(west:4em)..([xshift=-0em,yshift=-0em]a5.west) ;
+\draw[->,thick,dashed] ([xshift=0em,yshift=-0em]word3.east)..controls +(east:4em) and +(west:3em)..([xshift=-0em,yshift=-0em]a6.west) ;
+
+
+
+\end{scope}
+}
+\end{tikzpicture}
+\end{center}
\ No newline at end of file
--- a/Chapter15/Figures/figure-parsing-tree-of-a-sentence.png
+++ b/Chapter15/Figures/figure-parsing-tree-of-a-sentence.png
--- a/Chapter15/Figures/figure-parsing-tree-of-a-sentence.tex
+++ b/Chapter15/Figures/figure-parsing-tree-of-a-sentence.tex
+%%%------------------------------------------------------------------------------------------------------------
+
+\begin{center}
+\begin{tikzpicture}
+
+\tikzstyle{wnode}=[inner sep=0mm,minimum height=1.5em,minimum width=2em]
+
+\begin{scope}[sibling distance=15pt, level distance = 30pt]
+
+\Tree[.\node(n1){{S}};
+        [.\node(n2){{NP}};
+	        [.\node(n3){{PRN}}; \node(w1){{I}};]
+		    ]
+	        [.\node(n4){{VP}};
+	            [. \node(n5){VBP}; \node(w2){{love}};]
+	            [. \node(cw4){NP};
+                   [. \node(n6){NNS}; \node(w3){{dogs}};]
+                ]
+	        ]
+        ]
+     ]
+
+\node [anchor=north] (label1) at ([xshift=0em,yshift=-4em]w2.south) {(a)句法树};
+
+\end{scope}
+
+
+\begin{scope}[xshift=1.8in,yshift=0em]
+
+\node [anchor=west,wnode] (w1) at (0,0) {I};
+\node [anchor=west,wnode] (w2) at ([xshift=3em,yshift=0em]w1.east) {love};
+\node [anchor=west,wnode] (w3) at ([xshift=3em,yshift=0em]w2.east) {dogs};
+
+\node [anchor=north,wnode] (w4) at ([xshift=0em,yshift=-2em]w1.south) {$w_1$};
+\node [anchor=north,wnode] (w5) at ([xshift=0em,yshift=-2em]w2.south) {$w_2$};
+\node [anchor=north,wnode] (w6) at ([xshift=0em,yshift=-2em]w3.south) {$w_3$};
+
+\node [anchor=north] (label2) at ([xshift=0em,yshift=-1.5em]w5.south) {(b)词序列};
+
+\end{scope}
+
+\begin{scope}[xshift=1.2in,yshift=-1.5in]
+
+\node [anchor=west,wnode] (l1) at (0,0) {S};
+\node [anchor=west,wnode] (l2) at ([xshift=1em,yshift=0em]l1.east) {NP};
+\node [anchor=west,wnode] (l3) at ([xshift=1em,yshift=0em]l2.east) {PRN};
+\node [anchor=west,wnode] (l4) at ([xshift=1em,yshift=0em]l3.east) {VP};
+\node [anchor=west,wnode] (l5) at ([xshift=1em,yshift=0em]l4.east) {VBP};
+\node [anchor=west,wnode] (l6) at ([xshift=1em,yshift=0em]l5.east) {NP};
+\node [anchor=west,wnode] (l7) at ([xshift=1em,yshift=0em]l6.east) {NNS};
+
+\node [anchor=north,wnode] (l8) at ([xshift=0em,yshift=-1em]l1.south) {$l_1$};
+\node [anchor=north,wnode] (l9) at ([xshift=0em,yshift=-1em]l2.south) {$l_2$};
+\node [anchor=north,wnode] (l10) at ([xshift=0em,yshift=-1em]l3.south) {$l_3$};
+\node [anchor=north,wnode] (l11) at ([xshift=0em,yshift=-1em]l4.south) {$l_4$};
+\node [anchor=north,wnode] (l12) at ([xshift=0em,yshift=-1em]l5.south) {$l_5$};
+\node [anchor=north,wnode] (l13) at ([xshift=0em,yshift=-1em]l6.south) {$l_6$};
+\node [anchor=north,wnode] (l14) at ([xshift=0em,yshift=-1em]l7.south) {$l_7$};
+
+\node [anchor=north] (label3) at ([xshift=0em,yshift=-1.5em]l11.south) {(c)句法序列};
+
+\end{scope}
+
+
+\end{tikzpicture}
+\end{center}
\ No newline at end of file
--- a/Chapter15/Figures/figure-syntax-tree-linearization-example.png
+++ b/Chapter15/Figures/figure-syntax-tree-linearization-example.png
--- a/Chapter15/Figures/figure-syntax-tree-linearization-example.tex
+++ b/Chapter15/Figures/figure-syntax-tree-linearization-example.tex
+%%%------------------------------------------------------------------------------------------------------------
+
+\begin{center}
+\begin{tikzpicture}
+
+\tikzstyle{wnode}=[inner sep=0mm,minimum height=1.5em,minimum width=2em]
+{\small
+\begin{scope}[sibling distance=15pt, level distance = 30pt]
+
+\Tree[.\node(n1){{S}};
+        [.\node(n2){{NP}}; \node(w1){{Jane}};]
+        [.\node(n3){{VP}};
+            [. \node(w2){had};]
+            [. \node(n4){NP}; \node(w3){{a cat}};]
+            ]
+        [. \node(w4){.};]
+     ]
+
+\end{scope}
+}
+
+{\small
+\begin{scope}[xshift=1in,yshift=-0.7in]
+
+\node [anchor=west] (n1) at (0.5em,0em) {(Root(S(NP Jane)NP(VP had(NP a cat)NP)VP .)S)Root};
+
+\draw [->,very thick] ([xshift=-2.3em,yshift=0em]n1.west) -- ([xshift=-0.5em,yshift=0em]n1.west);
+
+\end{scope}
+}
+
+\end{tikzpicture}
+\end{center}
\ No newline at end of file
--- a/Chapter15/Figures/figure-three-fusion-methods-of-tree-structure-information-1.png
+++ b/Chapter15/Figures/figure-three-fusion-methods-of-tree-structure-information-1.png
--- a/Chapter15/Figures/figure-three-fusion-methods-of-tree-structure-information-1.tex
+++ b/Chapter15/Figures/figure-three-fusion-methods-of-tree-structure-information-1.tex
+%%%------------------------------------------------------------------------------------------------------------
+
+\begin{center}
+\begin{tikzpicture}
+
+\tikzstyle{wrnode}=[rectangle,inner sep=0mm,minimum height=1.8em,minimum width=3em,rounded corners=5pt,fill=blue!30]
+\tikzstyle{srnode}=[rectangle,inner sep=0mm,minimum height=1.8em,minimum width=3em,rounded corners=5pt,fill=yellow!30]
+\tikzstyle{dotnode}=[inner sep=0mm,minimum height=0.5em,minimum width=1.5em]
+\tikzstyle{wnode}=[inner sep=0mm,minimum height=1.8em]
+{\small
+\begin{scope}[]
+
+\node [anchor=west,wrnode] (wr1) at (0,0) {$\mathbi{h}_{w_1}$};
+\node [anchor=west,wrnode] (wr2) at ([xshift=1em,yshift=0em]wr1.east) {$\mathbi{h}_{w_2}$};
+\node [anchor=west,wrnode] (wr3) at ([xshift=1em,yshift=0em]wr2.east) {$\mathbi{h}_{w_3}$};
+
+\node [anchor=west,srnode] (sr1) at ([xshift=2em,yshift=0em]wr3.east) {$\mathbi{h}_{l_1}$};
+\node [anchor=west,dotnode] (dot1) at ([xshift=0.8em,yshift=0em]sr1.east) {$\cdots$};
+\node [anchor=west,srnode] (sr2) at ([xshift=0.8em,yshift=0em]dot1.east) {$\mathbi{h}_{l_3}$};
+\node [anchor=west,dotnode] (dot2) at ([xshift=0.8em,yshift=0em]sr2.east) {$\cdots$};
+\node [anchor=west,srnode] (sr3) at ([xshift=0.8em,yshift=0em]dot2.east) {$\mathbi{h}_{l_5}$};
+\node [anchor=west,dotnode] (dot3) at ([xshift=0.8em,yshift=0em]sr3.east) {$\cdots$};
+\node [anchor=west,srnode] (sr4) at ([xshift=0.8em,yshift=0em]dot3.east) {$\mathbi{h}_{l_7}$};
+
+\node [anchor=north,wnode,font=\footnotesize] (w1) at ([xshift=0em,yshift=-1em]wr1.south) {$w_1$\ :\ I};
+\node [anchor=north,wnode,font=\footnotesize] (w2) at ([xshift=0em,yshift=-1em]wr2.south) {$w_2$\ :\ love};
+\node [anchor=north,wnode,font=\footnotesize] (w3) at ([xshift=0em,yshift=-1em]wr3.south) {$w_3$\ :\ dogs};
+
+\node [anchor=north,wnode,font=\footnotesize] (w4) at ([xshift=0em,yshift=-1em]sr1.south) {$l_1$\ :\ S};
+\node [anchor=north,dotnode] (dot4) at ([xshift=0em,yshift=-2.4em]dot1.south) {$\cdots$};
+\node [anchor=north,wnode,font=\footnotesize] (w5) at ([xshift=0em,yshift=-1em]sr2.south) {$l_3$\ :\ PRN};
+\node [anchor=north,dotnode] (dot5) at ([xshift=0em,yshift=-2.2em]dot2.south) {$\cdots$};
+\node [anchor=north,wnode,font=\footnotesize] (w6) at ([xshift=0em,yshift=-1em]sr3.south) {$l_5$\ :\ VBP};
+\node [anchor=north,dotnode] (dot6) at ([xshift=0em,yshift=-2.3em]dot3.south) {$\cdots$};
+\node [anchor=north,wnode,font=\footnotesize] (w7) at ([xshift=0em,yshift=-1em]sr4.south) {$l_7$\ :\ NNS};
+
+\node [anchor=south,circle,draw,minimum size=1.2em] (c1) at ([xshift=2.5em,yshift=2em]wr2.north){};
+\node [anchor=west,circle,draw,minimum size=1.2em] (c2) at ([xshift=8em,yshift=0em]c1.east){};
+\node [anchor=west,circle,draw,minimum size=1.2em] (c3) at ([xshift=8em,yshift=0em]c2.east){};
+
+\node [anchor=south,srnode] (m1) at ([xshift=0em,yshift=2em]c1.north) {$\mathbi{h}_{l_1}$};
+\node [anchor=south,wrnode] (m2) at ([xshift=0em,yshift=0em]m1.north) {$\mathbi{h}_{w_1}$};
+\node [anchor=south,srnode] (m3) at ([xshift=0em,yshift=2em]c2.north) {$\mathbi{h}_{l_5}$};
+\node [anchor=south,wrnode] (m4) at ([xshift=0em,yshift=0em]m3.north) {$\mathbi{h}_{w_2}$};
+\node [anchor=south,srnode] (m5) at ([xshift=0em,yshift=2em]c3.north) {$\mathbi{h}_{l_7}$};
+\node [anchor=south,wrnode] (m6) at ([xshift=0em,yshift=0em]m5.north) {$\mathbi{h}_{w_3}$};
+
+
+\draw[-] (c1.west)--(c1.east);
+\draw[-] (c1.north)--(c1.south);
+\draw[-] (c2.west)--(c2.east);
+\draw[-] (c2.north)--(c2.south);
+\draw[-] (c3.west)--(c3.east);
+\draw[-] (c3.north)--(c3.south);
+
+\begin{pgfonlayer}{background}
+\node [rectangle,inner sep=0.5em,draw=blue!80,dashed,very thick,rounded corners=10pt] [fit = (wr1) (wr3) (w1) (w3)] (box1) {};
+\node [rectangle,inner sep=0.5em,draw=yellow!80,dashed,very thick,rounded corners=10pt] [fit = (sr1) (sr4) (w4) (w7)] (box2) {};
+\node [rectangle,minimum height=5em,inner sep=0.6em,fill=gray!20,draw=black,dashed,very thick,rounded corners=8pt] [fit = (m1) (m2)] (box3) {};
+\node [rectangle,minimum height=5em,inner sep=0.6em,fill=gray!20,draw=black,dashed,very thick,rounded corners=8pt] [fit = (m3) (m4)] (box4) {};
+\node [rectangle,minimum height=5em,inner sep=0.6em,fill=gray!20,draw=black,dashed,very thick,rounded corners=8pt] [fit = (m5) (m6)] (box5) {};
+\end{pgfonlayer}
+
+\node [anchor=south,wnode] (h1) at ([xshift=0em,yshift=0.1em]box3.north) {${\mathbi{h}'}_1$\ :\ };
+\node [anchor=south,wnode] (h2) at ([xshift=0em,yshift=0.1em]box4.north) {${\mathbi{h}'}_2$\ :\ };
+\node [anchor=south,wnode] (h3) at ([xshift=0em,yshift=0.1em]box5.north) {${\mathbi{h}'}_3$\ :\ };
+
+\draw [->,thick] ([xshift=0em,yshift=0em]w1.north) -- ([xshift=0em,yshift=0em]wr1.south);
+\draw [->,thick] ([xshift=0em,yshift=0em]w2.north) -- ([xshift=0em,yshift=0em]wr2.south);
+\draw [->,thick] ([xshift=0em,yshift=0em]w3.north) -- ([xshift=0em,yshift=0em]wr3.south);
+\draw [->,thick] ([xshift=0em,yshift=0em]w4.north) -- ([xshift=0em,yshift=0em]sr1.south);
+\draw [->,thick] ([xshift=0em,yshift=0em]w5.north) -- ([xshift=0em,yshift=0em]sr2.south);
+\draw [->,thick] ([xshift=0em,yshift=0em]w6.north) -- ([xshift=0em,yshift=0em]sr3.south);
+\draw [->,thick] ([xshift=0em,yshift=0em]w7.north) -- ([xshift=0em,yshift=0em]sr4.south);
+
+\draw [->,thick] ([xshift=0em,yshift=0.7em]dot4.north) -- ([xshift=0em,yshift=-0.7em]dot1.south);
+\draw [->,thick] ([xshift=0em,yshift=0.7em]dot5.north) -- ([xshift=0em,yshift=-0.7em]dot2.south);
+\draw [->,thick] ([xshift=0em,yshift=0.7em]dot6.north) -- ([xshift=0em,yshift=-0.7em]dot3.south);
+
+\draw [<->,thick] ([xshift=0em,yshift=0em]wr1.east) -- ([xshift=0em,yshift=0em]wr2.west);
+\draw [<->,thick] ([xshift=0em,yshift=0em]wr2.east) -- ([xshift=0em,yshift=0em]wr3.west);
+
+\draw [<->,thick] ([xshift=0em,yshift=0em]sr1.east) -- ([xshift=0em,yshift=0em]dot1.west);
+\draw [<->,thick] ([xshift=0em,yshift=0em]dot1.east) -- ([xshift=0em,yshift=0em]sr2.west);
+\draw [<->,thick] ([xshift=0em,yshift=0em]sr2.east) -- ([xshift=0em,yshift=0em]dot2.west);
+\draw [<->,thick] ([xshift=0em,yshift=0em]dot2.east) -- ([xshift=0em,yshift=0em]sr3.west);
+\draw [<->,thick] ([xshift=0em,yshift=0em]sr3.east) -- ([xshift=0em,yshift=0em]dot3.west);
+\draw [<->,thick] ([xshift=0em,yshift=0em]dot3.east) -- ([xshift=0em,yshift=0em]sr4.west);
+
+\draw[->,thick] ([xshift=0em,yshift=-0em]wr1.north)..controls +(north:2em) and +(west:0em)..([xshift=-0em,yshift=-0em]c1.west) ;
+\draw[->,thick] ([xshift=0em,yshift=-0em]sr2.north)..controls +(north:2em) and +(south:1em)..([xshift=-0em,yshift=-0em]c1.south east) ;
+
+\draw[->,thick] ([xshift=0em,yshift=-0em]wr2.north)..controls +(north:2em) and +(west:0em)..([xshift=-0em,yshift=-0em]c2.west) ;
+\draw[->,thick] ([xshift=0em,yshift=-0em]sr3.north)..controls +(north:2em) and +(east:0em)..([xshift=-0em,yshift=-0em]c2.east) ;
+
+\draw[->,thick] ([xshift=0em,yshift=-0em]wr3.north)..controls +(north:2em) and +(south:1em)..([xshift=-0em,yshift=-0em]c3.south west) ;
+\draw[->,thick] ([xshift=0em,yshift=-0em]sr4.north)..controls +(north:2em) and +(east:0em)..([xshift=-0em,yshift=-0em]c3.east) ;
+
+\draw [->,thick] ([xshift=0em,yshift=0em]c1.north) -- ([xshift=0em,yshift=0em]box3.south);
+\draw [->,thick] ([xshift=0em,yshift=0em]c2.north) -- ([xshift=0em,yshift=0em]box4.south);
+\draw [->,thick] ([xshift=0em,yshift=0em]c3.north) -- ([xshift=0em,yshift=0em]box5.south);
+
+\node [anchor=north] (r1) at ([xshift=0em,yshift=-1em]w2.south) {词语RNN};
+\node [anchor=north] (r2) at ([xshift=3em,yshift=-1em]w5.south) {句法RNN};
+\node [anchor=north] (label1) at ([xshift=0em,yshift=-4em]dot4.south) {(a)平行结构};
+
+\end{scope}
+}
+\end{tikzpicture}
+\end{center}
\ No newline at end of file
--- a/Chapter15/Figures/figure-three-fusion-methods-of-tree-structure-information-2.png
+++ b/Chapter15/Figures/figure-three-fusion-methods-of-tree-structure-information-2.png
--- a/Chapter15/Figures/figure-three-fusion-methods-of-tree-structure-information-2.tex
+++ b/Chapter15/Figures/figure-three-fusion-methods-of-tree-structure-information-2.tex
+%%%------------------------------------------------------------------------------------------------------------
+
+\begin{center}
+\begin{tikzpicture}
+
+\tikzstyle{wrnode}=[rectangle,inner sep=0mm,minimum height=1.8em,minimum width=3em,rounded corners=5pt,fill=blue!30]
+\tikzstyle{srnode}=[rectangle,inner sep=0mm,minimum height=1.8em,minimum width=3em,rounded corners=5pt,fill=yellow!30]
+\tikzstyle{dotnode}=[inner sep=0mm,minimum height=0.5em,minimum width=1.5em]
+\tikzstyle{wnode}=[inner sep=0mm,minimum height=1.8em]
+
+
+{\small
+\begin{scope}[]
+
+\node [anchor=west,srnode] (sr1) at (0,0) {$\mathbi{h}_{l_1}$};
+\node [anchor=west,dotnode] (dot1) at ([xshift=0.8em,yshift=0em]sr1.east) {$\cdots$};
+\node [anchor=west,srnode] (sr2) at ([xshift=0.8em,yshift=0em]dot1.east) {$\mathbi{h}_{l_3}$};
+\node [anchor=west,dotnode] (dot2) at ([xshift=0.8em,yshift=0em]sr2.east) {$\cdots$};
+\node [anchor=west,srnode] (sr3) at ([xshift=0.8em,yshift=0em]dot2.east) {$\mathbi{h}_{l_5}$};
+\node [anchor=west,dotnode] (dot3) at ([xshift=0.8em,yshift=0em]sr3.east) {$\cdots$};
+\node [anchor=west,srnode] (sr4) at ([xshift=0.8em,yshift=0em]dot3.east) {$\mathbi{h}_{l_7}$};
+
+\node [anchor=north,wnode,font=\footnotesize] (w4) at ([xshift=0em,yshift=-1em]sr1.south) {$l_1$\ :\ S};
+\node [anchor=north,dotnode] (dot4) at ([xshift=0em,yshift=-2.4em]dot1.south) {$\cdots$};
+\node [anchor=north,wnode,font=\footnotesize] (w5) at ([xshift=0em,yshift=-1em]sr2.south) {$l_3$\ :\ PRN};
+\node [anchor=north,dotnode] (dot5) at ([xshift=0em,yshift=-2.2em]dot2.south) {$\cdots$};
+\node [anchor=north,wnode,font=\footnotesize] (w6) at ([xshift=0em,yshift=-1em]sr3.south) {$l_5$\ :\ VBP};
+\node [anchor=north,dotnode] (dot6) at ([xshift=0em,yshift=-2.3em]dot3.south) {$\cdots$};
+\node [anchor=north,wnode,font=\footnotesize] (w7) at ([xshift=0em,yshift=-1em]sr4.south) {$l_7$\ :\ NNS};
+
+\node [anchor=south,circle,draw,minimum size=1.2em] (c1) at ([xshift=0em,yshift=4.5em]sr2.north){};
+\node [anchor=south,circle,draw,minimum size=1.2em] (c2) at ([xshift=0em,yshift=4.5em]sr3.north){};
+\node [anchor=south,circle,draw,minimum size=1.2em] (c3) at ([xshift=0em,yshift=4.5em]sr4.north){};
+
+\draw[-] (c1.west)--(c1.east);
+\draw[-] (c1.north)--(c1.south);
+\draw[-] (c2.west)--(c2.east);
+\draw[-] (c2.north)--(c2.south);
+\draw[-] (c3.west)--(c3.east);
+\draw[-] (c3.north)--(c3.south);
+
+\node [anchor=north east,wnode,font=\footnotesize] (w1) at ([xshift=-1em,yshift=-1em]c1.south west) {$w_1$\ :\ I};
+\node [anchor=north east,wnode,font=\footnotesize] (w2) at ([xshift=-1em,yshift=-1em]c2.south west) {$w_2$\ :\ love};
+\node [anchor=north east,wnode,font=\footnotesize] (w3) at ([xshift=-1em,yshift=-1em]c3.south west) {$w_3$\ :\ dogs};
+
+\node [anchor=south,wnode] (w8) at ([xshift=0em,yshift=0.5em]c1.north) {$\mathbi{e}_{w_1}$};
+\node [anchor=south,wnode] (w9) at ([xshift=0em,yshift=0.5em]c2.north) {$\mathbi{e}_{w_2}$};
+\node [anchor=south,wnode] (w10) at ([xshift=0em,yshift=0.5em]c3.north) {$\mathbi{e}_{w_2}$};
+
+\begin{pgfonlayer}{background}
+\node [rectangle,minimum height=5em,inner sep=0.6em,fill=ugreen!20,rounded corners=8pt] [fit = (c1) (w8)] (box6) {};
+\node [rectangle,minimum height=5em,inner sep=0.6em,fill=ugreen!20,rounded corners=8pt] [fit = (c2) (w9)] (box7) {};
+\node [rectangle,minimum height=5em,inner sep=0.6em,fill=ugreen!20,rounded corners=8pt] [fit = (c3) (w10)] (box8) {};
+\end{pgfonlayer}
+
+\node [anchor=south,wrnode] (wr1) at ([xshift=0em,yshift=1em]box6.north) {$\mathbi{h}_{w_1}$};
+\node [anchor=south,wrnode] (wr2) at ([xshift=0em,yshift=1em]box7.north) {$\mathbi{h}_{w_2}$};
+\node [anchor=south,wrnode] (wr3) at ([xshift=0em,yshift=1em]box8.north) {$\mathbi{h}_{w_3}$};
+
+\node [anchor=south,wnode] (h1) at ([xshift=0em,yshift=0.3em]wr1.north) {${\mathbi{h}'}_1$\ :\ };
+\node [anchor=south,wnode] (h2) at ([xshift=0em,yshift=0.3em]wr2.north) {${\mathbi{h}'}_2$\ :\ };
+\node [anchor=south,wnode] (h3) at ([xshift=0em,yshift=0.3em]wr3.north) {${\mathbi{h}'}_3$\ :\ };
+
+\begin{pgfonlayer}{background}
+\node [rectangle,minimum width=20em,minimum height=13em,inner sep=0.5em,draw=blue!80,dashed,very thick,rounded corners=10pt] [fit = (h1) (w1) (h3) (c3)] (box1) {};
+\node [rectangle,inner sep=0.5em,draw=yellow!80,dashed,very thick,rounded corners=10pt] [fit = (sr1) (sr4) (w4) (w7)] (box2) {};
+\node [rectangle,inner sep=0.4em,fill=gray!20,draw=black,dashed,very thick,rounded corners=8pt] [fit = (wr1)] (box3) {};
+\node [rectangle,inner sep=0.4em,fill=gray!20,draw=black,dashed,very thick,rounded corners=8pt] [fit = (wr2)] (box4) {};
+\node [rectangle,inner sep=0.4em,fill=gray!20,draw=black,dashed,very thick,rounded corners=8pt] [fit = (wr3)] (box5) {};
+\end{pgfonlayer}
+
+\draw [->,thick] ([xshift=0em,yshift=0em]w4.north) -- ([xshift=0em,yshift=0em]sr1.south);
+\draw [->,thick] ([xshift=0em,yshift=0em]w5.north) -- ([xshift=0em,yshift=0em]sr2.south);
+\draw [->,thick] ([xshift=0em,yshift=0em]w6.north) -- ([xshift=0em,yshift=0em]sr3.south);
+\draw [->,thick] ([xshift=0em,yshift=0em]w7.north) -- ([xshift=0em,yshift=0em]sr4.south);
+
+\draw [->,thick] ([xshift=0em,yshift=0.7em]dot4.north) -- ([xshift=0em,yshift=-0.7em]dot1.south);
+\draw [->,thick] ([xshift=0em,yshift=0.7em]dot5.north) -- ([xshift=0em,yshift=-0.7em]dot2.south);
+\draw [->,thick] ([xshift=0em,yshift=0.7em]dot6.north) -- ([xshift=0em,yshift=-0.7em]dot3.south);
+
+\draw [<->,thick] ([xshift=0em,yshift=0em]wr1.east) -- ([xshift=0em,yshift=0em]wr2.west);
+\draw [<->,thick] ([xshift=0em,yshift=0em]wr2.east) -- ([xshift=0em,yshift=0em]wr3.west);
+
+\draw [<->,thick] ([xshift=0em,yshift=0em]sr1.east) -- ([xshift=0em,yshift=0em]dot1.west);
+\draw [<->,thick] ([xshift=0em,yshift=0em]dot1.east) -- ([xshift=0em,yshift=0em]sr2.west);
+\draw [<->,thick] ([xshift=0em,yshift=0em]sr2.east) -- ([xshift=0em,yshift=0em]dot2.west);
+\draw [<->,thick] ([xshift=0em,yshift=0em]dot2.east) -- ([xshift=0em,yshift=0em]sr3.west);
+\draw [<->,thick] ([xshift=0em,yshift=0em]sr3.east) -- ([xshift=0em,yshift=0em]dot3.west);
+\draw [<->,thick] ([xshift=0em,yshift=0em]dot3.east) -- ([xshift=0em,yshift=0em]sr4.west);
+
+\draw [->,thick] ([xshift=0em,yshift=0em]sr2.north) -- ([xshift=0em,yshift=0em]c1.south);
+\draw [->,thick] ([xshift=0em,yshift=0em]sr3.north) -- ([xshift=0em,yshift=0em]c2.south);
+\draw [->,thick] ([xshift=0em,yshift=0em]sr4.north) -- ([xshift=0em,yshift=0em]c3.south);
+
+\draw [->,thick] ([xshift=0em,yshift=0em]box6.north) -- ([xshift=0em,yshift=0em]wr1.south);
+\draw [->,thick] ([xshift=0em,yshift=0em]box7.north) -- ([xshift=0em,yshift=0em]wr2.south);
+\draw [->,thick] ([xshift=0em,yshift=0em]box8.north) -- ([xshift=0em,yshift=0em]wr3.south);
+
+\node [anchor=east] (r2) at ([xshift=-2em,yshift=0em]box2.west) {句法RNN};
+\node [anchor=south] (r1) at ([xshift=0em,yshift=8em]r2.north) {词语RNN};
+\node [anchor=north] (label2) at ([xshift=0em,yshift=-2em]w5.south) {(b)分层结构};
+
+\end{scope}
+}
+\end{tikzpicture}
+\end{center}
\ No newline at end of file
--- a/Chapter15/Figures/figure-three-fusion-methods-of-tree-structure-information-3.png
+++ b/Chapter15/Figures/figure-three-fusion-methods-of-tree-structure-information-3.png
--- a/Chapter15/Figures/figure-three-fusion-methods-of-tree-structure-information-3.tex
+++ b/Chapter15/Figures/figure-three-fusion-methods-of-tree-structure-information-3.tex
+%%%------------------------------------------------------------------------------------------------------------
+
+\begin{center}
+\begin{tikzpicture}
+
+\tikzstyle{hnode}=[rectangle,inner sep=0mm,minimum height=1.8em,minimum width=3em,rounded corners=5pt,fill=red!30]
+\tikzstyle{dotnode}=[inner sep=0mm,minimum height=0.5em,minimum width=1.5em]
+\tikzstyle{wnode}=[inner sep=0mm,minimum height=1.8em]
+{\small
+\begin{scope}[]
+
+\node [anchor=west,hnode] (n1) at (0,0) {$\mathbi{h}_{1}$};
+\node [anchor=west,hnode] (n2) at ([xshift=1em,yshift=0em]n1.east) {$\mathbi{h}_{2}$};
+\node [anchor=west,dotnode] (dot1) at ([xshift=1em,yshift=0em]n2.east) {$\cdots$};
+\node [anchor=west,hnode] (n3) at ([xshift=1em,yshift=0em]dot1.east) {$\mathbi{h}_{4}$};
+\node [anchor=west,dotnode] (dot2) at ([xshift=1em,yshift=0em]n3.east) {$\cdots$};
+\node [anchor=west,hnode] (n4) at ([xshift=1em,yshift=0em]dot2.east) {$\mathbi{h}_{7}$};
+\node [anchor=west,dotnode] (dot3) at ([xshift=1em,yshift=0em]n4.east) {$\cdots$};
+\node [anchor=west,hnode] (n5) at ([xshift=1em,yshift=0em]dot3.east) {$\mathbi{h}_{10}$};
+
+\node [anchor=north,wnode,font=\footnotesize] (w1) at ([xshift=0em,yshift=-1em]n1.south) {$l_1$\ :\ S};
+\node [anchor=north,wnode,font=\footnotesize] (w2) at ([xshift=0em,yshift=-1em]n2.south) {$l_3$\ :\ NP};
+\node [anchor=north,dotnode] (dot4) at ([xshift=0em,yshift=-2.4em]dot1.south) {$\cdots$};
+\node [anchor=north,wnode,font=\footnotesize] (w3) at ([xshift=0em,yshift=-1em]n3.south) {$w_1$\ :\ I};
+\node [anchor=north,dotnode] (dot5) at ([xshift=0em,yshift=-2.2em]dot2.south) {$\cdots$};
+\node [anchor=north,wnode,font=\footnotesize] (w4) at ([xshift=0em,yshift=-1em]n4.south) {$w_2$\ :\ love};
+\node [anchor=north,dotnode] (dot6) at ([xshift=0em,yshift=-2.3em]dot3.south) {$\cdots$};
+\node [anchor=north,wnode,font=\footnotesize] (w5) at ([xshift=0em,yshift=-1em]n5.south) {$w_3$\ :\ dogs};
+
+
+\node [anchor=south,wnode] (h1) at ([xshift=0em,yshift=0.3em]n3.north) {${\mathbi{h}'}_1$\ :\ };
+\node [anchor=south,wnode] (h2) at ([xshift=0em,yshift=0.3em]n4.north) {${\mathbi{h}'}_2$\ :\ };
+\node [anchor=south,wnode] (h3) at ([xshift=0em,yshift=0.3em]n5.north) {${\mathbi{h}'}_3$\ :\ };
+
+
+\begin{pgfonlayer}{background}
+\node [rectangle,inner sep=0.5em,draw=red!80,dashed,very thick,rounded corners=10pt] [fit = (w1) (w5) (n1) (h3)] (box1) {};
+\node [rectangle,inner sep=0.4em,fill=gray!20,draw=black,dashed,very thick,rounded corners=8pt] [fit = (n3)] (box3) {};
+\node [rectangle,inner sep=0.4em,fill=gray!20,draw=black,dashed,very thick,rounded corners=8pt] [fit = (n4)] (box4) {};
+\node [rectangle,inner sep=0.4em,fill=gray!20,draw=black,dashed,very thick,rounded corners=8pt] [fit = (n5)] (box5) {};
+\end{pgfonlayer}
+
+
+\node [anchor=east] (r1) at ([xshift=-2em,yshift=0em]box1.west) {词语RNN};
+
+
+\node [anchor=south west,wnode] (l1) at ([xshift=1em,yshift=6em]r1.north west) {先序遍历句法树，得到序列：};
+\node [anchor=north west,wnode,align=center] (l2) at ([xshift=0.5em,yshift=-0.6em]l1.north east) {S\\[0.5em]$l_1$};
+\node [anchor=north west,wnode,align=center] (l3) at ([xshift=0.5em,yshift=0em]l2.north east) {NP\\[0.5em]$l_2$};
+\node [anchor=north west,wnode,align=center] (l4) at ([xshift=0.5em,yshift=0em]l3.north east) {PRN\\[0.5em]$l_3$};
+\node [anchor=north west,wnode,align=center] (l5) at ([xshift=0.5em,yshift=0em]l4.north east) {I\\[0.5em]$w_1$};
+\node [anchor=north west,wnode,align=center] (l6) at ([xshift=0.5em,yshift=0em]l5.north east) {VP\\[0.5em]$l_4$};
+\node [anchor=north west,wnode,align=center] (l7) at ([xshift=0.5em,yshift=0em]l6.north east) {VBP\\[0.5em]$l_5$};
+\node [anchor=north west,wnode,align=center] (l8) at ([xshift=0.5em,yshift=0em]l7.north east) {love\\[0.5em]$w_2$};
+\node [anchor=north west,wnode,align=center] (l9) at ([xshift=0.5em,yshift=0em]l8.north east) {NP\\[0.5em]$l_6$};
+\node [anchor=north west,wnode,align=center] (l10) at ([xshift=0.5em,yshift=0em]l9.north east) {NNS\\[0.5em]$l_7$};
+\node [anchor=north west,wnode,align=center] (l11) at ([xshift=0.5em,yshift=0em]l10.north east) {dogs\\[0.5em]$w_3$};
+
+
+
+\draw [->,thick] ([xshift=0em,yshift=0em]w1.north) -- ([xshift=0em,yshift=0em]n1.south);
+\draw [->,thick] ([xshift=0em,yshift=0em]w2.north) -- ([xshift=0em,yshift=0em]n2.south);
+\draw [->,thick] ([xshift=0em,yshift=0em]w3.north) -- ([xshift=0em,yshift=0em]n3.south);
+\draw [->,thick] ([xshift=0em,yshift=0em]w4.north) -- ([xshift=0em,yshift=0em]n4.south);
+\draw [->,thick] ([xshift=0em,yshift=0em]w5.north) -- ([xshift=0em,yshift=0em]n5.south);
+
+
+\draw [->,thick] ([xshift=0em,yshift=0.7em]dot4.north) -- ([xshift=0em,yshift=-0.7em]dot1.south);
+\draw [->,thick] ([xshift=0em,yshift=0.7em]dot5.north) -- ([xshift=0em,yshift=-0.7em]dot2.south);
+\draw [->,thick] ([xshift=0em,yshift=0.7em]dot6.north) -- ([xshift=0em,yshift=-0.7em]dot3.south);
+
+
+\draw [<->,thick] ([xshift=0em,yshift=0em]n1.east) -- ([xshift=0em,yshift=0em]n2.west);
+\draw [<->,thick] ([xshift=0em,yshift=0em]n2.east) -- ([xshift=0em,yshift=0em]dot1.west);
+\draw [<->,thick] ([xshift=0em,yshift=0em]dot1.east) -- ([xshift=0em,yshift=0em]n3.west);
+\draw [<->,thick] ([xshift=0em,yshift=0em]n3.east) -- ([xshift=0em,yshift=0em]dot2.west);
+\draw [<->,thick] ([xshift=0em,yshift=0em]dot2.east) -- ([xshift=0em,yshift=0em]n4.west);
+\draw [<->,thick] ([xshift=0em,yshift=0em]n4.east) -- ([xshift=0em,yshift=0em]dot3.west);
+\draw [<->,thick] ([xshift=0em,yshift=0em]dot3.east) -- ([xshift=0em,yshift=0em]n5.west);
+
+
+\node [anchor=north] (label2) at ([xshift=-2em,yshift=-2em]w3.south) {(c)混合结构};
+
+\end{scope}
+}
+\end{tikzpicture}
+\end{center}
\ No newline at end of file
--- a/Chapter15/Figures/figure-weighted-transformer-network-structure.png
+++ b/Chapter15/Figures/figure-weighted-transformer-network-structure.png
--- a/Chapter15/chapter15.tex
+++ b/Chapter15/chapter15.tex
--- a/Chapter16/Figures/figure-application-process-of-back-translation.tex
+++ b/Chapter16/Figures/figure-application-process-of-back-translation.tex
@@ -58,9 +58,9 @@

 \draw [->,thick]([xshift=-3.2em]remark3.west)--(remark3.west) node [pos=0.5,above] (pos3) {\small{训练}};

-\node [anchor=south](d1) at ([xshift=-1.5em,yshift=1em]remark1.north){\small{真实数据：}};
+\node [anchor=south](d1) at ([xshift=-1.5em,yshift=1em]remark1.north){\small{真实双语数据：}};
 \node [anchor=west](d2) at ([xshift=2.0em]d1.east){\small{伪数据：}};
-\node [anchor=west](d3) at ([xshift=2.0em]d2.east){\small{额外数据：}};
+\node [anchor=west](d3) at ([xshift=2.0em]d2.east){\small{额外单语数据：}};
 \node [anchor=west,fill=green!20,minimum width=1.5em](d1-1) at ([xshift=-0.0em]d1.east){};
 \node [anchor=west,fill=red!20,minimum width=1.5em](d2-1) at ([xshift=-0.0em]d2.east){};
 \node [anchor=west,fill=yellow!20,minimum width=1.5em](d3-1) at ([xshift=-0.0em]d3.east){};

--- a/Chapter16/Figures/figure-comparison-of-structure-between-gpt-and-bert-model.tex
+++ b/Chapter16/Figures/figure-comparison-of-structure-between-gpt-and-bert-model.tex
@@ -103,7 +103,7 @@
 \node [anchor=north] (pos1) at ([xshift=1.5em,yshift=-1.0em]node0-2.south) {\small{(a) GPT模型结构}};
 \node [anchor=north] (pos2) at ([xshift=1.5em,yshift=-1.0em]node0-6.south) {\small{(b) BERT模型结构}};

-\node [anchor=south] (ex) at ([xshift=2.1em,yshift=0.5em]node3-1.north) {\small{TRM：Transformer}};
+\node [anchor=south] (ex) at ([xshift=2.1em,yshift=0.5em]node3-1.north) {\small{TRM：标准Transformer模块}};




--- a/Chapter16/Figures/figure-data-based-domain-adaptation-approach.tex
+++ b/Chapter16/Figures/figure-data-based-domain-adaptation-approach.tex
+
+\begin{tikzpicture}[scale=0.8]
+\begin{scope}
+\tikzstyle{data} = [draw,black,very thick,inner sep=2pt,rounded corners=0pt,minimum width=2.5em,minimum height=1.5em,anchor=west]
+\tikzstyle{model} = [draw,black,very thick,inner sep=3.5pt,rounded corners=0pt,fill=red!20,minimum width=3em,minimum height=1.5em,font=\footnotesize]
+\tikzstyle{word} = [inner sep=3.5pt,align=left,font=\scriptsize]
+\tikzstyle{more} = [inner sep=2pt,rounded corners=0pt,minimum width=2.5em,minimum height=1.5em,anchor=west]
+
+\node[data,fill=blue!20] (one) at (0,0) {};
+\node[data,fill=green!20] (two) at ([xshift=-0.2em]one.east) {};
+\node[data,fill=yellow!20] (three) at ([xshift=-0.2em]two.east) {};
+
+\node[data,fill=blue!20,minimum width=1em] (one_) at ([yshift=-6em]one.south west) {};
+\node[data,fill=green!20,minimum width=4.5em] (two_) at ([xshift=-0.2em]one_.east) {};
+\node[data,fill=yellow!20,minimum width=2em] (three_) at ([xshift=-0.2em]two_.east) {};
+
+\node[model] (mo) at ([xshift=0.5em,yshift=-5em]two_.south){模型};
+
+\node[word] at ([xshift=-1.5em]one.west) {原始 \\ 数据};
+\node[word] at ([xshift=-1.5em]one_.west) {加权};
+
+\node[word,font=\tiny] at ([yshift=1em]one.north) {$(x_1,y_1)$};
+\node[word,font=\tiny] at ([yshift=1em]two.north) {$(x_2,y_2)$};
+\node[word,font=\tiny] at ([yshift=1em]three.north) {$(x_3,y_3)$};
+
+\node[word,font=\tiny] at ([yshift=1em]one_.north) {$(x_1,y_1)$};
+\node[word,font=\tiny] at ([yshift=1em]two_.north) {$(x_2,y_2)$};
+\node[word,font=\tiny] at ([yshift=1em]three_.north) {$(x_3,y_3)$};
+
+\draw [->,thick] ([yshift=-0.2em]one.south) .. controls +(south:2.5em) and +(north:2.5em) .. ([yshift=1.5em]one_.north);
+\draw [->,thick] ([yshift=-0.2em]two.south) .. controls +(south:2.5em) and +(north:2.5em) .. ([yshift=1.5em]two_.north);
+\draw [->,thick] ([yshift=-0.2em]three.south) .. controls +(south:2.5em) and +(north:2.5em) .. ([yshift=1.5em]three_.north);
+
+\draw [->,thick] ([xshift=0.5em,yshift=-0.2em]two_.south) -- ([yshift=0.2em]mo.north) node[pos=0.5,left,align=center,font=\footnotesize]{训练};
+
+\node[font=\small] at ([yshift=-4em]mo.south){(a)数据加权};
+\end{scope}
+
+\begin{scope}
+\tikzstyle{data} = [draw,black,very thick,inner sep=2pt,rounded corners=0pt,minimum width=2.5em,minimum height=1.5em,anchor=west]
+\tikzstyle{model} = [draw,black,very thick,inner sep=3.5pt,rounded corners=0pt,fill=red!20,minimum width=3em,minimum height=1.5em,font=\footnotesize]
+\tikzstyle{word} = [inner sep=3.5pt,align=left,font=\scriptsize]
+\tikzstyle{more} = [inner sep=2pt,rounded corners=0pt,minimum width=2.5em,minimum height=1.5em,anchor=west]
+
+\node[data,fill=blue!20] (one-2) at ([xshift=10.0em]one.east) {};
+\node[data,fill=green!20] (two-2) at ([xshift=-0.2em]one-2.east) {};
+\node[data,fill=yellow!20] (three-2) at ([xshift=-0.2em]two-2.east) {};
+
+\node[data,fill=blue!20] (one_-2) at ([yshift=-6em]one-2.south west) {};
+\node[data,fill=yellow!20] (three_-2) at ([xshift=-0.2em]one_-2.east) {};
+
+\node[model] (mo-2) at ([xshift=1.7em,yshift=-5em]one_-2.south){模型};
+
+\node[word] at ([xshift=-1.5em]one-2.west) {原始 \\ 数据};
+\node[word] at ([xshift=-1.5em]one_-2.west) {选择};
+
+\node[word,font=\tiny] at ([yshift=1em]one-2.north) {$(x_1,y_1)$};
+\node[word,font=\tiny] at ([yshift=1em]two-2.north) {$(x_2,y_2)$};
+\node[word,font=\tiny] at ([yshift=1em]three-2.north) {$(x_3,y_3)$};
+
+\node[word,font=\tiny] at ([yshift=1em]one_-2.north) {$(x_1,y_1)$};
+\node[word,font=\tiny] at ([yshift=1em]three_-2.north) {$(x_3,y_3)$};
+
+\draw [->,thick] ([yshift=-0.2em]one-2.south) .. controls +(south:2.5em) and +(north:2.5em) .. ([yshift=1.5em]one_-2.north);
+\draw [->,thick] ([yshift=-0.2em]three-2.south) .. controls +(south:2.5em) and +(north:2.5em) .. ([yshift=1.5em]three_-2.north);
+
+\draw [->,thick] ([xshift=1.7em,yshift=-0.2em]one_-2.south) -- ([yshift=0.2em]mo-2.north) node[pos=0.5,left,align=center,font=\footnotesize]{训练};
+\node[font=\small] at ([yshift=-4em]mo-2.south){(b)数据选择};
+\end{scope}
+
+\begin{scope}
+\tikzstyle{data} = [draw,black,very thick,inner sep=2pt,rounded corners=0pt,minimum width=2.5em,minimum height=1.5em,anchor=west]
+\tikzstyle{model} = [draw,black,very thick,inner sep=3.5pt,rounded corners=0pt,fill=red!20,minimum width=3em,minimum height=1.5em,font=\footnotesize]
+\tikzstyle{word} = [inner sep=3.5pt,align=left,font=\scriptsize]
+\tikzstyle{more} = [inner sep=2pt,rounded corners=0pt,minimum width=2.5em,minimum height=1.5em,anchor=west]
+
+\node[data,fill=blue!20] (one-3) at ([xshift=10.0em]one-2.east) {};
+\node[data,fill=green!20] (two-3) at ([xshift=-0.2em]one-3.east) {};
+\node[data,fill=yellow!20] (three-3) at ([xshift=-0.2em]two-3.east) {};
+
+\node[data,fill=blue!20] (one_-3) at ([yshift=-6em]one-3.south west) {};
+\node[data,fill=green!20] (two_-3) at ([xshift=-0.2em]one_-3.east) {};
+\node[data,fill=yellow!20] (three_-3) at ([xshift=-0.2em]two_-3.east) {};
+\node[data,fill=black!10] (new_-3) at ([xshift=1.2em]three_-3.east) {};
+
+\node[model] (mo-3) at ([xshift=1.7em,yshift=-5em]two_-3.south){模型};
+
+\node[word] at ([xshift=-1.5em]one-3.west) {原始 \\ 数据};
+
+\node[word,font=\tiny] at ([yshift=1em]one-3.north) {$(x_1,y_1)$};
+\node[word,font=\tiny] at ([yshift=1em]two-3.north) {$(x_2,y_2)$};
+\node[word,font=\tiny] at ([yshift=1em]three-3.north) {$(x_3,y_3)$};
+\node[word,font=\scriptsize] (monolingual-3)  at ([xshift=6em]three-3.south) {$\tilde{x}_4$};
+
+\node[word,font=\tiny] (w-3) at ([yshift=1em]one_-3.north) {$(x_1,y_1)$};
+\node[word,font=\tiny] at ([yshift=1em]two_-3.north) {$(x_2,y_2)$};
+\node[word,font=\tiny] at ([yshift=1em]three_-3.north) {$(x_3,y_3)$};
+\node[word,font=\tiny] (w4-3) at ([yshift=1em]new_-3.north) {{\red $(\tilde{x}_4,y_4^*)$}};
+
+\draw [->,thick] ([yshift=-0.2em]one-3.south) .. controls +(south:2.5em) and +(north:2.5em) .. ([yshift=1.5em]one_-3.north);
+\draw [->,thick] ([yshift=-0.2em]two-3.south) .. controls +(south:2.5em) and +(north:2.5em) .. ([yshift=1.5em]two_-3.north);
+\draw [->,thick] ([yshift=-0.2em]three-3.south) .. controls +(south:2.5em) and +(north:2.5em) .. ([yshift=1.5em]three_-3.north);
+\draw [->,thick] ([yshift=-0.0em]monolingual-3.south) .. controls +(south:2.5em) and +(north:2.5em) .. ([xshift=0.8em,yshift=1.7em]new_-3.north) node[pos=0.5,left,align=center,font=\tiny]{解码};
+
+\draw [->,thick] ([xshift=1.7em,yshift=-0.2em]two_-3.south) -- ([yshift=0.2em]mo-3.north) node[pos=0.5,left,align=center,font=\footnotesize]{训练};
+\node[font=\small] at ([yshift=-4em]mo-3.south){(c)伪数据};
+
+\begin{pgfonlayer}{background}
+\node [rectangle,rounded corners=1pt,fill=orange!10] [fit = (w-3) (one_-3)(three_-3)] (box1) {};
+\node [rectangle,rounded corners=1pt,fill=cyan!10] [fit = (w4-3) (new_-3)] (box2) {};
+\end{pgfonlayer}
+
+\node[word,draw=orange!50,dotted,very thick,inner sep=2.5pt] (realdata-3) at ([xshift=-4.5em,yshift=-2em]box1.south) {真实数据};
+\node[word,draw=cyan!50,dotted,very thick,inner sep=2.5pt] (fake-3) at ([xshift=1em,yshift=-2em]box2.south) {伪数据};
+\node[word,draw,dotted,very thick,inner sep=2.5pt] (monodata-3) at ([xshift=-0.5em,yshift=2em]monolingual-3.north) {单语数据};
+
+\draw[->,dotted,very thick] ([yshift=0.0em]monolingual-3.north)-- ([yshift=-0.2em,xshift=0.45em]monodata-3.south);
+\draw[->,dotted,very thick,cyan] (box2.south) -- ([xshift=-1em,yshift=0.2em]fake-3.north);
+\draw[->,dotted,very thick,orange] ([xshift=-3.5em]box1.south) -- ([xshift=1em,yshift=0.2em]realdata-3.north);
+
+\end{scope}
+\end{tikzpicture}
--- a/Chapter16/Figures/figure-example-of-iterative-back-translation.tex
+++ b/Chapter16/Figures/figure-example-of-iterative-back-translation.tex
@@ -53,9 +53,9 @@
 \draw [->,thick]([yshift=-0.75em]node5-1.east)--(remark3.north west);
 \draw [->,thick]([yshift=-0.75em]node6-1.east)--(remark3.south west);

-\node [anchor=south](d1) at ([xshift=-0.7em,yshift=5.5em]remark1.north){\small{真实数据：}};
+\node [anchor=south](d1) at ([xshift=-0.7em,yshift=5.5em]remark1.north){\small{真实双语数据：}};
 \node [anchor=west](d2) at ([xshift=2.0em]d1.east){\small{伪数据：}};
-\node [anchor=west](d3) at ([xshift=2.0em]d2.east){\small{额外数据：}};
+\node [anchor=west](d3) at ([xshift=2.0em]d2.east){\small{额外单语数据：}};
 \node [anchor=west,fill=green!20,minimum width=1.5em](d1-1) at ([xshift=-0.0em]d1.east){};
 \node [anchor=west,fill=red!20,minimum width=1.5em](d2-1) at ([xshift=-0.0em]d2.east){};
 \node [anchor=west,fill=yellow!20,minimum width=1.5em](d3-1) at ([xshift=-0.0em]d3.east){};

--- a/Chapter16/Figures/figure-mass.tex
+++ b/Chapter16/Figures/figure-mass.tex
@@ -7,12 +7,12 @@
 \node [anchor=center,model,fill=blue!20] (decoder) at ([xshift=7.5em]ate.east) {\small{解码器}};
 \node [anchor=north,word] (w1) at ([yshift=-1.5em,xshift=0em]decoder.south) {\small{$x_3$}};
 \node [anchor=west,word] (w2) at ([xshift=0em]w1.east) {\small{$x_4$}};
-\node [anchor=west,word] (w3) at ([xshift=0em]w2.east) {[M]};
+\node [anchor=west,word] (w3) at ([xshift=0em]w2.east) {<M>};

-\node [anchor=east,word] (w4) at ([xshift=0em]w1.west) {[M]};
-\node [anchor=east,word] (w5) at ([xshift=0em]w4.west) {[M]};
-\node [anchor=east,word] (w6) at ([xshift=0em]w5.west) {[M]};
-\node [anchor=west,word] (w7) at ([xshift=0em]w3.east) {[M]};
+\node [anchor=east,word] (w4) at ([xshift=0em]w1.west) {<M>};
+\node [anchor=east,word] (w5) at ([xshift=0em]w4.west) {<M>};
+\node [anchor=east,word] (w6) at ([xshift=0em]w5.west) {<M>};
+\node [anchor=west,word] (w7) at ([xshift=0em]w3.east) {<M>};

 \node [anchor=south,word] (w8) at ([yshift=1.5em,xshift=0em]decoder.north) {\small{$x_4$}};
 \node [anchor=east,word] (w9) at (w8.west) {\small{$x_3$}};
@@ -33,11 +33,11 @@
 %encoder
 \node [model] (encoder) at ([xshift=-7.5em]ate.west) {\small{编码器}};

-\node [anchor=north,word] (we1) at ([yshift=-1.5em,xshift=0em]encoder.south) {[M]};
-\node [anchor=west,word] (we2) at ([xshift=0em]we1.east) {[M]};
+\node [anchor=north,word] (we1) at ([yshift=-1.5em,xshift=0em]encoder.south) {<M>};
+\node [anchor=west,word] (we2) at ([xshift=0em]we1.east) {<M>};
 \node [anchor=west,word] (we3) at ([xshift=0em]we2.east) {\small{$x_6$}};

-\node [anchor=east,word] (we4) at ([xshift=0em]we1.west) {[M]};
+\node [anchor=east,word] (we4) at ([xshift=0em]we1.west) {<M>};
 \node [anchor=east,word] (we5) at ([xshift=0em]we4.west) {\small{$x_2$}};
 \node [anchor=east,word] (we6) at ([xshift=0em]we5.west) {\small{$x_1$}};
 \node [anchor=west,word] (we7) at ([xshift=0em]we3.east) {\small{$x_7$}};
@@ -51,5 +51,5 @@
 \draw [->,thick] (we7.north) -- ([yshift=1.35em]we7.north);

 \draw [->,very thick] ([xshift=0.5em]encoder)--([xshift=-0.3em]decoder);
-\node [anchor=south] (ex) at ([xshift=-4.0em,yshift=1.0em]encoder.north) {\small{[M]：Mask}};
+\node [anchor=south] (ex) at ([xshift=-4.0em,yshift=1.0em]encoder.north) {\small{<M>：<Mask>}};
 \end{tikzpicture}
\ No newline at end of file
--- a/Chapter16/Figures/figure-multi-language-single-model-system-diagram.tex
+++ b/Chapter16/Figures/figure-multi-language-single-model-system-diagram.tex
@@ -3,13 +3,13 @@
 %-------------------------------------------------------------------------
 \begin{tikzpicture}
 \tikzstyle{lan}=[font=\footnotesize,inner ysep=2pt,minimum height=1em]
-\node[minimum height=3em,minimum width=8em,fill=orange!20,draw,rounded corners=2pt,align=center,line width=0.6pt] (sys) at (0,0){多语言 \\ 单模型系统};
-\node[draw,font=\footnotesize,minimum width=4em,fill=red!20,rounded corners=1pt,line width=0.6pt] (en) at (-3em,4em){英语};
-\node[draw,font=\footnotesize,minimum width=4em,fill=red!20,rounded corners=1pt,line width=0.6pt] (fr) at (3em,4em){法语};
-\node[minimum width=4em]  at (6.6em,4em){$\dots$};
-\node[draw,font=\footnotesize,minimum width=4em,fill=blue!20,rounded corners=1pt,line width=0.6pt] (de) at (-3em,-4em){德语};
-\node[draw,font=\footnotesize,minimum width=4em,fill=blue!20,rounded corners=1pt,line width=0.6pt] (sp) at (3em,-4em){西班牙语};
-\node[minimum width=4em]  at (6.6em,-4em){$\dots$};
+\node[minimum height=4em,minimum width=8em,fill=orange!20,draw,rounded corners=2pt,align=center,line width=0.6pt,font=\small] (sys) at (0,0){多语言 \\ 单模型系统};
+\node[draw,font=\footnotesize,minimum width=4em,fill=red!20,rounded corners=1pt,line width=0.6pt] (en) at (-3em,5em){英语};
+\node[draw,font=\footnotesize,minimum width=4em,fill=red!20,rounded corners=1pt,line width=0.6pt] (fr) at (3em,5em){法语};
+\node[minimum width=4em]  at (6.6em,5em){$\dots$};
+\node[draw,font=\footnotesize,minimum width=4em,fill=blue!20,rounded corners=1pt,line width=0.6pt] (de) at (-3em,-5em){德语};
+\node[draw,font=\footnotesize,minimum width=4em,fill=blue!20,rounded corners=1pt,line width=0.6pt] (sp) at (3em,-5em){西班牙语};
+\node[minimum width=4em]  at (6.6em,-5em){$\dots$};

 \draw[->,thick] (en.-90) -- ([xshift=-1em]sys.90);
 \draw[->,thick] (fr.-90) -- ([xshift=1em]sys.90);

--- a/Chapter16/Figures/figure-parameter-initialization-method-diagram.tex
+++ b/Chapter16/Figures/figure-parameter-initialization-method-diagram.tex
@@ -12,10 +12,10 @@
 \node[node,anchor=west,minimum width=6em,minimum height=2.4em,fill=blue!20,line width=0.6pt] (decoder2) at ([xshift=4em,yshift=0em]decoder1.east){\small 解码器};
 \node[node,anchor=west,minimum width=6em,minimum height=2.4em,fill=blue!30,line width=0.6pt] (decoder3) at ([xshift=3em]decoder2.east){\small 解码器};

-\node[anchor=north,font=\scriptsize,fill=yellow!20] (w1) at ([yshift=-1.6em]decoder1.south){知识 \ 就是 \ 力量 \ 。 \ <eos>};
-\node[anchor=north,font=\scriptsize,fill=green!20] (w3) at ([yshift=-1.6em]decoder3.south){Wissen  \ ist \ Machit \ . \ <eos>};
-\node[anchor=south,font=\scriptsize,fill=orange!20] (w2) at ([yshift=1.6em]encoder1.north){Knowledge \ is \ power \ . };
-\node[anchor=south,font=\scriptsize,fill=orange!20] (w4) at ([yshift=1.6em]encoder3.north){Knowledge \ is \ power \ . };
+\node[anchor=north,font=\scriptsize,fill=yellow!20,drop shadow,draw] (w1) at ([yshift=-1.6em]decoder1.south){知识 \ 就是 \ 力量 \ 。 \ <eos>};
+\node[anchor=north,font=\scriptsize,fill=green!20,drop shadow,draw] (w3) at ([yshift=-1.6em]decoder3.south){El conocimiento es poder . <eos>};
+\node[anchor=south,font=\scriptsize,fill=orange!20,drop shadow,draw] (w2) at ([yshift=1.6em]encoder1.north){Knowledge \ is \ power \ . };
+\node[anchor=south,font=\scriptsize,fill=orange!20,drop shadow,draw] (w4) at ([yshift=1.6em]encoder3.north){Knowledge \ is \ power \ . };


 \draw[->,thick] (decoder1.-90) -- (w1.north);

--- a/Chapter16/Figures/figure-the-iterative-process-of-bidirectional-training.tex
+++ b/Chapter16/Figures/figure-the-iterative-process-of-bidirectional-training.tex
@@ -14,14 +14,14 @@

 \node(process_1_1)[process, right of = monolingual_X, xshift=2.5cm, yshift=-1.5cm]{\textbf{$M^0_{x\to y}$}};
 \node(process_1_2)[process, right of = process_1_1, xshift=5cm, fill=red!25]{$M^0_{y\to x}$};
-\node(process_2_1)[process, below of = process_1_1, yshift=-1.2cm]{解码过程};
-\node(process_2_2)[process, below of = process_1_2, yshift=-1.2cm, fill=red!25]{解码过程};
+\node(process_2_1)[process, below of = process_1_1, yshift=-1.2cm]{翻译过程};
+\node(process_2_2)[process, below of = process_1_2, yshift=-1.2cm, fill=red!25]{翻译过程};
 \node(process_3_1)[state, below of = process_2_1, yshift=-1.2cm, fill=color1!25]{\{$x_i,\hat{y}^0_i$\}};
 \node(process_3_2)[state, below of = process_2_2, yshift=-1.2cm, fill=blue!25]{\{$\hat{x}^0_i,{y_i}$\}};
 \node(process_4_1)[process, below of = process_3_1, yshift=-1.2cm]{\textbf{$M^1_{x\to y}$}};
 \node(process_4_2)[process, below of = process_3_2, yshift=-1.2cm, fill=red!25]{$M^1_{y\to x}$};
-\node(process_5_1)[process, below of = process_4_1, yshift=-1.2cm]{解码过程};
-\node(process_5_2)[process, below of = process_4_2, yshift=-1.2cm, fill=red!25]{解码过程};
+\node(process_5_1)[process, below of = process_4_1, yshift=-1.2cm]{翻译过程};
+\node(process_5_2)[process, below of = process_4_2, yshift=-1.2cm, fill=red!25]{翻译过程};
 \node(process_6_1)[state, below of = process_5_1, yshift=-1.2cm, fill=color1!25]{\{$x_i,\hat{y}^1_i$\}};
 \node(process_6_2)[state, below of = process_5_2, yshift=-1.2cm, fill=blue!25]{\{$\hat{x}^1_i,{y_i}$\}};
 \node(process_7_1)[process, below of = process_6_1, yshift=-1.2cm]{\textbf{$M^2_{x\to y}$}};

--- a/Chapter16/chapter16.tex
+++ b/Chapter16/chapter16.tex
--- a/Chapter17/chapter17.tex
+++ b/Chapter17/chapter17.tex
@@ -134,7 +134,7 @@

 \subsubsection{2. 语音识别结果的表示}

-\parinterval 级联语音翻译模型利用翻译模型将语音识别结果翻译为目标语言文本，但存在的一个问题是语音识别模型只输出One-best，其中可能存在一些识别错误，这些错误在翻译过程中会被放大，导致最终翻译结果偏离原本意思，也就是错误传播问题。传统级联语音模型的一个主要方向是丰富语音识别模型的预测结果，为翻译模型提供更多的信息，具体做法是在语音识别模型中，声学模型解码得到{\small\bfnew{词格}}\index{词格}（Word Lattice）\index{Word Lattice}来取代One-best识别结果。词格是一种有向无环图，包含单个起点和终点，图中的每条边记录了每个词和对应的转移概率信息，如图\ref{fig:17-6}所示。
+\parinterval 级联语音翻译模型利用翻译模型将语音识别结果翻译为目标语言文本，但存在的一个问题是语音识别模型只输出One-best，其中可能存在一些识别错误，这些错误在翻译过程中会被放大，导致最终翻译结果偏离原本意思，也就是错误传播问题。传统级联语音模型的一个主要方向是丰富语音识别模型的预测结果，为翻译模型提供更多的信息，具体做法是在语音识别模型中，声学模型解码得到词格来取代One-best识别结果。词格是一种有向无环图，包含单个起点和终点，图中的每条边记录了每个词和对应的转移概率信息，如图\ref{fig:17-6}所示。

 %----------------------------------------------------------------------------------------------------
 \begin{figure}[htp]

--- a/Chapter5/chapter5.tex
+++ b/Chapter5/chapter5.tex
@@ -153,7 +153,7 @@ IBM模型由Peter F. Brown等人于上世纪九十年代初提出\upcite{DBLP:jo
 %    NEW SUBSUB-SECTION
 %----------------------------------------------------------------------------------------

-\subsubsection{3. 人工翻译 vs. 机器翻译}
+\subsubsection{3. 人工翻译 vs 机器翻译}
 \parinterval 人在翻译时的决策是非常确定并且快速的，但计算机处理这个问题时却充满了概率化的思想。当然它们也有类似的地方。首先，计算机使用统计模型的目的是把翻译知识变得可计算，并把这些“知识”储存在模型参数中，这个模型和人类大脑的作用是类似的\footnote{这里并不是要把统计模型等同于生物学或者认知科学上的人脑，这里是指它们处理翻译问题时发挥的作用类似。}；其次，计算机对统计模型进行训练相当于人类对知识的学习，二者都可以被看作是理解、加工知识的过程；再有，计算机使用学习到的模型对新句子进行翻译的过程相当于人运用知识的过程。在统计机器翻译中，模型学习的过程被称为训练，目的是从双语平行数据中自动学习翻译“知识”；而使用模型处理新句子的过程是一个典型的预测过程，也被称为解码或推断。图\ref{fig:5-4}的右侧标注在翻译过程中训练和解码的作用。最终，统计机器翻译的核心由三部分构成\ \dash \ 建模、训练和解码。本章后续内容会围绕这三个问题展开讨论。

 %----------------------------------------------------------------------------------------
@@ -401,7 +401,7 @@ g(\seq{s},\seq{t}) &= &\prod_{(j,i)\in \widehat{A}}\funp{P}(s_j,t_i)
 \label{eq:5-10}
 \end{eqnarray}

-\noindent  其中，$\seq{t}=t_1...t_l$表示由$l$个单词组成的句子，$\funp{P}_{\textrm{lm}}(\seq{t})$表示语言模型给句子$\seq{t}$的打分。具体而言，$\funp{P}_{\textrm{lm}}(\seq{t})$被定义为$\funp{P}(t_i|t_{i-1})(i=1,2,...,l)$的连乘\footnote{为了确保数学表达的准确性，本书中定义$\funp{P}(t_1|t_0) \equiv \funp{P}(t_1)$}，其中$\funp{P}(t_i|t_{i-1})(i=1,2,...,l)$表示前面一个单词为$t_{i-1}$时，当前单词为$t_i$的概率。语言模型的训练方法可以参看{\chaptertwo}相关内容。
+\noindent  其中，$\seq{t}=\{ t_1...t_l \}$表示由$l$个单词组成的句子，$\funp{P}_{\textrm{lm}}(\seq{t})$表示语言模型给句子$\seq{t}$的打分。具体而言，$\funp{P}_{\textrm{lm}}(\seq{t})$被定义为$\funp{P}(t_i|t_{i-1})(i=1,2,...,l)$的连乘\footnote{为了确保数学表达的准确性，本书中定义$\funp{P}(t_1|t_0) \equiv \funp{P}(t_1)$}，其中$\funp{P}(t_i|t_{i-1})(i=1,2,...,l)$表示前面一个单词为$t_{i-1}$时，当前单词为$t_i$的概率。语言模型的训练方法可以参看{\chaptertwo}相关内容。

 \parinterval 回到建模问题上来。既然语言模型可以帮助系统度量每个译文的流畅度，那么可以使用它对翻译进行打分。一种简单的方法是把语言模型$\funp{P}_{\textrm{lm}}{(\seq{t})}$ 和公式\eqref{eq:5-8}中的$g(\seq{s},\seq{t})$相乘，这样就得到了一个新的$g(\seq{s},\seq{t})$，它同时考虑了翻译准确性（$\prod_{j,i \in \widehat{A}}{\funp{P}(s_j,t_i)}$）和流畅度（$\funp{P}_{\textrm{lm}}(\seq{t})$）:
 \begin{eqnarray}
@@ -605,7 +605,7 @@ g(\seq{s},\seq{t}) & \equiv & \prod_{j,i \in \widehat{A}}{\funp{P}(s_j,t_i)} \ti

 \subsection{词对齐}

-\parinterval IBM模型的一个基本的假设是词对齐假设。词对齐描述了源语言句子和目标语句子之间单词级别的对应。具体来说，给定源语句子$\seq{s}=s_1...s_m$和目标语译文$\seq{t}=t_1...t_l$，IBM模型假设词对齐具有如下两个性质。
+\parinterval IBM模型的一个基本的假设是词对齐假设。词对齐描述了源语言句子和目标语句子之间单词级别的对应。具体来说，给定源语句子$\seq{s}=\{ s_1...s_m \}$和目标语译文$\seq{t}=\{ t_1...t_l \}$，IBM模型假设词对齐具有如下两个性质。

 \begin{itemize}
 \vspace{0.5em}
@@ -634,7 +634,7 @@ g(\seq{s},\seq{t}) & \equiv & \prod_{j,i \in \widehat{A}}{\funp{P}(s_j,t_i)} \ti
 %----------------------------------------------
 \end{itemize}

-\parinterval 通常，把词对齐记为$\seq{a}$，它由$a_1$到$a_m$共$m$个词对齐连接组成，即$\seq{a}=a_1...a_m$。$a_j$表示第$j$个源语单词$s_j$对应的目标语单词的位置。在图\ref{fig:5-16}的例子中，词对齐关系可以记为$a_1=0, a_2=3, a_3=1$，即第1个源语单词“在”对应到目标语译文的第0个位置，第2个源语单词“桌子”对应到目标语译文的第3个位置，第3个源语单词“上”对应到目标语译文的第1个位置。
+\parinterval 通常，把词对齐记为$\seq{a}$，它由$a_1$到$a_m$共$m$个词对齐连接组成，即$\seq{a}=\{ a_1...a_m \}$。$a_j$表示第$j$个源语单词$s_j$对应的目标语单词的位置。在图\ref{fig:5-16}的例子中，词对齐关系可以记为$a_1=0, a_2=3, a_3=1$，即第1个源语单词“在”对应到目标语译文的第0个位置，第2个源语单词“桌子”对应到目标语译文的第3个位置，第3个源语单词“上”对应到目标语译文的第1个位置。

 %----------------------------------------------------------------------------------------
 %    NEW SUB-SECTION
@@ -668,7 +668,7 @@ g(\seq{s},\seq{t}) & \equiv & \prod_{j,i \in \widehat{A}}{\funp{P}(s_j,t_i)} \ti
 \label{eq:5-19}
 \end{eqnarray}

-\noindent  其中$s_j$和$a_j$分别表示第$j$个源语言单词及第$j$个源语言单词对齐到的目标位置，\seq{s}${{}_1^{j-1}}$表示前$j-1$个源语言单词（即\seq{s}${}_1^{j-1}=s_1...s_{j-1}$），\seq{a}${}_1^{j-1}$表示前$j-1$个源语言的词对齐（即\seq{a}${}_1^{j-1}=a_1...a_{j-1}$），$m$表示源语句子的长度。公式\eqref{eq:5-19}将$\funp{P}(\seq{s},\seq{a}|\seq{t})$分解为四个部分，具体含义如下：
+\noindent  其中$s_j$和$a_j$分别表示第$j$个源语言单词及第$j$个源语言单词对齐到的目标位置，\seq{s}${{}_1^{j-1}}$表示前$j-1$个源语言单词（即\seq{s}${}_1^{j-1}=\{ s_1...s_{j-1} \}$），\seq{a}${}_1^{j-1}$表示前$j-1$个源语言的词对齐（即\seq{a}${}_1^{j-1}=\{ a_1...a_{j-1} \}$），$m$表示源语句子的长度。公式\eqref{eq:5-19}将$\funp{P}(\seq{s},\seq{a}|\seq{t})$分解为四个部分，具体含义如下：

 \begin{itemize}
 \vspace{0.5em}

--- a/Chapter6/chapter6.tex
+++ b/Chapter6/chapter6.tex
@@ -219,7 +219,7 @@
 \parinterval 不过$<\seq{s},\seq{a}>$中有多少组$<\tau,\pi>$呢？通过图\ref{fig:6-5}中的例子，可以推出$<\seq{s},\seq{a}>$应该包含$\prod_{i=0}^{l}{\varphi_i !}$个不同的二元组$<\tau,\pi>$。 这是因为在给定源语言句子和词对齐时，对于每一个$\tau_i$都有$\varphi_{i}!$种排列。


-\parinterval 进一步，$\funp{P}(\tau,\pi|\seq{t})$可以被表示如图\ref{fig:6-7}的形式。其中$\tau_{i1}^{k-1}$表示$\tau_{i1}\tau_{i2}\cdots \tau_{i(k-1)}$，$\pi_{i1}^{ k-1}$表示$\pi_{i1}\pi_{i2}\cdots \pi_{i(k-1)}$。可以把图\ref{fig:6-7}中的公式分为5个部分，并用不同的序号和颜色进行标注。每部分的具体含义是：
+\parinterval 进一步，$\funp{P}(\tau,\pi|\seq{t})$可以被表示如图\ref{fig:6-7}的形式。其中$\tau_{i1}^{k-1}$表示$\tau_{i1}\cdots \tau_{i(k-1)}$，$\pi_{i1}^{ k-1}$表示$\pi_{i1}\cdots \pi_{i(k-1)}$。可以把图\ref{fig:6-7}中的公式分为5个部分，并用不同的序号和颜色进行标注。每部分的具体含义是：

 %----------------------------------------------
 \begin{figure}[htp]
@@ -281,13 +281,13 @@
 \label{eq:6-15}
 \end{eqnarray}
 }
-\parinterval 而上面提到的$t_0$所对应的这些空位置是如何生成的呢？即如何确定哪些位置是要放置空对齐的源语言单词。在IBM模型3中，假设在所有的非空对齐源语言单词都被生成出来后（共$\varphi_1+\varphi_2+\cdots {\varphi}_l$个非空对源语单词），这些单词后面都以$p_1$概率随机地产生一个“槽”用来放置空对齐单词。这样，${\varphi}_0$就服从了一个二项分布。于是得到
+\parinterval 而上面提到的$t_0$所对应的这些空位置是如何生成的呢？即如何确定哪些位置是要放置空对齐的源语言单词。在IBM模型3中，假设在所有的非空对齐源语言单词都被生成出来后（共$\varphi_1+\cdots {\varphi}_l$个非空对源语单词），这些单词后面都以$p_1$概率随机地产生一个“槽”用来放置空对齐单词。这样，${\varphi}_0$就服从了一个二项分布。于是得到
 {
 \begin{eqnarray}
 \funp{P}(\varphi_0|\seq{t}) & = & \big(\begin{array}{c}
-\varphi_1+\varphi_2+\cdots \varphi_l\\
+\varphi_1+\cdots \varphi_l\\
 \varphi_0\\
-\end{array}\big)p_0^{\varphi_1+\varphi_2+\cdots \varphi_l-\varphi_0}p_1^{\varphi_0}
+\end{array}\big)p_0^{\varphi_1+\cdots \varphi_l-\varphi_0}p_1^{\varphi_0}
 \label{eq:6-16}
 \end{eqnarray}
 }

--- a/Chapter7/chapter7.tex
+++ b/Chapter7/chapter7.tex
@@ -142,7 +142,7 @@
 \begin{definition} 短语

 {\small
-对于一个句子$\seq{w} = w_1...w_n$，任意子串$w_i...w_j$($i\leq j$且$0\leq i,j\leq n$)都是句子$\seq{w}$的一个{\small\bfnew{短语}}。
+对于一个句子$\seq{w} = \{ w_1...w_n \} $，任意子串$\{ w_i...w_j\}$($i\leq j$且$0\leq i,j\leq n$)都是句子$\seq{w}$的一个{\small\bfnew{短语}}。
 }
 \end{definition}
 %-------------------------------------------
@@ -154,7 +154,7 @@
 \begin{definition} 句子的短语切分

 {\small
-如果一个句子$\seq{w} = w_1...w_n$可以被切分为$m$个子串，则称$\seq{w}$由$m$个短语组成，记为$\seq{w} = p_1...p_m$，其中$p_i$是$\seq{w}$的一个短语，$\{p_1,...,p_m\}$也被称作句子$\seq{w}$的一个{\small\bfnew{短语切分}}。
+如果一个句子$\seq{w} = \{ w_1...w_n\}$可以被切分为$m$个子串，则称$\seq{w}$由$m$个短语组成，记为$\seq{w} =\{ p_1...p_m \} $，其中$p_i$是$\seq{w}$的一个短语，$\{p_1,...,p_m\}$也被称作句子$\seq{w}$的一个{\small\bfnew{短语切分}}。
 }
 \end{definition}
 %-------------------------------------------

--- a/Chapter8/chapter8.tex
+++ b/Chapter8/chapter8.tex
@@ -340,11 +340,11 @@ d & = & {r_1} \circ {r_2} \circ {r_3} \circ {r_4}
 \item 	对于$(x,y)\in \varPhi$，存在$m$个双语短语$(x_i,y_j)\in \varPhi$，同时存在(1,$...$,$m$)上面的一个排序$\sim = \{\pi_1 , ... ,\pi_m\}$，且：
 \vspace{-1.5em}
 \begin{eqnarray}
-x&=&\alpha_0 x_1 \alpha_1 x_2 ... \alpha_{m-1} x_m \alpha_m \label{eq:8-2}\\
-y&=&\beta_0 y_{\pi_1} \beta_1 y_{\pi_2} ... \beta_{m-1} y_{\pi_m} \beta_m
+x&=&\alpha_0 x_1 ... \alpha_{m-1} x_m \alpha_m \label{eq:8-2}\\
+y&=&\beta_0 y_{\pi_1}  ... \beta_{m-1} y_{\pi_m} \beta_m
 \label{eq:8-3}
 \end{eqnarray}
-其中，${\alpha_0, ... ,\alpha_m}$和${\beta_0, ... ,\beta_m}$表示源语言和目标语言的若干个词串（包含空串）。则$\funp{X} \to \langle x,y,\sim \rangle$是与词对齐相兼容的层次短语规则。这条规则包含$m$个变量，变量的对齐信息是$\sim$。
+其中，$\{\alpha_0, ... ,\alpha_m \}$和$\{\beta_0, ... ,\beta_m \}$表示源语言和目标语言的若干个词串（包含空串）。则$\funp{X} \to \langle x,y,\sim \rangle$是与词对齐相兼容的层次短语规则。这条规则包含$m$个变量，变量的对齐信息是$\sim$。
 \end{enumerate}
 }
 \end{definition}

--- a/ChapterAppend/chapterappend.tex
+++ b/ChapterAppend/chapterappend.tex
@@ -245,16 +245,16 @@ c(i|j,m,l;\mathbf{s},\mathbf{t}) &=&\frac{f(s_j|t_i)a(i|j,m,l)}   {\sum_{k=0}^{l
 \parinterval M-Step的计算公式如下，其中参数$a(i|j,m,l)$表示调序概率：

 \begin{eqnarray}
-f(s_u|t_v) &=\frac{c(s_u|t_v;\mathbf{s},\mathbf{t}) }    {\sum_{s_u} c(s_u|t_v;\mathbf{s},\mathbf{t})} \\
-a(i|j,m,l) &=\frac{c(i|j;\mathbf{s},\mathbf{t})}  {\sum_{i}c(i|j;\mathbf{s},\mathbf{t})}
+f(s_u|t_v) &=&\frac{c(s_u|t_v;\mathbf{s},\mathbf{t}) }    {\sum_{s_u} c(s_u|t_v;\mathbf{s},\mathbf{t})} \\
+a(i|j,m,l) &=&\frac{c(i|j;\mathbf{s},\mathbf{t})}  {\sum_{i}c(i|j;\mathbf{s},\mathbf{t})}
 \label{eq:append-2}
 \end{eqnarray}

 对于由$K$个样本组成的训练集$\{(\mathbf{s}^{[1]},\mathbf{t}^{[1]}),...,(\mathbf{s}^{[K]},\mathbf{t}^{[K]})\}$，可以将M-Step的计算调整为：

 \begin{eqnarray}
-f(s_u|t_v) &=\frac{\sum_{k=1}^{K}c_{\mathbb{E}}(s_u|t_v;\mathbf{s}^{[k]},\mathbf{t}^{[k]}) }    {\sum_{s_u} \sum_{k=1}^{K} c_{\mathbb{E}}(s_u|t_v;\mathbf{s}^{[k]},\mathbf{t}^{[k]})} \\
-a(i|j,m,l) &=\frac{\sum_{k=1}^{K}c_{\mathbb{E}}(i|j;\mathbf{s}^{[k]},\mathbf{t}^{[k]})}  {\sum_{i}\sum_{k=1}^{K}c_{\mathbb{E}}(i|j;\mathbf{s}^{[k]},\mathbf{t}^{[k]})}
+f(s_u|t_v) &=&\frac{\sum_{k=1}^{K}c_{\mathbb{E}}(s_u|t_v;\mathbf{s}^{[k]},\mathbf{t}^{[k]}) }    {\sum_{s_u} \sum_{k=1}^{K} c_{\mathbb{E}}(s_u|t_v;\mathbf{s}^{[k]},\mathbf{t}^{[k]})} \\
+a(i|j,m,l) &=&\frac{\sum_{k=1}^{K}c_{\mathbb{E}}(i|j;\mathbf{s}^{[k]},\mathbf{t}^{[k]})}  {\sum_{i}\sum_{k=1}^{K}c_{\mathbb{E}}(i|j;\mathbf{s}^{[k]},\mathbf{t}^{[k]})}
 \label{eq:append-3}
 \end{eqnarray}

@@ -294,13 +294,13 @@ p_x & = & \zeta^{-1} \sum_{k=1}^{K}c(x;\mathbf{s}^{[k]},\mathbf{t}^{[k]}) \label

 \parinterval 在模型3中，因为繁衍率的引入，并不能像模型1和模型2那样，在保证正确性的情况下加速参数估计的过程。这就使得每次迭代过程中，都不得不面对大小为$(l+1)^m$的词对齐空间。遍历所有$(l+1)^m$个词对齐所带来的高时间复杂度显然是不能被接受的。因此就要考虑能否仅利用词对齐空间中的部分词对齐对这些参数进行估计。比较简单的方法是仅使用Viterbi对齐来进行参数估计，这里Viterbi 词对齐可以被简单的看作搜索到的最好词对齐。遗憾的是，在模型3中并没有方法直接获得Viterbi对齐。这样只能采用一种折中的策略，即仅考虑那些使得$\funp{P}_{\theta}(\mathbf{s},\mathbf{a}|\mathbf{t})$ 达到较高值的词对齐。这里把这部分词对齐组成的集合记为$S$。式\ref{eq:1.2}可以被修改为：
 \begin{eqnarray}
-c(s|t,\mathbf{s},\mathbf{t}) \approx \sum_{\mathbf{a} \in S}\big[\funp{P}_{\theta}(\mathbf{s},\mathbf{a}|\mathbf{t}) \times \sum_{j=1}^{m}(\delta(s_j,\mathbf{s}) \cdot \delta(t_{a_{j}},\mathbf{t})) \big]
+c(s|t,\mathbf{s},\mathbf{t}) &\approx & \sum_{\mathbf{a} \in S}\big[\funp{P}_{\theta}(\mathbf{s},\mathbf{a}|\mathbf{t}) \times \sum_{j=1}^{m}(\delta(s_j,\mathbf{s}) \cdot \delta(t_{a_{j}},\mathbf{t})) \big]
 \label{eq:1.11}
 \end{eqnarray}

 \parinterval 同理可以获得式\ref{eq:1.3}-\ref{eq:1.6}的修改结果。进一步，在IBM模型3中，可以定义$S$如下：
 \begin{eqnarray}
-S = N(b^{\infty}(V(\mathbf{s}|\mathbf{t};2))) \cup (\mathop{\cup}\limits_{ij} N(b_{i \leftrightarrow j}^{\infty}(V_{i \leftrightarrow j}(\mathbf{s}|\mathbf{t},2))))
+S &=& N(b^{\infty}(V(\mathbf{s}|\mathbf{t};2))) \cup (\mathop{\cup}\limits_{ij} N(b_{i \leftrightarrow j}^{\infty}(V_{i \leftrightarrow j}(\mathbf{s}|\mathbf{t},2))))
 \label{eq:1.12}
 \end{eqnarray}

@@ -323,7 +323,7 @@ S = N(b^{\infty}(V(\mathbf{s}|\mathbf{t};2))) \cup (\mathop{\cup}\limits_{ij} N(

 \parinterval 如果$\bf{a}$和$\bf{a}'$区别于两个位置$j_1$和$j_2$的对齐上，$a_{j_{1}}=a_{j_{2}^{'}}$且$a_{j_{2}}=a_{j_{1}^{'}}$，那么
 \begin{eqnarray}
-\funp{P}_{\theta}(\mathbf{a'},\mathbf{s}|\mathbf{t}) = \funp{P}_{\theta}(\mathbf{a},\mathbf{s}|\mathbf{t}) \cdot \frac{t(s_{j_{2}}|t_{a_{j_{2}}})}{t(s_{j_{1}}|t_{a_{j_{1}}})} \cdot \frac{d(j_{2}|a_{j_{2}},m,l)}{d(j_{1}|a_{j_{1}},m,l)}
+\funp{P}_{\theta}(\mathbf{a'},\mathbf{s}|\mathbf{t}) &=& \funp{P}_{\theta}(\mathbf{a},\mathbf{s}|\mathbf{t}) \cdot \frac{t(s_{j_{2}}|t_{a_{j_{2}}})}{t(s_{j_{1}}|t_{a_{j_{1}}})} \cdot \frac{d(j_{2}|a_{j_{2}},m,l)}{d(j_{1}|a_{j_{1}},m,l)}
 \label{eq:1.14}
 \end{eqnarray}

@@ -337,7 +337,7 @@ S = N(b^{\infty}(V(\mathbf{s}|\mathbf{t};2))) \cup (\mathop{\cup}\limits_{ij} N(

 \parinterval 模型4的参数估计基本与模型3一致。需要修改的是扭曲度的估计公式，对于目标语第$i$个cept.生成的第一单词，可以得到（假设有$K$个训练样本）：
 \begin{eqnarray}
-d_1(\Delta_j|ca,cb) = \mu_{1cacb}^{-1} \times \sum_{k=1}^{K}c_1(\Delta_j|ca,cb;\mathbf{s}^{[k]},\mathbf{t}^{[k]})
+d_1(\Delta_j|ca,cb) &=& \mu_{1cacb}^{-1} \times \sum_{k=1}^{K}c_1(\Delta_j|ca,cb;\mathbf{s}^{[k]},\mathbf{t}^{[k]})
 \label{eq:1.15}
 \end{eqnarray}

@@ -352,7 +352,7 @@ s_1(\Delta_j|ca,cb;\rm{a},\mathbf{s},\mathbf{t}) & = & \sum_{i=1}^l \big[\vareps
 且

 \begin{eqnarray}
-\varepsilon(x) = \begin{cases}
+\varepsilon(x) &=& \begin{cases}
 0 & x \leq 0 \\
 1 & x > 0
 \end{cases}
@@ -362,7 +362,7 @@ s_1(\Delta_j|ca,cb;\rm{a},\mathbf{s},\mathbf{t}) & = & \sum_{i=1}^l \big[\vareps
 对于目标语第$i$个cept.生成的其他单词（非第一个单词），可以得到：

 \begin{eqnarray}
-d_{>1}(\Delta_j|cb) = \mu_{>1cb}^{-1} \times \sum_{k=1}^{K}c_{>1}(\Delta_j|cb;\mathbf{s}^{[k]},\mathbf{t}^{[k]})
+d_{>1}(\Delta_j|cb) &=& \mu_{>1cb}^{-1} \times \sum_{k=1}^{K}c_{>1}(\Delta_j|cb;\mathbf{s}^{[k]},\mathbf{t}^{[k]})
 \label{eq:1.18}
 \end{eqnarray}

@@ -377,7 +377,7 @@ s_{>1}(\Delta_j|cb;\mathbf{a},\mathbf{s},\mathbf{t}) & = & \sum_{i=1}^l \big[\va
 \noindent 这里，$ca$和$cb$分别表示目标语言和源语言的某个词类。模型4需要像模型3一样，通过定义一个词对齐集合$S$，使得每次迭代都在$S$上进行，进而降低运算量。模型4中$S$的定义为：

 \begin{eqnarray}
-\textrm{S} = N(\tilde{b}^{\infty}(V(\mathbf{s}|\mathbf{t};2))) \cup (\mathop{\cup}\limits_{ij} N(\tilde{b}_{i \leftrightarrow j}^{\infty}(V_{i \leftrightarrow j}(\mathbf{s}|\mathbf{t},2))))
+\textrm{S} &=& N(\tilde{b}^{\infty}(V(\mathbf{s}|\mathbf{t};2))) \cup (\mathop{\cup}\limits_{ij} N(\tilde{b}_{i \leftrightarrow j}^{\infty}(V_{i \leftrightarrow j}(\mathbf{s}|\mathbf{t},2))))
 \label{eq:1.22}
 \end{eqnarray}

@@ -392,7 +392,7 @@ s_{>1}(\Delta_j|cb;\mathbf{a},\mathbf{s},\mathbf{t}) & = & \sum_{i=1}^l \big[\va
 \parinterval 模型5的参数估计过程也模型4的过程基本一致，二者的区别在于扭曲度的估计公式。在模型5中，对于目标语第$i$个cept.生成的第一单词，可以得到（假设有$K$个训练样本）：

 \begin{eqnarray}
-d_1(\Delta_j|cb) = \mu_{1cb}^{-1} \times \sum_{k=1}^{K}c_1(\Delta_j|cb;\mathbf{s}^{[k]},\mathbf{t}^{[k]})
+d_1(\Delta_j|cb) &=& \mu_{1cb}^{-1} \times \sum_{k=1}^{K}c_1(\Delta_j|cb;\mathbf{s}^{[k]},\mathbf{t}^{[k]})
 \label{eq:1.23}
 \end{eqnarray}

@@ -408,7 +408,7 @@ s_1(\Delta_j|cb,v_x,v_y;\mathbf{a},\mathbf{s},\mathbf{t}) & = & \sum_{i=1}^l \Bi
 对于目标语第$i$个cept.生成的其他单词（非第一个单词），可以得到：

 \begin{eqnarray}
-d_{>1}(\Delta_j|cb,v) = \mu_{>1cb}^{-1} \times \sum_{k=1}^{K}c_{>1}(\Delta_j|cb,v;\mathbf{s}^{[k]},\mathbf{t}^{[k]})
+d_{>1}(\Delta_j|cb,v) &=& \mu_{>1cb}^{-1} \times \sum_{k=1}^{K}c_{>1}(\Delta_j|cb,v;\mathbf{s}^{[k]},\mathbf{t}^{[k]})
 \label{eq:1.26}
 \end{eqnarray}

@@ -431,7 +431,7 @@ s_{>1}(\Delta_j|cb,v;\mathbf{a},\mathbf{s},\mathbf{t}) & = & \sum_{i=1}^l\Big[\v

 \parinterval 在模型5中同样需要定义一个词对齐集合$S$，使得每次迭代都在$S$上进行。可以对$S$进行如下定义
 \begin{eqnarray}
-\textrm{S} = N(\tilde{\tilde{b}}^{\infty}(V(\mathbf{s}|\mathbf{t};2))) \cup (\mathop{\cup}\limits_{ij} N(\tilde{\tilde{b}}_{i \leftrightarrow j}^{\infty}(V_{i \leftrightarrow j}(\mathbf{s}|\mathbf{t},2))))
+\textrm{S} &=& N(\tilde{\tilde{b}}^{\infty}(V(\mathbf{s}|\mathbf{t};2))) \cup (\mathop{\cup}\limits_{ij} N(\tilde{\tilde{b}}_{i \leftrightarrow j}^{\infty}(V_{i \leftrightarrow j}(\mathbf{s}|\mathbf{t},2))))
 \label{eq:1.29}
 \end{eqnarray}
 \vspace{0.5em}

--- a/ChapterPostscript/postscript.tex
+++ b/ChapterPostscript/postscript.tex
@@ -36,7 +36,7 @@

 \parinterval 自从计算机诞生开始，机器翻译即利用计算机软件技术实现不同语言自动翻译，就是人们首先想到的计算机主要应用之一。很多人说现在处于人工智能时代，是得语言者的天下，因此机器翻译也是认知智能的终极梦想之一。本书已经讨论了机器翻译的模型、方法和实现技术，这里将分享一些我们对机器翻译应用和未来的一些思考，有些想法不一定正确，也许需要十年之后才能验证。

-\parinterval 简单来说，机器翻译技术应用至少可以满足三个用户需求。一是实现外文资料辅助阅读和帮助不同母语的人们进行无障碍交流；二是计算机辅助翻译，帮助人工翻译降本增效；三是大数据分析和处理应用领域实现对多语言文字资料（也可以是图像和语音资料）进行加工处理，海量数据翻译对于人工翻译来说是无法完成的，机器翻译是大数据翻译的唯一有效解决方案。从上述三个需求可以看出，机器翻译和人工翻译本质上不存在严格冲突，属于两个平行轨道，两者可以和谐共存、相互帮助。对于机器翻译来说，至少有两个应用场景是其无法胜任的。第一个是要求高质量翻译结果，比如诗歌小说翻译出版；第二个是比如重要领导人讲话，轻易不允许出现低级翻译错误，否则有可能导致严重后果甚至国际纠纷。严格上来说，对译文准确性要求很高的应用 场景不可能简单采用机器翻译，必须由高水平的人工翻译参与来完成。
+\parinterval 简单来说，机器翻译技术应用至少可以满足三个用户需求。一是实现外文资料辅助阅读和帮助不同母语的人们进行无障碍交流；二是计算机辅助翻译，帮助人工翻译降本增效；三是通过大数据分析和处理，实现对多语言文字资料（也可以是图像资料或语音资料）的加工处理，海量数据翻译对于人工翻译来说是无法完成的，机器翻译是大数据翻译的唯一有效解决方案。从上述三个需求可以看出，机器翻译和人工翻译本质上不存在严格冲突，属于两个平行轨道，两者可以和谐共存、相互帮助。对于机器翻译来说，至少有两个应用场景是其无法胜任的。第一个是要求高质量翻译结果，比如诗歌小说翻译出版；第二个是比如重要领导人讲话，轻易不允许出现低级翻译错误，否则有可能导致严重后果甚至国际纠纷。严格上来说，对译文准确性要求很高的应用 场景不可能简单采用机器翻译，必须由高水平的人工翻译参与来完成。

 \parinterval 如何构建一套好的机器翻译系统呢？假设我们需要给用户提供一套翻译品质不错的机器翻译系统，至少需要考虑三个方面：足够大规模的双语句对集合用于训练、强大的机器翻译技术和错误驱动的打磨过程。从技术应用和产业化角度来看，简单靠提出一个新的机器翻译技术，对于构建一套好的机器翻译系统来说，只能说必要条件，不是充要条件，上述三者缺一不可。


--- a/bibliography.bib
+++ b/bibliography.bib
@@ -6618,6 +6618,108 @@ author    = {Yoshua Bengio and
  publisher = {{IEEE} International Conference on Computer Vision},
  year      = {2017}
 }
+@inproceedings{DBLP:journals/corr/SuGMRUVWY16a,
+  author    = {Pei{-}Hao Su and
+               Milica Gasic and
+               Nikola Mrksic and
+               Lina Maria Rojas{-}Barahona and
+               Stefan Ultes and
+               David Vandyke and
+               Tsung{-}Hsien Wen and
+               Steve J. Young},
+  title     = {Continuously Learning Neural Dialogue Management},
+  publisher   = {CoRR},
+  volume    = {abs/1606.02689},
+  year      = {2016}
+}
+@inproceedings{DBLP:journals/corr/abs-1709-02349,
+  author    = {Iulian Vlad Serban and
+               Chinnadhurai Sankar and
+               Mathieu Germain and
+               Saizheng Zhang and
+               Zhouhan Lin and
+               Sandeep Subramanian and
+               Taesup Kim and
+               Michael Pieper and
+               Sarath Chandar and
+               Nan Rosemary Ke and
+               Sai Mudumba and
+               Alexandre de Br{\'{e}}bisson and
+               Jose Sotelo and
+               Dendi Suhubdy and
+               Vincent Michalski and
+               Alexandre Nguyen and
+               Joelle Pineau and
+               Yoshua Bengio},
+  title     = {A Deep Reinforcement Learning Chatbot},
+  publisher   = {CoRR},
+  volume    = {abs/1709.02349},
+  year      = {2017}
+}
+@inproceedings{DBLP:conf/emnlp/WuTQLL18,
+  author    = {Lijun Wu and
+               Fei Tian and
+               Tao Qin and
+               Jianhuang Lai and
+               Tie{-}Yan Liu},
+  title     = {A Study of Reinforcement Learning for Neural Machine Translation},
+  pages     = {3612--3621},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2018}
+}
+@inproceedings{DBLP:journals/jmlr/RossGB11,
+  author    = {St{\'{e}}phane Ross and
+               Geoffrey J. Gordon and
+               Drew Bagnell},
+  title     = {A Reduction of Imitation Learning and Structured Prediction to No-Regret
+               Online Learning},
+  publisher = {International Conference on Artificial Intelligence and Statistics},
+  series    = {{JMLR} Proceedings},
+  volume    = {15},
+  pages     = {627--635},
+  publisher = {JMLR.org},
+  year      = {2011}
+}
+@inproceedings{DBLP:conf/aaai/VenkatramanHB15,
+  author    = {Arun Venkatraman and
+               Martial Hebert and
+               J. Andrew Bagnell},
+  title     = {Improving Multi-Step Prediction of Learned Time Series Models},
+  publisher = {AAAI Conference on Artificial Intelligence},
+  pages     = {3024--3030},
+  year      = {2015}
+}
+@inproceedings{DBLP:conf/iclr/LiuCLS17,
+  author    = {Yanpei Liu and
+               Xinyun Chen and
+               Chang Liu and
+               Dawn Song},
+  title     = {Delving into Transferable Adversarial Examples and Black-box Attacks},
+  publisher = {International Conference on Learning Representations},
+  year      = {2017}
+}
+@inproceedings{DBLP:journals/tnn/YuanHZL19,
+  author    = {Xiaoyong Yuan and
+               Pan He and
+               Qile Zhu and
+               Xiaolin Li},
+  title     = {Adversarial Examples: Attacks and Defenses for Deep Learning},
+  publisher   = {IEEE Transactions on Neural Networks and Learning Systems},
+  volume    = {30},
+  number    = {9},
+  pages     = {2805--2824},
+  year      = {2019}
+}
+@inproceedings{DBLP:conf/infocom/YuanHL020,
+  author    = {Xiaoyong Yuan and
+               Pan He and
+               Xiaolin Li and
+               Dapeng Wu},
+  title     = {Adaptive Adversarial Attack on Scene Text Recognition},
+  pages     = {358--363},
+  publisher = {IEEE Conference on Computer Communications},
+  year      = {2020}
+}
 %%%%% chapter 13------------------------------------------------------
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%