合并分支 'caorunzhe' 到 'mengxia'

Caorunzhe 查看合并请求 !588

合并分支 'caorunzhe' 到 'mengxia'
Caorunzhe 查看合并请求 !588
cccbe612 · 孟霞 · 30f904ef · dba1ccef · cccbe612 · 30f904ef
Commit cccbe612 authored Dec 14, 2020 by 孟霞
--- a/Chapter14/chapter14.tex
+++ b/Chapter14/chapter14.tex
@@ -108,13 +108,13 @@
 \parinterval 以上两种推断方式在神经机器翻译中都有应用，对于源语言句子$\seq{x}=\{x_1,x_2,\dots,x_m\}$和目标语句子$\seq{y}=\{y_1,y_2,\dots,y_n\}$，用自左向右的方式可以将翻译概率$\funp{P}(\seq{y}\vert\seq{x})$描述为公式\eqref{eq:14-1}：

 \begin{eqnarray}
-\funp{P}(\seq{y}\vert\seq{x}) &=& \prod_{j=1}^n \funp{P}(y_j\vert\seq{y}_{<j},\seq{x})
+\funp{P}(\seq{y}\vert\seq{x})=\prod_{j=1}^n \funp{P}(y_j\vert\seq{y}_{<j},\seq{x})
 \label{eq:14-1}
 \end{eqnarray}
 \parinterval 而用自右向左的方式可以得到公式\eqref{eq:14-2}：

 \begin{eqnarray}
-\funp{P}(\seq{y}\vert\seq{x}) &=& \prod_{j=1}^n \funp{P}(y_{n+1-j}\vert\seq{y}_{>j},\seq{x})
+\funp{P}(\seq{y}\vert\seq{x})=\prod_{j=1}^n \funp{P}(y_{n+1-j}\vert\seq{y}_{>j},\seq{x})
 \label{eq:14-2}
 \end{eqnarray}
 \parinterval 其中，$\seq{y}_{<j}=\{y_1,y_2,\dots,y_{j-1}\}$，$\seq{y}_{>j}=\{y_{j+1},y_{j+2},\dots,y_n\}$。
@@ -148,7 +148,7 @@
 \item 长度惩罚因子。用译文长度来归一化翻译概率是最常用的方法：对于源语言句子$\seq{x}$和译文句子$\seq{y}$，模型得分$\textrm{score}(\seq{x},\seq{y})$的值会随着译文$\seq{y}$ 的变长而减小，为了避免此现象，可以引入一个长度惩罚函数$\textrm{lp}(\seq{y})$，并定义模型得分如公式\eqref{eq:14-12}所示：

 \begin{eqnarray}
-\textrm{score}(\seq{x},\seq{y}) &=& \frac{\log \funp{P}(\seq{y}\vert\seq{x})}{\textrm{lp}(\seq{y})}
+\textrm{score}(\seq{x},\seq{y})=\frac{\log \funp{P}(\seq{y}\vert\seq{x})}{\textrm{lp}(\seq{y})}
 \label{eq:14-12}
 \end{eqnarray}

@@ -188,7 +188,7 @@ b &=& \omega_{\textrm{high}}\cdot |\seq{x}| \label{eq:14-4}
 \noindent 其中，$\textrm{cp}(\seq{x},\seq{y}) $表示覆盖度模型，它度量了译文对源语言每个单词的覆盖程度。$\textrm{cp}(\seq{x},\seq{y}) $的定义中，$a_{ij}$表示源语言第$i$个位置与目标语第$j$个位置的注意力权重，这样$\sum \limits_{j}^{|\seq{y}|} a_{ij}$就可以用来衡量源语言第$i$个单词被翻译了“多少”，如果它大于1，表明翻译多了；如果小于1，表明翻译少了。公式\eqref{eq:14-6}会惩罚那些欠翻译的翻译假设。覆盖度模型的一种改进形式是\upcite{li-etal-2018-simple}：

 \begin{eqnarray}
-\textrm{cp}(\seq{x},\seq{y}) &=& \sum_{i=1}^{|\seq{x}|} \log( \textrm{max} ( \sum_{j}^{|\seq{y}|} a_{ij},\beta))
+\textrm{cp}(\seq{x},\seq{y}) = \sum_{i=1}^{|\seq{x}|} \log( \textrm{max} ( \sum_{j}^{|\seq{y}|} a_{ij},\beta))
 \label{eq:14-7}
 \end{eqnarray}
 \noindent 公式\eqref{eq:14-7}将公式\eqref{eq:14-6}中的向下截断方式换为了向上截断。这样，模型可以对过翻译（或重复翻译）有更好的建模能力。不过，这个模型需要在开发集上细致地调整$\beta$，也带来了一定的额外工作量。此外，也可以将这种覆盖度单独建模并进行参数化，与翻译模型一同训练\upcite{Mi2016CoverageEM,TuModeling,Kazimi2017CoverageFC}。这样可以得到更加精细的覆盖度模型。
@@ -416,7 +416,7 @@ b &=& \omega_{\textrm{high}}\cdot |\seq{x}| \label{eq:14-4}
 \parinterval 目前主流的神经机器翻译的推断是一种{\small\sffamily\bfseries{自回归翻译}}\index{自回归翻译}（Autoregressive Translation）\index{Autoregressive Translation}过程。所谓自回归是一种描述时间序列生成的方式。对于目标序列$\seq{y}=\{y_1,\dots,y_n\}$，自回归模型假设$j$时刻状态$y_j$的生成依赖于之前的状态$\{y_1,\dots,y_{j-1}\}$，而且$y_j$与$\{y_1,\dots,y_{j-1}\}$构成线性关系，那么生成$y_j$就是自回归的序列生成过程。神经机器翻译借用了这个概念，但是并不要求线性模型。对于输入的源语言序列$\seq{x}=\{x_1,\dots,x_m\}$，用自回归翻译模型生成译文序列$\seq{y}=\{y_1,\dots,y_n\}$的概率可以被定义为：

 \begin{eqnarray}
-\funp{P}(\seq{y}|\seq{x}) &=& \prod_{j=1}^n {\funp{P}(y_j|y_{<j},\seq{x})}
+\funp{P}(\seq{y}|\seq{x})=\prod_{j=1}^n {\funp{P}(y_j|y_{<j},\seq{x})}
 \label{eq:14-8}
 \end{eqnarray}

@@ -425,7 +425,7 @@ b &=& \omega_{\textrm{high}}\cdot |\seq{x}| \label{eq:14-4}
 \parinterval 对于这个问题，研究者也考虑移除翻译的自归回性，进行{\small\sffamily\bfseries{非自回归翻译}}\index{非自回归翻译}（Non-Autoregressive Translation，NAT）\index{Non-Autoregressive Translation}\upcite{Gu2017NonAutoregressiveNM}。一个简单的非自回归翻译模型将问题建模为：

 \begin{eqnarray}
-\funp{P}(\seq{y}|\seq{x}) &=& \prod_{j=1}^n {\funp{P}(y_j|\seq{x})}
+\funp{P}(\seq{y}|\seq{x})=\prod_{j=1}^n {\funp{P}(y_j|\seq{x})}
 \label{eq:14-9}
 \end{eqnarray}

@@ -485,7 +485,7 @@ b &=& \omega_{\textrm{high}}\cdot |\seq{x}| \label{eq:14-4}
 \parinterval 另外，在每个解码器层中还包括额外的位置注意力模块，该模块与Transformer模型的其它部分中使用的多头注意力机制相同，如下：

 \begin{eqnarray}
-\textrm{Attention}(\mathbi{Q},\mathbi{K},\mathbi{V}) &=& \textrm{Softmax}(\frac{\mathbi{Q}{\mathbi{K}}^{T}}{\sqrt{d_k}})\cdot \mathbi{V}
+\textrm{Attention}(\mathbi{Q},\mathbi{K},\mathbi{V})&=&\textrm{Softmax}(\frac{\mathbi{Q}{\mathbi{K}}^{T}}{\sqrt{d_k}})\cdot \mathbi{V}
 \label{eq:14-10}
 \end{eqnarray}

@@ -651,7 +651,7 @@ b &=& \omega_{\textrm{high}}\cdot |\seq{x}| \label{eq:14-4}
 \parinterval 神经机器翻译模型对每个目标端位置$j$的单词分布进行预测，即对于目标语言词汇表中的每个单词$y_j$，需要计算$\funp{P}(y_j | \seq{y}_{<j},\seq{x})$。假设有$K$个神经机器翻译系统，那么每个系统$k$都可以独立的计算这个概率，记为$\funp{P}_{k} (y_j | \seq{y}_{<j},\seq{x})$。于是，可以用公式\eqref{eq:14-11}融合这$K$个系统的预测：

 \begin{eqnarray}
-\funp{P}(y_{j} | \seq{y}_{<j},\seq{x}) &=& \sum_{k=1}^K \gamma_{k} \cdot \funp{P}_{k} (y_j | \seq{y}_{<j},\seq{x})
+\funp{P}(y_{j} | \seq{y}_{<j},\seq{x}) = \sum_{k=1}^K \gamma_{k} \cdot \funp{P}_{k} (y_j | \seq{y}_{<j},\seq{x})
 \label{eq:14-11}
 \end{eqnarray}


--- a/Chapter15/Figures/figure-activation-function-swish-structure-diagram.png
+++ b/Chapter15/Figures/figure-activation-function-swish-structure-diagram.png
--- a/Chapter15/Figures/figure-activation-function-swish-structure-diagram.tex
+++ b/Chapter15/Figures/figure-activation-function-swish-structure-diagram.tex
+
+%%% outline
+%-------------------------------------------------------------------------
+\begin{tikzpicture}
+\tikzstyle{opt}=[draw,minimum height=2em,minimum width=4em,rounded corners=2pt,thick]
+
+\node[circle,minimum size=2em,draw,fill=red!20] (x1) at (0,0) {\small\bfnew{X}};
+\node[circle,minimum size=2em,draw,fill=red!20] (x2) at (0,4em) {\small\bfnew{X}};
+\node[circle,minimum size=2em,draw,fill=red!20] (x3) at (0,-5em) {\small\bfnew{X}};
+
+\node[anchor=west,opt,fill=yellow!20] (unary1) at ([xshift=3em]x1.east){\small\bfnew{Unary}};
+\node[anchor=west,opt,fill=yellow!20] (unary2) at ([xshift=3em]x2.east){\small\bfnew{Unary}};
+
+\node[opt,fill=blue!20] (binary1) at (12em,2em){\small\bfnew{Binary}};
+\node[opt,fill=blue!20] (binary2) at (25em,-1.5em){\small\bfnew{Binary}};
+\node[anchor=west,opt,fill=yellow!20] (unary3) at ([xshift=3em]binary1.east){\small\bfnew{Unary}};
+\node[anchor=west,opt,fill=yellow!20] (unary4) at ([xshift=16em]x3.east){\small\bfnew{Unary}};
+
+
+\draw[-latex,very thick] (x1.0) -- (unary1.180);
+\draw[-latex,very thick] (x2.0) -- (unary2.180);
+\draw[-latex,very thick] (x3.0) -- (unary4.180);
+\draw[-latex,very thick] (unary1.0) -- ([yshift=-0.2em]binary1.180);
+\draw[-latex,very thick] (unary2.0) -- ([yshift=0.2em]binary1.180);
+\draw[-latex,very thick] (binary1.0) -- (unary3.180);
+\draw[-latex,very thick] (unary3.0) -- ([yshift=0.2em]binary2.180);
+\draw[-latex,very thick] (unary4.0) -- ([yshift=-0.2em]binary2.180);
+
+\begin{pgfonlayer}{background}
+\node[draw=lightgray,fill=lightgray!50,rounded corners=2pt,inner sep=8pt][fit=(x2)(x1)(binary1)]{};
+\end{pgfonlayer}
+\node[anchor=south] at ([yshift=1em]binary1.north){\small\bfnew{Core Unit}};
+\end{tikzpicture}
+
+
+
+
--- a/Chapter15/Figures/figure-attention-distribution-based-on gaussian-distribution.png
+++ b/Chapter15/Figures/figure-attention-distribution-based-on gaussian-distribution.png
--- a/Chapter15/Figures/figure-common-multi-branch-structure-1.png
+++ b/Chapter15/Figures/figure-common-multi-branch-structure-1.png
--- a/Chapter15/Figures/figure-common-multi-branch-structure-2.png
+++ b/Chapter15/Figures/figure-common-multi-branch-structure-2.png
--- a/Chapter15/Figures/figure-convolutional-attention-network.png
+++ b/Chapter15/Figures/figure-convolutional-attention-network.png
--- a/Chapter15/Figures/figure-encoder-of-bidirectional-tree-structure.png
+++ b/Chapter15/Figures/figure-encoder-of-bidirectional-tree-structure.png
--- a/Chapter15/Figures/figure-encoder-tree-structure-modeling.png
+++ b/Chapter15/Figures/figure-encoder-tree-structure-modeling.png
--- a/Chapter15/Figures/figure-layer-fusion-method.png
+++ b/Chapter15/Figures/figure-layer-fusion-method.png
--- a/Chapter15/Figures/figure-layer-fusion-method.tex
+++ b/Chapter15/Figures/figure-layer-fusion-method.tex
+
+\begin{tikzpicture}
+\begin{scope}
+
+\tikzstyle{encnode}=[rectangle,inner sep=0mm,minimum height=2em,minimum width=4.5em,rounded corners=5pt,thick]
+\tikzstyle{decnode}=[rectangle,inner sep=0mm,minimum height=2em,minimum width=4.5em,rounded corners=5pt,thick]
+
+\node [anchor=north,encnode] (n1) at (0, 0) {编码器};
+
+\node [anchor=north,rectangle,minimum height=1.5em,minimum width=2.5em,rounded corners=5pt] (n2) at ([xshift=0em,yshift=-0.2em]n1.south) {$\mathbi{X}$};
+
+\node [anchor=west,encnode,draw=red!60!black!80,fill=red!20] (n3) at ([xshift=1.5em,yshift=0em]n2.east) {$\mathbi{h}_0$};
+
+\node [anchor=west,encnode,draw=red!60!black!80,fill=red!20] (n4) at ([xshift=1.5em,yshift=0em]n3.east) {$\mathbi{h}_1$};
+
+\node [anchor=west,encnode,draw=red!60!black!80,fill=red!20] (n5) at ([xshift=1.5em,yshift=0em]n4.east) {$\mathbi{h}_2$};
+
+\node [anchor=west,rectangle,minimum height=1.5em,minimum width=2.5em,rounded corners=5pt] (n6) at ([xshift=1em,yshift=0em]n5.east) {$\ldots$};
+
+\node [anchor=west,encnode,draw=red!60!black!80,fill=red!20] (n7) at ([xshift=1em,yshift=0em]n6.east) {$\mathbi{h}_{N-1}$};
+
+\node [anchor=north,rectangle,draw=teal!80, inner sep=0mm,minimum height=2em,minimum width=8em,fill=teal!17,rounded corners=5pt,thick] (n8) at ([xshift=3em,yshift=-1.2em]n4.south) {权重聚合$\mathbi{g}$};
+
+
+
+\node [anchor=west,decnode] (n9) at ([xshift=0em,yshift=-7.2em]n1.west) {解码器};
+
+\node [anchor=north,rectangle,minimum height=1.5em,minimum width=2.5em,rounded corners=5pt] (n10) at ([xshift=0em,yshift=-0.2em]n9.south) {$\mathbi{y}_{<j}$};
+
+\node [anchor=west,decnode,draw=ublue,fill=blue!10] (n11) at ([xshift=1.5em,yshift=0em]n10.east) {$\mathbi{s}_j^0$};
+
+\node [anchor=west,decnode,draw=ublue,fill=blue!10] (n12) at ([xshift=1.5em,yshift=0em]n11.east) {$\mathbi{s}_j^1$};
+
+\node [anchor=west,decnode,draw=ublue,fill=blue!10] (n13) at ([xshift=1.5em,yshift=0em]n12.east) {$\mathbi{s}_j^2$};
+
+\node [anchor=west,rectangle,minimum height=1.5em,minimum width=2.5em,rounded corners=5pt] (n14) at ([xshift=1em,yshift=0em]n13.east) {$\ldots$};
+
+\node [anchor=west,decnode,draw=ublue,fill=blue!10] (n15) at ([xshift=1em,yshift=0em]n14.east) {$\mathbi{s}_j^{M-1}$};
+
+\node [anchor=west,rectangle,minimum height=1.5em,minimum width=2.5em,rounded corners=5pt] (n16) at ([xshift=1.5em,yshift=0em]n15.east) {$\mathbi{y}_{j}$};
+
+
+\node [anchor=south,minimum height=1.5em,minimum width=2.5em] (n17) at ([xshift=0em,yshift=6em]n16.north) {};
+
+\node [anchor=north,minimum height=0.5em,minimum width=4em] (n18) at ([xshift=0em,yshift=-0.7em]n4.south) {};
+\node [anchor=north,minimum height=0.5em,minimum width=4em] (n19) at ([xshift=0em,yshift=-0.7em]n13.south) {};
+
+\node [anchor=west,minimum height=0.5em,minimum width=4em] (n20) at ([xshift=0em,yshift=0.7em]n8.east) {};
+\node [anchor=west,minimum height=0.5em,minimum width=4em] (n21) at ([xshift=0em,yshift=-0.7em]n8.east) {};
+
+\begin{pgfonlayer}{background}
+{
+\node[rectangle,inner sep=2pt,fill=blue!7] [fit = (n1) (n7) (n17) (n18) (n20)] (bg1) {};
+\node[rectangle,inner sep=2pt,fill=red!7] [fit = (n9) (n16) (n19) (n21)] (bg2) {};
+}
+\end{pgfonlayer}
+
+\draw [->,thick] ([xshift=0em,yshift=0em]n2.east) -- ([xshift=0em,yshift=0em]n3.west);
+\draw [->,thick] ([xshift=0em,yshift=0em]n3.east) -- ([xshift=0em,yshift=0em]n4.west);
+\draw [->,thick] ([xshift=0em,yshift=0em]n4.east) -- ([xshift=0em,yshift=0em]n5.west);
+\draw [->,thick] ([xshift=0em,yshift=0em]n5.east) -- ([xshift=0em,yshift=0em]n6.west);
+\draw [->,thick] ([xshift=0em,yshift=0em]n6.east) -- ([xshift=0em,yshift=0em]n7.west);
+
+\draw [->,thick] ([xshift=0em,yshift=0em]n2.south) -- ([xshift=0em,yshift=0em]n8.north);
+\draw [->,thick] ([xshift=0em,yshift=0em]n3.south) -- ([xshift=0em,yshift=0em]n8.north);
+\draw [->,thick] ([xshift=0em,yshift=0em]n4.south) -- ([xshift=0em,yshift=0em]n8.north);
+\draw [->,thick] ([xshift=0em,yshift=0em]n5.south) -- ([xshift=0em,yshift=0em]n8.north);
+\draw [->,thick] ([xshift=0em,yshift=0em]n7.south) -- ([xshift=0em,yshift=0em]n8.north);
+
+\draw [->,thick] ([xshift=0em,yshift=0em]n10.east) -- ([xshift=0em,yshift=0em]n11.west);
+\draw [->,thick] ([xshift=0em,yshift=0em]n11.east) -- ([xshift=0em,yshift=0em]n12.west);
+\draw [->,thick] ([xshift=0em,yshift=0em]n12.east) -- ([xshift=0em,yshift=0em]n13.west);
+\draw [->,thick] ([xshift=0em,yshift=0em]n13.east) -- ([xshift=0em,yshift=0em]n14.west);
+\draw [->,thick] ([xshift=0em,yshift=0em]n14.east) -- ([xshift=0em,yshift=0em]n15.west);
+\draw [->,thick] ([xshift=0em,yshift=0em]n15.east) -- ([xshift=0em,yshift=0em]n16.west);
+
+\draw [->,thick] ([xshift=0em,yshift=0em]n8.south) -- ([xshift=0em,yshift=0em]n11.north);
+\draw [->,thick] ([xshift=0em,yshift=0em]n8.south) -- ([xshift=0em,yshift=0em]n12.north);
+\draw [->,thick] ([xshift=0em,yshift=0em]n8.south) -- ([xshift=0em,yshift=0em]n13.north);
+\draw [->,thick] ([xshift=0em,yshift=0em]n8.south) -- ([xshift=0em,yshift=0em]n15.north);
+
+
+
+\end{scope}
+\end{tikzpicture}
\ No newline at end of file
--- a/Chapter15/Figures/figure-learning-of-local-structure-combination.tex
+++ b/Chapter15/Figures/figure-learning-of-local-structure-combination.tex
+
+%%% outline
+%-------------------------------------------------------------------------
+\begin{tikzpicture}
+\tikzstyle{opt}=[minimum height=1em,minimum width=5em,rounded corners=2pt,thick]
+
+\node[opt] (opt1_0) at (0,0) {固定操作};
+\node[draw,anchor=south,opt,fill=green!20] (opt1_1) at ([yshift=0.8em]opt1_0.north) {操作3};
+\node[draw,anchor=south,opt,fill=cyan!20] (opt1_2) at ([yshift=0.8em]opt1_1.north) {操作2};
+\node[anchor=south,opt] (opt1_3) at ([yshift=0.8em]opt1_2.north) {固定操作};
+\node[draw,anchor=south,opt,fill=yellow!20] (opt1_4) at ([yshift=0.8em]opt1_3.north) {操作4};
+\node[anchor=south,opt] (opt1_5) at ([yshift=0.8em]opt1_4.north) {$\cdots$};
+\node[draw,anchor=south,opt,fill=blue!10] (opt1_6) at ([yshift=0.8em]opt1_5.north) {操作1};
+\node[anchor=south,opt] (opt1_7) at ([yshift=0.8em]opt1_6.north) {};
+
+\node[opt] (opt2_0) at (3,0) {固定操作};
+\node[draw,anchor=south,opt,fill=blue!10] (opt2_1) at ([yshift=0.8em]opt2_0.north) {操作1};
+\node[draw,anchor=south,opt,fill=green!20] (opt2_2) at ([yshift=0.8em]opt2_1.north) {操作3};
+\node[anchor=south,opt] (opt2_3) at ([yshift=0.8em]opt2_2.north) {固定操作};
+\node[draw,anchor=south,opt,fill=red!20] (opt2_4) at ([yshift=0.8em]opt2_3.north) {操作5};
+\node[anchor=south,opt] (opt2_5) at ([yshift=0.8em]opt2_4.north) {$\cdots$};
+\node[draw,anchor=south,opt,fill=cyan!20] (opt2_6) at ([yshift=0.8em]opt2_5.north) {操作2};
+\node[anchor=south,opt] (opt2_7) at ([yshift=0.8em]opt2_6.north) {};
+
+\node[opt] (opt3_0) at (6,0) {固定操作};
+\node[draw,anchor=south,opt,fill=yellow!20] (opt3_1) at ([yshift=0.8em]opt3_0.north) {操作4};
+\node[draw,anchor=south,opt,fill=cyan!20] (opt3_2) at ([yshift=0.8em]opt3_1.north) {操作2};
+\node[anchor=south,opt] (opt3_3) at ([yshift=0.8em]opt3_2.north) {固定操作};
+\node[draw,anchor=south,opt,fill=yellow!20] (opt3_4) at ([yshift=0.8em]opt3_3.north) {操作4};
+\node[anchor=south,opt] (opt3_5) at ([yshift=0.8em]opt3_4.north) {$\cdots$};
+\node[draw,anchor=south,opt,fill=red!20] (opt3_6) at ([yshift=0.8em]opt3_5.north) {操作5};
+\node[anchor=south,opt] (opt3_7) at ([yshift=0.8em]opt3_6.north) {};
+
+\begin{pgfonlayer}{background}
+\node[draw,fill=yellow!20,rounded corners=6pt,inner ysep=2.6em,inner xsep=2.6em] [fit=(opt1_0) (opt3_7)](box4){};
+\node[draw,fill=gray!10,rounded corners=2pt,inner sep=8pt] [fit=(opt1_0) (opt1_7)](box1){};
+\node[draw,fill=gray!10,rounded corners=2pt,inner sep=8pt] [fit=(opt2_0) (opt2_7)](box2){};
+\node[draw,fill=gray!10,rounded corners=2pt,inner sep=8pt] [fit=(opt3_0) (opt3_7)](box3){};
+\end{pgfonlayer}
+
+
+\draw[->,thick] (opt1_0) -- (opt1_1);
+\draw[->,thick] (opt1_1) -- (opt1_2);
+\draw[->,thick] (opt1_2) -- (opt1_3);
+\draw[->,thick] (opt1_3) -- (opt1_4);
+\draw[->,thick] (opt1_4) -- (opt1_5);
+\draw[->,thick] (opt1_5) -- (opt1_6);
+
+\draw[->,thick] (opt2_0) -- (opt2_1);
+\draw[->,thick] (opt2_1) -- (opt2_2);
+\draw[->,thick] (opt2_2) -- (opt2_3);
+\draw[->,thick] (opt2_3) -- (opt2_4);
+\draw[->,thick] (opt2_4) -- (opt2_5);
+\draw[->,thick] (opt2_5) -- (opt2_6);
+
+\draw[->,thick] (opt3_0) -- (opt3_1);
+\draw[->,thick] (opt3_1) -- (opt3_2);
+\draw[->,thick] (opt3_2) -- (opt3_3);
+\draw[->,thick] (opt3_3) -- (opt3_4);
+\draw[->,thick] (opt3_4) -- (opt3_5);
+\draw[->,thick] (opt3_5) -- (opt3_6);
+
+\node[] at ([xshift=-1.2em,yshift=0.2em]opt1_7){\small\bfnew{分支1}};
+\node[] at ([xshift=-1.2em,yshift=0.2em]opt2_7){\small\bfnew{分支2}};
+\node[] at ([xshift=-1.2em,yshift=0.2em]opt3_7){\small\bfnew{分支3}};
+
+\node[] (input) at ([yshift=-5em]opt2_0){\small\bfnew{输入}};
+\node[] (output) at ([yshift=5em]opt2_7){\small\bfnew{输出}};
+
+\draw[->,thick,out=140,in=-30] (box4.-90) to (box1.-90);
+\draw[->,thick,out=40,in=-150] (box4.-90) to (box3.-90);
+\draw[->,thick] (box4.-90) -- (box2.-90);
+
+\draw[->,thick,out=50,in=-130] (box1.90) to (box4.90);
+\draw[->,thick,out=130,in=-50] (box3.90) to (box4.90);
+\draw[->,thick] (box2.90) -- (box4.90);
+
+\draw[->,thick] (input.90) -- (box4.-90);
+\draw[->,thick] (box4.90) -- (output.-90);
+
+\node[] at ([xshift=-2.8em,yshift=1.1em]box1.90){\small\bfnew{模型结构}};
+\node[] at ([xshift=-0.8em]box4.0){$\cdots$};
+\end{tikzpicture}
+
+
+
+
--- a/Chapter15/Figures/figure-light-weight-transformer-module.png
+++ b/Chapter15/Figures/figure-light-weight-transformer-module.png
--- a/Chapter15/Figures/figure-multi-branch-attention-model.png
+++ b/Chapter15/Figures/figure-multi-branch-attention-model.png
--- a/Chapter15/Figures/figure-multi-cell-transformer.png
+++ b/Chapter15/Figures/figure-multi-cell-transformer.png
--- a/Chapter15/Figures/figure-multi-scale-local-modeling.png
+++ b/Chapter15/Figures/figure-multi-scale-local-modeling.png
--- a/Chapter15/Figures/figure-multi-task-structure.png
+++ b/Chapter15/Figures/figure-multi-task-structure.png
--- a/Chapter15/Figures/figure-parallel-RNN-structure.png
+++ b/Chapter15/Figures/figure-parallel-RNN-structure.png
--- a/Chapter15/Figures/figure-parsing-tree-of-a-sentence.png
+++ b/Chapter15/Figures/figure-parsing-tree-of-a-sentence.png
--- a/Chapter15/Figures/figure-structure-search-based-on-evolutionary-algorithm.png
+++ b/Chapter15/Figures/figure-structure-search-based-on-evolutionary-algorithm.png
--- a/Chapter15/Figures/figure-structure-search-based-on-evolutionary-algorithm.tex
+++ b/Chapter15/Figures/figure-structure-search-based-on-evolutionary-algorithm.tex
+
+%%% outline
+%-------------------------------------------------------------------------
+\begin{tikzpicture}
+\tikzstyle{node}=[minimum height=2em,minimum width=4.6em,rounded corners=2pt,thick,font=\footnotesize]
+
+\node[node,draw,fill=green!20] (n11) at (0,0) {模型结构1};
+\node[anchor=west,draw,node,fill=red!20] (n12) at ([xshift=0.6em]n11.east){模型结构6};
+\node[anchor=north,draw,node,fill=orange!20] (n13) at ([yshift=-1em]n11.south){模型结构2};
+\node[anchor=west,draw,node,fill=blue!15] (n14) at ([xshift=0.6em]n13.east){模型结构5};
+\node[anchor=north,draw,node,fill=cyan!30] (n15) at ([yshift=-1em]n13.south){模型结构1};
+\node[anchor=west,draw,node,fill=yellow!20] (n16) at ([xshift=0.6em]n15.east){模型结构4};
+\node[inner sep=0pt] (kind) at (-0.7,0.8){\small\bfnew{种群}};
+
+\node[node,draw,fill=green!20] (n21) at (5,0){模型结构1};
+\node[anchor=west,draw,node,fill=red!20] (n22) at ([xshift=0.6em]n21.east){模型结构4};
+\node[anchor=north,node] (n23) at ([yshift=-1em]n21.south){};
+\node[anchor=west,node] (n24) at ([xshift=0.6em]n23.east){};
+\node[anchor=north,node] (n25) at ([yshift=-1em]n23.south){};
+\node[anchor=west,node] (n26) at ([xshift=0.6em]n25.east){};
+\node[inner sep=0pt] (choice) at (4.8,0.8){\small\bfnew{选中的亲本}};
+
+\node[node,draw,fill=green!20,dotted] (n31) at (10,0){模型结构1};
+\node[anchor=west,draw,node,fill=red!20,dotted] (n32) at ([xshift=0.6em]n31.east){模型结构4};
+\node[anchor=north,node] (n33) at ([yshift=-1em]n31.south){};
+\node[anchor=west,node] (n34) at ([xshift=0.6em]n33.east){};
+\node[anchor=north,draw,node,fill=green!40] (n35) at ([yshift=-1em]n33.south){模型结构1};
+\node[anchor=west,draw,node,fill=red!40] (n36) at ([xshift=0.6em]n35.east){模型结构1};
+\node[inner sep=0pt] (change) at (9.7,0.8){\small\bfnew{亲本变异}};
+
+\begin{pgfonlayer}{background}
+\node[rounded corners=4pt,draw,thick,fill=yellow!10,inner sep=4pt,drop shadow][fit=(kind)(n16)](box1){};
+\node[rounded corners=4pt,draw,thick,fill=yellow!10,inner sep=4pt,drop shadow][fit=(choice)(n26)](box2){};
+\node[rounded corners=4pt,draw,thick,fill=yellow!10,inner sep=4pt,drop shadow][fit=(change)(n36)](box3){};
+\end{pgfonlayer}
+
+\draw[->,very thick] (box1.0) -- (box2.180);
+\draw[->,very thick] (box2.0) -- (box3.180);
+\draw[->,very thick] (n31.-90) -- (n35.90);
+\draw[->,very thick] (n32.-90) -- (n36.90);
+\draw[->,very thick] (box3.-90) .. controls ([yshift=-2em,xshift=-1em]box3.-90) and ([yshift=-2em,xshift=1em]box1.-90) .. node[font=\scriptsize,below]{对变异后的结构进行性能评估，选择优秀的结构加入原始种群} (box1.-90);
+\end{tikzpicture}
+
+
+
+
--- a/Chapter15/Figures/figure-structure-search-based-on-gradient-method.png
+++ b/Chapter15/Figures/figure-structure-search-based-on-gradient-method.png
--- a/Chapter15/Figures/figure-structure-search-based-on-gradient-method.tex
+++ b/Chapter15/Figures/figure-structure-search-based-on-gradient-method.tex
+
+%%% outline
+%-------------------------------------------------------------------------
+\begin{tikzpicture}
+\tikzstyle{node}=[minimum height=6em,inner sep=4pt,align=left,draw,font=\footnotesize,rounded corners=4pt,thick,drop shadow]
+
+\node[node,fill=red!20] (n1) at (0,0){\scriptsize\bfnew{超网络}： \\ [1ex] 模型结构参数 \\[0.4ex] 网络参数};
+\node[anchor=west,node,fill=yellow!20] (n2) at ([xshift=4em]n1.east){\scriptsize\bfnew{优化后的超网络}： \\ [1ex]模型{\color{red}结构参数}（已优化） \\ [0.4ex]网络参数（已优化）};
+\node[anchor=west,node,fill=blue!20] (n3) at ([xshift=6em]n2.east){\scriptsize\bfnew{找到的模型结构}};
+
+\draw[-latex,thick] (n1.0) -- node[above,align=center,font=\scriptsize]{优化后的\\超网络}(n2.180);
+\draw[-latex,thick] (n2.0) -- node[above,align=center,font=\scriptsize]{根据结构参数\\离散化结构}(n3.180);
+\draw[-latex,out=90,in=100,thick] ([xshift=-2em]n1.90) to node[above,font=\scriptsize]{参数优化}([xshift=2em]n1.90);
+\end{tikzpicture}
+
+
+
+
--- a/Chapter15/Figures/figure-structure-search-based-on-reinforcement-learning.png
+++ b/Chapter15/Figures/figure-structure-search-based-on-reinforcement-learning.png
--- a/Chapter15/Figures/figure-structure-search-based-on-reinforcement-learning.tex
+++ b/Chapter15/Figures/figure-structure-search-based-on-reinforcement-learning.tex
+
+%%% outline
+%-------------------------------------------------------------------------
+\begin{tikzpicture}
+\tikzstyle{node}=[minimum height=2em,minimum width=5em,draw,rounded corners=2pt,thick,drop shadow]
+
+\node[node,fill=red!20] (n1) at (0,0){\small\bfnew{环境}};
+\node[anchor=south,node,fill=blue!20] (n2) at ([yshift=5em]n1.north){\small\bfnew{主体}};
+\node[anchor=north,font=\footnotesize] at ([yshift=-0.2em]n1.south){（结构所应用于的任务）};
+\node[anchor=south,font=\footnotesize] at ([yshift=0.2em]n2.north){（结构生成器）};
+
+\draw[-latex,thick] ([yshift=.4em]n1.180) .. controls ([xshift=-3.4em,yshift=.4em]n1.180) and  ([xshift=-3.4em,yshift=-.4em]n2.180) .. node[right,font=\scriptsize,align=left]{\scriptsize\bfnew{奖励} \\ （对输出结果的评价）}([yshift=-.4em]n2.180);
+\draw[-latex,thick] ([yshift=-.4em]n1.180) .. controls ([xshift=-4.4em,yshift=-.4em]n1.180) and  ([xshift=-4.4em,yshift=.4em]n2.180) .. node[left,font=\scriptsize,align=right]{\scriptsize\bfnew{状态} \\ （这个结构在任务中应用 \\ 后得到的输出结果）}([yshift=.4em]n2.180);
+\draw[-latex,thick] (n2.0) .. controls ([xshift=4em]n2.0) and  ([xshift=4em]n1.0) .. node[right,font=\scriptsize,align=left]{\scriptsize\bfnew{动作} \\ （生成一个结构）}(n1.0);
+\end{tikzpicture}
+
+
+
+
--- a/Chapter15/Figures/figure-swish-function-image.png
+++ b/Chapter15/Figures/figure-swish-function-image.png
--- a/Chapter15/Figures/figure-swish-function-image.tex
+++ b/Chapter15/Figures/figure-swish-function-image.tex
+
+%%% outline
+%-------------------------------------------------------------------------
+\begin{tikzpicture}
+
+\begin{axis}[
+width=10cm,height=6cm,
+legend style={legend pos=north west,font=\footnotesize,align=center},
+xtick={-5,-4,-3,-2,-1,0,1,2,3},
+ytick={-2,-1,0,1,2,3},
+x tick label style={font=\scriptsize},
+y tick label style={font=\scriptsize},
+xmin=-5,xmax=3,
+ymin=-2,ymax=3,
+y tick style={opacity=0},
+x tick style={opacity=0},
+grid=major,
+title={\small\bfnew{Swish}},
+]
+
+\addplot[cyan,smooth,thick]{x*(1+exp(-10*x))^-1};
+\addplot[red,smooth,thick]{x*(1+exp(-1*x))^-1};
+\addplot[brown,smooth,thick]{x*(1+exp(-0.1*x))^-1};
+\addplot[gray!60, very thick] coordinates {(0,-2) (0,3)};
+\addplot[gray!60, very thick] coordinates {(0,0) (3,0)};
+\legend{$\beta=0.1$,$\beta=1.0$,$ \ \ \beta=10.0$}
+\end{axis}
+\end{tikzpicture}
+
+
+
+
--- a/Chapter15/Figures/figure-syntax-tree-linearization-example.png
+++ b/Chapter15/Figures/figure-syntax-tree-linearization-example.png
--- a/Chapter15/Figures/figure-three-fusion-methods-of-tree-structure-information-1.png
+++ b/Chapter15/Figures/figure-three-fusion-methods-of-tree-structure-information-1.png
--- a/Chapter15/Figures/figure-three-fusion-methods-of-tree-structure-information-2.png
+++ b/Chapter15/Figures/figure-three-fusion-methods-of-tree-structure-information-2.png
--- a/Chapter15/Figures/figure-three-fusion-methods-of-tree-structure-information-3.png
+++ b/Chapter15/Figures/figure-three-fusion-methods-of-tree-structure-information-3.png
--- a/Chapter15/Figures/figure-weighted-transformer-network-structure.png
+++ b/Chapter15/Figures/figure-weighted-transformer-network-structure.png
--- a/Chapter15/Figures/figure-whole-structure-and-internal-structure-in-rnn.png
+++ b/Chapter15/Figures/figure-whole-structure-and-internal-structure-in-rnn.png
--- a/Chapter15/Figures/figure-whole-structure-and-internal-structure-in-rnn.tex
+++ b/Chapter15/Figures/figure-whole-structure-and-internal-structure-in-rnn.tex
+
+%%% outline
+%-------------------------------------------------------------------------
+\begin{tikzpicture}[scale=0.8]
+\tikzstyle{every node}=[scale=0.8]
+\tikzstyle{node}=[draw,minimum height=1.4em,minimum width=2em,rounded corners=3pt,thick]
+
+\node[node] (n1) at (0,0){};
+\node[node] (n2) at (1.5,0){};
+\node[node] (n3) at (3,0){};
+\node[node] (n4) at (4.5,0){};
+\node[node] (n5) at (1.5,-1.3){};
+\node[node] (n6) at (3,-1.3){};
+\node[node] (n7) at (2.25,-2.4){};
+\node[node] (n8) at (3,1.3){};
+
+\draw[->,thick] (n1.0) -- (n2.180);
+\draw[->,thick] (n2.0) -- (n3.180);
+\draw[->,thick] (n3.0) -- (n4.180);
+\draw[->,thick,out=60,in=180] (n1.90) to (n8.180);
+\draw[->,thick,out=-10,in=90] (n8.0) to (n4.90);
+\draw[->,thick,out=90,in=-90] (n5.90) to (n3.-90);
+\draw[->,thick,out=90,in=-90] (n6.90) to (n4.-90);
+\draw[->,thick,out=90,in=-90] (n7.90) to (n5.-90);
+\draw[->,thick,out=90,in=-90] (n7.90) to (n6.-90);
+\node[font=\huge] (ht) at (-0.2,2.2){$\mathbi h_t$};
+
+\node[draw,font=\huge,inner sep=0pt,minimum width=4em,minimum height=4em,very thick,rounded corners=2pt] (ht-1) at (-3,0) {$\mathbi h_{t-1}$};
+\node[draw,font=\huge,inner sep=0pt,minimum width=4em,minimum height=4em,very thick,rounded corners=2pt] (ht+1) at (7.5,0) {$\mathbi h_{t+1}$};
+
+\node[font=\huge] (xt) at (2.25,-4.2){$x_t$};
+\node[font=\Huge]  at (9,0){$\cdots$};
+\node[font=\Huge]  at (-4.5,0){$\cdots$};
+
+\node[minimum width=3em,minimum height=2em,fill=red!20,rounded corners=6pt] (b1) at (6,-3.8){};
+\node[]  (w1) at (7.8,-3.8){\Large 整体框架};
+\node[minimum width=3em,minimum height=2em,fill=yellow!30,rounded corners=6pt] (b2) at (6,-4.8){};
+\node[] (w2) at (7.8,-4.8){\Large 内部结构};
+
+\begin{pgfonlayer}{background}
+\node[draw,rounded corners=6pt,very thick,fill=yellow!30,minimum width=16em,minimum height=15em] (box1) at (2.25,0){};
+\node[draw=ublue,very thick,drop shadow,inner sep=1.2em,fill=white,xshift=-0.1em] [fit=(b1)(w2)]{};
+\draw[fill=red!20,red!20,rounded corners=6pt] ([yshift=2.4em,xshift=1em]ht-1.east) -- ([yshift=2.4em,xshift=-0.1em]box1.west) -- ([xshift=-8em,xshift=-0.1em]box1.south) -- ([xshift=2em]box1.south) -- ([xshift=2em,yshift=-5em]box1.south) -- ([xshift=0em,yshift=-5em]box1.south) .. controls ([xshift=-12em,yshift=-5em]box1.south) and ([yshift=-2em]ht-1.east) ..([yshift=2.4em]ht-1.east) -- ([yshift=2.4em,xshift=1em]ht-1.east) ;
+\end{pgfonlayer}
+
+\draw[->,very thick] (ht-1.0) -- (box1.180);
+\draw[->,very thick] (box1.0) -- (ht+1.180);
+\draw[->,very thick] (ht-1.90) -- ([yshift=2em]ht-1.90);
+\draw[->,very thick] (ht+1.90) -- ([yshift=2em]ht+1.90);
+\draw[->,very thick] (box1.90) -- ([yshift=2em]box1.90);
+\draw[->,very thick] ([yshift=-2em]ht-1.-90) -- (ht-1.-90);
+\draw[->,very thick] ([yshift=-2em]ht+1.-90) -- (ht+1.-90);
+\draw[->,very thick] ([yshift=-2em]box1.-90) -- (box1.-90);
+
+
+\end{tikzpicture}
+
+
+
+
--- a/Chapter15/chapter15.tex
+++ b/Chapter15/chapter15.tex
--- a/Chapter16/Figures/figure-application-process-of-back-translation.tex
+++ b/Chapter16/Figures/figure-application-process-of-back-translation.tex
@@ -4,9 +4,9 @@
 \tikzstyle{bignode} = [,inner sep=0.3em,draw=black,line width=0.6pt,rounded corners=2pt,minimum width=3.0em]


-\node [anchor=center] (node1-1) at (0,0) {{汉语}};
-\node [anchor=west] (node1-2) at ([xshift=0.8em]node1-1.east) {{英语}};
-\node [anchor=north] (node1-3) at ([xshift=1.75em]node1-1.south) {{反向翻译模型}};
+\node [anchor=center] (node1-1) at (0,0) {\small{汉语}};
+\node [anchor=west] (node1-2) at ([xshift=0.8em]node1-1.east) {\small{英语}};
+\node [anchor=north] (node1-3) at ([xshift=1.75em]node1-1.south) {\small{反向翻译模型}};
 \draw [->,thick](node1-1.east)--(node1-2.west);

 \begin{pgfonlayer}{background}
@@ -16,16 +16,16 @@
 \end{pgfonlayer}


-\node [anchor=north,fill=green!20,bignode](node2-1) at ([yshift=-3em]node1-3.south){{汉语}};
-\node [anchor=north,fill=green!20,bignode](node2-2) at (node2-1.south){{英语}};
-\draw [->,thick](node2-1.north)--(remark1.south) node [pos=0.5,right] (pos1) {{训练}};
+\node [anchor=north,fill=green!20,bignode](node2-1) at ([yshift=-3em]node1-3.south){\small{汉语}};
+\node [anchor=north,fill=green!20,bignode](node2-2) at (node2-1.south){\small{英语}};
+\draw [->,thick](node2-1.north)--(remark1.south) node [pos=0.5,right] (pos1) {\small{训练}};


-\node [anchor=west,fill=yellow!20,bignode](node3-1) at ([xshift=6.5em,yshift=0.0em]node1-2.east){{汉语}};
-\node [anchor=north,fill=red!20,bignode](node3-2) at ([yshift=-2.5em]node3-1.south){{英语}};
+\node [anchor=west,fill=yellow!20,bignode](node3-1) at ([xshift=6.5em,yshift=0.0em]node1-2.east){\small{汉语}};
+\node [anchor=north,fill=red!20,bignode](node3-2) at ([yshift=-2.5em]node3-1.south){\small{英语}};
 \node [anchor=center](node3-3) at ([xshift=0.4em]node3-2.east){};

-\draw [->,thick](node3-1.south)--(node3-2.north) node [pos=0.5,right] (pos2) {{翻译}};
+\draw [->,thick](node3-1.south)--(node3-2.north) node [pos=0.5,right] (pos2) {\small{翻译}};

 \begin{pgfonlayer}{background}
 {
@@ -33,21 +33,21 @@
 }
 \end{pgfonlayer}

-\draw [->,thick](remark1.east)--([xshift=5.5em]remark1.east) node [pos=0.5,above] (pos2) {{模型翻译}};
-\node [anchor=south](pos2-2) at ([yshift=-0.5em]pos2.north){{使用反向}};
+\draw [->,thick](remark1.east)--([xshift=5.5em]remark1.east) node [pos=0.5,above] (pos2) {\small{模型翻译}};
+\node [anchor=south](pos2-2) at ([yshift=-0.5em]pos2.north){\small{使用反向}};

-\draw[decorate,thick,decoration={brace,amplitude=5pt}] ([yshift=1.5em,xshift=1.5em]node3-1.east) -- ([yshift=-8.6em,xshift=1.5em]node3-1.east) node [pos=0.1,right,xshift=0.0em,yshift=0.0em] (label1) {{{混合}}};
+\draw[decorate,thick,decoration={brace,amplitude=5pt}] ([yshift=1.5em,xshift=1.5em]node3-1.east) -- ([yshift=-8.6em,xshift=1.5em]node3-1.east) node [pos=0.1,right,xshift=0.0em,yshift=0.0em] (label1) {\small{{混合}}};


-\node [anchor=west,fill=red!20,bignode](node4-1) at ([xshift=2.5em,yshift=1.3em]node3-2.east){{英语}};
-\node [anchor=north,fill=yellow!20,bignode](node4-2) at (node4-1.south){{汉语}};
-\node [anchor=west,fill=green!20,bignode](node4-3) at (node4-1.east){{英语}};
-\node [anchor=north,fill=green!20,bignode](node4-4) at (node4-3.south){{汉语}};
+\node [anchor=west,fill=red!20,bignode](node4-1) at ([xshift=2.5em,yshift=1.3em]node3-2.east){\small{英语}};
+\node [anchor=north,fill=yellow!20,bignode](node4-2) at (node4-1.south){\small{汉语}};
+\node [anchor=west,fill=green!20,bignode](node4-3) at (node4-1.east){\small{英语}};
+\node [anchor=north,fill=green!20,bignode](node4-4) at (node4-3.south){\small{汉语}};


-\node [anchor=center] (node5-1) at ([xshift=5em,yshift=0.02em]node4-3.east) {{英语}};
-\node [anchor=west] (node5-2) at ([xshift=0.8em]node5-1.east) {{汉语}};
-\node [anchor=north] (node5-3) at ([xshift=1.65em]node5-1.south) {{正向翻译模型}};
+\node [anchor=center] (node5-1) at ([xshift=5em,yshift=0.02em]node4-3.east) {\small{英语}};
+\node [anchor=west] (node5-2) at ([xshift=0.8em]node5-1.east) {\small{汉语}};
+\node [anchor=north] (node5-3) at ([xshift=1.65em]node5-1.south) {\small{正向翻译模型}};
 \draw [->,thick](node5-1.east)--(node5-2.west);

 \begin{pgfonlayer}{background}
@@ -56,11 +56,11 @@
 }
 \end{pgfonlayer}

-\draw [->,thick]([xshift=-3.2em]remark3.west)--(remark3.west) node [pos=0.5,above] (pos3) {{训练}};
+\draw [->,thick]([xshift=-3.2em]remark3.west)--(remark3.west) node [pos=0.5,above] (pos3) {\small{训练}};

-\node [anchor=south](d1) at ([xshift=-1.5em,yshift=1em]remark1.north){{真实数据：}};
-\node [anchor=west](d2) at ([xshift=2.0em]d1.east){{伪数据：}};
-\node [anchor=west](d3) at ([xshift=2.0em]d2.east){{额外数据：}};
+\node [anchor=south](d1) at ([xshift=-1.5em,yshift=1em]remark1.north){\small{真实数据：}};
+\node [anchor=west](d2) at ([xshift=2.0em]d1.east){\small{伪数据：}};
+\node [anchor=west](d3) at ([xshift=2.0em]d2.east){\small{额外数据：}};
 \node [anchor=west,fill=green!20,minimum width=1.5em](d1-1) at ([xshift=-0.0em]d1.east){};
 \node [anchor=west,fill=red!20,minimum width=1.5em](d2-1) at ([xshift=-0.0em]d2.east){};
 \node [anchor=west,fill=yellow!20,minimum width=1.5em](d3-1) at ([xshift=-0.0em]d3.east){};

--- a/Chapter16/Figures/figure-comparison-of-structure-between-gpt-and-bert-model.tex
+++ b/Chapter16/Figures/figure-comparison-of-structure-between-gpt-and-bert-model.tex
+\begin{tikzpicture}
+
+\tikzstyle{embedding} = [line width=0.6pt,draw=black,minimum width=2.5em,minimum height=1.6em,fill=green!20]
+\tikzstyle{model} = [line width=0.6pt,draw=black,minimum width=3.0em,minimum height=1.6em,fill=blue!20,rounded corners=2pt]
+
+\node [anchor=center,model] (node1-1) at (0,0) {\footnotesize{TRM}};
+\node [anchor=west,model] (node1-2) at ([xshift=1.8em]node1-1.east) {\footnotesize{TRM}};
+\node [anchor=west,scale=1.8] (node1-3) at ([xshift=1.0em]node1-2.east) {...};
+\node [anchor=west,model] (node1-4) at ([xshift=1.0em]node1-3.east) {\footnotesize{TRM}};
+\node [anchor=west,model] (node1-5) at ([xshift=2.0em]node1-4.east) {\footnotesize{TRM}};
+\node [anchor=west,model] (node1-6) at ([xshift=1.8em]node1-5.east) {\footnotesize{TRM}};
+\node [anchor=west,scale=1.8] (node1-7) at ([xshift=1.0em]node1-6.east) {...};
+\node [anchor=west,model] (node1-8) at ([xshift=1.0em]node1-7.east) {\footnotesize{TRM}};
+
+\node [anchor=north,embedding] (node0-1) at ([yshift=-2em]node1-1.south){\footnotesize{$\mathbi{e}_1$}};
+\node [anchor=north,embedding] (node0-2) at ([yshift=-2em]node1-2.south){\footnotesize{$\mathbi{e}_2$}};
+\node [anchor=west,scale=1.8] (node0-3) at ([xshift=1.25em]node0-2.east){...};
+\node [anchor=north,embedding] (node0-4) at ([yshift=-2em]node1-4.south){\footnotesize{$\mathbi{e}_n$}};
+
+\node [anchor=south,model](node2-1) at ([yshift=1.8em]node1-1.north){\footnotesize{TRM}};
+\node [anchor=south,model](node2-2) at ([yshift=1.8em]node1-2.north){\footnotesize{TRM}};
+\node [anchor=west,scale=1.8](node2-3) at ([xshift=1.0em]node2-2.east){...};
+\node [anchor=south,model](node2-4) at ([yshift=1.8em]node1-4.north){\footnotesize{TRM}};
+\node [anchor=south,model](node2-5) at ([yshift=1.8em]node1-5.north){\footnotesize{TRM}};
+\node [anchor=south,model](node2-6) at ([yshift=1.8em]node1-6.north){\footnotesize{TRM}};
+\node [anchor=west,scale=1.8](node2-7) at ([xshift=1.0em]node2-6.east){...};
+\node [anchor=south,model](node2-8) at ([yshift=1.8em]node1-8.north){\footnotesize{TRM}};
+
+\draw [->,thick](node1-1.north)--(node2-1.south);
+\draw [->,thick](node1-2.north)--(node2-2.south);
+\draw [->,thick](node1-4.north)--(node2-4.south);
+
+\begin{pgfonlayer}{background}
+{
+\node[fill=white,inner sep=0.5em,draw=black,line width=0.6pt,minimum width=6.0em,rounded corners=2pt,dashed] [fit =(node1-1)(node1-2)(node1-3)(node1-4)(node2-1)] (remark1) {};
+}
+\end{pgfonlayer}
+
+\begin{pgfonlayer}{background}
+{
+\node[fill=white,inner sep=0.5em,draw=black,line width=0.6pt,minimum width=6.0em,rounded corners=2pt,dashed] [fit =(node1-5)(node1-6)(node1-7)(node1-8)(node2-8)] (remark2) {};
+}
+\end{pgfonlayer}
+
+\draw [->,thick](node0-1.north)--(node1-1.south);
+\draw [->,thick](node0-1.north)--(node1-2.south);
+\draw [->,thick](node0-1.north)--(node1-4.south);
+\draw [->,thick](node0-2.north)--(node1-2.south);
+\draw [->,thick](node0-2.north)--(node1-4.south);
+\draw [->,thick](node0-4.north)--(node1-4.south);
+
+\draw [->,thick](node1-1.north)--(node2-1.south);
+\draw [->,thick](node1-1.north)--(node2-2.south);
+\draw [->,thick](node1-1.north)--(node2-4.south);
+\draw [->,thick](node1-2.north)--(node2-2.south);
+\draw [->,thick](node1-2.north)--(node2-4.south);
+\draw [->,thick](node1-4.north)--(node2-4.south);
+
+\node [anchor=south,embedding,fill=yellow!20](node3-1) at ([yshift=2em]node2-1.north){\footnotesize{$\seq{P}_1$}};
+\node [anchor=south,embedding,fill=yellow!20] (node3-2) at ([yshift=2em]node2-2.north){\footnotesize{$\seq{P}_2$}};
+\node [anchor=west,scale=1.8] (node3-3) at ([xshift=1.25em]node3-2.east){...};
+\node [anchor=south,embedding,fill=yellow!20](node3-4) at ([yshift=2em]node2-4.north){\footnotesize{$\seq{P}_n$}};
+
+\draw [<-,thick](node3-1.south)--(node2-1.north);
+\draw [<-,thick](node3-2.south)--(node2-2.north);
+\draw [<-,thick](node3-4.south)--(node2-4.north);
+
+%%%%%%%%%bert
+\node [anchor=north,embedding] (node0-5) at ([yshift=-2em]node1-5.south){\footnotesize{$\mathbi{e}_1$}};
+\node [anchor=north,embedding] (node0-6) at ([yshift=-2em]node1-6.south){\footnotesize{$\mathbi{e}_2$}};
+\node [anchor=west,scale=1.8] (node0-7) at ([xshift=1.25em]node0-6.east){...};
+\node [anchor=north,embedding] (node0-8) at ([yshift=-2em]node1-8.south){\footnotesize{$\mathbi{e}_n$}};
+
+\node [anchor=south,embedding,fill=yellow!20](node3-5) at ([yshift=2em]node2-5.north){\footnotesize{$\seq{P}_1$}};
+\node [anchor=south,embedding,fill=yellow!20] (node3-6) at ([yshift=2em]node2-6.north){\footnotesize{$\seq{P}_2$}};
+\node [anchor=west,scale=1.8] (node3-7) at ([xshift=1.25em]node3-6.east){...};
+\node [anchor=south,embedding,fill=yellow!20](node3-8) at ([yshift=2em]node2-8.north){\footnotesize{$\seq{P}_n$}};
+
+\draw [->,thick](node0-5.north)--(node1-5.south);
+\draw [->,thick](node0-5.north)--(node1-6.south);
+\draw [->,thick](node0-5.north)--(node1-8.south);
+\draw [->,thick](node0-6.north)--(node1-5.south);
+\draw [->,thick](node0-6.north)--(node1-6.south);
+\draw [->,thick](node0-6.north)--(node1-8.south);
+\draw [->,thick](node0-8.north)--(node1-5.south);
+\draw [->,thick](node0-8.north)--(node1-6.south);
+\draw [->,thick](node0-8.north)--(node1-8.south);
+
+\draw [->,thick](node1-5.north)--(node2-5.south);
+\draw [->,thick](node1-5.north)--(node2-6.south);
+\draw [->,thick](node1-5.north)--(node2-8.south);
+\draw [->,thick](node1-6.north)--(node2-5.south);
+\draw [->,thick](node1-6.north)--(node2-6.south);
+\draw [->,thick](node1-6.north)--(node2-8.south);
+\draw [->,thick](node1-8.north)--(node2-5.south);
+\draw [->,thick](node1-8.north)--(node2-6.south);
+\draw [->,thick](node1-8.north)--(node2-8.south);
+
+\draw [<-,thick](node3-5.south)--(node2-5.north);
+\draw [<-,thick](node3-6.south)--(node2-6.north);
+\draw [<-,thick](node3-8.south)--(node2-8.north);
+
+\node [anchor=north] (pos1) at ([xshift=1.5em,yshift=-1.0em]node0-2.south) {\small{(a) GPT模型结构}};
+\node [anchor=north] (pos2) at ([xshift=1.5em,yshift=-1.0em]node0-6.south) {\small{(b) BERT模型结构}};
+
+\node [anchor=south] (ex) at ([xshift=2.1em,yshift=0.5em]node3-1.north) {\small{TRM：transformer}};
+
+
+
+\end{tikzpicture}
\ No newline at end of file
--- a/Chapter16/Figures/figure-contrast-of-traditional-machine-learning&transfer-learning.tex
+++ b/Chapter16/Figures/figure-contrast-of-traditional-machine-learning&transfer-learning.tex
@@ -37,6 +37,10 @@
 \draw[->,thick] ([yshift=-0.2em]n5.-90) -- ([yshift=0.2em]sys.90);
 \draw[->,thick] ([yshift=0.3em,xshift=0.2em]kd.0) -- ([yshift=-0.2em,xshift=-0.2em]sys.180);

+\node [anchor=north] (re1) at ([yshift=-1em]nmt2.south) {\small{(a) 传统机器学习}};
+\node [anchor=west] (re2) at ([xshift=11.0em]re1.east) {\small{(b) 迁移学习}};
+
+
 \end{tikzpicture}



--- a/Chapter16/Figures/figure-elmo-model-structure.tex
+++ b/Chapter16/Figures/figure-elmo-model-structure.tex
+\begin{tikzpicture}
+
+\tikzstyle{embedding} = [line width=0.6pt,draw=black,minimum width=2.5em,minimum height=1.6em,fill=green!20]
+\tikzstyle{model} = [line width=0.6pt,draw=black,minimum width=3.0em,minimum height=1.6em,fill=blue!20,rounded corners=2pt]
+
+\node [anchor=center,model] (node1-1) at (0,0) {\footnotesize{LSTM}};
+\node [anchor=west,model] (node1-2) at ([xshift=1.8em]node1-1.east) {\footnotesize{LSTM}};
+\node [anchor=west,scale=1.8] (node1-3) at ([xshift=1.0em]node1-2.east) {...};
+\node [anchor=west,model] (node1-4) at ([xshift=1.0em]node1-3.east) {\footnotesize{LSTM}};
+\node [anchor=west,model] (node1-5) at ([xshift=2.0em]node1-4.east) {\footnotesize{LSTM}};
+\node [anchor=west,model] (node1-6) at ([xshift=1.8em]node1-5.east) {\footnotesize{LSTM}};
+\node [anchor=west,scale=1.8] (node1-7) at ([xshift=1.0em]node1-6.east) {...};
+\node [anchor=west,model] (node1-8) at ([xshift=1.0em]node1-7.east) {\footnotesize{LSTM}};
+
+\node [anchor=south,model](node2-1) at ([yshift=1.8em]node1-1.north){\footnotesize{LSTM}};
+\node [anchor=south,model](node2-2) at ([yshift=1.8em]node1-2.north){\footnotesize{LSTM}};
+\node [anchor=west,scale=1.8](node2-3) at ([xshift=1.0em]node2-2.east){...};
+\node [anchor=south,model](node2-4) at ([yshift=1.8em]node1-4.north){\footnotesize{LSTM}};
+\node [anchor=south,model](node2-5) at ([yshift=1.8em]node1-5.north){\footnotesize{LSTM}};
+\node [anchor=south,model](node2-6) at ([yshift=1.8em]node1-6.north){\footnotesize{LSTM}};
+\node [anchor=west,scale=1.8](node2-7) at ([xshift=1.0em]node2-6.east){...};
+\node [anchor=south,model](node2-8) at ([yshift=1.8em]node1-8.north){\footnotesize{LSTM}};
+
+\draw [->,thick](node1-1.east)--(node1-2.west);
+\draw [->,thick](node1-2.east)--([xshift=0.5em]node1-3.west);
+\draw [->,thick]([xshift=-0.5em]node1-3.east)--(node1-4.west);
+\draw [<-,thick](node1-5.east)--(node1-6.west);
+\draw [<-,thick](node1-6.east)--([xshift=0.5em]node1-7.west);
+\draw [<-,thick]([xshift=-0.5em]node1-7.east)--(node1-8.west);
+
+\draw [->,thick](node1-1.north)--(node2-1.south);
+\draw [->,thick](node1-2.north)--(node2-2.south);
+\draw [->,thick](node1-4.north)--(node2-4.south);
+\draw [->,thick](node1-5.north)--(node2-5.south);
+\draw [->,thick](node1-6.north)--(node2-6.south);
+\draw [->,thick](node1-8.north)--(node2-8.south);
+
+\draw [->,thick](node2-1.east)--(node2-2.west);
+\draw [->,thick](node2-2.east)--([xshift=0.5em]node2-3.west);
+\draw [->,thick]([xshift=-0.5em]node2-3.east)--(node2-4.west);
+\draw [<-,thick](node2-5.east)--(node2-6.west);
+\draw [<-,thick](node2-6.east)--([xshift=0.5em]node2-7.west);
+\draw [<-,thick]([xshift=-0.5em]node2-7.east)--(node2-8.west);
+
+\begin{pgfonlayer}{background}
+{
+\node[fill=white,inner sep=0.5em,draw=black,line width=0.6pt,minimum width=6.0em,rounded corners=2pt,dashed] [fit =(node1-1)(node1-2)(node1-3)(node1-4)(node2-1)] (remark1) {};
+}
+\end{pgfonlayer}
+
+\begin{pgfonlayer}{background}
+{
+\node[fill=white,inner sep=0.5em,draw=black,line width=0.6pt,minimum width=6.0em,rounded corners=2pt,dashed] [fit =(node1-5)(node1-6)(node1-7)(node1-8)(node2-8)] (remark1) {};
+}
+\end{pgfonlayer}
+
+\node [anchor=north,embedding] (node0-2) at ([yshift=-2em]node1-4.south){\footnotesize{$\mathbi{e}_2$}};
+\node [anchor=east,embedding] (node0-1) at ([xshift=-1.4em]node0-2.west){\footnotesize{$\mathbi{e}_1$}};
+\node [anchor=north,scale=1.8] (node0-3) at ([yshift=-2em]node1-5.south){...};
+\node [anchor=north,embedding] (node0-4) at ([yshift=-2em]node1-6.south){\footnotesize{$\mathbi{e}_n$}};
+
+\draw [->,thick](node0-1.north)--(node1-1.south);
+\draw [->,thick](node0-1.north)--(node1-5.south);
+\draw [->,thick](node0-2.north)--(node1-2.south);
+\draw [->,thick](node0-2.north)--(node1-6.south);
+\draw [->,thick](node0-4.north)--(node1-4.south);
+\draw [->,thick](node0-4.north)--(node1-8.south);
+
+\node [anchor=south,embedding,fill=yellow!20](node3-2) at ([yshift=2em]node2-4.north){\footnotesize{$\seq{P}_2$}};
+\node [anchor=east,embedding,fill=yellow!20] (node3-1) at ([xshift=-1.4em]node3-2.west){\footnotesize{$\seq{P}_1$}};
+\node [anchor=south,scale=1.8] (node3-3) at ([yshift=2em]node2-5.north){...};
+\node [anchor=south,embedding,fill=yellow!20](node3-4) at ([yshift=2em]node2-6.north){\footnotesize{$\seq{P}_n$}};
+
+\draw [<-,thick](node3-1.south)--(node2-1.north);
+\draw [<-,thick](node3-1.south)--(node2-5.north);
+\draw [<-,thick](node3-2.south)--(node2-2.north);
+\draw [<-,thick](node3-2.south)--(node2-6.north);
+\draw [<-,thick](node3-4.south)--(node2-4.north);
+\draw [<-,thick](node3-4.south)--(node2-8.north);
+
+\end{tikzpicture}
+
+
+
+
+
+
+
+
+
+
+
+
--- a/Chapter16/Figures/figure-example-of-iterative-back-translation.tex
+++ b/Chapter16/Figures/figure-example-of-iterative-back-translation.tex
 \begin{tikzpicture}
 \tikzstyle{rec} = [inner sep=0.3em,minimum width=4em,draw=black,line width=0.6pt,rounded corners=2pt]
-\node [anchor=north,fill=green!20,rec](node1-1) at (0,0){{汉语}};
-\node [anchor=north,fill=green!20,rec](node1-2) at (node1-1.south){{英语}};
-\node [anchor=north,fill=yellow!20,rec](node2-1) at ([yshift=-5.0em]node1-1.south){{汉语}};
-\node [anchor=north,fill=red!20,rec](node2-2) at (node2-1.south){{英语}};
-\node [anchor=east] (node3-1) at ([xshift=-4.0em,yshift=-3.5em]node1-1.west) {{正向}};
-\node [anchor=north] (node3-2) at ([yshift=0.5em]node3-1.south) {{翻译模型}};
+\node [anchor=north,fill=green!20,rec](node1-1) at (0,0){\small{汉语}};
+\node [anchor=north,fill=green!20,rec](node1-2) at (node1-1.south){\small{英语}};
+\node [anchor=north,fill=yellow!20,rec](node2-1) at ([yshift=-5.0em]node1-1.south){\small{汉语}};
+\node [anchor=north,fill=red!20,rec](node2-2) at (node2-1.south){\small{英语}};
+\node [anchor=east] (node3-1) at ([xshift=-4.0em,yshift=-3.5em]node1-1.west) {\small{正向}};
+\node [anchor=north] (node3-2) at ([yshift=0.5em]node3-1.south) {\small{翻译模型}};
 \begin{pgfonlayer}{background}
 {
 \node[fill=blue!20,inner sep=0.3em,draw=black,line width=0.6pt,minimum width=3.0em,drop shadow,rounded corners=2pt] [fit =(node3-1)(node3-2)]  (remark1) {};
@@ -14,8 +14,8 @@
 \draw [->,thick]([yshift=-0.75em]node1-1.west)--(remark1.north east);
 \draw [->,thick,dashed](remark1.south east)--([yshift=-0.75em]node2-1.west);

-\node [anchor=west] (node4-1) at ([xshift=4.0em,yshift=-3.5em]node1-1.east) {{反向}};
-\node [anchor=north] (node4-2) at ([yshift=0.5em]node4-1.south) {{翻译模型}};
+\node [anchor=west] (node4-1) at ([xshift=4.0em,yshift=-3.5em]node1-1.east) {\small{反向}};
+\node [anchor=north] (node4-2) at ([yshift=0.5em]node4-1.south) {\small{翻译模型}};
 \begin{pgfonlayer}{background}
 {
 \node[fill=blue!20,inner sep=0.3em,draw=black,line width=0.6pt,minimum width=3.0em,drop shadow,rounded corners=2pt] [fit =(node4-1)(node4-2)]  (remark2) {};
@@ -24,15 +24,15 @@
 \draw [->,thick]([yshift=-0.75em]node1-1.east)--(remark2.north west);
 \draw [->,thick]([yshift=-0.75em]node2-1.east)--(remark2.south west);

-\node [anchor=west,fill=green!20,rec](node5-1) at ([xshift=4.0em,yshift=3.48em]node4-1.east){{英语}};
-\node [anchor=north,fill=green!20,rec](node5-2) at (node5-1.south){{汉语 }};
-\node [anchor=north,fill=yellow!20,rec](node6-1) at ([yshift=-5.0em]node5-1.south){{英语}};
-\node [anchor=north,fill=red!20,rec](node6-2) at (node6-1.south){{汉语}};
+\node [anchor=west,fill=green!20,rec](node5-1) at ([xshift=4.0em,yshift=3.48em]node4-1.east){\small{英语}};
+\node [anchor=north,fill=green!20,rec](node5-2) at (node5-1.south){\small{汉语}};
+\node [anchor=north,fill=yellow!20,rec](node6-1) at ([yshift=-5.0em]node5-1.south){\small{英语}};
+\node [anchor=north,fill=red!20,rec](node6-2) at (node6-1.south){\small{汉语}};

 \draw [->,thick,dashed](remark2.south east)--([yshift=-0.75em]node6-1.west);

-\node [anchor=west] (node7-1) at ([xshift=4.0em,yshift=-3.5em]node5-1.east) {{正向}};
-\node [anchor=north] (node7-2) at ([yshift=0.5em]node7-1.south) {{翻译模型}};
+\node [anchor=west] (node7-1) at ([xshift=4.0em,yshift=-3.5em]node5-1.east) {\small{正向}};
+\node [anchor=north] (node7-2) at ([yshift=0.5em]node7-1.south) {\small{翻译模型}};
 \begin{pgfonlayer}{background}
 {
 \node[fill=blue!20,inner sep=0.3em,draw=black,line width=0.6pt,minimum width=3.0em,drop shadow,rounded corners=2pt] [fit =(node7-1)(node7-2)]  (remark3) {};
@@ -42,14 +42,14 @@
 \draw [->,thick]([yshift=-0.75em]node5-1.east)--(remark3.north west);
 \draw [->,thick]([yshift=-0.75em]node6-1.east)--(remark3.south west);

-\node [anchor=south](d1) at ([xshift=-0.7em,yshift=4em]remark1.north){{真实数据：}};
-\node [anchor=west](d2) at ([xshift=2.0em]d1.east){{伪数据：}};
-\node [anchor=west](d3) at ([xshift=2.0em]d2.east){{额外数据：}};
+\node [anchor=south](d1) at ([xshift=-0.7em,yshift=5.5em]remark1.north){\small{真实数据：}};
+\node [anchor=west](d2) at ([xshift=2.0em]d1.east){\small{伪数据：}};
+\node [anchor=west](d3) at ([xshift=2.0em]d2.east){\small{额外数据：}};
 \node [anchor=west,fill=green!20,minimum width=1.5em](d1-1) at ([xshift=-0.0em]d1.east){};
 \node [anchor=west,fill=red!20,minimum width=1.5em](d2-1) at ([xshift=-0.0em]d2.east){};
 \node [anchor=west,fill=yellow!20,minimum width=1.5em](d3-1) at ([xshift=-0.0em]d3.east){};
-\node [anchor=south] (d4) at ([xshift=1em]d1.north) {{训练：}};
-\node [anchor=south] (d5) at ([xshift=0.5em]d2.north) {{推理：}};
+\node [anchor=north] (d4) at ([xshift=1em]d1.south) {\small{训练：}};
+\node [anchor=north] (d5) at ([xshift=0.5em]d2.south) {\small{推理：}};
 \draw [->,thick] ([xshift=0em]d4.east)--([xshift=1.5em]d4.east);
 \draw [->,thick,dashed] ([xshift=0em]d5.east)--([xshift=1.5em]d5.east);


--- a/Chapter16/Figures/figure-mass.tex
+++ b/Chapter16/Figures/figure-mass.tex
 \begin{tikzpicture}
-\begin{scope}
-\tikzstyle{word} = [font=\scriptsize]
-\tikzstyle{model} = [rectangle,draw,minimum height=3em,minimum width=6em,rounded corners=4pt,fill=red!15!white]
+\tikzstyle{word} = [font=\scriptsize,minimum height=1.4em]
+\tikzstyle{model} = [rectangle,line width=0.7pt,draw,minimum height=3em,minimum width=13em,rounded corners=4pt,fill=red!20]

 \node [anchor=center] (ate) at (0,0) {};
+%decoder
+\node [anchor=center,model,fill=blue!20] (decoder) at ([xshift=7.5em]ate.east) {解码器};
+\node [anchor=north,word] (w1) at ([yshift=-1.5em,xshift=0em]decoder.south) {\small{$x_3$}};
+\node [anchor=west,word] (w2) at ([xshift=0em]w1.east) {\small{$x_4$}};
+\node [anchor=west,word] (w3) at ([xshift=0em]w2.east) {[M]};

-\node [model,minimum width=10.5em,line width=0.7pt] (decoder) at ([xshift=6em]ate.east) {Decoder};
-\node [word] (w1) at ([yshift=-2em,xshift=1em]decoder.south) {\small{$x_3$}};
-\node [word] (w2) at ([xshift=-1em]w1.west) {\#};
-\node [word] (w3) at ([xshift=-1em]w2.west) {\#};
-\node [word] (w4) at ([xshift=-1em]w3.west) {\#};
-\node [word] (w5) at ([xshift=1em]w1.east) {\small{$x_4$}};
-\node [word] (w6) at ([xshift=1em]w5.east) {\#};
+\node [anchor=east,word] (w4) at ([xshift=0em]w1.west) {[M]};
+\node [anchor=east,word] (w5) at ([xshift=0em]w4.west) {[M]};
+\node [anchor=east,word] (w6) at ([xshift=0em]w5.west) {[M]};
+\node [anchor=west,word] (w7) at ([xshift=0em]w3.east) {[M]};

-\node [word] (w7) at ([yshift=2em,xshift=1em]decoder.north) {\small{$x_4$}};
-\node [word] (w8) at ([yshift=0em,xshift=-1em]w7.west) {\small{$x_3$}};
-\node [word] (w9) at ([yshift=0em,xshift=1em]w7.east) {\small{$x_5$}};
+\node [anchor=south,word] (w8) at ([yshift=1.5em,xshift=0em]decoder.north) {\small{$x_4$}};
+\node [anchor=east,word] (w9) at (w8.west) {\small{$x_3$}};
+\node [anchor=west,word] (w10) at (w8.east) {\small{$x_5$}};

 \draw [->,thick] (w1.north) -- ([yshift=1.35em]w1.north);
 \draw [->,thick] (w2.north) -- ([yshift=1.35em]w2.north);
@@ -23,19 +24,23 @@
 \draw [->,thick] (w4.north) -- ([yshift=1.35em]w4.north);
 \draw [->,thick] (w5.north) -- ([yshift=1.35em]w5.north);
 \draw [->,thick] (w6.north) -- ([yshift=1.35em]w6.north);
+\draw [->,thick] (w7.north) -- ([yshift=1.35em]w7.north);

-\draw [->,thick] ([yshift=-1.4em]w7.south) -- (w7.south);
 \draw [->,thick] ([yshift=-1.4em]w8.south) -- (w8.south);
 \draw [->,thick] ([yshift=-1.4em]w9.south) -- (w9.south);
+\draw [->,thick] ([yshift=-1.4em]w10.south) -- (w10.south);

 %encoder
-\node [model,minimum width=10.5em,line width=0.7pt] (encoder) at ([xshift=-6em]ate.west) {Encoder};
-\node [word] (we1) at ([yshift=-2em,xshift=1em]encoder.south) {\#};
-\node [word] (we2) at ([xshift=-1em]we1.west) {\#};
-\node [word] (we3) at ([xshift=-1em]we2.west) {\small{$x_2$}};
-\node [word] (we4) at ([xshift=-1em]we3.west) {\small{$x_3$}};
-\node [word] (we5) at ([xshift=1em]we1.east) {\#};
-\node [word] (we6) at ([xshift=1em]we5.east) {\small{$x_6$}};
+\node [model] (encoder) at ([xshift=-7.5em]ate.west) {编码器};
+
+\node [anchor=north,word] (we1) at ([yshift=-1.5em,xshift=0em]encoder.south) {[M]};
+\node [anchor=west,word] (we2) at ([xshift=0em]we1.east) {[M]};
+\node [anchor=west,word] (we3) at ([xshift=0em]we2.east) {\small{$x_6$}};
+
+\node [anchor=east,word] (we4) at ([xshift=0em]we1.west) {[M]};
+\node [anchor=east,word] (we5) at ([xshift=0em]we4.west) {\small{$x_2$}};
+\node [anchor=east,word] (we6) at ([xshift=0em]we5.west) {\small{$x_1$}};
+\node [anchor=west,word] (we7) at ([xshift=0em]we3.east) {\small{$x_7$}};

 \draw [->,thick] (we1.north) -- ([yshift=1.35em]we1.north);
 \draw [->,thick] (we2.north) -- ([yshift=1.35em]we2.north);
@@ -43,7 +48,8 @@
 \draw [->,thick] (we4.north) -- ([yshift=1.35em]we4.north);
 \draw [->,thick] (we5.north) -- ([yshift=1.35em]we5.north);
 \draw [->,thick] (we6.north) -- ([yshift=1.35em]we6.north);
+\draw [->,thick] (we7.north) -- ([yshift=1.35em]we7.north);

-\draw [->,line width=2pt] ([xshift=0.5em]encoder)--([xshift=-0.5em]decoder);
-\end{scope}
+\draw [->,very thick] ([xshift=0.5em]encoder)--([xshift=-0.3em]decoder);
+\node [anchor=south] (ex) at ([xshift=-4.0em,yshift=1.0em]encoder.north) {\small{[M]：Mask}};
 \end{tikzpicture}
\ No newline at end of file
--- a/Chapter16/Figures/figure-optimization-of-the-model-initialization-method.tex
+++ b/Chapter16/Figures/figure-optimization-of-the-model-initialization-method.tex
@@ -6,15 +6,15 @@
 \tikzstyle{circle} = [draw,black,line width=0.6pt,inner sep=3.5pt,rounded corners=4pt,minimum width=2em]
 \tikzstyle{word} = [inner sep=3.5pt]

-\node[circle,fill=red!20](data) at (0,0) {数据};
-\node[circle,fill=blue!20](model) at ([xshift=5em]data.east) {模型};
-\node[word] (init) at ([xshift=-5em]data.west){初始化};
+\node[circle,fill=red!20](data) at (0,0) {\small{数据}};
+\node[circle,fill=blue!20](model) at ([xshift=5em]data.east) {\small{模型}};
+\node[word] (init) at ([xshift=-5em]data.west){\small{初始化}};

 \draw[->,thick] (init.east) -- ([xshift=-0.2em]data.west);
-\draw [->,thick] ([yshift=1pt]data.north) .. controls +(90:2em) and +(90:2em) .. ([yshift=1pt]model.north) node[above,midway] {参数优化};
-\draw [->,thick] ([yshift=1pt]model.south) .. controls +(-90:2em) and +(-90:2em) .. ([yshift=1pt]data.south) node[below,midway] {数据优化};
+\draw [->,thick] ([yshift=1pt]data.north) .. controls +(90:2em) and +(90:2em) .. ([yshift=1pt]model.north) node[above,midway] {\small{参数优化}};
+\draw [->,thick] ([yshift=1pt]model.south) .. controls +(-90:2em) and +(-90:2em) .. ([yshift=1pt]data.south) node[below,midway] {\small{数据优化}};

-\node[word] at ([xshift=-0.5em,yshift=-5em]data.south){（a）思路1};
+\node[word] at ([xshift=-0.5em,yshift=-4em]data.south){\small{(a) 思路1}};

 \end{scope}
 \end{tikzpicture}
@@ -25,15 +25,15 @@
 \tikzstyle{circle} = [draw,black,line width=0.6pt,inner sep=3.5pt,rounded corners=4pt,minimum width=2em]
 \tikzstyle{word} = [inner sep=3.5pt]

-\node[circle,fill=red!20](data) at (0,0) {数据};
-\node[circle,fill=blue!20](model) at ([xshift=5em]data.east) {模型};
-\node[word] (init) at ([xshift=5em]model.east){初始化};
+\node[circle,fill=red!20](data) at (0,0) {\small{数据}};
+\node[circle,fill=blue!20](model) at ([xshift=5em]data.east) {\small{模型}};
+\node[word] (init) at ([xshift=5em]model.east){\small{初始化}};

 \draw[->,thick] (init.west) -- ([xshift=0.2em]model.east);
-\draw [->,thick] ([yshift=1pt]data.north) .. controls +(90:2em) and +(90:2em) .. ([yshift=1pt]model.north) node[above,midway] {参数优化};
-\draw [->,thick] ([yshift=1pt]model.south) .. controls +(-90:2em) and +(-90:2em) .. ([yshift=1pt]data.south) node[below,midway] {数据优化};
+\draw [->,thick] ([yshift=1pt]data.north) .. controls +(90:2em) and +(90:2em) .. ([yshift=1pt]model.north) node[above,midway] {\small{参数优化}};
+\draw [->,thick] ([yshift=1pt]model.south) .. controls +(-90:2em) and +(-90:2em) .. ([yshift=1pt]data.south) node[below,midway] {\small{数据优化}};

-\node[word] at ([xshift=-0.5em,yshift=-5em]model.south){（b）思路2};
+\node[word] at ([xshift=-0.5em,yshift=-4em]model.south){\small{(b) 思路2}};

 \end{scope}
 \end{tikzpicture}

--- a/Chapter16/Figures/figure-parameter-initialization-method-diagram.tex
+++ b/Chapter16/Figures/figure-parameter-initialization-method-diagram.tex
@@ -6,11 +6,11 @@

 \node[node,minimum width=6em,minimum height=2.4em,fill=red!20,line width=0.6pt] (encoder1) at (0,0){\small 编码器};
 \node[node,anchor=west,minimum width=6em,minimum height=2.4em,fill=red!20,line width=0.6pt] (encoder2) at ([xshift=4em,yshift=0em]encoder1.east){\small 编码器};
-\node[node,anchor=west,minimum width=6em,minimum height=2.4em,fill=red!40,line width=0.6pt] (encoder3) at ([xshift=3em]encoder2.east){\small 编码器};
+\node[node,anchor=west,minimum width=6em,minimum height=2.4em,fill=red!30,line width=0.6pt] (encoder3) at ([xshift=3em]encoder2.east){\small 编码器};

 \node[node,anchor=north,minimum width=6em,minimum height=2.4em,fill=blue!20,line width=0.6pt] (decoder1) at ([yshift=-3em]encoder1.south){\small 解码器};
 \node[node,anchor=west,minimum width=6em,minimum height=2.4em,fill=blue!20,line width=0.6pt] (decoder2) at ([xshift=4em,yshift=0em]decoder1.east){\small 解码器};
-\node[node,anchor=west,minimum width=6em,minimum height=2.4em,fill=blue!40,line width=0.6pt] (decoder3) at ([xshift=3em]decoder2.east){\small 解码器};
+\node[node,anchor=west,minimum width=6em,minimum height=2.4em,fill=blue!30,line width=0.6pt] (decoder3) at ([xshift=3em]decoder2.east){\small 解码器};

 \node[anchor=north,font=\scriptsize,fill=yellow!20] (w1) at ([yshift=-1.6em]decoder1.south){知识 \ 就是 \ 力量 \ 。 \ <EOS>};
 \node[anchor=north,font=\scriptsize,fill=green!20] (w3) at ([yshift=-1.6em]decoder3.south){Wissen  \ ist \ Machit \ . \ <EOS>};

--- a/Chapter16/Figures/figure-schematic-of-the-domain-discriminator.tex
+++ b/Chapter16/Figures/figure-schematic-of-the-domain-discriminator.tex
 \begin{tikzpicture}
 \tikzstyle{rec} = [,line width=0.6pt,draw,rounded corners,minimum height=2.2em,minimum width=4.3em,fill=blue!20]
-\node [anchor=center](node1) at (0,0) {源语言};
+\node [anchor=center](node1) at (0,0) {\small{源语言}};

-\node [anchor=west,rec,fill=red!20](node2) at ([xshift=2.0em]node1.east){编码器};
-\node [anchor=west,rec](node3) at ([xshift=3.0em,yshift=2.0em]node2.east){解码器};
-\node [anchor=west,rec,fill=yellow!20](node4) at ([xshift=3.0em,yshift=-2.0em]node2.east){鉴别器};
+\node [anchor=west,rec,fill=red!20](node2) at ([xshift=2.0em]node1.east){\small{编码器}};
+\node [anchor=west,rec](node3) at ([xshift=3.0em,yshift=2.0em]node2.east){\small{解码器}};
+\node [anchor=west,rec,fill=yellow!20](node4) at ([xshift=3.0em,yshift=-2.0em]node2.east){\small{判别器}};

 \draw [->,thick](node1.east)--(node2.west);
 \draw [->,thick](node2.east)--([xshift=1.5em]node2.east)--([xshift=1.5em,yshift=2.0em]node2.east)--(node3.west);
 \draw [->,thick](node2.east)--([xshift=1.5em]node2.east)--([xshift=1.5em,yshift=-2.0em]node2.east)--(node4.west);
-\node [anchor=west,minimum width=5.0em](node5) at ([xshift=2.0em]node3.east) {目标语言};
-\node [anchor=west,minimum width=5.0em](node6) at ([xshift=2.0em]node4.east) {< 领域 >};
+\node [anchor=west,minimum width=5.0em](node5) at ([xshift=2.0em]node3.east) {\small{目标语言}};
+\node [anchor=west,minimum width=5.0em](node6) at ([xshift=2.0em]node4.east) {\small{< 领域 >}};
 \draw [->,thick](node3.east)--(node5.west);
 \draw [->,thick](node4.east)--(node6.west);
 \end{tikzpicture}
\ No newline at end of file
--- a/Chapter16/Figures/figure-the-iterative-process-of-bidirectional-training.tex
+++ b/Chapter16/Figures/figure-the-iterative-process-of-bidirectional-training.tex
@@ -9,7 +9,7 @@
 \node(bilingual_D_shadow)[data_shadow, right of = monolingual_X_shadow, xshift=5cm]{};
 \node(monolingual_Y_shadow)[data_shadow, right of = bilingual_D_shadow, xshift=5cm]{};
 \node(monolingual_X)[data,right of = monolingual_X_shadow,xshift=-0.08cm,yshift=0.08cm]{单语语料X};
-\node(bilingual_D)[data, right of = monolingual_X, xshift=5cm, fill=ugreen!25]{双语语料D};
+\node(bilingual_D)[data, right of = monolingual_X, xshift=5cm, fill=green!30]{双语语料D};
 \node(monolingual_Y)[data, right of = bilingual_D, xshift=5cm, fill=blue!25]{单语语料Y};

 \node(process_1_1)[process, right of = monolingual_X, xshift=2.5cm, yshift=-1.5cm]{\textbf{$M^0_{x\to y}$}};
@@ -35,7 +35,7 @@
 \node(text_2)[left of = process_5_1, xshift=-4cm,scale=0.8]{第1轮迭代};
 \node(text_3)[left of = ellipsis_2, xshift=-4cm, scale=0.8]{第2轮迭代};
 \draw[->, very thick, color=color1!40](monolingual_X.south)--(ellipsis_1.north);
-\draw[->, very thick, color=ugreen!55](bilingual_D.south)--(ellipsis_3.north);
+\draw[->, very thick, color=green!30](bilingual_D.south)--(ellipsis_3.north);
 \draw[->, very thick, color=blue!55](monolingual_Y.south)--(ellipsis_5.north);
 \draw[->, very thick, color=color1!40]([xshift=-1.5cm]process_2_1.west)--(process_2_1.west);
 \draw[->, very thick, color=color1!40]([xshift=-1.5cm]process_5_1.west)--(process_5_1.west);
@@ -55,13 +55,13 @@
 \draw[->, very thick, color=color1!40](process_3_2.west)--([yshift=0.35cm]process_4_1.east);
 \draw[->, very thick, color=color1!40](process_6_1.east)--([yshift=0.35cm]process_7_2.west);
 \draw[->, very thick, color=color1!40](process_6_2.west)--([yshift=0.35cm]process_7_1.east);
-\draw[->, very thick, color=ugreen!55,in=0,out=270]([xshift=-0.3cm]bilingual_D.south)to(process_1_1.east);
-\draw[->, very thick, color=ugreen!55,in=180,out=270]([xshift=0.3cm]bilingual_D.south)to(process_1_2.west);
-\draw[->, very thick, color=ugreen!55,in=0,out=270]([yshift=-3.7cm]bilingual_D.south)to(process_4_1.east);
-\draw[->, very thick, color=ugreen!55,in=180,out=270]([yshift=-3.7cm]bilingual_D.south)to(process_4_2.west);
-\draw[->, very thick, color=ugreen!55,in=0,out=270]([yshift=-7.3cm]bilingual_D.south)to(process_7_1.east);
-\draw[->, very thick, color=ugreen!55,in=180,out=270]([yshift=-7.3cm]bilingual_D.south)to(process_7_2.west);
-\draw[->, very thick, color=ugreen!55,in=180,out=270]([yshift=-7.3cm]bilingual_D.south)to(process_7_2.west);
+\draw[->, very thick, color=green!30,in=0,out=270]([xshift=-0.3cm]bilingual_D.south)to(process_1_1.east);
+\draw[->, very thick, color=green!30,in=180,out=270]([xshift=0.3cm]bilingual_D.south)to(process_1_2.west);
+\draw[->, very thick, color=green!30,in=0,out=270]([yshift=-3.7cm]bilingual_D.south)to(process_4_1.east);
+\draw[->, very thick, color=green!30,in=180,out=270]([yshift=-3.7cm]bilingual_D.south)to(process_4_2.west);
+\draw[->, very thick, color=green!30,in=0,out=270]([yshift=-7.3cm]bilingual_D.south)to(process_7_1.east);
+\draw[->, very thick, color=green!30,in=180,out=270]([yshift=-7.3cm]bilingual_D.south)to(process_7_2.west);
+\draw[->, very thick, color=green!30,in=180,out=270]([yshift=-7.3cm]bilingual_D.south)to(process_7_2.west);

 \draw[-, very thick, dashed, color=blue!55]([xshift=-1cm,yshift=-0.35cm]text_1.south)--([xshift=12.7cm,yshift=-0.35cm]text_1.south);
 \draw[-, very thick, dashed, color=blue!55]([xshift=-1cm,yshift=-0.35cm]text_2.south)--([xshift=12.7cm,yshift=-0.35cm]text_2.south);

--- a/Chapter16/Figures/figure-unmt-process.tex
+++ b/Chapter16/Figures/figure-unmt-process.tex
@@ -7,24 +7,24 @@
 \node[circle](center) at (0,0) {
 \begin{tabular}{c | c}
 $x\rightarrow y$ & $y\rightarrow x$ \\
-模型 & 模型
+\small{模型} & \small{模型}
 \end{tabular}
 };
-\node[circle,fill=red!20] (left) at ([xshift=-9em]center.west) {$x\rightarrow y$ \\ 数据};
-\node[circle,fill=red!20] (right) at ([xshift=9em]center.east) {$y\rightarrow x$ \\ 数据};
+\node[circle,fill=red!20] (left) at ([xshift=-9em]center.west) {$x\rightarrow y$ \\ \small{数据}};
+\node[circle,fill=red!20] (right) at ([xshift=9em]center.east) {$y\rightarrow x$ \\ \small{数据}};

-\node[word] (init) at ([yshift=6em]center.north){初始化};
+\node[word] (init) at ([yshift=6em]center.north){\small{初始化}};

-\node[circle,fill=red!20] (down) at ([yshift=-8em]center.south) {$x,y$ \\ 数据};
+\node[circle,fill=red!20] (down) at ([yshift=-8em]center.south) {$x,y$ \\ \small{数据}};

 \draw[->,thick] (init.south) -- ([yshift=0.2em]center.north);
-\draw[->,thick] ([yshift=0.2em]down.north) -- ([yshift=-0.2em]center.south) node[pos=0.6,midway,align=left,xshift=-2.5em,yshift=0.5em] {语言模型\\目标函数};
-\node [anchor=center] at ([yshift=2.0em,xshift=-2.5em]down.north){（模型优化）};
-\draw[->,thick] ([yshift=1pt]left.north) .. controls +(90:2em) and +(90:2em) .. ([yshift=1pt,xshift=-2.2em]center.north) node[above,midway,align=center] {翻译模型目标函数\\（模型优化）};
-\draw[->,thick] ([yshift=1pt,xshift=-1.8em]center.north) .. controls +(90:2em) and +(90:2em) .. ([yshift=1pt]right.north) node[above,pos=0.6,align=center] {回译\\（数据优化）};
+\draw[->,thick] ([yshift=0.2em]down.north) -- ([yshift=-0.2em]center.south) node[pos=0.6,midway,align=left,xshift=-2.5em,yshift=0.5em] {\small{语言模型}\\\small{目标函数}};
+\node [anchor=center] at ([yshift=2.0em,xshift=-2.5em]down.north){\small{（模型优化）}};
+\draw[->,thick] ([yshift=1pt]left.north) .. controls +(90:2em) and +(90:2em) .. ([yshift=1pt,xshift=-2.2em]center.north) node[above,midway,align=center] {\small{翻译模型目标函数}\\\small{（模型优化）}};
+\draw[->,thick] ([yshift=1pt,xshift=-1.8em]center.north) .. controls +(90:2em) and +(90:2em) .. ([yshift=1pt]right.north) node[above,pos=0.6,align=center] {\small{回译}\\\small{（数据优化）}};

-\draw [->,thick] ([yshift=1pt]right.south) .. controls +(-90:2em) and +(-90:2em) .. ([yshift=1pt,xshift=2.2em]center.south) node[below,midway,align=center] {翻译模型目标函数\\（模型优化）};
-\draw [->,thick] ([yshift=1pt,xshift=1.8em]center.south) .. controls +(-90:2em) and +(-90:2em) .. ([yshift=1pt]left.south) node[below,pos=0.6,align=center] {回译\\（数据优化）};
+\draw [->,thick] ([yshift=1pt]right.south) .. controls +(-90:2em) and +(-90:2em) .. ([yshift=1pt,xshift=2.2em]center.south) node[below,midway,align=center] {\small{翻译模型目标函数}\\\small{（模型优化）}};
+\draw [->,thick] ([yshift=1pt,xshift=1.8em]center.south) .. controls +(-90:2em) and +(-90:2em) .. ([yshift=1pt]left.south) node[below,pos=0.6,align=center] {\small{回译}\\\small{（数据优化）}};

 \end{scope}
 \end{tikzpicture}
--- a/Chapter16/chapter16.tex
+++ b/Chapter16/chapter16.tex
--- a/Chapter18/Figures/figure-comparison-of-incremental-model-optimization-methods.tex
+++ b/Chapter18/Figures/figure-comparison-of-incremental-model-optimization-methods.tex
+\addtolength{\tabcolsep}{-4pt}
+
+\begin{tabular}{c c c}
+
+\begin{tikzpicture}
+\begin{scope}
+% ,minimum height =1em,minimum width=2em
+\tikzstyle{model} = [draw,black,very thick,inner sep=3.5pt,rounded corners=4pt,fill=blue!20,minimum width=4em,minimum height=1.5em,font=\footnotesize]
+\tikzstyle{data} = [draw,black,very thick,inner sep=3.5pt,rounded corners=4pt,fill=green!20,minimum width=4em,minimum height=1.5em,font=\footnotesize]
+\tikzstyle{word} = [inner sep=3.5pt,font=\footnotesize]
+
+\node[data] (old) at (0,0) {旧数据};
+\node[data] (new) at ([xshift=3em]old.east) {新数据};
+\node[data] (all) at ([xshift=2.55em,yshift=-4em]old.south) {最终数据};
+\node[model] (final_model) at ([xshift=0em,yshift=-4em]all.south) {最终模型};
+
+\draw [->,thick] ([yshift=-0.2em]old.south) .. controls +(south:2.5em) and +(north:2.5em) .. ([xshift=-0.2em,yshift=0.2em]all.north);
+\draw [->,thick] ([yshift=-0.2em]new.south) .. controls +(south:2.5em) and +(north:2.5em) .. ([xshift=0.2em,yshift=0.2em]all.north);
+\draw [->,thick] ([yshift=-0.2em]all.south) -- ([yshift=0.2em]final_model.north)node[pos=0.5,right,align=center,font=\footnotesize] {训练};
+
+\node[word] at ([yshift=-2em]final_model.south){（a）数据混合};
+
+\end{scope}
+\end{tikzpicture}
+&
+\begin{tikzpicture}
+\begin{scope}
+\tikzstyle{model} = [draw,black,very thick,inner sep=3.5pt,rounded corners=4pt,fill=blue!20,minimum width=4em,minimum height=1.5em,font=\footnotesize]
+\tikzstyle{data} = [draw,black,very thick,inner sep=3.5pt,rounded corners=4pt,fill=green!20,minimum width=4em,minimum height=1.5em,font=\footnotesize]
+\tikzstyle{word} = [inner sep=3.5pt,font=\footnotesize]
+
+\node[data] (old) at (0,0) {旧数据};
+\node[data] (new) at ([xshift=3em]old.east) {新数据};
+\node[model] (old_model) at ([yshift=-4em]old.south) {旧模型};
+\node[model] (new_model) at ([yshift=-4em]new.south) {新模型};
+
+\node[model] (final_model) at ([xshift=2.55em,yshift=-4em]old_model.south) {最终模型};
+
+\draw [->,thick] ([yshift=-0.2em]old.south) -- ([yshift=0.2em]old_model.north) node[pos=0.5,left,align=center,font=\footnotesize] {训练};
+\draw [->,thick] ([yshift=-0.2em]new.south) -- ([yshift=0.2em]new_model.north) node[pos=0.5,right,align=center,font=\footnotesize] {训练};
+\draw [->,thick] ([yshift=-0.2em]old_model.south) .. controls +(south:2.5em) and +(north:2.5em) .. ([xshift=-0.2em,yshift=0.2em]final_model.north);
+\draw [->,thick] ([yshift=-0.2em]new_model.south) .. controls +(south:2.5em) and +(north:2.5em) .. ([xshift=0.2em,yshift=0.2em]final_model.north);
+
+\node[word] at ([yshift=2em]final_model.north) {插值};
+
+
+\node[word] at ([yshift=-2em]final_model.south){（b）模型插值};
+
+%空白占位
+\node[word] at ([xshift=-3em]old.west) {};
+\node[word] at ([xshift=3em]new.east) {};
+
+\end{scope}
+\end{tikzpicture}
+&
+\begin{tikzpicture}
+\begin{scope}
+\tikzstyle{model} = [draw,black,very thick,inner sep=3.5pt,rounded corners=4pt,fill=blue!20,minimum width=4em,minimum height=1.5em,font=\footnotesize]
+\tikzstyle{data} = [draw,black,very thick,inner sep=3.5pt,rounded corners=4pt,fill=green!20,minimum width=4em,minimum height=1.5em,font=\footnotesize]
+\tikzstyle{word} = [inner sep=3.5pt,font=\footnotesize]
+
+\node[data] (old) at (0,0) {旧数据};
+\node[data] (new) at ([xshift=3em]old.east) {新数据};
+\node[model] (final_model) at ([yshift=-8.8em]new.south) {最终模型};
+
+\draw [->,thick] ([yshift=-0.2em]new.south) -- ([xshift=0.2em,yshift=0.2em]final_model.north) node[pos=0.5,right,align=center,font=\footnotesize] {目标\\函数1};
+\draw [->,thick,dashed] ([yshift=-0.2em]old.south) .. controls +(south:4.5em) and +(north:4.5em) .. ([xshift=-0.2em,yshift=0.2em]final_model.north) node[align=center,font=\footnotesize] at ([xshift=-0.1em,yshift=-4em]old.south) {目标\\函数2};
+
+\node[word] at ([yshift=-2em,xshift=-2.55em]final_model.south){（c）多目标训练};
+
+\end{scope}
+\end{tikzpicture}
+
+\end{tabular}
+
+\addtolength{\tabcolsep}{4pt}
\ No newline at end of file
--- a/Chapter18/Figures/figure-memory-multi-use.tex
+++ b/Chapter18/Figures/figure-memory-multi-use.tex
+\begin{tabular}{c c}
+
+\begin{tikzpicture}
+\begin{scope}
+% ,minimum height =1em,minimum width=2em
+\tikzstyle{memory} = [draw,black,very thick,inner sep=2pt,rounded corners=0pt,fill=blue!20,minimum width=2em,minimum height=1.5em,anchor=west]
+\tikzstyle{thread} = [very thick,inner sep=3.5pt,rounded corners=0pt,minimum width=3em,minimum height=1.5em]
+\tikzstyle{word} = [inner sep=3.5pt,font=\scriptsize]
+
+\node[thread] (one) at (0,0) {};
+\node [word] at (one.north) {\scriptsize 数据1};
+\draw[|-|,very thick] (-1.5em,0em) -- (1.6em,0em);
+
+\node[thread,minimum width=5em] (two) at ([yshift=-1em,xshift=2.6em]one.south east) {};
+\node [word] at (two.north) {\scriptsize 数据2};
+\draw[|-|,very thick] (1.8em,-1.8em) -- (6.5em,-1.8em);
+
+\node[thread,minimum width=4em] (three) at ([yshift=-1em,xshift=0.3em]two.south east) {};
+\node [word] at (three.north) {\scriptsize 数据3};
+\draw[|-|,very thick] (5em,-1.8*2em) -- (9em,-1.8*2em);
+
+\node[thread,minimum width=2em] (four) at ([yshift=-1em,xshift=1.2em]three.south east) {};
+\node [word] at (four.north) {\scriptsize 数据4};
+\draw[|-|,very thick] (9.3em,-1.8*3em) -- (11em,-1.8*3em);
+
+\node [memory] (mone) at ([yshift=4em,xshift=1em]one.north) {};
+\node [memory] (mtwo) at ([xshift=0em]mone.east) {};
+\node [memory] (mthree) at ([xshift=0em]mtwo.east) {};
+\node [memory] (mfour) at ([xshift=0em]mthree.east) {};
+
+\draw[->,very thick] (-0.8,-2.5) -- (4.7,-2.5);
+\draw[->,very thick] (-0.8,-2.5) -- (-0.8,1);
+ 
+\node [word] (time) at ([yshift=-1.5em,xshift=0.3em]four.south) {\scriptsize 时间线};
+\node [word] (time) at ([yshift=1.5em,xshift=-2.2em]one.west) {\scriptsize 数据};
+
+\draw [->,dashed,line width=0.7pt] ([yshift=0.5em]one.north) .. controls +(north:1.5em) and +(south:1.5em) .. ([yshift=-0.2em]mone.south);
+\draw [->,dashed,line width=0.7pt] ([yshift=0.5em]two.north) -- ([yshift=-0.2em]mtwo.south);
+\draw [->,dashed,line width=0.7pt] ([yshift=0.5em,xshift=0.5em]three.north) .. controls +(north:3.5em) and +(south:4.5em) .. ([yshift=-0.2em]mthree.south);
+\draw [->,dashed,line width=0.7pt] ([yshift=0.5em]four.north) .. controls +(north:4.5em) and +(south:4.5em) .. ([yshift=-0.2em]mfour.south);
+
+\node [word] at ([yshift=-6em]two.south) {（a）显存不复用};
+
+%占位
+\node[word] at ([xshift=1em]four.east) {};
+
+\node [word] at ([xshift=1.5em,yshift=5.6em]one.north) {\scriptsize 显存};
+
+\begin{pgfonlayer}{background}
+\node [rectangle,inner sep=0.5em,rounded corners=1pt,minimum width=10em,minimum height=3.6em,fill=gray!10,drop shadow] at ([yshift=6.6em,xshift=1em]two.north) {};
+\end{pgfonlayer}
+
+\end{scope}
+\end{tikzpicture}
+&
+\begin{tikzpicture}
+\begin{scope}
+%\tikzstyle{memory} = [draw,black,very thick,inner sep=2pt,rounded corners=0pt,fill=blue!20,minimum width=2em,minimum height=1.5em,anchor=west]
+%\tikzstyle{thread} = [draw,black,very thick,inner sep=3.5pt,rounded corners=0pt,fill=green!20,minimum width=3em,minimum height=1.5em]
+\tikzstyle{memory} = [draw,black,very thick,inner sep=2pt,rounded corners=0pt,fill=blue!20,minimum width=2em,minimum height=1.5em,anchor=west]
+\tikzstyle{thread} = [very thick,inner sep=3.5pt,rounded corners=0pt,minimum width=3em,minimum height=1.5em]
+\tikzstyle{word} = [inner sep=3.5pt,font=\scriptsize]
+
+\node[thread] (one) at (0,0) {};
+\node [word] at (one.north) {\scriptsize 数据1};
+\draw[|-|,very thick] (-1.5em,0em) -- (1.6em,0em);
+
+\node[thread,minimum width=5em] (two) at ([yshift=-1em,xshift=2.6em]one.south east) {};
+\node [word] at (two.north) {\scriptsize 数据2};
+\draw[|-|,very thick] (1.8em,-1.8em) -- (6.5em,-1.8em);
+
+\node[thread,minimum width=4em] (three) at ([yshift=-1em,xshift=0.3em]two.south east) {};
+\node [word] at (three.north) {\scriptsize 数据3};
+\draw[|-|,very thick] (5em,-1.8*2em) -- (9em,-1.8*2em);
+
+\node[thread,minimum width=2em] (four) at ([yshift=-1em,xshift=1.2em]three.south east) {};
+\node [word] at (four.north) {\scriptsize 数据4};
+\draw[|-|,very thick] (9.3em,-1.8*3em) -- (11em,-1.8*3em);
+
+\node [memory] (mone) at ([yshift=4em,xshift=1em]one.north) {};
+\node [memory] (mtwo) at ([xshift=0em]mone.east) {};
+\node [memory,fill=white,minimum width=4em] (mthree) at ([xshift=0em]mtwo.east) {};
+%\node [memory,fill=white] (mfour) at ([xshift=0em]mthree.east) {};
+
+\draw[->,very thick] (-0.8,-2.5) -- (4.7,-2.5);
+\draw[->,very thick] (-0.8,-2.5) -- (-0.8,1);
+
+\node [word] (time) at ([yshift=-1.5em,xshift=0.3em]four.south) {\scriptsize 时间线};
+\node [word] (time) at ([yshift=1.5em,xshift=-2.2em]one.west) {\scriptsize 数据};
+
+\draw [->,dashed,line width=0.7pt] ([yshift=0.5em]one.north) .. controls +(north:1.5em) and +(south:1.5em) .. ([yshift=-0.2em,xshift=-0.4em]mone.south);
+\draw [->,dashed,line width=0.7pt] ([yshift=0.5em]two.north) .. controls +(north:3.5em) and +(south:3.5em) .. ([yshift=-0.2em,xshift=0.4em]mone.south);
+\draw [->,dashed,line width=0.7pt] ([yshift=0.5em,xshift=0.5em]three.north) .. controls +(north:3.5em) and +(south:3.5em) .. ([yshift=-0.2em,xshift=-0.4em]mtwo.south);
+\draw [->,dashed,line width=0.7pt] ([yshift=0.5em]four.north) .. controls +(north:4.5em) and +(south:3.5em) .. ([yshift=-0.2em,xshift=0.4em]mtwo.south);
+
+\node [word] at ([xshift=1.5em,yshift=5.6em]one.north) {\scriptsize 显存};
+
+\node [word] at ([yshift=-6em]two.south) {（b）显存复用};
+
+\begin{pgfonlayer}{background}
+\node [rectangle,inner sep=0.5em,rounded corners=1pt,minimum width=10em,minimum height=3.6em,fill=gray!10,drop shadow] at ([yshift=6.6em,xshift=1em]two.north) {};
+\end{pgfonlayer}
+
+
+\end{scope}
+\end{tikzpicture}
+
+\end{tabular}
\ No newline at end of file
--- a/Chapter18/Figures/figure-translation-interfered.tex
+++ b/Chapter18/Figures/figure-translation-interfered.tex
+
+%%% outline
+%-------------------------------------------------------------------------
+\begin{tikzpicture}[scale=0.8]
+\tikzstyle{every node}=[scale=0.8]
+\tikzstyle{node}=[rounded corners=4pt, draw,minimum width=3em, minimum height=2em, drop shadow={shadow xshift=0.14em, shadow yshift=-0.14em}]
+
+\begin{scope}
+
+%\draw[fill=yellow!20]  (-5em, 0) -- (-6em, 1em) -- (5em, 1em) -- (6em, 0em) -- (5em, -1em) -- (-6em, -1em) -- (-5em, 0em);
+%\draw[fill=yellow!20]  (-5em, 10em) -- (-6em, 11.2em) -- (5em, 11.2em) -- (6em, 10em) -- (5em,8.8em) -- (-6em, 8.8em) -- (-5em, 10em);
+\node[] (n1) at (0,0){小牛翻译的总部在哪里？};
+
+\node[node,fill=blue!20] (c1) at (0, 5em){\scriptsize\bfnew{机器翻译}};
+\node[align=left] (n2) at (0,10em){Where is the headquarters \\ of {\color{red} Mavericks Translation}?};
+
+\node [draw,single arrow,inner ysep=0.3em, minimum height=2.4em, rotate=90,fill=cyan!40,very thin] (arrow1) at (0, 2.4em) {};
+\node [draw,single arrow,inner ysep=0.3em, minimum height=2em, rotate=90,fill=cyan!40,very thin] (arrow1) at (0, 7.2em) {};
+
+\node[font=\Large,text=red] at (0, -2em){\ding{56}};
+\end{scope}
+\begin{scope}[xshift=14em]
+%\draw[fill=yellow!20]  (-5em, 0) -- (-6em, 1em) -- (5em, 1em) -- (6em, 0em) -- (5em, -1em) -- (-6em, -1em) -- (-5em, 0em);
+%\draw[fill=yellow!20]  (-5em, 10em) -- (-6em, 11.2em) -- (5em, 11.2em) -- (6em, 10em) -- (5em,8.8em) -- (-6em, 8.8em) -- (-5em, 10em);
+\node[] (n3) at (0,0){小牛翻译的总部在哪里？};
+
+\node[node,fill=blue!20] (c2) at (-3em, 5em){\scriptsize\bfnew{机器翻译}};
+
+\node[node,fill=red!20] (c3) at (3em, 5em){\scriptsize\bfnew{术语词典}};
+
+\node[font=\scriptsize,draw,inner sep=3pt,fill=red!20,minimum height=1em] (w1) at (9em, 6.5em){小牛翻译};
+
+\node[font=\scriptsize,draw,inner sep=3pt,fill=red!20,minimum height=1em] (w2) at (9em, 3.5em){NiuTrans};
+\node[font=\Large] (add) at (0em, 5em){+};
+\node[align=left] (n4) at (0,10em){Where is the headquarters \\ of {\color{red} NiuTrans}?};
+
+\node [draw,single arrow,inner ysep=0.3em, minimum height=2.4em, rotate=90,fill=cyan!40,very thin] (arrow1) at (0, 2.4em) {};
+\node [draw,single arrow,inner ysep=0.3em, minimum height=2em, rotate=90,fill=cyan!40,very thin] (arrow1) at (0, 7.2em) {};
+
+\draw[dash pattern=on 1pt off 0.5pt,black,line width=1.2pt,->, out=180, in=45] ([xshift=-0.2em]w1.180) to ([xshift=0.2em]c3.20);
+\draw[dash pattern=on 1pt off 0.5pt,black,line width=1.2pt,->,out=180,in=-45] ([xshift=-0.2em]w2.180) to ([xshift=0.2em]c3.-20);
+
+\node[font=\Large,text=ugreen] at (0, -2em){\ding{52}};
+\end{scope}
+\end{tikzpicture}
+
+
+
+
--- a/Chapter18/chapter18.tex
+++ b/Chapter18/chapter18.tex
@@ -33,17 +33,17 @@

 \section{机器翻译的应用并不简单}

-\parinterval 机器翻译一直是自然语言处理的热点，无论从评测比赛的结果，还是论文发表数量上看，机器翻译的研究可谓火热。但是，客观的说，我们离机器翻译完美的应用还有相当的距离。这主要是因为，成熟的系统需要很多技术的融合。因此，机器翻译系统研发也是一项复杂的系统工程。而机器翻译研究大多是对局部模型和方法的调整，这也会造成一个现象，很多论文里报道的技术方法可能无法直接应用于真实场景的系统。因此，需要关注如何对具体的机器翻译应用问题进行求解，使机器翻译技术能够落地。有几方面挑战：
+\parinterval 机器翻译一直是自然语言处理的热点，无论从评测比赛的结果，还是论文发表数量上看，机器翻译的研究可谓火热。但是，客观的说，我们离机器翻译完美的应用还有相当的距离。这主要是因为，成熟的系统需要很多技术的融合。因此，机器翻译系统研发也是一项复杂的系统工程。而机器翻译研究大多是对局部模型和方法的调整，这也会造成一个现象：很多论文里报道的技术方法可能无法直接应用于真实场景的系统。因此，需要关注如何对具体的机器翻译应用问题进行求解，使机器翻译技术能够落地。有几方面挑战：

 \begin{itemize}
 \vspace{0.5em}
-\item {\small\bfnew{机器翻译模型很脆弱}}。实验环境下，给定翻译任务，甚至给定训练和测试数据，机器翻译模型可以表现的很好。但是，应用场景是不断变化的。经常会出现训练数据缺乏、应用领域与训练数据不匹配、用户的测试方法与开发者不同等等一系列问题。特别是，对于不同的任务，神经机器翻译等模型需要进行非常细致的调整，理想中一套``包打天下''的模型和设置是不存在的。这些都导致一个结果：直接使用既有机器翻译模型很难满足真实应用的需求。
+\item {\small\bfnew{机器翻译模型很脆弱}}。实验环境下，给定翻译任务，甚至给定训练和测试数据，机器翻译模型可以表现的很好。但是，应用场景是不断变化的。经常会出现训练数据缺乏、应用领域与训练数据不匹配、用户的测试方法与开发者不同等等一系列问题。特别是，对于不同的任务，神经机器翻译模型需要进行非常细致的调整，理想中“一套包打天下”的模型和设置是不存在的。这些都导致一个结果：直接使用既有机器翻译模型很难满足不断变化的应用场景。

 \vspace{0.5em}
 \item {\small\bfnew{机器翻译缺少针对场景的应用技术}}。目前为止，机器翻译的研究进展已经为我们提供很好的机器翻译基础模型。但是，用户并不是简单的与这些模型“打交道”，他们更加关注如何解决自身的业务需求，例如，机器翻译应用的交互方式、系统是否可以自己预估翻译可信度等等。甚至，在某些场景中，用户对翻译模型的体积和速度都有非常严格的要求。

 \vspace{0.5em}
-\item {\small\bfnew{优秀系统的研发需要长时间的打磨}}。工程打磨也是研发优秀机器翻译系统的必备条件，有些时候甚至是决定性的。从科学研究的角度看，我们需要对更本质的科学问题进行探索，而非简单的工程开发与调试。但是，对一个初级的系统进行研究往往会掩盖掉``真正的问题''，因为很多问题在更优秀的系统中并不存在。
+\item {\small\bfnew{优秀系统的研发需要长时间的打磨}}。工程打磨也是研发优秀机器翻译系统的必备条件，有些时候甚至是决定性的。从科学研究的角度看，我们需要对更本质的科学问题进行探索，而非简单的工程开发与调试。但是，对一个初级的系统进行研究往往会掩盖掉“真正的问题”，因为很多问题在更优秀的系统中并不存在。
 \vspace{0.5em}
 \end{itemize}

@@ -69,22 +69,32 @@

 \parinterval 增量训练就是满足上述需求的一种方法。本质上来说，神经机器翻译中使用的随机梯度下降方法就是典型的增量训练方法，其基本思想是：每次选择一个样本对模型进行更新，这个过程反复不断执行，每次模型更新都是一次增量训练。当多个样本构成了一个新数据集时，可以把这些新样本作为训练数据，把当前的模型作为初始模型，之后正常执行机器翻译的训练过程即可。如果新增加的数据量不大（比如，几万句对），训练的代价非常低。

-\parinterval 这里面的一个问题是，新的数据虽然能代表一部分的翻译现象，但是如果仅仅依赖新数据进行更新，会使模型对新数据过分拟合，进而造成无法很好地处理新数据之外的样本。这也可以被看做是一种灾难性遗忘的问题（{\color{red} 参考文献！}），即：模型过分注重对新样本的拟合，丧失了旧模型的一部分能力。解决这个问题，有几种思路：
+\parinterval 这里面的一个问题是，新的数据虽然能代表一部分的翻译现象，但是如果仅仅依赖新数据进行更新，会使模型对新数据过分拟合，进而造成无法很好地处理新数据之外的样本。这也可以被看做是一种灾难性遗忘的问题\upcite{DBLP:conf/coling/GuF20}，即：模型过分注重对新样本的拟合，丧失了旧模型的一部分能力。解决这个问题，有几种思路：

 \begin{itemize}
 \vspace{0.5em}
-\item 数据混合。在增量训练时，除了使用新的数据，再混合一定量的旧数据，混合的比例可以根据训练的代价进行调整。这样，模型相当于在全量数据的一个采样结果上进行更新。
+\item 数据混合\upcite{DBLP:journals/corr/ChuDK17}。在增量训练时，除了使用新的数据，再混合一定量的旧数据，混合的比例可以根据训练的代价进行调整。这样，模型相当于在全量数据的一个采样结果上进行更新。

 \vspace{0.5em}
-\item 模型插值（{\color{red} 参考文献！}）。在增量训练之后，将新模型与旧模型进行插值。
+\item 模型插值\upcite{DBLP:conf/emnlp/WangULCS17}。在增量训练之后，将新模型与旧模型进行插值。

 \vspace{0.5em}
-\item 多目标训练（{\color{red} 参考文献！}）。在增量训练时，除了在新数据上定义损失函数之外，可以再定义一个在旧数据上的损失函数，这样确保模型可以在两个数据上都有较好的表现。另一种方案是引入正则化项，使新模型的参数不会偏离旧模型的参数太远。
+\item 多目标训练\upcite{barone2017regularization,DBLP:conf/aclnmt/KhayrallahTDK18,DBLP:conf/naacl/ThompsonGKDK19}。在增量训练时，除了在新数据上定义损失函数之外，可以再定义一个在旧数据上的损失函数，这样确保模型可以在两个数据上都有较好的表现。另一种方案是引入正则化项，使新模型的参数不会偏离旧模型的参数太远。

 \vspace{0.5em}
 \end{itemize}

-\parinterval {\color{red} 图XXX}给出了上述方法的对比。在实际应用中，还有很多细节会影响增量训练的效果，比如，学习率大小的选择等。另外，新的数据积累到何种规模可以进行增量训练也是实践中需要解决问题。一般来说，增量训练使用的数据量越大，训练的效果越稳定。但是，这并不是说数据量少就不可以进行增量训练，而是如果数据量过少时，需要考虑训练代价和效果之间的平衡。而且，过于频繁的增量训练也会带来更多的灾难性遗忘的风险，因此合理进行增量训练也是应用中需要实践的。
+%----------------------------------------------
+\begin{figure}[htp]
+\centering
+\input{./Chapter18/Figures/figure-comparison-of-incremental-model-optimization-methods}
+%\setlength{\abovecaptionskip}{-0.2cm}
+\caption{增量式模型优化方法}
+\label{fig:18-1}
+\end{figure}
+%----------------------------------------------
+
+\parinterval 图\ref{fig:18-1}给出了上述方法的对比。在实际应用中，还有很多细节会影响增量训练的效果，比如，学习率大小的选择等。另外，新的数据积累到何种规模可以进行增量训练也是实践中需要解决问题。一般来说，增量训练使用的数据量越大，训练的效果越稳定。但是，这并不是说数据量少就不可以进行增量训练，而是如果数据量过少时，需要考虑训练代价和效果之间的平衡。而且，过于频繁的增量训练也会带来更多的灾难性遗忘的风险，因此合理进行增量训练也是应用中需要实践的。

 \parinterval 主要注意的是，理想状态下，系统使用者会希望系统看到少量句子就可以很好地解决一类翻译问题，即：进行真正的小样本学习。但是，现实的情况是，现在的机器翻译系统还无法很好的做到“举一反三”。增量训练也需要专业人士完成才能得到相对较好的效果。

@@ -102,6 +112,38 @@

 \section{翻译结果可干预性}

+\parinterval 交互式机器翻译体现了一种用户的行为“干预”机器翻译结果的思想。实际上，在机器翻译出现错误时，人们总是希望用一种直接有效的方式“改变”译文，到达改善翻译质量的目的。比如，如果机器翻译系统可以输出多个候选译文，用户可以在其中挑选最好的译文进行输出。也就是，人干预了译文候选的排序过程。另一个例子是使用{\small\bfnew{翻译记忆}}\index{翻译记忆}（Translation Memory\index{Translation Memory}）改善机器翻译系统的性能。翻译记忆记录了高质量的源语言-目标语言句对，有时也可以被看作是一种先验知识或“记忆”。因此，当进行机器翻译（包括统计机器翻译和神经机器翻译）时，使用翻译记忆指导翻译过程也可以被看作是一种干预手段（{\color{red} 参考文献！SMT和NMT都有，SMT中CL上有个长文，自动化所的，NMT的我记得腾讯应该有，找到后和我确认一下！}）。
+
+
+\parinterval 虽然干预机器翻译系统的方式很多，最常用的还是对源语言特定片段翻译的干预，以期望最终句子的译文中满足某些对片段翻译的约束。这个问题也被称作{\small\bfnew{基于约束的翻译}}\index{基于约束的翻译} （Constraint-based Translation\index{Constraint-based Translation}）。比如，在翻译网页时，需要保持译文中的网页标签与源文一致。另一个典型例子是术语翻译。在实际应用中，经常会遇到公司名称、品牌名称、产品名称等专有名词和行业术语，以及不同含义的缩写，比如，对于“小牛翻译”这个专有术语，不同的机器翻译系统给出的结果不一样:“Maverick translation”、“Calf translation”、“The mavericks translation”…… 而它正确的翻译应该为“NiuTrans”。 对于这些类似的特殊词汇，大多数机器翻译引擎很难翻译得准确。一方面，因为模型大多是在通用数据集上训练出来的，并不能保证数据集能涵盖所有的语言现象。另一方面，即使是这些术语在训练数据中出现，它们通常也是低频的，模型比较难学到。为了保证翻译的准确性，对术语翻译进行干预是十分有必要的，这对领域适应等问题的求解也是非常有意义的。
+
+\parinterval 就{\small\bfnew 术语翻译}\index{术语翻译}（Lexically Constrained Translation）\index{Lexically Constrained Translation}而言，在不干预的情况下让模型直接翻译出正确术语是很难的，因为目标术语翻译词很可能是未登录词，因此必须人为提供额外的术语词典，那么我们的目标就是让模型的翻译输出遵守用户提供的术语约束。这个过程如图\ref{fig:18-2}所示。
+%----------------------------------------------
+\begin{figure}[htp]
+\centering
+\input{./Chapter18/Figures/figure-translation-interfered}
+%\setlength{\abovecaptionskip}{-0.2cm}
+\caption{翻译结果可干预性（{\color{red} 这个图需要修改！有些乱，等回沈阳找我讨论！}）}
+\label{fig:18-2}
+\end{figure}
+%----------------------------------------------
+
+\parinterval 在统计机器翻译中，翻译本质上是由短语和规则构成的推导，因此修改译文比较容易，比如，可以在一个源语言片段所对应的翻译候选集中添加希望得到的译文即可。而神经机器翻译是一个端到端模型，内部基于连续空间的实数向量表示，翻译过程本质上是连续空间中元素的一系列映射、组合和代数运算，因此无法像修改符号系统那样直接修改模型并加入离散化的约束来影响译文生成。目前主要有两种解决思路：
+
+\begin{itemize}
+\vspace{0.5em}
+\item 强制生成。这种方法并不改变模型，而是在解码过程中按照一定的策略来实施约束，一般是修改束搜索算法以确保输出必须包含指定的词或者短语\upcite{DBLP:conf/acl/HokampL17,DBLP:conf/naacl/PostV18,DBLP:conf/wmt/ChatterjeeNTFSB17,DBLP:conf/naacl/HaslerGIB18}，例如，在获得译文输出后，利用注意力机制获取词对齐，之后通过词对齐对指定部分译文进行强制替换。或者，对包含正确术语翻译的翻译候选进行额外的加分，以确保解码时这样的翻译候选的排名足够靠前。
+
+\vspace{0.5em}
+\item 数据增强。这类方法通过修改机器翻译模型的数据和训练过程来实现约束。通常是根据术语词典对源语言句子进行一定的修改，例如，将术语的译文编辑到源语言句子中，之后将原始语料库和合成语料库进行混合训练，期望模型能够自动利用术语信息来指导解码，或者是利用占位符来替换源语中的术语，待翻译完成后再进行还原\upcite{DBLP:conf/naacl/SongZYLWZ19,DBLP:conf/acl/DinuMFA19,DBLP:journals/corr/abs-1912-00567,DBLP:conf/ijcai/ChenCWL20}。
+
+\vspace{0.5em}
+\end{itemize}
+
+\parinterval 强制生成的方法是在搜索策略上进行限制，与模型无关，这类方法能保证输出满足约束，但是会影响翻译速度。数据增强的方法是通过构造特定格式的数据让模型训练，从而让模型具有一定的泛化能力，这类方法需要进行译前译后编辑，通常不会影响翻译速度，但并不能保证输出能满足约束。
+
+\parinterval 此外，机器翻译在应用时通常还需要进行译前译后的处理，译前处理指的是在翻译前对源语言句子进行修改和规范，从而能生成比较顺畅的译文，提高译文的可读性和准确率。在实际应用时，由于用户输入的形式多样，可能会包含比如术语、缩写、数学公式等，有些甚至可能还包含网页标签，因此对源文进行预处理是很有必要的。常见的处理工作包括格式转换、标点符号检査、术语编辑、标签识别等，待翻译完成后，则需要对机器译文进行进一步的编辑和修正，从而使其符合使用规范，比如进行标点、格式检查，术语、标签还原等，这些过程通常都是按照设定的处理策略自动完成的。另外,译文长度的控制、译文多样性的控制等也可以丰富机器翻译系统干预的手段（见{\chapterfourteen}）。
+
 %----------------------------------------------------------------------------------------
 %    NEW SECTION
 %----------------------------------------------------------------------------------------
@@ -110,7 +152,7 @@

 \parinterval 在机器翻译研究中，一般会假设计算资源是充足的。但是，在很多应用场景中，机器翻译使用的计算资源非常有限，比如，一些离线设备上没有GPU处理器，而且CPU的处理能力也很弱，甚至内存也非常有限。这时，让模型变得更小、系统变得更快就成为了一个重要的需求。

-\parinterval 本书中已经讨论了大量的技术方法，可用于小设备上的机器翻译，例如：
+\parinterval 本书中已经讨论了大量的可用于小设备上的机器翻译技术方法，例如：

 \begin{itemize}
 \vspace{0.5em}
@@ -126,7 +168,7 @@
 \item 面向设备的结构学习（{\chapterfifteen}）。可以把设备的存储及延时作为目标函数的一部分，自动搜索高效的翻译模型结构。

 \vspace{0.5em}
-\item 动态适应性模型（引用：王强emnlp findings，还有Adaptive neural networks for fast test-time prediction，Multi-scale dense networks for resource efficient image classification）。模型可以动态调整大小或者计算规模，以达到在不同设备上平衡延时和精度的目的。比如，可以根据延时的要求，动态生成合适深度的神经网络模型进行翻译。
+\item 动态适应性模型\upcite{DBLP:conf/emnlp/WangXZ20,DBLP:journals/corr/BolukbasiWDS17,DBLP:conf/iclr/HuangCLWMW18}。模型可以动态调整大小或者计算规模，以达到在不同设备上平衡延时和精度的目的。比如，可以根据延时的要求，动态生成合适深度的神经网络模型进行翻译。

 \vspace{0.5em}
 \end{itemize}
@@ -156,10 +198,19 @@
 \item GPU部署中，由于GPU成本较高，因此可以考虑在单GPU设备上部署多套不同的系统。如果这些系统之间的并发不频繁，翻译延时不会有明显增加。这种多个模型共享一个设备的方法比较适合翻译请求相对低频但是翻译任务又很多样的情况。

 \vspace{0.5em}
-\item 机器翻译大规模GPU部署对显存的使用也很严格。由于GPU显存较为有限，因此模型运行的显存消耗也是需要考虑的。一般来说，除了模型压缩和结构优化之外（{\chapterfourteen}和{\chapterfifteen}），也需要对模型的显存分配和使用进行单独的优化。例如，使用显存池来缓解频繁申请和释放显存空间造成的延时。另外，也可以尽可能让同一个显存块保存生命期不重叠的数据，避免重复开辟新的存储空间。图XXX 展示了一个显存复用的示例。
+\item 机器翻译大规模GPU部署对显存的使用也很严格。由于GPU显存较为有限，因此模型运行的显存消耗也是需要考虑的。一般来说，除了模型压缩和结构优化之外（{\chapterfourteen}和{\chapterfifteen}），也需要对模型的显存分配和使用进行单独的优化。例如，使用显存池来缓解频繁申请和释放显存空间造成的延时。另外，也可以尽可能让同一个显存块保存生命期不重叠的数据，避免重复开辟新的存储空间。图\ref{fig:18-3}展示了一个显存复用的示例。

+%----------------------------------------------
+\begin{figure}[htp]
+\centering
+\input{./Chapter18/Figures/figure-memory-multi-use}
+%\setlength{\abovecaptionskip}{-0.2cm}
+\caption{显存复用示例}
+\label{fig:18-3}
+\end{figure}
+%----------------------------------------------
 \vspace{0.5em}
-\item 在翻译请求高并发的场景中，使用批量翻译也是有效利用GPU设备的方式。不过，机器翻译是一个处理不定长序列的任务，输入的句子长度差异较大。而且，由于译文长度无法预知，进一步增加了不同长度的句子所消耗计算资源的不确定性。这时，可以让长度相近的句子在一个批次里处理，减小由于句子长度不统一造成的补全过多、设备利用率低的问题。例如，可以按输入句子长度范围分组，如图XXX。 也可以设计更加细致的方法对句子进行分组，以最大化批量翻译中设备的利用率（{\color{red} 参考文献：TurboTransformers: An Efficient GPU Serving System For Transformer Models}）。
+\item 在翻译请求高并发的场景中，使用批量翻译也是有效利用GPU设备的方式。不过，机器翻译是一个处理不定长序列的任务，输入的句子长度差异较大。而且，由于译文长度无法预知，进一步增加了不同长度的句子所消耗计算资源的不确定性。这时，可以让长度相近的句子在一个批次里处理，减小由于句子长度不统一造成的补全过多、设备利用率低的问题。例如，可以按输入句子长度范围分组。 也可以设计更加细致的方法对句子进行分组，以最大化批量翻译中设备的利用率\upcite{DBLP:journals/corr/abs-2010-05680}。

 \vspace{0.5em}
 \end{itemize}

--- a/bibliography.bib
+++ b/bibliography.bib
@@ -4086,7 +4086,7 @@ year = {2012}
          Joris Pelemans and 
 		  Hugo Van Hamme and 
 		  Patrick Wambacq},
-  publisher={European Association of Computational Linguistics},
+  publisher={Annual Conference of the European Association for Machine Translation},
  year={2017}
 }

@@ -4569,7 +4569,7 @@ author    = {Yoshua Bengio and
               Jozef Mokry and
               Maria Nadejde},
  title     = {Nematus: a Toolkit for Neural Machine Translation},
-  publisher = {European Association of Computational Linguistics},
+  publisher = {Annual Conference of the European Association for Machine Translation},
  pages     = {65--68},
  year      = {2017}
 }
@@ -9067,6 +9067,376 @@ author    = {Zhuang Liu and
  year      = {2016},
 }

+@inproceedings{DBLP:conf/emnlp/WangTWS19a,
+  author    = {Xing Wang and
+               Zhaopeng Tu and
+               Longyue Wang and
+               Shuming Shi},
+  title     = {Self-Attention with Structural Position Representations},
+  pages     = {1403--1409},
+  publisher = {Conference on Empirical Methods in Natural Language Processing},
+  year      = {2019}
+}
+
+@article{Liu2020LearningTE,
+	title={Learning to Encode Position for Transformer with Continuous Dynamical Model},
+	author={Xuanqing Liu and Hsiang-Fu Yu and Inderjit Dhillon and Cho-Jui Hsieh},
+	journal={ArXiv},
+	year={2020},
+	volume={abs/2003.09229}
+}
+
+@inproceedings{DBLP:conf/nips/ChenRBD18,
+  author    = {Tian Qi Chen and
+               Yulia Rubanova and
+               Jesse Bettencourt and
+               David Duvenaud},
+  title     = {Neural Ordinary Differential Equations},
+  publisher = {Conference and Workshop on Neural Information Processing Systems},
+  pages     = {6572--6583},
+  year      = {2018}
+}
+
+@inproceedings{DBLP:journals/corr/LuongPM15,
+  author    = {Thang Luong and
+               Hieu Pham and
+               Christopher D. Manning},
+  title     = {Effective Approaches to Attention-based Neural Machine Translation},
+  publisher = {Conference on Empirical Methods in Natural Language Processing},
+  pages     = {1412--1421},
+  year      = {2015}
+}
+
+@inproceedings{Yang2018ModelingLF,
+	author    = {Baosong Yang and
+               Zhaopeng Tu and
+               Derek F. Wong and
+               Fandong Meng and
+               Lidia S. Chao and
+               Tong Zhang},
+  title     = {Modeling Localness for Self-Attention Networks},
+  pages     = {4449--4458},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2018}
+}
+
+@inproceedings{DBLP:conf/aaai/GuoQLXZ20,
+  author    = {Qipeng Guo and
+               Xipeng Qiu and
+               Pengfei Liu and
+               Xiangyang Xue and
+               Zheng Zhang},
+  title     = {Multi-Scale Self-Attention for Text Classification},
+  pages     = {7847--7854},
+  publisher = {AAAI Conference on Artificial Intelligence},
+  year      = {2020}
+}
+
+@inproceedings{Wu2019PayLA,
+ author = {Felix Wu and
+		 Angela Fan and
+		 Alexei Baevski and
+		 Yann N. Dauphin and
+		 Michael Auli},
+ title = {Pay Less Attention with Lightweight and Dynamic Convolutions},
+ publisher = {International Conference on Learning Representations},
+ year = {2019},
+}
+
+@inproceedings{DBLP:conf/interspeech/GulatiQCPZYHWZW20,
+  author    = {Anmol Gulati and
+               James Qin and
+               Chung-Cheng Chiu and
+               Niki Parmar and
+               Yu Zhang and
+               Jiahui Yu and
+               Wei Han and
+               Shibo Wang and
+               Zhengdong Zhang and
+               Yonghui Wu and
+               Ruoming Pang},
+  title     = {Conformer: Convolution-augmented Transformer for Speech Recognition},
+  pages     = {5036--5040},
+  publisher = {International Speech Communication Association},
+  year      = {2020}
+}
+
+@inproceedings{DBLP:conf/cvpr/XieGDTH17,
+  author    = {Saining Xie and
+               Ross B. Girshick and
+               Piotr Doll{\'{a}}r and
+               Zhuowen Tu and
+               Kaiming He},
+  title     = {Aggregated Residual Transformations for Deep Neural Networks},
+  pages     = {5987--5995},
+  publisher = {IEEE Conference on Computer Vision and Pattern Recognition},
+  year      = {2017}
+}
+
+@article{DBLP:journals/corr/abs-1711-02132,
+  author    = {Karim Ahmed and
+               Nitish Shirish Keskar and
+               Richard Socher},
+  title     = {Weighted Transformer Network for Machine Translation},
+  journal   = {CoRR},
+  volume    = {abs/1711.02132},
+  year      = {2017}
+}
+
+@article{DBLP:journals/corr/abs-2006-10270,
+  author    = {Yang Fan and
+               Shufang Xie and
+               Yingce Xia and
+               Lijun Wu and
+               Tao Qin and
+               Xiang-Yang Li and
+               Tie-Yan Liu},
+  title     = {Multi-branch Attentive Transformer},
+  journal   = {CoRR},
+  volume    = {abs/2006.10270},
+  year      = {2020}
+}
+
+@inproceedings{DBLP:conf/emnlp/YanMZ20,
+  author    = {Jianhao Yan and
+               Fandong Meng and
+               Jie Zhou},
+  title     = {Multi-Unit Transformers for Neural Machine Translation},
+  pages     = {1047--1059},
+  publisher = {Conference on Empirical Methods in Natural Language Processing},
+  year      = {2020}
+}
+
+@article{李北2019面向神经机器翻译的集成学习方法分析,
+  title={面向神经机器翻译的集成学习方法分析},
+  author={李北 and 王强 and 肖桐 and 姜雨帆 and 张哲旸 and 刘继强 and 张俐 and 于清},
+  journal={中文信息学报},
+  volume={33},
+  number={3},
+  year={2019},
+}
+
+@inproceedings{DBLP:conf/iclr/WuLLLH20,
+  author    = {Zhanghao Wu and
+               Zhijian Liu and
+               Ji Lin and
+               Yujun Lin and
+               Song Han},
+  title     = {Lite Transformer with Long-Short Range Attention},
+  publisher = {International Conference on Learning Representations},
+  year      = {2020}
+}
+
+@inproceedings{DBLP:conf/iclr/DehghaniGVUK19,
+  author    = {Mostafa Dehghani and
+               Stephan Gouws and
+               Oriol Vinyals and
+               Jakob Uszkoreit and
+               Lukasz Kaiser},
+  title     = {Universal Transformers},
+  publisher = {International Conference on Learning Representations},
+  year      = {2019}
+}
+
+@article{Lan2020ALBERTAL,
+  title={ALBERT: A Lite BERT for Self-supervised Learning of Language Representations},
+  author={Zhenzhong Lan and Mingda Chen and Sebastian Goodman and Kevin Gimpel and Piyush Sharma and Radu Soricut},
+  publisher={International Conference on Learning Representations}
+}
+
+@inproceedings{DBLP:conf/naacl/HaoWYWZT19,
+  author    = {Jie Hao and
+               Xing Wang and
+               Baosong Yang and
+               Longyue Wang and
+               Jinfeng Zhang and
+               Zhaopeng Tu},
+  title     = {Modeling Recurrence for Transformer},
+  pages     = {1198--1207},
+  publisher = {Annual Conference of the North American Chapter of the Association for Computational Linguistics},
+  year      = {2019}
+}
+
+@inproceedings{DBLP:conf/emnlp/QiuMLYW020,
+  author    = {Jiezhong Qiu and
+               Hao Ma and
+               Omer Levy and
+               Wen-tau Yih and
+               Sinong Wang and
+               Jie Tang},
+  title     = {Blockwise Self-Attention for Long Document Understanding},
+  pages     = {2555--2565},
+  publisher = {Conference on Empirical Methods in Natural Language Processing},
+  year      = {2020}
+}
+
+@inproceedings{DBLP:conf/iclr/LiuSPGSKS18,
+  author    = {Peter J. Liu and
+               Mohammad Saleh and
+               Etienne Pot and
+               Ben Goodrich and
+               Ryan Sepassi and
+               Lukasz Kaiser and
+               Noam Shazeer},
+  title     = {Generating Wikipedia by Summarizing Long Sequences},
+  publisher = {International Conference on Learning Representations},
+  year      = {2018}
+}
+
+@article{DBLP:journals/corr/abs-2004-05150,
+  author    = {Iz Beltagy and
+               Matthew E. Peters and
+               Arman Cohan},
+  title     = {Longformer: The Long-Document Transformer},
+  journal   = {CoRR},
+  volume    = {abs/2004.05150},
+  year      = {2020}
+}
+
+@article{Kitaev2020ReformerTE,
+  author    = {Nikita Kitaev and
+               Lukasz Kaiser and
+               Anselm Levskaya},
+  title     = {Reformer: The Efficient Transformer},
+  journal = {International Conference on Learning Representations},
+  year      = {2020}
+}
+
+@article{DBLP:journals/corr/abs-2003-05997,
+  author    = {Aurko Roy and
+               Mohammad Saffar and
+               Ashish Vaswani and
+               David Grangier},
+  title     = {Efficient Content-Based Sparse Attention with Routing Transformers},
+  journal   = {CoRR},
+  volume    = {abs/2003.05997},
+  year      = {2020}
+}
+
+@article{Katharopoulos2020TransformersAR,
+  title={Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention},
+  author={Angelos Katharopoulos and Apoorv Vyas and Nikolaos Pappas and Franccois Fleuret},
+  journal={CoRR},
+  year={2020},
+  volume={abs/2006.16236}
+}
+
+@article{DBLP:journals/corr/abs-2009-14794,
+  author    = {Krzysztof Choromanski and
+               Valerii Likhosherstov and
+               David Dohan and
+               Xingyou Song and
+               Andreea Gane and
+               Tam{\'{a}}s Sarl{\'{o}}s and
+               Peter Hawkins and
+               Jared Davis and
+               Afroz Mohiuddin and
+               Lukasz Kaiser and
+               David Belanger and
+               Lucy Colwell and
+               Adrian Weller},
+  title     = {Rethinking Attention with Performers},
+  journal   = {CoRR},
+  volume    = {abs/2009.14794},
+  year      = {2020}
+}
+
+@inproceedings{DBLP:conf/emnlp/HaoWSZT19,
+  author    = {Jie Hao and
+               Xing Wang and
+               Shuming Shi and
+               Jinfeng Zhang and
+               Zhaopeng Tu},
+  title     = {Multi-Granularity Self-Attention for Neural Machine Translation},
+  pages     = {887--897},
+  publisher = {Conference on Empirical Methods in Natural Language Processing},
+  year      = {2019}
+}
+
+@inproceedings{DBLP:conf/emnlp/Lin0RLS18,
+  author    = {Junyang Lin and
+               Xu Sun and
+               Xuancheng Ren and
+               Muyu Li and
+               Qi Su},
+  title     = {Learning When to Concentrate or Divert Attention: Self-Adaptive Attention
+               Temperature for Neural Machine Translation},
+  pages     = {2985--2990},
+  publisher = {Conference on Empirical Methods in Natural Language Processing},
+  year      = {2018}
+}
+
+@article{DBLP:journals/corr/abs-2006-04768,
+  author    = {Sinong Wang and
+               Belinda Z. Li and
+               Madian Khabsa and
+               Han Fang and
+               Hao Ma},
+  title     = {Linformer: Self-Attention with Linear Complexity},
+  journal   = {CoRR},
+  volume    = {abs/2006.04768},
+  year      = {2020}
+}
+
+@inproceedings{DBLP:conf/nips/BergstraBBK11,
+  author    = {James Bergstra and
+               R{\'{e}}mi Bardenet and
+               Yoshua Bengio and
+               Bal{\'{a}}zs K{\'{e}}gl},
+  title     = {Algorithms for Hyper-Parameter Optimization},
+  publisher = {Advances in Neural Information Processing Systems},
+  pages     = {2546--2554},
+  year      = {2011}
+}
+
+@inproceedings{DBLP:conf/lion/HutterHL11,
+  author    = {Frank Hutter and
+               Holger H. Hoos and
+               Kevin Leyton-Brown},
+  title     = {Sequential Model-Based Optimization for General Algorithm Configuration},
+  series    = {Lecture Notes in Computer Science},
+  volume    = {6683},
+  pages     = {507--523},
+  publisher = {Learning and Intelligent Optimization},
+  year      = {2011}
+}
+
+@inproceedings{DBLP:conf/icml/BergstraYC13,
+  author    = {James Bergstra and
+               Daniel Yamins and
+               David D. Cox},
+  title     = {Making a Science of Model Search: Hyperparameter Optimization in Hundreds
+               of Dimensions for Vision Architectures},
+  series    = {{JMLR} Workshop and Conference Proceedings},
+  volume    = {28},
+  pages     = {115--123},
+  publisher = {International Conference on Machine Learning},
+  year      = {2013}
+}
+
+@inproceedings{DBLP:conf/iccv/ChenXW019,
+  author    = {Xin Chen and
+               Lingxi Xie and
+               Jun Wu and
+               Qi Tian},
+  title     = {Progressive Differentiable Architecture Search: Bridging the Depth
+               Gap Between Search and Evaluation},
+  pages     = {1294--1303},
+  publisher = {IEEE International Conference on Computer Vision},
+  year      = {2019}
+}
+
+@inproceedings{DBLP:conf/icml/ChenH20,
+  author    = {Xiangning Chen and
+               Cho-Jui Hsieh},
+  title     = {Stabilizing Differentiable Architecture Search via Perturbation-based
+               Regularization},
+  series    = {Proceedings of Machine Learning Research},
+  volume    = {119},
+  pages     = {1554--1565},
+  publisher = {International Conference on Machine Learning},
+  year      = {2020}
+}

 %%%%% chapter 15------------------------------------------------------
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@@ -9274,7 +9644,7 @@ author    = {Zhuang Liu and
 @inproceedings{finding2006adafre,
  author    = {S. F. Adafre and Maarten de Rijke},
  title     = {Finding Similar Sentences across Multiple Languages in Wikipedia },
-  publisher = {European Association of Computational Linguistics},
+  publisher = {Annual Conference of the European Association for Machine Translation},
  year      = {2006}
 }
 @inproceedings{method2008keiji,
@@ -10044,7 +10414,7 @@ author    = {Zhuang Liu and
  author    = {Sinno Jialin Pan and
               Qiang Yang},
  title     = {A Survey on Transfer Learning},
-  journal   = {{IEEE} Trans. Knowl. Data Eng.},
+  journal   = {IEEE Transactions on knowledge and data engineering},
  volume    = {22},
  number    = {10},
  pages     = {1345--1359},
@@ -10428,7 +10798,7 @@ author    = {Zhuang Liu and
               Mirella Lapata},
  title     = {Paraphrasing Revisited with Neural Machine Translation},
  pages     = {881--893},
-  publisher = {European Association of Computational Linguistics},
+  publisher = {Annual Conference of the European Association for Machine Translation},
  year      = {2017}
 }
 @article{2005Improving,
@@ -11324,7 +11694,7 @@ author    = {Zhuang Liu and
               Marcello Federico},
  title     = {Neural vs. Phrase-Based Machine Translation in a Multi-Domain Scenario},
  pages     = {280--284},
-  publisher = {European Association of Computational Linguistics},
+  publisher = {Annual Conference of the European Association for Machine Translation},
  year      = {2017}
 }
 @inproceedings{DBLP:conf/aaai/Zhang0LZC18,
@@ -11553,12 +11923,771 @@ author    = {Zhuang Liu and

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 %%%%% chapter 17------------------------------------------------------
-
-%%%%% chapter 17------------------------------------------------------
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-%%%%% chapter 18------------------------------------------------------
+@article{DBLP:journals/ac/Bar-Hillel60,
+  author    = {Yehoshua Bar-Hillel},
+  title     = {The Present Status of Automatic Translation of Languages},
+  journal   = {Advances in computers},
+  volume    = {1},
+  pages     = {91--163},
+  year      = {1960}
+}
+@article{DBLP:journals/corr/abs-1901-09115,
+  author    = {Andrei Popescu-Belis},
+  title     = {Context in Neural Machine Translation: {A} Review of Models and Evaluations},
+  journal   = {CoRR},
+  volume    = {abs/1901.09115},
+  year      = {2019}
+}
+@book{jurafsky2000speech,
+  title={Speech \& language processing},
+  author={Jurafsky, Dan},
+  year={2000},
+  publisher={Pearson Education India}
+}
+@inproceedings{DBLP:conf/anlp/MarcuCW00,
+  author    = {Daniel Marcu and
+               Lynn Carlson and
+               Maki Watanabe},
+  title     = {The Automatic Translation of Discourse Structures},
+  pages     = {9--17},
+  publisher = {Applied Natural Language Processing Conference},
+  year      = {2000}
+}
+@inproceedings{foster2010translating,
+  title={Translating structured documents},
+  author={Foster, George and Isabelle, Pierre and Kuhn, Roland},
+  booktitle={Proceedings of AMTA},
+  year={2010}
+}
+@inproceedings{DBLP:conf/eacl/LouisW14,
+  author    = {Annie Louis and
+               Bonnie L. Webber},
+  title     = {Structured and Unstructured Cache Models for {SMT} Domain Adaptation},
+  pages     = {155--163},
+  publisher = {Annual Conference of the European Association for Machine Translation},
+  year      = {2014}
+}
+@inproceedings{DBLP:conf/iwslt/HardmeierF10,
+  author    = {Christian Hardmeier and
+               Marcello Federico},
+  title     = {Modelling pronominal anaphora in statistical machine translation},
+  pages     = {283--289},
+  publisher = {International Workshop on Spoken Language Translation},
+  year      = {2010}
+}
+@inproceedings{DBLP:conf/wmt/NagardK10,
+  author    = {Ronan Le Nagard and
+               Philipp Koehn},
+  title     = {Aiding Pronoun Translation with Co-Reference Resolution},
+  pages     = {252--261},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2010}
+}
+@inproceedings{DBLP:conf/eamt/LuongP16,
+  author    = {Ngoc-Quang Luong and
+               Andrei Popescu-Belis},
+  title     = {A Contextual Language Model to Improve Machine Translation of Pronouns
+               by Re-ranking Translation Hypotheses},
+  pages     = {292--304},
+  publisher = {European Association for Machine Translation},
+  year      = {2016}
+}
+@inproceedings{tiedemann2010context,
+  title={Context adaptation in statistical machine translation using models with exponentially decaying cache},
+  author={Tiedemann, J{\"o}rg},
+  publisher={Domain Adaptation for Natural Language Processing},
+  pages={8--15},
+  year={2010}
+}
+@inproceedings{DBLP:conf/emnlp/GongZZ11,
+  author    = {Zhengxian Gong and
+               Min Zhang and
+               Guodong Zhou},
+  title     = {Cache-based Document-level Statistical Machine Translation},
+  pages     = {909--919},
+  publisher = {Conference on Empirical Methods in Natural Language Processing},
+  year      = {2011}
+}
+@inproceedings{DBLP:conf/ijcai/XiongBZLL13,
+  author    = {Deyi Xiong and
+               Guosheng Ben and
+               Min Zhang and
+               Yajuan Lv and
+               Qun Liu},
+  title     = {Modeling Lexical Cohesion for Document-Level Machine Translation},
+  pages     = {2183--2189},
+  publisher = {	International Joint Conference on Artificial Intelligence},
+  year      = {2013}
+}
+@inproceedings{xiao2011document,
+  title={Document-level consistency verification in machine translation},
+  author={Xiao, Tong and Zhu, Jingbo and Yao, Shujie and Zhang, Hao},
+  booktitle={Machine Translation Summit},
+  volume={13},
+  pages={131--138},
+  year={2011}
+}
+@inproceedings{DBLP:conf/sigdial/MeyerPZC11,
+  author    = {Thomas Meyer and
+               Andrei Popescu-Belis and
+               Sandrine Zufferey and
+               Bruno Cartoni},
+  title     = {Multilingual Annotation and Disambiguation of Discourse Connectives
+               for Machine Translation},
+  pages     = {194--203},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2011}
+}
+@inproceedings{DBLP:conf/hytra/MeyerP12,
+  author    = {Thomas Meyer and
+               Andrei Popescu-Belis},
+  title     = {Using Sense-labeled Discourse Connectives for Statistical Machine
+               Translation},
+  pages     = {129--138},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2012}
+}
+@inproceedings{DBLP:conf/nips/SutskeverVL14,
+  author    = {Ilya Sutskever and
+               Oriol Vinyals and
+               Quoc V. Le},
+  title     = {Sequence to Sequence Learning with Neural Networks},
+  pages     = {3104--3112},
+  year      = {2014},
+  publisher = {Conference and Workshop on Neural Information Processing Systems}
+}
+@inproceedings{DBLP:conf/emnlp/LaubliS018,
+  author    = {Samuel L{\"{a}}ubli and
+               Rico Sennrich and
+               Martin Volk},
+  title     = {Has Machine Translation Achieved Human Parity? {A} Case for Document-level
+               Evaluation},
+  pages     = {4791--4796},
+  publisher = {Conference on Empirical Methods in Natural Language Processing},
+  year      = {2018}
+}
+@article{DBLP:journals/corr/abs-1912-08494,
+  author    = {Sameen Maruf and
+               Fahimeh Saleh and
+               Gholamreza Haffari},
+  title     = {A Survey on Document-level Machine Translation: Methods and Evaluation},
+  journal   = {CoRR},
+  volume    = {abs/1912.08494},
+  year      = {2019}
+}
+@inproceedings{DBLP:conf/discomt/TiedemannS17,
+  author    = {J{\"{o}}rg Tiedemann and
+               Yves Scherrer},
+  title     = {Neural Machine Translation with Extended Context},
+  pages     = {82--92},
+  publisher = {Association for Computational Linguistics},
+  year      = {2017}
+}
+@article{DBLP:journals/corr/abs-1910-07481,
+  author    = {Valentin Mac{\'{e}} and
+               Christophe Servan},
+  title     = {Using Whole Document Context in Neural Machine Translation},
+  journal   = {CoRR},
+  volume    = {abs/1910.07481},
+  year      = {2019}
+}
+@article{DBLP:journals/corr/JeanLFC17,
+  author    = {S{\'{e}}bastien Jean and
+               Stanislas Lauly and
+               Orhan Firat and
+               Kyunghyun Cho},
+  title     = {Does Neural Machine Translation Benefit from Larger Context?},
+  journal   = {CoRR},
+  volume    = {abs/1704.05135},
+  year      = {2017}
+}
+@inproceedings{DBLP:conf/acl/TitovSSV18,
+  author    = {Elena Voita and
+               Pavel Serdyukov and
+               Rico Sennrich and
+               Ivan Titov},
+  title     = {Context-Aware Neural Machine Translation Learns Anaphora Resolution},
+  pages     = {1264--1274},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2018}
+}
+@inproceedings{DBLP:conf/acl/HaffariM18,
+  author    = {Sameen Maruf and
+               Gholamreza Haffari},
+  title     = {Document Context Neural Machine Translation with Memory Networks},
+  pages     = {1275--1284},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2018}
+}
+@inproceedings{DBLP:conf/coling/KuangXLZ18,
+  author    = {Shaohui Kuang and
+               Deyi Xiong and
+               Weihua Luo and
+               Guodong Zhou},
+  title     = {Modeling Coherence for Neural Machine Translation with Dynamic and
+               Topic Caches},
+  pages     = {596--606},
+  publisher = {International Conference on Computational Linguistics},
+  year      = {2018}
+}
+@inproceedings{DBLP:conf/discomt/GarciaCE19,
+  author    = {Eva Mart{\'{\i}}nez Garcia and
+               Carles Creus and
+               Cristina Espa{\~{n}}a-Bonet},
+  title     = {Context-Aware Neural Machine Translation Decoding},
+  pages     = {13--23},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2019}
+}
+@article{DBLP:journals/corr/abs-2010-12827,
+  author    = {Amane Sugiyama and
+               Naoki Yoshinaga},
+  title     = {Context-aware Decoder for Neural Machine Translation using a Target-side
+               Document-Level Language Model},
+  journal   = {CoRR},
+  volume    = {abs/2010.12827},
+  year      = {2020}
+}
+@inproceedings{DBLP:conf/acl/VoitaST19,
+  author    = {Elena Voita and
+               Rico Sennrich and
+               Ivan Titov},
+  title     = {When a Good Translation is Wrong in Context: Context-Aware Machine
+               Translation Improves on Deixis, Ellipsis, and Lexical Cohesion},
+  pages     = {1198--1212},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2019}
+}
+@inproceedings{DBLP:conf/emnlp/VoitaST19,
+  author    = {Elena Voita and
+               Rico Sennrich and
+               Ivan Titov},
+  title     = {Context-Aware Monolingual Repair for Neural Machine Translation},
+  pages     = {877--886},
+  publisher = {Conference on Empirical Methods in Natural Language Processing},
+  year      = {2019}
+}
+@inproceedings{DBLP:conf/discomt/WerlenP17,
+  author    = {Lesly Miculicich Werlen and
+               Andrei Popescu-Belis},
+  title     = {Validation of an Automatic Metric for the Accuracy of Pronoun Translation
+               {(APT)}},
+  pages     = {17--25},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2017}
+}
+@inproceedings{DBLP:conf/emnlp/WongK12,
+  author    = {Billy Tak-Ming Wong and
+               Chunyu Kit},
+  title     = {Extending Machine Translation Evaluation Metrics with Lexical Cohesion
+               to Document Level},
+  pages     = {1060--1068},
+  publisher = {Conference on Empirical Methods in Natural Language Processing},
+  year      = {2012}
+}
+@inproceedings{DBLP:conf/discomt/GongZZ15,
+  author    = {Zhengxian Gong and
+               Min Zhang and
+               Guodong Zhou},
+  title     = {Document-Level Machine Translation Evaluation with Gist Consistency
+               and Text Cohesion},
+  pages     = {33--40},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2015}
+}
+@inproceedings{DBLP:conf/cicling/HajlaouiP13,
+  author    = {Najeh Hajlaoui and
+               Andrei Popescu-Belis},
+  title     = {Assessing the Accuracy of Discourse Connective Translations: Validation
+               of an Automatic Metric},
+  volume    = {7817},
+  pages     = {236--247},
+  publisher = {Springer},
+  year      = {2013}
+}
+@inproceedings{DBLP:conf/wmt/RiosMS18,
+  author    = {Annette Rios and
+               Mathias M{\"{u}}ller and
+               Rico Sennrich},
+  title     = {The Word Sense Disambiguation Test Suite at {WMT18}},
+  pages     = {588--596},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2018}
+}
+@inproceedings{DBLP:conf/naacl/BawdenSBH18,
+  author    = {Rachel Bawden and
+               Rico Sennrich and
+               Alexandra Birch and
+               Barry Haddow},
+  title     = {Evaluating Discourse Phenomena in Neural Machine Translation},
+  pages     = {1304--1313},
+  publisher = {Annual Conference of the North American Chapter of the Association for Computational Linguistics},
+  year      = {2018}
+}
+@inproceedings{DBLP:conf/wmt/MullerRVS18,
+  author    = {Mathias M{\"{u}}ller and
+               Annette Rios and
+               Elena Voita and
+               Rico Sennrich},
+  title     = {A Large-Scale Test Set for the Evaluation of Context-Aware Pronoun
+               Translation in Neural Machine Translation},
+  pages     = {61--72},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2018}
+}
+@inproceedings{DBLP:conf/iclr/KitaevKL20,
+  author    = {Nikita Kitaev and
+               Lukasz Kaiser and
+               Anselm Levskaya},
+  title     = {Reformer: The Efficient Transformer},
+  publisher = {International Conference on Learning Representations},
+  year      = {2020}
+}
+@inproceedings{agrawal2018contextual,
+  title={Contextual handling in neural machine translation: Look behind, ahead and on both sides},
+  author={Agrawal, Ruchit Rajeshkumar and Turchi, Marco and Negri, Matteo},
+  booktitle={Annual Conference of the European Association for Machine Translation},
+  pages={11--20},
+  year={2018}
+}
+@inproceedings{DBLP:conf/emnlp/WerlenRPH18,
+  author    = {Lesly Miculicich Werlen and
+               Dhananjay Ram and
+               Nikolaos Pappas and
+               James Henderson},
+  title     = {Document-Level Neural Machine Translation with Hierarchical Attention
+               Networks},
+  pages     = {2947--2954},
+  publisher = {Conference on Empirical Methods in Natural Language Processing},
+  year      = {2018}
+}
+@inproceedings{DBLP:conf/naacl/MarufMH19,
+  author    = {Sameen Maruf and
+               Andr{\'{e}} F. T. Martins and
+               Gholamreza Haffari},
+  title     = {Selective Attention for Context-aware Neural Machine Translation},
+  pages     = {3092--3102},
+  publisher = {Annual Conference of the North American Chapter of the Association for Computational Linguistics},
+  year      = {2019}
+}
+@inproceedings{DBLP:conf/emnlp/TanZXZ19,
+  author    = {Xin Tan and
+               Longyin Zhang and
+               Deyi Xiong and
+               Guodong Zhou},
+  title     = {Hierarchical Modeling of Global Context for Document-Level Neural
+               Machine Translation},
+  pages     = {1576--1585},
+  publisher = {Conference on Empirical Methods in Natural Language Processing},
+  year      = {2019}
+}
+@inproceedings{DBLP:conf/emnlp/YangZMGFZ19,
+  author    = {Zhengxin Yang and
+               Jinchao Zhang and
+               Fandong Meng and
+               Shuhao Gu and
+               Yang Feng and
+               Jie Zhou},
+  title     = {Enhancing Context Modeling with a Query-Guided Capsule Network for
+               Document-level Translation},
+  pages     = {1527--1537},
+  publisher = {Conference on Empirical Methods in Natural Language Processing},
+  year      = {2019}
+}
+@inproceedings{DBLP:conf/ijcai/ZhengYHCB20,
+  author    = {Zaixiang Zheng and
+               Xiang Yue and
+               Shujian Huang and
+               Jiajun Chen and
+               Alexandra Birch},
+  title     = {Towards Making the Most of Context in Neural Machine Translation},
+  pages     = {3983--3989},
+  publisher = {International Joint Conference on Artificial Intelligence},
+  year      = {2020}
+}
+@article{DBLP:journals/tacl/TuLSZ18,
+  author    = {Zhaopeng Tu and
+               Yang Liu and
+               Shuming Shi and
+               Tong Zhang},
+  title     = {Learning to Remember Translation History with a Continuous Cache},
+  publisher = {Transactions of the Association for Computational Linguistics},
+  volume    = {6},
+  pages     = {407--420},
+  year      = {2018}
+}
+@inproceedings{DBLP:conf/discomt/ScherrerTL19,
+  author    = {Yves Scherrer and
+               J{\"{o}}rg Tiedemann and
+               Sharid Lo{\'{a}}iciga},
+  title     = {Analysing concatenation approaches to document-level {NMT} in two
+               different domains},
+  pages     = {51--61},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2019}
+}
+@inproceedings{DBLP:conf/wmt/GonzalesMS17,
+  author    = {Annette Rios Gonzales and
+               Laura Mascarell and
+               Rico Sennrich},
+  title     = {Improving Word Sense Disambiguation in Neural Machine Translation
+               with Sense Embeddings},
+  pages     = {11--19},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2017}
+}
+@inproceedings{DBLP:conf/acl/LiLWJXZLL20,
+  author    = {Bei Li and
+               Hui Liu and
+               Ziyang Wang and
+               Yufan Jiang and
+               Tong Xiao and
+               Jingbo Zhu and
+               Tongran Liu and
+               Changliang Li},
+  title     = {Does Multi-Encoder Help? {A} Case Study on Context-Aware Neural Machine
+               Translation},
+  pages     = {3512--3518},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2020}
+}
+@inproceedings{DBLP:conf/discomt/KimTN19,
+  author    = {Yunsu Kim and
+               Duc Thanh Tran and
+               Hermann Ney},
+  title     = {When and Why is Document-level Context Useful in Neural Machine Translation?},
+  pages     = {24--34},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2019}
+}
+@inproceedings{DBLP:conf/discomt/SugiyamaY19,
+  author    = {Amane Sugiyama and
+               Naoki Yoshinaga},
+  title     = {Data augmentation using back-translation for context-aware neural
+               machine translation},
+  pages     = {35--44},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2019}
+}
+@inproceedings{DBLP:conf/pacling/YamagishiK19,
+  author    = {Hayahide Yamagishi and
+               Mamoru Komachi},
+  title     = {Improving Context-Aware Neural Machine Translation with Target-Side
+               Context},
+  volume    = {1215},
+  pages     = {112--122},
+  publisher = {Springer},
+  year      = {2019}
+}
+@inproceedings{DBLP:conf/emnlp/ZhangLSZXZL18,
+  author    = {Jiacheng Zhang and
+               Huanbo Luan and
+               Maosong Sun and
+               Feifei Zhai and
+               Jingfang Xu and
+               Min Zhang and
+               Yang Liu},
+  title     = {Improving the Transformer Translation Model with Document-Level Context},
+  pages     = {533--542},
+  publisher = {Conference on Empirical Methods in Natural Language Processing},
+  year      = {2018}
+}
+@inproceedings{DBLP:conf/coling/KuangX18,
+  author    = {Shaohui Kuang and
+               Deyi Xiong},
+  title     = {Fusing Recency into Neural Machine Translation with an Inter-Sentence
+               Gate Model},
+  pages     = {607--617},
+  publisher = {International Conference on Computational Linguistics},
+  year      = {2018}
+}
+@inproceedings{DBLP:conf/emnlp/WangTWL17,
+  author    = {Longyue Wang and
+               Zhaopeng Tu and
+               Andy Way and
+               Qun Liu},
+  title     = {Exploiting Cross-Sentence Context for Neural Machine Translation},
+  pages     = {2826--2831},
+  publisher = {Conference on Empirical Methods in Natural Language Processing},
+  year      = {2017}
+}
+@inproceedings{DBLP:conf/aaai/XiongH0W19,
+  author    = {Hao Xiong and
+               Zhongjun He and
+               Hua Wu and
+               Haifeng Wang},
+  title     = {Modeling Coherence for Discourse Neural Machine Translation},
+  pages     = {7338--7345},
+  publisher = {{AAAI} Press},
+  year      = {2019}
+}
+@article{DBLP:journals/tacl/YuSSLKBD20,
+  author    = {Lei Yu and
+               Laurent Sartran and
+               Wojciech Stokowiec and
+               Wang Ling and
+               Lingpeng Kong and
+               Phil Blunsom and
+               Chris Dyer},
+  title     = {Better Document-Level Machine Translation with Bayes' Rule},
+  journal   = {Transactions of the Association for Computational Linguistics},
+  volume    = {8},
+  pages     = {346--360},
+  year      = {2020}
+}
+@article{DBLP:journals/corr/abs-1903-04715,
+  author    = {S{\'{e}}bastien Jean and
+               Kyunghyun Cho},
+  title     = {Context-Aware Learning for Neural Machine Translation},
+  journal   = {CoRR},
+  volume    = {abs/1903.04715},
+  year      = {2019}
+}
+@inproceedings{DBLP:conf/acl/SaundersSB20,
+  author    = {Danielle Saunders and
+               Felix Stahlberg and
+               Bill Byrne},
+  title     = {Using Context in Neural Machine Translation Training Objectives},
+  pages     = {7764--7770},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2020}
+}
+@inproceedings{DBLP:conf/mtsummit/StojanovskiF19,
+  author    = {Dario Stojanovski and
+               Alexander M. Fraser},
+  title     = {Improving Anaphora Resolution in Neural Machine Translation Using
+               Curriculum Learning},
+  pages     = {140--150},
+  publisher = {Annual Conference of the European Association for Machine Translation},
+  year      = {2019}
+}
+@article{DBLP:journals/corr/abs-1911-03110,
+  author    = {Liangyou Li and
+               Xin Jiang and
+               Qun Liu},
+  title     = {Pretrained Language Models for Document-Level Neural Machine Translation},
+  publisher = {CoRR},
+  volume    = {abs/1911.03110},
+  year      = {2019}
+}
+@article{DBLP:journals/tacl/LiuGGLEGLZ20,
+  author    = {Yinhan Liu and
+               Jiatao Gu and
+               Naman Goyal and
+               Xian Li and
+               Sergey Edunov and
+               Marjan Ghazvininejad and
+               Mike Lewis and
+               Luke Zettlemoyer},
+  title     = {Multilingual Denoising Pre-training for Neural Machine Translation},
+  journal   = {Transactions of the Association for Computational Linguistics},
+  volume    = {8},
+  pages     = {726--742},
+  year      = {2020}
+}
+@inproceedings{DBLP:conf/wmt/MarufMH18,
+  author    = {Sameen Maruf and
+               Andr{\'{e}} F. T. Martins and
+               Gholamreza Haffari},
+  title     = {Contextual Neural Model for Translating Bilingual Multi-Speaker Conversations},
+  pages     = {101--112},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2018}
+}
+%%%%% chapter 17------------------------------------------------------
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%% chapter 18------------------------------------------------------
+@article{DBLP:journals/corr/abs-2010-05680,
+  author    = {Jiarui Fang and
+               Yang Yu and
+               Chengduo Zhao and
+               Jie Zhou},
+  title     = {TurboTransformers: An Efficient {GPU} Serving System For Transformer
+               Models},
+  journal   = {CoRR},
+  volume    = {abs/2010.05680},
+  year      = {2020}
+}
+
+@inproceedings{DBLP:conf/iclr/HuangCLWMW18,
+  author    = {Gao Huang and
+               Danlu Chen and
+               Tianhong Li and
+               Felix Wu and
+               Laurens van der Maaten and
+               Kilian Q. Weinberger},
+  title     = {Multi-Scale Dense Networks for Resource Efficient Image Classification},
+  publisher = {International Conference on Learning Representations},
+  year      = {2018}
+}
+
+@article{DBLP:journals/corr/BolukbasiWDS17,
+  author    = {Tolga Bolukbasi and
+               Joseph Wang and
+               Ofer Dekel and
+               Venkatesh Saligrama},
+  title     = {Adaptive Neural Networks for Fast Test-Time Prediction},
+  journal   = {CoRR},
+  volume    = {abs/1702.07811},
+  year      = {2017}
+}
+
+@inproceedings{DBLP:conf/emnlp/WangXZ20,
+  author    = {Qiang Wang and
+               Tong Xiao and
+               Jingbo Zhu},
+  title     = {Training Flexible Depth Model by Multi-Task Learning for Neural Machine
+               Translation},
+  pages     = {4307--4312},
+  publisher = {Conference on Empirical Methods in Natural Language Processing},
+  year      = {2020}
+}
+
+@inproceedings{DBLP:conf/ijcai/ChenCWL20,
+  author    = {Guanhua Chen and
+               Yun Chen and
+               Yong Wang and
+               Victor O. K. Li},
+  title     = {Lexical-Constraint-Aware Neural Machine Translation via Data Augmentation},
+  pages     = {3587--3593},
+  publisher = {International Joint Conference on Artificial Intelligence},
+  year      = {2020}
+}
+
+@article{DBLP:journals/corr/abs-1912-00567,
+  author    = {Tao Wang and
+               Shaohui Kuang and
+               Deyi Xiong and
+               Ant{\'{o}}nio Branco},
+  title     = {Merging External Bilingual Pairs into Neural Machine Translation},
+  journal   = {CoRR},
+  volume    = {abs/1912.00567},
+  year      = {2019}
+}
+
+@inproceedings{DBLP:conf/acl/DinuMFA19,
+  author    = {Georgiana Dinu and
+               Prashant Mathur and
+               Marcello Federico and
+               Yaser Al-Onaizan},
+  title     = {Training Neural Machine Translation to Apply Terminology Constraints},
+  pages     = {3063--3068},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2019}
+}
+
+@inproceedings{DBLP:conf/naacl/SongZYLWZ19,
+  author    = {Kai Song and
+               Yue Zhang and
+               Heng Yu and
+               Weihua Luo and
+               Kun Wang and
+               Min Zhang},
+  title     = {Code-Switching for Enhancing {NMT} with Pre-Specified Translation},
+  pages     = {449--459},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2019}
+}
+
+@inproceedings{DBLP:conf/naacl/HaslerGIB18,
+  author    = {Eva Hasler and
+               Adri{\`{a}} de Gispert and
+               Gonzalo Iglesias and
+               Bill Byrne},
+  title     = {Neural Machine Translation Decoding with Terminology Constraints},
+  pages     = {506--512},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2018}
+}
+
+@inproceedings{DBLP:conf/wmt/ChatterjeeNTFSB17,
+  author    = {Rajen Chatterjee and
+               Matteo Negri and
+               Marco Turchi and
+               Marcello Federico and
+               Lucia Specia and
+               Fr{\'{e}}d{\'{e}}ric Blain},
+  title     = {Guiding Neural Machine Translation Decoding with External Knowledge},
+  pages     = {157--168},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2017}
+}
+
+@inproceedings{DBLP:conf/naacl/PostV18,
+  author    = {Matt Post and
+               David Vilar},
+  title     = {Fast Lexically Constrained Decoding with Dynamic Beam Allocation for
+               Neural Machine Translation},
+  pages     = {1314--1324},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2018}
+}
+
+@inproceedings{DBLP:conf/acl/HokampL17,
+  author    = {Chris Hokamp and
+               Qun Liu},
+  title     = {Lexically Constrained Decoding for Sequence Generation Using Grid
+               Beam Search},
+  pages     = {1535--1546},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2017}
+}
+
+@inproceedings{DBLP:conf/naacl/ThompsonGKDK19,
+  author    = {Brian Thompson and
+               Jeremy Gwinnup and
+               Huda Khayrallah and
+               Kevin Duh and
+               Philipp Koehn},
+  title     = {Overcoming Catastrophic Forgetting During Domain Adaptation of Neural
+               Machine Translation},
+  pages     = {2062--2068},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2019}
+}
+
+@inproceedings{DBLP:conf/aclnmt/KhayrallahTDK18,
+  author    = {Huda Khayrallah and
+               Brian Thompson and
+               Kevin Duh and
+               Philipp Koehn},
+  title     = {Regularized Training Objective for Continued Training for Domain Adaptation
+               in Neural Machine Translation},
+  pages     = {36--44},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2018}
+}
+
+@article{barone2017regularization,
+  title={Regularization techniques for fine-tuning in neural machine translation},
+  author={Barone, Antonio Valerio Miceli and Haddow, Barry and Germann, Ulrich and Sennrich, Rico},
+  journal={arXiv preprint arXiv:1707.09920},
+  year={2017}
+}
+
+@article{DBLP:journals/corr/ChuDK17,
+  author    = {Chenhui Chu and
+               Raj Dabre and
+               Sadao Kurohashi},
+  title     = {An Empirical Comparison of Simple Domain Adaptation Methods for Neural
+               Machine Translation},
+  journal   = {CoRR},
+  volume    = {abs/1701.03214},
+  year      = {2017}
+}
+
+@inproceedings{DBLP:conf/coling/GuF20,
+  author    = {Shuhao Gu and
+               Yang Feng},
+  title     = {Investigating Catastrophic Forgetting During Continual Training for
+               Neural Machine Translation},
+  pages     = {4315--4326},
+  publisher = {International Committee on Computational Linguistics},
+  year      = {2020}
+}
+

 %%%%% chapter 18------------------------------------------------------
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@@ -11785,7 +12914,7 @@ author    = {Zhuang Liu and
               Jozef Mokry and
               Maria Nadejde},
  title     = {Nematus: a Toolkit for Neural Machine Translation},
-  publisher = {European Association of Computational Linguistics},
+  publisher = {Annual Conference of the European Association for Machine Translation},
  pages     = {65--68},
  year      = {2017}
 }