合并分支 'caorunzhe' 到 'mengxia'

Caorunzhe 查看合并请求 !554

合并分支 'caorunzhe' 到 'mengxia'
Caorunzhe 查看合并请求 !554
4c3a48d1 · 孟霞 · 29861056 · d4c2adbd · 4c3a48d1 · 4c3a48d1
Commit 4c3a48d1 authored Dec 07, 2020 by 孟霞
--- a/Chapter10/chapter10.tex
+++ b/Chapter10/chapter10.tex
@@ -376,7 +376,7 @@ NMT                     & 21.7          & 18.7           & -13.7      \\
 %    NEW SECTION   10.3
 %----------------------------------------------------------------------------------------
 \sectionnewpage
-\section{基于循环神经网络的翻译模型}
+\section{基于循环神经网络的翻译建模}

 \parinterval 早期神经机器翻译的进展主要来自两个方面：1）使用循环神经网络对单词序列进行建模；2）注意力机制的使用。表\ref{tab:10-6}列出了2013-2015年间有代表性的部分研究工作。从这些工作的内容上看，当时的研究重点还是如何有效地使用循环神经网络进行翻译建模以及使用注意力机制捕捉双语单词序列间的对应关系。


--- a/Chapter11/chapter11.tex
+++ b/Chapter11/chapter11.tex
@@ -231,7 +231,7 @@
 %    NEW SECTION
 %----------------------------------------------------------------------------------------

-\section{基于卷积神经网络的机器翻译模型}
+\section{基于卷积神经网络的翻译建模}

 \parinterval 正如之前所讲，卷积神经网络可以用于序列建模，同时具有并行性高和易于学习的特点，一个很自然的想法就是将其用作神经机器翻译模型中的特征提取器。因此，在神经机器翻译被提出之初，研究人员就已经开始利用卷积神经网络对句子进行特征提取。比较经典的模型是使用卷积神经网络作为源语言句子的编码器，使用循环神经网络作为目标语译文生成的解码器\upcite{kalchbrenner-blunsom-2013-recurrent,Gehring2017ACE}。之后也有研究人员提出完全基于卷积神经网络的翻译模型（ConvS2S）\upcite{DBLP:journals/corr/GehringAGYD17}，或者针对卷积层进行改进，提出效率更高、性能更好的模型\upcite{Kaiser2018DepthwiseSC,Wu2019PayLA}。本节将基于ConvS2S模型，阐述如何使用卷积神经网络搭建端到端神经机器翻译模型。


--- a/Chapter15/chapter15.tex
+++ b/Chapter15/chapter15.tex
@@ -23,6 +23,10 @@

 \chapter{神经机器翻译结构优化}

+模型结构的设计是机器翻译系统研发中最重要的部分。在神经机器翻译中，虽然系统研发人员脱离了繁琐的特征工程，但是神经网络结构的设计仍然非常重要。无论是像循环神经网络、Transformer这样的整体架构的设计，还是注意力机制等局部结构的设计，都对机器翻译性能有着很大的影响。
+
+本章主要讨论神经机器翻译中若干结构优化的方向，包括：注意力机制的改进、网络连接优化及深层网络建模、基于树结构的模型、神经网络结构自动搜索等。这些内容可以指导神经机器翻译系统的深入优化，其中涉及的一些模型和方法也可以应用于其他自然语言处理任务。
+
 %----------------------------------------------------------------------------------------
 %    NEW SECTION
 %----------------------------------------------------------------------------------------
@@ -454,7 +458,7 @@ $\mathbi{g}_l$会作为输入的一部分送入第$l+1$层。其网络的结构
 \end{figure}
 %-------------------------------------------

-\parinterval 针对该问题的一个解决方案是修改学习率曲线的衰减策略。图中蓝色的曲线是修改后的学习率曲线。首先在训练的初期让网络快速的达到学习率的峰值（线性递增），之后的每一次从$p$层网络变为$2p$层网络时，都会将当前的学习率值重置到峰值点。之后，根据训练的步数对其进行相应的衰减。具体的步骤如下： 
+\parinterval 针对该问题的一个解决方案是修改学习率曲线的衰减策略。图中蓝色的曲线是修改后的学习率曲线。首先在训练的初期让网络快速的达到学习率的峰值（线性递增），之后的每一次从$p$层网络变为$2p$层网络时，都会将当前的学习率值重置到峰值点。之后，根据训练的步数对其进行相应的衰减。具体的步骤如下：
 \begin{itemize}
 \vspace{0.5em}
 \item 在训练的初期，模型先经历一个学习率预热的过程：
@@ -584,7 +588,7 @@ a = \funp{P}(\cdot|\mathbi{x};a)
 \vspace{0.5em}
 \item 设计搜索空间：理论上来说网络结构搜索应在所有潜在的模型结构所组成的空间中进行搜索（图\ref{fig:15-16}）。在这种情况下如果不对候选模型结构进行限制的话，搜索空间会十分巨大。因此，在实际的结构搜索过程中往往会针对特定任务设计一个搜索空间，这个搜索空间是全体结构空间的一个子集，之后的搜索过程将在这个子空间中进行。如图\ref{fig:15-16}例子中的搜索空间所示，该空间由循环神经网络构成，其中候选的模型包括人工设计的LSTM、GRU等模型结构，也包括其他潜在的循环神经网络结构。
 \vspace{0.5em}
-\item 	选择搜索策略：在设计好搜索空间之后，结构搜索的过程将选择一种合适的策略对搜索空间进行探索，找到最适用于当前任务的模型结构。不同于模型参数的学习，模型结构之间本身不存在直接可计算的关联，所以很难通过传统的最优化算法对其进行学习。因此，搜索策略往往选择采用遗传算法或强化学习等方法间接对模型结构进行设计或优化\upcite{DBLP:conf/icml/SoLL19,DBLP:conf/aaai/RealAHL19,DBLP:conf/icml/RealMSSSTLK17,DBLP:conf/iclr/ElskenMH19,DBLP:conf/iclr/ZophL17,DBLP:conf/cvpr/ZophVSL18,DBLP:conf/icml/PhamGZLD18,DBLP:conf/iclr/BakerGNR17,DBLP:conf/cvpr/TanCPVSHL19,DBLP:conf/iclr/LiuSVFK18}。不过近些年来也有研究人员开始尝试将模型结构建模为超网络中的参数，这样即可使用基于梯度的方式直接对最优结构进行搜索\upcite{DBLP:conf/nips/LuoTQCL18,DBLP:conf/iclr/LiuSY19,DBLP:conf/iclr/CaiZH19,DBLP:conf/cvpr/LiuCSAHY019,DBLP:conf/cvpr/WuDZWSWTVJK19,DBLP:conf/iclr/XieZLL19,DBLP:conf/uai/LiT19,DBLP:conf/cvpr/DongY19,DBLP:conf/iclr/XuX0CQ0X20,DBLP:conf/iclr/ZelaESMBH20,DBLP:conf/iclr/MeiLLJYYY20}。
+\item 	选择搜索策略：在设计好搜索空间之后，结构搜索的过程将选择一种合适的策略对搜索空间进行探索，找到最适用于当前任务的模型结构。不同于模型参数的学习，模型结构之间本身不存在直接可计算的关联，所以很难通过传统的最优化算法对其进行学习。因此，搜索策略往往选择采用遗传算法或强化学习等方法间接对模型结构进行设计或优化\upcite{DBLP:conf/icml/SoLL19,DBLP:conf/aaai/RealAHL19,DBLP:conf/icml/RealMSSSTLK17,DBLP:conf/iclr/ElskenMH19,DBLP:conf/iclr/ZophL17,DBLP:conf/cvpr/ZophVSL18,DBLP:conf/icml/PhamGZLD18,DBLP:conf/iclr/BakerGNR17,DBLP:conf/cvpr/TanCPVSHL19,DBLP:conf/iclr/LiuSVFK18}。 不过近些年来也有研究人员开始尝试将模型结构建模为超网络中的参数，这样即可使用基于梯度的方式直接对最优结构进行搜索\upcite{DBLP:conf/nips/LuoTQCL18,DBLP:conf/iclr/LiuSY19,DBLP:conf/iclr/CaiZH19,DBLP:conf/cvpr/LiuCSAHY019,DBLP:conf/cvpr/WuDZWSWTVJK19,DBLP:conf/iclr/XieZLL19,DBLP:conf/uai/LiT19,DBLP:conf/cvpr/DongY19,DBLP:conf/iclr/XuX0CQ0X20,DBLP:conf/iclr/ZelaESMBH20,DBLP:conf/iclr/MeiLLJYYY20}。
 \vspace{0.5em}
 \item 	进行性能评估：在搜索到模型结构之后需要对这种模型结构的性能进行验证，确定当前时刻找到的模型结构性能优劣。但是对于结构搜索任务来说，在搜索的过程中将产生大量中间模型结构，如果直接对所有可能的结构进行评价，其时间代价是难以接受的。因此在结构搜索任务中也有很多研究人员尝试如何快速获取模型性能（绝对性能或相对性能）\upcite{DBLP:conf/nips/LuoTQCL18,DBLP:journals/jmlr/LiJDRT17,DBLP:conf/eccv/LiuZNSHLFYHM18}。
 \vspace{0.5em}
@@ -637,7 +641,7 @@ a = \funp{P}(\cdot|\mathbi{x};a)

 \parinterval 其中函数$\pi(\cdot)$即为结构表示中的内部结构，而对于循环单元之间的组织方式（即整体框架）则决定了循环单元的输入信息，也就是上式中的循环单元表示$\hat{\mathbi{h}}_{t-1}$和输入表示$\hat{\mathbi{x}}_{t}$。理论上二者均能获得对应时刻之前所有可以获得的表示信息，因此可表示为：
 \begin{eqnarray}
-\hat{\mathbi{h}}_{t-1} &=& f(\mathbi{h}_{[0,t-1]};\mathbi{x}_{[1,t-1]}) \\ 
+\hat{\mathbi{h}}_{t-1} &=& f(\mathbi{h}_{[0,t-1]};\mathbi{x}_{[1,t-1]}) \\
 \hat{\mathbi{x}_t} &=& g(\mathbi{x}_{[1,t]};\mathbi{h}_{[0,t-1]})
 \label{eq:15-33}
 \end{eqnarray}
@@ -648,7 +652,7 @@ a = \funp{P}(\cdot|\mathbi{x};a)

 \begin{itemize}
 \vspace{0.5em}
-\item 整体框架：如图\ref{fig:15-17}所示，不同任务下不同结构往往会表现出不同的建模能力，而类似的结构在结构空间中又相对集中，因此在搜索空间的设计中，整体框架部分一般根据不同任务特点选择已经得到验证的经验性结构，通过这种方式能够快速定位到更有潜力的搜索空间。如对于图像任务来说，一般会将卷积神经网络设计为候选搜索空间\upcite{DBLP:conf/iclr/ElskenMH19,DBLP:conf/icml/PhamGZLD18,DBLP:conf/iclr/LiuSY19,DBLP:conf/eccv/LiuZNSHLFYHM18,DBLP:conf/icml/CaiYZHY18}，而对于包括机器翻译在内的自然语言处理任务而言，则会更倾向于使用循环神经网络或基于自注意力机制的Transformer模型附近的结构空间作为搜索空间\upcite{DBLP:conf/icml/SoLL19,DBLP:conf/iclr/ZophL17,DBLP:conf/icml/PhamGZLD18,DBLP:conf/iclr/LiuSY19,DBLP:journals/taslp/FanTXQLL20,DBLP:conf/ijcai/ChenLQWLDDHLZ20,DBLP:conf/acl/WangWLCZGH20}。此外，也可以拓展搜索空间以覆盖更多网络结构\upcite{DBLP:conf/acl/LiHZXJXZLL20}。
+\item 整体框架：如图\ref{fig:15-17}所示，不同任务下不同结构往往会表现出不同的建模能力，而类似的结构在结构空间中又相对集中，因此在搜索空间的设计中，整体框架部分一般根据不同任务特点选择已经得到验证的经验性结构，通过这种方式能够快速定位到更有潜力的搜索空间。如对于图像任务来说，一般会将卷积神经网络设计为候选搜索空间\upcite{DBLP:conf/iclr/ElskenMH19,DBLP:conf/icml/PhamGZLD18,DBLP:conf/iclr/LiuSY19,DBLP:conf/eccv/LiuZNSHLFYHM18,DBLP:conf/icml/CaiYZHY18}，而对于包括机器翻译在内的自然语言处理任务而言，则会更倾向于使用循环神经网络或基于自注意力机制的Transformer模型附近的结构空间作为搜索空间\upcite{DBLP:conf/icml/SoLL19,DBLP:conf/iclr/ZophL17,DBLP:conf/icml/PhamGZLD18,DBLP:conf/iclr/LiuSY19,DBLP:journals/taslp/FanTXQLL20,DBLP:conf/ijcai/ChenLQWLDDHLZ20,DBLP:conf/acl/WangWLCZGH20}。 此外，也可以拓展搜索空间以覆盖更多网络结构\upcite{DBLP:conf/acl/LiHZXJXZLL20}。
 \vspace{0.5em}
 \item 	内部结构：由于算力限制，网络结构搜索的任务通常使用经验性的架构作为模型的整体框架，之后通过对搜索到的内部结构进行堆叠得到完整的模型结构。而对于内部结构的设计需要考虑到搜索过程中的最小搜索单元以及搜索单元之间的连接方式，最小搜索单元指的是在结构搜索过程中可被选择的最小独立计算单元（或被称为搜索算子、操作），在不同搜索空间的设计中，最小搜索单元的颗粒度各有不同，相对较小的搜索粒度主要包括诸如矩阵乘法、张量缩放等基本数学运算\upcite{DBLP:journals/corr/abs-2003-03384}，中等粒度的搜索单元包括例如常见的激活函数，如ReLU、Tanh等\upcite{DBLP:conf/iclr/LiuSY19,DBLP:conf/acl/LiHZXJXZLL20,Chollet2017XceptionDL}，同时在搜索空间的设计上也有研究人员倾向于选择较大颗粒度的局部结构作为搜索单元，如注意力机制、层标准化等人工设计的经验性结构\upcite{DBLP:conf/icml/SoLL19,DBLP:conf/nips/LuoTQCL18,DBLP:journals/taslp/FanTXQLL20}。不过，对于搜索颗粒度的问题，目前还缺乏有效的方法针对不同任务进行自动优化。
 \vspace{0.5em}
@@ -666,7 +670,7 @@ a = \funp{P}(\cdot|\mathbi{x};a)

 \begin{itemize}
 \vspace{0.5em}
-\item 进化算法{\red 检查这些词是不是第一次提到}：最初主要通过进化算法对神经网络中的模型结构以及权重参数进行优化\upcite{DBLP:conf/icga/MillerTH89,DBLP:journals/tnn/AngelineSP94,stanley2002evolving,DBLP:journals/alife/StanleyDG09}。而随着最优化算法的发展，近年来对于网络参数的学习更多地采用梯度下降法的方式，不过使用进化算法对模型结构进行优化却依旧被沿用至今\upcite{DBLP:conf/aaai/RealAHL19,DBLP:conf/icml/RealMSSSTLK17,DBLP:conf/iclr/ElskenMH19,DBLP:conf/ijcai/SuganumaSN18,Real2019AgingEF,DBLP:conf/iclr/LiuSVFK18,DBLP:conf/iccv/XieY17}。目前主流的方式主要是将模型结构看做是遗传算法中种群的个体，通过使用轮盘赌或锦标赛等抽取方式对种群中的结构进行取样作为亲本，之后通过亲本模型的突变产生新的模型结构，最终对这些新的模型结构进行适应度评估{\red （见XXX节）}，根据模型结构在校验集上性能表现确定是否能够将其加入种群，整个过程如图\ref{fig:15-19}所示。对于进化算法中结构的突变主要指的是对模型中局部结构的改变，如增加跨层连接、替换局部操作等。
+\item 进化算法{\red 检查这些词是不是第一次提到}：最初主要通过进化算法对神经网络中的模型结构以及权重参数进行优化\upcite{DBLP:conf/icga/MillerTH89,DBLP:journals/tnn/AngelineSP94,stanley2002evolving,DBLP:journals/alife/StanleyDG09}。而随着最优化算法的发展，近年来对于网络参数的学习更多地采用梯度下降法的方式，不过使用进化算法对模型结构进行优化却依旧被沿用至今\upcite{DBLP:conf/aaai/RealAHL19,DBLP:conf/icml/RealMSSSTLK17,DBLP:conf/iclr/ElskenMH19,DBLP:conf/ijcai/SuganumaSN18,Real2019AgingEF,DBLP:conf/iclr/LiuSVFK18,DBLP:conf/iccv/XieY17}。 目前主流的方式主要是将模型结构看做是遗传算法中种群的个体，通过使用轮盘赌或锦标赛等抽取方式对种群中的结构进行取样作为亲本，之后通过亲本模型的突变产生新的模型结构，最终对这些新的模型结构进行适应度评估{\red （见XXX节）}，根据模型结构在校验集上性能表现确定是否能够将其加入种群，整个过程如图\ref{fig:15-19}所示。对于进化算法中结构的突变主要指的是对模型中局部结构的改变，如增加跨层连接、替换局部操作等。

 %----------------------------------------------
 \begin{figure}[htp]

--- a/Chapter16/Figures/figure-application-process-of-back-translation.tex
+++ b/Chapter16/Figures/figure-application-process-of-back-translation.tex
@@ -41,7 +41,7 @@

 \node [anchor=west,fill=red!20,inner sep=0.1em,minimum width=3em,draw=black,line width=0.6pt,rounded corners=2pt](node4-1) at ([xshift=2.0em,yshift=1.6em]node3-2.east){\scriptsize{英语}};
 \node [anchor=north,fill=green!20,inner sep=0.1em,minimum width=3em,draw=black,line width=0.6pt,rounded corners=2pt](node4-2) at (node4-1.south){\scriptsize{英语}};
-\node [anchor=west,fill=green!20,inner sep=0.1em,minimum width=3em,draw=black,line width=0.6pt,rounded corners=2pt](node4-3) at (node4-1.east){\scriptsize{汉语}};
+\node [anchor=west,fill=yellow!20,inner sep=0.1em,minimum width=3em,draw=black,line width=0.6pt,rounded corners=2pt](node4-3) at (node4-1.east){\scriptsize{汉语}};
 \node [anchor=north,fill=green!20,inner sep=0.1em,minimum width=3em,draw=black,line width=0.6pt,rounded corners=2pt](node4-4) at (node4-3.south){\scriptsize{汉语}};



--- a/Chapter16/Figures/figure-bilingual-dictionary-Induction.png
+++ b/Chapter16/Figures/figure-bilingual-dictionary-Induction.png
--- a/Chapter16/Figures/figure-bilingual-dictionary-Induction.tex
+++ b/Chapter16/Figures/figure-bilingual-dictionary-Induction.tex
+\begin{tikzpicture}
+
+%%%%%%%%词典推断------------------------------------------------------------
+\begin{scope}
+\draw [-,ublue,line width=0.5pt] (0,0)..controls (0.3,0.2) and (0.5,0)..(0.7,-0.2)..controls (0.8,-0.3) and (0.9,-0.4)..(1.1,-0.4)..controls (1.3,-0.4) and (1.3,-0.1)..(1.28,0)..controls (1.26,0.1) and (1.25,0.2)..(1.2,0.3)..controls (1.15,0.4)and (1.2,0.5)..(1.6,0.55)..controls (1.7,0.56) and (1.78,0.5)..(1.85,0.35)..controls (2.0,0.0) and (2.05,-0.1)..(2.05,-0.5)..controls (2.04,-1.1) and (1.5,-1.1)..(0.6,-0.78)..controls (0.5,-0.74) and (0.4,-0.7)..(0.2,-0.5)..controls(0.1,-0.4) and (-0.15,-0.1)..(0,0) ;
+
+\draw [-,red!70,line width=0.5pt] (0.04,-0.5) .. controls (0,-0.4) and (0.4,-0.1)..(0.7,-0.3)..controls (0.9,-0.45) and (1.1,-0.4)..(1.2,-0.3)..controls (1.3,-0.2) and (1.2,0.1).. (1.0,0.3)..controls (0.8,0.5) and (1.0,0.6)..(1.2,0.67)..controls (1.5,0.78) and (1.8,0.5)..(1.9,0.2)..controls(2.1,-0.3) and (2,-0.5)..(1.8,-0.75)..controls (1.5,-1.1) and (1.2,-1.0)..(0.4,-0.8)..controls (0.3,-0.77) and (0.14,-0.755)..(0.04,-0.5);
+
+\draw [-,thick] (-0.7,1.0)--(-0.7,-1.0);
+
+\node [anchor=center](c1) at (-0.1,0){\tiny{$\mathbi{Y}$}};
+\node [anchor=center](c2) at (-0.3,-0.7){\tiny{$\mathbi{W}\cdot \mathbi{X}$}};
+\node [anchor=center,red!70](cr1) at (0.65,-0.65){\scriptsize{$\bullet$}}; 
+\node [anchor=center,ublue](cb1) at (0.6,-0.5){\scriptsize{$\bullet$}};
+\node [anchor=center,red!70](cr2) at (1.65,-0.65){\scriptsize{$\bullet$}}; 
+\node [anchor=center,ublue](cb2) at (1.55,-0.8){\scriptsize{$\bullet$}};
+\node [anchor=center,red!70](cr3) at (1.5,0.1){\scriptsize{$\bullet$}}; 
+\node [anchor=center,ublue](cb3) at (1.6,-0.05){\scriptsize{$\bullet$}}; 
+\draw [-,red](0.65,-0.65)--(0.60,-0.62)--(0.66,-0.58)--(0.6,-0.55)--(0.63,-0.52)--(0.6,-0.5);
+\draw [-,red](1.65,-0.65)--(1.60,-0.68)--(1.64,-0.72)--(1.56,-0.72)--(1.60,-0.76)--(1.55,-0.8);
+\draw [-,red](1.5,0.1)--(1.53,0.08)--(1.49,0.04)--(1.58,0.03)--(1.54,-0.01)--(1.6,-0.05);
+\end{scope}
+
+%%%%%%%%X映射到Y空间------------------------------------------------------------
+\begin{scope}[xshift=-8.0em]
+\draw [-,ublue,line width=0.5pt] (0,0)..controls (0.3,0.2) and (0.5,0)..(0.7,-0.2)..controls (0.8,-0.3) and (0.9,-0.4)..(1.1,-0.4)..controls (1.3,-0.4) and (1.3,-0.1)..(1.28,0)..controls (1.26,0.1) and (1.25,0.2)..(1.2,0.3)..controls (1.15,0.4)and (1.2,0.5)..(1.6,0.55)..controls (1.7,0.56) and (1.78,0.5)..(1.85,0.35)..controls (2.0,0.0) and (2.05,-0.1)..(2.05,-0.5)..controls (2.04,-1.1) and (1.5,-1.1)..(0.6,-0.78)..controls (0.5,-0.74) and (0.4,-0.7)..(0.2,-0.5)..controls(0.1,-0.4) and (-0.15,-0.1)..(0,0) ;
+
+\draw [-,red!70,line width=0.5pt] (0.04,-0.5) .. controls (0,-0.4) and (0.4,-0.1)..(0.7,-0.3)..controls (0.9,-0.45) and (1.1,-0.4)..(1.2,-0.3)..controls (1.3,-0.2) and (1.2,0.1).. (1.0,0.3)..controls (0.8,0.5) and (1.0,0.6)..(1.2,0.67)..controls (1.5,0.78) and (1.8,0.5)..(1.9,0.2)..controls(2.1,-0.3) and (2,-0.5)..(1.8,-0.75)..controls (1.5,-1.1) and (1.2,-1.0)..(0.4,-0.8)..controls (0.3,-0.77) and (0.14,-0.755)..(0.04,-0.5);
+
+\draw [-,thick] (-0.7,1.0)--(-0.7,-1.0);
+
+\node [anchor=center](c1) at (-0.1,0){\tiny{$\mathbi{Y}$}};
+\node [anchor=center](c2) at (-0.3,-0.7){\tiny{$\mathbi{W}\cdot \mathbi{X}$}};
+\node [anchor=center,red!70](cr1) at (0.65,-0.65){\scriptsize{$\bullet$}}; 
+\node [anchor=center,ublue](cb1) at (0.6,-0.5){\scriptsize{$\bullet$}};
+\node [anchor=center,red!70](cr2) at (1.65,-0.65){\scriptsize{$\bullet$}}; 
+\node [anchor=center,ublue](cb2) at (1.55,-0.8){\scriptsize{$\bullet$}};
+\node [anchor=center,red!70](cr3) at (1.5,0.1){\scriptsize{$\bullet$}}; 
+\node [anchor=center,ublue](cb3) at (1.6,-0.05){\scriptsize{$\bullet$}}; 
+%%%%%%一堆红色的球
+\node [anchor=center,red!70](cr4) at (0.15,-0.6){\Large{$\cdot$}};
+\node [anchor=center,red!70](cr5) at (0.3,-0.6){\Large{$\cdot$}};
+\node [anchor=center,red!70](cr6) at (0.5,-0.55){\Large{$\cdot$}};
+\node [anchor=center,red!70](cr7) at (0.35,-0.4){\Large{$\cdot$}};
+\node [anchor=center,red!70](cr8) at (0.4,-0.7){\Large{$\cdot$}};
+\node [anchor=center,red!70](cr8) at (0.55,-0.8){\Large{$\cdot$}};
+\node [anchor=center,red!70](cr9) at (0.9,-0.8){\Large{$\cdot$}};
+\node [anchor=center,red!70](cr10) at (0.9,-0.5){\Large{$\cdot$}};
+\node [anchor=center,red!70](cr11) at (1.4,-0.8){\Large{$\cdot$}};
+\node [anchor=center,red!70](cr12) at (1.45,-0.3){\Large{$\cdot$}};
+\node [anchor=center,red!70](cr13) at (1.35,0.3){\Large{$\cdot$}};
+\node [anchor=center,red!70](cr14) at (1.2,0.4){\Large{$\cdot$}};
+\node [anchor=center,red!70](cr15) at (1.6,0.45){\Large{$\cdot$}};
+%%%%%%一堆蓝色的球
+\node [anchor=center,ublue](cb4) at (0.1,-0.2){\Large{$\cdot$}};
+\node [anchor=center,ublue](cb5) at (0.3,-0.2){\Large{$\cdot$}};
+\node [anchor=center,ublue](cb6) at (0.5,-0.25){\Large{$\cdot$}};
+\node [anchor=center,ublue](cb7) at (0.4,-0.1){\Large{$\cdot$}};
+\node [anchor=center,ublue](cb8) at (0.35,-0.45){\Large{$\cdot$}};
+\node [anchor=center,ublue](cb9) at (0.45,-0.6){\Large{$\cdot$}};
+\node [anchor=center,ublue](cb10) at (0.85,-0.45){\Large{$\cdot$}};
+\node [anchor=center,ublue](cb11) at (1.45,-0.45){\Large{$\cdot$}};
+\node [anchor=center,ublue](cb12) at (1.3,-0.85){\Large{$\cdot$}};
+\node [anchor=center,ublue](cb13) at (1.8,-0.5){\Large{$\cdot$}};
+\node [anchor=center,ublue](cb14) at (1.75,0.2){\Large{$\cdot$}};
+\node [anchor=center,ublue](cb15) at (1.6,0.2){\Large{$\cdot$}};
+\end{scope}
+
+%%%%%%%%X、Y词嵌入空间------------------------------------------------------------
+\begin{scope}[xshift=-16em]
+\draw [-,ublue,line width=0.5pt] (0,0)..controls (0.3,0.2) and (0.5,0)..(0.7,-0.2)..controls (0.8,-0.3) and (0.9,-0.4)..(1.1,-0.4)..controls (1.3,-0.4) and (1.3,-0.1)..(1.28,0)..controls (1.26,0.1) and (1.25,0.2)..(1.2,0.3)..controls (1.15,0.4)and (1.2,0.5)..(1.6,0.55)..controls (1.7,0.56) and (1.78,0.5)..(1.85,0.35)..controls (2.0,0.0) and (2.05,-0.1)..(2.05,-0.5)..controls (2.04,-1.1) and (1.5,-1.1)..(0.6,-0.78)..controls (0.5,-0.74) and (0.4,-0.7)..(0.2,-0.5)..controls(0.1,-0.4) and (-0.15,-0.1)..(0,0) ;
+
+\node [anchor=center](x1) at (-1.45,0.2){\tiny{$\mathbi{X}$}};
+\node [anchor=center](y1) at (1.1,0.1){\tiny{$\mathbi{Y}$}};
+
+\node [anchor=center,ublue](cb1) at (0.6,-0.5){\scriptsize{$\bullet$}};
+\node [anchor=center,ublue](cb2) at (1.55,-0.8){\scriptsize{$\bullet$}};
+\node [anchor=center,ublue](cb3) at (1.6,-0.05){\scriptsize{$\bullet$}}; 
+%%%%%%一堆蓝色的球
+\node [anchor=center,ublue](cb4) at (0.1,-0.2){\Large{$\cdot$}};
+\node [anchor=center,ublue](cb5) at (0.3,-0.2){\Large{$\cdot$}};
+\node [anchor=center,ublue](cb6) at (0.5,-0.25){\Large{$\cdot$}};
+\node [anchor=center,ublue](cb7) at (0.4,-0.1){\Large{$\cdot$}};
+\node [anchor=center,ublue](cb8) at (0.35,-0.45){\Large{$\cdot$}};
+\node [anchor=center,ublue](cb9) at (0.45,-0.6){\Large{$\cdot$}};
+\node [anchor=center,ublue](cb10) at (0.85,-0.45){\Large{$\cdot$}};
+\node [anchor=center,ublue](cb11) at (1.45,-0.45){\Large{$\cdot$}};
+\node [anchor=center,ublue](cb12) at (1.3,-0.85){\Large{$\cdot$}};
+\node [anchor=center,ublue](cb13) at (1.8,-0.5){\Large{$\cdot$}};
+\node [anchor=center,ublue](cb14) at (1.75,0.2){\Large{$\cdot$}};
+\node [anchor=center,ublue](cb15) at (1.6,0.2){\Large{$\cdot$}};
+
+\node [anchor=center](rw1) at (-0.5,0.45){\tiny{cat}}; 
+\node [anchor=center](rw2) at (0.05,0.4){\tiny{feline}}; 
+\node [anchor=center](rw3) at (-1.17,-0.07){\tiny{car}}; 
+\node [anchor=center](rw4) at (-0.7,-0.65){\tiny{deep}};
+\node [anchor=center](bw1) at (0.2,-0.1){\tiny{felin}};
+\node [anchor=center](bw2) at (0.75,-0.65){\tiny{katze}};
+\node [anchor=center](bw3) at (1.55,-0.65){\tiny{auto}};
+\node [anchor=center](bw4) at (1.6,-0.2){\tiny{tief}};
+\node [anchor=center](de1) at (0.3,-1.5) {\small{(a) $\mathbi{X}$、$\mathbi{Y}$词嵌入空间}};
+\node [anchor=center](de2) at (3.9,-1.5) {\small{(b) $\mathbi{X}$映射到$\mathbi{Y}$空间}};
+\node [anchor=center](de3) at (7,-1.5) {\small{(c) 词典推断}};
+\node [anchor=center](de4) at (10.1,-1.5) {\small{(d) 微调结果}};
+
+\end{scope}
+
+\begin{scope}[xshift=-14.5em,yshift=0.8em,rotate=-150]
+\draw [-,red!70,line width=0.5pt] (0.04,-0.5) .. controls (0,-0.4) and (0.4,-0.1)..(0.7,-0.3)..controls (0.9,-0.45) and (1.1,-0.4)..(1.2,-0.3)..controls (1.3,-0.2) and (1.2,0.1).. (1.0,0.3)..controls (0.8,0.5) and (1.0,0.6)..(1.2,0.67)..controls (1.5,0.78) and (1.8,0.5)..(1.9,0.2)..controls(2.1,-0.3) and (2,-0.5)..(1.8,-0.75)..controls (1.5,-1.1) and (1.2,-1.0)..(0.4,-0.8)..controls (0.3,-0.77) and (0.14,-0.755)..(0.04,-0.5);
+
+\node [anchor=center,red!70](cr1) at (0.65,-0.65){\scriptsize{$\bullet$}}; 
+\node [anchor=center,red!70](cr2) at (1.65,-0.65){\scriptsize{$\bullet$}}; 
+\node [anchor=center,red!70](cr3) at (1.5,0.1){\scriptsize{$\bullet$}}; 
+%%%%%%一堆红色的球
+\node [anchor=center,red!70](cr4) at (0.15,-0.6){\Large{$\cdot$}};
+\node [anchor=center,red!70](cr5) at (0.3,-0.6){\Large{$\cdot$}};
+\node [anchor=center,red!70](cr6) at (0.5,-0.55){\Large{$\cdot$}};
+\node [anchor=center,red!70](cr7) at (0.35,-0.4){\Large{$\cdot$}};
+\node [anchor=center,red!70](cr8) at (0.4,-0.7){\Large{$\cdot$}};
+\node [anchor=center,red!70](cr8) at (0.55,-0.8){\Large{$\cdot$}};
+\node [anchor=center,red!70](cr9) at (0.9,-0.8){\Large{$\cdot$}};
+\node [anchor=center,red!70](cr10) at (0.9,-0.5){\Large{$\cdot$}};
+\node [anchor=center,red!70](cr11) at (1.4,-0.8){\Large{$\cdot$}};
+\node [anchor=center,red!70](cr12) at (1.45,-0.3){\Large{$\cdot$}};
+\node [anchor=center,red!70](cr13) at (1.35,0.3){\Large{$\cdot$}};
+\node [anchor=center,red!70](cr14) at (1.2,0.4){\Large{$\cdot$}};
+\node [anchor=center,red!70](cr15) at (1.6,0.45){\Large{$\cdot$}};
+\end{scope}
+
+%%%%%%%%%%%微调结果------------------------------------------------------------
+\begin{scope}[xshift=8.2em]
+\draw [-,red!70,line width=0.5pt] (0,0.4688)..controls (0.3,0.45) and (0.5,0.2)..(0.7,-0.25)..controls (0.8,-0.45) and (0.9,-0.4)..(1.1,-0.4)..controls (1.3,-0.42) and (1.3,-0.12)..(1.28,0)..controls (1.26,0.1) and (1.25,0.2)..(1.2,0.3)..controls (1.13,0.4) and (1.18,0.5)..(1.6,0.55)..controls (1.7,0.56) and (1.78,0.5)..(1.85,0.35)..controls (2.03,0.0) and (2.08,-0.1)..(2.07,-0.5)..controls (2.04,-1.1) and (1.5,-1.16)..(0.6,-0.91)..controls (0.05,-0.71) and (-0.2,-0.53)..(-0.25,-0.45)..controls (-0.55,0.0) and (-0.5,0.501)..(0,0.4688);
+
+\draw [-,ublue,line width=0.5pt] (0,0.5)..controls (0.3,0.5) and (0.5,0.2)..(0.7,-0.25)..controls (0.8,-0.45) and (0.9,-0.4)..(1.1,-0.4)..controls (1.3,-0.40) and (1.3,-0.1)..(1.28,0)..controls (1.26,0.1) and (1.25,0.2)..(1.2,0.3)..controls (1.15,0.4)and (1.2,0.5)..(1.6,0.55)..controls (1.7,0.56) and (1.78,0.5)..(1.85,0.35)..controls (2.0,0.0) and (2.05,-0.1)..(2.05,-0.5)..controls (2.04,-1.1) and (1.5,-1.1)..(0.6,-0.91)..controls (0.0,-0.75) and (-0.2,-0.53)..(-0.25,-0.45)..controls (-0.5,0.0) and (-0.5,0.501)..(0,0.5);
+
+\draw [-,thick] (-0.8,1.0)--(-0.8,-1.0);
+
+\node [anchor=center](c1) at (0.1,0.6){\tiny{$\mathbi{Y}$}};
+\node [anchor=center](c2) at (-0.45,-0.7){\tiny{$\mathbi{W}\cdot \mathbi{X}$}};
+
+\node [anchor=center,red!70](cr1) at (0.2,-0.35){\scriptsize{$\bullet$}};
+\node [anchor=center,red!70](cr2) at (1.58,-0.78){\scriptsize{$\bullet$}};
+\node [anchor=center,red!70](cr3) at (1.6,0){\scriptsize{$\bullet$}}; 
+
+\node [anchor=center,ublue](cb1) at (0.2,-0.3){\scriptsize{$\bullet$}};
+\node [anchor=center,ublue](cb2) at (1.55,-0.8){\scriptsize{$\bullet$}};
+\node [anchor=center,ublue](cb3) at (1.6,-0.05){\scriptsize{$\bullet$}}; 
+%%%%%%一堆红色的球
+\node [anchor=center,red!70](cb4) at (-0.35,0.16){\Large{$\cdot$}};
+\node [anchor=center,red!70](cb5) at (-0.03,0.37){\Large{$\cdot$}};
+\node [anchor=center,red!70](cb6) at (-0.03,0.12){\Large{$\cdot$}};
+\node [anchor=center,red!70](cb7) at (0.37,0.02){\Large{$\cdot$}};
+\node [anchor=center,red!70](cb8) at (-0.18,-0.18){\Large{$\cdot$}};
+\node [anchor=center,red!70](cb9) at (0.65,-0.43){\Large{$\cdot$}};
+\node [anchor=center,red!70](cb10) at (0.32,-0.68){\Large{$\cdot$}};
+\node [anchor=center,red!70](cb11) at (0.82,-0.73){\Large{$\cdot$}};
+\node [anchor=center,red!70](cb12) at (1.23,-0.85){\Large{$\cdot$}};
+\node [anchor=center,red!70](cb13) at (1.8,-0.47){\Large{$\cdot$}};
+\node [anchor=center,red!70](cb14) at (1.75,0.23){\Large{$\cdot$}};
+\node [anchor=center,red!70](cb15) at (1.38,-0.44){\Large{$\cdot$}};
+\node [anchor=center,red!70](cb16) at (1.42,0.26){\Large{$\cdot$}};
+%%%%%%一堆蓝色的球
+\node [anchor=center,ublue](cb4) at (-0.35,0.2){\Large{$\cdot$}};
+\node [anchor=center,ublue](cb5) at (0,0.4){\Large{$\cdot$}};
+\node [anchor=center,ublue](cb6) at (0,0.15){\Large{$\cdot$}};
+\node [anchor=center,ublue](cb7) at (0.4,0.05){\Large{$\cdot$}};
+\node [anchor=center,ublue](cb8) at (-0.15,-0.15){\Large{$\cdot$}};
+\node [anchor=center,ublue](cb9) at (0.65,-0.4){\Large{$\cdot$}};
+\node [anchor=center,ublue](cb10) at (0.3,-0.65){\Large{$\cdot$}};
+\node [anchor=center,ublue](cb11) at (0.8,-0.7){\Large{$\cdot$}};
+\node [anchor=center,ublue](cb12) at (1.2,-0.85){\Large{$\cdot$}};
+\node [anchor=center,ublue](cb13) at (1.8,-0.5){\Large{$\cdot$}};
+\node [anchor=center,ublue](cb14) at (1.75,0.2){\Large{$\cdot$}};
+\node [anchor=center,ublue](cb15) at (1.4,-0.45){\Large{$\cdot$}};
+\node [anchor=center,ublue](cb16) at (1.45,0.3){\Large{$\cdot$}};
+\node [anchor=center](rw1) at (0.22,-0.45){\tiny{cat}};
+\node [anchor=center](rw2) at (0.20,-0.15){\tiny{katze}};
+\end{scope}
+
+\end{tikzpicture}
\ No newline at end of file
--- a/Chapter16/Figures/figure-contrast-of-traditional-machine-learning&transfer-learning.tex
+++ b/Chapter16/Figures/figure-contrast-of-traditional-machine-learning&transfer-learning.tex
@@ -4,26 +4,26 @@
 \begin{tikzpicture}
 	\tikzstyle{node}=[rounded corners=2pt,draw,minimum width=5em,minimum height=2em,drop shadow,font=\footnotesize]

-\node[node,fill=blue!20] (nmt1) at (0,0){NMT系统1};
-\node[node,anchor=west,fill=yellow!20] (nmt2) at ([xshift=1em]nmt1.east){NMT系统2};
-\node[node,anchor=west,fill=red!20] (nmt3) at ([xshift=1em]nmt2.east){NMT系统3};
+\node[node,fill=blue!20,line width=0.6pt] (nmt1) at (0,0){NMT系统1};
+\node[node,anchor=west,fill=yellow!20,line width=0.6pt] (nmt2) at ([xshift=1em]nmt1.east){NMT系统2};
+\node[node,anchor=west,fill=red!20,line width=0.6pt] (nmt3) at ([xshift=1em]nmt2.east){NMT系统3};

-\node[node,anchor=south,fill=blue!20] (n1) at ([yshift=2.4em]nmt1.north){我不悦};
-\node[node,anchor=west,fill=yellow!20] (n2) at ([xshift=1em]n1.east){我不开心};
-\node[node,anchor=west,fill=red!20] (n3) at ([xshift=1em]n2.east){吾怀忳忳};
+\node[node,anchor=south,fill=blue!20,line width=0.6pt] (n1) at ([yshift=2.4em]nmt1.north){我不悦};
+\node[node,anchor=west,fill=yellow!20,line width=0.6pt] (n2) at ([xshift=1em]n1.east){我不开心};
+\node[node,anchor=west,fill=red!20,line width=0.6pt] (n3) at ([xshift=1em]n2.east){吾怀忳忳};

-\node[node,anchor=south,fill=green!20,minimum height=1.6em] (task1) at ([yshift=2.6em]n2.north){不同任务};
+\node[node,anchor=south,fill=green!20,minimum height=1.6em,line width=0.6pt] (task1) at ([yshift=2.6em]n2.north){不同任务};

-\node[node,anchor=west,fill=green!20,minimum height=1.6em] (task2) at ([xshift=8em]task1.east){源任务};
-\node[node,anchor=north,minimum height=3.2em,fill=orange!20] (n4) at ([yshift=-2em]task2.south){};
-\node[draw,anchor=north,cylinder,shape border rotate=90,minimum width=3em,aspect=0.4,fill=orange!20] (kd) at ([yshift=-1.7em]n4.south){\footnotesize 知识};
+\node[node,anchor=west,fill=green!20,minimum height=1.6em,line width=0.6pt] (task2) at ([xshift=8em]task1.east){源任务};
+\node[node,anchor=north,minimum height=3.2em,fill=orange!20,line width=0.6pt] (n4) at ([yshift=-2em]task2.south){};
+\node[draw,anchor=north,cylinder,shape border rotate=90,minimum width=3em,aspect=0.4,fill=orange!20,line width=0.6pt] (kd) at ([yshift=-1.7em]n4.south){\footnotesize 知识};

-\node[draw,minimum width=4em,font=\scriptsize,anchor=north,inner ysep=2pt,fill=blue!20] at ([yshift=-2.35em]task2.south){我不悦};
-\node[draw,minimum width=4em,font=\scriptsize,anchor=north,inner ysep=2pt,fill=yellow!20] at ([yshift=-3.75em]task2.south){我不开心};
+\node[draw,minimum width=4em,font=\scriptsize,anchor=north,inner ysep=2pt,fill=blue!20,line width=0.6pt] at ([yshift=-2.35em]task2.south){我不悦};
+\node[draw,minimum width=4em,font=\scriptsize,anchor=north,inner ysep=2pt,fill=yellow!20,line width=0.6pt] at ([yshift=-3.75em]task2.south){我不开心};

-\node[node,anchor=west,fill=green!20,minimum height=1.6em] (task3) at ([xshift=3em]task2.east){目标任务};
-\node[node,anchor=north,fill=red!20] (n5) at ([yshift=-2.5em]task3.south){吾怀忳忳};
-\node[node,anchor=north,fill=red!20] (sys) at ([yshift=-2.5em]n5.south){学习系统};
+\node[node,anchor=west,fill=green!20,minimum height=1.6em,line width=0.6pt] (task3) at ([xshift=3em]task2.east){目标任务};
+\node[node,anchor=north,fill=red!20,line width=0.6pt] (n5) at ([yshift=-2.5em]task3.south){吾怀忳忳};
+\node[node,anchor=north,fill=red!20,line width=0.6pt] (sys) at ([yshift=-2.5em]n5.south){学习系统};

 \draw[->,thick] ([yshift=-0.2em,xshift=-0.7em]task1.-145) -- node[left,font=\scriptsize,yshift=0.2em]{书面语}([yshift=0.2em]n1.90);
 \draw[->,thick] ([yshift=-0.2em]task1.-90) -- node[right,font=\scriptsize,yshift=0.2em,xshift=-0.2em]{口语}([yshift=0.2em]n2.90);

--- a/Chapter16/Figures/figure-knowledge-distillation-based-translation-process.tex
+++ b/Chapter16/Figures/figure-knowledge-distillation-based-translation-process.tex
@@ -3,11 +3,11 @@
 %-------------------------------------------------------------------------
 \begin{tikzpicture}

-\node[draw,circle,inner sep=2pt,minimum size=2em,fill=blue!20] (x) at (0,0) {$\seq{x}$};
+\node[draw,circle,inner sep=2pt,minimum size=2em,fill=blue!20,line width=0.6pt] (x) at (0,0) {$\seq{x}$};

-\node[draw,circle,inner sep=2pt,minimum size=2em,fill=red!15] (p) at (0,-2.4) {$\seq{p}$};
+\node[draw,circle,inner sep=2pt,minimum size=2em,fill=red!20,line width=0.6pt] (p) at (0,-2.4) {$\seq{p}$};

-\node[draw,circle,inner sep=2pt,minimum size=2em,fill=blue!20] (y) at (2.4,-1.2) {$\seq{y}$};
+\node[draw,circle,inner sep=2pt,minimum size=2em,fill=blue!20,line width=0.6pt] (y) at (2.4,-1.2) {$\seq{y}$};

 \draw[-,dashed,thick,black!50] (x.-90) -- (p.90);
 \draw[-,dashed,thick,black!50] (p.0) -- (y.-135);

--- a/Chapter16/Figures/figure-multi-language-single-model-system-diagram.tex
+++ b/Chapter16/Figures/figure-multi-language-single-model-system-diagram.tex
@@ -3,12 +3,12 @@
 %-------------------------------------------------------------------------
 \begin{tikzpicture}
 \tikzstyle{lan}=[font=\footnotesize,inner ysep=2pt,minimum height=1em]
-\node[minimum height=3em,minimum width=8em,fill=orange!20,draw,rounded corners=2pt,align=center] (sys) at (0,0){多语言 \\ 单模型系统};
-\node[draw,font=\footnotesize,minimum width=4em,fill=blue!20,rounded corners=1pt] (en) at (-3em,4em){英语};
-\node[draw,font=\footnotesize,minimum width=4em,fill=blue!20,rounded corners=1pt] (fr) at (3em,4em){法语};
+\node[minimum height=3em,minimum width=8em,fill=orange!20,draw,rounded corners=2pt,align=center,line width=0.6pt] (sys) at (0,0){多语言 \\ 单模型系统};
+\node[draw,font=\footnotesize,minimum width=4em,fill=red!20,rounded corners=1pt,line width=0.6pt] (en) at (-3em,4em){英语};
+\node[draw,font=\footnotesize,minimum width=4em,fill=red!20,rounded corners=1pt,line width=0.6pt] (fr) at (3em,4em){法语};
 \node[minimum width=4em]  at (6.6em,4em){$\dots$};
-\node[draw,font=\footnotesize,minimum width=4em,fill=yellow!20,rounded corners=1pt] (de) at (-3em,-4em){德语};
-\node[draw,font=\footnotesize,minimum width=4em,fill=yellow!20,rounded corners=1pt] (sp) at (3em,-4em){西班牙语};
+\node[draw,font=\footnotesize,minimum width=4em,fill=blue!20,rounded corners=1pt,line width=0.6pt] (de) at (-3em,-4em){德语};
+\node[draw,font=\footnotesize,minimum width=4em,fill=blue!20,rounded corners=1pt,line width=0.6pt] (sp) at (3em,-4em){西班牙语};
 \node[minimum width=4em]  at (6.6em,-4em){$\dots$};

 \draw[->,thick] (en.-90) -- ([xshift=-1em]sys.90);
@@ -18,27 +18,21 @@

 \node[font=\footnotesize] (train) at (11em,7em) {\small\bfnew{训练阶段：}};
 \node[anchor=north,font=\footnotesize] (pair1) at ([yshift=-1em,xshift=1em]train.south) {双语句对1：};
-\node[anchor=west,draw=blue!40,lan,minimum width=9.8em,fill=blue!20] (box1) at ([yshift=.7em,xshift=0.4em]pair1.east) {};
-\node[anchor=west,lan] at ([yshift=.7em,xshift=0.4em]pair1.east) {英语：{\color{red}<spanish>} \ hello};
-\node[anchor=west,draw=yellow!40,lan,minimum width=9.8em,fill=yellow!20] (box2) at ([yshift=-.7em,xshift=0.4em]pair1.east) {};
-\node[anchor=west,lan] at ([yshift=-.7em,xshift=0.4em]pair1.east) {西班牙语：hola};
+\node[anchor=west,lan](train1) at ([yshift=.7em,xshift=0.4em]pair1.east) {英语：{\color{red}<spanish>} \ hello};
+\node[anchor=west,lan](train2) at ([yshift=-.7em,xshift=0.4em]pair1.east) {西班牙语：hola};
 \node[anchor=north,font=\footnotesize] (pair2) at ([yshift=-4.5em,xshift=1em]train.south) {双语句对2：};
-\node[anchor=west,draw=blue!40,lan,minimum width=9.8em,fill=blue!20] (box3) at ([yshift=.7em,xshift=0.4em]pair2.east) {};
-\node[anchor=west,lan] at ([yshift=.7em,xshift=0.4em]pair2.east) {法语：{\color{red}<german>} \ Bonjour};
-\node[anchor=west,draw=yellow!40,lan,minimum width=9.8em,fill=yellow!20] (box4) at ([yshift=-.7em,xshift=0.4em]pair2.east) {};
-\node[anchor=west,lan] at ([yshift=-.7em,xshift=0.4em]pair2.east) {德语：Hallo};
+\node[anchor=west,lan](train3) at ([yshift=.7em,xshift=0.4em]pair2.east) {法语：{\color{red}<german>} \ Bonjour};
+\node[anchor=west,lan](train4) at ([yshift=-.7em,xshift=0.4em]pair2.east) {德语：Hallo};
 \node[anchor=north,font=\footnotesize] (decode) at ([yshift=-8em]train.south) {\small\bfnew{解码阶段：}};
-\node[anchor=north,font=\footnotesize] (input) at ([yshift=-0.6em]decode.south) {输入：};
-\node[anchor=west,draw=blue!40,lan,minimum width=9.8em,fill=blue!20] (box5) at ([xshift=0.4em]input.east) {};
-\node[anchor=west,lan] at ([xshift=0.4em]input.east) {英语：{\color{red}<german>} \ hello};
-\node[anchor=north,font=\footnotesize] (output) at ([yshift=-2.6em]decode.south) {输出：};
-\node[anchor=west,draw=yellow!40,lan,minimum width=9.8em,fill=yellow!20] (box6) at ([xshift=0.4em]output.east) {};
-\node[anchor=west,lan] at ([xshift=0.4em]output.east) {德语：Hallo};
-\node[anchor=north,lan,minimum width=9.8em] (box7) at ([yshift=-2em]box4.south) {};
+\node[anchor=north,font=\footnotesize] (input) at ([xshift=2.13em,yshift=-0.6em]decode.south) {输入：};
+\node[anchor=west,lan](decode2) at ([xshift=0.4em]input.east) {英语：{\color{red}<german>} \ hello};
+\node[anchor=north,font=\footnotesize] (output) at ([xshift=2.13em,yshift=-2.6em]decode.south) {输出：};
+\node[anchor=west,lan](decode3) at ([xshift=0.4em]output.east) {德语：Hallo};
+\node[anchor=north,lan,minimum width=9.8em] (box7) at ([yshift=-4em]train3.south) {};

 \begin{pgfonlayer}{background}
-\node[fill=red!15,draw=red!30,rounded corners=2pt,inner ysep=6pt,line width=1pt][fit=(train)(box4)]{};
-\node[fill=green!20,,draw=green!40,rounded corners=2pt,inner ysep=6pt,line width=1pt][fit=(decode)(box7)(box6)]{};
+\node[fill=red!20,draw=black,rounded corners=2pt,inner ysep=6pt,line width=1pt][fit=(train)(train4)(train1)(train2)(train3)]{};
+\node[fill=blue!20,,draw=black,rounded corners=2pt,inner ysep=6pt,line width=1pt][fit=(decode)(output)(decode2)(decode3)(box7)]{};
 \end{pgfonlayer}
 \end{tikzpicture}


--- a/Chapter16/Figures/figure-multitask-learning-in-machine-translation-1.tex
+++ b/Chapter16/Figures/figure-multitask-learning-in-machine-translation-1.tex
+
+%%% outline
+%-------------------------------------------------------------------------
+\begin{tikzpicture}
+\tikzstyle{rec} = [line width=0.6pt,draw,rounded corners,minimum height=2.2em,minimum width=4.3em]
+
+
+
+\node [anchor=center] (node1-1) at (0,0) {\small{$y'$}};
+\node[anchor=north,rec,fill=blue!20](node1-2) at ([yshift=-2.0em]node1-1.south) {\small{解码器}};
+\node[anchor=north,rec,fill=red!20](node1-3) at ([yshift=-2em]node1-2.south) {\small{编码器}};
+\node[anchor=east](node1-5) at ([xshift=-2em]node1-2.west) {\small{$y$}};
+\node[anchor=north](node1-4) at ([yshift=-2em]node1-3.south) {\small{$x$}};
+\draw [->,thick](node1-4.north)--(node1-3.south);
+\draw [->,thick](node1-5.east)--(node1-2.west);
+\draw [->,thick](node1-3.north)--(node1-2.south);
+\draw [->,thick](node1-2.north)--(node1-1.south);
+
+\node [anchor=center] (node2-1) at ([xshift=12.0em]node1-1.east) {\small{$y'$}};
+\node[anchor=north,rec,fill=blue!20](node2-2) at ([yshift=-2.0em]node2-1.south) {\small{解码器}};
+\node[anchor=north,rec,fill=red!20](node2-3) at ([yshift=-2em]node2-2.south) {\small{编码器}};
+\node[anchor=east](node2-5) at ([xshift=-2em]node2-2.west) {\small{$y$}};
+\node[anchor=north](node2-4) at ([yshift=-2em]node2-3.south) {\small{$x$}};
+\node[anchor=west,rec,fill=yellow!20](node2-6) at ([xshift=3.0em]node2-3.east) {\small{解码器}};
+\node[anchor=south](node2-7) at ([yshift=2em]node2-6.north) {\small{$x'$}};
+
+\draw [->,thick](node2-4.north)--(node2-3.south);
+\draw [->,thick](node2-5.east)--(node2-2.west);
+\draw [->,thick](node2-3.north)--(node2-2.south)node[pos=0.5,left,font=\scriptsize]{翻译};
+\draw [->,thick](node2-2.north)--(node2-1.south);
+\draw [->,thick](node2-3.east)--(node2-6.west)node[pos=0.5,above,font=\scriptsize]{重排序};
+\draw [->,thick](node2-6.north)--(node2-7.south);
+
+
+
+\node [anchor=north](pos1) at ([yshift=0em]node1-4.south) {\small{(a)单任务学习}};
+\node [anchor=west](pos2) at ([xshift=10.0em]pos1.east) {\small{(b)多任务学习}};
+
+\end{tikzpicture}
\ No newline at end of file
--- a/Chapter16/Figures/figure-target-side-multi-task-learning.tex
+++ b/Chapter16/Figures/figure-target-side-multi-task-learning.tex
 \begin{tikzpicture}
 \begin{scope}
 \node [anchor=center] (node1-1) at (0,0) {\small{$y'$}};
-\node[anchor=south,line width=0.6pt,draw,rounded corners,minimum height=1.5em,minimum width=4em,fill=blue!20](node1-2) at ([yshift=-3em]node1-1.south) {\small{softmax}};
+\node[anchor=south,line width=0.6pt,draw,rounded corners,minimum height=1.5em,minimum width=4.3em,fill=blue!20](node1-2) at ([yshift=-3em]node1-1.south) {\small{Softmax}};

-\node[anchor=north,line width=0.6pt,draw,rounded corners,minimum height=2.2em,minimum width=4em,fill=red!20](node1-3) at ([yshift=-2.0em]node1-2.south) {\small{Decoder}};
-\node[anchor=north,line width=0.6pt,draw,rounded corners,minimum height=2.2em,minimum width=4em,fill=yellow!20](node3-3) at ([yshift=-2.0em]node1-3.south) {\small{LM}};
+\node[anchor=north,line width=0.6pt,draw,rounded corners,minimum height=2.2em,minimum width=4.3em,fill=blue!20](node1-3) at ([yshift=-2.0em]node1-2.south) {\small{解码器}};
+\node[anchor=north,line width=0.6pt,draw,rounded corners,minimum height=2.2em,minimum width=4.3em,fill=yellow!20](node3-3) at ([yshift=-2.0em]node1-3.south) {\small{语言模型}};


-\node[anchor=west,line width=0.6pt,draw,rounded corners,minimum height=1.5em,minimum width=4em,fill=blue!20](node3-2) at ([xshift=2em]node3-3.east) {\small{softmax}};
+\node[anchor=west,line width=0.6pt,draw,rounded corners,minimum height=1.5em,minimum width=4.3em,fill=blue!20](node3-2) at ([xshift=2em]node3-3.east) {\small{Softmax}};
 \node [anchor=north] (node3-1) at ([yshift=3.0em]node3-2.north) {\small{$z'$}};


 \node[anchor=north](node3-41) at ([xshift=-0.6em,yshift=-2em]node3-3.south) {\small{$y$}};
 \node[anchor=north](node3-42) at ([xshift=0.6em,yshift=-2em]node3-3.south) {\small{$z$}};

-\node[anchor=east,line width=0.6pt,draw,rounded corners,minimum height=2.2em,minimum width=4em,fill=red!20](node2-1) at ([xshift=-2em]node1-3.west) {\small{Encoder}};
+\node[anchor=east,line width=0.6pt,draw,rounded corners,minimum height=2.2em,minimum width=4.3em,fill=red!20](node2-1) at ([xshift=-2em]node1-3.west) {\small{编码器}};
 \node[anchor=north](node2-2) at ([yshift=-2em]node2-1.south) {\small{$x$}};


@@ -34,9 +34,9 @@


 \node [anchor=east] (node2-1-1) at ([xshift=-12.0em,yshift=-4.25em]node1-1.west) {\small{$y'$}};
-\node[anchor=south,line width=0.6pt,draw,rounded corners,minimum height=1.5em,minimum width=4em,fill=blue!20](node2-1-2) at ([yshift=-3em]node2-1-1.south) {\small{softmax}};
-\node[anchor=north,line width=0.6pt,draw,rounded corners,minimum height=2.2em,minimum width=4em,fill=red!20](node2-1-3) at ([yshift=-2.0em]node2-1-2.south) {\small{Decoder}};
-\node[anchor=east,line width=0.6pt,draw,rounded corners,minimum height=2.2em,minimum width=4em,fill=red!20](node2-2-1) at ([xshift=-2em]node2-1-3.west) {\small{Encoder}};
+\node[anchor=south,line width=0.6pt,draw,rounded corners,minimum height=1.5em,minimum width=4.3em,fill=blue!20](node2-1-2) at ([yshift=-3em]node2-1-1.south) {\small{Softmax}};
+\node[anchor=north,line width=0.6pt,draw,rounded corners,minimum height=2.2em,minimum width=4.3em,fill=blue!20](node2-1-3) at ([yshift=-2.0em]node2-1-2.south) {\small{解码器}};
+\node[anchor=east,line width=0.6pt,draw,rounded corners,minimum height=2.2em,minimum width=4.3em,fill=red!20](node2-2-1) at ([xshift=-2em]node2-1-3.west) {\small{编码器}};
 \node[anchor=north](node2-2-2) at ([yshift=-2em]node2-2-1.south) {\small{$x$}};
 \node[anchor=north](node2-2-3) at ([yshift=-2em]node2-1-3.south) {\small{$y$}};


--- a/Chapter16/Figures/figure-optimization-of-the-model-initialization-method.tex
+++ b/Chapter16/Figures/figure-optimization-of-the-model-initialization-method.tex
+\begin{tabular}{c c}
+
+\begin{tikzpicture}
+\begin{scope}
+% ,minimum height =1em,minimum width=2em
+\tikzstyle{circle} = [draw,black,line width=0.6pt,inner sep=3.5pt,rounded corners=4pt,minimum width=2em]
+\tikzstyle{word} = [inner sep=3.5pt]
+
+\node[circle,fill=red!20](data) at (0,0) {数据};
+\node[circle,fill=blue!20](model) at ([xshift=5em]data.east) {模型};
+\node[word] (init) at ([xshift=-5em]data.west){初始化};
+
+\draw[->,thick] (init.east) -- ([xshift=-0.2em]data.west);
+\draw [->,thick] ([yshift=1pt]data.north) .. controls +(90:2em) and +(90:2em) .. ([yshift=1pt]model.north) node[above,midway] {参数优化};
+\draw [->,thick] ([yshift=1pt]model.south) .. controls +(-90:2em) and +(-90:2em) .. ([yshift=1pt]data.south) node[below,midway] {数据优化};
+
+\node[word] at ([xshift=-0.5em,yshift=-5em]data.south){（a）思路1};
+
+\end{scope}
+\end{tikzpicture}
+&
+\begin{tikzpicture}
+\begin{scope}
+% ,minimum height =1em,minimum width=2em
+\tikzstyle{circle} = [draw,black,line width=0.6pt,inner sep=3.5pt,rounded corners=4pt,minimum width=2em]
+\tikzstyle{word} = [inner sep=3.5pt]
+
+\node[circle,fill=red!20](data) at (0,0) {数据};
+\node[circle,fill=blue!20](model) at ([xshift=5em]data.east) {模型};
+\node[word] (init) at ([xshift=5em]model.east){初始化};
+
+\draw[->,thick] (init.west) -- ([xshift=0.2em]model.east);
+\draw [->,thick] ([yshift=1pt]data.north) .. controls +(90:2em) and +(90:2em) .. ([yshift=1pt]model.north) node[above,midway] {参数优化};
+\draw [->,thick] ([yshift=1pt]model.south) .. controls +(-90:2em) and +(-90:2em) .. ([yshift=1pt]data.south) node[below,midway] {数据优化};
+
+\node[word] at ([xshift=-0.5em,yshift=-5em]model.south){（b）思路2};
+
+\end{scope}
+\end{tikzpicture}
+
+\end{tabular}
\ No newline at end of file
--- a/Chapter16/Figures/figure-parameter-initialization-method-diagram.tex
+++ b/Chapter16/Figures/figure-parameter-initialization-method-diagram.tex
@@ -4,13 +4,13 @@
 \begin{tikzpicture}
 	\tikzstyle{node}=[rounded corners=4pt,draw,minimum height=3em,drop shadow,font=\footnotesize]

-\node[node,minimum width=6em,minimum height=2.4em,fill=blue!20] (encoder1) at (0,0){\small 编码器};
-\node[node,anchor=west,minimum width=6em,minimum height=2.4em,fill=blue!20] (encoder2) at ([xshift=4em,yshift=0em]encoder1.east){\small 编码器};
-\node[node,anchor=west,minimum width=6em,minimum height=2.4em,fill=red!20] (encoder3) at ([xshift=3em]encoder2.east){\small 编码器};
+\node[node,minimum width=6em,minimum height=2.4em,fill=red!20,line width=0.6pt] (encoder1) at (0,0){\small 编码器};
+\node[node,anchor=west,minimum width=6em,minimum height=2.4em,fill=red!20,line width=0.6pt] (encoder2) at ([xshift=4em,yshift=0em]encoder1.east){\small 编码器};
+\node[node,anchor=west,minimum width=6em,minimum height=2.4em,fill=red!40,line width=0.6pt] (encoder3) at ([xshift=3em]encoder2.east){\small 编码器};

-\node[node,anchor=north,minimum width=6em,minimum height=2.4em,fill=blue!20] (decoder1) at ([yshift=-3em]encoder1.south){\small 解码器};
-\node[node,anchor=west,minimum width=6em,minimum height=2.4em,fill=blue!20] (decoder2) at ([xshift=4em,yshift=0em]decoder1.east){\small 解码器};
-\node[node,anchor=west,minimum width=6em,minimum height=2.4em,fill=red!20] (decoder3) at ([xshift=3em]decoder2.east){\small 解码器};
+\node[node,anchor=north,minimum width=6em,minimum height=2.4em,fill=blue!20,line width=0.6pt] (decoder1) at ([yshift=-3em]encoder1.south){\small 解码器};
+\node[node,anchor=west,minimum width=6em,minimum height=2.4em,fill=blue!20,line width=0.6pt] (decoder2) at ([xshift=4em,yshift=0em]decoder1.east){\small 解码器};
+\node[node,anchor=west,minimum width=6em,minimum height=2.4em,fill=blue!40,line width=0.6pt] (decoder3) at ([xshift=3em]decoder2.east){\small 解码器};

 \node[anchor=north,font=\scriptsize,fill=yellow!20] (w1) at ([yshift=-1.6em]decoder1.south){知识 \ 就是 \ 力量 \ 。 \ <EOS>};
 \node[anchor=north,font=\scriptsize,fill=green!20] (w3) at ([yshift=-1.6em]decoder3.south){Wissen  \ ist \ Machit \ . \ <EOS>};
@@ -24,7 +24,7 @@
 \draw[->,thick] (w4.-90) -- (encoder3.90);

 \node [anchor=north,single arrow,minimum height=2.2em,fill=blue!20,rotate=-90] (arrow1) at ([yshift=-1.4em,xshift=0.4em]encoder1.south) {};
-\node [anchor=north,single arrow,minimum height=2.2em,fill=blue!20,rotate=-90] (arrow2) at ([yshift=-1.4em,xshift=0.4em]encoder2.south) {};
+\node [anchor=north,single arrow,minimum height=2.2em,fill=red!20,rotate=-90] (arrow2) at ([yshift=-1.4em,xshift=0.4em]encoder2.south) {};
 \node [anchor=north,single arrow,minimum height=2.2em,fill=red!20,rotate=-90] (arrow3) at ([yshift=-1.4em,xshift=0.4em]encoder3.south) {};

 \node[anchor=south,yshift=3.4em] at (encoder1.north){\small\bfnew{父模型}};

--- a/Chapter16/Figures/figure-pivot-based-translation-process.tex
+++ b/Chapter16/Figures/figure-pivot-based-translation-process.tex
@@ -3,11 +3,11 @@
 %-------------------------------------------------------------------------
 \begin{tikzpicture}

-\node[draw,circle,inner sep=2pt,minimum size=2em,fill=blue!20] (x) at (0,0) {$\seq{x}$};
+\node[draw,circle,inner sep=2pt,minimum size=2em,fill=blue!20,line width=0.6pt] (x) at (0,0) {$\seq{x}$};

-\node[draw,circle,inner sep=2pt,minimum size=2em,fill=red!15] (p) at (2,0) {$\seq{p}$};
+\node[draw,circle,inner sep=2pt,minimum size=2em,fill=red!20,line width=0.6pt] (p) at (2,0) {$\seq{p}$};

-\node[draw,circle,inner sep=2pt,minimum size=2em,fill=blue!20] (y) at (4,0) {$\seq{y}$};
+\node[draw,circle,inner sep=2pt,minimum size=2em,fill=blue!20,line width=0.6pt] (y) at (4,0) {$\seq{y}$};

 \draw[-,dashed,thick,black!50] (x.0) -- (p.180);
 \draw[-,dashed,thick,black!50] (p.0) -- (y.180);

--- a/Chapter16/Figures/figure-schematic-of-the-domain-discriminator.jpg
+++ b/Chapter16/Figures/figure-schematic-of-the-domain-discriminator.jpg
--- a/Chapter16/Figures/figure-schematic-of-the-domain-discriminator.tex
+++ b/Chapter16/Figures/figure-schematic-of-the-domain-discriminator.tex
+\begin{tikzpicture}
+\tikzstyle{rec} = [,line width=0.6pt,draw,rounded corners,minimum height=2.2em,minimum width=4.3em,fill=blue!20]
+\node [anchor=center](node1) at (0,0) {源语言};
+
+\node [anchor=west,rec,fill=red!20](node2) at ([xshift=2.0em]node1.east){编码器};
+\node [anchor=west,rec](node3) at ([xshift=3.0em,yshift=2.0em]node2.east){解码器};
+\node [anchor=west,rec,fill=yellow!20](node4) at ([xshift=3.0em,yshift=-2.0em]node2.east){鉴别器};
+
+\draw [->,thick](node1.east)--(node2.west);
+\draw [->,thick](node2.east)--([xshift=1.5em]node2.east)--([xshift=1.5em,yshift=2.0em]node2.east)--(node3.west);
+\draw [->,thick](node2.east)--([xshift=1.5em]node2.east)--([xshift=1.5em,yshift=-2.0em]node2.east)--(node4.west);
+\node [anchor=west,minimum width=5.0em](node5) at ([xshift=2.0em]node3.east) {目标语言};
+\node [anchor=west,minimum width=5.0em](node6) at ([xshift=2.0em]node4.east) {< 领域 >};
+\draw [->,thick](node3.east)--(node5.west);
+\draw [->,thick](node4.east)--(node6.west);
+\end{tikzpicture}
\ No newline at end of file
--- a/Chapter16/Figures/figure-shared-space-inductive-bilingual-dictionary.png
+++ b/Chapter16/Figures/figure-shared-space-inductive-bilingual-dictionary.png
--- a/Chapter16/Figures/figure-shared-space-inductive-bilingual-dictionary.tex
+++ b/Chapter16/Figures/figure-shared-space-inductive-bilingual-dictionary.tex
@@ -47,38 +47,38 @@
 \node [anchor=south](pos2-2) at ([yshift=-0.5em]pos2.north){\scriptsize{词典}};

 %circle1
-\node[rec,anchor=center,rotate=60,fill=green!30](c1x1) at ([xshift=-7em,yshift=-1.4em]circle1.east){\tiny{1}};
-\node[rec,anchor=center,rotate=60,fill=green!30](c1x2) at ([xshift=-4.5em,yshift=1.8em]circle1.east){\tiny{2}};
-\node[rec,anchor=center,rotate=60,fill=green!30](c1x3) at ([xshift=-4em,yshift=-0.5em]circle1.east){\tiny{3}};
-\node[rec,anchor=center,rotate=60,fill=green!30](c1x4) at ([xshift=-3.5em,yshift=-2.5em]circle1.east){\tiny{4}};
-\node[rec,anchor=center,rotate=60,fill=green!30](c1x5) at ([xshift=-2em,yshift=1.0em]circle1.east){\tiny{5}};
+\node[rec,anchor=center,rotate=60,fill=green!40](c1x1) at ([xshift=-7em,yshift=-1.4em]circle1.east){\tiny{1}};
+\node[rec,anchor=center,rotate=60,fill=green!40](c1x2) at ([xshift=-4.5em,yshift=1.8em]circle1.east){\tiny{2}};
+\node[rec,anchor=center,rotate=60,fill=green!40](c1x3) at ([xshift=-4em,yshift=-0.5em]circle1.east){\tiny{3}};
+\node[rec,anchor=center,rotate=60,fill=green!40](c1x4) at ([xshift=-3.5em,yshift=-2.5em]circle1.east){\tiny{4}};
+\node[rec,anchor=center,rotate=60,fill=green!40](c1x5) at ([xshift=-2em,yshift=1.0em]circle1.east){\tiny{5}};

 %circle2
-\node[cir,anchor=center,rotate=-30,fill=red!30] (c2a) at ([xshift=-5.3em,yshift=2.15em]circle2.east){\tiny{a}};
-\node[cir,anchor=east,rotate=-30,fill=red!30] (c2b) at ([xshift=2.0em,yshift=-1.25em]c2a.east){\tiny{b}};
-\node[cir,anchor=east,rotate=-30,fill=red!30] (c2c) at ([xshift=0.8em,yshift=-3.9em]c2a.south){\tiny{c}};
-\node[cir,anchor=east,rotate=-30,fill=red!30] (c2x) at ([xshift=-0.3em,yshift=-1.9em]c2a.south){\tiny{x}};
-\node[cir,anchor=west,rotate=-30,fill=red!30] (c2y) at ([xshift=1.15em,yshift=-2.85em]c2a.east){\tiny{y}};
+\node[cir,anchor=center,rotate=-30,fill=red!40] (c2a) at ([xshift=-5.3em,yshift=2.15em]circle2.east){\tiny{a}};
+\node[cir,anchor=east,rotate=-30,fill=red!40] (c2b) at ([xshift=2.0em,yshift=-1.25em]c2a.east){\tiny{b}};
+\node[cir,anchor=east,rotate=-30,fill=red!40] (c2c) at ([xshift=0.8em,yshift=-3.9em]c2a.south){\tiny{c}};
+\node[cir,anchor=east,rotate=-30,fill=red!40] (c2x) at ([xshift=-0.3em,yshift=-1.9em]c2a.south){\tiny{x}};
+\node[cir,anchor=west,rotate=-30,fill=red!40] (c2y) at ([xshift=1.15em,yshift=-2.85em]c2a.east){\tiny{y}};

 %circle3
-\node[rec,anchor=center,rotate=-30,fill=green!30] (c3x1) at ([xshift=-6.7em,yshift=1.75em]circle3.east){\tiny{1}};
-\node[rec,anchor=east,rotate=-30,fill=green!30] (c3x2) at ([xshift=4.7em,yshift=-0.95em]c3x1.east){\tiny{2}};
-\node[rec,anchor=east,rotate=-30,fill=green!30] (c3x3) at ([xshift=2.6em,yshift=-2.4em]c3x1.south){\tiny{3}};
-\node[rec,anchor=east,rotate=-30,fill=green!30] (c3x4) at ([xshift=0.35em,yshift=-2.7em]c3x1.south){\tiny{4}};
-\node[rec,anchor=west,rotate=-30,fill=green!30] (c3x5) at ([xshift=2.35em,yshift=-3.85em]c3x1.east){\tiny{5}};
+\node[rec,anchor=center,rotate=-30,fill=green!40] (c3x1) at ([xshift=-6.7em,yshift=1.75em]circle3.east){\tiny{1}};
+\node[rec,anchor=east,rotate=-30,fill=green!40] (c3x2) at ([xshift=4.7em,yshift=-0.95em]c3x1.east){\tiny{2}};
+\node[rec,anchor=east,rotate=-30,fill=green!40] (c3x3) at ([xshift=2.6em,yshift=-2.4em]c3x1.south){\tiny{3}};
+\node[rec,anchor=east,rotate=-30,fill=green!40] (c3x4) at ([xshift=0.35em,yshift=-2.7em]c3x1.south){\tiny{4}};
+\node[rec,anchor=west,rotate=-30,fill=green!40] (c3x5) at ([xshift=2.35em,yshift=-3.85em]c3x1.east){\tiny{5}};

 %circle4
-\node[rec,anchor=center,rotate=-30,fill=green!30] (c4x1) at ([xshift=-6.7em,yshift=1.75em]circle4.east){\tiny{1}};
-\node[rec,anchor=east,rotate=-30,fill=green!30] (c4x2) at ([xshift=4.7em,yshift=-0.95em]c4x1.east){\tiny{2}};
-\node[rec,anchor=east,rotate=-30,fill=green!30] (c4x3) at ([xshift=2.6em,yshift=-2.4em]c4x1.south){\tiny{3}};
-\node[rec,anchor=east,rotate=-30,fill=green!30] (c4x4) at ([xshift=0.35em,yshift=-2.7em]c4x1.south){\tiny{4}};
-\node[rec,anchor=west,rotate=-30,fill=green!30] (c4x5) at ([xshift=2.35em,yshift=-3.85em]c4x1.east){\tiny{5}};
+\node[rec,anchor=center,rotate=-30,fill=green!40] (c4x1) at ([xshift=-6.7em,yshift=1.75em]circle4.east){\tiny{1}};
+\node[rec,anchor=east,rotate=-30,fill=green!40] (c4x2) at ([xshift=4.7em,yshift=-0.95em]c4x1.east){\tiny{2}};
+\node[rec,anchor=east,rotate=-30,fill=green!40] (c4x3) at ([xshift=2.6em,yshift=-2.4em]c4x1.south){\tiny{3}};
+\node[rec,anchor=east,rotate=-30,fill=green!40] (c4x4) at ([xshift=0.35em,yshift=-2.7em]c4x1.south){\tiny{4}};
+\node[rec,anchor=west,rotate=-30,fill=green!40] (c4x5) at ([xshift=2.35em,yshift=-3.85em]c4x1.east){\tiny{5}};

-\node[cir,anchor=center,rotate=-30,fill=red!30] (c4a) at ([xshift=-5.3em,yshift=2.15em]circle4.east){\tiny{a}};
-\node[cir,anchor=east,rotate=-30,fill=red!30] (c4b) at ([xshift=2.0em,yshift=-1.25em]c4a.east){\tiny{b}};
-\node[cir,anchor=east,rotate=-30,fill=red!30] (c4c) at ([xshift=0.8em,yshift=-3.9em]c4a.south){\tiny{c}};
-\node[cir,anchor=east,rotate=-30,fill=red!30] (c4x) at ([xshift=-0.3em,yshift=-1.9em]c4a.south){\tiny{x}};
-\node[cir,anchor=west,rotate=-30,fill=red!30] (c4y) at ([xshift=1.15em,yshift=-2.85em]c4a.east){\tiny{y}};
+\node[cir,anchor=center,rotate=-30,fill=red!40] (c4a) at ([xshift=-5.3em,yshift=2.15em]circle4.east){\tiny{a}};
+\node[cir,anchor=east,rotate=-30,fill=red!40] (c4b) at ([xshift=2.0em,yshift=-1.25em]c4a.east){\tiny{b}};
+\node[cir,anchor=east,rotate=-30,fill=red!40] (c4c) at ([xshift=0.8em,yshift=-3.9em]c4a.south){\tiny{c}};
+\node[cir,anchor=east,rotate=-30,fill=red!40] (c4x) at ([xshift=-0.3em,yshift=-1.9em]c4a.south){\tiny{x}};
+\node[cir,anchor=west,rotate=-30,fill=red!40] (c4y) at ([xshift=1.15em,yshift=-2.85em]c4a.east){\tiny{y}};

 \draw [color=red,line width=0.7pt,rotate=18] ([xshift=-5.1em,yshift=3.7em]circle4.east) ellipse (1.6em and 0.9em); 
 \draw [color=red,line width=0.7pt,rotate=-5] ([xshift=-2.8em,yshift=0.6em]circle4.east) ellipse (1.6em and 0.9em);

--- a/Chapter16/Figures/figure-the-iterative-process-of-bidirectional-training.jpg
+++ b/Chapter16/Figures/figure-the-iterative-process-of-bidirectional-training.jpg
--- a/Chapter16/Figures/figure-the-iterative-process-of-bidirectional-training.png
+++ b/Chapter16/Figures/figure-the-iterative-process-of-bidirectional-training.png
--- a/Chapter16/Figures/figure-the-meaning-of-pitch-in-different-fields.jpg
+++ b/Chapter16/Figures/figure-the-meaning-of-pitch-in-different-fields.jpg
--- a/Chapter16/Figures/figure-unmt-idea1.jpg
+++ b/Chapter16/Figures/figure-unmt-idea1.jpg
--- a/Chapter16/Figures/figure-unmt-idea2.jpg
+++ b/Chapter16/Figures/figure-unmt-idea2.jpg
--- a/Chapter16/Figures/figure-unmt-idea3.jpg
+++ b/Chapter16/Figures/figure-unmt-idea3.jpg
--- a/Chapter16/Figures/figure-unmt-process.jpg
+++ b/Chapter16/Figures/figure-unmt-process.jpg
--- a/Chapter16/Figures/figure-unmt-process.tex
+++ b/Chapter16/Figures/figure-unmt-process.tex
+
+\begin{tikzpicture}
+\begin{scope}
+\tikzstyle{circle} = [draw,black,line width=0.6pt,inner sep=3.5pt,rounded corners=4pt,minimum width=2em,align=center,fill=blue!20]
+\tikzstyle{word} = [inner sep=3.5pt]
+
+\node[circle](center) at (0,0) {
+\begin{tabular}{c | c}
+$x\rightarrow y$ & $y\rightarrow x$ \\
+模型 & 模型
+\end{tabular}
+};
+\node[circle,fill=red!20] (left) at ([xshift=-9em]center.west) {$x\rightarrow y$ \\ 数据};
+\node[circle,fill=red!20] (right) at ([xshift=9em]center.east) {$y\rightarrow x$ \\ 数据};
+
+\node[word] (init) at ([yshift=6em]center.north){初始化};
+
+\node[circle,fill=red!20] (down) at ([yshift=-8em]center.south) {$x,y$ \\ 数据};
+
+\draw[->,thick] (init.south) -- ([yshift=0.2em]center.north);
+\draw[->,thick] ([yshift=0.2em]down.north) -- ([yshift=-0.2em]center.south) node[pos=0.6,midway,align=left,xshift=-2.5em,yshift=0.5em] {语言模型\\目标函数};
+\node [anchor=center] at ([yshift=2.0em,xshift=-2.5em]down.north){（模型优化）};
+\draw[->,thick] ([yshift=1pt]left.north) .. controls +(90:2em) and +(90:2em) .. ([yshift=1pt,xshift=-2.2em]center.north) node[above,midway,align=center] {翻译模型目标函数\\（模型优化）};
+\draw[->,thick] ([yshift=1pt,xshift=-1.8em]center.north) .. controls +(90:2em) and +(90:2em) .. ([yshift=1pt]right.north) node[above,pos=0.6,align=center] {回译\\（数据优化）};
+
+\draw [->,thick] ([yshift=1pt]right.south) .. controls +(-90:2em) and +(-90:2em) .. ([yshift=1pt,xshift=2.2em]center.south) node[below,midway,align=center] {翻译模型目标函数\\（模型优化）};
+\draw [->,thick] ([yshift=1pt,xshift=1.8em]center.south) .. controls +(-90:2em) and +(-90:2em) .. ([yshift=1pt]left.south) node[below,pos=0.6,align=center] {回译\\（数据优化）};
+
+\end{scope}
+\end{tikzpicture}
--- a/Chapter16/Figures/figure-unsupervised-dual-learning-process.jpg.jpg
+++ b/Chapter16/Figures/figure-unsupervised-dual-learning-process.jpg.jpg
--- a/Chapter16/Figures/figure-unsupervised-dual-learning-process.tex
+++ b/Chapter16/Figures/figure-unsupervised-dual-learning-process.tex
+\begin{tikzpicture}
+
+\tikzstyle{circle} = [draw,black,line width=0.6pt,inner sep=3.5pt,rounded corners=4pt,minimum width=2em]
+\tikzstyle{word} = [inner sep=3.5pt]
+
+\node [anchor=center] (node1-1) at (0,0) {\small{\seq{x}}};
+\node [anchor=west] (node1-2) at ([xshift=0.8em]node1-1.east) {\small{\seq{y}}};
+\node [anchor=north] (node1-3) at ([xshift=1.0em]node1-1.south) {\small{翻译模型f}};
+\draw [->,line width=0.6pt](node1-1.east)--(node1-2.west);
+
+\begin{pgfonlayer}{background}
+{
+\node[fill=blue!20,inner sep=0.1em,draw=black,line width=0.6pt,minimum width=5em,drop shadow,rounded corners=2pt] [fit =(node1-1)(node1-2)(node1-3)]  (remark1) {};
+}
+\end{pgfonlayer}
+
+\node[anchor=north,circle,fill=red!20,minimum width=6.8em](node2) at ([xshift=-6.0em,yshift=-2.0em]remark1.south) {源语言句子$\seq{x}$};
+\node[anchor=north,circle,fill=red!20,minimum width=6.8em](node2-2) at ([yshift=-0.2em]node2.south) {新生成句子$\seq{x'}$};
+\draw [->,thick]([yshift=0.2em]node2.north).. controls (-1.93,-1.5) and (-2.0,-0.2)..([xshift=-0.2em]remark1.west);
+\node[anchor=north,circle,fill=red!20](node3) at ([xshift=6.5em,yshift=-2.0em]remark1.south) {目标语言句子$\seq{x}$};
+\draw [->,thick]([xshift=0.2em]remark1.east).. controls (2.9,-0.25) and (2.9,-0.7) ..([yshift=0.2em]node3.north);
+
+
+\node [anchor=north] (node4-1) at ([xshift=-1.0em,yshift=-7.0em]remark1.south) {\small{\seq{y}}};
+\node [anchor=west] (node4-2) at ([xshift=0.8em]node4-1.east) {\small{\seq{x}}};
+\node [anchor=north] (node4-3) at ([xshift=1.0em]node4-1.south) {\small{翻译模型g}};
+\draw [->,line width=0.6pt](node4-1.east)--(node4-2.west);
+
+\begin{pgfonlayer}{background}
+{
+\node[fill=yellow!20,inner sep=0.1em,draw=black,line width=0.6pt,minimum width=5em,drop shadow,rounded corners=2pt] [fit =(node4-1)(node4-2)(node4-3)]  (remark2) {};
+}
+\end{pgfonlayer}
+
+\draw [->,thick]([xshift=-0.2em]remark2.west).. controls (-0.8,-4.12) and (-1.95,-4.12)..([yshift=-0.2em]node2-2.south);
+\draw [->,thick]([yshift=-0.2em]node3.south).. controls (2.9,-3) and (2.9,-4.1)..([xshift=0.2em]remark2.east);
+
+\end{tikzpicture}
\ No newline at end of file
--- a/Chapter16/chapter16.tex
+++ b/Chapter16/chapter16.tex
--- a/Chapter6/chapter6.tex
+++ b/Chapter6/chapter6.tex
@@ -32,7 +32,7 @@
 %----------------------------------------------------------------------------------------

 \sectionnewpage
-\section{基于扭曲度的翻译模型}
+\section{基于扭曲度的模型}

 下面将介绍扭曲度在机器翻译中的定义及使用方法。这也带来了两个新的翻译模型\ \dash\ IBM模型2\upcite{DBLP:journals/coling/BrownPPM94}和HMM翻译模型\upcite{vogel1996hmm}。

@@ -161,7 +161,7 @@
 %----------------------------------------------------------------------------------------

 \sectionnewpage
-\section{基于繁衍率的翻译模型}
+\section{基于繁衍率的模型}

 下面介绍翻译中的一对多问题，以及这个问题所带来的句子长度预测问题。


--- a/bibliography.bib
+++ b/bibliography.bib
@@ -8836,7 +8836,6 @@ author    = {Zhuang Liu and
  publisher = {Annual Meeting of the Association for Computational Linguistics},
  year      = {2017}
 }
-
 @inproceedings{DBLP:conf/emnlp/EdunovOAG18,
  author    = {Sergey Edunov and
               Myle Ott and
@@ -8959,15 +8958,6 @@ author    = {Zhuang Liu and
  volume    = {abs/1706.05098},
  year      = {2017}
 }
-@inproceedings{DBLP:conf/emnlp/DomhanH17,
-  author    = {Tobias Domhan and
-               Felix Hieber},
-  title     = {Using Target-side Monolingual Data for Neural Machine Translation
-               through Multi-task Learning},
-  pages     = {1500--1505},
-  publisher = {Annual Meeting of the Association for Computational Linguistics},
-  year      = {2017}
-}
 @inproceedings{DBLP:conf/icml/XiaQCBYL17,
  author    = {Yingce Xia and
               Tao Qin and
@@ -9014,13 +9004,6 @@ author    = {Zhuang Liu and
  publisher = {The {MIT} Press},
  year      = {1999}
 }
-@inproceedings{lample2019cross,
-  author    = {Alexis Conneau and
-               Guillaume Lample},
-  title     = {Cross-lingual Language Model Pretraining},
-  pages     = {7057--7067},
-  year      = {2019}
-}
 @inproceedings{DBLP:conf/aclnmt/HoangKHC18,
  author    = {Cong Duy Vu Hoang and
               Philipp Koehn and
@@ -9042,15 +9025,6 @@ author    = {Zhuang Liu and
  publisher = {{PMLR}},
  year      = {2018}
 }
-@inproceedings{DBLP:conf/acl/FadaeeBM17a,
-  author    = {Marzieh Fadaee and
-               Arianna Bisazza and
-               Christof Monz},
-  title     = {Data Augmentation for Low-Resource Neural Machine Translation},
-  pages     = {567--573},
-  publisher = {Annual Meeting of the Association for Computational Linguistics},
-  year      = {2017}
-}
 @inproceedings{finding2006adafre,
  author    = {S. F. Adafre and Maarten de Rijke},
  title     = {Finding Similar Sentences across Multiple Languages in Wikipedia },
@@ -9074,24 +9048,6 @@ author    = {Zhuang Liu and
  pages     = {477--504},
  year      = {2005}
 }
-@inproceedings{DBLP:conf/naacl/SmithQT10,
-  author    = {Jason R. Smith and
-               Chris Quirk and
-               Kristina Toutanova},
-  title     = {Extracting Parallel Sentences from Comparable Corpora using Document
-               Level Alignment},
-  pages     = {403--411},
-  publisher = {Annual Meeting of the Association for Computational Linguistics},
-  year      = {2010}
-}
-@inproceedings{DBLP:conf/emnlp/ZhangZ16,
-  author    = {Jiajun Zhang and
-               Chengqing Zong},
-  title     = {Exploiting Source-side Monolingual Data in Neural Machine Translation},
-  pages     = {1535--1545},
-  publisher = {Annual Meeting of the Association for Computational Linguistics},
-  year      = {2016}
-}
 @inproceedings{DBLP:conf/acl/XiaKAN19,
  author    = {Mengzhou Xia and
               Xiang Kong and
@@ -9102,17 +9058,6 @@ author    = {Zhuang Liu and
  publisher = {Annual Meeting of the Association for Computational Linguistics},
  year      = {2019}
 }
-@inproceedings{DBLP:conf/emnlp/WangPDN18,
-  author    = {Xinyi Wang and
-               Hieu Pham and
-               Zihang Dai and
-               Graham Neubig},
-  title     = {SwitchOut: an Efficient Data Augmentation Algorithm for Neural Machine
-               Translation},
-  pages     = {856--861},
-  publisher = {Annual Meeting of the Association for Computational Linguistics},
-  year      = {2018}
-}
 @inproceedings{DBLP:conf/acl/GaoZWXQCZL19,
  author    = {Fei Gao and
               Jinhua Zhu and
@@ -9127,17 +9072,6 @@ author    = {Zhuang Liu and
  publisher = {Annual Meeting of the Association for Computational Linguistics},
  year      = {2019}
 }
-@inproceedings{DBLP:conf/emnlp/WangLWLS19,
-  author    = {Shuo Wang and
-               Yang Liu and
-               Chao Wang and
-               Huanbo Luan and
-               Maosong Sun},
-  title     = {Improving Back-Translation with Uncertainty-based Confidence Estimation},
-  pages     = {791--802},
-  publisher = {Annual Meeting of the Association for Computational Linguistics},
-  year      = {2019}
-}
 @inproceedings{DBLP:conf/emnlp/WuWXQLL19,
  author    = {Lijun Wu and
               Yiren Wang and
@@ -9176,7 +9110,6 @@ author    = {Zhuang Liu and
  journal = {Computer Science},
  year = {2015},
 }
-
 @phdthesis{黄书剑0统计机器翻译中的词对齐研究,
  title={统计机器翻译中的词对齐研究},
  author={黄书剑},
@@ -9199,16 +9132,6 @@ author    = {Zhuang Liu and
  publisher = {Annual Meeting of the Association for Computational Linguistics},
  year      = {2016}
 }
-@inproceedings{DBLP:conf/iclr/SmithTHH17,
-  author    = {Samuel L. Smith and
-               David H. P. Turban and
-               Steven Hamblin and
-               Nils Y. Hammerla},
-  title     = {Offline bilingual word vectors, orthogonal transformations and the
-               inverted softmax},
-  publisher = {International Conference on Learning Representations},
-  year      = {2017}
-}
 @inproceedings{DBLP:conf/acl/ArtetxeLA17,
  author    = {Mikel Artetxe and
               Gorka Labaka and
@@ -9227,7 +9150,6 @@ author    = {Zhuang Liu and
  pages={1-10},
  year={1966},
 }
-
 @inproceedings{DBLP:conf/iclr/LampleCRDJ18,
  author    = {Guillaume Lample and
               Alexis Conneau and
@@ -9248,16 +9170,6 @@ author    = {Zhuang Liu and
  publisher = {Annual Meeting of the Association for Computational Linguistics},
  year      = {2017}
 }
-@inproceedings{DBLP:conf/emnlp/XuYOW18,
-  author    = {Ruochen Xu and
-               Yiming Yang and
-               Naoki Otani and
-               Yuexin Wu},
-  title     = {Unsupervised Cross-lingual Transfer of Word Embedding Spaces},
-  pages     = {2465--2474},
-  publisher = {Annual Meeting of the Association for Computational Linguistics},
-  year      = {2018}
-}
 @inproceedings{DBLP:conf/emnlp/Alvarez-MelisJ18,
  author    = {David Alvarez-Melis and
               Tommi S. Jaakkola},
@@ -9310,15 +9222,6 @@ author    = {Zhuang Liu and
  publisher = {Annual Meeting of the Association for Computational Linguistics},
  year      = {2019}
 }
-@inproceedings{DBLP:conf/acl/SogaardVR18,
-  author    = {Anders S{\o}gaard and
-               Sebastian Ruder and
-               Ivan Vulic},
-  title     = {On the Limitations of Unsupervised Bilingual Dictionary Induction},
-  pages     = {778--788},
-  publisher = {Annual Meeting of the Association for Computational Linguistics},
-  year      = {2018}
-}
 @article{DBLP:journals/talip/MarieF20,
  author    = {Benjamin Marie and
               Atsushi Fujita},
@@ -9351,15 +9254,6 @@ author    = {Zhuang Liu and
  publisher = {Annual Meeting of the Association for Computational Linguistics},
  year      = {2019}
 }
-@inproceedings{DBLP:conf/iclr/LampleCDR18,
-  author    = {Guillaume Lample and
-               Alexis Conneau and
-               Ludovic Denoyer and
-               Marc'Aurelio Ranzato},
-  title     = {Unsupervised Machine Translation Using Monolingual Corpora Only},
-  publisher = {International Conference on Learning Representations},
-  year      = {2018}
-}
 @inproceedings{DBLP:conf/nips/ConneauL19,
  author    = {Alexis Conneau and
               Guillaume Lample},
@@ -9388,7 +9282,6 @@ author    = {Zhuang Liu and
  publisher={International Conference on Computational Linguistics},
  year={2020}
 }
-
 @inproceedings{2018When,
  title={When and Why are Pre-trained Word Embeddings Useful for Neural Machine Translation?},
  author={ Qi, Ye  and  Sachan, Devendra Singh  and  Felix, Matthieu  and  Padmanabhan, Sarguna Janani  and  Neubig, Graham },
@@ -9404,16 +9297,6 @@ author    = {Zhuang Liu and
  publisher = {Annual Meeting of the Association for Computational Linguistics},
  year      = {2019}
 }
-@inproceedings{DBLP:conf/emnlp/ImamuraS19,
-  author    = {Kenji Imamura and
-               Eiichiro Sumita},
-  title     = {Recycling a Pre-trained {BERT} Encoder for Neural Machine Translation},
-  booktitle = {Proceedings of the 3rd Workshop on Neural Generation and Translation@EMNLP-IJCNLP
-               2019, Hong Kong, November 4, 2019},
-  pages     = {23--31},
-  publisher = {Annual Meeting of the Association for Computational Linguistics},
-  year      = {2019}
-}
 @inproceedings{DBLP:conf/aaai/YangW0Z00020,
  author    = {Jiacheng Yang and
               Mingxuan Wang and
@@ -9538,7 +9421,6 @@ author    = {Zhuang Liu and
  publisher = {Annual Meeting of the Annual Meeting of the Association for Computational Linguistics},
  year      = {2019}
 }
-
 @article{DBLP:journals/corr/abs-1811-01124,
  author    = {Jean Alaux and
               Edouard Grave and
@@ -9549,16 +9431,6 @@ author    = {Zhuang Liu and
  volume    = {abs/1811.01124},
  year      = {2018}
 }
-@inproceedings{DBLP:conf/emnlp/XuYOW18,
-  author    = {Ruochen Xu and
-               Yiming Yang and
-               Naoki Otani and
-               Yuexin Wu},
-  title     = {Unsupervised Cross-lingual Transfer of Word Embedding Spaces},
-  pages     = {2465--2474},
-  publisher = {Annual Meeting of the Annual Meeting of the Association for Computational Linguistics},
-  year      = {2018}
-}
 @inproceedings{DBLP:conf/emnlp/DouZH18,
  author    = {Zi-Yi Dou and
               Zhi-Hao Zhou and
@@ -9595,18 +9467,6 @@ author    = {Zhuang Liu and
  publisher = {Annual Meeting of the Annual Meeting of the Association for Computational Linguistics},
  year      = {2018}
 }
-@inproceedings{DBLP:conf/emnlp/JoulinBMJG18,
-  author    = {Armand Joulin and
-               Piotr Bojanowski and
-               Tomas Mikolov and
-               Herv{\'{e}} J{\'{e}}gou and
-               Edouard Grave},
-  title     = {Loss in Translation: Learning Bilingual Word Mapping with a Retrieval
-               Criterion},
-  pages     = {2979--2984},
-  publisher = {Annual Meeting of the Annual Meeting of the Association for Computational Linguistics},
-  year      = {2018}
-}
 @inproceedings{DBLP:conf/emnlp/ChenC18,
  author    = {Xilun Chen and
               Claire Cardie},
@@ -9615,15 +9475,6 @@ author    = {Zhuang Liu and
  publisher = {Annual Meeting of the Annual Meeting of the Association for Computational Linguistics},
  year      = {2018}
 }
-@inproceedings{DBLP:conf/naacl/MohiuddinJ19,
-  author    = {Tasnim Mohiuddin and
-               Shafiq R. Joty},
-  title     = {Revisiting Adversarial Autoencoder for Unsupervised Word Translation
-               with Cycle Consistency and Improved Training},
-  pages     = {3857--3867},
-  publisher = {Annual Meeting of the Annual Meeting of the Association for Computational Linguistics},
-  year      = {2019}
-}
 @inproceedings{DBLP:conf/emnlp/TaitelbaumCG19,
  author    = {Hagai Taitelbaum and
               Gal Chechik and
@@ -9675,7 +9526,6 @@ author    = {Zhuang Liu and
  publisher = {Annual Meeting of the Annual Meeting of the Association for Computational Linguistics},
  year      = {2020}
 }
-
 @article{hartmann2018empirical,
  title={Empirical observations on the instability of aligning word vector spaces with GANs},
  author={Hartmann, Mareike and Kementchedjhieva, Yova and S{\o}gaard, Anders},
@@ -9699,7 +9549,6 @@ author    = {Zhuang Liu and
  pages     = {6031--6041},
  year      = {2019}
 }
-
 @inproceedings{DBLP:conf/emnlp/HartmannKS18,
  author    = {Mareike Hartmann and
               Yova Kementchedjhieva and
@@ -9710,17 +9559,6 @@ author    = {Zhuang Liu and
  publisher = {Annual Meeting of the Annual Meeting of the Association for Computational Linguistics},
  year      = {2018}
 }
-
-@inproceedings{DBLP:conf/emnlp/VulicGRK19,
-  author    = {Ivan Vulic and
-               Goran Glavas and
-               Roi Reichart and
-               Anna Korhonen},
-  title     = {Do We Really Need Fully Unsupervised Cross-Lingual Embeddings?},
-  pages     = {4406--4417},
-  publisher = {Annual Meeting of the Annual Meeting of the Association for Computational Linguistics},
-  year      = {2019}
-}
 @inproceedings{DBLP:conf/emnlp/JoulinBMJG18,
  author    = {Armand Joulin and
               Piotr Bojanowski and
@@ -9766,36 +9604,6 @@ author    = {Zhuang Liu and
  publisher = {Annual Conference of the North American Chapter of the Association for Computational Linguistics},
  year      = {2016}
 }
-@inproceedings{DBLP:conf/naacl/FiratCB16,
-  author    = {Orhan Firat and
-               Kyunghyun Cho and
-               Yoshua Bengio},
-  title     = {Multi-Way, Multilingual Neural Machine Translation with a Shared Attention
-               Mechanism},
-  pages     = {866--875},
-  publisher = {Annual Meeting of the Association for Computational Linguistics},
-  year      = {2016}
-}
-@article{DBLP:journals/tacl/JohnsonSLKWCTVW17,
-  author    = {Melvin Johnson and
-               Mike Schuster and
-               Quoc V. Le and
-               Maxim Krikun and
-               Yonghui Wu and
-               Zhifeng Chen and
-               Nikhil Thorat and
-               Fernanda B. Vi{\'{e}}gas and
-               Martin Wattenberg and
-               Greg Corrado and
-               Macduff Hughes and
-               Jeffrey Dean},
-  title     = {Google's Multilingual Neural Machine Translation System: Enabling
-               Zero-Shot Translation},
-  journal   = {Trans. Assoc. Comput. Linguistics},
-  volume    = {5},
-  pages     = {339--351},
-  year      = {2017}
-}
 @inproceedings{DBLP:conf/emnlp/KimPPKN19,
  author    = {Yunsu Kim and
               Petre Petrov and
@@ -9877,16 +9685,6 @@ author    = {Zhuang Liu and
  publisher = {Annual Meeting of the Association for Computational Linguistics},
  year      = {2007}
 }
-@article{DBLP:journals/mt/WuW07,
-  author    = {Hua Wu and
-               Haifeng Wang},
-  title     = {Pivot language approach for phrase-based statistical machine translation},
-  journal   = {Mach. Transl.},
-  volume    = {21},
-  number    = {3},
-  pages     = {165--181},
-  year      = {2007}
-}
 @inproceedings{DBLP:conf/acl/WuW09,
  author    = {Hua Wu and
               Haifeng Wang},
@@ -9987,17 +9785,6 @@ author    = {Zhuang Liu and
  publisher = {Annual Meeting of the Association for Computational Linguistics},
  year      = {2015}
 }
-@article{DBLP:journals/tacl/LeeCH17,
-  author    = {Jason Lee and
-               Kyunghyun Cho and
-               Thomas Hofmann},
-  title     = {Fully Character-Level Neural Machine Translation without Explicit
-               Segmentation},
-  journal   = {Trans. Assoc. Comput. Linguistics},
-  volume    = {5},
-  pages     = {365--378},
-  year      = {2017}
-}
 @inproceedings{DBLP:conf/lrec/RiktersPK18,
  author    = {Matiss Rikters and
               Marcis Pinnis and
@@ -10017,26 +9804,6 @@ author    = {Zhuang Liu and
  pages     = {1345--1359},
  year      = {2010}
 }
-@article{DBLP:journals/tacl/JohnsonSLKWCTVW17,
-  author    = {Melvin Johnson and
-               Mike Schuster and
-               Quoc V. Le and
-               Maxim Krikun and
-               Yonghui Wu and
-               Zhifeng Chen and
-               Nikhil Thorat and
-               Fernanda B. Vi{\'{e}}gas and
-               Martin Wattenberg and
-               Greg Corrado and
-               Macduff Hughes and
-               Jeffrey Dean},
-  title     = {Google's Multilingual Neural Machine Translation System: Enabling
-               Zero-Shot Translation},
-  journal   = {Trans. Assoc. Comput. Linguistics},
-  volume    = {5},
-  pages     = {339--351},
-  year      = {2017}
-}
 @book{2009Handbook,
  title={Handbook Of Research On Machine Learning Applications and Trends: Algorithms, Methods and Techniques - 2 Volumes},
  author={ Olivas, Emilio Soria  and  Guerrero, Jose David Martin  and  Sober, Marcelino Martinez  and  Benedito, Jose Rafael Magdalena  and  Lopez, Antonio Jose Serrano },
@@ -10122,35 +9889,6 @@ author    = {Zhuang Liu and
  pages={1--38},
  year={2020}
 }
-@inproceedings{DBLP:conf/emnlp/VulicGRK19,
-  author    = {Ivan Vulic and
-               Goran Glavas and
-               Roi Reichart and
-               Anna Korhonen},
-  title     = {Do We Really Need Fully Unsupervised Cross-Lingual Embeddings?},
-  pages     = {4406--4417},
-  publisher = {Annual Meeting of the Association for Computational Linguistics},
-  year      = {2019}
-}
-@article{DBLP:journals/corr/MikolovLS13,
-  author    = {Tomas Mikolov and
-               Quoc V. Le and
-               Ilya Sutskever},
-  title     = {Exploiting Similarities among Languages for Machine Translation},
-  journal   = {CoRR},
-  volume    = {abs/1309.4168},
-  year      = {2013}
-}
-@article{DBLP:journals/corr/MikolovLS13,
-  author    = {Tomas Mikolov and
-               Quoc V. Le and
-               Ilya Sutskever},
-  title     = {Exploiting Similarities among Languages for Machine Translation},
-  journal   = {CoRR},
-  volume    = {abs/1309.4168},
-  year      = {2013}
-}
-
 @inproceedings{DBLP:conf/emnlp/XuYOW18,
  author    = {Ruochen Xu and
               Yiming Yang and
@@ -10161,17 +9899,6 @@ author    = {Zhuang Liu and
  publisher = {Annual Meeting of the Association for Computational Linguistics},
  year      = {2018}
 }
-@inproceedings{DBLP:conf/iclr/LampleCRDJ18,
-  author    = {Guillaume Lample and
-               Alexis Conneau and
-               Marc'Aurelio Ranzato and
-               Ludovic Denoyer and
-               Herv{\'{e}} J{\'{e}}gou},
-  title     = {Word translation without parallel data},
-  publisher = {International Conference on Learning Representations},
-  year      = {2018}
-}
-
 @inproceedings{DBLP:conf/emnlp/ZhangLLS17,
  author    = {Meng Zhang and
               Yang Liu and
@@ -10183,17 +9910,6 @@ author    = {Zhuang Liu and
  publisher = {Conference on Empirical Methods in Natural Language Processing},
  year      = {2017}
 }
-@inproceedings{DBLP:conf/naacl/MohiuddinJ19,
-  author    = {Tasnim Mohiuddin and
-               Shafiq R. Joty},
-  title     = {Revisiting Adversarial Autoencoder for Unsupervised Word Translation
-               with Cycle Consistency and Improved Training},
-  pages     = {3857--3867},
-  publisher = {Annual Meeting of the Association for Computational Linguistics},
-  year      = {2019}
-}
-
-
 @inproceedings{DBLP:conf/emnlp/ArtetxeLA18,
  author    = {Mikel Artetxe and
               Gorka Labaka and
@@ -10203,7 +9919,6 @@ author    = {Zhuang Liu and
  publisher = {Conference on Empirical Methods in Natural Language Processing},
  year      = {2018}
 }
-
 @article{DBLP:journals/tacl/LeeCH17,
  author    = {Jason Lee and
               Kyunghyun Cho and
@@ -10231,29 +9946,9 @@ author    = {Zhuang Liu and
               Alexander H. Waibel},
  title     = {Toward Multilingual Neural Machine Translation with Universal Encoder
               and Decoder},
-  journal   = {CoRR},
-  volume    = {abs/1611.04798},
-  year      = {2016}
-}
-@article{DBLP:journals/tacl/JohnsonSLKWCTVW17,
-  author    = {Melvin Johnson and
-               Mike Schuster and
-               Quoc V. Le and
-               Maxim Krikun and
-               Yonghui Wu and
-               Zhifeng Chen and
-               Nikhil Thorat and
-               Fernanda B. Vi{\'{e}}gas and
-               Martin Wattenberg and
-               Greg Corrado and
-               Macduff Hughes and
-               Jeffrey Dean},
-  title     = {Google's Multilingual Neural Machine Translation System: Enabling
-               Zero-Shot Translation},
-  journal   = {Transactions of the Association for Computational Linguistics},
-  volume    = {5},
-  pages     = {339--351},
-  year      = {2017}
+  journal   = {CoRR},
+  volume    = {abs/1611.04798},
+  year      = {2016}
 }
 @inproceedings{DBLP:conf/coling/BlackwoodBW18,
  author    = {Graeme W. Blackwood and
@@ -10318,13 +10013,6 @@ author    = {Zhuang Liu and
  publisher = {Conference on Empirical Methods in Natural Language Processing},
  year      = {2019}
 }
-
-@inproceedings{2019Consistency,
-  title={Consistency by Agreement in Zero-Shot Neural Machine Translation},
-  author={Al-Shedivat, Maruan  and  Parikh, Ankur },
-  publisher={Proceedings of the 2019 Conference of the North},
-  year={2019},
-}
 @article{DBLP:journals/corr/abs-1903-07091,
  author    = {Naveen Arivazhagan and
               Ankur Bapna and
@@ -10421,15 +10109,6 @@ author    = {Zhuang Liu and
  publisher = {Annual Meeting of the Association for Computational Linguistics},
  year      = {2009}
 }
-@inproceedings{DBLP:conf/eacl/LapataSM17,
-  author    = {Jonathan Mallinson and
-               Rico Sennrich and
-               Mirella Lapata},
-  title     = {Paraphrasing Revisited with Neural Machine Translation},
-  pages     = {881--893},
-  publisher = {European Association of Computational Linguistics},
-  year      = {2017}
-}
 @inproceedings{DBLP:conf/aclnmt/ImamuraFS18,
  author    = {Kenji Imamura and
               Atsushi Fujita and
@@ -10451,21 +10130,6 @@ author    = {Zhuang Liu and
  pages     = {1096--1103},
  publisher = {International Conference on Machine Learning}
 }
-@article{DBLP:journals/ipm/FarhanTAJATT20,
-  author    = {Wael Farhan and
-               Bashar Talafha and
-               Analle Abuammar and
-               Ruba Jaikat and
-               Mahmoud Al-Ayyoub and
-               Ahmad Bisher Tarakji and
-               Anas Toma},
-  title     = {Unsupervised dialectal neural machine translation},
-  journal   = {Inform Process Manag},
-  volume    = {57},
-  number    = {3},
-  pages     = {102181},
-  year      = {2020}
-}
 @inproceedings{DBLP:conf/iclr/LampleCDR18,
  author    = {Guillaume Lample and
               Alexis Conneau and
@@ -10521,13 +10185,6 @@ author    = {Zhuang Liu and
  publisher = {European Association of Computational Linguistics},
  year      = {2017}
 }
-@inproceedings{yasuda2008method,
-  title={Method for building sentence-aligned corpus from wikipedia},
-  author={Yasuda, Keiji and Sumita, Eiichiro},
-  publisher={2008 AAAI Workshop on Wikipedia and Artificial Intelligence},
-  pages={263--268},
-  year={2008}
-}
 @article{2005Improving,
  title={Improving Machine Translation Performance by Exploiting Non-Parallel Corpora},
  author={ Munteanu, Ds  and  Marcu, D },
@@ -10698,54 +10355,6 @@ author    = {Zhuang Liu and
  publisher = {Annual Meeting of the Association for Computational Linguistics},
  year      = {2017}
 }
-@inproceedings{DBLP:conf/naacl/PetersNIGCLZ18,
-  author    = {Matthew E. Peters and
-               Mark Neumann and
-               Mohit Iyyer and
-               Matt Gardner and
-               Christopher Clark and
-               Kenton Lee and
-               Luke Zettlemoyer},
-  title     = {Deep Contextualized Word Representations},
-  pages     = {2227--2237},
-  publisher = {Annual Meeting of the Association for Computational Linguistics},
-  year      = {2018}
-}
-@inproceedings{DBLP:conf/naacl/PetersNIGCLZ18,
-  author    = {Matthew E. Peters and
-               Mark Neumann and
-               Mohit Iyyer and
-               Matt Gardner and
-               Christopher Clark and
-               Kenton Lee and
-               Luke Zettlemoyer},
-  title     = {Deep Contextualized Word Representations},
-  pages     = {2227--2237},
-  publisher = {Annual Meeting of the Association for Computational Linguistics},
-  year      = {2018}
-}
-@inproceedings{DBLP:conf/naacl/PetersNIGCLZ18,
-  author    = {Matthew E. Peters and
-               Mark Neumann and
-               Mohit Iyyer and
-               Matt Gardner and
-               Christopher Clark and
-               Kenton Lee and
-               Luke Zettlemoyer},
-  title     = {Deep Contextualized Word Representations},
-  pages     = {2227--2237},
-  publisher = {Annual Conference of the North American Chapter of the Association for Computational Linguistics},
-  year      = {2018}
-}
-@inproceedings{DBLP:conf/emnlp/ClinchantJN19,
-  author    = {St{\'{e}}phane Clinchant and
-               Kweon Woo Jung and
-               Vassilina Nikoulina},
-  title     = {On the use of {BERT} for Neural Machine Translation},
-  pages     = {108--117},
-  publisher = {Annual Meeting of the Association for Computational Linguistics},
-  year      = {2019}
-}
 @inproceedings{DBLP:conf/emnlp/ImamuraS19,
  author    = {Kenji Imamura and
               Eiichiro Sumita},
@@ -10773,34 +10382,6 @@ author    = {Zhuang Liu and
  volume    = {abs/1908.06259},
  year      = {2019}
 }
-@inproceedings{DBLP:conf/aaai/YangW0Z00020,
-  author    = {Jiacheng Yang and
-               Mingxuan Wang and
-               Hao Zhou and
-               Chengqi Zhao and
-               Weinan Zhang and
-               Yong Yu and
-               Lei Li},
-  title     = {Towards Making the Most of {BERT} in Neural Machine Translation},
-  pages     = {9378--9385},
-  publisher = {AAAI Conference on Artificial Intelligence},
-  year      = {2020}
-}
-@inproceedings{DBLP:conf/acl/LewisLGGMLSZ20,
-  author    = {Mike Lewis and
-               Yinhan Liu and
-               Naman Goyal and
-               Marjan Ghazvininejad and
-               Abdelrahman Mohamed and
-               Omer Levy and
-               Veselin Stoyanov and
-               Luke Zettlemoyer},
-  title     = {{BART:} Denoising Sequence-to-Sequence Pre-training for Natural Language
-               Generation, Translation, and Comprehension},
-  pages     = {7871--7880},
-  publisher = {Annual Meeting of the Association for Computational Linguistics},
-  year      = {2020}
-}
 @inproceedings{DBLP:conf/emnlp/QiYGLDCZ020,
  author    = {Weizhen Qi and
               Yu Yan and
@@ -10941,13 +10522,6 @@ author    = {Zhuang Liu and
  publisher = {Annual Meeting of the Association for Computational Linguistics},
  year      = {2013}
 }
-@article{joty2015using,
-  title={Using joint models for domain adaptation in statistical machine translation},
-  author={Joty, Nadir Durrani Hassan Sajjad Shafiq and Vogel, Ahmed Abdelali Stephan},
-  journal={Proceedings of MT Summit XV},
-  pages={117},
-  year={2015}
-}
 @article{imamura2016multi,
  title={Multi-domain adaptation for statistical machine translation based on feature augmentation},
  author={Imamura, Kenji and Sumita, Eiichiro},
@@ -11025,17 +10599,6 @@ author    = {Zhuang Liu and
  publisher = {Annual Meeting of the Association for Computational Linguistics},
  year      = {2010}
 }
-@inproceedings{DBLP:conf/acl/DuhNST13,
-  author    = {Kevin Duh and
-               Graham Neubig and
-               Katsuhito Sudoh and
-               Hajime Tsukada},
-  title     = {Adaptation Data Selection using Neural Language Models: Experiments
-               in Machine Translation},
-  pages     = {678--683},
-  publisher = {Annual Meeting of the Association for Computational Linguistics},
-  year      = {2013}
-}
 @inproceedings{DBLP:conf/coling/HoangS14,
  author    = {Cuong Hoang and
               Khalil Sima'an},
@@ -11110,33 +10673,6 @@ author    = {Zhuang Liu and
  publisher = {Annual Meeting of the Association for Computational Linguistics},
  year      = {2012}
 }
-@inproceedings{DBLP:conf/wmt/FosterK07,
-  author    = {George F. Foster and
-               Roland Kuhn},
-  title     = {Mixture-Model Adaptation for {SMT}},
-  pages     = {128--135},
-  publisher = {Annual Meeting of the Association for Computational Linguistics},
-  year      = {2007}
-}
-@inproceedings{DBLP:conf/emnlp/MatsoukasRZ09,
-  author    = {Spyros Matsoukas and
-               Antti-Veikko I. Rosti and
-               Bing Zhang},
-  title     = {Discriminative Corpus Weight Estimation for Machine Translation},
-  pages     = {708--717},
-  publisher = {Conference on Empirical Methods in Natural Language Processing},
-  year      = {2009}
-}
-@inproceedings{DBLP:conf/emnlp/FosterGK10,
-  author    = {George F. Foster and
-               Cyril Goutte and
-               Roland Kuhn},
-  title     = {Discriminative Instance Weighting for Domain Adaptation in Statistical
-               Machine Translation},
-  pages     = {451--459},
-  publisher = {Annual Meeting of the Association for Computational Linguistics},
-  year      = {2010}
-}
 @inproceedings{DBLP:conf/wmt/ShahBS10,
  author    = {Kashif Shah and
               Lo{\"{\i}}c Barrault and
@@ -11152,24 +10688,6 @@ author    = {Zhuang Liu and
  publisher={International Workshop on Spoken Language Translation},
  year={2011}
 }
-@inproceedings{DBLP:conf/lrec/EckVW04,
-  author    = {Matthias Eck and
-               Stephan Vogel and
-               Alex Waibel},
-  title     = {Language Model Adaptation for Statistical Machine Translation Based
-               on Information Retrieval},
-  publisher = {European Language Resources Association},
-  year      = {2004}
-}
-@inproceedings{DBLP:conf/coling/ZhaoEV04,
-  author    = {Bing Zhao and
-               Matthias Eck and
-               Stephan Vogel},
-  title     = {Language Model Adaptation for Statistical Machine Translation via
-               Structured Query Models},
-  publisher = {International Conference on Computational Linguistics},
-  year      = {2004}
-}
 @article{moore2010intelligent,
  title = {Intelligent selection of language model training data},
  author = {Moore, Robert C and Lewis, Will},
@@ -11311,12 +10829,6 @@ author    = {Zhuang Liu and
  publisher = {Annual Meeting of the Association for Computational Linguistics},
  year      = {2019}
 }
-@inproceedings{2019Non,
-  title={Non-Parametric Adaptation for Neural Machine Translation},
-  author={Bapna, Ankur  and  Firat, Orhan },
-  booktitle={Conference of the North},
-  year={2019},
-}
 @inproceedings{britz2017effective,
  title={Effective domain mixing for neural machine translation},
  author={Britz, Denny and Le, Quoc and Pryzant, Reid},
@@ -11472,17 +10984,6 @@ author    = {Zhuang Liu and
  publisher = {Annual Conference of the North American Chapter of the Association for Computational Linguistics},
  year      = {2019}
 }
-@article{DBLP:journals/corr/abs-1906-03129,
-  author    = {Shen Yan and
-               Leonard Dahlmann and
-               Pavel Petrushkov and
-               Sanjika Hewavitharana and
-               Shahram Khadivi},
-  title     = {Word-based Domain Adaptation for Neural Machine Translation},
-  journal   = {CoRR},
-  volume    = {abs/1906.03129},
-  year      = {2019}
-}
 @inproceedings{DBLP:conf/emnlp/WeesBM17,
  author    = {Marlies van der Wees and
               Arianna Bisazza and
@@ -11514,15 +11015,6 @@ author    = {Zhuang Liu and
  publisher = {Annual Meeting of the Association for Computational Linguistics},
  year      = {2017}
 }
-@inproceedings{DBLP:conf/emnlp/DomhanH17,
-  author    = {Tobias Domhan and
-               Felix Hieber},
-  title     = {Using Target-side Monolingual Data for Neural Machine Translation
-               through Multi-task Learning},
-  pages     = {1500--1505},
-  publisher = {Conference on Empirical Methods in Natural Language Processing},
-  year      = {2017}
-}
 @inproceedings{DBLP:conf/naacl/BapnaF19,
  author    = {Ankur Bapna and
               Orhan Firat},
@@ -11531,8 +11023,6 @@ author    = {Zhuang Liu and
  publisher = {Annual Conference of the North American Chapter of the Association for Computational Linguistics},
  year      = {2019}
 }
-
-
 @article{DBLP:journals/corr/abs-2010-11125,
  author    = {Angela Fan and
               Shruti Bhosale and
@@ -11570,7 +11060,6 @@ author    = {Zhuang Liu and
  publisher = {Conference on Empirical Methods in Natural Language Processing},
  year      = {2020}
 }
-
 @inproceedings{DBLP:conf/emnlp/ZhuH07,
  author    = {Jingbo Zhu and
               Eduard H. Hovy},
@@ -11604,8 +11093,6 @@ author    = {Zhuang Liu and
  publisher = {AAAI Conference on Artificial Intelligence},
  year      = {2018}
 }
-
-
 @inproceedings{DBLP:conf/wmt/SunJXHWW19,
  author    = {Meng Sun and
               Bojian Jiang and
@@ -11618,8 +11105,6 @@ author    = {Zhuang Liu and
  publisher = {Annual Meeting of the Association for Computational Linguistics},
  year      = {2019}
 }
-
-
 @inproceedings{DBLP:conf/acl/SuHC19,
  author    = {Shang-Yu Su and
               Chao-Wei Huang and
@@ -11629,8 +11114,6 @@ author    = {Zhuang Liu and
  publisher = {Annual Meeting of the Association for Computational Linguistics},
  year      = {2019}
 }
-
-
 @article{DBLP:journals/ejasmp/RadzikowskiNWY19,
  author    = {Kacper Radzikowski and
               Robert Nowak and
@@ -11670,6 +11153,155 @@ author    = {Zhuang Liu and
  pages     = {170248--170260},
  year      = {2020}
 }
+@inproceedings{DBLP:conf/acl/MarieRF20,
+  author    = {Benjamin Marie and
+               Raphael Rubino and
+               Atsushi Fujita},
+  title     = {Tagged Back-translation Revisited: Why Does It Really Work?},
+  pages     = {5990--5997},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2020}
+}
+@inproceedings{DBLP:conf/nips/YangDYCSL19,
+  author    = {Zhilin Yang and
+               Zihang Dai and
+               Yiming Yang and
+               Jaime G. Carbonell and
+               Ruslan Salakhutdinov and
+               Quoc V. Le},
+  title     = {XLNet: Generalized Autoregressive Pretraining for Language Understanding},
+  pages     = {5754--5764},
+  year      = {2019}
+}
+@article{lewis2019bart,
+  title={Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension},
+  author={Lewis, Mike and Liu, Yinhan and Goyal, Naman and Ghazvininejad, Marjan and Mohamed, Abdelrahman and Levy, Omer and Stoyanov, Ves and Zettlemoyer, Luke},
+  journal={arXiv preprint arXiv:1910.13461},
+  year={2019}
+}
+@inproceedings{DBLP:conf/iclr/LanCGGSS20,
+  author    = {Zhenzhong Lan and
+               Mingda Chen and
+               Sebastian Goodman and
+               Kevin Gimpel and
+               Piyush Sharma and
+               Radu Soricut},
+  title     = {{ALBERT:} {A} Lite {BERT} for Self-supervised Learning of Language
+               Representations},
+  publisher = {International Conference on Learning Representations},
+  year      = {2020}
+}
+@inproceedings{DBLP:conf/acl/ZhangHLJSL19,
+  author    = {Zhengyan Zhang and
+               Xu Han and
+               Zhiyuan Liu and
+               Xin Jiang and
+               Maosong Sun and
+               Qun Liu},
+  title     = {{ERNIE:} Enhanced Language Representation with Informative Entities},
+  pages     = {1441--1451},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2019}
+}
+@inproceedings{DBLP:conf/emnlp/HuangLDGSJZ19,
+  author    = {Haoyang Huang and
+               Yaobo Liang and
+               Nan Duan and
+               Ming Gong and
+               Linjun Shou and
+               Daxin Jiang and
+               Ming Zhou},
+  title     = {Unicoder: {A} Universal Language Encoder by Pre-training with Multiple
+               Cross-lingual Tasks},
+  pages     = {2485--2494},
+  publisher = {Conference on Empirical Methods in Natural Language Processing},
+  year      = {2019}
+}
+@inproceedings{DBLP:conf/iccv/SunMV0S19,
+  author    = {Chen Sun and
+               Austin Myers and
+               Carl Vondrick and
+               Kevin Murphy and
+               Cordelia Schmid},
+  title     = {VideoBERT: {A} Joint Model for Video and Language Representation Learning},
+  pages     = {7463--7472},
+  publisher = {International Conference on Computer Vision},
+  year      = {2019}
+}
+@article{DBLP:journals/corr/abs-2010-12831,
+  author    = {Liunian Harold Li and
+               Haoxuan You and
+               Zhecan Wang and
+               Alireza Zareian and
+               Shih-Fu Chang and
+               Kai-Wei Chang},
+  title     = {Weakly-supervised VisualBERT: Pre-training without Parallel Images
+               and Captions},
+  journal   = {CoRR},
+  volume    = {abs/2010.12831},
+  year      = {2020}
+}
+@inproceedings{DBLP:conf/nips/LuBPL19,
+  author    = {Jiasen Lu and
+               Dhruv Batra and
+               Devi Parikh and
+               Stefan Lee},
+  title     = {ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations
+               for Vision-and-Language Tasks},
+  publisher = {Annual Conference and Workshop on Neural Information Processing Systems},
+  pages     = {13--23},
+  year      = {2019}
+}
+@inproceedings{DBLP:conf/interspeech/ChuangLLL20,
+  author    = {Yung-Sung Chuang and
+               Chi-Liang Liu and
+               Hung-yi Lee and
+               Lin-Shan Lee},
+  title     = {SpeechBERT: An Audio-and-Text Jointly Learned Language Model for End-to-End
+               Spoken Question Answering},
+  pages     = {4168--4172},
+  publisher = {Annual Conference of the International Speech Communication Association},
+  year      = {2020}
+}
+@inproceedings{DBLP:conf/rep4nlp/PetersRS19,
+  author    = {Matthew E. Peters and
+               Sebastian Ruder and
+               Noah A. Smith},
+  title     = {To Tune or Not to Tune? Adapting Pretrained Representations to Diverse
+               Tasks},
+  pages     = {7--14},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2019}
+}
+@inproceedings{DBLP:conf/cncl/SunQXH19,
+  author    = {Chi Sun and
+               Xipeng Qiu and
+               Yige Xu and
+               Xuanjing Huang},
+  title     = {How to Fine-Tune {BERT} for Text Classification?},
+  volume    = {11856},
+  pages     = {194--206},
+  publisher = {Springer},
+  year      = {2019}
+}
+@inproceedings{shen2020q,
+  title={Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT.},
+  author={Shen, Sheng and Dong, Zhen and Ye, Jiayu and Ma, Linjian and Yao, Zhewei and Gholami, Amir and Mahoney, Michael W and Keutzer, Kurt},
+  booktitle={AAAI Conference on Artificial Intelligence},
+  pages={8815--8821},
+  year={2020}
+}
+@article{DBLP:journals/corr/abs-1910-01108,
+  author    = {Victor Sanh and
+               Lysandre Debut and
+               Julien Chaumond and
+               Thomas Wolf},
+  title     = {DistilBERT, a distilled version of {BERT:} smaller, faster, cheaper
+               and lighter},
+  journal   = {CoRR},
+  volume    = {abs/1910.01108},
+  year      = {2019}
+}
 %%%%% chapter 16------------------------------------------------------
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%