合并分支 'caorunzhe' 到 'zengxin'

Caorunzhe 查看合并请求 !585

合并分支 'caorunzhe' 到 'zengxin'
Caorunzhe 查看合并请求 !585
68afcd15 · zengxin · 35befc90 · 0bdb74d8 · 68afcd15 · 68afcd15
Commit 68afcd15 authored Dec 14, 2020 by zengxin
--- a/Chapter14/chapter14.tex
+++ b/Chapter14/chapter14.tex
@@ -108,13 +108,13 @@
 \parinterval 以上两种推断方式在神经机器翻译中都有应用，对于源语言句子$\seq{x}=\{x_1,x_2,\dots,x_m\}$和目标语句子$\seq{y}=\{y_1,y_2,\dots,y_n\}$，用自左向右的方式可以将翻译概率$\funp{P}(\seq{y}\vert\seq{x})$描述为公式\eqref{eq:14-1}：

 \begin{eqnarray}
-\funp{P}(\seq{y}\vert\seq{x})=\prod_{j=1}^n \funp{P}(y_j\vert\seq{y}_{<j},\seq{x})
+\funp{P}(\seq{y}\vert\seq{x}) &=& \prod_{j=1}^n \funp{P}(y_j\vert\seq{y}_{<j},\seq{x})
 \label{eq:14-1}
 \end{eqnarray}
 \parinterval 而用自右向左的方式可以得到公式\eqref{eq:14-2}：

 \begin{eqnarray}
-\funp{P}(\seq{y}\vert\seq{x})=\prod_{j=1}^n \funp{P}(y_{n+1-j}\vert\seq{y}_{>j},\seq{x})
+\funp{P}(\seq{y}\vert\seq{x}) &=& \prod_{j=1}^n \funp{P}(y_{n+1-j}\vert\seq{y}_{>j},\seq{x})
 \label{eq:14-2}
 \end{eqnarray}
 \parinterval 其中，$\seq{y}_{<j}=\{y_1,y_2,\dots,y_{j-1}\}$，$\seq{y}_{>j}=\{y_{j+1},y_{j+2},\dots,y_n\}$。
@@ -148,7 +148,7 @@
 \item 长度惩罚因子。用译文长度来归一化翻译概率是最常用的方法：对于源语言句子$\seq{x}$和译文句子$\seq{y}$，模型得分$\textrm{score}(\seq{x},\seq{y})$的值会随着译文$\seq{y}$ 的变长而减小，为了避免此现象，可以引入一个长度惩罚函数$\textrm{lp}(\seq{y})$，并定义模型得分如公式\eqref{eq:14-12}所示：

 \begin{eqnarray}
-\textrm{score}(\seq{x},\seq{y})=\frac{\log \funp{P}(\seq{y}\vert\seq{x})}{\textrm{lp}(\seq{y})}
+\textrm{score}(\seq{x},\seq{y}) &=& \frac{\log \funp{P}(\seq{y}\vert\seq{x})}{\textrm{lp}(\seq{y})}
 \label{eq:14-12}
 \end{eqnarray}

@@ -188,7 +188,7 @@ b &=& \omega_{\textrm{high}}\cdot |\seq{x}| \label{eq:14-4}
 \noindent 其中，$\textrm{cp}(\seq{x},\seq{y}) $表示覆盖度模型，它度量了译文对源语言每个单词的覆盖程度。$\textrm{cp}(\seq{x},\seq{y}) $的定义中，$a_{ij}$表示源语言第$i$个位置与目标语第$j$个位置的注意力权重，这样$\sum \limits_{j}^{|\seq{y}|} a_{ij}$就可以用来衡量源语言第$i$个单词被翻译了“多少”，如果它大于1，表明翻译多了；如果小于1，表明翻译少了。公式\eqref{eq:14-6}会惩罚那些欠翻译的翻译假设。覆盖度模型的一种改进形式是\upcite{li-etal-2018-simple}：

 \begin{eqnarray}
-\textrm{cp}(\seq{x},\seq{y}) = \sum_{i=1}^{|\seq{x}|} \log( \textrm{max} ( \sum_{j}^{|\seq{y}|} a_{ij},\beta))
+\textrm{cp}(\seq{x},\seq{y}) &=& \sum_{i=1}^{|\seq{x}|} \log( \textrm{max} ( \sum_{j}^{|\seq{y}|} a_{ij},\beta))
 \label{eq:14-7}
 \end{eqnarray}
 \noindent 公式\eqref{eq:14-7}将公式\eqref{eq:14-6}中的向下截断方式换为了向上截断。这样，模型可以对过翻译（或重复翻译）有更好的建模能力。不过，这个模型需要在开发集上细致地调整$\beta$，也带来了一定的额外工作量。此外，也可以将这种覆盖度单独建模并进行参数化，与翻译模型一同训练\upcite{Mi2016CoverageEM,TuModeling,Kazimi2017CoverageFC}。这样可以得到更加精细的覆盖度模型。
@@ -416,7 +416,7 @@ b &=& \omega_{\textrm{high}}\cdot |\seq{x}| \label{eq:14-4}
 \parinterval 目前主流的神经机器翻译的推断是一种{\small\sffamily\bfseries{自回归翻译}}\index{自回归翻译}（Autoregressive Translation）\index{Autoregressive Translation}过程。所谓自回归是一种描述时间序列生成的方式。对于目标序列$\seq{y}=\{y_1,\dots,y_n\}$，自回归模型假设$j$时刻状态$y_j$的生成依赖于之前的状态$\{y_1,\dots,y_{j-1}\}$，而且$y_j$与$\{y_1,\dots,y_{j-1}\}$构成线性关系，那么生成$y_j$就是自回归的序列生成过程。神经机器翻译借用了这个概念，但是并不要求线性模型。对于输入的源语言序列$\seq{x}=\{x_1,\dots,x_m\}$，用自回归翻译模型生成译文序列$\seq{y}=\{y_1,\dots,y_n\}$的概率可以被定义为：

 \begin{eqnarray}
-\funp{P}(\seq{y}|\seq{x})=\prod_{j=1}^n {\funp{P}(y_j|y_{<j},\seq{x})}
+\funp{P}(\seq{y}|\seq{x}) &=& \prod_{j=1}^n {\funp{P}(y_j|y_{<j},\seq{x})}
 \label{eq:14-8}
 \end{eqnarray}

@@ -425,7 +425,7 @@ b &=& \omega_{\textrm{high}}\cdot |\seq{x}| \label{eq:14-4}
 \parinterval 对于这个问题，研究者也考虑移除翻译的自归回性，进行{\small\sffamily\bfseries{非自回归翻译}}\index{非自回归翻译}（Non-Autoregressive Translation，NAT）\index{Non-Autoregressive Translation}\upcite{Gu2017NonAutoregressiveNM}。一个简单的非自回归翻译模型将问题建模为：

 \begin{eqnarray}
-\funp{P}(\seq{y}|\seq{x})=\prod_{j=1}^n {\funp{P}(y_j|\seq{x})}
+\funp{P}(\seq{y}|\seq{x}) &=& \prod_{j=1}^n {\funp{P}(y_j|\seq{x})}
 \label{eq:14-9}
 \end{eqnarray}

@@ -485,7 +485,7 @@ b &=& \omega_{\textrm{high}}\cdot |\seq{x}| \label{eq:14-4}
 \parinterval 另外，在每个解码器层中还包括额外的位置注意力模块，该模块与Transformer模型的其它部分中使用的多头注意力机制相同，如下：

 \begin{eqnarray}
-\textrm{Attention}(\mathbi{Q},\mathbi{K},\mathbi{V})&=&\textrm{Softmax}(\frac{\mathbi{Q}{\mathbi{K}}^{T}}{\sqrt{d_k}})\cdot \mathbi{V}
+\textrm{Attention}(\mathbi{Q},\mathbi{K},\mathbi{V}) &=& \textrm{Softmax}(\frac{\mathbi{Q}{\mathbi{K}}^{T}}{\sqrt{d_k}})\cdot \mathbi{V}
 \label{eq:14-10}
 \end{eqnarray}

@@ -651,7 +651,7 @@ b &=& \omega_{\textrm{high}}\cdot |\seq{x}| \label{eq:14-4}
 \parinterval 神经机器翻译模型对每个目标端位置$j$的单词分布进行预测，即对于目标语言词汇表中的每个单词$y_j$，需要计算$\funp{P}(y_j | \seq{y}_{<j},\seq{x})$。假设有$K$个神经机器翻译系统，那么每个系统$k$都可以独立的计算这个概率，记为$\funp{P}_{k} (y_j | \seq{y}_{<j},\seq{x})$。于是，可以用公式\eqref{eq:14-11}融合这$K$个系统的预测：

 \begin{eqnarray}
-\funp{P}(y_{j} | \seq{y}_{<j},\seq{x}) = \sum_{k=1}^K \gamma_{k} \cdot \funp{P}_{k} (y_j | \seq{y}_{<j},\seq{x})
+\funp{P}(y_{j} | \seq{y}_{<j},\seq{x}) &=& \sum_{k=1}^K \gamma_{k} \cdot \funp{P}_{k} (y_j | \seq{y}_{<j},\seq{x})
 \label{eq:14-11}
 \end{eqnarray}


--- a/Chapter15/chapter15.tex
+++ b/Chapter15/chapter15.tex
--- a/Chapter16/Figures/figure-example-of-iterative-back-translation.tex
+++ b/Chapter16/Figures/figure-example-of-iterative-back-translation.tex
@@ -42,14 +42,14 @@
 \draw [->,thick]([yshift=-0.75em]node5-1.east)--(remark3.north west);
 \draw [->,thick]([yshift=-0.75em]node6-1.east)--(remark3.south west);

-\node [anchor=south](d1) at ([xshift=-0.7em,yshift=4em]remark1.north){\small{真实数据：}};
+\node [anchor=south](d1) at ([xshift=-0.7em,yshift=5.5em]remark1.north){\small{真实数据：}};
 \node [anchor=west](d2) at ([xshift=2.0em]d1.east){\small{伪数据：}};
 \node [anchor=west](d3) at ([xshift=2.0em]d2.east){\small{额外数据：}};
 \node [anchor=west,fill=green!20,minimum width=1.5em](d1-1) at ([xshift=-0.0em]d1.east){};
 \node [anchor=west,fill=red!20,minimum width=1.5em](d2-1) at ([xshift=-0.0em]d2.east){};
 \node [anchor=west,fill=yellow!20,minimum width=1.5em](d3-1) at ([xshift=-0.0em]d3.east){};
-\node [anchor=south] (d4) at ([xshift=1em]d1.north) {\small{训练：}};
-\node [anchor=south] (d5) at ([xshift=0.5em]d2.north) {\small{推理：}};
+\node [anchor=north] (d4) at ([xshift=1em]d1.south) {\small{训练：}};
+\node [anchor=north] (d5) at ([xshift=0.5em]d2.south) {\small{推理：}};
 \draw [->,thick] ([xshift=0em]d4.east)--([xshift=1.5em]d4.east);
 \draw [->,thick,dashed] ([xshift=0em]d5.east)--([xshift=1.5em]d5.east);


--- a/Chapter16/Figures/figure-schematic-of-the-domain-discriminator.tex
+++ b/Chapter16/Figures/figure-schematic-of-the-domain-discriminator.tex
@@ -4,7 +4,7 @@

 \node [anchor=west,rec,fill=red!20](node2) at ([xshift=2.0em]node1.east){\small{编码器}};
 \node [anchor=west,rec](node3) at ([xshift=3.0em,yshift=2.0em]node2.east){\small{解码器}};
-\node [anchor=west,rec,fill=yellow!20](node4) at ([xshift=3.0em,yshift=-2.0em]node2.east){\small{鉴别器}};
+\node [anchor=west,rec,fill=yellow!20](node4) at ([xshift=3.0em,yshift=-2.0em]node2.east){\small{判别器}};

 \draw [->,thick](node1.east)--(node2.west);
 \draw [->,thick](node2.east)--([xshift=1.5em]node2.east)--([xshift=1.5em,yshift=2.0em]node2.east)--(node3.west);

--- a/Chapter18/Figures/figure-comparison-of-incremental-model-optimization-methods.tex
+++ b/Chapter18/Figures/figure-comparison-of-incremental-model-optimization-methods.tex
+\addtolength{\tabcolsep}{-4pt}
+
 \begin{tabular}{c c c}

 \begin{tikzpicture}
@@ -69,4 +71,6 @@
 \end{scope}
 \end{tikzpicture}

-\end{tabular}
\ No newline at end of file
+\end{tabular}
+
+\addtolength{\tabcolsep}{4pt}
\ No newline at end of file
--- a/Chapter18/chapter18.tex
+++ b/Chapter18/chapter18.tex
@@ -111,38 +111,38 @@
 %----------------------------------------------------------------------------------------

 \section{翻译结果可干预性}
-\parinterval 尽管目前神经机器翻译的质量已经很高，但语言现象是复杂多样的，模型在一些特定场景下仍然存在问题，最典型的一个是句子中术语的翻译。在实际应用中，经常会遇到公司名称、品牌名称、产品名称等专有名词和行业术语，以及不同含义的缩写，比如对于“小牛翻译”这个专有术语，不同的机器翻译系统给出的结果不一样:“Maverick translation”、“Calf translation”、“The mavericks translation”……而它正确的翻译应该为“NiuTrans”。对于这些类似的特殊词汇，大多数机器翻译引擎很难翻译得准确，一方面，因为模型大多是在通用数据集上训练出来的，并不能保证数据集能涵盖所有的语言现象，另一方面，即使是这些术语在训练数据中出现，它们通常也是低频的，模型比较难学到。为了保证翻译的准确性，对模型的翻译结果进行干预是十分有必要的，这些干预措施在比如交互式机器翻译、领域适应等一系列环境中也是很有用的。

-\parinterval 就{\small\bfnew 术语翻译}\index{术语翻译}（Lexically Constrained Translation）\index{Lexically Constrained Translation}而言，在不干预的情况下让模型直接翻译出正确术语是很难的，因为目标术语翻译词很可能是未登录词，因此必须人为提供额外的术语词典，那么我们的目标就是让模型的翻译输出遵守用户提供的术语约束。一个例子如下图所示：
+\parinterval 交互式机器翻译体现了一种用户的行为“干预”机器翻译结果的思想。实际上，在机器翻译出现错误时，人们总是希望用一种直接有效的方式“改变”译文，到达改善翻译质量的目的。比如，如果机器翻译系统可以输出多个候选译文，用户可以在其中挑选最好的译文进行输出。也就是，人干预了译文候选的排序过程。另一个例子是使用{\small\bfnew{翻译记忆}}\index{翻译记忆}（Translation Memory\index{Translation Memory}）改善机器翻译系统的性能。翻译记忆记录了高质量的源语言-目标语言句对，有时也可以被看作是一种先验知识或“记忆”。因此，当进行机器翻译（包括统计机器翻译和神经机器翻译）时，使用翻译记忆指导翻译过程也可以被看作是一种干预手段（{\color{red} 参考文献！SMT和NMT都有，SMT中CL上有个长文，自动化所的，NMT的我记得腾讯应该有，找到后和我确认一下！}）。
+
+
+\parinterval 虽然干预机器翻译系统的方式很多，最常用的还是对源语言特定片段翻译的干预，以期望最终句子的译文中满足某些对片段翻译的约束。这个问题也被称作{\small\bfnew{基于约束的翻译}}\index{基于约束的翻译} （Constraint-based Translation\index{Constraint-based Translation}）。比如，在翻译网页时，需要保持译文中的网页标签与源文一致。另一个典型例子是术语翻译。在实际应用中，经常会遇到公司名称、品牌名称、产品名称等专有名词和行业术语，以及不同含义的缩写，比如，对于“小牛翻译”这个专有术语，不同的机器翻译系统给出的结果不一样:“Maverick translation”、“Calf translation”、“The mavericks translation”…… 而它正确的翻译应该为“NiuTrans”。 对于这些类似的特殊词汇，大多数机器翻译引擎很难翻译得准确。一方面，因为模型大多是在通用数据集上训练出来的，并不能保证数据集能涵盖所有的语言现象。另一方面，即使是这些术语在训练数据中出现，它们通常也是低频的，模型比较难学到。为了保证翻译的准确性，对术语翻译进行干预是十分有必要的，这对领域适应等问题的求解也是非常有意义的。
+
+\parinterval 就{\small\bfnew 术语翻译}\index{术语翻译}（Lexically Constrained Translation）\index{Lexically Constrained Translation}而言，在不干预的情况下让模型直接翻译出正确术语是很难的，因为目标术语翻译词很可能是未登录词，因此必须人为提供额外的术语词典，那么我们的目标就是让模型的翻译输出遵守用户提供的术语约束。这个过程如图\ref{fig:18-2}所示。
 %----------------------------------------------
 \begin{figure}[htp]
 \centering
 \input{./Chapter18/Figures/figure-translation-interfered}
 %\setlength{\abovecaptionskip}{-0.2cm}
-\caption{翻译结果可干预性}
+\caption{翻译结果可干预性（{\color{red} 这个图需要修改！有些乱，等回沈阳找我讨论！}）}
 \label{fig:18-2}
 \end{figure}
 %----------------------------------------------

-\parinterval 在统计机器翻译中，翻译过程是基于符号匹配的概率计算和推导，因此在强制某些词的翻译输出方面比较容易，而神经机器翻译是一个端到端训练的模型，内部基于连续空间的实数向量表示，翻译过程本质上是连续空间元素的一系列映射、组合和计算，因此这种干预存在一定的困难。目前主要有两种解决思路：
+\parinterval 在统计机器翻译中，翻译本质上是由短语和规则构成的推导，因此修改译文比较容易，比如，可以在一个源语言片段所对应的翻译候选集中添加希望得到的译文即可。而神经机器翻译是一个端到端模型，内部基于连续空间的实数向量表示，翻译过程本质上是连续空间中元素的一系列映射、组合和代数运算，因此无法像修改符号系统那样直接修改模型并加入离散化的约束来影响译文生成。目前主要有两种解决思路：

 \begin{itemize}
-\item {\small\bfnew 基于硬约束}。在模型解码过程中按照一定的策略来实施约束，这类方法大部分是在修改束搜索算法以强迫输出必须包含指定的词或者短语\upcite{DBLP:conf/acl/HokampL17,DBLP:conf/naacl/PostV18,DBLP:conf/wmt/ChatterjeeNTFSB17,DBLP:conf/naacl/HaslerGIB18}。
-
-\item {\small\bfnew 基于软约束}。这类方法本质上属于数据增强类的方法，是通过修改神经机器翻译模型的数据和训练过程来实现约束。通常是根据术语词典对源语句进行一定的修改，比如将目标术语编辑到源语中，之后将原始语料库和合成语料库进行混合训练，期望模型能够自动利用术语信息来指导解码，或者是利用占位符来替换源语中的术语，待翻译完成后再进行还原\upcite{DBLP:conf/naacl/SongZYLWZ19,DBLP:conf/acl/DinuMFA19,DBLP:journals/corr/abs-1912-00567,DBLP:conf/ijcai/ChenCWL20}。
-\end{itemize}
-
-\parinterval 基于硬约束的方式是在搜索策略上进行限制，与模型无关，这类方法能保证输出满足约束，但是会影响解码速度。基于软约束的方式是通过构造特定格式的数据让模型训练，从而让模型具有一定的泛化能力，这类方法需要进行译前译后编辑，通常不会影响解码速度，但并不能保证输出能满足约束。
-
-\parinterval 此外，神经机器翻译在应用时通常还需要进行译前译后的处理，译前处理指的是在翻译前对源文进行修改和规范，使之适合机器翻译的特点，从而能生成比较顺畅的译文，提高译文的可读性和准确率。在实际应用时，由于用户输入的源文形式多样，可能会包含比如术语、缩写、数学公式等，有些甚至可能还包含网页标签，因此对源文进行预处理是很有必要的。常见的处理工作包括对原文进行格式转换、标点符号检査、术语编辑、标签识别等，待翻译完成后，则需要对机器译文进行进一步的编辑和修正，从而使其符合使用规范，比如进行标点、格式检查，术语、标签还原等，这些过程通常都是按照设定的处理策略自动完成的。
+\vspace{0.5em}
+\item 强制生成。这种方法并不改变模型，而是在解码过程中按照一定的策略来实施约束，一般是修改束搜索算法以确保输出必须包含指定的词或者短语\upcite{DBLP:conf/acl/HokampL17,DBLP:conf/naacl/PostV18,DBLP:conf/wmt/ChatterjeeNTFSB17,DBLP:conf/naacl/HaslerGIB18}，例如，在获得译文输出后，利用注意力机制获取词对齐，之后通过词对齐对指定部分译文进行强制替换。或者，对包含正确术语翻译的翻译候选进行额外的加分，以确保解码时这样的翻译候选的排名足够靠前。

-\parinterval 另外机器翻译中还有一些常见的干预（以上具体的内容可以参考{\chapterfourteen}），比如：
-\begin{itemize}
-\item 译文长度的控制，由于神经机器翻译模型使用单词概率的乘积表示整个句子的翻译概率，因此它天然倾向生成短译文，解决策略是在推断过程中引入译文长度控制机制，本质上是修改模型译文评分函数使其能够感知到长度信息，从而形成约束，比如引入长度惩罚因子、覆盖度等手段；
+\vspace{0.5em}
+\item 数据增强。这类方法通过修改机器翻译模型的数据和训练过程来实现约束。通常是根据术语词典对源语言句子进行一定的修改，例如，将术语的译文编辑到源语言句子中，之后将原始语料库和合成语料库进行混合训练，期望模型能够自动利用术语信息来指导解码，或者是利用占位符来替换源语中的术语，待翻译完成后再进行还原\upcite{DBLP:conf/naacl/SongZYLWZ19,DBLP:conf/acl/DinuMFA19,DBLP:journals/corr/abs-1912-00567,DBLP:conf/ijcai/ChenCWL20}。

-\item 译文的多样性，神经机器翻译通常会面临$n$-best 输出的译文十分相似的问题，即译文缺乏多样性，这会造成重排序的不准确，而且从人工翻译的角度看，同一个源文的译文应该是多样的，过于相似的译文也无法反映足够多的翻译现象，解决的方法可以从建模和解码出发，比如在引入隐变量来建模或者推断过程中引入额外的模型来惩罚相似的译文等手段。
+\vspace{0.5em}
 \end{itemize}

+\parinterval 强制生成的方法是在搜索策略上进行限制，与模型无关，这类方法能保证输出满足约束，但是会影响翻译速度。数据增强的方法是通过构造特定格式的数据让模型训练，从而让模型具有一定的泛化能力，这类方法需要进行译前译后编辑，通常不会影响翻译速度，但并不能保证输出能满足约束。
+
+\parinterval 此外，机器翻译在应用时通常还需要进行译前译后的处理，译前处理指的是在翻译前对源语言句子进行修改和规范，从而能生成比较顺畅的译文，提高译文的可读性和准确率。在实际应用时，由于用户输入的形式多样，可能会包含比如术语、缩写、数学公式等，有些甚至可能还包含网页标签，因此对源文进行预处理是很有必要的。常见的处理工作包括格式转换、标点符号检査、术语编辑、标签识别等，待翻译完成后，则需要对机器译文进行进一步的编辑和修正，从而使其符合使用规范，比如进行标点、格式检查，术语、标签还原等，这些过程通常都是按照设定的处理策略自动完成的。另外,译文长度的控制、译文多样性的控制等也可以丰富机器翻译系统干预的手段（见{\chapterfourteen}）。

 %----------------------------------------------------------------------------------------
 %    NEW SECTION
@@ -190,13 +190,13 @@
 \begin{itemize}
 \vspace{0.5em}
 \item 对于多语言翻译的场景，使用单模型多语言翻译系统是一种很好的选择（{\chaptersixteen}）。当多个语种的数据量有限、使用频度不高时，这种方法可以很有效地解决翻译需求中长尾部分。例如，一些线上机器翻译服务已经支持超过100种语言的翻译，其中大部分语言之间的翻译需求是相对低频的，因此使用同一个模型进行翻译可以大大节约部署和运维的成本。
-    
+
 \vspace{0.5em}
 \item 使用基于中介语言的翻译也可以有效的解决多语言翻译问题（{\chaptersixteen}）。这种方法同时适合统计机器翻译和神经机器翻译，因此很早就使用在大规模机器翻译部署中。

 \vspace{0.5em}
 \item GPU部署中，由于GPU成本较高，因此可以考虑在单GPU设备上部署多套不同的系统。如果这些系统之间的并发不频繁，翻译延时不会有明显增加。这种多个模型共享一个设备的方法比较适合翻译请求相对低频但是翻译任务又很多样的情况。
-    
+
 \vspace{0.5em}
 \item 机器翻译大规模GPU部署对显存的使用也很严格。由于GPU显存较为有限，因此模型运行的显存消耗也是需要考虑的。一般来说，除了模型压缩和结构优化之外（{\chapterfourteen}和{\chapterfifteen}），也需要对模型的显存分配和使用进行单独的优化。例如，使用显存池来缓解频繁申请和释放显存空间造成的延时。另外，也可以尽可能让同一个显存块保存生命期不重叠的数据，避免重复开辟新的存储空间。图\ref{fig:18-3}展示了一个显存复用的示例。


--- a/bibliography.bib
+++ b/bibliography.bib
@@ -4086,7 +4086,7 @@ year = {2012}
          Joris Pelemans and 
 		  Hugo Van Hamme and 
 		  Patrick Wambacq},
-  publisher={European Association of Computational Linguistics},
+  publisher={Annual Conference of the European Association for Machine Translation},
  year={2017}
 }

@@ -4569,7 +4569,7 @@ author    = {Yoshua Bengio and
               Jozef Mokry and
               Maria Nadejde},
  title     = {Nematus: a Toolkit for Neural Machine Translation},
-  publisher = {European Association of Computational Linguistics},
+  publisher = {Annual Conference of the European Association for Machine Translation},
  pages     = {65--68},
  year      = {2017}
 }
@@ -9644,7 +9644,7 @@ author    = {Zhuang Liu and
 @inproceedings{finding2006adafre,
  author    = {S. F. Adafre and Maarten de Rijke},
  title     = {Finding Similar Sentences across Multiple Languages in Wikipedia },
-  publisher = {European Association of Computational Linguistics},
+  publisher = {Annual Conference of the European Association for Machine Translation},
  year      = {2006}
 }
 @inproceedings{method2008keiji,
@@ -10798,7 +10798,7 @@ author    = {Zhuang Liu and
               Mirella Lapata},
  title     = {Paraphrasing Revisited with Neural Machine Translation},
  pages     = {881--893},
-  publisher = {European Association of Computational Linguistics},
+  publisher = {Annual Conference of the European Association for Machine Translation},
  year      = {2017}
 }
 @article{2005Improving,
@@ -11694,7 +11694,7 @@ author    = {Zhuang Liu and
               Marcello Federico},
  title     = {Neural vs. Phrase-Based Machine Translation in a Multi-Domain Scenario},
  pages     = {280--284},
-  publisher = {European Association of Computational Linguistics},
+  publisher = {Annual Conference of the European Association for Machine Translation},
  year      = {2017}
 }
 @inproceedings{DBLP:conf/aaai/Zhang0LZC18,
@@ -11923,7 +11923,577 @@ author    = {Zhuang Liu and

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 %%%%% chapter 17------------------------------------------------------
-
+@article{DBLP:journals/ac/Bar-Hillel60,
+  author    = {Yehoshua Bar-Hillel},
+  title     = {The Present Status of Automatic Translation of Languages},
+  journal   = {Advances in computers},
+  volume    = {1},
+  pages     = {91--163},
+  year      = {1960}
+}
+@article{DBLP:journals/corr/abs-1901-09115,
+  author    = {Andrei Popescu-Belis},
+  title     = {Context in Neural Machine Translation: {A} Review of Models and Evaluations},
+  journal   = {CoRR},
+  volume    = {abs/1901.09115},
+  year      = {2019}
+}
+@book{jurafsky2000speech,
+  title={Speech \& language processing},
+  author={Jurafsky, Dan},
+  year={2000},
+  publisher={Pearson Education India}
+}
+@inproceedings{DBLP:conf/anlp/MarcuCW00,
+  author    = {Daniel Marcu and
+               Lynn Carlson and
+               Maki Watanabe},
+  title     = {The Automatic Translation of Discourse Structures},
+  pages     = {9--17},
+  publisher = {Applied Natural Language Processing Conference},
+  year      = {2000}
+}
+@inproceedings{foster2010translating,
+  title={Translating structured documents},
+  author={Foster, George and Isabelle, Pierre and Kuhn, Roland},
+  booktitle={Proceedings of AMTA},
+  year={2010}
+}
+@inproceedings{DBLP:conf/eacl/LouisW14,
+  author    = {Annie Louis and
+               Bonnie L. Webber},
+  title     = {Structured and Unstructured Cache Models for {SMT} Domain Adaptation},
+  pages     = {155--163},
+  publisher = {Annual Conference of the European Association for Machine Translation},
+  year      = {2014}
+}
+@inproceedings{DBLP:conf/iwslt/HardmeierF10,
+  author    = {Christian Hardmeier and
+               Marcello Federico},
+  title     = {Modelling pronominal anaphora in statistical machine translation},
+  pages     = {283--289},
+  publisher = {International Workshop on Spoken Language Translation},
+  year      = {2010}
+}
+@inproceedings{DBLP:conf/wmt/NagardK10,
+  author    = {Ronan Le Nagard and
+               Philipp Koehn},
+  title     = {Aiding Pronoun Translation with Co-Reference Resolution},
+  pages     = {252--261},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2010}
+}
+@inproceedings{DBLP:conf/eamt/LuongP16,
+  author    = {Ngoc-Quang Luong and
+               Andrei Popescu-Belis},
+  title     = {A Contextual Language Model to Improve Machine Translation of Pronouns
+               by Re-ranking Translation Hypotheses},
+  pages     = {292--304},
+  publisher = {European Association for Machine Translation},
+  year      = {2016}
+}
+@inproceedings{tiedemann2010context,
+  title={Context adaptation in statistical machine translation using models with exponentially decaying cache},
+  author={Tiedemann, J{\"o}rg},
+  publisher={Domain Adaptation for Natural Language Processing},
+  pages={8--15},
+  year={2010}
+}
+@inproceedings{DBLP:conf/emnlp/GongZZ11,
+  author    = {Zhengxian Gong and
+               Min Zhang and
+               Guodong Zhou},
+  title     = {Cache-based Document-level Statistical Machine Translation},
+  pages     = {909--919},
+  publisher = {Conference on Empirical Methods in Natural Language Processing},
+  year      = {2011}
+}
+@inproceedings{DBLP:conf/ijcai/XiongBZLL13,
+  author    = {Deyi Xiong and
+               Guosheng Ben and
+               Min Zhang and
+               Yajuan Lv and
+               Qun Liu},
+  title     = {Modeling Lexical Cohesion for Document-Level Machine Translation},
+  pages     = {2183--2189},
+  publisher = {	International Joint Conference on Artificial Intelligence},
+  year      = {2013}
+}
+@inproceedings{xiao2011document,
+  title={Document-level consistency verification in machine translation},
+  author={Xiao, Tong and Zhu, Jingbo and Yao, Shujie and Zhang, Hao},
+  booktitle={Machine Translation Summit},
+  volume={13},
+  pages={131--138},
+  year={2011}
+}
+@inproceedings{DBLP:conf/sigdial/MeyerPZC11,
+  author    = {Thomas Meyer and
+               Andrei Popescu-Belis and
+               Sandrine Zufferey and
+               Bruno Cartoni},
+  title     = {Multilingual Annotation and Disambiguation of Discourse Connectives
+               for Machine Translation},
+  pages     = {194--203},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2011}
+}
+@inproceedings{DBLP:conf/hytra/MeyerP12,
+  author    = {Thomas Meyer and
+               Andrei Popescu-Belis},
+  title     = {Using Sense-labeled Discourse Connectives for Statistical Machine
+               Translation},
+  pages     = {129--138},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2012}
+}
+@inproceedings{DBLP:conf/nips/SutskeverVL14,
+  author    = {Ilya Sutskever and
+               Oriol Vinyals and
+               Quoc V. Le},
+  title     = {Sequence to Sequence Learning with Neural Networks},
+  pages     = {3104--3112},
+  year      = {2014},
+  publisher = {Conference and Workshop on Neural Information Processing Systems}
+}
+@inproceedings{DBLP:conf/emnlp/LaubliS018,
+  author    = {Samuel L{\"{a}}ubli and
+               Rico Sennrich and
+               Martin Volk},
+  title     = {Has Machine Translation Achieved Human Parity? {A} Case for Document-level
+               Evaluation},
+  pages     = {4791--4796},
+  publisher = {Conference on Empirical Methods in Natural Language Processing},
+  year      = {2018}
+}
+@article{DBLP:journals/corr/abs-1912-08494,
+  author    = {Sameen Maruf and
+               Fahimeh Saleh and
+               Gholamreza Haffari},
+  title     = {A Survey on Document-level Machine Translation: Methods and Evaluation},
+  journal   = {CoRR},
+  volume    = {abs/1912.08494},
+  year      = {2019}
+}
+@inproceedings{DBLP:conf/discomt/TiedemannS17,
+  author    = {J{\"{o}}rg Tiedemann and
+               Yves Scherrer},
+  title     = {Neural Machine Translation with Extended Context},
+  pages     = {82--92},
+  publisher = {Association for Computational Linguistics},
+  year      = {2017}
+}
+@article{DBLP:journals/corr/abs-1910-07481,
+  author    = {Valentin Mac{\'{e}} and
+               Christophe Servan},
+  title     = {Using Whole Document Context in Neural Machine Translation},
+  journal   = {CoRR},
+  volume    = {abs/1910.07481},
+  year      = {2019}
+}
+@article{DBLP:journals/corr/JeanLFC17,
+  author    = {S{\'{e}}bastien Jean and
+               Stanislas Lauly and
+               Orhan Firat and
+               Kyunghyun Cho},
+  title     = {Does Neural Machine Translation Benefit from Larger Context?},
+  journal   = {CoRR},
+  volume    = {abs/1704.05135},
+  year      = {2017}
+}
+@inproceedings{DBLP:conf/acl/TitovSSV18,
+  author    = {Elena Voita and
+               Pavel Serdyukov and
+               Rico Sennrich and
+               Ivan Titov},
+  title     = {Context-Aware Neural Machine Translation Learns Anaphora Resolution},
+  pages     = {1264--1274},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2018}
+}
+@inproceedings{DBLP:conf/acl/HaffariM18,
+  author    = {Sameen Maruf and
+               Gholamreza Haffari},
+  title     = {Document Context Neural Machine Translation with Memory Networks},
+  pages     = {1275--1284},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2018}
+}
+@inproceedings{DBLP:conf/coling/KuangXLZ18,
+  author    = {Shaohui Kuang and
+               Deyi Xiong and
+               Weihua Luo and
+               Guodong Zhou},
+  title     = {Modeling Coherence for Neural Machine Translation with Dynamic and
+               Topic Caches},
+  pages     = {596--606},
+  publisher = {International Conference on Computational Linguistics},
+  year      = {2018}
+}
+@inproceedings{DBLP:conf/discomt/GarciaCE19,
+  author    = {Eva Mart{\'{\i}}nez Garcia and
+               Carles Creus and
+               Cristina Espa{\~{n}}a-Bonet},
+  title     = {Context-Aware Neural Machine Translation Decoding},
+  pages     = {13--23},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2019}
+}
+@article{DBLP:journals/corr/abs-2010-12827,
+  author    = {Amane Sugiyama and
+               Naoki Yoshinaga},
+  title     = {Context-aware Decoder for Neural Machine Translation using a Target-side
+               Document-Level Language Model},
+  journal   = {CoRR},
+  volume    = {abs/2010.12827},
+  year      = {2020}
+}
+@inproceedings{DBLP:conf/acl/VoitaST19,
+  author    = {Elena Voita and
+               Rico Sennrich and
+               Ivan Titov},
+  title     = {When a Good Translation is Wrong in Context: Context-Aware Machine
+               Translation Improves on Deixis, Ellipsis, and Lexical Cohesion},
+  pages     = {1198--1212},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2019}
+}
+@inproceedings{DBLP:conf/emnlp/VoitaST19,
+  author    = {Elena Voita and
+               Rico Sennrich and
+               Ivan Titov},
+  title     = {Context-Aware Monolingual Repair for Neural Machine Translation},
+  pages     = {877--886},
+  publisher = {Conference on Empirical Methods in Natural Language Processing},
+  year      = {2019}
+}
+@inproceedings{DBLP:conf/discomt/WerlenP17,
+  author    = {Lesly Miculicich Werlen and
+               Andrei Popescu-Belis},
+  title     = {Validation of an Automatic Metric for the Accuracy of Pronoun Translation
+               {(APT)}},
+  pages     = {17--25},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2017}
+}
+@inproceedings{DBLP:conf/emnlp/WongK12,
+  author    = {Billy Tak-Ming Wong and
+               Chunyu Kit},
+  title     = {Extending Machine Translation Evaluation Metrics with Lexical Cohesion
+               to Document Level},
+  pages     = {1060--1068},
+  publisher = {Conference on Empirical Methods in Natural Language Processing},
+  year      = {2012}
+}
+@inproceedings{DBLP:conf/discomt/GongZZ15,
+  author    = {Zhengxian Gong and
+               Min Zhang and
+               Guodong Zhou},
+  title     = {Document-Level Machine Translation Evaluation with Gist Consistency
+               and Text Cohesion},
+  pages     = {33--40},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2015}
+}
+@inproceedings{DBLP:conf/cicling/HajlaouiP13,
+  author    = {Najeh Hajlaoui and
+               Andrei Popescu-Belis},
+  title     = {Assessing the Accuracy of Discourse Connective Translations: Validation
+               of an Automatic Metric},
+  volume    = {7817},
+  pages     = {236--247},
+  publisher = {Springer},
+  year      = {2013}
+}
+@inproceedings{DBLP:conf/wmt/RiosMS18,
+  author    = {Annette Rios and
+               Mathias M{\"{u}}ller and
+               Rico Sennrich},
+  title     = {The Word Sense Disambiguation Test Suite at {WMT18}},
+  pages     = {588--596},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2018}
+}
+@inproceedings{DBLP:conf/naacl/BawdenSBH18,
+  author    = {Rachel Bawden and
+               Rico Sennrich and
+               Alexandra Birch and
+               Barry Haddow},
+  title     = {Evaluating Discourse Phenomena in Neural Machine Translation},
+  pages     = {1304--1313},
+  publisher = {Annual Conference of the North American Chapter of the Association for Computational Linguistics},
+  year      = {2018}
+}
+@inproceedings{DBLP:conf/wmt/MullerRVS18,
+  author    = {Mathias M{\"{u}}ller and
+               Annette Rios and
+               Elena Voita and
+               Rico Sennrich},
+  title     = {A Large-Scale Test Set for the Evaluation of Context-Aware Pronoun
+               Translation in Neural Machine Translation},
+  pages     = {61--72},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2018}
+}
+@inproceedings{DBLP:conf/iclr/KitaevKL20,
+  author    = {Nikita Kitaev and
+               Lukasz Kaiser and
+               Anselm Levskaya},
+  title     = {Reformer: The Efficient Transformer},
+  publisher = {International Conference on Learning Representations},
+  year      = {2020}
+}
+@inproceedings{agrawal2018contextual,
+  title={Contextual handling in neural machine translation: Look behind, ahead and on both sides},
+  author={Agrawal, Ruchit Rajeshkumar and Turchi, Marco and Negri, Matteo},
+  booktitle={Annual Conference of the European Association for Machine Translation},
+  pages={11--20},
+  year={2018}
+}
+@inproceedings{DBLP:conf/emnlp/WerlenRPH18,
+  author    = {Lesly Miculicich Werlen and
+               Dhananjay Ram and
+               Nikolaos Pappas and
+               James Henderson},
+  title     = {Document-Level Neural Machine Translation with Hierarchical Attention
+               Networks},
+  pages     = {2947--2954},
+  publisher = {Conference on Empirical Methods in Natural Language Processing},
+  year      = {2018}
+}
+@inproceedings{DBLP:conf/naacl/MarufMH19,
+  author    = {Sameen Maruf and
+               Andr{\'{e}} F. T. Martins and
+               Gholamreza Haffari},
+  title     = {Selective Attention for Context-aware Neural Machine Translation},
+  pages     = {3092--3102},
+  publisher = {Annual Conference of the North American Chapter of the Association for Computational Linguistics},
+  year      = {2019}
+}
+@inproceedings{DBLP:conf/emnlp/TanZXZ19,
+  author    = {Xin Tan and
+               Longyin Zhang and
+               Deyi Xiong and
+               Guodong Zhou},
+  title     = {Hierarchical Modeling of Global Context for Document-Level Neural
+               Machine Translation},
+  pages     = {1576--1585},
+  publisher = {Conference on Empirical Methods in Natural Language Processing},
+  year      = {2019}
+}
+@inproceedings{DBLP:conf/emnlp/YangZMGFZ19,
+  author    = {Zhengxin Yang and
+               Jinchao Zhang and
+               Fandong Meng and
+               Shuhao Gu and
+               Yang Feng and
+               Jie Zhou},
+  title     = {Enhancing Context Modeling with a Query-Guided Capsule Network for
+               Document-level Translation},
+  pages     = {1527--1537},
+  publisher = {Conference on Empirical Methods in Natural Language Processing},
+  year      = {2019}
+}
+@inproceedings{DBLP:conf/ijcai/ZhengYHCB20,
+  author    = {Zaixiang Zheng and
+               Xiang Yue and
+               Shujian Huang and
+               Jiajun Chen and
+               Alexandra Birch},
+  title     = {Towards Making the Most of Context in Neural Machine Translation},
+  pages     = {3983--3989},
+  publisher = {International Joint Conference on Artificial Intelligence},
+  year      = {2020}
+}
+@article{DBLP:journals/tacl/TuLSZ18,
+  author    = {Zhaopeng Tu and
+               Yang Liu and
+               Shuming Shi and
+               Tong Zhang},
+  title     = {Learning to Remember Translation History with a Continuous Cache},
+  publisher = {Transactions of the Association for Computational Linguistics},
+  volume    = {6},
+  pages     = {407--420},
+  year      = {2018}
+}
+@inproceedings{DBLP:conf/discomt/ScherrerTL19,
+  author    = {Yves Scherrer and
+               J{\"{o}}rg Tiedemann and
+               Sharid Lo{\'{a}}iciga},
+  title     = {Analysing concatenation approaches to document-level {NMT} in two
+               different domains},
+  pages     = {51--61},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2019}
+}
+@inproceedings{DBLP:conf/wmt/GonzalesMS17,
+  author    = {Annette Rios Gonzales and
+               Laura Mascarell and
+               Rico Sennrich},
+  title     = {Improving Word Sense Disambiguation in Neural Machine Translation
+               with Sense Embeddings},
+  pages     = {11--19},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2017}
+}
+@inproceedings{DBLP:conf/acl/LiLWJXZLL20,
+  author    = {Bei Li and
+               Hui Liu and
+               Ziyang Wang and
+               Yufan Jiang and
+               Tong Xiao and
+               Jingbo Zhu and
+               Tongran Liu and
+               Changliang Li},
+  title     = {Does Multi-Encoder Help? {A} Case Study on Context-Aware Neural Machine
+               Translation},
+  pages     = {3512--3518},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2020}
+}
+@inproceedings{DBLP:conf/discomt/KimTN19,
+  author    = {Yunsu Kim and
+               Duc Thanh Tran and
+               Hermann Ney},
+  title     = {When and Why is Document-level Context Useful in Neural Machine Translation?},
+  pages     = {24--34},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2019}
+}
+@inproceedings{DBLP:conf/discomt/SugiyamaY19,
+  author    = {Amane Sugiyama and
+               Naoki Yoshinaga},
+  title     = {Data augmentation using back-translation for context-aware neural
+               machine translation},
+  pages     = {35--44},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2019}
+}
+@inproceedings{DBLP:conf/pacling/YamagishiK19,
+  author    = {Hayahide Yamagishi and
+               Mamoru Komachi},
+  title     = {Improving Context-Aware Neural Machine Translation with Target-Side
+               Context},
+  volume    = {1215},
+  pages     = {112--122},
+  publisher = {Springer},
+  year      = {2019}
+}
+@inproceedings{DBLP:conf/emnlp/ZhangLSZXZL18,
+  author    = {Jiacheng Zhang and
+               Huanbo Luan and
+               Maosong Sun and
+               Feifei Zhai and
+               Jingfang Xu and
+               Min Zhang and
+               Yang Liu},
+  title     = {Improving the Transformer Translation Model with Document-Level Context},
+  pages     = {533--542},
+  publisher = {Conference on Empirical Methods in Natural Language Processing},
+  year      = {2018}
+}
+@inproceedings{DBLP:conf/coling/KuangX18,
+  author    = {Shaohui Kuang and
+               Deyi Xiong},
+  title     = {Fusing Recency into Neural Machine Translation with an Inter-Sentence
+               Gate Model},
+  pages     = {607--617},
+  publisher = {International Conference on Computational Linguistics},
+  year      = {2018}
+}
+@inproceedings{DBLP:conf/emnlp/WangTWL17,
+  author    = {Longyue Wang and
+               Zhaopeng Tu and
+               Andy Way and
+               Qun Liu},
+  title     = {Exploiting Cross-Sentence Context for Neural Machine Translation},
+  pages     = {2826--2831},
+  publisher = {Conference on Empirical Methods in Natural Language Processing},
+  year      = {2017}
+}
+@inproceedings{DBLP:conf/aaai/XiongH0W19,
+  author    = {Hao Xiong and
+               Zhongjun He and
+               Hua Wu and
+               Haifeng Wang},
+  title     = {Modeling Coherence for Discourse Neural Machine Translation},
+  pages     = {7338--7345},
+  publisher = {{AAAI} Press},
+  year      = {2019}
+}
+@article{DBLP:journals/tacl/YuSSLKBD20,
+  author    = {Lei Yu and
+               Laurent Sartran and
+               Wojciech Stokowiec and
+               Wang Ling and
+               Lingpeng Kong and
+               Phil Blunsom and
+               Chris Dyer},
+  title     = {Better Document-Level Machine Translation with Bayes' Rule},
+  journal   = {Transactions of the Association for Computational Linguistics},
+  volume    = {8},
+  pages     = {346--360},
+  year      = {2020}
+}
+@article{DBLP:journals/corr/abs-1903-04715,
+  author    = {S{\'{e}}bastien Jean and
+               Kyunghyun Cho},
+  title     = {Context-Aware Learning for Neural Machine Translation},
+  journal   = {CoRR},
+  volume    = {abs/1903.04715},
+  year      = {2019}
+}
+@inproceedings{DBLP:conf/acl/SaundersSB20,
+  author    = {Danielle Saunders and
+               Felix Stahlberg and
+               Bill Byrne},
+  title     = {Using Context in Neural Machine Translation Training Objectives},
+  pages     = {7764--7770},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2020}
+}
+@inproceedings{DBLP:conf/mtsummit/StojanovskiF19,
+  author    = {Dario Stojanovski and
+               Alexander M. Fraser},
+  title     = {Improving Anaphora Resolution in Neural Machine Translation Using
+               Curriculum Learning},
+  pages     = {140--150},
+  publisher = {Annual Conference of the European Association for Machine Translation},
+  year      = {2019}
+}
+@article{DBLP:journals/corr/abs-1911-03110,
+  author    = {Liangyou Li and
+               Xin Jiang and
+               Qun Liu},
+  title     = {Pretrained Language Models for Document-Level Neural Machine Translation},
+  publisher = {CoRR},
+  volume    = {abs/1911.03110},
+  year      = {2019}
+}
+@article{DBLP:journals/tacl/LiuGGLEGLZ20,
+  author    = {Yinhan Liu and
+               Jiatao Gu and
+               Naman Goyal and
+               Xian Li and
+               Sergey Edunov and
+               Marjan Ghazvininejad and
+               Mike Lewis and
+               Luke Zettlemoyer},
+  title     = {Multilingual Denoising Pre-training for Neural Machine Translation},
+  journal   = {Transactions of the Association for Computational Linguistics},
+  volume    = {8},
+  pages     = {726--742},
+  year      = {2020}
+}
+@inproceedings{DBLP:conf/wmt/MarufMH18,
+  author    = {Sameen Maruf and
+               Andr{\'{e}} F. T. Martins and
+               Gholamreza Haffari},
+  title     = {Contextual Neural Model for Translating Bilingual Multi-Speaker Conversations},
+  pages     = {101--112},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2018}
+}
 %%%%% chapter 17------------------------------------------------------
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

@@ -12344,7 +12914,7 @@ author    = {Zhuang Liu and
               Jozef Mokry and
               Maria Nadejde},
  title     = {Nematus: a Toolkit for Neural Machine Translation},
-  publisher = {European Association of Computational Linguistics},
+  publisher = {Annual Conference of the European Association for Machine Translation},
  pages     = {65--68},
  year      = {2017}
 }