10 11 appendenix

34080625 · zengxin · 88515850 · 34080625 · 34080625 · 34080625
Commit 34080625 authored Nov 29, 2020 by zengxin
--- a/Chapter10/chapter10.tex
+++ b/Chapter10/chapter10.tex
@@ -925,14 +925,14 @@ a (\mathbi{s},\mathbi{h}) =  \left\{ \begin{array}{ll}
 %----------------------------------------------------------------------------------------
 \subsection{训练}

-\parinterval 在基于梯度的方法中，模型参数可以通过损失函数$L$对于参数的梯度进行不断更新。对于第$\textrm{step}$步参数更新，首先进行神经网络的前向计算，之后进行反向计算，并得到所有参数的梯度信息，再使用下面的规则进行参数更新：
+\parinterval 在基于梯度的方法中，模型参数可以通过损失函数$L$对参数的梯度进行不断更新。对于第$\textrm{step}$步参数更新，首先进行神经网络的前向计算，之后进行反向计算，并得到所有参数的梯度信息，再使用下面的规则进行参数更新：

 \begin{eqnarray}
 \mathbi{w}_{\textrm{step}+1} = \mathbi{w}_{\textrm{step}} - \alpha \cdot \frac{ \partial L(\mathbi{w}_{\textrm{step}})} {\partial \mathbi{w}_{\textrm{step}} }
 \label{eq:10-30}
 \end{eqnarray}

-\noindent 其中，$\mathbi{w}_{\textrm{step}}$表示更新前的模型参数，$\mathbi{w}_{\textrm{step}+1}$表示更新后的模型参数，$L(\mathbi{w}_{\textrm{step}})$表示模型相对于$\mathbi{w}_{\textrm{step}}$ 的损失，$\frac{\partial L(\mathbi{w}_{\textrm{step}})} {\partial \mathbi{w}_{\textrm{step}} }$表示损失函数的梯度，$\alpha$是更新的步进值。也就是说，给定一定量的训练数据，不断执行公式\eqref{eq:10-30}的过程。反复使用训练数据，直至模型参数达到收敛或者损失函数不再变化。通常，把公式的一次执行称为“一步”更新/训练，把访问完所有样本的训练称为“一轮”训练。
+\noindent 其中，$\mathbi{w}_{\textrm{step}}$表示更新前的模型参数，$\mathbi{w}_{\textrm{step}+1}$表示更新后的模型参数，$L(\mathbi{w}_{\textrm{step}})$表示模型相对于$\mathbi{w}_{\textrm{step}}$ 的损失，$\frac{\partial L(\mathbi{w}_{\textrm{step}})} {\partial \mathbi{w}_{\textrm{step}} }$表示损失函数的梯度，$\alpha$是更新的步长。也就是说，给定一定量的训练数据，不断执行公式\eqref{eq:10-30}的过程。反复使用训练数据，直至模型参数达到收敛或者损失函数不再变化。通常，把公式的一次执行称为“一步”更新/训练，把访问完所有样本的训练称为“一轮”训练。

 \parinterval 将公式\eqref{eq:10-30}应用于神经机器翻译有几个基本问题需要考虑：1）损失函数的选择；2）参数初始化的策略，也就是如何设置$\mathbi{w}_0$；3）优化策略和学习率调整策略；4）训练加速。下面对这些问题进行讨论。

@@ -1190,7 +1190,7 @@ L(\mathbi{Y},\widehat{\mathbi{Y}}) = \sum_{j=1}^n L_{\textrm{ce}}(\mathbi{y}_j,\
 \subsubsection{2. 束搜索}
 \vspace{0.5em}

-\parinterval 束搜索是一种启发式图搜索算法。相比于全搜索，它可以减少搜索所占用的空间和时间，在每一步扩展的时候，剪掉一些质量比较差的结点，保留下一些质量较高的结点。具体到机器翻译任务，对于每一个目标语言位置，束搜索选择了概率最大的前$K$个单词进行扩展（其中$k$叫做束宽度，或简称为束宽）。如图\ref{fig:10-34}所示，假设\{$y_1, y_2,..., y_n$\}表示生成的目标语言序列，且$k=3$，则束搜索的具体过程为：在预测第一个位置时，可以通过模型得到$y_1$的概率分布，选取概率最大的前3个单词作为候选结果（假设分别为“have”, “has”, “it”）。在预测第二个位置的单词时，模型针对已经得到的三个候选结果（“have”, “has”, “it”）计算第二个单词的概率分布。因为$y_2$对应$|V|$种可能，总共可以得到$3 \times |V|$种结果。然后从中选取使序列概率$\funp{P}(y_2,y_1| \seq{{x}})$最大的前三个$y_2$作为新的输出结果，这样便得到了前两个位置的top-3译文。在预测其他位置时也是如此，不断重复此过程直到推断结束。可以看到，束搜索的搜索空间大小与束宽度有关，也就是：束宽度越大，搜索空间越大，更有可能搜索到质量更高的译文，但同时搜索会更慢。束宽度等于3，意味着每次只考虑三个最有可能的结果，贪婪搜索实际上便是束宽度为1的情况。在神经机器翻译系统实现中，一般束宽度设置在4～8之间。
+\parinterval 束搜索是一种启发式图搜索算法。相比于全搜索，它可以减少搜索所占用的空间和时间，在每一步扩展的时候，剪掉一些质量比较差的结点，保留下一些质量较高的结点。具体到机器翻译任务，对于每一个目标语言位置，束搜索选择了概率最大的前$k$个单词进行扩展（其中$k$叫做束宽度，或简称为束宽）。如图\ref{fig:10-34}所示，假设\{$y_1, y_2,..., y_n$\}表示生成的目标语言序列，且$k=3$，则束搜索的具体过程为：在预测第一个位置时，可以通过模型得到$y_1$的概率分布，选取概率最大的前3个单词作为候选结果（假设分别为“have”, “has”, “it”）。在预测第二个位置的单词时，模型针对已经得到的三个候选结果（“have”, “has”, “it”）计算第二个单词的概率分布。因为$y_2$对应$|V|$种可能，总共可以得到$3 \times |V|$种结果。然后从中选取使序列概率$\funp{P}(y_2,y_1| \seq{{x}})$最大的前三个$y_2$作为新的输出结果，这样便得到了前两个位置的top-3译文。在预测其他位置时也是如此，不断重复此过程直到推断结束。可以看到，束搜索的搜索空间大小与束宽度有关，也就是：束宽度越大，搜索空间越大，更有可能搜索到质量更高的译文，但同时搜索会更慢。束宽度等于3，意味着每次只考虑三个最有可能的结果，贪婪搜索实际上便是束宽度为1的情况。在神经机器翻译系统实现中，一般束宽度设置在4～8之间。

 %----------------------------------------------
 \begin{figure}[htp]
@@ -1245,7 +1245,7 @@ L(\mathbi{Y},\widehat{\mathbi{Y}}) = \sum_{j=1}^n L_{\textrm{ce}}(\mathbi{y}_j,\
 %    NEW SECTION
 %----------------------------------------------------------------------------------------
 \sectionnewpage
-\section{小节及拓展阅读}
+\section{小结及拓展阅读}

 \parinterval 神经机器翻译是近几年的热门方向。无论是前沿性的技术探索，还是面向应用落地的系统研发，神经机器翻译已经成为当下最好的选择之一。研究人员对神经机器翻译的热情使得这个领域得到了快速的发展。本章作为神经机器翻译的入门章节，对神经机器翻译的建模思想和基础框架进行了描述。同时，对常用的神经机器翻译架构\ \dash \ 循环神经网络进行了讨论与分析。


--- a/Chapter11/Figures/figure-single-glu.tex
+++ b/Chapter11/Figures/figure-single-glu.tex
@@ -64,8 +64,8 @@ $\otimes$： & 按位乘运算 \\
 	\draw[-latex,thick] (c2.east) -- ([xshift=0.4cm]c2.east); 
 	
 	\node[inner sep=0pt, font=\tiny] at (0.75cm, -0.4cm) {$\mathbi{x}$};
-	\node[inner sep=0pt, font=\tiny] at ([yshift=-0.8cm]a.south) {$\mathbi{B}=\mathbi{x} * \mathbi{V} + \mathbi{b}_{\mathbi{W}}$};
-	\node[inner sep=0pt, font=\tiny] at ([yshift=-0.8cm]b.south) {$\mathbi{A}=\mathbi{x} * \mathbi{W} + \mathbi{b}_{\mathbi{V}}$};
+	\node[inner sep=0pt, font=\tiny] at ([yshift=-0.8cm]a.south) {$\mathbi{A}=\mathbi{x} * \mathbi{W} + \mathbi{b}_{\mathbi{W}}$};
+	\node[inner sep=0pt, font=\tiny] at ([yshift=-0.8cm]b.south) {$\mathbi{B}=\mathbi{x} * \mathbi{V} + \mathbi{b}_{\mathbi{V}}$};
 	\node[inner sep=0pt, font=\tiny] at (8.2cm, -0.4cm) {$\mathbi{y}=\mathbi{A} \otimes \sigma(\mathbi{B})$};
 	
 \end{tikzpicture}
\ No newline at end of file
--- a/Chapter11/chapter11.tex
+++ b/Chapter11/chapter11.tex
@@ -579,7 +579,7 @@
 %    NEW SUB-SECTION
 %----------------------------------------------------------------------------------------

-\section{小节及拓展阅读}
+\section{小结及拓展阅读}

 \parinterval 卷积是一种高效的神经网络结构，在图像、语音处理等领域取得了令人瞩目的成绩。本章介绍了卷积的概念及其特性，并对池化、填充等操作进行了讨论。本章介绍了具有高并行计算能力的机器翻译范式，即基于卷积神经网络的编码器-解码器框架。其在机器翻译任务上表现出色，并大幅度缩短了模型的训练周期。除了基础部分，本章还针对卷积计算进行了延伸，内容涉及逐通道卷积、逐点卷积、轻量卷积和动态卷积等。除了上述提及的内容，卷积神经网络及其变种在文本分类、命名实体识别、关系分类、事件抽取等其他自然语言处理任务上也有许多应用\upcite{Kim2014ConvolutionalNN,2011Natural,DBLP:conf/cncl/ZhouZXQBX17,DBLP:conf/acl/ChenXLZ015,DBLP:conf/coling/ZengLLZZ14}。


--- a/ChapterAppend/chapterappend.tex
+++ b/ChapterAppend/chapterappend.tex
@@ -26,6 +26,110 @@
 \begin{appendices}
 \chapter{附录A}
 \label{appendix-A}
+\parinterval  从实践的角度，机器翻译的发展主要可以归功于两方面的推动作用：开源系统和评测。开源系统通过代码共享的方式使得最新的研究成果可以快速传播，同时实验结果可以复现。而评测比赛，使得各个研究组织的成果可以进行科学的对比，共同推动机器翻译的发展与进步。此外，开源项目也促进了不同团队之间的协作，让研究人员在同一个平台上集中力量攻关。
+
+%----------------------------------------------------------------------------------------
+%    NEW SECTION
+%----------------------------------------------------------------------------------------
+
+\section{统计机器翻译开源系统}
+
+\begin{itemize}
+\vspace{0.5em}
+\item NiuTrans.SMT。NiuTrans\upcite{Tong2012NiuTrans}是由东北大学自然语言处理实验室自主研发的统计机器翻译系统，该系统可支持基于短语的模型、基于层次短语的模型以及基于句法的模型。由于使用C++ 语言开发，所以该系统运行时间快，所占存储空间少。系统中内嵌有$n$-gram语言模型，故无需使用其他的系统即可对完成语言建模。网址：\url{http://opensource.niutrans.com/smt/index.html}
+\vspace{0.5em}
+\item Moses。Moses\upcite{Koehn2007Moses}是统计机器翻译时代最著名的系统之一，（主要）由爱丁堡大学的机器翻译团队开发。最新的Moses系统支持很多的功能，例如，它既支持基于短语的模型，也支持基于句法的模型。Moses 提供因子化翻译模型（Factored Translation Model），因此该模型可以很容易地对不同层次的信息进行建模。此外，它允许将混淆网络和字格作为输入，可缓解系统的1-best输出中的错误。Moses还提供了很多有用的脚本和工具，被机器翻译研究者广泛使用。网址：\url{http://www.statmt.org/moses/}
+\vspace{0.5em}
+\item Joshua。Joshua\upcite{Li2010Joshua}是由约翰霍普金斯大学的语言和语音处理中心开发的层次短语翻译系统。由于Joshua是由Java语言开发，所以它在不同的平台上运行或开发时具有良好的可扩展性和可移植性。Joshua也是使用非常广泛的开源机器翻译系统之一。网址：\url{https://cwiki.apache.org/confluence/display/JOSHUA/}
+\vspace{0.5em}
+\item SilkRoad。SilkRoad是由五个国内机构（中科院计算所、中科院软件所、中科院自动化所、厦门大学和哈尔滨工业大学）联合开发的基于短语的统计机器翻译系统。该系统是中国乃至亚洲地区第一个开源的统计机器翻译系统。SilkRoad支持多种解码器和规则提取模块，这样可以组合成不同的系统，提供多样的选择。网址：\url{http://www.nlp.org.cn/project/project.php?projid=14}
+\vspace{0.5em}
+\item SAMT。SAMT\upcite{zollmann2007the}是由卡内基梅隆大学机器翻译团队开发的语法增强的统计机器翻译系统。SAMT在解码的时候使用目标树来生成翻译规则，而不严格遵守目标语言的语法。SAMT 的一个亮点是它提供了简单但高效的方式在机器翻译中使用句法信息。由于SAMT在hadoop中实现，它可受益于大数据集的分布式处理。网址：\url{http://www.cs.cmu.edu/zollmann/samt/}
+\vspace{0.5em}
+\item HiFST。HiFST\upcite{iglesias2009hierarchical}是剑桥大学开发的统计机器翻译系统。该系统完全基于有限状态自动机实现，因此非常适合对搜索空间进行有效的表示。网址：\url{http://ucam-smt.github.io/}
+\vspace{0.5em}
+\item cdec。cdec\upcite{dyer2010cdec}是一个强大的解码器，是由Chris Dyer 和他的合作者们一起开发。cdec的主要功能是它使用了翻译模型的一个统一的内部表示，并为结构预测问题的各种模型和算法提供了实现框架。所以，cdec也可以被用来做一个对齐系统或者一个更通用的学习框架。此外，由于使用C++语言编写，cdec的运行速度较快。网址：\url{http://cdec-decoder.org/index.php?title=MainPage}
+\vspace{0.5em}
+\item Phrasal。Phrasal\upcite{Cer2010Phrasal}是由斯坦福大学自然语言处理小组开发的系统。除了传统的基于短语的模型，Phrasal还支持基于非层次短语的模型，这种模型将基于短语的翻译延伸到非连续的短语翻译，增加了模型的泛化能力。网址：\url{http://nlp.stanford.edu/phrasal/}
+\vspace{0.5em}
+\item Jane。Jane\upcite{vilar2012jane}是一个基于短语和基于层次短语的机器翻译系统，由亚琛工业大学的人类语言技术与模式识别小组开发。Jane提供了系统融合模块，因此可以非常方便的对多个系统进行融合。网址：\url{https://www-i6.informatik.rwth-aachen.de/jane/}
+\vspace{0.5em}
+\item GIZA++。GIZA++\upcite{och2003systematic}是Franz Och研发的用于训练IBM模型1-5和HMM单词对齐模型的工具包。在早期，GIZA++是所有统计机器翻译系统中词对齐的标配工具。网址：\url{https://github.com/moses-smt/giza-pp}
+\vspace{0.5em}
+\item FastAlign。FastAlign\upcite{DBLP:conf/naacl/DyerCS13}是一个快速，无监督的词对齐工具，由卡内基梅隆大学开发。网址：\url{https://github.com/clab/fast\_align}
+\vspace{0.5em}
+\end{itemize}
+
+%----------------------------------------------------------------------------------------
+%    NEW SECTION
+%----------------------------------------------------------------------------------------
+\section{神经机器翻译开源系统}
+
+\begin{itemize}
+\vspace{0.5em}
+\item GroundHog。GroundHog\upcite{bahdanau2014neural}基于Theano\upcite{al2016theano}框架，由蒙特利尔大学LISA 实验室使用Python语言编写的一个框架，旨在提供灵活而高效的方式来实现复杂的循环神经网络模型。它提供了包括LSTM在内的多种模型。Bahdanau等人在此框架上又编写了GroundHog神经机器翻译系统。该系统也作为了很多论文的基线系统。网址：\url{https://github.com/lisa-groundhog/GroundHog}
+\vspace{0.5em}
+\item Nematus。Nematus\upcite{DBLP:journals/corr/SennrichFCBHHJL17}是英国爱丁堡大学开发的，基于Theano框架的神经机器翻译系统。该系统使用GRU作为隐层单元，支持多层网络。Nematus 编码端有正向和反向的编码方式，可以同时提取源语句子中的上下文信息。该系统的一个优点是，它可以支持输入端有多个特征的输入（例如词的词性等）。网址：\url{https://github.com/EdinburghNLP/nematus}
+\vspace{0.5em}
+\item ZophRNN。ZophRNN\upcite{zoph2016simple}是由南加州大学的Barret Zoph 等人使用C++语言开发的系统。Zoph既可以训练序列表示模型（如语言模型），也可以训练序列到序列的模型（如神经机器翻译模型）。当训练神经机器翻译系统时，ZophRNN也支持多源输入。网址：\url{https://github.com/isi-nlp/Zoph\_RNN}
+\vspace{0.5em}
+\item Fairseq。Fairseq\upcite{Ottfairseq}是由Facebook开发的，基于PyTorch框架的用以解决序列到序列问题的工具包，其中包括基于卷积神经网络、基于循环神经网络、基于Transformer的模型等。Fairseq是当今使用最广泛的神经机器翻译开源系统之一。网址：\url{https://github.com/facebookresearch/fairseq}
+\vspace{0.5em}
+\item Tensor2Tensor。Tensor2Tensor\upcite{Vaswani2018Tensor2TensorFN}是由谷歌推出的，基于TensorFlow框架的开源系统。该系统基于Transformer模型，因此可以支持大多数序列到序列任务。得益于Transformer 的网络结构，系统的训练速度较快。现在，Tensor2Tensor也是机器翻译领域广泛使用的开源系统之一。网址：\url{https://github.com/tensorflow/tensor2tensor}
+\vspace{0.5em}
+\item OpenNMT。OpenNMT\upcite{KleinOpenNMT}系统是由哈佛大学自然语言处理研究组开源的，基于Torch框架的神经机器翻译系统。OpenNMT系统的早期版本使用Lua 语言编写，现在也扩展到了TensorFlow和PyTorch，设计简单易用，易于扩展，同时保持效率和翻译精度。网址：\url{https://github.com/OpenNMT/OpenNMT}
+\vspace{0.5em}
+\item 斯坦福神经机器翻译开源代码库。斯坦福大学自然语言处理组（Stanford NLP）发布了一篇教程，介绍了该研究组在神经机器翻译上的研究信息，同时实现了多种翻译模型\upcite{luong2016acl_hybrid}。 网址：\url{https://nlp.stanford.edu/projects/nmt/}
+\vspace{0.5em}
+\item THUMT。清华大学NLP团队实现的神经机器翻译系统，支持Transformer等模型\upcite{ZhangTHUMT}。该系统主要基于TensorFlow和Theano实现，其中Theano版本包含了RNNsearch模型，训练方式包括MLE （Maximum Likelihood Estimate）, MRT（Minimum Risk Training）, SST（Semi-Supervised Training）。TensorFlow 版本实现了Seq2Seq, RNNsearch, Transformer三种基本模型。网址：\url{https://github.com/THUNLP-MT/THUMT}
+\vspace{0.5em}
+\item NiuTrans.NMT。由小牛翻译团队基于NiuTensor实现的神经机器翻译系统。支持循环神经网络、Transformer等结构，并支持语言建模、序列标注、机器翻译等任务。支持机器翻译GPU与CPU 训练及解码。其小巧易用，为开发人员提供快速二次开发基础。此外，NiuTrans.NMT已经得到了大规模应用，形成了支持304种语言翻译的小牛翻译系统。网址：\url{http://opensource.niutrans.com/niutensor/index.html}
+\vspace{0.5em}
+\item MARIAN。主要由微软翻译团队搭建\upcite{JunczysMarian}，其使用C++实现的用于GPU/CPU训练和解码的引擎，支持多GPU训练和批量解码，最小限度依赖第三方库，静态编译一次之后，复制其二进制文件就能在其他平台使用。网址：\url{https://marian-nmt.github.io/}
+\vspace{0.5em}
+\item Sockeye。由Awslabs开发的神经机器翻译框架\upcite{hieber2017sockeye}。其中支持RNNsearch、Transformer、CNN等翻译模型，同时提供了从图片翻译到文字的模块以及WMT 德英新闻翻译、领域适应任务、多语言零资源翻译任务的教程。网址：\url{https://awslabs.github.io/sockeye/}
+\vspace{0.5em}
+\item CytonMT。由NICT开发的一种用C++实现的神经机器翻译开源工具包\upcite{WangCytonMT}。主要支持Transformer模型，并支持一些常用的训练方法以及解码方法。网址：\url{https://github.com/arthurxlw/cytonMt}
+\vspace{0.5em}
+\item OpenSeq2Seq。由NVIDIA团队开发的\upcite{DBLP:journals/corr/abs-1805-10387}基于TensorFlow的模块化架构，用于序列到序列的模型，允许从可用组件中组装新模型，支持混合精度训练，利用NVIDIA Volta Turing GPU中的Tensor核心，基于Horovod的快速分布式训练，支持多GPU，多节点多模式。网址：\url{https://nvidia.github.io/OpenSeq2Seq/html/index.html}
+\vspace{0.5em}
+\item NMTPyTorch。由勒芒大学语言实验室发布的基于序列到序列框架的神经网络翻译系统\upcite{nmtpy2017}，NMTPyTorch的核心部分依赖于Numpy，PyTorch和tqdm。其允许训练各种端到端神经体系结构，包括但不限于神经机器翻译、图像字幕和自动语音识别系统。网址：\url{https://github.com/lium-lst/nmtpytorch}
+\vspace{0.5em}
+\end{itemize}
+
+
+%----------------------------------------------------------------------------------------
+%    NEW SECTION
+%----------------------------------------------------------------------------------------
+\section{公开评测任务}
+\parinterval 机器翻译相关评测主要有两种组织形式，一种是由政府及国家相关机构组织，权威性强。如由美国国家标准技术研究所组织的NIST评测、日本国家科学咨询系统中心主办的NACSIS Test Collections for IR（NTCIR）PatentMT、日本科学振兴机构（Japan Science and Technology Agency，简称JST）等组织联合举办的Workshop on Asian Translation（WAT）以及国内由中文信息学会主办的全国机器翻译大会（China Conference on Machine Translation，简称CCMT）；另一种是由相关学术机构组织，具有领域针对性的特点，如倾向新闻领域的Conference on Machine Translation（WMT）以及面向口语的International Workshop on Spoken Language Translation（IWSLT）。下面将针对上述评测进行简要介绍。
+
+\begin{itemize}
+\vspace{0.5em}
+\item CCMT（全国机器翻译大会），前身为CWMT（全国机器翻译研讨会）是国内机器翻译领域的旗舰会议，自2005年起已经组织多次机器翻译评测，对国内机器翻译相关技术的发展产生了深远影响。该评测主要针对汉语、英语以及国内的少数民族语言（蒙古语、藏语、维吾尔语等）进行评测，领域包括新闻、口语、政府文件等，不同语言方向对应的领域也有所不同。评价方式不同届略有不同，主要采用自动评价的方式，自CWMT\ 2013起则针对某些领域增设人工评价。自动评价的指标一般包括BLEU-SBP、BLEU-NIST、TER、METEOR、NIST、GTM、mWER、mPER 以及ICT 等，其中以BLEU-SBP 为主，汉语为目标语的翻译采用基于字符的评价方式，面向英语的翻译采用基于词的评价方式。每年该评测吸引国内外近数十家企业及科研机构参赛，业内认可度极高。关于CCMT的更多信息可参考中文信息学会机器翻译专业委员会相关页面：\url{http://sc.cipsc.org.cn/mt/index.php/CWMT.html}。
+\vspace{0.5em}
+\item WMT由Special Interest Group for Machine Translation（SIGMT）主办，会议自2006年起每年召开一次，是一个涉及机器翻译多种任务的综合性会议，包括多领域翻译评测任务、质量评价任务以及其他与机器翻译的相关任务（如文档对齐评测等）。现在WMT已经成为机器翻译领域的旗舰评测会议，很多研究工作都以WMT评测结果作为基准。WMT评测涉及的语言范围较广，包括英语、德语、芬兰语、捷克语、罗马尼亚语等十多种语言，翻译方向一般以英语为核心，探索英语与其他语言之间的翻译性能，领域包括新闻、信息技术、生物医学。最近，也增加了无指导机器翻译等热门问题。WMT在评价方面类似于CCMT，也采用人工评价与自动评价相结合的方式，自动评价的指标一般为BLEU、TER 等。此外，WMT公开了所有评测数据，因此也经常被机器翻译相关人员所使用。更多WMT的机器翻译评测相关信息可参考SIGMT官网：\url{http://www.sigmt.org/}。
+\vspace{0.5em}
+\item NIST机器翻译评测开始于2001年，是早期机器翻译公开评测中颇具代表性的任务，现在WMT和CCMT很多任务的设置也大量参考了当年NIST评测的内容。NIST评测由美国国家标准技术研究所主办，作为美国国防高级计划署（DARPA）中TIDES计划的重要组成部分。早期，NIST评测主要评价阿拉伯语和汉语等语言到英语的翻译效果，评价方法一般采用人工评价与自动评价相结合的方式。人工评价采用5分制评价。自动评价使用多种方式，包括BLEU，METEOR，TER以及HyTER。此外NIST从2016 年起开始对稀缺语言资源技术进行评估，其中机器翻译作为其重要组成部分共同参与评测，评测指标主要为BLEU。除对机器翻译系统进行评测之外，NIST在2008 和2010年对于机器翻译的自动评价方法（MetricsMaTr）也进行了评估，以鼓励更多研究人员对现有评价方法进行改进或提出更加贴合人工评价的方法。同时NIST评测所提供的数据集由于数据质量较高受到众多科研人员喜爱，如MT04，MT06等（汉英）平行语料经常被科研人员在实验中使用。不过，近几年NIST评测已经停止。更多NIST的机器翻译评测相关信息可参考官网：\url{https://www.nist.gov/programs-projects/machine-translation}。
+\vspace{0.5em}
+\item 从2004年开始举办的IWSLT也是颇具特色的机器翻译评测，它主要关注口语相关的机器翻译任务，测试数据包括TED talks的多语言字幕以及QED 教育讲座影片字幕等，语言涉及英语、法语、德语、捷克语、汉语、阿拉伯语等众多语言。此外在IWSLT 2016 中还加入了对于日常对话的翻译评测，尝试将微软Skype中一种语言的对话翻译成其他语言。评价方式采用自动评价的模式，评价标准和WMT类似，一般为BLEU 等指标。另外，IWSLT除了对文本到文本的翻译评测外，还有自动语音识别以及语音转另一种语言的文本的评测。更多IWSLT的机器翻译评测相关信息可参考IWSLT\ 2019官网：\url{https://workshop2019.iwslt.org/}。
+\vspace{0.5em}
+\item 日本举办的机器翻译评测WAT是亚洲范围内的重要评测之一，由日本科学振兴机构（JST）、情报通信研究机构（NICT）等多家机构共同组织，旨在为亚洲各国之间交流融合提供便宜之处。语言方向主要包括亚洲主流语言（汉语、韩语、印地语等）以及英语对日语的翻译，领域丰富多样，包括学术论文、专利、新闻、食谱等。评价方式包括自动评价（BLEU、RIBES以及AMFM 等）以及人工评价，其特点在于对于测试语料以段落为单位进行评价，考察其上下文关联的翻译效果。更多WAT的机器翻译评测相关信息可参考官网：\url{http://lotus.kuee.kyoto-u.ac.jp/WAT/}。
+\vspace{0.5em}
+\item NTCIR计划是由日本国家科学咨询系统中心策划主办的，旨在建立一个用在自然语言处理以及信息检索相关任务上的日文标准测试集。在NTCIR-9和NTCIR-10中开设的Patent Machine Translation（PatentMT）任务主要针对专利领域进行翻译测试，其目的在于促进机器翻译在专利领域的发展和应用。在NTCIR-9中，评测方式采取人工评价与自动评价相结合，以人工评价为主导。人工评价主要根据准确度和流畅度进行评估，自动评价采用BLEU、NIST等方式进行。NTCIR-10评价方式在此基础上增加了专利审查评估、时间评估以及多语种评估，分别考察机器翻译系统在专利领域翻译的实用性、耗时情况以及不同语种的翻译效果等。更多NTCIR评测相关信息可参考官网：\url{http://research.nii.ac.jp/ntcir/index-en.html}。
+\vspace{0.5em}
+\end{itemize}
+
+\parinterval 以上评测数据大多可以从评测网站上下载，此外部分数据也可以从LDC（Lingu-istic Data Consortium）上申请，网址为\url{https://www.ldc.upenn.edu/}。ELRA（Euro-pean Language Resources Association）上也有一些免费的语料库供研究使用，其官网为\url{http://www.elra.info/}。从机器翻译发展的角度看，这些评测任务给相关研究提供了基准数据集，使得不同的系统都可以在同一个环境下进行比较和分析，进而建立了机器翻译研究所需的实验基础。此外，公开评测也使得研究者可以第一时间了解机器翻译研究的最新成果，比如，有多篇ACL会议最佳论文的灵感就来自当年参加机器翻译评测任务的系统。
+
+\end{appendices}
+%----------------------------------------------------------------------------------------
+%	CHAPTER  APPENDIX B
+%----------------------------------------------------------------------------------------
+
+\begin{appendices}
+\chapter{附录B}
+\label{appendix-B}
 \parinterval 在构建机器翻译系统的过程中，数据是必不可少的，尤其是现在主流的神经机器翻译系统，系统的性能往往受限于语料库规模和质量。所幸的是，随着语料库语言学的发展，一些主流语种的相关语料资源已经十分丰富。

 \parinterval 为了方便读者进行相关研究，我们汇总了几个常用的基准数据集，这些数据集已经在机器翻译领域中被广泛使用，有很多之前的相关工作可以进行复现和对比。同时，我们收集了一下常用的平行语料，方便读者进行一些探索。
@@ -161,12 +265,12 @@
 \end{appendices}

 %----------------------------------------------------------------------------------------
-%	CHAPTER  APPENDIX B
+%	CHAPTER  APPENDIX C
 %----------------------------------------------------------------------------------------

 \begin{appendices}
-\chapter{附录B}
-\label{appendix-B}
+\chapter{附录C}
+\label{appendix-C}

 %----------------------------------------------------------------------------------------
 %    NEW SECTION

--- a/bibliography.bib
+++ b/bibliography.bib
@@ -4,11 +4,12 @@
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 %%%%% chapter 1------------------------------------------------------

-@book{慧立彦宗1983大慈恩寺三藏法师传,
-  title={大慈恩寺三藏法师传},
-  author={慧立彦宗},
-  publisher={中华书局},
-  year={1983},
+@book{慧立2000大慈恩寺三藏法師傳,
+  title={大慈恩寺三藏法師傳},
+  author={慧立 and 彦悰 and 道宣},
+  volume={2},
+  year={2000},
+  publisher={中华书局}
 }

 @book{2019cns,
@@ -3853,16 +3854,11 @@ year = {2012}
 %%%%% chapter 9------------------------------------------------------
 @article{brown1992class,
  title={Class-based n-gram models of natural language},
-  author={Brown and
-              Peter F and
-              Desouza and
-              Peter V and
-              Mercer amd
-              Robert L
-              and Pietra and
-              Vincent J Della
-              and Lai and
-              Jenifer C},
+  author={Peter F. Brown and
+               Vincent J. Della Pietra and
+               Peter V. De Souza and
+               Jennifer C. Lai and
+               Robert L. Mercer},
  journal={Computational linguistics},
  volume={18},
  number={4},
@@ -3872,10 +3868,8 @@ year = {2012}

 @inproceedings{mikolov2012context,
  title={Context dependent recurrent neural network language model},
-  author={Mikolov and
-            Tomas and
-            Zweig and
-            Geoffrey},
+  author={Tomas Mikolov and
+               Geoffrey Zweig},
  publisher={IEEE Spoken Language Technology Workshop},
  pages={234--239},
  year={2012}
@@ -3883,38 +3877,28 @@ year = {2012}

 @article{zaremba2014recurrent,
  title={Recurrent Neural Network Regularization},
-  author={Zaremba and
-             Wojciech and
-             Sutskever and
-             Ilya and
-             Vinyals and
-             Oriol},
+  author={Wojciech Zaremba and
+               Ilya Sutskever and
+               Oriol Vinyals},
  journal={arXiv: Neural and Evolutionary Computing},
  year={2014}
 }

 @article{zilly2016recurrent,
  title={Recurrent Highway Networks},
-  author={Zilly and
-            Julian and
-            Srivastava and
-            Rupesh Kumar and
-            Koutnik and
-            Jan and
-            Schmidhuber and
-            Jurgen},
+  author={Julian G. Zilly and
+               Rupesh Kumar Srivastava and
+               Jan Koutn{\'{\i}}k and
+               J{\"{u}}rgen Schmidhuber},
  journal={International Conference on Machine Learning},
  year={2016}
 }

 @article{merity2017regularizing,
  title={Regularizing and optimizing LSTM language models},
-  author={Merity and
-             tephen and
-             Keskar and
-             Nitish Shirish and
-             Socher and
-             Richard},
+  author={Stephen Merity and
+               Nitish Shirish Keskar and
+               Richard Socher},
  journal={International Conference on Learning Representations},
  year={2017}
 }
@@ -3992,7 +3976,7 @@ year = {2012}
 @article{Ba2016LayerN,
  author    = {Lei Jimmy Ba and
               Jamie Ryan Kiros and
-               Geoffrey E. Hinton},
+               Geoffrey Hinton},
  title     = {Layer Normalization},
  journal   = {CoRR},
  volume    = {abs/1607.06450},
@@ -4017,7 +4001,7 @@ year = {2012}
               Satoshi Nakamura},
  title     = {Incorporating Discrete Translation Lexicons into Neural Machine Translation},
  pages     = {1557--1567},
-  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  publisher = {Conference on Empirical Methods in Natural Language Processing},
  year      = {2016}
 }

@@ -4061,7 +4045,7 @@ year = {2012}
  year={2011}
 }
 @inproceedings{mccann2017learned,
-  author    = {Bryan McCann and
+  author    = {Bryan Mccann and
               James Bradbury and
               Caiming Xiong and
               Richard Socher},
@@ -4080,15 +4064,15 @@ year = {2012}
 		  Matt Gardner and 
 		  Christopher Clark and 
 		  Kenton Lee and 
-		  L. Zettlemoyer},
-  publisher={arXiv preprint arXiv:1802.05365},
+		  Luke Zettlemoyer},
+  publisher={Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics},
  year={2018}
 }


 @article{Graves2013HybridSR,
  title={Hybrid speech recognition with Deep Bidirectional LSTM},
-  author={A. Graves and 
+  author={Alex Graves and 
          Navdeep Jaitly and 
 		  Abdel-rahman Mohamed},
  publisher={IEEE Workshop on Automatic Speech Recognition and Understanding},
@@ -4100,7 +4084,7 @@ year = {2012}
  title={Character-Word LSTM Language Models},
  author={Lyan Verwimp and 
          Joris Pelemans and 
-		  H. V. Hamme and 
+		  Hugo Van Hamme and 
 		  Patrick Wambacq},
  publisher={European Association of Computational Linguistics},
  year={2017}
@@ -4111,7 +4095,7 @@ year = {2012}
               Kyunghyun Cho},
  title     = {Gated Word-Character Recurrent Language Model},
  pages     = {1992--1997},
-  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  publisher = {Conference on Empirical Methods in Natural Language Processing},
  year      = {2016}
 }
 @inproceedings{Hwang2017CharacterlevelLM,
@@ -4145,7 +4129,7 @@ year = {2012}
  title={Larger-Context Language Modelling},
  author={Tian Wang and 
          Kyunghyun Cho},
-  journal={arXiv preprint arXiv:1511.03729},
+  journal={Annual Meeting of the Association for Computational Linguistics},
  year={2015}
 }
 @article{Adel2015SyntacticAS,
@@ -4173,7 +4157,7 @@ year = {2012}
 }
 @inproceedings{Pham2016ConvolutionalNN,
  title={Convolutional Neural Network Language Models},
-  author={Ngoc-Quan Pham and 
+  author={Ngoc-quan Pham and 
          German Kruszewski and 
 		  Gemma Boleda},
  publisher={Conference on Empirical Methods in Natural Language Processing},
@@ -4267,9 +4251,9 @@ year = {2012}
 @inproceedings{Bastings2017GraphCE,
  title={Graph Convolutional Encoders for Syntax-aware Neural Machine Translation},
  author={Jasmijn Bastings and 
-          Ivan Titov and W. Aziz and 
+          Ivan Titov and Wilker Aziz and 
 		  Diego Marcheggiani and 
-		  K. Sima'an},
+		  Khalil Sima'an},
  publisher={Conference on Empirical Methods in Natural Language Processing},
  year={2017}
 }
@@ -4726,8 +4710,8 @@ author    = {Yoshua Bengio and
               Quoc V. Le and
               Ruslan Salakhutdinov},
  title     = {Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context},
-  journal   = {CoRR},
-  volume    = {abs/1901.02860},
+  journal   = {Annual Meeting of the Association for Computational Linguistics},
+  pages     = {2978--2988},
  year      = {2019}
 }
 @inproceedings{li-etal-2019-word,
@@ -4809,7 +4793,7 @@ author    = {Yoshua Bengio and
  year      = {2017}
 }
 @article{Hinton2015Distilling,
-  author    = {Geoffrey E. Hinton and
+  author    = {Geoffrey Hinton and
               Oriol Vinyals and
               Jeffrey Dean},
  title     = {Distilling the Knowledge in a Neural Network},
@@ -4820,7 +4804,7 @@ author    = {Yoshua Bengio and

 @inproceedings{Ott2018ScalingNM,
  title={Scaling Neural Machine Translation},
-  author={Myle Ott and Sergey Edunov and David Grangier and M. Auli},
+  author={Myle Ott and Sergey Edunov and David Grangier and Michael Auli},
  publisher={Annual Meeting of the Association for Computational Linguistics},
  year={2018}
 }
@@ -4841,7 +4825,7 @@ author    = {Yoshua Bengio and
               Alexander M. Rush},
  title     = {Sequence-Level Knowledge Distillation},
  pages     = {1317--1327},
-  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  publisher = {Conference on Empirical Methods in Natural Language Processing},
  year      = {2016}
 }
 @article{Akaike1969autoregressive,
@@ -4877,7 +4861,7 @@ author    = {Yoshua Bengio and
 }
 @inproceedings{He2018LayerWiseCB,
  title={Layer-Wise Coordination between Encoder and Decoder for Neural Machine Translation},
-  author={Tianyu He and X. Tan and Yingce Xia and D. He and T. Qin and Zhibo Chen and T. Liu},
+  author={Tianyu He and Xu Tan and Yingce Xia and Di He and Tao Qin and Zhibo Chen and Tie-Yan Liu},
  publisher={Conference on Neural Information Processing Systems},
  year={2018}
 }
@@ -4955,7 +4939,7 @@ author    = {Yoshua Bengio and
               Deyi Xiong},
  title     = {Encoding Gated Translation Memory into Neural Machine Translation},
  pages     = {3042--3047},
-  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  publisher = {Conference on Empirical Methods in Natural Language Processing},
  year      = {2018}
 }
 @inproceedings{yang-etal-2016-hierarchical,
@@ -4967,7 +4951,7 @@ author    = {Yoshua Bengio and
               Eduard H. Hovy},
  title     = {Hierarchical Attention Networks for Document Classification},
  pages     = {1480--1489},
-  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  publisher = {Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics},
  year      = {2016}
 }
 %%%%% chapter 10------------------------------------------------------
@@ -4982,7 +4966,7 @@ author    = {Yoshua Bengio and
               Jian Sun},
  title     = {Faster {R-CNN:} Towards Real-Time Object Detection with Region Proposal
               Networks},
-  journal   = {Institute of Electrical and Electronics Engineers},
+  journal   = {{IEEE} Transactions on Pattern Analysis and Machine Intelligence},
  volume    = {39},
  number    = {6},
  pages     = {1137--1149},
@@ -4998,10 +4982,9 @@ author    = {Yoshua Bengio and
               Cheng-Yang Fu and
               Alexander C. Berg},
  title     = {{SSD:} Single Shot MultiBox Detector},
-  publisher    = {European Conference on Computer Vision},
+  publisher = {European Conference on Computer Vision},
  volume    = {9905},
  pages     = {21--37},
-  publisher = {Springer},
  year      = {2016}
 }

@@ -5027,7 +5010,7 @@ author    = {Yoshua Bengio and
               Qun Liu},
  title     = {genCNN: {A} Convolutional Architecture for Word Sequence Prediction},
  pages     = {1567--1576},
-  publisher = {The Association for Computer Linguistics},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
  year      = {2015}
 }

@@ -5037,7 +5020,7 @@ author    = {Yoshua Bengio and
               Navdeep Jaitly},
  title     = {Very deep convolutional networks for end-to-end speech recognition},
  pages     = {4845--4849},
-  publisher = {Institute of Electrical and Electronics Engineers},
+  publisher = {International Conference on Acoustics, Speech and Signal Processing},
  year      = {2017}
 }

@@ -5048,7 +5031,7 @@ author    = {Yoshua Bengio and
  title     = {A deep convolutional neural network using heterogeneous pooling for
               trading acoustic invariance with phonetic confusion},
  pages     = {6669--6673},
-  publisher = {Institute of Electrical and Electronics Engineers},
+  publisher = {International Conference on Acoustics, Speech and Signal Processing},
  year      = {2013}
 }

@@ -5057,8 +5040,7 @@ author    = {Yoshua Bengio and
               Hieu Pham and
               Christopher D. Manning},
  title     = {Effective Approaches to Attention-based Neural Machine Translation},
-  publisher = {Conference on Empirical Methods in Natural
-               Language Processing},
+  publisher = {Conference on Empirical Methods in Natural Language Processing},
  pages     = {1412--1421},
  year      = {2015}
 }
@@ -5082,7 +5064,7 @@ author    = {Yoshua Bengio and
  title     = {Leveraging Linguistic Structures for Named Entity Recognition with
               Bidirectional Recursive Neural Networks},
  pages     = {2664--2669},
-  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  publisher = {Conference on Empirical Methods in Natural Language Processing},
  year      = {2017}
 }

@@ -5098,10 +5080,10 @@ author    = {Yoshua Bengio and
  author    = {Emma Strubell and
               Patrick Verga and
               David Belanger and
-               Andrew McCallum},
+               Andrew Mccallum},
  title     = {Fast and Accurate Entity Recognition with Iterated Dilated Convolutions},
  pages     = {2670--2680},
-  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  publisher = {Conference on Empirical Methods in Natural Language Processing},
  year      = {2017}
 }

@@ -5167,7 +5149,7 @@ author    = {Yoshua Bengio and
               Tommi S. Jaakkola},
  title     = {Molding CNNs for text: non-linear, non-consecutive convolutions},
  pages     = {1565--1575},
-  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  publisher = {Conference on Empirical Methods in Natural Language Processing},
  year      = {2015}
 }

@@ -5177,7 +5159,7 @@ author    = {Yoshua Bengio and
  title     = {Effective Use of Word Order for Text Categorization with Convolutional
               Neural Networks},
  pages     = {103--112},
-  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  publisher = {Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics},
  year      = {2015}
 }

@@ -5186,7 +5168,7 @@ author    = {Yoshua Bengio and
               Ralph Grishman},
  title     = {Relation Extraction: Perspective from Convolutional Neural Networks},
  pages     = {39--48},
-  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  publisher = {Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics},
  year      = {2015}
 }

@@ -5204,7 +5186,7 @@ author    = {Yoshua Bengio and
               Barry Haddow and
               Alexandra Birch},
  title     = {Improving Neural Machine Translation Models with Monolingual Data},
-  publisher = {The Association for Computer Linguistics},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
  year      = {2016}
 }

@@ -5219,7 +5201,7 @@ author    = {Yoshua Bengio and

 @article{Waibel1989PhonemeRU,
  title={Phoneme recognition using time-delay neural networks},
-  author={Alexander Waibel and Toshiyuki Hanazawa and Geoffrey Everest Hinton and Kiyohiro Shikano and K.J. Lang},
+  author={Alexander Waibel and Toshiyuki Hanazawa and Geoffrey Hinton and Kiyohiro Shikano and Kevin J. Lang},
  journal={IEEE Transactions on Acoustics, Speech, and Signal Processing},
  year={1989},
  volume={37},
@@ -5228,7 +5210,7 @@ author    = {Yoshua Bengio and

 @article{LeCun1989BackpropagationAT,
  title={Backpropagation Applied to Handwritten Zip Code Recognition},
-  author={Yann LeCun and Bernhard Boser and John Denker and Don Henderson and R.E.Howard and W.E. Hubbard and Larry Jackel},
+  author={Yann Lecun and Bernhard Boser and John Denker and Don Henderson and Richard E.Howard and Wayne E. Hubbard and Larry Jackel},
  journal={Neural Computation},
  year={1989},
  volume={1},
@@ -5242,7 +5224,7 @@ author    = {Yoshua Bengio and
  year={1998},
  volume={86},
  number={11},
-  pages={2278-2324},
+  pages={2278-2324}
 }

 @inproceedings{DBLP:journals/corr/HeZRS15,
@@ -5253,7 +5235,7 @@ author    = {Yoshua Bengio and
  title     = {Deep Residual Learning for Image Recognition},
  publisher = {{IEEE} Conference on Computer Vision and Pattern Recognition},
  pages     = {770--778},
-  year      = {2016},
+  year      = {2016}
 }

 @inproceedings{DBLP:conf/cvpr/HuangLMW17,
@@ -5278,10 +5260,9 @@ author    = {Yoshua Bengio and
 @article{He2020MaskR,
  title={Mask R-CNN},
  author={Kaiming He and Georgia Gkioxari and Piotr Doll{\'a}r and Ross B. Girshick},
-  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
-  year={2020},
-  volume={42},
-  pages={386-397}
+  journal={International Conference on Computer Vision},
+  pages={2961--2969},
+  year={2017}
 }

 @inproceedings{Kalchbrenner2014ACN,
@@ -5316,7 +5297,7 @@ author    = {Yoshua Bengio and
  author    = {C{\'{\i}}cero Nogueira dos Santos and
               Maira Gatti},
  pages     = {69--78},
-  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  publisher = {International Conference on Computational Linguistics},
  year={2014}
 }

@@ -5373,7 +5354,7 @@ author    = {Yoshua Bengio and
 		 Michael Auli},
 title = {Pay Less Attention with Lightweight and Dynamic Convolutions},
 publisher = {International Conference on Learning Representations},
- year = {2019},
+ year = {2019}
 }

 @inproceedings{kalchbrenner-blunsom-2013-recurrent,
@@ -5381,8 +5362,8 @@ author    = {Yoshua Bengio and
               Phil Blunsom},
  title     = {Recurrent Continuous Translation Models},
  pages     = {1700--1709},
-  publisher = {Annual Meeting of the Association for Computational Linguistics},
-  year      = {2013},
+  publisher = {Conference on Empirical Methods in Natural Language Processing},
+  year      = {2013}
 }

 @article{Wu2016GooglesNM,
@@ -5458,7 +5439,7 @@ author    = {Yoshua Bengio and
  author    = {Ilya Sutskever and
               James Martens and
               George E. Dahl and
-               Geoffrey Everest Hinton},
+               Geoffrey Hinton},
  publisher = {International Conference on Machine Learning},
  pages     = {1139--1147},
  year={2013}
@@ -5473,7 +5454,7 @@ author    = {Yoshua Bengio and
 }

 @article{JMLR:v15:srivastava14a,
-  author  = {Nitish Srivastava and Geoffrey Everest Hinton and Alex Krizhevsky and Ilya Sutskever and Ruslan Salakhutdinov},
+  author  = {Nitish Srivastava and Geoffrey Hinton and Alex Krizhevsky and Ilya Sutskever and Ruslan Salakhutdinov},
  title   = {Dropout: A Simple Way to Prevent Neural Networks from Overfitting},
  journal = {Journal of Machine Learning Research},
  year    = {2014},
@@ -5507,7 +5488,7 @@ author    = {Yoshua Bengio and
  title={Rigid-motion scattering for image classification},
  author={Sifre, Laurent and Mallat, St{\'e}phane},
  year={2014},
-  publisher={Citeseer}
+  journal={Citeseer}
 }

 @article{Taigman2014DeepFaceCT,
@@ -5566,7 +5547,7 @@ author    = {Yoshua Bengio and
               Tong Zhang},
  title     = {Deep Pyramid Convolutional Neural Networks for Text Categorization},
  pages     = {562--570},
-  publisher = {Association for Computational Linguistics},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
  year      = {2017}
 }

@@ -5595,7 +5576,7 @@ author    = {Yoshua Bengio and
  title     = {Speech-Transformer: {A} No-Recurrence Sequence-to-Sequence Model for
               Speech Recognition},
  pages     = {5884--5888},
-  publisher = {Institute of Electrical and Electronics Engineers},
+  publisher = {International Conference on Acoustics, Speech and Signal Processing},
  year      = {2018}
 }

@@ -5773,14 +5754,14 @@ author    = {Yoshua Bengio and
 }
 @article{Liu2020LearningTE,
 	title={Learning to Encode Position for Transformer with Continuous Dynamical Model},
-	author={Xuanqing Liu and Hsiang-Fu Yu and I. Dhillon and Cho-Jui Hsieh},
+	author={Xuanqing Liu and Hsiang-Fu Yu and Inderjit Dhillon and Cho-Jui Hsieh},
 	journal={ArXiv},
 	year={2020},
 	volume={abs/2003.09229}
 }
 @inproceedings{Jawahar2019WhatDB,
 	title={What Does BERT Learn about the Structure of Language?},
-	author={Ganesh Jawahar and B. Sagot and Djam{\'e} Seddah},
+	author={Ganesh Jawahar and Beno{\^{\i}}t Sagot and Djam{\'e} Seddah},
 	publisher={Annual Meeting of the Association for Computational Linguistics},
 	year={2019}
 }
@@ -5943,7 +5924,7 @@ author    = {Yoshua Bengio and
               Translation Models},
  volume    = {3265},
  pages     = {115--124},
-  publisher = {Springer},
+  publisher = {	Association for Machine Translation in the Americas},
  year      = {2004}
 }

@@ -5954,19 +5935,20 @@ author    = {Yoshua Bengio and
               Bill Byrne},
  title     = {SGNMT - A Flexible NMT Decoding Platform for Quick Prototyping
               of New Models and Search Strategies},
-  booktitle = {Proceedings of the 2017 Conference on Empirical Methods in Natural
-               Language Processing, {EMNLP} 2017, Copenhagen, Denmark, September
-               9-11, 2017 - System Demonstrations},
  pages     = {25--30},
-  publisher = {Association for Computational Linguistics},
+  publisher = {Conference on Empirical Methods in Natural Language Processing},
  year      = {2017}
 }

 @inproceedings{Liu2016AgreementOT,
  title={Agreement on Target-bidirectional Neural Machine Translation},
-  author={L. Liu and M. Utiyama and A. Finch and Eiichiro Sumita},
-  booktitle={HLT-NAACL},
-  year={2016}
+  author={Lemao Liu and
+               Masao Utiyama and
+               Andrew M. Finch and
+               Eiichiro Sumita},
+  pages     = {411--416},
+  publisher = {	Annual Conference of the North American Chapter of the Association for Computational Linguistics},
+  year      = {2016}
 }

 @inproceedings{DBLP:conf/wmt/LiLXLLLWZXWFCLL19,
@@ -5988,11 +5970,8 @@ author    = {Yoshua Bengio and
               Tong Xiao and
               Jingbo Zhu},
  title     = {The NiuTrans Machine Translation Systems for {WMT19}},
-  booktitle = {Proceedings of the Fourth Conference on Machine Translation, {WMT}
-               2019, Florence, Italy, August 1-2, 2019 - Volume 2: Shared Task Papers,
-               Day 1},
  pages     = {257--266},
-  publisher = {Association for Computational Linguistics},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
  year      = {2019}
 }

@@ -6001,19 +5980,19 @@ author    = {Yoshua Bengio and
               Barry Haddow and
               Alexandra Birch},
  title     = {Edinburgh Neural Machine Translation Systems for {WMT} 16},
-  booktitle = {Proceedings of the First Conference on Machine Translation, {WMT}
-               2016, colocated with {ACL} 2016, August 11-12, Berlin, Germany},
  pages     = {371--376},
-  publisher = {The Association for Computer Linguistics},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
  year      = {2016}
 }

 @article{Stahlberg2018TheUO,
  title={The University of Cambridge's Machine Translation Systems for WMT18},
-  author={Felix Stahlberg and A. Gispert and B. Byrne},
-  journal={ArXiv},
-  year={2018},
-  volume={abs/1808.09465}
+  author={Felix Stahlberg and
+               Adri{\`{a}} de Gispert and
+               Bill Byrne},
+  pages     = {504--512},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2018}
 }

 @inproceedings{DBLP:conf/aaai/ZhangSQLJW18,
@@ -6024,20 +6003,18 @@ author    = {Yoshua Bengio and
               Rongrong Ji and
               Hongji Wang},
  title     = {Asynchronous Bidirectional Decoding for Neural Machine Translation},
-  booktitle = {Proceedings of the Thirty-Second {AAAI} Conference on Artificial Intelligence,
-               (AAAI-18), the 30th innovative Applications of Artificial Intelligence
-               (IAAI-18), and the 8th {AAAI} Symposium on Educational Advances in
-               Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February
-               2-7, 2018},
  pages     = {5698--5705},
-  publisher = {{AAAI} Press},
+  publisher = {	AAAI Conference on Artificial Intelligence},
  year      = {2018}
 }

 @article{Li2017EnhancedNM,
  title={Enhanced neural machine translation by learning from draft},
-  author={A. Li and Shiyue Zhang and D. Wang and T. Zheng},
-  journal={2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)},
+  author={Aodong Li and
+               Shiyue Zhang and
+               Dong Wang and
+               Thomas Fang Zheng},
+  publisher={IEEE Asia-Pacific Services Computing Conference},
  year={2017},
  pages={1583-1587}
 }
@@ -6045,120 +6022,141 @@ author    = {Yoshua Bengio and
 @inproceedings{ElMaghraby2018EnhancingTF,
  title={Enhancing Translation from English to Arabic Using Two-Phase Decoder Translation},
  author={Ayah ElMaghraby and Ahmed Rafea},
-  booktitle={IntelliSys},
-  year={2018}
+  pages     = {539--549},
+  publisher = {Intelligent Systems and Applications},
+  year      = {2018}
 }

 @inproceedings{Geng2018AdaptiveMD,
  title={Adaptive Multi-pass Decoder for Neural Machine Translation},
-  author={X. Geng and X. Feng and B. Qin and T. Liu},
-  booktitle={EMNLP},
+  author={Xinwei Geng and
+               Xiaocheng Feng and
+               Bing Qin and
+               Ting Liu},
+  publisher ={Conference on Empirical Methods in Natural Language Processing},
+  pages={523--532},
  year={2018}
 }

 @article{Lee2018DeterministicNN,
  title={Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement},
  author={Jason Lee and Elman Mansimov and Kyunghyun Cho},
-  journal={ArXiv},
-  year={2018},
-  volume={abs/1802.06901}
+  pages     = {1173--1182},
+  publisher = {Conference on Empirical Methods in Natural Language Processing},
+  year      = {2018}
 }

 @inproceedings{Gu2019LevenshteinT,
  title={Levenshtein Transformer},
  author={Jiatao Gu and Changhan Wang and Jake Zhao},
-  booktitle={NeurIPS},
-  year={2019}
+  publisher = {Conference and Workshop on Neural Information Processing Systems},
+  pages     = {11179--11189},
+  year      = {2019},
 }

 @inproceedings{Guo2020JointlyMS,
  title={Jointly Masked Sequence-to-Sequence Model for Non-Autoregressive Neural Machine Translation},
-  author={Junliang Guo and Linli Xu and E. Chen},
-  booktitle={ACL},
-  year={2020}
+  author={Junliang Guo and Linli Xu and Enhong Chen},
+  pages     = {376--385},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2020}
 }

 @article{Stahlberg2018AnOS,
  title={An Operation Sequence Model for Explainable Neural Machine Translation},
-  author={Felix Stahlberg and Danielle Saunders and B. Byrne},
-  journal={ArXiv},
-  year={2018},
-  volume={abs/1808.09688}
+  author={Felix Stahlberg and Danielle Saunders and Bill Byrne},
+  pages     = {175--186},
+  publisher = {Conference on Empirical Methods in Natural Language Processing},
+  year      = {2018}
 }

 @inproceedings{Stern2019InsertionTF,
  title={Insertion Transformer: Flexible Sequence Generation via Insertion Operations},
-  author={Mitchell Stern and William Chan and J. Kiros and Jakob Uszkoreit},
-  booktitle={ICML},
+  author={Mitchell Stern and William Chan and Jamie Kiros and Jakob Uszkoreit},
+  publisher={International Conference on Machine Learning},
+  pages={5976--5985},
  year={2019}
 }

 @article{stling2017NeuralMT,
  title={Neural machine translation for low-resource languages},
-  author={Robert {\"O}stling and J. Tiedemann},
-  journal={ArXiv},
+  author={Robert {\"O}stling and J{\"{o}}rg Tiedemann},
+  journal={CoRR},
  year={2017},
  volume={abs/1708.05729}
 }

 @article{Kikuchi2016ControllingOL,
  title={Controlling Output Length in Neural Encoder-Decoders},
-  author={Yuta Kikuchi and Graham Neubig and Ryohei Sasano and H. Takamura and M. Okumura},
-  journal={ArXiv},
-  year={2016},
-  volume={abs/1609.09552}
+  author={Yuta Kikuchi and
+               Graham Neubig and
+               Ryohei Sasano and
+               Hiroya Takamura and
+               Manabu Okumura},
+  pages     = {1328--1338},
+  publisher = {Conference on Empirical Methods in Natural Language Processing},
+  year      = {2016}
 }

 @inproceedings{Takase2019PositionalET,
  title={Positional Encoding to Control Output Sequence Length},
-  author={S. Takase and N. Okazaki},
-  booktitle={NAACL-HLT},
+  author={Sho Takase and
+               Naoaki Okazaki},
+  publisher={Annual Conference of the North American Chapter of the Association for Computational Linguistics},
+  pages={3999--4004},
  year={2019}
 }

 @inproceedings{Murray2018CorrectingLB,
  title={Correcting Length Bias in Neural Machine Translation},
  author={Kenton Murray and David Chiang},
-  booktitle={WMT},
-  year={2018}
+  pages     = {212--223},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2018}
 }

 @article{Sountsov2016LengthBI,
  title={Length bias in Encoder Decoder Models and a Case for Global Conditioning},
  author={Pavel Sountsov and Sunita Sarawagi},
-  journal={ArXiv},
-  year={2016},
-  volume={abs/1606.03402}
+  pages     = {1516--1525},
+  publisher = {Conference on Empirical Methods in Natural Language Processing},
+  year      = {2016}
 }

 @inproceedings{Jean2015MontrealNM,
  title={Montreal Neural Machine Translation Systems for WMT'15},
-  author={S. Jean and Orhan Firat and Kyunghyun Cho and R. Memisevic and Yoshua Bengio},
-  booktitle={WMT@EMNLP},
+  author={S{\'{e}}bastien Jean and
+               Orhan Firat and
+               Kyunghyun Cho and
+               Roland Memisevic and
+               Yoshua Bengio},
+  publisher={Conference on Empirical Methods in Natural Language Processing},
+  pages={134--140},
  year={2015}
 }

 @inproceedings{Yang2018OtemUtemOA,
  title={Otem{\&}Utem: Over- and Under-Translation Evaluation Metric for NMT},
-  author={J. Yang and Biao Zhang and Yue Qin and Xiangwen Zhang and Q. Lin and Jinsong Su},
-  booktitle={NLPCC},
+  author={Jing Yang and
+               Biao Zhang and
+               Yue Qin and
+               Xiangwen Zhang and
+               Qian Lin and
+               Jinsong Su},
+  publisher={CCF International Conference on Natural Language Processing and Chinese Computing},
+  pages={291--302},
  year={2018}
 }

 @inproceedings{Mi2016CoverageEM,
  title={Coverage Embedding Models for Neural Machine Translation},
-  author={Haitao Mi and B. Sankaran and Z. Wang and Abe Ittycheriah},
-  booktitle={EMNLP},
-  year={2016}
-}
-
-@article{Kazimi2017CoverageFC,
-  title={Coverage for Character Based Neural Machine Translation},
-  author={M. Kazimi and Marta R. Costa-juss{\`a}},
-  journal={Proces. del Leng. Natural},
-  year={2017},
-  volume={59},
-  pages={99-106}
+  author={Haitao Mi and
+               Baskaran Sankaran and
+               Zhiguo Wang and
+               Abe Ittycheriah},
+  pages     = {955--960},
+  publisher = {Conference on Empirical Methods in Natural Language Processing},
+  year      = {2016}
 }

 @inproceedings{DBLP:conf/emnlp/HuangZM17,
@@ -6175,7 +6173,8 @@ author    = {Yoshua Bengio and
 @inproceedings{Wiseman2016SequencetoSequenceLA,
  title={Sequence-to-Sequence Learning as Beam-Search Optimization},
  author={Sam Wiseman and Alexander M. Rush},
-  booktitle={EMNLP},
+  publisher={Conference on Empirical Methods in Natural Language Processing},
+  pages={1296--1306},
  year={2016}
 }

@@ -6192,10 +6191,12 @@ author    = {Yoshua Bengio and

 @article{Ma2019LearningTS,
  title={Learning to Stop in Structured Prediction for Neural Machine Translation},
-  author={M. Ma and Renjie Zheng and Liang Huang},
-  journal={ArXiv},
-  year={2019},
-  volume={abs/1904.01032}
+  author={Mingbo Ma and
+               Renjie Zheng and
+               Liang Huang},
+  pages     = {1884--1889},
+  publisher = {	Annual Conference of the North American Chapter of the Association for Computational Linguistics},
+  year      = {2019}
 }

 @inproceedings{KleinOpenNMT,
@@ -6219,119 +6220,153 @@ author    = {Yoshua Bengio and
  year      = {2015}
 }

-@inproceedings{Eisner2011LearningST,
-  title={Learning Speed-Accuracy Tradeoffs in Nondeterministic Inference Algorithms},
-  author={J. Eisner and Hal Daum{\'e}},
-  year={2011}
-}
-
 @inproceedings{Jiang2012LearnedPF,
  title={Learned Prioritization for Trading Off Accuracy and Speed},
-  author={J. Jiang and Adam R. Teichert and Hal Daum{\'e} and J. Eisner},
-  booktitle={NIPS},
-  year={2012}
+  author={Jiarong Jiang and Adam R. Teichert and Hal Daum{\'e} and Jason Eisner},
+  publisher={Conference and Workshop on Neural Information Processing Systems},
+  pages={1340--1348},
+  year= {2012}
 }

 @inproceedings{Zheng2020OpportunisticDW,
  title={Opportunistic Decoding with Timely Correction for Simultaneous Translation},
-  author={Renjie Zheng and M. Ma and Baigong Zheng and Kaibo Liu and Liang Huang},
-  booktitle={ACL},
+  author={Renjie Zheng and
+               Mingbo Ma and
+               Baigong Zheng and
+               Kaibo Liu and
+               Liang Huang},
+  publisher={Annual Meeting of the Association for Computational Linguistics},
+  pages={437--442},
  year={2020}
 }

 @inproceedings{Ma2019STACLST,
  title={STACL: Simultaneous Translation with Implicit Anticipation and Controllable Latency using Prefix-to-Prefix Framework},
-  author={M. Ma and L. Huang and Hao Xiong and Renjie Zheng and Kaibo Liu and Baigong Zheng and Chuanqiang Zhang and Zhongjun He and Hairong Liu and X. Li and H. Wu and Haifeng Wang},
-  booktitle={ACL},
+  author={Mingbo Ma and
+               Liang Huang and
+               Hao Xiong and
+               Renjie Zheng and
+               Kaibo Liu and
+               Baigong Zheng and
+               Chuanqiang Zhang and
+               Zhongjun He and
+               Hairong Liu and
+               Xing Li and
+               Hua Wu and
+               Haifeng Wang},
+  publisher={Annual Meeting of the Association for Computational Linguistics},
+  pages={3025--3036},
  year={2019}
 }

 @inproceedings{Gimpel2013ASE,
  title={A Systematic Exploration of Diversity in Machine Translation},
  author={Kevin Gimpel and Dhruv Batra and Chris Dyer and Gregory Shakhnarovich},
-  booktitle={EMNLP},
+  publisher={Conference on Empirical Methods in Natural Language Processing},
+  pages={1100--1111},
  year={2013}
 }

 @article{Li2016MutualIA,
  title={Mutual Information and Diverse Decoding Improve Neural Machine Translation},
-  author={J. Li and Dan Jurafsky},
-  journal={ArXiv},
+  author={Jiwei Li and Dan Jurafsky},
+  journal={CoRR},
  year={2016},
  volume={abs/1601.00372}
 }

 @inproceedings{Li2016ADO,
  title={A Diversity-Promoting Objective Function for Neural Conversation Models},
-  author={J. Li and Michel Galley and Chris Brockett and Jianfeng Gao and W. Dolan},
-  booktitle={HLT-NAACL},
+  author={Jiwei Li and
+               Michel Galley and
+               Chris Brockett and
+               Jianfeng Gao and
+               Bill Dolan},
+  publisher={Annual Conference of the North American Chapter of the Association for Computational Linguistics},
+  pages={110--119},
  year={2016}
 }

 @inproceedings{He2018SequenceTS,
  title={Sequence to Sequence Mixture Model for Diverse Machine Translation},
  author={Xuanli He and Gholamreza Haffari and Mohammad Norouzi},
-  booktitle={CoNLL},
-  year={2018}
+  pages     = {583--592},
+  publisher = {International Conference on Computational Linguistics},
+  year      = {2018}
 }

 @article{Shen2019MixtureMF,
  title={Mixture Models for Diverse Machine Translation: Tricks of the Trade},
-  author={Tianxiao Shen and Myle Ott and M. Auli and Marc'Aurelio Ranzato},
-  journal={ArXiv},
-  year={2019},
-  volume={abs/1902.07816}
+  author={Tianxiao Shen and Myle Ott and Michael Auli and Marc'Aurelio Ranzato},
+  pages     = {5719--5728},
+  publisher = {International Conference on Machine Learning},
+  year      = {2019},
 }

 @article{Wu2020GeneratingDT,
  title={Generating Diverse Translation from Model Distribution with Dropout},
  author={Xuanfu Wu and Yang Feng and Chenze Shao},
-  journal={ArXiv},
-  year={2020},
-  volume={abs/2010.08178}
+  pages={1088--1097},
+  publisher={Annual Meeting of the Association for Computational Linguistics},
+  year={2020}
 }

 @inproceedings{Sun2020GeneratingDT,
  title={Generating Diverse Translation by Manipulating Multi-Head Attention},
-  author={Zewei Sun and Shujian Huang and Hao-Ran Wei and Xin-Yu Dai and Jiajun Chen},
-  booktitle={AAAI},
+  author={Zewei Sun and Shujian Huang and Hao Ran Wei and Xin Yu Dai and Jiajun Chen},
+  publisher={AAAI Conference on Artificial Intelligence},
+  pages={8976--8983},
  year={2020}
 }

 @article{Vijayakumar2016DiverseBS,
  title={Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models},
-  author={Ashwin K. Vijayakumar and Michael Cogswell and R. R. Selvaraju and Q. Sun and Stefan Lee and David J. Crandall and Dhruv Batra},
-  journal={ArXiv},
+  author={Ashwin K. Vijayakumar and
+               Michael Cogswell and
+               Ramprasaath R. Selvaraju and
+               Qing Sun and
+               Stefan Lee and
+               David J. Crandall and
+               Dhruv Batra},
+  journal={CoRR},
  year={2016},
  volume={abs/1610.02424}
 }

 @inproceedings{Liu2014SearchAwareTF,
  title={Search-Aware Tuning for Machine Translation},
-  author={L. Liu and Liang Huang},
-  booktitle={EMNLP},
+  author={Lemao Liu and
+               Liang Huang},
+  publisher={Conference on Empirical Methods in Natural Language Processing},
+  pages={1942--1952},
  year={2014}
 }

 @inproceedings{Yu2013MaxViolationPA,
  title={Max-Violation Perceptron and Forced Decoding for Scalable MT Training},
  author={Heng Yu and Liang Huang and Haitao Mi and Kai Zhao},
-  booktitle={EMNLP},
+  publisher={Conference on Empirical Methods in Natural Language Processing},
+  pages={1112--1123},
  year={2013}
 }

 @inproceedings{Stahlberg2019OnNS,
  title={On NMT Search Errors and Model Errors: Cat Got Your Tongue?},
  author={Felix Stahlberg and 
-          B. Byrne},
-  booktitle={EMNLP/IJCNLP},
+          Bill Byrne},
+  publisher={Conference on Empirical Methods in Natural Language Processing},
+  pages={3354--3360},
  year={2019}
 }

 @inproceedings{Niehues2017AnalyzingNM,
  title={Analyzing Neural MT Search and Model Performance},
-  author={J. Niehues and Eunah Cho and Thanh-Le Ha and Alexander H. Waibel},
-  booktitle={NMT@ACL},
+  author={Jan Niehues and
+               Eunah Cho and
+               Thanh-Le Ha and
+               Alex Waibel},
+  pages={11--17},
+  publisher={Annual Meeting of the Association for Computational Linguistics},
  year={2017}
 }

@@ -6346,26 +6381,31 @@ author    = {Yoshua Bengio and

 @article{Ranzato2016SequenceLT,
  title={Sequence Level Training with Recurrent Neural Networks},
-  author={Marc'Aurelio Ranzato and S. Chopra and M. Auli and W. Zaremba},
-  journal={CoRR},
-  year={2016},
-  volume={abs/1511.06732}
+  author={Marc'Aurelio Ranzato and
+               Sumit Chopra and
+               Michael Auli and
+               Wojciech Zaremba},
+  publisher={International Conference on Learning Representations},
+  year={2016}
 }

 @article{Bengio2015ScheduledSF,
  title={Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks},
-  author={S. Bengio and Oriol Vinyals and Navdeep Jaitly and Noam Shazeer},
-  journal={ArXiv},
-  year={2015},
-  volume={abs/1506.03099}
+  author={Samy Bengio and
+               Oriol Vinyals and
+               Navdeep Jaitly and
+               Noam Shazeer},
+  booktitle = {Conference and Workshop on Neural Information Processing Systems},
+  pages     = {1171--1179},
+  year      = {2015}
 }

 @article{Zhang2019BridgingTG,
  title={Bridging the Gap between Training and Inference for Neural Machine Translation},
-  author={Wen Zhang and Y. Feng and Fandong Meng and Di You and Qun Liu},
-  journal={ArXiv},
-  year={2019},
-  volume={abs/1906.02448}
+  author={Wen Zhang and Yang Feng and Fandong Meng and Di You and Qun Liu},
+  pages     = {4334--4343},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2019}
 }

 @inproceedings{DBLP:conf/acl/ShenCHHWSL16,
@@ -6381,15 +6421,6 @@ author    = {Yoshua Bengio and
  year      = {2016},
 }

-@article{Gage1994ANA,
-  title={A new algorithm for data compression},
-  author={P. Gage},
-  journal={The C Users Journal archive},
-  year={1994},
-  volume={12},
-  pages={23-38}
-}
-
 @inproceedings{DBLP:conf/acl/SennrichHB16a,
  author    = {Rico Sennrich and
               Barry Haddow and
@@ -6433,26 +6464,31 @@ author    = {Yoshua Bengio and

 @article{Narang2017BlockSparseRN,
  title={Block-Sparse Recurrent Neural Networks},
-  author={Sharan Narang and Eric Undersander and G. Diamos},
-  journal={ArXiv},
+  author={Sharan Narang and Eric Undersander and Gregory Diamos},
+  journal={CoRR},
  year={2017},
  volume={abs/1711.02782}
 }

 @article{Gale2019TheSO,
  title={The State of Sparsity in Deep Neural Networks},
-  author={T. Gale and E. Elsen and Sara Hooker},
-  journal={ArXiv},
+  author={Trevor Gale and
+               Erich Elsen and
+               Sara Hooker},
+  journal={CoRR},
  year={2019},
  volume={abs/1902.09574}
 }

 @article{Michel2019AreSH,
  title={Are Sixteen Heads Really Better than One?},
-  author={Paul Michel and Omer Levy and Graham Neubig},
-  journal={ArXiv},
-  year={2019},
-  volume={abs/1905.10650}
+  author    = {Paul Michel and
+               Omer Levy and
+               Graham Neubig},
+  title     = {Are Sixteen Heads Really Better than One?},
+  publisher = {Conference and Workshop on Neural Information Processing Systems},
+  pages     = {14014--14024},
+  year      = {2019}
 }

 @inproceedings{DBLP:journals/corr/abs-1905-09418,
@@ -6480,17 +6516,11 @@ author    = {Yoshua Bengio and
 @article{Katharopoulos2020TransformersAR,
  title={Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention},
  author={Angelos Katharopoulos and Apoorv Vyas and Nikolaos Pappas and Franccois Fleuret},
-  journal={ArXiv},
+  journal={CoRR},
  year={2020},
  volume={abs/2006.16236}
 }

-@inproceedings{Beal2003VariationalAF,
-  title={Variational algorithms for approximate Bayesian inference},
-  author={M. Beal},
-  year={2003}
-}
-
 @article{xiao2011language,
  title ={Language Modeling for Syntax-Based Machine Translation Using Tree Substitution Grammars: A Case Study on Chinese-English Translation},
  author ={Xiao, Tong and Zhu, Jingbo and Zhu, Muhua},
@@ -6503,33 +6533,40 @@ author    = {Yoshua Bengio and

 @inproceedings{Li2009VariationalDF,
  title={Variational Decoding for Statistical Machine Translation},
-  author={Zhifei Li and J. Eisner and S. Khudanpur},
-  booktitle={ACL/IJCNLP},
+  author={Zhifei Li and
+               Jason Eisner and
+               Sanjeev Khudanpur},
+  publisher={Annual Meeting of the Association for Computational Linguistics},
+  pages={593--601},
  year={2009}
 }

 @article{Bastings2019ModelingLS,
  title={Modeling Latent Sentence Structure in Neural Machine Translation},
-  author={Jasmijn Bastings and W. Aziz and Ivan Titov and K. Sima'an},
-  journal={ArXiv},
-  year={2019},
-  volume={abs/1901.06436}
+  author={Jasmijn Bastings and
+               Wilker Aziz and
+               Ivan Titov and
+               Khalil Sima'an},
+  journal   = {CoRR},
+  volume    = {abs/1901.06436},
+  year      = {2019}
 }

 @article{Shah2018GenerativeNM,
  title={Generative Neural Machine Translation},
-  author={Harshil Shah and D. Barber},
-  journal={ArXiv},
-  year={2018},
-  volume={abs/1806.05138}
+  author={Harshil Shah and
+               David Barber},
+  publisher={Conference and Workshop on Neural Information Processing Systems},
+  pages={1353--1362},
+  year={2018}
 }

 @article{Su2018VariationalRN,
  title={Variational Recurrent Neural Machine Translation},
  author={Jinsong Su and Shan Wu and Deyi Xiong and Yaojie Lu and Xianpei Han and Biao Zhang},
-  journal={ArXiv},
-  year={2018},
-  volume={abs/1801.05119}
+  publisher={AAAI Conference on Artificial Intelligence},
+  pages={5488--5495},
+  year={2018}
 }

 @inproceedings{DBLP:journals/corr/GehringAGYD17,
@@ -6548,127 +6585,161 @@ author    = {Yoshua Bengio and
 @inproceedings{Wei2019ImitationLF,
  title={Imitation Learning for Non-Autoregressive Neural Machine Translation},
  author={Bingzhen Wei and Mingxuan Wang and Hao Zhou and Junyang Lin and Xu Sun},
-  booktitle={ACL},
+  publisher={Annual Meeting of the Association for Computational Linguistics},
+  pages     = {1304--1312},
  year={2019}
 }

 @inproceedings{Shao2019RetrievingSI,
  title={Retrieving Sequential Information for Non-Autoregressive Neural Machine Translation},
-  author={Chenze Shao and Y. Feng and J. Zhang and Fandong Meng and X. Chen and Jie Zhou},
-  booktitle={ACL},
+  author={Chenze Shao and
+               Yang Feng and
+               Jinchao Zhang and
+               Fandong Meng and
+               Xilin Chen and
+               Jie Zhou},
+  publisher={Annual Meeting of the Association for Computational Linguistics},
+  pages={3013--3024},
  year={2019}
 }

 @article{Akoury2019SyntacticallyST,
  title={Syntactically Supervised Transformers for Faster Neural Machine Translation},
  author={Nader Akoury and Kalpesh Krishna and Mohit Iyyer},
-  journal={ArXiv},
-  year={2019},
-  volume={abs/1906.02780}
+  pages     = {1269--1281},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2019},
 }

 @article{Guo2020FineTuningBC,
  title={Fine-Tuning by Curriculum Learning for Non-Autoregressive Neural Machine Translation},
-  author={Junliang Guo and Xu Tan and Linli Xu and Tao Qin and E. Chen and T. Liu},
-  journal={ArXiv},
-  year={2020},
-  volume={abs/1911.08717}
+  author={Junliang Guo and
+               Xu Tan and
+               Linli Xu and
+               Tao Qin and
+               Enhong Chen and
+               Tie-Yan Liu},
+  pages     = {7839--7846},
+  publisher = {AAAI Conference on Artificial Intelligence},
+  year      = {2020}
 }

 @inproceedings{Ran2020LearningTR,
  title={Learning to Recover from Multi-Modality Errors for Non-Autoregressive Neural Machine Translation},
-  author={Qiu Ran and Yankai Lin and Peng Li and J. Zhou},
-  booktitle={ACL},
+  author={Qiu Ran and Yankai Lin and Peng Li and Jie Zhou},
+  publisher={Annual Meeting of the Association for Computational Linguistics},
+  pages={3059--3069},
  year={2020}
 }

 @article{Liu2020FastBERTAS,
  title={FastBERT: a Self-distilling BERT with Adaptive Inference Time},
-  author={Weijie Liu and P. Zhou and Zhe Zhao and Zhiruo Wang and Haotang Deng and Q. Ju},
-  journal={ArXiv},
-  year={2020},
-  volume={abs/2004.02178}
+  author={Weijie Liu and
+               Peng Zhou and
+               Zhiruo Wang and
+               Zhe Zhao and
+               Haotang Deng and
+               Qi Ju},
+  pages     = {6035--6044},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2020}
 }

+
 @article{Elbayad2020DepthAdaptiveT,
  title={Depth-Adaptive Transformer},
-  author={Maha Elbayad and Jiatao Gu and E. Grave and M. Auli},
-  journal={ArXiv},
-  year={2020},
-  volume={abs/1910.10073}
+  author={Maha Elbayad and
+               Jiatao Gu and
+               Edouard Grave and
+               Michael Auli},
+  publisher={International Conference on Learning Representations},
+  year={2020}
 }

 @article{Lan2020ALBERTAL,
  title={ALBERT: A Lite BERT for Self-supervised Learning of Language Representations},
  author={Zhenzhong Lan and Mingda Chen and Sebastian Goodman and Kevin Gimpel and Piyush Sharma and Radu Soricut},
-  journal={ArXiv},
-  year={2020},
-  volume={abs/1909.11942}
+  publisher={International Conference on Learning Representations}
 }

 @inproceedings{Han2015LearningBW,
  title={Learning both Weights and Connections for Efficient Neural Network},
-  author={Song Han and J. Pool and John Tran and W. Dally},
-  booktitle={NIPS},
+  author={Song Han and
+               Jeff Pool and
+               John Tran and
+               William J. Dally},
+  publisher={Conference and Workshop on Neural Information Processing Systems},
+  pages={1135--1143},
  year={2015}
 }

 @article{Lee2019SNIPSN,
-  title={SNIP: Single-shot Network Pruning based on Connection Sensitivity},
-  author={N. Lee and Thalaiyasingam Ajanthan and P. Torr},
-  journal={ArXiv},
-  year={2019},
-  volume={abs/1810.02340}
+  author    = {Namhoon Lee and
+               Thalaiyasingam Ajanthan and
+               Philip H. S. Torr},
+  title     = {Snip: single-Shot Network Pruning based on Connection sensitivity},
+  publisher = {International Conference on Learning Representations},
+  year      = {2019},
 }

 @article{Frankle2019TheLT,
  title={The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks},
  author={Jonathan Frankle and Michael Carbin},
-  journal={arXiv: Learning},
+  publisher={International Conference on Learning Representations},
  year={2019}
 }

 @article{Brix2020SuccessfullyAT,
-  title={Successfully Applying the Stabilized Lottery Ticket Hypothesis to the Transformer Architecture},
-  author={Christopher Brix and P. Bahar and H. Ney},
-  journal={ArXiv},
-  year={2020},
-  volume={abs/2005.03454}
+  author    = {Christopher Brix and
+               Parnia Bahar and
+               Hermann Ney},
+  title     = {Successfully Applying the Stabilized Lottery Ticket Hypothesis to
+               the Transformer Architecture},
+  pages     = {3909--3915},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2020},
 }

 @article{Liu2019RethinkingTV,
  title={Rethinking the Value of Network Pruning},
-  author={Zhuang Liu and M. Sun and Tinghui Zhou and Gao Huang and Trevor Darrell},
+  author={Zhuang Liu and
+               Mingjie Sun and
+               Tinghui Zhou and
+               Gao Huang and
+               Trevor Darrell},
  journal={ArXiv},
  year={2019},
  volume={abs/1810.05270}
 }

 @article{Liu2017LearningEC,
-  title={Learning Efficient Convolutional Networks through Network Slimming},
-  author={Zhuang Liu and J. Li and Zhiqiang Shen and Gao Huang and S. Yan and C. Zhang},
-  journal={2017 IEEE International Conference on Computer Vision (ICCV)},
-  year={2017},
-  pages={2755-2763}
-}
-
-@inproceedings{Cheong2019transformersZ,
-  title={transformers.zip : Compressing Transformers with Pruning and Quantization},
-  author={Robin Cheong},
-  year={2019}
+author    = {Zhuang Liu and
+               Jianguo Li and
+               Zhiqiang Shen and
+               Gao Huang and
+               Shoumeng Yan and
+               Changshui Zhang},
+  title     = {Learning Efficient Convolutional Networks through Network Slimming},
+  pages     = {2755--2763},
+  publisher = {{IEEE} International Conference on Computer Vision},
+  year      = {2017}
 }

 @inproceedings{Banner2018ScalableMF,
  title={Scalable Methods for 8-bit Training of Neural Networks},
-  author={R. Banner and Itay Hubara and E. Hoffer and Daniel Soudry},
-  booktitle={NeurIPS},
+  author={Ron Banner and
+               Itay Hubara and
+               Elad Hoffer and
+               Daniel Soudry},
+  publisher={Conference on Neural Information Processing Systems},
+  pages={5151--5159},
  year={2018}
 }

 @article{Hubara2017QuantizedNN,
  title={Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations},
  author={Itay Hubara and Matthieu Courbariaux and Daniel Soudry and Ran El-Yaniv and Yoshua Bengio},
-  journal={J. Mach. Learn. Res.},
+  journal={Journal of Machine Learning Reseach},
  year={2017},
  volume={18},
  pages={187:1-187:30}
@@ -6686,62 +6757,100 @@ author    = {Yoshua Bengio and

 @article{Munim2019SequencelevelKD,
  title={Sequence-level Knowledge Distillation for Model Compression of Attention-based Sequence-to-sequence Speech Recognition},
-  author={Raden Mu'az Mun'im and N. Inoue and Koichi Shinoda},
-  journal={ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
+  author={Raden Mu'az Mun'im and Nakamasa Inoue and Koichi Shinoda},
+  publisher={{IEEE} International Conference on Acoustics, Speech and Signal Processing},
  year={2019},
  pages={6151-6155}
 }

 @article{Tang2019DistillingTK,
-  title={Distilling Task-Specific Knowledge from BERT into Simple Neural Networks},
-  author={Raphael Tang and Yao Lu and L. Liu and Lili Mou and Olga Vechtomova and Jimmy Lin},
-  journal={ArXiv},
-  year={2019},
-  volume={abs/1903.12136}
+  author    = {Raphael Tang and
+               Yao Lu and
+               Linqing Liu and
+               Lili Mou and
+               Olga Vechtomova and
+               Jimmy Lin},
+  title     = {Distilling Task-Specific Knowledge from {BERT} into Simple Neural
+               Networks},
+  journal   = {CoRR},
+  volume    = {abs/1903.12136},
+  year      = {2019}
 }

 @inproceedings{Jiao2020TinyBERTDB,
-  title={TinyBERT: Distilling BERT for Natural Language Understanding},
-  author={Xiaoqi Jiao and Y. Yin and L. Shang and Xin Jiang and X. Chen and Linlin Li and F. Wang and Qun Liu},
-  booktitle={EMNLP},
+  author    = {Xiaoqi Jiao and
+               Yichun Yin and
+               Lifeng Shang and
+               Xin Jiang and
+               Xiao Chen and
+               Linlin Li and
+               Fang Wang and
+               Qun Liu},
+  title     = {TinyBERT: Distilling {BERT} for Natural Language Understanding},
+  pages     = {4163--4174},
+  publisher={Conference on Empirical Methods in Natural Language Processing},
  year={2020}
 }

 @article{Ghazvininejad2020AlignedCE,
-  title={Aligned Cross Entropy for Non-Autoregressive Machine Translation},
-  author={Marjan Ghazvininejad and V. Karpukhin and Luke Zettlemoyer and Omer Levy},
-  journal={ArXiv},
-  year={2020},
-  volume={abs/2004.01655}
+  author    = {Marjan Ghazvininejad and
+               Vladimir Karpukhin and
+               Luke Zettlemoyer and
+               Omer Levy},
+  title     = {Aligned Cross Entropy for Non-Autoregressive Machine Translation},
+  journal   = {CoRR},
+  volume    = {abs/2004.01655},
+  year      = {2020},
 }

 @inproceedings{Shao2020MinimizingTB,
-  title={Minimizing the Bag-of-Ngrams Difference for Non-Autoregressive Neural Machine Translation},
-  author={Chenze Shao and Jinchao Zhang and Yun-jie Feng and Fandong Meng and Jie Zhou},
-  booktitle={AAAI},
-  year={2020}
+  author    = {Chenze Shao and
+               Jinchao Zhang and
+               Yang Feng and
+               Fandong Meng and
+               Jie Zhou},
+  title     = {Minimizing the Bag-of-Ngrams Difference for Non-Autoregressive Neural
+               Machine Translation},
+  pages     = {198--205},
+  publisher = {AAAI Conference on Artificial Intelligence},
+  year      = {2020},
 }

 @inproceedings{Ma2019FlowSeqNC,
  title={FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow},
-  author={Xuezhe Ma and Chunting Zhou and X. Li and Graham Neubig and E. Hovy},
-  booktitle={EMNLP/IJCNLP},
+  author={Xuezhe Ma and
+               Chunting Zhou and
+               Xian Li and
+               Graham Neubig and
+               Eduard H. Hovy},
+  publisher={Conference on Empirical Methods in Natural Language Processing},
+  pages={4281--4291},
  year={2019}
 }

 @inproceedings{Guo2019NonAutoregressiveNM,
  title={Non-Autoregressive Neural Machine Translation with Enhanced Decoder Input},
-  author={Junliang Guo and X. Tan and D. He and T. Qin and Linli Xu and T. Liu},
-  booktitle={AAAI},
+  author={Junliang Guo and
+               Xu Tan and
+               Di He and
+               Tao Qin and
+               Linli Xu and
+               Tie-Yan Liu},
+  pages={3723--3730},
+  publisher={AAAI Conference on Artificial Intelligence},
  year={2019}
 }

 @article{Ran2019GuidingNN,
-  title={Guiding Non-Autoregressive Neural Machine Translation Decoding with Reordering Information},
-  author={Qiu Ran and Yankai Lin and Peng Li and J. Zhou},
-  journal={ArXiv},
-  year={2019},
-  volume={abs/1911.02215}
+  author    = {Qiu Ran and
+               Yankai Lin and
+               Peng Li and
+               Jie Zhou},
+  title     = {Guiding Non-Autoregressive Neural Machine Translation Decoding with
+               Reordering Information},
+  journal   = {CoRR},
+  volume    = {abs/1911.02215},
+  year      = {2019}
 }

 @inproceedings{vaswani2017attention,
@@ -6773,73 +6882,96 @@ author    = {Yoshua Bengio and

 @inproceedings{Wang2019NonAutoregressiveMT,
  title={Non-Autoregressive Machine Translation with Auxiliary Regularization},
-  author={Yiren Wang and Fei Tian and D. He and T. Qin and ChengXiang Zhai and T. Liu},
-  booktitle={AAAI},
+  author={Yiren Wang and
+               Fei Tian and
+               Di He and
+               Tao Qin and
+               ChengXiang Zhai and
+               Tie-Yan Liu},
+  publisher={AAAI Conference on Artificial Intelligence},
+  pages={5377--5384},
  year={2019}
 }

 @inproceedings{Kaiser2018FastDI,
  title={Fast Decoding in Sequence Models using Discrete Latent Variables},
-  author={Łukasz Kaiser and Aurko Roy and Ashish Vaswani and Niki Parmar and S. Bengio and Jakob Uszkoreit and Noam Shazeer},
-  booktitle={ICML},
+  author={Łukasz Kaiser and Aurko Roy and Ashish Vaswani and Niki Parmar and Samy Bengio and Jakob Uszkoreit and Noam Shazeer},
+  publisher={International Conference on Machine Learning},
+  pages={2395--2404},
  year={2018}
 }

 @article{Tu2020ENGINEEI,
  title={ENGINE: Energy-Based Inference Networks for Non-Autoregressive Machine Translation},
  author={Lifu Tu and Richard Yuanzhe Pang and Sam Wiseman and Kevin Gimpel},
-  journal={ArXiv},
-  year={2020},
-  volume={abs/2005.00850}
+  pages={2819--2826},
+  publisher={Annual Meeting of the Association for Computational Linguistics},
+  year={2020}
 }

 @inproceedings{Shu2020LatentVariableNN,
  title={Latent-Variable Non-Autoregressive Neural Machine Translation with Deterministic Inference using a Delta Posterior},
  author={Raphael Shu and Jason Lee and Hideki Nakayama and Kyunghyun Cho},
-  booktitle={AAAI},
+  publisher={AAAI Conference on Artificial Intelligence},
+  pages={8846--8853},
  year={2020}
 }

 @inproceedings{Li2019HintBasedTF,
  title={Hint-Based Training for Non-Autoregressive Machine Translation},
-  author={Zhuohan Li and Zi Lin and Di He and Fei Tian and Tao Qin and Liwei Wang and T. Liu},
-  booktitle={EMNLP/IJCNLP},
+  author={Zhuohan Li and
+               Zi Lin and
+               Di He and
+               Fei Tian and
+               Tao Qin and
+               Liwei Wang and
+               Tie-Yan Liu},
+  publisher={Conference on Empirical Methods in Natural Language Processing},
+  pages={5707--5712},
  year={2019}
 }

 @inproceedings{Ho2016ModelFreeIL,
  title={Model-Free Imitation Learning with Policy Optimization},
-  author={Jonathan Ho and J. Gupta and S. Ermon},
-  booktitle={ICML},
+  author={Jonathan Ho and
+               Jayesh K. Gupta and
+               Stefano Ermon},
+  publisher={International Conference on Machine Learning},
+  pages={2760--2769},
  year={2016}
 }

 @inproceedings{Ho2016GenerativeAI,
  title={Generative Adversarial Imitation Learning},
-  author={Jonathan Ho and S. Ermon},
-  booktitle={NIPS},
+  author={Jonathan Ho and Stefano Ermon},
+  publisher={Conference and Workshop on Neural Information Processing Systems},
+  pages={4565--4573},
  year={2016}
 }

 @article{Duan2017OneShotIL,
  title={One-Shot Imitation Learning},
-  author={Yan Duan and Marcin Andrychowicz and Bradly C. Stadie and Jonathan Ho and J. Schneider and Ilya Sutskever and P. Abbeel and W. Zaremba},
-  journal={ArXiv},
+  author={Yan Duan and Marcin Andrychowicz and Bradly C. Stadie and Jonathan Ho and Jonas Schneider and Ilya Sutskever and Pieter Abbeel and Wojciech Zaremba},
+  journal={CoRR},
  year={2017},
  volume={abs/1703.07326}
 }

 @inproceedings{Wang2018SemiAutoregressiveNM,
  title={Semi-Autoregressive Neural Machine Translation},
-  author={C. Wang and Ji Zhang and Haiqing Chen},
-  booktitle={EMNLP},
+  author={Chunqi Wang and
+               Ji Zhang and
+               Haiqing Chen},
+  booktitle={Conference on Empirical Methods in Natural Language Processing},
+  pages={479--488},
  year={2018}
 }

 @inproceedings{Ghazvininejad2019MaskPredictPD,
  title={Mask-Predict: Parallel Decoding of Conditional Masked Language Models},
  author={Marjan Ghazvininejad and Omer Levy and Yinhan Liu and Luke Zettlemoyer},
-  booktitle={EMNLP/IJCNLP},
+  publisher={Conference on Empirical Methods in Natural Language Processing},
+  pages={6111--6120},
  year={2019}
 }

@@ -6852,7 +6984,9 @@ author    = {Yoshua Bengio and

 @article{Zhou2019SynchronousBN,
  title={Synchronous Bidirectional Neural Machine Translation},
-  author={L. Zhou and Jiajun Zhang and C. Zong},
+  author={Long Zhou and
+               Jiajun Zhang and
+               Chengqing Zong},
  journal={Transactions of the Association for Computational Linguistics},
  year={2019},
  volume={7},
@@ -6869,8 +7003,9 @@ author    = {Yoshua Bengio and

 @inproceedings{Feng2016ImprovingAM,
  title={Improving Attention Modeling with Implicit Distortion and Fertility for Machine Translation},
-  author={Shi Feng and Shujie Liu and Nan Yang and Mu Li and M. Zhou and K. Q. Zhu},
-  booktitle={COLING},
+  author={Shi Feng and Shujie Liu and Nan Yang and Mu Li and Ming Zhou and Kenny Q. Zhu},
+  booktitle={International Conference on Computational Linguistics},
+  pages={3082--3092},
  year={2016}
 }

@@ -6939,7 +7074,7 @@ author    = {Yoshua Bengio and
 @article{Peris2017InteractiveNM,
  title={Interactive neural machine translation},
  author={{\'A}lvaro Peris and Miguel Domingo and F. Casacuberta},
-  journal={Comput. Speech Lang.},
+  journal={Computer Speech and Language},
  year={2017},
  volume={45},
  pages={201-220}
@@ -6947,8 +7082,9 @@ author    = {Yoshua Bengio and

 @inproceedings{Peris2018ActiveLF,
  title={Active Learning for Interactive Neural Machine Translation of Data Streams},
-  author={{\'A}lvaro Peris and F. Casacuberta},
-  booktitle={CoNLL},
+  author={{\'A}lvaro Peris and Francisco Casacuberta},
+  publisher={The SIGNLL Conference on Computational Natural Language Learning},
+  pages={151--160},
  year={2018}
 }

@@ -6973,7 +7109,7 @@ author    = {Yoshua Bengio and
 }

 @article{61115,
-  author={J. {Lin}},
+  author={Jianhua Lin},
  journal={IEEE Transactions on Information Theory}, 
  title={Divergence measures based on the Shannon entropy}, 
  year={1991},
@@ -6987,13 +7123,8 @@ author    = {Yoshua Bengio and
               Atsushi Fujita},
  title     = {Recurrent Stacking of Layers for Compact Neural Machine Translation
               Models},
-  booktitle = {The Thirty-Third {AAAI} Conference on Artificial Intelligence, {AAAI}
-               2019, The Thirty-First Innovative Applications of Artificial Intelligence
-               Conference, {IAAI} 2019, The Ninth {AAAI} Symposium on Educational
-               Advances in Artificial Intelligence, {EAAI} 2019, Honolulu, Hawaii,
-               USA, January 27 - February 1, 2019},
  pages     = {6292--6299},
-  publisher = {{AAAI} Press},
+  publisher = {	AAAI Conference on Artificial Intelligence},
  year      = {2019}
 }

@@ -7081,10 +7212,8 @@ author    = {Yoshua Bengio and
               Dmitry Kalenichenko},
  title     = {Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only
               Inference},
-  booktitle = {2018 {IEEE} Conference on Computer Vision and Pattern Recognition,
-               {CVPR} 2018, Salt Lake City, UT, USA, June 18-22, 2018},
+  publisher = {{IEEE} Conference on Computer Vision and Pattern Recognition},
  pages     = {2704--2713},
-  publisher = {{IEEE} Computer Society},
  year      = {2018}
 }

@@ -7105,9 +7234,7 @@ author    = {Yoshua Bengio and
               Ran El-Yaniv and
               Yoshua Bengio},
  title     = {Binarized Neural Networks},
-  booktitle = {Advances in Neural Information Processing Systems 29: Annual Conference
-               on Neural Information Processing Systems 2016, December 5-10, 2016,
-               Barcelona, Spain},
+  publisher = {Conference and Workshop on Neural Information Processing Systems},
  pages     = {4107--4115},
  year      = {2016}
 }
@@ -7130,10 +7257,8 @@ author    = {Yoshua Bengio and
               Muhua Zhu and
               Huizhen Wang},
  title     = {Boosting-Based System Combination for Machine Translation},
-  booktitle = {{ACL} 2010, Proceedings of the 48th Annual Meeting of the Association
-               for Computational Linguistics, July 11-16, 2010, Uppsala, Sweden},
  pages     = {739--748},
-  publisher = {The Association for Computer Linguistics},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
  year      = {2010}
 }

@@ -7145,11 +7270,9 @@ author    = {Yoshua Bengio and
               Philip C. Woodland},
  title     = {Consensus Network Decoding for Statistical Machine Translation System
               Combination},
-  booktitle = {Proceedings of the {IEEE} International Conference on Acoustics, Speech,
-               and Signal Processing, {ICASSP} 2007, Honolulu, Hawaii, USA, April
-               15-20, 2007},
+  publisher = {Proceedings of the {IEEE} International Conference on Acoustics, Speech,
+               and Signal Processing},
  pages     = {105--108},
-  publisher = {{IEEE}},
  year      = {2007}
 }

@@ -7158,9 +7281,7 @@ author    = {Yoshua Bengio and
               Spyridon Matsoukas and
               Richard M. Schwartz},
  title     = {Improved Word-Level System Combination for Machine Translation},
-  booktitle = {{ACL} 2007, Proceedings of the 45th Annual Meeting of the Association
-               for Computational Linguistics, June 23-30, 2007, Prague, Czech Republic},
-  publisher = {The Association for Computational Linguistics},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
  year      = {2007}
 }

@@ -7171,10 +7292,8 @@ author    = {Yoshua Bengio and
               Richard M. Schwartz},
  title     = {Incremental Hypothesis Alignment for Building Confusion Networks with
               Application to Machine Translation System Combination},
-  booktitle = {Proceedings of the Third Workshop on Statistical Machine Translation,
-               WMT@ACL 2008, Columbus, Ohio, USA, June 19, 2008},
+  publisher = {Proceedings of the Third Workshop on Statistical Machine Translation},
  pages     = {183--186},
-  publisher = {Association for Computational Linguistics},
  year      = {2008}
 }

@@ -7184,11 +7303,8 @@ author    = {Yoshua Bengio and
               Tong Xiao and
               Ming Zhou},
  title     = {The Feature Subspace Method for SMT System Combination},
-  booktitle = {Proceedings of the 2009 Conference on Empirical Methods in Natural
-               Language Processing, {EMNLP} 2009, 6-7 August 2009, Singapore, {A}
-               meeting of SIGDAT, a Special Interest Group of the {ACL}},
+  publisher = {Conference on Empirical Methods in Natural Language Processing},
  pages     = {1096--1104},
-  publisher = {{ACL}},
  year      = {2009}
 }

@@ -7217,12 +7333,8 @@ author    = {Yoshua Bengio and
               Franz Josef Och and
               Wolfgang Macherey},
  title     = {Lattice Minimum Bayes-Risk Decoding for Statistical Machine Translation},
-  booktitle = {2008 Conference on Empirical Methods in Natural Language Processing,
-               {EMNLP} 2008, Proceedings of the Conference, 25-27 October 2008, Honolulu,
-               Hawaii, USA, {A} meeting of SIGDAT, a Special Interest Group of the
-               {ACL}},
+  publisher = {Conference on Empirical Methods in Natural Language Processing},
  pages     = {620--629},
-  publisher = {{ACL}},
  year      = {2008}
 }

@@ -7235,10 +7347,8 @@ author    = {Yoshua Bengio and
               Yang Liu},
  title     = {Lattice-Based Recurrent Neural Network Encoders for Neural Machine
               Translation},
-  booktitle = {Proceedings of the Thirty-First {AAAI} Conference on Artificial Intelligence,
-               February 4-9, 2017, San Francisco, California, {USA}},
+  publisher = {AAAI Conference on Artificial Intelligence},
  pages     = {3302--3308},
-  publisher = {{AAAI} Press},
  year      = {2017}
 }

@@ -7250,7 +7360,7 @@ author    = {Yoshua Bengio and
  publisher = {Proceedings of the Human Language Technology Conference of 
               the North American Chapter of the Association for Computational Linguistics},
  pages     = {464--468},
-  year      = {2018},
+  year      = {2018}
 }

 @inproceedings{WangLearning,
@@ -7272,9 +7382,7 @@ author    = {Yoshua Bengio and
               Edouard Grave and
               Armand Joulin},
  title     = {Reducing Transformer Depth on Demand with Structured Dropout},
-  booktitle = {8th International Conference on Learning Representations, {ICLR} 2020,
-               Addis Ababa, Ethiopia, April 26-30, 2020},
-  publisher = {OpenReview.net},
+  publisher = {International Conference on Learning Representations},
  year      = {2020}
 }

@@ -7282,16 +7390,10 @@ author    = {Yoshua Bengio and
  author    = {Qiang Wang and
               Tong Xiao and
               Jingbo Zhu},
-  editor    = {Trevor Cohn and
-               Yulan He and
-               Yang Liu},
  title     = {Training Flexible Depth Model by Multi-Task Learning for Neural Machine
               Translation},
-  booktitle = {Proceedings of the 2020 Conference on Empirical Methods in Natural
-               Language Processing: Findings, {EMNLP} 2020, Online Event, 16-20 November
-               2020},
  pages     = {4307--4312},
-  publisher = {Association for Computational Linguistics},
+  publisher = {Conference on Empirical Methods in Natural Language Processing},
  year      = {2020}
 }

@@ -7302,8 +7404,7 @@ author    = {Yoshua Bengio and
               Furu Wei and
               Ming Zhou},
  title     = {BERT-of-Theseus: Compressing {BERT} by Progressive Module Replacing},
-  journal   = {CoRR},
-  volume    = {abs/2002.02925},
+  publisher = {Conference on Empirical Methods in Natural Language Processing},
  year      = {2020}
 }

@@ -7311,9 +7412,7 @@ author    = {Yoshua Bengio and
  author    = {Alexei Baevski and
               Michael Auli},
  title     = {Adaptive Input Representations for Neural Language Modeling},
-  booktitle = {7th International Conference on Learning Representations, {ICLR} 2019,
-               New Orleans, LA, USA, May 6-9, 2019},
-  publisher = {OpenReview.net},
+  journal   = {arXiv preprint arXiv:1809.10853},
  year      = {2019}
 }

@@ -7361,9 +7460,7 @@ author    = {Yoshua Bengio and
               Ruslan Salakhutdinov and
               Quoc V. Le},
  title     = {Mixtape: Breaking the Softmax Bottleneck Efficiently},
-  booktitle = {Advances in Neural Information Processing Systems 32: Annual Conference
-               on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14
-               December 2019, Vancouver, BC, Canada},
+  booktitle = {Conference on Neural Information Processing Systems},
  pages     = {15922--15930},
  year      = {2019}
 }
@@ -7390,11 +7487,9 @@ author    = {Yoshua Bengio and
               Chenglong Wang and
               Tong Xiao and
               Jingbo Zhu},
-  title     = {The NiuTrans System for {WNGT} 2020 Efficiency Task},
-  booktitle = {Proceedings of the Fourth Workshop on Neural Generation and Translation,
-               NGT@ACL 2020, Online, July 5-10, 2020},
+  title     = {The NiuTrans System for WNGT 2020 Efficiency Task},
  pages     = {204--210},
-  publisher = {Association for Computational Linguistics},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
  year      = {2020}
 }

@@ -7431,37 +7526,38 @@ author    = {Yoshua Bengio and

 @inproceedings{Sun2019BaiduNM,
  title={Baidu Neural Machine Translation Systems for WMT19},
-  author={M. Sun and 
-          B. Jiang and 
-		  H. Xiong and 
-		  Zhongjun He and 
-		  H. Wu and 
-		  Haifeng Wang},
-  booktitle={WMT},
+  author    = {Meng Sun and
+               Bojian Jiang and
+               Hao Xiong and
+               Zhongjun He and
+               Hua Wu and
+               Haifeng Wang},
+  publisher={Annual Meeting of the Association for Computational Linguistics},
+  pages     = {374--381},
  year={2019}
 }

 @inproceedings{Wang2018TencentNM,
  title={Tencent Neural Machine Translation Systems for WMT18},
-  author={Mingxuan Wang and 
-          Li Gong and 
-		  Wenhuan Zhu and 
-		  J. Xie and 
-		  C. Bian},
-  booktitle={WMT},
+  author={Mingxuan Wang and
+          Li Gong and
+          Wenhuan Zhu and
+          Jun Xie and
+          Chao Bian},
+  publisher={Annual Meeting of the Association for Computational Linguistics},
+  pages={522--527},
  year={2018}
 }

 @article{Bi2019MultiagentLF,
  title={Multi-agent Learning for Neural Machine Translation},
  author={Tianchi Bi and 
-          H. Xiong and 
+          Hao Xiong and 
 		  Zhongjun He and 
-		  H. Wu and 
+		  Hua Wu and 
 		  Haifeng Wang},
-  journal={ArXiv},
-  year={2019},
-  volume={abs/1909.01101}
+  publisher={arXiv preprint arXiv:1909.01101},
+  year={2019}
 }

 @inproceedings{DBLP:conf/aclnmt/KoehnK17,
@@ -7475,23 +7571,73 @@ author    = {Yoshua Bengio and


 @inproceedings{Held2013AppliedSI,
-  title={Applied Statistical Inference: Likelihood and Bayes},
-  author={L. Held and Daniel Sabans Bov},
-  year={2013}
+  title={Applied statistical inference},
+  author={Leonhard Held and Saban{\'e}s Bov{\'e}, D},
+  volume={10},
+  number={978-3},
+  pages={16},
+  year={2014},
+  publisher={Springer}
+}
+
+
+
+@inproceedings{Zhang2016VariationalNM,
+  title={Variational Neural Machine Translation},
+  author    = {Biao Zhang and
+               Deyi Xiong and
+               Jinsong Su and
+               Hong Duan and
+               Min Zhang},
+  pages     = {521--530},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2016}
 }

 @inproceedings{Silvey2018StatisticalI,
  title={Statistical Inference},
  author={S. D. Silvey},
-  booktitle={Encyclopedia of Social Network Analysis and Mining. 2nd Ed.},
+  publisher={Encyclopedia of Social Network Analysis and Mining},
  year={2018}
 }

-@inproceedings{Zhang2016VariationalNM,
-  title={Variational Neural Machine Translation},
-  author={Biao Zhang and Deyi Xiong and Jinsong Su and H. Duan and Min Zhang},
-  booktitle={EMNLP},
-  year={2016}
+@inproceedings{Cheong2019transformersZ,
+  title={transformers.zip : Compressing Transformers with Pruning and Quantization},
+  author={Robin Cheong and Robel Daniel},
+  publisher={Stanford University},
+  year={2019}
+}
+
+@inproceedings{Beal2003VariationalAF,
+  title={Variational algorithms for approximate Bayesian inference},
+  author={Matthew J. Beal},
+  publisher={University College London},
+  year={2003}
+}
+
+@article{Gage1994ANA,
+  title={A new algorithm for data compression},
+  author={P. Gage},
+  journal={The C Users Journal archive},
+  year={1994},
+  volume={12},
+  pages={23-38}
+}
+
+@inproceedings{Eisner2011LearningST,
+  title={Learning Speed-Accuracy Tradeoffs in Nondeterministic Inference Algorithms},
+  author={J. Eisner and Hal Daum{\'e}},
+  publisher={Conference and Workshop on Neural Information Processing Systems},
+  year={2011}
+}
+
+@article{Kazimi2017CoverageFC,
+  title={Coverage for Character Based Neural Machine Translation},
+  author={M. Kazimi and Marta R. Costa-juss{\`a}},
+  journal={arXiv preprint arXiv:1810.02340},
+  year={2017},
+  volume={59},
+  pages={99-106}
 }
 %%%%% chapter 14------------------------------------------------------
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@@ -7970,7 +8116,7 @@ author    = {Yoshua Bengio and
  author    = {Ivan Vulic and
               Anna Korhonen},
  title     = {On the Role of Seed Lexicons in Learning Bilingual Word Embeddings},
-  publisher = {The Association for Computer Linguistics},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
  year      = {2016}
 }
 @inproceedings{DBLP:conf/iclr/SmithTHH17,
@@ -8616,7 +8762,7 @@ author    = {Yoshua Bengio and
  title     = {Using Context Vectors in Improving a Machine Translation System with
               Bridge Language},
  pages     = {318--322},
-  publisher = {The Association for Computer Linguistics},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
  year      = {2013}
 }
 @inproceedings{DBLP:conf/emnlp/ZhuHWZWZ14,
@@ -8640,7 +8786,7 @@ author    = {Yoshua Bengio and
               Satoshi Nakamura},
  title     = {Improving Pivot Translation by Remembering the Pivot},
  pages     = {573--577},
-  publisher = {The Association for Computer Linguistics},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
  year      = {2015}
 }
 @inproceedings{DBLP:conf/acl/CohnL07,
@@ -8666,7 +8812,7 @@ author    = {Yoshua Bengio and
               Haifeng Wang},
  title     = {Revisiting Pivot Language Approach for Machine Translation},
  pages     = {154--162},
-  publisher = {The Association for Computer Linguistics},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
  year      = {2009}
 }
 @article{DBLP:journals/corr/ChengLYSX16,
@@ -8711,7 +8857,7 @@ author    = {Yoshua Bengio and
               Rafael E. Banchs},
  title     = {Enhancing scarce-resource language translation through pivot combinations},
  pages     = {1361--1365},
-  publisher = {The Association for Computer Linguistics},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
  year      = {2011}
 }
 @article{DBLP:journals/corr/HintonVD15,
@@ -8758,7 +8904,7 @@ author    = {Yoshua Bengio and
               Haifeng Wang},
  title     = {Multi-Task Learning for Multiple Language Translation},
  pages     = {1723--1732},
-  publisher = {The Association for Computer Linguistics},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
  year      = {2015}
 }
 @article{DBLP:journals/tacl/LeeCH17,
@@ -9338,7 +9484,7 @@ author    = {Yoshua Bengio and
               Xiaohua Liu and
               Hang Li},
  title     = {Modeling Coverage for Neural Machine Translation},
-  publisher = {The Association for Computer Linguistics},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
  year      = {2016}
 }
 @article{DBLP:journals/tacl/TuLLLL17,
@@ -9676,3 +9822,429 @@ author    = {Yoshua Bengio and

 %%%%% chapter 18------------------------------------------------------
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%% chapter appendix-A------------------------------------------------------
+@inproceedings{Tong2012NiuTrans,
+  author    = {Tong Xiao and
+               Jingbo Zhu and
+               Hao Zhang and
+               Qiang Li},
+  title     = {NiuTrans: An Open Source Toolkit for Phrase-based and Syntax-based
+               Machine Translation},
+  pages     = {19--24},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2012}
+}
+
+@inproceedings{Li2010Joshua,
+  author    = {Zhifei Li and
+               Chris Callison-Burch and
+               Chris Dyer and
+               Sanjeev Khudanpur and
+               Lane Schwartz and
+               Wren N. G. Thornton and
+               Jonathan Weese and
+               Omar Zaidan},
+  title     = {Joshua: An Open Source Toolkit for Parsing-Based Machine Translation},
+  pages     = {135--139},
+  publisher = {Association for Computational Linguistics},
+  year      = {2009}
+}
+
+@inproceedings{iglesias2009hierarchical,
+  author    = {Gonzalo Iglesias and
+               Adri{\`{a}} de Gispert and
+               Eduardo Rodr{\'{\i}}guez Banga and
+               William J. Byrne},
+  title     = {Hierarchical Phrase-Based Translation with Weighted Finite State Transducers},
+  pages     = {433--441},
+  publisher = {The Association for Computational Linguistics},
+  year      = {2009}
+}
+
+@inproceedings{dyer2010cdec,
+  author    = {Chris Dyer and
+               Adam Lopez and
+               Juri Ganitkevitch and
+               Jonathan Weese and
+               Ferhan T{\"{u}}re and
+               Phil Blunsom and
+               Hendra Setiawan and
+               Vladimir Eidelman and
+               Philip Resnik},
+  title     = {cdec: {A} Decoder, Alignment, and Learning Framework for Finite-State
+               and Context-Free Translation Models},
+  pages     = {7--12},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2010}
+}
+
+@inproceedings{Cer2010Phrasal,
+  author    = {Daniel M. Cer and
+               Michel Galley and
+               Daniel Jurafsky and
+               Christopher D. Manning},
+  title     = {Phrasal: {A} Statistical Machine Translation Toolkit for Exploring
+               New Model Features},
+  pages     = {9--12},
+  publisher = {The Association for Computational Linguistics},
+  year      = {2010}
+}
+
+@article{vilar2012jane,
+  title={Jane: an advanced freely available hierarchical machine translation toolkit},
+  author={Vilar, David and Stein, Daniel and Huck, Matthias and Ney, Hermann},
+  publisher={Machine Translation},
+  volume={26},
+  number={3},
+  pages={197--216},
+  year={2012}
+}
+
+@inproceedings{DBLP:conf/naacl/DyerCS13,
+  author    = {Chris Dyer and
+               Victor Chahuneau and
+               Noah A. Smith},
+  title     = {A Simple, Fast, and Effective Reparameterization of {IBM} Model 2},
+  pages     = {644--648},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2013}
+}
+
+@article{al2016theano,
+  author    = {Rami Al-Rfou and
+               Guillaume Alain and
+               Amjad Almahairi and
+               Christof Angerm{\"{u}}ller and
+               Dzmitry Bahdanau and
+               Nicolas Ballas and
+               Fr{\'{e}}d{\'{e}}ric Bastien and
+               Justin Bayer and
+               Anatoly Belikov and
+               Alexander Belopolsky and
+               Yoshua Bengio and
+               Arnaud Bergeron and
+               James Bergstra and
+               Valentin Bisson and
+               Josh Bleecher Snyder and
+               Nicolas Bouchard and
+               Nicolas Boulanger-Lewandowski and
+               Xavier Bouthillier and
+               Alexandre de Br{\'{e}}bisson and
+               Olivier Breuleux and
+               Pierre Luc Carrier and
+               Kyunghyun Cho and
+               Jan Chorowski and
+               Paul F. Christiano and
+               Tim Cooijmans and
+               Marc-Alexandre C{\^{o}}t{\'{e}} and
+               Myriam C{\^{o}}t{\'{e}} and
+               Aaron C. Courville and
+               Yann N. Dauphin and
+               Olivier Delalleau and
+               Julien Demouth and
+               Guillaume Desjardins and
+               Sander Dieleman and
+               Laurent Dinh and
+               Melanie Ducoffe and
+               Vincent Dumoulin and
+               Samira Ebrahimi Kahou and
+               Dumitru Erhan and
+               Ziye Fan and
+               Orhan Firat and
+               Mathieu Germain and
+               Xavier Glorot and
+               Ian J. Goodfellow and
+               Matthew Graham and
+               {\c{C}}aglar G{\"{u}}l{\c{c}}ehre and
+               Philippe Hamel and
+               Iban Harlouchet and
+               Jean-Philippe Heng and
+               Bal{\'{a}}zs Hidasi and
+               Sina Honari and
+               Arjun Jain and
+               S{\'{e}}bastien Jean and
+               Kai Jia and
+               Mikhail Korobov and
+               Vivek Kulkarni and
+               Alex Lamb and
+               Pascal Lamblin and
+               Eric Larsen and
+               C{\'{e}}sar Laurent and
+               Sean Lee and
+               Simon Lefran{\c{c}}ois and
+               Simon Lemieux and
+               Nicholas L{\'{e}}onard and
+               Zhouhan Lin and
+               Jesse A. Livezey and
+               Cory Lorenz and
+               Jeremiah Lowin and
+               Qianli Ma and
+               Pierre-Antoine Manzagol and
+               Olivier Mastropietro and
+               Robert McGibbon and
+               Roland Memisevic and
+               Bart van Merri{\"{e}}nboer and
+               Vincent Michalski and
+               Mehdi Mirza and
+               Alberto Orlandi and
+               Christopher Joseph Pal and
+               Razvan Pascanu and
+               Mohammad Pezeshki and
+               Colin Raffel and
+               Daniel Renshaw and
+               Matthew Rocklin and
+               Adriana Romero and
+               Markus Roth and
+               Peter Sadowski and
+               John Salvatier and
+               Fran{\c{c}}ois Savard and
+               Jan Schl{\"{u}}ter and
+               John Schulman and
+               Gabriel Schwartz and
+               Iulian Vlad Serban and
+               Dmitriy Serdyuk and
+               Samira Shabanian and
+               {\'{E}}tienne Simon and
+               Sigurd Spieckermann and
+               S. Ramana Subramanyam and
+               Jakub Sygnowski and
+               J{\'{e}}r{\'{e}}mie Tanguay and
+               Gijs van Tulder and
+               Joseph P. Turian and
+               Sebastian Urban and
+               Pascal Vincent and
+               Francesco Visin and
+               Harm de Vries and
+               David Warde-Farley and
+               Dustin J. Webb and
+               Matthew Willson and
+               Kelvin Xu and
+               Lijun Xue and
+               Li Yao and
+               Saizheng Zhang and
+               Ying Zhang},
+  title     = {Theano: {A} Python framework for fast computation of mathematical
+               expressions},
+  journal   = {CoRR},
+  volume    = {abs/1605.02688},
+  year      = {2016}
+}
+
+@inproceedings{DBLP:journals/corr/SennrichFCBHHJL17,
+  author    = {Rico Sennrich and
+               Orhan Firat and
+               Kyunghyun Cho and
+               Barry Haddow and
+			   Alexandra Birch and
+               Julian Hitschler and
+               Marcin Junczys-Dowmunt and
+               Samuel L{\"{a}}ubli and
+               Antonio Valerio Miceli Barone and
+               Jozef Mokry and
+               Maria Nadejde},
+  title     = {Nematus: a Toolkit for Neural Machine Translation},
+  publisher = {European Association of Computational Linguistics},
+  pages     = {65--68},
+  year      = {2017}
+}
+
+@inproceedings{Koehn2007Moses,
+  author    = {Philipp Koehn and
+               Hieu Hoang and
+			    Alexandra Birch and
+               Chris Callison-Burch and
+               Marcello Federico and
+               Nicola Bertoldi and
+               Brooke Cowan and
+               Wade Shen and
+               Christine Moran and
+               Richard Zens and
+               Chris Dyer and
+               Ondrej Bojar and
+               Alexandra Constantin and
+               Evan Herbst},
+  title     = {Moses: Open Source Toolkit for Statistical Machine Translation},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2007}
+}
+
+@inproceedings{zollmann2007the,
+  author    = {Andreas Zollmann and
+               Ashish Venugopal and
+               Matthias Paulik and
+               Stephan Vogel},
+  title     = {The Syntax Augmented {MT} {(SAMT)} System at the Shared Task for the
+               2007 {ACL} Workshop on Statistical Machine Translation},
+  pages     = {216--219},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2007}
+}
+
+@article{och2003systematic,
+  author    = {Franz Josef Och and
+               Hermann Ney},
+  title     = {A Systematic Comparison of Various Statistical Alignment Models},
+  journal   = {Computational Linguistics},
+  volume    = {29},
+  number    = {1},
+  pages     = {19--51},
+  year      = {2003}
+}
+
+@inproceedings{zoph2016simple,
+  author    = {Barret Zoph and
+               Ashish Vaswani and
+               Jonathan May and
+               Kevin Knight},
+  title     = {Simple, Fast Noise-Contrastive Estimation for Large {RNN} Vocabularies},
+  pages     = {1217--1222},
+  publisher = {The Association for Computational Linguistics},
+  year      = {2016}
+}
+
+@inproceedings{Ottfairseq,
+  author    = {Myle Ott and
+               Sergey Edunov and
+               Alexei Baevski and
+               Angela Fan and
+               Sam Gross and
+               Nathan Ng and
+               David Grangier and
+               Michael Auli},
+  title     = {fairseq: {A} Fast, Extensible Toolkit for Sequence Modeling},
+  pages     = {48--53},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2019}
+}
+
+@inproceedings{Vaswani2018Tensor2TensorFN,
+   author    = {Ashish Vaswani and
+               Samy Bengio and
+               Eugene Brevdo and
+               Fran{\c{c}}ois Chollet and
+               Aidan N. Gomez and
+               Stephan Gouws and
+               Llion Jones and
+               Lukasz Kaiser and
+               Nal Kalchbrenner and
+               Niki Parmar and
+               Ryan Sepassi and
+               Noam Shazeer and
+               Jakob Uszkoreit},
+  title     = {Tensor2Tensor for Neural Machine Translation},
+  pages     = {193--199},
+  publisher = {Association for Machine Translation in the Americas},
+  year      = {2018}
+}
+
+@inproceedings{KleinOpenNMT,
+  author    = {Guillaume Klein and
+               Yoon Kim and
+               Yuntian Deng and
+               Jean Senellart and
+               Alexander M. Rush},
+  title     = {OpenNMT: Open-Source Toolkit for Neural Machine Translation},
+  pages     = {67--72},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2017}
+}
+
+@inproceedings{luong2016acl_hybrid,
+  author    = {Minh-Thang Luong and
+               Christopher D. Manning},
+  title     = {Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character
+               Models},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2016}
+}
+
+@article{ZhangTHUMT,
+  author    = {Jiacheng Zhang and
+               Yanzhuo Ding and
+               Shiqi Shen and
+               Yong Cheng and
+               Maosong Sun and
+               Huan-Bo Luan and
+               Yang Liu},
+  title     = {{THUMT:} An Open Source Toolkit for Neural Machine Translation},
+  journal   = {CoRR},
+  volume    = {abs/1706.06415},
+  year      = {2017}
+}
+
+@inproceedings{JunczysMarian,
+  author    = {Marcin Junczys-Dowmunt and
+               Roman Grundkiewicz and
+               Tomasz Dwojak and
+               Hieu Hoang and
+               Kenneth Heafield and
+               Tom Neckermann and
+               Frank Seide and
+               Ulrich Germann and
+               Alham Fikri Aji and
+               Nikolay Bogoychev and
+               Andr{\'{e}} F. T. Martins and
+               Alexandra Birch},
+  title     = {Marian: Fast Neural Machine Translation in {C++}},
+  pages     = {116--121},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2018}
+}
+
+@article{hieber2017sockeye,
+  author    = {Felix Hieber and
+               Tobias Domhan and
+               Michael Denkowski and
+               David Vilar and
+               Artem Sokolov and
+               Ann Clifton and
+               Matt Post},
+  title     = {Sockeye: {A} Toolkit for Neural Machine Translation},
+  journal   = {CoRR},
+  volume    = {abs/1712.05690},
+  year      = {2017}
+}
+
+@inproceedings{WangCytonMT,
+  author    = {Xiaolin Wang and
+               Masao Utiyama and
+               Eiichiro Sumita},
+  title     = {CytonMT: an Efficient Neural Machine Translation Open-source Toolkit
+               Implemented in {C++}},
+  pages     = {133--138},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2018}
+}
+
+@article{DBLP:journals/corr/abs-1805-10387,
+  author    = {Oleksii Kuchaiev and
+               Boris Ginsburg and
+               Igor Gitman and
+               Vitaly Lavrukhin and
+               Carl Case and
+               Paulius Micikevicius},
+  title     = {OpenSeq2Seq: extensible toolkit for distributed and mixed precision
+               training of sequence-to-sequence models},
+  journal   = {CoRR},
+  volume    = {abs/1805.10387},
+  year      = {2018}
+}
+
+@article{nmtpy2017,
+  author    = {Ozan Caglayan and
+               Mercedes Garc{\'{\i}}a-Mart{\'{\i}}nez and
+               Adrien Bardet and
+               Walid Aransa and
+               Fethi Bougares and
+               Lo{\"{\i}}c Barrault},
+  title     = {{NMTPY:} {A} Flexible Toolkit for Advanced Neural Machine Translation
+               Systems},
+  journal   = {Prague Bull. Math. Linguistics},
+  volume    = {109},
+  pages     = {15--28},
+  year      = {2017}
+}
+%%%%% chapter appendix-A------------------------------------------------------
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%