Commit 34080625 by zengxin

10 11 appendenix

parent 88515850
......@@ -925,14 +925,14 @@ a (\mathbi{s},\mathbi{h}) = \left\{ \begin{array}{ll}
%----------------------------------------------------------------------------------------
\subsection{训练}
\parinterval 在基于梯度的方法中,模型参数可以通过损失函数$L$参数的梯度进行不断更新。对于第$\textrm{step}$步参数更新,首先进行神经网络的前向计算,之后进行反向计算,并得到所有参数的梯度信息,再使用下面的规则进行参数更新:
\parinterval 在基于梯度的方法中,模型参数可以通过损失函数$L$对参数的梯度进行不断更新。对于第$\textrm{step}$步参数更新,首先进行神经网络的前向计算,之后进行反向计算,并得到所有参数的梯度信息,再使用下面的规则进行参数更新:
\begin{eqnarray}
\mathbi{w}_{\textrm{step}+1} = \mathbi{w}_{\textrm{step}} - \alpha \cdot \frac{ \partial L(\mathbi{w}_{\textrm{step}})} {\partial \mathbi{w}_{\textrm{step}} }
\label{eq:10-30}
\end{eqnarray}
\noindent 其中,$\mathbi{w}_{\textrm{step}}$表示更新前的模型参数,$\mathbi{w}_{\textrm{step}+1}$表示更新后的模型参数,$L(\mathbi{w}_{\textrm{step}})$表示模型相对于$\mathbi{w}_{\textrm{step}}$ 的损失,$\frac{\partial L(\mathbi{w}_{\textrm{step}})} {\partial \mathbi{w}_{\textrm{step}} }$表示损失函数的梯度,$\alpha$是更新的步进值。也就是说,给定一定量的训练数据,不断执行公式\eqref{eq:10-30}的过程。反复使用训练数据,直至模型参数达到收敛或者损失函数不再变化。通常,把公式的一次执行称为“一步”更新/训练,把访问完所有样本的训练称为“一轮”训练。
\noindent 其中,$\mathbi{w}_{\textrm{step}}$表示更新前的模型参数,$\mathbi{w}_{\textrm{step}+1}$表示更新后的模型参数,$L(\mathbi{w}_{\textrm{step}})$表示模型相对于$\mathbi{w}_{\textrm{step}}$ 的损失,$\frac{\partial L(\mathbi{w}_{\textrm{step}})} {\partial \mathbi{w}_{\textrm{step}} }$表示损失函数的梯度,$\alpha$是更新的步。也就是说,给定一定量的训练数据,不断执行公式\eqref{eq:10-30}的过程。反复使用训练数据,直至模型参数达到收敛或者损失函数不再变化。通常,把公式的一次执行称为“一步”更新/训练,把访问完所有样本的训练称为“一轮”训练。
\parinterval 将公式\eqref{eq:10-30}应用于神经机器翻译有几个基本问题需要考虑:1)损失函数的选择;2)参数初始化的策略,也就是如何设置$\mathbi{w}_0$;3)优化策略和学习率调整策略;4)训练加速。下面对这些问题进行讨论。
......@@ -1190,7 +1190,7 @@ L(\mathbi{Y},\widehat{\mathbi{Y}}) = \sum_{j=1}^n L_{\textrm{ce}}(\mathbi{y}_j,\
\subsubsection{2. 束搜索}
\vspace{0.5em}
\parinterval 束搜索是一种启发式图搜索算法。相比于全搜索,它可以减少搜索所占用的空间和时间,在每一步扩展的时候,剪掉一些质量比较差的结点,保留下一些质量较高的结点。具体到机器翻译任务,对于每一个目标语言位置,束搜索选择了概率最大的前$K$个单词进行扩展(其中$k$叫做束宽度,或简称为束宽)。如图\ref{fig:10-34}所示,假设\{$y_1, y_2,..., y_n$\}表示生成的目标语言序列,且$k=3$,则束搜索的具体过程为:在预测第一个位置时,可以通过模型得到$y_1$的概率分布,选取概率最大的前3个单词作为候选结果(假设分别为“have”, “has”, “it”)。在预测第二个位置的单词时,模型针对已经得到的三个候选结果(“have”, “has”, “it”)计算第二个单词的概率分布。因为$y_2$对应$|V|$种可能,总共可以得到$3 \times |V|$种结果。然后从中选取使序列概率$\funp{P}(y_2,y_1| \seq{{x}})$最大的前三个$y_2$作为新的输出结果,这样便得到了前两个位置的top-3译文。在预测其他位置时也是如此,不断重复此过程直到推断结束。可以看到,束搜索的搜索空间大小与束宽度有关,也就是:束宽度越大,搜索空间越大,更有可能搜索到质量更高的译文,但同时搜索会更慢。束宽度等于3,意味着每次只考虑三个最有可能的结果,贪婪搜索实际上便是束宽度为1的情况。在神经机器翻译系统实现中,一般束宽度设置在4~8之间。
\parinterval 束搜索是一种启发式图搜索算法。相比于全搜索,它可以减少搜索所占用的空间和时间,在每一步扩展的时候,剪掉一些质量比较差的结点,保留下一些质量较高的结点。具体到机器翻译任务,对于每一个目标语言位置,束搜索选择了概率最大的前$k$个单词进行扩展(其中$k$叫做束宽度,或简称为束宽)。如图\ref{fig:10-34}所示,假设\{$y_1, y_2,..., y_n$\}表示生成的目标语言序列,且$k=3$,则束搜索的具体过程为:在预测第一个位置时,可以通过模型得到$y_1$的概率分布,选取概率最大的前3个单词作为候选结果(假设分别为“have”, “has”, “it”)。在预测第二个位置的单词时,模型针对已经得到的三个候选结果(“have”, “has”, “it”)计算第二个单词的概率分布。因为$y_2$对应$|V|$种可能,总共可以得到$3 \times |V|$种结果。然后从中选取使序列概率$\funp{P}(y_2,y_1| \seq{{x}})$最大的前三个$y_2$作为新的输出结果,这样便得到了前两个位置的top-3译文。在预测其他位置时也是如此,不断重复此过程直到推断结束。可以看到,束搜索的搜索空间大小与束宽度有关,也就是:束宽度越大,搜索空间越大,更有可能搜索到质量更高的译文,但同时搜索会更慢。束宽度等于3,意味着每次只考虑三个最有可能的结果,贪婪搜索实际上便是束宽度为1的情况。在神经机器翻译系统实现中,一般束宽度设置在4~8之间。
%----------------------------------------------
\begin{figure}[htp]
......@@ -1245,7 +1245,7 @@ L(\mathbi{Y},\widehat{\mathbi{Y}}) = \sum_{j=1}^n L_{\textrm{ce}}(\mathbi{y}_j,\
% NEW SECTION
%----------------------------------------------------------------------------------------
\sectionnewpage
\section{及拓展阅读}
\section{及拓展阅读}
\parinterval 神经机器翻译是近几年的热门方向。无论是前沿性的技术探索,还是面向应用落地的系统研发,神经机器翻译已经成为当下最好的选择之一。研究人员对神经机器翻译的热情使得这个领域得到了快速的发展。本章作为神经机器翻译的入门章节,对神经机器翻译的建模思想和基础框架进行了描述。同时,对常用的神经机器翻译架构\ \dash \ 循环神经网络进行了讨论与分析。
......
......@@ -64,8 +64,8 @@ $\otimes$: & 按位乘运算 \\
\draw[-latex,thick] (c2.east) -- ([xshift=0.4cm]c2.east);
\node[inner sep=0pt, font=\tiny] at (0.75cm, -0.4cm) {$\mathbi{x}$};
\node[inner sep=0pt, font=\tiny] at ([yshift=-0.8cm]a.south) {$\mathbi{B}=\mathbi{x} * \mathbi{V} + \mathbi{b}_{\mathbi{W}}$};
\node[inner sep=0pt, font=\tiny] at ([yshift=-0.8cm]b.south) {$\mathbi{A}=\mathbi{x} * \mathbi{W} + \mathbi{b}_{\mathbi{V}}$};
\node[inner sep=0pt, font=\tiny] at ([yshift=-0.8cm]a.south) {$\mathbi{A}=\mathbi{x} * \mathbi{W} + \mathbi{b}_{\mathbi{W}}$};
\node[inner sep=0pt, font=\tiny] at ([yshift=-0.8cm]b.south) {$\mathbi{B}=\mathbi{x} * \mathbi{V} + \mathbi{b}_{\mathbi{V}}$};
\node[inner sep=0pt, font=\tiny] at (8.2cm, -0.4cm) {$\mathbi{y}=\mathbi{A} \otimes \sigma(\mathbi{B})$};
\end{tikzpicture}
\ No newline at end of file
......@@ -579,7 +579,7 @@
% NEW SUB-SECTION
%----------------------------------------------------------------------------------------
\section{及拓展阅读}
\section{及拓展阅读}
\parinterval 卷积是一种高效的神经网络结构,在图像、语音处理等领域取得了令人瞩目的成绩。本章介绍了卷积的概念及其特性,并对池化、填充等操作进行了讨论。本章介绍了具有高并行计算能力的机器翻译范式,即基于卷积神经网络的编码器-解码器框架。其在机器翻译任务上表现出色,并大幅度缩短了模型的训练周期。除了基础部分,本章还针对卷积计算进行了延伸,内容涉及逐通道卷积、逐点卷积、轻量卷积和动态卷积等。除了上述提及的内容,卷积神经网络及其变种在文本分类、命名实体识别、关系分类、事件抽取等其他自然语言处理任务上也有许多应用\upcite{Kim2014ConvolutionalNN,2011Natural,DBLP:conf/cncl/ZhouZXQBX17,DBLP:conf/acl/ChenXLZ015,DBLP:conf/coling/ZengLLZZ14}
......
......@@ -26,6 +26,110 @@
\begin{appendices}
\chapter{附录A}
\label{appendix-A}
\parinterval 从实践的角度,机器翻译的发展主要可以归功于两方面的推动作用:开源系统和评测。开源系统通过代码共享的方式使得最新的研究成果可以快速传播,同时实验结果可以复现。而评测比赛,使得各个研究组织的成果可以进行科学的对比,共同推动机器翻译的发展与进步。此外,开源项目也促进了不同团队之间的协作,让研究人员在同一个平台上集中力量攻关。
%----------------------------------------------------------------------------------------
% NEW SECTION
%----------------------------------------------------------------------------------------
\section{统计机器翻译开源系统}
\begin{itemize}
\vspace{0.5em}
\item NiuTrans.SMT。NiuTrans\upcite{Tong2012NiuTrans}是由东北大学自然语言处理实验室自主研发的统计机器翻译系统,该系统可支持基于短语的模型、基于层次短语的模型以及基于句法的模型。由于使用C++ 语言开发,所以该系统运行时间快,所占存储空间少。系统中内嵌有$n$-gram语言模型,故无需使用其他的系统即可对完成语言建模。网址:\url{http://opensource.niutrans.com/smt/index.html}
\vspace{0.5em}
\item Moses。Moses\upcite{Koehn2007Moses}是统计机器翻译时代最著名的系统之一,(主要)由爱丁堡大学的机器翻译团队开发。最新的Moses系统支持很多的功能,例如,它既支持基于短语的模型,也支持基于句法的模型。Moses 提供因子化翻译模型(Factored Translation Model),因此该模型可以很容易地对不同层次的信息进行建模。此外,它允许将混淆网络和字格作为输入,可缓解系统的1-best输出中的错误。Moses还提供了很多有用的脚本和工具,被机器翻译研究者广泛使用。网址:\url{http://www.statmt.org/moses/}
\vspace{0.5em}
\item Joshua。Joshua\upcite{Li2010Joshua}是由约翰霍普金斯大学的语言和语音处理中心开发的层次短语翻译系统。由于Joshua是由Java语言开发,所以它在不同的平台上运行或开发时具有良好的可扩展性和可移植性。Joshua也是使用非常广泛的开源机器翻译系统之一。网址:\url{https://cwiki.apache.org/confluence/display/JOSHUA/}
\vspace{0.5em}
\item SilkRoad。SilkRoad是由五个国内机构(中科院计算所、中科院软件所、中科院自动化所、厦门大学和哈尔滨工业大学)联合开发的基于短语的统计机器翻译系统。该系统是中国乃至亚洲地区第一个开源的统计机器翻译系统。SilkRoad支持多种解码器和规则提取模块,这样可以组合成不同的系统,提供多样的选择。网址:\url{http://www.nlp.org.cn/project/project.php?projid=14}
\vspace{0.5em}
\item SAMT。SAMT\upcite{zollmann2007the}是由卡内基梅隆大学机器翻译团队开发的语法增强的统计机器翻译系统。SAMT在解码的时候使用目标树来生成翻译规则,而不严格遵守目标语言的语法。SAMT 的一个亮点是它提供了简单但高效的方式在机器翻译中使用句法信息。由于SAMT在hadoop中实现,它可受益于大数据集的分布式处理。网址:\url{http://www.cs.cmu.edu/zollmann/samt/}
\vspace{0.5em}
\item HiFST。HiFST\upcite{iglesias2009hierarchical}是剑桥大学开发的统计机器翻译系统。该系统完全基于有限状态自动机实现,因此非常适合对搜索空间进行有效的表示。网址:\url{http://ucam-smt.github.io/}
\vspace{0.5em}
\item cdec。cdec\upcite{dyer2010cdec}是一个强大的解码器,是由Chris Dyer 和他的合作者们一起开发。cdec的主要功能是它使用了翻译模型的一个统一的内部表示,并为结构预测问题的各种模型和算法提供了实现框架。所以,cdec也可以被用来做一个对齐系统或者一个更通用的学习框架。此外,由于使用C++语言编写,cdec的运行速度较快。网址:\url{http://cdec-decoder.org/index.php?title=MainPage}
\vspace{0.5em}
\item Phrasal。Phrasal\upcite{Cer2010Phrasal}是由斯坦福大学自然语言处理小组开发的系统。除了传统的基于短语的模型,Phrasal还支持基于非层次短语的模型,这种模型将基于短语的翻译延伸到非连续的短语翻译,增加了模型的泛化能力。网址:\url{http://nlp.stanford.edu/phrasal/}
\vspace{0.5em}
\item Jane。Jane\upcite{vilar2012jane}是一个基于短语和基于层次短语的机器翻译系统,由亚琛工业大学的人类语言技术与模式识别小组开发。Jane提供了系统融合模块,因此可以非常方便的对多个系统进行融合。网址:\url{https://www-i6.informatik.rwth-aachen.de/jane/}
\vspace{0.5em}
\item GIZA++。GIZA++\upcite{och2003systematic}是Franz Och研发的用于训练IBM模型1-5和HMM单词对齐模型的工具包。在早期,GIZA++是所有统计机器翻译系统中词对齐的标配工具。网址:\url{https://github.com/moses-smt/giza-pp}
\vspace{0.5em}
\item FastAlign。FastAlign\upcite{DBLP:conf/naacl/DyerCS13}是一个快速,无监督的词对齐工具,由卡内基梅隆大学开发。网址:\url{https://github.com/clab/fast\_align}
\vspace{0.5em}
\end{itemize}
%----------------------------------------------------------------------------------------
% NEW SECTION
%----------------------------------------------------------------------------------------
\section{神经机器翻译开源系统}
\begin{itemize}
\vspace{0.5em}
\item GroundHog。GroundHog\upcite{bahdanau2014neural}基于Theano\upcite{al2016theano}框架,由蒙特利尔大学LISA 实验室使用Python语言编写的一个框架,旨在提供灵活而高效的方式来实现复杂的循环神经网络模型。它提供了包括LSTM在内的多种模型。Bahdanau等人在此框架上又编写了GroundHog神经机器翻译系统。该系统也作为了很多论文的基线系统。网址:\url{https://github.com/lisa-groundhog/GroundHog}
\vspace{0.5em}
\item Nematus。Nematus\upcite{DBLP:journals/corr/SennrichFCBHHJL17}是英国爱丁堡大学开发的,基于Theano框架的神经机器翻译系统。该系统使用GRU作为隐层单元,支持多层网络。Nematus 编码端有正向和反向的编码方式,可以同时提取源语句子中的上下文信息。该系统的一个优点是,它可以支持输入端有多个特征的输入(例如词的词性等)。网址:\url{https://github.com/EdinburghNLP/nematus}
\vspace{0.5em}
\item ZophRNN。ZophRNN\upcite{zoph2016simple}是由南加州大学的Barret Zoph 等人使用C++语言开发的系统。Zoph既可以训练序列表示模型(如语言模型),也可以训练序列到序列的模型(如神经机器翻译模型)。当训练神经机器翻译系统时,ZophRNN也支持多源输入。网址:\url{https://github.com/isi-nlp/Zoph\_RNN}
\vspace{0.5em}
\item Fairseq。Fairseq\upcite{Ottfairseq}是由Facebook开发的,基于PyTorch框架的用以解决序列到序列问题的工具包,其中包括基于卷积神经网络、基于循环神经网络、基于Transformer的模型等。Fairseq是当今使用最广泛的神经机器翻译开源系统之一。网址:\url{https://github.com/facebookresearch/fairseq}
\vspace{0.5em}
\item Tensor2Tensor。Tensor2Tensor\upcite{Vaswani2018Tensor2TensorFN}是由谷歌推出的,基于TensorFlow框架的开源系统。该系统基于Transformer模型,因此可以支持大多数序列到序列任务。得益于Transformer 的网络结构,系统的训练速度较快。现在,Tensor2Tensor也是机器翻译领域广泛使用的开源系统之一。网址:\url{https://github.com/tensorflow/tensor2tensor}
\vspace{0.5em}
\item OpenNMT。OpenNMT\upcite{KleinOpenNMT}系统是由哈佛大学自然语言处理研究组开源的,基于Torch框架的神经机器翻译系统。OpenNMT系统的早期版本使用Lua 语言编写,现在也扩展到了TensorFlow和PyTorch,设计简单易用,易于扩展,同时保持效率和翻译精度。网址:\url{https://github.com/OpenNMT/OpenNMT}
\vspace{0.5em}
\item 斯坦福神经机器翻译开源代码库。斯坦福大学自然语言处理组(Stanford NLP)发布了一篇教程,介绍了该研究组在神经机器翻译上的研究信息,同时实现了多种翻译模型\upcite{luong2016acl_hybrid}。 网址:\url{https://nlp.stanford.edu/projects/nmt/}
\vspace{0.5em}
\item THUMT。清华大学NLP团队实现的神经机器翻译系统,支持Transformer等模型\upcite{ZhangTHUMT}。该系统主要基于TensorFlow和Theano实现,其中Theano版本包含了RNNsearch模型,训练方式包括MLE (Maximum Likelihood Estimate), MRT(Minimum Risk Training), SST(Semi-Supervised Training)。TensorFlow 版本实现了Seq2Seq, RNNsearch, Transformer三种基本模型。网址:\url{https://github.com/THUNLP-MT/THUMT}
\vspace{0.5em}
\item NiuTrans.NMT。由小牛翻译团队基于NiuTensor实现的神经机器翻译系统。支持循环神经网络、Transformer等结构,并支持语言建模、序列标注、机器翻译等任务。支持机器翻译GPU与CPU 训练及解码。其小巧易用,为开发人员提供快速二次开发基础。此外,NiuTrans.NMT已经得到了大规模应用,形成了支持304种语言翻译的小牛翻译系统。网址:\url{http://opensource.niutrans.com/niutensor/index.html}
\vspace{0.5em}
\item MARIAN。主要由微软翻译团队搭建\upcite{JunczysMarian},其使用C++实现的用于GPU/CPU训练和解码的引擎,支持多GPU训练和批量解码,最小限度依赖第三方库,静态编译一次之后,复制其二进制文件就能在其他平台使用。网址:\url{https://marian-nmt.github.io/}
\vspace{0.5em}
\item Sockeye。由Awslabs开发的神经机器翻译框架\upcite{hieber2017sockeye}。其中支持RNNsearch、Transformer、CNN等翻译模型,同时提供了从图片翻译到文字的模块以及WMT 德英新闻翻译、领域适应任务、多语言零资源翻译任务的教程。网址:\url{https://awslabs.github.io/sockeye/}
\vspace{0.5em}
\item CytonMT。由NICT开发的一种用C++实现的神经机器翻译开源工具包\upcite{WangCytonMT}。主要支持Transformer模型,并支持一些常用的训练方法以及解码方法。网址:\url{https://github.com/arthurxlw/cytonMt}
\vspace{0.5em}
\item OpenSeq2Seq。由NVIDIA团队开发的\upcite{DBLP:journals/corr/abs-1805-10387}基于TensorFlow的模块化架构,用于序列到序列的模型,允许从可用组件中组装新模型,支持混合精度训练,利用NVIDIA Volta Turing GPU中的Tensor核心,基于Horovod的快速分布式训练,支持多GPU,多节点多模式。网址:\url{https://nvidia.github.io/OpenSeq2Seq/html/index.html}
\vspace{0.5em}
\item NMTPyTorch。由勒芒大学语言实验室发布的基于序列到序列框架的神经网络翻译系统\upcite{nmtpy2017},NMTPyTorch的核心部分依赖于Numpy,PyTorch和tqdm。其允许训练各种端到端神经体系结构,包括但不限于神经机器翻译、图像字幕和自动语音识别系统。网址:\url{https://github.com/lium-lst/nmtpytorch}
\vspace{0.5em}
\end{itemize}
%----------------------------------------------------------------------------------------
% NEW SECTION
%----------------------------------------------------------------------------------------
\section{公开评测任务}
\parinterval 机器翻译相关评测主要有两种组织形式,一种是由政府及国家相关机构组织,权威性强。如由美国国家标准技术研究所组织的NIST评测、日本国家科学咨询系统中心主办的NACSIS Test Collections for IR(NTCIR)PatentMT、日本科学振兴机构(Japan Science and Technology Agency,简称JST)等组织联合举办的Workshop on Asian Translation(WAT)以及国内由中文信息学会主办的全国机器翻译大会(China Conference on Machine Translation,简称CCMT);另一种是由相关学术机构组织,具有领域针对性的特点,如倾向新闻领域的Conference on Machine Translation(WMT)以及面向口语的International Workshop on Spoken Language Translation(IWSLT)。下面将针对上述评测进行简要介绍。
\begin{itemize}
\vspace{0.5em}
\item CCMT(全国机器翻译大会),前身为CWMT(全国机器翻译研讨会)是国内机器翻译领域的旗舰会议,自2005年起已经组织多次机器翻译评测,对国内机器翻译相关技术的发展产生了深远影响。该评测主要针对汉语、英语以及国内的少数民族语言(蒙古语、藏语、维吾尔语等)进行评测,领域包括新闻、口语、政府文件等,不同语言方向对应的领域也有所不同。评价方式不同届略有不同,主要采用自动评价的方式,自CWMT\ 2013起则针对某些领域增设人工评价。自动评价的指标一般包括BLEU-SBP、BLEU-NIST、TER、METEOR、NIST、GTM、mWER、mPER 以及ICT 等,其中以BLEU-SBP 为主,汉语为目标语的翻译采用基于字符的评价方式,面向英语的翻译采用基于词的评价方式。每年该评测吸引国内外近数十家企业及科研机构参赛,业内认可度极高。关于CCMT的更多信息可参考中文信息学会机器翻译专业委员会相关页面:\url{http://sc.cipsc.org.cn/mt/index.php/CWMT.html}
\vspace{0.5em}
\item WMT由Special Interest Group for Machine Translation(SIGMT)主办,会议自2006年起每年召开一次,是一个涉及机器翻译多种任务的综合性会议,包括多领域翻译评测任务、质量评价任务以及其他与机器翻译的相关任务(如文档对齐评测等)。现在WMT已经成为机器翻译领域的旗舰评测会议,很多研究工作都以WMT评测结果作为基准。WMT评测涉及的语言范围较广,包括英语、德语、芬兰语、捷克语、罗马尼亚语等十多种语言,翻译方向一般以英语为核心,探索英语与其他语言之间的翻译性能,领域包括新闻、信息技术、生物医学。最近,也增加了无指导机器翻译等热门问题。WMT在评价方面类似于CCMT,也采用人工评价与自动评价相结合的方式,自动评价的指标一般为BLEU、TER 等。此外,WMT公开了所有评测数据,因此也经常被机器翻译相关人员所使用。更多WMT的机器翻译评测相关信息可参考SIGMT官网:\url{http://www.sigmt.org/}
\vspace{0.5em}
\item NIST机器翻译评测开始于2001年,是早期机器翻译公开评测中颇具代表性的任务,现在WMT和CCMT很多任务的设置也大量参考了当年NIST评测的内容。NIST评测由美国国家标准技术研究所主办,作为美国国防高级计划署(DARPA)中TIDES计划的重要组成部分。早期,NIST评测主要评价阿拉伯语和汉语等语言到英语的翻译效果,评价方法一般采用人工评价与自动评价相结合的方式。人工评价采用5分制评价。自动评价使用多种方式,包括BLEU,METEOR,TER以及HyTER。此外NIST从2016 年起开始对稀缺语言资源技术进行评估,其中机器翻译作为其重要组成部分共同参与评测,评测指标主要为BLEU。除对机器翻译系统进行评测之外,NIST在2008 和2010年对于机器翻译的自动评价方法(MetricsMaTr)也进行了评估,以鼓励更多研究人员对现有评价方法进行改进或提出更加贴合人工评价的方法。同时NIST评测所提供的数据集由于数据质量较高受到众多科研人员喜爱,如MT04,MT06等(汉英)平行语料经常被科研人员在实验中使用。不过,近几年NIST评测已经停止。更多NIST的机器翻译评测相关信息可参考官网:\url{https://www.nist.gov/programs-projects/machine-translation}
\vspace{0.5em}
\item 从2004年开始举办的IWSLT也是颇具特色的机器翻译评测,它主要关注口语相关的机器翻译任务,测试数据包括TED talks的多语言字幕以及QED 教育讲座影片字幕等,语言涉及英语、法语、德语、捷克语、汉语、阿拉伯语等众多语言。此外在IWSLT 2016 中还加入了对于日常对话的翻译评测,尝试将微软Skype中一种语言的对话翻译成其他语言。评价方式采用自动评价的模式,评价标准和WMT类似,一般为BLEU 等指标。另外,IWSLT除了对文本到文本的翻译评测外,还有自动语音识别以及语音转另一种语言的文本的评测。更多IWSLT的机器翻译评测相关信息可参考IWSLT\ 2019官网:\url{https://workshop2019.iwslt.org/}
\vspace{0.5em}
\item 日本举办的机器翻译评测WAT是亚洲范围内的重要评测之一,由日本科学振兴机构(JST)、情报通信研究机构(NICT)等多家机构共同组织,旨在为亚洲各国之间交流融合提供便宜之处。语言方向主要包括亚洲主流语言(汉语、韩语、印地语等)以及英语对日语的翻译,领域丰富多样,包括学术论文、专利、新闻、食谱等。评价方式包括自动评价(BLEU、RIBES以及AMFM 等)以及人工评价,其特点在于对于测试语料以段落为单位进行评价,考察其上下文关联的翻译效果。更多WAT的机器翻译评测相关信息可参考官网:\url{http://lotus.kuee.kyoto-u.ac.jp/WAT/}
\vspace{0.5em}
\item NTCIR计划是由日本国家科学咨询系统中心策划主办的,旨在建立一个用在自然语言处理以及信息检索相关任务上的日文标准测试集。在NTCIR-9和NTCIR-10中开设的Patent Machine Translation(PatentMT)任务主要针对专利领域进行翻译测试,其目的在于促进机器翻译在专利领域的发展和应用。在NTCIR-9中,评测方式采取人工评价与自动评价相结合,以人工评价为主导。人工评价主要根据准确度和流畅度进行评估,自动评价采用BLEU、NIST等方式进行。NTCIR-10评价方式在此基础上增加了专利审查评估、时间评估以及多语种评估,分别考察机器翻译系统在专利领域翻译的实用性、耗时情况以及不同语种的翻译效果等。更多NTCIR评测相关信息可参考官网:\url{http://research.nii.ac.jp/ntcir/index-en.html}
\vspace{0.5em}
\end{itemize}
\parinterval 以上评测数据大多可以从评测网站上下载,此外部分数据也可以从LDC(Lingu-istic Data Consortium)上申请,网址为\url{https://www.ldc.upenn.edu/}。ELRA(Euro-pean Language Resources Association)上也有一些免费的语料库供研究使用,其官网为\url{http://www.elra.info/}。从机器翻译发展的角度看,这些评测任务给相关研究提供了基准数据集,使得不同的系统都可以在同一个环境下进行比较和分析,进而建立了机器翻译研究所需的实验基础。此外,公开评测也使得研究者可以第一时间了解机器翻译研究的最新成果,比如,有多篇ACL会议最佳论文的灵感就来自当年参加机器翻译评测任务的系统。
\end{appendices}
%----------------------------------------------------------------------------------------
% CHAPTER APPENDIX B
%----------------------------------------------------------------------------------------
\begin{appendices}
\chapter{附录B}
\label{appendix-B}
\parinterval 在构建机器翻译系统的过程中,数据是必不可少的,尤其是现在主流的神经机器翻译系统,系统的性能往往受限于语料库规模和质量。所幸的是,随着语料库语言学的发展,一些主流语种的相关语料资源已经十分丰富。
\parinterval 为了方便读者进行相关研究,我们汇总了几个常用的基准数据集,这些数据集已经在机器翻译领域中被广泛使用,有很多之前的相关工作可以进行复现和对比。同时,我们收集了一下常用的平行语料,方便读者进行一些探索。
......@@ -161,12 +265,12 @@
\end{appendices}
%----------------------------------------------------------------------------------------
% CHAPTER APPENDIX B
% CHAPTER APPENDIX C
%----------------------------------------------------------------------------------------
\begin{appendices}
\chapter{附录B}
\label{appendix-B}
\chapter{附录C}
\label{appendix-C}
%----------------------------------------------------------------------------------------
% NEW SECTION
......
......@@ -4,11 +4,12 @@
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%% chapter 1------------------------------------------------------
@book{慧立彦宗1983大慈恩寺三藏法师传,
title={大慈恩寺三藏法师传},
author={慧立彦宗},
publisher={中华书局},
year={1983},
@book{慧立2000大慈恩寺三藏法師傳,
title={大慈恩寺三藏法師傳},
author={慧立 and 彦悰 and 道宣},
volume={2},
year={2000},
publisher={中华书局}
}
@book{2019cns,
......@@ -3853,16 +3854,11 @@ year = {2012}
%%%%% chapter 9------------------------------------------------------
@article{brown1992class,
title={Class-based n-gram models of natural language},
author={Brown and
Peter F and
Desouza and
Peter V and
Mercer amd
Robert L
and Pietra and
Vincent J Della
and Lai and
Jenifer C},
author={Peter F. Brown and
Vincent J. Della Pietra and
Peter V. De Souza and
Jennifer C. Lai and
Robert L. Mercer},
journal={Computational linguistics},
volume={18},
number={4},
......@@ -3872,10 +3868,8 @@ year = {2012}
@inproceedings{mikolov2012context,
title={Context dependent recurrent neural network language model},
author={Mikolov and
Tomas and
Zweig and
Geoffrey},
author={Tomas Mikolov and
Geoffrey Zweig},
publisher={IEEE Spoken Language Technology Workshop},
pages={234--239},
year={2012}
......@@ -3883,38 +3877,28 @@ year = {2012}
@article{zaremba2014recurrent,
title={Recurrent Neural Network Regularization},
author={Zaremba and
Wojciech and
Sutskever and
Ilya and
Vinyals and
Oriol},
author={Wojciech Zaremba and
Ilya Sutskever and
Oriol Vinyals},
journal={arXiv: Neural and Evolutionary Computing},
year={2014}
}
@article{zilly2016recurrent,
title={Recurrent Highway Networks},
author={Zilly and
Julian and
Srivastava and
Rupesh Kumar and
Koutnik and
Jan and
Schmidhuber and
Jurgen},
author={Julian G. Zilly and
Rupesh Kumar Srivastava and
Jan Koutn{\'{\i}}k and
J{\"{u}}rgen Schmidhuber},
journal={International Conference on Machine Learning},
year={2016}
}
@article{merity2017regularizing,
title={Regularizing and optimizing LSTM language models},
author={Merity and
tephen and
Keskar and
Nitish Shirish and
Socher and
Richard},
author={Stephen Merity and
Nitish Shirish Keskar and
Richard Socher},
journal={International Conference on Learning Representations},
year={2017}
}
......@@ -3992,7 +3976,7 @@ year = {2012}
@article{Ba2016LayerN,
author = {Lei Jimmy Ba and
Jamie Ryan Kiros and
Geoffrey E. Hinton},
Geoffrey Hinton},
title = {Layer Normalization},
journal = {CoRR},
volume = {abs/1607.06450},
......@@ -4017,7 +4001,7 @@ year = {2012}
Satoshi Nakamura},
title = {Incorporating Discrete Translation Lexicons into Neural Machine Translation},
pages = {1557--1567},
publisher = {Annual Meeting of the Association for Computational Linguistics},
publisher = {Conference on Empirical Methods in Natural Language Processing},
year = {2016}
}
......@@ -4061,7 +4045,7 @@ year = {2012}
year={2011}
}
@inproceedings{mccann2017learned,
author = {Bryan McCann and
author = {Bryan Mccann and
James Bradbury and
Caiming Xiong and
Richard Socher},
......@@ -4080,15 +4064,15 @@ year = {2012}
Matt Gardner and
Christopher Clark and
Kenton Lee and
L. Zettlemoyer},
publisher={arXiv preprint arXiv:1802.05365},
Luke Zettlemoyer},
publisher={Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics},
year={2018}
}
@article{Graves2013HybridSR,
title={Hybrid speech recognition with Deep Bidirectional LSTM},
author={A. Graves and
author={Alex Graves and
Navdeep Jaitly and
Abdel-rahman Mohamed},
publisher={IEEE Workshop on Automatic Speech Recognition and Understanding},
......@@ -4100,7 +4084,7 @@ year = {2012}
title={Character-Word LSTM Language Models},
author={Lyan Verwimp and
Joris Pelemans and
H. V. Hamme and
Hugo Van Hamme and
Patrick Wambacq},
publisher={European Association of Computational Linguistics},
year={2017}
......@@ -4111,7 +4095,7 @@ year = {2012}
Kyunghyun Cho},
title = {Gated Word-Character Recurrent Language Model},
pages = {1992--1997},
publisher = {Annual Meeting of the Association for Computational Linguistics},
publisher = {Conference on Empirical Methods in Natural Language Processing},
year = {2016}
}
@inproceedings{Hwang2017CharacterlevelLM,
......@@ -4145,7 +4129,7 @@ year = {2012}
title={Larger-Context Language Modelling},
author={Tian Wang and
Kyunghyun Cho},
journal={arXiv preprint arXiv:1511.03729},
journal={Annual Meeting of the Association for Computational Linguistics},
year={2015}
}
@article{Adel2015SyntacticAS,
......@@ -4173,7 +4157,7 @@ year = {2012}
}
@inproceedings{Pham2016ConvolutionalNN,
title={Convolutional Neural Network Language Models},
author={Ngoc-Quan Pham and
author={Ngoc-quan Pham and
German Kruszewski and
Gemma Boleda},
publisher={Conference on Empirical Methods in Natural Language Processing},
......@@ -4267,9 +4251,9 @@ year = {2012}
@inproceedings{Bastings2017GraphCE,
title={Graph Convolutional Encoders for Syntax-aware Neural Machine Translation},
author={Jasmijn Bastings and
Ivan Titov and W. Aziz and
Ivan Titov and Wilker Aziz and
Diego Marcheggiani and
K. Sima'an},
Khalil Sima'an},
publisher={Conference on Empirical Methods in Natural Language Processing},
year={2017}
}
......@@ -4726,8 +4710,8 @@ author = {Yoshua Bengio and
Quoc V. Le and
Ruslan Salakhutdinov},
title = {Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context},
journal = {CoRR},
volume = {abs/1901.02860},
journal = {Annual Meeting of the Association for Computational Linguistics},
pages = {2978--2988},
year = {2019}
}
@inproceedings{li-etal-2019-word,
......@@ -4809,7 +4793,7 @@ author = {Yoshua Bengio and
year = {2017}
}
@article{Hinton2015Distilling,
author = {Geoffrey E. Hinton and
author = {Geoffrey Hinton and
Oriol Vinyals and
Jeffrey Dean},
title = {Distilling the Knowledge in a Neural Network},
......@@ -4820,7 +4804,7 @@ author = {Yoshua Bengio and
@inproceedings{Ott2018ScalingNM,
title={Scaling Neural Machine Translation},
author={Myle Ott and Sergey Edunov and David Grangier and M. Auli},
author={Myle Ott and Sergey Edunov and David Grangier and Michael Auli},
publisher={Annual Meeting of the Association for Computational Linguistics},
year={2018}
}
......@@ -4841,7 +4825,7 @@ author = {Yoshua Bengio and
Alexander M. Rush},
title = {Sequence-Level Knowledge Distillation},
pages = {1317--1327},
publisher = {Annual Meeting of the Association for Computational Linguistics},
publisher = {Conference on Empirical Methods in Natural Language Processing},
year = {2016}
}
@article{Akaike1969autoregressive,
......@@ -4877,7 +4861,7 @@ author = {Yoshua Bengio and
}
@inproceedings{He2018LayerWiseCB,
title={Layer-Wise Coordination between Encoder and Decoder for Neural Machine Translation},
author={Tianyu He and X. Tan and Yingce Xia and D. He and T. Qin and Zhibo Chen and T. Liu},
author={Tianyu He and Xu Tan and Yingce Xia and Di He and Tao Qin and Zhibo Chen and Tie-Yan Liu},
publisher={Conference on Neural Information Processing Systems},
year={2018}
}
......@@ -4955,7 +4939,7 @@ author = {Yoshua Bengio and
Deyi Xiong},
title = {Encoding Gated Translation Memory into Neural Machine Translation},
pages = {3042--3047},
publisher = {Annual Meeting of the Association for Computational Linguistics},
publisher = {Conference on Empirical Methods in Natural Language Processing},
year = {2018}
}
@inproceedings{yang-etal-2016-hierarchical,
......@@ -4967,7 +4951,7 @@ author = {Yoshua Bengio and
Eduard H. Hovy},
title = {Hierarchical Attention Networks for Document Classification},
pages = {1480--1489},
publisher = {Annual Meeting of the Association for Computational Linguistics},
publisher = {Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics},
year = {2016}
}
%%%%% chapter 10------------------------------------------------------
......@@ -4982,7 +4966,7 @@ author = {Yoshua Bengio and
Jian Sun},
title = {Faster {R-CNN:} Towards Real-Time Object Detection with Region Proposal
Networks},
journal = {Institute of Electrical and Electronics Engineers},
journal = {{IEEE} Transactions on Pattern Analysis and Machine Intelligence},
volume = {39},
number = {6},
pages = {1137--1149},
......@@ -4998,10 +4982,9 @@ author = {Yoshua Bengio and
Cheng-Yang Fu and
Alexander C. Berg},
title = {{SSD:} Single Shot MultiBox Detector},
publisher = {European Conference on Computer Vision},
publisher = {European Conference on Computer Vision},
volume = {9905},
pages = {21--37},
publisher = {Springer},
year = {2016}
}
......@@ -5027,7 +5010,7 @@ author = {Yoshua Bengio and
Qun Liu},
title = {genCNN: {A} Convolutional Architecture for Word Sequence Prediction},
pages = {1567--1576},
publisher = {The Association for Computer Linguistics},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2015}
}
......@@ -5037,7 +5020,7 @@ author = {Yoshua Bengio and
Navdeep Jaitly},
title = {Very deep convolutional networks for end-to-end speech recognition},
pages = {4845--4849},
publisher = {Institute of Electrical and Electronics Engineers},
publisher = {International Conference on Acoustics, Speech and Signal Processing},
year = {2017}
}
......@@ -5048,7 +5031,7 @@ author = {Yoshua Bengio and
title = {A deep convolutional neural network using heterogeneous pooling for
trading acoustic invariance with phonetic confusion},
pages = {6669--6673},
publisher = {Institute of Electrical and Electronics Engineers},
publisher = {International Conference on Acoustics, Speech and Signal Processing},
year = {2013}
}
......@@ -5057,8 +5040,7 @@ author = {Yoshua Bengio and
Hieu Pham and
Christopher D. Manning},
title = {Effective Approaches to Attention-based Neural Machine Translation},
publisher = {Conference on Empirical Methods in Natural
Language Processing},
publisher = {Conference on Empirical Methods in Natural Language Processing},
pages = {1412--1421},
year = {2015}
}
......@@ -5082,7 +5064,7 @@ author = {Yoshua Bengio and
title = {Leveraging Linguistic Structures for Named Entity Recognition with
Bidirectional Recursive Neural Networks},
pages = {2664--2669},
publisher = {Annual Meeting of the Association for Computational Linguistics},
publisher = {Conference on Empirical Methods in Natural Language Processing},
year = {2017}
}
......@@ -5098,10 +5080,10 @@ author = {Yoshua Bengio and
author = {Emma Strubell and
Patrick Verga and
David Belanger and
Andrew McCallum},
Andrew Mccallum},
title = {Fast and Accurate Entity Recognition with Iterated Dilated Convolutions},
pages = {2670--2680},
publisher = {Annual Meeting of the Association for Computational Linguistics},
publisher = {Conference on Empirical Methods in Natural Language Processing},
year = {2017}
}
......@@ -5167,7 +5149,7 @@ author = {Yoshua Bengio and
Tommi S. Jaakkola},
title = {Molding CNNs for text: non-linear, non-consecutive convolutions},
pages = {1565--1575},
publisher = {Annual Meeting of the Association for Computational Linguistics},
publisher = {Conference on Empirical Methods in Natural Language Processing},
year = {2015}
}
......@@ -5177,7 +5159,7 @@ author = {Yoshua Bengio and
title = {Effective Use of Word Order for Text Categorization with Convolutional
Neural Networks},
pages = {103--112},
publisher = {Annual Meeting of the Association for Computational Linguistics},
publisher = {Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics},
year = {2015}
}
......@@ -5186,7 +5168,7 @@ author = {Yoshua Bengio and
Ralph Grishman},
title = {Relation Extraction: Perspective from Convolutional Neural Networks},
pages = {39--48},
publisher = {Annual Meeting of the Association for Computational Linguistics},
publisher = {Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics},
year = {2015}
}
......@@ -5204,7 +5186,7 @@ author = {Yoshua Bengio and
Barry Haddow and
Alexandra Birch},
title = {Improving Neural Machine Translation Models with Monolingual Data},
publisher = {The Association for Computer Linguistics},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2016}
}
......@@ -5219,7 +5201,7 @@ author = {Yoshua Bengio and
@article{Waibel1989PhonemeRU,
title={Phoneme recognition using time-delay neural networks},
author={Alexander Waibel and Toshiyuki Hanazawa and Geoffrey Everest Hinton and Kiyohiro Shikano and K.J. Lang},
author={Alexander Waibel and Toshiyuki Hanazawa and Geoffrey Hinton and Kiyohiro Shikano and Kevin J. Lang},
journal={IEEE Transactions on Acoustics, Speech, and Signal Processing},
year={1989},
volume={37},
......@@ -5228,7 +5210,7 @@ author = {Yoshua Bengio and
@article{LeCun1989BackpropagationAT,
title={Backpropagation Applied to Handwritten Zip Code Recognition},
author={Yann LeCun and Bernhard Boser and John Denker and Don Henderson and R.E.Howard and W.E. Hubbard and Larry Jackel},
author={Yann Lecun and Bernhard Boser and John Denker and Don Henderson and Richard E.Howard and Wayne E. Hubbard and Larry Jackel},
journal={Neural Computation},
year={1989},
volume={1},
......@@ -5242,7 +5224,7 @@ author = {Yoshua Bengio and
year={1998},
volume={86},
number={11},
pages={2278-2324},
pages={2278-2324}
}
@inproceedings{DBLP:journals/corr/HeZRS15,
......@@ -5253,7 +5235,7 @@ author = {Yoshua Bengio and
title = {Deep Residual Learning for Image Recognition},
publisher = {{IEEE} Conference on Computer Vision and Pattern Recognition},
pages = {770--778},
year = {2016},
year = {2016}
}
@inproceedings{DBLP:conf/cvpr/HuangLMW17,
......@@ -5278,10 +5260,9 @@ author = {Yoshua Bengio and
@article{He2020MaskR,
title={Mask R-CNN},
author={Kaiming He and Georgia Gkioxari and Piotr Doll{\'a}r and Ross B. Girshick},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
year={2020},
volume={42},
pages={386-397}
journal={International Conference on Computer Vision},
pages={2961--2969},
year={2017}
}
@inproceedings{Kalchbrenner2014ACN,
......@@ -5316,7 +5297,7 @@ author = {Yoshua Bengio and
author = {C{\'{\i}}cero Nogueira dos Santos and
Maira Gatti},
pages = {69--78},
publisher = {Annual Meeting of the Association for Computational Linguistics},
publisher = {International Conference on Computational Linguistics},
year={2014}
}
......@@ -5373,7 +5354,7 @@ author = {Yoshua Bengio and
Michael Auli},
title = {Pay Less Attention with Lightweight and Dynamic Convolutions},
publisher = {International Conference on Learning Representations},
year = {2019},
year = {2019}
}
@inproceedings{kalchbrenner-blunsom-2013-recurrent,
......@@ -5381,8 +5362,8 @@ author = {Yoshua Bengio and
Phil Blunsom},
title = {Recurrent Continuous Translation Models},
pages = {1700--1709},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2013},
publisher = {Conference on Empirical Methods in Natural Language Processing},
year = {2013}
}
@article{Wu2016GooglesNM,
......@@ -5458,7 +5439,7 @@ author = {Yoshua Bengio and
author = {Ilya Sutskever and
James Martens and
George E. Dahl and
Geoffrey Everest Hinton},
Geoffrey Hinton},
publisher = {International Conference on Machine Learning},
pages = {1139--1147},
year={2013}
......@@ -5473,7 +5454,7 @@ author = {Yoshua Bengio and
}
@article{JMLR:v15:srivastava14a,
author = {Nitish Srivastava and Geoffrey Everest Hinton and Alex Krizhevsky and Ilya Sutskever and Ruslan Salakhutdinov},
author = {Nitish Srivastava and Geoffrey Hinton and Alex Krizhevsky and Ilya Sutskever and Ruslan Salakhutdinov},
title = {Dropout: A Simple Way to Prevent Neural Networks from Overfitting},
journal = {Journal of Machine Learning Research},
year = {2014},
......@@ -5507,7 +5488,7 @@ author = {Yoshua Bengio and
title={Rigid-motion scattering for image classification},
author={Sifre, Laurent and Mallat, St{\'e}phane},
year={2014},
publisher={Citeseer}
journal={Citeseer}
}
@article{Taigman2014DeepFaceCT,
......@@ -5566,7 +5547,7 @@ author = {Yoshua Bengio and
Tong Zhang},
title = {Deep Pyramid Convolutional Neural Networks for Text Categorization},
pages = {562--570},
publisher = {Association for Computational Linguistics},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2017}
}
......@@ -5595,7 +5576,7 @@ author = {Yoshua Bengio and
title = {Speech-Transformer: {A} No-Recurrence Sequence-to-Sequence Model for
Speech Recognition},
pages = {5884--5888},
publisher = {Institute of Electrical and Electronics Engineers},
publisher = {International Conference on Acoustics, Speech and Signal Processing},
year = {2018}
}
......@@ -5773,14 +5754,14 @@ author = {Yoshua Bengio and
}
@article{Liu2020LearningTE,
title={Learning to Encode Position for Transformer with Continuous Dynamical Model},
author={Xuanqing Liu and Hsiang-Fu Yu and I. Dhillon and Cho-Jui Hsieh},
author={Xuanqing Liu and Hsiang-Fu Yu and Inderjit Dhillon and Cho-Jui Hsieh},
journal={ArXiv},
year={2020},
volume={abs/2003.09229}
}
@inproceedings{Jawahar2019WhatDB,
title={What Does BERT Learn about the Structure of Language?},
author={Ganesh Jawahar and B. Sagot and Djam{\'e} Seddah},
author={Ganesh Jawahar and Beno{\^{\i}}t Sagot and Djam{\'e} Seddah},
publisher={Annual Meeting of the Association for Computational Linguistics},
year={2019}
}
......@@ -5943,7 +5924,7 @@ author = {Yoshua Bengio and
Translation Models},
volume = {3265},
pages = {115--124},
publisher = {Springer},
publisher = { Association for Machine Translation in the Americas},
year = {2004}
}
......@@ -5954,19 +5935,20 @@ author = {Yoshua Bengio and
Bill Byrne},
title = {SGNMT - A Flexible NMT Decoding Platform for Quick Prototyping
of New Models and Search Strategies},
booktitle = {Proceedings of the 2017 Conference on Empirical Methods in Natural
Language Processing, {EMNLP} 2017, Copenhagen, Denmark, September
9-11, 2017 - System Demonstrations},
pages = {25--30},
publisher = {Association for Computational Linguistics},
publisher = {Conference on Empirical Methods in Natural Language Processing},
year = {2017}
}
@inproceedings{Liu2016AgreementOT,
title={Agreement on Target-bidirectional Neural Machine Translation},
author={L. Liu and M. Utiyama and A. Finch and Eiichiro Sumita},
booktitle={HLT-NAACL},
year={2016}
author={Lemao Liu and
Masao Utiyama and
Andrew M. Finch and
Eiichiro Sumita},
pages = {411--416},
publisher = { Annual Conference of the North American Chapter of the Association for Computational Linguistics},
year = {2016}
}
@inproceedings{DBLP:conf/wmt/LiLXLLLWZXWFCLL19,
......@@ -5988,11 +5970,8 @@ author = {Yoshua Bengio and
Tong Xiao and
Jingbo Zhu},
title = {The NiuTrans Machine Translation Systems for {WMT19}},
booktitle = {Proceedings of the Fourth Conference on Machine Translation, {WMT}
2019, Florence, Italy, August 1-2, 2019 - Volume 2: Shared Task Papers,
Day 1},
pages = {257--266},
publisher = {Association for Computational Linguistics},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2019}
}
......@@ -6001,19 +5980,19 @@ author = {Yoshua Bengio and
Barry Haddow and
Alexandra Birch},
title = {Edinburgh Neural Machine Translation Systems for {WMT} 16},
booktitle = {Proceedings of the First Conference on Machine Translation, {WMT}
2016, colocated with {ACL} 2016, August 11-12, Berlin, Germany},
pages = {371--376},
publisher = {The Association for Computer Linguistics},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2016}
}
@article{Stahlberg2018TheUO,
title={The University of Cambridge's Machine Translation Systems for WMT18},
author={Felix Stahlberg and A. Gispert and B. Byrne},
journal={ArXiv},
year={2018},
volume={abs/1808.09465}
author={Felix Stahlberg and
Adri{\`{a}} de Gispert and
Bill Byrne},
pages = {504--512},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2018}
}
@inproceedings{DBLP:conf/aaai/ZhangSQLJW18,
......@@ -6024,20 +6003,18 @@ author = {Yoshua Bengio and
Rongrong Ji and
Hongji Wang},
title = {Asynchronous Bidirectional Decoding for Neural Machine Translation},
booktitle = {Proceedings of the Thirty-Second {AAAI} Conference on Artificial Intelligence,
(AAAI-18), the 30th innovative Applications of Artificial Intelligence
(IAAI-18), and the 8th {AAAI} Symposium on Educational Advances in
Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February
2-7, 2018},
pages = {5698--5705},
publisher = {{AAAI} Press},
publisher = { AAAI Conference on Artificial Intelligence},
year = {2018}
}
@article{Li2017EnhancedNM,
title={Enhanced neural machine translation by learning from draft},
author={A. Li and Shiyue Zhang and D. Wang and T. Zheng},
journal={2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)},
author={Aodong Li and
Shiyue Zhang and
Dong Wang and
Thomas Fang Zheng},
publisher={IEEE Asia-Pacific Services Computing Conference},
year={2017},
pages={1583-1587}
}
......@@ -6045,120 +6022,141 @@ author = {Yoshua Bengio and
@inproceedings{ElMaghraby2018EnhancingTF,
title={Enhancing Translation from English to Arabic Using Two-Phase Decoder Translation},
author={Ayah ElMaghraby and Ahmed Rafea},
booktitle={IntelliSys},
year={2018}
pages = {539--549},
publisher = {Intelligent Systems and Applications},
year = {2018}
}
@inproceedings{Geng2018AdaptiveMD,
title={Adaptive Multi-pass Decoder for Neural Machine Translation},
author={X. Geng and X. Feng and B. Qin and T. Liu},
booktitle={EMNLP},
author={Xinwei Geng and
Xiaocheng Feng and
Bing Qin and
Ting Liu},
publisher ={Conference on Empirical Methods in Natural Language Processing},
pages={523--532},
year={2018}
}
@article{Lee2018DeterministicNN,
title={Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement},
author={Jason Lee and Elman Mansimov and Kyunghyun Cho},
journal={ArXiv},
year={2018},
volume={abs/1802.06901}
pages = {1173--1182},
publisher = {Conference on Empirical Methods in Natural Language Processing},
year = {2018}
}
@inproceedings{Gu2019LevenshteinT,
title={Levenshtein Transformer},
author={Jiatao Gu and Changhan Wang and Jake Zhao},
booktitle={NeurIPS},
year={2019}
publisher = {Conference and Workshop on Neural Information Processing Systems},
pages = {11179--11189},
year = {2019},
}
@inproceedings{Guo2020JointlyMS,
title={Jointly Masked Sequence-to-Sequence Model for Non-Autoregressive Neural Machine Translation},
author={Junliang Guo and Linli Xu and E. Chen},
booktitle={ACL},
year={2020}
author={Junliang Guo and Linli Xu and Enhong Chen},
pages = {376--385},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2020}
}
@article{Stahlberg2018AnOS,
title={An Operation Sequence Model for Explainable Neural Machine Translation},
author={Felix Stahlberg and Danielle Saunders and B. Byrne},
journal={ArXiv},
year={2018},
volume={abs/1808.09688}
author={Felix Stahlberg and Danielle Saunders and Bill Byrne},
pages = {175--186},
publisher = {Conference on Empirical Methods in Natural Language Processing},
year = {2018}
}
@inproceedings{Stern2019InsertionTF,
title={Insertion Transformer: Flexible Sequence Generation via Insertion Operations},
author={Mitchell Stern and William Chan and J. Kiros and Jakob Uszkoreit},
booktitle={ICML},
author={Mitchell Stern and William Chan and Jamie Kiros and Jakob Uszkoreit},
publisher={International Conference on Machine Learning},
pages={5976--5985},
year={2019}
}
@article{stling2017NeuralMT,
title={Neural machine translation for low-resource languages},
author={Robert {\"O}stling and J. Tiedemann},
journal={ArXiv},
author={Robert {\"O}stling and J{\"{o}}rg Tiedemann},
journal={CoRR},
year={2017},
volume={abs/1708.05729}
}
@article{Kikuchi2016ControllingOL,
title={Controlling Output Length in Neural Encoder-Decoders},
author={Yuta Kikuchi and Graham Neubig and Ryohei Sasano and H. Takamura and M. Okumura},
journal={ArXiv},
year={2016},
volume={abs/1609.09552}
author={Yuta Kikuchi and
Graham Neubig and
Ryohei Sasano and
Hiroya Takamura and
Manabu Okumura},
pages = {1328--1338},
publisher = {Conference on Empirical Methods in Natural Language Processing},
year = {2016}
}
@inproceedings{Takase2019PositionalET,
title={Positional Encoding to Control Output Sequence Length},
author={S. Takase and N. Okazaki},
booktitle={NAACL-HLT},
author={Sho Takase and
Naoaki Okazaki},
publisher={Annual Conference of the North American Chapter of the Association for Computational Linguistics},
pages={3999--4004},
year={2019}
}
@inproceedings{Murray2018CorrectingLB,
title={Correcting Length Bias in Neural Machine Translation},
author={Kenton Murray and David Chiang},
booktitle={WMT},
year={2018}
pages = {212--223},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2018}
}
@article{Sountsov2016LengthBI,
title={Length bias in Encoder Decoder Models and a Case for Global Conditioning},
author={Pavel Sountsov and Sunita Sarawagi},
journal={ArXiv},
year={2016},
volume={abs/1606.03402}
pages = {1516--1525},
publisher = {Conference on Empirical Methods in Natural Language Processing},
year = {2016}
}
@inproceedings{Jean2015MontrealNM,
title={Montreal Neural Machine Translation Systems for WMT'15},
author={S. Jean and Orhan Firat and Kyunghyun Cho and R. Memisevic and Yoshua Bengio},
booktitle={WMT@EMNLP},
author={S{\'{e}}bastien Jean and
Orhan Firat and
Kyunghyun Cho and
Roland Memisevic and
Yoshua Bengio},
publisher={Conference on Empirical Methods in Natural Language Processing},
pages={134--140},
year={2015}
}
@inproceedings{Yang2018OtemUtemOA,
title={Otem{\&}Utem: Over- and Under-Translation Evaluation Metric for NMT},
author={J. Yang and Biao Zhang and Yue Qin and Xiangwen Zhang and Q. Lin and Jinsong Su},
booktitle={NLPCC},
author={Jing Yang and
Biao Zhang and
Yue Qin and
Xiangwen Zhang and
Qian Lin and
Jinsong Su},
publisher={CCF International Conference on Natural Language Processing and Chinese Computing},
pages={291--302},
year={2018}
}
@inproceedings{Mi2016CoverageEM,
title={Coverage Embedding Models for Neural Machine Translation},
author={Haitao Mi and B. Sankaran and Z. Wang and Abe Ittycheriah},
booktitle={EMNLP},
year={2016}
}
@article{Kazimi2017CoverageFC,
title={Coverage for Character Based Neural Machine Translation},
author={M. Kazimi and Marta R. Costa-juss{\`a}},
journal={Proces. del Leng. Natural},
year={2017},
volume={59},
pages={99-106}
author={Haitao Mi and
Baskaran Sankaran and
Zhiguo Wang and
Abe Ittycheriah},
pages = {955--960},
publisher = {Conference on Empirical Methods in Natural Language Processing},
year = {2016}
}
@inproceedings{DBLP:conf/emnlp/HuangZM17,
......@@ -6175,7 +6173,8 @@ author = {Yoshua Bengio and
@inproceedings{Wiseman2016SequencetoSequenceLA,
title={Sequence-to-Sequence Learning as Beam-Search Optimization},
author={Sam Wiseman and Alexander M. Rush},
booktitle={EMNLP},
publisher={Conference on Empirical Methods in Natural Language Processing},
pages={1296--1306},
year={2016}
}
......@@ -6192,10 +6191,12 @@ author = {Yoshua Bengio and
@article{Ma2019LearningTS,
title={Learning to Stop in Structured Prediction for Neural Machine Translation},
author={M. Ma and Renjie Zheng and Liang Huang},
journal={ArXiv},
year={2019},
volume={abs/1904.01032}
author={Mingbo Ma and
Renjie Zheng and
Liang Huang},
pages = {1884--1889},
publisher = { Annual Conference of the North American Chapter of the Association for Computational Linguistics},
year = {2019}
}
@inproceedings{KleinOpenNMT,
......@@ -6219,119 +6220,153 @@ author = {Yoshua Bengio and
year = {2015}
}
@inproceedings{Eisner2011LearningST,
title={Learning Speed-Accuracy Tradeoffs in Nondeterministic Inference Algorithms},
author={J. Eisner and Hal Daum{\'e}},
year={2011}
}
@inproceedings{Jiang2012LearnedPF,
title={Learned Prioritization for Trading Off Accuracy and Speed},
author={J. Jiang and Adam R. Teichert and Hal Daum{\'e} and J. Eisner},
booktitle={NIPS},
year={2012}
author={Jiarong Jiang and Adam R. Teichert and Hal Daum{\'e} and Jason Eisner},
publisher={Conference and Workshop on Neural Information Processing Systems},
pages={1340--1348},
year= {2012}
}
@inproceedings{Zheng2020OpportunisticDW,
title={Opportunistic Decoding with Timely Correction for Simultaneous Translation},
author={Renjie Zheng and M. Ma and Baigong Zheng and Kaibo Liu and Liang Huang},
booktitle={ACL},
author={Renjie Zheng and
Mingbo Ma and
Baigong Zheng and
Kaibo Liu and
Liang Huang},
publisher={Annual Meeting of the Association for Computational Linguistics},
pages={437--442},
year={2020}
}
@inproceedings{Ma2019STACLST,
title={STACL: Simultaneous Translation with Implicit Anticipation and Controllable Latency using Prefix-to-Prefix Framework},
author={M. Ma and L. Huang and Hao Xiong and Renjie Zheng and Kaibo Liu and Baigong Zheng and Chuanqiang Zhang and Zhongjun He and Hairong Liu and X. Li and H. Wu and Haifeng Wang},
booktitle={ACL},
author={Mingbo Ma and
Liang Huang and
Hao Xiong and
Renjie Zheng and
Kaibo Liu and
Baigong Zheng and
Chuanqiang Zhang and
Zhongjun He and
Hairong Liu and
Xing Li and
Hua Wu and
Haifeng Wang},
publisher={Annual Meeting of the Association for Computational Linguistics},
pages={3025--3036},
year={2019}
}
@inproceedings{Gimpel2013ASE,
title={A Systematic Exploration of Diversity in Machine Translation},
author={Kevin Gimpel and Dhruv Batra and Chris Dyer and Gregory Shakhnarovich},
booktitle={EMNLP},
publisher={Conference on Empirical Methods in Natural Language Processing},
pages={1100--1111},
year={2013}
}
@article{Li2016MutualIA,
title={Mutual Information and Diverse Decoding Improve Neural Machine Translation},
author={J. Li and Dan Jurafsky},
journal={ArXiv},
author={Jiwei Li and Dan Jurafsky},
journal={CoRR},
year={2016},
volume={abs/1601.00372}
}
@inproceedings{Li2016ADO,
title={A Diversity-Promoting Objective Function for Neural Conversation Models},
author={J. Li and Michel Galley and Chris Brockett and Jianfeng Gao and W. Dolan},
booktitle={HLT-NAACL},
author={Jiwei Li and
Michel Galley and
Chris Brockett and
Jianfeng Gao and
Bill Dolan},
publisher={Annual Conference of the North American Chapter of the Association for Computational Linguistics},
pages={110--119},
year={2016}
}
@inproceedings{He2018SequenceTS,
title={Sequence to Sequence Mixture Model for Diverse Machine Translation},
author={Xuanli He and Gholamreza Haffari and Mohammad Norouzi},
booktitle={CoNLL},
year={2018}
pages = {583--592},
publisher = {International Conference on Computational Linguistics},
year = {2018}
}
@article{Shen2019MixtureMF,
title={Mixture Models for Diverse Machine Translation: Tricks of the Trade},
author={Tianxiao Shen and Myle Ott and M. Auli and Marc'Aurelio Ranzato},
journal={ArXiv},
year={2019},
volume={abs/1902.07816}
author={Tianxiao Shen and Myle Ott and Michael Auli and Marc'Aurelio Ranzato},
pages = {5719--5728},
publisher = {International Conference on Machine Learning},
year = {2019},
}
@article{Wu2020GeneratingDT,
title={Generating Diverse Translation from Model Distribution with Dropout},
author={Xuanfu Wu and Yang Feng and Chenze Shao},
journal={ArXiv},
year={2020},
volume={abs/2010.08178}
pages={1088--1097},
publisher={Annual Meeting of the Association for Computational Linguistics},
year={2020}
}
@inproceedings{Sun2020GeneratingDT,
title={Generating Diverse Translation by Manipulating Multi-Head Attention},
author={Zewei Sun and Shujian Huang and Hao-Ran Wei and Xin-Yu Dai and Jiajun Chen},
booktitle={AAAI},
author={Zewei Sun and Shujian Huang and Hao Ran Wei and Xin Yu Dai and Jiajun Chen},
publisher={AAAI Conference on Artificial Intelligence},
pages={8976--8983},
year={2020}
}
@article{Vijayakumar2016DiverseBS,
title={Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models},
author={Ashwin K. Vijayakumar and Michael Cogswell and R. R. Selvaraju and Q. Sun and Stefan Lee and David J. Crandall and Dhruv Batra},
journal={ArXiv},
author={Ashwin K. Vijayakumar and
Michael Cogswell and
Ramprasaath R. Selvaraju and
Qing Sun and
Stefan Lee and
David J. Crandall and
Dhruv Batra},
journal={CoRR},
year={2016},
volume={abs/1610.02424}
}
@inproceedings{Liu2014SearchAwareTF,
title={Search-Aware Tuning for Machine Translation},
author={L. Liu and Liang Huang},
booktitle={EMNLP},
author={Lemao Liu and
Liang Huang},
publisher={Conference on Empirical Methods in Natural Language Processing},
pages={1942--1952},
year={2014}
}
@inproceedings{Yu2013MaxViolationPA,
title={Max-Violation Perceptron and Forced Decoding for Scalable MT Training},
author={Heng Yu and Liang Huang and Haitao Mi and Kai Zhao},
booktitle={EMNLP},
publisher={Conference on Empirical Methods in Natural Language Processing},
pages={1112--1123},
year={2013}
}
@inproceedings{Stahlberg2019OnNS,
title={On NMT Search Errors and Model Errors: Cat Got Your Tongue?},
author={Felix Stahlberg and
B. Byrne},
booktitle={EMNLP/IJCNLP},
Bill Byrne},
publisher={Conference on Empirical Methods in Natural Language Processing},
pages={3354--3360},
year={2019}
}
@inproceedings{Niehues2017AnalyzingNM,
title={Analyzing Neural MT Search and Model Performance},
author={J. Niehues and Eunah Cho and Thanh-Le Ha and Alexander H. Waibel},
booktitle={NMT@ACL},
author={Jan Niehues and
Eunah Cho and
Thanh-Le Ha and
Alex Waibel},
pages={11--17},
publisher={Annual Meeting of the Association for Computational Linguistics},
year={2017}
}
......@@ -6346,26 +6381,31 @@ author = {Yoshua Bengio and
@article{Ranzato2016SequenceLT,
title={Sequence Level Training with Recurrent Neural Networks},
author={Marc'Aurelio Ranzato and S. Chopra and M. Auli and W. Zaremba},
journal={CoRR},
year={2016},
volume={abs/1511.06732}
author={Marc'Aurelio Ranzato and
Sumit Chopra and
Michael Auli and
Wojciech Zaremba},
publisher={International Conference on Learning Representations},
year={2016}
}
@article{Bengio2015ScheduledSF,
title={Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks},
author={S. Bengio and Oriol Vinyals and Navdeep Jaitly and Noam Shazeer},
journal={ArXiv},
year={2015},
volume={abs/1506.03099}
author={Samy Bengio and
Oriol Vinyals and
Navdeep Jaitly and
Noam Shazeer},
booktitle = {Conference and Workshop on Neural Information Processing Systems},
pages = {1171--1179},
year = {2015}
}
@article{Zhang2019BridgingTG,
title={Bridging the Gap between Training and Inference for Neural Machine Translation},
author={Wen Zhang and Y. Feng and Fandong Meng and Di You and Qun Liu},
journal={ArXiv},
year={2019},
volume={abs/1906.02448}
author={Wen Zhang and Yang Feng and Fandong Meng and Di You and Qun Liu},
pages = {4334--4343},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2019}
}
@inproceedings{DBLP:conf/acl/ShenCHHWSL16,
......@@ -6381,15 +6421,6 @@ author = {Yoshua Bengio and
year = {2016},
}
@article{Gage1994ANA,
title={A new algorithm for data compression},
author={P. Gage},
journal={The C Users Journal archive},
year={1994},
volume={12},
pages={23-38}
}
@inproceedings{DBLP:conf/acl/SennrichHB16a,
author = {Rico Sennrich and
Barry Haddow and
......@@ -6433,26 +6464,31 @@ author = {Yoshua Bengio and
@article{Narang2017BlockSparseRN,
title={Block-Sparse Recurrent Neural Networks},
author={Sharan Narang and Eric Undersander and G. Diamos},
journal={ArXiv},
author={Sharan Narang and Eric Undersander and Gregory Diamos},
journal={CoRR},
year={2017},
volume={abs/1711.02782}
}
@article{Gale2019TheSO,
title={The State of Sparsity in Deep Neural Networks},
author={T. Gale and E. Elsen and Sara Hooker},
journal={ArXiv},
author={Trevor Gale and
Erich Elsen and
Sara Hooker},
journal={CoRR},
year={2019},
volume={abs/1902.09574}
}
@article{Michel2019AreSH,
title={Are Sixteen Heads Really Better than One?},
author={Paul Michel and Omer Levy and Graham Neubig},
journal={ArXiv},
year={2019},
volume={abs/1905.10650}
author = {Paul Michel and
Omer Levy and
Graham Neubig},
title = {Are Sixteen Heads Really Better than One?},
publisher = {Conference and Workshop on Neural Information Processing Systems},
pages = {14014--14024},
year = {2019}
}
@inproceedings{DBLP:journals/corr/abs-1905-09418,
......@@ -6480,17 +6516,11 @@ author = {Yoshua Bengio and
@article{Katharopoulos2020TransformersAR,
title={Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention},
author={Angelos Katharopoulos and Apoorv Vyas and Nikolaos Pappas and Franccois Fleuret},
journal={ArXiv},
journal={CoRR},
year={2020},
volume={abs/2006.16236}
}
@inproceedings{Beal2003VariationalAF,
title={Variational algorithms for approximate Bayesian inference},
author={M. Beal},
year={2003}
}
@article{xiao2011language,
title ={Language Modeling for Syntax-Based Machine Translation Using Tree Substitution Grammars: A Case Study on Chinese-English Translation},
author ={Xiao, Tong and Zhu, Jingbo and Zhu, Muhua},
......@@ -6503,33 +6533,40 @@ author = {Yoshua Bengio and
@inproceedings{Li2009VariationalDF,
title={Variational Decoding for Statistical Machine Translation},
author={Zhifei Li and J. Eisner and S. Khudanpur},
booktitle={ACL/IJCNLP},
author={Zhifei Li and
Jason Eisner and
Sanjeev Khudanpur},
publisher={Annual Meeting of the Association for Computational Linguistics},
pages={593--601},
year={2009}
}
@article{Bastings2019ModelingLS,
title={Modeling Latent Sentence Structure in Neural Machine Translation},
author={Jasmijn Bastings and W. Aziz and Ivan Titov and K. Sima'an},
journal={ArXiv},
year={2019},
volume={abs/1901.06436}
author={Jasmijn Bastings and
Wilker Aziz and
Ivan Titov and
Khalil Sima'an},
journal = {CoRR},
volume = {abs/1901.06436},
year = {2019}
}
@article{Shah2018GenerativeNM,
title={Generative Neural Machine Translation},
author={Harshil Shah and D. Barber},
journal={ArXiv},
year={2018},
volume={abs/1806.05138}
author={Harshil Shah and
David Barber},
publisher={Conference and Workshop on Neural Information Processing Systems},
pages={1353--1362},
year={2018}
}
@article{Su2018VariationalRN,
title={Variational Recurrent Neural Machine Translation},
author={Jinsong Su and Shan Wu and Deyi Xiong and Yaojie Lu and Xianpei Han and Biao Zhang},
journal={ArXiv},
year={2018},
volume={abs/1801.05119}
publisher={AAAI Conference on Artificial Intelligence},
pages={5488--5495},
year={2018}
}
@inproceedings{DBLP:journals/corr/GehringAGYD17,
......@@ -6548,127 +6585,161 @@ author = {Yoshua Bengio and
@inproceedings{Wei2019ImitationLF,
title={Imitation Learning for Non-Autoregressive Neural Machine Translation},
author={Bingzhen Wei and Mingxuan Wang and Hao Zhou and Junyang Lin and Xu Sun},
booktitle={ACL},
publisher={Annual Meeting of the Association for Computational Linguistics},
pages = {1304--1312},
year={2019}
}
@inproceedings{Shao2019RetrievingSI,
title={Retrieving Sequential Information for Non-Autoregressive Neural Machine Translation},
author={Chenze Shao and Y. Feng and J. Zhang and Fandong Meng and X. Chen and Jie Zhou},
booktitle={ACL},
author={Chenze Shao and
Yang Feng and
Jinchao Zhang and
Fandong Meng and
Xilin Chen and
Jie Zhou},
publisher={Annual Meeting of the Association for Computational Linguistics},
pages={3013--3024},
year={2019}
}
@article{Akoury2019SyntacticallyST,
title={Syntactically Supervised Transformers for Faster Neural Machine Translation},
author={Nader Akoury and Kalpesh Krishna and Mohit Iyyer},
journal={ArXiv},
year={2019},
volume={abs/1906.02780}
pages = {1269--1281},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2019},
}
@article{Guo2020FineTuningBC,
title={Fine-Tuning by Curriculum Learning for Non-Autoregressive Neural Machine Translation},
author={Junliang Guo and Xu Tan and Linli Xu and Tao Qin and E. Chen and T. Liu},
journal={ArXiv},
year={2020},
volume={abs/1911.08717}
author={Junliang Guo and
Xu Tan and
Linli Xu and
Tao Qin and
Enhong Chen and
Tie-Yan Liu},
pages = {7839--7846},
publisher = {AAAI Conference on Artificial Intelligence},
year = {2020}
}
@inproceedings{Ran2020LearningTR,
title={Learning to Recover from Multi-Modality Errors for Non-Autoregressive Neural Machine Translation},
author={Qiu Ran and Yankai Lin and Peng Li and J. Zhou},
booktitle={ACL},
author={Qiu Ran and Yankai Lin and Peng Li and Jie Zhou},
publisher={Annual Meeting of the Association for Computational Linguistics},
pages={3059--3069},
year={2020}
}
@article{Liu2020FastBERTAS,
title={FastBERT: a Self-distilling BERT with Adaptive Inference Time},
author={Weijie Liu and P. Zhou and Zhe Zhao and Zhiruo Wang and Haotang Deng and Q. Ju},
journal={ArXiv},
year={2020},
volume={abs/2004.02178}
author={Weijie Liu and
Peng Zhou and
Zhiruo Wang and
Zhe Zhao and
Haotang Deng and
Qi Ju},
pages = {6035--6044},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2020}
}
@article{Elbayad2020DepthAdaptiveT,
title={Depth-Adaptive Transformer},
author={Maha Elbayad and Jiatao Gu and E. Grave and M. Auli},
journal={ArXiv},
year={2020},
volume={abs/1910.10073}
author={Maha Elbayad and
Jiatao Gu and
Edouard Grave and
Michael Auli},
publisher={International Conference on Learning Representations},
year={2020}
}
@article{Lan2020ALBERTAL,
title={ALBERT: A Lite BERT for Self-supervised Learning of Language Representations},
author={Zhenzhong Lan and Mingda Chen and Sebastian Goodman and Kevin Gimpel and Piyush Sharma and Radu Soricut},
journal={ArXiv},
year={2020},
volume={abs/1909.11942}
publisher={International Conference on Learning Representations}
}
@inproceedings{Han2015LearningBW,
title={Learning both Weights and Connections for Efficient Neural Network},
author={Song Han and J. Pool and John Tran and W. Dally},
booktitle={NIPS},
author={Song Han and
Jeff Pool and
John Tran and
William J. Dally},
publisher={Conference and Workshop on Neural Information Processing Systems},
pages={1135--1143},
year={2015}
}
@article{Lee2019SNIPSN,
title={SNIP: Single-shot Network Pruning based on Connection Sensitivity},
author={N. Lee and Thalaiyasingam Ajanthan and P. Torr},
journal={ArXiv},
year={2019},
volume={abs/1810.02340}
author = {Namhoon Lee and
Thalaiyasingam Ajanthan and
Philip H. S. Torr},
title = {Snip: single-Shot Network Pruning based on Connection sensitivity},
publisher = {International Conference on Learning Representations},
year = {2019},
}
@article{Frankle2019TheLT,
title={The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks},
author={Jonathan Frankle and Michael Carbin},
journal={arXiv: Learning},
publisher={International Conference on Learning Representations},
year={2019}
}
@article{Brix2020SuccessfullyAT,
title={Successfully Applying the Stabilized Lottery Ticket Hypothesis to the Transformer Architecture},
author={Christopher Brix and P. Bahar and H. Ney},
journal={ArXiv},
year={2020},
volume={abs/2005.03454}
author = {Christopher Brix and
Parnia Bahar and
Hermann Ney},
title = {Successfully Applying the Stabilized Lottery Ticket Hypothesis to
the Transformer Architecture},
pages = {3909--3915},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2020},
}
@article{Liu2019RethinkingTV,
title={Rethinking the Value of Network Pruning},
author={Zhuang Liu and M. Sun and Tinghui Zhou and Gao Huang and Trevor Darrell},
author={Zhuang Liu and
Mingjie Sun and
Tinghui Zhou and
Gao Huang and
Trevor Darrell},
journal={ArXiv},
year={2019},
volume={abs/1810.05270}
}
@article{Liu2017LearningEC,
title={Learning Efficient Convolutional Networks through Network Slimming},
author={Zhuang Liu and J. Li and Zhiqiang Shen and Gao Huang and S. Yan and C. Zhang},
journal={2017 IEEE International Conference on Computer Vision (ICCV)},
year={2017},
pages={2755-2763}
}
@inproceedings{Cheong2019transformersZ,
title={transformers.zip : Compressing Transformers with Pruning and Quantization},
author={Robin Cheong},
year={2019}
author = {Zhuang Liu and
Jianguo Li and
Zhiqiang Shen and
Gao Huang and
Shoumeng Yan and
Changshui Zhang},
title = {Learning Efficient Convolutional Networks through Network Slimming},
pages = {2755--2763},
publisher = {{IEEE} International Conference on Computer Vision},
year = {2017}
}
@inproceedings{Banner2018ScalableMF,
title={Scalable Methods for 8-bit Training of Neural Networks},
author={R. Banner and Itay Hubara and E. Hoffer and Daniel Soudry},
booktitle={NeurIPS},
author={Ron Banner and
Itay Hubara and
Elad Hoffer and
Daniel Soudry},
publisher={Conference on Neural Information Processing Systems},
pages={5151--5159},
year={2018}
}
@article{Hubara2017QuantizedNN,
title={Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations},
author={Itay Hubara and Matthieu Courbariaux and Daniel Soudry and Ran El-Yaniv and Yoshua Bengio},
journal={J. Mach. Learn. Res.},
journal={Journal of Machine Learning Reseach},
year={2017},
volume={18},
pages={187:1-187:30}
......@@ -6686,62 +6757,100 @@ author = {Yoshua Bengio and
@article{Munim2019SequencelevelKD,
title={Sequence-level Knowledge Distillation for Model Compression of Attention-based Sequence-to-sequence Speech Recognition},
author={Raden Mu'az Mun'im and N. Inoue and Koichi Shinoda},
journal={ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
author={Raden Mu'az Mun'im and Nakamasa Inoue and Koichi Shinoda},
publisher={{IEEE} International Conference on Acoustics, Speech and Signal Processing},
year={2019},
pages={6151-6155}
}
@article{Tang2019DistillingTK,
title={Distilling Task-Specific Knowledge from BERT into Simple Neural Networks},
author={Raphael Tang and Yao Lu and L. Liu and Lili Mou and Olga Vechtomova and Jimmy Lin},
journal={ArXiv},
year={2019},
volume={abs/1903.12136}
author = {Raphael Tang and
Yao Lu and
Linqing Liu and
Lili Mou and
Olga Vechtomova and
Jimmy Lin},
title = {Distilling Task-Specific Knowledge from {BERT} into Simple Neural
Networks},
journal = {CoRR},
volume = {abs/1903.12136},
year = {2019}
}
@inproceedings{Jiao2020TinyBERTDB,
title={TinyBERT: Distilling BERT for Natural Language Understanding},
author={Xiaoqi Jiao and Y. Yin and L. Shang and Xin Jiang and X. Chen and Linlin Li and F. Wang and Qun Liu},
booktitle={EMNLP},
author = {Xiaoqi Jiao and
Yichun Yin and
Lifeng Shang and
Xin Jiang and
Xiao Chen and
Linlin Li and
Fang Wang and
Qun Liu},
title = {TinyBERT: Distilling {BERT} for Natural Language Understanding},
pages = {4163--4174},
publisher={Conference on Empirical Methods in Natural Language Processing},
year={2020}
}
@article{Ghazvininejad2020AlignedCE,
title={Aligned Cross Entropy for Non-Autoregressive Machine Translation},
author={Marjan Ghazvininejad and V. Karpukhin and Luke Zettlemoyer and Omer Levy},
journal={ArXiv},
year={2020},
volume={abs/2004.01655}
author = {Marjan Ghazvininejad and
Vladimir Karpukhin and
Luke Zettlemoyer and
Omer Levy},
title = {Aligned Cross Entropy for Non-Autoregressive Machine Translation},
journal = {CoRR},
volume = {abs/2004.01655},
year = {2020},
}
@inproceedings{Shao2020MinimizingTB,
title={Minimizing the Bag-of-Ngrams Difference for Non-Autoregressive Neural Machine Translation},
author={Chenze Shao and Jinchao Zhang and Yun-jie Feng and Fandong Meng and Jie Zhou},
booktitle={AAAI},
year={2020}
author = {Chenze Shao and
Jinchao Zhang and
Yang Feng and
Fandong Meng and
Jie Zhou},
title = {Minimizing the Bag-of-Ngrams Difference for Non-Autoregressive Neural
Machine Translation},
pages = {198--205},
publisher = {AAAI Conference on Artificial Intelligence},
year = {2020},
}
@inproceedings{Ma2019FlowSeqNC,
title={FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow},
author={Xuezhe Ma and Chunting Zhou and X. Li and Graham Neubig and E. Hovy},
booktitle={EMNLP/IJCNLP},
author={Xuezhe Ma and
Chunting Zhou and
Xian Li and
Graham Neubig and
Eduard H. Hovy},
publisher={Conference on Empirical Methods in Natural Language Processing},
pages={4281--4291},
year={2019}
}
@inproceedings{Guo2019NonAutoregressiveNM,
title={Non-Autoregressive Neural Machine Translation with Enhanced Decoder Input},
author={Junliang Guo and X. Tan and D. He and T. Qin and Linli Xu and T. Liu},
booktitle={AAAI},
author={Junliang Guo and
Xu Tan and
Di He and
Tao Qin and
Linli Xu and
Tie-Yan Liu},
pages={3723--3730},
publisher={AAAI Conference on Artificial Intelligence},
year={2019}
}
@article{Ran2019GuidingNN,
title={Guiding Non-Autoregressive Neural Machine Translation Decoding with Reordering Information},
author={Qiu Ran and Yankai Lin and Peng Li and J. Zhou},
journal={ArXiv},
year={2019},
volume={abs/1911.02215}
author = {Qiu Ran and
Yankai Lin and
Peng Li and
Jie Zhou},
title = {Guiding Non-Autoregressive Neural Machine Translation Decoding with
Reordering Information},
journal = {CoRR},
volume = {abs/1911.02215},
year = {2019}
}
@inproceedings{vaswani2017attention,
......@@ -6773,73 +6882,96 @@ author = {Yoshua Bengio and
@inproceedings{Wang2019NonAutoregressiveMT,
title={Non-Autoregressive Machine Translation with Auxiliary Regularization},
author={Yiren Wang and Fei Tian and D. He and T. Qin and ChengXiang Zhai and T. Liu},
booktitle={AAAI},
author={Yiren Wang and
Fei Tian and
Di He and
Tao Qin and
ChengXiang Zhai and
Tie-Yan Liu},
publisher={AAAI Conference on Artificial Intelligence},
pages={5377--5384},
year={2019}
}
@inproceedings{Kaiser2018FastDI,
title={Fast Decoding in Sequence Models using Discrete Latent Variables},
author={Łukasz Kaiser and Aurko Roy and Ashish Vaswani and Niki Parmar and S. Bengio and Jakob Uszkoreit and Noam Shazeer},
booktitle={ICML},
author={Łukasz Kaiser and Aurko Roy and Ashish Vaswani and Niki Parmar and Samy Bengio and Jakob Uszkoreit and Noam Shazeer},
publisher={International Conference on Machine Learning},
pages={2395--2404},
year={2018}
}
@article{Tu2020ENGINEEI,
title={ENGINE: Energy-Based Inference Networks for Non-Autoregressive Machine Translation},
author={Lifu Tu and Richard Yuanzhe Pang and Sam Wiseman and Kevin Gimpel},
journal={ArXiv},
year={2020},
volume={abs/2005.00850}
pages={2819--2826},
publisher={Annual Meeting of the Association for Computational Linguistics},
year={2020}
}
@inproceedings{Shu2020LatentVariableNN,
title={Latent-Variable Non-Autoregressive Neural Machine Translation with Deterministic Inference using a Delta Posterior},
author={Raphael Shu and Jason Lee and Hideki Nakayama and Kyunghyun Cho},
booktitle={AAAI},
publisher={AAAI Conference on Artificial Intelligence},
pages={8846--8853},
year={2020}
}
@inproceedings{Li2019HintBasedTF,
title={Hint-Based Training for Non-Autoregressive Machine Translation},
author={Zhuohan Li and Zi Lin and Di He and Fei Tian and Tao Qin and Liwei Wang and T. Liu},
booktitle={EMNLP/IJCNLP},
author={Zhuohan Li and
Zi Lin and
Di He and
Fei Tian and
Tao Qin and
Liwei Wang and
Tie-Yan Liu},
publisher={Conference on Empirical Methods in Natural Language Processing},
pages={5707--5712},
year={2019}
}
@inproceedings{Ho2016ModelFreeIL,
title={Model-Free Imitation Learning with Policy Optimization},
author={Jonathan Ho and J. Gupta and S. Ermon},
booktitle={ICML},
author={Jonathan Ho and
Jayesh K. Gupta and
Stefano Ermon},
publisher={International Conference on Machine Learning},
pages={2760--2769},
year={2016}
}
@inproceedings{Ho2016GenerativeAI,
title={Generative Adversarial Imitation Learning},
author={Jonathan Ho and S. Ermon},
booktitle={NIPS},
author={Jonathan Ho and Stefano Ermon},
publisher={Conference and Workshop on Neural Information Processing Systems},
pages={4565--4573},
year={2016}
}
@article{Duan2017OneShotIL,
title={One-Shot Imitation Learning},
author={Yan Duan and Marcin Andrychowicz and Bradly C. Stadie and Jonathan Ho and J. Schneider and Ilya Sutskever and P. Abbeel and W. Zaremba},
journal={ArXiv},
author={Yan Duan and Marcin Andrychowicz and Bradly C. Stadie and Jonathan Ho and Jonas Schneider and Ilya Sutskever and Pieter Abbeel and Wojciech Zaremba},
journal={CoRR},
year={2017},
volume={abs/1703.07326}
}
@inproceedings{Wang2018SemiAutoregressiveNM,
title={Semi-Autoregressive Neural Machine Translation},
author={C. Wang and Ji Zhang and Haiqing Chen},
booktitle={EMNLP},
author={Chunqi Wang and
Ji Zhang and
Haiqing Chen},
booktitle={Conference on Empirical Methods in Natural Language Processing},
pages={479--488},
year={2018}
}
@inproceedings{Ghazvininejad2019MaskPredictPD,
title={Mask-Predict: Parallel Decoding of Conditional Masked Language Models},
author={Marjan Ghazvininejad and Omer Levy and Yinhan Liu and Luke Zettlemoyer},
booktitle={EMNLP/IJCNLP},
publisher={Conference on Empirical Methods in Natural Language Processing},
pages={6111--6120},
year={2019}
}
......@@ -6852,7 +6984,9 @@ author = {Yoshua Bengio and
@article{Zhou2019SynchronousBN,
title={Synchronous Bidirectional Neural Machine Translation},
author={L. Zhou and Jiajun Zhang and C. Zong},
author={Long Zhou and
Jiajun Zhang and
Chengqing Zong},
journal={Transactions of the Association for Computational Linguistics},
year={2019},
volume={7},
......@@ -6869,8 +7003,9 @@ author = {Yoshua Bengio and
@inproceedings{Feng2016ImprovingAM,
title={Improving Attention Modeling with Implicit Distortion and Fertility for Machine Translation},
author={Shi Feng and Shujie Liu and Nan Yang and Mu Li and M. Zhou and K. Q. Zhu},
booktitle={COLING},
author={Shi Feng and Shujie Liu and Nan Yang and Mu Li and Ming Zhou and Kenny Q. Zhu},
booktitle={International Conference on Computational Linguistics},
pages={3082--3092},
year={2016}
}
......@@ -6939,7 +7074,7 @@ author = {Yoshua Bengio and
@article{Peris2017InteractiveNM,
title={Interactive neural machine translation},
author={{\'A}lvaro Peris and Miguel Domingo and F. Casacuberta},
journal={Comput. Speech Lang.},
journal={Computer Speech and Language},
year={2017},
volume={45},
pages={201-220}
......@@ -6947,8 +7082,9 @@ author = {Yoshua Bengio and
@inproceedings{Peris2018ActiveLF,
title={Active Learning for Interactive Neural Machine Translation of Data Streams},
author={{\'A}lvaro Peris and F. Casacuberta},
booktitle={CoNLL},
author={{\'A}lvaro Peris and Francisco Casacuberta},
publisher={The SIGNLL Conference on Computational Natural Language Learning},
pages={151--160},
year={2018}
}
......@@ -6973,7 +7109,7 @@ author = {Yoshua Bengio and
}
@article{61115,
author={J. {Lin}},
author={Jianhua Lin},
journal={IEEE Transactions on Information Theory},
title={Divergence measures based on the Shannon entropy},
year={1991},
......@@ -6987,13 +7123,8 @@ author = {Yoshua Bengio and
Atsushi Fujita},
title = {Recurrent Stacking of Layers for Compact Neural Machine Translation
Models},
booktitle = {The Thirty-Third {AAAI} Conference on Artificial Intelligence, {AAAI}
2019, The Thirty-First Innovative Applications of Artificial Intelligence
Conference, {IAAI} 2019, The Ninth {AAAI} Symposium on Educational
Advances in Artificial Intelligence, {EAAI} 2019, Honolulu, Hawaii,
USA, January 27 - February 1, 2019},
pages = {6292--6299},
publisher = {{AAAI} Press},
publisher = { AAAI Conference on Artificial Intelligence},
year = {2019}
}
......@@ -7081,10 +7212,8 @@ author = {Yoshua Bengio and
Dmitry Kalenichenko},
title = {Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only
Inference},
booktitle = {2018 {IEEE} Conference on Computer Vision and Pattern Recognition,
{CVPR} 2018, Salt Lake City, UT, USA, June 18-22, 2018},
publisher = {{IEEE} Conference on Computer Vision and Pattern Recognition},
pages = {2704--2713},
publisher = {{IEEE} Computer Society},
year = {2018}
}
......@@ -7105,9 +7234,7 @@ author = {Yoshua Bengio and
Ran El-Yaniv and
Yoshua Bengio},
title = {Binarized Neural Networks},
booktitle = {Advances in Neural Information Processing Systems 29: Annual Conference
on Neural Information Processing Systems 2016, December 5-10, 2016,
Barcelona, Spain},
publisher = {Conference and Workshop on Neural Information Processing Systems},
pages = {4107--4115},
year = {2016}
}
......@@ -7130,10 +7257,8 @@ author = {Yoshua Bengio and
Muhua Zhu and
Huizhen Wang},
title = {Boosting-Based System Combination for Machine Translation},
booktitle = {{ACL} 2010, Proceedings of the 48th Annual Meeting of the Association
for Computational Linguistics, July 11-16, 2010, Uppsala, Sweden},
pages = {739--748},
publisher = {The Association for Computer Linguistics},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2010}
}
......@@ -7145,11 +7270,9 @@ author = {Yoshua Bengio and
Philip C. Woodland},
title = {Consensus Network Decoding for Statistical Machine Translation System
Combination},
booktitle = {Proceedings of the {IEEE} International Conference on Acoustics, Speech,
and Signal Processing, {ICASSP} 2007, Honolulu, Hawaii, USA, April
15-20, 2007},
publisher = {Proceedings of the {IEEE} International Conference on Acoustics, Speech,
and Signal Processing},
pages = {105--108},
publisher = {{IEEE}},
year = {2007}
}
......@@ -7158,9 +7281,7 @@ author = {Yoshua Bengio and
Spyridon Matsoukas and
Richard M. Schwartz},
title = {Improved Word-Level System Combination for Machine Translation},
booktitle = {{ACL} 2007, Proceedings of the 45th Annual Meeting of the Association
for Computational Linguistics, June 23-30, 2007, Prague, Czech Republic},
publisher = {The Association for Computational Linguistics},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2007}
}
......@@ -7171,10 +7292,8 @@ author = {Yoshua Bengio and
Richard M. Schwartz},
title = {Incremental Hypothesis Alignment for Building Confusion Networks with
Application to Machine Translation System Combination},
booktitle = {Proceedings of the Third Workshop on Statistical Machine Translation,
WMT@ACL 2008, Columbus, Ohio, USA, June 19, 2008},
publisher = {Proceedings of the Third Workshop on Statistical Machine Translation},
pages = {183--186},
publisher = {Association for Computational Linguistics},
year = {2008}
}
......@@ -7184,11 +7303,8 @@ author = {Yoshua Bengio and
Tong Xiao and
Ming Zhou},
title = {The Feature Subspace Method for SMT System Combination},
booktitle = {Proceedings of the 2009 Conference on Empirical Methods in Natural
Language Processing, {EMNLP} 2009, 6-7 August 2009, Singapore, {A}
meeting of SIGDAT, a Special Interest Group of the {ACL}},
publisher = {Conference on Empirical Methods in Natural Language Processing},
pages = {1096--1104},
publisher = {{ACL}},
year = {2009}
}
......@@ -7217,12 +7333,8 @@ author = {Yoshua Bengio and
Franz Josef Och and
Wolfgang Macherey},
title = {Lattice Minimum Bayes-Risk Decoding for Statistical Machine Translation},
booktitle = {2008 Conference on Empirical Methods in Natural Language Processing,
{EMNLP} 2008, Proceedings of the Conference, 25-27 October 2008, Honolulu,
Hawaii, USA, {A} meeting of SIGDAT, a Special Interest Group of the
{ACL}},
publisher = {Conference on Empirical Methods in Natural Language Processing},
pages = {620--629},
publisher = {{ACL}},
year = {2008}
}
......@@ -7235,10 +7347,8 @@ author = {Yoshua Bengio and
Yang Liu},
title = {Lattice-Based Recurrent Neural Network Encoders for Neural Machine
Translation},
booktitle = {Proceedings of the Thirty-First {AAAI} Conference on Artificial Intelligence,
February 4-9, 2017, San Francisco, California, {USA}},
publisher = {AAAI Conference on Artificial Intelligence},
pages = {3302--3308},
publisher = {{AAAI} Press},
year = {2017}
}
......@@ -7250,7 +7360,7 @@ author = {Yoshua Bengio and
publisher = {Proceedings of the Human Language Technology Conference of
the North American Chapter of the Association for Computational Linguistics},
pages = {464--468},
year = {2018},
year = {2018}
}
@inproceedings{WangLearning,
......@@ -7272,9 +7382,7 @@ author = {Yoshua Bengio and
Edouard Grave and
Armand Joulin},
title = {Reducing Transformer Depth on Demand with Structured Dropout},
booktitle = {8th International Conference on Learning Representations, {ICLR} 2020,
Addis Ababa, Ethiopia, April 26-30, 2020},
publisher = {OpenReview.net},
publisher = {International Conference on Learning Representations},
year = {2020}
}
......@@ -7282,16 +7390,10 @@ author = {Yoshua Bengio and
author = {Qiang Wang and
Tong Xiao and
Jingbo Zhu},
editor = {Trevor Cohn and
Yulan He and
Yang Liu},
title = {Training Flexible Depth Model by Multi-Task Learning for Neural Machine
Translation},
booktitle = {Proceedings of the 2020 Conference on Empirical Methods in Natural
Language Processing: Findings, {EMNLP} 2020, Online Event, 16-20 November
2020},
pages = {4307--4312},
publisher = {Association for Computational Linguistics},
publisher = {Conference on Empirical Methods in Natural Language Processing},
year = {2020}
}
......@@ -7302,8 +7404,7 @@ author = {Yoshua Bengio and
Furu Wei and
Ming Zhou},
title = {BERT-of-Theseus: Compressing {BERT} by Progressive Module Replacing},
journal = {CoRR},
volume = {abs/2002.02925},
publisher = {Conference on Empirical Methods in Natural Language Processing},
year = {2020}
}
......@@ -7311,9 +7412,7 @@ author = {Yoshua Bengio and
author = {Alexei Baevski and
Michael Auli},
title = {Adaptive Input Representations for Neural Language Modeling},
booktitle = {7th International Conference on Learning Representations, {ICLR} 2019,
New Orleans, LA, USA, May 6-9, 2019},
publisher = {OpenReview.net},
journal = {arXiv preprint arXiv:1809.10853},
year = {2019}
}
......@@ -7361,9 +7460,7 @@ author = {Yoshua Bengio and
Ruslan Salakhutdinov and
Quoc V. Le},
title = {Mixtape: Breaking the Softmax Bottleneck Efficiently},
booktitle = {Advances in Neural Information Processing Systems 32: Annual Conference
on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14
December 2019, Vancouver, BC, Canada},
booktitle = {Conference on Neural Information Processing Systems},
pages = {15922--15930},
year = {2019}
}
......@@ -7390,11 +7487,9 @@ author = {Yoshua Bengio and
Chenglong Wang and
Tong Xiao and
Jingbo Zhu},
title = {The NiuTrans System for {WNGT} 2020 Efficiency Task},
booktitle = {Proceedings of the Fourth Workshop on Neural Generation and Translation,
NGT@ACL 2020, Online, July 5-10, 2020},
title = {The NiuTrans System for WNGT 2020 Efficiency Task},
pages = {204--210},
publisher = {Association for Computational Linguistics},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2020}
}
......@@ -7431,37 +7526,38 @@ author = {Yoshua Bengio and
@inproceedings{Sun2019BaiduNM,
title={Baidu Neural Machine Translation Systems for WMT19},
author={M. Sun and
B. Jiang and
H. Xiong and
Zhongjun He and
H. Wu and
Haifeng Wang},
booktitle={WMT},
author = {Meng Sun and
Bojian Jiang and
Hao Xiong and
Zhongjun He and
Hua Wu and
Haifeng Wang},
publisher={Annual Meeting of the Association for Computational Linguistics},
pages = {374--381},
year={2019}
}
@inproceedings{Wang2018TencentNM,
title={Tencent Neural Machine Translation Systems for WMT18},
author={Mingxuan Wang and
Li Gong and
Wenhuan Zhu and
J. Xie and
C. Bian},
booktitle={WMT},
author={Mingxuan Wang and
Li Gong and
Wenhuan Zhu and
Jun Xie and
Chao Bian},
publisher={Annual Meeting of the Association for Computational Linguistics},
pages={522--527},
year={2018}
}
@article{Bi2019MultiagentLF,
title={Multi-agent Learning for Neural Machine Translation},
author={Tianchi Bi and
H. Xiong and
Hao Xiong and
Zhongjun He and
H. Wu and
Hua Wu and
Haifeng Wang},
journal={ArXiv},
year={2019},
volume={abs/1909.01101}
publisher={arXiv preprint arXiv:1909.01101},
year={2019}
}
@inproceedings{DBLP:conf/aclnmt/KoehnK17,
......@@ -7475,23 +7571,73 @@ author = {Yoshua Bengio and
@inproceedings{Held2013AppliedSI,
title={Applied Statistical Inference: Likelihood and Bayes},
author={L. Held and Daniel Sabans Bov},
year={2013}
title={Applied statistical inference},
author={Leonhard Held and Saban{\'e}s Bov{\'e}, D},
volume={10},
number={978-3},
pages={16},
year={2014},
publisher={Springer}
}
@inproceedings{Zhang2016VariationalNM,
title={Variational Neural Machine Translation},
author = {Biao Zhang and
Deyi Xiong and
Jinsong Su and
Hong Duan and
Min Zhang},
pages = {521--530},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2016}
}
@inproceedings{Silvey2018StatisticalI,
title={Statistical Inference},
author={S. D. Silvey},
booktitle={Encyclopedia of Social Network Analysis and Mining. 2nd Ed.},
publisher={Encyclopedia of Social Network Analysis and Mining},
year={2018}
}
@inproceedings{Zhang2016VariationalNM,
title={Variational Neural Machine Translation},
author={Biao Zhang and Deyi Xiong and Jinsong Su and H. Duan and Min Zhang},
booktitle={EMNLP},
year={2016}
@inproceedings{Cheong2019transformersZ,
title={transformers.zip : Compressing Transformers with Pruning and Quantization},
author={Robin Cheong and Robel Daniel},
publisher={Stanford University},
year={2019}
}
@inproceedings{Beal2003VariationalAF,
title={Variational algorithms for approximate Bayesian inference},
author={Matthew J. Beal},
publisher={University College London},
year={2003}
}
@article{Gage1994ANA,
title={A new algorithm for data compression},
author={P. Gage},
journal={The C Users Journal archive},
year={1994},
volume={12},
pages={23-38}
}
@inproceedings{Eisner2011LearningST,
title={Learning Speed-Accuracy Tradeoffs in Nondeterministic Inference Algorithms},
author={J. Eisner and Hal Daum{\'e}},
publisher={Conference and Workshop on Neural Information Processing Systems},
year={2011}
}
@article{Kazimi2017CoverageFC,
title={Coverage for Character Based Neural Machine Translation},
author={M. Kazimi and Marta R. Costa-juss{\`a}},
journal={arXiv preprint arXiv:1810.02340},
year={2017},
volume={59},
pages={99-106}
}
%%%%% chapter 14------------------------------------------------------
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
......@@ -7970,7 +8116,7 @@ author = {Yoshua Bengio and
author = {Ivan Vulic and
Anna Korhonen},
title = {On the Role of Seed Lexicons in Learning Bilingual Word Embeddings},
publisher = {The Association for Computer Linguistics},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2016}
}
@inproceedings{DBLP:conf/iclr/SmithTHH17,
......@@ -8616,7 +8762,7 @@ author = {Yoshua Bengio and
title = {Using Context Vectors in Improving a Machine Translation System with
Bridge Language},
pages = {318--322},
publisher = {The Association for Computer Linguistics},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2013}
}
@inproceedings{DBLP:conf/emnlp/ZhuHWZWZ14,
......@@ -8640,7 +8786,7 @@ author = {Yoshua Bengio and
Satoshi Nakamura},
title = {Improving Pivot Translation by Remembering the Pivot},
pages = {573--577},
publisher = {The Association for Computer Linguistics},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2015}
}
@inproceedings{DBLP:conf/acl/CohnL07,
......@@ -8666,7 +8812,7 @@ author = {Yoshua Bengio and
Haifeng Wang},
title = {Revisiting Pivot Language Approach for Machine Translation},
pages = {154--162},
publisher = {The Association for Computer Linguistics},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2009}
}
@article{DBLP:journals/corr/ChengLYSX16,
......@@ -8711,7 +8857,7 @@ author = {Yoshua Bengio and
Rafael E. Banchs},
title = {Enhancing scarce-resource language translation through pivot combinations},
pages = {1361--1365},
publisher = {The Association for Computer Linguistics},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2011}
}
@article{DBLP:journals/corr/HintonVD15,
......@@ -8758,7 +8904,7 @@ author = {Yoshua Bengio and
Haifeng Wang},
title = {Multi-Task Learning for Multiple Language Translation},
pages = {1723--1732},
publisher = {The Association for Computer Linguistics},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2015}
}
@article{DBLP:journals/tacl/LeeCH17,
......@@ -9338,7 +9484,7 @@ author = {Yoshua Bengio and
Xiaohua Liu and
Hang Li},
title = {Modeling Coverage for Neural Machine Translation},
publisher = {The Association for Computer Linguistics},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2016}
}
@article{DBLP:journals/tacl/TuLLLL17,
......@@ -9676,3 +9822,429 @@ author = {Yoshua Bengio and
%%%%% chapter 18------------------------------------------------------
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%% chapter appendix-A------------------------------------------------------
@inproceedings{Tong2012NiuTrans,
author = {Tong Xiao and
Jingbo Zhu and
Hao Zhang and
Qiang Li},
title = {NiuTrans: An Open Source Toolkit for Phrase-based and Syntax-based
Machine Translation},
pages = {19--24},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2012}
}
@inproceedings{Li2010Joshua,
author = {Zhifei Li and
Chris Callison-Burch and
Chris Dyer and
Sanjeev Khudanpur and
Lane Schwartz and
Wren N. G. Thornton and
Jonathan Weese and
Omar Zaidan},
title = {Joshua: An Open Source Toolkit for Parsing-Based Machine Translation},
pages = {135--139},
publisher = {Association for Computational Linguistics},
year = {2009}
}
@inproceedings{iglesias2009hierarchical,
author = {Gonzalo Iglesias and
Adri{\`{a}} de Gispert and
Eduardo Rodr{\'{\i}}guez Banga and
William J. Byrne},
title = {Hierarchical Phrase-Based Translation with Weighted Finite State Transducers},
pages = {433--441},
publisher = {The Association for Computational Linguistics},
year = {2009}
}
@inproceedings{dyer2010cdec,
author = {Chris Dyer and
Adam Lopez and
Juri Ganitkevitch and
Jonathan Weese and
Ferhan T{\"{u}}re and
Phil Blunsom and
Hendra Setiawan and
Vladimir Eidelman and
Philip Resnik},
title = {cdec: {A} Decoder, Alignment, and Learning Framework for Finite-State
and Context-Free Translation Models},
pages = {7--12},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2010}
}
@inproceedings{Cer2010Phrasal,
author = {Daniel M. Cer and
Michel Galley and
Daniel Jurafsky and
Christopher D. Manning},
title = {Phrasal: {A} Statistical Machine Translation Toolkit for Exploring
New Model Features},
pages = {9--12},
publisher = {The Association for Computational Linguistics},
year = {2010}
}
@article{vilar2012jane,
title={Jane: an advanced freely available hierarchical machine translation toolkit},
author={Vilar, David and Stein, Daniel and Huck, Matthias and Ney, Hermann},
publisher={Machine Translation},
volume={26},
number={3},
pages={197--216},
year={2012}
}
@inproceedings{DBLP:conf/naacl/DyerCS13,
author = {Chris Dyer and
Victor Chahuneau and
Noah A. Smith},
title = {A Simple, Fast, and Effective Reparameterization of {IBM} Model 2},
pages = {644--648},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2013}
}
@article{al2016theano,
author = {Rami Al-Rfou and
Guillaume Alain and
Amjad Almahairi and
Christof Angerm{\"{u}}ller and
Dzmitry Bahdanau and
Nicolas Ballas and
Fr{\'{e}}d{\'{e}}ric Bastien and
Justin Bayer and
Anatoly Belikov and
Alexander Belopolsky and
Yoshua Bengio and
Arnaud Bergeron and
James Bergstra and
Valentin Bisson and
Josh Bleecher Snyder and
Nicolas Bouchard and
Nicolas Boulanger-Lewandowski and
Xavier Bouthillier and
Alexandre de Br{\'{e}}bisson and
Olivier Breuleux and
Pierre Luc Carrier and
Kyunghyun Cho and
Jan Chorowski and
Paul F. Christiano and
Tim Cooijmans and
Marc-Alexandre C{\^{o}}t{\'{e}} and
Myriam C{\^{o}}t{\'{e}} and
Aaron C. Courville and
Yann N. Dauphin and
Olivier Delalleau and
Julien Demouth and
Guillaume Desjardins and
Sander Dieleman and
Laurent Dinh and
Melanie Ducoffe and
Vincent Dumoulin and
Samira Ebrahimi Kahou and
Dumitru Erhan and
Ziye Fan and
Orhan Firat and
Mathieu Germain and
Xavier Glorot and
Ian J. Goodfellow and
Matthew Graham and
{\c{C}}aglar G{\"{u}}l{\c{c}}ehre and
Philippe Hamel and
Iban Harlouchet and
Jean-Philippe Heng and
Bal{\'{a}}zs Hidasi and
Sina Honari and
Arjun Jain and
S{\'{e}}bastien Jean and
Kai Jia and
Mikhail Korobov and
Vivek Kulkarni and
Alex Lamb and
Pascal Lamblin and
Eric Larsen and
C{\'{e}}sar Laurent and
Sean Lee and
Simon Lefran{\c{c}}ois and
Simon Lemieux and
Nicholas L{\'{e}}onard and
Zhouhan Lin and
Jesse A. Livezey and
Cory Lorenz and
Jeremiah Lowin and
Qianli Ma and
Pierre-Antoine Manzagol and
Olivier Mastropietro and
Robert McGibbon and
Roland Memisevic and
Bart van Merri{\"{e}}nboer and
Vincent Michalski and
Mehdi Mirza and
Alberto Orlandi and
Christopher Joseph Pal and
Razvan Pascanu and
Mohammad Pezeshki and
Colin Raffel and
Daniel Renshaw and
Matthew Rocklin and
Adriana Romero and
Markus Roth and
Peter Sadowski and
John Salvatier and
Fran{\c{c}}ois Savard and
Jan Schl{\"{u}}ter and
John Schulman and
Gabriel Schwartz and
Iulian Vlad Serban and
Dmitriy Serdyuk and
Samira Shabanian and
{\'{E}}tienne Simon and
Sigurd Spieckermann and
S. Ramana Subramanyam and
Jakub Sygnowski and
J{\'{e}}r{\'{e}}mie Tanguay and
Gijs van Tulder and
Joseph P. Turian and
Sebastian Urban and
Pascal Vincent and
Francesco Visin and
Harm de Vries and
David Warde-Farley and
Dustin J. Webb and
Matthew Willson and
Kelvin Xu and
Lijun Xue and
Li Yao and
Saizheng Zhang and
Ying Zhang},
title = {Theano: {A} Python framework for fast computation of mathematical
expressions},
journal = {CoRR},
volume = {abs/1605.02688},
year = {2016}
}
@inproceedings{DBLP:journals/corr/SennrichFCBHHJL17,
author = {Rico Sennrich and
Orhan Firat and
Kyunghyun Cho and
Barry Haddow and
Alexandra Birch and
Julian Hitschler and
Marcin Junczys-Dowmunt and
Samuel L{\"{a}}ubli and
Antonio Valerio Miceli Barone and
Jozef Mokry and
Maria Nadejde},
title = {Nematus: a Toolkit for Neural Machine Translation},
publisher = {European Association of Computational Linguistics},
pages = {65--68},
year = {2017}
}
@inproceedings{Koehn2007Moses,
author = {Philipp Koehn and
Hieu Hoang and
Alexandra Birch and
Chris Callison-Burch and
Marcello Federico and
Nicola Bertoldi and
Brooke Cowan and
Wade Shen and
Christine Moran and
Richard Zens and
Chris Dyer and
Ondrej Bojar and
Alexandra Constantin and
Evan Herbst},
title = {Moses: Open Source Toolkit for Statistical Machine Translation},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2007}
}
@inproceedings{zollmann2007the,
author = {Andreas Zollmann and
Ashish Venugopal and
Matthias Paulik and
Stephan Vogel},
title = {The Syntax Augmented {MT} {(SAMT)} System at the Shared Task for the
2007 {ACL} Workshop on Statistical Machine Translation},
pages = {216--219},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2007}
}
@article{och2003systematic,
author = {Franz Josef Och and
Hermann Ney},
title = {A Systematic Comparison of Various Statistical Alignment Models},
journal = {Computational Linguistics},
volume = {29},
number = {1},
pages = {19--51},
year = {2003}
}
@inproceedings{zoph2016simple,
author = {Barret Zoph and
Ashish Vaswani and
Jonathan May and
Kevin Knight},
title = {Simple, Fast Noise-Contrastive Estimation for Large {RNN} Vocabularies},
pages = {1217--1222},
publisher = {The Association for Computational Linguistics},
year = {2016}
}
@inproceedings{Ottfairseq,
author = {Myle Ott and
Sergey Edunov and
Alexei Baevski and
Angela Fan and
Sam Gross and
Nathan Ng and
David Grangier and
Michael Auli},
title = {fairseq: {A} Fast, Extensible Toolkit for Sequence Modeling},
pages = {48--53},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2019}
}
@inproceedings{Vaswani2018Tensor2TensorFN,
author = {Ashish Vaswani and
Samy Bengio and
Eugene Brevdo and
Fran{\c{c}}ois Chollet and
Aidan N. Gomez and
Stephan Gouws and
Llion Jones and
Lukasz Kaiser and
Nal Kalchbrenner and
Niki Parmar and
Ryan Sepassi and
Noam Shazeer and
Jakob Uszkoreit},
title = {Tensor2Tensor for Neural Machine Translation},
pages = {193--199},
publisher = {Association for Machine Translation in the Americas},
year = {2018}
}
@inproceedings{KleinOpenNMT,
author = {Guillaume Klein and
Yoon Kim and
Yuntian Deng and
Jean Senellart and
Alexander M. Rush},
title = {OpenNMT: Open-Source Toolkit for Neural Machine Translation},
pages = {67--72},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2017}
}
@inproceedings{luong2016acl_hybrid,
author = {Minh-Thang Luong and
Christopher D. Manning},
title = {Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character
Models},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2016}
}
@article{ZhangTHUMT,
author = {Jiacheng Zhang and
Yanzhuo Ding and
Shiqi Shen and
Yong Cheng and
Maosong Sun and
Huan-Bo Luan and
Yang Liu},
title = {{THUMT:} An Open Source Toolkit for Neural Machine Translation},
journal = {CoRR},
volume = {abs/1706.06415},
year = {2017}
}
@inproceedings{JunczysMarian,
author = {Marcin Junczys-Dowmunt and
Roman Grundkiewicz and
Tomasz Dwojak and
Hieu Hoang and
Kenneth Heafield and
Tom Neckermann and
Frank Seide and
Ulrich Germann and
Alham Fikri Aji and
Nikolay Bogoychev and
Andr{\'{e}} F. T. Martins and
Alexandra Birch},
title = {Marian: Fast Neural Machine Translation in {C++}},
pages = {116--121},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2018}
}
@article{hieber2017sockeye,
author = {Felix Hieber and
Tobias Domhan and
Michael Denkowski and
David Vilar and
Artem Sokolov and
Ann Clifton and
Matt Post},
title = {Sockeye: {A} Toolkit for Neural Machine Translation},
journal = {CoRR},
volume = {abs/1712.05690},
year = {2017}
}
@inproceedings{WangCytonMT,
author = {Xiaolin Wang and
Masao Utiyama and
Eiichiro Sumita},
title = {CytonMT: an Efficient Neural Machine Translation Open-source Toolkit
Implemented in {C++}},
pages = {133--138},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2018}
}
@article{DBLP:journals/corr/abs-1805-10387,
author = {Oleksii Kuchaiev and
Boris Ginsburg and
Igor Gitman and
Vitaly Lavrukhin and
Carl Case and
Paulius Micikevicius},
title = {OpenSeq2Seq: extensible toolkit for distributed and mixed precision
training of sequence-to-sequence models},
journal = {CoRR},
volume = {abs/1805.10387},
year = {2018}
}
@article{nmtpy2017,
author = {Ozan Caglayan and
Mercedes Garc{\'{\i}}a-Mart{\'{\i}}nez and
Adrien Bardet and
Walid Aransa and
Fethi Bougares and
Lo{\"{\i}}c Barrault},
title = {{NMTPY:} {A} Flexible Toolkit for Advanced Neural Machine Translation
Systems},
journal = {Prague Bull. Math. Linguistics},
volume = {109},
pages = {15--28},
year = {2017}
}
%%%%% chapter appendix-A------------------------------------------------------
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论