Commit ecec245e by 曹润柘

合并分支 'caorunzhe' 到 'master'

Caorunzhe

查看合并请求 !208
parents daa4b8f3 9f0f1449
......@@ -364,7 +364,7 @@ $计算这种切分的概率值。
& = & \prod_{i=1}^{m} \funp{P}(x_i|y_i) \funp{P}(y_i | y_{i-1}) \label{eq:joint-prob-xy}
\end{eqnarray}
\noindent 这里,$y_{0}$表示一个虚拟的隐含状态。这样,可以定义$\funp{P}(y_1|y_{0}) \equiv \funp{P}(y_1)$,它表示起始隐含状态出现的概率。隐马尔可夫模型的假设也大大化简了问题,因此可以通过式(\ref{eq:joint-prob-xy})很容易地计算隐含状态序列和可见状态序列出现的概率。值得注意的是,发射概率和转移概率都可以被看作是描述序列生成过程的“特征”。但是,这些“特征”并不是随意定义的,而是符合问题的概率解释。而这种基于事件发生的逻辑所定义的概率生成模型,通常可以被看作是一种{\small\bfnew{生成式模型}}\index{生成式模型}(Generative Model)\index{Generative Model}
\noindent 这里,$y_{0}$表示一个虚拟的隐含状态。这样,可以定义$\funp{P}(y_1|y_{0}) \equiv \funp{P}(y_1)$,它表示起始隐含状态出现的概率。隐马尔可夫模型的假设也大大化简了问题,因此可以通过式(\eqref{eq:joint-prob-xy})很容易地计算隐含状态序列和可见状态序列出现的概率。值得注意的是,发射概率和转移概率都可以被看作是描述序列生成过程的“特征”。但是,这些“特征”并不是随意定义的,而是符合问题的概率解释。而这种基于事件发生的逻辑所定义的概率生成模型,通常可以被看作是一种{\small\bfnew{生成式模型}}\index{生成式模型}(Generative Model)\index{Generative Model}
%----------------------------------------------
\begin{figure}[htp]
......@@ -421,7 +421,7 @@ $计算这种切分的概率值。
\label{eq:3.3-4}
\end{eqnarray}
\parinterval 将式(\ref{eq:joint-prob-xy})带入式(\ref{eq:markov-sequence-argmax})可以得到最终计算公式,如下:
\parinterval 将式(\eqref{eq:joint-prob-xy})带入式(\eqref{eq:markov-sequence-argmax})可以得到最终计算公式,如下:
\begin{eqnarray}
\hat{\seq{Y}} = \arg\max_{\seq{Y}}\prod_{i=1}^{m}\funp{P}(x_i|y_i)\funp{P}(y_i|y_{i-1})
......@@ -452,7 +452,7 @@ $计算这种切分的概率值。
\funp{P}(A|B)+\funp{P}(B|B)+\funp{P}(C|B)+\funp{P}(D|B) & = & 1 \label{eq:3.3-7}
\end{eqnarray}
\noindent 其中,$\funp{P}(b|a)$表示由状态$a$转移到状态$b$的概率,由于式(\ref{eq:3.3-6})中的分式数量少于式(\ref{eq:3.3-7}),这就导致在统计中获得的$\funp{P}(A|A)$$\funp{P}(A|B)$的值很可能会比$\funp{P}(A|B)$$\funp{P}(B|B)$$\funp{P}(C|B)$$\funp{P}(D|B)$要大。
\noindent 其中,$\funp{P}(b|a)$表示由状态$a$转移到状态$b$的概率,由于式(\eqref{eq:3.3-6})中的分式数量少于式(\eqref{eq:3.3-7}),这就导致在统计中获得的$\funp{P}(A|A)$$\funp{P}(A|B)$的值很可能会比$\funp{P}(A|B)$$\funp{P}(B|B)$$\funp{P}(C|B)$$\funp{P}(D|B)$要大。
\parinterval\ref{fig:3.3-5}展示了一个具体的例子,有一个可见状态序列$T F F T$,假设初始隐含状态是$A$,图中线上的概率值是对应的转移概率与发射概率的乘积,比如图中隐含状态$A$开始,下一个隐含状态是$A$且可见状态是$F$的概率是0.45,下一个隐含状态是$B$且可见状态是$F$的概率是0.55。图中可以看出,由于有较大的值,当可见状态序列为$T F F T$时,隐马尔可夫计算出的最有可能的隐含状态序列为$A A A A$。但是如果对训练集进行统计可能会发现,当可见序列为$T F F T$ 时,对应的隐含状态是$A A A A$的概率可能是比较大的,但也可能是比较小的。这个例子中出现预测偏差的主要原因是:由于比其他状态转移概率要大得多,隐含状态的预测一直停留在状态$A$
......@@ -480,14 +480,14 @@ F(y_{i-1},y_i,\seq{X}) & = & t(y_{i-1},y_i,\seq{X},i)+s(y_i,\seq{X},i)
\label{eq:3.3-9}
\end{eqnarray}
\parinterval 公式(\ref{eq:3.3-9})中的$Z(X)$即为上面提到的实现全局统计归一化的归一化因子,其计算方式为:
\parinterval 公式(\eqref{eq:3.3-9})中的$Z(X)$即为上面提到的实现全局统计归一化的归一化因子,其计算方式为:
\begin{eqnarray}
Z(\seq{X})=\sum_{\seq{Y}}\exp(\sum_{i=1}^m\sum_{j=1}^k\lambda_{j}F_{j}(y_{i-1},y_i,x,i))
\label{eq:3.3-10}
\end{eqnarray}
\parinterval 由公式(\ref{eq:3.3-10})可以看出,归一化因子的求解依赖于整个可见状态序列和每个位置的隐含状态,因此条件随机场模型中的归一化是一种全局范围的归一化方式。图\ref{fig:3.3-6}为条件随机场模型处理序列问题的示意图。
\parinterval 由公式(\eqref{eq:3.3-10})可以看出,归一化因子的求解依赖于整个可见状态序列和每个位置的隐含状态,因此条件随机场模型中的归一化是一种全局范围的归一化方式。图\ref{fig:3.3-6}为条件随机场模型处理序列问题的示意图。
%----------------------------------------------
\begin{figure}[htp]
......@@ -498,7 +498,7 @@ Z(\seq{X})=\sum_{\seq{Y}}\exp(\sum_{i=1}^m\sum_{j=1}^k\lambda_{j}F_{j}(y_{i-1},y
\end{figure}
%-------------------------------------------
\parinterval 虽然,式(\ref{eq:3.3-9})和式(\ref{eq:3.3-10})的表述相较于隐马尔可夫模型更加复杂,但是其实现有非常高效的方式。比如,可以使用动态规划方法完成整个条件随机场模型的计算,具体方法读者可以参看参考文献\cite{lafferty2001conditional}
\parinterval 虽然,式(\eqref{eq:3.3-9})和式(\eqref{eq:3.3-10})的表述相较于隐马尔可夫模型更加复杂,但是其实现有非常高效的方式。比如,可以使用动态规划方法完成整个条件随机场模型的计算,具体方法读者可以参看参考文献\cite{lafferty2001conditional}
\parinterval 条件随机场模型处理命名实体识别任务时,可见状态序列对应着文本内容,隐含状态序列对应着待预测的标签。对于命名实体识别任务,需要单独设计若干适合命名实体识别任务的特征函数。例如在使用BIOES标准标注命名实体识别任务时,标签“B-ORG”\footnote{ORG表示机构实体}后面的标签必然是“I-ORG”或是“E-ORG”,而不可能是“O”,针对此规则可以设计相应特征函数。
......
......@@ -1094,7 +1094,7 @@ c_{\mathbb{E}}(s_u|t_v)=\sum\limits_{i=1}^{N} c_{\mathbb{E}}(s_u|t_v;s^{[i]},t^
\begin{itemize}
\vspace{0.5em}
\item 在IBM基础模型之上,有很多改进的工作。例如,对空对齐、低频词进行额外处理\upcite{DBLP:conf/acl/Moore04};考虑源语言-目标语言和目标语言-源语言双向词对齐进行更好地词对齐对称化\upcite{肖桐1991面向统计机器翻译的重对齐方法研究};使用词典、命名实体等多种信息对模型进行改进\upcite{2005Improving};通过引入短语增强IBM基础模型\upcite{1998Grammar};引入相邻单词对齐之间的依赖关系增加模型鲁棒性\upcite{DBLP:conf/acl-vlc/DaganCG93}等;也可以对IBM模型的正向和反向结果进行对称化处理,以得到更加准确词对齐结果\upcite{och2003systematic}
\item 在IBM基础模型之上,有很多改进的工作。例如,对空对齐、低频词进行额外处理\upcite{DBLP:conf/acl/Moore04};考虑源语言-目标语言和目标语言-源语言双向词对齐进行更好地词对齐对称化\upcite{肖桐1991面向统计机器翻译的重对齐方法研究};使用词典、命名实体等多种信息对模型进行改进\upcite{2005Improvin};通过引入短语增强IBM基础模型\upcite{1998Grammar};引入相邻单词对齐之间的依赖关系增加模型鲁棒性\upcite{DBLP:conf/acl-vlc/DaganCG93}等;也可以对IBM模型的正向和反向结果进行对称化处理,以得到更加准确词对齐结果\upcite{och2003systematic}
\item 随着词对齐概念的不断深入,也有很多词对齐方面的工作并不依赖IBM模型。比如,可以直接使用判别式模型利用分类器解决词对齐问题\upcite{ittycheriah2005maximum};使用带参数控制的动态规划方法来提高词对齐准确率\upcite{DBLP:conf/naacl/GaleC91};甚至可以把对齐的思想用于短语和句法结构的双语对应\upcite{xiao2013unsupervised};无监督的对称词对齐方法,正向和反向模型联合训练,结合数据的相似性\upcite{DBLP:conf/naacl/LiangTK06};除了GIZA++,研究人员也开发了很多优秀的自动对齐工具,比如,FastAlign\upcite{DBLP:conf/naacl/DyerCS13}、Berkeley Word Aligner\upcite{taskar2005a}等,这些工具现在也有很广发的应用。
......
......@@ -478,7 +478,7 @@ p_0+p_1 & = & 1 \label{eq:6-21}
\item 扭曲度是机器翻译中的一个经典概念。广义上来说,事物位置的变换都可以用扭曲度进行描述,比如,在物理成像系统中,扭曲度模型可以帮助进行镜头校正\upcite{1966Decentering,ClausF05}。在机器翻译中,扭曲度本质上在描述源语言和目标源单词顺序的偏差。这种偏差可以用于对调序的建模。因此扭曲度的使用也可以被看做是一种对调序问题的描述,这也是机器翻译区别于语音识别等任务的主要因素之一。在早期的统计机器翻译系统中,如Pharaoh\upcite{DBLP:conf/amta/Koehn04},大量使用了扭曲度这个概念。虽然,随着机器翻译的发展,更复杂的调序模型被提出\upcite{Gros2008MSD,xiong2006maximum,och2004alignment,DBLP:conf/naacl/KumarB05,li-etal-2014-neural,vaswani2017attention},但是扭曲度所引发的对调序问题的思考是非常深刻的,这也是IBM模型最大的贡献之一。
\vspace{0.5em}
\item IBM模型的另一个贡献是在机器翻译中引入了繁衍率的概念。本质上,繁衍率是一种对翻译长度的建模。在IBM模型中,通过计算单词的繁衍率就可以得到整个句子的长度。需要注意的是,在机器翻译中译文长度对翻译性能有着至关重要的影响。虽然,在很多机器翻译模型中并没有直接使用繁衍率这个概念,但是几乎所有的现代机器翻译系统中都有译文长度的控制模块。比如,在统计机器翻译和神经机器翻译中,都把译文单词数量作为一个特征用于生成合理长度的译文\upcite{Koehn2007Moses,ChiangLMMRS05,bahdanau2014neural}。此外,在神经机器翻译中,非自回归的解码中也使用繁衍率模型对译文长度进行预测\upcite{2018Non}
\item IBM模型的另一个贡献是在机器翻译中引入了繁衍率的概念。本质上,繁衍率是一种对翻译长度的建模。在IBM模型中,通过计算单词的繁衍率就可以得到整个句子的长度。需要注意的是,在机器翻译中译文长度对翻译性能有着至关重要的影响。虽然,在很多机器翻译模型中并没有直接使用繁衍率这个概念,但是几乎所有的现代机器翻译系统中都有译文长度的控制模块。比如,在统计机器翻译和神经机器翻译中,都把译文单词数量作为一个特征用于生成合理长度的译文\upcite{Koehn2007Moses,ChiangLMMRS05,bahdanau2014neural}。此外,在神经机器翻译中,非自回归的解码中也使用繁衍率模型对译文长度进行预测\upcite{Gu2017NonAutoregressiveNM}
\vspace{0.5em}
\end{itemize}
......
......@@ -464,7 +464,7 @@ d = {(\bar{s}_{\bar{a}_1},\bar{t}_1)} \circ {(\bar{s}_{\bar{a}_2},\bar{t}_2)} \c
\end{figure}
%-------------------------------------------
\parinterval 除此之外,一些外部工具也可以用来获取词对齐,如Fastalign\upcite{dyer2013a}、Berkeley Word Aligner\upcite{taskar2005a}等。词对齐的质量通常使用词对齐错误率(AER)来评价\upcite{DBLP:conf/coling/OchN00},但是词对齐并不是一个独立的系统,它一般会服务于其他任务。因此,也可以使用下游任务来评价词对齐的好坏。比如,改进词对齐后观察机器翻译系统性能的变化。
\parinterval 除此之外,一些外部工具也可以用来获取词对齐,如Fastalign\upcite{DBLP:conf/naacl/DyerCS13}、Berkeley Word Aligner\upcite{taskar2005a}等。词对齐的质量通常使用词对齐错误率(AER)来评价\upcite{DBLP:conf/coling/OchN00},但是词对齐并不是一个独立的系统,它一般会服务于其他任务。因此,也可以使用下游任务来评价词对齐的好坏。比如,改进词对齐后观察机器翻译系统性能的变化。
%----------------------------------------------------------------------------------------
% NEW SUB-SECTION
......@@ -651,7 +651,7 @@ dr = start_i-end_{i-1}-1
\parinterval 想要得到最优的特征权重,最简单的方法是枚举所有的特征权重可能的取值,然后评价每组权重所对应的翻译性能,最后选择最优的特征权重作为调优的结果。但是特征权重是一个实数值,因此可以考虑把实数权重进行量化,即把权重看作是在固定间隔上的取值,比如,每隔0.01取值。即使是这样,同时枚举多个特征的权重也是非常耗时的工作,当特征数量增多时这种方法的效率仍然很低。
\parinterval 这里介绍一种更加高效的特征权重调优方法$\ \dash \ ${\small\bfnew{最小错误率训练}}\index{最小错误率训练}(Minimum Error Rate Training\index{Minimum Error Rate Training},MERT)。最小错误率训练是统计机器翻译发展中代表性工作,也是机器翻译领域原创的重要技术方法之一\upcite{och2003minimum}。最小错误率训练假设:翻译结果相对于标准答案的错误是可度量的,进而可以通过降低错误数量的方式来找到最优的特征权重。假设有样本集合$S = \{(s_1,\seq{r}_1),...,(s_N,\seq{r}_N)\}$$s_i$为样本中第$i$个源语言句子,$\seq{r}_i$为相应的参考译文。注意,$\seq{r}_i$ 可以包含多个参考译文。$S$通常被称为{\small\bfnew{调优集合}}\index{调优集合}(Tuning Set)\index{Tuning Set}。对于$S$中的每个源语句子$s_i$,机器翻译模型会解码出$n$-best推导$\hat{\seq{d}}_{i} = \{\hat{d}_{ij}\}$,其中$\hat{d}_{ij}$表示对于源语言句子$s_i$得到的第$j$个最好的推导。$\{\hat{d}_{ij}\}$可以被定义如下:
\parinterval 这里介绍一种更加高效的特征权重调优方法$\ \dash \ ${\small\bfnew{最小错误率训练}}\index{最小错误率训练}(Minimum Error Rate Training\index{Minimum Error Rate Training},MERT)。最小错误率训练是统计机器翻译发展中代表性工作,也是机器翻译领域原创的重要技术方法之一\upcite{DBLP:conf/acl/Och03}。最小错误率训练假设:翻译结果相对于标准答案的错误是可度量的,进而可以通过降低错误数量的方式来找到最优的特征权重。假设有样本集合$S = \{(s_1,\seq{r}_1),...,(s_N,\seq{r}_N)\}$$s_i$为样本中第$i$个源语言句子,$\seq{r}_i$为相应的参考译文。注意,$\seq{r}_i$ 可以包含多个参考译文。$S$通常被称为{\small\bfnew{调优集合}}\index{调优集合}(Tuning Set)\index{Tuning Set}。对于$S$中的每个源语句子$s_i$,机器翻译模型会解码出$n$-best推导$\hat{\seq{d}}_{i} = \{\hat{d}_{ij}\}$,其中$\hat{d}_{ij}$表示对于源语言句子$s_i$得到的第$j$个最好的推导。$\{\hat{d}_{ij}\}$可以被定义如下:
\begin{eqnarray}
\{\hat{d}_{ij}\} = \arg\max_{\{d_{ij}\}} \sum_{i=1}^{M} \lambda_i \cdot h_i (d,\seq{t},\seq{s})
......@@ -912,7 +912,7 @@ dr = start_i-end_{i-1}-1
\vspace{0.5em}
\item 统计机器翻译中使用的栈解码方法源自Tillmann等人的工作\upcite{tillmann1997a}。这种方法在Pharaoh\upcite{DBLP:conf/amta/Koehn04}、Moses\upcite{Koehn2007Moses}等开源系统中被成功的应用,在机器翻译领域产生了很大的影响力。特别是,这种解码方法效率很高,因此在许多工业系统里也大量使用。对于栈解码也有很多改进工作,比如,早期的工作考虑剪枝或者限制调序范围以加快解码速度\upcite{DBLP:conf/acl/WangW97,DBLP:conf/coling/TillmannN00,DBLP:conf/iwslt/ShenDA06a,robert2007faster}。随后,也有研究工作从解码算法和语言模型集成方式的角度对这类方法进行改进\upcite{DBLP:conf/acl/HeafieldKM14,DBLP:conf/acl/WuebkerNZ12,DBLP:conf/iwslt/ZensN08}
\vspace{0.5em}
\item 统计机器翻译的成功很大程度上来自判别式模型引入任意特征的能力。因此,在统计机器翻译时代,很多工作都集中在新特征的设计上。比如,可以基于不同的统计特征和先验知识设计翻译特征\upcite{och2004smorgasbord,Chiang200911,gildea2003loosely},也可以模仿分类任务设计大规模的稀疏特征\upcite{chiang2008online}。另一方面,模型训练和特征权重调优也是统计机器翻译中的重要问题,除了最小错误率训练,还有很多方法,比如,最大似然估计\upcite{koehn2003statistical,DBLP:journals/coling/BrownPPM94}、判别式方法\upcite{Blunsom2008A}、贝叶斯方法\upcite{Blunsom2009A,Cohn2009A}、最小风险训练\upcite{smith2006minimum,li2009first}、基于Margin的方法\upcite{watanabe2007online,Chiang200911}以及基于排序模型的方法(PRO)\upcite{Hopkins2011Tuning,dreyer2015apro}。实际上,统计机器翻译的训练和解码也存在不一致的问题,比如,特征值由双语数据上的极大似然估计得到(没有剪枝),而解码时却使用束剪枝,而且模型的目标是最大化机器翻译评价指标。对于这个问题也可以通过调整训练的目标函数进行缓解\upcite{XiaoA,marcu2006practical}
\item 统计机器翻译的成功很大程度上来自判别式模型引入任意特征的能力。因此,在统计机器翻译时代,很多工作都集中在新特征的设计上。比如,可以基于不同的统计特征和先验知识设计翻译特征\upcite{och2004smorgasbord,Chiang200911,gildea2003loosely},也可以模仿分类任务设计大规模的稀疏特征\upcite{DBLP:conf/emnlp/ChiangMR08}。另一方面,模型训练和特征权重调优也是统计机器翻译中的重要问题,除了最小错误率训练,还有很多方法,比如,最大似然估计\upcite{koehn2003statistical,DBLP:journals/coling/BrownPPM94}、判别式方法\upcite{Blunsom2008A}、贝叶斯方法\upcite{Blunsom2009A,Cohn2009A}、最小风险训练\upcite{smith2006minimum,li2009first}、基于Margin的方法\upcite{watanabe2007online,Chiang200911}以及基于排序模型的方法(PRO)\upcite{Hopkins2011Tuning,dreyer2015apro}。实际上,统计机器翻译的训练和解码也存在不一致的问题,比如,特征值由双语数据上的极大似然估计得到(没有剪枝),而解码时却使用束剪枝,而且模型的目标是最大化机器翻译评价指标。对于这个问题也可以通过调整训练的目标函数进行缓解\upcite{XiaoA,marcu2006practical}
\vspace{0.5em}
\item 短语表是基于短语的系统中的重要模块。但是,简单的利用基于频次的方法估计得到的翻译概率无法很好的处理低频短语。这时就需要对短语表进行平滑\upcite{DBLP:conf/iwslt/ZensN08,DBLP:conf/emnlp/SchwenkCF07,boxing2011unpacking,DBLP:conf/coling/DuanSZ10}。另一方面,随着数据量的增长和抽取短语长度的增大,短语表的体积会极具膨胀,这也大大增加了系统的存储消耗,同时过大的短语表也会带来短语查询效率的下降。针对这个问题,很多工作尝试对短语表进行压缩。一种思路是限制短语的长度\upcite{DBLP:conf/naacl/QuirkM06,DBLP:journals/coling/MarinoBCGLFC06};另一种广泛使用的思路是使用一些指标或者分类器来对短语进行剪枝,其核心思想是判断每个短语的质量\upcite{DBLP:conf/emnlp/ZensSX12},并过滤掉低质量的短语。代表性的方法有:基于假设检验的剪枝\upcite{DBLP:conf/emnlp/JohnsonMFK07}、基于熵的剪枝\upcite{DBLP:conf/emnlp/LingGTB12}、两阶段短语抽取方法\upcite{DBLP:conf/naacl/ZettlemoyerM07}、基于解码中短语使用频率的方法\upcite{DBLP:conf/naacl/EckVW07}等。此外,短语表的存储方式也是在实际使用中需要考虑的问题。因此,也有研究者尝试使用更加紧凑、高效的结构保存短语表。其中最具代表性的结构是后缀数组(Suffix Arrays),这种结构可以充分利用短语之间有重叠的性质,发幅减少了重复存储\upcite{DBLP:conf/acl/Callison-BurchBS05,DBLP:conf/acl/Callison-BurchBS05,DBLP:conf/naacl/ZensN07,2014Dynamic}
\vspace{0.5em}
......
......@@ -1574,7 +1574,7 @@ d_1 = {d'} \circ {r_5}
\textrm{VP}_1\ \ \textrm{NP}_2 &\rightarrow& \textrm{V103(}\ \ \textrm{VP}_1\ \ \textrm{NP}_2 ) \nonumber
\end{eqnarray}
\noindent 可以看到,这两条新的规则源语言端只有两个部分,代表两个分叉。V103是一个新的标签,它没有任何句法含义。不过,为了保证二叉化后规则目标语部分的连续性,需要考虑源语言和目标语二叉化的同步性\upcite{zhang2006synchronous,Tong2009Better}。这样的规则与CKY方法一起使用完成解码,具体内容可以参考\ref{section-8.2.4}节的内容。
\noindent 可以看到,这两条新的规则源语言端只有两个部分,代表两个分叉。V103是一个新的标签,它没有任何句法含义。不过,为了保证二叉化后规则目标语部分的连续性,需要考虑源语言和目标语二叉化的同步性\upcite{DBLP:conf/naacl/ZhangHGK06,Tong2009Better}。这样的规则与CKY方法一起使用完成解码,具体内容可以参考\ref{section-8.2.4}节的内容。
\vspace{0.5em}
\end{itemize}
......@@ -1592,9 +1592,9 @@ d_1 = {d'} \circ {r_5}
\begin{itemize}
\vspace{0.5em}
\item 从建模的角度看,早期的统计机器翻译模型已经涉及到了树结构的表示问题\upcite{DBLP:conf/acl/AlshawiBX97,DBLP:conf/acl/WangW98}。不过,基于句法的翻译模型的真正崛起还源自同步文法的提出。初期的工作大多集中在反向转录文法和括号转录文法方面\upcite{DBLP:conf/acl-vlc/Wu95,wu1997stochastic,DBLP:conf/acl/WuW98},这类方法也被用于短语获取\upcite{ja2006obtaining,DBLP:conf/acl/ZhangQMG08}。进一步,研究者提出了更加通用的层次模型来描述翻译过程\upcite{chiang2005a,DBLP:conf/coling/ZollmannVOP08,DBLP:conf/acl/WatanabeTI06},本章介绍的层次短语模型就是其中典型的代表。之后,使用语言学句法的模型也逐渐兴起。最具代表性的是在单语言端使用语言学句法信息的模型\upcite{DBLP:conf/naacl/GalleyHKM04,galley2006scalable,marcu2006spmt,DBLP:conf/naacl/HuangK06,DBLP:conf/emnlp/DeNeefeKWM07,DBLP:conf/wmt/LiuG08,DBLP:conf/acl/LiuLL06},即:树到串翻译模型和串到树翻译模型。值得注意的是,除了直接用句法信息定义翻译规则,也有研究者将句法信息作为软约束改进层次短语模型\upcite{zollmann2006syntax,DBLP:conf/acl/MartonR08}。这类方法具有很大的灵活性,既保留了层次短语模型比较健壮的特点,同时也兼顾了语言学句法对翻译的指导作用。在同一时期,也有研究者提出同时使用双语两端的语言学句法树对翻译进行建模,比较有代表性的工作是使用同步树插入文法(Synchronous Tree-Insertion Grammars)和同步树替换文法(Synchronous Tree-Substitution Grammars)进行树到树翻译的建模\upcite{Nesson06inductionof,Zhang07atree-to-tree,DBLP:conf/acl/LiuLL09}。不过,树到树翻译假设两种语言间的句法结构能够相互转换,而这个假设并不总是成立。因此树到树翻译系统往往要配合一些技术,如树二叉化,来提升系统的健壮性。
\item 从建模的角度看,早期的统计机器翻译模型已经涉及到了树结构的表示问题\upcite{DBLP:conf/acl/AlshawiBX97,DBLP:conf/acl/WangW98}。不过,基于句法的翻译模型的真正崛起还源自同步文法的提出。初期的工作大多集中在反向转录文法和括号转录文法方面\upcite{DBLP:conf/acl-vlc/Wu95,wu1997stochastic,DBLP:conf/acl/WuW98},这类方法也被用于短语获取\upcite{ja2006obtaining,DBLP:conf/acl/ZhangQMG08}。进一步,研究者提出了更加通用的层次模型来描述翻译过程\upcite{chiang2005a,DBLP:conf/coling/ZollmannVOP08,DBLP:conf/acl/WatanabeTI06},本章介绍的层次短语模型就是其中典型的代表。之后,使用语言学句法的模型也逐渐兴起。最具代表性的是在单语言端使用语言学句法信息的模型\upcite{DBLP:conf/naacl/GalleyHKM04,galley2006scalable,marcu2006spmt,DBLP:conf/naacl/HuangK06,DBLP:conf/emnlp/DeNeefeKWM07,DBLP:conf/wmt/LiuG08,liu2006tree},即:树到串翻译模型和串到树翻译模型。值得注意的是,除了直接用句法信息定义翻译规则,也有研究者将句法信息作为软约束改进层次短语模型\upcite{DBLP:conf/wmt/ZollmannV06,DBLP:conf/acl/MartonR08}。这类方法具有很大的灵活性,既保留了层次短语模型比较健壮的特点,同时也兼顾了语言学句法对翻译的指导作用。在同一时期,也有研究者提出同时使用双语两端的语言学句法树对翻译进行建模,比较有代表性的工作是使用同步树插入文法(Synchronous Tree-Insertion Grammars)和同步树替换文法(Synchronous Tree-Substitution Grammars)进行树到树翻译的建模\upcite{Nesson06inductionof,Zhang07atree-to-tree,liu2009improving}。不过,树到树翻译假设两种语言间的句法结构能够相互转换,而这个假设并不总是成立。因此树到树翻译系统往往要配合一些技术,如树二叉化,来提升系统的健壮性。
\vspace{0.5em}
\item 在基于句法的模型中,常常会使用句法分析器完成句法分析树的生成。由于句法分析器会产生错误,因此这些错误会对机器翻译系统产生影响。对于这个问题,一种解决办法是同时考虑更多的句法树,这样增加正确句法分析结果被使用到的概率。其中,比较典型的方式基于句法森林的方法\upcite{DBLP:conf/acl/MiHL08,DBLP:conf/emnlp/MiH08},比如,在规则抽取或者解码阶段使用句法森林,而不是仅仅使用一棵单独的句法树。另一种思路是,对句法结构进行松弛操作,即在翻译的过程中并不严格遵循句法结构\upcite{DBLP:conf/acl/ZhuX11,DBLP:conf/emnlp/ZhangZZ11}。实际上,前面提到的基于句法软约束的模型也是这类方法的一种体现\upcite{DBLP:conf/wmt/ZollmannV06,DBLP:conf/acl/MartonR08}。实际上,机器翻译领域的长期存在一个问题:使用什么样的句法结构是最适合机器翻译?因此,有研究者尝试对比不同的句法分析结果对机器翻译系统的影响\upcite{DBLP:conf/wmt/PopelMGZ11,DBLP:conf/coling/XiaoZZZ10}。也有研究者面向机器翻译任务自动归纳句法结构\upcite{DBLP:journals/tacl/ZhaiZZZ13},而不是直接使用从单语小规模树库学习到的句法分析器,这样可以提高系统的健壮性。
\item 在基于句法的模型中,常常会使用句法分析器完成句法分析树的生成。由于句法分析器会产生错误,因此这些错误会对机器翻译系统产生影响。对于这个问题,一种解决办法是同时考虑更多的句法树,这样增加正确句法分析结果被使用到的概率。其中,比较典型的方式基于句法森林的方法\upcite{DBLP:conf/acl/MiHL08,DBLP:conf/emnlp/MiH08},比如,在规则抽取或者解码阶段使用句法森林,而不是仅仅使用一棵单独的句法树。另一种思路是,对句法结构进行松弛操作,即在翻译的过程中并不严格遵循句法结构\upcite{zhu2011improving,DBLP:conf/emnlp/ZhangZZ11}。实际上,前面提到的基于句法软约束的模型也是这类方法的一种体现\upcite{DBLP:conf/wmt/ZollmannV06,DBLP:conf/acl/MartonR08}。实际上,机器翻译领域的长期存在一个问题:使用什么样的句法结构是最适合机器翻译?因此,有研究者尝试对比不同的句法分析结果对机器翻译系统的影响\upcite{DBLP:conf/wmt/PopelMGZ11,DBLP:conf/coling/XiaoZZZ10}。也有研究者面向机器翻译任务自动归纳句法结构\upcite{DBLP:journals/tacl/ZhaiZZZ13},而不是直接使用从单语小规模树库学习到的句法分析器,这样可以提高系统的健壮性。
\vspace{0.5em}
\item 本章所讨论的模型大多基于短语结构树。另一个重要的方向是使用依存树进行翻译建模\upcite{DBLP:journals/mt/QuirkM06,DBLP:conf/wmt/XiongLL07,DBLP:conf/coling/Lin04}。依存树比短语结构树有更简单的结构,而且依存关系本身也是对“语义”的表征,因此也可以扑捉到短语结构树所无法涵盖的信息。同其它基于句法的模型类似,基于依存树的模型大多也需要进行规则抽取、解码等步骤,因此这方面的研究工作大多涉及翻译规则的抽取、基于依存树的解码等\upcite{DBLP:conf/acl/DingP05,DBLP:conf/coling/ChenXMJL14,DBLP:conf/coling/SuLMZLL10,DBLP:conf/coling/XieXL14,DBLP:conf/emnlp/LiWL15}。此外,基于依存树的模型也可以与句法森林结构相结合,对系统性能进行进一步提升\upcite{DBLP:conf/acl/MiL10,DBLP:conf/coling/TuLHLL10}
\vspace{0.5em}
......
......@@ -186,7 +186,7 @@
Keith Stevens and
George Kurian and
Nishant Patil and
Wei Wang and
Wei Wang and
Cliff Young and
Jason Smith and
Jason Riesa and
......@@ -221,7 +221,6 @@
Yann N. Dauphin},
title = {Convolutional Sequence to Sequence Learning},
publisher = {International Conference on Machine Learning},
//series = {Proceedings of Machine Learning Research},
volume = {70},
pages = {1243--1252},
year = {2017}
......@@ -273,7 +272,6 @@
Yoshua Bengio and
Aaron C. Courville},
title = {Deep Learning},
//series = {Adaptive computation and machine learning},
publisher = {{MIT} Press},
year = {2016}
}
......@@ -386,7 +384,6 @@
publisher={International Conference on Acoustics, Speech, and Signal Processing},
pages={825--828},
year={1991},
//organization={IEEE Computer Society}
}
@inproceedings{stolcke2002srilm,
......@@ -425,7 +422,6 @@
title = {Speech and language processing: an introduction to natural language
processing, computational linguistics, and speech recognition, 2nd
Edition},
//series = {Prentice Hall series in artificial intelligence},
publisher = {Prentice Hall, Pearson Education International},
year = {2009}
}
......@@ -743,7 +739,7 @@
@inproceedings{Koehn2007Moses,
author = {Philipp Koehn and
Hieu Hoang and
Alexandra Birch and
Alexandra Birch and
Chris Callison-Burch and
Marcello Federico and
Nicola Bertoldi and
......@@ -824,7 +820,7 @@
Kevin Knight and
Daniel Marcu and
Steve DeNeefe and
Wei Wang and
Wei Wang and
Ignacio Thayer},
title = {Scalable Inference and Training of Context-Rich Syntactic Translation
Models},
......@@ -1647,7 +1643,7 @@
@inproceedings{DBLP:conf/wmt/Callison-BurchF07,
author = {Chris Callison-Burch and
Cameron S. Fordyce and
Philipp Koehn and
Philipp Koehn and
Christof Monz and
Josh Schroeder},
title = {(Meta-) Evaluation of Machine Translation},
......@@ -1681,7 +1677,7 @@
Barry Haddow and
Matthias Huck and
Chris Hokamp and
Philipp Koehn and
Philipp Koehn and
Varvara Logacheva and
Christof Monz and
Matteo Negri and
......@@ -2056,16 +2052,6 @@
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2013}
}
@article{DBLP:journals/coling/FraserM07,
author = {Alexander M. Fraser and
Daniel Marcu},
title = {Measuring Word Alignment Quality for Statistical Machine Translation},
journal = {Computational Linguistics},
volume = {33},
number = {3},
pages = {293--303},
year = {2007}
}
@inproceedings{DBLP:conf/acl/DeNeroK07,
author = {John DeNero and
Dan Klein},
......@@ -2252,17 +2238,6 @@
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2005},
}
@inproceedings{2018Non,
author = {Jiatao Gu and
James Bradbury and
Caiming Xiong and
Victor O. K. Li and
Richard Socher},
title = {Non-Autoregressive Neural Machine Translation},
publisher = {OpenReview.net},
year = {2018}
}
%%%%% chapter 6------------------------------------------------------
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
......@@ -2297,15 +2272,6 @@
publisher = {AAAI Press},
year = {2000}
}
@inproceedings{dyer2013a,
author = {Chris Dyer and
Victor Chahuneau and
Noah A. Smith},
title = {A Simple, Fast, and Effective Reparameterization of {IBM} Model 2},
pages = {644--648},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2013}
}
@inproceedings{taskar2005a,
author = {Benjamin Taskar and
Simon Lacoste-Julien and
......@@ -2366,13 +2332,7 @@
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2014}
}
@inproceedings{och2003minimum,
author = {Franz Josef Och},
title = {Minimum Error Rate Training in Statistical Machine Translation},
pages = {160--167},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2003}
}
@article{powell1964an,
author = {M. J. D. Powell},
title = {An efficient method for finding the minimum of a function of several
......@@ -2721,16 +2681,6 @@
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2003}
}
@inproceedings{chiang2008online,
author = {David Chiang and
Yuval Marton and
Philip Resnik},
title = {Online Large-Margin Training of Syntactic and Structural Translation
Features},
pages = {224--233},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2008}
}
@inproceedings{Blunsom2008A,
author = {Phil Blunsom and
Trevor Cohn and
......@@ -2811,15 +2761,6 @@
publisher={University of Southern California},
year={2006},
}
@inproceedings{DBLP:conf/iwslt/ZensN08,
author = {Richard Zens and
Hermann Ney},
title = {Improvements in dynamic programming beam search for phrase-based statistical
machine translation},
pages = {198--205},
publisher = {International Symposium on Computer Architecture},
year = {2008}
}
@inproceedings{DBLP:conf/emnlp/SchwenkCF07,
author = {Holger Schwenk and
Marta R. Costa-juss{\`{a}} and
......@@ -2968,13 +2909,6 @@
pages = {1159--1187},
year = {2012}
}
@inproceedings{chiang2005a,
author = {David Chiang},
title = {A Hierarchical Phrase-Based Model for Statistical Machine Translation},
pages = {263--270},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2005}
}
@article{chiang2007hierarchical,
title={Hierarchical Phrase-Based Translation},
author ={Chiang David},
......@@ -3031,20 +2965,6 @@
year ={2006},
publisher ={Computationally Hard Problems \& Joint Inference in Speech \& Language Processing}
}
@inproceedings{galley2006scalable,
author = {Michel Galley and
Jonathan Graehl and
Kevin Knight and
Daniel Marcu and
Steve DeNeefe and
Wei Wang and
Ignacio Thayer},
title = {Scalable Inference and Training of Context-Rich Syntactic Translation
Models},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2006}
}
@inproceedings{galley2004s,
title ={What’s in a translation rule?},
author ={Galley, Michel and Hopkins, Mark and Knight, Kevin and Marcu, Daniel},
......@@ -3216,15 +3136,6 @@
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2011}
}
@inproceedings{zhang2006synchronous,
author = {Hao Zhang and
Liang Huang and
Daniel Gildea and
Kevin Knight},
title = {Synchronous Binarization for Machine Translation},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2006}
}
@inproceedings{DBLP:conf/acl/AlshawiBX97,
author = {Hiyan Alshawi and
Adam L. Buchsbaum and
......@@ -3304,7 +3215,7 @@
@inproceedings{DBLP:conf/emnlp/DeNeefeKWM07,
author = {Steve DeNeefe and
Kevin Knight and
Wei Wang and
Wei Wang and
Daniel Marcu},
title = {What Can Syntax-Based {MT} Learn from Phrase-Based MT?},
pages = {755--763},
......@@ -3319,30 +3230,6 @@
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2008}
}
@inproceedings{DBLP:conf/acl/LiuLL06,
author = {Yang Liu and
Qun Liu and
Shouxun Lin},
title = {Tree-to-String Alignment Template for Statistical Machine Translation},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2006}
}
@inproceedings{zollmann2006syntax,
author = {Andreas Zollmann and
Ashish Venugopal},
title = {Syntax Augmented Machine Translation via Chart Parsing},
pages = {138--141},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2006}
}
@inproceedings{DBLP:conf/acl/MartonR08,
author = {Yuval Marton and
Philip Resnik},
title = {Soft Syntactic Constraints for Hierarchical Phrased-Based Translation},
pages = {1003--1011},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2008}
}
@INPROCEEDINGS{Nesson06inductionof,
author = {Rebecca Nesson and Stuart M. Shieber and Alexander Rush},
title = {Induction of probabilistic synchronous tree-insertion grammars for machine translation},
......@@ -3355,15 +3242,6 @@
year = {2007},
publisher = {Machine Translation Summit}
}
@inproceedings{DBLP:conf/acl/LiuLL09,
author = {Yang Liu and
Yajuan L{\"{u}} and
Qun Liu},
title = {Improving Tree-to-Tree Translation with Packed Forests},
pages = {558--566},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2009}
}
@inproceedings{DBLP:conf/emnlp/WangKM07,
author = {Wei Wang and
Kevin Knight and
......@@ -3391,14 +3269,6 @@
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2008}
}
@inproceedings{DBLP:conf/acl/ZhuX11,
author = {Jingbo Zhu and
Tong Xiao},
title = {Improving Decoding Generalization for Tree-to-String Translation},
pages = {418--423},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2011}
}
@inproceedings{DBLP:conf/emnlp/ZhangZZ11,
author = {Jiajun Zhang and
Feifei Zhai and
......@@ -3783,11 +3653,6 @@
number = {1},
pages = {145--151},
year = {1999},
//url = {https://doi.org/10.1016/S0893-6080(98)00116-6},
//doi = {10.1016/S0893-6080(98)00116-6},
//timestamp = {Wed, 14 Nov 2018 10:30:22 +0100},
//biburl = {https://dblp.org/rec/journals/nn/Qian99.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@article{duchi2011adaptive,
......@@ -3799,10 +3664,6 @@
volume = {12},
pages = {2121--2159},
year = {2011},
//url = {http://dl.acm.org/citation.cfm?id=2021068},
//timestamp = {Wed, 10 Jul 2019 15:28:02 +0200},
//biburl = {https://dblp.org/rec/journals/jmlr/DuchiHS11.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@article{tieleman2012rmsprop,
......@@ -3818,23 +3679,15 @@
@inproceedings{kingma2014adam,
author = {Diederik P. Kingma and
Jimmy Ba},
//editor = {Yoshua Bengio and
Yann LeCun},
title = {Adam: {A} Method for Stochastic Optimization},
booktitle = {3rd International Conference on Learning Representations, {ICLR} 2015,
San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings},
year = {2015},
//url = {http://arxiv.org/abs/1412.6980},
//timestamp = {Thu, 25 Jul 2019 14:25:37 +0200},
//biburl = {https://dblp.org/rec/journals/corr/KingmaB14.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@inproceedings{ioffe2015batch,
author = {Sergey Ioffe and
Christian Szegedy},
//editor = {Francis R. Bach and
David M. Blei},
title = {Batch Normalization: Accelerating Deep Network Training by Reducing
Internal Covariate Shift},
booktitle = {Proceedings of the 32nd International Conference on Machine Learning,
......@@ -3844,10 +3697,6 @@
pages = {448--456},
publisher = {JMLR.org},
year = {2015},
//url = {http://proceedings.mlr.press/v37/ioffe15.html},
//timestamp = {Wed, 29 May 2019 08:41:45 +0200},
//biburl = {https://dblp.org/rec/conf/icml/IoffeS15.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@article{Ba2016LayerN,
......@@ -3858,30 +3707,6 @@
journal = {CoRR},
volume = {abs/1607.06450},
year = {2016},
//url = {http://arxiv.org/abs/1607.06450},
//archivePrefix = {arXiv},
//eprint = {1607.06450},
//timestamp = {Tue, 23 Jul 2019 17:33:23 +0200},
//biburl = {https://dblp.org/rec/journals/corr/BaKH16.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@inproceedings{DBLP:journals/corr/HeZRS15,
author = {Kaiming He and
Xiangyu Zhang and
Shaoqing Ren and
Jian Sun},
title = {Deep Residual Learning for Image Recognition},
booktitle = {2016 {IEEE} Conference on Computer Vision and Pattern Recognition,
{CVPR} 2016, Las Vegas, NV, USA, June 27-30, 2016},
pages = {770--778},
publisher = {{IEEE} Computer Society},
year = {2016},
//url = {https://doi.org/10.1109/CVPR.2016.90},
//doi = {10.1109/CVPR.2016.90},
//timestamp = {Wed, 16 Oct 2019 14:14:50 +0200},
//biburl = {https://dblp.org/rec/conf/cvpr/HeZRS16.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@inproceedings{mikolov2013distributed,
......@@ -3890,20 +3715,12 @@
Kai Chen and
Gregory S. Corrado and
Jeffrey Dean},
//editor = {Christopher J. C. Burges and
L{\'{e}}on Bottou and
Zoubin Ghahramani and
Kilian Q. Weinberger},
title = {Distributed Representations of Words and Phrases and their Compositionality},
booktitle = {Advances in Neural Information Processing Systems 26: 27th Annual
Conference on Neural Information Processing Systems 2013. Proceedings
of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States},
pages = {3111--3119},
year = {2013},
//url = {http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality},
//timestamp = {Fri, 06 Mar 2020 17:00:12 +0100},
//biburl = {https://dblp.org/rec/conf/nips/MikolovSCCD13.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@article{guidotti2018survey,
......@@ -3919,39 +3736,24 @@
number = {5},
pages = {93:1--93:42},
year = {2019},
//url = {https://doi.org/10.1145/3236009},
//doi = {10.1145/3236009},
//timestamp = {Thu, 09 May 2019 16:06:21 +0200},
//biburl = {https://dblp.org/rec/journals/csur/GuidottiMRTGP19.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@inproceedings{koh2017understanding,
author = {Pang Wei Koh and
Percy Liang},
//editor = {Doina Precup and
Yee Whye Teh},
title = {Understanding Black-box Predictions via Influence Functions},
booktitle = {Proceedings of the 34th International Conference on Machine Learning,
{ICML} 2017, Sydney, NSW, Australia, 6-11 August 2017},
series = {Proceedings of Machine Learning Research},
volume = {70},
pages = {1885--1894},
publisher = {{PMLR}},
year = {2017},
//url = {http://proceedings.mlr.press/v70/koh17a.html},
//timestamp = {Wed, 29 May 2019 08:41:45 +0200},
//biburl = {https://dblp.org/rec/conf/icml/KohL17.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@inproceedings{arthur2016incorporating,
author = {Philip Arthur and
Graham Neubig and
Satoshi Nakamura},
//editor = {Jian Su and
Xavier Carreras and
Kevin Duh},
title = {Incorporating Discrete Translation Lexicons into Neural Machine Translation},
booktitle = {Proceedings of the 2016 Conference on Empirical Methods in Natural
Language Processing, {EMNLP} 2016, Austin, Texas, USA, November 1-4,
......@@ -3959,28 +3761,6 @@
pages = {1557--1567},
publisher = {The Association for Computational Linguistics},
year = {2016},
//url = {https://doi.org/10.18653/v1/d16-1162},
//doi = {10.18653/v1/d16-1162},
//timestamp = {Tue, 28 Jan 2020 10:28:31 +0100},
//biburl = {https://dblp.org/rec/conf/emnlp/ArthurNN16.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@inproceedings{zollmann2006syntax,
author = {Andreas Zollmann and
Ashish Venugopal},
//editor = {Philipp Koehn and
Christof Monz},
title = {Syntax Augmented Machine Translation via Chart Parsing},
booktitle = {Proceedings on the Workshop on Statistical Machine Translation, WMT@HLT-NAACL
2006, New York City, NY, USA, June 8-9, 2006},
pages = {138--141},
publisher = {Association for Computational Linguistics},
year = {2006},
//url = {https://www.aclweb.org/anthology/W06-3119/},
//timestamp = {Fri, 13 Sep 2019 13:08:46 +0200},
//biburl = {https://dblp.org/rec/conf/wmt/ZollmannV06.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@INPROCEEDINGS{charniak2003syntax,
......@@ -4001,11 +3781,6 @@
2: Short Papers},
publisher = {The Association for Computer Linguistics},
year = {2016},
//url = {https://doi.org/10.18653/v1/p16-2049},
//doi = {10.18653/v1/p16-2049},
//timestamp = {Tue, 28 Jan 2020 10:27:31 +0100},
//biburl = {https://dblp.org/rec/conf/acl/StahlbergHWB16.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@inproceedings{plank2013embedding,
......@@ -4019,21 +3794,12 @@
pages = {1498--1507},
publisher = {The Association for Computer Linguistics},
year = {2013},
//url = {https://www.aclweb.org/anthology/P13-1147/},
//timestamp = {Mon, 19 Aug 2019 18:10:05 +0200},
//biburl = {https://dblp.org/rec/conf/acl/PlankM13.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@inproceedings{perozzi2014deepwalk,
author = {Bryan Perozzi and
Rami Al-Rfou and
Steven Skiena},
//editor = {Sofus A. Macskassy and
Claudia Perlich and
Jure Leskovec and
Wei Wang and
Rayid Ghani},
title = {DeepWalk: online learning of social representations},
booktitle = {The 20th {ACM} {SIGKDD} International Conference on Knowledge Discovery
and Data Mining, {KDD} '14, New York, NY, {USA} - August 24 - 27,
......@@ -4041,17 +3807,12 @@
pages = {701--710},
publisher = {{ACM}},
year = {2014},
//url = {https://doi.org/10.1145/2623330.2623732},
//doi = {10.1145/2623330.2623732},
//timestamp = {Sun, 02 Jun 2019 21:11:52 +0200},
//biburl = {https://dblp.org/rec/conf/kdd/PerozziAS14.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@article{collobert2011natural,
author = {Ronan Collobert and
Jason Weston and
L{\'{e}}on Bottou and
L{\'{e}}on Bottou and
Michael Karlen and
Koray Kavukcuoglu and
Pavel P. Kuksa},
......@@ -4060,10 +3821,6 @@
volume = {12},
pages = {2493--2537},
year = {2011},
//url = {http://dl.acm.org/citation.cfm?id=2078186},
//timestamp = {Wed, 10 Jul 2019 15:28:44 +0200},
//biburl = {https://dblp.org/rec/journals/jmlr/CollobertWBKKK11.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@inproceedings{mccann2017learned,
......@@ -4071,23 +3828,12 @@
James Bradbury and
Caiming Xiong and
Richard Socher},
//editor = {Isabelle Guyon and
Ulrike von Luxburg and
Samy Bengio and
Hanna M. Wallach and
Rob Fergus and
S. V. N. Vishwanathan and
Roman Garnett},
title = {Learned in Translation: Contextualized Word Vectors},
booktitle = {Advances in Neural Information Processing Systems 30: Annual Conference
on Neural Information Processing Systems 2017, 4-9 December 2017,
Long Beach, CA, {USA}},
pages = {6294--6305},
year = {2017},
//url = {http://papers.nips.cc/paper/7209-learned-in-translation-contextualized-word-vectors},
//timestamp = {Fri, 06 Mar 2020 16:57:53 +0100},
//biburl = {https://dblp.org/rec/conf/nips/McCannBXS17.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
%%%%% chapter 9------------------------------------------------------
......@@ -4110,16 +3856,9 @@
pages = {1370--1380},
//publisher = {The Association for Computer Linguistics},
year = {2014},
//url = {https://doi.org/10.3115/v1/p14-1129},
//doi = {10.3115/v1/p14-1129},
//timestamp = {Tue, 28 Jan 2020 10:27:56 +0100},
//biburl = {https://dblp.org/rec/conf/acl/DevlinZHLSM14.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@inproceedings{Schwenk_continuousspace,
author = {Holger Schwenk},
//editor = {Martin Kay and
Christian Boitet},
title = {Continuous Space Translation Models for Phrase-Based Statistical Machine
Translation},
publisher = {{COLING} 2012, 24th International Conference on Computational Linguistics,
......@@ -4128,10 +3867,6 @@
pages = {1071--1080},
//publisher = {Indian Institute of Technology Bombay},
year = {2012},
//url = {https://www.aclweb.org/anthology/C12-2104/},
//timestamp = {Wed, 18 Sep 2019 12:15:53 +0200},
//biburl = {https://dblp.org/rec/conf/coling/Schwenk12.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@inproceedings{kalchbrenner-blunsom-2013-recurrent,
author = {Nal Kalchbrenner and
......@@ -4144,10 +3879,6 @@
pages = {1700--1709},
//publisher = {{ACL}},
year = {2013},
//url = {https://www.aclweb.org/anthology/D13-1176/},
//timestamp = {Fri, 13 Sep 2019 13:08:45 +0200},
//biburl = {https://dblp.org/rec/conf/emnlp/KalchbrennerB13.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@article{HochreiterThe,
author = {Sepp Hochreiter},
......@@ -4159,11 +3890,6 @@
number = {2},
pages = {107--116},
year = {1998},
//url = {https://doi.org/10.1142/S0218488598000094},
//doi = {10.1142/S0218488598000094},
//timestamp = {Wed, 14 Nov 2018 10:41:42 +0100},
//biburl = {https://dblp.org/rec/journals/ijufks/Hochreiter98.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@article{BENGIO1994Learning,
author ={Y. {Bengio} and P. {Simard} and P. {Frasconi}},
......@@ -4183,23 +3909,12 @@ pages ={157-166},
Aidan N. Gomez and
Lukasz Kaiser and
Illia Polosukhin},
//editor = {Isabelle Guyon and
Ulrike von Luxburg and
Samy Bengio and
Hanna M. Wallach and
Rob Fergus and
S. V. N. Vishwanathan and
Roman Garnett},
title = {Attention is All you Need},
publisher = {Advances in Neural Information Processing Systems 30: Annual Conference
on Neural Information Processing Systems 2017, 4-9 December 2017,
Long Beach, CA, {USA}},
pages = {5998--6008},
year = {2017},
//url = {http://papers.nips.cc/paper/7181-attention-is-all-you-need},
//timestamp = {Fri, 06 Mar 2020 17:00:11 +0100},
//biburl = {https://dblp.org/rec/conf/nips/VaswaniSPUJGKP17.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@article{StahlbergNeural,
author = {Felix Stahlberg},
......@@ -4207,21 +3922,12 @@ pages ={157-166},
journal = {CoRR},
volume = {abs/1912.02047},
year = {2019},
//url = {http://arxiv.org/abs/1912.02047},
//archivePrefix = {arXiv},
//eprint = {1912.02047},
//timestamp = {Thu, 02 Jan 2020 18:08:18 +0100},
//biburl = {https://dblp.org/rec/journals/corr/abs-1912-02047.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@inproceedings{Bentivogli2016NeuralVP,
author = {Luisa Bentivogli and
Arianna Bisazza and
Mauro Cettolo and
Marcello Federico},
//editor = {Jian Su and
Xavier Carreras and
Kevin Duh},
title = {Neural versus Phrase-Based Machine Translation Quality: a Case Study},
publisher = {Proceedings of the 2016 Conference on Empirical Methods in Natural
Language Processing, {EMNLP} 2016, Austin, Texas, USA, November 1-4,
......@@ -4229,11 +3935,6 @@ pages ={157-166},
pages = {257--267},
//publisher = {The Association for Computational Linguistics},
year = {2016},
//url = {https://doi.org/10.18653/v1/d16-1025},
//doi = {10.18653/v1/d16-1025},
//timestamp = {Tue, 28 Jan 2020 10:28:39 +0100},
//biburl = {https://dblp.org/rec/conf/emnlp/BentivogliBCF16.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@article{Hassan2018AchievingHP,
author = {Hany Hassan and
......@@ -4264,12 +3965,6 @@ pages ={157-166},
journal = {CoRR},
volume = {abs/1803.05567},
year = {2018},
//url = {http://arxiv.org/abs/1803.05567},
//archivePrefix = {arXiv},
//eprint = {1803.05567},
//timestamp = {Mon, 13 Aug 2018 16:47:23 +0200},
//biburl = {https://dblp.org/rec/journals/corr/abs-1803-05567.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@inproceedings{WangLearning,
author = {Qiang Wang and
......@@ -4279,9 +3974,6 @@ pages ={157-166},
Changliang Li and
Derek F. Wong and
Lidia S. Chao},
//editor = {Anna Korhonen and
David R. Traum and
Llu{\'{\i}}s M{\`{a}}rquez},
title = {Learning Deep Transformer Models for Machine Translation},
publisher = {Proceedings of the 57th Conference of the Association for Computational
Linguistics, {ACL} 2019, Florence, Italy, July 28- August 2, 2019,
......@@ -4289,11 +3981,6 @@ pages ={157-166},
pages = {1810--1822},
//publisher = {Association for Computational Linguistics},
year = {2019},
//url = {https://doi.org/10.18653/v1/p19-1176},
//doi = {10.18653/v1/p19-1176},
//timestamp = {Tue, 28 Jan 2020 10:27:53 +0100},
//biburl = {https://dblp.org/rec/conf/acl/WangLXZLWC19.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@article{Li2020NeuralMT,
author = {Yanyang Li and
......@@ -4305,12 +3992,6 @@ pages ={157-166},
journal = {CoRR},
volume = {abs/2002.06546},
year = {2020},
//url = {https://arxiv.org/abs/2002.06546},
//archivePrefix = {arXiv},
//eprint = {2002.06546},
//timestamp = {Mon, 02 Mar 2020 16:46:06 +0100},
//biburl = {https://dblp.org/rec/journals/corr/abs-2002-06546.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@article{HochreiterLong,
author = {Hochreiter, Sepp and Schmidhuber, Jürgen},
......@@ -4320,7 +4001,6 @@ pages ={157-166},
title = {Long Short-term Memory},
volume = {9},
journal = {Neural computation},
//doi = {10.1162/neco.1997.9.8.1735}
}
@inproceedings{Cho2014Learning,
author = {Kyunghyun Cho and
......@@ -4330,9 +4010,6 @@ pages ={157-166},
Fethi Bougares and
Holger Schwenk and
Yoshua Bengio},
//editor = {Alessandro Moschitti and
Bo Pang and
Walter Daelemans},
title = {Learning Phrase Representations using {RNN} Encoder-Decoder for Statistical
Machine Translation},
publisher = {Proceedings of the 2014 Conference on Empirical Methods in Natural
......@@ -4341,37 +4018,24 @@ pages ={157-166},
pages = {1724--1734},
//publisher = {{ACL}},
year = {2014},
//url = {https://doi.org/10.3115/v1/d14-1179},
//doi = {10.3115/v1/d14-1179},
//timestamp = {Tue, 28 Jan 2020 10:28:17 +0100},
//biburl = {https://dblp.org/rec/conf/emnlp/ChoMGBBSB14.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@inproceedings{pmlr-v9-glorot10a,
author = {Xavier Glorot and
Yoshua Bengio},
//editor = {Yee Whye Teh and
D. Mike Titterington},
title = {Understanding the difficulty of training deep feedforward neural networks},
publisher = {Proceedings of the Thirteenth International Conference on Artificial
Intelligence and Statistics, {AISTATS} 2010, Chia Laguna Resort, Sardinia,
Italy, May 13-15, 2010},
//series = {{JMLR} Proceedings},
volume = {9},
pages = {249--256},
//publisher = {JMLR.org},
year = {2010},
//url = {http://proceedings.mlr.press/v9/glorot10a.html},
//timestamp = {Wed, 29 May 2019 08:41:47 +0200},
//biburl = {https://dblp.org/rec/journals/jmlr/GlorotB10.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@inproceedings{xiao2017fast,
author = {Tong Xiao and
Jingbo Zhu and
Tongran Liu and
Chunliang Zhang},
//editor = {Carles Sierra},
title = {Fast Parallel Training of Neural Language Models},
publisher = {Proceedings of the Twenty-Sixth International Joint Conference on
Artificial Intelligence, {IJCAI} 2017, Melbourne, Australia, August
......@@ -4379,11 +4043,6 @@ pages ={157-166},
pages = {4193--4199},
//publisher = {ijcai.org},
year = {2017},
//url = {https://doi.org/10.24963/ijcai.2017/586},
//doi = {10.24963/ijcai.2017/586},
//timestamp = {Tue, 20 Aug 2019 16:17:12 +0200},
//biburl = {https://dblp.org/rec/conf/ijcai/XiaoZLZ17.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@inproceedings{Gu2017NonAutoregressiveNM,
author = {Jiatao Gu and
......@@ -4392,14 +4051,8 @@ pages ={157-166},
Victor O. K. Li and
Richard Socher},
title = {Non-Autoregressive Neural Machine Translation},
publisher = {6th International Conference on Learning Representations, {ICLR} 2018,
Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings},
//publisher = {OpenReview.net},
publisher = {International Conference on Learning Representations},
year = {2018},
//url = {https://openreview.net/forum?id=B1l8BtlCb},
//timestamp = {Thu, 25 Jul 2019 14:25:57 +0200},
//biburl = {https://dblp.org/rec/conf/iclr/Gu0XLS18.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@inproceedings{li-etal-2018-simple,
author = {Yanyang Li and
......@@ -4408,8 +4061,6 @@ pages ={157-166},
Qiang Wang and
Changming Xu and
Jingbo Zhu},
//editor = {Iryna Gurevych and
Yusuke Miyao},
title = {A Simple and Effective Approach to Coverage-Aware Neural Machine Translation},
publisher = {Proceedings of the 56th Annual Meeting of the Association for Computational
Linguistics, {ACL} 2018, Melbourne, Australia, July 15-20, 2018, Volume
......@@ -4417,11 +4068,6 @@ pages ={157-166},
pages = {292--297},
//publisher = {Association for Computational Linguistics},
year = {2018},
//url = {https://www.aclweb.org/anthology/P18-2047/},
//doi = {10.18653/v1/P18-2047},
//timestamp = {Mon, 16 Sep 2019 13:46:41 +0200},
//biburl = {https://dblp.org/rec/conf/acl/LiXLWXZ18.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@inproceedings{TuModeling,
author = {Zhaopeng Tu and
......@@ -4435,26 +4081,19 @@ pages ={157-166},
1: Long Papers},
//publisher = {The Association for Computer Linguistics},
year = {2016},
//url = {https://doi.org/10.18653/v1/p16-1008},
//doi = {10.18653/v1/p16-1008},
//timestamp = {Tue, 28 Jan 2020 10:27:13 +0100},
//biburl = {https://dblp.org/rec/conf/acl/TuLLLL16.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@inproceedings{DBLP:journals/corr/SennrichFCBHHJL17,
author = {Rico Sennrich and
Orhan Firat and
Kyunghyun Cho and
Alexandra Birch and
Barry Haddow and
Alexandra Birch and
Julian Hitschler and
Marcin Junczys-Dowmunt and
Samuel L{\"{a}}ubli and
Antonio Valerio Miceli Barone and
Jozef Mokry and
Maria Nadejde},
//editor = {Andre Martins and
Anselmo Pe{\~{n}}as},
title = {Nematus: a Toolkit for Neural Machine Translation},
publisher = {Proceedings of the 15th Conference of the European Chapter of the
Association for Computational Linguistics, {EACL} 2017, Valencia,
......@@ -4462,18 +4101,10 @@ pages ={157-166},
pages = {65--68},
//publisher = {Association for Computational Linguistics},
year = {2017},
//url = {https://doi.org/10.18653/v1/e17-3017},
//doi = {10.18653/v1/e17-3017},
//timestamp = {Tue, 28 Jan 2020 10:31:12 +0100},
//biburl = {https://dblp.org/rec/conf/eacl/SennrichFCBHHJL17.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@inproceedings{DBLP:journals/corr/abs-1905-13324,
author = {Biao Zhang and
Rico Sennrich},
//editor = {Anna Korhonen and
David R. Traum and
Llu{\'{\i}}s M{\`{a}}rquez},
title = {A Lightweight Recurrent Network for Sequence Modeling},
publisher = {Proceedings of the 57th Conference of the Association for Computational
Linguistics, {ACL} 2019, Florence, Italy, July 28- August 2, 2019,
......@@ -4481,11 +4112,6 @@ pages ={157-166},
pages = {1538--1548},
//publisher = {Association for Computational Linguistics},
year = {2019},
//url = {https://doi.org/10.18653/v1/p19-1149},
//doi = {10.18653/v1/p19-1149},
//timestamp = {Tue, 28 Jan 2020 10:28:03 +0100},
//biburl = {https://dblp.org/rec/conf/acl/ZhangS19.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@article{Lei2017TrainingRA,
author = {Tao Lei and
......@@ -4495,12 +4121,6 @@ pages ={157-166},
journal = {CoRR},
volume = {abs/1709.02755},
year = {2017},
//url = {http://arxiv.org/abs/1709.02755},
//archivePrefix = {arXiv},
//eprint = {1709.02755},
//timestamp = {Mon, 13 Aug 2018 16:46:29 +0200},
//biburl = {https://dblp.org/rec/journals/corr/abs-1709-02755.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@inproceedings{Zhang2018SimplifyingNM,
author = {Biao Zhang and
......@@ -4508,10 +4128,6 @@ pages ={157-166},
Jinsong Su and
Qian Lin and
Huiji Zhang},
//editor = {Ellen Riloff and
David Chiang and
Julia Hockenmaier and
Jun'ichi Tsujii},
title = {Simplifying Neural Machine Translation with Addition-Subtraction Twin-Gated
Recurrent Networks},
publisher = {Proceedings of the 2018 Conference on Empirical Methods in Natural
......@@ -4519,10 +4135,6 @@ pages ={157-166},
pages = {4273--4283},
//publisher = {Association for Computational Linguistics},
year = {2018},
//url = {https://www.aclweb.org/anthology/D18-1459/},
//timestamp = {Fri, 13 Sep 2019 13:08:45 +0200},
//biburl = {https://dblp.org/rec/conf/emnlp/ZhangXSLZ18.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@inproceedings{Liu_2019_CVPR,
author = {Shikun Liu and
......@@ -4534,50 +4146,23 @@ pages ={157-166},
pages = {1871--1880},
//publisher = {Computer Vision Foundation / {IEEE}},
year = {2019},
//url = {http://openaccess.thecvf.com/content\_CVPR\_2019/html/Liu\_End-To-End\_Multi-Task\_Learning\_With\_Attention\_CVPR\_2019\_paper.html},
//doi = {10.1109/CVPR.2019.00197},
//timestamp = {Mon, 20 Jan 2020 15:36:04 +0100},
//biburl = {https://dblp.org/rec/conf/cvpr/LiuJD19.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@inproceedings{DBLP:journals/corr/abs-1811-00498,
author = {Ra{\'{u}}l V{\'{a}}zquez and
Alessandro Raganato and
J{\"{o}}rg Tiedemann and
Mathias Creutz},
//editor = {Isabelle Augenstein and
Spandana Gella and
Sebastian Ruder and
Katharina Kann and
Burcu Can and
Johannes Welbl and
Alexis Conneau and
Xiang Ren and
Marek Rei},
title = {Multilingual {NMT} with a Language-Independent Attention Bridge},
publisher = {Proceedings of the 4th Workshop on Representation Learning for NLP,
RepL4NLP@ACL 2019, Florence, Italy, August 2, 2019},
pages = {33--39},
//publisher = {Association for Computational Linguistics},
year = {2019},
//url = {https://doi.org/10.18653/v1/w19-4305},
//doi = {10.18653/v1/w19-4305},
//timestamp = {Fri, 27 Mar 2020 08:52:29 +0100},
//biburl = {https://dblp.org/rec/conf/rep4nlp/VazquezRTC19.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@inproceedings{MoradiInterrogating,
author = {Pooya Moradi and
Nishant Kambhatla and
Anoop Sarkar},
//editor = {Alexandra Birch and
Andrew M. Finch and
Hiroaki Hayashi and
Ioannis Konstas and
Thang Luong and
Graham Neubig and
Yusuke Oda and
Katsuhito Sudoh},
title = {Interrogating the Explanatory Power of Attention in Neural Machine
Translation},
publisher = {Proceedings of the 3rd Workshop on Neural Generation and Translation@EMNLP-IJCNLP
......@@ -4585,11 +4170,6 @@ pages ={157-166},
pages = {221--230},
//publisher = {Association for Computational Linguistics},
year = {2019},
//url = {https://doi.org/10.18653/v1/D19-5624},
//doi = {10.18653/v1/D19-5624},
//timestamp = {Tue, 24 Mar 2020 15:04:09 +0100},
//biburl = {https://dblp.org/rec/conf/emnlp/MoradiKS19.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@inproceedings{WangNeural,
author = {Xing Wang and
......@@ -4598,18 +4178,12 @@ pages ={157-166},
Hang Li and
Deyi Xiong and
Min Zhang},
//editor = {Satinder P. Singh and
Shaul Markovitch},
title = {Neural Machine Translation Advised by Statistical Machine Translation},
publisher = {Proceedings of the Thirty-First {AAAI} Conference on Artificial Intelligence,
February 4-9, 2017, San Francisco, California, {USA}},
pages = {3330--3336},
//publisher = {{AAAI} Press},
year = {2017},
//url = {http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14451},
//timestamp = {Tue, 15 Jan 2019 11:48:13 +0100},
//biburl = {https://dblp.org/rec/conf/aaai/WangLTLXZ17.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@inproceedings{DBLP:journals/corr/abs-1905-09418,
author = {Elena Voita and
......@@ -4617,9 +4191,6 @@ pages ={157-166},
Fedor Moiseev and
Rico Sennrich and
Ivan Titov},
//editor = {Anna Korhonen and
David R. Traum and
Llu{\'{\i}}s M{\`{a}}rquez},
title = {Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy
Lifting, the Rest Can Be Pruned},
publisher = {Proceedings of the 57th Conference of the Association for Computational
......@@ -4628,11 +4199,6 @@ pages ={157-166},
pages = {5797--5808},
//publisher = {Association for Computational Linguistics},
year = {2019},
//url = {https://doi.org/10.18653/v1/p19-1580},
//doi = {10.18653/v1/p19-1580},
//timestamp = {Tue, 28 Jan 2020 10:27:29 +0100},
//biburl = {https://dblp.org/rec/conf/acl/VoitaTMST19.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@inproceedings{Xiao2019SharingAW,
author = {Tong Xiao and
......@@ -4640,7 +4206,6 @@ pages ={157-166},
Jingbo Zhu and
Zhengtao Yu and
Tongran Liu},
//editor = {Sarit Kraus},
title = {Sharing Attention Weights for Fast Transformer},
publisher = {Proceedings of the Twenty-Eighth International Joint Conference on
Artificial Intelligence, {IJCAI} 2019, Macao, China, August 10-16,
......@@ -4648,11 +4213,6 @@ pages ={157-166},
pages = {5292--5298},
//publisher = {ijcai.org},
year = {2019},
//url = {https://doi.org/10.24963/ijcai.2019/735},
//doi = {10.24963/ijcai.2019/735},
//timestamp = {Tue, 20 Aug 2019 16:18:18 +0200},
//biburl = {https://dblp.org/rec/conf/ijcai/XiaoLZ0L19.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@inproceedings{Yang2017TowardsBH,
author = {Baosong Yang and
......@@ -4660,9 +4220,6 @@ pages ={157-166},
Tong Xiao and
Lidia S. Chao and
Jingbo Zhu},
//editor = {Martha Palmer and
Rebecca Hwa and
Sebastian Riedel},
title = {Towards Bidirectional Hierarchical Representations for Attention-based
Neural Machine Translation},
publisher = {Proceedings of the 2017 Conference on Empirical Methods in Natural
......@@ -4671,20 +4228,11 @@ pages ={157-166},
pages = {1432--1441},
//publisher = {Association for Computational Linguistics},
year = {2017},
//url = {https://doi.org/10.18653/v1/d17-1150},
//doi = {10.18653/v1/d17-1150},
//timestamp = {Tue, 28 Jan 2020 10:28:08 +0100},
//biburl = {https://dblp.org/rec/conf/emnlp/YangWXCZ17.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@inproceedings{Wang2019TreeTI,
author = {Yau-Shian Wang and
Hung-yi Lee and
Yun-Nung Chen},
//editor = {Kentaro Inui and
Jing Jiang and
Vincent Ng and
Xiaojun Wan},
title = {Tree Transformer: Integrating Tree Structures into Self-Attention},
publisher = {Proceedings of the 2019 Conference on Empirical Methods in Natural
Language Processing and the 9th International Joint Conference on
......@@ -4693,51 +4241,29 @@ pages ={157-166},
//publisher = {Association for Computational Linguistics},
pages = {1061--1070},
year = {2019},
//url = {https://doi.org/10.18653/v1/D19-1098},
//doi = {10.18653/v1/D19-1098},
//timestamp = {Thu, 12 Dec 2019 13:23:46 +0100},
//biburl = {https://dblp.org/rec/conf/emnlp/WangLC19.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@inproceedings{DBLP:journals/corr/abs-1809-01854,
author = {Jetic Gu and
Hassan S. Shavarani and
Anoop Sarkar},
//editor = {Ellen Riloff and
David Chiang and
Julia Hockenmaier and
Jun'ichi Tsujii},
title = {Top-down Tree Structured Decoding with Syntactic Connections for Neural Machine Translation and Parsing},
publisher = {Proceedings of the 2018 Conference on Empirical Methods in Natural
Language Processing, Brussels, Belgium, October 31 - November 4, 2018},
pages = {401--413},
//publisher = {Association for Computational Linguistics},
year = {2018},
//url = {https://doi.org/10.18653/v1/d18-1037},
//doi = {10.18653/v1/d18-1037},
//timestamp = {Tue, 28 Jan 2020 10:28:48 +0100},
//biburl = {https://dblp.org/rec/conf/emnlp/GuSS18.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@inproceedings{DBLP:journals/corr/abs-1808-09374,
author = {Xinyi Wang and
Hieu Pham and
Pengcheng Yin and
Graham Neubig},
//editor = {Ellen Riloff and
David Chiang and
Julia Hockenmaier and
Jun'ichi Tsujii},
title = {A Tree-based Decoder for Neural Machine Translation},
publisher = {Proceedings of the 2018 Conference on Empirical Methods in Natural
Language Processing, Brussels, Belgium, October 31 - November 4, 2018},
pages = {4772--4777},
//publisher = {Association for Computational Linguistics},
year = {2018},
//url = {https://www.aclweb.org/anthology/D18-1509/},
//timestamp = {Fri, 13 Sep 2019 13:08:45 +0200},
//biburl = {https://dblp.org/rec/conf/emnlp/WangPYN18.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@article{DBLP:journals/corr/ZhangZ16c,
author = {Jiajun Zhang and
......@@ -4746,12 +4272,6 @@ pages ={157-166},
journal = {CoRR},
volume = {abs/1610.07272},
year = {2016},
//url = {http://arxiv.org/abs/1610.07272},
//archivePrefix = {arXiv},
//eprint = {1610.07272},
//timestamp = {Mon, 13 Aug 2018 16:47:14 +0200},
//biburl = {https://dblp.org/rec/journals/corr/ZhangZ16c.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@article{Dai2019TransformerXLAL,
author = {Zihang Dai and
......@@ -4764,12 +4284,6 @@ pages ={157-166},
journal = {CoRR},
volume = {abs/1901.02860},
year = {2019},
//url = {http://arxiv.org/abs/1901.02860},
//archivePrefix = {arXiv},
//eprint = {1901.02860},
//timestamp = {Fri, 01 Feb 2019 13:39:59 +0100},
//biburl = {https://dblp.org/rec/journals/corr/abs-1901-02860.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@inproceedings{li-etal-2019-word,
author = {Xintong Li and
......@@ -4777,9 +4291,6 @@ pages ={157-166},
Lemao Liu and
Max Meng and
Shuming Shi},
//editor = {Anna Korhonen and
David R. Traum and
Llu{\'{\i}}s M{\`{a}}rquez},
title = {On the Word Alignment from Neural Machine Translation},
publisher = {Proceedings of the 57th Conference of the Association for Computational
Linguistics, {ACL} 2019, Florence, Italy, July 28- August 2, 2019,
......@@ -4787,11 +4298,6 @@ pages ={157-166},
pages = {1293--1303},
//publisher = {Association for Computational Linguistics},
year = {2019},
//url = {https://doi.org/10.18653/v1/p19-1124},
//doi = {10.18653/v1/p19-1124},
//timestamp = {Tue, 28 Jan 2020 10:27:51 +0100},
//biburl = {https://dblp.org/rec/conf/acl/LiLLMS19.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@inproceedings{Werlen2018DocumentLevelNM,
......@@ -4799,10 +4305,6 @@ pages ={157-166},
Dhananjay Ram and
Nikolaos Pappas and
James Henderson},
//editor = {Ellen Riloff and
David Chiang and
Julia Hockenmaier and
Jun'ichi Tsujii},
title = {Document-Level Neural Machine Translation with Hierarchical Attention
Networks},
publisher = {Proceedings of the 2018 Conference on Empirical Methods in Natural
......@@ -4810,19 +4312,12 @@ pages ={157-166},
pages = {2947--2954},
//publisher = {Association for Computational Linguistics},
year = {2018},
//url = {https://doi.org/10.18653/v1/d18-1325},
//doi = {10.18653/v1/d18-1325},
//timestamp = {Fri, 27 Mar 2020 08:46:30 +0100},
//biburl = {https://dblp.org/rec/conf/emnlp/WerlenRPH18.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@inproceedings{DBLP:journals/corr/abs-1805-10163,
author = {Elena Voita and
Pavel Serdyukov and
Rico Sennrich and
Ivan Titov},
//editor = {Iryna Gurevych and
Yusuke Miyao},
title = {Context-Aware Neural Machine Translation Learns Anaphora Resolution},
publisher = {Proceedings of the 56th Annual Meeting of the Association for Computational
Linguistics, {ACL} 2018, Melbourne, Australia, July 15-20, 2018, Volume
......@@ -4830,11 +4325,6 @@ pages ={157-166},
pages = {1264--1274},
//publisher = {Association for Computational Linguistics},
year = {2018},
//url = {https://www.aclweb.org/anthology/P18-1117/},
//doi = {10.18653/v1/P18-1117},
//timestamp = {Mon, 16 Sep 2019 13:46:41 +0200},
//biburl = {https://dblp.org/rec/conf/acl/TitovSSV18.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@article{DBLP:journals/corr/abs-1906-00532,
author = {Aishwarya Bhandare and
......@@ -4849,12 +4339,6 @@ pages ={157-166},
journal = {CoRR},
volume = {abs/1906.00532},
year = {2019},
//url = {http://arxiv.org/abs/1906.00532},
//archivePrefix = {arXiv},
//eprint = {1906.00532},
//timestamp = {Thu, 13 Jun 2019 13:36:00 +0200},
//biburl = {https://dblp.org/rec/journals/corr/abs-1906-00532.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@inproceedings{Zhang2018SpeedingUN,
......@@ -4863,46 +4347,29 @@ pages ={157-166},
Yang Feng and
Lei Shen and
Qun Liu},
//editor = {Ellen Riloff and
David Chiang and
Julia Hockenmaier and
Jun'ichi Tsujii},
title = {Speeding Up Neural Machine Translation Decoding by Cube Pruning},
publisher = {Proceedings of the 2018 Conference on Empirical Methods in Natural
Language Processing, Brussels, Belgium, October 31 - November 4, 2018},
pages = {4284--4294},
//publisher = {Association for Computational Linguistics},
year = {2018},
//url = {https://www.aclweb.org/anthology/D18-1460/},
//timestamp = {Fri, 29 Nov 2019 14:00:46 +0100},
//biburl = {https://dblp.org/rec/conf/emnlp/Zhang0FSL18.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@inproceedings{DBLP:journals/corr/SeeLM16,
author = {Abigail See and
Minh-Thang Luong and
Christopher D. Manning},
//editor = {Yoav Goldberg and
Stefan Riezler},
title = {Compression of Neural Machine Translation Models via Pruning},
publisher = {Proceedings of the 20th {SIGNLL} Conference on Computational Natural
Language Learning, CoNLL 2016, Berlin, Germany, August 11-12, 2016},
pages = {291--301},
//publisher = {{ACL}},
year = {2016},
//url = {https://doi.org/10.18653/v1/k16-1029},
//doi = {10.18653/v1/k16-1029},
//timestamp = {Tue, 28 Jan 2020 10:29:27 +0100},
//biburl = {https://dblp.org/rec/conf/conll/SeeLM16.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@inproceedings{DBLP:journals/corr/ChenLCL17,
author = {Yun Chen and
Yang Liu and
Yong Cheng and
Victor O. K. Li},
//editor = {Regina Barzilay and
Min-Yen Kan},
title = {A Teacher-Student Framework for Zero-Resource Neural Machine Translation},
publisher = {Proceedings of the 55th Annual Meeting of the Association for Computational
Linguistics, {ACL} 2017, Vancouver, Canada, July 30 - August 4, Volume
......@@ -4910,11 +4377,6 @@ pages ={157-166},
pages = {1925--1935},
//publisher = {Association for Computational Linguistics},
year = {2017},
//url = {https://doi.org/10.18653/v1/P17-1176},
//doi = {10.18653/v1/P17-1176},
//timestamp = {Tue, 20 Aug 2019 11:59:05 +0200},
//biburl = {https://dblp.org/rec/conf/acl/ChenLCL17.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@article{Hinton2015Distilling,
author = {Geoffrey E. Hinton and
......@@ -4924,12 +4386,6 @@ pages ={157-166},
journal = {CoRR},
volume = {abs/1503.02531},
year = {2015},
//url = {http://arxiv.org/abs/1503.02531},
//archivePrefix = {arXiv},
//eprint = {1503.02531},
//timestamp = {Mon, 13 Aug 2018 16:48:36 +0200},
//biburl = {https://dblp.org/rec/journals/corr/HintonVD15.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@inproceedings{Ott2018ScalingNM,
......@@ -4953,8 +4409,6 @@ pages ={157-166},
year = "2016",
//address = "Austin, Texas",
//publisher = "Association for Computational Linguistics",
//url = "https://www.aclweb.org/anthology/D16-1139",
//doi = "10.18653/v1/D16-1139",
pages = "1317--1327",
}
......@@ -4982,18 +4436,11 @@ pages ={157-166},
Toulon, France, April 24-26, 2017, Conference Track Proceedings},
//publisher = {OpenReview.net},
year = {2017},
//url = {https://openreview.net/forum?id=BJC\_jUqxe},
//timestamp = {Thu, 25 Jul 2019 14:25:44 +0200},
//biburl = {https://dblp.org/rec/conf/iclr/LinFSYXZB17.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@inproceedings{Shaw2018SelfAttentionWR,
author = {Peter Shaw and
Jakob Uszkoreit and
Ashish Vaswani},
//editor = {Marilyn A. Walker and
Heng Ji and
Amanda Stent},
title = {Self-Attention with Relative Position Representations},
publisher = {Proceedings of the 2018 Conference of the North American Chapter of
the Association for Computational Linguistics: Human Language Technologies,
......@@ -5002,11 +4449,6 @@ pages ={157-166},
pages = {464--468},
//publisher = {Association for Computational Linguistics},
year = {2018},
//url = {https://doi.org/10.18653/v1/n18-2074},
//doi = {10.18653/v1/n18-2074},
//timestamp = {Tue, 28 Jan 2020 10:30:17 +0100},
//biburl = {https://dblp.org/rec/conf/naacl/ShawUV18.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@inproceedings{DBLP:journals/corr/HeZRS15,
author = {Kaiming He and
......@@ -5019,26 +4461,6 @@ pages ={157-166},
pages = {770--778},
//publisher = {{IEEE} Computer Society},
year = {2016},
//url = {https://doi.org/10.1109/CVPR.2016.90},
//doi = {10.1109/CVPR.2016.90},
//timestamp = {Wed, 16 Oct 2019 14:14:50 +0200},
//biburl = {https://dblp.org/rec/conf/cvpr/HeZRS16.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@article{Ba2016LayerN,
author = {Lei Jimmy Ba and
Jamie Ryan Kiros and
Geoffrey E. Hinton},
title = {Layer Normalization},
journal = {CoRR},
volume = {abs/1607.06450},
year = {2016},
//url = {http://arxiv.org/abs/1607.06450},
//archivePrefix = {arXiv},
//eprint = {1607.06450},
//timestamp = {Tue, 23 Jul 2019 17:33:23 +0200},
//biburl = {https://dblp.org/rec/journals/corr/BaKH16.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@article{JMLR:v15:srivastava14a,
author = {Nitish Srivastava and Geoffrey Hinton and Alex Krizhevsky and Ilya Sutskever and Ruslan Salakhutdinov},
......@@ -5047,7 +4469,6 @@ pages ={157-166},
year = {2014},
volume = {15},
pages = {1929-1958},
//url = {http://jmlr.org/papers/v15/srivastava14a.html}
}
@inproceedings{Szegedy_2016_CVPR,
author = {Christian Szegedy and
......@@ -5061,18 +4482,11 @@ pages ={157-166},
pages = {2818--2826},
//publisher = {{IEEE} Computer Society},
year = {2016},
//url = {https://doi.org/10.1109/CVPR.2016.308},
//doi = {10.1109/CVPR.2016.308},
//timestamp = {Wed, 16 Oct 2019 14:14:50 +0200},
//biburl = {https://dblp.org/rec/conf/cvpr/SzegedyVISW16.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@inproceedings{DBLP:journals/corr/abs-1805-00631,
author = {Biao Zhang and
Deyi Xiong and
Jinsong Su},
//editor = {Iryna Gurevych and
Yusuke Miyao},
title = {Accelerating Neural Transformer via an Average Attention Network},
publisher = {Proceedings of the 56th Annual Meeting of the Association for Computational
Linguistics, {ACL} 2018, Melbourne, Australia, July 15-20, 2018, Volume
......@@ -5080,11 +4494,6 @@ pages ={157-166},
pages = {1789--1798},
//publisher = {Association for Computational Linguistics},
year = {2018},
//url = {https://www.aclweb.org/anthology/P18-1166/},
//doi = {10.18653/v1/P18-1166},
//timestamp = {Mon, 16 Sep 2019 13:46:41 +0200},
//biburl = {https://dblp.org/rec/conf/acl/XiongZS18.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
@article{DBLP:journals/corr/CourbariauxB16,
author = {Matthieu Courbariaux and
......@@ -5094,12 +4503,6 @@ pages ={157-166},
journal = {CoRR},
volume = {abs/1602.02830},
year = {2016},
//url = {http://arxiv.org/abs/1602.02830},
//archivePrefix = {arXiv},
//eprint = {1602.02830},
//timestamp = {Mon, 13 Aug 2018 16:46:57 +0200},
//biburl = {https://dblp.org/rec/journals/corr/CourbariauxB16.bib},
//bibsource = {dblp computer science bibliography, https://dblp.org}
}
%%%%% chapter 12------------------------------------------------------
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论