Skip to content
项目
群组
代码片段
帮助
当前项目
正在载入...
登录 / 注册
切换导航面板
M
mtbookv2
概览
Overview
Details
Activity
Cycle Analytics
版本库
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
问题
0
Issues
0
列表
Board
标记
里程碑
合并请求
0
Merge Requests
0
CI / CD
CI / CD
流水线
作业
日程表
图表
维基
Wiki
代码片段
Snippets
成员
Collapse sidebar
Close sidebar
活动
图像
聊天
创建新问题
作业
提交
Issue Boards
Open sidebar
NiuTrans
mtbookv2
Commits
68afcd15
Commit
68afcd15
authored
Dec 14, 2020
by
zengxin
Browse files
Options
Browse Files
Download
Plain Diff
合并分支 'caorunzhe' 到 'zengxin'
Caorunzhe 查看合并请求
!585
parents
35befc90
0bdb74d8
隐藏空白字符变更
内嵌
并排
正在显示
7 个修改的文件
包含
613 行增加
和
39 行删除
+613
-39
Chapter14/chapter14.tex
+8
-8
Chapter15/chapter15.tex
+0
-0
Chapter16/Figures/figure-example-of-iterative-back-translation.tex
+3
-3
Chapter16/Figures/figure-schematic-of-the-domain-discriminator.tex
+1
-1
Chapter18/Figures/figure-comparison-of-incremental-model-optimization-methods.tex
+6
-2
Chapter18/chapter18.tex
+18
-18
bibliography.bib
+577
-7
没有找到文件。
Chapter14/chapter14.tex
查看文件 @
68afcd15
...
...
@@ -108,13 +108,13 @@
\parinterval
以上两种推断方式在神经机器翻译中都有应用,对于源语言句子
$
\seq
{
x
}
=
\{
x
_
1
,x
_
2
,
\dots
,x
_
m
\}
$
和目标语句子
$
\seq
{
y
}
=
\{
y
_
1
,y
_
2
,
\dots
,y
_
n
\}
$
,用自左向右的方式可以将翻译概率
$
\funp
{
P
}
(
\seq
{
y
}
\vert\seq
{
x
}
)
$
描述为公式
\eqref
{
eq:14-1
}
:
\begin{eqnarray}
\funp
{
P
}
(
\seq
{
y
}
\vert\seq
{
x
}
)
=
\prod
_{
j=1
}^
n
\funp
{
P
}
(y
_
j
\vert\seq
{
y
}_{
<j
}
,
\seq
{
x
}
)
\funp
{
P
}
(
\seq
{
y
}
\vert\seq
{
x
}
)
&
=
&
\prod
_{
j=1
}^
n
\funp
{
P
}
(y
_
j
\vert\seq
{
y
}_{
<j
}
,
\seq
{
x
}
)
\label
{
eq:14-1
}
\end{eqnarray}
\parinterval
而用自右向左的方式可以得到公式
\eqref
{
eq:14-2
}
:
\begin{eqnarray}
\funp
{
P
}
(
\seq
{
y
}
\vert\seq
{
x
}
)
=
\prod
_{
j=1
}^
n
\funp
{
P
}
(y
_{
n+1-j
}
\vert\seq
{
y
}_{
>j
}
,
\seq
{
x
}
)
\funp
{
P
}
(
\seq
{
y
}
\vert\seq
{
x
}
)
&
=
&
\prod
_{
j=1
}^
n
\funp
{
P
}
(y
_{
n+1-j
}
\vert\seq
{
y
}_{
>j
}
,
\seq
{
x
}
)
\label
{
eq:14-2
}
\end{eqnarray}
\parinterval
其中,
$
\seq
{
y
}_{
<j
}
=
\{
y
_
1
,y
_
2
,
\dots
,y
_{
j
-
1
}
\}
$
,
$
\seq
{
y
}_{
>j
}
=
\{
y
_{
j
+
1
}
,y
_{
j
+
2
}
,
\dots
,y
_
n
\}
$
。
...
...
@@ -148,7 +148,7 @@
\item
长度惩罚因子。用译文长度来归一化翻译概率是最常用的方法:对于源语言句子
$
\seq
{
x
}$
和译文句子
$
\seq
{
y
}$
,模型得分
$
\textrm
{
score
}
(
\seq
{
x
}
,
\seq
{
y
}
)
$
的值会随着译文
$
\seq
{
y
}$
的变长而减小,为了避免此现象,可以引入一个长度惩罚函数
$
\textrm
{
lp
}
(
\seq
{
y
}
)
$
,并定义模型得分如公式
\eqref
{
eq:14-12
}
所示:
\begin{eqnarray}
\textrm
{
score
}
(
\seq
{
x
}
,
\seq
{
y
}
)
=
\frac
{
\log
\funp
{
P
}
(
\seq
{
y
}
\vert\seq
{
x
}
)
}{
\textrm
{
lp
}
(
\seq
{
y
}
)
}
\textrm
{
score
}
(
\seq
{
x
}
,
\seq
{
y
}
)
&
=
&
\frac
{
\log
\funp
{
P
}
(
\seq
{
y
}
\vert\seq
{
x
}
)
}{
\textrm
{
lp
}
(
\seq
{
y
}
)
}
\label
{
eq:14-12
}
\end{eqnarray}
...
...
@@ -188,7 +188,7 @@ b &=& \omega_{\textrm{high}}\cdot |\seq{x}| \label{eq:14-4}
\noindent
其中,
$
\textrm
{
cp
}
(
\seq
{
x
}
,
\seq
{
y
}
)
$
表示覆盖度模型,它度量了译文对源语言每个单词的覆盖程度。
$
\textrm
{
cp
}
(
\seq
{
x
}
,
\seq
{
y
}
)
$
的定义中,
$
a
_{
ij
}$
表示源语言第
$
i
$
个位置与目标语第
$
j
$
个位置的注意力权重,这样
$
\sum
\limits
_{
j
}^{
|
\seq
{
y
}
|
}
a
_{
ij
}$
就可以用来衡量源语言第
$
i
$
个单词被翻译了“多少”,如果它大于1,表明翻译多了;如果小于1,表明翻译少了。公式
\eqref
{
eq:14-6
}
会惩罚那些欠翻译的翻译假设。覆盖度模型的一种改进形式是
\upcite
{
li-etal-2018-simple
}
:
\begin{eqnarray}
\textrm
{
cp
}
(
\seq
{
x
}
,
\seq
{
y
}
)
=
\sum
_{
i=1
}^{
|
\seq
{
x
}
|
}
\log
(
\textrm
{
max
}
(
\sum
_{
j
}^{
|
\seq
{
y
}
|
}
a
_{
ij
}
,
\beta
))
\textrm
{
cp
}
(
\seq
{
x
}
,
\seq
{
y
}
)
&
=
&
\sum
_{
i=1
}^{
|
\seq
{
x
}
|
}
\log
(
\textrm
{
max
}
(
\sum
_{
j
}^{
|
\seq
{
y
}
|
}
a
_{
ij
}
,
\beta
))
\label
{
eq:14-7
}
\end{eqnarray}
\noindent
公式
\eqref
{
eq:14-7
}
将公式
\eqref
{
eq:14-6
}
中的向下截断方式换为了向上截断。这样,模型可以对过翻译(或重复翻译)有更好的建模能力。不过,这个模型需要在开发集上细致地调整
$
\beta
$
,也带来了一定的额外工作量。此外,也可以将这种覆盖度单独建模并进行参数化,与翻译模型一同训练
\upcite
{
Mi2016CoverageEM,TuModeling,Kazimi2017CoverageFC
}
。这样可以得到更加精细的覆盖度模型。
...
...
@@ -416,7 +416,7 @@ b &=& \omega_{\textrm{high}}\cdot |\seq{x}| \label{eq:14-4}
\parinterval
目前主流的神经机器翻译的推断是一种
{
\small\sffamily\bfseries
{
自回归翻译
}}
\index
{
自回归翻译
}
(Autoregressive Translation)
\index
{
Autoregressive Translation
}
过程。所谓自回归是一种描述时间序列生成的方式。对于目标序列
$
\seq
{
y
}
=
\{
y
_
1
,
\dots
,y
_
n
\}
$
,自回归模型假设
$
j
$
时刻状态
$
y
_
j
$
的生成依赖于之前的状态
$
\{
y
_
1
,
\dots
,y
_{
j
-
1
}
\}
$
,而且
$
y
_
j
$
与
$
\{
y
_
1
,
\dots
,y
_{
j
-
1
}
\}
$
构成线性关系,那么生成
$
y
_
j
$
就是自回归的序列生成过程。神经机器翻译借用了这个概念,但是并不要求线性模型。对于输入的源语言序列
$
\seq
{
x
}
=
\{
x
_
1
,
\dots
,x
_
m
\}
$
,用自回归翻译模型生成译文序列
$
\seq
{
y
}
=
\{
y
_
1
,
\dots
,y
_
n
\}
$
的概率可以被定义为:
\begin{eqnarray}
\funp
{
P
}
(
\seq
{
y
}
|
\seq
{
x
}
)
=
\prod
_{
j=1
}^
n
{
\funp
{
P
}
(y
_
j|y
_{
<j
}
,
\seq
{
x
}
)
}
\funp
{
P
}
(
\seq
{
y
}
|
\seq
{
x
}
)
&
=
&
\prod
_{
j=1
}^
n
{
\funp
{
P
}
(y
_
j|y
_{
<j
}
,
\seq
{
x
}
)
}
\label
{
eq:14-8
}
\end{eqnarray}
...
...
@@ -425,7 +425,7 @@ b &=& \omega_{\textrm{high}}\cdot |\seq{x}| \label{eq:14-4}
\parinterval
对于这个问题,研究者也考虑移除翻译的自归回性,进行
{
\small\sffamily\bfseries
{
非自回归翻译
}}
\index
{
非自回归翻译
}
(Non-Autoregressive Translation,NAT)
\index
{
Non-Autoregressive Translation
}
\upcite
{
Gu2017NonAutoregressiveNM
}
。一个简单的非自回归翻译模型将问题建模为:
\begin{eqnarray}
\funp
{
P
}
(
\seq
{
y
}
|
\seq
{
x
}
)
=
\prod
_{
j=1
}^
n
{
\funp
{
P
}
(y
_
j|
\seq
{
x
}
)
}
\funp
{
P
}
(
\seq
{
y
}
|
\seq
{
x
}
)
&
=
&
\prod
_{
j=1
}^
n
{
\funp
{
P
}
(y
_
j|
\seq
{
x
}
)
}
\label
{
eq:14-9
}
\end{eqnarray}
...
...
@@ -485,7 +485,7 @@ b &=& \omega_{\textrm{high}}\cdot |\seq{x}| \label{eq:14-4}
\parinterval
另外,在每个解码器层中还包括额外的位置注意力模块,该模块与Transformer模型的其它部分中使用的多头注意力机制相同,如下:
\begin{eqnarray}
\textrm
{
Attention
}
(
\mathbi
{
Q
}
,
\mathbi
{
K
}
,
\mathbi
{
V
}
)
&
=
&
\textrm
{
Softmax
}
(
\frac
{
\mathbi
{
Q
}{
\mathbi
{
K
}}^{
T
}}{
\sqrt
{
d
_
k
}}
)
\cdot
\mathbi
{
V
}
\textrm
{
Attention
}
(
\mathbi
{
Q
}
,
\mathbi
{
K
}
,
\mathbi
{
V
}
)
&
=
&
\textrm
{
Softmax
}
(
\frac
{
\mathbi
{
Q
}{
\mathbi
{
K
}}^{
T
}}{
\sqrt
{
d
_
k
}}
)
\cdot
\mathbi
{
V
}
\label
{
eq:14-10
}
\end{eqnarray}
...
...
@@ -651,7 +651,7 @@ b &=& \omega_{\textrm{high}}\cdot |\seq{x}| \label{eq:14-4}
\parinterval
神经机器翻译模型对每个目标端位置
$
j
$
的单词分布进行预测,即对于目标语言词汇表中的每个单词
$
y
_
j
$
,需要计算
$
\funp
{
P
}
(
y
_
j |
\seq
{
y
}_{
<j
}
,
\seq
{
x
}
)
$
。假设有
$
K
$
个神经机器翻译系统,那么每个系统
$
k
$
都可以独立的计算这个概率,记为
$
\funp
{
P
}_{
k
}
(
y
_
j |
\seq
{
y
}_{
<j
}
,
\seq
{
x
}
)
$
。于是,可以用公式
\eqref
{
eq:14-11
}
融合这
$
K
$
个系统的预测:
\begin{eqnarray}
\funp
{
P
}
(y
_{
j
}
|
\seq
{
y
}_{
<j
}
,
\seq
{
x
}
)
=
\sum
_{
k=1
}^
K
\gamma
_{
k
}
\cdot
\funp
{
P
}_{
k
}
(y
_
j |
\seq
{
y
}_{
<j
}
,
\seq
{
x
}
)
\funp
{
P
}
(y
_{
j
}
|
\seq
{
y
}_{
<j
}
,
\seq
{
x
}
)
&
=
&
\sum
_{
k=1
}^
K
\gamma
_{
k
}
\cdot
\funp
{
P
}_{
k
}
(y
_
j |
\seq
{
y
}_{
<j
}
,
\seq
{
x
}
)
\label
{
eq:14-11
}
\end{eqnarray}
...
...
Chapter15/chapter15.tex
查看文件 @
68afcd15
This source diff could not be displayed because it is too large. You can
view the blob
instead.
Chapter16/Figures/figure-example-of-iterative-back-translation.tex
查看文件 @
68afcd15
...
...
@@ -42,14 +42,14 @@
\draw
[->,thick]([yshift=-0.75em]node5-1.east)--(remark3.north west);
\draw
[->,thick]([yshift=-0.75em]node6-1.east)--(remark3.south west);
\node
[anchor=south](d1) at ([xshift=-0.7em,yshift=
4
em]remark1.north)
{
\small
{
真实数据:
}}
;
\node
[anchor=south](d1) at ([xshift=-0.7em,yshift=
5.5
em]remark1.north)
{
\small
{
真实数据:
}}
;
\node
[anchor=west](d2) at ([xshift=2.0em]d1.east)
{
\small
{
伪数据:
}}
;
\node
[anchor=west](d3) at ([xshift=2.0em]d2.east)
{
\small
{
额外数据:
}}
;
\node
[anchor=west,fill=green!20,minimum width=1.5em](d1-1) at ([xshift=-0.0em]d1.east)
{}
;
\node
[anchor=west,fill=red!20,minimum width=1.5em](d2-1) at ([xshift=-0.0em]d2.east)
{}
;
\node
[anchor=west,fill=yellow!20,minimum width=1.5em](d3-1) at ([xshift=-0.0em]d3.east)
{}
;
\node
[anchor=
south] (d4) at ([xshift=1em]d1.nor
th)
{
\small
{
训练:
}}
;
\node
[anchor=
south] (d5) at ([xshift=0.5em]d2.nor
th)
{
\small
{
推理:
}}
;
\node
[anchor=
north] (d4) at ([xshift=1em]d1.sou
th)
{
\small
{
训练:
}}
;
\node
[anchor=
north] (d5) at ([xshift=0.5em]d2.sou
th)
{
\small
{
推理:
}}
;
\draw
[->,thick] ([xshift=0em]d4.east)--([xshift=1.5em]d4.east);
\draw
[->,thick,dashed] ([xshift=0em]d5.east)--([xshift=1.5em]d5.east);
...
...
Chapter16/Figures/figure-schematic-of-the-domain-discriminator.tex
查看文件 @
68afcd15
...
...
@@ -4,7 +4,7 @@
\node
[anchor=west,rec,fill=red!20](node2) at ([xshift=2.0em]node1.east)
{
\small
{
编码器
}}
;
\node
[anchor=west,rec](node3) at ([xshift=3.0em,yshift=2.0em]node2.east)
{
\small
{
解码器
}}
;
\node
[anchor=west,rec,fill=yellow!20](node4) at ([xshift=3.0em,yshift=-2.0em]node2.east)
{
\small
{
鉴
别器
}}
;
\node
[anchor=west,rec,fill=yellow!20](node4) at ([xshift=3.0em,yshift=-2.0em]node2.east)
{
\small
{
判
别器
}}
;
\draw
[->,thick](node1.east)--(node2.west);
\draw
[->,thick](node2.east)--([xshift=1.5em]node2.east)--([xshift=1.5em,yshift=2.0em]node2.east)--(node3.west);
...
...
Chapter18/Figures/figure-comparison-of-incremental-model-optimization-methods.tex
查看文件 @
68afcd15
\addtolength
{
\tabcolsep
}{
-4pt
}
\begin{tabular}
{
c c c
}
\begin{tikzpicture}
...
...
@@ -69,4 +71,6 @@
\end{scope}
\end{tikzpicture}
\end{tabular}
\ No newline at end of file
\end{tabular}
\addtolength
{
\tabcolsep
}{
4pt
}
\ No newline at end of file
Chapter18/chapter18.tex
查看文件 @
68afcd15
...
...
@@ -111,38 +111,38 @@
%----------------------------------------------------------------------------------------
\section
{
翻译结果可干预性
}
\parinterval
尽管目前神经机器翻译的质量已经很高,但语言现象是复杂多样的,模型在一些特定场景下仍然存在问题,最典型的一个是句子中术语的翻译。在实际应用中,经常会遇到公司名称、品牌名称、产品名称等专有名词和行业术语,以及不同含义的缩写,比如对于“小牛翻译”这个专有术语,不同的机器翻译系统给出的结果不一样:“Maverick translation”、“Calf translation”、“The mavericks translation”……而它正确的翻译应该为“NiuTrans”。对于这些类似的特殊词汇,大多数机器翻译引擎很难翻译得准确,一方面,因为模型大多是在通用数据集上训练出来的,并不能保证数据集能涵盖所有的语言现象,另一方面,即使是这些术语在训练数据中出现,它们通常也是低频的,模型比较难学到。为了保证翻译的准确性,对模型的翻译结果进行干预是十分有必要的,这些干预措施在比如交互式机器翻译、领域适应等一系列环境中也是很有用的。
\parinterval
就
{
\small\bfnew
术语翻译
}
\index
{
术语翻译
}
(Lexically Constrained Translation)
\index
{
Lexically Constrained Translation
}
而言,在不干预的情况下让模型直接翻译出正确术语是很难的,因为目标术语翻译词很可能是未登录词,因此必须人为提供额外的术语词典,那么我们的目标就是让模型的翻译输出遵守用户提供的术语约束。一个例子如下图所示:
\parinterval
交互式机器翻译体现了一种用户的行为“干预”机器翻译结果的思想。实际上,在机器翻译出现错误时,人们总是希望用一种直接有效的方式“改变”译文,到达改善翻译质量的目的。比如,如果机器翻译系统可以输出多个候选译文,用户可以在其中挑选最好的译文进行输出。也就是,人干预了译文候选的排序过程。另一个例子是使用
{
\small\bfnew
{
翻译记忆
}}
\index
{
翻译记忆
}
(Translation Memory
\index
{
Translation Memory
}
)改善机器翻译系统的性能。翻译记忆记录了高质量的源语言-目标语言句对,有时也可以被看作是一种先验知识或“记忆”。因此,当进行机器翻译(包括统计机器翻译和神经机器翻译)时,使用翻译记忆指导翻译过程也可以被看作是一种干预手段(
{
\color
{
red
}
参考文献!SMT和NMT都有,SMT中CL上有个长文,自动化所的,NMT的我记得腾讯应该有,找到后和我确认一下!
}
)。
\parinterval
虽然干预机器翻译系统的方式很多,最常用的还是对源语言特定片段翻译的干预,以期望最终句子的译文中满足某些对片段翻译的约束。这个问题也被称作
{
\small\bfnew
{
基于约束的翻译
}}
\index
{
基于约束的翻译
}
(Constraint-based Translation
\index
{
Constraint-based Translation
}
)。比如,在翻译网页时,需要保持译文中的网页标签与源文一致。另一个典型例子是术语翻译。在实际应用中,经常会遇到公司名称、品牌名称、产品名称等专有名词和行业术语,以及不同含义的缩写,比如,对于“小牛翻译”这个专有术语,不同的机器翻译系统给出的结果不一样:“Maverick translation”、“Calf translation”、“The mavericks translation”…… 而它正确的翻译应该为“NiuTrans”。 对于这些类似的特殊词汇,大多数机器翻译引擎很难翻译得准确。一方面,因为模型大多是在通用数据集上训练出来的,并不能保证数据集能涵盖所有的语言现象。另一方面,即使是这些术语在训练数据中出现,它们通常也是低频的,模型比较难学到。为了保证翻译的准确性,对术语翻译进行干预是十分有必要的,这对领域适应等问题的求解也是非常有意义的。
\parinterval
就
{
\small\bfnew
术语翻译
}
\index
{
术语翻译
}
(Lexically Constrained Translation)
\index
{
Lexically Constrained Translation
}
而言,在不干预的情况下让模型直接翻译出正确术语是很难的,因为目标术语翻译词很可能是未登录词,因此必须人为提供额外的术语词典,那么我们的目标就是让模型的翻译输出遵守用户提供的术语约束。这个过程如图
\ref
{
fig:18-2
}
所示。
%----------------------------------------------
\begin{figure}
[htp]
\centering
\input
{
./Chapter18/Figures/figure-translation-interfered
}
%\setlength{\abovecaptionskip}{-0.2cm}
\caption
{
翻译结果可干预性
}
\caption
{
翻译结果可干预性
(
{
\color
{
red
}
这个图需要修改!有些乱,等回沈阳找我讨论!
}
)
}
\label
{
fig:18-2
}
\end{figure}
%----------------------------------------------
\parinterval
在统计机器翻译中,翻译
过程是基于符号匹配的概率计算和推导,因此在强制某些词的翻译输出方面比较容易,而神经机器翻译是一个端到端训练的模型,内部基于连续空间的实数向量表示,翻译过程本质上是连续空间元素的一系列映射、组合和计算,因此这种干预存在一定的困难
。目前主要有两种解决思路:
\parinterval
在统计机器翻译中,翻译
本质上是由短语和规则构成的推导,因此修改译文比较容易,比如,可以在一个源语言片段所对应的翻译候选集中添加希望得到的译文即可。而神经机器翻译是一个端到端模型,内部基于连续空间的实数向量表示,翻译过程本质上是连续空间中元素的一系列映射、组合和代数运算,因此无法像修改符号系统那样直接修改模型并加入离散化的约束来影响译文生成
。目前主要有两种解决思路:
\begin{itemize}
\item
{
\small\bfnew
基于硬约束
}
。在模型解码过程中按照一定的策略来实施约束,这类方法大部分是在修改束搜索算法以强迫输出必须包含指定的词或者短语
\upcite
{
DBLP:conf/acl/HokampL17,DBLP:conf/naacl/PostV18,DBLP:conf/wmt/ChatterjeeNTFSB17,DBLP:conf/naacl/HaslerGIB18
}
。
\item
{
\small\bfnew
基于软约束
}
。这类方法本质上属于数据增强类的方法,是通过修改神经机器翻译模型的数据和训练过程来实现约束。通常是根据术语词典对源语句进行一定的修改,比如将目标术语编辑到源语中,之后将原始语料库和合成语料库进行混合训练,期望模型能够自动利用术语信息来指导解码,或者是利用占位符来替换源语中的术语,待翻译完成后再进行还原
\upcite
{
DBLP:conf/naacl/SongZYLWZ19,DBLP:conf/acl/DinuMFA19,DBLP:journals/corr/abs-1912-00567,DBLP:conf/ijcai/ChenCWL20
}
。
\end{itemize}
\parinterval
基于硬约束的方式是在搜索策略上进行限制,与模型无关,这类方法能保证输出满足约束,但是会影响解码速度。基于软约束的方式是通过构造特定格式的数据让模型训练,从而让模型具有一定的泛化能力,这类方法需要进行译前译后编辑,通常不会影响解码速度,但并不能保证输出能满足约束。
\parinterval
此外,神经机器翻译在应用时通常还需要进行译前译后的处理,译前处理指的是在翻译前对源文进行修改和规范,使之适合机器翻译的特点,从而能生成比较顺畅的译文,提高译文的可读性和准确率。在实际应用时,由于用户输入的源文形式多样,可能会包含比如术语、缩写、数学公式等,有些甚至可能还包含网页标签,因此对源文进行预处理是很有必要的。常见的处理工作包括对原文进行格式转换、标点符号检査、术语编辑、标签识别等,待翻译完成后,则需要对机器译文进行进一步的编辑和修正,从而使其符合使用规范,比如进行标点、格式检查,术语、标签还原等,这些过程通常都是按照设定的处理策略自动完成的。
\vspace
{
0.5em
}
\item
强制生成。这种方法并不改变模型,而是在解码过程中按照一定的策略来实施约束,一般是修改束搜索算法以确保输出必须包含指定的词或者短语
\upcite
{
DBLP:conf/acl/HokampL17,DBLP:conf/naacl/PostV18,DBLP:conf/wmt/ChatterjeeNTFSB17,DBLP:conf/naacl/HaslerGIB18
}
,例如,在获得译文输出后,利用注意力机制获取词对齐,之后通过词对齐对指定部分译文进行强制替换。或者,对包含正确术语翻译的翻译候选进行额外的加分,以确保解码时这样的翻译候选的排名足够靠前。
\parinterval
另外机器翻译中还有一些常见的干预(以上具体的内容可以参考
{
\chapterfourteen
}
),比如:
\begin{itemize}
\item
译文长度的控制,由于神经机器翻译模型使用单词概率的乘积表示整个句子的翻译概率,因此它天然倾向生成短译文,解决策略是在推断过程中引入译文长度控制机制,本质上是修改模型译文评分函数使其能够感知到长度信息,从而形成约束,比如引入长度惩罚因子、覆盖度等手段;
\vspace
{
0.5em
}
\item
数据增强。这类方法通过修改机器翻译模型的数据和训练过程来实现约束。通常是根据术语词典对源语言句子进行一定的修改,例如,将术语的译文编辑到源语言句子中,之后将原始语料库和合成语料库进行混合训练,期望模型能够自动利用术语信息来指导解码,或者是利用占位符来替换源语中的术语,待翻译完成后再进行还原
\upcite
{
DBLP:conf/naacl/SongZYLWZ19,DBLP:conf/acl/DinuMFA19,DBLP:journals/corr/abs-1912-00567,DBLP:conf/ijcai/ChenCWL20
}
。
\
item
译文的多样性,神经机器翻译通常会面临
$
n
$
-best 输出的译文十分相似的问题,即译文缺乏多样性,这会造成重排序的不准确,而且从人工翻译的角度看,同一个源文的译文应该是多样的,过于相似的译文也无法反映足够多的翻译现象,解决的方法可以从建模和解码出发,比如在引入隐变量来建模或者推断过程中引入额外的模型来惩罚相似的译文等手段。
\
vspace
{
0.5em
}
\end{itemize}
\parinterval
强制生成的方法是在搜索策略上进行限制,与模型无关,这类方法能保证输出满足约束,但是会影响翻译速度。数据增强的方法是通过构造特定格式的数据让模型训练,从而让模型具有一定的泛化能力,这类方法需要进行译前译后编辑,通常不会影响翻译速度,但并不能保证输出能满足约束。
\parinterval
此外,机器翻译在应用时通常还需要进行译前译后的处理,译前处理指的是在翻译前对源语言句子进行修改和规范,从而能生成比较顺畅的译文,提高译文的可读性和准确率。在实际应用时,由于用户输入的形式多样,可能会包含比如术语、缩写、数学公式等,有些甚至可能还包含网页标签,因此对源文进行预处理是很有必要的。常见的处理工作包括格式转换、标点符号检査、术语编辑、标签识别等,待翻译完成后,则需要对机器译文进行进一步的编辑和修正,从而使其符合使用规范,比如进行标点、格式检查,术语、标签还原等,这些过程通常都是按照设定的处理策略自动完成的。另外,译文长度的控制、译文多样性的控制等也可以丰富机器翻译系统干预的手段(见
{
\chapterfourteen
}
)。
%----------------------------------------------------------------------------------------
% NEW SECTION
...
...
@@ -190,13 +190,13 @@
\begin{itemize}
\vspace
{
0.5em
}
\item
对于多语言翻译的场景,使用单模型多语言翻译系统是一种很好的选择(
{
\chaptersixteen
}
)。当多个语种的数据量有限、使用频度不高时,这种方法可以很有效地解决翻译需求中长尾部分。例如,一些线上机器翻译服务已经支持超过100种语言的翻译,其中大部分语言之间的翻译需求是相对低频的,因此使用同一个模型进行翻译可以大大节约部署和运维的成本。
\vspace
{
0.5em
}
\item
使用基于中介语言的翻译也可以有效的解决多语言翻译问题(
{
\chaptersixteen
}
)。这种方法同时适合统计机器翻译和神经机器翻译,因此很早就使用在大规模机器翻译部署中。
\vspace
{
0.5em
}
\item
GPU部署中,由于GPU成本较高,因此可以考虑在单GPU设备上部署多套不同的系统。如果这些系统之间的并发不频繁,翻译延时不会有明显增加。这种多个模型共享一个设备的方法比较适合翻译请求相对低频但是翻译任务又很多样的情况。
\vspace
{
0.5em
}
\item
机器翻译大规模GPU部署对显存的使用也很严格。由于GPU显存较为有限,因此模型运行的显存消耗也是需要考虑的。一般来说,除了模型压缩和结构优化之外(
{
\chapterfourteen
}
和
{
\chapterfifteen
}
),也需要对模型的显存分配和使用进行单独的优化。例如,使用显存池来缓解频繁申请和释放显存空间造成的延时。另外,也可以尽可能让同一个显存块保存生命期不重叠的数据,避免重复开辟新的存储空间。图
\ref
{
fig:18-3
}
展示了一个显存复用的示例。
...
...
bibliography.bib
查看文件 @
68afcd15
...
...
@@ -4086,7 +4086,7 @@ year = {2012}
Joris Pelemans and
Hugo Van Hamme and
Patrick Wambacq},
publisher={
European Association of Computational Linguistics
},
publisher={
Annual Conference of the European Association for Machine Translation
},
year={2017}
}
...
...
@@ -4569,7 +4569,7 @@ author = {Yoshua Bengio and
Jozef Mokry and
Maria Nadejde},
title = {Nematus: a Toolkit for Neural Machine Translation},
publisher = {
European Association of Computational Linguistics
},
publisher = {
Annual Conference of the European Association for Machine Translation
},
pages = {65--68},
year = {2017}
}
...
...
@@ -9644,7 +9644,7 @@ author = {Zhuang Liu and
@inproceedings{finding2006adafre,
author = {S. F. Adafre and Maarten de Rijke},
title = {Finding Similar Sentences across Multiple Languages in Wikipedia },
publisher = {
European Association of Computational Linguistics
},
publisher = {
Annual Conference of the European Association for Machine Translation
},
year = {2006}
}
@inproceedings{method2008keiji,
...
...
@@ -10798,7 +10798,7 @@ author = {Zhuang Liu and
Mirella Lapata},
title = {Paraphrasing Revisited with Neural Machine Translation},
pages = {881--893},
publisher = {
European Association of Computational Linguistics
},
publisher = {
Annual Conference of the European Association for Machine Translation
},
year = {2017}
}
@article{2005Improving,
...
...
@@ -11694,7 +11694,7 @@ author = {Zhuang Liu and
Marcello Federico},
title = {Neural vs. Phrase-Based Machine Translation in a Multi-Domain Scenario},
pages = {280--284},
publisher = {
European Association of Computational Linguistics
},
publisher = {
Annual Conference of the European Association for Machine Translation
},
year = {2017}
}
@inproceedings{DBLP:conf/aaai/Zhang0LZC18,
...
...
@@ -11923,7 +11923,577 @@ author = {Zhuang Liu and
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%% chapter 17------------------------------------------------------
@article{DBLP:journals/ac/Bar-Hillel60,
author = {Yehoshua Bar-Hillel},
title = {The Present Status of Automatic Translation of Languages},
journal = {Advances in computers},
volume = {1},
pages = {91--163},
year = {1960}
}
@article{DBLP:journals/corr/abs-1901-09115,
author = {Andrei Popescu-Belis},
title = {Context in Neural Machine Translation: {A} Review of Models and Evaluations},
journal = {CoRR},
volume = {abs/1901.09115},
year = {2019}
}
@book{jurafsky2000speech,
title={Speech \& language processing},
author={Jurafsky, Dan},
year={2000},
publisher={Pearson Education India}
}
@inproceedings{DBLP:conf/anlp/MarcuCW00,
author = {Daniel Marcu and
Lynn Carlson and
Maki Watanabe},
title = {The Automatic Translation of Discourse Structures},
pages = {9--17},
publisher = {Applied Natural Language Processing Conference},
year = {2000}
}
@inproceedings{foster2010translating,
title={Translating structured documents},
author={Foster, George and Isabelle, Pierre and Kuhn, Roland},
booktitle={Proceedings of AMTA},
year={2010}
}
@inproceedings{DBLP:conf/eacl/LouisW14,
author = {Annie Louis and
Bonnie L. Webber},
title = {Structured and Unstructured Cache Models for {SMT} Domain Adaptation},
pages = {155--163},
publisher = {Annual Conference of the European Association for Machine Translation},
year = {2014}
}
@inproceedings{DBLP:conf/iwslt/HardmeierF10,
author = {Christian Hardmeier and
Marcello Federico},
title = {Modelling pronominal anaphora in statistical machine translation},
pages = {283--289},
publisher = {International Workshop on Spoken Language Translation},
year = {2010}
}
@inproceedings{DBLP:conf/wmt/NagardK10,
author = {Ronan Le Nagard and
Philipp Koehn},
title = {Aiding Pronoun Translation with Co-Reference Resolution},
pages = {252--261},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2010}
}
@inproceedings{DBLP:conf/eamt/LuongP16,
author = {Ngoc-Quang Luong and
Andrei Popescu-Belis},
title = {A Contextual Language Model to Improve Machine Translation of Pronouns
by Re-ranking Translation Hypotheses},
pages = {292--304},
publisher = {European Association for Machine Translation},
year = {2016}
}
@inproceedings{tiedemann2010context,
title={Context adaptation in statistical machine translation using models with exponentially decaying cache},
author={Tiedemann, J{\"o}rg},
publisher={Domain Adaptation for Natural Language Processing},
pages={8--15},
year={2010}
}
@inproceedings{DBLP:conf/emnlp/GongZZ11,
author = {Zhengxian Gong and
Min Zhang and
Guodong Zhou},
title = {Cache-based Document-level Statistical Machine Translation},
pages = {909--919},
publisher = {Conference on Empirical Methods in Natural Language Processing},
year = {2011}
}
@inproceedings{DBLP:conf/ijcai/XiongBZLL13,
author = {Deyi Xiong and
Guosheng Ben and
Min Zhang and
Yajuan Lv and
Qun Liu},
title = {Modeling Lexical Cohesion for Document-Level Machine Translation},
pages = {2183--2189},
publisher = { International Joint Conference on Artificial Intelligence},
year = {2013}
}
@inproceedings{xiao2011document,
title={Document-level consistency verification in machine translation},
author={Xiao, Tong and Zhu, Jingbo and Yao, Shujie and Zhang, Hao},
booktitle={Machine Translation Summit},
volume={13},
pages={131--138},
year={2011}
}
@inproceedings{DBLP:conf/sigdial/MeyerPZC11,
author = {Thomas Meyer and
Andrei Popescu-Belis and
Sandrine Zufferey and
Bruno Cartoni},
title = {Multilingual Annotation and Disambiguation of Discourse Connectives
for Machine Translation},
pages = {194--203},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2011}
}
@inproceedings{DBLP:conf/hytra/MeyerP12,
author = {Thomas Meyer and
Andrei Popescu-Belis},
title = {Using Sense-labeled Discourse Connectives for Statistical Machine
Translation},
pages = {129--138},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2012}
}
@inproceedings{DBLP:conf/nips/SutskeverVL14,
author = {Ilya Sutskever and
Oriol Vinyals and
Quoc V. Le},
title = {Sequence to Sequence Learning with Neural Networks},
pages = {3104--3112},
year = {2014},
publisher = {Conference and Workshop on Neural Information Processing Systems}
}
@inproceedings{DBLP:conf/emnlp/LaubliS018,
author = {Samuel L{\"{a}}ubli and
Rico Sennrich and
Martin Volk},
title = {Has Machine Translation Achieved Human Parity? {A} Case for Document-level
Evaluation},
pages = {4791--4796},
publisher = {Conference on Empirical Methods in Natural Language Processing},
year = {2018}
}
@article{DBLP:journals/corr/abs-1912-08494,
author = {Sameen Maruf and
Fahimeh Saleh and
Gholamreza Haffari},
title = {A Survey on Document-level Machine Translation: Methods and Evaluation},
journal = {CoRR},
volume = {abs/1912.08494},
year = {2019}
}
@inproceedings{DBLP:conf/discomt/TiedemannS17,
author = {J{\"{o}}rg Tiedemann and
Yves Scherrer},
title = {Neural Machine Translation with Extended Context},
pages = {82--92},
publisher = {Association for Computational Linguistics},
year = {2017}
}
@article{DBLP:journals/corr/abs-1910-07481,
author = {Valentin Mac{\'{e}} and
Christophe Servan},
title = {Using Whole Document Context in Neural Machine Translation},
journal = {CoRR},
volume = {abs/1910.07481},
year = {2019}
}
@article{DBLP:journals/corr/JeanLFC17,
author = {S{\'{e}}bastien Jean and
Stanislas Lauly and
Orhan Firat and
Kyunghyun Cho},
title = {Does Neural Machine Translation Benefit from Larger Context?},
journal = {CoRR},
volume = {abs/1704.05135},
year = {2017}
}
@inproceedings{DBLP:conf/acl/TitovSSV18,
author = {Elena Voita and
Pavel Serdyukov and
Rico Sennrich and
Ivan Titov},
title = {Context-Aware Neural Machine Translation Learns Anaphora Resolution},
pages = {1264--1274},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2018}
}
@inproceedings{DBLP:conf/acl/HaffariM18,
author = {Sameen Maruf and
Gholamreza Haffari},
title = {Document Context Neural Machine Translation with Memory Networks},
pages = {1275--1284},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2018}
}
@inproceedings{DBLP:conf/coling/KuangXLZ18,
author = {Shaohui Kuang and
Deyi Xiong and
Weihua Luo and
Guodong Zhou},
title = {Modeling Coherence for Neural Machine Translation with Dynamic and
Topic Caches},
pages = {596--606},
publisher = {International Conference on Computational Linguistics},
year = {2018}
}
@inproceedings{DBLP:conf/discomt/GarciaCE19,
author = {Eva Mart{\'{\i}}nez Garcia and
Carles Creus and
Cristina Espa{\~{n}}a-Bonet},
title = {Context-Aware Neural Machine Translation Decoding},
pages = {13--23},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2019}
}
@article{DBLP:journals/corr/abs-2010-12827,
author = {Amane Sugiyama and
Naoki Yoshinaga},
title = {Context-aware Decoder for Neural Machine Translation using a Target-side
Document-Level Language Model},
journal = {CoRR},
volume = {abs/2010.12827},
year = {2020}
}
@inproceedings{DBLP:conf/acl/VoitaST19,
author = {Elena Voita and
Rico Sennrich and
Ivan Titov},
title = {When a Good Translation is Wrong in Context: Context-Aware Machine
Translation Improves on Deixis, Ellipsis, and Lexical Cohesion},
pages = {1198--1212},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2019}
}
@inproceedings{DBLP:conf/emnlp/VoitaST19,
author = {Elena Voita and
Rico Sennrich and
Ivan Titov},
title = {Context-Aware Monolingual Repair for Neural Machine Translation},
pages = {877--886},
publisher = {Conference on Empirical Methods in Natural Language Processing},
year = {2019}
}
@inproceedings{DBLP:conf/discomt/WerlenP17,
author = {Lesly Miculicich Werlen and
Andrei Popescu-Belis},
title = {Validation of an Automatic Metric for the Accuracy of Pronoun Translation
{(APT)}},
pages = {17--25},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2017}
}
@inproceedings{DBLP:conf/emnlp/WongK12,
author = {Billy Tak-Ming Wong and
Chunyu Kit},
title = {Extending Machine Translation Evaluation Metrics with Lexical Cohesion
to Document Level},
pages = {1060--1068},
publisher = {Conference on Empirical Methods in Natural Language Processing},
year = {2012}
}
@inproceedings{DBLP:conf/discomt/GongZZ15,
author = {Zhengxian Gong and
Min Zhang and
Guodong Zhou},
title = {Document-Level Machine Translation Evaluation with Gist Consistency
and Text Cohesion},
pages = {33--40},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2015}
}
@inproceedings{DBLP:conf/cicling/HajlaouiP13,
author = {Najeh Hajlaoui and
Andrei Popescu-Belis},
title = {Assessing the Accuracy of Discourse Connective Translations: Validation
of an Automatic Metric},
volume = {7817},
pages = {236--247},
publisher = {Springer},
year = {2013}
}
@inproceedings{DBLP:conf/wmt/RiosMS18,
author = {Annette Rios and
Mathias M{\"{u}}ller and
Rico Sennrich},
title = {The Word Sense Disambiguation Test Suite at {WMT18}},
pages = {588--596},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2018}
}
@inproceedings{DBLP:conf/naacl/BawdenSBH18,
author = {Rachel Bawden and
Rico Sennrich and
Alexandra Birch and
Barry Haddow},
title = {Evaluating Discourse Phenomena in Neural Machine Translation},
pages = {1304--1313},
publisher = {Annual Conference of the North American Chapter of the Association for Computational Linguistics},
year = {2018}
}
@inproceedings{DBLP:conf/wmt/MullerRVS18,
author = {Mathias M{\"{u}}ller and
Annette Rios and
Elena Voita and
Rico Sennrich},
title = {A Large-Scale Test Set for the Evaluation of Context-Aware Pronoun
Translation in Neural Machine Translation},
pages = {61--72},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2018}
}
@inproceedings{DBLP:conf/iclr/KitaevKL20,
author = {Nikita Kitaev and
Lukasz Kaiser and
Anselm Levskaya},
title = {Reformer: The Efficient Transformer},
publisher = {International Conference on Learning Representations},
year = {2020}
}
@inproceedings{agrawal2018contextual,
title={Contextual handling in neural machine translation: Look behind, ahead and on both sides},
author={Agrawal, Ruchit Rajeshkumar and Turchi, Marco and Negri, Matteo},
booktitle={Annual Conference of the European Association for Machine Translation},
pages={11--20},
year={2018}
}
@inproceedings{DBLP:conf/emnlp/WerlenRPH18,
author = {Lesly Miculicich Werlen and
Dhananjay Ram and
Nikolaos Pappas and
James Henderson},
title = {Document-Level Neural Machine Translation with Hierarchical Attention
Networks},
pages = {2947--2954},
publisher = {Conference on Empirical Methods in Natural Language Processing},
year = {2018}
}
@inproceedings{DBLP:conf/naacl/MarufMH19,
author = {Sameen Maruf and
Andr{\'{e}} F. T. Martins and
Gholamreza Haffari},
title = {Selective Attention for Context-aware Neural Machine Translation},
pages = {3092--3102},
publisher = {Annual Conference of the North American Chapter of the Association for Computational Linguistics},
year = {2019}
}
@inproceedings{DBLP:conf/emnlp/TanZXZ19,
author = {Xin Tan and
Longyin Zhang and
Deyi Xiong and
Guodong Zhou},
title = {Hierarchical Modeling of Global Context for Document-Level Neural
Machine Translation},
pages = {1576--1585},
publisher = {Conference on Empirical Methods in Natural Language Processing},
year = {2019}
}
@inproceedings{DBLP:conf/emnlp/YangZMGFZ19,
author = {Zhengxin Yang and
Jinchao Zhang and
Fandong Meng and
Shuhao Gu and
Yang Feng and
Jie Zhou},
title = {Enhancing Context Modeling with a Query-Guided Capsule Network for
Document-level Translation},
pages = {1527--1537},
publisher = {Conference on Empirical Methods in Natural Language Processing},
year = {2019}
}
@inproceedings{DBLP:conf/ijcai/ZhengYHCB20,
author = {Zaixiang Zheng and
Xiang Yue and
Shujian Huang and
Jiajun Chen and
Alexandra Birch},
title = {Towards Making the Most of Context in Neural Machine Translation},
pages = {3983--3989},
publisher = {International Joint Conference on Artificial Intelligence},
year = {2020}
}
@article{DBLP:journals/tacl/TuLSZ18,
author = {Zhaopeng Tu and
Yang Liu and
Shuming Shi and
Tong Zhang},
title = {Learning to Remember Translation History with a Continuous Cache},
publisher = {Transactions of the Association for Computational Linguistics},
volume = {6},
pages = {407--420},
year = {2018}
}
@inproceedings{DBLP:conf/discomt/ScherrerTL19,
author = {Yves Scherrer and
J{\"{o}}rg Tiedemann and
Sharid Lo{\'{a}}iciga},
title = {Analysing concatenation approaches to document-level {NMT} in two
different domains},
pages = {51--61},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2019}
}
@inproceedings{DBLP:conf/wmt/GonzalesMS17,
author = {Annette Rios Gonzales and
Laura Mascarell and
Rico Sennrich},
title = {Improving Word Sense Disambiguation in Neural Machine Translation
with Sense Embeddings},
pages = {11--19},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2017}
}
@inproceedings{DBLP:conf/acl/LiLWJXZLL20,
author = {Bei Li and
Hui Liu and
Ziyang Wang and
Yufan Jiang and
Tong Xiao and
Jingbo Zhu and
Tongran Liu and
Changliang Li},
title = {Does Multi-Encoder Help? {A} Case Study on Context-Aware Neural Machine
Translation},
pages = {3512--3518},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2020}
}
@inproceedings{DBLP:conf/discomt/KimTN19,
author = {Yunsu Kim and
Duc Thanh Tran and
Hermann Ney},
title = {When and Why is Document-level Context Useful in Neural Machine Translation?},
pages = {24--34},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2019}
}
@inproceedings{DBLP:conf/discomt/SugiyamaY19,
author = {Amane Sugiyama and
Naoki Yoshinaga},
title = {Data augmentation using back-translation for context-aware neural
machine translation},
pages = {35--44},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2019}
}
@inproceedings{DBLP:conf/pacling/YamagishiK19,
author = {Hayahide Yamagishi and
Mamoru Komachi},
title = {Improving Context-Aware Neural Machine Translation with Target-Side
Context},
volume = {1215},
pages = {112--122},
publisher = {Springer},
year = {2019}
}
@inproceedings{DBLP:conf/emnlp/ZhangLSZXZL18,
author = {Jiacheng Zhang and
Huanbo Luan and
Maosong Sun and
Feifei Zhai and
Jingfang Xu and
Min Zhang and
Yang Liu},
title = {Improving the Transformer Translation Model with Document-Level Context},
pages = {533--542},
publisher = {Conference on Empirical Methods in Natural Language Processing},
year = {2018}
}
@inproceedings{DBLP:conf/coling/KuangX18,
author = {Shaohui Kuang and
Deyi Xiong},
title = {Fusing Recency into Neural Machine Translation with an Inter-Sentence
Gate Model},
pages = {607--617},
publisher = {International Conference on Computational Linguistics},
year = {2018}
}
@inproceedings{DBLP:conf/emnlp/WangTWL17,
author = {Longyue Wang and
Zhaopeng Tu and
Andy Way and
Qun Liu},
title = {Exploiting Cross-Sentence Context for Neural Machine Translation},
pages = {2826--2831},
publisher = {Conference on Empirical Methods in Natural Language Processing},
year = {2017}
}
@inproceedings{DBLP:conf/aaai/XiongH0W19,
author = {Hao Xiong and
Zhongjun He and
Hua Wu and
Haifeng Wang},
title = {Modeling Coherence for Discourse Neural Machine Translation},
pages = {7338--7345},
publisher = {{AAAI} Press},
year = {2019}
}
@article{DBLP:journals/tacl/YuSSLKBD20,
author = {Lei Yu and
Laurent Sartran and
Wojciech Stokowiec and
Wang Ling and
Lingpeng Kong and
Phil Blunsom and
Chris Dyer},
title = {Better Document-Level Machine Translation with Bayes' Rule},
journal = {Transactions of the Association for Computational Linguistics},
volume = {8},
pages = {346--360},
year = {2020}
}
@article{DBLP:journals/corr/abs-1903-04715,
author = {S{\'{e}}bastien Jean and
Kyunghyun Cho},
title = {Context-Aware Learning for Neural Machine Translation},
journal = {CoRR},
volume = {abs/1903.04715},
year = {2019}
}
@inproceedings{DBLP:conf/acl/SaundersSB20,
author = {Danielle Saunders and
Felix Stahlberg and
Bill Byrne},
title = {Using Context in Neural Machine Translation Training Objectives},
pages = {7764--7770},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2020}
}
@inproceedings{DBLP:conf/mtsummit/StojanovskiF19,
author = {Dario Stojanovski and
Alexander M. Fraser},
title = {Improving Anaphora Resolution in Neural Machine Translation Using
Curriculum Learning},
pages = {140--150},
publisher = {Annual Conference of the European Association for Machine Translation},
year = {2019}
}
@article{DBLP:journals/corr/abs-1911-03110,
author = {Liangyou Li and
Xin Jiang and
Qun Liu},
title = {Pretrained Language Models for Document-Level Neural Machine Translation},
publisher = {CoRR},
volume = {abs/1911.03110},
year = {2019}
}
@article{DBLP:journals/tacl/LiuGGLEGLZ20,
author = {Yinhan Liu and
Jiatao Gu and
Naman Goyal and
Xian Li and
Sergey Edunov and
Marjan Ghazvininejad and
Mike Lewis and
Luke Zettlemoyer},
title = {Multilingual Denoising Pre-training for Neural Machine Translation},
journal = {Transactions of the Association for Computational Linguistics},
volume = {8},
pages = {726--742},
year = {2020}
}
@inproceedings{DBLP:conf/wmt/MarufMH18,
author = {Sameen Maruf and
Andr{\'{e}} F. T. Martins and
Gholamreza Haffari},
title = {Contextual Neural Model for Translating Bilingual Multi-Speaker Conversations},
pages = {101--112},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2018}
}
%%%%% chapter 17------------------------------------------------------
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
...
...
@@ -12344,7 +12914,7 @@ author = {Zhuang Liu and
Jozef Mokry and
Maria Nadejde},
title = {Nematus: a Toolkit for Neural Machine Translation},
publisher = {
European Association of Computational Linguistics
},
publisher = {
Annual Conference of the European Association for Machine Translation
},
pages = {65--68},
year = {2017}
}
...
...
编写
预览
Markdown
格式
0%
重试
或
添加新文件
添加附件
取消
您添加了
0
人
到此讨论。请谨慎行事。
请先完成此评论的编辑!
取消
请
注册
或者
登录
后发表评论