Skip to content
项目
群组
代码片段
帮助
当前项目
正在载入...
登录 / 注册
切换导航面板
M
mtbookv2
概览
Overview
Details
Activity
Cycle Analytics
版本库
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
问题
0
Issues
0
列表
Board
标记
里程碑
合并请求
0
Merge Requests
0
CI / CD
CI / CD
流水线
作业
日程表
图表
维基
Wiki
代码片段
Snippets
成员
Collapse sidebar
Close sidebar
活动
图像
聊天
创建新问题
作业
提交
Issue Boards
Open sidebar
NiuTrans
mtbookv2
Commits
40f9e93d
Commit
40f9e93d
authored
Jan 07, 2021
by
曹润柘
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
16
parent
0500424d
隐藏空白字符变更
内嵌
并排
正在显示
5 个修改的文件
包含
47 行增加
和
56 行删除
+47
-56
Chapter16/Figures/figure-data-based-domain-adaptation-approach.tex
+6
-6
Chapter16/Figures/figure-multitask-learning-in-machine-translation-1.tex
+10
-9
Chapter16/Figures/figure-multitask-learning-in-machine-translation-2.tex
+14
-22
Chapter16/Figures/figure-parameter-initialization-method-diagram.tex
+15
-17
Chapter16/chapter16.tex
+2
-2
没有找到文件。
Chapter16/Figures/figure-data-based-domain-adaptation-approach.tex
查看文件 @
40f9e93d
...
...
@@ -110,13 +110,13 @@
\node
[rectangle,rounded corners=1pt,fill=cyan!10] [fit = (w4-3) (new
_
-3)] (box2)
{}
;
\end{pgfonlayer}
\node
[word,draw=orange!50,dotted,very thick,inner sep=2.5pt]
(realdata-3) at ([xshift=-
4
.5em,yshift=-2em]box1.south)
{
真实数据
}
;
\node
[word,draw=cyan!50,dotted,very thick,inner sep=2.5pt]
(fake-3) at ([xshift=
1
em,yshift=-2em]box2.south)
{
伪数据
}
;
\node
[word,draw,dotted,very thick,inner sep=2.5pt]
(monodata-3) at ([xshift=
-0.5
em,yshift=2em]monolingual-3.north)
{
单语数据
}
;
\node
[word,draw=orange!50,dotted,very thick,inner sep=2.5pt]
(realdata-3) at ([xshift=-
3
.5em,yshift=-2em]box1.south)
{
真实数据
}
;
\node
[word,draw=cyan!50,dotted,very thick,inner sep=2.5pt]
(fake-3) at ([xshift=
0
em,yshift=-2em]box2.south)
{
伪数据
}
;
\node
[word,draw,dotted,very thick,inner sep=2.5pt]
(monodata-3) at ([xshift=
0
em,yshift=2em]monolingual-3.north)
{
单语数据
}
;
\draw
[->,dotted,very thick]
([yshift=0.0em]monolingual-3.north)-- ([yshift=-0.2em,xshift=0.
45
em]monodata-3.south);
\draw
[->,dotted,very thick,cyan]
(box2.south) -- ([xshift=-
1
em,yshift=0.2em]fake-3.north);
\draw
[->,dotted,very thick,orange]
([xshift=-3.5em]box1.south) -- ([xshift=
1
em,yshift=0.2em]realdata-3.north);
\draw
[->,dotted,very thick]
([yshift=0.0em]monolingual-3.north)-- ([yshift=-0.2em,xshift=0.
0
em]monodata-3.south);
\draw
[->,dotted,very thick,cyan]
(box2.south) -- ([xshift=-
0
em,yshift=0.2em]fake-3.north);
\draw
[->,dotted,very thick,orange]
([xshift=-3.5em]box1.south) -- ([xshift=
0
em,yshift=0.2em]realdata-3.north);
\end{scope}
\end{tikzpicture}
Chapter16/Figures/figure-multitask-learning-in-machine-translation-1.tex
查看文件 @
40f9e93d
...
...
@@ -6,32 +6,33 @@
\node
[anchor=center] (node1-1) at (0,0)
{
\small
{$
y
'
$}}
;
\node
[anchor=center] (node1-1) at (0,0)
{
\small
{$
y
$}}
;
\node
[anchor=north,rec,fill=blue!20]
(node1-2) at ([yshift=-2.0em]node1-1.south)
{
\small
{
解码器
}}
;
\node
[anchor=north,rec,fill=red!20]
(node1-3) at ([yshift=-2em]node1-2.south)
{
\small
{
编码器
}}
;
\node
[anchor=east]
(node1-5) at ([xshift=-2em]node1-2.west)
{
\small
{$
y
$}}
;
\node
[anchor=east]
(node1-5) at ([xshift=-2em]node1-2.west)
{
\small
{$
y
_{
<
}
$}}
;
\node
[anchor=north]
(node1-4) at ([yshift=-2em]node1-3.south)
{
\small
{$
x
$}}
;
\draw
[->,thick](node1-4.north)--(node1-3.south);
\draw
[->,thick](node1-5.east)--(node1-2.west);
\draw
[->,thick](node1-3.north)--(node1-2.south);
\draw
[->,thick](node1-2.north)--(node1-1.south);
\node
[anchor=center] (node2-1) at ([xshift=12.0em]node1-1.east)
{
\small
{$
y
'
$}}
;
\node
[anchor=center] (node2-1) at ([xshift=12.0em]node1-1.east)
{
\small
{$
y
$}}
;
\node
[anchor=north,rec,fill=blue!20]
(node2-2) at ([yshift=-2.0em]node2-1.south)
{
\small
{
解码器
}}
;
\node
[anchor=north,rec,fill=red!20]
(node2-3) at ([yshift=-2em]node2-2.south)
{
\small
{
编码器
}}
;
\node
[anchor=east]
(node2-5) at ([xshift=-2em]node2-2.west)
{
\small
{$
y
$}}
;
\node
[anchor=east]
(node2-5) at ([xshift=-2em]node2-2.west)
{
\small
{$
y
_{
<
}
$}}
;
\node
[anchor=north]
(node2-4) at ([yshift=-2em]node2-3.south)
{
\small
{$
x
$}}
;
\node
[anchor=west,rec,fill=yellow!20]
(node2-6) at ([xshift=3.0em]node2-3.east)
{
\small
{
解码器
}}
;
\node
[anchor=south]
(node2-7) at ([yshift=2em]node2-6.north)
{
\small
{$
x'
$}}
;
\node
[anchor=south]
(node2-7) at ([yshift=2em]node2-6.north)
{
\small
{$
\hat
{
x
}
$}}
;
\draw
[->,thick](node2-4.north)--(node2-3.south);
\draw
[->,thick](node2-5.east)--(node2-2.west);
\draw
[->,thick](node2-3.north)--(node2-2.south)
node[pos=0.5,left,font=
\scriptsize
]
{
翻译
}
;
\draw
[->,thick](node2-2.north)--(node2-1.south);
\draw
[->,thick](node2-3.east)--(node2-6.west)
node[pos=0.5,above,font=
\scriptsize
]
{
重排序
}
;
\draw
[->,thick](node2-6.north)--(node2-7.south);
\draw
[->,thick](node2-3.north)--(node2-2.south);
\draw
[->,thick](node2-2.north)--(node2-1.south)
node[pos=0.5,left,font=
\scriptsize
]
{
翻译
}
;
\draw
[->,thick](node2-3.east)--(node2-6.west);
\draw
[->,thick](node2-6.north)--(node2-7.south)
node[pos=0.5,left,font=
\scriptsize
]
{
调整语序
}
;
\node
[anchor=east] (node1) at ([xshift=-2.0em]node1-1.west)
{
\small
{$
x,y
$
:双语数据
}}
;
\node
[anchor=south] (node2) at ([xshift=1.96em]node1.north)
{
\small
{$
y
_{
<
}$
:目标语言文本数据
}}
;
\node
[anchor=north](pos1) at ([yshift=0em]node1-4.south)
{
\small
{
(a)单任务学习
}}
;
\node
[anchor=west](pos2) at ([xshift=10.0em]pos1.east)
{
\small
{
(b)多任务学习
}}
;
...
...
Chapter16/Figures/figure-multitask-learning-in-machine-translation-2.tex
查看文件 @
40f9e93d
\begin{tikzpicture}
\begin{scope}
\node
[anchor=center] (node1-1) at (0,0)
{
\small
{$
y'
$}}
;
\node
[anchor=south,line width=0.6pt,draw,rounded corners,minimum height=1.5em,minimum width=4.3em,fill=blue!20]
(node1-2) at ([yshift=-3em]node1-1.south)
{
\small
{
Softmax
}}
;
\node
[anchor=center] (node1-1) at (0,0)
{
\small
{$
y
$}}
;
\node
[anchor=north,line width=0.6pt,draw,rounded corners,minimum height=2.2em,minimum width=4.3em,fill=blue!20]
(node1-3) at ([yshift=-2.0em]node1-
2
.south)
{
\small
{
解码器
}}
;
\node
[anchor=north,line width=0.6pt,draw,rounded corners,minimum height=2.2em,minimum width=4.3em,fill=blue!20]
(node1-3) at ([yshift=-2.0em]node1-
1
.south)
{
\small
{
解码器
}}
;
\node
[anchor=north,line width=0.6pt,draw,rounded corners,minimum height=2.2em,minimum width=4.3em,fill=yellow!20]
(node3-3) at ([yshift=-2.0em]node1-3.south)
{
\small
{
语言模型
}}
;
\node
[anchor=west] (node3-1) at ([xshift=4.0em]node3-3.east)
{
\small
{$
z
$}}
;
\node
[anchor=west,line width=0.6pt,draw,rounded corners,minimum height=1.5em,minimum width=4.3em,fill=blue!20]
(node3-2) at ([xshift=2em]node3-3.east)
{
\small
{
Softmax
}}
;
\node
[anchor=north] (node3-1) at ([yshift=3.0em]node3-2.north)
{
\small
{$
z'
$}}
;
\node
[anchor=north]
(node3-41) at ([xshift=-0.6em,yshift=-2em]node3-3.south)
{
\small
{$
y
$}}
;
\node
[anchor=north]
(node3-42) at ([xshift=0.6em,yshift=-2em]node3-3.south)
{
\small
{$
z
$}}
;
\node
[anchor=north]
(node3-41) at ([yshift=-2em]node3-3.south)
{
\small
{$
y
_{
<
}
+
z
_{
<
}$}}
;
\node
[anchor=east,line width=0.6pt,draw,rounded corners,minimum height=2.2em,minimum width=4.3em,fill=red!20]
(node2-1) at ([xshift=-2em]node1-3.west)
{
\small
{
编码器
}}
;
\node
[anchor=north]
(node2-2) at ([yshift=-2em]node2-1.south)
{
\small
{$
x
$}}
;
\node
[rectangle,rounded corners,draw=red,line width=0.2mm,densely dashed,inner sep=0.4em] [fit = (node3-2) (node3-3)] (inputshadow)
{}
;
\draw
[->,thick](node1-3.north)--(node1-2);
\draw
[->,thick](node1-2.north)--(node1-1);
\node
[rectangle,rounded corners,draw=red,line width=0.2mm,densely dashed,inner sep=0.4em] [fit = (node3-1) (node3-3)] (inputshadow)
{}
;
\draw
[->,thick](node1-3.north)--(node1-1)node[pos=0.5,left,font=
\scriptsize
]
{
Softmax
}
;
\draw
[->,thick](node2-2.north)--(node2-1);
\draw
[->,thick]
(node2-1.east)--(node1-3.west);
\draw
[->,thick](node3-41.north)--([xshift=-0.6em]node3-3.south);
\draw
[->,thick](node3-42.north)--([xshift=0.6em]node3-3.south);
\draw
[->,thick](node3-41.north)--(node3-3.south);
\draw
[->,thick](node3-3.north)--(node1-3.south);
\draw
[->,thick](node3-2.north)--(node3-1);
\draw
[->,thick]
(node3-3.east)--(node3-2.west);
\draw
[->,thick]
(node3-3.east)--(node3-1.west)node[pos=0.5,above,font=
\scriptsize
]
{
Softmax
}
;
\node
[anchor=east] (node2-1-1) at ([xshift=-12.0em,yshift=-4.25em]node1-1.west)
{
\small
{$
y'
$}}
;
\node
[anchor=south,line width=0.6pt,draw,rounded corners,minimum height=1.5em,minimum width=4.3em,fill=blue!20]
(node2-1-2) at ([yshift=-3em]node2-1-1.south)
{
\small
{
Softmax
}}
;
\node
[anchor=north,line width=0.6pt,draw,rounded corners,minimum height=2.2em,minimum width=4.3em,fill=blue!20]
(node2-1-3) at ([yshift=-2.0em]node2-1-2.south)
{
\small
{
解码器
}}
;
\node
[anchor=east] (node2-1-1) at ([xshift=-12.0em,yshift=-4.25em]node1-1.west)
{
\small
{$
y
$}}
;
\node
[anchor=north,line width=0.6pt,draw,rounded corners,minimum height=2.2em,minimum width=4.3em,fill=blue!20]
(node2-1-3) at ([yshift=-2.0em]node2-1-1.south)
{
\small
{
解码器
}}
;
\node
[anchor=east,line width=0.6pt,draw,rounded corners,minimum height=2.2em,minimum width=4.3em,fill=red!20]
(node2-2-1) at ([xshift=-2em]node2-1-3.west)
{
\small
{
编码器
}}
;
\node
[anchor=north]
(node2-2-2) at ([yshift=-2em]node2-2-1.south)
{
\small
{$
x
$}}
;
\node
[anchor=north]
(node2-2-3) at ([yshift=-2em]node2-1-3.south)
{
\small
{$
y
$}}
;
\node
[anchor=north]
(node2-2-3) at ([yshift=-2em]node2-1-3.south)
{
\small
{$
y
_{
<
}
$}}
;
\draw
[->,thick](node2-1-2.north)--(node2-1-1);
\draw
[->,thick](node2-2-2.north)--(node2-2-1);
\draw
[->,thick]
(node2-2-1.east)--(node2-1-3.west);
\draw
[->,thick](node2-1-3.north)--(node2-1-
2.south)
;
\draw
[->,thick](node2-1-3.north)--(node2-1-
1)node[pos=0.5,left,font=
\scriptsize
]
{
Softmax
}
;
\draw
[->,thick](node2-2-3.north)--(node2-1-3);
\node
[anchor=east] (node1) at ([xshift=-2.0em,yshift=4em]node2-1-1.west)
{
\small
{$
x,y
$
:双语数据
}}
;
\node
[anchor=east] (node1) at ([xshift=-2.0em,yshift=3em]node2-1-1.west)
{
\small
{$
x,y
$
:双语数据
}}
;
\node
[anchor=south] (node3) at ([xshift=1.96em]node1.north)
{
\small
{$
y
_{
<
}$
:目标语言文本数据
}}
;
\node
[anchor=north] (node2) at ([xshift=0.45em]node1.south)
{
\small
{$
z
$}
:单语数据
}
;
\node
[anchor=north](pos1) at ([yshift=-3.5em]node3-3.south)
{
\small
{
(b)多任务学习
}}
;
...
...
Chapter16/Figures/figure-parameter-initialization-method-diagram.tex
查看文件 @
40f9e93d
...
...
@@ -5,35 +5,33 @@
\tikzstyle
{
node
}
=[rounded corners=4pt,draw,minimum height=3em,drop shadow,font=
\footnotesize
]
\node
[node,minimum width=6em,minimum height=2.4em,fill=red!20,line width=0.6pt]
(encoder1) at (0,0)
{
\small
编码器
}
;
\node
[node,anchor=west,minimum width=6em,minimum height=2.4em,fill=red!20,line width=0.6pt]
(encoder2) at ([xshift=4em,yshift=0em]encoder1.east)
{
\small
编码器
}
;
\node
[node,anchor=west,minimum width=6em,minimum height=2.4em,fill=red!30,line width=0.6pt]
(encoder3) at ([xshift=3em]encoder2.east)
{
\small
编码器
}
;
\node
[node,anchor=west,minimum width=6em,minimum height=2.4em,fill=red!30,line width=0.6pt]
(encoder2) at ([xshift=7em,yshift=0em]encoder1.east)
{
\small
编码器
}
;
\node
[node,anchor=north,minimum width=6em,minimum height=2.4em,fill=blue!20,line width=0.6pt]
(decoder1) at ([yshift=-2em]encoder1.south)
{
\small
解码器
}
;
\node
[node,anchor=west,minimum width=6em,minimum height=2.4em,fill=blue!30,line width=0.6pt]
(decoder2) at ([xshift=7em,yshift=0em]decoder1.east)
{
\small
解码器
}
;
\node
[node,anchor=north,minimum width=6em,minimum height=2.4em,fill=blue!20,line width=0.6pt]
(decoder1) at ([yshift=-3em]encoder1.south)
{
\small
解码器
}
;
\node
[node,anchor=west,minimum width=6em,minimum height=2.4em,fill=blue!20,line width=0.6pt]
(decoder2) at ([xshift=4em,yshift=0em]decoder1.east)
{
\small
解码器
}
;
\node
[node,anchor=west,minimum width=6em,minimum height=2.4em,fill=blue!30,line width=0.6pt]
(decoder3) at ([xshift=3em]decoder2.east)
{
\small
解码器
}
;
\node
[anchor=north,font=\scriptsize,fill=yellow!20,drop shadow,draw]
(w1) at ([yshift=-1.6em]decoder1.south)
{
知识
\
就是
\
力量
\
。
\
<eos>
}
;
\node
[anchor=north,font=\scriptsize,fill=green!20,drop shadow,draw]
(w3) at ([yshift=-1.6em]decoder
3
.south)
{
El conocimiento es poder . <eos>
}
;
\node
[anchor=north,font=\scriptsize,fill=green!20,drop shadow,draw]
(w3) at ([yshift=-1.6em]decoder
2
.south)
{
El conocimiento es poder . <eos>
}
;
\node
[anchor=south,font=\scriptsize,fill=orange!20,drop shadow,draw]
(w2) at ([yshift=1.6em]encoder1.north)
{
Knowledge
\
is
\
power
\
.
}
;
\node
[anchor=south,font=\scriptsize,fill=orange!20,drop shadow,draw]
(w4) at ([yshift=1.6em]encoder
3
.north)
{
Knowledge
\
is
\
power
\
.
}
;
\node
[anchor=south,font=\scriptsize,fill=orange!20,drop shadow,draw]
(w4) at ([yshift=1.6em]encoder
2
.north)
{
Knowledge
\
is
\
power
\
.
}
;
\draw
[->,thick]
(decoder1.-90) -- (w1.north);
\draw
[->,thick]
(decoder
3
.-90) -- (w3.north);
\draw
[->,thick]
(decoder
2
.-90) -- (w3.north);
\draw
[->,thick]
(w2.-90) -- (encoder1.90);
\draw
[->,thick]
(w4.-90) -- (encoder3.90);
\draw
[->,thick]
(w4.-90) -- (encoder2.90);
\draw
[->,thick]
(encoder1.south)--(decoder1.north);
\draw
[->,thick]
(encoder2.south)--(decoder2.north);
\node
[anchor=north,single arrow,minimum height=2.2em,fill=blue!20,rotate=-90] (arrow1) at ([yshift=-1.4em,xshift=0.4em]encoder1.south)
{}
;
\node
[anchor=north,single arrow,minimum height=2.2em,fill=red!20,rotate=-90] (arrow2) at ([yshift=-1.4em,xshift=0.4em]encoder2.south)
{}
;
\node
[anchor=north,single arrow,minimum height=2.2em,fill=red!20,rotate=-90] (arrow3) at ([yshift=-1.4em,xshift=0.4em]encoder3.south)
{}
;
\node
[anchor=south,yshift=3.4em]
at (encoder1.north)
{
\small\bfnew
{
父模型
}}
;
\node
[anchor=south,yshift=3.4em]
at (encoder
3
.north)
{
\small\bfnew
{
子模型
}}
;
\node
[anchor=south,yshift=3.4em]
at (encoder
2
.north)
{
\small\bfnew
{
子模型
}}
;
\draw
[->,dash pattern=on 3pt off 2pt,thick]
([yshift=0em]encoder1.0) -- node[above,font=
\scriptsize
]
{
参数复用
}
(encoder2.180);
\draw
[->,dash pattern=on 3pt off 2pt,thick]
(encoder2.0) -- node[above,font=
\scriptsize
]
{
微调
}
(encoder3.180);
\draw
[->,dash pattern=on 3pt off 2pt,thick]
([yshift=0em]decoder1.0) -- node[above,font=
\scriptsize
]
{
参数复用
}
(decoder2.180);
\draw
[->,dash pattern=on 3pt off 2pt,thick]
(decoder2.0) -- node[above,font=
\scriptsize
]
{
微调
}
(decoder3.180);
\draw
[->,dash pattern=on 3pt off 2pt,thick]
([yshift=0em]encoder1.0) -- node[above,font=
\scriptsize
]
{
参数复用
\&
微调
}
(encoder2.180);
\draw
[->,dash pattern=on 3pt off 2pt,thick]
([yshift=0em]decoder1.0) -- node[above,font=
\scriptsize
]
{
参数复用
\&
微调
}
(decoder2.180);
\end{tikzpicture}
...
...
Chapter16/chapter16.tex
查看文件 @
40f9e93d
...
...
@@ -235,7 +235,7 @@
\parinterval
在训练一个神经网络的时候,如果过分地关注单个训练目标,可能使模型忽略掉其他可能有帮助的信息,这些信息可能来自于一些其他相关的任务
\upcite
{
DBLP:journals/corr/Ruder17a
}
。通过联合多个独立但相关的任务共同学习,任务之间相互``促进'',就是多任务学习
\upcite
{
DBLP:journals/corr/Ruder17a,DBLP:books/sp/98/Caruana98,liu2019multi
}
。多任务学习的常用做法是,针对多个相关的任务,共享模型的部分参数来学习不同任务之间相似的特征,并通过特定的模块来学习每个任务独立的特征(见
\chapterfifteen
)。常用的策略是对底层的模型参数进行共享,顶层的模型参数用于独立学习各个不同的任务。
\parinterval
在神经机器翻译中,应用多任务学习的主要策略是将翻译任务作为主任务,同时设置一些仅使用单语数据的子任务,通过这些子任务来捕捉单语数据中的语言知识
\upcite
{
DBLP:conf/emnlp/DomhanH17,DBLP:conf/emnlp/ZhangZ16,DBLP:journals/corr/LuongLSVK15
}
。一种多任务学习的方法是利用源语言单语数据,通过单个编码器对源语言数据进行建模,再分别使用两个解码器来学习源语言排序和翻译任务。源语言排序任务是指利用预排序规则对源语言句子中词的顺序进行调整
\upcite
{
DBLP:conf/emnlp/WangCK07
}
,可以通过单语数据来构造训练数据,从而使编码器被训练得更加充分
\upcite
{
DBLP:conf/emnlp/ZhangZ16
}
,如图
\ref
{
fig:16-7
}
所示。
\parinterval
在神经机器翻译中,应用多任务学习的主要策略是将翻译任务作为主任务,同时设置一些仅使用单语数据的子任务,通过这些子任务来捕捉单语数据中的语言知识
\upcite
{
DBLP:conf/emnlp/DomhanH17,DBLP:conf/emnlp/ZhangZ16,DBLP:journals/corr/LuongLSVK15
}
。一种多任务学习的方法是利用源语言单语数据,通过单个编码器对源语言数据进行建模,再分别使用两个解码器来学习源语言排序和翻译任务。源语言排序任务是指利用预排序规则对源语言句子中词的顺序进行调整
\upcite
{
DBLP:conf/emnlp/WangCK07
}
,可以通过单语数据来构造训练数据,从而使编码器被训练得更加充分
\upcite
{
DBLP:conf/emnlp/ZhangZ16
}
,如图
\ref
{
fig:16-7
}
所示
,图中
$
y
_{
<
}$
表示当前时刻之前的译文,
$
x
_{
<
}$
表示源语言句子中词的顺序调整后的句子
。
%----------------------------------------------
\begin{figure}
[htp]
\centering
...
...
@@ -245,7 +245,7 @@
\end{figure}
%----------------------------------------------
\parinterval
虽然神经机器翻译模型可以看作一种语言生成模型,但生成过程中却依赖于源语言信息,因此无法直接利用目标语言单语数据进行多任务学习。针对这个问题,可以对原有翻译模型结构进行修改,在解码器底层增加一个语言模型子层,这个子层用于学习语言模型任务,与编码器端是完全独立的,如图
\ref
{
fig:16-8
}
所示
\upcite
{
DBLP:conf/emnlp/DomhanH17
}
。在训练过程中,分别将双语数据和单语数据送入翻译模型和语言模型进行计算,双语数据训练产生的梯度用于对整个模型进行参数更新,而单语数据产生的梯度只对语言模型子层进行参数更新。
\parinterval
虽然神经机器翻译模型可以看作一种语言生成模型,但生成过程中却依赖于源语言信息,因此无法直接利用目标语言单语数据进行多任务学习。针对这个问题,可以对原有翻译模型结构进行修改,在解码器底层增加一个语言模型子层,这个子层用于学习语言模型任务,与编码器端是完全独立的,如图
\ref
{
fig:16-8
}
所示
\upcite
{
DBLP:conf/emnlp/DomhanH17
}
,图中
$
y
_{
<
}$
表示当前时刻之前的译文,
$
z
_{
<
}$
表示当前时刻之前的单语数据
。在训练过程中,分别将双语数据和单语数据送入翻译模型和语言模型进行计算,双语数据训练产生的梯度用于对整个模型进行参数更新,而单语数据产生的梯度只对语言模型子层进行参数更新。
%----------------------------------------------
\begin{figure}
[htp]
...
...
编写
预览
Markdown
格式
0%
重试
或
添加新文件
添加附件
取消
您添加了
0
人
到此讨论。请谨慎行事。
请先完成此评论的编辑!
取消
请
注册
或者
登录
后发表评论