Skip to content
项目
群组
代码片段
帮助
当前项目
正在载入...
登录 / 注册
切换导航面板
T
Toy-MT-Introduction
概览
Overview
Details
Activity
Cycle Analytics
版本库
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
问题
0
Issues
0
列表
Board
标记
里程碑
合并请求
0
Merge Requests
0
CI / CD
CI / CD
流水线
作业
日程表
图表
维基
Wiki
代码片段
Snippets
成员
Collapse sidebar
Close sidebar
活动
图像
聊天
创建新问题
作业
提交
Issue Boards
Open sidebar
NiuTrans
Toy-MT-Introduction
Commits
a3a7c1da
Commit
a3a7c1da
authored
May 09, 2020
by
zengxin
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
chapter7 fig
parent
46043306
隐藏空白字符变更
内嵌
并排
正在显示
4 个修改的文件
包含
61 行增加
和
61 行删除
+61
-61
Book/Chapter7/Chapter7.tex
+4
-4
Book/Chapter7/Figures/figure-batch-generation-method.tex
+37
-36
Book/Chapter7/Figures/figure-randomly-generation-vs-generate-by-sentence-length.tex
+20
-19
Book/Chapter7/Figures/figure-word-change.tex
+0
-2
没有找到文件。
Book/Chapter7/Chapter7.tex
查看文件 @
a3a7c1da
...
...
@@ -270,7 +270,7 @@
\subsubsection
{
大词表和OOV问题
}
\parinterval
首先来具体看一看神经机器翻译的大词表问题。神经机器翻译模型训练和解码都依赖于源语言和目标语言的词表。在建模中,词表中的每一个单词都会被转换为分布式(向量)表示,即词嵌入。这些向量会作为模型的输入(见第六章)。如果每个单词都对应一个向量,那么单词的各种变形(时态、语态等)都会导致词表和相应的向量数量的增加。
\parinterval
首先来具体看一看神经机器翻译的大词表问题。神经机器翻译模型训练和解码都依赖于源语言和目标语言的词表。在建模中,词表中的每一个单词都会被转换为分布式(向量)表示,即词嵌入。这些向量会作为模型的输入(见第六章)。如果每个单词都对应一个向量,那么单词的各种变形(时态、语态等)都会导致词表和相应的向量数量的增加。
图
\ref
{
fig:7-7
}
展示了一些英语单词的时态语态变化。
%----------------------------------------------
\begin{figure}
[htp]
...
...
@@ -1180,9 +1180,7 @@ b &=& \omega_{\textrm{high}}\cdot |\mathbf{x}|
\label
{
eq:7-15
}
\end{eqnarray}
\noindent
其中
$
\gamma
_{
k
}$
表示第
$
k
$
个系统的权重,且满足
$
\sum
_{
k
=
1
}^{
K
}
\gamma
_{
k
}
=
1
$
。公式
\ref
{
eq:7-15
}
是一种线性模型。权重
$
\{
\gamma
_{
k
}
\}
$
可以在开发集上自动调整,比如,使用最小错误率训练得到最优的权重(见第四章)。不过在实践中发现,如果这
$
K
$
个模型都是由一个基础模型衍生出来的,权重
$
\{
\gamma
_{
k
}
\}
$
对最终结果的影响并不大。因此,有时候也简单的将权重设置为
$
\gamma
_{
k
}
=
\frac
{
1
}{
K
}$
。
\parinterval
公式
\ref
{
eq:7-15
}
是一种典型的线性插值模型,这类模型在语言建模等任务中已经得到成功应用。从统计学习的角度,对多个模型的插值可以有效的降低经验错误率。不过,多模型集成依赖一个假设:这些模型之间需要有一定的互补性。这种互补性有时也体现在多个模型预测的上限上,称为Oracle。比如,可以把这
$
K
$
个模型输出中BLEU最高的结果作为Oracle,也可以选择每个预测结果中使BLEU达到最高的译文单词,这样构成的句子作为Oracle。当然,并不是说Oracle提高,模型集成的结果一定会变好。因为Oracle是最理想情况下的结果,而实际预测的结果与Oracle往往有很大差异。如何使用Oracle进行模型优化也是很多研究者在探索的问题。
\noindent
其中
$
\gamma
_{
k
}$
表示第
$
k
$
个系统的权重,且满足
$
\sum
_{
k
=
1
}^{
K
}
\gamma
_{
k
}
=
1
$
。公式
\ref
{
eq:7-15
}
是一种线性模型。权重
$
\{
\gamma
_{
k
}
\}
$
可以在开发集上自动调整,比如,使用最小错误率训练得到最优的权重(见第四章)。不过在实践中发现,如果这
$
K
$
个模型都是由一个基础模型衍生出来的,权重
$
\{
\gamma
_{
k
}
\}
$
对最终结果的影响并不大。因此,有时候也简单的将权重设置为
$
\gamma
_{
k
}
=
\frac
{
1
}{
K
}$
。图
\ref
{
fig:7-25
}
展示了对三个模型预测结果的集成。
%----------------------------------------------
\begin{figure}
[htp]
...
...
@@ -1193,6 +1191,8 @@ b &=& \omega_{\textrm{high}}\cdot |\mathbf{x}|
\end{figure}
%----------------------------------------------
\parinterval
公式
\ref
{
eq:7-15
}
是一种典型的线性插值模型,这类模型在语言建模等任务中已经得到成功应用。从统计学习的角度,对多个模型的插值可以有效的降低经验错误率。不过,多模型集成依赖一个假设:这些模型之间需要有一定的互补性。这种互补性有时也体现在多个模型预测的上限上,称为Oracle。比如,可以把这
$
K
$
个模型输出中BLEU最高的结果作为Oracle,也可以选择每个预测结果中使BLEU达到最高的译文单词,这样构成的句子作为Oracle。当然,并不是说Oracle提高,模型集成的结果一定会变好。因为Oracle是最理想情况下的结果,而实际预测的结果与Oracle往往有很大差异。如何使用Oracle进行模型优化也是很多研究者在探索的问题。
\parinterval
此外,如何构建集成用的模型也是非常重要的,甚至说这部分工作会成为模型集成方法中最困难的部分。绝大多数时候,模型生成并没有固定的方法。系统研发者大多也是``八仙过海、各显神通''。一些常用的方法有:
\begin{itemize}
...
...
Book/Chapter7/Figures/figure-batch-generation-method.tex
查看文件 @
a3a7c1da
\begin{tikzpicture}
\tikzstyle
{
node
}
= [minimum height=1.0em,draw=teal,fill=teal!10]
\tikzstyle
{
legend
}
= [minimum height=1.0
em,minimum width=1.0
em,draw]
\tikzstyle
{
node2
}
= [minimum width=1.0
em,minimum height=4.1
em,draw=blue,fill=blue!10]
\node
[node,minimum width=2.8em]
(node1) at (0,0)
{}
;
\node
[node,minimum width=4.0em,anchor=north west]
(node2) at (node1.south west)
{}
;
\node
[node,minimum width=3.2em,anchor=north west]
(node3) at (node2.south west)
{}
;
\node
[node,minimum width=3.0em,anchor=north west]
(node4) at (node3.south west)
{}
;
\tikzstyle
{
node
}
= [minimum height=1.0
*1.2
em,draw=teal,fill=teal!10]
\tikzstyle
{
legend
}
= [minimum height=1.0
*1.2em,minimum width=1.0*1.2
em,draw]
\tikzstyle
{
node2
}
= [minimum width=1.0
*1.2em,minimum height=4.1*1.2
em,draw=blue,fill=blue!10]
\node
[node,minimum width=2.8
*1.2
em]
(node1) at (0,0)
{}
;
\node
[node,minimum width=4.0
*1.2
em,anchor=north west]
(node2) at (node1.south west)
{}
;
\node
[node,minimum width=3.2
*1.2
em,anchor=north west]
(node3) at (node2.south west)
{}
;
\node
[node,minimum width=3.0
*1.2
em,anchor=north west]
(node4) at (node3.south west)
{}
;
\node
[node2,anchor = north west]
(grad1) at ([xshift=1.2em]node1.north east)
{}
;
\node
[node,minimum width=3.7em,anchor=north west]
(node5) at (grad1.north east)
{}
;
\node
[node,minimum width=2.8em,anchor=north west]
(node6) at (node5.south west)
{}
;
\node
[node,minimum width=3.2em,anchor=north west]
(node7) at (node6.south west)
{}
;
\node
[node,minimum width=4.0em,anchor=north west]
(node8) at (node7.south west)
{}
;
\node
[font=\
script
size,anchor=east]
(line1) at (node1.west)
{
gpu1
}
;
\node
[font=\
script
size,anchor=east]
(line2) at (node2.west)
{
gpu2
}
;
\node
[font=\
script
size,anchor=east]
(line3) at (node3.west)
{
gpu3
}
;
\node
[font=\
script
size,anchor=east]
(line4) at (node4.west)
{
gpu4
}
;
\node
[node,minimum width=3.7
*1.2
em,anchor=north west]
(node5) at (grad1.north east)
{}
;
\node
[node,minimum width=2.8
*1.2
em,anchor=north west]
(node6) at (node5.south west)
{}
;
\node
[node,minimum width=3.2
*1.2
em,anchor=north west]
(node7) at (node6.south west)
{}
;
\node
[node,minimum width=4.0
*1.2
em,anchor=north west]
(node8) at (node7.south west)
{}
;
\node
[font=\
footnote
size,anchor=east]
(line1) at (node1.west)
{
gpu1
}
;
\node
[font=\
footnote
size,anchor=east]
(line2) at (node2.west)
{
gpu2
}
;
\node
[font=\
footnote
size,anchor=east]
(line3) at (node3.west)
{
gpu3
}
;
\node
[font=\
footnote
size,anchor=east]
(line4) at (node4.west)
{
gpu4
}
;
\node
[node2,anchor = north west]
(grad2) at ([xshift=0.3em]node5.north east)
{}
;
\draw
[->]
(-1.4em
,-3.62em) -- (9.5em,-3.6
2em);
\draw
[->]
(-1.4em
*1.2,-3.62*1.2em) -- (9em*1.2,-3.62*1.
2em);
\node
[node,minimum width=2.8
em]
(node9) at (15
em,0)
{}
;
\node
[node,minimum width=4.0em,anchor=north west]
(node10) at (node9.south west)
{}
;
\node
[node,minimum width=3.2em,anchor=north west]
(node11) at (node10.south west)
{}
;
\node
[node,minimum width=3.0em,anchor=north west]
(node12) at (node11.south west)
{}
;
\node
[node,minimum width=2.8
*1.2em]
(node9) at (16
em,0)
{}
;
\node
[node,minimum width=4.0
*1.2
em,anchor=north west]
(node10) at (node9.south west)
{}
;
\node
[node,minimum width=3.2
*1.2
em,anchor=north west]
(node11) at (node10.south west)
{}
;
\node
[node,minimum width=3.0
*1.2
em,anchor=north west]
(node12) at (node11.south west)
{}
;
\node
[node,minimum width=3.7em,anchor=north west]
(node13) at (node9.north east)
{}
;
\node
[node,minimum width=2.8em,anchor=north west]
(node14) at (node10.north east)
{}
;
\node
[node,minimum width=3.2em,anchor=north west]
(node15) at (node11.north east)
{}
;
\node
[node,minimum width=4.0em,anchor=north west]
(node16) at (node12.north east)
{}
;
\node
[node,minimum width=3.7
*1.2
em,anchor=north west]
(node13) at (node9.north east)
{}
;
\node
[node,minimum width=2.8
*1.2
em,anchor=north west]
(node14) at (node10.north east)
{}
;
\node
[node,minimum width=3.2
*1.2
em,anchor=north west]
(node15) at (node11.north east)
{}
;
\node
[node,minimum width=4.0
*1.2
em,anchor=north west]
(node16) at (node12.north east)
{}
;
\node
[node2,anchor = north west]
(grad3) at ([xshift=0.5em]node13.north east)
{}
;
\node
[font=\
script
size,anchor=east]
(line1) at (node9.west)
{
gpu1
}
;
\node
[font=\
script
size,anchor=east]
(line2) at (node10.west)
{
gpu2
}
;
\node
[font=\
script
size,anchor=east]
(line3) at (node11.west)
{
gpu3
}
;
\node
[font=\
script
size,anchor=east]
(line4) at (node12.west)
{
gpu4
}
;
\draw
[->]
(13.6
em,-3.62em) -- (22.2em,-3.6
2em);
\node
[font=\
footnote
size,anchor=east]
(line1) at (node9.west)
{
gpu1
}
;
\node
[font=\
footnote
size,anchor=east]
(line2) at (node10.west)
{
gpu2
}
;
\node
[font=\
footnote
size,anchor=east]
(line3) at (node11.west)
{
gpu3
}
;
\node
[font=\
footnote
size,anchor=east]
(line4) at (node12.west)
{
gpu4
}
;
\draw
[->]
(13.6
*1.2em,-3.62*1.2em) -- (20.5*1.2em,-3.62*1.
2em);
\begin{pgfonlayer}
{
background
}
\node
[rectangle,inner sep=-0.0em,draw] [fit = (node1) (node2) (node3) (node4)] (box1)
{}
;
\node
[rectangle,inner sep=-0.0em,draw] [fit = (node5) (node6) (node7) (node8)] (box2)
{}
;
\node
[rectangle,inner sep=-0.0em,draw] [fit = (node9) (node13) (node12) (node16)] (box2)
{}
;
\end{pgfonlayer}
\node
[font=\
script
size,anchor=north]
(legend1) at ([xshift=3em]node4.south)
{
一步一更新
}
;
\node
[font=\
script
size,anchor=north]
(legend2) at ([xshift=2.5em]node12.south)
{
累积两步更新
}
;
\node
[font=\
script
size,anchor=north]
(time1) at (grad2.south)
{
time
}
;
\node
[font=\
script
size,anchor=north]
(time1) at (grad3.south)
{
time
}
;
\node
[font=\
footnote
size,anchor=north]
(legend1) at ([xshift=3em]node4.south)
{
一步一更新
}
;
\node
[font=\
footnote
size,anchor=north]
(legend2) at ([xshift=2.5em]node12.south)
{
累积两步更新
}
;
\node
[font=\
footnote
size,anchor=north]
(time1) at (grad2.south)
{
time
}
;
\node
[font=\
footnote
size,anchor=north]
(time1) at (grad3.south)
{
time
}
;
\node
[legend]
(legend3) at (2em,2em)
{}
;
\node
[font=\
script
size,anchor=west]
(idle) at (legend3.east)
{
:空闲
}
;
\node
[font=\
footnote
size,anchor=west]
(idle) at (legend3.east)
{
:空闲
}
;
\node
[legend,anchor=west,draw=teal,fill=teal!10]
(legend4) at ([xshift = 2em]idle.east)
{}
;
\node
[font=\
script
size,anchor=west]
(FB) at (legend4.east)
{
:前向/反向
}
;
\node
[font=\
footnote
size,anchor=west]
(FB) at (legend4.east)
{
:前向/反向
}
;
\node
[legend,anchor=west,draw=blue,fill=blue!10]
(legend5) at ([xshift = 2em]FB.east)
{}
;
\node
[font=\
script
size,anchor=west]
(grad
_
sync) at (legend5.east)
{
:梯度更新
}
;
\node
[font=\
footnote
size,anchor=west]
(grad
_
sync) at (legend5.east)
{
:梯度更新
}
;
\end{tikzpicture}
\ No newline at end of file
Book/Chapter7/Figures/figure-randomly-generation-vs-generate-by-sentence-length.tex
查看文件 @
a3a7c1da
\begin{tikzpicture}
\tikzstyle
{
node
}
= [minimum height=1.0em,draw=teal,fill=teal!10]
\node
[node,minimum width=2.0em]
(sent1) at (0,0)
{}
;
\node
[node,minimum width=5.0em,anchor=north west]
(sent2) at (sent1.south west)
{}
;
\node
[node,minimum width=1.0em,anchor=north west]
(sent3) at (sent2.south west)
{}
;
\node
[node,minimum width=3.0em,anchor=north west]
(sent4) at (sent3.south west)
{}
;
\tikzstyle
{
node
}
= [minimum height=1.0
*1.2
em,draw=teal,fill=teal!10]
\node
[node,minimum width=2.0
*1.2
em]
(sent1) at (0,0)
{}
;
\node
[node,minimum width=5.0
*1.2
em,anchor=north west]
(sent2) at (sent1.south west)
{}
;
\node
[node,minimum width=1.0
*1.2
em,anchor=north west]
(sent3) at (sent2.south west)
{}
;
\node
[node,minimum width=3.0
*1.2
em,anchor=north west]
(sent4) at (sent3.south west)
{}
;
\node
[node,minimum width=4.0
em]
(sent5) at (12
em,0)
{}
;
\node
[node,minimum width=4.5em,anchor=north west]
(sent6) at (sent5.south west)
{}
;
\node
[node,minimum width=4.5em,anchor=north west]
(sent7) at (sent6.south west)
{}
;
\node
[node,minimum width=5em,anchor=north west]
(sent8) at (sent7.south west)
{}
;
\node
[node,minimum width=4.0
*1.2em]
(sent5) at (14
em,0)
{}
;
\node
[node,minimum width=4.5
*1.2
em,anchor=north west]
(sent6) at (sent5.south west)
{}
;
\node
[node,minimum width=4.5
*1.2
em,anchor=north west]
(sent7) at (sent6.south west)
{}
;
\node
[node,minimum width=5
*1.2
em,anchor=north west]
(sent8) at (sent7.south west)
{}
;
\node
[font=\
script
size,anchor=east]
(line1) at (sent1.west)
{
sent1
}
;
\node
[font=\
script
size,anchor=east]
(line2) at (sent2.west)
{
sent2
}
;
\node
[font=\
script
size,anchor=east]
(line3) at (sent3.west)
{
sent3
}
;
\node
[font=\
script
size,anchor=east]
(line4) at (sent4.west)
{
sent4
}
;
\node
[font=\
footnote
size,anchor=east]
(line1) at (sent1.west)
{
sent1
}
;
\node
[font=\
footnote
size,anchor=east]
(line2) at (sent2.west)
{
sent2
}
;
\node
[font=\
footnote
size,anchor=east]
(line3) at (sent3.west)
{
sent3
}
;
\node
[font=\
footnote
size,anchor=east]
(line4) at (sent4.west)
{
sent4
}
;
\node
[font=\
script
size,anchor=east]
(line5) at (sent5.west)
{
sent1
}
;
\node
[font=\
script
size,anchor=east]
(line6) at (sent6.west)
{
sent2
}
;
\node
[font=\
script
size,anchor=east]
(line7) at (sent7.west)
{
sent3
}
;
\node
[font=\
script
size,anchor=east]
(line8) at (sent8.west)
{
sent4
}
;
\node
[font=\
footnote
size,anchor=east]
(line5) at (sent5.west)
{
sent1
}
;
\node
[font=\
footnote
size,anchor=east]
(line6) at (sent6.west)
{
sent2
}
;
\node
[font=\
footnote
size,anchor=east]
(line7) at (sent7.west)
{
sent3
}
;
\node
[font=\
footnote
size,anchor=east]
(line8) at (sent8.west)
{
sent4
}
;
\begin{pgfonlayer}
{
background
}
\node
[rectangle,inner sep=-0.0em,draw] [fit = (sent1) (sent2) (sent3) (sent4)] (box1)
{}
;
\node
[rectangle,inner sep=-0.0em,draw] [fit = (sent5) (sent6) (sent7) (sent8)] (box2)
{}
;
\end{pgfonlayer}
\node
[font=\
scriptsize]
(node1) at ([yshift=-3
em]sent2.south)
{
随机生成
}
;
\node
[font=\
script
size]
(node2) at ([yshift=-1em]sent8.south)
{
排序生成
}
;
\node
[font=\
footnotesize]
(node1) at ([yshift=-3.4
em]sent2.south)
{
随机生成
}
;
\node
[font=\
footnote
size]
(node2) at ([yshift=-1em]sent8.south)
{
排序生成
}
;
\end{tikzpicture}
\ No newline at end of file
Book/Chapter7/Figures/figure-word-change.tex
查看文件 @
a3a7c1da
\begin{center}
\centerline
{
以英语为例:
}
\vspace
{
0.5em
}
\begin{tikzpicture}
\node
[rounded corners=3pt,minimum width=10.0em,minimum height=2.0em,draw,thick,fill=green!5,font=\scriptsize,drop shadow,inner sep=0.5em]
(left) at (0,0)
{
\begin{tabular}
{
c
}
...
...
编写
预览
Markdown
格式
0%
重试
或
添加新文件
添加附件
取消
您添加了
0
人
到此讨论。请谨慎行事。
请先完成此评论的编辑!
取消
请
注册
或者
登录
后发表评论