Skip to content
项目
群组
代码片段
帮助
当前项目
正在载入...
登录 / 注册
切换导航面板
T
Toy-MT-Introduction
概览
Overview
Details
Activity
Cycle Analytics
版本库
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
问题
0
Issues
0
列表
Board
标记
里程碑
合并请求
0
Merge Requests
0
CI / CD
CI / CD
流水线
作业
日程表
图表
维基
Wiki
代码片段
Snippets
成员
Collapse sidebar
Close sidebar
活动
图像
聊天
创建新问题
作业
提交
Issue Boards
Open sidebar
NiuTrans
Toy-MT-Introduction
Commits
2a3a3fe1
Commit
2a3a3fe1
authored
Dec 21, 2019
by
姜雨帆
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Update Transformer & RNN
parent
89ecf9be
显示空白字符变更
内嵌
并排
正在显示
1 个修改的文件
包含
255 行增加
和
156 行删除
+255
-156
Section06-Neural-Machine-Translation/section06.tex
+255
-156
没有找到文件。
Section06-Neural-Machine-Translation/section06.tex
查看文件 @
2a3a3fe1
...
...
@@ -3243,12 +3243,6 @@ $\textrm{``you''} = \argmax_{y} \textrm{P}(y|\textbf{s}_1, \alert{\textbf{C}})$
\end{frame}
%%%------------------------------------------------------------------------------------------------------------
%%% 实验结果
\begin{frame}
{
效果
}
%% 实用注意力机制带来的提升
%% 个大评测比赛没有不使用注意力机制的系统,已经成为标配
\end{frame}
%%%------------------------------------------------------------------------------------------------------------
%%% GNMT
...
...
@@ -3494,6 +3488,46 @@ $\textrm{``you''} = \argmax_{y} \textrm{P}(y|\textbf{s}_1, \alert{\textbf{C}})$
\end{frame}
%%%------------------------------------------------------------------------------------------------------------
%%% 实验结果
\begin{frame}
{
效果
}
%% 实用注意力机制带来的提升
%% 个大评测比赛没有不使用注意力机制的系统,已经成为标配
\begin{itemize}
\item
在引入注意力机制之前,神经机器翻译(RNNSearch)的性能要弱于统计机器翻译(PBMT)
\item
加入注意力机制和深层网络之后,神经机器翻译性能有了很大幅度的提升
\item
虽然网络深度增加了,但是通过相应的结构设计和解码策略保证了解码速度
\end{itemize}
{
\footnotesize
\begin{center}
\setlength
{
\tabcolsep
}{
3pt
}
\renewcommand\arraystretch
{
1
}
\begin{tabular}
{
l
}
\begin{tabular}
{
lcccl
}
\specialrule
{
1pt
}{
1pt
}{
1pt
}
\multirow
{
2
}{
*
}{
\#
}
&
\multicolumn
{
2
}{
c
}{
BLEU
}
&
\multicolumn
{
2
}{
c
}{
\multirow
{
2
}{
*
}{
CPU decoding time
}}
\\
\cline
{
2-3
}
&
EN-DE
&
EN-FR
&
\\
\specialrule
{
0.6pt
}{
1pt
}{
1pt
}
PBMT
&
20.7
&
37.0
&
\multicolumn
{
2
}{
c
}{
-
}
\\
RNNSearch
&
16.5
&
-
&
\multicolumn
{
2
}{
c
}{
-
}
\\
LSTM(6 layers)
&
-
&
31.5
&
\multicolumn
{
2
}{
c
}{
-
}
\\
Deep-Att
&
20.6
&
37.7
&
\multicolumn
{
2
}{
c
}{
-
}
\\
\specialrule
{
0.6pt
}{
1pt
}{
1pt
}
GNMT
&
24.6
&
39.0
&
\multicolumn
{
2
}{
c
}{
0.2s per sentence
}
\\
\specialrule
{
1pt
}{
1pt
}{
1pt
}
\end{tabular}
\\
\addlinespace
[-0.3ex]
\footnote
*GNMT versus previous state-of-the-art models
\\
\end{tabular}
\end{center}
}
\end{frame}
%%%------------------------------------------------------------------------------------------------------------
\section
{
Transformer
}
%%%------------------------------------------------------------------------------------------------------------
...
...
@@ -4863,19 +4897,14 @@ x_{l+1} = x_l+\mathcal{F}(x_l)
\begin{frame}
{
优化器
}
% Adam
% 学习率调整
\end{frame}
%%%------------------------------------------------------------------------------------------------------------
\begin{frame}
{
训练配置
}
\begin{itemize}
\item
\textbf
{
优化器
}
:使用Adam优化器,
$
\beta
_
1
$
=0.9,
$
\beta
_
2
$
=0.98,
$
\epsilon
=
10
^{
-
9
}$
关于学习率的设置,引入了warmup策略,在训练初期,学习率从一个较小的初始值逐渐增大,当到达一定的步数,学习率再逐渐减小
\item
\textbf
{
优化器
}
:使用Adam优化器,
$
\beta
_
1
$
=0.9,
$
\beta
_
2
$
=0.98,
$
\epsilon
=
10
^{
-
9
}$
\item
\textbf
{
学习率
}
:关于学习率的设置,引入了warmup策略,在训练初期,学习率从一个较小的初始值逐渐增大,当到达一定的步数,学习率再逐渐减小
\begin{displaymath}
lrate=d
_{
\mathrm
{
model
}}^{
-0.5
}
\cdot
\min
(step
^{
-0.5
}
,step
\cdot
\mathrm
{
warmup
\_
steps
}^{
-1.5
}
)
\end{displaymath}
这样做可以减缓在训练初期的不稳定现象,保持分布平稳,通常warmup
\_
steps通常设置为4000
\vspace
{
0.5em
}
\only
<1>
{
...
...
@@ -4902,7 +4931,45 @@ x_{l+1} = x_l+\mathcal{F}(x_l)
\caption
{}
\label
{}
\end{figure}
}
\end{itemize}
\end{frame}
%%%------------------------------------------------------------------------------------------------------------
\begin{frame}
{
训练配置
}
\begin{itemize}
\item
\textbf
{
Mini Batch训练
}
:选择批量的数据作为训练样本,来计算损失函数,提高训练效率
\begin{itemize}
\item
Mini Batch大小通常设置为2048/4096(token数)
\item
通常对句子长度进行排序,选取长度相近的句子组成一个batch,可以减少padding数量,提高训练效率
\end{itemize}
\vspace
{
0.5em
}
\begin{center}
\begin{tikzpicture}
\tikzstyle
{
snode
}
= [draw,inner sep=1pt,minimum width=3em,minimum height=0.5em,rounded corners=1pt,fill=green!30!white]
\tikzstyle
{
pnode
}
= [draw,inner sep=1pt,minimum width=1em,minimum height=0.5em,rounded corners=1pt]
\node
[anchor=west,snode] (s1) at (0,0)
{
\tiny
{}}
;
\node
[anchor=north west,snode,minimum width=6.3em] (s2) at ([yshift=-0.3em]s1.south west)
{
\tiny
{}}
;
\node
[anchor=north west,snode,minimum width=2em] (s3) at ([yshift=-0.3em]s2.south west)
{
\tiny
{}}
;
\node
[anchor=east] (label1) at ([xshift=-0.8em,yshift=0.6em]s1.west)
{
\scriptsize
{
Shuffle:
}}
;
\node
[anchor=west,pnode,minimum width=3em] (p1) at ([xshift=0.3em]s1.east)
{
\tiny
{}}
;
\node
[anchor=west,pnode,minimum width=4em] (p3) at ([xshift=0.3em]s3.east)
{
\tiny
{}}
;
\node
[anchor=west,snode,minimum width=5em] (s4) at ([xshift=6em]p1.east)
{
\tiny
{}}
;
\node
[anchor=north west,snode,minimum width=5em] (s5) at ([yshift=-0.3em]s4.south west)
{
\tiny
{}}
;
\node
[anchor=north west,snode,minimum width=6.3em] (s6) at ([yshift=-0.3em]s5.south west)
{
\tiny
{}}
;
\node
[anchor=east] (label2) at ([xshift=-0.8em,yshift=0.6em]s4.west)
{
\scriptsize
{
Sorted:
}}
;
\node
[anchor=west,pnode,minimum width=1em] (p4) at ([xshift=0.3em]s4.east)
{
\tiny
{}}
;
\node
[anchor=west,pnode,minimum width=1em] (p5) at ([xshift=0.3em]s5.east)
{
\tiny
{}}
;
\node
[rectangle,inner sep=0.5em,rounded corners=2pt,very thick,dotted,draw=ugreen!80] [fit = (s1) (s3) (p1) (p3)] (box0)
{}
;
\node
[rectangle,inner sep=0.5em,rounded corners=2pt,very thick,dotted,draw=ugreen!80] [fit = (s4) (s6) (p4) (p5)] (box0)
{}
;
\end{tikzpicture}
\end{center}
\item
<2->
\textbf
{
Dropout
}
:为了防止网络训练过拟合,加入了Dropout操作。在四个地方用到了Dropout,词嵌入和位置编码、残差连接、注意力操作和前馈神经网络。Drop率通常设置为0.1
...
...
@@ -4964,173 +5031,180 @@ x_{l+1} = x_l+\mathcal{F}(x_l)
\visible
<3->
{
%\vspace{-0.5em}
\begin{center}
\begin{tikzpicture}
\begin{scope}
\tikzstyle
{
rnnnode
}
= [minimum height=1.1em,minimum width=2.1em,inner sep=2pt,rounded corners=1pt,draw,fill=red!20];
\node
[rnnnode,anchor=west] (h1) at (0,0)
{
\tiny
{$
\textbf
{
h
}_
1
$}}
;
\node
[rnnnode,anchor=west] (h2) at ([xshift=1em]h1.east)
{
\tiny
{$
\textbf
{
h
}_
2
$}}
;
\node
[rnnnode,anchor=west] (h3) at ([xshift=1em]h2.east)
{
\tiny
{$
\textbf
{
h
}_
3
$}}
;
\node
[rnnnode,anchor=north,fill=green!20] (e1) at ([yshift=-1em]h1.south)
{
\tiny
{$
e
_
x
()
$}}
;
\node
[rnnnode,anchor=west,fill=green!20] (e2) at ([xshift=1em]e1.east)
{
\tiny
{$
e
_
x
()
$}}
;
\node
[rnnnode,anchor=west,fill=green!20] (e3) at ([xshift=1em]e2.east)
{
\tiny
{$
e
_
x
()
$}}
;
\node
[anchor=north,inner sep=2pt] (w1) at ([yshift=-0.6em]e1.south)
{
\tiny
{
你
}}
;
\node
[anchor=north,inner sep=2pt] (w2) at ([yshift=-0.6em]e2.south)
{
\tiny
{
好
}}
;
\node
[anchor=north,inner sep=2pt] (w3) at ([yshift=-0.6em]e3.south)
{
\tiny
{
EOS
}}
;
\node
[anchor=south] (dot1) at ([xshift=0.4em,yshift=-0.7em]h1.south)
{
\tiny
{
...
}}
;
\node
[anchor=south] (dot2) at ([xshift=-0.4em,yshift=-0.7em]h3.south)
{
\tiny
{
...
}}
;
\draw
[->] (w1.north) -- ([yshift=-0.1em]e1.south);
\draw
[->] (w2.north) -- ([yshift=-0.1em]e2.south);
\draw
[->] (w3.north) -- ([yshift=-0.1em]e3.south);
\draw
[->] ([yshift=0.1em]e1.north) -- ([yshift=-0.1em]h1.south);
\draw
[->] ([yshift=0.1em]e2.north) -- ([yshift=-0.1em]h2.south);
\draw
[->] ([yshift=0.1em]e3.north) -- ([yshift=-0.1em]h3.south);
\draw
[->] ([xshift=0.2em,yshift=0.1em]e1.north) .. controls +(north:0.3) and +(south:0.4) .. ([xshift=-0.3em,yshift=-0.1em]h2.south);
\draw
[->] ([xshift=-0.2em,yshift=0.1em]e3.north) .. controls +(north:0.3) and +(south:0.4) .. ([xshift=0.3em,yshift=-0.1em]h2.south);
\node
[anchor=south] (encoder) at ([xshift=-0.2em]h1.north west)
{
\scriptsize
{
\textbf
{
编码器
}}}
;
\begin{tikzpicture}
\begin{scope}
\tikzstyle
{
rnnnode
}
= [minimum height=1.1em,minimum width=2.1em,inner sep=2pt,rounded corners=1pt,draw,fill=red!20];
\node
[rnnnode,anchor=west] (h1) at (0,0)
{
\tiny
{$
\textbf
{
h
}_
1
$}}
;
\node
[rnnnode,anchor=west] (h2) at ([xshift=1em]h1.east)
{
\tiny
{$
\textbf
{
h
}_
2
$}}
;
\node
[rnnnode,anchor=west] (h3) at ([xshift=1em]h2.east)
{
\tiny
{$
\textbf
{
h
}_
3
$}}
;
\node
[rnnnode,anchor=north,fill=green!20] (e1) at ([yshift=-1em]h1.south)
{
\tiny
{$
e
_
x
()
$}}
;
\node
[rnnnode,anchor=west,fill=green!20] (e2) at ([xshift=1em]e1.east)
{
\tiny
{$
e
_
x
()
$}}
;
\node
[rnnnode,anchor=west,fill=green!20] (e3) at ([xshift=1em]e2.east)
{
\tiny
{$
e
_
x
()
$}}
;
\node
[anchor=north,inner sep=2pt] (w1) at ([yshift=-0.6em]e1.south)
{
\tiny
{
你
}}
;
\node
[anchor=north,inner sep=2pt] (w2) at ([yshift=-0.6em]e2.south)
{
\tiny
{
好
}}
;
\node
[anchor=north,inner sep=2pt] (w3) at ([yshift=-0.6em]e3.south)
{
\tiny
{
EOS
}}
;
%\node [anchor=south] (dot1) at ([xshift=0.4em,yshift=-0.7em]h1.south) {\tiny{...}};
%\node [anchor=south] (dot2) at ([xshift=-0.4em,yshift=-0.7em]h3.south) {\tiny{...}};
\draw
[->] (w1.north) -- ([yshift=-0.1em]e1.south);
\draw
[->] (w2.north) -- ([yshift=-0.1em]e2.south);
\draw
[->] (w3.north) -- ([yshift=-0.1em]e3.south);
\draw
[->] ([yshift=0.1em]e1.north) -- ([yshift=-0.1em]h1.south);
\draw
[->] ([yshift=0.1em]e2.north) -- ([yshift=-0.1em]h2.south);
\draw
[->] ([yshift=0.1em]e3.north) -- ([yshift=-0.1em]h3.south);
\draw
[->] ([xshift=0.2em,yshift=0.1em]e1.north) .. controls +(north:0.3) and +(south:0.4) .. ([xshift=-0.3em,yshift=-0.1em]h2.south);
\draw
[->] ([xshift=-0.2em,yshift=0.1em]e3.north) .. controls +(north:0.3) and +(south:0.4) .. ([xshift=0.3em,yshift=-0.1em]h2.south);
\draw
[->] ([xshift=0.4em,yshift=-0.4em]h1.south) -- ([xshift=0.3em,yshift=-0.1em]h1.south);
\draw
[->] ([xshift=0.8em,yshift=-0.4em]h1.south) -- ([xshift=0.6em,yshift=-0.1em]h1.south);
\draw
[->] ([xshift=-0.4em,yshift=-0.4em]h3.south) -- ([xshift=-0.3em,yshift=-0.1em]h3.south);
\draw
[->] ([xshift=-0.8em,yshift=-0.4em]h3.south) -- ([xshift=-0.6em,yshift=-0.1em]h3.south);
\node
[anchor=south] (encoder) at ([xshift=-0.2em]h1.north west)
{
\scriptsize
{
\textbf
{
编码器
}}}
;
\visible
<5->
{
\node
[rnnnode,anchor=west,fill=green!20] (t1) at ([xshift=3em]e3.east)
{
\tiny
{$
e
_
y
()
$}}
;
}
\visible
<7->
{
\node
[rnnnode,anchor=west,fill=green!20] (t2) at ([xshift=1.5em]t1.east)
{
\tiny
{$
e
_
y
()
$}}
;
}
\visible
<8->
{
\node
[rnnnode,anchor=west,fill=green!20] (t3) at ([xshift=1.5em]t2.east)
{
\tiny
{$
e
_
y
()
$}}
;
\node
[rnnnode,anchor=west,fill=green!20] (t4) at ([xshift=1.5em]t3.east)
{
\tiny
{$
e
_
y
()
$}}
;
%\node [anchor=west,inner sep=2pt] (t5) at ([xshift=0.3em]t4.east) {\tiny{...}};
}
\visible
<5->
{
\node
[rnnnode,anchor=south] (s1) at ([yshift=1em]t1.north)
{
\tiny
{$
\textbf
{
s
}_
1
$}}
;
\node
[rnnnode,anchor=south] (f1) at ([yshift=1em]s1.north)
{
\tiny
{$
\textbf
{
f
}_
1
$}}
;
}
\visible
<7->
{
\node
[rnnnode,anchor=south] (s2) at ([yshift=1em]t2.north)
{
\tiny
{$
\textbf
{
s
}_
2
$}}
;
\node
[rnnnode,anchor=south] (f2) at ([yshift=1em]s2.north)
{
\tiny
{$
\textbf
{
f
}_
2
$}}
;
}
\visible
<8->
{
\node
[rnnnode,anchor=south] (s3) at ([yshift=1em]t3.north)
{
\tiny
{$
\textbf
{
s
}_
3
$}}
;
\node
[rnnnode,anchor=south] (f3) at ([yshift=1em]s3.north)
{
\tiny
{$
\textbf
{
f
}_
3
$}}
;
\node
[rnnnode,anchor=south] (s4) at ([yshift=1em]t4.north)
{
\tiny
{$
\textbf
{
s
}_
4
$}}
;
\node
[rnnnode,anchor=south] (f4) at ([yshift=1em]s4.north)
{
\tiny
{$
\textbf
{
f
}_
4
$}}
;
%\node [anchor=west,inner sep=2pt] (s5) at ([xshift=0.3em]s4.east) {\tiny{...}};
\node
[anchor=south] (dot3) at ([xshift=-0.4em,yshift=-0.7em]s3.south)
{
\tiny
{
...
}}
;
\node
[anchor=south] (dot4) at ([xshift=-0.4em,yshift=-0.7em]s4.south)
{
\tiny
{
...
}}
;
}
\visible
<5->
{
\node
[rnnnode,anchor=south,fill=blue!20] (o1) at ([yshift=1em]f1.north)
{
\tiny
{
softmax
}}
;
\node
[anchor=east] (decoder) at ([xshift=-0.3em,yshift=0.5em]o1.north west)
{
\scriptsize
{
\textbf
{
解码器
}}}
;
}
\visible
<7->
{
\node
[rnnnode,anchor=south,fill=blue!20] (o2) at ([yshift=1em]f2.north)
{
\tiny
{
softmax
}}
;
}
\visible
<8->
{
\node
[rnnnode,anchor=south,fill=blue!20] (o3) at ([yshift=1em]f3.north)
{
\tiny
{
softmax
}}
;
\node
[rnnnode,anchor=south,fill=blue!20] (o4) at ([yshift=1em]f4.north)
{
\tiny
{
softmax
}}
;
%\node [anchor=west,inner sep=2pt] (o5) at ([xshift=0.3em]o4.east) {\tiny{...}};
}
\visible
<4->
{
\node
[anchor=north,inner sep=2pt] (wt1) at ([yshift=-0.6em]t1.south)
{
\tiny
{
EOS
}}
;
}
\visible
<6->
{
\node
[anchor=north,inner sep=2pt] (wt2) at ([yshift=-0.6em]t2.south)
{
\tiny
{
How
}}
;
}
\visible
<8->
{
\node
[anchor=north,inner sep=2pt] (wt3) at ([yshift=-0.8em]t3.south)
{
\tiny
{
are
}}
;
\node
[anchor=north,inner sep=2pt] (wt4) at ([yshift=-0.8em]t4.south)
{
\tiny
{
you
}}
;
}
\visible
<5->
{
\node
[anchor=center,inner sep=2pt] (wo1) at ([yshift=1.2em]o1.north)
{
\tiny
{
How
}}
;
}
\visible
<4->
{
\node
[anchor=south,inner sep=2pt] (wos1) at (wo1.north)
{
\tiny
{
\textbf
{
[step 1]
}}}
;
}
\visible
<7->
{
\node
[anchor=center,inner sep=2pt] (wo2) at ([yshift=1.2em]o2.north)
{
\tiny
{
are
}}
;
}
\visible
<6->
{
\node
[anchor=south,inner sep=2pt] (wos2) at (wo2.north)
{
\tiny
{
\textbf
{
[step 2]
}}}
;
}
\visible
<8->
{
\node
[anchor=center,inner sep=2pt] (wo3) at ([yshift=1.2em]o3.north)
{
\tiny
{
you
}}
;
\node
[anchor=south,inner sep=2pt] (wos3) at (wo3.north)
{
\tiny
{
\textbf
{
[step 3]
}}}
;
\node
[anchor=center,inner sep=2pt] (wo4) at ([yshift=1.2em]o4.north)
{
\tiny
{
EOS
}}
;
\node
[anchor=south,inner sep=2pt] (wos4) at (wo4.north)
{
\tiny
{
\textbf
{
[step 4]
}}}
;
}
\visible
<5->
{
\node
[rnnnode,anchor=west,fill=green!20] (t1) at ([xshift=3em]e3.east)
{
\tiny
{$
e
_
y
()
$}}
;
}
\visible
<7->
{
\node
[rnnnode,anchor=west,fill=green!20] (t2) at ([xshift=1.5em]t1.east)
{
\tiny
{$
e
_
y
()
$}}
;
}
\visible
<8->
{
\node
[rnnnode,anchor=west,fill=green!20] (t3) at ([xshift=1.5em]t2.east)
{
\tiny
{$
e
_
y
()
$}}
;
\node
[rnnnode,anchor=west,fill=green!20] (t4) at ([xshift=1.5em]t3.east)
{
\tiny
{$
e
_
y
()
$}}
;
%\node [anchor=west,inner sep=2pt] (t5) at ([xshift=0.3em]t4.east) {\tiny{...}};
}
\visible
<5->
{
\node
[rnnnode,anchor=south] (s1) at ([yshift=1em]t1.north)
{
\tiny
{$
\textbf
{
s
}_
1
$}}
;
\node
[rnnnode,anchor=south] (f1) at ([yshift=1em]s1.north)
{
\tiny
{$
\textbf
{
f
}_
1
$}}
;
}
\visible
<7->
{
\node
[rnnnode,anchor=south] (s2) at ([yshift=1em]t2.north)
{
\tiny
{$
\textbf
{
s
}_
2
$}}
;
\node
[rnnnode,anchor=south] (f2) at ([yshift=1em]s2.north)
{
\tiny
{$
\textbf
{
f
}_
2
$}}
;
}
\visible
<8->
{
\node
[rnnnode,anchor=south] (s3) at ([yshift=1em]t3.north)
{
\tiny
{$
\textbf
{
s
}_
3
$}}
;
\node
[rnnnode,anchor=south] (f3) at ([yshift=1em]s3.north)
{
\tiny
{$
\textbf
{
f
}_
3
$}}
;
\node
[rnnnode,anchor=south] (s4) at ([yshift=1em]t4.north)
{
\tiny
{$
\textbf
{
s
}_
4
$}}
;
\node
[rnnnode,anchor=south] (f4) at ([yshift=1em]s4.north)
{
\tiny
{$
\textbf
{
f
}_
4
$}}
;
%\node [anchor=west,inner sep=2pt] (s5) at ([xshift=0.3em]s4.east) {\tiny{...}};
%
\node [anchor=south] (dot3) at ([xshift=-0.4em,yshift=-0.7em]s3.south) {\tiny{...}};
\node
[anchor=south] (dot4) at ([xshift=-0.4em,yshift=-0.7em]s4.south)
{
\tiny
{
...
}}
;
}
\visible
<5->
{
\node
[rnnnode,anchor=south,fill=blue!20] (o1) at ([yshift=1em]f1.north)
{
\tiny
{
softmax
}}
;
\node
[anchor=east] (decoder) at ([xshift=-0.3em,yshift=0.5em]o1.north west)
{
\scriptsize
{
\textbf
{
解码器
}}}
;
}
\visible
<7->
{
\node
[rnnnode,anchor=south,fill=blue!20] (o2) at ([yshift=1em]f2.north)
{
\tiny
{
softmax
}}
;
}
\visible
<8->
{
\node
[rnnnode,anchor=south,fill=blue!20] (o3) at ([yshift=1em]f3.north)
{
\tiny
{
softmax
}}
;
\node
[rnnnode,anchor=south,fill=blue!20] (o4) at ([yshift=1em]f4.north)
{
\tiny
{
softmax
}}
;
%\node [anchor=west,inner sep=2pt] (o5) at ([xshift=0.3em]o4.east) {\tiny{...}};
}
\visible
<4->
{
\node
[anchor=north,inner sep=2pt] (wt1) at ([yshift=-0.6em]t1.south)
{
\tiny
{
EOS
}}
;
}
\visible
<6->
{
\node
[anchor=north,inner sep=2pt] (wt2) at ([yshift=-0.6em]t2.south)
{
\tiny
{
How
}}
;
}
\visible
<8->
{
\node
[anchor=north,inner sep=2pt] (wt3) at ([yshift=-0.8em]t3.south)
{
\tiny
{
are
}}
;
\node
[anchor=north,inner sep=2pt] (wt4) at ([yshift=-0.8em]t4.south)
{
\tiny
{
you
}}
;
}
\visible
<5->
{
\node
[anchor=center,inner sep=2pt] (wo1) at ([yshift=1.2em]o1.north)
{
\tiny
{
How
}}
;
}
\visible
<4->
{
\node
[anchor=south,inner sep=2pt] (wos1) at (wo1.north)
{
\tiny
{
\textbf
{
[step 1]
}}}
;
}
\visible
<7->
{
\node
[anchor=center,inner sep=2pt] (wo2) at ([yshift=1.2em]o2.north)
{
\tiny
{
are
}}
;
}
\visible
<6->
{
\node
[anchor=south,inner sep=2pt] (wos2) at (wo2.north)
{
\tiny
{
\textbf
{
[step 2]
}}}
;
}
\visible
<8->
{
\node
[anchor=center,inner sep=2pt] (wo3) at ([yshift=1.2em]o3.north)
{
\tiny
{
you
}}
;
\node
[anchor=south,inner sep=2pt] (wos3) at (wo3.north)
{
\tiny
{
\textbf
{
[step 3]
}}}
;
\node
[anchor=center,inner sep=2pt] (wo4) at ([yshift=1.2em]o4.north)
{
\tiny
{
EOS
}}
;
\node
[anchor=south,inner sep=2pt] (wos4) at (wo4.north)
{
\tiny
{
\textbf
{
[step 4]
}}}
;
}
\visible
<5->
{
\foreach
\x
in
{
1
}{
\visible
<5->
{
\foreach
\x
in
{
1
}{
\draw
[->] ([yshift=-0.7em]t
\x
.south) -- ([yshift=-0.1em]t
\x
.south);
\draw
[->] ([yshift=0.1em]t
\x
.north) -- ([yshift=-0.1em]s
\x
.south);
\draw
[->] ([yshift=0.1em]s
\x
.north) -- ([yshift=-0.1em]f
\x
.south);
\draw
[->] ([yshift=0.1em]f
\x
.north) -- ([yshift=-0.1em]o
\x
.south);
\draw
[->] ([yshift=0.1em]o
\x
.north) -- ([yshift=0.8em]o
\x
.north) node [pos=0.5,right]
{
\tiny
{
top1
}}
;
}
}
}
}
\visible
<7->
{
\foreach
\x
in
{
2
}{
\visible
<7->
{
\foreach
\x
in
{
2
}{
\draw
[->] ([yshift=-0.7em]t
\x
.south) -- ([yshift=-0.1em]t
\x
.south);
\draw
[->] ([yshift=0.1em]t
\x
.north) -- ([yshift=-0.1em]s
\x
.south);
\draw
[->] ([yshift=0.1em]s
\x
.north) -- ([yshift=-0.1em]f
\x
.south);
\draw
[->] ([yshift=0.1em]f
\x
.north) -- ([yshift=-0.1em]o
\x
.south);
\draw
[->] ([yshift=0.1em]o
\x
.north) -- ([yshift=0.8em]o
\x
.north) node [pos=0.5,right]
{
\tiny
{
top1
}}
;
\draw
[->] ([xshift=0.2em,yshift=0.1em]t1.north) .. controls +(north:0.3) and +(south:0.3) .. ([xshift=-0.3em,yshift=-0.1em]s2.south);
}
}
\draw
[->] ([xshift=0.2em,yshift=0.1em]t1.north) .. controls +(north:0.3) and +(south:0.3) .. ([xshift=-0.3em,yshift=-0.1em]s2.south);
}
}
\visible
<8->
{
\foreach
\x
in
{
3,4
}{
\visible
<8->
{
\foreach
\x
in
{
3,4
}{
\draw
[->] ([yshift=-0.7em]t
\x
.south) -- ([yshift=-0.1em]t
\x
.south);
\draw
[->] ([yshift=0.1em]t
\x
.north) -- ([yshift=-0.1em]s
\x
.south);
\draw
[->] ([yshift=0.1em]s
\x
.north) -- ([yshift=-0.1em]f
\x
.south);
\draw
[->] ([yshift=0.1em]f
\x
.north) -- ([yshift=-0.1em]o
\x
.south);
\draw
[->] ([yshift=0.1em]o
\x
.north) -- ([yshift=0.8em]o
\x
.north) node [pos=0.5,right]
{
\tiny
{
top1
}}
;
%\draw [->] ([xshift=0.4em,yshift=0.1em]t1.north) .. controls +(north:0.25) and +(south:0.3) .. ([xshift=-0.6em,yshift=-0.1em]s3.south);
%\draw [->] ([xshift=0.2em,yshift=0.1em]t2.north) .. controls +(north:0.2) and +(south:0.4) .. ([xshift=-0.3em,yshift=-0.1em]s3.south);
}
}
%\draw [->] ([xshift=0.4em,yshift=0.1em]t1.north) .. controls +(north:0.25) and +(south:0.3) .. ([xshift=-0.6em,yshift=-0.1em]s3.south);
%\draw [->] ([xshift=0.2em,yshift=0.1em]t2.north) .. controls +(north:0.2) and +(south:0.4) .. ([xshift=-0.3em,yshift=-0.1em]s3.south);
\draw
[->] ([xshift=-0.6em,yshift=-0.5em]s3.south) .. controls +(north:0) and +(south:0.2) .. ([xshift=-0.3em,yshift=-0.1em]s3.south);
\draw
[->] ([xshift=-1.5em,yshift=-0.5em]s3.south) .. controls +(north:0) and +(south:0.15) .. ([xshift=-0.6em,yshift=-0.1em]s3.south);
}
}
\visible
<6->
{
\draw
[->,thick,dotted] (wo1.east) .. controls +(east:1.0) and +(west:1.0) ..(wt2.west);
}
\visible
<8->
{
\draw
[->,thick,dotted] (wo2.east) .. controls +(east:1.3) and +(west:1.1) ..(wt3.west);
\draw
[->,thick,dotted] (wo3.east) .. controls +(east:1.1) and +(west:0.9) ..(wt4.west);
}
\visible
<6->
{
\draw
[->,thick,dotted] (wo1.east) .. controls +(east:1.0) and +(west:1.0) ..(wt2.west);
}
\visible
<8->
{
\draw
[->,thick,dotted] (wo2.east) .. controls +(east:1.3) and +(west:1.1) ..(wt3.west);
\draw
[->,thick,dotted] (wo3.east) .. controls +(east:1.1) and +(west:0.9) ..(wt4.west);
}
\visible
<5->
{
\node
[circle,draw,anchor=south,inner sep=3pt,fill=orange!20] (c1) at ([yshift=2em]h2.north)
{
\tiny
{$
\textbf
{
C
}_
1
$}}
;
\node
[anchor=south] (c1label) at (c1.north)
{
\tiny
{
\textbf
{
编码-解码注意力机制:上下文
}}}
;
\draw
[->] (h1.north) .. controls +(north:0.6) and +(250:0.9) .. (c1.250);
\draw
[->] (h2.north) .. controls +(north:0.6) and +(270:0.9) .. (c1.270);
\draw
[->] (h3.north) .. controls +(north:0.6) and +(290:0.9) .. (c1.290);
\draw
[->] ([yshift=0.3em]s1.west) .. controls +(west:1) and +(east:1) .. (c1.-30);
\draw
[->] (c1.0) .. controls +(east:1) and +(west:1) .. ([yshift=0em]f1.west);
}
\visible
<5->
{
\node
[circle,draw,anchor=south,inner sep=3pt,fill=orange!20] (c1) at ([yshift=2em]h2.north)
{
\tiny
{$
\textbf
{
C
}_
1
$}}
;
\node
[anchor=south] (c1label) at (c1.north)
{
\tiny
{
\textbf
{
编码-解码注意力机制:上下文
}}}
;
\draw
[->] (h1.north) .. controls +(north:0.6) and +(250:0.9) .. (c1.250);
\draw
[->] (h2.north) .. controls +(north:0.6) and +(270:0.9) .. (c1.270);
\draw
[->] (h3.north) .. controls +(north:0.6) and +(290:0.9) .. (c1.290);
\draw
[->] ([yshift=0.3em]s1.west) .. controls +(west:1) and +(east:1) .. (c1.-30);
\draw
[->] (c1.0) .. controls +(east:1) and +(west:1) .. ([yshift=0em]f1.west);
}
\visible
<7->
{
\node
[circle,draw,anchor=north,inner sep=3pt,fill=orange!20] (c2) at ([yshift=-2em]t1.south)
{
\tiny
{$
\textbf
{
C
}_
2
$}}
;
\draw
[->] ([xshift=-0.7em]c2.west) -- ([xshift=-0.1em]c2.west);
\draw
[->] ([xshift=0.1em]c2.east) .. controls +(east:0.6) and +(west:0.8) ..([yshift=-0.3em,xshift=-0.1em]f2.west);
}
\visible
<7->
{
\node
[circle,draw,anchor=north,inner sep=3pt,fill=orange!20] (c2) at ([yshift=-2em]t1.south)
{
\tiny
{$
\textbf
{
C
}_
2
$}}
;
\draw
[->] ([xshift=-0.7em]c2.west) -- ([xshift=-0.1em]c2.west);
\draw
[->] ([xshift=0.1em]c2.east) .. controls +(east:0.6) and +(west:0.8) ..([yshift=-0.3em,xshift=-0.1em]f2.west);
}
\visible
<8->
{
\node
[circle,draw,anchor=north,inner sep=3pt,fill=orange!20] (c3) at ([yshift=-2em]t2.south)
{
\tiny
{$
\textbf
{
C
}_
3
$}}
;
\draw
[->] ([xshift=-0.7em]c3.west) -- ([xshift=-0.1em]c3.west);
\draw
[->] ([xshift=0.1em]c3.east) .. controls +(east:0.6) and +(west:0.8) ..([yshift=-0.3em,xshift=-0.1em]f3.west);
}
\visible
<8->
{
\node
[circle,draw,anchor=north,inner sep=3pt,fill=orange!20] (c3) at ([yshift=-2em]t2.south)
{
\tiny
{$
\textbf
{
C
}_
3
$}}
;
\draw
[->] ([xshift=-0.7em]c3.west) -- ([xshift=-0.1em]c3.west);
\draw
[->] ([xshift=0.1em]c3.east) .. controls +(east:0.6) and +(west:0.8) ..([yshift=-0.3em,xshift=-0.1em]f3.west);
}
\visible
<8->
{
\node
[circle,draw,anchor=north,inner sep=3pt,fill=orange!20] (c4) at ([yshift=-2em]t3.south)
{
\tiny
{$
\textbf
{
C
}_
4
$}}
;
\draw
[->] ([xshift=-0.7em]c4.west) -- ([xshift=-0.1em]c4.west);
\draw
[->] ([xshift=0.1em]c4.east) .. controls +(east:0.6) and +(west:0.8) ..([yshift=-0.3em,xshift=-0.1em]f4.west);
}
\visible
<8->
{
\node
[circle,draw,anchor=north,inner sep=3pt,fill=orange!20] (c4) at ([yshift=-2em]t3.south)
{
\tiny
{$
\textbf
{
C
}_
4
$}}
;
\draw
[->] ([xshift=-0.7em]c4.west) -- ([xshift=-0.1em]c4.west);
\draw
[->] ([xshift=0.1em]c4.east) .. controls +(east:0.6) and +(west:0.8) ..([yshift=-0.3em,xshift=-0.1em]f4.west);
}
\end{scope}
\end{tikzpicture}
\end{center}
\end{scope}
\end{tikzpicture}
\end{center}
}
\end{frame}
...
...
@@ -5458,7 +5532,32 @@ x_{l+1} = x_l+\mathcal{F}(x_l)
%%%------------------------------------------------------------------------------------------------------------
%%% open source NMT
\begin{frame}
{
一些开源NMT系统
}
\end{frame}
\begin{itemize}
\item
Tensor2Tensor
\begin{itemize}
\item
Google Brain开发,基于静态图实现
\item
先定义、后运行、速度快、可优化,但是代码中的错误难以发现
\item
https://github.com/tensorflow/tensor2tensor
\end{itemize}
\vspace
{
0.2em
}
\item
Fairseq
\begin{itemize}
\item
Facebook开发,基于动态图实现
\item
灵活,debug方便,更适合自然语言处理
\item
https://github.com/pytorch/fairseq
\end{itemize}
\vspace
{
0.2em
}
\item
NiuTrans.NMT
\begin{itemize}
\item
小牛翻译开发,基于动态图实现
\item
简单小巧,易于修改、C语言编写,代码高度优化
\item
https://github.com/NiuTrans/NiuTensor
\end{itemize}
\vspace
{
0.2em
}
\item
其他优秀的开源NMT系统:OpenNMT、THUMT
\end{itemize}
\end{frame}
%%%------------------------------------------------------------------------------------------------------------
%%% last slide
...
...
编写
预览
Markdown
格式
0%
重试
或
添加新文件
添加附件
取消
您添加了
0
人
到此讨论。请谨慎行事。
请先完成此评论的编辑!
取消
请
注册
或者
登录
后发表评论