Skip to content
项目
群组
代码片段
帮助
当前项目
正在载入...
登录 / 注册
切换导航面板
T
Toy-MT-Introduction
概览
Overview
Details
Activity
Cycle Analytics
版本库
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
问题
0
Issues
0
列表
Board
标记
里程碑
合并请求
0
Merge Requests
0
CI / CD
CI / CD
流水线
作业
日程表
图表
维基
Wiki
代码片段
Snippets
成员
Collapse sidebar
Close sidebar
活动
图像
聊天
创建新问题
作业
提交
Issue Boards
Open sidebar
单韦乔
Toy-MT-Introduction
Commits
d5ac0ef2
Commit
d5ac0ef2
authored
Jan 04, 2020
by
xiaotong
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
new updates
parent
90ed014e
隐藏空白字符变更
内嵌
并排
正在显示
3 个修改的文件
包含
80 行增加
和
407 行删除
+80
-407
Section04-Phrasal-and-Syntactic-Models/section04-test.tex
+14
-363
Section04-Phrasal-and-Syntactic-Models/section04.tex
+20
-2
Section06-Neural-Machine-Translation/section06.tex
+46
-42
没有找到文件。
Section04-Phrasal-and-Syntactic-Models/section04-test.tex
查看文件 @
d5ac0ef2
...
...
@@ -149,373 +149,24 @@
\subsection
{
引入双语句法信息
}
%%%------------------------------------------------------------------------------------------------------------
%%%
树到树规则抽取
\begin{frame}
{
引入双语句法信息
}
%%%
翻译特征
\begin{frame}
{
特征
}
\begin{itemize}
\item
对于树到树模型,源语和目标语端都有句法树,需要使用树片段到树片段的映射来描述翻译过程,这种映射关系被描述为树到树翻译规则。这里,把
\\
\vspace
{
-1.3em
}
\begin{eqnarray}
\langle\ \textrm
{
VP
}
,
\textrm
{
VP
}
\ \rangle
&
\to
&
\langle\ \textrm
{
VP(PP
}_{
1
}
\ \textrm
{
VP(VV(表示) NN
}_{
2
}
)),
\nonumber
\\
&
&
\ \ \textrm
{
VP(VBZ(was) VP(VBZ
}_{
2
}
\ \textrm
{
PP
}_{
1
}
))
\ \rangle
\nonumber
\end{eqnarray}
表示为
\alert
{
树片段到树片段
}
的映射形式
\\
\vspace
{
-1.3em
}
\begin{eqnarray}
&
&
\textrm
{
VP(PP
}_{
1
}
\ \textrm
{
VP(VV(表示) NN
}_{
2
}
))
\nonumber
\\
&
\to
&
\textrm
{
VP(VBZ(was) VP(VBZ
}_{
2
}
\ \textrm
{
PP
}_{
1
}
))
\nonumber
\end{eqnarray}
\item
<2-> 可以通过扩展GHKM方法进行树到树规则抽取
\item
与短语和层次短语模型一样,句法模型也使用判别式模型进行建模 -
$
\textrm
{
P
}
(
d,
\textbf
{
t
}
|
\textbf
{
s
}
)
=
\frac
{
\exp
(
\sum
_{
i
=
1
}^{
M
}
\lambda
_
i
\cdot
h
_
i
(
d,
\textbf
{
s
}
,
\textbf
{
t
}
))
}{
\sum
_{
d',t'
}
\exp
(
\sum
_{
i
=
1
}^{
M
}
\lambda
_
i
\cdot
h
_
i
(
d',
\textbf
{
s
}
,
\textbf
{
t
}
'
))
}$
。其中特征权重
$
\{\lambda
_
i
\}
$
可以使用最小错误率训练进行调优,特征函数
$
\{
h
_
i
\}
$
需要用户定义。
\item
<2-> 这里,所有规则满足
$
\langle\ \alpha
_
h,
\beta
_
h
\ \rangle
\to
\langle\ \alpha
_
r,
\beta
_
r,
\sim\ \rangle
$
的形式
\begin{itemize}
\item
双语端进行可信节点的识别,之后找到节点之间的对应
\item
基于对应的节点获得树片段的对应,即抽取树到树规则
\item
规则组合、SPMT等方法同样适用
\end{itemize}
\end{itemize}
\end{frame}
%%%------------------------------------------------------------------------------------------------------------
%%% 方法1:利用词对齐归纳句法映射
\begin{frame}
{
方法1:利用词对齐归纳树到树规则
}
\begin{itemize}
\item
简单直接的方法是把GHKM方法扩展到双语的情况,利用词对齐归纳树到树映射
\begin{itemize}
\item
<3-> 但是词对齐的错误往往会导致很多规则无法抽取
\end{itemize}
\end{itemize}
\begin{minipage}
[c][5cm][t]
{
0.47
\textwidth
}
\begin{center}
\begin{tikzpicture}
\begin{scope}
\begin{scope}
[scale=0.65, level distance=27pt]
\Tree
[.S
[.NP
[.DT
\node
(ew1)
{
the
}
; ]
[.NNS
\node
(ew2)
{
imports
}
; ]
]
[.VP
[.VBZ
\node
(ew3)
{
have
}
; ]
[.ADVP
[.RB
\node
(ew4)
{
drastically
}
; ]
[.VBN
\node
(ew5)
{
fallen
}
; ]
]
]
]
\end{scope}
\begin{scope}
[scale=0.65, level distance=27pt, grow'=up, xshift=-13pt, yshift=-3.5in, sibling distance=22pt]
\Tree
[.IP
[.NN
\node
(cw1)
{
进口
}
; ]
[.VP
[.AD
\node
(cw2)
{
大幅度
}
; ]
[.VP
[.VV
\node
(cw3)
{
下降
}
; ]
[.AS
\node
(cw4)
{
了
}
; ]
]
]
]
\end{scope}
\visible
<2->
{
\draw
[-, dashed]
(cw1) -- (ew2);
\draw
[-, dashed]
(cw2) -- (ew4);
\draw
[-, dashed]
(cw3) -- (ew5);
\draw
[-, dashed]
(cw4) .. controls +(north:1.0) and +(south:1.6) .. (ew1);
}
\visible
<3->
{
\draw
[-, red, dashed,thick]
(cw4) .. controls +(north:1.0) and +(south:1.6) .. (ew1);
}
\end{scope}
\end{tikzpicture}
\end{center}
\end{minipage}
\begin{minipage}
[c][5cm][t]
{
0.50
\textwidth
}
\visible
<2->
{
\begin{tabular}
{
l l
}
\multicolumn
{
2
}{
l
}{
\textbf
{
\scriptsize
{
抽取得到的规则
}}}
\\
\hline
\scriptsize
{$
r
_
1
$}
&
\scriptsize
{
AS(了)
$
\rightarrow
$
DT(the)
}
\\
\scriptsize
{$
r
_
2
$}
&
\scriptsize
{
NN(进口)
$
\rightarrow
$
NNS(imports)
}
\\
\scriptsize
{$
r
_
3
$}
&
\scriptsize
{
AD(大幅度)
$
\rightarrow
$
RB(drastically)
}
\\
\scriptsize
{$
r
_
4
$}
&
\scriptsize
{
VV(下降)
$
\rightarrow
$
VBN(fallen)
}
\\
\scriptsize
{$
r
_
5
$}
&
\scriptsize
{
IP(NN
$_
1
$
VP(AD
$_
2
$
VP(VV
$_
3
$
AS
$_
4
$
))
$
\rightarrow
$}
\\
\multicolumn
{
2
}{
l
}{
\tiny
{
S(NP(DT
$_
4
$
NNS
$_
1
$
) VP(VBZ(have) ADVP(RB
$_
2
$
VBN
$_
3
$
))
}}
\\
\end{tabular}
}
\visible
<3->
{
\vspace
{
0.5em
}
\begin{tabular}
{
l l
}
\multicolumn
{
2
}{
l
}{
\textbf
{
\scriptsize
{
无法得到的规则
}}}
\\
\hline
\scriptsize
{$
r
_{
?
}$}
&
\scriptsize
{
AS(了)
$
\rightarrow
$
VBZ(have)
}
\\
\scriptsize
{$
r
_{
?
}$}
&
\scriptsize
{
NN(进口)
$
\rightarrow
$}
\\
&
\scriptsize
{
NP(DT(the) NNS(imports))
}
\\
\scriptsize
{$
r
_{
?
}$}
&
\scriptsize
{
IP(NN
$_
1
$
VP
$_
2
$
)
$
\rightarrow
$
S(NP
$_
1
$
VP
$_
2
$
)
}
\\
\end{tabular}
}
\end{minipage}
\end{frame}
%%%------------------------------------------------------------------------------------------------------------
%%% 方法2:直接进行节点对齐然后归纳句法映射
\begin{frame}
{
方法2:利用节点对齐抽取树到树规则
}
\begin{itemize}
\item
另一种思路是直接获取源语言树节点到目标语树节点的对应关系,然后直接抽取规则,这样可避免词对齐错误
\begin{itemize}
\item
节点对其可以更准确的捕捉双语结构的对应
\end{itemize}
\end{itemize}
\begin{minipage}
[c][5cm][t]
{
0.47
\textwidth
}
\begin{center}
\begin{tikzpicture}
\only
<1>
{
\begin{scope}
\begin{scope}
[scale=0.65, level distance=27pt]
\Tree
[.S
[.NP
[.DT
\node
(ew1)
{
the
}
; ]
[.NNS
\node
(ew2)
{
imports
}
; ]
]
[.VP
[.VBZ
\node
(ew3)
{
have
}
; ]
[.ADVP
[.RB
\node
(ew4)
{
drastically
}
; ]
[.VBN
\node
(ew5)
{
fallen
}
; ]
]
]
]
\end{scope}
\begin{scope}
[scale=0.65, level distance=27pt, grow'=up, xshift=-13pt, yshift=-3.5in, sibling distance=22pt]
\Tree
[.IP
[.NN
\node
(cw1)
{
进口
}
; ]
[.VP
[.AD
\node
(cw2)
{
大幅度
}
; ]
[.VP
[.VV
\node
(cw3)
{
下降
}
; ]
[.AS
\node
(cw4)
{
了
}
; ]
]
]
]
\end{scope}
\draw
[-, dashed]
(cw1) -- (ew2);
\draw
[-, dashed]
(cw2) -- (ew4);
\draw
[-, dashed]
(cw3) -- (ew5);
\draw
[-, dashed]
(cw4) .. controls +(north:1.0) and +(south:1.6) .. (ew1);
\end{scope}
}
\begin{scope}
\visible
<2->
{
\begin{scope}
[scale=0.65, level distance=27pt]
\Tree
[.\node[draw]
(en1)
{
S
}
;
[.
\node
[draw]
(en2)
{
NP
}
;
[.DT the ]
[.NNS imports ]
]
[.
\node
[draw]
(en3)
{
VP
}
;
[.
\node
[draw]
(en4)
{
VBZ
}
; have ]
[.ADVP
[.
\node
[draw]
(en5)
{
RB
}
; drastically ]
[.
\node
[draw]
(en6)
{
VBN
}
; fallen ]
]
]
]
\end{scope}
\begin{scope}
[scale=0.65, level distance=27pt, grow'=up, xshift=-13pt, yshift=-3.5in, sibling distance=22pt]
\Tree
[.\node[draw]
(cn1)
{
\ \
IP
\ \
}
;
[.
\node
[draw]
(cn2)
{
NN
}
; 进口 ]
[.
\node
[draw]
(cn3)
{
VP
}
;
[.
\node
[draw]
(cn4)
{
AD
}
; 大幅度 ]
[.VP
[.
\node
[draw]
(cn5)
{
VV
}
; 下降 ]
[.
\node
[draw]
(cn6)
{
AS
}
; 了 ]
]
]
]
\end{scope}
}
\visible
<3->
{
\draw
[latex-latex, dotted, thick, red]
(cn4.east) .. controls +(east:0.5) and +(west:0.5) .. (en5.west);
\draw
[latex-latex, dotted, thick, red]
(cn5.east) .. controls +(east:0.5) and +(south:0.5) .. (en6.south west);
\draw
[latex-latex, dotted, thick, red]
(cn6.north west) .. controls +(north:1.5) and +(south:2.5) .. (en4.south west);
\draw
[latex-latex, dotted, thick, red]
(cn3.north west) -- (en3.south west);
\draw
[latex-latex, dotted, thick, red]
(cn2.west) .. controls +(west:0.6) and +(west:0.6) .. (en2.west);
\draw
[latex-latex, dotted, thick, red]
(cn1.north west) .. controls +(north:4) and +(south:5.5) .. (en1.south west);
}
\end{scope}
\end{tikzpicture}
\end{center}
\end{minipage}
\begin{minipage}
[c][5cm][t]
{
0.50
\textwidth
}
\only
<1>
{
\begin{tabular}
{
l l
}
\multicolumn
{
2
}{
l
}{
\textbf
{
\scriptsize
{
抽取得到的规则(词对齐)
}}}
\\
\hline
\scriptsize
{$
r
_
1
$}
&
\scriptsize
{
AS(了)
$
\rightarrow
$
DT(the)
}
\\
\scriptsize
{$
r
_
2
$}
&
\scriptsize
{
NN(进口)
$
\rightarrow
$
NNS(imports)
}
\\
\scriptsize
{$
r
_
3
$}
&
\scriptsize
{
AD(大幅度)
$
\rightarrow
$
RB(drastically)
}
\\
\scriptsize
{$
r
_
4
$}
&
\scriptsize
{
VV(下降)
$
\rightarrow
$
VBN(fallen)
}
\\
\scriptsize
{$
r
_
5
$}
&
\scriptsize
{
IP(NN
$_
1
$
VP(AD
$_
2
$
VP(VV
$_
3
$
AS
$_
4
$
))
$
\rightarrow
$}
\\
\multicolumn
{
2
}{
l
}{
\tiny
{
S(NP(DT
$_
4
$
NNS
$_
1
$
) VP(VBZ(have) ADVP(RB
$_
2
$
VBN
$_
3
$
))
}}
\\
\end{tabular}
}
\visible
<4->
{
\begin{tabular}
{
l l
}
\multicolumn
{
2
}{
l
}{
\textbf
{
\scriptsize
{
抽取得到的规则(子树对齐)
}}}
\\
\hline
{
\color
{
gray!70
}
\scriptsize
{$
r
_
1
$}}
&
{
\color
{
gray!70
}
\scriptsize
{
AS(了)
$
\rightarrow
$
DT(the)
}}
\\
{
\color
{
gray!70
}
\scriptsize
{$
r
_
2
$}}
&
{
\color
{
gray!70
}
\scriptsize
{
NN(进口)
$
\rightarrow
$
NNS(imports)
}}
\\
\scriptsize
{$
r
_
3
$}
&
\scriptsize
{
AD(大幅度)
$
\rightarrow
$
RB(drastically)
}
\\
\scriptsize
{$
r
_
4
$}
&
\scriptsize
{
VV(下降)
$
\rightarrow
$
VBN(fallen)
}
\\
{
\color
{
gray!70
}
\scriptsize
{$
r
_
5
$}}
&
{
\color
{
gray!70
}
\scriptsize
{
IP(NN
$_
1
$
VP(AD
$_
2
$
VP(VV
$_
3
$
AS
$_
4
$
))
$
\rightarrow
$}}
\\
\multicolumn
{
2
}{
l
}{{
\color
{
gray!70
}
\tiny
{
S(NP(DT
$_
4
$
NNS
$_
1
$
) VP(VBZ(have) ADVP(RB
$_
2
$
VBN
$_
3
$
))
}}}
\\
\alert
{
\scriptsize
{$
r
_
6
$}}
&
\alert
{
\scriptsize
{
AS(了)
$
\rightarrow
$
VBZ(have)
}}
\\
\alert
{
\scriptsize
{$
r
_
7
$}}
&
\alert
{
\scriptsize
{
NN(进口)
$
\rightarrow
$
}}
\\
&
\alert
{
\scriptsize
{
NP(DT(the) NNS(imports))
}}
\\
\alert
{
\scriptsize
{$
r
_
8
$}}
&
\alert
{
\scriptsize
{
VP(AD
$_
1
$
VP(VV
$_
2
$
AS
$_
3
$
))
$
\rightarrow
$}}
\\
&
\alert
{
\scriptsize
{
VP(VBZ
$_
3
$
ADVP(RB
$_
1
$
VBN
$_
2
$
)
}}
\\
\alert
{
\scriptsize
{$
r
_
9
$}}
&
\alert
{
\scriptsize
{
IP(NN
$_
1
$
VP
$_
2
$
)
$
\rightarrow
$
S(NP
$_
1
$
VP
$_
2
$
)
}}
\\
\end{tabular}
}
\end{minipage}
\end{frame}
%%%------------------------------------------------------------------------------------------------------------
%%% 抽取更多的规则:节点对齐矩阵
\begin{frame}
{
节点对齐矩阵
}
\begin{itemize}
\item
节点对齐的自动获取:1)基于分类模型的方法;2)无指导节点对齐的方法
\item
使用节点对齐的另一个好处是,我们可以直接用节点对齐矩阵进行规则抽取,而不是用单一的对齐结果
\begin{itemize}
\item
对齐矩阵可以帮助抽取更多样的规则
\item
$
\alpha
_
h
$
和
$
\beta
_
h
$
是规则左部的源语和目标语部分,对应树结构的根节点
\item
$
\alpha
_
r
$
和
$
\beta
_
r
$
是规则右部的源语和目标语部分,对应树结构
\item
$
\sim
$
表示
$
\alpha
_
r
$
和
$
\beta
_
r
$
中叶子非终结符的对应
\item
此外,定义
$
r
(
\alpha
_
r
)
$
和
$
r
(
\beta
_
r
)
$
为源语和目标语树结构的叶子节点序列。例如,对于规则
$
\langle\ \textrm
{
VP
}
,
\textrm
{
VP
}
\ \rangle
\to
\langle\ \textrm
{
VP
(
PP
}_{
1
}
\ \textrm
{
VP
(
VV
(
表示
)
NN
}_{
2
}
))
,
\textrm
{
VP
(
VBZ
(
was
)
VP
(
VBZ
}_{
2
}
\ \textrm
{
PP
}_{
1
}
))
$
,有
\\
\vspace
{
-1.5em
}
\begin{eqnarray}
r(
\alpha
_
r)
&
=
&
\textrm
{
PP
}_
1
\ \textrm
{
表示 NN
}_
2
\nonumber
\\
r(
\beta
_
r)
&
=
&
\textrm
{
was
}
\ \textrm
{
VBZ
}_
2
\ \textrm
{
PP
}_
1
\nonumber
\end{eqnarray}
\end{itemize}
\end{itemize}
\vspace
{
-0.2em
}
\centering
\begin{tikzpicture}
\begin{scope}
[scale=0.7]
\begin{scope}
[sibling distance=17pt, level distance=25pt]
\Tree
[.\node(en1){VP$^{[1]
}$}
;
[
.
\node
(
en
2
)
{
VBZ
$^{
[2]
}$}
; have
]
[
.
\node
(
en
3
)
{
ADVP
$^{
[3]
}$}
;
[
.
\node
(
en
4
)
{
RB
$^{
[4]
}$}
; drastically
]
[
.
\node
(
en
5
)
{
VBN
$^{
[5]
}$}
; fallen
]
]
]
\end
{
scope
}
\begin
{
scope
}
[
grow'
=
up, yshift
=-
2
.
7
in, sibling distance
=
32
pt, level distance
=
25
pt
]
\Tree
[
.
\node
(
cn
1
)
{
VP
$^{
[1]
}$}
;
[
.
\node
(
cn
2
)
{
AD
$^{
[2]
}$}
; 大幅度
]
[
.
\node
(
cn
3
)
{
VP
$^{
[3]
}$}
;
[
.
\node
(
cn
4
)
{
VV
$^{
[4]
}$}
; 下降
]
[
.
\node
(
cn
5
)
{
AS
$^{
[5]
}$}
; 了
]
]
]
\end
{
scope
}
\begin
{
scope
}
[
xshift
=
1
.
7
in, yshift
=-
0
.
4
in
]
\node
[
anchor
=
west, rotate
=
60
]
at
(
0
.
8
,
-
0
.
6
)
{
VP
$^{
[1]
}$}
;
\node
[
anchor
=
west, rotate
=
60
]
at
(
1
.
8
,
-
0
.
6
)
{
VBZ
$^{
[2]
}$}
;
\node
[
anchor
=
west, rotate
=
60
]
at
(
2
.
8
,
-
0
.
6
)
{
ADVP
$^{
[3]
}$}
;
\node
[
anchor
=
west, rotate
=
60
]
at
(
3
.
8
,
-
0
.
6
)
{
RB
$^{
[4]
}$}
;
\node
[
anchor
=
west, rotate
=
60
]
at
(
4
.
8
,
-
0
.
6
)
{
VBN
$^{
[5]
}$}
;
\node
[]
at
(
6
.
2
,
-
1
)
{
VP
$^{
[1]
}$}
;
\node
[]
at
(
6
.
2
,
-
2
)
{
AD
$^{
[2]
}$}
;
\node
[]
at
(
6
.
2
,
-
3
)
{
VP
$^{
[3]
}$}
;
\node
[]
at
(
6
.
2
,
-
4
)
{
VV
$^{
[4]
}$}
;
\node
[]
at
(
6
.
2
,
-
5
)
{
AS
$^{
[5]
}$}
;
\foreach
\i
in
{
1
,...,
5
}{
\foreach
\j
in
{
-
5
,...,
-
1
}{
\node
[
fill
=
blue,scale
=
0
.
2
]
at
(
\i
,
\j
)
{}
;
}
}
\visible
<
2
-
3
>
{
\node
[
fill
=
blue, scale
=
1
.
2
]
at
(
1
,
-
1
)
{}
;
\node
[
fill
=
blue, scale
=
1
.
2
]
at
(
4
,
-
2
)
{}
;
\node
[
fill
=
blue, scale
=
1
.
2
]
at
(
2
,
-
5
)
{}
;
}
\visible
<
2
>
{
\node
[
fill
=
blue, scale
=
1
.
2
]
at
(
5
,
-
4
)
{}
;
}
\visible
<
3
>
{
\node
[
fill
=
red, scale
=
1
.
2
]
at
(
5
,
-
4
)
{}
;
}
\visible
<
4
-
5
>
{
\node
[
fill
=
blue, scale
=
1
.
1
]
at
(
1
,
-
1
)
{}
;
\node
[
fill
=
blue, scale
=
0
.
5
]
at
(
1
,
-
3
)
{}
;
\node
[
fill
=
blue, scale
=
0
.
6
]
at
(
2
,
-
2
)
{}
;
\node
[
fill
=
blue, scale
=
0
.
7
]
at
(
2
,
-
3
)
{}
;
\node
[
fill
=
blue, scale
=
0
.
7
]
at
(
2
,
-
5
)
{}
;
\node
[
fill
=
blue, scale
=
0
.
4
]
at
(
3
,
-
1
)
{}
;
\node
[
fill
=
blue, scale
=
0
.
6
]
at
(
3
,
-
2
)
{}
;
\node
[
fill
=
blue, scale
=
0
.
5
]
at
(
3
,
-
3
)
{}
;
\node
[
fill
=
blue, scale
=
0
.
9
]
at
(
4
,
-
2
)
{}
;
\node
[
fill
=
blue, scale
=
0
.
7
]
at
(
5
,
-
3
)
{}
;
\node
[
fill
=
blue, scale
=
0
.
4
]
at
(
5
,
-
5
)
{}
;
}
\visible
<
4
>
{
\node
[
fill
=
blue, scale
=
0
.
6
]
at
(
3
,
-
4
)
{}
;
\node
[
fill
=
blue, scale
=
0
.
8
]
at
(
5
,
-
4
)
{}
;
}
\visible
<
5
>
{
\node
[
fill
=
red, scale
=
0
.
6
]
at
(
3
,
-
4
)
{}
;
\node
[
fill
=
red, scale
=
0
.
8
]
at
(
5
,
-
4
)
{}
;
}
\visible
<
2
-
3
>
{
\node
[]
at
(
4
,
-
5
.
8
)
{
\footnotesize
{{
\color
{
blue
}
$
\blacksquare
$}
=
extractable node
-
pair
}}
;
}
\visible
<
4
-
5
>
{
\node
[]
at
(
4
,
-
5
.
8
)
{
\footnotesize
{{
\color
{
blue
}
$
\blacksquare
$}
=
possible alignment
}}
;
}
\end
{
scope
}
\visible
<
3
>
{
\draw
[
<
-
>, red, thick
]
(
cn
4
.east
)
.. controls
+(
east:
0
.
9
)
and
+(
west:
0
.
9
)
..
(
en
5
.west
)
;
}
\visible
<
5
>
{
\draw
[
<
-
>, red, dotted, very thick
]
(
cn
4
.east
)
.. controls
+(
east:
0
.
9
)
and
+(
west:
0
.
9
)
..
(
en
5
.west
)
;
}
\visible
<
5
>
{
\draw
[
<
-
>, red, dotted, very thick
]
(
cn
4
.west
)
.. controls
+(
west:
1
.
0
)
and
+(
west:
2
)
..
(
en
3
.west
)
;
}
\end
{
scope
}
\end
{
tikzpicture
}
\end{frame}
%%%------------------------------------------------------------------------------------------------------------
...
...
Section04-Phrasal-and-Syntactic-Models/section04.tex
查看文件 @
d5ac0ef2
...
...
@@ -4287,6 +4287,8 @@ NP-BAR(NN$_1$ NP-BAR$_2$) $\to$ NN$_1$ NP-BAR$_2$
\end
{
scope
}
\begin
{
scope
}
[
xshift
=
1
.
7
in, yshift
=-
0
.
4
in
]
{
\footnotesize
\node
[
anchor
=
west, rotate
=
60
]
at
(
0
.
8
,
-
0
.
6
)
{
VP
$^{
[1]
}$}
;
\node
[
anchor
=
west, rotate
=
60
]
at
(
1
.
8
,
-
0
.
6
)
{
VBZ
$^{
[2]
}$}
;
\node
[
anchor
=
west, rotate
=
60
]
at
(
2
.
8
,
-
0
.
6
)
{
ADVP
$^{
[3]
}$}
;
...
...
@@ -4298,6 +4300,7 @@ NP-BAR(NN$_1$ NP-BAR$_2$) $\to$ NN$_1$ NP-BAR$_2$
\node
[]
at
(
6
.
2
,
-
3
)
{
VP
$^{
[3]
}$}
;
\node
[]
at
(
6
.
2
,
-
4
)
{
VV
$^{
[4]
}$}
;
\node
[]
at
(
6
.
2
,
-
5
)
{
AS
$^{
[5]
}$}
;
}
\foreach
\i
in
{
1
,...,
5
}{
\foreach
\j
in
{
-
5
,...,
-
1
}{
...
...
@@ -4370,8 +4373,23 @@ NP-BAR(NN$_1$ NP-BAR$_2$) $\to$ NN$_1$ NP-BAR$_2$
%%%------------------------------------------------------------------------------------------------------------
%%% 翻译特征
\begin
{
frame
}{
翻译特征
}
% NiuTrans Manual
\begin
{
frame
}{
特征
}
\begin
{
itemize
}
\item
与短语和层次短语模型一样,句法模型也使用判别式模型进行建模
-
$
\textrm
{
P
}
(d,
\textbf
{
t
}
|
\textbf
{
s
}
) =
\frac
{
\exp
(
\sum
_{
i=1
}^{
M
}
\lambda
_
i
\cdot
h
_
i(d,
\textbf
{
s
}
,
\textbf
{
t
}
))
}{
\sum
_{
d',t'
}
\exp
(
\sum
_{
i=1
}^{
M
}
\lambda
_
i
\cdot
h
_
i(d',
\textbf
{
s
}
,
\textbf
{
t
}
'))
}$
。其中特征权重
$
\{\lambda
_
i
\}
$
可以使用最小错误率训练进行调优,特征函数
$
\{
h
_
i
\}
$
需要用户定义。
\item
<
2
-
> 这里,所有规则满足
$
\langle\ \alpha
_
h,
\beta
_
h
\ \rangle
\to
\langle\ \alpha
_
r,
\beta
_
r,
\sim\ \rangle
$
的形式
\begin
{
itemize
}
\item
$
\alpha
_
h
$
和
$
\beta
_
h
$
是规则左部的源语和目标语部分,对应树结构的根节点
\item
$
\alpha
_
r
$
和
$
\beta
_
r
$
是规则右部的源语和目标语部分,对应树结构
\item
$
\sim
$
表示
$
\alpha
_
r
$
和
$
\beta
_
r
$
中叶子非终结符的对应
\item
此外,定义
$
r(
\alpha
_
r)
$
和
$
r(
\beta
_
r)
$
为源语和目标语树结构的叶子节点序列。例如,对于规则
$
\langle\ \textrm
{
VP
}
,
\textrm
{
VP
}
\ \rangle
\to
\langle\ \textrm
{
VP(PP
}_{
1
}
\ \textrm
{
VP(VV(表示) NN
}_{
2
}
)),
\textrm
{
VP(VBZ(was) VP(VBZ
}_{
2
}
\ \textrm
{
PP
}_{
1
}
))
$
,有
\\
\vspace
{
-
1
.
5
em
}
\begin
{
eqnarray
}
r
(
\alpha
_
r
)
&
=
&
\textrm
{
PP
}_
1
\ \textrm
{
表示 NN
}_
2
\nonumber
\\
r
(
\beta
_
r
)
&
=
&
\textrm
{
was
}
\ \textrm
{
VBZ
}_
2
\ \textrm
{
PP
}_
1
\nonumber
\end
{
eqnarray
}
\end
{
itemize
}
\end
{
itemize
}
\end
{
frame
}
%%%------------------------------------------------------------------------------------------------------------
...
...
Section06-Neural-Machine-Translation/section06.tex
查看文件 @
d5ac0ef2
...
...
@@ -1148,10 +1148,10 @@ NLP问题的隐含结构假设 & 无隐含结构假设,端到端学习 \\
\hspace*
{
-0.6cm
}
\begin{tikzpicture}
\setlength
{
\base
}{
0.9cm
}
\tikzstyle
{
rnnnode
}
= [rounded corners=1pt,minimum height=0.5
\base
,minimum width=1
\base
,draw,inner sep=0pt,outer sep=0pt]
\tikzstyle
{
wordnode
}
= [font=
\tiny
]
% RNN translation model
\begin{scope}
[local bounding box=RNNMT]
% RNN Encoder
...
...
@@ -1165,11 +1165,11 @@ NLP问题的隐含结构假设 & 无隐含结构假设,端到端学习 \\
\node
[rnnnode,fill=purple!30!white]
(enclabel3) at (enc3)
{
\tiny
{$
\textbf
{
h
}_{
m
}$}}
;
\node
[wordnode,left=0.4\base of enc1]
(init1)
{$
\cdots
$}
;
\node
[wordnode,left=0.4\base of eemb1]
(init2)
{$
\cdots
$}
;
\node
[wordnode,below=0pt of eemb1]
()
{
走
}
;
\node
[wordnode,below=0pt of eemb2]
()
{
吗
}
;
\node
[wordnode,below=0pt of eemb3]
()
{$
\langle
$
eos
$
\rangle
$}
;
% RNN Decoder
\foreach
\x
in
{
1,2,...,3
}
\node
[rnnnode,minimum height=0.5\base,fill=green!30!white,anchor=south]
(demb
\x
) at ([yshift=
\base
]enc
\x
.north)
{
\tiny
{$
e
_
y
()
$}}
;
...
...
@@ -1180,7 +1180,7 @@ NLP问题的隐含结构假设 & 无隐含结构假设,端到端学习 \\
\node
[wordnode,right=0.4\base of demb3]
(end1)
{$
\cdots
$}
;
\node
[wordnode,right=0.4\base of dec3]
(end2)
{$
\cdots
$}
;
\node
[wordnode,right=0.4\base of softmax3]
(end3)
{$
\cdots
$}
;
% Decoder input words
\node
[wordnode,below=0pt of demb1]
(decwordin)
{$
\langle
$
sos
$
\rangle
$}
;
\ExtractX
{$
(
demb
2
.south
)
$}
...
...
@@ -1189,7 +1189,7 @@ NLP问题的隐含结构假设 & 无隐含结构假设,端到端学习 \\
\ExtractX
{$
(
demb
3
.south
)
$}
\ExtractY
{$
(
decwordin.base
)
$}
\node
[wordnode,anchor=base]
() at (
\XCoord
,
\YCoord
)
{
you
}
;
% Decoder output words
\node
[wordnode,above=0pt of softmax1]
(decwordout)
{
Do
}
;
\ExtractX
{$
(
softmax
2
.north
)
$}
...
...
@@ -1198,7 +1198,7 @@ NLP问题的隐含结构假设 & 无隐含结构假设,端到端学习 \\
\ExtractX
{$
(
softmax
3
.north
)
$}
\ExtractY
{$
(
decwordout.base
)
$}
\node
[wordnode,anchor=base]
() at (
\XCoord
,
\YCoord
)
{
know
}
;
% Connections
\draw
[-latex']
(init1.east) to (enc1.west);
\draw
[-latex']
(dec3.east) to (end2.west);
...
...
@@ -1213,11 +1213,11 @@ NLP问题的隐含结构假设 & 无隐含结构假设,端到端学习 \\
\draw
[-latex']
(enc
\x
.east) to (enc
\y
.west);
\draw
[-latex']
(dec
\x
.east) to (dec
\y
.west);
}
\coordinate
(bridge) at ([yshift=0.4
\base
]enc2.north west);
\draw
[-latex']
(enc3.north) .. controls +(north:0.3
\base
) and +(east:
\base
) .. (bridge) .. controls +(west:2.7
\base
) and +(west:0.3
\base
) .. (dec1.west);
\end{scope}
\begin{scope}
\coordinate
(start) at (5.8
\base
,0.3
\base
);
\visible
<2->
{
...
...
@@ -1240,9 +1240,9 @@ NLP问题的隐含结构假设 & 无隐含结构假设,端到端学习 \\
-
2
&
.
3
&
\cdots
&
.
1
\end
{
bmatrix
}
$}}
;
\draw
[decorate,decoration=
{
brace,mirror
}
] ([shift=
{
(6pt,2pt)
}
]mat.south west) to node [auto,swap,font=
\scriptsize
]
{
词嵌入矩阵
}
([shift=
{
(-6pt,2pt)
}
]mat.south east);
\visible
<3->
{
\draw
[-latex'] ([xshift=-2pt,yshift=-0.65cm]one.east) to ([yshift=-0.65cm]words.west);
}
...
...
@@ -1251,10 +1251,10 @@ NLP问题的隐含结构假设 & 无隐含结构假设,端到端学习 \\
}
\draw
[-latex'] ([yshift=-0.4cm]w.south) to ([yshift=2pt]w.south);
\node
[anchor=north] (wlabel) at ([yshift=-0.6em]w.south)
{
\scriptsize
{
输入的单词
}}
;
\node
[draw=ugreen,densely dashed,thick,rounded corners=3pt,fit=(one) (words) (mat) (w)] (input)
{}
;
\end{scope}
\draw
[->,thick,densely dashed,ugreen] ([yshift=-0.2em]demb3.east) to [out=0,in=180] ([yshift=-1cm]input.west);
\end{tikzpicture}
\end{center}
...
...
@@ -1275,10 +1275,10 @@ NLP问题的隐含结构假设 & 无隐含结构假设,端到端学习 \\
\hspace*
{
-0.6cm
}
\begin{tikzpicture}
\setlength
{
\base
}{
0.9cm
}
\tikzstyle
{
rnnnode
}
= [rounded corners=1pt,minimum height=0.5
\base
,minimum width=1
\base
,draw,inner sep=0pt,outer sep=0pt]
\tikzstyle
{
wordnode
}
= [font=
\tiny
]
% RNN translation model
\begin{scope}
[local bounding box=RNNMT]
% RNN Encoder
...
...
@@ -1292,11 +1292,11 @@ NLP问题的隐含结构假设 & 无隐含结构假设,端到端学习 \\
\node
[rnnnode,fill=purple!30!white]
(enclabel3) at (enc3)
{
\tiny
{$
\textbf
{
h
}_{
m
}$}}
;
\node
[wordnode,left=0.4\base of enc1]
(init1)
{$
\cdots
$}
;
\node
[wordnode,left=0.4\base of eemb1]
(init2)
{$
\cdots
$}
;
\node
[wordnode,below=0pt of eemb1]
()
{
走
}
;
\node
[wordnode,below=0pt of eemb2]
()
{
吗
}
;
\node
[wordnode,below=0pt of eemb3]
()
{$
\langle
$
eos
$
\rangle
$}
;
% RNN Decoder
\foreach
\x
in
{
1,2,...,3
}
\node
[rnnnode,minimum height=0.5\base,fill=green!30!white,anchor=south]
(demb
\x
) at ([yshift=
\base
]enc
\x
.north)
{
\tiny
{$
e
_
y
()
$}}
;
...
...
@@ -1307,7 +1307,7 @@ NLP问题的隐含结构假设 & 无隐含结构假设,端到端学习 \\
\node
[wordnode,right=0.4\base of demb3]
(end1)
{$
\cdots
$}
;
\node
[wordnode,right=0.4\base of dec3]
(end2)
{$
\cdots
$}
;
\node
[wordnode,right=0.4\base of softmax3]
(end3)
{$
\cdots
$}
;
% Decoder input words
\node
[wordnode,below=0pt of demb1]
(decwordin)
{$
\langle
$
sos
$
\rangle
$}
;
\ExtractX
{$
(
demb
2
.south
)
$}
...
...
@@ -1316,7 +1316,7 @@ NLP问题的隐含结构假设 & 无隐含结构假设,端到端学习 \\
\ExtractX
{$
(
demb
3
.south
)
$}
\ExtractY
{$
(
decwordin.base
)
$}
\node
[wordnode,anchor=base]
() at (
\XCoord
,
\YCoord
)
{
you
}
;
% Decoder output words
\node
[wordnode,above=0pt of softmax1]
(decwordout)
{
Do
}
;
\ExtractX
{$
(
softmax
2
.north
)
$}
...
...
@@ -1325,7 +1325,7 @@ NLP问题的隐含结构假设 & 无隐含结构假设,端到端学习 \\
\ExtractX
{$
(
softmax
3
.north
)
$}
\ExtractY
{$
(
decwordout.base
)
$}
\node
[wordnode,anchor=base]
() at (
\XCoord
,
\YCoord
)
{
know
}
;
% Connections
\draw
[-latex']
(init1.east) to (enc1.west);
\draw
[-latex']
(dec3.east) to (end2.west);
...
...
@@ -1340,20 +1340,20 @@ NLP问题的隐含结构假设 & 无隐含结构假设,端到端学习 \\
\draw
[-latex']
(enc
\x
.east) to (enc
\y
.west);
\draw
[-latex']
(dec
\x
.east) to (dec
\y
.west);
}
\coordinate
(bridge) at ([yshift=0.4
\base
]enc2.north west);
\draw
[-latex']
(enc3.north) .. controls +(north:0.3
\base
) and +(east:
\base
) .. (bridge) .. controls +(west:2.7
\base
) and +(west:0.3
\base
) .. (dec1.west);
\end{scope}
\begin{scope}
\coordinate
(start) at (8.5
\base
,0.1
\base
);
\node
[anchor=center,minimum width=5.7em,minimum height=1.3em,draw,rounded corners=0.3em] (hidden) at (start)
{}
;
\node
[anchor=west,minimum width=1em,minimum size=1em,fill=ugreen!20] (cell01) at ([xshift=0.2em]hidden.west)
{
\scriptsize
{
.2
}}
;
\node
[anchor=west,minimum width=1em,minimum size=1em,fill=ugreen!10] (cell02) at (cell01.east)
{
\scriptsize
{
-1
}}
;
\node
[anchor=west,minimum width=1em,minimum size=1em,fill=white] (cell03) at (cell02.east)
{
\scriptsize
{$
\cdots
$}}
;
\node
[anchor=west,minimum width=1em,minimum size=1em,fill=ugreen!50] (cell04) at (cell03.east)
{
\scriptsize
{
5
}}
;
\visible
<2->
{
\node
[anchor=south,minimum width=10.9em,minimum height=1.3em,draw,rounded corners=0.3em] (target) at ([yshift=1.5em]hidden.north)
{}
;
\node
[anchor=west,minimum width=1em,minimum size=1em,fill=ugreen!10] (cell11) at ([xshift=0.2em]target.west)
{
\scriptsize
{
-2
}}
;
...
...
@@ -1365,7 +1365,7 @@ NLP问题的隐含结构假设 & 无隐含结构假设,端到端学习 \\
\node
[anchor=west,minimum width=1em,minimum size=1em,fill=ugreen!10] (cell17) at (cell16.east)
{
\scriptsize
{
-1
}}
;
\node
[anchor=west,minimum width=1em,minimum size=1em,fill=ugreen!20] (cell18) at (cell17.east)
{
\scriptsize
{
.2
}}
;
}
\visible
<3->
{
\node
[anchor=south,minimum width=1em,minimum height=0.2em,fill=ublue!80,inner sep=0pt] (label1) at ([yshift=2.5em]cell11.north)
{}
;
\node
[anchor=west,rotate=90,font=
\tiny
] (w1) at (label1.north)
{$
\langle
$
eos
$
\rangle
$}
;
...
...
@@ -1389,15 +1389,15 @@ NLP问题的隐含结构假设 & 无隐含结构假设,端到端学习 \\
\node
[anchor=south,minimum width=1em,minimum height=0.4em,fill=ublue!80,inner sep=0pt] (label8) at ([yshift=2.5em]cell18.north)
{}
;
\node
[anchor=west,rotate=90,font=
\tiny
] (w8) at (label8.north)
{
have
}
;
}
\visible
<2->
{
\filldraw
[fill=red!20,draw=white] (target.south west) -- (target.south east) -- ([xshift=-0.2em,yshift=0.1em]hidden.north east) -- ([xshift=0.2em,yshift=0.1em]hidden.north west);
\draw
[->,thick] ([xshift=0.2em,yshift=0.1em]hidden.north west) -- (target.south west);
\draw
[->,thick] ([xshift=-0.2em,yshift=0.1em]hidden.north east) -- (target.south east);
\node
[anchor=south] () at ([yshift=0.3em]hidden.north)
{
\scriptsize
{$
\hat
{
s
}
=
Ws
$}}
;
}
\visible
<3->
{
\node
[rounded corners=0.3em] (softmax) at ([yshift=1.25em]target.north)
{
\scriptsize
{$
p
(
\hat
{
s
}_
i
)=
\frac
{
e
^{
\hat
{
s
}_
i
}}{
\sum
_
j e
^{
\hat
{
s
}_
j
}}$}}
;
\filldraw
[fill=blue!20,draw=white] ([yshift=0.1em]cell11.north west)
{
[rounded corners=0.3em] -- (softmax.west)
}
-- (label1.south west) -- (label8.south east)
{
[rounded corners=0.3em] -- (softmax.east)
}
-- ([yshift=0.1em]cell18.north east) -- ([yshift=0.1em]cell11.north west);
...
...
@@ -1407,11 +1407,11 @@ NLP问题的隐含结构假设 & 无隐含结构假设,端到端学习 \\
\visible
<4->
{
\draw
[-latex'] (w5.east) to ([yshift=0.3cm]w5.east);
}
\coordinate
(tmp) at ([yshift=-3pt]w5.east);
\node
[draw=red,thick,densely dashed,rounded corners=3pt,inner sep=5pt,fit=(cell01) (cell11) (label1) (label8) (target) (hidden) (tmp)] (output)
{}
;
\end{scope}
\draw
[->,thick,densely dashed,red] ([yshift=-0.2em]softmax3.east) .. controls +(east:2
\base
) and +(west:
\base
) .. (output.west);
\end{tikzpicture}
\end{center}
...
...
@@ -1600,7 +1600,7 @@ NLP问题的隐含结构假设 & 无隐含结构假设,端到端学习 \\
\end{center}
{
\scriptsize
\begin{tabular}
{
l
}
*
$
x
_
t
$
:
上
一层的输出,
$
h
_
t
$
: 同一层上一时刻的隐藏状态
\\
*
$
x
_
t
$
:
前
一层的输出,
$
h
_
t
$
: 同一层上一时刻的隐藏状态
\\
*
$
c
_
t
$
: 同一层上一时刻的记忆
\end{tabular}
}
\end{frame}
...
...
@@ -2546,7 +2546,7 @@ $\textrm{``you''} = \argmax_{y} \textrm{P}(y|\textbf{s}_1, \alert{\textbf{C}})$
\item
翻译出``world''的时候``世界''的权重很大
\end{enumerate}
\item
互译的词通常都会产生较大的注意力权重
\item
注意力的权重
包含了词对齐的信息
\item
注意力的权重
一定程度上反应了词语间的对应关系
\end{itemize}
\begin{center}
\hspace*
{
\fill
}
...
...
@@ -2823,8 +2823,8 @@ $\textrm{``you''} = \argmax_{y} \textrm{P}(y|\textbf{s}_1, \alert{\textbf{C}})$
\begin{itemize}
\item
LSTM遗忘门偏置初始为1,也就是始终选择遗忘记忆
$
c
$
,可以有效防止初始时
$
c
$
里包含的错误信号传播后面所有时刻
\item
网络的其他偏置一般都初始化成0,可以有效防止加入过大或过小的偏置后使得激活函数的输出跑到``饱和区'',也就是梯度接近0的区域,使得训练一开始就无法跳出局部极小
\item
网络的权重矩阵
$
W
$
一般使用Xavier参数初始化方法,可以有效稳定训练过程,特别是对于比较``深''的网络
$$
W
\sim
\mathcal
{
U
}
(-
\sqrt
{
\frac
{
6
}{
d
_{
\mathrm
{
in
}}
+
d
_{
\mathrm
{
out
}}}}
,
\sqrt
{
\frac
{
6
}{
d
_{
\mathrm
{
in
}}
+
d
_{
\mathrm
{
out
}}}}
)
$$
\item
$
d
_{
\mathrm
{
in
}}$
和
$
d
_{
\mathrm
{
out
}}$
分别是
$
W
$
的输入和输出的维度大小,经典的
论文
\\
\item
<2->
网络的权重矩阵
$
W
$
一般使用Xavier参数初始化方法,可以有效稳定训练过程,特别是对于比较``深''的网络
$$
W
\sim
\mathcal
{
U
}
(-
\sqrt
{
\frac
{
6
}{
d
_{
\mathrm
{
in
}}
+
d
_{
\mathrm
{
out
}}}}
,
\sqrt
{
\frac
{
6
}{
d
_{
\mathrm
{
in
}}
+
d
_{
\mathrm
{
out
}}}}
)
$$
$
d
_{
\mathrm
{
in
}}$
和
$
d
_{
\mathrm
{
out
}}$
分别是
$
W
$
的输入和输出的维度大小,参考
论文
\\
\textbf
{
Understanding the difficulty of training deep feedforward neural networks
}
\\
\textbf
{
Glorot, X.,
\&
Bengio, Y., 2010, In Proc of AISTATS
}
\end{itemize}
...
...
@@ -2844,7 +2844,7 @@ $\textrm{``you''} = \argmax_{y} \textrm{P}(y|\textbf{s}_1, \alert{\textbf{C}})$
\end{tabular}
\end{center}
\item
因此需要快速得到模型看一下初步效果,选择Adam
\item
若是需要在一个任务上得到最优的结果,选择SGD
\item
<2->
若是需要在一个任务上得到最优的结果,选择SGD
\begin{itemize}
\item
需要注意的是,训练RNN的时候,我们通常会遇到梯度爆炸的问题,也就是梯度突然变得很大,这种情况下需要使用``梯度裁剪''来防止梯度
$
\pi
$
超过阈值
$$
\pi
'
=
\pi
\cdot
\frac
{
\mathrm
{
threshold
}}{
\max
(
\mathrm
{
threshold
}
,
\parallel
\pi
\parallel
_
2
)
}$$
\item
其中
$
\mathrm
{
threshold
}$
是手工设定的梯度大小阈值,
$
\parallel
\cdot
\parallel
_
2
$
是L2范数
...
...
@@ -2858,9 +2858,11 @@ $\textrm{``you''} = \argmax_{y} \textrm{P}(y|\textbf{s}_1, \alert{\textbf{C}})$
\item
不同优化器需要的学习率不同,比如Adam一般使用
$
0
.
001
$
或
$
0
.
0001
$
,而SGD则在
$
0
.
1
\sim
1
$
之间挑选
\item
但是无论使用哪个优化器,为了保证训练又快又好,我们通常都需要根据当前的更新次数来调整学习率的大小
\begin{itemize}
\item
学习率预热:模型训练初期,梯度通常很大,直接使用很大的学习率很容易让模型跑偏,因此需要学习率有一个从小到大的过程
\item
学习率衰减:模型训练接近收敛的时候,使用大学习率会很容易让模型错过局部极小,因此需要学习率逐渐变小来逼近局部最小
\item
<2->
学习率预热:模型训练初期,梯度通常很大,直接使用很大的学习率很容易让模型跑偏,因此需要学习率有一个从小到大的过程
\item
<2->
学习率衰减:模型训练接近收敛的时候,使用大学习率会很容易让模型错过局部极小,因此需要学习率逐渐变小来逼近局部最小
\end{itemize}
\visible
<2->
{
\begin{center}
\begin{tikzpicture}
\footnotesize
{
...
...
@@ -2880,14 +2882,15 @@ $\textrm{``you''} = \argmax_{y} \textrm{P}(y|\textbf{s}_1, \alert{\textbf{C}})$
}
\end{tikzpicture}
\end{center}
}
\end{itemize}
\end{frame}
\begin{frame}
{
训练 - 加速
}
\begin{itemize}
\item
万事俱备,只是为什么训练这么慢?
\visible
<2>
{
\alert
{
- RNN需要等前面所有时刻都完成计算以后才能开始计算当前时刻的输出
}}
\item
我有钱,是不是多买几台设备会更快?
\visible
<2>
{
\alert
{
- 可以,但是需要技巧,而且也不是无限增长的
}}
\item
<
2
> 使用多个设备并行计算进行加速的两种方法
\item
万事俱备,只是为什么训练这么慢?
\visible
<2
-
>
{
\alert
{
- RNN需要等前面所有时刻都完成计算以后才能开始计算当前时刻的输出
}}
\item
我有钱,是不是多买几台设备会更快?
\visible
<2
-
>
{
\alert
{
- 可以,但是需要技巧,而且也不是无限增长的
}}
\item
<
3
> 使用多个设备并行计算进行加速的两种方法
\begin{itemize}
\item
数据并行:把``输入''分到不同设备上并行计算
\item
模型并行:把``模型''分到不同设备上并行计算
...
...
@@ -2901,7 +2904,8 @@ $\textrm{``you''} = \argmax_{y} \textrm{P}(y|\textbf{s}_1, \alert{\textbf{C}})$
模型并行
&
\specialcell
{
l
}{
可以对很大的模型进行
\\
运算
}
&
\specialcell
{
l
}{
只能有限并行,比如
\\
多少层就多少个设备
}
\\
\end{tabular}
\end{center}
\item
<2> 这两种方法可以一起使用!!!
\vspace
{
0.5em
}
\item
<3> 这两种方法可以一起使用!!!
\end{itemize}
\end{frame}
...
...
@@ -4578,7 +4582,7 @@ PE_{(pos,2i+1)} = cos(pos/10000^{2i/d_{model}})
}
\visible
<3->
{
\filldraw
[fill=blue!20,draw,thick,fill opacity=0.85] ([xshift=-0.9em,yshift=0.5em]a15.north west) -- ([xshift=0.5em,yshift=-0.9em]a51.south east) -- ([xshift=0.5em,yshift=0.5em]a55.north east) -- ([xshift=-0.9em,yshift=0.5em]a15.north west);
\node
[anchor=west]
(labelmask) at ([xshift=0.3em,yshift=0.5em]a23.north east)
{
Mask
}
;
\node
[anchor=west]
(labelmask) at ([xshift=0.3em,yshift=0.5em]a23.north east)
{
Mask
ed
}
;
\node
[rounded corners=0.3em,anchor=west,fill=blue!20] (mask) at ([xshift=0.1em]add.east)
{
\large
{$
Mask
$}}
;
}
...
...
编写
预览
Markdown
格式
0%
重试
或
添加新文件
添加附件
取消
您添加了
0
人
到此讨论。请谨慎行事。
请先完成此评论的编辑!
取消
请
注册
或者
登录
后发表评论