Skip to content
项目
群组
代码片段
帮助
当前项目
正在载入...
登录 / 注册
切换导航面板
T
Toy-MT-Introduction
概览
Overview
Details
Activity
Cycle Analytics
版本库
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
问题
0
Issues
0
列表
Board
标记
里程碑
合并请求
0
Merge Requests
0
CI / CD
CI / CD
流水线
作业
日程表
图表
维基
Wiki
代码片段
Snippets
成员
Collapse sidebar
Close sidebar
活动
图像
聊天
创建新问题
作业
提交
Issue Boards
Open sidebar
NiuTrans
Toy-MT-Introduction
Commits
03157a0c
Commit
03157a0c
authored
Apr 23, 2020
by
曹润柘
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
update 5 & 6
parent
c40d80e0
隐藏空白字符变更
内嵌
并排
正在显示
13 个修改的文件
包含
56 行增加
和
62 行删除
+56
-62
Book/Chapter5/Figures/fig-back-propagation-hid.tex
+6
-6
Book/Chapter5/Figures/fig-back-propagation-output1.tex
+2
-2
Book/Chapter5/Figures/fig-back-propagation-output2.tex
+3
-3
Book/Chapter5/Figures/fig-bert.tex
+1
-1
Book/Chapter5/Figures/fig-code-back-propagation-1.tex
+9
-9
Book/Chapter5/Figures/fig-code-back-propagation-2.tex
+9
-9
Book/Chapter5/Figures/fig-code-niutensor-rnn.tex
+5
-5
Book/Chapter5/Figures/fig-forward-propagation-hid.tex
+2
-2
Book/Chapter5/Figures/fig-gpt.tex
+1
-1
Book/Chapter5/Figures/fig-residual-structure.tex
+1
-1
Book/Chapter5/chapter5.tex
+0
-0
Book/Chapter6/Chapter6.tex
+10
-16
Book/mt-book-xelatex.tex
+7
-7
没有找到文件。
Book/Chapter5/Figures/fig-back-propagation-hid.tex
查看文件 @
03157a0c
...
...
@@ -7,20 +7,20 @@
\node
[anchor=east] (prev) at ([xshift=-2em]h.west)
{
...
}
;
\node
[anchor=west] (next) at ([xshift=2em]h2.east)
{
...
}
;
\draw
[->,thick] ([xshift=0.1em]prev.east) -- ([xshift=-0.1em]h.west);
\draw
[->,thick] ([xshift=0.1em]h.east) -- ([xshift=-0.1em]s.west) node [pos=0.5,below]
{
\
tiny
{$
\textbf
{
s
}^
k
=
\textbf
{
h
}^{
k
-
1
}
\textbf
{
w
}^
k
$}}
;
\draw
[->,thick] ([xshift=0.1em]s.east) -- ([xshift=-0.1em]h2.west) node [pos=0.5,below]
{
\
tiny
{$
\textbf
{
h
}^
k
=
f
^
k
(
\textbf
{
s
}^{
k
}
)
$}}
;
\draw
[->,thick] ([xshift=0.1em]h.east) -- ([xshift=-0.1em]s.west) node [pos=0.5,below]
{
\
scriptsize
{$
\textbf
{
s
}^
k
=
\textbf
{
h
}^{
k
-
1
}
\textbf
{
w
}^
k
$}}
;
\draw
[->,thick] ([xshift=0.1em]s.east) -- ([xshift=-0.1em]h2.west) node [pos=0.5,below]
{
\
scriptsize
{$
\textbf
{
h
}^
k
=
f
^
k
(
\textbf
{
s
}^{
k
}
)
$}}
;
\draw
[->,thick] ([xshift=0.1em]h2.east) -- ([xshift=-0.1em]next.west);
{
\draw
[<-,thick,red] ([xshift=0.1em,yshift=0.4em]h2.east) -- ([xshift=-0.1em,yshift=0.4em]next.west) node [pos=0.8,above]
{
\
tiny
{
反向传播
}}
;
\draw
[<-,thick,red] ([xshift=0.1em,yshift=0.4em]h2.east) -- ([xshift=-0.1em,yshift=0.4em]next.west) node [pos=0.8,above]
{
\
scriptsize
{
反向传播
}}
;
}
{
\draw
[<-,thick,red] ([xshift=0.1em,yshift=0.4em]s.east) -- ([xshift=-0.1em,yshift=0.4em]h2.west) node [pos=0.5,above]
{
\
tiny
{
反向传播
}}
;
\draw
[<-,thick,red] ([xshift=0.1em,yshift=0.4em]s.east) -- ([xshift=-0.1em,yshift=0.4em]h2.west) node [pos=0.5,above]
{
\
scriptsize
{
反向传播
}}
;
}
{
\draw
[<-,thick,red] ([xshift=0.1em,yshift=0.4em]h.east) -- ([xshift=-0.1em,yshift=0.4em]s.west) node [pos=0.5,above]
{
\
tiny
{
反向传播
}}
;
\draw
[<-,thick,red] ([xshift=0.1em,yshift=0.4em]h.east) -- ([xshift=-0.1em,yshift=0.4em]s.west) node [pos=0.5,above]
{
\
scriptsize
{
反向传播
}}
;
}
{
...
...
@@ -33,7 +33,7 @@
}
{
\node
[anchor=south] (slabel) at (s.north)
{$
\
pi
^
k
=
\
frac
{
\partial
L
}{
\partial
\textbf
{
s
}^{
k
}}$}
;
\node
[anchor=south] (slabel) at (s.north)
{$
\frac
{
\partial
L
}{
\partial
\textbf
{
s
}^{
k
}}$}
;
}
{
...
...
Book/Chapter5/Figures/fig-back-propagation-output1.tex
查看文件 @
03157a0c
...
...
@@ -7,8 +7,8 @@
\draw
[->] (s.east) -- (h2.west);
\draw
[->] (h2.east) -- (l.west);
\draw
[->,very thick,red] ([yshift=1em,xshift=-0.1em]l.north) -- ([yshift=1em,xshift=0.1em]h2.north) node [pos=0.5,above]
{
\
tiny
{
求梯度
{$
\frac
{
\partial
L
}{
\partial
\textbf
{
h
}^
K
}
=
?
$}}}
;
\draw
[->,very thick,red] ([yshift=1em,xshift=-0.1em]h2.north) -- ([yshift=1em,xshift=0.1em]s.north) node [pos=0.5,above]
{
\
tiny
{
求梯度
{$
\frac
{
\partial
f
^
K
(
\textbf
{
s
}^
K
)
}{
\partial
\textbf
{
s
}^
K
}
=
?
$}}}
;
\draw
[->,very thick,red] ([yshift=1em,xshift=-0.1em]l.north) -- ([yshift=1em,xshift=0.1em]h2.north) node [pos=0.5,above]
{
\
scriptsize
{
求梯度
{$
\frac
{
\partial
L
}{
\partial
\textbf
{
h
}^
K
}
=
?
$}}}
;
\draw
[->,very thick,red] ([yshift=1em,xshift=-0.1em]h2.north) -- ([yshift=1em,xshift=0.1em]s.north) node [pos=0.5,above]
{
\
scriptsize
{
求梯度
{$
\frac
{
\partial
f
^
K
(
\textbf
{
s
}^
K
)
}{
\partial
\textbf
{
s
}^
K
}
=
?
$}}}
;
\draw
[-,very thick,red] ([yshift=0.5em]l.north) -- ([yshift=1.5em]l.north);
\draw
[-,very thick,red] ([yshift=0.5em]h2.north) -- ([yshift=1.5em]h2.north);
\draw
[-,very thick,red] ([yshift=0.5em]s.north) -- ([yshift=1.5em]s.north);
...
...
Book/Chapter5/Figures/fig-back-propagation-output2.tex
查看文件 @
03157a0c
...
...
@@ -2,19 +2,19 @@
\begin{tikzpicture}
\begin{scope}
\node
[anchor=center,minimum height=1.7em,fill=yellow!20,draw] (h) at (0,0)
{$
\textbf
{
h
}^{
K
-
1
}$}
;
\node
[anchor=west,minimum height=1.7em,fill=blue!20,draw] (s) at ([xshift=
5.5
em]h.east)
{$
\textbf
{
s
}^{
K
}$}
;
\node
[anchor=west,minimum height=1.7em,fill=blue!20,draw] (s) at ([xshift=
6.0
em]h.east)
{$
\textbf
{
s
}^{
K
}$}
;
\draw
[->] (h.east) -- (s.west);
\node
[anchor=south west,inner sep=2pt] (step100) at ([xshift=0.5em,yshift=-0.8em]h.north east)
{
\
tiny
{$
\textbf
{
s
}^
K
=
\textbf
{
h
}^{
K
-
1
}
\textbf
{
w
}^
K
$}}
;
\node
[anchor=south west,inner sep=2pt] (step100) at ([xshift=0.5em,yshift=-0.8em]h.north east)
{
\
scriptsize
{$
\textbf
{
s
}^
K
=
\textbf
{
h
}^{
K
-
1
}
\textbf
{
w
}^
K
$}}
;
\node
[anchor=south west] (slabel) at ([yshift=1em,xshift=0.3em]s.north)
{
\scriptsize
{
\red
{
\textbf
{{
已经得到:
$
\pi
^
K
=
\frac
{
\partial
L
}{
\partial
\textbf
{
s
}^
K
}$}}}}}
;
\draw
[->,red] ([yshift=0.3em]slabel.south) .. controls +(south:0.5) and +(north:0.5) .. ([xshift=0.5em]s.north);
{
\draw
[->,very thick,red] ([yshift=1em,xshift=-0.1em]s.north) -- ([yshift=1em,xshift=0.1em]h.north) node [pos=0.5,above]
{
\
tiny
{{$
\frac
{
\partial
L
}{
\partial
\textbf
{
w
}^
K
}
=
?
$
,
$
\frac
{
\partial
L
}{
\partial
\textbf
{
h
}^{
K
-
1
}}
=
?
$}}}
;
\draw
[->,very thick,red] ([yshift=1em,xshift=-0.1em]s.north) -- ([yshift=1em,xshift=0.1em]h.north) node [pos=0.5,above]
{
\
scriptsize
{{$
\frac
{
\partial
L
}{
\partial
\textbf
{
w
}^
K
}
=
?
$
,
$
\frac
{
\partial
L
}{
\partial
\textbf
{
h
}^{
K
-
1
}}
=
?
$}}}
;
\draw
[-,very thick,red] ([yshift=0.5em]h.north) -- ([yshift=1.5em]h.north);
\draw
[-,very thick,red] ([yshift=0.5em]s.north) -- ([yshift=1.5em]s.north);
}
...
...
Book/Chapter5/Figures/fig-bert.tex
查看文件 @
03157a0c
...
...
@@ -33,7 +33,7 @@
\node
[anchor=south,draw,inner sep=4pt,fill=yellow!30,minimum width=2em] (t5) at ([yshift=1em]Trm9.north)
{
\scriptsize
{$
\textbf
{
h
}_
m
$}}
;
\node
[anchor=west,draw,inner sep=3pt,fill=blue!20!white,minimum width=1em] (Lt1) at ([yshift=1.5em]t1.west)
{
\tiny
{
TRM
}}
;
\node
[anchor=west] (Lt2) at ([xshift=-0.1em]Lt1.east)
{
\
tiny
{
: Transformer
}}
;
\node
[anchor=west] (Lt2) at ([xshift=-0.1em]Lt1.east)
{
\
scriptsize
{
: Transformer Block
}}
;
\draw
[->] ([yshift=0.1em]e1.north) -- ([yshift=-0.1em]Trm0.south);
\draw
[->] ([yshift=0.1em]e1.north) -- ([yshift=-0.1em]Trm1.south);
...
...
Book/Chapter5/Figures/fig-code-back-propagation-1.tex
查看文件 @
03157a0c
%%%------------------------------------------------------------------------------------------------------------
\begin{tcolorbox}
[bicolor,sidebyside,width=1
2
cm,righthand width=4cm,size=title,frame engine=empty,
[bicolor,sidebyside,width=1
3
cm,righthand width=4cm,size=title,frame engine=empty,
colback=blue!10!white,colbacklower=black!5!white]
{
\scriptsize
\begin{tabbing}
...
...
@@ -14,16 +14,16 @@
\texttt
{}
\\
\texttt
{
CrossEntropyBackward(dh[4], y, gold);
}
\\
\texttt
{
SoftmaxBackward(y, s[4], dh[4], ds[4]);
}
\\
\texttt
{
MMul(h[3],
{
\
tiny
X
\_
TRANS
}
, ds[4],
{
\tiny
X
\_
NOTRANS
}
, dw[4]);
}
\\
\texttt
{
MMul(ds[4],
{
\
tiny
X
\_
NOTRANS
}
, w[4],
{
\tiny
X
\_
RANS
}
, dh[3]);
}
\\
\texttt
{
MMul(h[3],
{
\
scriptsize
X
\_
TRANS
}
, ds[4],
{
\scriptsize
X
\_
NOTRANS
}
, dw[4]);
}
\\
\texttt
{
MMul(ds[4],
{
\
scriptsize
X
\_
NOTRANS
}
, w[4],
{
\scriptsize
X
\_
RANS
}
, dh[3]);
}
\\
\texttt
{}
\\
\texttt
{
dh[2] = dh[3];
}
\\
\texttt
{
ReluBackward(h[2], s[2], dh[2], ds[2]);
}
\\
\texttt
{
MMul(h[1],
{
\
tiny
X
\_
TRANS
}
, ds[2],
{
\tiny
X
\_
NOTRANS
}
, dw[2]);
}
\\
\texttt
{
MMul(ds[2],
{
\
tiny
X
\_
NOTRANS
}
, w[2],
{
\tiny
X
\_
TRANS
}
, dh[2]);
}
\\
\texttt
{
MMul(h[1],
{
\
scriptsize
X
\_
TRANS
}
, ds[2],
{
\scriptsize
X
\_
NOTRANS
}
, dw[2]);
}
\\
\texttt
{
MMul(ds[2],
{
\
scriptsize
X
\_
NOTRANS
}
, w[2],
{
\scriptsize
X
\_
TRANS
}
, dh[2]);
}
\\
...
...
@@ -46,10 +46,10 @@
\begin{tikzpicture}
\node
[anchor=south,draw,rounded corners,inner sep=2pt,minimum width=8em,minimum height=1.2em,fill=red!30!white,blur shadow=
{
shadow xshift=1pt,shadow yshift=-1pt
}
] (h1) at (0,0)
{
\
tiny
{
x (input)
}}
;
\node
[anchor=south,draw,rounded corners,inner sep=2pt,minimum width=8em,minimum height=1.2em,fill=green!30!white,blur shadow=
{
shadow xshift=1pt,shadow yshift=-1pt
}
] (h2) at ([yshift=1.5em]h1.north)
{
\
tiny
{
h1 = Relu(x * w1)
}}
;
\node
[anchor=south,draw,rounded corners,inner sep=2pt,minimum width=8em,minimum height=1.2em,fill=green!30!white,blur shadow=
{
shadow xshift=1pt,shadow yshift=-1pt
}
] (h3) at ([yshift=1.5em]h2.north)
{
\
tiny
{
h2 = Relu(h1 * w2)
}}
;
\node
[anchor=south,draw,rounded corners,inner sep=2pt,minimum width=8em,minimum height=1.2em,fill=green!30!white,blur shadow=
{
shadow xshift=1pt,shadow yshift=-1pt
}
] (h4) at ([yshift=1.5em]h3.north)
{
\
tiny
{
h3 = h2 + h1
}}
;
\node
[anchor=south,draw,rounded corners,inner sep=2pt,minimum width=8em,minimum height=1.2em,fill=red!30!white,blur shadow=
{
shadow xshift=1pt,shadow yshift=-1pt
}
] (h1) at (0,0)
{
\
scriptsize
{
x (input)
}}
;
\node
[anchor=south,draw,rounded corners,inner sep=2pt,minimum width=8em,minimum height=1.2em,fill=green!30!white,blur shadow=
{
shadow xshift=1pt,shadow yshift=-1pt
}
] (h2) at ([yshift=1.5em]h1.north)
{
\
scriptsize
{
h1 = Relu(x * w1)
}}
;
\node
[anchor=south,draw,rounded corners,inner sep=2pt,minimum width=8em,minimum height=1.2em,fill=green!30!white,blur shadow=
{
shadow xshift=1pt,shadow yshift=-1pt
}
] (h3) at ([yshift=1.5em]h2.north)
{
\
scriptsize
{
h2 = Relu(h1 * w2)
}}
;
\node
[anchor=south,draw,rounded corners,inner sep=2pt,minimum width=8em,minimum height=1.2em,fill=green!30!white,blur shadow=
{
shadow xshift=1pt,shadow yshift=-1pt
}
] (h4) at ([yshift=1.5em]h3.north)
{
\
scriptsize
{
h3 = h2 + h1
}}
;
{
\draw
[->,thick] (h1.north) -- (h2.south);
}
{
\draw
[->,thick] (h2.north) -- (h3.south);
}
...
...
Book/Chapter5/Figures/fig-code-back-propagation-2.tex
查看文件 @
03157a0c
%%%------------------------------------------------------------------------------------------------------------
\begin{tcolorbox}
[bicolor,sidebyside,width=1
2
cm,righthand width=4cm,size=title,frame engine=empty,
[bicolor,sidebyside,width=1
3
cm,righthand width=4cm,size=title,frame engine=empty,
colback=blue!10!white,colbacklower=black!5!white]
{
\scriptsize
\begin{tabbing}
...
...
@@ -14,16 +14,16 @@
\texttt
{}
\\
\texttt
{
CrossEntropyBackward(dh[4], y, gold);
}
\\
\texttt
{
SoftmaxBackward(y, s[4], dh[4], ds[4]);
}
\\
\texttt
{
MMul(h[3],
{
\
tiny
X
\_
TRANS
}
, ds[4],
{
\tiny
X
\_
NOTRANS
}
, dw[4]);
}
\\
\texttt
{
MMul(ds[4],
{
\
tiny
X
\_
NOTRANS
}
, w[4],
{
\tiny
X
\_
RANS
}
, dh[3]);
}
\\
\texttt
{
MMul(h[3],
{
\
scriptsize
X
\_
TRANS
}
, ds[4],
{
\scriptsize
X
\_
NOTRANS
}
, dw[4]);
}
\\
\texttt
{
MMul(ds[4],
{
\
scriptsize
X
\_
NOTRANS
}
, w[4],
{
\scriptsize
X
\_
RANS
}
, dh[3]);
}
\\
\texttt
{}
\\
\texttt
{
dh[2] = dh[3];
}
\\
\texttt
{
ReluBackward(h[2], s[2], dh[2], ds[2]);
}
\\
\texttt
{
MMul(h[1],
{
\
tiny
X
\_
TRANS
}
, ds[2],
{
\tiny
X
\_
NOTRANS
}
, dw[2]);
}
\\
\texttt
{
MMul(ds[2],
{
\
tiny
X
\_
NOTRANS
}
, w[2],
{
\tiny
X
\_
TRANS
}
, dh[2]);
}
\\
\texttt
{
MMul(h[1],
{
\
scriptsize
X
\_
TRANS
}
, ds[2],
{
\scriptsize
X
\_
NOTRANS
}
, dw[2]);
}
\\
\texttt
{
MMul(ds[2],
{
\
scriptsize
X
\_
NOTRANS
}
, w[2],
{
\scriptsize
X
\_
TRANS
}
, dh[2]);
}
\\
...
...
@@ -46,10 +46,10 @@
\begin{tikzpicture}
\node
[anchor=south,draw,rounded corners,inner sep=2pt,minimum width=8em,minimum height=1.2em,fill=red!30!white,blur shadow=
{
shadow xshift=1pt,shadow yshift=-1pt
}
] (h1) at (0,0)
{
\
tiny
{
x (input)
}}
;
\node
[anchor=south,draw,rounded corners,inner sep=2pt,minimum width=8em,minimum height=1.2em,fill=green!30!white,blur shadow=
{
shadow xshift=1pt,shadow yshift=-1pt
}
] (h2) at ([yshift=1.5em]h1.north)
{
\
tiny
{
h1 = Relu(x * w1)
}}
;
\node
[anchor=south,draw,rounded corners,inner sep=2pt,minimum width=8em,minimum height=1.2em,fill=green!30!white,blur shadow=
{
shadow xshift=1pt,shadow yshift=-1pt
}
] (h3) at ([yshift=1.5em]h2.north)
{
\
tiny
{
h2 = Relu(h1 * w2)
}}
;
\node
[anchor=south,draw,rounded corners,inner sep=2pt,minimum width=8em,minimum height=1.2em,fill=green!30!white,blur shadow=
{
shadow xshift=1pt,shadow yshift=-1pt
}
] (h4) at ([yshift=1.5em]h3.north)
{
\
tiny
{
h3 = h2 + h1
}}
;
\node
[anchor=south,draw,rounded corners,inner sep=2pt,minimum width=8em,minimum height=1.2em,fill=red!30!white,blur shadow=
{
shadow xshift=1pt,shadow yshift=-1pt
}
] (h1) at (0,0)
{
\
scriptsize
{
x (input)
}}
;
\node
[anchor=south,draw,rounded corners,inner sep=2pt,minimum width=8em,minimum height=1.2em,fill=green!30!white,blur shadow=
{
shadow xshift=1pt,shadow yshift=-1pt
}
] (h2) at ([yshift=1.5em]h1.north)
{
\
scriptsize
{
h1 = Relu(x * w1)
}}
;
\node
[anchor=south,draw,rounded corners,inner sep=2pt,minimum width=8em,minimum height=1.2em,fill=green!30!white,blur shadow=
{
shadow xshift=1pt,shadow yshift=-1pt
}
] (h3) at ([yshift=1.5em]h2.north)
{
\
scriptsize
{
h2 = Relu(h1 * w2)
}}
;
\node
[anchor=south,draw,rounded corners,inner sep=2pt,minimum width=8em,minimum height=1.2em,fill=green!30!white,blur shadow=
{
shadow xshift=1pt,shadow yshift=-1pt
}
] (h4) at ([yshift=1.5em]h3.north)
{
\
scriptsize
{
h3 = h2 + h1
}}
;
{
\draw
[->,thick] (h1.north) -- (h2.south);
}
{
\draw
[->,thick] (h2.north) -- (h3.south);
}
...
...
Book/Chapter5/Figures/fig-code-niutensor-rnn.tex
查看文件 @
03157a0c
...
...
@@ -49,10 +49,10 @@
\draw
[->,thick] (rlayer3.north) -- ([yshift=1em]rlayer3.north);
{
\node
[anchor=south,draw,rounded corners,inner sep=2pt,minimum width=9.4em,minimum height=1.0em,fill=green!30!white,blur shadow=
{
shadow xshift=1pt,shadow yshift=-1pt
}
] (h1) at ([yshift=1em]rlayer2.north)
{
\
tiny
{
h1 = Merge(
$
\cdot
$
)
}}
;
\node
[anchor=south,draw,rounded corners,inner sep=2pt,minimum width=9.4em,minimum height=1.0em,fill=green!30!white,blur shadow=
{
shadow xshift=1pt,shadow yshift=-1pt
}
] (h2) at ([yshift=1em]h1.north)
{
\
tiny
{
h2 = Relu(
$
\cdot
$
)
}}
;
\node
[anchor=south,draw,rounded corners,inner sep=2pt,minimum width=9.4em,minimum height=1.0em,fill=green!30!white,blur shadow=
{
shadow xshift=1pt,shadow yshift=-1pt
}
] (h3) at ([yshift=1em]h2.north)
{
\
tiny
{
h3 = Sum(
$
\cdot
$
)
}}
;
\node
[anchor=south,draw,rounded corners,inner sep=2pt,minimum width=9.4em,minimum height=1.0em,fill=green!30!white,blur shadow=
{
shadow xshift=1pt,shadow yshift=-1pt
}
] (h4) at ([yshift=1em]h3.north)
{
\
tiny
{
h4 = Softmax(
$
\cdot
$
)
}}
;
\node
[anchor=south,draw,rounded corners,inner sep=2pt,minimum width=9.4em,minimum height=1.0em,fill=green!30!white,blur shadow=
{
shadow xshift=1pt,shadow yshift=-1pt
}
] (h1) at ([yshift=1em]rlayer2.north)
{
\
scriptsize
{
h1 = Merge(
$
\cdot
$
)
}}
;
\node
[anchor=south,draw,rounded corners,inner sep=2pt,minimum width=9.4em,minimum height=1.0em,fill=green!30!white,blur shadow=
{
shadow xshift=1pt,shadow yshift=-1pt
}
] (h2) at ([yshift=1em]h1.north)
{
\
scriptsize
{
h2 = Relu(
$
\cdot
$
)
}}
;
\node
[anchor=south,draw,rounded corners,inner sep=2pt,minimum width=9.4em,minimum height=1.0em,fill=green!30!white,blur shadow=
{
shadow xshift=1pt,shadow yshift=-1pt
}
] (h3) at ([yshift=1em]h2.north)
{
\
scriptsize
{
h3 = Sum(
$
\cdot
$
)
}}
;
\node
[anchor=south,draw,rounded corners,inner sep=2pt,minimum width=9.4em,minimum height=1.0em,fill=green!30!white,blur shadow=
{
shadow xshift=1pt,shadow yshift=-1pt
}
] (h4) at ([yshift=1em]h3.north)
{
\
scriptsize
{
h4 = Softmax(
$
\cdot
$
)
}}
;
\draw
[->,thick] (h1.north) -- (h2.south);
\draw
[->,thick] (h2.north) -- (h3.south);
\draw
[->,thick] (h3.north) -- (h4.south);
...
...
@@ -60,7 +60,7 @@
}
{
\node
[anchor=south,draw,rounded corners,inner sep=2pt,minimum width=9.4em,minimum height=1.0em,fill=green!30!white,blur shadow=
{
shadow xshift=1pt,shadow yshift=-1pt
}
] (slayer) at ([yshift=1em]h4.north)
{
\
tiny
{
Split(
$
\cdot
$
)
}}
;
\node
[anchor=south,draw,rounded corners,inner sep=2pt,minimum width=9.4em,minimum height=1.0em,fill=green!30!white,blur shadow=
{
shadow xshift=1pt,shadow yshift=-1pt
}
] (slayer) at ([yshift=1em]h4.north)
{
\
scriptsize
{
Split(
$
\cdot
$
)
}}
;
\node
[anchor=south,draw,circle,inner sep=1pt,fill=red!30!white,blur shadow=
{
shadow xshift=1pt,shadow yshift=-1pt
}
] (y2) at ([yshift=1em]slayer.north)
{
\footnotesize
{$
\textrm
{
y
}_
2
$}}
;
\node
[anchor=east,draw,circle,inner sep=1pt,fill=red!30!white,blur shadow=
{
shadow xshift=1pt,shadow yshift=-1pt
}
] (y1) at ([xshift=-2em]y2.west)
{
\footnotesize
{$
\textrm
{
y
}_
1
$}}
;
\node
[anchor=west,draw,circle,inner sep=1pt,fill=red!30!white,blur shadow=
{
shadow xshift=1pt,shadow yshift=-1pt
}
] (y3) at ([xshift=2em]y2.east)
{
\footnotesize
{$
\textrm
{
y
}_
3
$}}
;
...
...
Book/Chapter5/Figures/fig-forward-propagation-hid.tex
查看文件 @
03157a0c
...
...
@@ -7,8 +7,8 @@
\node
[anchor=east] (prev) at ([xshift=-2em]h.west)
{
...
}
;
\node
[anchor=west] (next) at ([xshift=2em]h2.east)
{
...
}
;
\draw
[->,thick] ([xshift=0.1em]prev.east) -- ([xshift=-0.1em]h.west);
\draw
[->,thick] ([xshift=0.1em]h.east) -- ([xshift=-0.1em]s.west) node [pos=0.5,below]
{
\
tiny
{$
\textbf
{
s
}^
k
=
\textbf
{
h
}^{
k
-
1
}
\textbf
{
w
}^
k
$}}
;
\draw
[->,thick] ([xshift=0.1em]s.east) -- ([xshift=-0.1em]h2.west) node [pos=0.5,below]
{
\
tiny
{$
\textbf
{
h
}^
k
=
f
^
k
(
\textbf
{
s
}^{
k
}
)
$}}
;
\draw
[->,thick] ([xshift=0.1em]h.east) -- ([xshift=-0.1em]s.west) node [pos=0.5,below]
{
\
scriptsize
{$
\textbf
{
s
}^
k
=
\textbf
{
h
}^{
k
-
1
}
\textbf
{
w
}^
k
$}}
;
\draw
[->,thick] ([xshift=0.1em]s.east) -- ([xshift=-0.1em]h2.west) node [pos=0.5,below]
{
\
scriptsize
{$
\textbf
{
h
}^
k
=
f
^
k
(
\textbf
{
s
}^{
k
}
)
$}}
;
\draw
[->,thick] ([xshift=0.1em]h2.east) -- ([xshift=-0.1em]next.west);
...
...
Book/Chapter5/Figures/fig-gpt.tex
查看文件 @
03157a0c
...
...
@@ -33,7 +33,7 @@
\node
[anchor=south,draw,inner sep=4pt,fill=yellow!30,minimum width=2em] (t5) at ([yshift=1em]Trm9.north)
{
\scriptsize
{$
\textbf
{
h
}_
m
$}}
;
\node
[anchor=west,draw,inner sep=3pt,fill=blue!20!white,minimum width=1em] (Lt1) at ([yshift=1.5em]t1.west)
{
\tiny
{
TRM
}}
;
\node
[anchor=west] (Lt2) at ([xshift=-0.1em]Lt1.east)
{
\
tiny
{
: Transformer
}}
;
\node
[anchor=west] (Lt2) at ([xshift=-0.1em]Lt1.east)
{
\
scriptsize
{
: Transformer Block
}}
;
\draw
[->] ([yshift=0.1em]e1.north) -- ([yshift=-0.1em]Trm0.south);
\draw
[->] ([yshift=0.1em]e1.north) -- ([yshift=-0.1em]Trm1.south);
...
...
Book/Chapter5/Figures/fig-residual-structure.tex
查看文件 @
03157a0c
...
...
@@ -4,7 +4,7 @@
\node
[anchor=center] (node1) at (0,0)
{}
;
\node
[anchor=north,draw,thick](node2)at ([yshift=-1.5em]node1.south)
{
\small
{
weight layer
}}
;
\node
[anchor=north,draw,thick](node2)at ([yshift=-1.5em]node1.south)
{
\small
{
\ \
layer
\ \
}}
;
\draw
[->,thick]
(node1.south)--(node2.north);
\node
[anchor=north](node3)at ([yshift=-1.2em]node2.south)
{$
\bigoplus
$}
;
...
...
Book/Chapter5/chapter5.tex
查看文件 @
03157a0c
This source diff could not be displayed because it is too large. You can
view the blob
instead.
Book/Chapter6/Chapter6.tex
查看文件 @
03157a0c
...
...
@@ -264,13 +264,12 @@ NMT & $ 21.7^{\ast}$ & $18.7^{\ast}$ & -1
\subsection
{
简单的运行实例
}
\index
{
Chapter6.2.3
}
\label
{
chapter6.2.3
}
\parinterval
为了对编码器-解码器框架和神经机器翻译的运行过程有一个直观的认识,这里演示一个简单的翻译实例。这里采用标准的循环神经网络作为编码器和解码器的结构。假设系统的输入和输出为:
\begin{example}
\quad
源语(中文)输入:``我''、``很''、``好''、``<eos>''
\vspace
{
0.5em
}
\parinterval
\hspace
{
5em
}
源语(中文)输入:
\{
``我'',
\
``很'',
\
``好'',
\
``<eos>''
\}
目标语(英文)输出:``I''、``am''、``fine''、``<eos>''
\end{example}
\vspace
{
0.3em
}
\parinterval
\hspace
{
5em
}
目标语(英文)输出:
\{
``I'',
\
``am'',
\
``fine'',
\
``<eos>''
\}
\vspace
{
0.5em
}
%figure-a simple example for tl
%----------------------------------------------
...
...
@@ -369,14 +368,9 @@ NMT & $ 21.7^{\ast}$ & $18.7^{\ast}$ & -1
\parinterval
同大多数自然语言处理任务一样,神经机器翻译要解决的一个基本问题是如何描述文字序列,称为序列表示问题。例如,处理语音数据、文本数据都可以被看作是典型的序列表示问题。如果把一个序列看作一个时序上的一系列变量,不同时刻的变量之间往往是存在相关性的。也就是说,一个时序中某个时刻变量的状态会依赖其他时刻变量的状态,即上下文的语境信息。下面是一个简单的例子,假设有一个句子,但是最后两个单词被擦掉了,如何猜测被擦掉的单词是什么?
\begin{example}
\quad
中午没吃饭,又刚打了一下午篮球,我现在很饿,我想
\underline
{
\quad
\quad
\quad
}
。
\end{example}
%\vspace{0.5em}
%\centerline{中午没吃饭,又刚打了一下午篮球,我现在很饿,我想\underline{\quad \quad \quad} 。}
%\vspace{0.5em}
\vspace
{
0.5em
}
\centerline
{
中午没吃饭,又刚打了一下午篮球,我现在很饿,我想
\underline
{
\quad
\quad
\quad
}
。
}
\vspace
{
0.5em
}
\parinterval
显然,根据上下文中提到的``没吃饭''、``很饿'',最佳的答案是``吃 饭''或者``吃 东西''。也就是,对序列中某个位置的答案进行预测时需要记忆当前时刻之前的序列信息,因此,循环神经网络(Recurrent Neural Network, RNN)应运而生。实际上循环神经网络有着极为广泛的应用,例如语音识别、语言建模以及即将要介绍的神经机器翻译。
...
...
@@ -938,7 +932,7 @@ $\textrm{a}(\cdot)$可以被看作是目标语表示和源语言表示的一种`
\parinterval
将公式
\ref
{
eqC6.29
}
应用于神经机器翻译有几个基本问题需要考虑:1)损失函数的选择;2)参数初始化的策略,也就是如何设置
$
\mathbf
{
w
}_
0
$
;3)优化策略和学习率调整策略;4)训练加速。下面对这些问题进行讨论。
%%%%%%%%%%%%%%%%%%
\subsubsection
{
损失函数
}
\index
{
Chapter6.3.5.1
}
\parinterval
因为神经机器翻译在每个目标语位置都会输出一个概率分布,表示这个位置上不同单词出现的可能性,因此需要知道当前位置输出的分布相比于标准答案的``损失''。对于这个问题,常用的是交叉熵损失函数
\footnote
{
\ \
百度百科:
\url
{
https://baike.baidu.com/item/
\%
E4
\%
BA
\%
A4
\%
E5
\%
8F
\%
89
\%
E7
\%
86
\%
B5/8983241?fr=aladdin
}}
。令
$
\mathbf
{
y
}$
表示机器翻译模型输出的分布,
$
\hat
{
\mathbf
{
y
}}$
表示标准答案,则交叉熵损失可以被定义为
$
L
_{
\textrm
{
ce
}}
(
\mathbf
{
y
}
,
\hat
{
\mathbf
{
y
}}
)
=
-
\sum
_{
k
=
1
}^{
|V|
}
\mathbf
{
y
}
[
k
]
\textrm
{
log
}
(
\hat
{
\mathbf
{
y
}}
[
k
])
$
,其中
$
\mathbf
{
y
}
[
k
]
$
和
$
\hat
{
\mathbf
{
y
}}
[
k
]
$
分别表示向量
$
\mathbf
{
y
}$
和
$
\hat
{
\mathbf
{
y
}}$
的第
$
k
$
维,
$
|V|
$
表示输出向量得维度(等于词表大小)。对于一个模型输出的概率分布
$
\mathbf
{
Y
}
=
\{
\mathbf
{
y
}_
1
,
\mathbf
{
y
}_
2
,...,
\mathbf
{
y
}_
n
\}
$
和标准答案分布
$
\hat
{
\mathbf
{
Y
}}
=
\{
\hat
{
\mathbf
{
y
}}_
1
,
\hat
{
\mathbf
{
y
}}_
2
,...,
\hat
{
\mathbf
{
y
}}_
n
\}
$
,损失函数可以被定义为
\parinterval
因为神经机器翻译在每个目标语位置都会输出一个概率分布,表示这个位置上不同单词出现的可能性,因此需要知道当前位置输出的分布相比于标准答案的``损失''。对于这个问题,常用的是交叉熵损失函数。令
$
\mathbf
{
y
}$
表示机器翻译模型输出的分布,
$
\hat
{
\mathbf
{
y
}}$
表示标准答案,则交叉熵损失可以被定义为
$
L
_{
\textrm
{
ce
}}
(
\mathbf
{
y
}
,
\hat
{
\mathbf
{
y
}}
)
=
-
\sum
_{
k
=
1
}^{
|V|
}
\mathbf
{
y
}
[
k
]
\textrm
{
log
}
(
\hat
{
\mathbf
{
y
}}
[
k
])
$
,其中
$
\mathbf
{
y
}
[
k
]
$
和
$
\hat
{
\mathbf
{
y
}}
[
k
]
$
分别表示向量
$
\mathbf
{
y
}$
和
$
\hat
{
\mathbf
{
y
}}$
的第
$
k
$
维,
$
|V|
$
表示输出向量得维度(等于词表大小)。对于一个模型输出的概率分布
$
\mathbf
{
Y
}
=
\{
\mathbf
{
y
}_
1
,
\mathbf
{
y
}_
2
,...,
\mathbf
{
y
}_
n
\}
$
和标准答案分布
$
\hat
{
\mathbf
{
Y
}}
=
\{
\hat
{
\mathbf
{
y
}}_
1
,
\hat
{
\mathbf
{
y
}}_
2
,...,
\hat
{
\mathbf
{
y
}}_
n
\}
$
,损失函数可以被定义为
%-------------
\begin{eqnarray}
L(
\mathbf
{
Y
}
,
\hat
{
\mathbf
{
Y
}}
) =
\sum
_{
j=1
}^
n L
_{
\textrm
{
ce
}}
(
\mathbf
{
y
}_
j,
\hat
{
\mathbf
{
y
}}_
j)
...
...
@@ -1024,7 +1018,7 @@ L(\mathbf{Y},\hat{\mathbf{Y}}) = \sum_{j=1}^n L_{\textrm{ce}}(\mathbf{y}_j,\hat{
\end{figure}
%----------------------------------------------
\parinterval
图
\ref
{
fig:6-28
}
展示了一种常用的学习率调整策略。它分为两个阶段:预热阶段和衰减阶段。模型训练初期梯度通常很大,如果直接使用较大的学习率很容易让模型陷入局部最优。学习率的预热阶段便是通过在训练初期使学习率从小到大逐渐增加来减缓在初始阶段模型``跑偏''的现象。一般来说,初始学习率太高会使得模型进入一种损失函数曲面非常不平滑的区域,进而使得模型进入一种混乱状态,后续的优化过程很难取得很好的效果。一个常用的学习率预热方法是
逐渐预热
(Gradual Warmup),如果令预热的更新次数为
$
T'
$
,初始学习率为
$
\alpha
_
0
$
,则预热阶段第
$
t
$
次更新的学习率为:
\parinterval
图
\ref
{
fig:6-28
}
展示了一种常用的学习率调整策略。它分为两个阶段:预热阶段和衰减阶段。模型训练初期梯度通常很大,如果直接使用较大的学习率很容易让模型陷入局部最优。学习率的预热阶段便是通过在训练初期使学习率从小到大逐渐增加来减缓在初始阶段模型``跑偏''的现象。一般来说,初始学习率太高会使得模型进入一种损失函数曲面非常不平滑的区域,进而使得模型进入一种混乱状态,后续的优化过程很难取得很好的效果。一个常用的学习率预热方法是
{
\small\bfnew
{
逐渐预热
}}
(Gradual Warmup),如果令预热的更新次数为
$
T'
$
,初始学习率为
$
\alpha
_
0
$
,则预热阶段第
$
t
$
次更新的学习率为:
%-------------------------------
\begin{eqnarray}
\alpha
_
t =
\frac
{
t
}{
T'
}
\alpha
_
0
\quad
,
\quad
1
\leq
t
\leq
T'
...
...
Book/mt-book-xelatex.tex
查看文件 @
03157a0c
...
...
@@ -92,7 +92,7 @@
{
\large
\noindent
{
\color
{
red
}
在此感谢所有为本书做出贡献的人
}
\\
\noindent
曹润柘、曾信、孟霞、单韦乔、姜雨帆、王子扬、刘辉、许诺、李北、刘继强、张哲旸、周书涵、周涛、张裕浩、李炎洋
,
刘晓倩、牛蕊
\\
\noindent
曹润柘、曾信、孟霞、单韦乔、姜雨帆、王子扬、刘辉、许诺、李北、刘继强、张哲旸、周书涵、周涛、张裕浩、李炎洋
、林野、
刘晓倩、牛蕊
\\
}
...
...
@@ -112,13 +112,13 @@
% CHAPTERS
%----------------------------------------------------------------------------------------
\include
{
Chapter1/chapter1
}
\include
{
Chapter2/chapter2
}
\include
{
Chapter3/chapter3
}
\include
{
Chapter4/chapter4
}
\include
{
Chapter5/chapter5
}
%
\include{Chapter1/chapter1}
%
\include{Chapter2/chapter2}
%
\include{Chapter3/chapter3}
%
\include{Chapter4/chapter4}
%
\include{Chapter5/chapter5}
\include
{
Chapter6/chapter6
}
\include
{
ChapterAppend/chapterappend
}
%
\include{ChapterAppend/chapterappend}
...
...
编写
预览
Markdown
格式
0%
重试
或
添加新文件
添加附件
取消
您添加了
0
人
到此讨论。请谨慎行事。
请先完成此评论的编辑!
取消
请
注册
或者
登录
后发表评论