Commit 9d2fb3e9 by zengxin

合并分支 'caorunzhe' 到 'zengxin'

Caorunzhe

查看合并请求 !488
parents 67801f47 6be774cb
\tikzstyle{yy} = [circle,minimum height=1cm,text centered,draw=black]
\begin{tikzpicture}[node distance = 0,scale = 1]
\begin{scope}[xshift=0.2in]
\tikzstyle{every node}=[scale=1]
\node (y1)[yy]{\large$y_1$};
\node (y2)[yy,right of = y1,xshift=1.5cm]{\large$y_2$};
\node (y3)[yy,right of = y2,xshift=1.5cm]{\large$y_3$};
\node (y4)[yy,right of = y3,xshift=1.5cm]{\large$y_4$};
\node (y5)[yy,right of = y4,xshift=1.5cm]{\large$y_5$};
\node (y6)[yy,right of = y5,xshift=1.5cm]{\large$y_6$};
\node [anchor=north,font=\scriptsize] (labela) at ([xshift=1.8em,yshift=-3em]y3.south) {(a) 自回归模型};
\draw[->,thick] (y1.north) .. controls ([yshift=2em]y1.north) and ([yshift=2em]y3.north).. (y3.north);
\draw[->,thick] (y1.north) .. controls ([yshift=3em]y1.north) and ([yshift=3em]y4.north).. (y4.north);
\draw[->,thick] (y1.south) .. controls ([yshift=-2em]y1.south) and ([yshift=-2em]y5.south).. (y5.south);
\draw[->,thick] (y1.south) .. controls ([yshift=-3em]y1.south) and ([yshift=-3em]y6.south).. (y6.south);
\draw[->,thick] (y2.north) .. controls ([yshift=2em]y2.north) and ([yshift=2em]y4.north).. (y4.north);
\draw[->,thick] (y2.south) .. controls ([yshift=-2em]y2.south) and ([yshift=-2em]y5.south).. (y5.south);
\draw[->,thick] (y2.south) .. controls ([yshift=-3em]y2.south) and ([yshift=-3em]y6.south).. (y6.south);
\draw[->,thick] (y3.south) .. controls ([yshift=-2em]y3.south) and ([yshift=-2em]y5.south).. (y5.south);
\draw[->,thick] (y3.south) .. controls ([yshift=-3em]y3.south) and ([yshift=-3em]y6.south).. (y6.south);
\draw[->,thick] (y4.south) .. controls ([yshift=-2em]y4.south) and ([yshift=-2em]y6.south).. (y6.south);
\draw[->,red,very thick](y1.east)to(y2.west);
\draw[->,red,very thick](y2.east)to(y3.west);
\draw[->,red,very thick](y3.east)to(y4.west);
\draw[->,red,very thick](y4.east)to(y5.west);
\draw[->,red,very thick](y5.east)to(y6.west);
\end{scope}
\begin{scope}[yshift=-1.6in]
\tikzstyle{rec} = [rectangle,minimum width=2.8cm,minimum height=1.5cm,text centered,draw=black,dashed]
\tikzstyle{every node}=[scale=1]
\node (y1)[yy]{\large$y_1$};
\node (y2)[yy,right of = y1,xshift=1.5cm]{\large$y_2$};
\node (y3)[yy,right of = y2,xshift=2cm]{\large$y_3$};
\node (y4)[yy,right of = y3,xshift=1.5cm]{\large$y_4$};
\node (y5)[yy,right of = y4,xshift=2cm]{\large$y_5$};
\node (y6)[yy,right of = y5,xshift=1.5cm]{\large$y_6$};
\node (rec1)[rec,right of = y1,xshift=0.75cm]{};
\node (rec2)[rec,right of = y3,xshift=0.75cm]{};
\node (rec3)[rec,right of = y5,xshift=0.75cm]{};
\node [anchor=north,font=\scriptsize] (labelb) at ([xshift=2.2em,yshift=-1em]y3.south) {(b) 半自回归模型};
\draw[->,red,very thick](rec1.east)to(rec2.west);
\draw[->,red,very thick](rec2.east)to(rec3.west);
\draw[->,thick] (rec1.north) .. controls ([yshift=3em]rec1.north) and ([yshift=3em]rec3.north).. (rec3.north);
\end{scope}
\begin{scope}[xshift=0.3in,yshift=-2.5in]
\tikzstyle{every node}=[scale=1]
\node (y1)[yy]{\large$y_1$};
\node (y2)[yy,right of = y1,xshift=1.5cm]{\large$y_2$};
\node (y3)[yy,right of = y2,xshift=1.5cm]{\large$y_3$};
\node (y4)[yy,right of = y3,xshift=1.5cm]{\large$y_4$};
\node (y5)[yy,right of = y4,xshift=1.5cm]{\large$y_5$};
\node (y6)[yy,right of = y5,xshift=1.5cm]{\large$y_6$};
\node [anchor=north,font=\scriptsize] (labelc) at ([xshift=1.5em,yshift=-0.5em]y3.south) {(c) 非自回归模型};
\end{scope}
\end{tikzpicture}
\ No newline at end of file
\begin{tikzpicture}
\tikzstyle{snode} = [draw,inner sep=1pt,minimum width=3em,minimum height=0.5em,rounded corners=1pt,fill=green!30!white]
\tikzstyle{pnode} = [draw,inner sep=1pt,minimum width=1em,minimum height=0.5em,rounded corners=1pt]
\node [anchor=west,snode] (s1) at (0,0) {\tiny{}};
\node [anchor=north west,snode,minimum width=6.3em] (s2) at ([yshift=-0.3em]s1.south west) {\tiny{}};
\node [anchor=north west,snode,minimum width=2em] (s3) at ([yshift=-0.3em]s2.south west) {\tiny{}};
\node [anchor=north west,snode,minimum width=5.5em] (s4) at ([yshift=-0.3em]s3.south west) {\tiny{}};
\node [anchor=north west,snode,minimum width=5.8em] (s5) at ([yshift=-0.3em]s4.south west) {\tiny{}};
\node [anchor=north west,snode,minimum width=3em] (s6) at ([yshift=-0.3em]s5.south west) {\tiny{}};
\node [anchor=east] (label1) at ([xshift=-0.8em,yshift=0.6em]s1.west) {{句子:}};
\node [anchor=west,pnode,minimum width=3em] (p1) at ([xshift=0.3em]s1.east) {\tiny{}};
\node [anchor=west,pnode,minimum width=4em] (p3) at ([xshift=0.3em]s3.east) {\tiny{}};
\node [anchor=west,pnode,minimum width=0.5em] (p4) at ([xshift=0.3em]s4.east) {\tiny{}};
\node [anchor=west,pnode,minimum width=0.2em] (p5) at ([xshift=0.3em]s5.east) {\tiny{}};
\node [anchor=west,pnode,minimum width=3em] (p6) at ([xshift=0.3em]s6.east) {\tiny{}};
\node [rectangle,inner sep=0.5em,rounded corners=2pt,very thick,dotted,draw=ugreen!80] [fit = (s1) (s6) (p1) (p6)] (box0) {};
\node[rectangle,inner sep=0.5em,rounded corners=1pt,draw,fill=blue!15] (model) at ([xshift=4em]box0.east){{模型}};
% big batch
\node [anchor=west,snode] (sbi1) at ([xshift=3em,yshift=6em]model.east) {\tiny{}};
\node [anchor=north west,snode,minimum width=6.3em] (sbi2) at ([yshift=-0.3em]sbi1.south west) {\tiny{}};
\node [anchor=north west,snode,minimum width=2em] (sbi3) at ([yshift=-0.3em]sbi2.south west) {\tiny{}};
\node [anchor=north west,snode,minimum width=5.5em] (sbi4) at ([yshift=-0.3em]sbi3.south west) {\tiny{}};
\node [anchor=north west,snode,minimum width=5.8em] (sbi5) at ([yshift=-0.3em]sbi4.south west) {\tiny{}};
\node [anchor=north west,snode,minimum width=3em] (sbi6) at ([yshift=-0.3em]sbi5.south west) {\tiny{}};
\node [anchor=east] (label1) at ([xshift=-0.8em,yshift=-1em]sbi1.west) {{大batch}};
\node [anchor=west,pnode,minimum width=3em] (pbi1) at ([xshift=0.3em]sbi1.east) {\tiny{}};
\node [anchor=west,pnode,minimum width=4em] (pbi3) at ([xshift=0.3em]sbi3.east) {\tiny{}};
\node [anchor=west,pnode,minimum width=0.5em] (pbi4) at ([xshift=0.3em]sbi4.east) {\tiny{}};
\node [anchor=west,pnode,minimum width=0.2em] (pbi5) at ([xshift=0.3em]sbi5.east) {\tiny{}};
\node [anchor=west,pnode,minimum width=3em] (pbi6) at ([xshift=0.3em]sbi6.east) {\tiny{}};
\node [rectangle,inner sep=0.5em,rounded corners=2pt,very thick,dotted,draw=ugreen!80] [fit = (sbi1) (sbi6) (pbi1) (pbi6)] (box1) {};
% small batch
\node [anchor=west,snode,minimum width=5.5em] (sma1) at ([xshift=3em,yshift=-3em]model.east) {\tiny{}};
\node [anchor=north west,snode,minimum width=5.8em] (sma2) at ([yshift=-0.3em]sma1.south west) {\tiny{}};
\node [anchor=north west,snode,minimum width=6.3em] (sma3) at ([yshift=-0.3em]sma2.south west) {\tiny{}};
\node [anchor=east] (label1) at ([xshift=-0.8em,yshift=-2em]sma1.west) {{小batch}};
\node [anchor=west,pnode,minimum width=0.5em] (pma1) at ([xshift=0.3em]sma1.east) {\tiny{}};
\node [anchor=west,pnode,minimum width=0.2em] (pma2) at ([xshift=0.3em]sma2.east) {\tiny{}};
\node [rectangle,inner sep=0.5em,rounded corners=2pt,very thick,dotted,draw=ugreen!80] [fit = (sma1) (sma3) (pma1) (pma2)] (box2) {};
% small batch
\node [anchor=west,snode,minimum width=2em] (sma4) at ([xshift=4em,yshift=0em]sma1.east) {\tiny{}};
\node [anchor=north west,snode,minimum width=3em] (sma5) at ([yshift=-0.3em]sma4.south west) {\tiny{}};
\node [anchor=north west,snode,minimum width=3em] (sma6) at ([yshift=-0.3em]sma5.south west) {\tiny{}};
\node [anchor=west,pnode,minimum width=0.7em] (pma4) at ([xshift=0.3em]sma4.east) {\tiny{}};
\node [rectangle,inner sep=0.5em,rounded corners=2pt,very thick,dotted,draw=ugreen!80] [fit = (sma4) (sma6) (pma4)] (box3) {};
\draw [->,very thick] (box0.east) -- (model.west);
\draw [->,thick] (model.east) .. controls +(east:0.5) and +(west:0.5) .. ([xshift=-1em]box1.west);
\draw [->,thick] (model.east) .. controls +(east:0.5) and +(west:0.5) .. ([xshift=-1em]box2.west);
\draw [->,very thick] (box2.east) -- (box3.west);
%%%%%
\node [] (t10) at ([yshift=1.5em]box1.north) {$t_1$};
\node [] (t11) at ([yshift=1.5em]box2.north) {$t_1$};
\node [] (t2) at ([yshift=1.5em]box3.north) {$t_2$};
\draw [very thick,decorate,decoration={brace}] ([xshift=0em,yshift=0.3em]box1.north west) to node [midway,name=final] {} ([xshift=0em,yshift=0.3em]box1.north east);
\draw [very thick,decorate,decoration={brace}] ([xshift=0em,yshift=0.3em]box2.north west) to node [midway,name=final] {} ([xshift=0em,yshift=0.3em]box2.north east);
\draw [very thick,decorate,decoration={brace}] ([xshift=0em,yshift=0.3em]box3.north west) to node [midway,name=final] {} ([xshift=0em,yshift=0.3em]box3.north east);
\node [] (m1) at ([xshift=1.5em]box1.east) {$m_1$};
\node [] (m2) at ([xshift=1.5em]box3.east) {$m_2$};
\draw [very thick,decorate,decoration={brace}] ([xshift=3pt]box1.north east) to node [midway,name=final] {} ([xshift=3pt]box1.south east);
\draw [very thick,decorate,decoration={brace}] ([xshift=3pt]box3.north east) to node [midway,name=final] {} ([xshift=3pt]box3.south east);
\node [rectangle,inner sep=0.5em,rounded corners=2pt,draw,fill=red!5,font=\scriptsize] at ([yshift=-2em,xshift=10em]sbi1.east) {
\begin{tabular}{l}
$m$: 显存 \\
$t$: 时间 \\
$m_1>m_2$ \\
$t_1>t_2$
\end{tabular}
};
\end{tikzpicture}
\ No newline at end of file
%%%------------------------------------------------------------------------------------------------------------
\begin{tikzpicture}
\scriptsize{
\begin{axis}[
width=8cm,
height=5cm,
yticklabel style={/pgf/number format/.cd,fixed,precision=2},
xticklabel style={color=white},
xlabel={\footnotesize{Beam Size}},ylabel={\footnotesize{BLEU}},
ymin=28.8,ymax=30.4,
xmin=0,xmax=10,
xtick={0,4,6,7,8,9,10},
ytick={28.8,29.0,29.2,29.4,29.6,29.8,30.0,30.2,30.4},
legend style={yshift=-5em,xshift=0em,legend cell align=left,legend plot pos=right}
]
\addplot[purple,mark=square,mark=star,very thick] coordinates {(0,29.3) (4,29.7) (6,30.05) (7,30.1) (7.6,30.2) (8,30.3) (8.3,30.2) (8.7,30.08) (9,29.98) (9.6,29.6)(10,28.8) };
\end{axis}
}
\node[inner sep=0pt] at (0,-1em) {1};
\node[inner sep=0pt] at (9.15em,-1em) {2};
\node[inner sep=0pt] at (13.75em,-1em) {3};
\node[inner sep=0pt] at (16.05em,-1em) {5};
\node[inner sep=0pt] at (18.35em,-1em) {10};
\node[inner sep=0pt] at (20.65em,-1em) {30};
\node[inner sep=0pt] at (22.9em,-1em) {100};
\end{tikzpicture}
%%%--------------------------------------------------------------------------------------------
\ No newline at end of file
%\definecolor{dblue}{cmyk}{0.99998,1,0,0 }
\definecolor{dblue}{cmyk}{100,0,90,0 }
\begin{tikzpicture}[decoration=brace]
\begin{scope}
\setlength{\wseg}{1.5cm}
\setlength{\hseg}{0.6cm}
\setlength{\wnode}{2cm}
\setlength{\hnode}{1.2cm}
\tikzstyle{layernode} = [rectangle,draw,thick,densely dotted,inner sep=3pt,rounded corners,minimum width=2.1\wnode,minimum height=2.7\hnode]
\tikzstyle{attnnode} = [rectangle,draw,inner sep=3pt,rounded corners, minimum width=2\wnode,minimum height=2.2\hnode]
\tikzstyle{thinnode} = [rectangle,inner sep=1pt,rounded corners=1pt,minimum size=0.3\hnode,font=\scriptsize]
\tikzstyle{fatnode} = [rectangle,inner sep=1pt,rounded corners=1pt,minimum height=0.3\hnode,minimum width=\wnode,font=\small]
% 0.3\wseg here can be used to determine the distance between two adjacent blocks
\coordinate (layer00) at (0,0);
\foreach \i / \j in {1/0,2/1,3/2,4/3,5/4}
\coordinate (layer0\i) at ([xshift=2.05\wnode+0.3\wseg]layer0\j);
\node[layernode,anchor=north] (layer11) at ([yshift=-\hseg]layer01.south) {};
\node[attnnode,anchor=south] (attn11) at ([yshift=0.1\hnode]layer11.south) {};
\node[anchor=north west,inner sep=4pt,font=\small] () at (attn11.north west) {Attention};
\node[anchor=south,inner sep=0pt] (out11) at ([yshift=0.3\hseg]attn11.north) {$\cdots$};
\node[thinnode,anchor=south west,thick,draw=dblue,text=black] (q11) at ([xshift=0.1\wseg,yshift=0.2\hseg]attn11.south west) {$Q^n$};
\node[thinnode,anchor=south,thick,draw=orange,text=black] (k11) at ([yshift=0.2\hseg]attn11.south) {$K^n$};
\node[thinnode,anchor=south east,thick,draw=purple,text=black] (v11) at ([xshift=-0.1\wseg,yshift=0.2\hseg]attn11.south east) {$V^n$};
\node[fatnode,anchor=south,thick,draw] (s11) at ([xshift=0.5\wseg,yshift=0.8\hseg]q11.north east) {$S^n\!=\!S(Q^n\!\cdot\!K^n)$};
\node[fatnode,anchor=south,thick,draw] (a11) at ([xshift=0.45\wseg,yshift=1.3\hseg+0.6\hnode]k11.north east) {$A^n\!=\!S^n\!\cdot\!V$};
\begin{scope}[fill=black!100]
\draw[-latex',thick,draw=black!100] (q11.north) .. controls +(north:0.5\hseg) and +(south:0.8\hseg) .. (s11.south);
\draw[-latex',thick,draw=black!100] (k11.north) .. controls +(north:0.5\hseg) and +(south:0.8\hseg) .. (s11.south);
\end{scope}
\begin{scope}[fill=black!100]
\draw[-latex',thick,draw=black!100] (s11.north) .. controls +(north:0.7\hseg) and +(south:0.8\hseg) ..(a11.south);
\draw[-latex',thick,draw=black!100] (v11.north) .. controls +(north:2.7\hseg) and +(south:0.9\hseg) .. (a11.south);
\end{scope}
\draw[-latex',thick] (a11.north).. controls +(north:0.3\hseg) and +(south:0.7\hseg) ..(out11.south);
\node[layernode,anchor=north] (layer12) at ([yshift=-\hseg]layer02.south) {};
\node[attnnode,anchor=south] (attn12) at ([yshift=0.1\hnode]layer12.south) {};
\node[anchor=north west,inner sep=4pt,font=\small] () at (attn12.north west) {Attention};
\node[anchor=south,inner sep=0pt] (out12) at ([yshift=0.3\hseg]attn12.north) {$\cdots$};
\node[thinnode,anchor=south west,thick,draw=dblue!40,text=black!40] (q12) at ([xshift=0.1\wseg,yshift=0.2\hseg]attn12.south west) {$Q^n$};
\node[thinnode,anchor=south,thick,draw=orange!40,text=black!40] (k12) at ([yshift=0.2\hseg]attn12.south) {$K^n$};
\node[thinnode,anchor=south east,thick,draw=purple,text=black] (v12) at ([xshift=-0.1\wseg,yshift=0.2\hseg]attn12.south east) {$V^n$};
\node[fatnode,anchor=south,thick,densely dashed,draw] (s12) at ([xshift=0.5\wseg,yshift=0.8\hseg]q12.north east) {$S^n\!=\!S^m$};
\node[fatnode,anchor=south,thick,draw] (a12) at ([xshift=0.45\wseg,yshift=1.3\hseg+0.6\hnode]k12.north east) {$A^n\!=\!S^n\!\cdot\!V$};
\begin{scope}[fill=black!40]
\draw[-latex',thick,draw=black!40] (q12.north) .. controls +(north:0.5\hseg) and +(south:0.8\hseg) .. (s12.south);
\draw[-latex',thick,draw=black!40] (k12.north) .. controls +(north:0.5\hseg) and +(south:0.8\hseg) .. (s12.south);
\end{scope}
\begin{scope}[fill=black!100]
\draw[-latex',thick,draw=black!100] (s12.north).. controls +(north:0.7\hseg) and +(south:0.8\hseg) .. (a12.south);
\draw[-latex',thick,draw=black!100] (v12.north).. controls +(north:2.7\hseg) and +(south:0.9\hseg) .. (a12.south);
\end{scope}
\draw[-latex',thick] (a12.north).. controls +(north:0.3\hseg) and +(south:0.7\hseg) ..(out12.south);
\node[layernode,anchor=north] (layer13) at ([yshift=-\hseg]layer03.south) {};
\node[attnnode,anchor=south] (attn13) at ([yshift=0.1\hnode]layer13.south) {};
\node[anchor=north west,inner sep=4pt,font=\small] () at (attn13.north west) {Attention};
\node[anchor=south,inner sep=0pt] (out13) at ([yshift=0.3\hseg]attn13.north) {$\cdots$};
\node[thinnode,anchor=south west,thick,draw=dblue!40,text=black!40] (q13) at ([xshift=0.1\wseg,yshift=0.2\hseg]attn13.south west) {$Q^n$};
\node[thinnode,anchor=south,thick,draw=orange!40,text=black!40] (k13) at ([yshift=0.2\hseg]attn13.south) {$K^n$};
\node[thinnode,anchor=south east,thick,draw=purple!40,text=black!40] (v13) at ([xshift=-0.1\wseg,yshift=0.2\hseg]attn13.south east) {$V^n$};
\node[fatnode,anchor=south,thick,draw=black!40,text=black!40] (s13) at ([xshift=0.5\wseg,yshift=0.8\hseg]q13.north east) {$S^n$};
\node[fatnode,anchor=south,thick,densely dashed,draw] (a13) at ([xshift=0.45\wseg,yshift=1.3\hseg+0.6\hnode]k13.north east) {$A^n\!=\!A^m$};
\begin{scope}[fill=black!40]
\draw[-latex',thick,draw=black!40] (q13.north) .. controls +(north:0.5\hseg) and +(south:0.8\hseg) .. (s13.south);
\draw[-latex',thick,draw=black!40] (k13.north) .. controls +(north:0.5\hseg) and +(south:0.8\hseg) .. (s13.south);
\end{scope}
\begin{scope}[fill=black!40]
\draw[-latex',thick,draw=black!40] (s13.north) .. controls +(north:0.7\hseg) and +(south:0.8\hseg) .. (a13.south);
\draw[-latex',thick,draw=black!40] (v13.north) .. controls +(north:2.7\hseg) and +(south:0.9\hseg) .. (a13.south);
\end{scope}
\draw[-latex',thick] (a13.north).. controls +(north:0.3\hseg) and +(south:0.7\hseg) ..(out13.south);
\foreach \i / \j / \k / \q / \s / \t / \v in
{2/1/1/100/100/100/100, 2/2/1/100/100/100/100, 2/3/1/100/100/100/100}
{
\node[layernode,anchor=north] (layer\i\j) at ([yshift=-0.8\hseg]layer\k\j.south) {};
\node[attnnode,anchor=south] (attn\i\j) at ([yshift=0.1\hnode]layer\i\j.south) {};
\node[anchor=north west,inner sep=4pt,font=\small] () at (attn\i\j.north west) {Attention};
\node[anchor=south,inner sep=0pt] (out\i\j) at ([yshift=0.3\hseg]attn\i\j.north) {$\cdots$};
\node[thinnode,anchor=south west,thick,draw=dblue!\q,text=black] (q\i\j) at ([xshift=0.1\wseg,yshift=0.2\hseg]attn\i\j.south west) {$Q^m$};
\node[thinnode,anchor=south,thick,draw=orange!\q,text=black] (k\i\j) at ([yshift=0.2\hseg]attn\i\j.south) {$K^m$};
\node[thinnode,anchor=south east,thick,draw=purple!\s,text=black] (v\i\j) at ([xshift=-0.1\wseg,yshift=0.2\hseg]attn\i\j.south east) {$V^m$};
\node[fatnode,anchor=south,thick,draw=black!\s] (s\i\j) at ([xshift=0.45\wseg,yshift=0.8\hseg]q\i\j.north east) {$S^m\!=\!S(Q^m\!\cdot\!K^m)$};
\node[fatnode,anchor=south,thick,draw=black!80] (a\i\j) at ([xshift=0.45\wseg,yshift=1.3\hseg+0.6\hnode]k\i\j.north east) {$A^m\!=\!S^m\!\cdot\!V$};
\begin{scope}[fill=black!\q]
\draw[-latex',thick,draw=black!\t] (q\i\j.north) .. controls +(north:0.5\hseg) and +(south:0.8\hseg) .. (s\i\j.south);
\draw[-latex',thick,draw=black!\t] (k\i\j.north) .. controls +(north:0.5\hseg) and +(south:0.8\hseg) .. (s\i\j.south);
\end{scope}
\begin{scope}[fill=black!\s]
\draw[-latex',thick,draw=black!\v] (s\i\j.north).. controls +(north:0.7\hseg) and +(south:0.8\hseg) ..(a\i\j.south);
\draw[-latex',thick,draw=black!\v] (v\i\j.north).. controls +(north:2.7\hseg) and +(south:0.9\hseg) ..(a\i\j.south);
\end{scope}
\draw[-latex',thick] (a\i\j.north).. controls +(north:0.3\hseg) and +(south:0.7\hseg) ..(out\i\j.south);
}
\draw[-latex',densely dashed,very thick] (s22.west) to [out=120,in=-120] (s12.west);
\draw[-latex',densely dashed,very thick] (a23.east) to [out=60,in=-60] (a13.east);
\foreach \i in {1,2,3}
{
\node[anchor=north west,inner sep=3pt,font=\tiny] () at ([yshift=-0.2em]layer1\i.north west) {Layer $n\!=\!m\!+\!i$};
\node[anchor=north west,inner sep=3pt,font=\tiny] () at ([yshift=-0.2em]layer2\i.north west) {Layer $m$};
\node[anchor=center,inner sep=1pt] (dot1\i) at ([yshift=0.5\hseg]layer1\i.north) {$\cdots$};
\draw[->,thick] (out1\i.north) -- ([yshift=0.1em]dot1\i.south);
\node[anchor=center,inner sep=1pt] (dot2\i) at ([yshift=-0.4\hseg]layer1\i.south) {$\cdots$};
\draw[->,thick] ([yshift=-0.15em]dot2\i.north) -- ([yshift=-0.3em]attn1\i.south);
\draw[->,thick] (out2\i.north) -- ([yshift=0.1em]dot2\i.south);
\node[anchor=center,inner sep=1pt] (dot3\i) at ([yshift=-0.4\hseg]layer2\i.south) {$\cdots$};
\draw[->,thick] ([yshift=-0.15em]dot3\i.north) -- ([yshift=-0.3em]attn2\i.south);
}
\node[anchor=north,align=left,inner sep=1pt,font=\footnotesize] () at (dot31.south) {(a) Standard Transformer Attention};
\node[anchor=north,align=left,inner sep=1pt,font=\footnotesize] () at (dot32.south) {(b) \textsc{San} Self-Attention};
\node[anchor=north,align=left,inner sep=1pt,font=\footnotesize] () at (dot33.south) {(c) \textsc{San} Encoder-Decoder Attention};
\end{scope}
\end{tikzpicture}
\ No newline at end of file
\centering
\hspace*{\fill}
\subfigure[假设选择]
{
\begin{tikzpicture}[scale=0.5]
\tikzstyle{system} = [rectangle,very thick,minimum width=1cm,font=\tiny];
\tikzstyle{output} = [rectangle,very thick,rounded corners=3pt,minimum width=1cm,align=center,font=\tiny];
\begin{scope}
\node [system,draw=orange,text=orange] (model3) at (0,0) {模型 $3$};
\node [system,draw=ugreen,text=ugreen,anchor=south] (model2) at ([yshift=0.3cm]model3.north) {模型 $2$};
\node [system,draw=red,text=red,anchor=south] (model1) at ([yshift=0.3cm]model2.north) {模型 $1$};
\node [output,draw=orange,text=orange,anchor=west] (output3) at ([xshift=0.5cm]model3.east) {输出 $3$};
\node [output,draw=ugreen,text=ugreen,anchor=west] (output2) at ([xshift=0.5cm]model2.east) {输出 $2$};
\node [output,draw=red,text=red,anchor=west] (output1) at ([xshift=0.5cm]model1.east) {输出 $1$};
\begin{pgfonlayer}{background}
\node [draw,thick,dashed,rounded corners=3pt,inner sep=2pt,fit=(output1) (output2) (output3)] (output) {};
\end{pgfonlayer}
\node [output,draw=ublue,text=ublue,minimum width=1cm,right=1cm of output] (final) {最终\\输出};
\draw [->,very thick] (model1) to (output1);
\draw [->,very thick] (model2) to (output2);
\draw [->,very thick] (model3) to (output3);
\draw [->,very thick] (output) to node [above,pos=0.5,font=\tiny] {选择} (final);
\end{scope}
\end{tikzpicture}
}
\hfill
\subfigure[预测融合]
{
\begin{tikzpicture}[scale=0.5]
\tikzstyle{system} = [rectangle,very thick,minimum width=1cm,font=\tiny];
\tikzstyle{output} = [rectangle,very thick,rounded corners=3pt,minimum width=1cm,align=center,font=\tiny];
\begin{scope}
\node [system,draw=orange,text=orange] (model3) at (0,0) {模型 $3$};
\node [system,draw=ugreen,text=ugreen,anchor=south] (model2) at ([yshift=0.3cm]model3.north) {模型 $2$};
\node [system,draw=red,text=red,anchor=south] (model1) at ([yshift=0.3cm]model2.north) {模型 $1$};
\begin{pgfonlayer}{background}
\node [draw,thick,dashed,inner sep=2pt,fit=(model3) (model2) (model1)] (ensemble) {};
\end{pgfonlayer}
\node [system,draw=ugreen,text=ugreen,right=1cm of ensemble] (model) {模型};
\node [output,draw=ublue,text=ublue,minimum width=1cm,anchor=west] (final) at ([xshift=0.5cm]model.east) {最终\\输出};
\draw [->,very thick] (ensemble) to node [above,pos=0.5,font=\tiny] {融合} (model);
\draw [->,very thick] (model) to (final);
\end{scope}
\end{tikzpicture}
}
\hspace*{\fill}
\\
\subfigure[译文重组]
{
\begin{tikzpicture}[scale=0.5]
\tikzstyle{system} = [rectangle,very thick,minimum width=1cm,font=\tiny];
\tikzstyle{output} = [rectangle,very thick,rounded corners=3pt,minimum width=1cm,align=center,font=\tiny];
\tikzstyle{dot} = [circle,fill=blue!40!white,minimum size=5pt,inner sep=0pt];
\begin{scope}
\node [system,draw=orange,text=orange] (model3) at (0,0) {模型 $3$};
\node [system,draw=ugreen,text=ugreen,anchor=south] (model2) at ([yshift=0.3cm]model3.north) {模型 $2$};
\node [system,draw=red,text=red,anchor=south] (model1) at ([yshift=0.3cm]model2.north) {模型 $1$};
\node [output,draw=orange,text=orange,anchor=west] (output3) at ([xshift=0.5cm]model3.east) {输出 $3$};
\node [output,draw=ugreen,text=ugreen,anchor=west] (output2) at ([xshift=0.5cm]model2.east) {输出 $2$};
\node [output,draw=red,text=red,anchor=west] (output1) at ([xshift=0.5cm]model1.east) {输出 $1$};
\draw [->,very thick] (model1) to (output1);
\draw [->,very thick] (model2) to (output2);
\draw [->,very thick] (model3) to (output3);
\begin{pgfonlayer}{background}
\node [draw,thick,dashed,rounded corners=3pt,inner sep=2pt,fit=(output1) (output2) (output3)] (output) {};
\end{pgfonlayer}
\node [dot,anchor=west] (lattice1) at ([shift={(1.5cm,0.5cm)}]output2.east) {};
\node [dot,anchor=west] (lattice2) at ([shift={(1cm,0)}]lattice1.east) {};
\node [dot,anchor=west] (lattice3) at ([shift={(1cm,0)}]lattice2.east) {};
\node [dot,anchor=west] (lattice4) at ([shift={(1.5cm,-0.5cm)}]output2.east) {};
\node [dot,anchor=west] (lattice5) at ([shift={(1cm,0)}]lattice4.east) {};
\draw [-latex,blue] (lattice1) to [out=30,in=150] (lattice2);
\draw [-latex,blue] (lattice2) to [out=30,in=150] (lattice3);
\draw [-latex,blue] (lattice4) to [out=15,in=-120] (lattice2);
\draw [-latex,blue] (lattice4) to [out=-30,in=-150] (lattice5);
\draw [-latex,blue] (lattice5) to [out=15,in=-120] (lattice3);
\draw [-latex,blue] (lattice5) to [out=-60,in=-90] (lattice3);
\begin{pgfonlayer}{background}
\node [draw=blue,fill=white,drop shadow,thick,rounded corners=3pt,inner sep=5pt,fit=(lattice1) (lattice2) (lattice3) (lattice4) (lattice5),label={[font=\tiny,label distance=0pt]90:Lattice}] (lattice) {};
\end{pgfonlayer}
\draw [->,very thick] (output) to (lattice);
\node [system,draw=purple,text=purple,anchor=west] (model) at ([xshift=5.3cm]output1.east) {模型};
\node [output,draw=ublue,text=ublue,minimum width=1cm,right=1.3cm of lattice] (final) {最终输出};
\draw [->,very thick] (model) |- (final);
\draw [->,very thick] (lattice) -- (final);
\end{scope}
\end{tikzpicture}
}
\begin{tikzpicture}
\tikzstyle{layer} = [rectangle,draw,rounded corners=3pt,minimum width=1cm,minimum height=0.5cm,line width=1pt];
\tikzstyle{prob} = [minimum width=0.3cm,rectangle,fill=ugreen!20!white,inner sep=0pt];
\begin{scope}[local bounding box=STANDARD]
\node [] (input1) at (0,0) {$\cdots$};
\node [anchor=south,layer,fill=orange!15!white] (net1) at ([yshift=0.5cm]input1.north) {};
\node [anchor=south,layer,fill=orange!15!white] (out1) at ([yshift=0.5cm]net1.north) {};
\node [anchor=south,prob,minimum height=0.9cm] (prob5) at ([yshift=1.2cm]out1.north) {};
\node [anchor=south east,prob,minimum height=0.1cm] (prob4) at ([xshift=-1pt]prob5.south west) {};
\node [anchor=south east,prob,minimum height=0.2cm] (prob3) at ([xshift=-1pt]prob4.south west) {};
\node [anchor=south east,prob,minimum height=0.5cm] (prob2) at ([xshift=-1pt]prob3.south west) {};
\node [anchor=south east,prob,minimum height=0.4cm] (prob1) at ([xshift=-1pt]prob2.south west) {};
\node [anchor=south west,prob,minimum height=0.6cm] (prob6) at ([xshift=1pt]prob5.south east) {};
\node [anchor=south west,prob,minimum height=0.3cm] (prob7) at ([xshift=1pt]prob6.south east) {};
\node [anchor=south west,prob,minimum height=0.2cm] (prob8) at ([xshift=1pt]prob7.south east) {};
\node [anchor=south west,prob,minimum height=0.1cm] (prob9) at ([xshift=1pt]prob8.south east) {};
\path [fill=blue!20!white,draw=white] (out1.north west) -- (prob1.south west) -- (prob9.south east) -- (out1.north east) -- (out1.north west);
\draw [->,line width=1pt] (input1) to (net1);
\draw [->,line width=1pt] (net1) to (out1);
\node [font=\small] (label1) at ([yshift=0.6cm]out1.north) {Softmax};
\end{scope}
\begin{scope}[local bounding box=SELECTION]
\node [] (input2) at (4.5cm,0) {$\cdots$};
\node [anchor=south,layer,fill=orange!15!white] (net2) at ([yshift=0.5cm]input2.north) {};
\node [anchor=south,layer,fill=orange!15!white] (out2) at ([yshift=0.5cm]net2.north) {};
\node [anchor=south,prob,minimum height=0.9cm] (prob5) at ([yshift=1.2cm]out2.north) {};
\node [anchor=south east,prob,minimum height=0.1cm,opacity=0] (prob4) at ([xshift=-1pt]prob5.south west) {};
\node [text=red,anchor=south,inner sep=1pt] () at (prob4.south) {$\times$};
\node [anchor=south east,prob,minimum height=0.2cm,opacity=0] (prob3) at ([xshift=-1pt]prob4.south west) {};
\node [text=red,anchor=south,inner sep=1pt] () at (prob3.south) {$\times$};
\node [anchor=south east,prob,minimum height=0.5cm] (prob2) at ([xshift=-1pt]prob3.south west) {};
\node [anchor=south east,prob,minimum height=0.4cm] (prob1) at ([xshift=-1pt]prob2.south west) {};
\node [anchor=south west,prob,minimum height=0.6cm,opacity=0] (prob6) at ([xshift=1pt]prob5.south east) {};
\node [text=red,anchor=south,inner sep=1pt] () at (prob6.south) {$\times$};
\node [anchor=south west,prob,minimum height=0.3cm,opacity=0] (prob7) at ([xshift=1pt]prob6.south east) {};
\node [text=red,anchor=south,inner sep=1pt] () at (prob7.south) {$\times$};
\node [anchor=south west,prob,minimum height=0.2cm] (prob8) at ([xshift=1pt]prob7.south east) {};
\node [anchor=south west,prob,minimum height=0.1cm,opacity=0] (prob9) at ([xshift=1pt]prob8.south east) {};
\node [text=red,anchor=south,inner sep=1pt] (plabel9) at (prob9.south) {$\times$};
\path [fill=blue!20!white,draw=white] (out2.north west) -- (prob1.south west) -- (prob9.south east) -- (out2.north east) -- (out2.north west);
\draw [->,line width=1pt] (input2) to (net2);
\draw [->,line width=1pt] (net2) to (out2);
\node [font=\small] (label2) at ([yshift=0.6cm]out2.north) {Softmax};
\node [anchor=west,layer,fill=orange!15!white] (net3) at ([xshift=2cm]net2.east) {};
\node [anchor=north,font=\scriptsize] (input3) at ([yshift=-0.5cm]net3.south) {源语};
\node [anchor=south,layer,align=center,font=\scriptsize,fill=yellow!10!white] (out3) at ([yshift=0.9cm]net3.north) {候选\\列表};
\draw [->,line width=1pt] (input3) to (net3);
\draw [->,line width=1pt] (net3) to (out3);
\draw [->,line width=1pt] (out3) |- (plabel9.east);
\end{scope}
\node [anchor=north,font=\scriptsize] () at ([yshift=-0.2em]STANDARD.south) {(a) 标准方法};
\node [anchor=north,font=\scriptsize] () at ([xshift=1.2em]SELECTION.south) {(b) 词汇选择};
\end{tikzpicture}
\ No newline at end of file
\begin{tikzpicture}
\node [anchor=north west] (part1) at (0,0) {\small{$\begin{bmatrix} \textrm{Have} \; 0.5 \\ \textrm{Has} \ \ \; 0.1 \\ . \\ . \\ . \\ . \\ . \end{bmatrix}$}};
\node [anchor=north](p1) at ([yshift=-0.3em]part1.south) {$P_1$};
\node [anchor=west](part2) at ([xshift=0.5em]part1.east){\small{$\begin{bmatrix} \textrm{Have} \; 0.2 \\ \textrm{Has} \ \ \; 0.3 \\ . \\ . \\ . \\ . \\ . \end{bmatrix}$}};
\node [anchor=north](p2) at ([yshift=-0.3em]part2.south) {$P_2$};
\node [anchor=west](part3) at ([xshift=0.5em]part2.east){\small{$\begin{bmatrix} \textrm{Have} \; 0.4 \\ \textrm{Has} \ \ \; 0.3 \\ . \\ . \\ . \\ . \\ . \end{bmatrix}$}};
\node [anchor=north](p3) at ([yshift=-0.3em]part3.south) {$P_3$};
\node [anchor=west](part4) at ([xshift=0.5em]part3.east){\huge{$\Rightarrow$}};
\node [anchor=west](part5) at ([xshift=0.5em]part4.east){\small{$\begin{bmatrix} \textrm{Have} \; 0.37 \\ \textrm{Has} \ \ \; 0.23 \\ . \\ . \\ . \\ . \\ . \end{bmatrix}$}};
\node [anchor=north](p5) at (part5.south) {$P=\sum\limits_{i=1}^{3}{\frac{1}{3}P_{i}}$};
\end{tikzpicture}
\begin{tikzpicture}
\tikzstyle{system} = [rectangle,very thick,minimum width=1.5cm,font=\scriptsize];
\tikzstyle{output} = [rectangle,very thick,rounded corners=3pt,minimum width=1.5cm,align=center,font=\scriptsize];
\begin{scope}[local bounding box=MULTIPLE]
\node [system,draw=orange,text=orange] (engine3) at (0,0) {系统 $n$};
\node [system,draw=ugreen,text=ugreen,anchor=south] (engine2) at ([yshift=0.6cm]engine3.north) {系统 $2$};
\node [system,draw=red,text=red,anchor=south] (engine1) at ([yshift=0.3cm]engine2.north) {系统 $1$};
\node [output,draw=orange,text=orange,anchor=west] (output3) at ([xshift=0.5cm]engine3.east) {输出 $n$};
\node [output,draw=ugreen,text=ugreen,anchor=west] (output2) at ([xshift=0.5cm]engine2.east) {输出 $2$};
\node [output,draw=red,text=red,anchor=west] (output1) at ([xshift=0.5cm]engine1.east) {输出 $1$};
\draw [very thick,decorate,decoration={brace}] ([xshift=3pt]output1.north east) to node [midway,name=final] {} ([xshift=3pt]output3.south east);
\node [output,draw=ublue,text=ublue,minimum width=1cm,right=0pt of final,minimum height=2.5em] () {最终\\输出};
\draw [->,very thick] (engine1) to (output1);
\draw [->,very thick] (engine2) to (output2);
\draw [->,very thick] (engine3) to (output3);
\node [] () at ([yshift=0.4cm]output3.north) {$\vdots$};
\end{scope}
\begin{scope}[local bounding box=SINGLE]
\node [output,draw=ugreen,text=ugreen,anchor=west] (output3) at ([xshift=4cm]output3.east) {输出 $n$};
\node [output,draw=ugreen,text=ugreen,anchor=west] (output2) at ([xshift=4cm]output2.east) {输出 $2$};
\node [output,draw=ugreen,text=ugreen,anchor=west] (output1) at ([xshift=4cm]output1.east) {输出 $1$};
\node [system,draw=ugreen,text=ugreen,anchor=east,align=center,inner sep=1.9pt] (engine) at ([xshift=-0.5cm]output2.west) {单系统};
\draw [very thick,decorate,decoration={brace}] ([xshift=3pt]output1.north east) to node [midway,name=final] {} ([xshift=3pt]output3.south east);
\node [output,draw=ublue,text=ublue,minimum width=1cm,right=0pt of final,minimum height=2.5em] () {最终\\输出};
\draw [->,very thick] (engine.east) to (output1.west);
\draw [->,very thick] (engine.east) to (output2.west);
\draw [->,very thick] (engine.east) to (output3.west);
\node [] () at ([yshift=0.4cm]output3.north) {$\vdots$};
\end{scope}
\node [align=center,anchor=north,font=\small] () at ([yshift=-0.3cm]MULTIPLE.south) {\footnotesize{(a) 多系统输出结构融合}};
\node [align=center,anchor=north,font=\small] () at ([yshift=-0.3cm]SINGLE.south) {\footnotesize{(b) 单系统多输出结果融合}};
\end{tikzpicture}
\ No newline at end of file
\tikzstyle{er} = [rectangle,minimum width=2.5cm,minimum height=1.5cm,text centered,draw=black]
\begin{tikzpicture}[node distance = 0,scale = 0.75]
\tikzstyle{every node}=[scale=0.75]
\node (encoder)[er,very thick,draw=black!70,fill=ugreen!20]{\Large{编码器}};
\node (decoder_1)[er,very thick,draw=black!70,right of=encoder,xshift=4cm,fill=red!20]{\Large{解码器1}};
\node (decoder_2)[er,very thick,draw=black!70,right of=decoder_1,xshift=4cm,fill=red!20]{\Large{解码器2}};
\node (point)[right of=decoder_2,xshift=2.5cm,]{\LARGE{...}};
\node (decoder_3)[er,very thick,draw=black!70,right of=point,xshift=2.5cm,fill=red!20]{\Large{解码器3}};
\draw [->,very thick,draw=black!70]([xshift=0.2cm]encoder.east) -- ([xshift=-0.2cm]decoder_1.west);
\draw [->,very thick,draw=black!70]([xshift=0.2cm]decoder_1.east) -- ([xshift=-0.2cm]decoder_2.west);
\draw [->,very thick,draw=black!70]([xshift=0.2cm]decoder_2.east) -- ([xshift=-0.1cm]point.west);
\draw [->,very thick,draw=black!70]([xshift=0.1cm]point.east) -- ([xshift=-0.2cm]decoder_3.west);
\draw [->,very thick,draw=black!70]([xshift=0,yshift=-1cm]encoder.south) -- ([xshift=0,yshift=-0.2cm]encoder.south);
\draw [->,very thick,draw=black!70]([xshift=0,yshift=0.2cm]encoder.north) -- ([xshift=0,yshift=1cm]encoder.north);
\node [below of = encoder,xshift=0cm,yshift=2.2cm]{预测目标长度};
\node [below of = encoder,xshift=0cm,yshift=-2.2cm]{\Large$x$};
\draw [->,very thick,draw=black!70]([xshift=0,yshift=-1cm]decoder_1.south) -- ([xshift=0,yshift=-0.2cm]decoder_1.south);
\draw [->,very thick,draw=black!70]([xshift=0,yshift=0.2cm]decoder_1.north) -- ([xshift=0,yshift=1cm]decoder_1.north);
\node [below of = decoder_1,xshift=0cm,yshift=-2.2cm]{\Large$x'$};
\node (line1_1)[below of = decoder_1,xshift=0cm,yshift=2.2cm]{\Large$y_1$};
\draw [->,thick,]([xshift=0,yshift=-1cm]decoder_2.south) -- ([xshift=0,yshift=-0.2cm]decoder_2.south);
\draw [->,very thick,draw=black!70]([xshift=0,yshift=0.2cm]decoder_2.north) -- ([xshift=0,yshift=1cm]decoder_2.north);
\node (line1_2)[below of = decoder_2,xshift=0cm,yshift=-2.2cm]{\Large$y_1$};
\node [below of = decoder_2,xshift=0cm,yshift=2.2cm]{\Large$y_2$};
\draw [->,very thick,draw=black!70]([xshift=0,yshift=-1cm]decoder_3.south) -- ([xshift=0,yshift=-0.2cm]decoder_3.south);
\draw [->,very thick,draw=black!70]([xshift=0,yshift=0.2cm]decoder_3.north) -- ([xshift=0,yshift=1cm]decoder_3.north);
\node (line3_2)[below of = decoder_3,xshift=0cm,yshift=-2.2cm]{\Large$y_{N-1}$};
\node [below of = decoder_3,xshift=0cm,yshift=2.2cm]{\Large$y_N$};
\draw[->,very thick,draw=black!70, out=0, in=180,dotted] (line1_1.east) to (line1_2.west);
\draw[->,very thick,draw=black!70, out=0, in=180,dotted] ([xshift=4cm]line1_1.east) to ([xshift=3cm]line1_2.west);
\draw[->,very thick,draw=black!70, out=0, in=180,dotted] ([xshift=6cm]line1_1.east) to (line3_2.west);
\end{tikzpicture}
\ No newline at end of file
\begin{tikzpicture}
%左
\node [anchor=west,draw=black,very thick,minimum width=6em,minimum height=3.5em,fill=blue!15,align=center,text=black] (part1) at (0,0) {\scriptsize{预测模块} \\ \tiny{(RNN/Transsformer)}};
\node [anchor=south] (text) at ([xshift=0.5em,yshift=-3.5em]part1.south) {\scriptsize{源语言句子(编码)}};
\node [anchor=east,draw=black,very thick,minimum width=6em,minimum height=3.5em,fill=blue!15,align=center,text=black] (part2) at ([xshift=10em]part1.east) {\scriptsize{搜索模块}};
\node [anchor=south] (text1) at ([xshift=0.5em,yshift=2.2em]part1.north) {\scriptsize{已经生成的目标语单词}};
\node [anchor=south] (text2) at ([xshift=0.5em,yshift=2.2em]part2.north) {\scriptsize{预测当前位置的单词分布}};
\draw [->,draw=black, thick] ([yshift=2em]part1.north) -- ([yshift=0.1em]part1.north);
\draw [->,draw=black, thick] ([yshift=-2em]part1.south) -- ([yshift=-0.1em]part1.south);
\draw [->,draw=black, thick] ([yshift=0em]part2.north) -- ([yshift=2em]part2.north);
\draw [->,draw=black,very thick] ([yshift=-0.7em]part1.east) -- ([xshift=-0.05em,yshift=-0.7em]part2.west);
\draw [->,draw=black,very thick,dashed] ([yshift=0.7em]part2.west) -- ([xshift=0.05em,yshift=0.7em]part1.east);
\end{tikzpicture}
\tikzstyle{er} = [rectangle,minimum width=7cm,minimum height=2.5cm,text centered,draw=black]
\begin{tikzpicture}[node distance = 0,scale = 0.55]
\tikzstyle{every node}=[scale=0.55]
\node (encoder)[er,very thick,minimum width=5cm,draw=black!70,fill=ugreen!20]{\huge{编码器}};
\node (decoder)[er,very thick,right of=encoder,xshift=7.75cm,draw=black!70,fill=red!20]{\huge{解码器}};
\node (decoder_1)[er,very thick,right of=decoder,xshift=8.75cm,draw=black!70,fill=red!20]{\huge{解码器}};
\draw [->,very thick,draw=black!70]([xshift=0.2cm]encoder.east) -- ([xshift=-0.2cm]decoder.west);
\draw [->,very thick,draw=black!70]([xshift=0.2cm]decoder.east) -- ([xshift=-0.2cm]decoder_1.west);
\foreach \x in {-1.8cm,-0.9cm,...,1.9cm}
\draw [->,very thick,draw=black!70]([xshift=\x,yshift=-1cm]encoder.south) -- ([xshift=\x,yshift=-0.2cm]encoder.south);
\node [below of = encoder,xshift=-1.8cm,yshift=-2.95cm,scale=1.2]{[cls]};
\node [below of = encoder,xshift=-0.7cm,yshift=-2.9cm,scale=1.2]{hello};
\node [below of = encoder,xshift=0cm,yshift=-3.05cm,scale=1.2]{,};
\node [below of = encoder,xshift=0.9cm,yshift=-2.9cm,scale=1.2]{world};
\node [below of = encoder,xshift=1.8cm,yshift=-2.9cm,scale=1.2]{!};
\draw [->,very thick,draw=black!70]([xshift=-1.8cm,yshift=0.2cm]encoder.north) -- ([xshift=-1.8cm,yshift=1cm]encoder.north);
\node [below of = encoder,xshift=-1.8cm,yshift=2.9cm,scale=1.5]{length:6};
\foreach \x in {-2.8cm,-1.7cm,...,2.9cm}
{\draw [->,very thick,draw=black!70]([xshift=\x,yshift=-1cm]decoder.south) -- ([xshift=\x,yshift=-0.2cm]decoder.south);
\draw [->,very thick,draw=black!70]([xshift=\x,yshift=0.2cm]decoder.north) -- ([xshift=\x,yshift=1cm]decoder.north);}
\node [below of = decoder,xshift=0cm,yshift=-2.9cm,scale=1.2]{{Mask\ Mask\ Mask\ Mask\ Mask\ Mask}};
\node [below of = decoder,xshift=-2.8cm,yshift=2.9cm,scale=1.6]{};
\node [below of = decoder,xshift=-1.7cm,yshift=2.9cm,scale=1.6]{};
\node [below of = decoder,xshift=-0.6cm,yshift=2.7cm,scale=1.6]{};
\node [below of = decoder,xshift=0.5cm,yshift=2.7cm,scale=1.6]{};
\node [below of = decoder,xshift=1.6cm,yshift=2.9cm,scale=1.6]{};
\node [below of = decoder,xshift=2.7cm,yshift=2.75cm,scale=1.6]{};
\foreach \x in {-2.8cm,-1.7cm,...,2.9cm}
{\draw [->,very thick,draw=black!70]([xshift=\x,yshift=-1cm]decoder_1.south) -- ([xshift=\x,yshift=-0.2cm]decoder_1.south);
\draw [->,very thick,draw=black!70]([xshift=\x,yshift=0.2cm]decoder_1.north) -- ([xshift=\x,yshift=1cm]decoder_1.north);}
\node [below of = decoder_1,xshift=-2.8cm,yshift=2.9cm,scale=1.6]{};
\node [below of = decoder_1,xshift=-1.7cm,yshift=2.9cm,scale=1.6]{};
\node [below of = decoder_1,xshift=-0.6cm,yshift=2.75cm,scale=1.6]{};
\node [below of = decoder_1,xshift=0.5cm,yshift=2.9cm,scale=1.6]{};
\node [below of = decoder_1,xshift=1.6cm,yshift=2.9cm,scale=1.6]{};
\node [below of = decoder_1,xshift=2.7cm,yshift=2.75cm,scale=1.6]{};
\node [below of = decoder_1,xshift=-2.8cm,yshift=-2.9cm,scale=1.2]{Mask};
\node [below of = decoder_1,xshift=-1.7cm,yshift=-2.9cm,scale=1.3]{};
\node [below of = decoder_1,xshift=-0.6cm,yshift=-3cm,scale=1.3]{};
\node [below of = decoder_1,xshift=0.5cm,yshift=-2.9cm,scale=1.2]{Mask};
\node [below of = decoder_1,xshift=1.6cm,yshift=-2.9cm,scale=1.2]{};
\node [below of = decoder_1,xshift=2.7cm,yshift=-3cm,scale=1.3]{};
\end{tikzpicture}
\ No newline at end of file
\definecolor{ublue}{rgb}{0.152,0.250,0.545}
\definecolor{ugreen}{rgb}{0,0.5,0}
%%% outline
%-------------------------------------------------------------------------
\begin{tikzpicture}
\tikzstyle{word} = [draw=ugreen!20,minimum size=1.8em, fill=ugreen!40, font=\scriptsize, rounded corners=1pt]
\tikzstyle{tgt} = [minimum height=1.6em,minimum width=5.2em,fill=black!10!yellow!30,font=\footnotesize,drop shadow={shadow xshift=0.15em,shadow yshift=-0.15em,}]
\tikzstyle{p} = [fill=blue!15,minimum width=0.4em,inner sep=0pt]
\node[ rounded corners=3pt, fill=red!20, drop shadow, minimum width=10em,minimum height=4em] (encoder) at (0,0) {Transformer 编码器 };
\node[anchor=west, rounded corners=3pt, fill=red!20, drop shadow, minimum width=14em,minimum height=4em] (decoder) at ([xshift=1cm]encoder.east) {Transformer 解码器};
\node[anchor=north,word] (en1) at ([yshift=-1.3em,xshift=-3em]encoder.south) {};
\node[anchor=north,word] (en2) at ([yshift=-1.3em]encoder.south) {};
\node[anchor=north,word] (en3) at ([yshift=-1.3em,xshift=3em]encoder.south) {};
\node[anchor=north,word] (de1) at ([yshift=-1.3em,xshift=-4em]decoder.south) {1};
\node[anchor=north,word] (de2) at ([yshift=-1.3em]decoder.south) {2};
\node[anchor=north,word] (de3) at ([yshift=-1.3em,xshift=4em]decoder.south) {3};
\node[p,anchor=south, minimum height=0.5em] (w1_1) at ([xshift=-7em,yshift=1.5em]decoder.north){};
\node[p,anchor=south,minimum height=2em] (w1_2) at ([xshift=0.3em]w1_1.south east){};
\node[p,anchor=south,minimum height=0.7em] (w1_3) at ([xshift=0.3em]w1_2.south east){};
\node[p,anchor=south,minimum height=0.6em] (w1_4) at ([xshift=0.3em]w1_3.south east){};
\node[p,anchor=south,minimum height=0.2em] (w1_5) at ([xshift=0.3em]w1_4.south east){};
\node[p,anchor=south,minimum height=0.3em] (w1_6) at ([xshift=0.3em]w1_5.south east){};
\node[p,anchor=south,minimum height=0.6em] (w1_7) at ([xshift=0.3em]w1_6.south east){};
\node[p,anchor=south,minimum height=0.8em] (w1_8) at ([xshift=0.3em]w1_7.south east){};
\node[p,anchor=south, minimum height=0.5em] (w2_1) at ([xshift=-1.8em,yshift=1.5em]decoder.north){};
\node[p,anchor=south,minimum height=2em] (w2_2) at ([xshift=0.3em]w2_1.south east){};
\node[p,anchor=south,minimum height=0.7em] (w2_3) at ([xshift=0.3em]w2_2.south east){};
\node[p,anchor=south,minimum height=0.6em] (w2_4) at ([xshift=0.3em]w2_3.south east){};
\node[p,anchor=south,minimum height=1.9em] (w2_5) at ([xshift=0.3em]w2_4.south east){};
\node[p,anchor=south,minimum height=0.3em] (w2_6) at ([xshift=0.3em]w2_5.south east){};
\node[p,anchor=south,minimum height=0.6em] (w2_7) at ([xshift=0.3em]w2_6.south east){};
\node[p,anchor=south,minimum height=0.8em] (w2_8) at ([xshift=0.3em]w2_7.south east){};
\node[p,anchor=south, minimum height=0.5em] (w3_1) at ([xshift=3.2em,yshift=1.5em]decoder.north){};
\node[p,anchor=south,minimum height=2em] (w3_2) at ([xshift=0.3em]w3_1.south east){};
\node[p,anchor=south,minimum height=0.7em] (w3_3) at ([xshift=0.3em]w3_2.south east){};
\node[p,anchor=south,minimum height=0.6em] (w3_4) at ([xshift=0.3em]w3_3.south east){};
\node[p,anchor=south,minimum height=1.9em] (w3_5) at ([xshift=0.3em]w3_4.south east){};
\node[p,anchor=south,minimum height=0.3em] (w3_6) at ([xshift=0.3em]w3_5.south east){};
\node[p,anchor=south,minimum height=0.6em] (w3_7) at ([xshift=0.3em]w3_6.south east){};
\node[p,anchor=south,minimum height=0.8em] (w3_8) at ([xshift=0.3em]w3_7.south east){};
\node[inner sep=0pt,font=\scriptsize] at ([yshift=0.3em]w1_2.north){thanks};
\node[inner sep=0pt,font=\scriptsize] at ([yshift=0.3em]w2_2.north){a};
\node[inner sep=0pt,font=\scriptsize] at ([yshift=0.3em]w2_5.north){to};
\node[inner sep=0pt,font=\scriptsize] at ([yshift=0.3em]w3_2.north){lot};
\node[inner sep=0pt,font=\scriptsize] at ([yshift=0.3em]w3_5.north){you};
\draw[-latex, very thick, ublue] ([yshift=0.1em]en1.north) -- ([xshift=-3em,yshift=-0.1em]encoder.south);
\draw[-latex, very thick, ublue] ([yshift=0.1em]en2.north) -- ([xshift=0em,yshift=-0.1em]encoder.south);
\draw[-latex, very thick, ublue] ([yshift=0.1em]en3.north) -- ([xshift=3em,yshift=-0.1em]encoder.south);
\draw[-latex, very thick, ublue] ([yshift=0.1em]de1.north) -- ([xshift=-4em,yshift=-0.1em]decoder.south);
\draw[-latex, very thick, ublue] ([yshift=0.1em]de2.north) -- ([xshift=0em,yshift=-0.1em]decoder.south);
\draw[-latex, very thick, ublue] ([yshift=0.1em]de3.north) -- ([xshift=4em,yshift=-0.1em]decoder.south);
\draw[-latex, very thick, ublue] (encoder.east) -- (decoder.west);
\begin{pgfonlayer}{background}
{
\node[inner sep=2pt] [fit =(w1_1)(w1_2)(w1_8)](box1) {};
\node[inner sep=2pt] [fit =(w2_1)(w2_2)(w2_8)] (box2){};
\node[inner sep=2pt] [fit =(w3_1)(w3_2)(w3_8)] (box3){};
}
\end{pgfonlayer}
\draw[-latex, very thick, ublue] ([yshift=-1.2em]box1.south) -- (box1.south);
\draw[-latex, very thick, ublue] ([yshift=-1.2em]box2.south) -- (box2.south);
\draw[-latex, very thick, ublue] ([yshift=-1.2em]box3.south) -- (box3.south);
\node[tgt,anchor=west] (tgt1) at ([xshift=2em]box3.east) {thanks a lot};
\node[tgt,,anchor=north](tgt2) at ([yshift=-1em]tgt1.south) {thanks to you};
\node[tgt,,anchor=north] (tgt3) at ([yshift=-1em]tgt2.south) {thanks a you};
\node[tgt,,anchor=north] (tgt4) at ([yshift=-1em]tgt3.south) {thanks to lot};
\node[text=ugreen] at ([xshift=1em]tgt1.east){\ding{51}};
\node[text=ugreen] at ([xshift=1em]tgt2.east){\ding{51}};
\node[text=red] at ([xshift=1em]tgt3.east){\ding{55}};
\node[text=red] at ([xshift=1em]tgt4.east){\ding{55}};
\end{tikzpicture}
\definecolor{ublue}{rgb}{0.152,0.250,0.545}
\definecolor{ugreen}{rgb}{0,0.5,0}
%%% outline
%-------------------------------------------------------------------------
\begin{tikzpicture}
\tikzstyle{word} = [draw=ugreen!20,minimum size=1.8em, fill=ugreen!40, font=\scriptsize, rounded corners=1pt]
\node[rounded corners=3pt, fill=red!20, drop shadow, minimum width=12em,minimum height=4em] (encoder) at (0,0) {Transformer 编码器 };
\node[draw=blue!10,anchor=west, rounded corners=2pt, fill=blue!20,minimum width=2.5cm,minimum height=2em] (attention) at ([xshift=0.8cm]encoder.east) {注意力模块};
\node[anchor=west, rounded corners=3pt, fill=red!20, drop shadow, minimum width=14em,minimum height=4em] (decoder) at ([xshift=0.8cm]attention.east) {Transformer 解码器};
\node[anchor=north,word] (en1) at ([yshift=-1.6em,xshift=-4.8em]encoder.south) {hello};
\node[anchor=north,word] (en2) at ([yshift=-1.6em,xshift=-1.6em]encoder.south) {,};
\node[anchor=north,word] (en3) at ([yshift=-1.6em,xshift=1.6em]encoder.south) {world};
\node[anchor=north,word] (en4) at ([yshift=-1.6em,xshift=4.8em]encoder.south) {!};
\node[anchor=north,word] (de1) at ([yshift=-1.6em,xshift=-5.75em]decoder.south) {1};
\node[anchor=north,word] (de2) at ([yshift=-1.6em,xshift=-3.45em]decoder.south) {2};
\node[anchor=north,word] (de3) at ([yshift=-1.6em,xshift=-1.15em]decoder.south) {3};
\node[anchor=north,word] (de4) at ([yshift=-1.6em,xshift=1.15em]decoder.south) {4};
\node[anchor=north,word] (de5) at ([yshift=-1.6em,xshift=3.45em]decoder.south) {5};
\node[anchor=north,word] (de6) at ([yshift=-1.6em,xshift=5.75em]decoder.south) {6};
\node[anchor=south,word] (out1) at ([yshift=1.6em,xshift=-5.75em]decoder.north) {};
\node[anchor=south,word] (out2) at ([yshift=1.6em,xshift=-3.45em]decoder.north) {};
\node[anchor=south,word] (out3) at ([yshift=1.6em,xshift=-1.15em]decoder.north) {};
\node[anchor=south,word] (out4) at ([yshift=1.6em,xshift=1.15em]decoder.north) {};
\node[anchor=south,word] (out5) at ([yshift=1.6em,xshift=3.45em]decoder.north) {};
\node[anchor=south,word] (out6) at ([yshift=1.6em,xshift=5.75em]decoder.north) {};
\draw[-latex, very thick,ublue] ([yshift=0.1em]en1.north) -- ([xshift=-4.8em,yshift=-0.1em]encoder.south);
\draw[-latex, very thick,ublue] ([yshift=0.1em]en2.north) -- ([xshift=-1.6em,yshift=-0.1em]encoder.south);
\draw[-latex, very thick,ublue] ([yshift=0.1em]en3.north) -- ([xshift=1.6em,yshift=-0.1em]encoder.south);
\draw[-latex, very thick,ublue] ([yshift=0.1em]en4.north) -- ([xshift=4.8em,yshift=-0.1em]encoder.south);
\draw[-latex, very thick,ublue] ([yshift=0.1em]de1.north) -- ([xshift=-5.75em]decoder.south);
\draw[-latex, very thick,ublue] ([yshift=0.1em]de2.north) -- ([xshift=-3.45em]decoder.south);
\draw[-latex, very thick,ublue] ([yshift=0.1em]de3.north) -- ([xshift=-1.15em]decoder.south);
\draw[-latex, very thick,ublue] ([yshift=0.1em]de4.north) -- ([xshift=1.15em]decoder.south);
\draw[-latex, very thick,ublue] ([yshift=0.1em]de5.north) -- ([xshift=3.45em]decoder.south);
\draw[-latex, very thick,ublue] ([yshift=0.1em]de6.north) -- ([xshift=5.75em]decoder.south);
\draw[-latex, very thick,ublue] ([xshift=-5.75em,yshift=0.1em]decoder.north) -- ([yshift=-0.1em]out1.south);
\draw[-latex, very thick,ublue] ([xshift=-3.45em,yshift=0.1em]decoder.north) -- ([yshift=-0.1em]out2.south);
\draw[-latex, very thick,ublue] ([xshift=-1.15em,yshift=0.1em]decoder.north) -- ([yshift=-0.1em]out3.south);
\draw[-latex, very thick,ublue] ([xshift=1.15em,yshift=0.1em]decoder.north) -- ([yshift=-0.1em]out4.south);
\draw[-latex, very thick,ublue] ([xshift=3.45em,yshift=0.1em]decoder.north) -- ([yshift=-0.1em]out5.south);
\draw[-latex, very thick,ublue] ([xshift=5.75em,yshift=0.1em]decoder.north) -- ([yshift=-0.1em]out6.south);
\draw[-latex, very thick, ublue] (encoder.east) -- (attention.west);
\draw[-latex, very thick, ublue] (attention.east) -- (decoder.west);
\draw[decorate,decoration={brace, mirror},ublue, very thick] ([yshift=-0.4em]de1.-135) -- node[font=\scriptsize,text=black,yshift=-1em]{predict traget length \& get position embedding}([yshift=-0.4em]de6.-45);
%\begin{pgfonlayer}{background}
%{
%\node[rectangle,draw=ublue, inner sep=0mm] [fit =(original1)(ht1)(mt1)(ht1-4)] {};
%}
%\end{pgfonlayer}
\end{tikzpicture}
\definecolor{ublue}{rgb}{0.152,0.250,0.545}
\definecolor{ugreen}{rgb}{0,0.5,0}
%%% outline
%-------------------------------------------------------------------------
\begin{tikzpicture}[scale=0.7]
\tikzstyle{every node}=[scale=0.7]
\tikzstyle{emb} = [draw,very thick, minimum height=2.4em,minimum width=3.6em,rounded corners=2pt,font=\footnotesize]
\tikzstyle{attention} = [fill=blue!15,draw,very thick, minimum width=19.6em,minimum height=2.4em, rounded corners=2pt,font=\footnotesize]
\tikzstyle{line} = [line width=1.6pt,->]
\foreach \x in {1,2,3,4,5}
\node[emb,fill=gray!30] (enemb_\x) at (0+4em*\x,0){\small\bfnew{Emb}};
\foreach \x in {1,2,3,4,5}
\node[emb,fill=gray!30] (deemb_\x) at (24em+4em*\x,0){\small\bfnew{Emb}};
\node[attention] (a1) at (12em, 5em) {\small\bfnew{Multi-Head Self-Attention}};
\node[attention] (a2) at (36em, 5em) {\small\bfnew{Multi-Head Self-Attention}};
\node[attention] (a3) at (36em, 9em) {\small\bfnew{Multi-Head Self-Attention}};
\node[attention] (a4) at (36em, 13em) {\small\bfnew{Multi-Head Self-Attention}};
\foreach \x in {1,2,3,4,5}
\node[emb,fill=cyan!35] (enmlp_\x) at (0+4em*\x,9em){\small\bfnew{MLP}};
\foreach \x in {1,2,3,4,5}
\node[emb,fill=cyan!35] (demlp_\x) at (24em+4em*\x,17em){\small\bfnew{MLP}};
\foreach \x in {1,2,3,4,5}
\node[emb,fill=red!25,inner sep=1pt] (ensf_\x) at (0+4em*\x,17em){\small\bfnew{SoftMax}};
\foreach \x in {1,2,3,4,5}
\node[emb,fill=ugreen!25!white,inner sep=1pt, font=\scriptsize] (desf_\x) at (24em+4em*\x,22em){\small\bfnew{SoftMax}};
\node[minimum height=2.4em,minimum width=3.6em] (enout_1) at (0+4em*1,21em){\small\bfnew{1}};
\node[minimum height=2.4em,minimum width=3.6em] (enout_2) at (0+4em*2,21em){\small\bfnew{1}};
\node[minimum height=2.4em,minimum width=3.6em] (enout_3) at (0+4em*3,21em){\small\bfnew{2}};
\node[minimum height=2.4em,minimum width=3.6em] (enout_4) at (0+4em*4,21em){\small\bfnew{0}};
\node[minimum height=2.4em,minimum width=3.6em] (enout_5) at (0+4em*5,21em){\small\bfnew{1}};
\node[minimum height=2.4em,minimum width=3.6em,inner sep=0pt,font=\footnotesize] (deout_1) at (24em+4em*1,26em){\small\bfnew{I}};
\node[minimum height=2.4em,minimum width=3.6em,inner sep=0pt,font=\footnotesize] (deout_2) at (24em+4em*2,26em){\small\bfnew{love}};
\node[minimum height=2.4em,minimum width=3.6em,inner sep=0pt,font=\footnotesize] (deout_3) at (24em+4em*3,26em){\small\bfnew{my}};
\node[minimum height=2.4em,minimum width=3.6em,inner sep=0pt,font=\footnotesize] (deout_4) at (24em+4em*4,26em){\small\bfnew{dog}};
\node[minimum height=2.4em,minimum width=3.6em,inner sep=0pt,font=\footnotesize] (deout_5) at (24em+4em*5,26em){\small\bfnew{.}};
\foreach \x in {1,2,3,4,5}{
\draw[line] (enemb_\x.north) -- ([yshift=2.6em]enemb_\x.north);
\draw[line] ([yshift=4.9em]enemb_\x.north) -- (enmlp_\x.south);
\draw[line] (enmlp_\x.north) -- (ensf_\x.south);
\draw[line,dotted] (ensf_\x.north) -- (enout_\x.south);
\draw[line] (deemb_\x.north) -- ([yshift=2.6em]deemb_\x.north);
\draw[line] ([yshift=4.9em]deemb_\x.north) -- ([yshift=6.5em]deemb_\x.north);
\draw[line] ([yshift=8.8em]deemb_\x.north) -- ([yshift=10.4em]deemb_\x.north);
\draw[line] ([yshift=12.9em]deemb_\x.north) -- (demlp_\x.south);
\draw[line] (demlp_\x.north) -- (desf_\x.south);
\draw[line,dotted] (desf_\x.north) -- (deout_\x.south);
}
\begin{pgfonlayer}{background}
{
\node[draw=blue!25,very thick,fill=blue!5,rounded corners=6pt,minimum height=12em,minimum width=21em,yshift=-1em] at (12em,9em) (box1){};
\node[draw=red!25,very thick,fill=red!10,rounded corners=6pt,minimum height=7.4em,minimum width=21em,yshift=-1em] at (12em,19.2em) (box2){};
\node[draw=cyan!25,very thick,fill=cyan!15,rounded corners=6pt,minimum height=17.5em,minimum width=22.6em,yshift=-1em] at (36em,11.8em) (box3){};
\node[draw=ugreen!25,very thick,fill=green!15!white,rounded corners=6pt,minimum height=7em,minimum width=22.6em,yshift=-1em] at (36em,24.5em) (box4){};
}
\draw[line] (enmlp_1.north) .. controls ([yshift=3.6em]enmlp_1.north) and ([xshift=-2em]a4.west) .. (a4.west);
\draw[line] (enmlp_2.north) .. controls ([yshift=3.6em]enmlp_2.north) and ([xshift=-2em]a4.west) .. (a4.west);
\draw[line] (enmlp_3.north) .. controls ([yshift=3.6em]enmlp_3.north) and ([xshift=-2em]a4.west) .. (a4.west);
\draw[line] (enmlp_4.north) .. controls ([yshift=3.6em]enmlp_4.north) and ([xshift=-2em]a4.west) .. (a4.west);
\draw[line] (enmlp_5.north) .. controls ([yshift=3.6em]enmlp_5.north) and ([xshift=-2em]a4.west) .. (a4.west);
\end{pgfonlayer}
\node[anchor=east,draw,circle,minimum size=2em,thick,inner sep=0pt,fill=gray!40] at ([xshift=-2em]enemb_1.west) (pos1){\Large$\sim$};
\node[anchor=west,draw,circle,minimum size=2em,thick,inner sep=0pt,fill=gray!40] at ([xshift=2.4em]deemb_5.east) (pos2){\Large$\sim$};
\node[anchor=west,draw,circle,minimum size=2em,thick,inner sep=0pt,fill=gray!40] at ([xshift=2.4em]a3.east) (pos3){\Large$\sim$};
\draw[line] (pos1.0) -- (enemb_1.180);
\draw[line] (pos2.180) -- (deemb_5.0);
\draw[line] (pos3.180) -- (a3.0);
\draw[violet,dotted,rounded corners=4pt,very thick,->] (box2.north) -- ([yshift=2em]box2.north) -- ([yshift=2em,xshift=11.5em]box2.north) -- ([xshift=1.9em]enemb_5.east) -- (deemb_1.west);
\draw[violet,dotted,very thick,->] (enemb_5.east) -- node[below,violet,font=\footnotesize]{\emph{Copy}}(deemb_1.west);
\node[] at ([xshift=-1em,yshift=-4em]box1.west){{$N \times$}};
\node[] at ([xshift=1em,yshift=-7em]box3.east){{$\times N$}};
\node[font=\footnotesize,align=left] at ([xshift=-2em,yshift=2em]box1.west) {\small\bfnew{Encoder} \\ \small\bfnew{Stack}};
\node[font=\footnotesize,align=left] at ([xshift=-2em]box2.west) {\small\bfnew{Fertility} \\ \small\bfnew{Predictor}};
\node[font=\footnotesize,align=left] at ([xshift=0.4em,yshift=-1.2em]pos3.south) {\small\bfnew{Positional} \\ \small\bfnew{Encoding}};
\node[font=\footnotesize,align=left] at ([xshift=2em,yshift=2em]box3.east) {\small\bfnew{Decoder} \\ \small\bfnew{Stack}};
\node[font=\footnotesize,align=left] at ([xshift=2.5em]box4.east) {\small\bfnew{Translation} \\ \small\bfnew{Predictor}};
\node[font=\footnotesize] at ([yshift=-1em]enemb_1.south) {\small\bfnew{}};
\node[font=\footnotesize] at ([yshift=-1em]enemb_2.south) {\small\bfnew{}};
\node[font=\footnotesize] at ([yshift=-1em]enemb_3.south) {\small\bfnew{我的}};
\node[font=\footnotesize] at ([yshift=-1em]enemb_4.south) {\small\bfnew{}};
\node[font=\footnotesize] at ([yshift=-1em]enemb_5.south) {\small\bfnew{}};
\end{tikzpicture}
% set table width
\newcommand{\PreserveBackslash}[1]{\let\temp=\\#1\let\\=\temp}
\newcolumntype{C}[1]{>{\PreserveBackslash\centering}p{#1}}
% used for heatmap
\newcommand*{\MinNumber}{0}%
\newcommand*{\MaxNumber}{1}%
\newcommand{\ApplyGradient}[1]{%
\pgfmathsetmacro{\PercentColor}{100.0*(#1-\MinNumber)/(\MaxNumber-\MinNumber)}
\hspace{-0.33em}\colorbox{white!\PercentColor!myblack}{}
}
\newcolumntype{Q}{>{\collectcell\ApplyGradient}c<{\endcollectcell}}
\begin{center}
\renewcommand{\arraystretch}{0}
\setlength{\tabcolsep}{5mm}
\setlength{\fboxsep}{2.2mm} % box size
\begin{tabular}{C{.20\textwidth}C{.20\textwidth}C{.20\textwidth}C{.20\textwidth}}
\setlength{\tabcolsep}{0pt}
\subfigure [\footnotesize{自注意力}] {
\begin{tabular}{cc}
\setlength{\tabcolsep}{0pt}
~
&
\begin{tikzpicture}
\begin{scope}
\node [inner sep=1.5pt] (w1) at (0,0) {\small{$1$} };
\foreach \x/\y/\z in {2/1/$2$, 3/2/$3$, 4/3/$4$, 5/4/$5$, 6/5/$6$}
{
\node [inner sep=1.5pt,anchor=south west] (w\x) at ([xshift=1.15em]w\y.south west) {\small{\z} };
}
\end{scope}
\end{tikzpicture}
\\
\renewcommand\arraystretch{1}
\begin{tabular}{c}
\setlength{\tabcolsep}{0pt}
\small{$1\ \ $} \\
\small{$2\ $} \\
\small{$3\ $} \\
\small{$4\ $} \\
\small{$5\ $} \\
\small{$6\ $} \\
\end{tabular}
&
%\setlength{\tabcolsep}{0pt}
\begin{tabular}{*{6}{Q}}
0.0000 & 0.5429 & 0.5138 & 0.4650 & 0.5005 & 0.5531 \\
0.5429 & 0.0000 & 0.0606 & 0.0630 & 0.0703 & 0.0332 \\
0.5138 & 0.0606 & 0.0000 & 0.0671 & 0.0472 & 0.0296 \\
0.4650 & 0.0630 & 0.0671 & 0.0000 & 0.0176 & 0.0552 \\
0.5005 & 0.0703 & 0.0472 & 0.0176 & 0.0000 & 0.0389 \\
0.5531 & 0.0332 & 0.0296 & 0.0552 & 0.0389 & 0.0000 \\
\end{tabular}
\end{tabular}
}
&
\subfigure [\footnotesize{编码-解码注意力}] {
\setlength{\tabcolsep}{0pt}
\begin{tabular}{cc}
\setlength{\tabcolsep}{0pt}
~
&
\begin{tikzpicture}
\begin{scope}
\node [inner sep=1.5pt] (w1) at (0,0) {\small{$1$} };
\foreach \x/\y/\z in {2/1/$2$, 3/2/$3$, 4/3/$4$, 5/4/$5$, 6/5/$6$}
{
\node [inner sep=1.5pt,anchor=south west] (w\x) at ([xshift=1.15em]w\y.south west) {\small{\z} };
}
\end{scope}
\end{tikzpicture}
\\
\renewcommand\arraystretch{1}
\begin{tabular}{c}
\setlength{\tabcolsep}{0pt}
\small{$1\ \ $} \\
\small{$2\ $} \\
\small{$3\ $} \\
\small{$4\ $} \\
\small{$5\ $} \\
\small{$6\ $} \\
\end{tabular}
&
%\setlength{\tabcolsep}{0pt}
\begin{tabular}{*{6}{Q}}
0.0000 & 0.0175 & 0.2239 & 0.3933 & 0.7986 & 0.3603 \\
0.0175 & 0.0000 & 0.1442 & 0.3029 & 0.7295 & 0.3324 \\
0.2239 & 0.1442 & 0.0000 & 0.0971 & 0.6270 & 0.4163 \\
0.3933 & 0.3029 & 0.0971 & 0.0000 & 0.2385 & 0.2022 \\
0.7986 & 0.7295 & 0.6270 & 0.2385 & 0.0000 & 0.0658 \\
0.3603 & 0.3324 & 0.4163 & 0.2022 & 0.0658 & 0.0000 \\
\end{tabular}
\end{tabular}
}
\end{tabular}
\end{center}
\begin{tikzpicture}
\begin{scope}
\node [anchor=north west] (pos1) at (0,0) {$\circ$};
\node [anchor= west] (pos2) at ([xshift=3.0em]pos1.east){$\circ$};
\node [anchor= west] (pos1-2) at ([xshift=1.0em,yshift=1.0em]pos1.east){I};
\draw[->,thick](pos1.east)--(pos2.west);
\node [anchor= west] (pos3) at ([xshift=3.0em]pos2.east){$\circ$};
\node [anchor= west] (pos2-2) at ([xshift=0.1em,yshift=1.0em]pos2.east){have};
\draw[->,thick](pos2.east)--(pos3.west);
\end{scope}
\begin{scope}[yshift=-4.0em]
\node [anchor=north west] (pos1) at (0,0) {$\circ$};
\node [anchor= west] (pos2) at ([xshift=3.0em]pos1.east){$\circ$};
\node [anchor= west] (pos1-2) at ([xshift=0.5em,yshift=1.0em]pos1.east){He};
\draw[->,thick](pos1.east)--(pos2.west);
\node [anchor= west] (pos3) at ([xshift=3.0em]pos2.east){$\circ$};
\node [anchor= west] (pos2-2) at ([xshift=0.1em,yshift=1.0em]pos2.east){has};
\draw[->,thick](pos2.east)--(pos3.west);
\node [anchor= west] (pos4) at ([xshift=5.0em]pos3.east){$\circ$};
\node [anchor= west] (pos5) at ([xshift=5.0em]pos4.east){$\circ$};
\node [anchor= west] (pos6) at ([xshift=5.0em]pos5.east){$\circ$};
\node [anchor= west] (word1) at ([xshift=2.0em,yshift=2.7em]pos4.east){I};
\node [anchor= west] (word2) at ([xshift=1.5em,yshift=-1.6em]pos4.east){He};
\node [anchor= west] (word3) at ([xshift=1.4em,yshift=-3em]pos4.east){She};
\node [anchor= west] (word4) at ([xshift=1.1em,yshift=2.8em]pos5.east){Have};
\node [anchor= west] (word5) at ([xshift=1.3em,yshift=-2.8em]pos5.east){Has};
\begin{pgfonlayer}{background}
{
% I
\draw [->,thick] (pos4.north) .. controls +(north:0.8) and +(north:0.8) .. (pos5.north);
% He
\draw [->,thick] (pos4.south) .. controls +(south:0.8) and +(south:0.8) .. (pos5.south);
% She
\draw [->,thick] (pos4.south) .. controls +(south:1.5) and +(south:1.5) .. (pos5.south);
% Have
\draw [->,thick] (pos5.north) .. controls +(north:0.8) and +(north:0.8) .. (pos6.north);
% Has
\draw [->,thick] (pos5.south) .. controls +(south:0.8) and +(south:0.8) .. (pos6.south);
}
\end{pgfonlayer}
\end{scope}
\begin{scope}[yshift=-8.0em]
\node [anchor=north west] (pos1) at (0,0) {$\circ$};
\node [anchor= west] (pos2) at ([xshift=3.0em]pos1.east){$\circ$};
\node [anchor= west] (pos1-2) at ([xshift=0.4em,yshift=1.0em]pos1.east){She};
\draw[->,thick](pos1.east)--(pos2.west);
\node [anchor= west] (pos3) at ([xshift=3.0em]pos2.east){$\circ$};
\node [anchor= west] (pos2-2) at ([xshift=0.1em,yshift=1.0em]pos2.east){has};
\draw[->,thick](pos2.east)--(pos3.west);
\end{scope}
\end{tikzpicture}
%---------------------------------------------------------------------
\begin{tikzpicture}
\tikzstyle{bignode} = [line width=0.6pt,draw=black,minimum width=6.3em,minimum height=2.2em,fill=blue!20,rounded corners=2pt]
\tikzstyle{middlenode} = [line width=0.6pt,draw=black,minimum width=5.6em,minimum height=2.2em,fill=blue!20,rounded corners=2pt]
\tikzstyle{bignode} = [line width=0.6pt,draw=black,minimum width=6.3em,minimum height=2.2em,fill=white]
\tikzstyle{middlenode} = [line width=0.6pt,draw=black,minimum width=5.6em,minimum height=2.2em,fill=white]
\node [anchor=center] (node1-1) at (0,0) {\scriptsize{汉语}};
\node [anchor=west] (node1-2) at ([xshift=1.0em]node1-1.east) {\scriptsize{英语}};
\node [anchor=north] (node1-3) at ([xshift=1.65em]node1-1.south) {\scriptsize{反向翻译模型}};
\node [anchor=west] (node1-2) at ([xshift=0.8em]node1-1.east) {\scriptsize{英语}};
\node [anchor=north] (node1-3) at ([xshift=1.45em]node1-1.south) {\scriptsize{反向翻译模型}};
\draw [->,line width=0.6pt](node1-1.east)--(node1-2.west);
\begin{pgfonlayer}{background}
{
\node[fill=red!20,rounded corners=2pt,inner sep=0.2em,draw=black,line width=0.6pt,minimum width=6.0em] [fit =(node1-1)(node1-2)(node1-3)] (remark1) {};
\node[fill=blue!20,inner sep=0.1em,draw=black,line width=0.6pt,minimum width=6.0em,drop shadow,rounded corners=2pt] [fit =(node1-1)(node1-2)(node1-3)] (remark1) {};
}
\end{pgfonlayer}
\node [anchor=north](node2-1) at ([xshift=-1.93em,yshift=-1.95em]remark1.south){\scriptsize{汉语}};
\node [anchor=north](node2-1-2) at (node2-1.south){\scriptsize{真实数据}};
\begin{pgfonlayer}{background}
{
\node[fill=blue!20,rounded corners=2pt,inner sep=0.1em,draw=black,line width=0.6pt,minimum width=3.85em] [fit =(node2-1)(node2-1-2)] (remark2-1) {};
}
\end{pgfonlayer}
\node [anchor=west](node2-2) at ([xshift=0.82em,yshift=0.68em]remark2-1.east){\scriptsize{英语}};
\node [anchor=north](node2-2-2) at (node2-2.south){\scriptsize{真实数据}};
\begin{pgfonlayer}{background}
{
\node[fill=green!20,rounded corners=2pt,inner sep=0.1em,draw=black,line width=0.6pt,minimum width=3.85em] [fit =(node2-2)(node2-2-2)] (remark2-2) {};
}
\end{pgfonlayer}
\node [anchor=north,fill=green!20,inner sep=0.1em,minimum width=3em,draw=black,line width=0.6pt,rounded corners=2pt](node2-1) at ([xshift=-1.5em,yshift=-1.95em]remark1.south){\scriptsize{汉语}};
\node [anchor=west,fill=green!20,inner sep=0.1em,minimum width=3em,draw=black,line width=0.6pt,rounded corners=2pt](node2-2) at (node2-1.east){\scriptsize{英语}};
\draw [->,line width=0.6pt]([yshift=-2.0em]remark1.south)--(remark1.south) node [pos=0.5,right] (pos1) {\scriptsize{训练}};
\node [anchor=west](node3-1) at ([xshift=5.0em,yshift=0.1em]node1-2.east){\scriptsize{汉语}};
\node [anchor=north](node3-1-2) at (node3-1.south){\scriptsize{真实数据}};
\begin{pgfonlayer}{background}
{
\node[fill=blue!20,rounded corners=2pt,inner sep=0.1em,draw=black,line width=0.6pt,minimum width=3.85em] [fit =(node3-1)(node3-1-2)] (remark3-1) {};
}
\end{pgfonlayer}
\node [anchor=north](node3-2) at ([yshift=-2.15em]remark3-1.south){\scriptsize{英语}};
\node [anchor=north](node3-2-2) at (node3-2.south){\scriptsize{伪数据}};
\begin{pgfonlayer}{background}
{
\node[fill=yellow!20,rounded corners=2pt,inner sep=0.1em,draw=black,line width=0.6pt,minimum width=3.85em] [fit =(node3-2)(node3-2-2)] (remark3-2) {};
}
\end{pgfonlayer}
\node [anchor=west,fill=yellow!20,inner sep=0.1em,minimum width=3em,draw=black,line width=0.6pt,rounded corners=2pt](node3-1) at ([xshift=5.0em,yshift=0.0em]node1-2.east){\scriptsize{汉语}};
\node [anchor=north,fill=red!20,inner sep=0.1em,minimum width=3em,draw=black,line width=0.6pt,rounded corners=2pt](node3-2) at ([yshift=-2.15em]node3-1.south){\scriptsize{英语}};
\draw [->,line width=0.6pt](remark3-1.south)--(remark3-2.north) node [pos=0.5,right] (pos2) {\scriptsize{翻译}};
\draw [->,line width=0.6pt](node3-1.south)--(node3-2.north) node [pos=0.5,right] (pos2) {\scriptsize{翻译}};
\begin{pgfonlayer}{background}
{
\node[rounded corners=2pt,inner sep=0.3em,draw=black,line width=0.6pt,dotted] [fit =(remark3-1)(remark3-2)] (remark2) {};
\node[rounded corners=2pt,inner sep=0.3em,draw=black,line width=0.6pt,dotted] [fit =(node3-1)(node3-2)] (remark2) {};
}
\end{pgfonlayer}
\draw [->,line width=0.6pt](remark1.east)--([yshift=2.40em]remark2.west) node [pos=0.5,above] (pos2) {\scriptsize{模型翻译}};
\draw [->,line width=0.6pt](remark1.east)--([yshift=0.85em]remark2.west) node [pos=0.5,above] (pos2) {\scriptsize{模型翻译}};
\node [anchor=south](pos2-2) at ([yshift=-0.5em]pos2.north){\scriptsize{使用反向}};
\draw[decorate,thick,decoration={brace,amplitude=5pt}] ([yshift=1.3em,xshift=1.5em]node3-1.east) -- ([yshift=-7.7em,xshift=1.5em]node3-1.east) node [pos=0.1,right,xshift=0.0em,yshift=0.0em] (label1) {\scriptsize{{混合}}};
\draw[decorate,thick,decoration={brace,amplitude=5pt}] ([yshift=1.3em,xshift=1.0em]node3-1.east) -- ([yshift=-5.2em,xshift=1.0em]node3-1.east) node [pos=0.1,right,xshift=0.0em,yshift=0.0em] (label1) {\scriptsize{{混合}}};
\node [anchor=west](node4-1) at ([xshift=3.5em,yshift=3.94em]node3-2.east){\scriptsize{英语}};
\node [anchor=north](node4-1-2) at (node4-1.south){\scriptsize{伪数据}};
\begin{pgfonlayer}{background}
{
\node[fill=yellow!20,rounded corners=2pt,inner sep=0.1em,draw=black,line width=0.6pt,minimum width=3.85em] [fit =(node4-1)(node4-1-2)] (remark4-1) {};
}
\end{pgfonlayer}
\node [anchor=north](node4-2) at ([yshift=-1.59em]node4-1.south){\scriptsize{英语}};
\node [anchor=north](node4-2-2) at (node4-2.south){\scriptsize{真实数据}};
\begin{pgfonlayer}{background}
{
\node[fill=green!20,rounded corners=2pt,inner sep=0.1em,draw=black,line width=0.6pt,minimum width=3.85em] [fit =(node4-2)(node4-2-2)] (remark4-2) {};
}
\end{pgfonlayer}
\node [anchor=west](node4-3) at ([xshift=1.7em]node4-2.east){\scriptsize{汉语}};
\node [anchor=north](node4-3-2) at (node4-3.south){\scriptsize{真实数据}};
\begin{pgfonlayer}{background}
{
\node[fill=blue!20,rounded corners=2pt,inner sep=0.1em,draw=black,line width=0.6pt,minimum width=3.85em] [fit =(node4-3)(node4-3-2)] (remark4-3) {};
}
\end{pgfonlayer}
\node [anchor=west](node4-4) at ([xshift=1.7em]node4-1.east){\scriptsize{汉语}};
\node [anchor=north](node4-4-2) at (node4-4.south){\scriptsize{真实数据}};
\node [anchor=west,fill=red!20,inner sep=0.1em,minimum width=3em,draw=black,line width=0.6pt,rounded corners=2pt](node4-1) at ([xshift=2.0em,yshift=1.6em]node3-2.east){\scriptsize{英语}};
\node [anchor=north,fill=green!20,inner sep=0.1em,minimum width=3em,draw=black,line width=0.6pt,rounded corners=2pt](node4-2) at (node4-1.south){\scriptsize{英语}};
\node [anchor=west,fill=green!20,inner sep=0.1em,minimum width=3em,draw=black,line width=0.6pt,rounded corners=2pt](node4-3) at (node4-1.east){\scriptsize{汉语}};
\node [anchor=north,fill=green!20,inner sep=0.1em,minimum width=3em,draw=black,line width=0.6pt,rounded corners=2pt](node4-4) at (node4-3.south){\scriptsize{汉语}};
\begin{pgfonlayer}{background}
{
\node[fill=blue!20,rounded corners=2pt,inner sep=0.1em,draw=black,line width=0.6pt,minimum width=3.85em] [fit =(node4-4)(node4-4-2)] (remark4-3) {};
}
\end{pgfonlayer}
\node [anchor=center] (node5-1) at ([xshift=4.3em,yshift=-1.48em]node4-4.east) {\scriptsize{英语}};
\node [anchor=west] (node5-2) at ([xshift=1.0em]node5-1.east) {\scriptsize{汉语}};
\node [anchor=center] (node5-1) at ([xshift=3.4em,yshift=0.25em]node4-3.east) {\scriptsize{英语}};
\node [anchor=west] (node5-2) at ([xshift=0.8em]node5-1.east) {\scriptsize{汉语}};
\node [anchor=north] (node5-3) at ([xshift=1.65em]node5-1.south) {\scriptsize{正向翻译模型}};
\draw [->,line width=0.6pt](node5-1.east)--(node5-2.west);
\begin{pgfonlayer}{background}
{
\node[fill=red!20,rounded corners=2pt,inner sep=0.2em,draw=black,line width=0.6pt,minimum width=6.0em] [fit =(node5-1)(node5-2)(node5-3)] (remark3) {};
\node[fill=blue!20,inner sep=0.1em,draw=black,line width=0.6pt,minimum width=6.0em,drop shadow,rounded corners=2pt] [fit =(node5-1)(node5-2)(node5-3)] (remark3) {};
}
\end{pgfonlayer}
\draw [->,line width=0.6pt]([xshift=-2em]remark3.west)--(remark3.west) node [pos=0.5,above] (pos3) {\scriptsize{训练}};
\node [anchor=south](d1) at ([xshift=0.0em,yshift=2em]remark3.north){\scriptsize{真实数据:}};
\node [anchor=north](d2) at ([xshift=0.35em]d1.south){\scriptsize{伪数据:}};
\node [anchor=south](d3) at ([xshift=0.0em,yshift=0em]d1.north){\scriptsize{额外数据:}};
\node [anchor=west,fill=green!20,minimum width=1em](d1-1) at ([xshift=-0.0em]d1.east){};
\node [anchor=west,fill=red!20,minimum width=1em](d2-1) at ([xshift=-0.0em]d2.east){};
\node [anchor=west,fill=yellow!20,minimum width=1em](d3-1) at ([xshift=-0.0em]d3.east){};
\end{tikzpicture}
\ No newline at end of file
......@@ -35,9 +35,9 @@
\begin{pgfonlayer}{background}
{
\node [rectangle,inner sep=0.1em,fill=ugreen!20!white] [fit = (w0) (index0)] (wordbox0) {};
\node [rectangle,inner sep=0.1em,fill=ugreen!20!white] [fit = (w1) (index1)] (wordbox1) {};
\node [rectangle,inner sep=0.1em,fill=ugreen!20!white] [fit = (w2) (index2)] (wordbox2) {};
\node [rectangle,draw,inner sep=0.1em,fill=ugreen!20!white] [fit = (w0) (index0)] (wordbox0) {};
\node [rectangle,draw,inner sep=0.1em,fill=ugreen!20!white] [fit = (w1) (index1)] (wordbox1) {};
\node [rectangle,draw,inner sep=0.1em,fill=ugreen!20!white] [fit = (w2) (index2)] (wordbox2) {};
}
\end{pgfonlayer}
......
......@@ -8,7 +8,7 @@
\node [anchor=south] (w1) at (o1.north) {\footnotesize{桌子}};
\node [anchor=south] (w2) at (o2.north) {\footnotesize{椅子}};
{
\node [anchor=south,fill=red!20!white] (cosine) at (w1.north) {\footnotesize{$\textrm{cosine}(\textrm{‘桌子’},\textrm{‘椅子’})=0.5$}};
\node [anchor=south,fill=red!20!white] (cosine) at (w1.north) {\footnotesize{$\textrm{cos}(\textrm{‘桌子’},\textrm{‘椅子’})=0.5$}};
}
\end{scope}
}
......
......@@ -7,7 +7,7 @@
\node [anchor=south] (w1) at (o1.north) {\footnotesize{桌子}};
\node [anchor=south] (w2) at (o2.north) {\footnotesize{椅子}};
{
\node [anchor=south,fill=red!20!white] (cosine) at (w1.north) {\footnotesize{$\textrm{cosine}(\textrm{‘桌子’},\textrm{‘椅子’})=0$}};
\node [anchor=south,fill=red!20!white] (cosine) at (w1.north) {\footnotesize{$\textrm{cos}(\textrm{‘桌子’},\textrm{‘椅子’})=0$}};
}
\end{scope}
......
......@@ -1564,9 +1564,9 @@ z_t&=&\gamma z_{t-1}+(1-\gamma) \frac{\partial J}{\partial {\theta}_t} \cdot \f
\parinterval 为了使神经网络模型训练更加稳定,通常还会考虑其他策略。
\begin{itemize}
\item {\small\bfnew{批量归一化}}\index{批量归一化}(Batch Normalization)\index{Batch Normalization}。批量归一化,顾名思义,是以进行学习时的小批量样本为单位进行归一化\upcite{ioffe2015batch}。具体而言,就是对神经网络隐层输出的每一个维度,沿着批次的方向进行均值为0、方差为1的归一化。在深层神经网络中,每一层网络都可以使用批量归一化操作。这样使神经网络任意一层的输入不至于过大或过小,从而防止隐层中异常值导致模型状态的巨大改变。
\item {\small\bfnew{批量标准化}}\index{批量标准化}(Batch Normalization)\index{Batch Normalization}。批量标准化,顾名思义,是以进行学习时的小批量样本为单位进行标准化\upcite{ioffe2015batch}。具体而言,就是对神经网络隐层输出的每一个维度,沿着批次的方向进行均值为0、方差为1的标准化。在深层神经网络中,每一层网络都可以使用批量标准化操作。这样使神经网络任意一层的输入不至于过大或过小,从而防止隐层中异常值导致模型状态的巨大改变。
\item {\small\bfnew{归一化}}\index{层归一化}(Layer Normalization)\index{Layer Normalization}。类似的,层归一化更多是针对自然语言处理这种序列处理任务\upcite{Ba2016LayerN},它和批量归一化的原理是一样的,只是归一化操作是在序列上同一层网络的输出结果上进行的,也就是归一化操作沿着序列方向进行。这种方法可以很好的避免序列上不同位置神经网络输出结果的不可比性。同时由于归一化后所有的结果都转化到一个可比的范围,使得隐层状态可以在不同层之间进行自由组合。
\item {\small\bfnew{标准化}}\index{层标准化}(Layer Normalization)\index{Layer Normalization}。类似的,层标准化更多是针对自然语言处理这种序列处理任务\upcite{Ba2016LayerN},它和批量标准化的原理是一样的,只是标准化操作是在序列上同一层网络的输出结果上进行的,也就是标准化操作沿着序列方向进行。这种方法可以很好的避免序列上不同位置神经网络输出结果的不可比性。同时由于标准化后所有的结果都转化到一个可比的范围,使得隐层状态可以在不同层之间进行自由组合。
\item {\small\bfnew{残差网络}}\index{残差网络}(Residual Networks)\index{Residual Networks}。最初,残差网络是为了解决神经网络持续加深时的模型退化问题\upcite{DBLP:journals/corr/HeZRS15},但是残差结构对解决梯度消失和梯度爆炸问题也有所帮助。有了残差结构,可以很轻松的构建几十甚至上百层的神经网络,而不用担心层数过深造成的梯度消失问题。残差网络的结构如图\ref{fig:9-51}所示。图\ref{fig:9-51}中右侧的曲线叫做{\small\bfnew{跳接}}\index{跳接}(Skip Connection)\index{Skip Connection},通过跳接在激活函数前,将上一层(或几层)之前的输出与本层计算的输出相加,将求和的结果输入到激活函数中作为本层的输出。假设残差结构的输入为$ {\mathbi{x}}_l $,输出为$ {\mathbi{x}}_{l+1} $,则有
......@@ -1593,7 +1593,7 @@ z_t&=&\gamma z_{t-1}+(1-\gamma) \frac{\partial J}{\partial {\theta}_t} \cdot \f
\label{eq:9-45}
\end{eqnarray}
由上式可知,残差网络可以将后一层的梯度$ \frac{\partial L}{\partial {\mathbi{x}}_{l+1}} $不经过任何乘法项直接传递到$ \frac{\partial L}{\partial {\mathbi{x}}_l} $,从而缓解了梯度经过每一层后多次累乘造成的梯度消失问题。在{\chaptertwelve}中还会看到,在机器翻译中残差结构可以和层归一化一起使用,而且这种组合可以取得很好的效果。
由上式可知,残差网络可以将后一层的梯度$ \frac{\partial L}{\partial {\mathbi{x}}_{l+1}} $不经过任何乘法项直接传递到$ \frac{\partial L}{\partial {\mathbi{x}}_l} $,从而缓解了梯度经过每一层后多次累乘造成的梯度消失问题。在{\chaptertwelve}中还会看到,在机器翻译中残差结构可以和层标准化一起使用,而且这种组合可以取得很好的效果。
\end{itemize}
......@@ -1946,7 +1946,7 @@ z_t&=&\gamma z_{t-1}+(1-\gamma) \frac{\partial J}{\partial {\theta}_t} \cdot \f
\label{eq:9-120}
\end{eqnarray}
\noindent 这里,exp($\cdot$)表示指数函数。Softmax函数是一个典型的归一化函数,它可以将输入的向量的每一维都转化为0-1之间的数,同时保证所有维的和等于1。Softmax的另一个优点是,它本身(对于输出的每一维)都是可微的(如图\ref{fig:softmax}所示),因此可以直接使用基于梯度的方法进行优化。实际上,Softmax经常被用于分类任务。也可以把机器翻译中目标语单词的生成看作一个分类问题,它的类别数是|$V$|。
\noindent 这里,exp($\cdot$)表示指数函数。Softmax函数是一个典型的标准化函数,它可以将输入的向量的每一维都转化为0-1之间的数,同时保证所有维的和等于1。Softmax的另一个优点是,它本身(对于输出的每一维)都是可微的(如图\ref{fig:softmax}所示),因此可以直接使用基于梯度的方法进行优化。实际上,Softmax经常被用于分类任务。也可以把机器翻译中目标语单词的生成看作一个分类问题,它的类别数是|$V$|。
%----------------------------------------------
\begin{figure}[htp]
......@@ -2040,7 +2040,7 @@ z_t&=&\gamma z_{t-1}+(1-\gamma) \frac{\partial J}{\partial {\theta}_t} \cdot \f
\subsubsection{1. One-hot编码}
\parinterval {\small\sffamily\bfseries{One-hot编码}}\index{One-hot编码}(也称{\small\sffamily\bfseries{独热编码}}\index{独热编码})是传统的单词表示方法。One-hot编码把单词表示为词汇表大小的0-1向量,其中只有该词所对应的那一项是1,而其余所有项都是。举个简单的例子,假如有一个词典,里面包含10k个单词,并进行编号。那么每个单词都可以表示为一个10k维的One-hot向量,它仅在对应编号那个维度为1,其他维度都为0,如图\ref{fig:9-64}所示。
\parinterval {\small\sffamily\bfseries{One-hot编码}}\index{One-hot编码}(也称{\small\sffamily\bfseries{独热编码}}\index{独热编码})是传统的单词表示方法。One-hot编码把单词表示为词汇表大小的0-1向量,其中只有该词所对应的那一项是1,而其余所有项都是0。举个简单的例子,假如有一个词典,里面包含10k个单词,并进行编号。那么每个单词都可以表示为一个10k维的One-hot向量,它仅在对应编号那个维度为1,其他维度都为0,如图\ref{fig:9-64}所示。
%----------------------------------------------
\begin{figure}[htp]
......
This source diff could not be displayed because it is too large. You can view the blob instead.
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论