Commit 7d4e3774 by 孟霞

合并分支 'mengxia' 到 'caorunzhe'

fig-chapter7

查看合并请求 !43
parents 2dfaa569 6217af1c
......@@ -153,7 +153,10 @@
%表1------------------------
%--5.2神经网络基础-----------------------------------------
<<<<<<< HEAD
=======
\sectionnewpage
>>>>>>> master
\section{神经网络基础}
\parinterval 神经网络是一种由大量的节点(或称神经元)之间相互连接构成的计算模型。那么什么是神经元?神经元之间又是如何连接的?神经网络的数学描述又是什么样的?这一节将围绕这些问题对神经网络的基础知识作进行系统的介绍。
......@@ -834,7 +837,10 @@ x_0\cdot w_0+x_1\cdot w_1+x_2\cdot w_2 & = & 0\cdot 1+0\cdot 1+1\cdot 1 \nonumbe
\parinterval 在本书后面的内容中还会看到,深层网络在机器翻译中可以带来明显的性能提升。
%--5.3神经网络的张量实现-----------------------------------------
<<<<<<< HEAD
=======
\sectionnewpage
>>>>>>> master
\section{神经网络的张量实现}
\parinterval 在神经网络内部,输入经过若干次变换,最终得到输出的结果。这个过程类似于一种逐层的数据``流动''。不禁会产生这样的疑问:在神经网络中,数据是以哪种形式``流动''的?如何去编程实现这种数据``流动''呢?
......@@ -1203,7 +1209,10 @@ y&=&{\rm{Sigmoid}}({\rm{Tanh}}(\mathbf x\cdot \mathbf w^1+\mathbf b^1)\cdot \mat
%-------------------------------------------
%--5.4神经网络的参数训练-----------------------------------------
<<<<<<< HEAD
=======
\sectionnewpage
>>>>>>> master
\section{神经网络的参数训练}
\parinterval 简单来说,神经网络可以被看作是由变量和函数组成的表达式,例如:$ \mathbf y=\mathbf x+\mathbf b $$ \mathbf y={\rm{ReLU}}(\mathbf x\cdot \mathbf w+\mathbf b) $$ \mathbf y={\rm{Sigmoid}}({\rm{ReLU}}(\mathbf x\cdot \mathbf w^1+\mathbf b^1)\cdot \mathbf w^2+\mathbf b^2) $等等,其中的$ \mathbf x $$ \mathbf y $作为输入和输出变量, $ \mathbf w $$ \mathbf b $等其他变量作为{\small\sffamily\bfseries{模型参数}}\index{模型参数}(Model Parameters)\index{Model Parameters}。确定了函数表达式和模型参数,也就确定了神经网络模型。通常,表达式的形式需要系统开发者设计,而模型参数的数量有时会非常巨大,因此需要自动学习,这个过程也被称为模型学习或{\small\bfnew{训练}}\index{训练}(Training)\index{Training}。为了实现这个目标,通常会准备一定量的带有标准答案的数据,称之为{\small\sffamily\bfseries{有标注数据}}\index{有标注数据}(Annotated Data/Labeled Data)\index{Annotated Data/Labeled Data}。这些数据会用于对模型参数的学习,这也对应了统计模型中的参数估计过程。在机器学习中,一般把这种使用有标注数据进行统计模型参数训练的过程称为{\small\sffamily\bfseries{有指导的训练}}\index{有指导的训练}{\small\sffamily\bfseries{有监督的训练}}\index{有监督的训练}(Supervised Training)\index{Supervised Training}。在本章中,如果没有特殊说明,模型训练都是指有监督的训练。那么神经网络内部是怎样利用有标注数据对参数进行训练的呢?
......@@ -1920,7 +1929,10 @@ w_{t+1}&=&w_t-\frac{\eta}{\sqrt{z_t+\epsilon}} v_t
%-------------------------------------------
%--5.5神经语言模型-----------------------------------------
<<<<<<< HEAD
=======
\sectionnewpage
>>>>>>> master
\section{神经语言模型}\label{sec5:nlm}
\parinterval 神经网络给我们提供了一种工具,只要将问题的输入和输出定义好,就可以学习输入和输出之间的对应关系。显然,很多自然语言处理任务都可以用神经网络进行实现。比如,在机器翻译中,可以把输入的源语言句子和输出的目标语言句子用神经网络建模;在文本分类中,可以把输入的文本内容和输出的类别标签进行神经网络建模,等等。
......@@ -2287,7 +2299,10 @@ Jobs was the CEO of {\red{\underline{apple}}}.
%-------------------------------------------
%--5.6小结及深入阅读-----------------------------------------
<<<<<<< HEAD
=======
\sectionnewpage
>>>>>>> master
\section{小结及深入阅读}
\parinterval 神经网络为解决自然语言处理问题提供了全新的思路。而所谓深度学习也是建立在多层神经网络结构之上的一系列模型和方法。本章从神经网络的基本概念到其在语言建模中的应用进行了概述。由于篇幅所限,这里无法覆盖所有神经网络和深度学习的相关内容,感兴趣的读者可以进一步阅读《Neural Network Methods in Natural Language Processing》\cite{goldberg2017neural}和《Deep Learning》\cite{lecun2015deep}。此外,也有很多研究方向值得关注:
......
......@@ -3,9 +3,9 @@
\tikzstyle{word} = [font=\scriptsize]
\tikzstyle{model} = [rectangle,draw,minimum height=3em,minimum width=6em,rounded corners=4pt,fill=red!15!white]
\node [model,fill=blue!15!white] (ate) at (0,0) {Attention};
\node [anchor=center] (ate) at (0,0) {};
\node [model,minimum width=10.5em] (decoder) at ([xshift=8em]ate.east) {Decoder};
\node [model,minimum width=10.5em] (decoder) at ([xshift=6em]ate.east) {Decoder};
\node [word] (w1) at ([yshift=-2em,xshift=1em]decoder.south) {$x_3$};
\node [word] (w2) at ([xshift=-1em]w1.west) {\#};
\node [word] (w3) at ([xshift=-1em]w2.west) {\#};
......@@ -22,14 +22,14 @@
\draw [->] (w3.north) -- ([yshift=1.3em]w3.north);
\draw [->] (w4.north) -- ([yshift=1.3em]w4.north);
\draw [->] (w5.north) -- ([yshift=1.4em]w5.north);
\draw [->] (w6.north) -- ([yshift=1.4em]w6.north);
\draw [->] (w6.north) -- ([yshift=1.3em]w6.north);
\draw [->] ([yshift=-1.4em]w7.south) -- (w7.south);
\draw [->] ([yshift=-1.4em]w8.south) -- (w8.south);
\draw [->] ([yshift=-1.4em]w9.south) -- (w9.south);
%encoder
\node [model,minimum width=10.5em] (encoder) at ([xshift=-8em]ate.west) {Encoder};
\node [model,minimum width=10.5em] (encoder) at ([xshift=-6em]ate.west) {Encoder};
\node [word] (we1) at ([yshift=-2em,xshift=1em]encoder.south) {\#};
\node [word] (we2) at ([xshift=-1em]we1.west) {\#};
\node [word] (we3) at ([xshift=-1em]we2.west) {$x_2$};
......@@ -44,7 +44,6 @@
\draw [->] (we5.north) -- ([yshift=1.3em]we5.north);
\draw [->] (we6.north) -- ([yshift=1.4em]we6.north);
\draw [->,very thick] ([xshift=0.5em]encoder) -- ([xshift=-0.5em]ate);
\draw [->,very thick] ([xshift=0.5em]ate) -- ([xshift=-0.5em]decoder);
\draw [->,very thick] ([xshift=0.5em]encoder)--([xshift=-0.5em]decoder);
\end{scope}
\end{tikzpicture}
\ No newline at end of file
\begin{tikzpicture}
\begin{scope}
\tikzstyle{node1} = [rectangle,draw,minimum height=2em,minimum width=8em,rounded corners=2pt,fill=orange!10]
\tikzstyle{node2} = [rectangle,draw,minimum height=1.3em,minimum width=10em,rounded corners=2pt,fill=blue!15!white]
\tikzstyle{node3} = [rectangle,draw,minimum height=2em,minimum width=4em,rounded corners=2pt,fill=orange!10]
\node [anchor=north,inner sep=0mm,node1] (n1) at (0,0) {Parallel\ Data};
\node [anchor=north,node2] (n2) at ([xshift=0em,yshift=-2em]n1.south) {Reverse\ NMT\ System};
\node [anchor=north,node3] (n3) at ([xshift=-3em,yshift=-2em]n2.south) {M$_{\textrm{pseudo}}$};
\node [anchor=west,node3] (n31) at ([xshift=2em,yshift=0em]n3.east) {M$_{\textrm{target}}$};
\node [anchor=north west,node1,minimum height=4em,minimum width=8em] (n4) at ([xshift=5em,yshift=0em]n1.north east) {};
\node [anchor=south west,fill=orange!10,minimum height=1.6em,minimum width=3.6em] (n41) at ([xshift=0.2em,yshift=0.2em]n4.south west) {M$_{\textrm{pseudo}}$};
\node [anchor=south east,fill=orange!10,minimum height=1.6em,minimum width=3.6em] (n42) at ([xshift=-0.2em,yshift=0.2em]n4.south east) {M$_{\textrm{target}}$};
\node [anchor=north,fill=orange!10,minimum height=1.6em,minimum width=7.6em] (n43) at ([xshift=0em,yshift=-0.2em]n4.north) {Parallel\ Data};
\node [anchor=north,node2] (n5) at ([xshift=0em,yshift=-3em]n4.south) {Final\ NMT\ System};
\draw [->,thick,black!60,line width=1mm] (n1.east) -- ([xshift=0em,yshift=1em]n4.west);
\draw [->,thick,black!20,line width=1mm] (n1.south) -- (n2.north);
\draw [->,thick,black!20,line width=1mm] (n2.south) -- ([xshift=0em,yshift=-2em]n2.south);
\draw [->,thick,black!40,line width=1mm] (n3.east) -- (n31.west);
\draw [->,thick,black!60,line width=1mm] (n31.north east) -- (n4.south west);
\draw [->,thick,black!20,line width=1mm] (n4.south) -- (n5.north);
\node [anchor=center] (node1) at (-2.9,1) {\small{训练:}};
\node [anchor=center] (node11) at (-2.5,1) {};
\node [anchor=center] (node12) at (-1.7,1) {};
\node [anchor=center] (node2) at (-2.9,0.5) {\small{推理:}};
\node [anchor=center] (node21) at (-2.5,0.5) {};
\node [anchor=center] (node22) at (-1.7,0.5) {};
\node [anchor=west,draw=black,minimum width=5.6em,minimum height=2.2em,fill=blue!20,rounded corners=2pt] (node1-1) at (0,0) {\footnotesize{双语数据}};
\node [anchor=south,draw=black,minimum width=4.5em,minimum height=2.2em,fill=blue!20,rounded corners=2pt] (node1-2) at ([yshift=-5em]node1-1.south) {\footnotesize{目标语伪数据}};
\node [anchor=west,draw=black,minimum width=4.5em,minimum height=2.2em,fill=red!20,rounded corners=2pt] (node2-1) at ([xshift=-8.8em,yshift=-2.5em]node1-1.west) {\footnotesize{反向NMT系统}};
\node [anchor=west,draw=black,minimum width=4.5em,minimum height=2.2em,fill=red!20,rounded corners=2pt] (node3-1) at ([xshift=3em,yshift=-2.5em]node1-1.east) {\footnotesize{前向NMT系统}};
\draw [-,thick] (n4.west) -- (n4.east);
\draw [-,thick] (n4.south) -- ([xshift=0em,yshift=2em]n4.south);
\draw [-stealth](node1-1.west)--([xshift=3em]node2-1.north);
\draw [-stealth](node1-1.east)--([xshift=-3em]node3-1.north);
\draw [-stealth](node1-2.east)--([xshift=-3em]node3-1.south);
\draw [-stealth](node11.east)--(node12.west);
\draw [-stealth,dashed](node21.east)--(node22.west);
\draw [-stealth,dashed]([xshift=3em]node2-1.south)--(node1-2.west);
\end{scope}
\end{tikzpicture}
\ No newline at end of file
\begin{tikzpicture}
\begin{scope}
\tikzstyle{word} = [font=\scriptsize]
\tikzstyle{model} = [rectangle,draw,minimum height=3em,minimum width=5em,rounded corners=4pt,fill=blue!15!white]
\tikzstyle{model} = [rectangle,draw,minimum height=2.5em,minimum width=5em,rounded corners=4pt,fill=blue!15!white]
\node [model,minimum width=10.5em] (encoder0) at (0,0) {Encoder};
\node [word] (w1) at ([yshift=-2em,xshift=1em]encoder0.south) {\#};
......@@ -22,24 +22,24 @@
\draw [->] (w5.north) -- ([yshift=1.3em]w5.north);
\draw [->] (w6.north) -- ([yshift=1.4em]w6.north);
\draw [<-] (w7.south) -- ([yshift=-1.4em]w7.south);
\draw [<-] (w8.south) -- ([yshift=-1.4em]w8.south);
\draw [<-] (w9.south) -- ([yshift=-1.4em]w9.south);
\draw [->] (w7.south) -- ([yshift=-1.4em]w7.south);
\draw [->] (w8.south) -- ([yshift=-1.4em]w8.south);
\draw [->] (w9.south) -- ([yshift=-1.4em]w9.south);
\node [model] (encoder1) at ([xshift=8em]encoder0.east) {Encoder};
\node [model,fill=red!15!white] (decoder) at ([xshift=6em]encoder1.east) {Decoder};
\node [] (sinput) at ([yshift=-3em]encoder1.south) {source input};
\node [] (tinput) at ([yshift=-3em]decoder.south) {target input};
\node [] (output) at ([yshift=3em]decoder.north) {target output};
\node [model,fill=red!15!white] (decoder) at ([xshift=5em]encoder1.east) {Decoder};
\node [] (sinput) at ([yshift=-3em]encoder1.south) {\footnotesize{源语输入}};
\node [] (tinput) at ([yshift=-3em]decoder.south) {\footnotesize{目标语输入}};
\node [] (output) at ([yshift=3em]decoder.north) {\footnotesize{目标语输出}};
\draw [->] (sinput) -- (encoder1);
\draw [->] (tinput) -- (decoder);
\draw [->] (decoder) -- (output);
\coordinate (do0) at ([yshift=1em]encoder1.north);
\coordinate (do1) at ([xshift=4em]do0.east);
\coordinate (do2) at ([yshift=-2.5em]do1.south);
\coordinate (do1) at ([xshift=3.5em]do0.east);
\coordinate (do2) at ([yshift=-2.3em]do1.south);
\draw [-] (encoder1.north) -- (do0);
\draw [-] (do0) -- (do1);
......@@ -47,13 +47,16 @@
\draw [->] (do2) -- (decoder.west);
\begin{pgfonlayer}{background}
\node [rectangle,inner sep=2em,fill=black!5,rounded corners=4pt] [fit =(w4) (w6) (w9) (encoder0) ] (box) {};
\node [rectangle,inner sep=1em,fill=black!5,rounded corners=4pt] [fit =(w4) (w6) (w9) (encoder0) ] (box) {};
\end{pgfonlayer}
\node [] (left) at ([yshift=-1.5em]box.south) {Pre-training with monolingual data};
\node [] (right) at ([xshift=9.8em]left.east) {Fine-tune on translation task};
\node [] (left) at ([yshift=-1.5em]box.south) {编码器使用单语数据预训练};
\node [] (right) at ([xshift=11em]left.east) {在翻译任务上进行微调};
\node[anchor=north] (arrow1) at (3.7,0.1){};
\draw[fill=yellow!20]([yshift=-0.3em]arrow1.north)--([xshift=-1em,yshift=0.5em]arrow1.north west)--([xshift=-1em,yshift=0.1em]arrow1.north west)--([xshift=-2.6em,yshift=0.1em]arrow1.north west)--([xshift=-2.6em,yshift=-0.1em]arrow1.south west)--([xshift=-1em,yshift=-0.1em]arrow1.south west)--([xshift=-1em,yshift=-0.5em]arrow1.south west)--([yshift=-0.3em]arrow1.north);
\draw [->,black!50!white,line width=3pt,draw] ([xshift=1em]encoder0.east) -- ([xshift=-1em]encoder1.west);
\end{scope}
\end{tikzpicture}
\ No newline at end of file
\begin{tikzpicture}
\begin{scope}
\tikzstyle{node1} = [rectangle,draw,minimum height=2em,minimum width=8em,rounded corners=2pt,fill=orange!10]
\tikzstyle{node2} = [rectangle,draw,minimum height=1.3em,minimum width=10em,rounded corners=2pt,fill=blue!15!white]
\tikzstyle{node3} = [rectangle,draw,minimum height=2em,minimum width=4em,rounded corners=2pt,fill=orange!10]
\node [anchor=center] (node1) at (-2.3,1) {\small{训练:}};
\node [anchor=center] (node11) at (-2.0,1) {};
\node [anchor=center] (node12) at (-1.1,1) {};
\node [anchor=center] (node2) at (-2.3,0.5) {\small{推理:}};
\node [anchor=center] (node21) at (-2.0,0.5) {};
\node [anchor=center] (node22) at (-1.1,0.5) {};
\node [anchor=west,draw=black,minimum width=5.6em,minimum height=2.2em,fill=blue!20,rounded corners=2pt] (node1-1) at (0,0) {\footnotesize{双语数据}};
\node [anchor=south,draw=black,minimum width=4.5em,minimum height=2.2em,fill=blue!20,rounded corners=2pt] (node1-2) at ([yshift=-5em]node1-1.south) {\footnotesize{目标语伪数据}};
\node [anchor=west,draw=black,minimum width=4.5em,minimum height=2.2em,fill=red!20,rounded corners=2pt] (node2-1) at ([xshift=-7.3em,yshift=-2.5em]node1-1.west) {\footnotesize{前向NMT系统}};
\node [anchor=west,draw=black,minimum width=4.5em,minimum height=2.2em,fill=red!20,rounded corners=2pt] (node3-1) at ([xshift=1.5em,yshift=-2.5em]node1-1.east) {\footnotesize{反向NMT系统}};
\node [anchor=north,inner sep=0mm,node1] (n1) at (0,0) {Parallel\ Data};
\node [anchor=north,node2] (n2) at ([xshift=0em,yshift=-2em]n1.south) {NMT\ System};
\node [anchor=north,node3] (n3) at ([xshift=-3em,yshift=-2em]n2.south) {Mono$_{src}$};
\node [anchor=west,node3] (n31) at ([xshift=2em,yshift=0em]n3.east) {Pseudo$_{tgt}$};
\node [anchor=east,draw=black,minimum width=5.6em,minimum height=2.2em,fill=blue!20,rounded corners=2pt] (node4-1) at ([xshift=18em]node1-1) {\footnotesize{双语数据}};
\node [anchor=south,draw=black,minimum width=4.5em,minimum height=2.2em,fill=blue!20,rounded corners=2pt] (node4-2) at ([yshift=-5em]node4-1.south) {\footnotesize{目标语伪数据}};
\node [anchor=north west,node1,minimum height=4em,minimum width=8em] (n4) at ([xshift=4.7em,yshift=0em]n1.north east) {};
\node [anchor=south west,fill=orange!10,minimum height=1.6em,minimum width=3.6em] (n41) at ([xshift=0.2em,yshift=0.2em]n4.south west) {M$_{src}$};
\node [anchor=south east,fill=orange!10,minimum height=1.6em,minimum width=3.6em] (n42) at ([xshift=-0.2em,yshift=0.2em]n4.south east) {P$_{tgt}$};
\node [anchor=north,fill=orange!10,minimum height=1.6em,minimum width=7.6em] (n43) at ([xshift=0em,yshift=-0.2em]n4.north) {Parallel\ Data};
\node [anchor=north,node2] (n5) at ([xshift=0em,yshift=-2em]n4.south) {Reverse\ NMT\ System};
\node [anchor=north,node3] (n6) at ([xshift=-3.3em,yshift=-2em]n5.south) {Pseudo$_{src}$};
\node [anchor=west,node3] (n61) at ([xshift=2em,yshift=0em]n6.east) {Mono$_{tgt}$};
\node [anchor=east,draw=black,minimum width=4.5em,minimum height=2.2em,fill=red!20,rounded corners=2pt] (node5-1) at ([xshift=15.5em]node3-1.east) {\footnotesize{前向NMT系统}};
\node [anchor=north west,node1,minimum height=4em,minimum width=8em] (n7) at ([xshift=4.7em,yshift=0em]n4.north east) {};
\node [anchor=south west,fill=orange!10,minimum height=1.6em,minimum width=3.6em] (n71) at ([xshift=0.2em,yshift=0.2em]n7.south west) {P$_{src}$};
\node [anchor=south east,fill=orange!10,minimum height=1.6em,minimum width=3.6em] (n72) at ([xshift=-0.2em,yshift=0.2em]n7.south east) {M$_{tgt}$};
\node [anchor=north,fill=orange!10,minimum height=1.6em,minimum width=7.6em] (n73) at ([xshift=0em,yshift=-0.2em]n7.north) {Parallel\ Data};
\node [anchor=north,node2] (n8) at ([xshift=0em,yshift=-3em]n7.south) {Final\ NMT\ System};
\draw [->,thick,black!60,line width=1mm] (n1.east) -- ([xshift=0em,yshift=1em]n4.west);
\draw [->,thick,black!20,line width=1mm] (n1.south) -- (n2.north);
\draw [->,thick,black!20,line width=1mm] (n2.south) -- ([xshift=0em,yshift=-2em]n2.south);
\draw [->,thick,black!40,line width=1mm] (n3.east) -- (n31.west);
\draw [->,thick,black!60,line width=1mm] (n31.north east) -- (n4.south west);
\draw [->,thick,black!60,line width=1mm] ([xshift=0em,yshift=1em]n4.east) -- ([xshift=0em,yshift=1em]n7.west);
\draw [->,thick,black!20,line width=1mm] (n4.south) -- (n5.north);
\draw [->,thick,black!20,line width=1mm] (n5.south) -- ([xshift=0em,yshift=-2em]n5.south);
\draw [->,thick,black!40,line width=1mm] (n61.west) -- (n6.east);
\draw [->,black!60,line width=1mm] (n61.north east) -- (n7.south west);
\draw [->,thick,black!20,line width=1mm] (n7.south) -- (n8.north);
\draw [-,thick] (n4.west) -- (n4.east);
\draw [-,thick] (n4.south) -- ([xshift=0em,yshift=2em]n4.south);
\draw [-,thick] (n7.west) -- (n7.east);
\draw [-,thick] (n7.south) -- ([xshift=0em,yshift=2em]n7.south);
\draw [-stealth](node1-1.west)--([xshift=3em]node2-1.north);
\draw [-stealth](node1-1.east)--([xshift=-3em]node3-1.north);
\draw [-stealth](node1-2.east)--([xshift=-3em]node3-1.south);
\draw [-stealth](node11.east)--(node12.west);
\draw [-stealth,dashed](node21.east)--(node22.west);
\draw [-stealth,dashed]([xshift=3em]node2-1.south)--(node1-2.west);
\draw [-stealth,dashed]([xshift=3em]node3-1.south)--(node4-2.west);
\draw [-stealth](node4-1.east)--([xshift=-3em]node5-1.north);
\draw [-stealth](node4-2.east)--([xshift=-3em]node5-1.south);
\end{scope}
\end{tikzpicture}
\ No newline at end of file
\begin{tikzpicture}
\begin{scope}
\node [anchor=center] (node1) at (-2.3,0) {\small{$x,y$:双语数据}};
\node [anchor=center] (node2) at (-2.1,-0.5) {\small{$z$}:单语数据};
\node [anchor=center] (node1-1) at (0,0) {\small{$y'$}};
\node [anchor=center] (node1) at (-2.3,1.5) {\small{$x,y$:双语数据}};
\node [anchor=center] (node2) at (-2.1,1) {\small{$z$}:单语数据};
\node [anchor=center] (node3-1) at ([xshift=5.5em,yshift=-0.1em]node1-1.east) {\small{$z'$}};
\node[anchor=south,draw,rounded corners,minimum height=1.5em,minimum width=4em,fill=blue!20](node1-2) at ([yshift=-3em]node1-1.south) {\small{softamx}};
\node[anchor=south,draw,rounded corners,minimum height=2.5em,minimum width=4em,fill=red!20](node1-3) at ([yshift=-4.5em]node1-2.south) {\small{Decoder}};
\node[anchor=south](node1-4) at ([yshift=-3em]node1-3.south) {\small{$y$}};
\node[anchor=west](node2-2) at ([xshift=-5.5em]node1-4.west) {\small{$x$}};
\node [anchor=center] (labela) at ([xshift=3.5em,yshift=-1.5em]node2-2.south) {\small{(a) Baseline}};
\node[anchor=north,draw,rounded corners,minimum height=2.5em,minimum width=4em,fill=red!20](node2-1) at ([yshift=3.5em]node2-2.north) {\small{Encoder}};
\node[anchor=south,draw,rounded corners,minimum height=1.5em,minimum width=4em,fill=blue!20](node3-2) at ([yshift=-3em]node3-1.south) {\small{softamx}};
\node[anchor=south,draw,rounded corners,minimum height=2.2em,minimum width=4em,fill=red!20](node1-3) at ([yshift=-4.0em]node1-2.south) {\small{Decoder}};
\node[anchor=south,draw,rounded corners,minimum height=2.2em,minimum width=4em,fill=yellow!20](node3-3) at ([yshift=-4.0em]node3-2.south) {\small{LM}};
\node[anchor=south](node1-4) at ([xshift=-0.6em,yshift=-3em]node1-3.south) {\small{$y$}};
\node[anchor=south](node3-41) at ([xshift=-0.6em,yshift=-3em]node3-3.south) {\small{$y$}};
\node[anchor=south](node3-42) at ([xshift=0.6em,yshift=-2.9em]node3-3.south) {\small{$z$}};
\node[anchor=west](node2-2) at ([xshift=-4.9em]node1-4.west) {\small{$x$}};
\node[anchor=north,draw,rounded corners,minimum height=2.2em,minimum width=4em,fill=red!20](node2-1) at ([yshift=4em]node2-2.north) {\small{Encoder}};
\draw [->](node1-4.north)--(node1-3);
\node [rectangle,rounded corners,draw=red,line width=0.2mm,densely dashed,inner sep=0.4em] [fit = (node3-2) (node3-3)] (inputshadow) {};
\draw [->](node1-4.north)--([xshift=-0.6em]node1-3.south);
\draw [->](node1-3.north)--(node1-2);
\draw [->](node1-2.north)--(node1-1);
\draw [->](node2-2.north)--(node2-1);
\draw[->](node2-1.east)--(node1-3.west);
\draw[->](node2-1.north)--([yshift=1em]node2-1.north)--([xshift=2.5em,yshift=1em]node2-1.north)--([xshift=2.5em,yshift=-0.4em]node2-1.north)--(node1-3.west);
\end{scope}
\begin{scope}[xshift=2.3in,yshift=0.6in]
\node [anchor=center] (node1-1) at (0,0) {\small{$y'$}};
\node[anchor=south,draw,rounded corners,minimum height=1.5em,minimum width=4em,fill=blue!20](node1-2) at ([yshift=-3em]node1-1.south) {\small{softamx}};
\node[anchor=south,draw,rounded corners,minimum height=2.5em,minimum width=4em,fill=red!20](node1-3) at ([yshift=-4.5em]node1-2.south) {\small{Decoder}};
\node[anchor=south,draw,rounded corners,minimum height=1.5em,minimum width=4em,fill=blue!20](node3-1) at ([xshift=6em,yshift=0em]node1-3.south) {\small{softmax}};
\node[anchor=south](node3-2) at ([yshift=3em]node3-1.south) {\small{$z'$}};
\node[anchor=south,draw,rounded corners,minimum height=2em,minimum width=4em,,fill=yellow!20](node1-4) at ([yshift=-4em]node1-3.south) {\small{LM}};
\node[anchor=south](node1-5) at ([yshift=-3em]node1-4.south) {\small{$y$}};
\node [anchor=center] (labelb) at ([yshift=-1.5em]node1-5.south) {\small{(b) Multi task learning}};
\node[anchor=west](node2-2) at ([xshift=-5em]node1-5.west) {\small{$x$}};
\node[anchor=north,draw,rounded corners,minimum height=2.5em,minimum width=4em,fill=red!20](node2-1) at ([yshift=5.3em]node2-2.north) {\small{Encoder}};
\draw [->](node3-41.north)--([xshift=-0.6em]node3-3.south);
\draw [->](node3-42.north)--([xshift=0.6em]node3-3.south);
\draw [->]([xshift=0.6em]node3-3.north)--([xshift=0.6em]node3-2.south);
\draw [->](node3-2.north)--(node3-1);
\draw[->]([xshift=-0.6em]node3-3.north)--([xshift=-0.6em,yshift=0.6em]node3-3.north)--([xshift=-3em,yshift=0.6em]node3-3.north)--([xshift=-3em,yshift=-3em]node3-3.north)--([xshift=-5.6em,yshift=-3em]node3-3.north)--([xshift=0.6em]node1-3.south);
\draw [->](node1-5.north)--(node1-4);
\draw [->](node1-4.north)--(node1-3);
\draw [->](node1-3.north)--(node1-2);
\draw [->](node1-2.north)--(node1-1);
\draw [->](node2-2.north)--(node2-1);
\draw [->](node3-1.north)--(node3-2);
\draw[->](node2-1.north)--([yshift=1.8em]node2-1.north)--(node1-3.west);
\draw [->]([yshift=0.8em]node1-4.north)--([xshift=6em,yshift=0.8em]node1-4.north)--(node3-1.south);
%\draw[->](node2-1.north)--([yshift=1em]node2-1.north)--([xshift=2.5em,yshift=1em]node2-1.north)--([xshift=2.5em,yshift=-0.4em]node2-1.north)--(node1-3.west);
\end{scope}
\end{tikzpicture}
\ No newline at end of file
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论