Commit 6797dee8 by 曹润柘

合并分支 'caorunzhe' 到 'master'

Caorunzhe

查看合并请求 !776
parents 154466b9 484276b0
......@@ -105,7 +105,7 @@
\parinterval 机器翻译有两种常用的推断方式\ \dash \ 自左向右推断和自右向左推断。自左向右推断符合现实世界中人类的语言使用规律,因为人在翻译一个句子时,总是习惯从句子开始的部分往后生成\footnote{有些语言中,文字是自右向左书写,这时自右向左推断更符合人类使用这种语言的习惯。}。不过,有时候人也会使用当前单词后面的译文信息。也就是说,翻译也需要“未来” 的文字信息。于是很容易想到使用自右向左的方法对译文进行生成。
\parinterval 以上两种推断方式在神经机器翻译中都有应用,对于源语言句子$\seq{x}=\{x_1,x_2,\dots,x_m\}$和目标语言句子$\seq{y}=\{y_1,y_2,\dots,y_n\}$,自左向右的翻译可以被描述为公式\eqref{eq:14-1}
\parinterval 以上两种推断方式在神经机器翻译中都有应用,对于源语言句子$\seq{x}=\{x_1,\dots,x_m\}$和目标语言句子$\seq{y}=\{y_1,\dots,y_n\}$,自左向右的翻译可以被描述为公式\eqref{eq:14-1}
\begin{eqnarray}
\funp{P}(\seq{y}\vert\seq{x}) &=& \prod_{j=1}^n \funp{P}(y_j\vert\seq{y}_{<j},\seq{x})
......@@ -118,7 +118,7 @@
\label{eq:14-2}
\end{eqnarray}
\noindent 其中,$\seq{y}_{<j}=\{y_1,y_2,\dots,y_{j-1}\}$$\seq{y}_{>j}=\{y_{j+1},y_{j+2},\dots,y_n\}$。可以看到,自左向右推断和自右向左推断本质上是一样的。{\chapterten}{\chaptertwelve}均使用了自左向右的推断方法。自右向左推断比较简单的实现方式是:在训练过程中直接将双语数据中的目标语言句子进行反转,之后仍然使用原始的模型进行训练即可。在推断的时候,生成的目标语言词串也需要进行反转得到最终的译文。有时候,使用自右向左的推断方式会取得更好的效果\upcite{DBLP:conf/wmt/SennrichHB16}。不过更多情况下需要同时使用词串左端(历史)和右端(未来)的信息。有多种思路可以融合左右两端信息:
\noindent 其中,$\seq{y}_{<j}=\{y_1,\dots,y_{j-1}\}$$\seq{y}_{>j}=\{y_{j+1},\dots,y_n\}$。可以看到,自左向右推断和自右向左推断本质上是一样的。{\chapterten}{\chaptertwelve}均使用了自左向右的推断方法。自右向左推断比较简单的实现方式是:在训练过程中直接将双语数据中的目标语言句子进行反转,之后仍然使用原始的模型进行训练即可。在推断的时候,生成的目标语言词串也需要进行反转得到最终的译文。有时候,使用自右向左的推断方式会取得更好的效果\upcite{DBLP:conf/wmt/SennrichHB16}。不过更多情况下需要同时使用词串左端(历史)和右端(未来)的信息。有多种思路可以融合左右两端信息:
\begin{itemize}
\vspace{0.5em}
......
......@@ -103,7 +103,7 @@
\node [anchor=north] (pos1) at ([xshift=1.5em,yshift=-1.0em]node0-2.south) {\small{(a) GPT模型结构}};
\node [anchor=north] (pos2) at ([xshift=1.5em,yshift=-1.0em]node0-6.south) {\small{(b) BERT模型结构}};
\node [anchor=south] (ex) at ([xshift=2.1em,yshift=0.5em]node3-1.north) {\small{TRM:transformer}};
\node [anchor=south] (ex) at ([xshift=2.1em,yshift=0.5em]node3-1.north) {\small{TRM:Transformer}};
......
......@@ -60,7 +60,7 @@
\node [anchor=west,fill=red!20,minimum width=1.5em](d2-1) at ([xshift=-0.0em]d2.east){};
\node [anchor=west,fill=yellow!20,minimum width=1.5em](d3-1) at ([xshift=-0.0em]d3.east){};
\node [anchor=north] (d4) at ([xshift=1em]d1.south) {\small{训练:}};
\node [anchor=north] (d5) at ([xshift=0.5em]d2.south) {\small{}};
\node [anchor=north] (d5) at ([xshift=0.5em]d2.south) {\small{}};
\draw [->,thick] ([xshift=0em]d4.east)--([xshift=1.5em]d4.east);
\draw [->,thick,dashed] ([xshift=0em]d5.east)--([xshift=1.5em]d5.east);
......
\begin{tikzpicture}
\begin{scope}
\node [anchor=center] (node1) at (0,0) {\textbf{Machine translation}, sometiomes referred to by the abbreviation \textbf{MT} (not to be };
\node [anchor=north] (node2) at (node1.south) {confused with computer-aided translation,,machine-aided human translation inter};
\node [anchor=center] (node1) at (0,0) {\textbf{Machine Translation}, sometimes referred to by the abbreviation \textbf{MT} (not to be };
\node [anchor=north] (node2) at (node1.south) {confused with computer-aided translation,machine-aided human translation inter};
\node [anchor=north] (node3) at (node2.south) {-active translation), is a subfield of computational linguistics that investigates the};
\node [anchor=north] (node4) at ([xshift=-1.8em]node3.south) {use of software to translate text or speech from one language to another.};
\node [anchor=south] (node5) at ([xshift=-12.8em,yshift=0.5em]node1.north) {\Large{WIKIPEDIA}};
......
......@@ -12,8 +12,8 @@
\node[node,anchor=west,minimum width=6em,minimum height=2.4em,fill=blue!20,line width=0.6pt] (decoder2) at ([xshift=4em,yshift=0em]decoder1.east){\small 解码器};
\node[node,anchor=west,minimum width=6em,minimum height=2.4em,fill=blue!30,line width=0.6pt] (decoder3) at ([xshift=3em]decoder2.east){\small 解码器};
\node[anchor=north,font=\scriptsize,fill=yellow!20] (w1) at ([yshift=-1.6em]decoder1.south){知识 \ 就是 \ 力量 \ \ <EOS>};
\node[anchor=north,font=\scriptsize,fill=green!20] (w3) at ([yshift=-1.6em]decoder3.south){Wissen \ ist \ Machit \ . \ <EOS>};
\node[anchor=north,font=\scriptsize,fill=yellow!20] (w1) at ([yshift=-1.6em]decoder1.south){知识 \ 就是 \ 力量 \ \ <eos>};
\node[anchor=north,font=\scriptsize,fill=green!20] (w3) at ([yshift=-1.6em]decoder3.south){Wissen \ ist \ Machit \ . \ <eos>};
\node[anchor=south,font=\scriptsize,fill=orange!20] (w2) at ([yshift=1.6em]encoder1.north){Knowledge \ is \ power \ . };
\node[anchor=south,font=\scriptsize,fill=orange!20] (w4) at ([yshift=1.6em]encoder3.north){Knowledge \ is \ power \ . };
......
......@@ -18,7 +18,7 @@
\node[anchor=south,font=\footnotesize,inner sep=0pt] (cache)at ([yshift=2em,xshift=1.5em]key.north){\small\bfnew{Cache}};
\node[draw,anchor=east,minimum size=1.8em,fill=orange!15] (dt) at ([yshift=2.1em,xshift=-4em]key.west){${\mathbi{d}}_{t}$};
\node[anchor=north,font=\footnotesize] (readlab) at ([xshift=2.8em,yshift=0.3em]dt.north){\red{reading}};
\node[anchor=north,font=\footnotesize] (readlab) at ([xshift=2.8em,yshift=0.3em]dt.north){\red{读取}};
\node[draw,anchor=east,minimum size=1.8em,fill=ugreen!15] (st) at ([xshift=-3.7em]dt.west){${\mathbi{s}}_{t}$};
\node[draw,anchor=east,minimum size=1.8em,fill=red!15] (st2) at ([xshift=-0.85em,yshift=3.5em]dt.west){$ \widetilde{\mathbi{s}}_{t}$};
......@@ -27,10 +27,10 @@
\draw[-,thick] (add.0) -- (add.180);
\draw[-,thick] (add.90) -- (add.-90);
\node[anchor=north,inner sep=0pt,font=\footnotesize,text=red] at ([xshift=-0.08em,yshift=-1em]add.south){combining};
\node[anchor=north,inner sep=0pt,font=\footnotesize,text=red] at ([xshift=-0em,yshift=-0.5em]add.south){融合};
\node[draw,anchor=east,minimum size=1.8em,fill=yellow!15] (ct) at ([xshift=-2em,yshift=-3.5em]st.west){$ {\mathbi{C}}_{t}$};
\node[anchor=north,font=\footnotesize] (matchlab) at ([xshift=6.7em,yshift=-0.1em]ct.north){\red{mathching}};
\node[anchor=north,font=\footnotesize] (matchlab) at ([xshift=6.7em,yshift=-0.1em]ct.north){\red{匹配}};
\node[anchor=east] (y) at ([xshift=-6em,yshift=1em]st.west){$\mathbi{y}_{t-1}$};
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论