合并分支 'master' 到 'caorunzhe'

Master 查看合并请求 !799

合并分支 'master' 到 'caorunzhe'
Master 查看合并请求 !799
633f9701 · 曹润柘 · 137e91bd · d71465b4 · 633f9701 · 633f9701
Commit 633f9701 authored Jan 06, 2021 by 曹润柘
--- a/Chapter15/chapter15.tex
+++ b/Chapter15/chapter15.tex
@@ -642,7 +642,7 @@ $\mathbi{g}_l$会作为输入的一部分送入第$l+1$层。其网络的结构

 \noindent 其中，$\mathbi{w}$和$\mathbi{b}$为可学习参数。进一步将公式\eqref{eq:15-41}展开后可得：
 \begin{eqnarray}
-\mathbi{x}_{l+1}^{\textrm{post}} &=& \frac{\mathbi{x}_l+\mathbi{y}_l}{\bm  \sigma} \cdot \mathbi{w} - \frac{\bm  \mu}{\bm  \sigma} \cdot \mathbi{w}+\mathbi{b} \nonumber \\ 
+\mathbi{x}_{l+1}^{\textrm{post}} &=& \frac{\mathbi{x}_l+\mathbi{y}_l}{\bm  \sigma} \cdot \mathbi{w} - \frac{\bm  \mu}{\bm  \sigma} \cdot \mathbi{w}+\mathbi{b} \nonumber \\
                                 &=& \frac{\mathbi{w}}{\bm  \sigma} \cdot \mathbi{x}_{l+1}^{\textrm{pre}}-\frac{\mathbi{w}}{\bm  \sigma} \cdot {\bm  \mu}+\mathbi{b}
 \label{eq:15-42}
 \end{eqnarray}
@@ -1015,7 +1015,7 @@ lr &=& d_{\textrm{model}}^{-0.5}\cdot step\_num^{-0.5}

 \parinterval 另一种方法是直接在目标语言端使用句法树进行建模。与源语言句法树的建模不同，目标语言句法树的生成伴随着译文的生成，因此无法像源语言端一样将整个句法树一起处理。这样译文生成问题本质上就变成了目标语言树结构的生成，从这个角度说，这个过程与统计机器翻译中串到树的模型是类似的（见{\chaptereight}）。树结构的生成有很多种策略，基本的思想均是根据已经生成的局部结构预测新的局部结构，并将这些局部结构拼装成更大的结构，直到得到完整的句法树结构\upcite{DBLP:conf/iclr/Alvarez-MelisJ17}{\red（文献格式错误？）}。

-\parinterval 实现目标语言句法树生成的一种手段是将形式文法扩展，以适应分布式表示学习框架。这样，可以使用形式文法描述句法树的生成过程（见{\chapterthree}），同时利用分布式表示来进行建模和学习。比如，可以使用基于循环神经网络的文法描述方法，把句法分析过程看作是一个循环神经网络的执行过程\upcite{DBLP:conf/naacl/DyerKBS16}{\red（文献格式错误？）}。此外，也可以从多任务学习出发，用多个解码端共同完成目标语言句子的生成\upcite{DBLP:journals/corr/LuongLSVK15}{\red（文献格式错误？）}。图\ref{fig:15-25}展示了由一个编码器（汉语）和多个解码器组成的序列生成模型。其中不同解码器分别负责不同的任务：第一个用于预测翻译结果，即翻译任务；{\red 第二个用于预测句法结构；第三个用于重新生成源语言序列，进行自编码。（描述和图不对应？）}其设计思想是各个任务之间能够相互辅助，使得编码器的表示能包含更多的信息，进而让多个任务都获得性能提升。这种方法也可以使用在多个编码器上，其思想是类似的。
+\parinterval 实现目标语言句法树生成的一种手段是将形式文法扩展，以适应分布式表示学习框架。这样，可以使用形式文法描述句法树的生成过程（见{\chapterthree}），同时利用分布式表示来进行建模和学习。比如，可以使用基于循环神经网络的文法描述方法，把句法分析过程看作是一个循环神经网络的执行过程\upcite{DBLP:conf/naacl/DyerKBS16}{\red（文献格式错误？）}。此外，也可以从{\small\sffamily\bfnew{多任务学习}}\index{多任务学习}（Multitask Learning）\index{Multitask Learning}学习出发，用多个解码端共同完成目标语言句子的生成\upcite{DBLP:journals/corr/LuongLSVK15}{\red（文献格式错误？）}。图\ref{fig:15-25}展示了由一个编码器（汉语）和多个解码器组成的序列生成模型。其中不同解码器分别负责不同的任务：第一个用于预测翻译结果，即翻译任务；{\red 第二个用于预测句法结构；第三个用于重新生成源语言序列，进行自编码。（描述和图不对应？）}其设计思想是各个任务之间能够相互辅助，使得编码器的表示能包含更多的信息，进而让多个任务都获得性能提升。这种方法也可以使用在多个编码器上，其思想是类似的。

 %----------------------------------------------
 \begin{figure}[htp]
@@ -1248,7 +1248,7 @@ f(x) &=& x \cdot \delta(\beta x) \\
 %----------------------------------------------------------------------------------------

 \sectionnewpage
-\section{小结及深入阅读}
+\section{小结及拓展阅读}

 \parinterval 模型结构优化一直是机器翻译研究的重要方向。一方面，对于通用框架（如注意力机制）的结构改良可以服务于多种自然语言处理任务，另一方面，针对机器翻译的问题设计相适应的模型结构也是极具价值的。本章节重点介绍了神经机器翻译中结构优化的几种方法，内容涉及注意力机制的改进、深层神经网络的构建、句法结构的使用以及自动结构搜索等几个方面。此外，还有若干问题值得关注：


--- a/Chapter16/Figures/figure-application-process-of-back-translation.tex
+++ b/Chapter16/Figures/figure-application-process-of-back-translation.tex
@@ -58,9 +58,9 @@

 \draw [->,thick]([xshift=-3.2em]remark3.west)--(remark3.west) node [pos=0.5,above] (pos3) {\small{训练}};

-\node [anchor=south](d1) at ([xshift=-1.5em,yshift=1em]remark1.north){\small{真实数据：}};
+\node [anchor=south](d1) at ([xshift=-1.5em,yshift=1em]remark1.north){\small{真实双语数据：}};
 \node [anchor=west](d2) at ([xshift=2.0em]d1.east){\small{伪数据：}};
-\node [anchor=west](d3) at ([xshift=2.0em]d2.east){\small{额外数据：}};
+\node [anchor=west](d3) at ([xshift=2.0em]d2.east){\small{额外单语数据：}};
 \node [anchor=west,fill=green!20,minimum width=1.5em](d1-1) at ([xshift=-0.0em]d1.east){};
 \node [anchor=west,fill=red!20,minimum width=1.5em](d2-1) at ([xshift=-0.0em]d2.east){};
 \node [anchor=west,fill=yellow!20,minimum width=1.5em](d3-1) at ([xshift=-0.0em]d3.east){};

--- a/Chapter16/Figures/figure-comparison-of-structure-between-gpt-and-bert-model.tex
+++ b/Chapter16/Figures/figure-comparison-of-structure-between-gpt-and-bert-model.tex
@@ -103,7 +103,7 @@
 \node [anchor=north] (pos1) at ([xshift=1.5em,yshift=-1.0em]node0-2.south) {\small{(a) GPT模型结构}};
 \node [anchor=north] (pos2) at ([xshift=1.5em,yshift=-1.0em]node0-6.south) {\small{(b) BERT模型结构}};

-\node [anchor=south] (ex) at ([xshift=2.1em,yshift=0.5em]node3-1.north) {\small{TRM：Transformer}};
+\node [anchor=south] (ex) at ([xshift=2.1em,yshift=0.5em]node3-1.north) {\small{TRM：标准Transformer模块}};




--- a/Chapter16/Figures/figure-example-of-iterative-back-translation.tex
+++ b/Chapter16/Figures/figure-example-of-iterative-back-translation.tex
@@ -53,9 +53,9 @@
 \draw [->,thick]([yshift=-0.75em]node5-1.east)--(remark3.north west);
 \draw [->,thick]([yshift=-0.75em]node6-1.east)--(remark3.south west);

-\node [anchor=south](d1) at ([xshift=-0.7em,yshift=5.5em]remark1.north){\small{真实数据：}};
+\node [anchor=south](d1) at ([xshift=-0.7em,yshift=5.5em]remark1.north){\small{真实双语数据：}};
 \node [anchor=west](d2) at ([xshift=2.0em]d1.east){\small{伪数据：}};
-\node [anchor=west](d3) at ([xshift=2.0em]d2.east){\small{额外数据：}};
+\node [anchor=west](d3) at ([xshift=2.0em]d2.east){\small{额外单语数据：}};
 \node [anchor=west,fill=green!20,minimum width=1.5em](d1-1) at ([xshift=-0.0em]d1.east){};
 \node [anchor=west,fill=red!20,minimum width=1.5em](d2-1) at ([xshift=-0.0em]d2.east){};
 \node [anchor=west,fill=yellow!20,minimum width=1.5em](d3-1) at ([xshift=-0.0em]d3.east){};

--- a/Chapter16/Figures/figure-the-iterative-process-of-bidirectional-training.tex
+++ b/Chapter16/Figures/figure-the-iterative-process-of-bidirectional-training.tex
@@ -14,14 +14,14 @@

 \node(process_1_1)[process, right of = monolingual_X, xshift=2.5cm, yshift=-1.5cm]{\textbf{$M^0_{x\to y}$}};
 \node(process_1_2)[process, right of = process_1_1, xshift=5cm, fill=red!25]{$M^0_{y\to x}$};
-\node(process_2_1)[process, below of = process_1_1, yshift=-1.2cm]{解码过程};
-\node(process_2_2)[process, below of = process_1_2, yshift=-1.2cm, fill=red!25]{解码过程};
+\node(process_2_1)[process, below of = process_1_1, yshift=-1.2cm]{翻译过程};
+\node(process_2_2)[process, below of = process_1_2, yshift=-1.2cm, fill=red!25]{翻译过程};
 \node(process_3_1)[state, below of = process_2_1, yshift=-1.2cm, fill=color1!25]{\{$x_i,\hat{y}^0_i$\}};
 \node(process_3_2)[state, below of = process_2_2, yshift=-1.2cm, fill=blue!25]{\{$\hat{x}^0_i,{y_i}$\}};
 \node(process_4_1)[process, below of = process_3_1, yshift=-1.2cm]{\textbf{$M^1_{x\to y}$}};
 \node(process_4_2)[process, below of = process_3_2, yshift=-1.2cm, fill=red!25]{$M^1_{y\to x}$};
-\node(process_5_1)[process, below of = process_4_1, yshift=-1.2cm]{解码过程};
-\node(process_5_2)[process, below of = process_4_2, yshift=-1.2cm, fill=red!25]{解码过程};
+\node(process_5_1)[process, below of = process_4_1, yshift=-1.2cm]{翻译过程};
+\node(process_5_2)[process, below of = process_4_2, yshift=-1.2cm, fill=red!25]{翻译过程};
 \node(process_6_1)[state, below of = process_5_1, yshift=-1.2cm, fill=color1!25]{\{$x_i,\hat{y}^1_i$\}};
 \node(process_6_2)[state, below of = process_5_2, yshift=-1.2cm, fill=blue!25]{\{$\hat{x}^1_i,{y_i}$\}};
 \node(process_7_1)[process, below of = process_6_1, yshift=-1.2cm]{\textbf{$M^2_{x\to y}$}};

--- a/Chapter16/chapter16.tex
+++ b/Chapter16/chapter16.tex