合并分支 'master' 到 'mengxia'

Master 查看合并请求 !433

合并分支 'master' 到 'mengxia'
Master 查看合并请求 !433
7b3fbd9f · 孟霞 · df5f0c2b · 92a5de00 · 7b3fbd9f · 7b3fbd9f
Commit 7b3fbd9f authored Nov 19, 2020 by 孟霞
--- a/Chapter10/Figures/mt-history.png
+++ b/Chapter10/Figures/mt-history.png
--- a/Chapter10/chapter10.tex
+++ b/Chapter10/chapter10.tex
@@ -272,7 +272,7 @@ NMT                     & 21.7          & 18.7           & -13.7      \\
 \parinterval  编码器-解码器框架是一种典型的基于“表示”的模型。编码器的作用是将输入的文字序列通过某种转换变为一种新的“表示”形式，这种“表示”包含了输入序列的所有信息。之后，解码器把这种“表示”重新转换为输出的文字序列。这其中的一个核心问题是表示学习，即：如何定义对输入文字序列的表示形式，并自动学习这种表示，同时应用它生成输出序列。一般来说，不同的表示学习方法可以对应不同的机器翻译模型，比如，在最初的神经机器翻译模型中，源语言句子都被表示为一个独立的向量，这时表示结果是静态的；而在注意力机制中，源语言句子的表示是动态的，也就是翻译目标语言的每个单词时都会使用不同的表示结果。
-\parinterval  图\ref{fig:10-5}是一个应用编码器-解码器结构来解决机器翻译问题的简单实例。给定一个中文句子“我/对/你/感到/满意”，编码器会将这句话编码成一个实数向量$(0.2, -1, 6, 5, 0.7, -2)$，这个向量就是源语言句子的“表示”结果。虽然有些不可思议，但是神经机器翻译模型把这个向量等同于输入序列。向量中的数字并没有实际的意义，然而解码器却能从中提取到源语言句子中所包含的信息。也有研究者把向量的每一个维度看作是一个“特征”，这样源语言句子就被表示成多个“特征”的联合，而且这些特征可以被自动学习。有了这样的源语言句子的“表示”，解码器可以把这个实数向量作为输入，然后逐词生成目标语言句子“I am satisfied with you”。
+\parinterval  图\ref{fig:10-5}是一个应用编码器-解码器结构来解决机器翻译问题的简单实例。给定一个中文句子“我/对/你/感到/满意”，编码器会将这句话编码成一个实数向量$(0.2, -1, 6, \\ 5, 0.7, -2)$，这个向量就是源语言句子的“表示”结果。虽然有些不可思议，但是神经机器翻译模型把这个向量等同于输入序列。向量中的数字并没有实际的意义，然而解码器却能从中提取到源语言句子中所包含的信息。也有研究者把向量的每一个维度看作是一个“特征”，这样源语言句子就被表示成多个“特征”的联合，而且这些特征可以被自动学习。有了这样的源语言句子的“表示”，解码器可以把这个实数向量作为输入，然后逐词生成目标语言句子“I am satisfied with you”。
 %----------------------------------------------
 \begin{figure}[htp]
@@ -966,7 +966,7 @@ L(\mathbi{Y},\widehat{\mathbi{Y}}) = \sum_{j=1}^n L_{\textrm{ce}}(\mathbi{y}_j,\
 \parinterval 神经网络的参数主要是各层中的线性变换矩阵和偏置。在训练开始时，需要对参数进行初始化。但是，由于神经机器翻译的网络结构复杂，因此损失函数往往不是凸函数，不同初始化会导致不同的优化结果。而且在大量实践中已经发现，神经机器翻译模型对初始化方式非常敏感，性能优异的系统往往需要特定的初始化方式。
-\parinterval 因为LSTM是神经机器翻译中常用的一种模型，下面以LSTM模型为例（见\ref{sec:lstm-cell}节），介绍机器翻译模型的初始化方法，这些方法也可以推广到GRU等结构。具体内容如下：
+\parinterval 因为LSTM是神经机器翻译中常用的一种模型，下面以LSTM模型为例（见\ref{sec:lstm-cell}\\ 节），介绍机器翻译模型的初始化方法，这些方法也可以推广到GRU等结构。具体内容如下：
 \begin{itemize}
 \vspace{0.5em}

--- a/Chapter11/Figures/figure-convolution-kernel.tex
+++ b/Chapter11/Figures/figure-convolution-kernel.tex
@@ -51,9 +51,9 @@
 %\node[minimum width = 1.8cm] (sub) at ([xshift=-5.5cm,yshift=2cm]num9_9.east) {};
-\draw[decorate,decoration={brace,mirror,raise=0pt,amplitude=0.3cm},black,thick] ([yshift=0.4cm,xshift=-0.1cm]num1_1.west) -- node[att,xshift=-0.5cm]{$q$} ([yshift=-0.4cm,xshift=-0.1cm]num3_3.west);
+\draw[decorate,decoration={brace,mirror,raise=0pt,amplitude=0.3cm},black,thick] ([yshift=0.4cm,xshift=-0.1cm]num1_1.west) -- node[att,xshift=-0.5cm]{$Q$} ([yshift=-0.4cm,xshift=-0.1cm]num3_3.west);
-\draw[decorate,decoration={brace,raise=0pt,amplitude=0.3cm},black,thick] ([xshift=-0.4cm,yshift=0.1cm]num1.north) -- node[att,yshift=0.5cm]{$k$}([xshift=0.4cm,yshift=0.1cm]num7.north);
+\draw[decorate,decoration={brace,raise=0pt,amplitude=0.3cm},black,thick] ([xshift=-0.4cm,yshift=0.1cm]num1.north) -- node[att,yshift=0.5cm]{$U$}([xshift=0.4cm,yshift=0.1cm]num7.north);
-\draw[decorate,decoration={brace,mirror,raise=0pt,amplitude=0.3cm},black,thick] ([xshift=0.5cm,yshift=0.00cm]num9_9.south) -- node[att,xshift=0.5cm,yshift=-0.3cm]{$o$}([xshift=0.5cm,yshift=0.00cm]num9.south);
+\draw[decorate,decoration={brace,mirror,raise=0pt,amplitude=0.3cm},black,thick] ([xshift=0.5cm,yshift=0.00cm]num9_9.south) -- node[att,xshift=0.5cm,yshift=-0.3cm]{$O$}([xshift=0.5cm,yshift=0.00cm]num9.south);
 \end{tikzpicture}
\ No newline at end of file
--- a/Chapter11/Figures/figure-deep-vs-light.tex
+++ b/Chapter11/Figures/figure-deep-vs-light.tex
@@ -21,10 +21,10 @@
 	\draw[line width=0.9pt, gray!80, -latex] (l\point_3.east) -- (r2_3.west);
 	}
-	\node[vuale] at (-1.5em, 1.9em) {$x_2$};
+	\node[vuale] at (-1.5em, 1.9em) {$\mathbi{x}_2$};
-	\node[vuale] at (-1.5em, 9.9em) {$x_1$};
+	\node[vuale] at (-1.5em, 9.9em) {$\mathbi{x}_1$};
-	\node[vuale] at (6.5em, 1.9em) {$y_1$};
+	\node[vuale] at (6.5em, 1.9em) {$\mathbi{z}_2$};
-	\node[vuale] at (6.5em, 9.9em) {$y_2$};
+	\node[vuale] at (6.5em, 9.9em) {$\mathbi{z}_1$};
 	\node (t2) at (2.5em, -1em) {\large{$\cdots$}};
 	\node [anchor=north,font=\tiny] at ([yshift=-0.2em]t2.south) {深度卷积};
@@ -46,10 +46,10 @@
 	\draw[line width=0.9pt, cyan!80, -latex] (l\point_3.east) -- (r2_3.west);
 	}
-	\node[vuale] at (-1.5em, 1.9em) {$x_2$};
+	\node[vuale] at (-1.5em, 1.9em) {$\mathbi{x}_2$};
-	\node[vuale] at (-1.5em, 9.9em) {$x_1$};
+	\node[vuale] at (-1.5em, 9.9em) {$\mathbi{x}_1$};
-	\node[vuale] at (6.5em, 1.9em) {$y_1$};
+	\node[vuale] at (6.5em, 1.9em) {$\mathbi{z}_2$};
-	\node[vuale] at (6.5em, 9.9em) {$y_2$};
+	\node[vuale] at (6.5em, 9.9em) {$\mathbi{z}_1$};
 	\node (t2) at (2.5em, -1em) {\large{$\cdots$}};
 	\node [anchor=north,font=\tiny] at ([yshift=-0.2em]t2.south) {轻量卷积};

--- a/Chapter11/Figures/figure-single-glu.tex
+++ b/Chapter11/Figures/figure-single-glu.tex
@@ -63,9 +63,9 @@ $\otimes$： & 按位乘运算 \\
 	\draw[-latex,thick] (b.east) -- (c2.west);
 	\draw[-latex,thick] (c2.east) -- ([xshift=0.4cm]c2.east); 
-	\node[inner sep=0pt, font=\tiny] at (0.75cm, -0.4cm) {$X$};
+	\node[inner sep=0pt, font=\tiny] at (0.75cm, -0.4cm) {$\mathbi{X}$};
-	\node[inner sep=0pt, font=\tiny] at ([yshift=-0.4cm]a.south) {$B=X * V + c$};
+	\node[inner sep=0pt, font=\tiny] at ([yshift=-0.4cm]a.south) {$\mathbi{B}=\mathbi{X} * \mathbi{V} + \mathbi{b}_{\mathbi{W}}$};
-	\node[inner sep=0pt, font=\tiny] at ([yshift=-0.4cm]b.south) {$A=X * W + b$};
+	\node[inner sep=0pt, font=\tiny] at ([yshift=-0.4cm]b.south) {$\mathbi{A}=\mathbi{X} * \mathbi{W} + \mathbi{b}_{\mathbi{V}}$};
-	\node[inner sep=0pt, font=\tiny] at (8.5cm, -0.4cm) {$Y=A \otimes \sigma(B)$};
+	\node[inner sep=0pt, font=\tiny] at (8.5cm, -0.4cm) {$\mathbi{Y}=\mathbi{A} \otimes \sigma(\mathbi{B})$};
 \end{tikzpicture}
\ No newline at end of file
--- a/Chapter11/Figures/figure-standard.tex
+++ b/Chapter11/Figures/figure-standard.tex
@@ -32,12 +32,12 @@
 	\draw[line width=0.5pt, cyan!80, -latex] (l3_\point.east) -- ([xshift=0em,yshift=0.1em]r3_2.west);
 	}
-	\node[vuale] at ([xshift=-0.9em]l1_1.west) {$x_3$};
+	\node[vuale] at ([xshift=-0.9em]l1_1.west) {$\mathbi{x}_3$};
-	\node[vuale] at ([xshift=-0.9em]l2_1.west) {$x_2$};
+	\node[vuale] at ([xshift=-0.9em]l2_1.west) {$\mathbi{x}_2$};
-	\node[vuale] at ([xshift=-0.9em]l3_1.west) {$x_1$};
+	\node[vuale] at ([xshift=-0.9em]l3_1.west) {$\mathbi{x}_1$};
-	\node[vuale] at ([xshift=0.9em]r1_1.east) {$y_3$};
+	\node[vuale] at ([xshift=0.9em]r1_1.east) {$\mathbi{z}_3$};
-	\node[vuale] at ([xshift=0.9em]r2_1.east) {$y_3$};
+	\node[vuale] at ([xshift=0.9em]r2_1.east) {$\mathbi{z}_2$};
-	\node[vuale] at ([xshift=0.9em]r3_1.east) {$y_3$};
+	\node[vuale] at ([xshift=0.9em]r3_1.east) {$\mathbi{z}_1$};
 	\node (t1) at (2.5em, -1em) {\large{$\cdots$}};
 	\node [anchor=north,font=\tiny] at ([yshift=-0.2em]t1.south) {传统卷积};
@@ -66,12 +66,12 @@
 	\draw[line width=0.5pt, cyan!80, -latex] (l\point_2.east) -- (r3_2.west);
 	}
-	\node[vuale] at ([xshift=-0.9em]l1_1.west) {$x_3$};
+	\node[vuale] at ([xshift=-0.9em]l1_1.west) {$\mathbi{x}_3$};
-	\node[vuale] at ([xshift=-0.9em]l2_1.west) {$x_2$};
+	\node[vuale] at ([xshift=-0.9em]l2_1.west) {$\mathbi{x}_2$};
-	\node[vuale] at ([xshift=-0.9em]l3_1.west) {$x_1$};
+	\node[vuale] at ([xshift=-0.9em]l3_1.west) {$\mathbi{x}_1$};
-	\node[vuale] at ([xshift=0.9em]r1_1.east) {$y_3$};
+	\node[vuale] at ([xshift=0.9em]r1_1.east) {$\mathbi{z}_3$};
-	\node[vuale] at ([xshift=0.9em]r2_1.east) {$y_3$};
+	\node[vuale] at ([xshift=0.9em]r2_1.east) {$\mathbi{z}_2$};
-	\node[vuale] at ([xshift=0.9em]r3_1.east) {$y_3$};
+	\node[vuale] at ([xshift=0.9em]r3_1.east) {$\mathbi{z}_1$};
 	\node (t2) at (2.5em, -1em) {\large{$\cdots$}};
 	\node [anchor=north,font=\tiny] at ([yshift=-0.2em]t2.south) {深度卷积};
@@ -102,12 +102,12 @@
 	\draw[line width=0.5pt, cyan!80, -latex] (l3_\point.east) -- (r3_2.west);
 	}
-	\node[vuale] at ([xshift=-0.9em]l1_1.west) {$x_3$};
+	\node[vuale] at ([xshift=-0.9em]l1_1.west) {$\mathbi{x}_3$};
-	\node[vuale] at ([xshift=-0.9em]l2_1.west) {$x_2$};
+	\node[vuale] at ([xshift=-0.9em]l2_1.west) {$\mathbi{x}_2$};
-	\node[vuale] at ([xshift=-0.9em]l3_1.west) {$x_1$};
+	\node[vuale] at ([xshift=-0.9em]l3_1.west) {$\mathbi{x}_1$};
-	\node[vuale] at ([xshift=0.9em]r1_1.east) {$y_3$};
+	\node[vuale] at ([xshift=0.9em]r1_1.east) {$\mathbi{z}_3$};
-	\node[vuale] at ([xshift=0.9em]r2_1.east) {$y_3$};
+	\node[vuale] at ([xshift=0.9em]r2_1.east) {$\mathbi{z}_2$};
-	\node[vuale] at ([xshift=0.9em]r3_1.east) {$y_3$};
+	\node[vuale] at ([xshift=0.9em]r3_1.east) {$\mathbi{z}_1$};
 	\node (t3) at (2.5em, -1em) {\large{$\cdots$}};
 	\node [anchor=north,font=\tiny] at ([yshift=-0.2em]t3.south) {逐点卷积};

--- a/Chapter11/Figures/figure-structural-comparison-a.tex
+++ b/Chapter11/Figures/figure-structural-comparison-a.tex
@@ -12,8 +12,8 @@
 \node(num7)[num,right of = num6,xshift = 1.2cm]{\textcolor{blue!70}{$\mathbi{e}_7$}};
 \node(num8)[num,right of = num7,xshift = 1.2cm]{\textcolor{blue!70}{$\mathbi{e}_8$}};
 \node(num9)[num,right of = num8,xshift = 1.2cm]{$\mathbi{e}_9$};
-\node(A)[below of = num2,yshift = -0.6cm]{A};
+%\node(A)[below of = num2,yshift = -0.6cm]{A};
-\node(B)[below of = num8,yshift = -0.6cm]{B};
+%\node(B)[below of = num8,yshift = -0.6cm]{B};
 \draw [->, thick, color = blue!80](num2.east)--(num3.west);
@@ -23,5 +23,8 @@
 \draw [->, thick, color = blue!80](num6.east)--(num7.west);
 \draw [->, thick, color = blue!80](num7.east)--(num8.west);
+\draw [->,thick,color = black!70] (num1) -- (num2);
+\draw [->,thick,color =black!70] (num8) -- (num9);
 \end{tikzpicture}
\ No newline at end of file
--- a/Chapter11/Figures/figure-structural-comparison-b.tex
+++ b/Chapter11/Figures/figure-structural-comparison-b.tex
@@ -13,8 +13,8 @@
 \node(num1_8)[num,right of = num1_7,xshift = 1.2cm]{\textcolor{blue!70}{$\mathbi{e}_8$}};
 \node(num1_9)[num,right of = num1_8,xshift = 1.2cm]{$\mathbi{e}_9$};
 \node(num1_10)[num,right of = num1_9,xshift = 1.2cm, fill = blue!40]{$\mathbi{0}$};
-\node(A)[below of = num2,yshift = -0.6cm]{A};
+%\node(A)[below of = num2,yshift = -0.6cm]{A};
-\node(B)[below of = num8,yshift = -0.6cm]{B};
+%\node(B)[below of = num8,yshift = -0.6cm]{B};
 \node(num2_0)[num,above of = num1_0,yshift = 1.2cm, fill = blue!40]{\textcolor{white}{$\mathbi{0}$}};
 \node(num2_1)[num,right of = num2_0,xshift = 1.2cm]{\textbf2};

--- a/Chapter11/chapter11.tex
+++ b/Chapter11/chapter11.tex
--- a/Chapter16/Figures/figure-contrast-of-traditional-machine-learning&transfer-learning.jpg
+++ b/Chapter16/Figures/figure-contrast-of-traditional-machine-learning&transfer-learning.jpg
--- a/Chapter16/Figures/figure-knowledge-distillation-based-translation-process.jpg
+++ b/Chapter16/Figures/figure-knowledge-distillation-based-translation-process.jpg
--- a/Chapter16/Figures/figure-parameter-initialization-method-diagram.jpg
+++ b/Chapter16/Figures/figure-parameter-initialization-method-diagram.jpg
--- a/Chapter16/Figures/figure-pivot-based-translation-process.jpg
+++ b/Chapter16/Figures/figure-pivot-based-translation-process.jpg
--- a/Chapter16/Figures/figure-three-common-methods-of-adding-noise.tex
+++ b/Chapter16/Figures/figure-three-common-methods-of-adding-noise.tex
@@ -138,7 +138,7 @@
 \node [anchor=south,inner sep=2pt,minimum height=1.5em,minimum width=3.0em] (c10) at (c11.north) {\scriptsize{源语言}};
 \node [anchor=south,inner sep=2pt,minimum height=1.5em,minimum width=3.0em] (c30) at (c31.north) {\small{$n$=3}};
-\node [anchor=south,inner sep=2pt,minimum height=1.5em,minimum width=3.0em] (c50) at (c51.north) {\small{$S$}};
+\node [anchor=south,inner sep=2pt,minimum height=1.5em,minimum width=3.0em] (c50) at (c51.north) {\small{$\mathbi{S}$}};
 \node [anchor=south,inner sep=2pt] (c60) at (c61.north) {\scriptsize{进行排序}};
 \node [anchor=south,inner sep=2pt] (c60-2) at (c60.north) {\scriptsize{由小到大}};

--- a/Chapter16/Figures/multi-language-single-model-system-diagram.jpg
+++ b/Chapter16/Figures/multi-language-single-model-system-diagram.jpg
--- a/Chapter16/chapter16.tex
+++ b/Chapter16/chapter16.tex
--- a/Chapter9/chapter9.tex
+++ b/Chapter9/chapter9.tex
@@ -2166,6 +2166,6 @@ Jobs was the CEO of {\red{\underline{apple}}}.
 \vspace{0.5em}
 \item 为了进一步提高神经语言模型性能，除了改进模型，还可以在模型中引入新的结构或是其他有效信息，该领域也有很多典型工作值得关注。例如在神经语言模型中引入除了词嵌入以外的单词特征，如语言特征（形态、语法、语义特征等）\upcite{Wu2012FactoredLM,Adel2015SyntacticAS}、上下文信息\upcite{mikolov2012context,Wang2015LargerContextLM}、知识图谱等外部知识\upcite{Ahn2016ANK}；或是在神经语言模型中引入字符级信息，将其作为字符特征单独\upcite{Kim2016CharacterAwareNL,Hwang2017CharacterlevelLM}或与单词特征一起\upcite{Onoe2016GatedWR,Verwimp2017CharacterWordLL}送入模型中；在神经语言模型中引入双向模型也是一种十分有效的尝试，在单词预测时可以同时利用来自过去和未来的文本信息\upcite{Graves2013HybridSR,bahdanau2014neural,Peters2018DeepCW}。
 \vspace{0.5em}
-\item 词嵌入是自然语言处理近些年的重要进展。所谓“嵌入”是一类方法，理论上，把一个事物进行分布式表示的过程都可以被看作是广义上的“嵌入”。基于这种思想的表示学习也成为了自然语言处理中的前沿方法。比如，如何对树结构，甚至图结构进行分布式表示成为了分析自然语言的重要方法\upcite{DBLP:journals/corr/abs-1809-01854,Yin2018StructVAETL,Aharoni2017TowardsSN,Bastings2017GraphCE,KoncelKedziorski2019TextGF}。此外，除了语言建模，还有很多方式可以进行词嵌入的学习，比如，SENNA\upcite{collobert2011natural}、word2vec\upcite{DBLP:journals/corr/abs-1301-3781,mikolov2013distributed}、Glove\upcite{DBLP:conf/emnlp/PenningtonSM14}、CoVe\upcite{mccann2017learned} 等。
+\item 词嵌入是自然语言处理近些年的重要进展。所谓“嵌入”是一类方法，理论上，把一个事物进行分布式表示的过程都可以被看作是广义上的“嵌入”。基于这种思想的表示学习也成为了自然语言处理中的前沿方法。比如，如何对树结构，甚至图结构进行分布式表示成为了分析自然语言的重要方法\upcite{DBLP:journals/corr/abs-1809-01854,Yin2018StructVAETL,Aharoni2017TowardsSN,Bastings2017GraphCE,KoncelKedziorski2019TextGF}。此外，除了语言建模，还有很多方式可以进行词嵌入的学习，比如，SENNA\upcite{2011Natural}、word2vec\upcite{DBLP:journals/corr/abs-1301-3781,mikolov2013distributed}、Glove\upcite{DBLP:conf/emnlp/PenningtonSM14}、CoVe\upcite{mccann2017learned} 等。
 \vspace{0.5em}
 \end{itemize}
--- a/bibliography.bib
+++ b/bibliography.bib
--- a/mt-book-xelatex.tex
+++ b/mt-book-xelatex.tex
@@ -139,14 +139,14 @@
 %\include{Chapter6/chapter6}
 %\include{Chapter7/chapter7}
 %\include{Chapter8/chapter8}
-\include{Chapter9/chapter9}
+%\include{Chapter9/chapter9}
-\include{Chapter10/chapter10}
+%\include{Chapter10/chapter10}
-\include{Chapter11/chapter11}
+%\include{Chapter11/chapter11}
-\include{Chapter12/chapter12}
+%\include{Chapter12/chapter12}
 %\include{Chapter13/chapter13}
 %\include{Chapter14/chapter14}
 %\include{Chapter15/chapter15}
-%\include{Chapter16/chapter16}
+\include{Chapter16/chapter16}
 %\include{Chapter17/chapter17}
 %\include{Chapter18/chapter18}
 %\include{ChapterAppend/chapterappend}