合并分支 'caorunzhe' 到 'master'

Caorunzhe 查看合并请求 !415

合并分支 'caorunzhe' 到 'master'
Caorunzhe 查看合并请求 !415
ed12a0aa · 曹润柘 · 75e4b93d · c4196109 · ed12a0aa · ed12a0aa
Commit ed12a0aa authored Nov 16, 2020 by 曹润柘
--- a/Chapter10/Figures/mt-history.png
+++ b/Chapter10/Figures/mt-history.png
--- a/Chapter10/chapter10.tex
+++ b/Chapter10/chapter10.tex
@@ -272,7 +272,7 @@ NMT                     & 21.7          & 18.7           & -13.7      \\
 \parinterval  编码器-解码器框架是一种典型的基于“表示”的模型。编码器的作用是将输入的文字序列通过某种转换变为一种新的“表示”形式，这种“表示”包含了输入序列的所有信息。之后，解码器把这种“表示”重新转换为输出的文字序列。这其中的一个核心问题是表示学习，即：如何定义对输入文字序列的表示形式，并自动学习这种表示，同时应用它生成输出序列。一般来说，不同的表示学习方法可以对应不同的机器翻译模型，比如，在最初的神经机器翻译模型中，源语言句子都被表示为一个独立的向量，这时表示结果是静态的；而在注意力机制中，源语言句子的表示是动态的，也就是翻译目标语言的每个单词时都会使用不同的表示结果。
-\parinterval  图\ref{fig:10-5}是一个应用编码器-解码器结构来解决机器翻译问题的简单实例。给定一个中文句子“我/对/你/感到/满意”，编码器会将这句话编码成一个实数向量$(0.2, -1, 6, 5, 0.7, -2)$，这个向量就是源语言句子的“表示”结果。虽然有些不可思议，但是神经机器翻译模型把这个向量等同于输入序列。向量中的数字并没有实际的意义，然而解码器却能从中提取到源语言句子中所包含的信息。也有研究者把向量的每一个维度看作是一个“特征”，这样源语言句子就被表示成多个“特征”的联合，而且这些特征可以被自动学习。有了这样的源语言句子的“表示”，解码器可以把这个实数向量作为输入，然后逐词生成目标语言句子“I am satisfied with you”。
+\parinterval  图\ref{fig:10-5}是一个应用编码器-解码器结构来解决机器翻译问题的简单实例。给定一个中文句子“我/对/你/感到/满意”，编码器会将这句话编码成一个实数向量$(0.2, -1, 6, \\ 5, 0.7, -2)$，这个向量就是源语言句子的“表示”结果。虽然有些不可思议，但是神经机器翻译模型把这个向量等同于输入序列。向量中的数字并没有实际的意义，然而解码器却能从中提取到源语言句子中所包含的信息。也有研究者把向量的每一个维度看作是一个“特征”，这样源语言句子就被表示成多个“特征”的联合，而且这些特征可以被自动学习。有了这样的源语言句子的“表示”，解码器可以把这个实数向量作为输入，然后逐词生成目标语言句子“I am satisfied with you”。
 %----------------------------------------------
 \begin{figure}[htp]
@@ -966,7 +966,7 @@ L(\mathbi{Y},\widehat{\mathbi{Y}}) = \sum_{j=1}^n L_{\textrm{ce}}(\mathbi{y}_j,\
 \parinterval 神经网络的参数主要是各层中的线性变换矩阵和偏置。在训练开始时，需要对参数进行初始化。但是，由于神经机器翻译的网络结构复杂，因此损失函数往往不是凸函数，不同初始化会导致不同的优化结果。而且在大量实践中已经发现，神经机器翻译模型对初始化方式非常敏感，性能优异的系统往往需要特定的初始化方式。
-\parinterval 因为LSTM是神经机器翻译中常用的一种模型，下面以LSTM模型为例（见\ref{sec:lstm-cell}节），介绍机器翻译模型的初始化方法，这些方法也可以推广到GRU等结构。具体内容如下：
+\parinterval 因为LSTM是神经机器翻译中常用的一种模型，下面以LSTM模型为例（见\ref{sec:lstm-cell}\\ 节），介绍机器翻译模型的初始化方法，这些方法也可以推广到GRU等结构。具体内容如下：
 \begin{itemize}
 \vspace{0.5em}

--- a/Chapter11/Figures/figure-convolution-kernel.tex
+++ b/Chapter11/Figures/figure-convolution-kernel.tex
@@ -51,9 +51,9 @@
 %\node[minimum width = 1.8cm] (sub) at ([xshift=-5.5cm,yshift=2cm]num9_9.east) {};
-\draw[decorate,decoration={brace,mirror,raise=0pt,amplitude=0.3cm},black,thick] ([yshift=0.4cm,xshift=-0.1cm]num1_1.west) -- node[att,xshift=-0.5cm]{$q$} ([yshift=-0.4cm,xshift=-0.1cm]num3_3.west);
+\draw[decorate,decoration={brace,mirror,raise=0pt,amplitude=0.3cm},black,thick] ([yshift=0.4cm,xshift=-0.1cm]num1_1.west) -- node[att,xshift=-0.5cm]{$Q$} ([yshift=-0.4cm,xshift=-0.1cm]num3_3.west);
-\draw[decorate,decoration={brace,raise=0pt,amplitude=0.3cm},black,thick] ([xshift=-0.4cm,yshift=0.1cm]num1.north) -- node[att,yshift=0.5cm]{$k$}([xshift=0.4cm,yshift=0.1cm]num7.north);
+\draw[decorate,decoration={brace,raise=0pt,amplitude=0.3cm},black,thick] ([xshift=-0.4cm,yshift=0.1cm]num1.north) -- node[att,yshift=0.5cm]{$K$}([xshift=0.4cm,yshift=0.1cm]num7.north);
-\draw[decorate,decoration={brace,mirror,raise=0pt,amplitude=0.3cm},black,thick] ([xshift=0.5cm,yshift=0.00cm]num9_9.south) -- node[att,xshift=0.5cm,yshift=-0.3cm]{$o$}([xshift=0.5cm,yshift=0.00cm]num9.south);
+\draw[decorate,decoration={brace,mirror,raise=0pt,amplitude=0.3cm},black,thick] ([xshift=0.5cm,yshift=0.00cm]num9_9.south) -- node[att,xshift=0.5cm,yshift=-0.3cm]{$O$}([xshift=0.5cm,yshift=0.00cm]num9.south);
 \end{tikzpicture}
\ No newline at end of file
--- a/Chapter11/Figures/figure-single-glu.tex
+++ b/Chapter11/Figures/figure-single-glu.tex
@@ -63,9 +63,9 @@ $\otimes$： & 按位乘运算 \\
 	\draw[-latex,thick] (b.east) -- (c2.west);
 	\draw[-latex,thick] (c2.east) -- ([xshift=0.4cm]c2.east); 
-	\node[inner sep=0pt, font=\tiny] at (0.75cm, -0.4cm) {$X$};
+	\node[inner sep=0pt, font=\tiny] at (0.75cm, -0.4cm) {$\mathbi{X}$};
-	\node[inner sep=0pt, font=\tiny] at ([yshift=-0.4cm]a.south) {$B=X * V + c$};
+	\node[inner sep=0pt, font=\tiny] at ([yshift=-0.4cm]a.south) {$\mathbi{B}=\mathbi{X} * \mathbi{V} + \mathbi{b}_{\mathbi{W}}$};
-	\node[inner sep=0pt, font=\tiny] at ([yshift=-0.4cm]b.south) {$A=X * W + b$};
+	\node[inner sep=0pt, font=\tiny] at ([yshift=-0.4cm]b.south) {$\mathbi{A}=\mathbi{X} * \mathbi{W} + \mathbi{b}_{\mathbi{V}}$};
-	\node[inner sep=0pt, font=\tiny] at (8.5cm, -0.4cm) {$Y=A \otimes \sigma(B)$};
+	\node[inner sep=0pt, font=\tiny] at (8.5cm, -0.4cm) {$\mathbi{Y}=\mathbi{A} \otimes \sigma(\mathbi{B})$};
 \end{tikzpicture}
\ No newline at end of file
--- a/Chapter11/Figures/figure-structural-comparison-a.tex
+++ b/Chapter11/Figures/figure-structural-comparison-a.tex
@@ -12,8 +12,8 @@
 \node(num7)[num,right of = num6,xshift = 1.2cm]{\textcolor{blue!70}{$\mathbi{e}_7$}};
 \node(num8)[num,right of = num7,xshift = 1.2cm]{\textcolor{blue!70}{$\mathbi{e}_8$}};
 \node(num9)[num,right of = num8,xshift = 1.2cm]{$\mathbi{e}_9$};
-\node(A)[below of = num2,yshift = -0.6cm]{A};
+%\node(A)[below of = num2,yshift = -0.6cm]{A};
-\node(B)[below of = num8,yshift = -0.6cm]{B};
+%\node(B)[below of = num8,yshift = -0.6cm]{B};
 \draw [->, thick, color = blue!80](num2.east)--(num3.west);
@@ -23,5 +23,8 @@
 \draw [->, thick, color = blue!80](num6.east)--(num7.west);
 \draw [->, thick, color = blue!80](num7.east)--(num8.west);
+\draw [->,thick,color = black!70] (num1) -- (num2);
+\draw [->,thick,color =black!70] (num8) -- (num9);
 \end{tikzpicture}
\ No newline at end of file
--- a/Chapter11/Figures/figure-structural-comparison-b.tex
+++ b/Chapter11/Figures/figure-structural-comparison-b.tex
@@ -13,8 +13,8 @@
 \node(num1_8)[num,right of = num1_7,xshift = 1.2cm]{\textcolor{blue!70}{$\mathbi{e}_8$}};
 \node(num1_9)[num,right of = num1_8,xshift = 1.2cm]{$\mathbi{e}_9$};
 \node(num1_10)[num,right of = num1_9,xshift = 1.2cm, fill = blue!40]{$\mathbi{0}$};
-\node(A)[below of = num2,yshift = -0.6cm]{A};
+%\node(A)[below of = num2,yshift = -0.6cm]{A};
-\node(B)[below of = num8,yshift = -0.6cm]{B};
+%\node(B)[below of = num8,yshift = -0.6cm]{B};
 \node(num2_0)[num,above of = num1_0,yshift = 1.2cm, fill = blue!40]{\textcolor{white}{$\mathbi{0}$}};
 \node(num2_1)[num,right of = num2_0,xshift = 1.2cm]{\textbf2};

--- a/Chapter11/chapter11.tex
+++ b/Chapter11/chapter11.tex