合并分支 'master' 到 'mengxia'

Master 查看合并请求 !194

合并分支 'master' 到 'mengxia'
Master 查看合并请求 !194
7fce7e1a · 孟霞 · f7aa1b25 · 57dc6137 · 7fce7e1a · 7fce7e1a
Commit 7fce7e1a authored May 16, 2020 by 孟霞
--- a/Book/Chapter1/chapter1.tex
+++ b/Book/Chapter1/chapter1.tex
@@ -2,6 +2,14 @@
 % !TEX encoding = UTF-8 Unicode
 %----------------------------------------------------------------------------------------
+% 机器翻译：统计建模与深度学习方法
+% Machine Translation: Statistical Modeling and Deep Learning Methods
+%
+% Copyright 2020
+% 肖桐(xiaotong@mail.neu.edu.cn) 朱靖波 (zhujingbo@mail.neu.edu.cn)
+%----------------------------------------------------------------------------------------
+%----------------------------------------------------------------------------------------
 %    CONFIGURATIONS
 %----------------------------------------------------------------------------------------

--- a/Book/Chapter2/chapter2.tex
+++ b/Book/Chapter2/chapter2.tex
@@ -2,6 +2,14 @@
 % !TEX encoding = UTF-8 Unicode
 %----------------------------------------------------------------------------------------
+% 机器翻译：统计建模与深度学习方法
+% Machine Translation: Statistical Modeling and Deep Learning Methods
+%
+% Copyright 2020
+% 肖桐(xiaotong@mail.neu.edu.cn) 朱靖波 (zhujingbo@mail.neu.edu.cn)
+%----------------------------------------------------------------------------------------
+%----------------------------------------------------------------------------------------
 %    CONFIGURATIONS
 %----------------------------------------------------------------------------------------
@@ -939,9 +947,9 @@ I cannot see without my reading \underline{\ \ \ \ \ \ \ \ }
 \end{eqnarray}
 \begin{eqnarray}
 c_{\textrm{KN}}(\cdot) = \left\{\begin{array}{ll}
-\textrm{count}(\cdot) & \textrm{for\ highest\ order}  \\ 
+\textrm{count}(\cdot) & \textrm{for\ highest\ order}  \\
-\textrm{catcount}(\cdot) & \textrm{for\ lower\ order} 
+\textrm{catcount}(\cdot) & \textrm{for\ lower\ order}
-\end{array}\right. 
+\end{array}\right.
 \label{eq:2-41}
 \end{eqnarray}
 \noindent 其中catcount$(\cdot)$表示的是基于某个单个词作为第$n$个词的$n$-gram的种类数目。

--- a/Book/Chapter3/Chapter3.tex
+++ b/Book/Chapter3/Chapter3.tex
@@ -2,6 +2,14 @@
 % !TEX encoding = UTF-8 Unicode
 %----------------------------------------------------------------------------------------
+% 机器翻译：统计建模与深度学习方法
+% Machine Translation: Statistical Modeling and Deep Learning Methods
+%
+% Copyright 2020
+% 肖桐(xiaotong@mail.neu.edu.cn) 朱靖波 (zhujingbo@mail.neu.edu.cn)
+%----------------------------------------------------------------------------------------
+%----------------------------------------------------------------------------------------
 %    CONFIGURATIONS
 %----------------------------------------------------------------------------------------

--- a/Book/Chapter4/Figures/cky-algorithm.tex
+++ b/Book/Chapter4/Figures/cky-algorithm.tex
@@ -9,7 +9,7 @@
 \begin{scope}[scale=0.85]
 \node[srcnode] (c1) at (0,0) {\normalsize{\textbf{Function} CKY-Algorithm($\textbf{s},G$)}};
-\node[srcnode,anchor=north west] (c21) at ([xshift=1.5em,yshift=0.4em]c1.south west) {\normalsize{\textbf{fore} $j=0$ to $ J - 1$}};
+\node[srcnode,anchor=north west] (c21) at ([xshift=1.5em,yshift=0.4em]c1.south west) {\normalsize{\textbf{for} $j=0$ to $ J - 1$}};
 \node[srcnode,anchor=north west] (c22) at ([xshift=1.5em,yshift=0.4em]c21.south west) {\normalsize{$span[j,j+1 ]$.Add($A \to a \in G$)}};
 \node[srcnode,anchor=north west] (c3) at ([xshift=-1.5em,yshift=0.4em]c22.south west) {\normalsize{\textbf{for} $l$ = 1 to $J$}};
 \node[srcnode,anchor=west] (c31) at ([xshift=6em]c3.east) {\normalsize{// length of span}};

--- a/Book/Chapter4/chapter4.tex
+++ b/Book/Chapter4/chapter4.tex
@@ -2,6 +2,14 @@
 % !TEX encoding = UTF-8 Unicode
 %----------------------------------------------------------------------------------------
+% 机器翻译：统计建模与深度学习方法
+% Machine Translation: Statistical Modeling and Deep Learning Methods
+%
+% Copyright 2020
+% 肖桐(xiaotong@mail.neu.edu.cn) 朱靖波 (zhujingbo@mail.neu.edu.cn)
+%----------------------------------------------------------------------------------------
+%----------------------------------------------------------------------------------------
 %    CONFIGURATIONS configurations
 %----------------------------------------------------------------------------------------
 \renewcommand\figurename{图}%将figure改为图
@@ -780,7 +788,7 @@ dr = start_i-end_{i-1}-1
 \subsubsection{翻译候选匹配}
-\parinterval 在解码时，首先要知道每个源语言短语可能的译文都是什么。对于一个源语言短语，每个可能的译文也被称作{\small\bfnew{翻译候选}}\index{翻译候选}（Translation Candidate）\index{Translation Candidate}。实现翻译候选的匹配很简单。只需要遍历输入的源语言句子中所有可能的短语，之后在短语表中找到相应的翻译即可。比如，图\ref{fig:4-27}展示了句子``桌子\ 上\ 有\ 一个\ 苹果''的翻译候选匹配结果。可以看到，不同的短语会对应若干翻译候选。这些翻译候选会保存在所对应的跨度中。比如，``upon the table''是短语``桌子 上 有''的翻译候选，即对应源语言跨度[0,3]。\\ \\ \\ 
+\parinterval 在解码时，首先要知道每个源语言短语可能的译文都是什么。对于一个源语言短语，每个可能的译文也被称作{\small\bfnew{翻译候选}}\index{翻译候选}（Translation Candidate）\index{Translation Candidate}。实现翻译候选的匹配很简单。只需要遍历输入的源语言句子中所有可能的短语，之后在短语表中找到相应的翻译即可。比如，图\ref{fig:4-27}展示了句子``桌子\ 上\ 有\ 一个\ 苹果''的翻译候选匹配结果。可以看到，不同的短语会对应若干翻译候选。这些翻译候选会保存在所对应的跨度中。比如，``upon the table''是短语``桌子 上 有''的翻译候选，即对应源语言跨度[0,3]。\\ \\ \\
 %----------------------------------------------
 \begin{figure}[htp]

--- a/Book/Chapter5/Figures/fig-broadcast.tex
+++ b/Book/Chapter5/Figures/fig-broadcast.tex
@@ -8,7 +8,7 @@
    \node [fill=orange!20,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,\y) {$\number\value{mycount1}$};
    \addtocounter{mycount1}{1};
  }
-\node [anchor=south] (varlabel) at (0,0.6) {$\textbf{s}$};
+\node [anchor=south] (varlabel) at (0,0.6) {$\mathbf{s}$};
 \node [anchor=north] (labelc) at (0,-0.7) {\footnotesize{(a)}};
 \end{scope}
@@ -20,7 +20,7 @@
    \node [fill=green!20,inner sep=0pt,minimum height=0.48cm,minimum width=0.48cm] at (\x,\y) {$1$};
    \addtocounter{mycount1}{1};
  }
-\node [anchor=south] (varlabel) at (0,0.1) {$\textbf{b}$};
+\node [anchor=south] (varlabel) at (0,0.1) {$\mathbf{b}$};
 \node [anchor=north] (labelc) at (0,-0.7) {\footnotesize{(b)}};
 \end{scope}
@@ -34,7 +34,7 @@
    \node [fill=orange!20,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,\y) {$\number\value{mycount1}$};
    \addtocounter{mycount1}{1};
  }
-\node [anchor=south] (varlabel) at (0,0.6) {$\textbf{s}$};
+\node [anchor=south] (varlabel) at (0,0.6) {$\mathbf{s}$};
 \end{scope}
 \begin{scope}[yshift=-1in,xshift=1.5in]
 \setcounter{mycount1}{1}
@@ -49,8 +49,8 @@
    \node [fill=purple!20,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,\y) {$1$};
    \addtocounter{mycount1}{1};
  }
-\node [anchor=center] (plabel) at (-4.5em,0) {\huge{$\textbf{+}$}};
+\node [anchor=center] (plabel) at (-4.5em,0) {\huge{$\mathbf{+}$}};
-\node [anchor=south] (varlabel) at (0,0.6) {$\textbf{b}$};
+\node [anchor=south] (varlabel) at (0,0.6) {$\mathbf{b}$};
 \node [anchor=north] (labelc) at (0,-0.7) {\footnotesize{(c)}};
 \end{scope}
 \begin{scope}[yshift=-1in,xshift=3in]
@@ -61,8 +61,8 @@
    \node [fill=orange!20,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,\y) {$\number\value{mycount1}$};
    \addtocounter{mycount1}{1};
  }
-\node [anchor=center] (plabel) at (-4.5em,0) {\huge{$\textbf{=}$}};
+\node [anchor=center] (plabel) at (-4.5em,0) {\huge{$\mathbf{=}$}};
-\node [anchor=south] (varlabel) at (0,0.6) {$\textbf{s+b}$};
+\node [anchor=south] (varlabel) at (0,0.6) {$\mathbf{s}+\mathbf{b}$};
 \end{scope}

--- a/Book/Chapter5/Figures/fig-code-niutensor-one.tex
+++ b/Book/Chapter5/Figures/fig-code-niutensor-one.tex
@@ -59,8 +59,8 @@
 \draw [thick,->] (layer3.north) -- (y.south);
 \node [anchor=west,align=left] (xshape) at (x.east) {\tiny{shape: 3*4*5}};
 \node [anchor=west,align=left] (yshape) at (y.east) {\tiny{shape: 3*4*4}};
-\node [anchor=south west,align=left,inner sep=2pt] (l1shape) at (layer1.north) {\tiny{shape: 3*4*3}};
+\node [anchor=south west,align=left,inner sep=2pt] (l1shape) at ([xshift=0.3em]layer1.north) {\tiny{shape: 3*4*3}};
-\node [anchor=south west,align=left,inner sep=2pt] (l2shape) at (layer2.north) {\tiny{shape: 3*4*6}};
+\node [anchor=south west,align=left,inner sep=2pt] (l2shape) at ([xshift=0.3em]layer2.north) {\tiny{shape: 3*4*6}};
 \end{tikzpicture}
 \end{center}
 \end{tcolorbox}

--- a/Book/Chapter5/Figures/fig-code-tensor-define-2.tex
+++ b/Book/Chapter5/Figures/fig-code-tensor-define-2.tex
 %%%------------------------------------------------------------------------------------------------------------
-\begin{tcolorbox}[enhanced,width=11cm,frame engine=empty,boxrule=0.1mm,size=title,colback=blue!10!white]
+\begin{tcolorbox}[enhanced,width=12cm,frame engine=empty,boxrule=0.1mm,size=title,colback=blue!10!white]
 \begin{flushleft}
 {\scriptsize
 \begin{tabbing}
-\texttt{XTensor tensor;} \hspace{12em} \= // 声明张量tensor \\
+\texttt{XTensor tensor;} \hspace{14em} \= // 声明张量tensor \\
 \texttt{int sizes[6] = \{2,3,4,2,3,4\};} \> // 张量的形状为2*3*4*2*3*4 \\
 \texttt{InitTensor(\&tensor, 6, sizes, X\_FLOAT);} \> // 定义形状为sizes的6阶张量
 \end{tabbing}
@@ -12,11 +12,11 @@
 \end{tcolorbox}
 \hspace{0.1in} \scriptsize{(a) NiuTensor定义张量程序}
 \\
-\begin{tcolorbox}[enhanced,width=11cm,frame engine=empty,boxrule=0.1mm,size=title,colback=blue!10!white]
+\begin{tcolorbox}[enhanced,width=12cm,frame engine=empty,boxrule=0.1mm,size=title,colback=blue!10!white]
 \begin{flushleft}
 {\scriptsize
 \begin{tabbing}
-\texttt{XTensor a, b, c;} \hspace{11.5em} \= // 声明张量tensor \\
+\texttt{XTensor a, b, c;} \hspace{13.5em} \= // 声明张量tensor \\
 \texttt{InitTensor1D(\&a, 10, X\_INT);} \> // 10维的整数型向量\\
 \texttt{InitTensor1D(\&b, 10);} \> // 10维的向量，缺省类型(浮点)\\
 \texttt{InitTensor4D(\&c, 10, 20, 30, 40);} \> // 10*20*30*40的4阶张量(浮点)
@@ -26,11 +26,11 @@
 \end{tcolorbox}
 \hspace{0.1in} \scriptsize{(b) 定义张量的简便方式程序}
 \\
-\begin{tcolorbox}[enhanced,width=11cm,frame engine=empty,boxrule=0.1mm,size=title,colback=blue!10!white]
+\begin{tcolorbox}[enhanced,width=12cm,frame engine=empty,boxrule=0.1mm,size=title,colback=blue!10!white]
 \begin{flushleft}
 {\scriptsize
 \begin{tabbing}
-\texttt{XTensor tensorGPU;} \hspace{10.5em} \= // 声明张量tensor \\
+\texttt{XTensor tensorGPU;} \hspace{12.5em} \= // 声明张量tensor \\
 \texttt{InitTensor2D(\&tensorGPU, 10, 20,} $\backslash$ \> // 在编号为0的GPU上定义张量 \\
 \hspace{6.7em} \texttt{X\_FLOAT, 0);}
 \end{tabbing}

--- a/Book/Chapter5/Figures/fig-code-tensor-define.tex
+++ b/Book/Chapter5/Figures/fig-code-tensor-define.tex
 %------------------------------------------------------------------------------------------------------------
-\begin{tcolorbox}[enhanced,width=11cm,frame engine=empty,boxrule=0.1mm,size=title,colback=blue!10!white]
+\begin{tcolorbox}[enhanced,width=12cm,frame engine=empty,boxrule=0.1mm,size=title,colback=blue!10!white]
 \begin{flushleft}
 {\scriptsize
 \begin{tabbing}
-\texttt{\#include "source/tensor/XTensor.h"} \hspace{4em} \= // 引用XTensor定义的头文件 \\
+\texttt{\#include "source/tensor/XTensor.h"} \hspace{6em} \= // 引用XTensor定义的头文件 \\
 \texttt{using namespace nts;} \> // 引用nts命名空间 \\
 \ \\

--- a/Book/Chapter5/Figures/fig-code-tensor-operation.tex
+++ b/Book/Chapter5/Figures/fig-code-tensor-operation.tex
 %%%------------------------------------------------------------------------------------------------------------
-\begin{tcolorbox}[enhanced,width=11cm,frame engine=empty,boxrule=0.1mm,size=title,colback=blue!10!white]
+\begin{tcolorbox}[enhanced,width=12cm,frame engine=empty,boxrule=0.1mm,size=title,colback=blue!10!white]
 \begin{flushleft}
 {\scriptsize
 \begin{tabbing}
-\texttt{XTensor a, b, c, d, e;} \hspace{7em} \= // 声明张量tensor \\
+\texttt{XTensor a, b, c, d, e;} \hspace{9em} \= // 声明张量tensor \\
 \texttt{InitTensor3D(\&a, 2, 3, 4);} \> // a为2*3*4的3阶张量 \\
 \texttt{InitTensor3D(\&b, 2, 3, 4);} \> // b为2*3*4的3阶张量 \\
 \texttt{InitTensor3D(\&c, 2, 3, 4);} \> // c为2*3*4的3阶张量 \\
@@ -19,11 +19,11 @@
 \end{tcolorbox}
 \hspace{0.1in} \scriptsize{(a) 张量进行1阶运算}
 \\
-\begin{tcolorbox}[enhanced,width=11cm,frame engine=empty,boxrule=0.1mm,size=title,colback=blue!10!white]
+\begin{tcolorbox}[enhanced,width=12cm,frame engine=empty,boxrule=0.1mm,size=title,colback=blue!10!white]
 \begin{flushleft}
 {\scriptsize
 \begin{tabbing}
-\texttt{XTensor a, b, c;} \hspace{10.0em} \= // 声明张量tensor \\
+\texttt{XTensor a, b, c;} \hspace{12.0em} \= // 声明张量tensor \\
 \texttt{InitTensor4D(\&a, 2, 2, 3, 4);} \> // a为2*2*3*4的4阶张量 \\
 \texttt{InitTensor2D(\&b, 4, 5);} \> // b为4*5的矩阵 \\
 \texttt{a.SetDataRand();} \> // 随机初始化a \\

--- a/Book/Chapter5/chapter5.tex
+++ b/Book/Chapter5/chapter5.tex
--- a/Book/Chapter6/Chapter6.tex
+++ b/Book/Chapter6/Chapter6.tex
@@ -2,6 +2,14 @@
 % !TEX encoding = UTF-8 Unicode
 %----------------------------------------------------------------------------------------
+% 机器翻译：统计建模与深度学习方法
+% Machine Translation: Statistical Modeling and Deep Learning Methods
+%
+% Copyright 2020
+% 肖桐(xiaotong@mail.neu.edu.cn) 朱靖波 (zhujingbo@mail.neu.edu.cn)
+%----------------------------------------------------------------------------------------
+%----------------------------------------------------------------------------------------
 %    CONFIGURATIONS
 %----------------------------------------------------------------------------------------

--- a/Book/Chapter7/Chapter7.tex
+++ b/Book/Chapter7/Chapter7.tex
@@ -2,6 +2,14 @@
 % !TEX encoding = UTF-8 Unicode
 %----------------------------------------------------------------------------------------
+% 机器翻译：统计建模与深度学习方法
+% Machine Translation: Statistical Modeling and Deep Learning Methods
+%
+% Copyright 2020
+% 肖桐(xiaotong@mail.neu.edu.cn) 朱靖波 (zhujingbo@mail.neu.edu.cn)
+%----------------------------------------------------------------------------------------
+%----------------------------------------------------------------------------------------
 %    CONFIGURATIONS
 %----------------------------------------------------------------------------------------
@@ -443,7 +451,7 @@ y = f(x)
 \parinterval 正则化的一种实现是在训练目标中引入一个正则项。在神经机器翻译中，引入正则项的训练目标为：
 \begin{eqnarray}
-\hat{\mathbf{w}}=\argmax_{\mathbf{w}}L(\mathbf{w}) + \lambda R(\mathbf{w})
+\widehat{\mathbf{w}}=\argmax_{\mathbf{w}}L(\mathbf{w}) + \lambda R(\mathbf{w})
 \label{eq:7-2}
 \end{eqnarray}
@@ -1854,11 +1862,13 @@ L_{\textrm{seq}} = - \textrm{logP}_{\textrm{s}}(\hat{\mathbf{y}} | \mathbf{x})
 \vspace{0.5em}
 \item 无指导机器翻译。无指导机器翻译由于其不需要双语语料即可训练翻译模型的特性，在稀缺资源机器翻译的场景中有非常大的潜力而得到广泛的关注。目前无指导机器翻译主要有两种范式：第一种先得到词典的翻译，然后得到短语表的翻译和相应的统计机器翻译系统，最后使用统计机器翻译系统生成伪双语平行语料训练神经机器翻译系统\cite{DBLP:conf/acl/ArtetxeLA19}；第二种是先预训练语言模型来初始化神经机器翻译系统的编码器和解码器，然后使用翻译中回译以及降噪自编码器来训练神经机器翻译系统\cite{lample2019cross}。尽管目前无指导机器翻译在富资源的语种上取得了很大进展，但是离实际应用还有很远距离。比如，目前无指导系统都依赖于大量单语数据，而实际上稀缺资源的语种不但双语语料少，单语语料也少；此外，这些系统还无法在远距离如中英这些字母表重合少，需要大范围调序的语种对上取得可接受的结果；使用大量单语训练无指导系统还面临数据来自于不同领域的问题\cite{DBLP:journals/corr/abs-2004-05516}。设计更鲁棒，使用单语数据更高效的无指导机器翻译方法乃至新范式会是未来的趋势。
 \vspace{0.5em}
-\item 更多上下文信息的建模。由于人类语言潜在的歧义性，传统的神经机器翻译在单句翻译中可能会出现歧义。为此，一些研究工作在翻译过程中尝试引入更多的上下文信息，比如多模态翻译、基于树的翻译或者篇章级翻译。多模态翻译的目标就是在给定一个图片和其源语描述的情况下，生成目标语言的描述。一般做法就是通过一个额外的编码器来提取图像特征\cite{DBLP:journals/corr/ElliottFH15,DBLP:conf/acl/HitschlerSR16}，然后通过权重门控机制、注意力网络等融合到系统中\cite{DBLP:conf/wmt/HuangLSOD16}。
+\item 图片翻译。由于人类语言潜在的歧义性，传统的神经机器翻译在单句翻译中可能会出现歧义。为此，一些研究工作在翻译过程中尝试引入更多的上下文信息，比如多模态翻译、基于树的翻译或者篇章级翻译。比如，图片翻译的目标就是在给定一个图片和其源语描述的情况下，生成目标语言的描述。一般做法就是通过一个额外的编码器来提取图像特征\cite{DBLP:journals/corr/ElliottFH15,DBLP:conf/acl/HitschlerSR16}，然后通过权重门控机制、注意力网络等融合到系统中\cite{DBLP:conf/wmt/HuangLSOD16}。
-\parinterval 基于树的翻译是指在翻译模型中引入句法结构树或依存树，从而引入更多的句法信息。一种常用的做法是将句法树进行序列化，从而保留序列到序列的模型结构\cite{DBLP:conf/emnlp/CurreyH18,DBLP:conf/acl/SaundersSGB18}。在此基础上，一些研究工作引入了更多的解析结果\cite{DBLP:conf/acl/SumitaUZTM18,DBLP:conf/coling/ZaremoodiH18}。同时，也有一些研究工作直接使用Tree-LSTMs等网络结构\cite{DBLP:conf/acl/TaiSM15,DBLP:conf/iclr/ShenTSC19}来直接表示树结构，并将其应用到神经机器翻译模型中\cite{DBLP:conf/acl/EriguchiHT16,Yang2017TowardsBH,DBLP:conf/acl/ChenHCC17}。
+\vspace{0.5em}
+\item 基于树的翻译。这类方法在翻译模型中引入句法结构树或依存树，从而引入更多的句法信息。一种常用的做法是将句法树进行序列化，从而保留序列到序列的模型结构\cite{DBLP:conf/emnlp/CurreyH18,DBLP:conf/acl/SaundersSGB18}。在此基础上，一些研究工作引入了更多的解析结果\cite{DBLP:conf/acl/SumitaUZTM18,DBLP:conf/coling/ZaremoodiH18}。同时，也有一些研究工作直接使用Tree-LSTMs等网络结构\cite{DBLP:conf/acl/TaiSM15,DBLP:conf/iclr/ShenTSC19}来直接表示树结构，并将其应用到神经机器翻译模型中\cite{DBLP:conf/acl/EriguchiHT16,Yang2017TowardsBH,DBLP:conf/acl/ChenHCC17}。
-\parinterval 篇章级翻译是为了引入篇章级上下文信息，来处理篇章翻译中译文不连贯，主谓不一致等歧义现象。为此，一些研究人员针对该问题进行了改进，主要可以分为两类方法：一种是将当前句子与上下文进行句子级的拼接，不改变模型的结构\cite{DBLP:conf/discomt/TiedemannS17}，另外一种是采用额外的编码器来捕获篇章信息\cite{DBLP:journals/corr/JeanLFC17,DBLP:journals/corr/abs-1805-10163,DBLP:conf/emnlp/ZhangLSZXZL18}。编码器的结构除了传统的RNN、自注意力网络，还有利用层级注意力来编码之前的多句上文\cite{Werlen2018DocumentLevelNM,tan-etal-2019-hierarchical}，使用可选择的稀疏注意力机制对整个文档进行篇章建模\cite{DBLP:conf/naacl/MarufMH19},使用记忆网络、缓存机制等对篇章中的关键词进行提取\cite{DBLP:conf/coling/KuangXLZ18,DBLP:journals/tacl/TuLSZ18}或者采用两阶段解码的方式\cite{DBLP:conf/aaai/XiongH0W19,DBLP:conf/acl/VoitaST19}。除了从建模角度引入上下文信息，也有一些工作使用篇章级修正模型\cite{DBLP:conf/emnlp/VoitaST19}或者语言模型\cite{DBLP:journals/corr/abs-1910-00553}对句子级翻译模型的译文进行修正，或者通过自学习在解码过程中保持翻译连贯性\cite{DBLP:journals/corr/abs-2003-05259}。
+\vspace{0.5em}
+\item 篇章级翻译。可以通过引入篇章级上下文信息，来处理篇章翻译中译文不连贯，主谓不一致等问题。为此，一些研究人员针对该问题进行了改进，主要可以分为两类方法：一种是将当前句子与上下文进行句子级的拼接，不改变模型的结构\cite{DBLP:conf/discomt/TiedemannS17}，另外一种是采用额外的编码器来捕获篇章信息\cite{DBLP:journals/corr/JeanLFC17,DBLP:journals/corr/abs-1805-10163,DBLP:conf/emnlp/ZhangLSZXZL18}。编码器的结构除了传统的RNN、自注意力网络，还有利用层级注意力来编码之前的多句上文\cite{Werlen2018DocumentLevelNM,tan-etal-2019-hierarchical}，使用可选择的稀疏注意力机制对整个文档进行篇章建模\cite{DBLP:conf/naacl/MarufMH19},使用记忆网络、缓存机制等对篇章中的关键词进行提取\cite{DBLP:conf/coling/KuangXLZ18,DBLP:journals/tacl/TuLSZ18}或者采用两阶段解码的方式\cite{DBLP:conf/aaai/XiongH0W19,DBLP:conf/acl/VoitaST19}。除了从建模角度引入上下文信息，也有一些工作使用篇章级修正模型\cite{DBLP:conf/emnlp/VoitaST19}或者语言模型\cite{DBLP:journals/corr/abs-1910-00553}对句子级翻译模型的译文进行修正，或者通过自学习在解码过程中保持翻译连贯性\cite{DBLP:journals/corr/abs-2003-05259}。
 \vspace{0.5em}
 \item 语音翻译。在日常生活中，语音翻译也是有很大的需求。针对语音到文本翻译的特点，最简单的做法是使用自动语音识别（ASR）将语音转换成文本，然后送入文本翻译模型进行翻译\cite{DBLP:conf/icassp/Ney99,DBLP:conf/interspeech/MatusovKN05}。然而为了避免流水线中的错误传播和高延迟问题，现在通常采用端到端的建模做法\cite{DBLP:conf/naacl/DuongACBC16,DBLP:journals/corr/BerardPSB16}。同时，针对语音翻译数据稀缺的问题，一些研究工作采用各种方法来进行缓解，包括预训练\cite{DBLP:conf/naacl/BansalKLLG19}、多任务学习\cite{Weiss2017SequencetoSequenceMC,DBLP:conf/icassp/BerardBKP18}、课程学习\cite{DBLP:conf/interspeech/KanoS017}、注意力传递\cite{DBLP:journals/tacl/SperberNNW19}和知识精炼\cite{DBLP:conf/interspeech/LiuXZHWWZ19,DBLP:conf/icassp/JiaJMWCCALW19}。
 \vspace{0.5em}

--- a/Book/Chapter7/Figures/figure-batch-generation-method.tex
+++ b/Book/Chapter7/Figures/figure-batch-generation-method.tex
 \begin{tikzpicture}
-	\tikzstyle{node} = [minimum height=1.0*1.2em,draw=teal,fill=teal!10]
+	\tikzstyle{node} = [minimum height=1.0*1.2em,draw,fill=green!20]
 	\tikzstyle{legend} = [minimum height=1.0*1.2em,minimum width=1.0*1.2em,draw]
-	\tikzstyle{node2} = [minimum width=1.0*1.2em,minimum height=4.1*1.2em,draw=blue,fill=blue!10]
+	\tikzstyle{node2} = [minimum width=1.0*1.2em,minimum height=4.1*1.2em,draw,fill=blue!20]
 	\node[node,minimum width=2.8*1.2em] (node1) at (0,0) {};
 	\node[node,minimum width=4.0*1.2em,anchor=north west] (node2) at (node1.south west) {};
 	\node[node,minimum width=3.2*1.2em,anchor=north west] (node3) at (node2.south west) {};
@@ -12,12 +12,12 @@
 	\node[node,minimum width=2.8*1.2em,anchor=north west] (node6) at (node5.south west) {};
 	\node[node,minimum width=3.2*1.2em,anchor=north west] (node7) at (node6.south west) {};
 	\node[node,minimum width=4.0*1.2em,anchor=north west] (node8) at (node7.south west) {};
-	\node[font=\footnotesize,anchor=east] (line1) at (node1.west) {gpu1};
+	\node[font=\footnotesize,anchor=east] (line1) at (node1.west) {GPU1};
-	\node[font=\footnotesize,anchor=east] (line2) at (node2.west) {gpu2};
+	\node[font=\footnotesize,anchor=east] (line2) at (node2.west) {GPU2};
-	\node[font=\footnotesize,anchor=east] (line3) at (node3.west) {gpu3};
+	\node[font=\footnotesize,anchor=east] (line3) at (node3.west) {GPU3};
-	\node[font=\footnotesize,anchor=east] (line4) at (node4.west) {gpu4};
+	\node[font=\footnotesize,anchor=east] (line4) at (node4.west) {GPU4};
 	\node[node2,anchor = north west] (grad2) at ([xshift=0.3em]node5.north east) {};
-	\draw[->] (-1.4em*1.2,-3.62*1.2em) -- (9em*1.2,-3.62*1.2em);
+	\draw[->,thick] (-1.4em*1.2,-3.62*1.2em) -- (9em*1.2,-3.62*1.2em);
 	\node[node,minimum width=2.8*1.2em] (node9) at (16em,0) {};
 	\node[node,minimum width=4.0*1.2em,anchor=north west] (node10) at (node9.south west) {};
@@ -29,11 +29,11 @@
 	\node[node,minimum width=3.2*1.2em,anchor=north west] (node15) at (node11.north east) {};
 	\node[node,minimum width=4.0*1.2em,anchor=north west] (node16) at (node12.north east) {};
 	\node[node2,anchor = north west] (grad3) at ([xshift=0.5em]node13.north east) {};
-	\node[font=\footnotesize,anchor=east] (line1) at (node9.west) {gpu1};
+	\node[font=\footnotesize,anchor=east] (line1) at (node9.west) {GPU1};
-	\node[font=\footnotesize,anchor=east] (line2) at (node10.west) {gpu2};
+	\node[font=\footnotesize,anchor=east] (line2) at (node10.west) {GPU2};
-	\node[font=\footnotesize,anchor=east] (line3) at (node11.west) {gpu3};
+	\node[font=\footnotesize,anchor=east] (line3) at (node11.west) {GPU3};
-	\node[font=\footnotesize,anchor=east] (line4) at (node12.west) {gpu4};
+	\node[font=\footnotesize,anchor=east] (line4) at (node12.west) {GPU4};
-	\draw[->] (13.6*1.2em,-3.62*1.2em) -- (20.5*1.2em,-3.62*1.2em);
+	\draw[->,thick] (node12.south west) -- ([xshift=3em]node16.south east);
 	\begin{pgfonlayer}{background}
 	\node [rectangle,inner sep=-0.0em,draw] [fit = (node1) (node2) (node3) (node4)] (box1) {};
 	\node [rectangle,inner sep=-0.0em,draw] [fit = (node5) (node6) (node7) (node8)] (box2) {};
@@ -46,9 +46,9 @@
 	\node[legend] (legend3) at (2em,2em) {};
 	\node[font=\footnotesize,anchor=west] (idle) at (legend3.east) {:空闲};
-	\node[legend,anchor=west,draw=teal,fill=teal!10] (legend4) at ([xshift = 2em]idle.east) {};
+	\node[legend,anchor=west,draw,fill=green!30] (legend4) at ([xshift = 2em]idle.east) {};
 	\node[font=\footnotesize,anchor=west] (FB) at (legend4.east) {:前向/反向};
-	\node[legend,anchor=west,draw=blue,fill=blue!10] (legend5) at ([xshift = 2em]FB.east) {};
+	\node[legend,anchor=west,draw,fill=blue!30] (legend5) at ([xshift = 2em]FB.east) {};
 	\node[font=\footnotesize,anchor=west] (grad_sync) at (legend5.east) {:梯度更新};
 \end{tikzpicture}
\ No newline at end of file
--- a/Book/Chapter7/Figures/figure-machine-translation-performance-curve.tex
+++ b/Book/Chapter7/Figures/figure-machine-translation-performance-curve.tex
@@ -12,7 +12,7 @@
 \draw [-,very thick,draw=ublue] ([xshift=0.7em,yshift=3em]n1.north) .. controls +(north:7em) and +(south:0em) .. ([xshift=17em,yshift=9em]n1.north);
 {\footnotesize
-\node [anchor=south] (n4) at ([xshift=7em,yshift=5em]n1.north) {性能快速爬升阶段};
+\node [anchor=south] (n4) at ([xshift=8em,yshift=5em]n1.north) {性能快速爬升阶段（红色）};
 \node [anchor=west] (n5) at ([xshift=0em,yshift=-2em]n4.west) {数据的作用会非常明显};
 \draw [-,very thick,draw=red] ([xshift=0.7em,yshift=3em]n1.north) .. controls +(north:5.9em) and +(south:0em) .. ([xshift=10em,yshift=9.6em]n1.north);

--- a/Book/Chapter7/Figures/figure-randomly-generation-vs-generate-by-sentence-length.tex
+++ b/Book/Chapter7/Figures/figure-randomly-generation-vs-generate-by-sentence-length.tex
 \begin{tikzpicture}
-	\tikzstyle{node} = [minimum height=1.0*1.2em,draw=teal,fill=teal!10]
+	\tikzstyle{node} = [minimum height=1.0*1.2em,draw,fill=green!20]
 	\node[node,minimum width=2.0*1.2em] (sent1) at (0,0) {};
 	\node[node,minimum width=5.0*1.2em,anchor=north west] (sent2) at (sent1.south west) {};
 	\node[node,minimum width=1.0*1.2em,anchor=north west] (sent3) at (sent2.south west) {};
@@ -11,15 +11,15 @@
 	\node[node,minimum width=4.5*1.2em,anchor=north west] (sent7) at (sent6.south west) {};
 	\node[node,minimum width=5*1.2em,anchor=north west] (sent8) at (sent7.south west) {};
-	\node[font=\footnotesize,anchor=east] (line1) at (sent1.west) {sent1};
+	\node[font=\footnotesize,anchor=east] (line1) at (sent1.west) {句子1};
-	\node[font=\footnotesize,anchor=east] (line2) at (sent2.west) {sent2};
+	\node[font=\footnotesize,anchor=east] (line2) at (sent2.west) {句子2};
-	\node[font=\footnotesize,anchor=east] (line3) at (sent3.west) {sent3};
+	\node[font=\footnotesize,anchor=east] (line3) at (sent3.west) {句子3};
-	\node[font=\footnotesize,anchor=east] (line4) at (sent4.west) {sent4};
+	\node[font=\footnotesize,anchor=east] (line4) at (sent4.west) {句子4};
-	\node[font=\footnotesize,anchor=east] (line5) at (sent5.west) {sent1};
+	\node[font=\footnotesize,anchor=east] (line5) at (sent5.west) {句子1};
-	\node[font=\footnotesize,anchor=east] (line6) at (sent6.west) {sent2};
+	\node[font=\footnotesize,anchor=east] (line6) at (sent6.west) {句子2};
-	\node[font=\footnotesize,anchor=east] (line7) at (sent7.west) {sent3};
+	\node[font=\footnotesize,anchor=east] (line7) at (sent7.west) {句子3};
-	\node[font=\footnotesize,anchor=east] (line8) at (sent8.west) {sent4};
+	\node[font=\footnotesize,anchor=east] (line8) at (sent8.west) {句子4};
 	\begin{pgfonlayer}{background}
 	\node [rectangle,inner sep=-0.0em,draw] [fit = (sent1) (sent2) (sent3) (sent4)] (box1) {};
 	\node [rectangle,inner sep=-0.0em,draw] [fit = (sent5) (sent6) (sent7) (sent8)] (box2) {};

--- a/Book/Chapter7/Figures/figure-word-root.tex
+++ b/Book/Chapter7/Figures/figure-word-root.tex
@@ -3,16 +3,16 @@
 \node[] (do) at (0,0) {{\red do}}; 
 \node[anchor = west] (does) at ([xshift = 1em]do.east) {{\red do}es};
 \node[anchor = west] (doing) at ([xshift = 0.7em]does.east) {{\red do}ing};
-\node[anchor = north] (do_root) at ([yshift = -1em]does.south) {do};
+\node[anchor = north] (do_root) at ([yshift = -1.5em]does.south) {do};
 \node[anchor = west] (new) at ([xshift = 2em]doing.east) {{\red new}}; 
 \node[anchor = west] (newer) at ([xshift = 1em]new.east) {{\red new}er};
 \node[anchor = west] (newest) at ([xshift = 0.7em]newer.east) {{\red new}est};
-\node[anchor = north] (new_root) at ([yshift = -1em]newer.south) {new};
+\node[anchor = north] (new_root) at ([yshift = -1.5em]newer.south) {new};
-\draw [->] (do_root.north) .. controls +(north:0.4) and +(south:0.6) ..(do.south);
+\draw [->] ([yshift=0.2em]do_root.north) .. controls +(north:0.4) and +(south:0.6) ..(do.south);
 \draw [->] (do_root.north) -- (does.south);
-\draw [->] (do_root.north) .. controls +(north:0.4) and +(south:0.6) ..(doing.south);
+\draw [->] ([yshift=0.2em]do_root.north) .. controls +(north:0.4) and +(south:0.6) ..(doing.south);
-\draw [->] (new_root.north) .. controls +(north:0.4) and +(south:0.6) ..(new.south);
+\draw [->] ([yshift=0.2em]new_root.north) .. controls +(north:0.4) and +(south:0.6) ..(new.south);
 \draw [->] (new_root.north) -- (newer.south);
-\draw [->] (new_root.north) .. controls +(north:0.4) and +(south:0.6) ..(newest.south);
+\draw [->] ([yshift=0.2em]new_root.north) .. controls +(north:0.4) and +(south:0.6) ..(newest.south);
 \end{tikzpicture}
\ No newline at end of file
--- a/Book/ChapterAppend/ChapterAppend.tex
+++ b/Book/ChapterAppend/ChapterAppend.tex
@@ -2,6 +2,14 @@
 % !TEX encoding = UTF-8 Unicode
 %----------------------------------------------------------------------------------------
+% 机器翻译：统计建模与深度学习方法
+% Machine Translation: Statistical Modeling and Deep Learning Methods
+%
+% Copyright 2020
+% 肖桐(xiaotong@mail.neu.edu.cn) 朱靖波 (zhujingbo@mail.neu.edu.cn)
+%----------------------------------------------------------------------------------------
+%----------------------------------------------------------------------------------------
 %    CONFIGURATIONS
 %----------------------------------------------------------------------------------------
@@ -208,7 +216,7 @@ S = N(b^{\infty}(V(\mathbf{s}|\mathbf{t};2))) \cup (\mathop{\cup}\limits_{ij} N(
 \parinterval 为了理解这个公式，先介绍几个概念。
 \begin{itemize}
-\item $V(\mathbf{s}|\mathbf{t})$表示Viterbi词对齐，$V(\mathbf{s}|\mathbf{t},1)$、$V(\mathbf{s}|\mathbf{t},2)$和$V(\mathbf{s}|\mathbf{t},3)$就分别对应了模型1、2 和3 的Viterbi 词对齐； 
+\item $V(\mathbf{s}|\mathbf{t})$表示Viterbi词对齐，$V(\mathbf{s}|\mathbf{t},1)$、$V(\mathbf{s}|\mathbf{t},2)$和$V(\mathbf{s}|\mathbf{t},3)$就分别对应了模型1、2 和3 的Viterbi 词对齐；
 \item 把那些满足第$j$个源语言单词对应第$i$个目标语言单词（$a_j=i$）的词对齐构成的集合记为$\mathbf{A}_{i \leftrightarrow j}(\mathbf{s},\mathbf{t})$。通常称这些对齐中$j$和$i$被``钉''在了一起。在$\mathbf{A}_{i \leftrightarrow j}(\mathbf{s},\mathbf{t})$中使$\textrm{P}(\mathbf{a}|\mathbf{s},\mathbf{t})$达到最大的那个词对齐被记为$V_{i \leftrightarrow j}(\mathbf{s},\mathbf{t})$；
 \item 如果两个词对齐，通过交换两个词对齐连接就能互相转化，则称它们为邻居。一个词对齐$\mathbf{a}$的所有邻居记为$N(\mathbf{a})$。
 \end{itemize}

--- a/Book/ChapterPreface/ChapterPreface.tex
+++ b/Book/ChapterPreface/ChapterPreface.tex
 % !Mode:: "TeX:UTF-8"
 % !TEX encoding = UTF-8 Unicode
+%----------------------------------------------------------------------------------------
+% 机器翻译：统计建模与深度学习方法
+% Machine Translation: Statistical Modeling and Deep Learning Methods
+%
+% Copyright 2020
+% 肖桐(xiaotong@mail.neu.edu.cn) 朱靖波 (zhujingbo@mail.neu.edu.cn)
+%----------------------------------------------------------------------------------------
 \renewcommand\figurename{图}
 %----------------------------------------------------------------------------------------

--- a/Book/mt-book-xelatex.idx
+++ b/Book/mt-book-xelatex.idx
--- a/Book/mt-book-xelatex.ptc
+++ b/Book/mt-book-xelatex.ptc
--- a/Book/mt-book-xelatex.tex
+++ b/Book/mt-book-xelatex.tex
 % !Mode:: "TeX:UTF-8"
 % !TEX encoding = UTF-8 Unicode
+%----------------------------------------------------------------------------------------
+% 机器翻译：统计建模与深度学习方法
+% Machine Translation: Statistical Modeling and Deep Learning Methods
+%
+% Copyright 2020
+% 肖桐(xiaotong@mail.neu.edu.cn) 朱靖波 (zhujingbo@mail.neu.edu.cn)
+%----------------------------------------------------------------------------------------
 %----------------------------------------------------------------------------------------
 %	BASIC CONFIGURATIONS
@@ -122,14 +129,14 @@
 %	CHAPTERS
 %----------------------------------------------------------------------------------------
-%\include{Chapter1/chapter1}
+\include{Chapter1/chapter1}
-%\include{Chapter2/chapter2}
+\include{Chapter2/chapter2}
-%\include{Chapter3/chapter3}
+\include{Chapter3/chapter3}
 \include{Chapter4/chapter4}
-%\include{Chapter5/chapter5}
+\include{Chapter5/chapter5}
-%\include{Chapter6/chapter6}
+\include{Chapter6/chapter6}
-%\include{Chapter7/chapter7}
+\include{Chapter7/chapter7}
-%\include{ChapterAppend/chapterappend}
+\include{ChapterAppend/chapterappend}
 %----------------------------------------------------------------------------------------