合并分支 'master' 到 'mengxia'

Master 查看合并请求 !174

合并分支 'master' 到 'mengxia'
Master 查看合并请求 !174
7082772e · 孟霞 · dedf67b1 · 83d21178 · 7082772e · 7082772e
Commit 7082772e authored Sep 09, 2020 by 孟霞
--- a/Chapter1/chapter1.tex
+++ b/Chapter1/chapter1.tex
@@ -127,7 +127,7 @@
 \parinterval 随着电子计算机的发展，研究者开始尝试使用计算机来进行自动翻译。1954年，美国乔治敦大学在IBM公司支持下，启动了第一次真正的机器翻译实验。翻译的目标是将几个简单的俄语句子翻译成为英语，翻译系统包含6条翻译规则和250词汇。这次翻译实验中测试了50个化学文本句子，取得了初步成功。在某种意义上来说，这个实验显示了采用基于词典和翻译规则的方法可以实现机器翻译过程。虽然只是取得了初步成功，但却引起了苏联、英国和日本研究机构的机器翻译研究热，大大推动了早期机器翻译的研究进展。
-\parinterval 1957年，Noam Chomsky在\emph{Syntactic Structures}中描述了转换生成语法\upcite{chomsky1957syntactic}，并使用数学方法来研究自然语言，建立了包括上下文有关语法、上下文无关语法等4种类型的语法。这些工作最终为今天计算机中广泛使用的“形式语言”奠定了基础。而他的思想也深深地影响了同时期的语言学和自然语言处理领域的学者。特别是是，早期基于规则的机器翻译中也大量使用了这些思想。
+\parinterval 1957年，Noam Chomsky在\emph{Syntactic Structures}中描述了转换生成语法\upcite{chomsky1957syntactic}，并使用数学方法来研究自然语言，建立了包括上下文有关语法、上下文无关语法等4种类型的语法。这些工作最终为今天计算机中广泛使用的“形式语言”奠定了基础。而他的思想也深深地影响了同时期的语言学和自然语言处理领域的学者。特别的是，早期基于规则的机器翻译中也大量使用了这些思想。
 \parinterval 虽然在这段时间，使用机器进行翻译的议题越加火热，但是事情并不总是一帆风顺，怀疑论者对机器翻译一直存有质疑，并很容易找出一些机器翻译无法解决的问题。自然地，人们也期望能够客观地评估一下机器翻译的可行性。当时美国基金资助组织委任自动语言处理咨询会承担了这项任务。经过近两年的调查与分析，该委员会于1966年11月公布了一个题为\emph{LANGUAGE AND MACHINES}的报告（图\ref{fig:1-5}），即ALPAC报告。该报告全面否定了机器翻译的可行性，为机器翻译的研究泼了一盆冷水。
@@ -142,7 +142,7 @@
 \parinterval 随后美国政府终止了对机器翻译研究的支持，这导致整个产业界和学术界都开始回避机器翻译。没有了政府的支持，企业也无法进行大规模投入，机器翻译的研究就此受挫。
-\parinterval 从历史上看，包括机器翻译在内很多人工智能领域在那个年代并不受“待见”，其主要原因在于当时的技术水平还比较低，而大家又对机器翻译等技术的期望过高。最后发现，当时的机器翻译水平无法满足实际需要，因此转而排斥它。但是，也正是这一盆冷水，让研究人员可以更加冷静地思考机器翻译的发展方向，为后来的爆发蓄力。
+\parinterval 从历史上看，包括机器翻译在内，很多人工智能领域在那个年代并不受“待见”，其主要原因在于当时的技术水平还比较低，而大家又对机器翻译等技术的期望过高。最后发现，当时的机器翻译水平无法满足实际需要，因此转而排斥它。但是，也正是这一盆冷水，让研究人员可以更加冷静地思考机器翻译的发展方向，为后来的爆发蓄力。
 %----------------------------------------------------------------------------------------
 %    NEW SUB-SECTION
@@ -174,7 +174,7 @@
 \vspace{0.5em}
 \item 第二，神经网络的连续空间模型有更强的表示能力。机器翻译中的一个基本问题是：如何表示一个句子？统计机器翻译把句子的生成过程看作是短语或者规则的推导，这本质上是一个离散空间上的符号系统。深度学习把传统的基于离散化的表示变成了连续空间的表示。比如，用实数空间的分布式表示代替了离散化的词语表示，而整个句子可以被描述为一个实数向量。这使得翻译问题可以在连续空间上描述，进而大大缓解了传统离散空间模型维度灾难等问题。更重要的是，连续空间模型可以用梯度下降等方法进行优化，具有很好的数学性质并且易于实现。
 \vspace{0.5em}
-\item 第三，深度网络学习算法的发展和GPU\index{GPU}（Graphics Processing Unit）\index{Graphics Processing Unit}等并行计算设备为训练神经网络提供了可能。早期的基于神经网络的方法一直没有在机器翻译甚至自然语言处理领域得到大规模应用，其中一个重要的原因是这类方法需要大量的浮点运算，而且以前计算机的计算能力无法达到这个要求。随着GPU等并行计算设备的进步，训练大规模神经网络也变为了可能。现在已经可以在几亿、几十亿，甚至上百亿句对上训练机器翻译系统，系统研发的周期越来越短，进展日新月异。
+\item 第三，深度网络学习算法的发展和GPU\index{GPU}（Graphics Processing Unit）\index{Graphics Processing Unit}等并行计算设备为训练神经网络提供了可能。早期的基于神经网络的方法一直没有在机器翻译甚至自然语言处理领域得到大规模应用，其中一个重要的原因是这类方法需要大量的浮点运算，但是以前计算机的计算能力无法达到这个要求。随着GPU等并行计算设备的进步，训练大规模神经网络也变为了可能。现在已经可以在几亿、几十亿，甚至上百亿句对上训练机器翻译系统，系统研发的周期越来越短，进展日新月异。
 \vspace{0.5em}
 \end{itemize}
@@ -200,7 +200,7 @@
 \sectionnewpage
 \section{机器翻译现状及挑战}
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-\parinterval 机器翻译技术发展到今天已经过无数次迭代，技术范式也经过若干次更替，近些年机器翻译的应用也如雨后春笋。今天的机器翻译的质量究竟如何呢？乐观地说，在很多特定的条件下，机器翻译的译文结果是非常不错的，甚至可以接近人工翻译的结果。然而，在开放式翻译任务中，机器翻译的结果还并不完美。更严格来说，机器翻译的质量远没有达到人们所期望的程度。对于有些人提到的“机器翻译代替人工翻译”也并不是事实。比如，在高精度同声传译任务中，机器翻译仍需要更多打磨；再比如，针对于小说的翻译，机器翻译还无法做到与人工翻译媲美；甚至有人尝试用机器翻译系统翻译中国古代诗词，这里更多的是娱乐的味道。但是毫无疑问的是，机器翻译可以帮助人类，甚至有朝一日可以代替一些低端的人工翻译工作。
+\parinterval 机器翻译技术发展到今天已经过无数次迭代，技术范式也经过若干次更替，近些年机器翻译的应用也如雨后春笋相继浮现。今天的机器翻译的质量究竟如何呢？乐观地说，在很多特定的条件下，机器翻译的译文结果是非常不错的，甚至可以接近人工翻译的结果。然而，在开放式翻译任务中，机器翻译的结果还并不完美。更严格来说，机器翻译的质量远没有达到人们所期望的程度。对于有些人提到的“机器翻译代替人工翻译”也并不是事实。比如，在高精度同声传译任务中，机器翻译仍需要更多打磨；再比如，针对于小说的翻译，机器翻译还无法做到与人工翻译媲美；甚至有人尝试用机器翻译系统翻译中国古代诗词，这里更多的是娱乐的味道。但是毫无疑问的是，机器翻译可以帮助人类，甚至有朝一日可以代替一些低端的人工翻译工作。
 \parinterval 图\ref{fig:1-7}展示了机器翻译和人工翻译质量的一个对比结果。在汉语到英语的新闻翻译任务中，如果对译文进行人工评价（五分制），那么机器翻译的译文得分为3.9分，人工译文得分为4.7分（人的翻译也不是完美的）。可见，在这个任务中机器翻译表现不错，但是与人还有一定差距。如果换一种方式评价，把人的译文作为参考答案，用机器翻译的译文与其进行比对（百分制），会发现机器翻译的得分只有47分。当然，这个结果并不是说机器翻译的译文质量很差，它更多的是表明机器翻译系统可以生成一些与人工翻译不同的译文，机器翻译也具有一定的创造性。这也类似于，很多围棋选手都想向AlphaGo学习，因为智能围棋系统也可以走出一些人类从未走过的妙招。
@@ -287,7 +287,7 @@
 \subsection{转换法}
-\parinterval 通常一个典型的{\small\bfnew{基于转换规则的机器翻译}}\index{基于转换规则的机器翻译}（Transfer-based Translation）\index{Transfer-based Translation}的过程可以被视为“独立分析-独立生成-相关转换”的过程\upcite{parsing2009speech}。如图\ref{fig:1-11}所示，这些过程可以分成六个步骤，其中每一个步骤都是通过相应的翻译规则来完成。比如，第一个步骤中需要构建源语词法分析规则，第二个步骤中需要构建源语句法分析规则，第三个和第四个步骤中需要构建转换规则，其中包括源语言-目标语言词汇和结构转换规则等等。
+\parinterval 通常一个典型的{\small\bfnew{基于转换规则的机器翻译}}\index{基于转换规则的机器翻译}（Transfer-based Translation）\index{Transfer-based Translation}的过程可以被视为“独立分析-相关转换-独立生成”的过程\upcite{parsing2009speech}。如图\ref{fig:1-11}所示，这些过程可以分成六个步骤，其中每一个步骤都是通过相应的翻译规则来完成。比如，第一个步骤中需要构建源语词法分析规则，第二个步骤中需要构建源语句法分析规则，第三个和第四个步骤中需要构建转换规则，其中包括源语言-目标语言词汇和结构转换规则等等。
 %----------------------------------------------
 \begin{figure}[htp]
@@ -506,7 +506,7 @@
 \parinterval 首先，推荐一本书$Statistical\ Machine\ Translation$\upcite{koehn2009statistical}，其作者是机器翻译领域著名学者Philipp Koehn教授。该书是机器翻译领域内的经典之作，介绍了统计机器翻译技术的进展。该书从语言学和概率学两个方面介绍了统计机器翻译的构成要素，然后介绍了统计机器翻译的主要模型：基于词、基于短语和基于树的模型，以及机器翻译评价、语言建模、判别式训练等方法。此外，作者在该书的最新版本中增加了神经机器翻译的章节，方便研究人员全面了解机器翻译的最新发展趋势\upcite{DBLP:journals/corr/abs-1709-07809}。
-\parinterval $Foundations\ of\ Statistical\ Natural\ Language\ Processing$\upcite{manning1999foundations}中文译名《统计自然语言处理基础》，作者是自然语言处理领域的权威Chris Manning教授和Hinrich Sch$\ddot{\textrm{u}}$tze教授。该书对统计自然语言处理方法进行了全面介绍。书中讲解了统计自然语言处理所需的语言学和概率论基础知识，介绍了机器翻译评价、语言建模、判别式训练以及整合语言学信息等基础方法。其中也包含了构建自然语言处理工具所需的基本理论和算法，提供了对数学和语言学基础内容广泛而严格的覆盖，以及统计方法的详细讨论。
+\parinterval $Foundations\ of\ Statistical\ Natural\ Language\ Processing$\upcite{manning1999foundations}中文译名《统计自然语言处理基础》，作者是自然语言处理领域的权威Chris Manning教授和Hinrich Sch$\ddot{\textrm{u}}$tze教授。该书对统计自然语言处理方法进行了全面介绍。书中讲解了统计自然语言处理所需的语言学和概率论基础知识，介绍了机器翻译评价、语言建模、判别式训练以及整合语言学信息等基础方法。其中也包含了构建自然语言处理工具所需的基本理论和算法，并且涵盖了数学和语言学基础内容以及相关的统计方法。
 \parinterval 《统计自然语言处理（第2版）》\upcite{宗成庆2013统计自然语言处理}由中国科学院自动化所宗成庆教授所著。该书中系统介绍了统计自然语言处理的基本概念、理论方法和最新研究进展，既有对基础知识和理论模型的介绍，也有对相关问题的研究背景、实现方法和技术现状的详细阐述。可供从事自然语言处理、机器翻译等研究的相关人员参考。

--- a/Chapter10/Figures/figure-3-base-problom-of-p.tex
+++ b/Chapter10/Figures/figure-3-base-problom-of-p.tex
@@ -15,9 +15,9 @@
 					\node[rnnnode,minimum height=0.5\base,fill=green!30!white,anchor=west] (eemb\x) at ([xshift=0.4\base]eemb\y.east) {\tiny{$e_x()$}};
 				\foreach \x in {1,2,...,3}
 					\node[rnnnode,fill=blue!30!white,anchor=south] (enc\x) at ([yshift=0.3\base]eemb\x.north) {};
-			        \node[] (enclabel1) at (enc1) {\tiny{$\textbf{h}_{m-2}$}};
+			        \node[] (enclabel1) at (enc1) {\tiny{$\vectorn{h}_{m-2}$}};
-			        \node[] (enclabel2) at (enc2) {\tiny{$\textbf{h}_{m-1}$}};
+			        \node[] (enclabel2) at (enc2) {\tiny{$\vectorn{h}_{m-1}$}};
-			        \node[rnnnode,fill=purple!30!white] (enclabel3) at (enc3) {\tiny{$\textbf{h}_{m}$}};
+			        \node[rnnnode,fill=purple!30!white] (enclabel3) at (enc3) {\tiny{$\vectorn{h}_{m}$}};
 				\node[wordnode,left=0.4\base of enc1] (init1) {$\cdots$};
 				\node[wordnode,left=0.4\base of eemb1] (init2) {$\cdots$};
@@ -29,7 +29,7 @@
 				\foreach \x in {1,2,...,3}
 					\node[rnnnode,minimum height=0.5\base,fill=green!30!white,anchor=south] (demb\x) at ([yshift=\base]enc\x.north) {\tiny{$e_y()$}};
 				\foreach \x in {1,2,...,3}
-					\node[rnnnode,fill=blue!30!white,anchor=south] (dec\x) at ([yshift=0.3\base]demb\x.north) {{\tiny{$\textbf{s}_\x$}}};
+					\node[rnnnode,fill=blue!30!white,anchor=south] (dec\x) at ([yshift=0.3\base]demb\x.north) {{\tiny{$\vectorn{s}_\x$}}};
 				\foreach \x in {1,2,...,3}
 					\node[rnnnode,minimum height=0.5\base,fill=red!30!white,anchor=south] (softmax\x) at ([yshift=0.3\base]dec\x.north) {\tiny{Softmax}};
 				\node[wordnode,right=0.4\base of demb3] (end1) {$\cdots$};
@@ -73,7 +73,7 @@
 				\draw[-latex'] (enc3.north) .. controls +(north:0.3\base) and +(east:\base) .. (bridge) .. controls +(west:2.7\base) and +(west:0.3\base) .. (dec1.west);
 				{
-				\node [anchor=east] (line1) at ([xshift=-3em,yshift=0.5em]softmax1.west) {\scriptsize{基于RNN的隐层状态$\textbf{s}_i$}};
+				\node [anchor=east] (line1) at ([xshift=-3em,yshift=0.5em]softmax1.west) {\scriptsize{基于RNN的隐层状态$\vectorn{s}_i$}};
 				\node [anchor=north west] (line2) at ([yshift=0.3em]line1.south west) {\scriptsize{预测目标词的概率}};
 				\node [anchor=north west] (line3) at ([yshift=0.3em]line2.south west) {\scriptsize{通常，用Softmax函数}};
 				\node [anchor=north west] (line4) at ([yshift=0.3em]line3.south west) {\scriptsize{实现 $\textrm{P}(y_i|...)$}};
@@ -90,7 +90,7 @@
 				\node [anchor=west] (line21) at ([xshift=1.3em,yshift=1.5em]enc3.east)  {\scriptsize{源语编码器最后一个}};
 				\node [anchor=north west] (line22) at ([yshift=0.3em]line21.south west) {\scriptsize{循环单元的输出被}};
 				\node [anchor=north west] (line23) at ([yshift=0.3em]line22.south west) {\scriptsize{看作是句子的表示,}};
-				\node [anchor=north west] (line24) at ([yshift=0.3em]line23.south west) {\scriptsize{记为$\textbf{C}$}};
+				\node [anchor=north west] (line24) at ([yshift=0.3em]line23.south west) {\scriptsize{记为$\vectorn{C}$}};
 				}
 				\begin{pgfonlayer}{background}

--- a/Chapter10/Figures/figure-a-simple-example-for-tl.tex
+++ b/Chapter10/Figures/figure-a-simple-example-for-tl.tex
@@ -9,14 +9,14 @@
 \node [pos=0.4,left,xshift=-36em,yshift=7em,font=\small] (original0) {\quad 源语（中文）输入：};
 \node [pos=0.4,left,xshift=-22em,yshift=7em,font=\small] (original1) {
 \begin{tabular}[t]{l}
-\parbox{14em}{``我''、``很''、``好''、``<eos>'' }
+\parbox{14em}{“我”、“很”、“好”、“<eos>” }
 \end{tabular}
 };
 %译文1--------------mt1
 \node[font=\small] (mt1) at ([xshift=0em,yshift=-1em]original0.south) {目标语（英文）输出：};
 \node[font=\small] (ts1) at ([xshift=0em,yshift=-1em]original1.south)  {
 \begin{tabular}[t]{l}
-\parbox{14em}{``I''、``am''、``fine''、``<eos>''}
+\parbox{14em}{“I”、“am”、“fine”、“<eos>”}
 \end{tabular}
 };

--- a/Chapter10/Figures/figure-calculation-process-of-context-vector-c.tex
+++ b/Chapter10/Figures/figure-calculation-process-of-context-vector-c.tex
@@ -8,26 +8,26 @@
 \begin{scope}
-\node [anchor=west,draw,fill=red!20!white,inner sep=3pt,minimum width=2em,minimum height=1.2em] (h1) at (0,0) {\scriptsize{$\textbf{h}_1$}};
+\node [anchor=west,draw,fill=red!20!white,inner sep=3pt,minimum width=2em,minimum height=1.2em] (h1) at (0,0) {\scriptsize{$\vectorn{h}_1$}};
-\node [anchor=west,draw,fill=red!20!white,inner sep=3pt,minimum width=2em,minimum height=1.2em] (h2) at ([xshift=1em]h1.east) {\scriptsize{$\textbf{h}_2$}};
+\node [anchor=west,draw,fill=red!20!white,inner sep=3pt,minimum width=2em,minimum height=1.2em] (h2) at ([xshift=1em]h1.east) {\scriptsize{$\vectorn{h}_2$}};
 \node [anchor=west,inner sep=0pt,minimum width=3em] (h3) at ([xshift=0.5em]h2.east) {\scriptsize{...}};
-\node [anchor=west,draw,fill=red!20!white,inner sep=3pt,minimum width=2em,minimum height=1.2em] (h4) at ([xshift=0.5em]h3.east) {\scriptsize{$\textbf{h}_m$}};
+\node [anchor=west,draw,fill=red!20!white,inner sep=3pt,minimum width=2em,minimum height=1.2em] (h4) at ([xshift=0.5em]h3.east) {\scriptsize{$\vectorn{h}_m$}};
 \node [anchor=south,circle,minimum size=1.0em,draw,ublue,thick] (sum) at ([yshift=2em]h2.north east) {};
 \draw [thick,-,ublue] (sum.north) -- (sum.south);
 \draw [thick,-,ublue] (sum.west) -- (sum.east);
-\node [anchor=south,draw,fill=green!20!white,inner sep=3pt,minimum width=2em,minimum height=1.2em] (th1) at ([yshift=2em,xshift=-1em]sum.north west) {\scriptsize{$\textbf{s}_{j-1}$}};
+\node [anchor=south,draw,fill=green!20!white,inner sep=3pt,minimum width=2em,minimum height=1.2em] (th1) at ([yshift=2em,xshift=-1em]sum.north west) {\scriptsize{$\vectorn{s}_{j-1}$}};
-\node [anchor=west,draw,fill=green!20!white,inner sep=3pt,minimum width=2em,minimum height=1.2em] (th2) at ([xshift=2em]th1.east) {\scriptsize{$\textbf{s}_{j}$}};
+\node [anchor=west,draw,fill=green!20!white,inner sep=3pt,minimum width=2em,minimum height=1.2em] (th2) at ([xshift=2em]th1.east) {\scriptsize{$\vectorn{s}_{j}$}};
-\draw [->] (h1.north) .. controls +(north:0.8) and +(west:1) ..  (sum.190) node [pos=0.3,left] {\scriptsize{$\alpha_{1,j}$}};
+\draw [->] (h1.north) .. controls +(north:0.8) and +(west:1) ..  (sum.190) node [pos=0.2,left] {\scriptsize{$\alpha_{1,j}$}};
 \draw [->] (h2.north) .. controls +(north:0.6) and +(220:0.2) ..  (sum.220) node [pos=0.2,right] {\scriptsize{$\alpha_{2,j}$}};
 \draw [->] (h4.north) .. controls +(north:0.8) and +(east:1) ..  (sum.-10) node [pos=0.1,left] (alphan) {\scriptsize{$\alpha_{m,j}$}};
 \draw [->] ([xshift=-1.5em]th1.west) -- ([xshift=-0.1em]th1.west);
 \draw [->] ([xshift=0.1em]th1.east) -- ([xshift=-0.1em]th2.west);
 \draw [->] ([xshift=0.1em]th2.east) -- ([xshift=1.5em]th2.east);
-\draw [->] (sum.north) .. controls +(north:0.8) and +(west:0.2) ..  ([yshift=-0.4em,xshift=-0.1em]th2.west) node [pos=0.2,right] (ci) {\scriptsize{$\textbf{C}_{j}$}};
+\draw [->] (sum.north) .. controls +(north:0.8) and +(west:0.2) ..  ([yshift=-0.4em,xshift=-0.1em]th2.west) node [pos=0.2,right] (ci) {\scriptsize{$\vectorn{C}_{j}$}};
 \node [anchor=south,inner sep=1pt] (output) at ([yshift=0.8em]th2.north) {\scriptsize{输出层}};
 \draw [->] ([yshift=0.1em]th2.north) -- ([yshift=-0.1em]output.south);
@@ -39,11 +39,11 @@
 \node [anchor=north] (enc42) at ([yshift=0.5em]enc4.south) {\scriptsize{(位置$4$)}};
 {
-\node [anchor=west] (math1) at ([xshift=5em,yshift=1em]th2.east) {$\textbf{C}_j = \sum_{i} \alpha_{i,j} \textbf{h}_i \ \ $};
+\node [anchor=west] (math1) at ([xshift=5em,yshift=1em]th2.east) {$\vectorn{C}_j = \sum_{i} \alpha_{i,j} \vectorn{h}_i \ \ $};
 }
 {
 \node [anchor=north west] (math2) at ([yshift=-2em]math1.south west) {$\alpha_{i,j} = \frac{\exp(\beta_{i,j})}{\sum_{i'} \exp(\beta_{i',j})}$};
-\node [anchor=north west] (math3) at ([yshift=-0em]math2.south west) {$\beta_{i,j} = a(\textbf{s}_{j-1}, \textbf{h}_i)$};
+\node [anchor=north west] (math3) at ([yshift=-0em]math2.south west) {$\beta_{i,j} = a(\vectorn{s}_{j-1}, \vectorn{h}_i)$};
 }
 \begin{pgfonlayer}{background}

--- a/Chapter10/Figures/figure-encoder-decoder-process.tex
+++ b/Chapter10/Figures/figure-encoder-decoder-process.tex
@@ -2,9 +2,9 @@
 \begin{scope}
 \small{
-\node [anchor=south west,minimum width=15em] (source) at (0,0) {\textbf{source}: 我\ \ \ \ 对\ \ \ \ 你\ \ \ \ 感到\ \ \ \ 满意};
+\node [anchor=south west,minimum width=15em] (source) at (0,0) {\textbf{源语}: 我\ \ \ \ 对\ \ \ \ 你\ \ \ \ 感到\ \ \ \ 满意};
 {
-\node [anchor=south west,minimum width=15em] (target) at ([yshift=12em]source.north west) {\textbf{target}: I\ \ am\ \ \ satisfied\ \ \ with\ \ \ you};
+\node [anchor=south west,minimum width=15em] (target) at ([yshift=12em]source.north west) {\textbf{目标语}: I\ \ am\ \ \ satisfied\ \ \ with\ \ \ you};
 }
 {
 \node [anchor=center,minimum width=9.6em,minimum height=1.8em,draw,rounded corners=0.3em] (hidden) at ([yshift=6em]source.north) {};
@@ -24,7 +24,7 @@
 \node [anchor=west,minimum width=1.5em,minimum size=1.5em] (cell08) at (cell06.east){\small{
 \hspace{0.6em}
 \begin{tabular}{l}
-源语言句子的``表示''
+源语句子的“表示”
 \end{tabular}
 }
 };

--- a/Chapter10/Figures/figure-encoder-decoder-with-attention.tex
+++ b/Chapter10/Figures/figure-encoder-decoder-with-attention.tex
@@ -80,9 +80,9 @@
 \draw[<-] ([yshift=0.1em,xshift=1em]t6.north) -- ([yshift=1.2em,xshift=1em]t6.north);
-\draw [->] ([yshift=3em]s6.north) -- ([yshift=4em]s6.north) -- ([yshift=4em]t1.north) node [pos=0.5,fill=green!30,inner sep=2pt] (c1) {\scriptsize{表示$\textbf{C}_1$}} -- ([yshift=3em]t1.north) ;
+\draw [->] ([yshift=3em]s6.north) -- ([yshift=4em]s6.north) -- ([yshift=4em]t1.north) node [pos=0.5,fill=green!30,inner sep=2pt] (c1) {\scriptsize{表示$\vectorn{C}_1$}} -- ([yshift=3em]t1.north) ;
-\draw [->] ([yshift=3em]s5.north) -- ([yshift=5.3em]s5.north) -- ([yshift=5.3em]t2.north) node [pos=0.5,fill=green!30,inner sep=2pt] (c2) {\scriptsize{表示$\textbf{C}_2$}} -- ([yshift=3em]t2.north) ;
+\draw [->] ([yshift=3em]s5.north) -- ([yshift=5.3em]s5.north) -- ([yshift=5.3em]t2.north) node [pos=0.5,fill=green!30,inner sep=2pt] (c2) {\scriptsize{表示$\vectorn{C}_2$}} -- ([yshift=3em]t2.north) ;
-\draw [->] ([yshift=3.5em]s3.north) -- ([yshift=6.6em]s3.north) -- ([yshift=6.6em]t4.north) node [pos=0.5,fill=green!30,inner sep=2pt] (c3) {\scriptsize{表示$\textbf{C}_i$}} -- ([yshift=3.5em]t4.north) ;
+\draw [->] ([yshift=3.5em]s3.north) -- ([yshift=6.6em]s3.north) -- ([yshift=6.6em]t4.north) node [pos=0.5,fill=green!30,inner sep=2pt] (c3) {\scriptsize{表示$\vectorn{C}_i$}} -- ([yshift=3.5em]t4.north) ;
 \node [anchor=north] (smore) at ([yshift=3.5em]s3.north) {...};
 \node [anchor=north] (tmore) at ([yshift=3.5em]t4.north) {...};

--- a/Chapter10/Figures/figure-example-of-context-vector-calculation-process.tex
+++ b/Chapter10/Figures/figure-example-of-context-vector-calculation-process.tex
@@ -104,9 +104,9 @@
 %\visible<3->
 {
 % coverage score formula node
-\node [anchor=north west] (formula) at ([xshift=-0.3\hnode,yshift=-1.5\hnode]attn11.south) {\small{不同$\textbf{C}_j$所对应的源语言词的权重是不同的}};
+\node [anchor=north west] (formula) at ([xshift=-0.3\hnode,yshift=-1.5\hnode]attn11.south) {\small{不同$\vectorn{C}_j$所对应的源语言词的权重是不同的}};
-\node [anchor=north west] (example) at (formula.south west) {\footnotesize{$\textbf{C}_2=0.4 \times \textbf{h}(\textrm{``你''}) + 0.4 \times \textbf{h}(\textrm{``什么''}) +$}};
+\node [anchor=north west] (example) at (formula.south west) {\footnotesize{$\vectorn{C}_2=0.4 \times \vectorn{h}(\textrm{“你”}) + 0.4 \times \vectorn{h}(\textrm{“什么”}) +$}};
-\node [anchor=north west] (example2) at ([yshift=0.4em]example.south west) {\footnotesize{$\ \ \ \ \ \ \ \ 0 \times \textbf{h}(\textrm{``都''}) + 0.1 \times \textbf{h}(\textrm{`` 没''}) + ..$}};
+\node [anchor=north west] (example2) at ([yshift=0.4em]example.south west) {\footnotesize{$\ \ \ \ \ \ \ \ 0 \times \vectorn{h}(\textrm{“都”}) + 0.1 \times \vectorn{h}(\textrm{“ 没”}) + ..$}};
 }
 %\visible<3->
@@ -138,12 +138,12 @@
 %\visible<2->
 {
-\node[anchor=west] (sc1) at ([xshift=0.9\hnode]attn16.east) {$\textbf{C}_1 = \sum_{i=1}^{8} \alpha_{i1} \textbf{h}_{i}$};
+\node[anchor=west] (sc1) at ([xshift=0.9\hnode]attn16.east) {$\vectorn{C}_1 = \sum_{i=1}^{8} \alpha_{i1} \vectorn{h}_{i}$};
 }
 %\visible<3->
 {
-\node[anchor=west] (sc2) at ([xshift=0.9\hnode,yshift=0.0\hnode]attn26.east) {$\textbf{C}_2 = \sum_{i=1}^{8} \alpha_{i2} \textbf{h}_{i}$};
+\node[anchor=west] (sc2) at ([xshift=0.9\hnode,yshift=0.0\hnode]attn26.east) {$\vectorn{C}_2 = \sum_{i=1}^{8} \alpha_{i2} \vectorn{h}_{i}$};
 }
 \end{tikzpicture}
\ No newline at end of file
--- a/Chapter10/Figures/figure-gru01.tex
+++ b/Chapter10/Figures/figure-gru01.tex
@@ -78,8 +78,8 @@
        \end{scope}
        \begin{scope}
-            \node[wordnode,anchor=south] () at (aux71) {$\mathbf{h}_{t-1}$};
+            \node[wordnode,anchor=south] () at (aux71) {$\vectorn{h}_{t-1}$};
-            \node[wordnode,anchor=west] () at (aux12) {$\mathbf{x}_t$};
+            \node[wordnode,anchor=west] () at (aux12) {$\vectorn{x}_t$};
        \end{scope}

--- a/Chapter10/Figures/figure-gru02.tex
+++ b/Chapter10/Figures/figure-gru02.tex
@@ -91,8 +91,8 @@
        \end{scope}
        \begin{scope}
-            \node[wordnode,anchor=south] () at (aux71) {$\mathbf{h}_{t-1}$};
+            \node[wordnode,anchor=south] () at (aux71) {$\vectorn{h}_{t-1}$};
-            \node[wordnode,anchor=west] () at (aux12) {$\mathbf{x}_t$};
+            \node[wordnode,anchor=west] () at (aux12) {$\vectorn{x}_t$};
        \end{scope}

--- a/Chapter10/Figures/figure-gru03.tex
+++ b/Chapter10/Figures/figure-gru03.tex
@@ -109,11 +109,11 @@
        \end{scope}
        \begin{scope}
-             \node[wordnode,anchor=south] () at (aux71) {$\mathbf{h}_{t-1}$};
+             \node[wordnode,anchor=south] () at (aux71) {$\vectorn{h}_{t-1}$};
-            \node[wordnode,anchor=west] () at (aux12) {$\mathbf{x}_t$};
+            \node[wordnode,anchor=west] () at (aux12) {$\vectorn{x}_t$};
            {
-                \node[wordnode,anchor=east] () at (aux87) {$\mathbf{h}_{t}$};
+                \node[wordnode,anchor=east] () at (aux87) {$\vectorn{h}_{t}$};
-                \node[wordnode,anchor=south] () at (aux78) {$\mathbf{h}_{t}$};
+                \node[wordnode,anchor=south] () at (aux78) {$\vectorn{h}_{t}$};
            }
        \end{scope}

--- a/Chapter10/Figures/figure-lstm01.tex
+++ b/Chapter10/Figures/figure-lstm01.tex
@@ -84,9 +84,9 @@
        \end{scope}
        \begin{scope}
-            \node[wordnode,anchor=south] () at ([xshift=0.5\base]aux21) {$\mathbf{h}_{t-1}$};
+            \node[wordnode,anchor=south] () at ([xshift=0.5\base]aux21) {$\vectorn{h}_{t-1}$};
-            \node[wordnode,anchor=west] () at (aux12) {$\mathbf{x}_t$};
+            \node[wordnode,anchor=west] () at (aux12) {$\vectorn{x}_t$};
-            \node[wordnode,anchor=south] () at ([xshift=0.5\base]aux51) {$\mathbf{c}_{t-1}$};
+            \node[wordnode,anchor=south] () at ([xshift=0.5\base]aux51) {$\vectorn{c}_{t-1}$};
        \end{scope}

--- a/Chapter10/Figures/figure-lstm02.tex
+++ b/Chapter10/Figures/figure-lstm02.tex
@@ -99,9 +99,9 @@
         \end{scope}
        \begin{scope}
-            \node[wordnode,anchor=south] () at ([xshift=0.5\base]aux21) {$\mathbf{h}_{t-1}$};
+            \node[wordnode,anchor=south] () at ([xshift=0.5\base]aux21) {$\vectorn{h}_{t-1}$};
-            \node[wordnode,anchor=west] () at (aux12) {$\mathbf{x}_t$};
+            \node[wordnode,anchor=west] () at (aux12) {$\vectorn{x}_t$};
-            \node[wordnode,anchor=south] () at ([xshift=0.5\base]aux51) {$\mathbf{c}_{t-1}$};
+            \node[wordnode,anchor=south] () at ([xshift=0.5\base]aux51) {$\vectorn{c}_{t-1}$};
        \end{scope}

--- a/Chapter10/Figures/figure-lstm03.tex
+++ b/Chapter10/Figures/figure-lstm03.tex
@@ -113,11 +113,11 @@
        \end{scope}
        \begin{scope}
-            \node[wordnode,anchor=south] () at ([xshift=0.5\base]aux21) {$\mathbf{h}_{t-1}$};
+            \node[wordnode,anchor=south] () at ([xshift=0.5\base]aux21) {$\vectorn{h}_{t-1}$};
-            \node[wordnode,anchor=west] () at (aux12) {$\mathbf{x}_t$};
+            \node[wordnode,anchor=west] () at (aux12) {$\vectorn{x}_t$};
-            \node[wordnode,anchor=south] () at ([xshift=0.5\base]aux51) {$\mathbf{c}_{t-1}$};
+            \node[wordnode,anchor=south] () at ([xshift=0.5\base]aux51) {$\vectorn{c}_{t-1}$};
            {
-                \node[wordnode,anchor=south] () at ([xshift=-0.5\base]aux59) {$\mathbf{c}_{t}$};
+                \node[wordnode,anchor=south] () at ([xshift=-0.5\base]aux59) {$\vectorn{c}_{t}$};
            }
        \end{scope}

--- a/Chapter10/Figures/figure-lstm04.tex
+++ b/Chapter10/Figures/figure-lstm04.tex
@@ -131,15 +131,15 @@
        \end{scope}
        \begin{scope}
-            \node[wordnode,anchor=south] () at ([xshift=0.5\base]aux21) {$\mathbf{h}_{t-1}$};
+            \node[wordnode,anchor=south] () at ([xshift=0.5\base]aux21) {$\vectorn{h}_{t-1}$};
-            \node[wordnode,anchor=west] () at (aux12) {$\mathbf{x}_t$};
+            \node[wordnode,anchor=west] () at (aux12) {$\vectorn{x}_t$};
-            \node[wordnode,anchor=south] () at ([xshift=0.5\base]aux51) {$\mathbf{c}_{t-1}$};
+            \node[wordnode,anchor=south] () at ([xshift=0.5\base]aux51) {$\vectorn{c}_{t-1}$};
            {
-                \node[wordnode,anchor=south] () at ([xshift=-0.5\base]aux59) {$\mathbf{c}_{t}$};
+                \node[wordnode,anchor=south] () at ([xshift=-0.5\base]aux59) {$\vectorn{c}_{t}$};
            }
            {
-                \node[wordnode,anchor=east] () at (aux68) {$\mathbf{h}_{t}$};
+                \node[wordnode,anchor=east] () at (aux68) {$\vectorn{h}_{t}$};
-                \node[wordnode,anchor=south] () at ([xshift=-0.5\base]aux29) {$\mathbf{h}_{t}$};
+                \node[wordnode,anchor=south] () at ([xshift=-0.5\base]aux29) {$\vectorn{h}_{t}$};
            }
        \end{scope}

--- a/Chapter10/Figures/figure-output-layer-structur.tex
+++ b/Chapter10/Figures/figure-output-layer-structur.tex
@@ -123,7 +123,7 @@
                    \draw [->,thick] ([xshift=0.2em,yshift=0.1em]hidden.north west) -- (target.south west);
                    \draw [->,thick] ([xshift=-0.2em,yshift=0.1em]hidden.north east) -- (target.south east);
-                    \node [anchor=south] () at ([yshift=0.3em]hidden.north) {\scriptsize{$\hat{\mathbf{s}}=\mathbf{Ws}$}};
+                    \node [anchor=south] () at ([yshift=0.3em]hidden.north) {\scriptsize{$\hat{\vectorn{s}}=\vectorn{Ws}$}};
                }
                {

--- a/Chapter10/Figures/figure-query-model-corresponding-to-attention-mechanism.tex
+++ b/Chapter10/Figures/figure-query-model-corresponding-to-attention-mechanism.tex
@@ -12,17 +12,17 @@
 \tikzstyle{rnode} = [draw,minimum width=3.5em,minimum height=1.2em]
-\node [rnode,anchor=south west,fill=red!20!white] (value1) at (0,0) {\scriptsize{$\textbf{h}(\textrm{``你''})$}};
+\node [rnode,anchor=south west,fill=red!20!white] (value1) at (0,0) {\scriptsize{$\vectorn{h}(\textrm{“你”})$}};
-\node [rnode,anchor=south west,fill=red!20!white] (value2) at ([xshift=1em]value1.south east) {\scriptsize{$\textbf{h}(\textrm{``什么''})$}};
+\node [rnode,anchor=south west,fill=red!20!white] (value2) at ([xshift=1em]value1.south east) {\scriptsize{$\vectorn{h}(\textrm{“什么”})$}};
-\node [rnode,anchor=south west,fill=red!20!white] (value3) at ([xshift=1em]value2.south east) {\scriptsize{$\textbf{h}(\textrm{``也''})$}};
+\node [rnode,anchor=south west,fill=red!20!white] (value3) at ([xshift=1em]value2.south east) {\scriptsize{$\vectorn{h}(\textrm{“也”})$}};
-\node [rnode,anchor=south west,fill=red!20!white] (value4) at ([xshift=1em]value3.south east) {\scriptsize{$\textbf{h}(\textrm{``没''})$}};
+\node [rnode,anchor=south west,fill=red!20!white] (value4) at ([xshift=1em]value3.south east) {\scriptsize{$\vectorn{h}(\textrm{“没”})$}};
-\node [rnode,anchor=south west,fill=green!20!white] (key1) at ([yshift=0.2em]value1.north west) {\scriptsize{$\textbf{h}(\textrm{``你''})$}};
+\node [rnode,anchor=south west,fill=green!20!white] (key1) at ([yshift=0.2em]value1.north west) {\scriptsize{$\vectorn{h}(\textrm{“你”})$}};
-\node [rnode,anchor=south west,fill=green!20!white] (key2) at ([yshift=0.2em]value2.north west) {\scriptsize{$\textbf{h}(\textrm{``什么''})$}};
+\node [rnode,anchor=south west,fill=green!20!white] (key2) at ([yshift=0.2em]value2.north west) {\scriptsize{$\vectorn{h}(\textrm{“什么”})$}};
-\node [rnode,anchor=south west,fill=green!20!white] (key3) at ([yshift=0.2em]value3.north west) {\scriptsize{$\textbf{h}(\textrm{``也''})$}};
+\node [rnode,anchor=south west,fill=green!20!white] (key3) at ([yshift=0.2em]value3.north west) {\scriptsize{$\vectorn{h}(\textrm{“也”})$}};
-\node [rnode,anchor=south west,fill=green!20!white] (key4) at ([yshift=0.2em]value4.north west) {\scriptsize{$\textbf{h}(\textrm{``没''})$}};
+\node [rnode,anchor=south west,fill=green!20!white] (key4) at ([yshift=0.2em]value4.north west) {\scriptsize{$\vectorn{h}(\textrm{“没”})$}};
-\node [rnode,anchor=east] (query) at ([xshift=-2em]key1.west) {\scriptsize{$\textbf{s}(\textrm{``you''})$}};
+\node [rnode,anchor=east] (query) at ([xshift=-2em]key1.west) {\scriptsize{$\vectorn{s}(\textrm{“you”})$}};
 \node [anchor=east] (querylabel) at ([xshift=-0.2em]query.west) {\scriptsize{query}};
 \draw [->] ([yshift=1pt,xshift=6pt]query.north) .. controls +(90:1em) and +(90:1em) .. ([yshift=1pt]key1.north);

--- a/Chapter10/Figures/figure-the-whole-of-lstm.tex
+++ b/Chapter10/Figures/figure-the-whole-of-lstm.tex
@@ -141,15 +141,15 @@
 \end{scope}
 \begin{scope}
-\node[wordnode,anchor=south] () at ([xshift=0.5\base]aux21) {$\mathbf{h}_{t-1}$};
+\node[wordnode,anchor=south] () at ([xshift=0.5\base]aux21) {$\vectorn{h}_{t-1}$};
-\node[wordnode,anchor=west] () at (aux12) {$\mathbf{x}_t$};
+\node[wordnode,anchor=west] () at (aux12) {$\vectorn{x}_t$};
-\node[wordnode,anchor=south] () at ([xshift=0.5\base]aux51) {$\mathbf{c}_{t-1}$};
+\node[wordnode,anchor=south] () at ([xshift=0.5\base]aux51) {$\vectorn{c}_{t-1}$};
 {
-\node[wordnode,anchor=south] () at ([xshift=-0.5\base]aux59) {$\mathbf{c}_{t}$};
+\node[wordnode,anchor=south] () at ([xshift=-0.5\base]aux59) {$\vectorn{c}_{t}$};
 }
 {
-\node[wordnode,anchor=east] () at (aux68) {$\mathbf{h}_{t}$};
+\node[wordnode,anchor=east] () at (aux68) {$\vectorn{h}_{t}$};
-\node[wordnode,anchor=south] () at ([xshift=-0.5\base]aux29) {$\mathbf{h}_{t}$};
+\node[wordnode,anchor=south] () at ([xshift=-0.5\base]aux29) {$\vectorn{h}_{t}$};
 }
 \end{scope}
@@ -170,19 +170,19 @@
 \begin{scope}
 {
 % forget gate formula
-\node[formulanode,anchor=south east,text width=10em] () at ([shift={(4\base,1.5\base)}]aux51) {遗忘门\\$\mathbf{f}_t=\sigma(\mathbf{W}_f[\mathbf{h}_{t-1},\mathbf{x}_t]+\mathbf{b}_f)$};
+\node[formulanode,anchor=south east,text width=10em] () at ([shift={(4\base,1.5\base)}]aux51) {遗忘门\\$\vectorn{f}_t=\sigma(\vectorn{W}_f[\vectorn{h}_{t-1},\vectorn{x}_t]+\vectorn{b}_f)$};
 }
 {
 % input gate formula
-\node[formulanode,anchor=north east,text width=10em] () at ([shift={(4\base,-1.5\base)}]aux21) {输入门\\$\mathbf{i}_t=\sigma(\mathbf{W}_i[\mathbf{h}_{t-1},\mathbf{x}_t]+\mathbf{b}_i)$\\$\hat{\mathbf{c}}_t=\mathrm{tanh}(\mathbf{W}_c[\mathbf{h}_{t-1},\mathbf{x}_t]+\mathbf{b}_c)$};
+\node[formulanode,anchor=north east,text width=10em] () at ([shift={(4\base,-1.5\base)}]aux21) {输入门\\$\vectorn{i}_t=\sigma(\vectorn{W}_i[\vectorn{h}_{t-1},\vectorn{x}_t]+\vectorn{b}_i)$\\$\hat{\vectorn{c}}_t=\mathrm{tanh}(\vectorn{W}_c[\vectorn{h}_{t-1},\vectorn{x}_t]+\vectorn{b}_c)$};
 }
 {
 % cell update formula
-\node[formulanode,anchor=south west,text width=10em] () at ([shift={(-4\base,1.5\base)}]aux59) {记忆更新\\$\mathbf{c}_{t}=\mathbf{f}_t\cdot \mathbf{c}_{t-1}+\mathbf{i}_t\cdot \hat{\mathbf{c}}_t$};
+\node[formulanode,anchor=south west,text width=10em] () at ([shift={(-4\base,1.5\base)}]aux59) {记忆更新\\$\vectorn{c}_{t}=\vectorn{f}_t\cdot \vectorn{c}_{t-1}+\vectorn{i}_t\cdot \hat{\vectorn{c}}_t$};
 }
 {
 % output gate formula
-\node[formulanode,anchor=north west,text width=10em] () at ([shift={(-4\base,-1.5\base)}]aux29) {输出门\\$\mathbf{o}_t=\sigma(\mathbf{W}_o[\mathbf{h}_{t-1},\mathbf{x}_t]+\mathbf{b}_o)$\\$\mathbf{h}_{t}=\mathbf{o}_t\cdot \mathrm{tanh}(\mathbf{c}_{t})$};
+\node[formulanode,anchor=north west,text width=10em] () at ([shift={(-4\base,-1.5\base)}]aux29) {输出门\\$\vectorn{o}_t=\sigma(\vectorn{W}_o[\vectorn{h}_{t-1},\vectorn{x}_t]+\vectorn{b}_o)$\\$\vectorn{h}_{t}=\vectorn{o}_t\cdot \mathrm{tanh}(\vectorn{c}_{t})$};
 }
 \end{scope}
 \end{tikzpicture}

--- a/Chapter10/Figures/figure-word-embedding-structure.tex
+++ b/Chapter10/Figures/figure-word-embedding-structure.tex
@@ -14,9 +14,9 @@
                    \node[rnnnode,minimum height=0.5\base,fill=green!30!white,anchor=west] (eemb\x) at ([xshift=0.4\base]eemb\y.east) {\tiny{$e_x()$}};
                \foreach \x in {1,2,...,3}
                    \node[rnnnode,fill=blue!30!white,anchor=south] (enc\x) at ([yshift=0.3\base]eemb\x.north) {};
-                    \node[] (enclabel1) at (enc1) {\tiny{$\textbf{h}_{m-2}$}};
+                    \node[] (enclabel1) at (enc1) {\tiny{$\vectorn{h}_{m-2}$}};
-                    \node[] (enclabel2) at (enc2) {\tiny{$\textbf{h}_{m-1}$}};
+                    \node[] (enclabel2) at (enc2) {\tiny{$\vectorn{h}_{m-1}$}};
-                    \node[rnnnode,fill=purple!30!white] (enclabel3) at (enc3) {\tiny{$\textbf{h}_{m}$}};
+                    \node[rnnnode,fill=purple!30!white] (enclabel3) at (enc3) {\tiny{$\vectorn{h}_{m}$}};
                \node[wordnode,left=0.4\base of enc1] (init1) {$\cdots$};
                \node[wordnode,left=0.4\base of eemb1] (init2) {$\cdots$};
@@ -28,7 +28,7 @@
                \foreach \x in {1,2,...,3}
                    \node[rnnnode,minimum height=0.5\base,fill=green!30!white,anchor=south] (demb\x) at ([yshift=\base]enc\x.north) {\tiny{$e_y()$}};
                \foreach \x in {1,2,...,3}
-                    \node[rnnnode,fill=blue!30!white,anchor=south] (dec\x) at ([yshift=0.3\base]demb\x.north) {{\tiny{$\textbf{s}_\x$}}};
+                    \node[rnnnode,fill=blue!30!white,anchor=south] (dec\x) at ([yshift=0.3\base]demb\x.north) {{\tiny{$\vectorn{s}_\x$}}};
                \foreach \x in {1,2,...,3}
                    \node[rnnnode,minimum height=0.5\base,fill=red!30!white,anchor=south] (softmax\x) at ([yshift=0.3\base]dec\x.north) {\tiny{Softmax}};
                \node[wordnode,right=0.4\base of demb3] (end1) {$\cdots$};

--- a/Chapter10/chapter10.tex
+++ b/Chapter10/chapter10.tex
--- a/Chapter2/Figures/figure-schematic-chain-rule.tex
+++ b/Chapter2/Figures/figure-schematic-chain-rule.tex
-%%% outline
-%-------------------------------------------------------------------------
-\begin{tikzpicture}
-\node [anchor=north west](num1)  at (0,0) {\large{A}};
-\node [anchor=north west](num2)  at ([xshift=5.8em,yshift=1.44em]num1.south west) {\large{B}};
-\node [anchor=north west](num3)  at ([xshift=5.8em,yshift=1.44em]num2.south west) {\large{C}};
-\node [anchor=north west](num4)  at ([xshift=5.8em,yshift=1.44em]num3.south west) {\large{D}};
-\node [anchor=north west](num5)  at ([xshift=0.04em,yshift=-2.5em]num3.south west) {\large{E}};
-\draw [<-,very thick,black] (num1.east)--(num2.west);
-\draw [->,very thick,black] (num2.east)--(num3.west);
-\draw [<-,very thick,black] (num3.east)--(num4.west);
-\draw [->,very thick,black] (num3.south)--(num5.north);
-\end{tikzpicture}
--- a/Chapter2/chapter2.tex
+++ b/Chapter2/chapter2.tex
@@ -41,7 +41,7 @@
 %----------------------------------------------------------------------------------------
 \subsection{随机变量和概率}
-\parinterval 在自然界中，很多{\small\bfnew{事件}}\index{事件}（Event）\index{Event}是否会发生是不确定的。例如，明天会下雨、掷一枚硬币是正面朝上、扔一个骰子的点数是1等。这些事件可能会发生也可能不会发生。通过大量的重复试验，能发现其具有某种规律性的事件叫做{\small\sffamily\bfseries{随机事件}}\index{随机事件}。
+\parinterval 在自然界中，很多{\small\bfnew{事件}}\index{事件}（Event）\index{Event}是否会发生是不确定的。例如，明天会下雨、掷一枚硬币是正面朝上、扔一个骰子的点数是1等。这些事件可能会发生也可能不会发生。通过大量的重复试验，能发现具有某种规律性的事件叫做{\small\sffamily\bfseries{随机事件}}\index{随机事件}。
 \parinterval {\small\sffamily\bfseries{随机变量}}\index{随机变量}（Random Variable）\index{Random Variable}是对随机事件发生可能状态的描述，是随机事件的数量表征。设$\Omega = \{ \omega \}$为一个随机试验的样本空间，$X=X(\omega)$就是定义在样本空间$\Omega$上的单值实数函数，即$X=X(\omega)$为随机变量，记为$X$。随机变量是一种能随机选取数值的变量，常用大写的英语字母或希腊字母表示，其取值通常用小写字母来表示。例如，用$A$ 表示一个随机变量，用$a$表示变量$A$的一个取值。根据随机变量可以选取的值的某些性质，可以将其划分为离散变量和连续变量。
@@ -62,7 +62,7 @@
 \begin{tabular}{c|c c c c c c}
 \rule{0pt}{15pt}     $A$ & $a_1=1$ & $a_2=2$ & $a_3=3$ & $a_4=4$ & $a_5=5$ & $a_6=6$\\
               \hline
-\rule{0pt}{15pt}     $\funp{P}_i$ & $\funp{P}_1=\frac{4}{25}$  &  $\funp{P}_2=\frac{3}{25}$ &  $\funp{P}_3=\frac{4}{25}$ & $\funp{P}_4=\frac{6}{25}$ & $\funp{P}_5=\frac{3}{25}$ & $\funp{P}_6=\frac{1}{25}$  \\
+\rule{0pt}{15pt}     $\funp{P}_i$ & $\funp{P}_1=\frac{4}{25}$  &  $\funp{P}_2=\frac{3}{25}$ &  $\funp{P}_3=\frac{4}{25}$ & $\funp{P}_4=\frac{6}{25}$ & $\funp{P}_5=\frac{3}{25}$ & $\funp{P}_6=\frac{5}{25}$  \\
             \end{tabular}
             \label{tab:2-1}
 \end{table}
@@ -70,7 +70,7 @@
 \parinterval 除此之外，概率函数$\funp{P}(\cdot)$还具有非负性、归一性等特点。非负性是指，所有的概率函数$\funp{P}(\cdot)$都必须是大于等于0的数值，概率函数中不可能出现负数，即$\forall{x},\funp{P}{(x)}\geq{0}$。归一性，又称规范性，简单的说就是所有可能发生的事件的概率总和为1，即$\sum_{x}\funp{P}{(x)}={1}$。
-\parinterval 对于离散变量$A$，$\funp{P}(A=a)$是个确定的值，可以表示事件$A=a$的可能性大小；而对于连续变量，求在某个定点处的概率是无意义的，只能求其落在某个取值区间内的概率。因此，用{\small\sffamily\bfseries{概率分布函数}}\index{概率分布函数}$F(x)$和{\small\sffamily\bfseries{概率密度函数}}\index{概率密度函数}$f(x)$来统一描述随机变量取值的分布情况（如图\ref{fig:2-1}）。概率分布函数$F(x)$表示取值小于等于某个值的概率，是概率的累加（或积分）形式。假设$A$是一个随机变量，$a$是任意实数，将函数$F(a)=\funp{P}\{A\leq a\}$定义为$A$的分布函数。通过分布函数，可以清晰地表示任何随机变量的概率。
+\parinterval 对于离散变量$A$，$\funp{P}(A=a)$是个确定的值，可以表示事件$A=a$的可能性大小；而对于连续变量，求在某个定点处的概率是无意义的，只能求其落在某个取值区间内的概率。因此，用{\small\sffamily\bfseries{概率分布函数}}\index{概率分布函数}$F(x)$和{\small\sffamily\bfseries{概率密度函数}}\index{概率密度函数}$f(x)$来统一描述随机变量取值的分布情况（如图\ref{fig:2-1}）。概率分布函数$F(x)$表示取值小于等于某个值的概率，是概率的累加（或积分）形式。假设$A$是一个随机变量，$a$是任意实数，将函数$F(a)=\funp{P}\{A\leq a\}$定义为$A$的分布函数。通过分布函数，可以清晰地表示任何随机变量的概率分布情况。
 %----------------------------------------------
 \begin{figure}[htp]
@@ -81,7 +81,7 @@
 \end{figure}
 %-------------------------------------------
-\parinterval 概率密度函数反映了变量在某个区间内的概率变化快慢，概率密度函数的值是概率的变化率，该连续变量的概率也就是对概率密度函数求积分得到的结果。设$f(x) \geq 0$是连续变量$X$的概率密度函数，$X$的分布函数就可以用如下公式定义：
+\parinterval 概率密度函数反映了变量在某个区间内的概率变化快慢，概率密度函数的值是概率的变化率，该连续变量的概率分布函数也就是对概率密度函数求积分得到的结果。设$f(x) \geq 0$是连续变量$X$的概率密度函数，$X$的分布函数就可以用如下公式定义：
 \begin{eqnarray}
 F(x)=\int_{-\infty}^x f(x)\textrm{d}x
 \label{eq:2-1}
@@ -92,9 +92,9 @@ F(x)=\int_{-\infty}^x f(x)\textrm{d}x
 %----------------------------------------------------------------------------------------
 \subsection{联合概率、条件概率和边缘概率}
-\parinterval {\small\sffamily\bfseries{联合概率}}\index{联合概率}（Joint Probability）\index{Joint Probability}是指多个事件共同发生，每个随机变量满足各自条件的概率，表示为$\funp{P}(AB)$或$\funp{P}(A\cap{B})$。{\small\sffamily\bfseries{条件概率}}\index{条件概率}（Conditional Probability）\index{Conditional Probability}是指$A$、$B$为任意的两个事件，在事件$A$已出现的前提下，事件$B$出现的概率，使用$\funp{P}(B \mid A)$表示。
+\parinterval {\small\sffamily\bfseries{联合概率}}\index{联合概率}（Joint Probability）\index{Joint Probability}是指多个事件共同发生，每个随机变量满足各自条件的概率。如事件$A$和事件$B$的联合概率可以表示为$\funp{P}(AB)$或$\funp{P}(A\cap{B})$。{\small\sffamily\bfseries{条件概率}}\index{条件概率}（Conditional Probability）\index{Conditional Probability}是指$A$、$B$为任意的两个事件，在事件$A$已出现的前提下，事件$B$出现的概率，使用$\funp{P}(B \mid A)$表示。
-\parinterval 贝叶斯法则（见\ref{sec:2.2.3}小节）是条件概率计算时的重要依据，条件概率可以表示为
+\parinterval 贝叶斯法则（见\ref{sec:2.2.3}小节）是条件概率计算时的重要依据，条件概率可以表示为：
 \begin{eqnarray}
 \funp{P}{(B|A)} & = & \frac{\funp{P}(A\cap{B})}{\funp{P}(A)}  \nonumber \\
                           & = & \frac{\funp{P}(A)\funp{P}(B|A)}{\funp{P}(A)}  \nonumber \\
@@ -102,13 +102,13 @@ F(x)=\int_{-\infty}^x f(x)\textrm{d}x
 \label{eq:2-2}
 \end{eqnarray}
-\parinterval {\small\sffamily\bfseries{边缘概率}}\index{边缘概率}（Marginal Probability）\index{Marginal Probability}是和联合概率对应的，它指的是$\funp{P}(X=a)$或$\funp{P}(Y=b)$，即仅与单个随机变量有关的概率。对于离散随机变量$X$和$Y$，如果知道$\funp{P}(X,Y)$，则边缘概率$\funp{P}(X)$可以通过求和的方式得到。对于$\forall x \in X $，有
+\parinterval {\small\sffamily\bfseries{边缘概率}}\index{边缘概率}（Marginal Probability）\index{Marginal Probability}是和联合概率对应的，它指的是$\funp{P}(X=a)$或$\funp{P}(Y=b)$，即仅与单个随机变量有关的概率。对于离散随机变量$X$和$Y$，如果知道$\funp{P}(X,Y)$，则边缘概率$\funp{P}(X)$可以通过求和的方式得到。对于$\forall x \in X $，有：
 \begin{eqnarray}
 \funp{P}(X=x)=\sum_{y}  \funp{P}(X=x,Y=y)
 \label{eq:2-3}
 \end{eqnarray}
-\parinterval 对于连续变量，边缘概率$\funp{P}(X)$需要通过积分得到，如下式所示
+\parinterval 对于连续变量，边缘概率$\funp{P}(X)$需要通过积分得到，如下式所示：
 \begin{eqnarray}
 \funp{P}(X=x)=\int \funp{P}(x,y)\textrm{d}y
 \label{eq:2-4}
@@ -148,38 +148,12 @@ F(x)=\int_{-\infty}^x f(x)\textrm{d}x
 \label{eq:2-5}
 \end{eqnarray}
-\parinterval 推广到$n$个事件，可以得到了{\small\bfnew{链式法则}}\index{链式法则}（Chain Rule\index{Chain Rule}）的公式
+\parinterval 推广到$n$个事件，可以得到了{\small\bfnew{链式法则}}\index{链式法则}（Chain Rule\index{Chain Rule}）的公式：
 \begin{eqnarray}
 \funp{P}(x_1,x_2, \ldots ,x_n)=\funp{P}(x_1) \prod_{i=2}^n \funp{P}(x_i \mid x_1,x_2, \ldots ,x_{i-1})
 \label{eq:2-6}
 \end{eqnarray}
-\parinterval 下面的例子有助于更好的理解链式法则，如图\ref{fig:2-3}所示，$A$、$B$、$C$、$D$、$E$分别代表五个事件，其中，$A$只和$B$有关，$C$只和$B$、$D$有关，$E$只和$C$有关，$B$和$D$不依赖其他任何事件。则$P(A,B,C,D,E)$的表达式如下式：
-\begin{eqnarray}
-&   & \funp{P}(A,B,C,D,E) \nonumber \\
-&=&\funp{P}(E \mid A,B,C,D) \cdot \funp{P}(A,B,C,D) \nonumber \\
-&=&\funp{P}(E \mid A,B,C,D) \cdot \funp{P}(D \mid A,B,C) \cdot \funp{P}(A,B,C) \nonumber \\
-&=&\funp{P}(E \mid A,B,C,D) \cdot \funp{P}(D \mid A,B,C) \cdot \funp{P}(C \mid A,B) \cdot \funp{P}(A,B) \nonumber \\
-&=&\funp{P}(E \mid A,B,C,D) \cdot \funp{P}(D \mid A,B,C) \cdot \funp{P}(C \mid A,B) \cdot \funp{P}(B \mid A) \cdot \funp{P}(A)
-\label{eq:2-7}
-\end{eqnarray}
-\parinterval 根据图\ref {fig:2-3} 易知$E$只和$C$有关，所以$\funp{P}(E \mid A,B,C,D)=\funp{P}(E \mid C)$；$D$不依赖于其他事件，所以$\funp{P}(D \mid A,B,C)=\funp{P}(D)$；$C$只和$B$、$D$有关，所以$\funp{P}(C \mid A,B)=\funp{P}(C \mid B)$；$B$不依赖于其他事件，所以$\funp{P}(B \mid  A)=\funp{P}(B)$。最终化简可得：
-\begin{eqnarray}
-\funp{P}(A,B,C,D,E)=\funp{P}(E \mid C) \cdot \funp{P}(D) \cdot \funp{P}(C \mid B) \cdot \funp{P}(B)\cdot \funp{P}(A \mid B)
-\label{eq:2-8}
-\end{eqnarray}
-%----------------------------------------------
-\begin{figure}[htp]
-\centering
-\input{./Chapter2/Figures/figure-schematic-chain-rule}
-\setlength{\belowcaptionskip}{-1cm}
-\caption{事件$A$、$B$、$C$、$D$、$E$之间的关系图}
-\label{fig:2-3}
-\end{figure}
-%-------------------------------------------
 %----------------------------------------------------------------------------------------
 %    NEW SUB-SECTION
 %----------------------------------------------------------------------------------------
@@ -214,7 +188,7 @@ F(x)=\int_{-\infty}^x f(x)\textrm{d}x
 \label{eq:2-10}
 \end{eqnarray}
-\parinterval {\small\sffamily\bfseries{贝叶斯法则}}\index{贝叶斯法则}（Bayes' Rule）\index{Bayes' Rule}是概率论中的一个经典公式，通常用于已知$\funp{P}(A \mid B)$求$\funp{P}(B \mid A)$。可以表述为：设$\{B_1, \ldots ,B_n\}$是某个集合$\Sigma$的一个划分，$A$为事件，则对于$i=1, \ldots ,n$，有如下公式
+\parinterval {\small\sffamily\bfseries{贝叶斯法则}}\index{贝叶斯法则}（Bayes' Rule）\index{Bayes' Rule}是概率论中的一个经典公式，通常用于已知$\funp{P}(A \mid B)$求$\funp{P}(B \mid A)$。可以表述为：设$\{B_1, \ldots ,B_n\}$是某个集合$\Sigma$的一个划分，$A$为事件，则对于$i=1, \ldots ,n$，有如下公式：
 \begin{eqnarray}
 \funp{P}(B_i \mid A) & = & \frac {\funp{P}(A  B_i)} { \funp{P}(A) } \nonumber \\
                                   & = & \frac {\funp{P}(A \mid B_i)\funp{P}(B_i) } { \sum_{k=1}^n\funp{P}(A \mid B_k)\funp{P}(B_k) }
@@ -253,7 +227,7 @@ F(x)=\int_{-\infty}^x f(x)\textrm{d}x
 \label{eg:2-1}
 \end{example}
-\parinterval 在这两句话中，“太阳从东方升起”是一件确定性事件（在地球上），几乎不需要查阅更多信息就可以确认，因此这件事的信息熵相对较低；而“明天天气多云”这件事，需要关注天气预报，才能大概率确定这件事，它的不确定性很高，因而它的信息熵也就相对较高。因此，信息熵也是对事件不确定性的度量。进一步，定义{\small\bfnew{自信息}}\index{自信息}（Self-information）\index{Self-information}为一个事件$X$的自信息的表达式为：
+\parinterval 在这两句话中，“太阳从东方升起”是一件确定性事件（在地球上），几乎不需要查阅更多信息就可以确认，因此这件事的信息熵相对较低；而“明天天气多云”这件事，需要关注天气预报，才能大概率确定这件事，它的不确定性很高，因而它的信息熵也就相对较高。因此，信息熵也是对事件不确定性的度量。进一步，一个事件$X$的{\small\bfnew{自信息}}\index{自信息}（Self-information）\index{Self-information}的表达式为：
 \begin{eqnarray}
 \funp{I}(x)=-\log \funp{P}(x)
 \label{eq:2-13}
@@ -314,7 +288,7 @@ F(x)=\int_{-\infty}^x f(x)\textrm{d}x
 \label{eq:2-16}
 \end{eqnarray}
-\parinterval 结合相对熵公式可知，交叉熵是KL距离公式中的右半部分。因此，当概率分布$\funp{P}(x)$固定时，求关于$\funp{Q}$的交叉熵的最小值等价于求KL距离的最小值。从实践的角度来说，交叉熵与KL距离的目的相同：都是用来描述两个分布的差异，由于交叉熵计算上更加直观方便，因此在机器翻译中被广泛应用。
+\parinterval 结合相对熵公式可知，交叉熵是KL距离公式中的右半部分。因此，当概率分布$\funp{P}(x)$固定时，求关于$\funp{Q}$的交叉熵的最小值等价于求KL距离的最小值。从实践的角度来说，交叉熵与KL距离的目的相同：都是用来描述两个分布的差异。由于交叉熵计算上更加直观方便，因此在机器翻译中被广泛应用。
 %----------------------------------------------------------------------------------------
 %    NEW SECTION
@@ -336,7 +310,7 @@ F(x)=\int_{-\infty}^x f(x)\textrm{d}x
 \end{figure}
 %-------------------------------------------
-\parinterval 此时玩家的胜利似乎只能来源于运气。不过，这里的假设“随便选一个数字”本身就是一个概率模型，它对骰子的六个面的出现做了均匀分布假设。
+\parinterval 此时玩家的胜利似乎只能来源于运气。不过，这里的假设“随便选一个数字，获胜的概率是一样的”本身就是一个概率模型，它对骰子的六个面的出现做了均匀分布假设：
 \begin{eqnarray}
 \funp{P}(\text{1})=\funp{P}(\text{2})= \ldots =\funp{P}(\text{5})=\funp{P}(\text{6})=1/6
 \label{eq:2-17}
@@ -448,7 +422,7 @@ F(x)=\int_{-\infty}^x f(x)\textrm{d}x
 \label{eq:2-20}
 \end{eqnarray}
-\noindent 其中，$V$为词汇表。本质上，这个方法和计算单词出现概率$\funp{P}(w_i)$的方法是一样的。但是这里的问题是：当$m$较大时，词串$w_1 w_2 \ldots w_m$可能非常低频，甚至在数据中没有出现过。这时，由于$\textrm{count}(w_1 w_2 \ldots w_m) \approx 0$，公式\ref{eq:seq-mle}的结果会不准确，甚至产生0概率的情况。这是观测低频事件时经常出现的问题。对于这个问题，另一种概思路是对多个联合出现的事件进行独立性假设，这里可以假设$w_1$、$w_2\ldots w_m$的出现是相互独立的，于是
+\noindent 其中，$V$为词汇表。本质上，这个方法和计算单词出现概率$\funp{P}(w_i)$的方法是一样的。但是这里的问题是：当$m$较大时，词串$w_1 w_2 \ldots w_m$可能非常低频，甚至在数据中没有出现过。这时，由于$\textrm{count}(w_1 w_2 \ldots w_m) \approx 0$，公式\ref{eq:seq-mle}的结果会不准确，甚至产生0概率的情况。这是观测低频事件时经常出现的问题。对于这个问题，另一种概思路是对多个联合出现的事件进行独立性假设，这里可以假设$w_1$、$w_2\ldots w_m$的出现是相互独立的，于是：
 \begin{eqnarray}
 \funp{P}(w_1 w_2 \ldots w_m) & = & \funp{P}(w_1) \funp{P}(w_2) \ldots \funp{P}(w_m) \label{eq:seq-independ}
 \label{eq:2-21}
@@ -481,7 +455,7 @@ F(x)=\int_{-\infty}^x f(x)\textrm{d}x
 \end{definition}
 %-------------------------------------------
-\parinterval 直接求$\funp{P}(w_1 w_2 \ldots w_m)$并不简单，因为如果把整个词串$w_1 w_2 \ldots w_m$作为一个变量，模型的参数量会非常大。$w_1 w_2 \ldots w_m$有$|V|^m$种可能性，这里$|V|$表示词汇表大小。显然，当$m$ 增大时，模型的复杂度会急剧增加，甚至都无法进行存储和计算。既然把$w_1 w_2 \ldots w_m$作为一个变量不好处理，就可以考虑对这个序列的生成过程进行分解。使用链式法则（见\ref{sec:chain-rule} 节），很容易得到
+\parinterval 直接求$\funp{P}(w_1 w_2 \ldots w_m)$并不简单，因为如果把整个词串$w_1 w_2 \ldots w_m$作为一个变量，模型的参数量会非常大。$w_1 w_2 \ldots w_m$有$|V|^m$种可能性，这里$|V|$表示词汇表大小。显然，当$m$ 增大时，模型的复杂度会急剧增加，甚至都无法进行存储和计算。既然把$w_1 w_2 \ldots w_m$作为一个变量不好处理，就可以考虑对这个序列的生成过程进行分解。使用链式法则（见\ref{sec:chain-rule} 节），很容易得到：
 \begin{eqnarray}
 \funp{P}(w_1 w_2 \ldots w_m)=\funp{P}(w_1)\funp{P}(w_2|w_1)\funp{P}(w_3|w_1 w_2) \ldots \funp{P}(w_m|w_1 w_2 \ldots w_{m-1})
 \label{eq:2-22}
@@ -515,7 +489,7 @@ F(x)=\int_{-\infty}^x f(x)\textrm{d}x
 \end{center}
 %------------------------------------------------------
-\parinterval 可以看到，1-gram语言模型只是$n$-gram语言模型的一种特殊形式。基于独立性假设，1-gram假定当前单词出现与否与任何历史都无关，这种方法大大化简了求解句子概率的复杂度。比如，上一节中公式\ref{eq:seq-independ}就是一个1-gram语言模型。但是，句子中的单词并非完全相互独立的，这种独立性假设并不能完美的描述客观世界的问题。如果需要更精确地获取句子的概率，就需要使用更长的“历史”信息，比如，2-gram、3-gram、甚至更高阶的语言模型。
+\parinterval 可以看到，1-gram语言模型只是$n$-gram语言模型的一种特殊形式。基于独立性假设，1-gram假定当前单词出现与否与任何历史都无关，这种方法大大化简了求解句子概率的复杂度。比如，上一节中公式\ref{eq:seq-independ}就是一个1-gram语言模型。但是，句子中的单词并非完全相互独立的，这种独立性假设并不能完美地描述客观世界的问题。如果需要更精确地获取句子的概率，就需要使用更长的“历史”信息，比如，2-gram、3-gram、甚至更高阶的语言模型。
 \parinterval $n$-gram的优点在于，它所使用的历史信息是有限的，即$n-1$个单词。这种性质也反映了经典的马尔可夫链的思想\upcite{liuke-markov-2004,resnick1992adventures}，有时也被称作马尔可夫假设或者马尔可夫属性。因此$n$-gram也可以被看作是变长序列上的一种马尔可夫模型，比如，2-gram语言模型对应着1阶马尔可夫模型，3-gram语言模型对应着2阶马尔可夫模型，以此类推。
@@ -537,11 +511,12 @@ F(x)=\int_{-\infty}^x f(x)\textrm{d}x
 \end{itemize}
 \vspace{0.5em}
-\parinterval 极大似然估计方法（基于频次的方法）和掷骰子游戏中介绍的统计词汇概率的方法是一致的，它的核心是使用$n$-gram出现的频次进行参数估计。基于人工神经网络的方法在近些年也非常受关注，它直接利用多层神经网络对问题的输入$w_{m-n+1} \ldots w_{m-1}$和输出$\funp{P}(w_m|w_{m-n+1}  \ldots  w_{m-1})$进行建模，而模型的参数通过网络中神经元之间连接的权重进行体现。严格意义上了来说，基于人工神经网络的方法并不算基于$n$-gram的方法，或者说它并没有显性记录$n$-gram的生成概率，也不依赖$n$-gram的频次进行参数估计。为了保证内容的连贯性，接下来仍以传统$n$-gram语言模型为基础进行讨论，基于人工神经网络的方法将会在{\chapternine}进行详细介绍。
+\parinterval 极大似然估计方法（基于频次的方法）和掷骰子游戏中介绍的统计词汇概率的方法是一致的，它的核心是使用$n$-gram出现的频次进行参数估计。基于人工神经网络的方法在近些年也非常受关注，它直接利用多层神经网络对问题的输入$w_{m-n+1} \ldots w_{m-1}$和输出$\funp{P}(w_m|w_{m-n+1}  \ldots  w_{m-1})$进行建模，而模型的参数通过网络中神经元之间连接的权重进行体现。严格来说，基于人工神经网络的方法并不算基于$n$-gram的方法，或者说它并没有显性记录$n$-gram的生成概率，也不依赖$n$-gram的频次进行参数估计。为了保证内容的连贯性，接下来仍以传统$n$-gram语言模型为基础进行讨论，基于人工神经网络的方法将会在{\chapternine}进行详细介绍。
 \parinterval $n$-gram语言模型的使用非常简单。可以直接用它来对词序列出现的概率进行计算。比如，可以使用一个2-gram语言模型计算一个句子出现的概率，其中单词之间用斜杠分隔，如下：
 \begin{eqnarray}
- & &\funp{P}_{2-\textrm{gram}}{(\textrm{确实/现在/数据/很多})} \nonumber \\
+ & &\funp{P}_{2-\textrm{gram}}{(\textrm{确实/现在/数据/很
+/多})} \nonumber \\
 &= & \funp{P}(\textrm{确实}) \times \funp{P}(\textrm{现在}|\textrm{确实})\times \funp{P}(\textrm{数据}|\textrm{现在}) \times \nonumber \\
 &  & \funp{P}(\textrm{很}|\textrm{数据})\times \funp{P}(\textrm{多}|\textrm{很})
 \label{eq:2-25}
@@ -555,9 +530,9 @@ F(x)=\int_{-\infty}^x f(x)\textrm{d}x
 \subsection{参数估计和平滑算法}
-对于$n$-gram语言模型，每个$\funp{P}(w_m|w_{m-n+1} \ldots w_{m-1})$都可以被看作是模型的{\small\bfnew{参数}}\index{参数}（Parameter\index{Parameter}）。而$n$-gram语言模型的一个核心任务是估计这些参数的值，即参数估计。通常，参数估计可以通过在数据上的统计得到。一种简单的方法是：给定一定数量的句子，统计每个$n$-gram 出现的频次，并利用公式\ref{eq:2-24}得到每个参数$\funp{P}(w_m|w_{m-n+1} \ldots w_{m-1})$的值。这个过程也被称作模型的{\small\bfnew{训练}}\index{训练}（Training\index{训练}）。对于自然语言处理任务来说，统计模型的训练是至关重要的。在本书后面的内容中也会看到，不同的问题可能需要不同的模型以及不同的模型训练方法。而很多研究工作也都集中在优化模型训练的效果上。
+对于$n$-gram语言模型，每个$\funp{P}(w_m|w_{m-n+1} \ldots w_{m-1})$都可以被看作是模型的{\small\bfnew{参数}}\index{参数}（Parameter\index{Parameter}）。而$n$-gram语言模型的一个核心任务是估计这些参数的值，即参数估计。通常，参数估计可以通过在数据上的统计得到。一种简单的方法是：给定一定数量的句子，统计每个$n$-gram 出现的频次，并利用公式\ref{eq:2-24}得到每个参数$\funp{P}(w_m|w_{m-n+1} \ldots w_{m-1})$的值。这个过程也被称作模型的{\small\bfnew{训练}}\index{训练}（Training\index{训练}）。对于自然语言处理任务来说，统计模型的训练是至关重要的。在本书后面的内容中也会看到，不同的问题可能需要不同的模型以及不同的模型训练方法，并且很多研究工作也都集中在优化模型训练的效果上。
-\parinterval 回到$n$-gram语言模型上。前面所使用的参数估计方法并不完美，因为它无法很好的处理低频或者未见现象。比如，在式\ref{eq:2-25}所示的例子中，如果语料中从没有“确实”和“现在”两个词连续出现的情况，即$\textrm{count}(\textrm{确实}\ \textrm{现在})=0$。 那么使用2-gram 计算句子“确实/现在/数据/很多”的概率时，会出现如下情况
+\parinterval 回到$n$-gram语言模型上。前面所使用的参数估计方法并不完美，因为它无法很好的处理低频或者未见现象。比如，在式\ref{eq:2-25}所示的例子中，如果语料中从没有“确实”和“现在”两个词连续出现的情况，即$\textrm{count}(\textrm{确实}\ \textrm{现在})=0$。 那么使用2-gram 计算句子“确实/现在/数据/很多”的概率时，会出现如下情况：
 \begin{eqnarray}
 \funp{P}(\textrm{现在}|\textrm{确实}) & =  & \frac{\textrm{count}(\textrm{确实}\ \textrm{现在})}{\textrm{count}(\textrm{确实})} \nonumber \\
                                                                     & =  & \frac{0}{\textrm{count}(\textrm{确实})} \nonumber \\
@@ -595,7 +570,7 @@ F(x)=\int_{-\infty}^x f(x)\textrm{d}x
 \label{eq:2-27}
 \end{eqnarray}
-\noindent 其中，$V$表示词表，$|V|$为词表中单词的个数，$w$为词表中的一个词。有时候，加法平滑方法会将$\theta$取1，这时称之为加一平滑或是拉普拉斯平滑。这种方法比较容易理解，也比较简单，因此也往往被用于对系统的快速原型中。
+\noindent 其中，$V$表示词表，$|V|$为词表中单词的个数，$w$为词表中的一个词，count表示统计单词或短语出现的次数。有时候，加法平滑方法会将$\theta$取1，这时称之为加一平滑或是拉普拉斯平滑。这种方法比较容易理解，也比较简单，因此也往往被用于对系统的快速原型中。
 \parinterval 举一个例子。假设在一个英语文档中随机采样一些单词（词表大小$|V|=20$），各个单词出现的次数为：“look”出现4次，“people”出现3次，“am”出现2次，“what”出现1次，“want”出现1次，“do”出现1次。图\ref{fig:2-12} 给出了在平滑之前和平滑之后的概率分布。
@@ -617,25 +592,25 @@ F(x)=\int_{-\infty}^x f(x)\textrm{d}x
 \vspace{-0.5em}
 \parinterval {\small\bfnew{古德-图灵估计法}}\index{古德-图灵估计法}（Good-Turing Estimate）\index{Good-Turing Estimate}是Alan Turing和他的助手Irving John Good开发的，作为他们在二战期间破解德国密码机Enigma所使用的方法的一部分，在1953 年Irving John Good将其发表。这一方法也是很多平滑算法的核心，其基本思路是：把非零的$n$元语法单元的概率降低匀给一些低概率$n$元语法单元，以减小最大似然估计与真实概率之间的偏离\upcite{good1953population,gale1995good}。
-\parinterval 假定在语料库中出现$r$次的$n$-gram有$n_r$个，特别的，出现0次的$n$-gram（即未登录词及词串）出现的次数为$n_0$个。语料库中全部单词的总个数为$N$，显然
+\parinterval 假定在语料库中出现$r$次的$n$-gram有$n_r$个，特别的，出现0次的$n$-gram（即未登录词及词串）出现的次数为$n_0$个。语料库中全部单词的总个数为$N$，显然：
 \begin{eqnarray}
 N = \sum_{r=1}^{\infty}{r\,n_r}
 \label{eq:2-28}
 \end{eqnarray}
-\parinterval 这时，出现$r$次的$n$-gram的相对频率为$r/N$，也就是不做平滑处理时的概率估计。为了解决零概率问题，对于任何一个出现$r$次的$n$-gram，古德-图灵估计法利用出现$r+1$次的$n$-gram统计量重新假设它出现$r^*$次，这里
+\parinterval 这时，出现$r$次的$n$-gram的相对频率为$r/N$，也就是不做平滑处理时的概率估计。为了解决零概率问题，对于任何一个出现$r$次的$n$-gram，古德-图灵估计法利用出现$r+1$次的$n$-gram统计量重新假设它出现$r^*$次：
 \begin{eqnarray}
 r^* = (r + 1)\frac{n_{r + 1}}{n_r}
 \label{eq:2-29}
 \end{eqnarray}
-\parinterval 基于这个公式，就可以估计所有0次$n$-gram的频次$n_0 r^*=(r+1)n_1=n_1$。要把这个重新估计的统计数转化为概率，需要进行归一化处理：对于每个统计数为$r$的事件，其概率为
+\parinterval 基于这个公式，就可以估计所有0次$n$-gram的频次$n_0 r^*=(r+1)n_1=n_1$。要把这个重新估计的统计数转化为概率，需要进行归一化处理：对于每个统计数为$r$的事件，其概率为：
 \begin{eqnarray}
 \funp{P}_r=\frac{r^*}{N}
 \label{eq:2-30}
 \end{eqnarray}
-\noindent 其中
+\noindent 其中：
 \begin{eqnarray}
 N & = & \sum_{r=0}^{\infty}{r^{*}n_r} \nonumber \\
  & = & \sum_{r=0}^{\infty}{(r + 1)n_{r + 1}} \nonumber \\
@@ -687,11 +662,11 @@ N & = & \sum_{r=0}^{\infty}{r^{*}n_r} \nonumber \\
 \parinterval 首先介绍一下Absolute Discounting平滑算法，公式如下所示：
 \begin{eqnarray}
-\funp{P}_{\textrm{AbsDiscount}}(w_i | w_{i-1}) = \frac{c(w_{i-1},w_i )-d}{c(w_{i-1})} + \lambda(w_{i-1})\funp{P}(w)
+\funp{P}_{\textrm{AbsDiscount}}(w_i | w_{i-1}) = \frac{c(w_{i-1},w_i )-d}{c(w_{i-1})} + \lambda(w_{i-1})\funp{P}(w_{i})
 \label{eq:2-33}
 \end{eqnarray}
-\noindent 其中$d$表示被裁剪的值，$\lambda$是一个正则化常数。可以看到第一项是经过减值调整过的2-gram的概率值，第二项则相当于一个带权重$\lambda$的1-gram的插值项。然而这种插值模型极易受到原始1-gram 模型的干扰。
+\noindent 其中$d$表示被裁剪的值，$\lambda$是一个正则化常数，$c(\cdot)$是count$(\cdot)$的缩写。可以看到第一项是经过减值调整过的2-gram的概率值，第二项则相当于一个带权重$\lambda$的1-gram的插值项。然而这种插值模型极易受到原始1-gram 模型的干扰。
 \parinterval 假设这里使用2-gram和1-gram的插值模型预测下面句子中下划线处的词
@@ -707,29 +682,29 @@ I cannot see without my reading \underline{\ \ \ \ \ \ \ \ }
 \parinterval 为了评估$\funp{P}_{\textrm{cont}}$，统计使用当前词作为第二个词所出现2-gram的种类，2-gram法种类越多，这个词作为第二个词出现的可能性越高，呈正比：
 \begin{eqnarray}
-\funp{P}_{\textrm{cont}}(w_i) \varpropto |w_{i-1}: c(w_{i-1} w_i )>0|
+\funp{P}_{\textrm{cont}}(w_i) \varpropto |w_{i-1}: c(w_{i-1},w_i )>0|
 \label{eq:2-34}
 \end{eqnarray}
-通过全部的二元语法的种类做归一化可得到评估的公式
+通过全部的二元语法的种类做归一化可得到评估的公式：
 \begin{eqnarray}
-\funp{P}_{\textrm{cont}}(w_i) = \frac{|\{ w_{i-1}:c(w_{i-1} w_i )>0 \}|}{|\{ (w_{j-1}, w_j):c(w_{j-1}w_j )>0 \}|}
+\funp{P}_{\textrm{cont}}(w_i) = \frac{|\{ w_{i-1}:c(w_{i-1},w_i )>0 \}|}{|\{ (w_{j-1}, w_j):c(w_{j-1},w_j )>0 \}|}
 \label{eq:2-35}
 \end{eqnarray}
-\parinterval 基于分母的变化还有另一种形式
+\parinterval 基于分母的变化还有另一种形式：
 \begin{eqnarray}
-\funp{P}_{\textrm{cont}}(w_i) = \frac{|\{ w_{i-1}:c(w_{i-1} w_i )>0 \}|}{\sum_{w^{\prime}}|\{ w_{i-1}^{\prime}:c(w_{i-1}^{\prime} w_i^{\prime} )>0 \}|}
+\funp{P}_{\textrm{cont}}(w_i) = \frac{|\{ w_{i-1}:c(w_{i-1},w_i )>0 \}|}{\sum_{w^{\prime}_{i}}|\{ w_{i-1}^{\prime}:c(w_{i-1}^{\prime},w_i^{\prime} )>0 \}|}
 \label{eq:2-36}
 \end{eqnarray}
-结合基础的Absolute discounting计算公式，从而得到了Kneser-Ney平滑方法的公式
+结合基础的Absolute discounting计算公式，从而得到了Kneser-Ney平滑方法的公式：
 \begin{eqnarray}
 \funp{P}_{\textrm{KN}}(w_i|w_{i-1}) = \frac{\max(c(w_{i-1},w_i )-d,0)}{c(w_{i-1})}+ \lambda(w_{i-1})\funp{P}_{\textrm{cont}}(w_i)
 \label{eq:2-37}
 \end{eqnarray}
-\noindent 其中
+\noindent 其中：
 \begin{eqnarray}
 \lambda(w_{i-1}) = \frac{d}{c(w_{i-1})}|\{w:c(w_{i-1},w)>0\}|
 \label{eq:2-38}
@@ -737,14 +712,14 @@ I cannot see without my reading \underline{\ \ \ \ \ \ \ \ }
 \noindent 这里$\max(\cdot)$保证了分子部分为不小0的数，原始1-gram更新成$\funp{P}_{\textrm{cont}}$概率分布，$\lambda$是正则化项。
-\parinterval 为了更具普适性，不仅局限为2-gram和1-gram的插值模型，利用递归的方式可以得到更通用的Kneser-Ney平滑公式
+\parinterval 为了更具普适性，不仅局限为2-gram和1-gram的插值模型，利用递归的方式可以得到更通用的Kneser-Ney平滑公式：
 \begin{eqnarray}
-\funp{P}_{\textrm{KN}}(w_i|w_{i-n+1}  \ldots w_{i-1}) & = & \frac{\max(c_{\textrm{KN}}(w_{i-n+1} \ldots w_{i-1})-d,0)}{c_{\textrm{KN}}(w_{i-n+1} \ldots w_{i-1})} + \nonumber \\
+\funp{P}_{\textrm{KN}}(w_i|w_{i-n+1}  \ldots w_{i-1}) & = & \frac{\max(c_{\textrm{KN}}(w_{i-n+1} \ldots w_{i})-d,0)}{c_{\textrm{KN}}(w_{i-n+1} \ldots w_{i-1})} + \nonumber \\
                                                   &   &  \lambda(w_{i-n+1} \ldots w_{i-1})\funp{P}_{\textrm{KN}}(w_i|w_{i-n+2} \ldots w_{i-1})
 \label{eq:2-39}
 \end{eqnarray}
 \begin{eqnarray}
-\lambda(w_{i-1}) =  \frac{d}{c_{\textrm{KN}}(w_{i-n+1}^{i-1})}|\{w:c_{\textrm{KN}}(w_{i-n+1} \ldots w_{i-1}w)>0\}
+\lambda(w_{i-n+1} \ldots w_{i-1}) =  \frac{d}{c_{\textrm{KN}}(w_{i-n+1}^{i-1})}|\{w:c_{\textrm{KN}}(w_{i-n+1} \ldots w_{i-1},w)>0\}
 \label{eq:2-40}
 \end{eqnarray}
 \begin{eqnarray}
@@ -779,7 +754,7 @@ c_{\textrm{KN}}(\cdot) = \left\{\begin{array}{ll}
 \begin{itemize}
 \vspace{0.5em}
-\item 预测输入句子的可能性。比如，有如下两个句子，
+\item 预测输入句子的可能性。比如，有如下两个句子
 \vspace{0.8em}
 \hspace{10em} The boy caught the cat.
@@ -821,7 +796,7 @@ c_{\textrm{KN}}(\cdot) = \left\{\begin{array}{ll}
 \noindent 这里$\arg$即argument（参数），$\argmax_x f(x)$表示返回使$f(x)$达到最大的$x$。$\argmax_{w \in \chi}$\\$\funp{P}(w)$表示找到使语言模型得分$\funp{P}(w)$达到最大的单词序列$w$。$\chi$ 是搜索问题的解空间，它是所有可能的单词序列$w$的集合。$\hat{w}$可以被看做该搜索问题中的“最优解”，即概率最大的单词序列。
-\parinterval 在序列生成任务中，最简单的策略就是对词表中的词汇进行任意组合，通过这种枚举的方式得到全部可能的序列。但是，很多时候并生成序列的长度是无法预先知道的。比如，机器翻译中目标语序列的长度是任意的。那么怎样判断一个序列何时完成了生成过程呢？这里借用人类书写中文和英文的过程：句子的生成首先从一片空白开始，然后从左到右逐词生成，除了第一个单词，所有单词的生成都依赖于前面已经生成的单词。为了方便计算机实现，通常定义单词序列从一个特殊的符号<sos>后开始生成。同样地，一个单词序列的结束也用一个特殊的符号<eos>来表示。
+\parinterval 在序列生成任务中，最简单的策略就是对词表中的词汇进行任意组合，通过这种枚举的方式得到全部可能的序列。但是，很多时候并生成序列的长度是无法预先知道的。比如，机器翻译中目标语序列的长度是任意的。那么怎样判断一个序列何时完成了生成过程呢？这里借用现代人类书写中文和英文的过程：句子的生成首先从一片空白开始，然后从左到右逐词生成，除了第一个单词，所有单词的生成都依赖于前面已经生成的单词。为了方便计算机实现，通常定义单词序列从一个特殊的符号<sos>后开始生成。同样地，一个单词序列的结束也用一个特殊的符号<eos>来表示。
 \parinterval 对于一个序列$<$sos$>$\ I\ agree\ $<$eos$>$，图\ref{fig:2-13}展示语言模型视角下该序列的生成过程。该过程通过在序列的末尾不断附加词表中的单词来逐渐扩展序列，直到这段序列结束。这种生成单词序列的过程被称作{\small\bfnew{自左向右生成}}\index{自左向右生成}（Left-to-right Generation）\index{Left-to-right Generation}。注意，这种序列生成策略与$n$-gram的思想天然契合，因为$n$-gram语言模型中，每个词的生成概率依赖前面（左侧）若干词，因此$n$-gram语言模型也是一种自左向右的计算模型。
@@ -919,7 +894,7 @@ c_{\textrm{KN}}(\cdot) = \left\{\begin{array}{ll}
 \end{figure}
 %-------------------------------------------
-\parinterval 这样，语言模型的打分与解空间树的遍历就融合在一起了。于是，序列生成的问题可以被重新描述为：寻找所有单词序列组成的解空间树中权重总和最大的一条路径。在这个定义下，前面提到的两种枚举词序列的方法就是经典的{\small\bfnew{深度优先搜索}}\index{深度优先搜索}（Depth-first Search）\index{Depth-first Search}和{\small\bfnew{宽度优先搜索}}\index{宽度优先搜索}（Breadth-first Search）\index{Breadth-first Search}的雏形\upcite{even2011graph,tarjan1972depth}。在后面的内容中可以看到，从遍历解空间树的角度出发，可以对原始这些搜索策略的效率进行优化。
+\parinterval 这样，语言模型的打分与解空间树的遍历就融合在一起了。于是，序列生成的问题可以被重新描述为：寻找所有单词序列组成的解空间树中权重总和最大的一条路径。在这个定义下，前面提到的两种枚举词序列的方法就是经典的{\small\bfnew{深度优先搜索}}\index{深度优先搜索}（Depth-first Search）\index{Depth-first Search}和{\small\bfnew{宽度优先搜索}}\index{宽度优先搜索}（Breadth-first Search）\index{Breadth-first Search}的雏形\upcite{even2011graph,tarjan1972depth}。在后面的内容中，从遍历解空间树的角度出发，可以对原始这些搜索策略的效率进行优化。
 %----------------------------------------------------------------------------------------
 %    NEW SUB-SECTION
@@ -1025,7 +1000,7 @@ c_{\textrm{KN}}(\cdot) = \left\{\begin{array}{ll}
 \end{figure}
 %-------------------------------------------
-\parinterval 束搜索也有很多的改进版本。回忆一下，在无信息搜索策略中可以使用剪枝技术来提升搜索的效率。而实际上，束搜索本身也是一种剪枝方法。因此有时也把束搜索称作{\small\bfnew{束剪枝}}\index{束剪枝}（Beam Pruning）\index{Beam Pruning}。在这里有很多其它的剪枝策略可供选择，例如可以只保留与当前最佳路径得分相差在$\theta$之内的路径，也就是搜索只保留得分差距在一定范围内的路径，这种方法也被称作{\small\bfnew{直方图剪枝}}\index{直方图剪枝}（Histogram Pruning）\index{Histogram Pruning}。
+\parinterval 束搜索也有很多的改进版本。回忆一下，在无信息搜索策略中可以使用剪枝技术来提升搜索的效率。而实际上，束搜索本身也是一种剪枝方法。因此有时也把束搜索称作{\small\bfnew{束剪枝}}\index{束剪枝}（Beam Pruning）\index{Beam Pruning}。在这里有很多其它的剪枝策略可供选择，例如可以只保留与当前最佳路径得分相差在$\theta$之内的路径，也就是进行搜索时只保留得分差距在一定范围内的路径，这种方法也被称作{\small\bfnew{直方图剪枝}}\index{直方图剪枝}（Histogram Pruning）\index{Histogram Pruning}。
 \parinterval 对于语言模型来说，当多个路径中最高得分比当前搜索到的最好的解的得分低时，可以立刻停止搜索。因为此时序列越长语言模型得分$\log \funp{P}(w_1 w_2 \ldots w_m)$会越低，继续扩展这些路径不会产生更好的结果。这个技术通常也被称为{\small\bfnew{最佳停止条件}}\index{最佳停止条件}（Optimal Stopping Criteria）\index{Optimal Stopping Criteria}。类似的思想也被用于机器翻译等任务\upcite{DBLP:conf/emnlp/HuangZM17,DBLP:conf/emnlp/Yang0M18}。
@@ -1051,7 +1026,7 @@ c_{\textrm{KN}}(\cdot) = \left\{\begin{array}{ll}
 \vspace{0.5em}
 \item 本章更多地关注了语言模型的基本问题和求解思路，但是基于$n$-gram的方法并不是语言建模的唯一方法。从现在自然语言处理的前沿看，端到端的深度学习方法在很多任务中都取得了领先的性能。语言模型同样可以使用这些方法\upcite{jing2019a}，而且在近些年取得了巨大成功。例如，最早提出的前馈神经语言模型\upcite{bengio2003a}和后来的基于循环单元的语言模型\upcite{mikolov2010recurrent}、基于长短期记忆单元的语言模型\upcite{sundermeyer2012lstm}以及现在非常流行的Transformer\upcite{vaswani2017attention}。 关于神经语言模型的内容，会在{\chapternine}进行进一步介绍。
 \vspace{0.5em}
-\item 最后，本章结合语言模型的序列生成任务对搜索技术进行了介绍。类似地，机器翻译任务也需要从大量的翻译后选中快速寻找最优译文。因此在机器翻译任务中也使用了搜索方法，这个过程通常被称作{\small\bfnew{解码}}\index{解码}（Decoding）\index{Decoding}。例如，有研究者在基于词的翻译模型中尝试使用启发式搜索\upcite{DBLP:conf/acl/OchUN01,DBLP:conf/acl/WangW97,tillmann1997a}以及贪婪搜索方法\upcite{germann2001fast}\upcite{germann2003greedy}，也有研究者研究基于短语的栈解码方法\upcite{Koehn2007Moses,DBLP:conf/amta/Koehn04}。此外，解码方法还包括有限状态机解码\upcite{bangalore2001a}\upcite{DBLP:journals/mt/BangaloreR02}以及基于语言学约束的解码\upcite{venugopal2007an,zollmann2007the,liu2006tree,galley2006scalable,chiang2005a}。相关内容将在{\chaptereight} 和{\chapterfourteen} 进行介绍。
+\item 最后，本章结合语言模型的序列生成任务对搜索技术进行了介绍。类似地，机器翻译任务也需要从大量的翻译候选中快速寻找最优译文。因此在机器翻译任务中也使用了搜索方法，这个过程通常被称作{\small\bfnew{解码}}\index{解码}（Decoding）\index{Decoding}。例如，有研究者在基于词的翻译模型中尝试使用启发式搜索\upcite{DBLP:conf/acl/OchUN01,DBLP:conf/acl/WangW97,tillmann1997a}以及贪婪搜索方法\upcite{germann2001fast}\upcite{germann2003greedy}，也有研究者研究基于短语的栈解码方法\upcite{Koehn2007Moses,DBLP:conf/amta/Koehn04}。此外，解码方法还包括有限状态机解码\upcite{bangalore2001a}\upcite{DBLP:journals/mt/BangaloreR02}以及基于语言学约束的解码\upcite{venugopal2007an,zollmann2007the,liu2006tree,galley2006scalable,chiang2005a}。相关内容将在{\chaptereight} 和{\chapterfourteen} 进行介绍。
 \vspace{0.5em}
 \end{itemize}
 \end{adjustwidth}
--- a/Chapter4/Figures/representation-of-reference-answer-set-in-hyter.tex
+++ b/Chapter4/Figures/representation-of-reference-answer-set-in-hyter.tex
@@ -4,7 +4,7 @@
 		\node[unit] (u1)at (0,0){};
 		\node[unit,anchor=west](u2) at ([xshift=7em]u1.east){};
 		\node[unit,anchor=west](u3) at ([xshift=1.5em]u2.east){};
-		\node[unit,anchor=west](u4) at ([xshift=8em]u3.east){};
+		\node[unit,anchor=west](u4) at ([xshift=5em]u3.east){};
 		\node[unit,anchor=west](u5) at ([xshift=1.5em]u4.east){};
 		\node[unit,anchor=west](u6) at ([xshift=5em]u5.east){};
 		\node[unit,anchor=west,line width=1.5pt](u7) at ([xshift=2em]u6.east){};
@@ -14,7 +14,7 @@
 		\draw[->,red,line width=1.5pt](u1.east)-- node[inner sep=0pt,color=red,above]{\footnotesize the approval rate}(u2.west);
 		\draw[->,out=-30,in=-150,red,line width=1.5pt] (u1.south east) to  node[inner sep=0pt,color=red,below]{\footnotesize the approval level}(u2.south west);
 		\draw[->,line width=1.5pt](u2.east) -- node[above]{\footnotesize for} (u3.west);
-		\draw[->,line width=1.5pt](u3.east) -- node[above]{\footnotesize national football team} (u4.west);
+		\draw[->,line width=1.5pt](u3.east) -- node[above]{\footnotesize the proposal} (u4.west);
 		\draw[->,line width=1.5pt](u4.east) -- node[above]{\footnotesize was} (u5.west);
 		\draw[->,out=40,in=140,blue,line width=1.5pt] (u5.north east) to  node[inner sep=0pt,color=blue,above]{\footnotesize pratically}(u6.north west);
 		\draw[->,blue,line width=1.5pt](u5.east)-- node[inner sep=0pt,color=blue,above]{\footnotesize close to}(u6.west);

--- a/Chapter4/chapter4.tex
+++ b/Chapter4/chapter4.tex
@@ -131,7 +131,7 @@
 %    NEW SUB-SECTION
 %----------------------------------------------------------------------------------------
-\subsection{打分标准}
+\subsection{打分标准} \label{sec:human-eval-scoring}
 \parinterval 如何对译文进行打分是机器翻译评价的核心问题。在人工评价方法中，一种被广泛使用的方法是{\small\sffamily\bfseries{直接评估}}\index{直接评估}（Direct Assessment，DA）\index{Direct Assessment}\upcite{DBLP:conf/amta/WhiteOO94}，这种评价方法需要评价者给出对机器译文的绝对评分：在给定一个机器译文和一个参考答案的情况下，评价者直接给出1-100的分数用来表征机器译文的质量。与其类似的策略是对机器翻译质量进行等级评定\upcite{DBLP:journals/mt/PrzybockiPBS09}，常见的是在5级或7级标准中指定单一等级用以反映机器翻译质量。也有研究者提出利用语言测试技术对机器翻译质量进行评价\upcite{reeder2006direct}，其中涉及多等级内容的评价：第一等级测试简单的短语、成语、词汇等；第二等级利用简单的句子测试机器翻译在简单文本上的表现；第三等级利用稍复杂的句子测试机器翻译在复杂语法结构上的表现；第四等级测试引入更加复杂的补语结构和附加语等等。
@@ -222,9 +222,9 @@ Candidate：cat is standing in the ground
 %    NEW SUBSUB-SECTION
 %----------------------------------------------------------------------------------------
-\subsubsection{2.基于$\bm{n}$-gram的方法}
+\subsubsection{2.基于$\bm{n}$-gram的方法} \label{sec:ngram-eval}
-\parinterval BLUE是目前使用最广泛的自动评价指标。BLEU 是Bilingual Evaluation Understudy的缩写，由IBM 的研究人员在2002 年提出\upcite{DBLP:conf/acl/PapineniRWZ02}。通过采用$n$-gram匹配的方式评定机器翻译结果和参考答案之间的相似度，机器译文越接近参考答案就认定它的质量越高。$n$-gram是指$n$个连续单词组成的单元，称为{\small\sffamily\bfseries{$\bm{n}$元语法单元}}\index{$\bm{n}$元语法单元}（见{\chapterthree}）。$n$越大表示评价时考虑的匹配片段越大。
+\parinterval BLEU是目前使用最广泛的自动评价指标。BLEU 是Bilingual Evaluation Understudy的缩写，由IBM 的研究人员在2002 年提出\upcite{DBLP:conf/acl/PapineniRWZ02}。通过采用$n$-gram匹配的方式评定机器翻译结果和参考答案之间的相似度，机器译文越接近参考答案就认定它的质量越高。$n$-gram是指$n$个连续单词组成的单元，称为{\small\sffamily\bfseries{$\bm{n}$元语法单元}}\index{$\bm{n}$元语法单元}（见{\chapterthree}）。$n$越大表示评价时考虑的匹配片段越大。
 \parinterval BLEU 的计算首先考虑待评价机器译文中$n$-gram在参考答案中的匹配率，称为{\small\sffamily\bfseries{$\bm{n}$-gram准确率}}\index{$\bm{n}$-gram准确率}（$n$-gram Precision）\index{$n$-gram Precision}。其计算方法如下：
 \begin{eqnarray}
@@ -243,7 +243,7 @@ Candidate：the the the the
 \parinterval 在引入截断方式之前，该译文的1-gram准确率为4/4 = 1，这显然是不合理的。在引入截断的方式之后，``the'' 在译文中出现4 次，在参考答案中出现2 次，截断操作则是取二者的最小值，即$\mathrm{Count_{hit}}$= 2，$\mathrm{Count_{output}}$= 4，该译文的1-gram准确率为2/4。
-\parinterval 译文整体的准确率等于各$n$-gram的加权平均：
+\parinterval 令$N$表示考虑的最大$n$-gram的大小，则译文整体的准确率等于各$n$-gram的加权平均：
 \begin{eqnarray}
 {\rm P_{{\rm{avg}}}} = \exp (\sum\limits_{n = 1}^N {{w_n} \cdot {{{\mathop{\rm logP}\nolimits} }_n}} )
 \label{eq:4-5}
@@ -260,11 +260,11 @@ Candidate：the the the the
 \noindent 其中，$c$表示机器译文的句子长度，$r$表示参考答案的句子长度。最终BLEU的计算公式为：
 \begin{eqnarray}
-\mathrm {BLEU} = \mathrm {BP} \cdot \exp(\sum\limits_{i = 1}^N {{w_n} \cdot {{{\mathop{\mathrm {log}}\nolimits} }\mathrm P_n}} )
+\mathrm {BLEU} = \mathrm {BP} \cdot \exp(\sum\limits_{n = 1}^N {{w_n} \cdot {{{\mathop{\mathrm {log}}\nolimits} }\mathrm P_n}} )
 \label{eq:4-7}
 \end{eqnarray}
-\parinterval 实际上，BLEU的计算也是一种综合考虑{\small\sffamily\bfseries{准确率}}\index{准确率}（Precision）\index{Precision}和{\small\sffamily\bfseries{召回率}}\index{召回率}（Recall）\index{Recall}的方法。公式中，$\exp(\sum\limits_{i = 1}^N {{w_n} \cdot {{{\mathop{\mathrm {log}}\nolimits} }\mathrm P_n}} )$是一种准确率的表示。BP本是一种召回率的度量，它会惩罚过短的结果。这种设计同分类系统中评价指标F1值是有相通之处的\upcite{DBLP:conf/muc/Chinchor92}。
+\parinterval 实际上，BLEU的计算也是一种综合考虑{\small\sffamily\bfseries{准确率}}\index{准确率}（Precision）\index{Precision}和{\small\sffamily\bfseries{召回率}}\index{召回率}（Recall）\index{Recall}的方法。公式中，$\exp(\sum\limits_{n = 1}^N {{w_n} \cdot {{{\mathop{\mathrm {log}}\nolimits} }\mathrm P_n}} )$是一种准确率的表示。BP本是一种召回率的度量，它会惩罚过短的结果。这种设计同分类系统中评价指标F1值是有相通之处的\upcite{DBLP:conf/muc/Chinchor92}。
 \parinterval 从机器翻译的发展来看，BLEU 的意义在于它给系统研发人员提供了一种简单、高效、可重复的自动评价手段，在研发机器翻译系统时可以不需要依赖人工评价。同时，BLEU 也有很多创新之处，包括引入$n$-gram的匹配，截断计数和短句惩罚等等，包括NIST 等很多评价指标都是受到BLEU 的启发。此外，BLEU本身也有很多不同的实现方式，包括IBM-BLEU\upcite{DBLP:conf/acl/PapineniRWZ02}、NIST-BLEU\footnote{NIST-BLEU是指美国国家标准与技术研究院（NIST）开发的机器翻译评价工具mteval中实现的一种BLEU计算的方法。}、BLEU-SBP\upcite{DBLP:conf/emnlp/ChiangDCN08}、ScareBLEU等，使用不同实现方式得到评价结果会有差异。因此在实际使用BLEU进行评价时需要确认其实现细节，以保证结果与相关工作评价要求相符。
@@ -278,13 +278,9 @@ Candidate：the the the the
 \subsection{基于词对齐的方法}
-\parinterval 基于词对齐的方法，顾名思义就是根据参考答案中的单词与译文中的单词之间的对齐关系对机器翻译译文进行评价。词对齐的概念也被用于统计机器翻译的建模（\chapterfive），这里借用了相同的思想来度量机器译文与参考答案之间的匹配程度。在基于$n$-gram匹配的评价方法中（如BLEU），BP可以起到一些度量召回率的作用，但是这类方法并没有对召回率进行准确的定义。与其不同的是，基于词对齐的方法在机器译文和参考答案的单词之间建立一对一的对应关系，这种评价方法在引入准确率的同时还能显性引入召回率作为评价指标。
+\parinterval 基于词对齐的方法，顾名思义就是根据参考答案中的单词与译文中的单词之间的对齐关系对机器翻译译文进行评价。词对齐的概念也被用于统计机器翻译的建模（\chapterfive），这里借用了相同的思想来度量机器译文与参考答案之间的匹配程度。在基于$n$-gram匹配的评价方法中（如BLEU），BP可以起到一些度量召回率的作用，但是这类方法并没有对召回率进行准确的定义。与其不同的是，基于词对齐的方法在机器译文和参考答案的单词之间建立一对一的对应关系，这种评价方法在引入准确率的同时还能显性引入召回率作为评价所考虑的因素。
-\parinterval 在基于词对齐的自动评价方法中，一种典型的方法是Meteor。该方法通过计算精确的word-to-word匹配来度量一个译文的质量\upcite{DBLP:conf/acl/BanerjeeL05}，并且在``绝对''匹配之外，还引入了``波特词干匹配''和``同义词''匹配。在下面的内容中，将利用实例对Meteor方法进行介绍。
+\parinterval 在基于词对齐的自动评价方法中，一种典型的方法是Meteor。该方法通过计算精确的{\small\bfnew{单词到单词}}\index{单词到单词}（Word-to-Word\index{Word-to-Word}）的匹配来度量一个译文的质量\upcite{DBLP:conf/acl/BanerjeeL05}，并且在`` 绝对''匹配之外，还引入了`` 波特词干匹配''和``同义词''匹配。在下面的内容中，将利用实例对Meteor方法进行介绍。
-\parinterval 在Meteor方法中，首先在机器译文与参考答案之间建立单词之间的对应关系，再根据其对应关系计算精确率和召回率。
-\parinterval （1）单词之间的对应关系在建立过程中主要涉及三个模型，在对齐过程中依次使用这三个模型进行匹配：\\\\\\
 \begin{example}
 Candidate：Can I have it like he ?
@@ -293,9 +289,13 @@ Candidate：Can I have it like he ?
 \label{eg:4-2}
 \end{example}
+\parinterval 在Meteor方法中，首先在机器译文与参考答案之间建立单词之间的对应关系，再根据其对应关系计算准确率和召回率。
+\parinterval （1）单词之间的对应关系在建立过程中主要涉及三个模型，在对齐过程中依次使用这三个模型进行匹配：\\\\\\
 \begin{itemize}
 \vspace{0.5em}
-\item {\small\sffamily\bfseries{``绝对''匹配模型}}\index{``绝对''匹配模型}（Exact Module）\index{Exact Module}。绝对匹配模型在建立单词对应关系时，要求机器译文端的单词与参考答案端的单词完全一致，并且在参考答案端至多有1个单词与机器译文端的单词对应，否则会将其视为多种对应情况。对于实例\ref{eg:4-2}，使用``绝对''匹配模型，共有两种匹配结果：
+\item {\small\sffamily\bfseries{``绝对''匹配模型}}\index{``绝对''匹配模型}（Exact Module）\index{Exact Module}。绝对匹配模型在建立单词对应关系时，要求机器译文端的单词与参考答案端的单词完全一致，并且在参考答案端至多有1个单词与机器译文端的单词对应，否则会将其视为多种对应情况。对于实例\ref{eg:4-2}，使用``绝对''匹配模型，共有两种匹配结果，如图\ref{fig:4-3}所示。
 %----------------------------------------------
 \begin{figure}[htp]
@@ -308,7 +308,7 @@ Candidate：Can I have it like he ?
 %----------------------------------------------
 \vspace{0.5em}
-\item {\small\sffamily\bfseries{``波特词干''匹配模型}}\index{``波特词干''匹配模型}（Porter Stem Module）\index{Porter Stem Module}。该模型在``绝对''匹配结果的基础上，对尚未对齐的单词进行基于词干的匹配，只需机器译文端单词与参考答案端单词的词干相同即可，如上文中的``do''和``did''。对于图\ref{fig:4-3}的结果，再使用``波特词干'' 匹配模型，结果如下：
+\item {\small\sffamily\bfseries{``波特词干''匹配模型}}\index{``波特词干''匹配模型}（Porter Stem Module）\index{Porter Stem Module}。该模型在``绝对''匹配结果的基础上，对尚未对齐的单词进行基于词干的匹配，只需机器译文端单词与参考答案端单词的词干相同即可，如上文中的``do''和``did''。对于图\ref{fig:4-3}的结果，再使用``波特词干'' 匹配模型，得到如图\ref{fig:4-4}所示的结果。
 %----------------------------------------------
 \begin{figure}[htp]
@@ -320,7 +320,7 @@ Candidate：Can I have it like he ?
 %----------------------------------------------
 \vspace{0.5em}
-\item {\small\sffamily\bfseries{``同义词''匹配模型}}\index{``同义词''匹配模型}（WN synonymy module）\index{WN Synonymy Module}。该模型在前两个模型匹配结果的基础上，对尚未对齐的单词进行同义词的匹配，即基于WordNet词典匹配机器译文与参考答案中的同义词。如上例中的``eat''和``have''。
+\item {\small\sffamily\bfseries{``同义词''匹配模型}}\index{``同义词''匹配模型}（WN synonymy module）\index{WN Synonymy Module}。该模型在前两个模型匹配结果的基础上，对尚未对齐的单词进行同义词的匹配，即基于WordNet词典匹配机器译文与参考答案中的同义词。如上例中的``eat''和``have''。图\ref{fig:4-5}给出了一个真实的例子。
 %----------------------------------------------
 \begin{figure}[htp]
@@ -334,7 +334,7 @@ Candidate：Can I have it like he ?
 \vspace{0.5em}
 \end{itemize}
-\parinterval 经过上面的处理，可以得到若干对机器译文与参考答案的对齐关系，下一步需要从中确定一个拥有最大的子集的对齐关系（即机器译文中被对齐的单词个数最多的对齐关系）。但是在上例中的两种对齐关系子集基数相同，这种情况下，需要选择一个对齐关系中交叉现象出现最少的对齐关系。于是，最终的对齐关系如图\ref{fig:4-6}所示：
+\parinterval 经过上面的处理，可以得到机器译文与参考答案之间的单词对齐关系。下一步需要从中确定一个拥有最大的子集的对齐关系，即机器译文中被对齐的单词个数最多的对齐关系。但是在上例中的两种对齐关系子集基数相同，这种情况下，需要选择一个对齐关系中交叉现象出现最少的对齐关系。于是，最终的对齐关系如图\ref{fig:4-6}所示。
 %----------------------------------------------
 \begin{figure}[htp]
@@ -347,13 +347,13 @@ Candidate：Can I have it like he ?
 \parinterval （2）在得到机器译文与参考答案的对齐关系后，需要基于对齐关系计算准确率和召回率。
-\parinterval 准确率：机器译文中命中单词与机器译文单词总数的比值。即：
+\parinterval 准确率：机器译文中命中单词数与机器译文单词总数的比值。即：
 \begin{eqnarray}
 \mathrm P = \frac {\rm{Count}_{hit}}{\rm{Count}_{candidate}}
 \label{eq:4-8}
 \end{eqnarray}
-\parinterval 召回率：机器译文中命中单词个数与参考答案单词总数的比值。即：
+\parinterval 召回率：机器译文中命中单词数与参考答案单词总数的比值。即：
 \begin{eqnarray}
 \mathrm R = \frac {\rm{Count}_{hit}}{\rm{Count}_{reference}}
 \label{eq:4-9}
@@ -365,21 +365,21 @@ Candidate：Can I have it like he ?
 \label{eq:4-10}
 \end{eqnarray}
-\parinterval 在上文提到的评价指标中，无论是准确率、召回率还是$\rm F_{mean}$，都是基于单个词汇信息衡量译文质量，而忽略了语序问题。为了将语序问题纳入道评价内容中，Meteor会考虑更长的匹配：将机器译文按照最长匹配长度分块，并对``块数''较多的机器译文给予惩罚。例如上例中，机器译文被分为了三个``块''——``Can I have this''、``like he do''、``？''在这种情况下，看起来上例中的准确率、召回率都还不错，但最终会受到很严重的惩罚。这种罚分机制能够识别出机器译文中的词序问题，因为当待测译文词序与参考答案相差较大时，机器译文将会被分割得比较零散，这种惩罚机制的计算公式如式\ref{eq:4-11}，其中$\rm Count_{chunks}$表示匹配的块数。
+\parinterval 在上文提到的评价指标中，无论是准确率、召回率还是$\rm F_{mean}$，都是基于单个词汇信息衡量译文质量，而忽略了语序问题。为了将语序问题考虑进来，Meteor会考虑更长的匹配：将机器译文按照最长匹配长度分块，并对``块数''较多的机器译文给予惩罚。例如上例中，机器译文被分为了三个``块''——``Can I have this''、``like he do''、``？''在这种情况下，看起来上例中的准确率、召回率都还不错，但最终会受到很严重的惩罚。这种罚分机制能够识别出机器译文中的词序问题，因为当待测译文词序与参考答案相差较大时，机器译文将会被分割得比较零散，这种惩罚机制的计算公式如式\ref{eq:4-11}，其中$\rm Count_{chunks}$表示匹配的块数。
 \begin{eqnarray}
-\rm P = 0.5*{\left({\frac{\rm Count_{chunks}}{\rm Count_{hit}}} \right)^3}
+\rm P = 0.5 \cdot {\left({\frac{\rm Count_{chunks}}{\rm Count_{hit}}} \right)^3}
 \label{eq:4-11}
 \end{eqnarray}
 \parinterval Meteor评价方法的最终评分为：
 \begin{eqnarray}
-\rm score = {F_{mean}}*(1 - P)
+\rm score = {F_{mean}} \cdot (1 - P)
 \label{eq:4-12}
 \end{eqnarray}
-\parinterval Meteor方法也是目前使用最广泛的自动评价方法之一，它的创新点之一在于引入了词干匹配和同义词匹配，扩大了词汇匹配的范围。Meteor方法被提出后，很多人尝试对其进行了改进，使其评价结果与人工评价结果更相近。例如Meteor-next在Meteor的基础上增加{\small\sffamily\bfseries{释义匹配器}}\index{释义匹配器}（Paraphrase Matcher）\index{Paraphrase Matcher}，利用该匹配器能够捕获机器译文中与参考答案意思相近的短语，从而在短语层面进行匹配。此外这种方法还引入了{\small\sffamily\bfseries{可调权值向量}}\index{可调权值向量}（Tunable Weight Vector）\index{Tunable Weight Vector}，用于调节每个匹配类型的相应贡献\upcite{DBLP:conf/wmt/DenkowskiL10}；Meteor 1.3在Meteor的基础上增加了改进的{\small\sffamily\bfseries{文本规范器}}\index{文本规范器}（Meteor Normalizer）\index{Meteor Normalizer}、更高精度的释义匹配以及区分内容词和功能词等指标，其中文本规范器能够根据一些规范化规则，将机器译文中意义等价的标点减少到通用的形式。而区分内容词和功能词则能够得到更为准确地词汇对应关系\upcite{DBLP:conf/wmt/DenkowskiL11}；Meteor Universial则通过机器学习方法学习不同语言的可调权值，在对低资源语言进行评价时可对其进行复用，从而实现对低资源语言的译文更准确的评价\upcite{DBLP:conf/wmt/DenkowskiL14}。
+\parinterval Meteor方法是经典的自动评价方法之一。它的创新点在于引入了词干匹配和同义词匹配，扩大了词汇匹配的范围。Meteor方法被提出后，很多人尝试对其进行了改进，使其评价结果与人工评价结果更相近。例如Meteor-next在Meteor的基础上增加{\small\sffamily\bfseries{释义匹配器}}\index{释义匹配器}（Paraphrase Matcher）\index{Paraphrase Matcher}，利用该匹配器能够捕获机器译文中与参考答案意思相近的短语，从而在短语层面进行匹配。此外这种方法还引入了{\small\sffamily\bfseries{可调权值向量}}\index{可调权值向量}（Tunable Weight Vector）\index{Tunable Weight Vector}，用于调节每个匹配类型的相应贡献\upcite{DBLP:conf/wmt/DenkowskiL10}；Meteor 1.3在Meteor的基础上增加了改进的{\small\sffamily\bfseries{文本规范器}}\index{文本规范器}（Meteor Normalizer）\index{Meteor Normalizer}、更高精度的释义匹配以及区分内容词和功能词等指标，其中文本规范器能够根据一些规范化规则，将机器译文中意义等价的标点减少到通用的形式。而区分内容词和功能词则能够得到更为准确地词汇对应关系\upcite{DBLP:conf/wmt/DenkowskiL11}；Meteor Universial则通过机器学习方法学习不同语言的可调权值，在对低资源语言进行评价时可对其进行复用，从而实现对低资源语言的译文更准确的评价\upcite{DBLP:conf/wmt/DenkowskiL14}。
-\parinterval 由于召回率反映参考答案在何种程度上覆盖目标译文的全部内容，而Meteor在评价过程中显式引入召回率，所以Meteor的评价与人工评价更为接近。但Meteor方法需要借助同义词表、功能词表等外部数据，当外部数据中的目标词对应不正确或缺失相应的目标词时，评价水准就会降低。不仅如此，超参数的设置和使用，对于评分影响较大。
+\parinterval 由于召回率反映参考答案在何种程度上覆盖目标译文的全部内容，而Meteor在评价过程中显式引入召回率，所以Meteor的评价与人工评价更为接近。但Meteor方法需要借助同义词表、功能词表等外部数据，当外部数据中的目标词对应不正确或缺失相应的目标词时，评价水准就会降低。特别是，针对汉语等于英语差异较大的语言，使用Meteor方法也会面临很多挑战。不仅如此，超参数的设置和使用，对于评分也有较大影响。
 %----------------------------------------------------------------------------------------
 %    NEW SUB-SECTION
@@ -387,42 +387,42 @@ Candidate：Can I have it like he ?
 \subsection{基于检测点的方法}
-\parinterval 基于词串比对和词对齐的自动评价方法中提出的BLEU、TER 等评价指标可以对译文的整体质量进行评估，但是缺乏对具体问题的细致评价。很多情况下，研究人员需要知道系统是否能够处理特定类型的翻译问题，而不是得到一个笼统的评价结果。基于检测点的方法正是基于此想法\upcite{DBLP:journals/mt/Shiwen93}。基于检测点的评价的优点在于对机器翻译系统给出一个总体评价的同时针对系统在各个具体问题上的翻译能力进行评估，方便比较不同翻译模型的性能。这种方法也被多次用于机器翻译比赛的质量评测。
+\parinterval 基于词串比对和基于词对齐的自动评价方法中提出的BLEU、TER 等评价指标可以对译文的整体质量进行评估，但是缺乏对具体问题的细致评价。很多情况下，研究人员需要知道系统是否能够处理特定类型的翻译问题，而不是得到一个笼统的评价结果。基于检测点的方法正是基于此想法\upcite{DBLP:journals/mt/Shiwen93}。这种评价方法的优点在于对机器翻译系统给出一个总体评价的同时针对系统在各个具体问题上的翻译能力进行评估，方便比较不同翻译模型的性能。这种方法也被多次用于机器翻译比赛的译文质量评估。
 \parinterval 基于检测点的评价根据事先定义好的语言学检测点对译文的相应部分进行打分。如下是几个英中翻译中的检测点实例：
 \begin{example}
 They got up at six this morning.
-\qquad\ \ \ 他们今天早晨六点钟起床。
+\qquad\ \ 他们今天早晨六点钟起床。
-\qquad\ \ \ 检测点：时间词的顺序
+\qquad\ \ 检测点：时间词的顺序
 \label{eg:4-3}
 \end{example}
 \begin{example}
 There are nine cows on the farm.
-\qquad\ \ \ 农场里有九头牛。
+\qquad\ \ 农场里有九头牛。
-\qquad\ \ \ 检测点：量词``头''
+\qquad\ \ 检测点：量词``头''
 \label{eg:4-4}
 \end{example}
 \begin{example}
 His house is on the south bank of the river.
-\qquad\ \ \ 他的房子在河的南岸。
+\qquad\ \ 他的房子在河的南岸。
-\qquad\ \ \ We keep our money in a bank.
+\qquad\ \ We keep our money in a bank.
-\qquad\ \ \ 我们在一家银行存钱。
+\qquad\ \ 我们在一家银行存钱。
-\qquad\ \ \ 检测点：bank 的多义翻译
+\qquad\ \ 检测点：bank 的多义翻译
 \label{eg:4-5}
 \end{example}
-\parinterval 该方法的关键在于检测点的获取，有工作曾提出一种从平行双语句子中自动提取检查点的方法\upcite{DBLP:conf/coling/ZhouWLLZZ08}，借助大量的双语词对齐平行语料，利用自然语言处理工具对其进行词性标注、依存分析、成分分析等处理，利用预先构建的人工词典和人为定义的规则，识别语料中不同类别的检查点，从而构建检查点数据库。其中，将检查点分别设计为单词级（如介词、歧义词等）、短语级（如固定搭配）、句子级（特殊句型、复合句型等）三个层面，在对机器翻译系统进行评价时，在检查点数据库中分别选取不同类别检查点对应的测试数据进行测试，从而了解机器翻译系统在各种重要语言现象方面的翻译能力。除此之外，这种方法也能应用于机器翻译系统之间的性能比较中，通过为各个检查点分配合理的权重，用翻译系统在各个检查点得分的加权平均作为系统得分，从而对机器翻译系统的整体水平作出评价。
+\parinterval 该方法的关键在于检测点的获取。有工作曾提出一种从平行双语句子中自动提取检查点的方法\upcite{DBLP:conf/coling/ZhouWLLZZ08}，借助大量的双语词对齐平行语料，利用自然语言处理工具对其进行词性标注、句法分析等处理，利用预先构建的词典和人工定义的规则，识别语料中不同类别的检查点，从而构建检查点数据库。其中，将检查点分别设计为单词级（如介词、歧义词等）、短语级（如固定搭配）、句子级（特殊句型、复合句型等）三个层面，在对机器翻译系统进行评价时，在检查点数据库中分别选取不同类别检查点对应的测试数据进行测试，从而了解机器翻译系统在各种重要语言现象方面的翻译能力。除此之外，这种方法也能应用于机器翻译系统之间的性能比较中，通过为各个检查点分配合理的权重，用翻译系统在各个检查点得分的加权平均作为系统得分，从而对机器翻译系统的整体水平作出评价。
 \parinterval 基于检测点的评价方法的意义在于，它并不是简单给出一个分数，反而更像是一种诊断型评估方法，能够帮助系统研发人员定位系统问题。因此这类方法更多地使用在对机器翻译系统的翻译能力进行分析上，是对BLEU 等整体评价指标的一种很好的补充。
@@ -432,7 +432,7 @@ His house is on the south bank of the river.
 \subsection{多策略融合的评价方法}\label{Evaluation method of Multi Strategy fusion}
-\parinterval 前面介绍的几种自动评价方法中，大多是从某个单一的角度比对机器译文与参考答案之间的相似度，例如BLEU更关注$n$-gram是否命中、Meteor更关注机器译文与参考答案之间的词对齐信息、WER、PER与TER等方法只关注机器译文与参考译文之间的编辑距离，此外还有一些并不常见的自动评价方法比较关注机器译文和参考译文在语法、句法方面的相似度。但无一例外的是，每种自动评价的关注点都是单一的，无法对译文质量进行全面、综合的评价。为了克服这种限制，研究人员们提出了一些基于多策略融合的译文质量评估方法，以期提高自动评价方法与人工评价方法的结果一致性。
+\parinterval 前面介绍的几种自动评价方法中，大多是从某个单一的角度比对机器译文与参考答案之间的相似度，例如BLEU更关注$n$-gram是否命中、Meteor更关注机器译文与参考答案之间的词对齐信息、WER、PER 与TER等方法只关注机器译文与参考译文之间的编辑距离，此外还有一些方法比较关注机器译文和参考译文在语法、句法方面的相似度。但无一例外的是，每种自动评价的关注点都是单一的，无法对译文质量进行全面、综合的评价。为了克服这种限制，研究人员们提出了一些基于多策略融合的译文质量评估方法，以期提高自动评价与人工评价结果的一致性。
 \parinterval 基于策略融合的自动评价方法往往会将多个基于词汇、句法和语义的自动评价方法融合在内，其中比较核心的问题是如何将多个评价方法进行合理地组合。目前提出的方法中颇具代表性的是使用参数化方式和非参数化方式对多种自动评价方法进行筛选和组合。
@@ -456,13 +456,13 @@ His house is on the south bank of the river.
 \subsubsection{1.增大参考答案集}
-\parinterval BLUE、Meteor、TER等自动评价方法的结果往往与人工评价结果存在差距，其主要原因是这些自动评价方法往往通过直接比对机器译文与有限的参考答案之间的``外在差异''，由于参考答案集可覆盖的人类译文数量过少，当机器译文本来十分合理但却未被包含在参考答案集中时，就会将其质量过分低估。
+\parinterval BLEU、Meteor、TER等自动评价方法的结果往往与人工评价结果存在差距，一个主要原因是这些自动评价方法通过直接比对机器译文与有限的参考答案之间的``外在差异''，由于参考答案集可覆盖的人类译文数量过少，当机器译文本来十分合理但却未被包含在参考答案集中时，就会将其质量过分低估。
-\parinterval HyTER是2012年被提出的一种自动评价方法，它致力于得到所有可能译文的紧凑编码，从而实现自动评价过程中访问所有合理译文\upcite{DBLP:conf/naacl/DreyerM12}。这种评价方法的原理非常简单直观：
+\parinterval 针对这个问题，HyTER自动评价方法致力于得到所有可能译文的紧凑编码，从而实现自动评价过程中访问所有合理的译文\upcite{DBLP:conf/naacl/DreyerM12}。这种评价方法的原理非常简单直观：
 \begin{itemize}
 \vspace{0.5em}
-\item 通过注释工具标记出一个短语的所有备选含义（同义词）并存储在一起作为一个同义单元。可以认为每个同义单元表达了一个语义概念。在生成参考答案时，可以通过对某参考答案中的短语用同义单元进行替换生成一个新的参考答案。例如，将中文句子``对国足的支持率接近于0''翻译为英文，同义单元有以下几种：
+\item 通过注释工具标记出一个短语的所有备选含义（同义词）并存储在一起作为一个同义单元。可以认为每个同义单元表达了一个语义概念。在生成参考答案时，可以通过对某参考答案中的短语用同义单元进行替换生成一个新的参考答案。例如，将中文句子``对提案的支持率接近于0''翻译为英文，同义单元有以下几种：
 \noindent [THE-SUPPORT-RATE]：
@@ -504,19 +504,19 @@ His house is on the south bank of the river.
 \end{figure}
 %----------------------------------------------
-\parinterval 但是在捷克语中主语``městská rada''或是``zastupitelstvo města''的性别必须由动词来反映，那么上述捷克语的参考答案集中有部分存在语法错误。为了避免此类现象的出现，研究人员在同义单元中加入了将同义单元组合在一起必须满足的限制条件\upcite{DBLP:conf/tsd/BojarMTZ13}，从而在增大参考答案集地同时确保了每个参考答案的准确性
+\parinterval 但是在捷克语中主语``městská rada''或是``zastupitelstvo města''的性别必须由动词来反映，那么上述捷克语的参考答案集中有部分存在语法错误。为了避免此类现象的出现，研究人员在同义单元中加入了将同义单元组合在一起必须满足的限制条件\upcite{DBLP:conf/tsd/BojarMTZ13}，从而在增大参考答案集的同时确保了每个参考答案的准确性
-\parinterval 将参考答案集扩大后，可以继续沿用BLEU或NIST等基于$n$元语法的方法进行自动评价，但是传统方法往往会忽略多重参考答案中的重复信息，于是对每个$n$元语法进行加权的自动评价方法被提出\upcite{DBLP:conf/eamt/QinS15}。该方法根据每个$n$元语法单元的长度、在参考答案集中出现的次数、被虚词（如``the''``by''``a''等）分开后的分散度等方面，确定其在计算最终分数时所占的权重。以BLEU方法为例，原分数计算方式如公式13所示：
+\parinterval 将参考答案集扩大后，可以继续沿用BLEU或NIST等基于$n$元语法的方法进行自动评价，但是传统方法往往会忽略多重参考答案中的重复信息，于是对每个$n$元语法进行加权的自动评价方法被提出\upcite{DBLP:conf/eamt/QinS15}。该方法根据每个$n$元语法单元的长度、在参考答案集中出现的次数、被虚词（如“the”，“by”，“a”等）分开后的分散度等方面，确定其在计算最终分数时所占的权重。以BLEU方法为例（\ref{sec:ngram-eval}节），可以将式\ref{eq:4-7}改写为：
 \begin{eqnarray}
-\mathrm {BLEU} &=& \mathrm {BP} \cdot {\rm{exp}}(\sum\limits_{i = 1}^N {{w_n} \cdot {{{\mathop{\rm log}\nolimits} }\mathrm{P}_n}} )
+\mathrm{BLEU} &=& \mathrm {BP} \cdot {\rm{exp}}(\sum\limits_{n = 1}^N {{w_n} \cdot \log (\mathrm{S}_n \times \mathrm{P}_n} ))
-\label{eq:4-13}\\
-\mathrm{BLEU} &=& \mathrm {BP} \cdot {\rm{exp}}(\sum\limits_{i = 1}^N {{w_n} \cdot \log (\mathrm{S}_n \times \mathrm{P}_n} ))
 \label{eq:4-14}\\
-\mathrm{S}_n &=& \mathrm{Ngram_{diver}} \times \log (n + \frac{M}{\rm{Count_{ref}}})
+\mathrm{S}_n &=& n\mathrm{-gram_{diver}} \cdot \log (n + \frac{M}{\rm{Count_{ref}}})
 \label{eq:4-15}
 \end{eqnarray}
-\parinterval 本方法分数的计算方法见公式\ref{eq:4-14}，其中$\mathrm{S}_n$即为为某个$n$元语法单元分配的权重，计算方式见公式\ref{eq:4-15}，公式中$n$为语法单语的长度，$M$为参考答案集中出现该$n$元语法单元的参考答案数量，$\rm{Count_{ref}}$为参考答案集大小。$\mathrm{Ngram_{diver}}$为该$n$元语法单元的分散度，用$n$元语法单元种类数量与语法单元总数的比值计算。
+\noindent 其中，$\mathrm{S}_n$即为为某个$n$元语法单元分配的权重，$M$为参考答案集中出现该$n$-gram中的参考答案数量，$\rm{Count_{ref}}$ 为参考答案集大小。$n\mathrm{-gram_{diver}}$为该$n$-gram的分散度，用$n$-gram种类数量与语法单元总数的比值计算。
+\parinterval 需要注意的是，HyTER方法对参考译文的标注有特殊要求，因此需要单独培训译员并开发相应的标注系统。这在一定程度上也增加了该方法被使用的难度。
 %----------------------------------------------------------------------------------------
 %    NEW SUBSUB-SECTION
@@ -524,11 +524,11 @@ His house is on the south bank of the river.
 \subsubsection{2.利用分布式表示进行质量评价}
-\parinterval 2003年，在自然语言处理的神经语言建模任务中引入了词嵌入技术，其思想是把每个单词映射为多维实数空间中的一个点（具体表现为一个实数向量），这种技术也被称作单词的分布式表示。在这项技术中，研究人员们发现单词之间的关系可以通过空间的几何性质进行刻画，意义相近的单词之间的欧式距离也十分相近。（单词分布式表示的具体内容，将在书的{\chapternine}详细介绍，在此不再赘述。）
+\parinterval {\small\bfnew{词嵌入}}\index{词嵌入}（Word Embedding\index{Word Embedding}）技术是近些年自然语言处理中的重要成果，其思想是把每个单词映射为多维实数空间中的一个点（具体表现为一个实数向量），这种技术也被称作单词的{\small\bfnew{分布式表示}}\index{分布式表示}（Distributed Representation\index{Distributed Representation}）。在这项技术中，单词之间的关系可以通过空间的几何性质进行刻画，意义相近的单词之间的欧式距离也十分相近（单词分布式表示的具体内容，将在书的{\chapternine} 详细介绍，在此不再赘述）。
-\parinterval 受词嵌入技术的启发，研究人员尝试借助参考答案和机器译文的分布式表示来进行译文质量评价，为译文质量评价提供了新思路。在自然语言的上下文中，表示是与每个单词、句子或文档相关联的数学对象。这个对象通常是一个向量，其中每个元素的值在某种程度上描述了相关单词、句子或文档的语义或句法属性。{\small\sffamily\bfseries{分布式表示评价度量}}\index{分布式表示评价度量}（Distributed Representations Evaluation Metrics，DREEM）\index{Distributed Representations Evaluation Metrics}将单词或句子的分布式表示映射到连续的低维空间，发现在该空间中，具有相似句法和语义属性的单词彼此接近\upcite{bengio2003a,DBLP:conf/emnlp/SocherPHNM11,DBLP:conf/emnlp/SocherPWCMNP13}，证明了利用分布式表示实现译文质量评估的可行性。
+\parinterval 受词嵌入技术的启发，研究人员尝试借助参考答案和机器译文的分布式表示来进行译文质量评价，为译文质量评价提供了新思路。在自然语言的上下文中，表示是与每个单词、句子或文档相关联的数学对象。这个对象通常是一个向量，其中每个元素的值在某种程度上描述了相关单词、句子或文档的语义或句法属性。基于这个想法，研究人员提出了{\small\sffamily\bfseries{分布式表示评价度量}}\index{分布式表示评价度量}（Distributed Representations Evaluation Metrics，DREEM）\index{Distributed Representations Evaluation Metrics}\upcite{chen-guo-2015-representation}。这种方法将单词或句子的分布式表示映射到连续的低维空间，发现在该空间中，具有相似句法和语义属性的单词彼此接近，类似的结论也出现在相关工作中\upcite{bengio2003a,DBLP:conf/emnlp/SocherPHNM11,DBLP:conf/emnlp/SocherPWCMNP13}。而这个特点可以被应用到译文质量评估中。
-\parinterval 在该类方法中，分布式表示的选取是一个十分关键的问题，理想的情况下，分布式表示应该涵盖句子在词汇、句法、语法、语义、依存关系等各个方面的信息。目前常见的分布式表示方式如表\ref{tab:4-2}所示。除此之外，还可以通过词袋模型、循环神经网路、卷积神经网络、深层平均网络\upcite{iyyer-etal-2015-deep}、Quick-Thought模型\upcite{DBLP:conf/iclr/LogeswaranL18}等将词向量表示转换为句子向量表示。
+\parinterval 在DREEM中，分布式表示的选取是一个十分关键的问题，理想的情况下，分布式表示应该涵盖句子在词汇、句法、语法、语义、依存关系等各个方面的信息。目前常见的分布式表示方式如表\ref{tab:4-2}所示。除此之外，还可以通过词袋模型、循环神经网路等将词向量表示转换为句子向量表示。
 \begin{table}[htp]{
 \begin{center}
@@ -549,41 +549,45 @@ His house is on the south bank of the river.
 \end{center}
 }\end{table}
-\parinterval DREEM方法中选取了能够反映句子中使用的特定词汇的One-hot向量、能够反映词汇信息的词嵌入向量\upcite{bengio2003a}、能够反映句子的合成语义信息的{\small\sffamily\bfseries{递归自动编码}}\index{递归自动编码}（Recursive Autoencoder Embedding, RAE）\index{Recursive Autoencoder Embedding}，这三种表示级联在一起，最终形成句子的向量表示。在得到机器译文和参考答案的上述分布式表示后，利用余弦相似度和长度惩罚对机器译文质量进行评价。机器译文$t$和参考答案$r$之间的相似度如公式\ref{eq:4-16}所示，其中${v_i}(t)$和${v_i}(r)$分别是机器译文和参考答案的向量表示中的第$i$个元素，$N$是向量表示的维度大小。
+\parinterval DREEM方法中选取了能够反映句子中使用的特定词汇的One-hot向量、能够反映词汇信息的词嵌入向量\upcite{bengio2003a}、能够反映句子的合成语义信息的{\small\sffamily\bfseries{递归自动编码}}\index{递归自动编码}（Recursive Autoencoder Embedding, RAE）\index{Recursive Autoencoder Embedding}，这三种表示级联在一起，最终形成句子的向量表示。在得到机器译文和参考答案的上述分布式表示后，利用余弦相似度和长度惩罚对机器译文质量进行评价。机器译文$o$和参考答案$g$之间的相似度如公式\ref{eq:4-16}所示，其中${v_i}(o)$和${v_i}(g)$分别是机器译文和参考答案的向量表示中的第$i$个元素，$N$是向量表示的维度大小。
 \begin{eqnarray}
-\mathrm {cos}(t,r) = \frac{{\sum\limits_{i = 1}^N {{v_i}(t) \cdot {v_i}(r)} }}{{\sqrt {\sum\limits_{i = 1}^N {v_i^2(t)} } \sqrt {\sum\limits_{i = 1}^N {v_i^2(r)} } }}
+\mathrm {cos}(t,r) = \frac{{\sum\limits_{i = 1}^N {{v_i}(o) \cdot {v_i}(g)} }}{{\sqrt {\sum\limits_{i = 1}^N {v_i^2(o)} } \sqrt {\sum\limits_{i = 1}^N {v_i^2(g)} } }}
 \label{eq:4-16}
 \end{eqnarray}
-\parinterval 在此基础上，DREEM方法还引入了长度惩罚项，对与参考答案长度相差太多的机器译文进行惩罚，长度惩罚项如公式\ref{eq:4-17}所示，其中${l_t}$和${l_r}$分别是机器译文和参考答案长度：
+\parinterval 在此基础上，DREEM方法还引入了长度惩罚项，对与参考答案长度相差太多的机器译文进行惩罚，长度惩罚项如公式\ref{eq:4-17}所示，其中${l_o}$和${l_g}$分别是机器译文和参考答案长度：
 \begin{eqnarray}
 \mathrm{BP} = \left\{ \begin{array}{l}
-\exp (1 - {{{l_r}} \mathord{\left/
+\exp (1 - {{{l_g}} \mathord{\left/
- {\vphantom {{{l_r}} {{l_t}}}} \right.
+ {\vphantom {{{l_g}} {{l_o}}}} \right.
- \kern-\nulldelimiterspace} {{l_t}}})\quad {l_t} < {l_r}\\
+ \kern-\nulldelimiterspace} {{l_o}}})\quad {l_o} < {l_g}\\
-\exp (1 - {{{l_t}} \mathord{\left/
+\exp (1 - {{{l_o}} \mathord{\left/
- {\vphantom {{{l_t}} {{l_r}}}} \right.
+ {\vphantom {{{l_o}} {{l_g}}}} \right.
- \kern-\nulldelimiterspace} {{l_r}}})\quad {l_t} \ge {l_r}
+ \kern-\nulldelimiterspace} {{l_g}}})\quad {l_o} \ge {l_g}
 \end{array} \right.
 \label{eq:4-17}
 \end{eqnarray}
 \parinterval 机器译文的最终得分如下，其中$\alpha$是一个需要手动设置的参数：
 \begin{eqnarray}
-\mathrm{score}(t,r) = \mathrm{cos}{^\alpha }(t,r) \times \mathrm{BP}
+\mathrm{score}(o,g) = \mathrm{cos}{^\alpha }(o,g) \times \mathrm{BP}
 \label{eq:4-18}
 \end{eqnarray}
-\parinterval 与传统自动评价方法中对机器译文与参考答案的外在的词汇或是$n$元语法单元进行比较不同，该方法观察到的不只是单词的多余、缺少、乱序等问题，还可以从句法、语义等更深层的内容对两者进行相似度对比。此方法在译文质量评价方面的成功，也鼓励了更多研究人员利用分布式表示方法进行译文质量评价。
-\parinterval 在DREEM方法取得成功后，基于词嵌入的词对齐自动评价方法被提出\upcite{DBLP:journals/corr/MatsuoKS17}，该方法中先得到机器译文与参考答案的词对齐关系后，通过平均对齐关系$x_i$和$y_i$中两者的词嵌入相似度来计算机器译文与参考答案的相似度，具体见公式\ref{eq:4-19}，其中$x$是机器译文，$y$是参考答案，函数$\varphi(\cdot)$用来计算对齐关系$x_i$和$y_i$的相似度。
+\parinterval 本质上，分布式表示是一种对句子语义的一种统计表示。因此，它可以帮助评价系统捕捉一些从简单的词或者句子片段中不易发现的现象，进而进行更深层的句子匹配。
+\parinterval 在DREEM方法取得成功后，基于词嵌入的词对齐自动评价方法被提出\upcite{DBLP:journals/corr/MatsuoKS17}，该方法中先得到机器译文与参考答案的词对齐关系后，通过对齐关系中两者的词嵌入相似度来计算机器译文与参考答案的相似度，公式如下：
 \begin{eqnarray}
-\mathrm{ASS}(x,y) = \frac{1}{{\left| x \right|\left| y \right|}}\sum\limits_{i = 1}^{\left| x \right|} {\sum\limits_{j = 1}^{\left| y \right|} {\varphi ({x_i},{y_j})} }
+\mathrm{ASS}(o,g) = \frac{1}{{m \cdot l}}\sum\limits_{i = 1}^{m} {\sum\limits_{j = 1}^{l} {\varphi (o,g,i,j)} }
 \label{eq:4-19}
 \end{eqnarray}
+\noindent 其中，$o$是机器译文，$g$是参考答案，$m$表示译文$o$的长度，$l$表示参考答案$g$的长度，函数$\varphi(o,g,i,j)$用来计算$o$中第$i$个词和$g$中第$j$个词之间对齐关系的相似度。
 \parinterval 此外，将分布式表示与相对排序融合也是一个很有趣的想法\upcite{DBLP:journals/csl/GuzmanJMN17}，在这个尝试中，研究人员利用分布式表示提取参考答案和多个机器译文中的句法信息和语义信息，利用神经网络模型对多个机器译文进行排序。
-\parinterval 在基于分布式表示的这类译文质量评价方法中，译文和参考答案的所有词汇信息和句法语义信息都被包含在句子的分布式表示中，克服了单一参考答案的限制。但是同时也带来了新的问题，一方面将句子转化成分布式表示使评价过程变得不太直观，另一方面该类评价方法的优劣与分布式表示的选取息息相关，为了获得与人工评价更相关的评价效果，分布式表示的选取和组合方式还需要进一步的研究。
+\parinterval 在基于分布式表示的这类译文质量评价方法中，译文和参考答案的所有词汇信息和句法语义信息都被包含在句子的分布式表示中，克服了单一参考答案的限制。但是同时也带来了新的问题，一方面将句子转化成分布式表示使评价过程变得不那么具有可解释性，另一方面分布式表示的质量也会对评价结果有较大的影响。
 %----------------------------------------------------------------------------------------
 %    NEW SUB-SECTION
@@ -609,7 +613,7 @@ His house is on the south bank of the river.
 \vspace{0.5em}
 \end{itemize}
-\parinterval 目前在机器译文质量评价的领域中，有很多研究工作尝试比较各种有参考答案的自动评价方法（主要以BLEU、NIST等基于$n$元语法的方法为主）与人工评价方法的相关性。整体来看，这些方法与人工评价具有一定的相关性，自动评价结果能够较好地翻译译文质量\upcite{coughlin2003correlating}\upcite{doddington2002automatic}。
+\parinterval 目前在机器译文质量评价的领域中，有很多研究工作尝试比较各种有参考答案的自动评价方法（主要以BLEU、NIST等基于$n$元语法的方法为主）与人工评价方法的相关性。整体来看，这些方法与人工评价具有一定的相关性，自动评价结果能够较好地翻译译文质量\upcite{coughlin2003correlating,doddington2002automatic}。
 \parinterval 但是也有相关研究指出，不应该对有参考答案的自动评价方法过于乐观，而应该存谨慎态度，因为目前的自动评价方法对于流利度的评价并不可靠，同时参考答案的体裁和风格往往会对自动评价结果产生很大影响\upcite{culy2003limits}。同时，有研究者提出，在机器翻译研究过程中，忽略实际的示例翻译而仅仅通过BLEU等自动评价方式得分的提高来表明机器翻译质量的提高是不可取的，因为BLEU的提高并不足以反映翻译质量的真正提高，而在另一些情况下，为了实现翻译质量的显著提高，并不需要提高BLEU\upcite{callison2006re}。
@@ -620,9 +624,9 @@ His house is on the south bank of the river.
 \sectionnewpage
 \section{无参考答案的自动评价}
-\parinterval 无参考答案自动评价在机器翻译领域又被称作{\small\sffamily\bfseries{质量评估}}\index{质量评估}（Quality Estimation，\\QE）\index{Quality Estimation，QE}。与传统的译文质量评价方法不同，质量评估旨在不参照标准译文的情况下，对机器翻译系统的输出在单词、短语、句子、文档等各个层次进行评价，于是在质量评估这个任务的基础上衍生出了单词级质量评估、短语级质量评估、句子级质量评估和文档级质量评估几种相关任务。
+\parinterval 无参考答案自动评价在机器翻译领域又被称作{\small\sffamily\bfseries{质量评估}}\index{质量评估}（Quality Estimation，\\QE）\index{Quality Estimation，QE}。与传统的译文质量评价方法不同，质量评估旨在不参照标准译文的情况下，对机器翻译系统的输出在单词、短语、句子、文档等各个层次进行评价。
-\parinterval 人们对于无参考答案自动评价的需求大多来源于机器翻译的实际应用。例如，在机器翻译的译后编辑过程中，译员不仅仅希望了解机器翻译系统的整体翻译质量，还需要了解该系统在某个句子上的表现如何：该机器译文的质量是否很差？需要修改的内容有多少？是否值得进行后编辑？这时，译员更加关注系统在单个数据点上（比如一段话）的可信度而非系统在测试数据集上的平均质量。这时，太多的人工介入就无法保证使用机器翻译所带来的高效性，因此在机器翻译输出译文的同时，需要质量评估系统给出对译文质量的预估结果。这些需求也促使研究人员在质量评估问题上投入了更多的研究力量。包括WMT、CCMT等知名机器翻译评测中也都设置了相关任务，受到了业界的认可。
+\parinterval 人们对于无参考答案自动评价的需求大多来源于机器翻译的实际应用。例如，在机器翻译的译后编辑过程中，译员不仅仅希望了解机器翻译系统的整体翻译质量，还需要了解该系统在某个句子上的表现如何：该机器译文的质量是否很差？需要修改的内容有多少？是否值得进行后编辑？这时，译员更加关注系统在单个数据点上（比如一段话）的可信度而非系统在测试数据集上的平均质量。这时，太多的人工介入就无法保证使用机器翻译所带来的高效性，因此在机器翻译输出译文的同时，需要质量评估系统给出对译文质量的预估结果。这些需求也促使研究人员在质量评估问题上投入了更多的研究力量。包括WMT、CCMT等知名机器翻译评测中也都设置了相关任务，受到了业界的关注。
 %----------------------------------------------------------------------------------------
 %    NEW SUB-SECTION
@@ -674,7 +678,7 @@ scharfzeichnen.（德语）
 \vspace{0.5em}
 \item {\small\sffamily\bfseries{找出译文中翻译错误的单词}}。单词级质量评估任务要求预测一个与译文等长的质量标签序列，该标签序列反映译文端的每个单词是否能够准确表达出其对应的源端单词的含义，若是可以，则标签为``OK''，反之则为``BAD''。图\ref{fig:4-11}中的连线表示单词之间的对齐关系，图\ref{fig:4-11}中的MT tags即为该过程中需要预测的质量标签序列。
 \vspace{0.5em}
-\item {\small\sffamily\bfseries{找出源文中导致翻译错误的单词}}。单词级质量评估任务还要求预测一个与源文等长的质量标签序列，该标签序列反映源文端的每个单词是否会导致本次翻译出现错误，若是不会，则标签为``OK''，反之则为``BAD''。图\ref{fig:4-11}中的Source tags即为该过程中的质量标签序列。在实际实现中，质量评估系统往往先预测译文端的质量标签序列，并根据源文与译文之间的对齐关系，推测源端的质量标签序列。
+\item {\small\sffamily\bfseries{找出源文中导致翻译错误的单词}}。单词级质量评估任务还要求预测一个与源文等长的质量标签序列，该标签序列反映源文端的每个单词是否会导致本次翻译出现错误，若是不会，则标签为``OK''，反之则为``BAD''。图\ref{fig:4-11}中的Source tags即为该过程中的质量标签序列。在具体应用时，质量评估系统往往先预测译文端的质量标签序列，并根据源文与译文之间的对齐关系，推测源端的质量标签序列。
 \vspace{0.5em}
 \item {\small\sffamily\bfseries{找出在翻译句子时出现漏译现象的位置}}。单词级质量评估任务同时也要求预测一个能够捕捉到漏译现象的质量标签序列，在译文端单词的两侧位置进行预测，若某位置未出现漏译，则该位置的质量标签为``OK''，否则为``BAD''。图\ref{fig:4-11}中的Gap tags即为该过程中的质量标签序列。为了检测句子翻译中的漏译现象，需要在译文中标记缺口，即译文中的每个单词两边都各有一个``GAP''标记，如图\ref{fig:4-11}所示。
 \vspace{0.5em}
@@ -686,7 +690,7 @@ scharfzeichnen.（德语）
 \subsubsection{2.短语级质量评估}
-\parinterval 短语级质量评估可以看做是单词级质量评估任务的扩展：机器翻译系统引发的错误往往都是相互关联的，解码过程中某个单词出错会导致更多的错误，特别是在其局部上下文当中，以单词的``局部上下文''为基本单元进行指令评估即为短语级质量评估。
+\parinterval 短语级质量评估可以看做是单词级质量评估任务的扩展：机器翻译系统引发的错误往往都是相互关联的，解码过程中某个单词出错会导致更多的错误，特别是在其局部上下文当中，以单词的``局部上下文''为基本单元进行质量评估即为短语级质量评估。
 \parinterval 短语级质量评估与单词级质量评估类似，其目标是找出短语中翻译错误、短语内部语序问题及漏译问题。短语级质量评估任务可以被定义为：以若干个连续单词组成的短语为基本评估单位，参照源语言句子，自动标记出短语内部短语错误以及短语之间的是否存在漏译。其中的短语错误包括短语内部单词的错译和漏译、短语内部单词的语序错误，而漏译问题则特指短语之间的漏译错误。在短语级质量评估任务中，输入是机器译文和源语言句子，输出是一系列标签序列，即图\ref{fig:4-12}中的Phrase-target tags、Gap tags，标签序列中的每个标签对应翻译中的每个单词，并表明该位置是否出现错误。
@@ -702,7 +706,7 @@ scharfzeichnen.（德语）
 \parinterval 下面以实例\ref{eg:4-8}为例介绍该任务的具体内容：
 \begin{example}
-短语级质量评估任务
+短语级质量评估任务（短语间用 || 分隔）
 源句：Bei Patienten mit || eingeschränkter Nierenfunktion kann || Insulinabbaus ||
@@ -732,28 +736,28 @@ scharfzeichnen.（德语）
 \subsubsection{3.句子级质量评估}
-\parinterval 迄今为止，质量评估的大部分工作都集中在句子层次的预测上，这是因为多数情况下机器翻译系统的处理都是逐句进行，系统用户也总是每次翻译一个句子或是以句子为单位组成的文本块（段落、文档等），因此以句子作为质量评估的基本单元是相当自然的。
+\parinterval 迄今为止，质量评估的大部分工作都集中在句子层次的预测上，这是因为多数情况下机器翻译系统的处理都是逐句进行，系统用户也总是每次翻译一个句子或是以句子为单位组成的文本块（段落、文档等），因此以句子作为质量评估的基本单元是很自然的。
-\parinterval 句子级质量评估的目标是生成能够反映译文句子整体质量的质量标签——可以是离散型的表示某种质量等级的标签，也可以是连续型的基于评分的标签。虽然以不同的标准进行评估，同一个译文句子的质量标签可能有所不同，但可以肯定的是句子的最终质量绝不是句子中单词质量的简单累加。因为与词级的质量评估相比，句子级质量评估也会关注是否保留源句的语义、译文的语义是否连贯、译文中的单词顺序是否合理等因素。
+\parinterval 句子级质量评估的目标是生成能够反映译文句子整体质量的标签\ \dash \ 可以是离散型的表示某种质量等级的标签，也可以是连续型的基于评分的标签。虽然以不同的标准进行评估，同一个译文句子的质量标签可能有所不同，但可以肯定的是句子的最终质量绝不是句子中单词质量的简单累加。因为与词级的质量评估相比，句子级质量评估也会关注是否保留源句的语义、译文的语义是否连贯、译文中的单词顺序是否合理等因素。
-\parinterval 句子级质量评估，顾名思义就是根据某种评价标准，通过建立模型来预测一个反映句子质量的标签。人们可以根据句子翻译的目的、后编辑的工作难度、是否达到发表要求或是是否能让非母语者读懂等各个角度、各个标准去设定句子级质量评估的标准。句子级质量评估任务的发展经历过下面几个阶段：
+\parinterval 句子级质量系统需要根据某种评价标准，通过建立预测模型来生成一个反映句子质量的标签。人们可以根据句子翻译的目的、后编辑的工作难度、是否达到发表要求或是是否能让非母语者读懂等各个角度、各个标准去设定句子级质量评估的标准。句子级质量评估任务有多种形式：
 \begin{itemize}
 \vspace{0.5em}
-\item 区分``人工翻译''和``机器翻译''。在最初的工作中，研究人员试图训练一个能够区分人工翻译和机器翻译的二分类器完成句子级的质量评估\upcite{gamon2005sentence}，将被分类器判断为``人工翻译''的机器译文视为优秀的译文，将被分类器判断为``机器翻译''的机器译文视为较差的译文。一方面，这种评估方式不够直观，另一方面，这种评估方式并不十分合理，因为通过人工比对发现很多被判定为``机器翻译''的译文具有与人们期望的人类翻译相同的质量水平。
+\item 区分``人工翻译''和``机器翻译''。在早期的工作中，研究人员试图训练一个能够区分人工翻译和机器翻译的二分类器完成句子级的质量评估\upcite{gamon2005sentence}，将被分类器判断为``人工翻译''的机器译文视为优秀的译文，将被分类器判断为``机器翻译''的机器译文视为较差的译文。一方面，这种评估方式不够直观，另一方面，这种评估方式并不十分准确，因为通过人工比对发现很多被判定为``机器翻译''的译文具有与人们期望的人类翻译相同的质量水平。
 \vspace{0.5em}
-\item 预测反映译文句子质量的``质量标签''。此后，研究人员们试图使用人工为机器译文分配能够反映译文质量的标签\upcite{DBLP:conf/lrec/Quirk04}，例如``不可接受''``一定程度上可接受''``可接受''``理想''等，同时将获取机器译文的质量标签作为句子级质量评估的任务目标。
+\item 预测反映译文句子质量的``质量标签''。在同一时期，研究人员们也尝试使用人工为机器译文分配能够反映译文质量的标签\upcite{DBLP:conf/lrec/Quirk04}，例如``不可接受''、``一定程度上可接受''、`` 可接受''、`` 理想''等类型的质量标签，同时将获取机器译文的质量标签作为句子级质量评估的任务目标。
 \vspace{0.5em}
-\item 预测译文句子的相对排名。当相对排序（详见4.2节）的译文评价方法被引入后，给出机器译文的相对排名成为句子级质量评估的任务目标。
+\item 预测译文句子的相对排名。当相对排序（详见\ref{sec:human-eval-scoring}节）的译文评价方法被引入后，给出机器译文的相对排名成为句子级质量评估的任务目标。
 \vspace{0.5em}
-\item 预测译文句子的后编辑工作量。在最近的研究中，句子级地质量评估一直在探索各种类型的离散或连续的后编辑标签。例如，通过测量以秒为单位的后编辑时间对译文句子进行评分；通过测量预测后编辑过程所需的击键数对译文句子进行评分；通过计算{\small\sffamily\bfseries{人工译后编辑距离}}\index{人工译后编辑距离}（Human Translation Error Rate，HTER）\index{Human Translation Error Rate，HTER}，即在后编辑过程中编辑（插入/删除/替换）)数量与参考翻译长度的占比率对译文句子进行评分。HTER的计算公式为：
+\item 预测译文句子的后编辑工作量。在最近的研究中，句子级地质量评估一直在探索各种类型的离散或连续的后编辑标签。例如，通过测量以秒为单位的后编辑时间对译文句子进行评分；通过测量预测后编辑过程所需的击键数对译文句子进行评分；通过计算{\small\sffamily\bfseries{人工译后编辑距离}}\index{人工译后编辑距离}（Human Translation Error Rate，HTER）\index{Human Translation Error Rate，HTER}，即在后编辑过程中编辑（插入/删除/替换）数量与参考翻译长度的占比率对译文句子进行评分。HTER的计算公式为：
 \vspace{0.5em}
 \begin{eqnarray}
 \rm{HTER}= \frac{\mbox{编辑操作数目}}{\mbox{翻译后编辑结果长度}}
 \label{eq:4-20}
 \end{eqnarray}
-\parinterval 这种质量评估方式往往以单词级质量评估为基础，在其结果的基础上进行计算。以实例\ref{eg:4-7}中词级质量评估结果为例，与编辑后结果相比较，机器翻译译文中有四处漏译（``Mit''``können''``Sie''``einzelne''）、三处误译（``dem''``Scharfzeichner''\\``scharfzeichnen''分别被误译为``Der''``Schärfen-Werkezug''``Schärfer''）、一处多译（``erscheint''），因而需要进行4次插入操作、3次替换操作和1次删除操作，而最终译文长度为12，则有$\rm HTER=(4+3+1)/12=0.667$。需要注意的是，即便这种评估方式以单词级质量评估为基础，也不意味这句子级质量评估只是在单词级质量评估的结果上通过简单的计算来获得其得分，在实际研究中，常将其视为一个回归问题，利用大量数据学习其评分规则。
+\parinterval 这种质量评估方式往往以单词级质量评估为基础，在其结果的基础上进行计算。以实例\ref{eg:4-7}中词级质量评估结果为例，与编辑后结果相比较，机器翻译译文中有四处漏译（``Mit''、``können''、``Sie''、``einzelne''）、三处误译（``dem''、\\``Scharfzeichner''、``scharfzeichnen''分别被误译为``Der''、``Schärfen-Werkezug''、``Schärfer''）、一处多译（``erscheint''），因而需要进行4次插入操作、3次替换操作和1次删除操作，而最终译文长度为12，则有$\rm HTER=(4+3+1)/12=0.667$。需要注意的是，即便这种评估方式以单词级质量评估为基础，也不意味这句子级质量评估只是在单词级质量评估的结果上通过简单的计算来获得其得分，在实际研究中，常将其视为一个回归问题，利用大量数据学习其评分规则。
 \vspace{0.5em}
 \end{itemize}
@@ -763,7 +767,7 @@ scharfzeichnen.（德语）
 \subsubsection{4.文档级质量评估}
-\parinterval 文档级质量评估的主要目的就是对机器翻译得到的译文文档进行打分。文档级质量评估中，``文档''这个术语很多时候并不单单指一整篇文档，而是指包含多个句子的文本，例如包含3到5个句子的段落或是像新闻文章一样的长文本。
+\parinterval 文档级质量评估的主要目的是对机器翻译得到的整个译文文档进行打分。文档级质量评估中，``文档''很多时候并不单单指一整篇文档，而是指包含多个句子的文本，例如包含3到5个句子的段落或是像新闻文章一样的长文本。
 \parinterval 传统的机器翻译任务中，往往以一个句子作为输入和翻译的单元，而忽略了文档中句子之间的联系，这可能会使文档的论述要素受到影响，最终导致整个文档的语义不连贯。如实例1所示，在第二句中``he''原本指代第一句中的``housewife''，这里出现了错误，但这种错误在句子级的质量评估中并不能被发现。
@@ -786,13 +790,15 @@ Reference： A few days ago, {\red he} contacted the News Channel and said that 
 \vspace{0.5em}
 \item 阅读理解测试得分情况。以往衡量文档译文质量的主要方法是采用理解测试\upcite{,DBLP:conf/icassp/JonesGSGHRW05}，即利用提前设计好的与文档相关的阅读理解题目（包括多项选择题类型和问答题类型）对母语为目标语言的多个测试者进行测试，将代表测试者在给定文档上的问卷中的所有问题所得到的分数作为质量标签。
 \vspace{0.5em}
-\item 两阶段后编辑工作量。 最近的研究工作中，多是采用对文档译文进行后编辑的工作量作为评价指标评估文档译文的质量，为了准确获取文档后编辑的工作量，两阶段后编辑方法被提出\upcite{DBLP:conf/eamt/ScartonZVGS15}，即第一阶段对文档中的句子单独在无语境情况下进行后编辑，第二阶段将所有句子重新合并成文档后再进行后编辑。两阶段中后编辑工作量的总和越多，意味着文档译文质量越差。
+\item 后编辑工作量。 最近的研究工作中，多是采用对文档译文进行后编辑的工作量评估文档译文的质量。为了准确获取文档后编辑的工作量，两阶段后编辑方法被提出\upcite{DBLP:conf/eamt/ScartonZVGS15}，即第一阶段对文档中的句子单独在无语境情况下进行后编辑，第二阶段将所有句子重新合并成文档后再进行后编辑。两阶段中后编辑工作量的总和越多，意味着文档译文质量越差。
 \vspace{0.5em}
 \end{itemize}
-\parinterval 在文档级质量评估任务中，需要对译文文档做一些更细粒度的注释，注释内容包括错误、错误类型和错误的严重程度，最终在注释的基础上对译文文档质量进行评估。
+\parinterval 在文档级质量评估任务中，需要对译文文档做一些更细粒度的注释，注释内容包括错误位置、错误类型和错误的严重程度，最终在注释的基础上对译文文档质量进行评估。
-\parinterval 文档级质量评估与更细粒度的词级和句子级的质量评价相比更加复杂、更加难以实现。其难点之一在于文档级的质量评估过程中需要根据一些主观的质量标准去对文档进行评分，例如在注释的过程中，对于错误的严重程度并没有严格的界限和规定，只能靠评测人员主观判断，这就意味着随着出现主观偏差的注释的增多，文档级质量评估的参考价值会大打折扣。另一方面，根据所有注释（错误、错误类型及其严重程度）对整个文档进行评分本身就具有不合理性，因为译文中有些在抛开上下文环境的情况下可以并判定为``翻译的不错的''单词和句子，一旦被放在文档中的语境后就可能变得不合理，而某些在无语境条件下看起来翻译得``糟糕透了''的单词和句子，一旦被放在文档中的语境中可能会变得恰到好处。此外，构建一个质量评测模型势必需要大量的标注数据，而文档级质量评测所需要的带有注释的数据的获取代价相当高。
+\parinterval 与更细粒度的词级和句子级的质量评价相比，文档级质量评估更加复杂。其难点之一在于文档级的质量评估过程中需要根据一些主观的质量标准去对文档进行评分，例如在注释的过程中，对于错误的严重程度并没有严格的界限和规定，只能靠评测人员主观判断，这就意味着随着出现主观偏差的注释的增多，文档级质量评估的参考价值会大打折扣。另一方面，根据所有注释（错误位置、错误类型及其严重程度）对整个文档进行评分本身就具有不合理性，因为译文中有些在抛开上下文环境的情况下可以并判定为``翻译的不错的''单词和句子，一旦被放在文档中的语境后就可能变得不合理，而某些在无语境条件下看起来翻译得`` 糟糕透了''的单词和句子，一旦被放在文档中的语境中可能会变得恰到好处。此外，构建一个质量评测模型势必需要大量的标注数据，而文档级质量评测所需要的带有注释的数据的获取代价相当高。
+\parinterval 实际上，文档级集质量评估与其它文档级自然语言处理任务面临的问题是一样的。由于数据稀缺，无论是系统研发，还是结果评价都面临很大挑战。这些问题也会在本书的{\chaptersixteen}和{\chapterseventeen}进行讨论。
 %----------------------------------------------------------------------------------------
 %    NEW SUB-SECTION
@@ -800,47 +806,41 @@ Reference： A few days ago, {\red he} contacted the News Channel and said that 
 \subsection{怎样构建质量评估模型}
-\parinterval 不同于有参考答案的自动评价，质量评估任务中对译文质量进行评价需要更加``复杂''的计算方式。质量评估本质上是一个统计推断问题，即：如何根据以往得到的经验对从未见过的机器译文的质量做出预测。从这个角度说，质量评估和机器翻译问题一样，都需要设计模型进行求解，而不是像BLEU计算一样直接使用一两个指标性的公式就能得到结果。
+\parinterval 不同于有参考答案的自动评价，质量评估方法的实现较为复杂。质量评估可以被看作是一个统计推断问题，即：如何根据以往得到的经验对从未见过的机器译文的质量做出预测。从这个角度说，质量评估和机器翻译问题一样，都需要设计模型进行求解，而不是像BLEU计算一样直接使用指标性的公式计算就能得到结果。
-\parinterval 实际上，质量评估的灵感最初来源于语音识别中的置信度评价，所以最初研究人员也尝试通过翻译模型中的后验概率来直接评价翻译质量\upcite{DBLP:conf/interspeech/FetterDR96}，然而仅仅依靠概率值作为评价标准显然是远远不够的，其效果也让人大失所望。直到2003年，质量评估被定义为一个有监督的预测类机器学习问题，此后，``使用机器学习算法从许多通过特征（或学习表示）描述的句子翻译示例中归纳模型''成为了处理质量评估问题的基本思路。
+\parinterval 实际上，质量评估的灵感最初来源于语音识别中的置信度评价，所以最初研究人员也尝试通过翻译模型中的后验概率来直接评价翻译质量\upcite{DBLP:conf/interspeech/FetterDR96}，然而仅仅依靠概率值作为评价标准显然是远远不够的，其效果也让人大失所望。之后，质量评估被定义为一个有监督的预测类机器学习问题。这也导致质量评估的新范式：使用机器学习算法从描述句子的某种表示中归纳出评估模型。
 \parinterval 研究人员将质量评估模型的基本框架设计为两部分：
-\parinterval（1）特征提取模块：用于在数据中提取能够反映翻译结果``质量''的特征。
-\parinterval（2）质量评估模块：基于提取的特征，利用机器学习算法预测翻译结果``质量''。
-\parinterval 特征提取模块主要提取四个方面的特征：从源文中提取复杂度特征、从机器译文中提取流畅度特征、借助机器翻译系统提取翻译置信度特征、比照源文和机器译文提取充分度特征。
 \begin{itemize}
 \vspace{0.5em}
-\item 复杂度特征：反映了翻译一个源文的难易程度，翻译难度越大，译文质量低的可能性就越大。源文的形态句法信息最能反映源文的复杂度，例如源文的长度越长，源文往往越复杂；源文的句法树越宽、越深，源文往往越复杂，因为源句的句法树越深代表句子结构越复杂，源句的句法树越宽代表句子中各成分相互联系越多，正如图\ref{fig:4-13}所示；源文中定语从句的数量越多，源文往往更复杂。
+\item 表示/特征学习模块：用于在数据中提取能够反映翻译结果质量的“特征”；
+\vspace{0.5em}
+\item 质量评估模块：基于句子的表示结果，利用机器学习算法预测翻译结果的质量。
+\vspace{0.5em}
+\end{itemize}
-%----------------------------------------------
+\parinterval 在传统机器学习的观点下，句子都是由某些特征表示的。因此需要从句子中“抽取”出能代表句子的特征{\color{red}（参考文献！）}，常用的特征有：
-\begin{figure}[htp]
-    \centering
-	\subfigure[较浅较窄的句法书意味着较简单的句子结构]{\input{./Chapter4/Figures/a-shallow-and-narrow-grammar-means-a-simpler-sentence-structure}}
-	\subfigure[较深较宽的语法树意味着更复杂的句子结构]{\input{./Chapter4/Figures/A-deeper-and-wider-grammar-tree-means-more-complex-sentence-structures}}
-   \caption{句法树隐含着复杂度特征}
-   \label{fig:4-13}
-\end{figure}
-%----------------------------------------------
+\begin{itemize}
+\vspace{0.5em}
+\item 复杂度特征：反映了翻译一个源文的难易程度，翻译难度越大，译文质量低的可能性就越大；
 \vspace{0.5em}
-\item 流畅度特征：反映了译文的自然度、流畅度、语法合理程度。为了衡量译文的流畅度，往往需要借助大型目标语言语料库、语言模型和语法检查工具等。例如借助大型目标语料库和统计语言模型获取的译文3-gram语言模型概率、利用语法检查工具获取的译文语法正确性等等，这些数学性指标均可用来衡量译文的流畅度。
+\item 流畅度特征：反映了译文的自然度、流畅度、语法合理程度；
 \vspace{0.5em}
-\item 置信度特征：反映了机器翻译系统对输出的译文的置信程度。翻译系统解码过程中对应的译文的全局概率、最终$n$-best清单中翻译假设的数量、译文中的词语在$n$-best输出中的出现次数等指标都可以作为机器翻译提供的置信度特征用于质量评估。
+\item 置信度特征：反映了机器翻译系统对输出的译文的置信程度；
-\item 充分度特征：反映了源文和机器译文在不同语言层次上的密切程度或关联程度。比较常用的充分度特征包括源文和译文的长度比、源文和译文的词对齐信息、源文和译文表层结构（例如括号、数字、标点符号等）数量的绝对差异、源文和译文句法树的深度和宽度差异、源文和译文中命名实体数量的差异、源文和译文之间$n$元语法单元的匹配比例，此外，还可以用源文和译文的分布式表示衡量其间的相似性。由于源文和译文之间语言的不同，充分度特征是最难可靠提取的特征类型。
+\vspace{0.5em}
+\item 充分度特征：反映了源文和机器译文在不同语言层次上的密切程度或关联程度。
 \vspace{0.5em}
 \end{itemize}
-\parinterval 随着深度学习技术的发展，目前比较常用的特征提取手段还包括利用神经网络模型自动提取质量特征，但由于这种方法的可解释性比较差，研究人员无法对该方法提取到的质量特征类型进行判断。
+\parinterval 随着深度学习技术的发展，另一种思路是使用表示学习技术生成句子的分布式表示，这样就避免了人工设计特征的代价。同时，由于表示学习所得到的分布式表示可以更全面的反应句子的特点，因此在质量评估任务上也取得了很好的效果{\color{red}（参考文献！）}。比如，最近的一些工作中大量使用了神经机器翻译模型来获得双语句子的表示结果，并用于质量评估。这么做的好处在于，质量评估可以直接复用机器翻译的模型，从某种意义上降低了质量评估系统开发的代价。关于表示学习和神经机器翻译的内容在{\chapternine}和{\chapterten}会有进一步描述。
-\parinterval 提取到质量特征之后，使用质量评估模块对译文质量进行预测。质量评估模型通常由分类算法或回归算法实现：
+\parinterval 在得到句子表示之后，可以使用质量评估模块对译文质量进行预测。质量评估模型通常由分类算法或回归算法实现：
 \begin{itemize}
 \vspace{0.5em}
-\item 句子级和文档级质量评估多由回归算法实现。由于在句子级和文档级的质量评估中，标签是使用连续数字表示的，因此回归算法是最合适的选择，其中最常用的算法有朴素贝叶斯、线性回归、支持向量机、脊回归、偏最小二乘法、随机森林算法等。
+\item 句子级和文档级质量评估多由回归算法实现。由于在句子级和文档级的质量评估中，标签是使用连续数字表示的，因此回归算法是最合适的选择；
 \vspace{0.5em}
 \item 单词级和短语级质量评估多由分类算法实现。对于单词级质量评估任务中标记``OK''或``BAD''，这对应了经典的二分类问题，因此经常使用分类算法对其进行预测，自动分类算法在{\chapterthree}已经涉及，质量评估中直接使用成熟的分类器即可。
 \vspace{0.5em}
@@ -852,21 +852,20 @@ Reference： A few days ago, {\red he} contacted the News Channel and said that 
 \subsection{质量评估的应用场景}
-\parinterval 质量评估在过去十年中越来越受欢迎，这个现象在一些机器翻译领域之外的人看来非常令人费解，无论是人工评价方法还是自动评价方法都能够对机器译文质量进行评价从而衡量机器翻译系统的整体性能，无参考答案评价方法看起来似乎是没有存在的必要，但事实却并非如此。
+\parinterval 很多情况下参考答案是很难获取的，例如，在很多人工翻译生产环节中，译员的任务就是“创造”翻译。如果已经有了答案，译员根本不需要工作，也谈不上应用机器翻译技术了。这时更多的是希望通过质量评估帮助译员有效的选择机器翻译结果。质量评估的应用场景还有很多，例如：
-\parinterval 传统的有参考答案的评价方式可以通过计算机器译文与参考答案之间的相似性用来评测机器翻译系统的整体性能，从而加快系统研发进程。与其相比，无需参考答案的质量评估是在无参考答案的情况下，直接对机器译文质量作出预测，这个课题的提出，其目的并不在于其``评价''功能而在于其``预测''功能。
-\parinterval 大多数情况下参考答案的获取具有很大难度，因而质量评估比传统的有参考答案的自动评价方法更接近生产生活的实际情况，更适合被应用到生产实践和实时机器翻译场景中，为社会创造更多的实用价值和商业价值。下面将列举几个质量评估合理的应用场景：
 \begin{itemize}
 \vspace{0.5em}
-\item 判断人工后编辑的工作量，辅助人工后编辑过程。人工后编辑工作中有两个不可避免的问题：待编辑的机器译文是否值得改、待编辑的机器译文需要修改哪里。对于一些质量较差的机器译文来说，人工重译远远比修改译文的效率高，后编辑人员可以借助质量评估模型提供的句子分数或是编辑距离这两个指标筛选出值得进行后编辑的机器译文，另一方面，质量评估模型可以为每条机器译文提供{错误内容、错误类型、错误严重程度}的注释，这些内容将帮助后编辑人员准确定位到需要修改的位置，同时在一定程度上提示后编辑人员采取何种修改策略，势必能大大减少后编辑的工作内容。
+\item 判断人工后编辑的工作量。人工后编辑工作中有两个不可避免的问题：1）待编辑的机器译文是否值得改？2）待编辑的机器译文需要修改哪里？对于一些质量较差的机器译文来说，人工重译远远比修改译文的效率高，后编辑人员可以借助质量评估系统提供的指标筛选出值得进行后编辑的机器译文，另一方面，质量评估模型可以为每条机器译文提供{错误内容、错误类型、错误严重程度}的注释，这些内容将帮助后编辑人员准确定位到需要修改的位置，同时在一定程度上提示后编辑人员采取何种修改策略，势必能大大减少后编辑的工作内容。
+\vspace{0.5em}
+\item 自动识别并更正翻译错误。质量评估模型和{\small\sffamily\bfseries{自动后编辑模型}}\index{自动后编辑模型}（Automatic Podt-Editing，APE）\index{Automatic Podt-Editing, APE}也是很有潜力的应用方向。因为质量评估可以预测出错的位置，进而可以使用自动方法修正这些错误。但是，在这种应用模式中，质量评估的精度是非常关键的，因为如果预测错误可能会产生错误的修改，甚至带来整体译文质量的下降。
 \vspace{0.5em}
-\item 自动识别翻译错误，助力机器翻译后编辑工作完全自动化.质量评估模型和{\small\sffamily\bfseries{自动后编辑模型}}\index{自动后编辑模型}（Automatic Podt-Editing,APE）\index{Automatic Podt-Editing,APE}协同工作能够实现后编辑工作的自动化和流水化，提高日常工作效率，创造更多的经济效益：将机器翻译系统输出作为质量评估模型的输入，质量评估模型能够自动识别出机器译文中不准确、不流畅的现象，完成三方面内容：锁定错误出现的位置、识别错误类型、描述错误的严重程度。此后，自动后编辑模型将根据质量评估模型提供的错误提示，自动对译文中的错误进行修改，并生成$n$个最优的后编辑译文。质量评估模型此后将充当``评委''的角色，在最优后编辑译文列表中筛选出后编辑工作的最终输出。比较遗憾的是，目前自动后编辑模型的输出结果与人工后编辑结果相比仍存在一定的差距，在一些对译文准确性要求较高的场合仍需要人工后编辑的参与，但相信随着质量评估技术和自动后编辑技术的发展，后编辑工作的完全自动化在不远的将来必然可以实现。
+\item 辅助外语交流和学习。例如，在很多社交网站上，用户会利用外语进行交流。质量评估模型可以提示该用户输入的内容中存在的用词、语法等问题，这样用户可以重新对内容进行修改。甚至质量评估可以帮助外语学习者发现外语使用中的问题，例如，对于一个英语初学者，如果能提示他/她写的句子中的明显错误，对他/她的外语学习是非常有帮助的。
 \vspace{0.5em}
-\item 多语言场景下，参与人机协同过程实现无语言障碍交流。在跨国性质的服务网站和社交网站这种典型的多语言场景下，质量评估模型将鼓励用户使用非母语语言进行交流。例如在某社交网站上，当一名英国用户尝试用自己并不熟练的德语对某个德国用户的言论发表评价时，质量评估模型可以提示该用户评论内容中存在的用词、语法等问题。或者该用户选择借助机器翻译系统将英文评论内容翻译为德文，质量评估模型可以对翻译内容进行评分，由用户根据评分的高低决定使用机器翻译系统输出的德文进行评论还是使用原本的英文进行评论；例如在某国际酒店的预定网站上，酒店经营者希望使用机器翻译系统将某些服务评价内容译为多种语言供顾客参考，使用质量评估模型后，可以筛选出更加准确流畅的评价译文对顾客进行公示。在大型国际会议现场，{\small\sffamily\bfseries{自动语音识别系统}}\index{自动语音识别系统}（Automatic Speech Recognition，ASR）\index{Automatic Speech Recognition，ASR}、{\small\sffamily\bfseries{机器翻译系统}}\index{机器翻译系统}（Machine Translation，MT）\index{Machine Translation，MT}和质量评估工具的相互配合，有望在未来取代随身翻译和国际会议现场的同声传译专业人员：ASR系统给出较为准确的语音识别结果，由几个高性能的MT系统对其进行翻译，产生若干翻译结果，使用质量评估工具对翻译结果进行质量评估，评分最高的译文作为整体输出。比较遗憾的是，目前语音识别和机器翻译的发展水平都并未达到会议级别的要求，所以以机器代替专业人员还需要很长一段时间。
 \end{itemize}
+\parinterval 需要注意的是，质量评估的应用模式还没有完全得到验证。这一方面是由于，质量评估的应用非常依赖与人的交互过程。但是，改变人的工作习惯是很困难的，因此质量评估系统在应用时往往需要很长的时间适应到场景中，或者说人也要适应质量评估系统的行为。另一方面，质量评估的很多应用场景还没有完全被发觉出来，需要更长的时间进行探索。
 %----------------------------------------------------------------------------------------
 %    NEW SECTION
 %----------------------------------------------------------------------------------------
@@ -874,18 +873,18 @@ Reference： A few days ago, {\red he} contacted the News Channel and said that 
 \sectionnewpage
 \section{小结及深入阅读}
-\parinterval 译文的质量评价是机器翻译研究中不可或缺的环节。与其他任务不同，由于自然语言高度的歧义性和表达方式的多样性，机器翻译的参考答案本身就不唯一。此外，对译文准确、全面的评价准则很难制定，导致译文质量的自动评价变得异常艰难，因此其也成为了广受关注的研究课题。本章系统阐述了译文质量评估的研究现状和主要挑战。从人类参与程度和标注类型两个角度对译文质量评价中的经典方法进行介绍，其中对广受学界关注的无参考译文的质量评估问题从方法、模型、应用各个角度进行着重介绍，力求读者对领域内的热点内容有更加全面的了解。比较遗憾的是，由于篇幅限制笔者无法对译文评价的相关工作讲述得面面俱到，除了章节中的内容，还有很多研究问题值得关注：
+\parinterval 译文的质量评价是机器翻译研究中不可或缺的环节。与其他任务不同，由于自然语言高度的歧义性和表达方式的多样性，机器翻译的参考答案本身就不唯一。此外，对译文准确、全面的评价准则很难制定，导致译文质量的自动评价变得异常艰难，因此也成为了广受关注的研究课题。本章系统阐述了译文质量评估的研究现状和主要挑战。从人类参与程度和标注类型两个角度对译文质量评价中的经典方法进行介绍，力求让读者对领域内的经典及热点内容有更加全面的了解。不过，由于篇幅限制笔者无法对译文评价的相关工作进行面面俱到的描述，还有很多研究方向值得关注：
 \begin{itemize}
 \vspace{0.5em}
-\item 基于句法和语义的机器译文质量自动评价方法。本章内容中介绍的自动评价多是基于表面字符串形式判定机器翻译结果和参考译文之间的相似度，而忽略了更抽象的语言层次的信息。基于句法和语义的机器译文质量自动评价方法在在评价度量标准中加入能反映句法信息\upcite{DBLP:conf/acl/LiuG05}和语义信息\upcite{DBLP:conf/wmt/GimenezM07a}的相关内容，通过比较机器译文与参考答案之间的句法相似度和语义等价性\upcite{DBLP:journals/mt/PadoCGJM09}，能够大大提高自动评价与人工评价之间的相关性。其中句法信息往往能够对机器译文流利度方面的评价起到促进作用\upcite{DBLP:conf/acl/LiuG05}，常见的句法信息包括语法成分\upcite{DBLP:conf/acl/LiuG05}、依赖关系\upcite{DBLP:conf/ssst/OwczarzakGW07}\upcite{DBLP:conf/wmt/OwczarzakGW07}\upcite{DBLP:conf/coling/YuWXJLL14}、句法结构\upcite{DBLP:conf/wmt/PopovicN09}等。语义信息则机器翻译的充分性评价更有帮助\upcite{DBLP:conf/acl/BanchsL11}\upcite{reeder2006measuring}，近年来也有很多很多用于机器译文质量评估的语义框架被提出，如AM-FM\upcite{DBLP:conf/acl/BanchsL11}、XMEANT\upcite{DBLP:conf/acl/LoBSW14}等。
+\item 基于句法和语义的机器译文质量自动评价方法。本章内容中介绍的自动评价多是基于表面字符串形式判定机器翻译结果和参考译文之间的相似度，而忽略了更抽象的语言层次的信息。基于句法和语义的机器译文质量自动评价方法在在评价度量标准中加入能反映句法信息\upcite{DBLP:conf/acl/LiuG05}和语义信息\upcite{DBLP:conf/wmt/GimenezM07a}的相关内容，通过比较机器译文与参考答案之间的句法相似度和语义等价性\upcite{DBLP:journals/mt/PadoCGJM09}，能够大大提高自动评价与人工评价之间的相关性。其中句法信息往往能够对机器译文流利度方面的评价起到促进作用\upcite{DBLP:conf/acl/LiuG05}，常见的句法信息包括语法成分\upcite{DBLP:conf/acl/LiuG05}、依存关系\upcite{DBLP:conf/ssst/OwczarzakGW07}\upcite{DBLP:conf/wmt/OwczarzakGW07}\upcite{DBLP:conf/coling/YuWXJLL14}等。语义信息则对机器翻译的充分性评价更有帮助\upcite{DBLP:conf/acl/BanchsL11}\upcite{reeder2006measuring}，近年来也有很多很多用于机器译文质量评估的语义框架被提出，如AM-FM\upcite{DBLP:conf/acl/BanchsL11}、XMEANT\upcite{DBLP:conf/acl/LoBSW14}等。
 \vspace{0.5em}
-\item 对机器译文中的错误分析和错误分类。无论是人工评价还是自动评价手段，其评价结果只能反映机器翻译系统性能，而无法确切表明机器翻译系统的强项或弱点是什么、系统最常犯什么类型的错误、一个特定的修改是否改善了系统的某一方面、排名较差的系统是否在任何方面都优于排名较好的系统等等。对机器译文进行错误分析和错误分类有助于找出机器翻译系统中存在的主要问题，以便集中精力进行研究改进\upcite{DBLP:conf/lrec/VilarXDN06}。相关的研究工作中，一些致力于错误分类方法的设计，如手动的机器译文错误分类框架\upcite{DBLP:conf/lrec/VilarXDN06}、自动的机器译文错误分类框架\upcite{popovic2011human}、基于语言学的错误分类方法\upcite{DBLP:journals/mt/CostaLLCC15}以及目前被用作篇章级质量评估注释标准的MQM错误分类框架\upcite{lommel2014using}；其他的研究工作则致力于对机器译文进行错误分析，如引入形态句法信息的自动错误分析框架\upcite{DBLP:conf/wmt/PopovicGGLNMFB06}、引入词错误率(WER)和位置无关词错误率(PER)的错误分析框架\upcite{DBLP:conf/wmt/PopovicN07}、基于检索的错误分析工具tSEARCH\upcite{DBLP:conf/acl/GonzalezMM13}等等。
+\item 对机器译文中的错误分析和错误分类。无论是人工评价还是自动评价手段，其评价结果只能反映机器翻译系统性能，而无法确切表明机器翻译系统的优点和弱点是什么、系统最常犯什么类型的错误、一个特定的修改是否改善了系统的某一方面、排名较差的系统是否在任何方面都优于排名较好的系统等等。对机器译文进行错误分析和错误分类有助于找出机器翻译系统中存在的主要问题，以便集中精力进行研究改进\upcite{DBLP:conf/lrec/VilarXDN06}。相关的研究工作中，一些致力于错误分类方法的设计，如手动的机器译文错误分类框架\upcite{DBLP:conf/lrec/VilarXDN06}、自动的机器译文错误分类框架\upcite{popovic2011human}、基于语言学的错误分类方法\upcite{DBLP:journals/mt/CostaLLCC15}以及目前被用作篇章级质量评估注释标准的MQM错误分类框架\upcite{lommel2014using}；其他的研究工作则致力于对机器译文进行错误分析，如引入形态句法信息的自动错误分析框架\upcite{DBLP:conf/wmt/PopovicGGLNMFB06}、引入词错误率（WER）和位置无关词错误率（PER）的错误分析框架\upcite{DBLP:conf/wmt/PopovicN07}、基于检索的错误分析工具tSEARCH\upcite{DBLP:conf/acl/GonzalezMM13}等等。
 \vspace{0.5em}
-\item 译文质量的多角度评价。章节内主要介绍的几种经典方法如BLEU、TER、METEOR等，大都是从某个单一的角度计算机器译文和参考答案的相似性，如何对译文从多个角度进行综合评价是需要进一步思考的问题，\ref{Evaluation method of Multi Strategy fusion}节中介绍的多策略融合评价方法就可以看作是一种多角度评价方法，其思想是将各种评价方法下的译文得分通过某种方式进行组合，从而实现对译文的综合评价。译文质量多角度评价的另一种思路则是直接将BLEU、TER、Meteor等多种指标看做是某种特征，使用分类、回归、排序等机器学习手段形成一种综合度量。此外，也有相关工作专注于多等级的译文质量评价，使用聚类算法大致将译文按其质量分为不同等级，并对不同质量等级的译文按照不同权重组合几种不同的评价方法。
+\item 译文质量的多角度评价。章节内主要介绍的几种经典方法如BLEU、TER、METEOR等，大都是从某个单一的角度计算机器译文和参考答案的相似性，如何对译文从多个角度进行综合评价是需要进一步思考的问题，\ref{Evaluation method of Multi Strategy fusion}节中介绍的多策略融合评价方法就可以看作是一种多角度评价方法，其思想是将各种评价方法下的译文得分通过某种方式进行组合，从而实现对译文的综合评价。译文质量多角度评价的另一种思路则是直接将BLEU、TER、Meteor等多种指标看做是某种特征，使用分类、回归、排序等机器学习手段形成一种综合度量。此外，也有相关工作专注于多等级的译文质量评价，使用聚类算法将大致译文按其质量分为不同等级，并对不同质量等级的译文按照不同权重组合几种不同的评价方法。
 \vspace{0.5em}
-\item 不同评价方法的应用场景有明显不同：人工评价主要用于需要对机器翻译系统进行准确的评估的场合。例如，在系统对比中利用人工评价方法对不同系统进行人工评价、给出最终排名，或上线机器翻译服务时对翻译品质进行详细的测试；有参考答案的自动评价则可以为机器翻译系统提供快速、相对可靠的评价。在机器翻译系统的快速研发过程中，一般都使用有参考答案的自动评价方法对最终模型的性能进行评估。有相关研究工作专注在机器翻译模型的训练过程中充分利用评价信息进行参数调优（如BLEU分数），其中比较有代表性的工作包括最小错误率训练\upcite{DBLP:conf/acl/Och03}、最小风险训练等\upcite{DBLP:conf/acl/ShenCHHWSL16}。这部分内容可以参考{\chapterseven}和{\chapterthirteen}进行进一步阅读；无参考答案的质量评估主要用来对译文质量做出预测，经常被应用在是在一些无法提供参考译文的实时翻译场景中，例如人机交互过程、自动纠错、后编辑等\upcite{DBLP:conf/wmt/FreitagCR19}。
+\item 不同评价方法的应用场景有明显不同：人工评价主要用于需要对机器翻译系统进行准确的评估的场合。例如，在系统对比中利用人工评价方法对不同系统进行人工评价、给出最终排名，或上线机器翻译服务时对翻译品质进行详细的测试；有参考答案的自动评价则可以为机器翻译系统提供快速、相对可靠的评价。在机器翻译系统的快速研发过程中，一般都使用有参考答案的自动评价方法对最终模型的性能进行评估。有相关研究工作专注在机器翻译模型的训练过程中充分利用评价信息进行参数调优（如BLEU分数），其中比较有代表性的工作包括最小错误率训练\upcite{DBLP:conf/acl/Och03}、最小风险训练\upcite{DBLP:conf/acl/ShenCHHWSL16}等。这部分内容可以参考{\chapterseven}和{\chapterthirteen}进行进一步阅读；无参考答案的质量评估主要用来对译文质量做出预测，经常被应用在是在一些无法提供参考译文的实时翻译场景中，例如人机交互过程、自动纠错、后编辑等\upcite{DBLP:conf/wmt/FreitagCR19}。
 \vspace{0.5em}
-\item 质量评估领域比较值得关注的一个研究问题是如何使模型更加鲁棒，因为通常情况下，一个质量评估模型会受语种、评价等级等问题的约束，设计一个能应用于任何语种，同时从单词、短语、句子等各个等级对译文质量进行评估的模型是很有难度的。Bicici等人最先关注质量评估的鲁棒性问题，并设计开发了一种与语言无关的机器翻译性能预测器\upcite{DBLP:journals/mt/BiciciGG13}，此后又在该工作的基础上研究如何利用外在的、与语言无关的特征对译文进行句子级别的质量评估\upcite{DBLP:conf/wmt/BiciciW14}，该项研究的最终成果是一个与语言无关，可以从各个等级对译文质量进行评估的模型——RTMs（Referential Translation Machines）\upcite{DBLP:conf/wmt/BiciciLW15a}。
+\item 另一个比较值得关注的一个研究问题是如何使模型更加鲁棒，因为通常情况下，一个质量评估模型会受语种、评价策略等问题的约束，设计一个能应用于任何语种，同时从单词、短语、句子等各个等级对译文质量进行评估的模型是很有难度的。Bicici等人最先关注质量评估的鲁棒性问题，并设计开发了一种与语言无关的机器翻译性能预测器\upcite{DBLP:journals/mt/BiciciGG13}，此后又在该工作的基础上研究如何利用外在的、与语言无关的特征对译文进行句子级别的质量评估\upcite{DBLP:conf/wmt/BiciciW14}，该项研究的最终成果是一个与语言无关，可以从各个等级对译文质量进行评估的模型——RTMs（Referential Translation Machines）\upcite{DBLP:conf/wmt/BiciciLW15a}。
 \vspace{0.5em}
 \end{itemize}
--- a/Chapter5/chapter5.tex
+++ b/Chapter5/chapter5.tex
@@ -37,7 +37,7 @@ IBM模型由Peter F. Brown等人于上世纪九十年代初提出\cite{DBLP:jour
 \parinterval 在翻译任务中，我们希望得到一个源语言到目标语言的翻译。对于人类来说这个问题很简单，但是让计算机做这样的工作却很困难。这里面临的第一个问题是：如何对翻译进行建模？从计算机的角度来看，这就需要把自然语言的翻译问题转换为计算机可计算的问题。
-\parinterval 那么，基于单词的统计机器翻译模型又是如何描述翻译问题的呢？Peter F. Brown等人提出了一个观点\cite{Peter1993The}：在翻译一个句子时，可以把其中的每个单词翻译成对应的目标语言单词，然后调整这些目标语言单词的顺序，最后得到整个句子的翻译结果，而这个过程可以用统计模型来描述。尽管在人看来使用两个语言单词之间的对应进行翻译是很自然的事，但是对于计算机来说可是向前迈出了一大步。
+\parinterval 那么，基于单词的统计机器翻译模型又是如何描述翻译问题的呢？Peter F. Brown等人提出了一个观点\cite{DBLP:journals/coling/BrownPPM94}：在翻译一个句子时，可以把其中的每个单词翻译成对应的目标语言单词，然后调整这些目标语言单词的顺序，最后得到整个句子的翻译结果，而这个过程可以用统计模型来描述。尽管在人看来使用两个语言单词之间的对应进行翻译是很自然的事，但是对于计算机来说可是向前迈出了一大步。
 \parinterval 先来看一个例子。图 \ref{fig:5-1}展示了一个汉语翻译到英语的例子。首先，可以把源语言句子中的单词``我''、``对''、``你''、``感到''和``满意''分别翻译为``I''、``with''、``you''、``am''\ 和``satisfied''，然后调整单词的顺序，比如，``am''放在译文的第2个位置，``you''应该放在最后的位置等等，最后得到译文``I am satisfied with you''。
@@ -529,7 +529,7 @@ g(\vectorn{s},\vectorn{t}) \equiv \prod_{j,i \in \widehat{A}}{\funp{P}(s_j,t_i)}
 %----------------------------------------------
 \vspace{-0.5em}
-\parinterval IBM模型也是建立在如上统计模型之上。具体来说，IBM模型的基础是{\small\sffamily\bfseries{噪声信道模型}}\index{噪声信道模型}（Noise Channel Model）\index{Noise Channel Model}，它是由Shannon在上世纪40年代末提出来的\cite{shannon1949communication}，并于上世纪80年代应用在语言识别领域，后来又被Brown等人用于统计机器翻译中\cite{brown1990statistical,Peter1993The}。
+\parinterval IBM模型也是建立在如上统计模型之上。具体来说，IBM模型的基础是{\small\sffamily\bfseries{噪声信道模型}}\index{噪声信道模型}（Noise Channel Model）\index{Noise Channel Model}，它是由Shannon在上世纪40年代末提出来的\cite{shannon1949communication}，并于上世纪80年代应用在语言识别领域，后来又被Brown等人用于统计机器翻译中\cite{brown1990statistical,DBLP:journals/coling/BrownPPM94}。
 \parinterval 在噪声信道模型中，源语言句子$\vectorn{s}$（信宿）被看作是由目标语言句子$\vectorn{t}$（信源）经过一个有噪声的信道得到的。如果知道了$\vectorn{s}$和信道的性质，可以通过$\funp{P}(\vectorn{t}|\vectorn{s})$得到信源的信息，这个过程如图\ref{fig:5-13}所示。
@@ -578,7 +578,7 @@ g(\vectorn{s},\vectorn{t}) \equiv \prod_{j,i \in \widehat{A}}{\funp{P}(s_j,t_i)}
 \parinterval 公式\ref{eq:5-16}展示了IBM模型最基础的建模方式，它把模型分解为两项：（反向）翻译模型$\funp{P}(\vectorn{s}|\vectorn{t})$和语言模型$\funp{P}(\vectorn{t})$。一个很自然的问题是：直接用$\funp{P}(\vectorn{t}|\vectorn{s})$定义翻译问题不就可以了吗，为什么要用$\funp{P}(\vectorn{s}|\vectorn{t})$和$\funp{P}(\vectorn{t})$的联合模型？从理论上来说，正向翻译模型$\funp{P}(\vectorn{t}|\vectorn{s})$和反向翻译模型$\funp{P}(\vectorn{s}|\vectorn{t})$的数学建模可以是一样的，因为我们只需要在建模的过程中把两个语言调换即可。使用$\funp{P}(\vectorn{s}|\vectorn{t})$和$\funp{P}(\vectorn{t})$的联合模型的意义在于引入了语言模型，它可以很好的对译文的流畅度进行评价，确保结果是通顺的目标语言句子。
-\parinterval 可以回忆一下\ref{sec:sentence-level-translation}节中讨论的问题，如果只使用翻译模型可能会造成一个局面：译文的单词都和源语言单词对应的很好，但是由于语序的问题，读起来却不像人说的话。从这个角度说，引入语言模型是十分必要的。这个问题在Brown等人的论文中也有讨论\cite{Peter1993The}，他们提到单纯使用$\funp{P}(\vectorn{s}|\vectorn{t})$会把概率分配给一些翻译对应比较好但是不合法的目标语句子，而且这部分概率可能会很大，影响模型的决策。这也正体现了IBM模型的创新之处，作者用数学技巧把$\funp{P}(\vectorn{t})$引入进来，保证了系统的输出是通顺的译文。语言模型也被广泛使用在语音识别等领域以保证结果的流畅性，甚至应用的历史比机器翻译要长得多，这里的方法也有借鉴相关工作的味道。
+\parinterval 可以回忆一下\ref{sec:sentence-level-translation}节中讨论的问题，如果只使用翻译模型可能会造成一个局面：译文的单词都和源语言单词对应的很好，但是由于语序的问题，读起来却不像人说的话。从这个角度说，引入语言模型是十分必要的。这个问题在Brown等人的论文中也有讨论\cite{DBLP:journals/coling/BrownPPM94}，他们提到单纯使用$\funp{P}(\vectorn{s}|\vectorn{t})$会把概率分配给一些翻译对应比较好但是不合法的目标语句子，而且这部分概率可能会很大，影响模型的决策。这也正体现了IBM模型的创新之处，作者用数学技巧把$\funp{P}(\vectorn{t})$引入进来，保证了系统的输出是通顺的译文。语言模型也被广泛使用在语音识别等领域以保证结果的流畅性，甚至应用的历史比机器翻译要长得多，这里的方法也有借鉴相关工作的味道。
 实际上，在机器翻译中引入语言模型是一个很深刻的概念。在IBM模型之后相当长的时间里，语言模型一直是机器翻译各个部件中最重要的部分。对译文连贯性的建模也是所有系统中需要包含的内容（即使隐形体现）。
@@ -1088,18 +1088,21 @@ c_{\mathbb{E}}(s_u|t_v)=\sum\limits_{i=1}^{N}  c_{\mathbb{E}}(s_u|t_v;s^{[i]},t^
 \sectionnewpage
 \section{小结及深入阅读}
-\parinterval 本章对IBM系列模型中的IBM模型1进行了详细的介绍和讨论，从一个简单的基于单词的翻译模型开始，本章从建模、解码、训练多个维度对统计机器翻译进行了描述，期间涉及了词对齐、优化等多个重要概念。IBM模型共分为5个模型，对翻译问题的建模依次由浅入深，同时模型复杂度也依次增加，我们将在下一章对IBM模型2-5进行详细的介绍和讨论。IBM模型作为入门统计机器翻译的``必经之路''，其思想对今天的机器翻译仍然产生着影响。虽然单独使用IBM模型进行机器翻译现在已经不多见，甚至很多从事神经机器翻译等前沿研究的人对IBM模型已经逐渐淡忘，但是不能否认IBM模型标志着一个时代的开始。从某种意义上讲，当使用公式$\hat{\vectorn{t}} = \argmax_{\vectorn{t}} \funp{P}(\vectorn{t}|\vectorn{s})$描述机器翻译问题的时候，或多或少都在与IBM模型使用相似的思想。
+\parinterval 本章对IBM系列模型中的IBM模型1进行了详细的介绍和讨论，从一个简单的基于单词的翻译模型开始，本章从建模、解码、训练多个维度对统计机器翻译进行了描述，期间涉及了词对齐、优化等多个重要概念。IBM模型共分为5个模型，对翻译问题的建模依次由浅入深，同时模型复杂度也依次增加，我们将在{\chaptersix}对IBM模型2-5进行详细的介绍和讨论。IBM模型作为入门统计机器翻译的``必经之路''，其思想对今天的机器翻译仍然产生着影响。虽然单独使用IBM模型进行机器翻译现在已经不多见，甚至很多从事神经机器翻译等前沿研究的人对IBM模型已经逐渐淡忘，但是不能否认IBM模型标志着一个时代的开始。从某种意义上讲，当使用公式$\hat{\vectorn{t}} = \argmax_{\vectorn{t}} \funp{P}(\vectorn{t}|\vectorn{s})$描述机器翻译问题的时候，或多或少都在与IBM模型使用相似的思想。
-{\color{red}词对齐需要扩充，还不太清楚具体是什么，需要问老师}
+\parinterval 当然，本书也无法涵盖IBM模型的所有内涵，很多内容需要感兴趣的读者继续研究和挖掘。其中最值得关注的是统计词对齐问题。由于词对齐是IBM模型训练的间接产物，因此IBM模型成为了自动词对齐的重要方法。比如IBM模型训练装置GIZA++更多的是被用于自动词对齐任务，而非简单的训练IBM模型参数\upcite{och2003systematic}。
-\parinterval 当然，本书也无法涵盖IBM模型的所有内涵，很多内容需要感兴趣的读者继续研究和挖掘，有两个方向可以考虑：
 \begin{itemize}
 \vspace{0.5em}
-\item IBM模型在提出后的十余年中，一直受到了学术界的关注。一个比较有代表性的成果是GIZA++（\url{https://github.com/moses-smt/giza-pp}），它集成了IBM模型和隐马尔可夫模型，并实现了这些模型的训练。在随后相当长的一段时间里，GIZA++也是机器翻译研究的标配，用于获得双语平行数据上单词一级的对齐结果。此外，研究者也对IBM模型进行了大量的分析，为后人研究统计机器翻译提供了大量依据\cite{och2004alignment}。虽然IBM模型很少被独立使用，甚至直接用基于IBM模型的解码器也不多见，但是它通常会作为其他模型的一部分参与到对翻译的建模中。这部分工作会在下一章{\color{red}基于短语和句法的模型}中进行讨论\cite{koehn2003statistical}。此外，IBM模型也给机器翻译提供了一种非常简便的计算双语词串对应好坏的方式，因此也被广泛用于度量双语词串对应的强度，是自然语言处理中的一种常用特征。
+\item 在IBM基础模型之上，有很多改进的工作。例如，对空对齐、低频词进行额外处理\upcite{DBLP:conf/acl/Moore04}；考虑源语言-目标语言和目标语言-源语言双向词对齐进行更好地词对齐对称化\upcite{肖桐1991面向统计机器翻译的重对齐方法研究}；使用词典、命名实体等多种信息对模型进行改进\upcite{2005Improving}；通过引入短语增强IBM基础模型\upcite{1998Grammar}；引入相邻单词对齐之间的依赖关系增加模型鲁棒性\upcite{DBLP:conf/acl-vlc/DaganCG93}等；也可以对IBM模型的正向和反向结果进行对称化处理，以得到更加准确词对齐结果\upcite{och2003systematic}。
+\item 随着词对齐概念的不断深入，也有很多词对齐方面的工作并不依赖IBM模型。比如，可以直接使用判别式模型利用分类器解决词对齐问题\upcite{ittycheriah2005maximum}；使用带参数控制的动态规划方法来提高词对齐准确率\upcite{DBLP:conf/naacl/GaleC91}；甚至可以把对齐的思想用于短语和句法结构的双语对应\upcite{xiao2013unsupervised}；无监督的对称词对齐方法，正向和反向模型联合训练，结合数据的相似性\upcite{DBLP:conf/naacl/LiangTK06}；除了GIZA++，研究人员也开发了很多优秀的自动对齐工具，比如，FastAlign\upcite{DBLP:conf/naacl/DyerCS13}、Berkeley Aligner（\url{https://github.com/mhajiloo/berkeleyaligner}）等，这些工具现在也有很广发的应用。
 \vspace{0.5em}
-\item 除了在机器翻译建模上的开创性工作，IBM模型的另一项重要贡献是建立了统计词对齐的基础模型。在训练IBM模型的过程中，除了学习到模型参数，还可以得到双语数据上的词对齐结果。也就是说词对齐标注是IBM模型训练的间接产物。这也使得IBM模型成为了自动词对齐的重要方法。包括GIZA++在内的很多工作，实际上更多的是被用于自动词对齐任务，而非简单的训练IBM模型参数。随着词对齐概念的不断深入，这个任务逐渐成为了自然语言处理中的重要分支，比如，对IBM模型的结果进行对称化\cite{och2003systematic}，也可以直接使用判别式模型利用分类模型解决词对齐问题\cite{ittycheriah2005maximum}，甚至可以把对齐的思想用于短语和句法结构的双语对应\cite{xiao2013unsupervised}。除了GIZA++，研究人员也开发了很多优秀的自动词对齐工具，比如，FastAlign （\url{https://github.com/clab/fast_align}）、Berkeley Aligner（\url{https://github.com/mhajiloo/berkeleyaligner}）等，这些工具现在也有很广泛的应用。
+\item 一种较为通用的词对齐评价标准是{\bfnew{对齐错误率}}(Alignment Error Rate, AER)\upcite{DBLP:journals/coling/FraserM07}。在此基础之上也可以对词对齐评价方法进行改进，以提高对齐质量与机器翻译评价得分BLEU的相关性\upcite{DBLP:conf/acl/DeNeroK07,paul2007all,黄书剑2009一种错误敏感的词对齐评价方法}。也有工作通过统计机器翻译系统性能的提升来评价对齐质量\upcite{DBLP:journals/coling/FraserM07}。不过，在相当长的时间内，词对齐质量对机器翻译系统的影响究竟如何并没有统一的结论。有些时候，词对齐的错误率下降了，但是机器翻译系统的译文品质没有带来性能提升。但是，这个问题比较复杂，需要进一步的论证。不过，可以肯定的是，词对齐可以帮助人们分析机器翻译的行为。甚至在最新的神经机器翻译中，如何在神经网络模型中寻求两种语言单词之间的对应关系也是对模型进行解释的有效手段之一\upcite{DBLP:journals/corr/FengLLZ16}。
 \vspace{0.5em}
+\item 基于单词的翻译模型的解码问题也是早期研究者所关注的。比较经典的方法的是贪婪方法\upcite{germann2003greedy}。也有研究者对不同的解码方法进行了对比\upcite{germann2001fast}，并给出了一些加速解码的思路。随后，也有工作进一步对这些方法进行改进\upcite{DBLP:conf/coling/UdupaFM04,DBLP:conf/naacl/RiedelC09}。实际上，基于单词的模型的解码是一个NP完全问题\upcite{knight1999decoding}，这也是为什么机器翻译的解码十分困难的原因。关于翻译模型解码算法的时间复杂度也有很多讨论\upcite{DBLP:conf/eacl/UdupaM06,DBLP:conf/emnlp/LeuschMN08,DBLP:journals/mt/FlemingKN15}。
 \end{itemize}

--- a/Chapter6/chapter6.tex
+++ b/Chapter6/chapter6.tex
@@ -34,7 +34,7 @@
 \sectionnewpage
 \section{基于扭曲度的翻译模型}
-下面将介绍扭曲度在机器翻译中的定义及使用方法。这也带来了两个新的翻译模型\ \dash\ IBM模型2\upcite{Peter1993The}和HMM翻译模型\upcite{vogel1996hmm}。
+下面将介绍扭曲度在机器翻译中的定义及使用方法。这也带来了两个新的翻译模型\ \dash\ IBM模型2\upcite{DBLP:journals/coling/BrownPPM94}和HMM翻译模型\upcite{vogel1996hmm}。
 %----------------------------------------------------------------------------------------
 %    NEW SUB-SECTION
@@ -71,7 +71,7 @@
 %----------------------------------------------------------------------------------------
 \subsection{IBM模型2}
-\parinterval 对于建模来说，IBM模型1很好地化简了翻译问题，但是由于使用了很强的假设，导致模型和实际情况有较大差异。其中一个比较严重的问题是假设词对齐的生成概率服从均匀分布。IBM模型2抛弃了这个假设\upcite{Peter1993The}。它认为词对齐是有倾向性的，它与源语言单词的位置和目标语言单词的位置有关。具体来说，对齐位置$a_j$的生成概率与位置$j$、源语言句子长度$m$和目标语言句子长度$l$有关，形式化表述为：
+\parinterval 对于建模来说，IBM模型1很好地化简了翻译问题，但是由于使用了很强的假设，导致模型和实际情况有较大差异。其中一个比较严重的问题是假设词对齐的生成概率服从均匀分布。IBM模型2抛弃了这个假设\upcite{DBLP:journals/coling/BrownPPM94}。它认为词对齐是有倾向性的，它与源语言单词的位置和目标语言单词的位置有关。具体来说，对齐位置$a_j$的生成概率与位置$j$、源语言句子长度$m$和目标语言句子长度$l$有关，形式化表述为：
 \begin{eqnarray}
 \funp{P}(a_j|a_1^{j-1},s_1^{j-1},m,\vectorn{t}) \equiv a(a_j|j,m,l)
@@ -93,7 +93,7 @@
 \begin{eqnarray}
 \funp{P}(m|\vectorn{t}) & \equiv & \varepsilon \label{eq:s-len-gen-prob} \\
-\funp{P}(s_j|a_1^{j},s_1^{j-1},m,\vectorn{t}) & \equiv & f(s_j|t_{a_j}) 
+\funp{P}(s_j|a_1^{j},s_1^{j-1},m,\vectorn{t}) & \equiv & f(s_j|t_{a_j})
 \label{eq:s-word-gen-prob}
 \end{eqnarray}
@@ -173,7 +173,7 @@
 \parinterval 从前面的介绍可知，IBM模型1和模型2把不同的源语言单词看作相互独立的单元来进行词对齐和翻译。换句话说，即使某个源语言短语中的两个单词都对齐到同一个目标语单词，它们之间也是相互独立的。这样IBM模型1和模型2对于多个源语言单词对齐到同一个目标语单词的情况并不能很好地进行描述。
-\parinterval 这里将会给出另一个翻译模型，能在一定程度上解决上面提到的问题\upcite{Peter1993The,och2003systematic}。该模型把目标语言生成源语言的过程分解为如下几个步骤：首先，确定每个目标语言单词生成源语言单词的个数，这里把它称为{\small\sffamily\bfseries{繁衍率}}\index{繁衍率}或{\small\sffamily\bfseries{产出率}}\index{产出率}（Fertility）\index{Fertility}；其次，决定目标语言句子中每个单词生成的源语言单词都是什么，即决定生成的第一个源语言单词是什么，生成的第二个源语言单词是什么，以此类推。这样每个目标语言单词就对应了一个源语言单词列表；最后把各组源语言单词列表中的每个单词都放置到合适的位置上，完成目标语言译文到源语言句子的生成。
+\parinterval 这里将会给出另一个翻译模型，能在一定程度上解决上面提到的问题\upcite{DBLP:journals/coling/BrownPPM94,och2003systematic}。该模型把目标语言生成源语言的过程分解为如下几个步骤：首先，确定每个目标语言单词生成源语言单词的个数，这里把它称为{\small\sffamily\bfseries{繁衍率}}\index{繁衍率}或{\small\sffamily\bfseries{产出率}}\index{产出率}（Fertility）\index{Fertility}；其次，决定目标语言句子中每个单词生成的源语言单词都是什么，即决定生成的第一个源语言单词是什么，生成的第二个源语言单词是什么，以此类推。这样每个目标语言单词就对应了一个源语言单词列表；最后把各组源语言单词列表中的每个单词都放置到合适的位置上，完成目标语言译文到源语言句子的生成。
 \parinterval 对于句对$(\vectorn{s},\vectorn{t})$，令$\varphi$表示产出率，同时令${\tau}$表示每个目标语言单词对应的源语言单词列表。图{\ref{fig:6-5}}描述了一个英语句子生成汉语句子的过程。
@@ -320,7 +320,7 @@ p_0+p_1                            & = & 1 \label{eq:6-21}
 \parinterval IBM模型3仍然存在问题，比如，它不能很好地处理一个目标语言单词生成多个源语言单词的情况。这个问题在模型1和模型2中也存在。如果一个目标语言单词对应多个源语言单词，往往这些源语言单词构成短语或搭配。但是模型1-3把这些源语言单词看成独立的单元，而实际上它们是一个整体。这就造成了在模型1-3中这些源语言单词可能会``分散''开。为了解决这个问题，模型4对模型3进行了进一步修正。
-\parinterval 为了更清楚的阐述，这里引入新的术语\ \dash \ {\small\bfnew{概念单元}}\index{概念单元}或{\small\bfnew{概念}}\index{概念}（Concept）\index{Concept}。词对齐可以被看作概念之间的对应。这里的概念是指具有独立语法或语义功能的一组单词。依照Brown等人的表示方法\upcite{Peter1993The}，可以把概念记为cept.。每个句子都可以被表示成一系列的cept.。这里要注意的是，源语言句子中的cept.数量不一定等于目标句子中的cept.数量。因为有些cept. 可以为空，因此可以把那些空对的单词看作空cept.。比如，在图\ref{fig:6-8}的实例中，``了''就对应一个空cept.。
+\parinterval 为了更清楚的阐述，这里引入新的术语\ \dash \ {\small\bfnew{概念单元}}\index{概念单元}或{\small\bfnew{概念}}\index{概念}（Concept）\index{Concept}。词对齐可以被看作概念之间的对应。这里的概念是指具有独立语法或语义功能的一组单词。依照Brown等人的表示方法\upcite{DBLP:journals/coling/BrownPPM94}，可以把概念记为cept.。每个句子都可以被表示成一系列的cept.。这里要注意的是，源语言句子中的cept.数量不一定等于目标句子中的cept.数量。因为有些cept. 可以为空，因此可以把那些空对的单词看作空cept.。比如，在图\ref{fig:6-8}的实例中，``了''就对应一个空cept.。
 %----------------------------------------------
 \begin{figure}[htp]
@@ -471,17 +471,15 @@ p_0+p_1                            & = & 1 \label{eq:6-21}
 \sectionnewpage
 \section{小结及深入阅读}
-{\color{red}产出率需要增加}
+本章在IBM模型1的基础上进一步介绍了IBM模型2-5以及HMM模型。同时，本章引入了两个新的概念\ \dash\ 扭曲度和繁衍率。它们都是机器翻译中的经典概念，也经常出现在机器翻译的建模中。另一方面，通过对上述模型的分析，本章进一步探讨建模中的若干基础问题，例如，如何把翻译问题分解为若干步骤，并建立合理的模型解释这些步骤；如何对复杂问题进行化简，以得到可以计算的模型等等。这些思想也在很多自然语言处理问题中被使用。此外，关于扭曲度和繁衍率还有一些问题值得关注：
-\parinterval 本章对IBM系列模型进行了全面的介绍和讨论，从一个简单的基于单词的翻译模型开始，本章以建模、解码、训练多个维度对统计机器翻译进行了描述，期间也涉及了词对齐、优化等多个重要概念。IBM 模型共分为5个模型，对翻译问题的建模依次由浅入深，同时模型复杂度也依次增加。IBM模型作为入门统计机器翻译的``必经之路''，其思想对今天的机器翻译仍然产生着影响。虽然单独使用IBM模型进行机器翻译现在已经不多见，甚至很多从事神经机器翻译等前沿研究的人对IBM模型已经逐渐淡忘，但是不能否认IBM模型标志着一个时代的开始。从某种意义上，当使用公式$\hat{\vectorn{t}} = \argmax_{\vectorn{t}} \funp{P}(\vectorn{t}|\vectorn{s})$描述机器翻译问题的时候，或多或少都在与IBM模型使用相似的思想。
-\parinterval 当然，本书也无法涵盖IBM模型的所有内涵，很多内容需要感兴趣的读者继续研究和挖掘，有两个方向可以考虑：
 \begin{itemize}
 \vspace{0.5em}
-\item IBM模型在提出后的十余年中，一直受到了学术界的关注。一个比较有代表性的成果是GIZA++（\url{https://github.com/moses-smt/giza-pp}），它集成了IBM模型和隐马尔可夫模型，并实现了这些模型的训练。在随后相当长的一段时间里，GIZA++也是机器翻译研究的标配，用于获得双语平行数据上单词一级的对齐结果。此外，研究者也对IBM模型进行了大量的分析，为后人研究统计机器翻译提供了大量依据\upcite{och2004alignment}。虽然IBM模型很少被独立使用，甚至直接用基于IBM模型的解码器也不多见，但是它通常会作为其他模型的一部分参与到对翻译的建模中。这部分工作会在下一章基于短语和句法的模型中进行讨论\upcite{koehn2003statistical}。此外，IBM模型也给机器翻译提供了一种非常简便的计算双语词串对应好坏的方式，因此也被广泛用于度量双语词串对应的强度，是自然语言处理中的一种常用特征。
+\item 扭曲度是机器翻译中的一个经典概念。广义上来说，事物位置的变换都可以用扭曲度进行描述，比如，在物理成像系统中，扭曲度模型可以帮助进行镜头校正\upcite{1966Decentering,ClausF05}。在机器翻译中，扭曲度本质上在描述源语言和目标源单词顺序的偏差。这种偏差可以用于对调序的建模。因此扭曲度的使用也可以被看做是一种对调序问题的描述，这也是机器翻译区别于语音识别等任务的主要因素之一。在早期的统计机器翻译系统中，如Pharaoh\upcite{DBLP:conf/amta/Koehn04}，大量使用了扭曲度这个概念。虽然，随着机器翻译的发展，更复杂的调序模型被提出\upcite{Gros2008MSD,xiong2006maximum,och2004alignment,DBLP:conf/naacl/KumarB05,li-etal-2014-neural,vaswani2017attention}，但是扭曲度所引发的对调序问题的思考是非常深刻的，这也是IBM模型最大的贡献之一。
 \vspace{0.5em}
-\item 除了在机器翻译建模上的开创性工作，IBM模型的另一项重要贡献是建立了统计词对齐的基础模型。在训练IBM模型的过程中，除了学习到模型参数，还可以得到双语数据上的词对齐结果。也就是说词对齐标注是IBM模型训练的间接产物。这也使得IBM模型成为了自动词对齐的重要方法。包括GIZA++在内的很多工作，实际上更多的是被用于自动词对齐任务，而非简单的训练IBM模型参数。随着词对齐概念的不断深入，这个任务逐渐成为了自然语言处理中的重要分支，比如，对IBM模型的结果进行对称化\upcite{och2003systematic}，也可以直接使用判别式模型利用分类模型解决词对齐问题\upcite{ittycheriah2005maximum}，甚至可以把对齐的思想用于短语和句法结构的双语对应\upcite{xiao2013unsupervised}。除了GIZA++，研究人员也开发了很多优秀的自动词对齐工具，比如，FastAlign （\url{https://github.com/clab/fast_align}）、Berkeley Aligner（\url{https://github.com/mhajiloo/berkeleyaligner}）等，这些工具现在也有很广泛的应用。
+\item IBM模型的另一个贡献是在机器翻译中引入了繁衍率的概念。本质上，繁衍率是一种对翻译长度的建模。在IBM模型中，通过计算单词的繁衍率就可以得到整个句子的长度。需要注意的是，在机器翻译中译文长度对翻译性能有着至关重要的影响。虽然，在很多机器翻译模型中并没有直接使用繁衍率这个概念，但是几乎所有的现代机器翻译系统中都有译文长度的控制模块。比如，在统计机器翻译和神经机器翻译中，都把译文单词数量作为一个特征用于生成合理长度的译文\upcite{Koehn2007Moses,ChiangLMMRS05,bahdanau2014neural}。此外，在神经机器翻译中，非自回归的解码中也使用繁衍率模型对译文长度进行预测\ref{2018Non}。
 \vspace{0.5em}
 \end{itemize}

--- a/Chapter7/Figures/figure-basic-process-of-translation.tex
+++ b/Chapter7/Figures/figure-basic-process-of-translation.tex
@@ -4,12 +4,12 @@
 \begin{scope}[minimum height = 18pt]
-\node[anchor=east] (s0) at (-0.5em, 0) {$\textbf{s}$:};
+\node[anchor=east] (s0) at (-0.5em, 0) {$\seq{s}$：};
 \node[anchor=west,fill=gray!20] (s1) at (0, 0) {\footnotesize{桌子 上}};
 \node[anchor=west,fill=gray!20] (s2) at ([xshift=1em]s1.east) {\footnotesize{有}};
 \node[anchor=west,fill=gray!20] (s3) at ([xshift=1em]s2.east) {\footnotesize{一个 苹果}};
-\node[anchor=east] (t0) at (-0.5em, -1.5) {$\textbf{t}$:};
+\node[anchor=east] (t0) at (-0.5em, -1.5) {$\seq{t}$：};
 \node[anchor=north] (l) at ([xshift=7em,yshift=-0.5em]t0.south) {\footnotesize{(a)\ }};
 \end{scope}
@@ -18,12 +18,12 @@
 \begin{scope}[xshift=15em,minimum height = 18pt]
-\node[anchor=east] (s0) at (-0.5em, 0) {$\textbf{s}$:};
+\node[anchor=east] (s0) at (-0.5em, 0) {$\seq{s}$：};
 \node[anchor=west,fill=gray!20] (s1) at (0, 0) {\footnotesize{桌子 上}};
 \node[anchor=west,fill=red!20] (s2) at ([xshift=1em]s1.east) {\footnotesize{有}};
 \node[anchor=west,fill=gray!20] (s3) at ([xshift=1em]s2.east) {\footnotesize{一个 苹果}};
-\node[anchor=east] (t0) at (-0.5em, -1.5) {$\textbf{t}$:};
+\node[anchor=east] (t0) at (-0.5em, -1.5) {$\seq{t}$：};
 {
 \node[anchor=west,fill=red!20] (t1) at (0, -1.5) {\footnotesize{There is}};
 \path[<->, thick] (s2.south) edge (t1.north);
@@ -36,12 +36,12 @@
 \begin{scope}[yshift=-9.5em,minimum height = 18pt]
-\node[anchor=east] (s0) at (-0.5em, 0) {$\textbf{s}$:};
+\node[anchor=east] (s0) at (-0.5em, 0) {$\seq{s}$：};
 \node[anchor=west,fill=gray!20] (s1) at (0, 0) {\footnotesize{桌子 上}};
 \node[anchor=west,fill=gray!20] (s2) at ([xshift=1em]s1.east) {\footnotesize{有}};
 \node[anchor=west,fill=red!20] (s3) at ([xshift=1em]s2.east) {\footnotesize{一个 苹果}};
-\node[anchor=east] (t0) at (-0.5em, -1.5) {$\textbf{t}$:};
+\node[anchor=east] (t0) at (-0.5em, -1.5) {$\seq{t}$：};
 {
 \node[anchor=west,fill=gray!20] (t1) at (0, -1.5) {\footnotesize{There is}};
 \path[<->, thick] (s2.south) edge (t1.north);
@@ -58,12 +58,12 @@
 \begin{scope}[xshift=15em,yshift=-9.5em,minimum height = 18pt]%[scale=0.5]
-\node[anchor=east] (s0) at (-0.5em, 0) {$\textbf{s}$:};
+\node[anchor=east] (s0) at (-0.5em, 0) {$\seq{s}$：};
 \node[anchor=west,fill=red!20] (s1) at (0, 0) {\footnotesize{桌子 上}};
 \node[anchor=west,fill=gray!20] (s2) at ([xshift=1em]s1.east) {\footnotesize{有}};
 \node[anchor=west,fill=gray!20] (s3) at ([xshift=1em]s2.east) {\footnotesize{一个 苹果}};
-\node[anchor=east] (t0) at (-0.5em, -1.5) {$\textbf{t}$:};
+\node[anchor=east] (t0) at (-0.5em, -1.5) {$\seq{t}$：};
 {
 \node[anchor=west,fill=gray!20] (t1) at (0, -1.5) {\footnotesize{There is}};
 \path[<->, thick] (s2.south) edge (t1.north);

--- a/Chapter7/Figures/figure-derivation-consist-of-bilingual-phrase.tex
+++ b/Chapter7/Figures/figure-derivation-consist-of-bilingual-phrase.tex
@@ -19,8 +19,8 @@
 \path[<->, thick] (s3.south) edge (t3.north);
 }
-\node[anchor=south] (s0) at ([xshift=-2em,yshift=0em]s1.south) {\textbf{s:}};
+\node[anchor=south] (s0) at ([xshift=-2em,yshift=0em]s1.south) {$\seq{s}$：};
-\node[anchor=east] (t0) at ([xshift=0em,yshift=-3.5em]s0.east) {\textbf{t:}};
+\node[anchor=east] (t0) at ([xshift=0em,yshift=-3.5em]s0.east) {$\seq{t}$：};
 \node[anchor=south,inner sep=0pt,yshift=-0.3em] (sp1) at (s1.north) {\footnotesize{$\bar{s}_{a_1 = 1}$}};
 \node[anchor=south,inner sep=0pt,yshift=-0.3em] (sp2) at (s2.north) {\footnotesize{$\bar{s}_{a_2 = 2}$}};

--- a/Chapter7/Figures/figure-example-of-hypothesis-recombination.tex
+++ b/Chapter7/Figures/figure-example-of-hypothesis-recombination.tex
@@ -5,7 +5,7 @@
 {
 \node [anchor=north,inner sep=2pt,fill=red!20,minimum height=2em,minimum width=3em] (h0) at (0,0) {\small{null}};
 \node [anchor=north west,inner sep=1.0pt,fill=black] (hl0) at (h0.north west) {\scriptsize{{\color{white} \textbf{0}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt0) at (h0.east) {\footnotesize{{\color{white} \textbf{P=1}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt0) at (h0.east) {\footnotesize{{\color{white} \textbf{$\funp{P}$=1}}}};
 \node [anchor=west,inner sep=2pt,fill=red!20,minimum height=2em,minimum width=3em] (h2) at ([xshift=2.2em,yshift=3.5em]h0.east) {\small{an}};
 \node [anchor=west,inner sep=2pt,fill=red!20,minimum height=2em,minimum width=3em] (h3) at ([xshift=2.2em]h2.east) {\small{apple}};
@@ -13,8 +13,8 @@
 \node [anchor=north west,inner sep=1.0pt,fill=black] (hl2) at (h2.north west) {\scriptsize{{\color{white} \textbf{1}}}};
 \node [anchor=north west,inner sep=1.0pt,fill=black] (hl3) at (h3.north west) {\scriptsize{{\color{white} \textbf{2}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt2) at (h2.east) {\footnotesize{{\color{white} \textbf{P=.3}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt2) at (h2.east) {\footnotesize{{\color{white} \textbf{$\funp{P}$=.3}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt3) at (h3.east) {\footnotesize{{\color{white} \textbf{P=.5}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt3) at (h3.east) {\footnotesize{{\color{white} \textbf{$\funp{P}$=.5}}}};
 \draw [->,very thick,ublue] ([xshift=0.1em]pt0.south) -- ([xshift=-0.1em]h2.west);
 \draw [->,very thick,ublue] ([xshift=0.1em]pt2.south) -- ([xshift=-0.1em]h3.west);
@@ -22,12 +22,12 @@
 {
 \node [anchor=west,inner sep=2pt,fill=red!20,minimum height=2em,minimum width=3em] (h1) at ([xshift=7em]h0.east) {\small{an apple}};
 \node [anchor=north west,inner sep=1.0pt,fill=black] (hl1) at (h1.north west) {\scriptsize{{\color{white} \textbf{1-2}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt1) at (h1.east) {\footnotesize{{\color{white} \textbf{P=.5}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt1) at (h1.east) {\footnotesize{{\color{white} \textbf{$\funp{P}$=.5}}}};
 \draw [->,very thick,ublue] ([xshift=0.1em]pt0.south) -- ([xshift=-0.1em]h1.west);
 }
 }
 {
-\node [anchor=north west,inner sep=2pt,fill=red!20,minimum height=2em,minimum width=3em] (h4) at ([yshift=-7em]h0.south west) {\small{null}};
+\node [anchor=north west,inner sep=2pt,fill=red!20,minimum height=2em,minimum width=3em] (h4) at ([yshift=-9em]h0.south west) {\small{null}};
 \node [anchor=west,inner sep=2pt,fill=red!20,minimum height=2em,minimum width=3em] (h5) at ([xshift=2.2em]h4.east) {\small{he}};
 \node [anchor=west,inner sep=2pt,fill=red!20,minimum height=2em,minimum width=3em] (h6) at ([xshift=2.2em,yshift=3.5em]h4.east) {\small{it}};
 \node [anchor=west,inner sep=2pt,fill=red!20,minimum height=2em,minimum width=3em] (h8) at ([xshift=2.2em]h6.east) {\small{is not}};
@@ -37,10 +37,10 @@
 \node [anchor=north west,inner sep=1.0pt,fill=black] (hl5) at (h6.north west) {\scriptsize{{\color{white} \textbf{1}}}};
 \node [anchor=north west,inner sep=1.0pt,fill=black] (hl5) at (h8.north west) {\scriptsize{{\color{white} \textbf{2}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt4) at (h4.east) {\footnotesize{{\color{white} \textbf{P=1}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt4) at (h4.east) {\footnotesize{{\color{white} \textbf{$\funp{P}$=1}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt5) at (h5.east) {\footnotesize{{\color{white} \textbf{P=.3}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt5) at (h5.east) {\footnotesize{{\color{white} \textbf{$\funp{P}$=.3}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt6) at (h6.east) {\footnotesize{{\color{white} \textbf{P=.4}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt6) at (h6.east) {\footnotesize{{\color{white} \textbf{$\funp{P}$=.4}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt8) at (h8.east) {\footnotesize{{\color{white} \textbf{P=.2}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt8) at (h8.east) {\footnotesize{{\color{white} \textbf{$\funp{P}$=.2}}}};
 \draw [->,very thick,ublue] ([xshift=0.1em]pt4.south) -- ([xshift=-0.1em]h5.west);
 \draw [->,very thick,ublue] ([xshift=0.1em]pt4.south) -- ([xshift=-0.1em]h6.west);
@@ -48,15 +48,15 @@
 {
 \node [anchor=west,inner sep=2pt,fill=red!20,minimum height=2em,minimum width=3em] (h7) at ([xshift=2.2em]h5.east) {\small{is not}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt7) at (h7.east) {\footnotesize{{\color{white} \textbf{P=.2}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt7) at (h7.east) {\footnotesize{{\color{white} \textbf{$\funp{P}$=.2}}}};
 \node [anchor=north west,inner sep=1.0pt,fill=black] (hl5) at (h7.north west) {\scriptsize{{\color{white} \textbf{2}}}};
 \draw [->,very thick,ublue] ([xshift=0.1em]pt5.south) -- ([xshift=-0.1em]h7.west);
 }
 }
-\node[anchor=north] (l1) at ([xshift=6em,yshift=-1em]h0.south) {\scriptsize{(a)\ 原假设（译文相同时）}};
+\node[anchor=north] (l1) at ([xshift=5.5em,yshift=-1em]h0.south) {\scriptsize{原假设}};
-\node[anchor=north] (l2) at ([xshift=6em,yshift=-1em]h4.south) {\scriptsize{(c)\ 原假设（译文不同时）}};
+\node[anchor=north] (l2) at ([xshift=5.5em,yshift=-1em]h4.south) {\scriptsize{原假设}};
-%\node [anchor=west,inner sep=2pt,fill=red!20,minimum height=2em,minimum width=3em,opacity=0.7] (h1) at ([xshift=-1em,yshift=2em]h2.north) {原假设};
+\node[anchor=north] (part1) at ([xshift=16em,yshift=-2em]h0.south){\scriptsize{（a）译文相同时的假设重组}};
 \end{scope}
@@ -68,7 +68,7 @@
 {
 \node [anchor=north,inner sep=2pt,fill=red!20,minimum height=2em,minimum width=3em] (h0) at (0,0) {\small{null}};
 \node [anchor=north west,inner sep=1.0pt,fill=black] (hl0) at (h0.north west) {\scriptsize{{\color{white} \textbf{0}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt0) at (h0.east) {\footnotesize{{\color{white} \textbf{P=1}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt0) at (h0.east) {\footnotesize{{\color{white} \textbf{$\funp{P}$=1}}}};
 \node [anchor=west,inner sep=2pt,fill=red!20,minimum height=2em,minimum width=3em] (h2) at ([xshift=2.2em,yshift=3.5em]h0.east) {\small{an}};
 \node [anchor=west,inner sep=2pt,fill=red!20,minimum height=2em,minimum width=3em] (h3) at ([xshift=2.2em]h2.east) {\small{apple}};
@@ -76,8 +76,8 @@
 \node [anchor=north west,inner sep=1.0pt,fill=black] (hl2) at (h2.north west) {\scriptsize{{\color{white} \textbf{1}}}};
 \node [anchor=north west,inner sep=1.0pt,fill=black] (hl3) at (h3.north west) {\scriptsize{{\color{white} \textbf{2}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt2) at (h2.east) {\footnotesize{{\color{white} \textbf{P=.3}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt2) at (h2.east) {\footnotesize{{\color{white} \textbf{$\funp{P}$=.3}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt3) at (h3.east) {\footnotesize{{\color{white} \textbf{P=.5}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt3) at (h3.east) {\footnotesize{{\color{white} \textbf{$\funp{P}$=.5}}}};
 \draw [->,very thick,ublue] ([xshift=0.1em]pt0.south) -- ([xshift=-0.1em]h2.west);
 \draw [->,very thick,ublue] ([xshift=0.1em]pt2.south) -- ([xshift=-0.1em]h3.west);
@@ -87,7 +87,7 @@
 }
 }
 {
-\node [anchor=north west,inner sep=2pt,fill=red!20,minimum height=2em,minimum width=3em] (h4) at ([yshift=-7em]h0.south west) {\small{null}};
+\node [anchor=north west,inner sep=2pt,fill=red!20,minimum height=2em,minimum width=3em] (h4) at ([yshift=-9em]h0.south west) {\small{null}};
 \node [anchor=west,inner sep=2pt,fill=red!20,minimum height=2em,minimum width=3em] (h5) at ([xshift=2.2em]h4.east) {\small{he}};
 \node [anchor=west,inner sep=2pt,fill=red!20,minimum height=2em,minimum width=3em] (h6) at ([xshift=2.2em,yshift=3.5em]h4.east) {\small{it}};
 \node [anchor=west,inner sep=2pt,fill=red!20,minimum height=2em,minimum width=3em] (h8) at ([xshift=2.2em]h6.east) {\small{is not}};
@@ -97,10 +97,10 @@
 \node [anchor=north west,inner sep=1.0pt,fill=black] (hl5) at (h6.north west) {\scriptsize{{\color{white} \textbf{1}}}};
 \node [anchor=north west,inner sep=1.0pt,fill=black] (hl5) at (h8.north west) {\scriptsize{{\color{white} \textbf{2}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt4) at (h4.east) {\footnotesize{{\color{white} \textbf{P=1}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt4) at (h4.east) {\footnotesize{{\color{white} \textbf{$\funp{P}$=1}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt5) at (h5.east) {\footnotesize{{\color{white} \textbf{P=.3}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt5) at (h5.east) {\footnotesize{{\color{white} \textbf{$\funp{P}$=.3}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt6) at (h6.east) {\footnotesize{{\color{white} \textbf{P=.4}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt6) at (h6.east) {\footnotesize{{\color{white} \textbf{$\funp{P}$=.4}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt8) at (h8.east) {\footnotesize{{\color{white} \textbf{P=.2}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt8) at (h8.east) {\footnotesize{{\color{white} \textbf{$\funp{P}$=.2}}}};
 \draw [->,very thick,ublue] ([xshift=0.1em]pt4.south) -- ([xshift=-0.1em]h5.west);
 \draw [->,very thick,ublue] ([xshift=0.1em]pt4.south) -- ([xshift=-0.1em]h6.west);
@@ -111,17 +111,15 @@
 }
 }
 {
 {
 \node [anchor=west,inner sep=2pt,fill=red!20,minimum height=2em,minimum width=3em,opacity=0.3] (h1) at ([xshift=7em]h0.east) {\small{an apple}};
 \node [anchor=north west,inner sep=1.0pt,fill=black,opacity=0.3] (hl1) at (h1.north west) {\scriptsize{{\color{white} \textbf{1-2}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black,opacity=0.3] (pt1) at (h1.east) {\footnotesize{{\color{white} \textbf{P=.5}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black,opacity=0.3] (pt1) at (h1.east) {\footnotesize{{\color{white} \textbf{$\funp{P}$=.5}}}};
 }
 {
 \node [anchor=west,inner sep=2pt,fill=red!20,minimum height=2em,minimum width=3em,opacity=0.3] (h7) at ([xshift=2.2em]h5.east) {\small{is not}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black,opacity=0.3] (pt7) at (h7.east) {\footnotesize{{\color{white} \textbf{P=.2}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black,opacity=0.3] (pt7) at (h7.east) {\footnotesize{{\color{white} \textbf{$\funp{P}$=.2}}}};
 \node [anchor=north west,inner sep=1.0pt,fill=black,opacity=0.3] (hl5) at (h7.north west) {\scriptsize{{\color{white} \textbf{2}}}};
 }
 }
@@ -132,12 +130,10 @@
 \node [anchor=west] (l21) at ([xshift=0em, yshift=-1em]l2.west) {\footnotesize{较低假设}};
 %\node [anchor=west,inner sep=2pt,fill=red!20,minimum height=2em,minimum width=3em,opacity=0.7] (h1) at ([xshift=-1em,yshift=2em]h2.north) {重组假设};
-\node[anchor=north] (l1) at ([xshift=6em,yshift=-1em]h0.south) {\scriptsize{(c)\ 重组假设（译文相同时）}};
+\node[anchor=north] (l1) at ([xshift=7.5em,yshift=-1em]h0.south) {\scriptsize{重组假设}};
-\node[anchor=north] (l2) at ([xshift=6em,yshift=-1em]h4.south) {\scriptsize{(d)\ 重组假设（译文不同时）}};
+\node[anchor=north] (l2) at ([xshift=7.5em,yshift=-1em]h4.south) {\scriptsize{重组假设}};
+\node[anchor=north] (part2) at ([xshift=0em,yshift=-14em]h0.south){\scriptsize{（b）译文不同时的假设重组}};
 \end{scope}
 \end{tikzpicture}
\ No newline at end of file
--- a/Chapter7/Figures/figure-example-of-n-gram-1.tex
+++ b/Chapter7/Figures/figure-example-of-n-gram-1.tex
@@ -2,17 +2,17 @@
 %%% 引入短语翻译
 {\small
 \begin{tabular}{l | l}
-{\red{词串}}翻译表 & P \\ \hline
+{\red{词串}}翻译表 & $\funp{P}$ \\ \hline
 我 $\to$ I & 0.6 \\
 喜欢 $\to$ like & 0.3 \\
 红 $\to$ red & 0.8 \\
 红 $\to$ black & 0.1 \\
 茶 $\to$ tea & 0.8\\
-我 喜欢 $\to$ I like & 0.3\\
+我/喜欢 $\to$ I like & 0.3\\
-我 喜欢 $\to$ I liked & 0.2\\
+我/喜欢 $\to$ I liked & 0.2\\
-绿 茶 $\to$ green tea & 0.5\\
+绿/茶 $\to$ green tea & 0.5\\
-绿 茶 $\to$ the green tea & 0.1\\
+绿/茶 $\to$ the green tea & 0.1\\
-红 茶 $\to$ black tea & 0.7\\
+红/茶 $\to$ black tea & 0.7\\
 ... & 
 \end{tabular}
 }
--- a/Chapter7/Figures/figure-example-of-n-gram-2.tex
+++ b/Chapter7/Figures/figure-example-of-n-gram-2.tex
@@ -12,7 +12,7 @@
 \node [anchor=west] (s2) at ([xshift=1.0em]s1.east) {喜欢};
 \node [anchor=west] (s3) at ([xshift=1.0em]s2.east) {\red{红}};
 \node [anchor=west] (s4) at ([xshift=1.0em]s3.east) {茶};
-\node [anchor=east] (s) at (s1.west) {$\textbf{s}=$};
+\node [anchor=east] (s) at (s1.west) {$\seq{s}=$};
 }
 \end{scope}
@@ -22,7 +22,7 @@
 \node [anchor=west] (t2) at ([xshift=0.8em,yshift=-0.0em]t1.east) {like};
 \node [anchor=west] (t3) at ([xshift=0.6em,yshift=-0.0em]t2.east) {red};
 \node [anchor=west] (t4) at ([xshift=1.15em,yshift=-0.1em]t3.east) {tea};
-\node [anchor=east] (t) at ([xshift=-0.2em]t1.west) {$\textbf{t}=$};
+\node [anchor=east] (t) at ([xshift=-0.2em]t1.west) {$\seq{t}=$};
 }
 \end{scope}
@@ -44,7 +44,7 @@
 \node [anchor=west] (s2) at ([xshift=1.0em]s1.east) {喜欢};
 \node [anchor=west] (s3) at ([xshift=1.0em]s2.east) {\red{红}};
 \node [anchor=west] (s4) at ([xshift=1.0em]s3.east) {茶};
-\node [anchor=east] (s) at (s1.west) {$\textbf{s}=$};
+\node [anchor=east] (s) at (s1.west) {$\seq{s}=$};
 }
 \end{scope}
@@ -54,7 +54,7 @@
 \node [anchor=west] (t2) at ([xshift=0.8em,yshift=-0.0em]t1.east) {like};
 \node [anchor=west] (t3) at ([xshift=0.6em,yshift=-0.0em]t2.east) {black};
 \node [anchor=west] (t4) at ([xshift=1.0em,yshift=-0.1em]t3.east) {tea};
-\node [anchor=east] (t) at ([xshift=-0.2em]t1.west) {$\textbf{t}=$};
+\node [anchor=east] (t) at ([xshift=-0.2em]t1.west) {$\seq{t}=$};
 }
 \end{scope}

--- a/Chapter7/Figures/figure-example-of-stack-decode.tex
+++ b/Chapter7/Figures/figure-example-of-stack-decode.tex
@@ -6,7 +6,7 @@
 {
 \node [anchor=north,inner sep=2pt,fill=red!20,minimum height=2em,minimum width=3em] (h0) at (0,0) {\scriptsize{null}};
 \node [anchor=north west,inner sep=1.5pt,fill=black] (hl0) at (h0.north west) {\scriptsize{{\color{white} \textbf{0}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt0) at (h0.east) {\scriptsize{{\color{white} \textbf{P=1}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt0) at (h0.east) {\scriptsize{{\color{white} \textbf{$\funp{P}$=1}}}};
 }
 {
 \node [anchor=west,inner sep=2pt,fill=red!20,minimum height=2em,minimum width=3em] (h13) at ([xshift=2.1em,yshift=6em]h0.east) {\scriptsize{there is}};
@@ -17,8 +17,8 @@
 \node [anchor=north west,inner sep=1.0pt,fill=black] (hl3) at (h13.north west) {\scriptsize{{\color{white} \textbf{3}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt1) at (h1.east) {\scriptsize{{\color{white} \textbf{P=.2}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt1) at (h1.east) {\scriptsize{{\color{white} \textbf{$\funp{P}$=.2}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt3) at (h13.east) {\scriptsize{{\color{white} \textbf{P=.5}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt3) at (h13.east) {\scriptsize{{\color{white} \textbf{$\funp{P}$=.5}}}};
 \node [anchor=west,inner sep=2pt,fill=red!20,minimum height=2em,minimum width=3em] (h2) at ([xshift=2.1em]h1.east) {\scriptsize{have}};
 \node [anchor=west,inner sep=2pt,minimum height=2em,minimum width=3em] (h22) at ([xshift=2.1em]h12.east) {\small{\textbf{...}}};
@@ -32,15 +32,15 @@
 \node [anchor=north west,inner sep=1.0pt,fill=black] (hl3) at (h3.north west) {\scriptsize{{\color{white} \textbf{2}}}};
 \node [anchor=north west,inner sep=1.0pt,fill=black] (hl33) at (h33.north west) {\scriptsize{{\color{white} \textbf{4-5}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt2) at (h2.east) {\scriptsize{{\color{white} \textbf{P=.5}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt2) at (h2.east) {\scriptsize{{\color{white} \textbf{$\funp{P}$=.5}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt23) at (h23.east) {\scriptsize{{\color{white} \textbf{P=.5}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt23) at (h23.east) {\scriptsize{{\color{white} \textbf{$\funp{P}$=.5}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt3) at (h3.east) {\scriptsize{{\color{white} \textbf{P=.5}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt3) at (h3.east) {\scriptsize{{\color{white} \textbf{$\funp{P}$=.5}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt33) at (h33.east) {\scriptsize{{\color{white} \textbf{P=.5}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt33) at (h33.east) {\scriptsize{{\color{white} \textbf{$\funp{P}$=.5}}}};
 }
 \node [anchor=north] (l0) at ([xshift=0.2em,yshift=-0.7em]h0.south) {\small{\textbf{未译词}}};
-\node [anchor=north] (l1) at ([xshift=0.3em,yshift=-0.7em]h1.south) {\small{\textbf{已译1词}}};
+\node [anchor=north] (l1) at ([xshift=0.3em,yshift=-0.7em]h1.south) {\small{\textbf{已译}1\textbf{词}}};
-\node [anchor=north] (l2) at ([xshift=0.3em,yshift=-0.7em]h2.south) {\small{\textbf{已译2词}}};
+\node [anchor=north] (l2) at ([xshift=0.3em,yshift=-0.7em]h2.south) {\small{\textbf{已译}2\textbf{词}}};
-\node [anchor=north] (l3) at ([xshift=0.3em,yshift=-0.7em]h3.south) {\small{\textbf{已译3词}}};
+\node [anchor=north] (l3) at ([xshift=0.3em,yshift=-0.7em]h3.south) {\small{\textbf{已译}3\textbf{词}}};
 \begin{pgfonlayer}{background}
 \node [rectangle,inner sep=0.3em,fill=blue!10] [fit = (h0) (pt0)] (box0) {};

--- a/Chapter7/Figures/figure-example-of-translation-base-word-1.tex
+++ b/Chapter7/Figures/figure-example-of-translation-base-word-1.tex
@@ -4,7 +4,7 @@
 {\small
 \begin{tabular}{l | l}
-单词翻译表 & P \\ \hline
+单词翻译表 & $\funp{P}$ \\ \hline
 我 $\to$ I & 0.6 \\
 喜欢 $\to$ like & 0.3 \\
 绿 $\to$ green & 0.9 \\

--- a/Chapter7/Figures/figure-example-of-translation-base-word-2.tex
+++ b/Chapter7/Figures/figure-example-of-translation-base-word-2.tex
@@ -7,7 +7,7 @@
 \node [anchor=west] (s2) at ([xshift=1.0em]s1.east) {喜欢};
 \node [anchor=west] (s3) at ([xshift=1.0em]s2.east) {{\color{ugreen} 绿}};
 \node [anchor=west] (s4) at ([xshift=1.07em]s3.east) {茶};
-\node [anchor=east] (s) at (s1.west) {$\textbf{s}=$};
+\node [anchor=east] (s) at (s1.west) {$\seq{s}=$};
 }
 \end{scope}
@@ -18,7 +18,7 @@
 \node [anchor=west] (t2) at ([xshift=0.8em,yshift=-0.0em]t1.east) {like};
 \node [anchor=west] (t3) at ([xshift=1.0em,yshift=-0.2em]t2.east) {green};
 \node [anchor=west] (t4) at ([xshift=0.78em,yshift=0.1em]t3.east) {tea};
-\node [anchor=east] (t) at ([xshift=-0.3em]t1.west) {$\textbf{t}=$};
+\node [anchor=east] (t) at ([xshift=-0.3em]t1.west) {$\seq{t}=$};
 }
 \end{scope}

--- a/Chapter7/Figures/figure-example-of-translation-black-tea-1.tex
+++ b/Chapter7/Figures/figure-example-of-translation-black-tea-1.tex
@@ -2,7 +2,7 @@
 %%% 基于单词的模型的问题
 {\small
 \begin{tabular}{l | l}
-单词翻译表 & P \\ \hline
+单词翻译表 & $\funp{P}$ \\ \hline
 我 $\to$ I & 0.6 \\
 喜欢 $\to$ like & 0.3 \\
 红 $\to$ red & 0.8 \\

--- a/Chapter7/Figures/figure-example-of-translation-black-tea-2.tex
+++ b/Chapter7/Figures/figure-example-of-translation-black-tea-2.tex
@@ -10,7 +10,7 @@
 \node [anchor=west] (s2) at ([xshift=1.0em]s1.east) {喜欢};
 \node [anchor=west] (s3) at ([xshift=1.0em]s2.east) {\red{红}};
 \node [anchor=west] (s4) at ([xshift=1.0em]s3.east) {茶};
-\node [anchor=east] (s) at (s1.west) {$\textbf{s}=$};
+\node [anchor=east] (s) at (s1.west) {$\seq{s}=$};
 }
 \end{scope}
@@ -21,7 +21,7 @@
 \node [anchor=west] (t2) at ([xshift=0.8em,yshift=-0.0em]t1.east) {like};
 \node [anchor=west] (t3) at ([xshift=1.0em,yshift=-0.0em]t2.east) {red};
 \node [anchor=west] (t4) at ([xshift=1.0em,yshift=-0.1em]t3.east) {tea};
-\node [anchor=east] (t) at ([xshift=-0.3em]t1.west) {$\textbf{t}=$};
+\node [anchor=east] (t) at ([xshift=-0.3em]t1.west) {$\seq{t}=$};
 }
 \end{scope}
@@ -34,7 +34,7 @@
 \begin{pgfonlayer}{background}
 {
 \node [rectangle,draw,thick,inner sep=0.2em,fill=white,drop shadow] [fit = (t3) (t4)] (problemphrase) {};
-\node [anchor=north,text width=8em,align=left] (problemlabel) at (problemphrase.south) {\begin{spacing}{0.8}\scriptsize{``红 茶''为一种搭配，应该翻译为``black tea''}\end{spacing}};
+\node [anchor=north,text width=8em,align=left] (problemlabel) at (problemphrase.south) {\begin{spacing}{0.8}\scriptsize{“红 茶”为一种搭配，应该翻译为“black tea”}\end{spacing}};
 }
 \end{pgfonlayer}

--- a/Chapter7/Figures/figure-example-of-vocabulary-translation-probability.tex
+++ b/Chapter7/Figures/figure-example-of-vocabulary-translation-probability.tex
@@ -39,10 +39,10 @@
 \node[align=center,elementnode,minimum size=0.3cm,inner sep=0.1pt,fill=blue!50] (la4) at (a41) {};
 \node[align=center,elementnode,minimum size=0.3cm,inner sep=0.1pt,fill=blue!50] (la5) at (a30) {};
-\node[anchor=west] (f1) at ([xshift=3em,yshift=0.8em]a43.east) {\small{$\textrm{P}_{\textrm{lex}}(\bar{t}|\bar{s})=w(t_1|s_1)\times$}};
+\node[anchor=west] (f1) at ([xshift=3em,yshift=0.8em]a43.east) {\small{$\funp{P}_{\textrm{lex}}(\bar{t}|\bar{s})=\sigma (t_1|s_1)\times$}};
-\node[anchor=north] (f2) at ([xshift=5.2em]f1.south) {\small{$\frac{1}{2}(w(t_2|s_2)+w(t_4|s_2))\times$}};
+\node[anchor=north] (f2) at ([xshift=5.2em]f1.south) {\small{$\frac{1}{2}(\sigma (t_2|s_2)+\sigma (t_4|s_2))\times$}};
-\node[anchor=north west] (f3) at (f2.south west) {\small{$w(N|s_3)\times$}};
+\node[anchor=north west] (f3) at (f2.south west) {\small{$\sigma (N|s_3)\times$}};
-\node[anchor=north west] (f4) at (f3.south west) {\small{$w(t_4|s_4)\times$}};
+\node[anchor=north west] (f4) at (f3.south west) {\small{$\sigma (t_4|s_4)\times$}};
 \end{scope}

--- a/Chapter7/Figures/figure-example-of-zh2en-translation-base-phrase.tex
+++ b/Chapter7/Figures/figure-example-of-zh2en-translation-base-phrase.tex
@@ -7,20 +7,20 @@
 {\small
 \node[anchor=north,fill=green!20] (s1) at (0,0) {进口};
-\node [anchor=north,fill=red!20] (s2) at ([xshift=4em,yshift=0em]s1.north) {大幅度};
+\node [anchor=west,fill=red!20] (s2) at ([xshift=1em,yshift=0em]s1.east) {大幅度};
-\node[anchor=north,fill=blue!20] (s3) at ([xshift=4.5em,yshift=0em]s2.north) {下降 了};
+\node[anchor=west,fill=blue!20] (s3) at ([xshift=1em,yshift=0em]s2.east) {下降\ \ \ 了};
 \node[anchor=west,fill=green!20] (t1) at ([xshift=0em,yshift=-4em]s1.west) {The imports have};
-\node[anchor=north,fill=red!20] (t2) at ([xshift=8em,yshift=0em]t1.north) {drastically};
+\node[anchor=west,fill=red!20] (t2) at ([xshift=1em,yshift=0em]t1.east) {drastically};
-\node[anchor=north,fill=blue!20] (t3) at ([xshift=5.7em,yshift=0em]t2.north) {fallen};
+\node[anchor=west,fill=blue!20] (t3) at ([xshift=1em,yshift=0em]t2.east) {fallen};
 \path[<->, thick] (s1.south) edge (t1.north);
 \path[<->, thick] (s2.south) edge (t2.north);
 \path[<->, thick] (s3.south) edge (t3.north);
 }
-\node[anchor=south] (s0) at ([xshift=-3em,yshift=0em]s1.south) {源语言:};
+\node[anchor=south] (s0) at ([xshift=-3em,yshift=0em]s1.south) {源语言：};
-\node[anchor=east] (t0) at ([xshift=0em,yshift=-3.5em]s0.east) {目标语言:};
+\node[anchor=east] (t0) at ([xshift=0em,yshift=-3.5em]s0.east) {目标语言：};
 \end{scope}
 \end{tikzpicture}

--- a/Chapter7/Figures/figure-function-image-about-weight-and-Bleu-2.tex
+++ b/Chapter7/Figures/figure-function-image-about-weight-and-Bleu-2.tex
-%%%------------------------------------------------------------------------------------------------------------
-%%% 特征权重调优
-\begin{tikzpicture}
-\begin{scope}
-\node[anchor=west] (x0) at (0, 0) {};
-\draw[->,thick] (x0.center) -- ([xshift=8.2em]x0.east);
-\draw[->,thick] (x0.center) -- ([yshift=5.6em]x0.center);
-\node[anchor=north] (zero) at ([yshift=0.1em]x0.south) {\small{0}};
-\node[anchor=north] (wx) at ([xshift=4em,yshift=0.1em]x0.south) {\small{$w_x$}};
-\node[anchor=north] (wi) at ([xshift=8em,yshift=0.1em]x0.south) {\small{$w_i$}};
-{
-\draw[thick] ([yshift=2em]x0.center) -- ([xshift=4em,yshift=2em]x0.center);
-\draw[thick] ([xshift=4em,yshift=4em]x0.center) -- ([xshift=8em,yshift=4em]x0.center);
-\draw[thick,dotted] ([xshift=4em]x0.center) -- ([xshift=4em,yshift=5.5em]x0.center);
-\node[anchor=north] (e1) at ([xshift=2em,yshift=3em]x0.north) {\small{$d^*=d_1$}};
-\node[anchor=north] (e2) at ([xshift=6.2em,yshift=5em]x0.north) {\small{$d^*=d_2$}};
-\node[anchor=north,rotate=90] (e2) at ([xshift=-1.3em,yshift=3.6em]x0.south) {\small{BLEU}};
-\draw[decorate,decoration={brace,amplitude=0.4em},red,thick] ([xshift=4em,yshift=0.5em]x0.south) -- ([xshift=8.2em,yshift=0.5em]x0.south);
-\node[anchor=north] (wi) at ([xshift=6.1em,yshift=2.1em]x0.south) {\footnotesize{\red{挑选$w_i$}}};
-}
-\end{scope}
-\end{tikzpicture}
--- a/Chapter7/Figures/figure-function-image-about-weight-and-Bleu-1.tex
+++ b/Chapter7/Figures/figure-function-image-about-weight-and-Bleu-1.tex
@@ -12,17 +12,42 @@
 \draw[thick] ([yshift=2em]x0.center) -- ([xshift=8em,yshift=4em]x0.center);
 \node[anchor=north] (e1) at ([xshift=6em,yshift=6em]x0.south) {\small{$d_1$}};
 \node[anchor=north] (e2) at ([xshift=7em,yshift=4em]x0.south) {\small{$d_2$}};
-\node[anchor=north,rotate=90] (e2) at ([xshift=-1.3em,yshift=3.6em]x0.south) {\small{model score}};
+\node[anchor=north,rotate=90] (e2) at ([xshift=-1.3em,yshift=4em]x0.south) {\small{score}};
 }
 {
 \node [anchor=center,draw=red,circle,inner sep=2pt,thick] (x1) at ([xshift=4em,yshift=3em]x0.center) {};
-\draw[thick,dotted] ([xshift=4em]x0.center) -- ([xshift=4em,yshift=3em]x0.center);
+\draw[thick,dotted] ([xshift=4em]x0.center) -- ([xshift=4em,yshift=3.6em]x0.center);
 }
 \node[anchor=north] (zero) at ([yshift=0.1em]x0.south) {\small{0}};
-\node[anchor=north] (wx) at ([xshift=4em,yshift=0.1em]x0.south) {\small{$w_x$}};
+\node[anchor=north] (wx) at ([xshift=4em,yshift=0.1em]x0.south) {\small{$\lambda_x$}};
-\node[anchor=north] (wi) at ([xshift=8em,yshift=0.1em]x0.south) {\small{$w_i$}};
+\node[anchor=north] (wi) at ([xshift=8em,yshift=0.1em]x0.south) {\small{$\lambda_i$}};
 \end{scope}
+\begin{scope}[xshift=1.7in]
+\node[anchor=west] (x0) at (0, 0) {};
+\draw[->,thick] (x0.center) -- ([xshift=8.2em]x0.east);
+\draw[->,thick] (x0.center) -- ([yshift=5.6em]x0.center);
+\node[anchor=north] (zero) at ([yshift=0.1em]x0.south) {\small{0}};
+\node[anchor=north] (wx) at ([xshift=4em,yshift=0.1em]x0.south) {\small{$\lambda_x$}};
+\node[anchor=north] (wi) at ([xshift=8em,yshift=0.1em]x0.south) {\small{$\lambda_i$}};
+{
+\draw[thick] ([yshift=2em]x0.center) -- ([xshift=4em,yshift=2em]x0.center);
+\draw[thick] ([xshift=4em,yshift=4em]x0.center) -- ([xshift=8em,yshift=4em]x0.center);
+\draw[thick,dotted] ([xshift=4em]x0.center) -- ([xshift=4em,yshift=5.5em]x0.center);
+\node[anchor=north] (e1) at ([xshift=2em,yshift=3em]x0.north) {\small{$\hat{d}=d_1$}};
+\node[anchor=north] (e2) at ([xshift=6.2em,yshift=5em]x0.north) {\small{$\hat{d}=d_2$}};
+\node[anchor=north,rotate=90] (e2) at ([xshift=-1.3em,yshift=4em]x0.south) {\small{BLEU}};
+\draw[decorate,decoration={brace,amplitude=0.4em},red,thick] ([xshift=4em,yshift=0.5em]x0.south) -- ([xshift=8.2em,yshift=0.5em]x0.south);
+\node[anchor=north] (wi) at ([xshift=6.1em,yshift=2.1em]x0.south) {\footnotesize{\red{挑选$\hat{\lambda}_i$}}};
+}
+\end{scope}
 \end{tikzpicture}
--- a/Chapter7/Figures/figure-grid-search-2.tex
+++ b/Chapter7/Figures/figure-grid-search-2.tex
-\begin{tikzpicture}
-\begin{scope}[scale=0.62] 
-{\tiny
-\draw[step=1,help lines,color=black] (0,0) grid (4,4); 
-\node[anchor=north] (y2) at ([xshift=-3.3em,yshift=0em]n1.north) {0.01};
-\node[anchor=north] (y1) at ([xshift=0em,yshift=-3.3em]y2.south) {0.00};
-\node[anchor=north] (y3) at ([xshift=0em,yshift=4.5em]y2.north) {0.02};
-\node[anchor=north] (y4) at ([xshift=0em,yshift=6.6em]y3.north) {$\vdots$};
-\node[anchor=north] (y5) at ([xshift=0em,yshift=2em]y4.north) {1.00};
-\node[anchor=north] (x1) at ([xshift=2em,yshift=-3em]n1.south) {$\lambda_1$};
-\node[anchor=north] (x2) at ([xshift=4.5em,yshift=0em]x1.north) {$\lambda_2$};
-\node[anchor=north] (x3) at ([xshift=4em,yshift=-1em]x2.north) {$...$};
-\node[anchor=north] (x4) at ([xshift=5em,yshift=1em]x3.north) {$\lambda_{M-1}$};
-\node[anchor=north] (x5) at ([xshift=5em,yshift=0em]x4.north) {$\lambda_M$};
-\draw [-](n1) (0,4) -- (0,4.4);
-\draw [-](n2) (1,4) -- (1,4.4);
-\draw [-](n3) (2,4) -- (2,4.4);
-\draw [-](n4) (3,4) -- (3,4.4);
-\draw [-](n5) (4,4) -- (4,4.4);
-\node [anchor=center,draw,circle,inner sep=1.5pt,red!30,fill=red!30] (r31) at (2,4) {};
-\node [anchor=center,draw,circle,inner sep=1.5pt,red!30,fill=red!30] (r32) at (2,0) {};
-\node [anchor=center,draw,circle,inner sep=1.5pt,red!30,fill=red!30] (r33) at (2,2) {};
-\node [anchor=center,draw,circle,inner sep=1.5pt,red!30,fill=red!30] (r35) at (2,1) {};
-\node [anchor=center,draw,circle,inner sep=1.5pt,ugreen!50,fill=ugreen!50] (r34) at (2,3) {};
-\draw [-,very thick,red!50, dashed] (1,2) -- (2,4) -- (3,2) -- (2,3) -- (1,2) -- (3,2) -- (2,1) -- (1,2) -- (2,0) -- (3,2);
-\draw [-,very thick,blue!50] (0,1) -- (1,2);
-\draw [-,very thick,blue!50] (3,2) -- (4,4);
-\draw [-,very thick,ugreen!50, dashed] (1,2) -- (2,3) -- (3,2);
-\node [anchor=center,draw,circle,inner sep=1.5pt,blue!30,fill=blue!30] (r11) at (0,1) {};
-\node [anchor=center,draw,circle,inner sep=1.5pt,blue!30,fill=blue!30] (r12) at (1,2) {};
-\node [anchor=center,draw,circle,inner sep=1.5pt,blue!30,fill=blue!30] (r14) at (3,2) {};
-\node [anchor=center,draw,circle,inner sep=1.5pt,blue!30,fill=blue!30] (r15) at (4,4) {};
-}
-\end{scope}
-\end{tikzpicture}
\ No newline at end of file
--- a/Chapter7/Figures/figure-grid-search-1.tex
+++ b/Chapter7/Figures/figure-grid-search-1.tex
@@ -3,13 +3,13 @@
 {\tiny
 \draw[step=1,help lines,color=black] (0,0) grid (4,4); 
-\node[anchor=north] (y2) at ([xshift=-3.3em,yshift=0em]n1.north) {0.01};
+\node[anchor=north] (y2) at (-5.3em,1.5) {0.01};
 \node[anchor=north] (y1) at ([xshift=0em,yshift=-3.3em]y2.south) {0.00};
 \node[anchor=north] (y3) at ([xshift=0em,yshift=4.5em]y2.north) {0.02};
 \node[anchor=north] (y4) at ([xshift=0em,yshift=6.6em]y3.north) {$\vdots$};
 \node[anchor=north] (y5) at ([xshift=0em,yshift=2em]y4.north) {1.00};
-\node[anchor=north] (x1) at ([xshift=2em,yshift=-3em]n1.south) {$\lambda_1$};
+\node[anchor=north] (x1) at (1em,-3em) {$\lambda_1$};
 \node[anchor=north] (x2) at ([xshift=4.5em,yshift=0em]x1.north) {$\lambda_2$};
 \node[anchor=north] (x3) at ([xshift=4em,yshift=-1em]x2.north) {$...$};
 \node[anchor=north] (x4) at ([xshift=5em,yshift=1em]x3.north) {$\lambda_{M-1}$};
@@ -28,11 +28,11 @@
 \node [anchor=center,draw,circle,inner sep=1.5pt,red!30,fill=red!30] (r35) at (2,1) {};
 \node [anchor=center,draw,circle,inner sep=1.5pt,blue!30,fill=blue!30] (f11) at ([xshift=0em,yshift=23em]y2.north) {};
-\node[anchor=south] (f12) at ([xshift=5em,yshift=-0.5em]f11.south) {\scriptsize{fixed}};
+\node[anchor=south] (f12) at ([xshift=8.5em,yshift=-1em]f11.south) {\scriptsize{固定的权重}};
 \node [anchor=center,draw,circle,inner sep=1.5pt,ugreen!50,fill=ugreen!50] (f21) at ([xshift=0em,yshift=-4em]f11.north) {};
-\node[anchor=south] (f22) at ([xshift=8.5em,yshift=-0.5em]f21.south) {\scriptsize{valid choices}};
+\node[anchor=south] (f22) at ([xshift=8.5em,yshift=-1em]f21.south) {\scriptsize{有效取值点}};
 \node [anchor=center,draw,circle,inner sep=1.5pt,red!30,fill=red!30] (f31) at ([xshift=0em,yshift=-4em]f21.north) {};
-\node[anchor=south] (f32) at ([xshift=9.5em,yshift=-0.5em]f31.south) {\scriptsize{invalid choices}};
+\node[anchor=south] (f32) at ([xshift=8.5em,yshift=-1em]f31.south) {\scriptsize{无效取值点}};
 \draw [-,very thick,red!50, dashed] (1,2) -- (2,4) -- (3,2) -- (2,3) -- (1,2) -- (3,2) -- (2,1) -- (1,2) -- (2,0) -- (3,2);
 \draw [-,very thick,blue!50] (0,1) -- (1,2);
@@ -44,4 +44,45 @@
 \node [anchor=center,draw,circle,inner sep=1.5pt,blue!30,fill=blue!30] (r15) at (4,4) {};
 }
 \end{scope}
+\begin{scope}[scale=0.62,xshift=3in] 
+{\tiny
+\draw[step=1,help lines,color=black] (0,0) grid (4,4); 
+\node[anchor=north] (y2) at (-5.3em,1.5) {0.01};
+\node[anchor=north] (y1) at ([xshift=0em,yshift=-3.3em]y2.south) {0.00};
+\node[anchor=north] (y3) at ([xshift=0em,yshift=4.5em]y2.north) {0.02};
+\node[anchor=north] (y4) at ([xshift=0em,yshift=6.6em]y3.north) {$\vdots$};
+\node[anchor=north] (y5) at ([xshift=0em,yshift=2em]y4.north) {1.00};
+\node[anchor=north] (x1) at (1em,-3em) {$\lambda_1$};
+\node[anchor=north] (x2) at ([xshift=4.5em,yshift=0em]x1.north) {$\lambda_2$};
+\node[anchor=north] (x3) at ([xshift=4em,yshift=-1em]x2.north) {$...$};
+\node[anchor=north] (x4) at ([xshift=5em,yshift=1em]x3.north) {$\lambda_{M-1}$};
+\node[anchor=north] (x5) at ([xshift=5em,yshift=0em]x4.north) {$\lambda_M$};
+\draw [-](n1) (0,4) -- (0,4.4);
+\draw [-](n2) (1,4) -- (1,4.4);
+\draw [-](n3) (2,4) -- (2,4.4);
+\draw [-](n4) (3,4) -- (3,4.4);
+\draw [-](n5) (4,4) -- (4,4.4);
+\node [anchor=center,draw,circle,inner sep=1.5pt,red!30,fill=red!30] (r31) at (2,4) {};
+\node [anchor=center,draw,circle,inner sep=1.5pt,red!30,fill=red!30] (r32) at (2,0) {};
+\node [anchor=center,draw,circle,inner sep=1.5pt,red!30,fill=red!30] (r33) at (2,2) {};
+\node [anchor=center,draw,circle,inner sep=1.5pt,red!30,fill=red!30] (r35) at (2,1) {};
+\node [anchor=center,draw,circle,inner sep=1.5pt,ugreen!50,fill=ugreen!50] (r34) at (2,3) {};
+\draw [-,very thick,red!50, dashed] (1,2) -- (2,4) -- (3,2) -- (2,3) -- (1,2) -- (3,2) -- (2,1) -- (1,2) -- (2,0) -- (3,2);
+\draw [-,very thick,blue!50] (0,1) -- (1,2);
+\draw [-,very thick,blue!50] (3,2) -- (4,4);
+\draw [-,very thick,ugreen!50, dashed] (1,2) -- (2,3) -- (3,2);
+\node [anchor=center,draw,circle,inner sep=1.5pt,blue!30,fill=blue!30] (r11) at (0,1) {};
+\node [anchor=center,draw,circle,inner sep=1.5pt,blue!30,fill=blue!30] (r12) at (1,2) {};
+\node [anchor=center,draw,circle,inner sep=1.5pt,blue!30,fill=blue!30] (r14) at (3,2) {};
+\node [anchor=center,draw,circle,inner sep=1.5pt,blue!30,fill=blue!30] (r15) at (4,4) {};
+}
+\end{scope}
 \end{tikzpicture}
\ No newline at end of file
--- a/Chapter7/Figures/figure-phrase-extraction-consistent-with-word-alignment-1.tex
+++ b/Chapter7/Figures/figure-phrase-extraction-consistent-with-word-alignment-1.tex
@@ -62,7 +62,7 @@
 \begin{scope}[xshift = 1.5in, yshift = 1.3in]
 {\scriptsize
-\node (rules) {\textbf{抽取得到的短语:}};
+\node (rules) {\textbf{抽取得到的短语：}};
 \draw[-] (rules.south west)--([xshift=2.0in]rules.south west);
 {

--- a/Chapter7/Figures/figure-phrase-extraction-consistent-with-word-alignment.tex
+++ b/Chapter7/Figures/figure-phrase-extraction-consistent-with-word-alignment.tex
@@ -50,11 +50,11 @@
 {
-\node [anchor=west] (p1line1) at ([xshift=4em,yshift=1em]a75.east) {\footnotesize{$\bar{s}_i$: 天气\ \ \ \ \ \ }};
+\node [anchor=west] (p1line1) at ([xshift=4em,yshift=1em]a75.east) {\footnotesize{$\bar{s}_i$： 天气\ \ \ \ \ \ }};
-\node [anchor=north west] (p1line2) at ([xshift=0]p1line1.south west) {\footnotesize{$\bar{t}_i$: The\ \ \ weather\ \ \ \ \ }};
+\node [anchor=north west] (p1line2) at ([xshift=0]p1line1.south west) {\footnotesize{$\bar{t}_i$： The\ \ \ weather\ \ \ \ \ }};
-\node [anchor=west] (p2line1) at ([xshift=4em]a72.east) {\footnotesize{$\bar{s}_j$: 真\ \ \ 好 \ \ }};
+\node [anchor=west] (p2line1) at ([xshift=4em]a72.east) {\footnotesize{$\bar{s}_j$： 真\ \ \ 好 \ \ }};
-\node [anchor=north west] (p2line2) at ([xshift=0]p2line1.south west) {\footnotesize{$\bar{t}_j$: very\ \ \ good\ \ \ \ \ \ \ \ }};
+\node [anchor=north west] (p2line2) at ([xshift=0]p2line1.south west) {\footnotesize{$\bar{t}_j$： very\ \ \ good\ \ \ \ \ \ \ \ }};
 \node [anchor=east] (p2line3) at ([xshift=0em,yshift=-4em]p1line2.east) {};
 \begin{pgfonlayer}{background}

--- a/Chapter7/Figures/figure-reorder-base-distance.tex
+++ b/Chapter7/Figures/figure-reorder-base-distance.tex
@@ -5,7 +5,7 @@
 \begin{scope}[minimum height = 20pt]
-\node[anchor=east] (s0) at (-0.5em, 0) {$\textbf{s}$:};
+\node[anchor=east] (s0) at (-0.5em, 0) {$\seq{s}$：};
 \node[anchor=west,fill=green!20] (s1) at (0, 0) {\small{在\ \ 桌子\ \ 上\ \ \;的}};
 \node[anchor=south] (n1) at ([xshift=-2.5em,yshift=-0.5em]s1.north) {\small{1}};
 \node[anchor=south] (n2) at ([xshift=-0.7em,yshift=-0.5em]s1.north) {\small{2}};
@@ -14,7 +14,7 @@
 \node[anchor=west,fill=red!20] (s2) at ([xshift=1em]s1.east) {\small{苹果}};
 \node[anchor=south] (n5) at ([yshift=-0.5em]s2.north) {\small{5}};
-\node[anchor=east] (t0) at (-0.5em, -1.5) {$\textbf{t}$:};
+\node[anchor=east] (t0) at (-0.5em, -1.5) {$\seq{t}$：};
 \node[anchor=west,fill=red!20] (t1) at (0, -1.5) {\small{the apple}};
 \node[anchor=west,fill=green!20] (t2) at ([xshift=1.3em]t1.east) {\small{on the table}};

--- a/Chapter7/Figures/figure-reorder-base-phrase-translation.tex
+++ b/Chapter7/Figures/figure-reorder-base-phrase-translation.tex
@@ -5,11 +5,11 @@
 \begin{scope}[minimum height = 20pt]
-\node[anchor=east] (s0) at (-0.5em, 0) {$\textbf{s}$:};
+\node[anchor=east] (s0) at (-0.5em, 0) {$\seq{s}$：};
 \node[anchor=west,fill=green!20] (s1) at (0, 0) {\footnotesize{在 桌子 上 的}};
 \node[anchor=west,fill=red!20] (s2) at ([xshift=1em]s1.east) {\footnotesize{苹果}};
-\node[anchor=east] (t0) at (-0.5em, -1.5) {$\textbf{t}$:};
+\node[anchor=east] (t0) at (-0.5em, -1.5) {$\seq{t}$：};
 \node[anchor=west,fill=red!20] (t1) at (0, -1.5) {\footnotesize{the apple}};
 \node[anchor=west,fill=green!20] (t2) at ([xshift=1em]t1.east) {\footnotesize{on the table}};

--- a/Chapter7/Figures/figure-search-space-representation-of-feature-weight-1.tex
+++ b/Chapter7/Figures/figure-search-space-representation-of-feature-weight-1.tex
-\begin{tikzpicture}
-\begin{scope}[scale=0.55] 
-{\tiny
-\draw[step=1,help lines,color=black] grid (4,4); 
-\node[anchor=north] (y2) at ([xshift=-3.3em,yshift=0em]n1.north) {0.01};
-\node[anchor=north] (y1) at ([xshift=0em,yshift=-3.3em]y2.south) {0.00};
-\node[anchor=north] (y3) at ([xshift=0em,yshift=4.5em]y2.north) {0.02};
-\node[anchor=north] (y4) at ([xshift=0em,yshift=6.6em]y3.north) {$\vdots$};
-\node[anchor=north] (y5) at ([xshift=0em,yshift=2em]y4.north) {1.00};
-\node[anchor=north] (x1) at ([xshift=2em,yshift=-3em]n1.south) {$\lambda_1$};
-\node[anchor=north] (x2) at ([xshift=4.5em,yshift=0em]x1.north) {$\lambda_2$};
-\node[anchor=north] (x3) at ([xshift=4em,yshift=-1em]x2.north) {$...$};
-\node[anchor=north] (x4) at ([xshift=5em,yshift=1em]x3.north) {$\lambda_{M-1}$};
-\node[anchor=north] (x5) at ([xshift=5em,yshift=0em]x4.north) {$\lambda_M$};
-\draw [-](n1) (0,4) -- (0,4.4);
-\draw [-](n2) (1,4) -- (1,4.4);
-\draw [-](n3) (2,4) -- (2,4.4);
-\draw [-](n4) (3,4) -- (3,4.4);
-\draw [-](n5) (4,4) -- (4,4.4);
-\draw[decorate,decoration={brace}](0,4.7) --(4,4.7) node [xshift=-4em,yshift=1.5em,align=center](label1) {M dimensions};	
-\draw[decorate,decoration={brace}](4.5,4.3) --(4.5,0) node [xshift=2.3em,yshift=5.8em,align=center](label2) {Values};	
-}
-\end{scope}
-\end{tikzpicture}
\ No newline at end of file
--- a/Chapter7/Figures/figure-search-space-representation-of-feature-weight-2.tex
+++ b/Chapter7/Figures/figure-search-space-representation-of-feature-weight-2.tex
-\begin{tikzpicture}
-\begin{scope}[scale=0.55] 
-{\tiny
-\draw[step=1,help lines,color=black] grid (4,4); 
-\node[anchor=north] (y2) at ([xshift=-3.3em,yshift=0em]n1.north) {0.01};
-\node[anchor=north] (y1) at ([xshift=0em,yshift=-3.3em]y2.south) {0.00};
-\node[anchor=north] (y3) at ([xshift=0em,yshift=4.5em]y2.north) {0.02};
-\node[anchor=north] (y4) at ([xshift=0em,yshift=6.6em]y3.north) {$\vdots$};
-\node[anchor=north] (y5) at ([xshift=0em,yshift=2em]y4.north) {1.00};
-\node[anchor=north] (x1) at ([xshift=2em,yshift=-3em]n1.south) {$\lambda_1$};
-\node[anchor=north] (x2) at ([xshift=4.5em,yshift=0em]x1.north) {$\lambda_2$};
-\node[anchor=north] (x3) at ([xshift=4em,yshift=-1em]x2.north) {$...$};
-\node[anchor=north] (x4) at ([xshift=5em,yshift=1em]x3.north) {$\lambda_{M-1}$};
-\node[anchor=north] (x5) at ([xshift=5em,yshift=0em]x4.north) {$\lambda_M$};
-\draw [-](n1) (0,4) -- (0,4.4);
-\draw [-](n2) (1,4) -- (1,4.4);
-\draw [-](n3) (2,4) -- (2,4.4);
-\draw [-](n4) (3,4) -- (3,4.4);
-\draw [-](n5) (4,4) -- (4,4.4);
-\node [anchor=center,draw,circle,inner sep=1.5pt,blue!30,fill=blue!30] (r11) at (0,1) {};
-\node [anchor=center,draw,circle,inner sep=1.5pt,blue!30,fill=blue!30] (r12) at (1,2) {};
-\node [anchor=center,draw,circle,inner sep=1.5pt,blue!30,fill=blue!30] (r13) at (2,1) {};
-\node [anchor=center,draw,circle,inner sep=1.5pt,blue!30,fill=blue!30] (r14) at (3,2) {};
-\node [anchor=center,draw,circle,inner sep=1.5pt,blue!30,fill=blue!30] (r15) at (4,4) {};
-\draw [-,very thick,blue!50] (0,1) -- (1,2) -- (2,1) -- (3,2) -- (4,4);
-\node[anchor=north] (p1) at ([xshift=5em,yshift=13em]n5.north) {\scriptsize{$\leftarrow$ \textbf{path}:}};
-\node[anchor=north] (e1) at ([xshift=0,yshift=-0.4em]p1.south) {$w_1 = 0.01$};
-\node[anchor=north] (e2) at ([xshift=0,yshift=-0.8em]e1.south) {$w_2 = 0.02$};
-\node[anchor=north] (e3) at ([xshift=0,yshift=0.4em]e2.south) {$\vdots$};
-\node[anchor=north] (e4) at ([xshift=0,yshift=-0.2em]e3.south) {$w_M = 1.00$};
-}
-\end{scope}
-\end{tikzpicture}
\ No newline at end of file
--- a/Chapter7/Figures/figure-search-space-representation-of-feature-weight-3.tex
+++ b/Chapter7/Figures/figure-search-space-representation-of-feature-weight-3.tex
@@ -3,13 +3,85 @@
 {\tiny
 \draw[step=1,help lines,color=black] grid (4,4); 
-\node[anchor=north] (y2) at ([xshift=-3.3em,yshift=0em]n1.north) {0.01};
+\draw [-](n1) (0,4) -- (0,4.4);
+\draw [-](n2) (1,4) -- (1,4.4);
+\draw [-](n3) (2,4) -- (2,4.4);
+\draw [-](n4) (3,4) -- (3,4.4);
+\draw [-](n5) (4,4) -- (4,4.4);
+\node[anchor=north] (y2) at (-5.3em,1.5) {0.01};
 \node[anchor=north] (y1) at ([xshift=0em,yshift=-3.3em]y2.south) {0.00};
 \node[anchor=north] (y3) at ([xshift=0em,yshift=4.5em]y2.north) {0.02};
 \node[anchor=north] (y4) at ([xshift=0em,yshift=6.6em]y3.north) {$\vdots$};
 \node[anchor=north] (y5) at ([xshift=0em,yshift=2em]y4.north) {1.00};
-\node[anchor=north] (x1) at ([xshift=2em,yshift=-3em]n1.south) {$\lambda_1$};
+\node[anchor=north] (x1) at (1em,-3em) {$\lambda_1$};
+\node[anchor=north] (x2) at ([xshift=4.5em,yshift=0em]x1.north) {$\lambda_2$};
+\node[anchor=north] (x3) at ([xshift=4em,yshift=-1em]x2.north) {$...$};
+\node[anchor=north] (x4) at ([xshift=5em,yshift=1em]x3.north) {$\lambda_{M-1}$};
+\node[anchor=north] (x5) at ([xshift=5em,yshift=0em]x4.north) {$\lambda_M$};
+\draw[decorate,decoration={brace}](0,4.7) --(4,4.7) node [xshift=-4em,yshift=1.5em,align=center](label1) {M个特征函数};	
+\draw[decorate,decoration={brace}](4.5,4.3) --(4.5,0) node [xshift=2.3em,yshift=6.8em,align=center](label2) {V种};	
+\node[anchor=north] (label3) at ([xshift=0em,yshift=-2.5em]label2.north) {取值};	
+}
+\node[anchor=north] (l1) at ([xshift=0em,yshift=-2.5em]x3.south) {\footnotesize{(a)搜索空间}};
+\end{scope}
+\begin{scope}[scale=0.55,xshift=3.2in] 
+{\tiny
+\draw[step=1,help lines,color=black] grid (4,4); 
+\node[anchor=north] (y2) at (-5.3em,1.5) {0.01};
+\node[anchor=north] (y1) at ([xshift=0em,yshift=-3.3em]y2.south) {0.00};
+\node[anchor=north] (y3) at ([xshift=0em,yshift=4.5em]y2.north) {0.02};
+\node[anchor=north] (y4) at ([xshift=0em,yshift=6.6em]y3.north) {$\vdots$};
+\node[anchor=north] (y5) at ([xshift=0em,yshift=2em]y4.north) {1.00};
+\node[anchor=north] (x1) at (1em,-3em) {$\lambda_1$};
+\node[anchor=north] (x2) at ([xshift=4.5em,yshift=0em]x1.north) {$\lambda_2$};
+\node[anchor=north] (x3) at ([xshift=4em,yshift=-1em]x2.north) {$...$};
+\node[anchor=north] (x4) at ([xshift=5em,yshift=1em]x3.north) {$\lambda_{M-1}$};
+\node[anchor=north] (x5) at ([xshift=5em,yshift=0em]x4.north) {$\lambda_M$};
+\draw [-](n1) (0,4) -- (0,4.4);
+\draw [-](n2) (1,4) -- (1,4.4);
+\draw [-](n3) (2,4) -- (2,4.4);
+\draw [-](n4) (3,4) -- (3,4.4);
+\draw [-](n5) (4,4) -- (4,4.4);
+\node [anchor=center,draw,circle,inner sep=1.5pt,blue!30,fill=blue!30] (r11) at (0,1) {};
+\node [anchor=center,draw,circle,inner sep=1.5pt,blue!30,fill=blue!30] (r12) at (1,2) {};
+\node [anchor=center,draw,circle,inner sep=1.5pt,blue!30,fill=blue!30] (r13) at (2,1) {};
+\node [anchor=center,draw,circle,inner sep=1.5pt,blue!30,fill=blue!30] (r14) at (3,2) {};
+\node [anchor=center,draw,circle,inner sep=1.5pt,blue!30,fill=blue!30] (r15) at (4,4) {};
+\draw [-,very thick,blue!50] (0,1) -- (1,2) -- (2,1) -- (3,2) -- (4,4);
+\node[anchor=north] (p1) at (5.7,4.3) {\scriptsize{$\leftarrow$ \textbf{path}:}};
+\node[anchor=north] (e1) at ([xshift=0,yshift=-0.4em]p1.south) {$w_1 = 0.01$};
+\node[anchor=north] (e2) at ([xshift=0,yshift=-0.8em]e1.south) {$w_2 = 0.02$};
+\node[anchor=north] (e3) at ([xshift=0,yshift=0.4em]e2.south) {$\vdots$};
+\node[anchor=north] (e4) at ([xshift=0,yshift=-0.2em]e3.south) {$w_M = 1.00$};
+}
+\node[anchor=north] (l1) at ([xshift=0em,yshift=-2.5em]x3.south) {\footnotesize{(b)一条搜索路径}};
+\end{scope}
+\begin{scope}[scale=0.55,xshift=6.8in] 
+{\tiny
+\draw[step=1,help lines,color=black] grid (4,4); 
+\node[anchor=north] (y2) at (-5.3em,1.5) {0.01};
+\node[anchor=north] (y1) at ([xshift=0em,yshift=-3.3em]y2.south) {0.00};
+\node[anchor=north] (y3) at ([xshift=0em,yshift=4.5em]y2.north) {0.02};
+\node[anchor=north] (y4) at ([xshift=0em,yshift=6.6em]y3.north) {$\vdots$};
+\node[anchor=north] (y5) at ([xshift=0em,yshift=2em]y4.north) {1.00};
+\node[anchor=north] (x1) at (1em,-3em) {$\lambda_1$};
 \node[anchor=north] (x2) at ([xshift=4.5em,yshift=0em]x1.north) {$\lambda_2$};
 \node[anchor=north] (x3) at ([xshift=4em,yshift=-1em]x2.north) {$...$};
 \node[anchor=north] (x4) at ([xshift=5em,yshift=1em]x3.north) {$\lambda_{M-1}$};
@@ -43,8 +115,10 @@
 \draw [-,very thick,ugreen!50] (0,2) -- (1,3) -- (2,4) -- (3,0) -- (4,2);
 \draw [-,very thick,red!50] (0,4) -- (1,3) -- (2,2) -- (3,3) -- (4,1);
-\draw[decorate,decoration={brace}](4.5,4.3) --(4.5,0) node [xshift=2.3em,yshift=7.5em,align=center](label1) {$M^V$};	
+\draw[decorate,decoration={brace}](4.5,4.3) --(4.5,0) node [xshift=2.3em,yshift=6.5em,align=center](label1) {$M^V$};	
-\node[anchor=north] (label2) at ([xshift=0em,yshift=-2.5em]label1.north) {pathes};
+\node[anchor=north] (label2) at ([xshift=0em,yshift=-2.5em]label1.north) {种组合};
 }
+\node[anchor=north] (l1) at ([xshift=0em,yshift=-2.5em]x3.south) {\footnotesize{(c)多条搜索路径}};
 \end{scope}
 \end{tikzpicture}
\ No newline at end of file
--- a/Chapter7/Figures/figure-three-types-of-reorder-method-in-msd.tex
+++ b/Chapter7/Figures/figure-three-types-of-reorder-method-in-msd.tex
@@ -52,10 +52,10 @@
 {
-\node [anchor=west] (p1line1) at ([xshift=3.5em,yshift=0.5em]a75.east) {\footnotesize{M(monotone):单调调序}};
+\node [anchor=west] (p1line1) at ([xshift=3.5em,yshift=0.5em]a75.east) {\footnotesize{M(monotone)：单调调序}};
-\node [anchor=north west] (p1line2) at ([xshift=0,yshift=-1em]p1line1.south west) {\footnotesize{S(swap): 与前面一个短语}};
+\node [anchor=north west] (p1line2) at ([xshift=0,yshift=-1em]p1line1.south west) {\footnotesize{S(swap)： 与前面一个短语}};
-\node [anchor=north west] (p1line3) at ([xshift=3.5em]p1line2.south west) {\footnotesize{位置进行交换}};
+\node [anchor=north west] (p1line3) at ([xshift=3.8em]p1line2.south west) {\footnotesize{位置进行交换}};
-\node [anchor=north west] (p1line4) at ([xshift=-3.5em,yshift=-1em]p1line3.south west) {\footnotesize{D(discontinuous):非连续调序}};
+\node [anchor=north west] (p1line4) at ([xshift=-3.5em,yshift=-1em]p1line3.south west) {\footnotesize{D(discontinuous)：非连续调序}};
 \node [anchor=east] (p1line5) at ([xshift=0em,yshift=3em]p1line4.east) {};
 \node [anchor=east] (p1line6) at ([xshift=0em,yshift=7em]p1line4.east) {};

--- a/Chapter7/Figures/figure-translation-hypothesis-extension.tex
+++ b/Chapter7/Figures/figure-translation-hypothesis-extension.tex
@@ -6,7 +6,7 @@
 {
 \node [anchor=north,inner sep=2pt,fill=red!20,minimum height=2em,minimum width=3.5em] (h0) at (0,0) {\small{null}};
 \node [anchor=north west,inner sep=1.5pt,fill=black] (hl0) at (h0.north west) {\scriptsize{{\color{white} \textbf{0}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt0) at (h0.east) {\footnotesize{{\color{white} \textbf{P=1}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt0) at (h0.east) {\footnotesize{{\color{white} \textbf{$\funp{P}$=1}}}};
 }
 {
@@ -16,8 +16,8 @@
 \node [anchor=north west,inner sep=1.5pt,fill=black] (hl1) at (h1.north west) {\scriptsize{{\color{white} \textbf{2}}}};
 \node [anchor=north west,inner sep=1.5pt,fill=black] (hl2) at (h2.north west) {\scriptsize{{\color{white} \textbf{1}}}};
 \node [anchor=north west,inner sep=1.5pt,fill=black] (hl3) at (h3.north west) {\scriptsize{{\color{white} \textbf{3}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt1) at (h1.east) {\footnotesize{{\color{white} \textbf{P=.2}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt1) at (h1.east) {\footnotesize{{\color{white} \textbf{$\funp{P}$=.2}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt2) at (h2.east) {\footnotesize{{\color{white} \textbf{P=.3}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt2) at (h2.east) {\footnotesize{{\color{white} \textbf{$\funp{P}$=.3}}}};
 \node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt3) at (h3.east) {\footnotesize{{\color{white} \textbf{P=.5}}}};
 \draw [->,very thick,ublue] ([xshift=0.1em]pt0.south) -- ([xshift=-0.1em]h1.west);
@@ -38,11 +38,11 @@
 \node [anchor=north west,inner sep=1.5pt,fill=black] (hl7) at (h7.north west) {\scriptsize{{\color{white} \textbf{1-2}}}};
 \node [anchor=north west,inner sep=1.5pt,fill=black] (hl8) at (h8.north west) {\scriptsize{{\color{white} \textbf{5}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt4) at (h4.east) {\footnotesize{{\color{white} \textbf{P=.1}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt4) at (h4.east) {\footnotesize{{\color{white} \textbf{$\funp{P}$=.1}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt5) at (h5.east) {\footnotesize{{\color{white} \textbf{P=.4}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt5) at (h5.east) {\footnotesize{{\color{white} \textbf{$\funp{P}$=.4}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt6) at (h6.east) {\footnotesize{{\color{white} \textbf{P=.3}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt6) at (h6.east) {\footnotesize{{\color{white} \textbf{$\funp{P}$=.3}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt7) at (h7.east) {\footnotesize{{\color{white} \textbf{P=.4}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt7) at (h7.east) {\footnotesize{{\color{white} \textbf{$\funp{P}$=.4}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt8) at (h8.east) {\footnotesize{{\color{white} \textbf{P=.2}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt8) at (h8.east) {\footnotesize{{\color{white} \textbf{$\funp{P}$=.2}}}};
 \draw [->,very thick,ublue] ([xshift=0.1em]pt1.south) -- ([xshift=1em,yshift=0.7em]pt1.south);
@@ -65,7 +65,7 @@
 {
 \draw [->,ultra thick,red,line width=2pt,opacity=0.7] ([xshift=-0.5em]h0.west) -- ([xshift=0.7em]h0.east) -- ([xshift=-0.2em]h3.west) -- ([xshift=0.8em]h3.east) -- ([xshift=-0.2em]h5.west) -- ([xshift=0.8em]h5.east) -- ([xshift=-0.2em]h7.west) -- ([xshift=1.5em]h7.east);
-\node [anchor=north west] (wtranslabel) at ([yshift=-3em]h0.south west) {\small{翻译路径:}};
+\node [anchor=north west] (wtranslabel) at ([yshift=-3em]h0.south west) {\small{翻译路径：}};
 \draw [->,ultra thick,red,line width=1.5pt,opacity=0.7] (wtranslabel.east) -- ([xshift=1.5em]wtranslabel.east);
 }
 \end{scope}

--- a/Chapter7/Figures/figure-translation-option.tex
+++ b/Chapter7/Figures/figure-translation-option.tex
@@ -4,7 +4,7 @@
 \begin{tikzpicture}
 \begin{scope}[minimum height = 16pt]
-\node[anchor=east] (s0) at (-0.8em, 0) {$\textbf{s}$:};
+\node[anchor=east] (s0) at (-0.8em, 0) {$\textbf{s}$：};
 \node[anchor=west] (s1) at (0, 0) {桌子};
 \node[anchor=west] (s2) at ([xshift=2em]s1.east) {上};
 \node[anchor=west] (s3) at ([xshift=2.3em]s2.east) {有};

--- a/Chapter7/Figures/figure-unlimited-phrase-extraction.tex
+++ b/Chapter7/Figures/figure-unlimited-phrase-extraction.tex
@@ -41,11 +41,11 @@
 \node[tgtnode] (tgt7) at ([yshift=-0.5*1.0cm]tgt6.north east) {\scriptsize{?}};
 \node[tgtnode] (tgt8) at ([yshift=-0.5*1.0cm]tgt7.north east) {\scriptsize{EOS}};
-\node [anchor=west] (p1line1) at ([xshift=4em,yshift=1em]a57.east) {\footnotesize{$\bar{s}_i$: 什么\ \ \ 都\ \ \ 没}};
+\node [anchor=west] (p1line1) at ([xshift=4em,yshift=1em]a57.east) {\footnotesize{$\bar{s}_i$： 什么\ \ \ 都\ \ \ 没}};
-\node [anchor=north west] (p1line2) at ([xshift=0]p1line1.south west) {\footnotesize{$\bar{t}_i$: learned\ \ \ nothing\ \ \ ? \ \ \ \ \ \ \ \ \ \ \ \ }};
+\node [anchor=north west] (p1line2) at ([xshift=0]p1line1.south west) {\footnotesize{$\bar{t}_i$： learned\ \ \ nothing\ \ \ ? \ \ \ \ \ \ \ \ \ \ \ \ }};
-\node [anchor=west] (p2line1) at ([xshift=4em]a53.east) {\footnotesize{$\bar{s}_j$: 到\ \ \ ?}};
+\node [anchor=west] (p2line1) at ([xshift=4em]a53.east) {\footnotesize{$\bar{s}_j$： 到\ \ \ ?}};
-\node [anchor=north west] (p2line2) at ([xshift=0]p2line1.south west) {\footnotesize{$\bar{t}_j$: Have\ \ \ you\ \ \ learned\ \ \ nothing}};
+\node [anchor=north west] (p2line2) at ([xshift=0]p2line1.south west) {\footnotesize{$\bar{t}_j$： Have\ \ \ you\ \ \ learned\ \ \ nothing}};
 \node [anchor=east] (p1line3) at ([xshift=0em,yshift=2.9cm]p2line2.east) {};
 \begin{pgfonlayer}{background}

--- a/Chapter7/Figures/figure-word-and-phrase-translation-regard-as-path.tex
+++ b/Chapter7/Figures/figure-word-and-phrase-translation-regard-as-path.tex
@@ -10,7 +10,7 @@
 \node [anchor=west] (s4) at ([xshift=2em]s3.east) {\textbf{表示}};
 \node [anchor=west] (s5) at ([xshift=2em]s4.east) {\textbf{满意}};
-\node [anchor=south west] (sentlabel) at ([yshift=-0.5em]s1.north west) {\scriptsize{\textbf{待翻译句子(已经分词):}}};
+\node [anchor=south west] (sentlabel) at ([yshift=-0.5em]s1.north west) {\scriptsize{\textbf{待翻译句子（已经分词）：}}};
 \draw [->,very thick,ublue] (s1.south) -- ([yshift=-0.7em]s1.south);
 \draw [->,very thick,ublue] (s2.south) -- ([yshift=-0.7em]s2.south);
@@ -80,38 +80,38 @@
 {\tiny
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt11) at (t11.east) {{\color{white} \textbf{P=.4}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt11) at (t11.east) {{\color{white} \textbf{$\funp{P}$=.4}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt12) at (t12.east) {{\color{white} \textbf{P=.2}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt12) at (t12.east) {{\color{white} \textbf{$\funp{P}$=.2}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt13) at (t13.east) {{\color{white} \textbf{P=.4}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt13) at (t13.east) {{\color{white} \textbf{$\funp{P}$=.4}}};
 {
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt14) at (t14.east) {{\color{white} \textbf{P=.1}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt14) at (t14.east) {{\color{white} \textbf{$\funp{P}$=.1}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt15) at (t15.east) {{\color{white} \textbf{P=.2}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt15) at (t15.east) {{\color{white} \textbf{$\funp{P}$=.2}}};
 }
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt21) at (t21.east) {{\color{white} \textbf{P=.4}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt21) at (t21.east) {{\color{white} \textbf{$\funp{P}$=.4}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt22) at (t22.east) {{\color{white} \textbf{P=.3}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt22) at (t22.east) {{\color{white} \textbf{$\funp{P}$=.3}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt23) at (t23.east) {{\color{white} \textbf{P=.3}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt23) at (t23.east) {{\color{white} \textbf{$\funp{P}$=.3}}};
 {
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt24) at (t24.east) {{\color{white} \textbf{P=.2}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt24) at (t24.east) {{\color{white} \textbf{$\funp{P}$=.2}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt25) at (t25.east) {{\color{white} \textbf{P=.1}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt25) at (t25.east) {{\color{white} \textbf{$\funp{P}$=.1}}};
 }
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt31) at (t31.east) {{\color{white} \textbf{P=1}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt31) at (t31.east) {{\color{white} \textbf{$\funp{P}$=1}}};
 {
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt33) at (t32.east) {{\color{white} \textbf{P=.4}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt33) at (t32.east) {{\color{white} \textbf{$\funp{P}$=.4}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt33) at (t33.east) {{\color{white} \textbf{P=.3}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt33) at (t33.east) {{\color{white} \textbf{$\funp{P}$=.3}}};
 }
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt41) at (t41.east) {{\color{white} \textbf{P=.5}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt41) at (t41.east) {{\color{white} \textbf{$\funp{P}$=.5}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt42) at (t42.east) {{\color{white} \textbf{P=.5}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt42) at (t42.east) {{\color{white} \textbf{$\funp{P}$=.5}}};
 {
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt43) at (t43.east) {{\color{white} \textbf{P=.3}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt43) at (t43.east) {{\color{white} \textbf{$\funp{P}$=.3}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt44) at (t44.east) {{\color{white} \textbf{P=.2}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt44) at (t44.east) {{\color{white} \textbf{$\funp{P}$=.2}}};
 }
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt51) at (t51.east) {{\color{white} \textbf{P=.5}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt51) at (t51.east) {{\color{white} \textbf{$\funp{P}$=.5}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt52) at (t52.east) {{\color{white} \textbf{P=.4}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt52) at (t52.east) {{\color{white} \textbf{$\funp{P}$=.4}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt53) at (t53.east) {{\color{white} \textbf{P=.1}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt53) at (t53.east) {{\color{white} \textbf{$\funp{P}$=.1}}};
 }
@@ -143,13 +143,13 @@
 }
 {
-\node [anchor=north west] (wtranslabel) at ([yshift=-4em]t15.south west) {\scriptsize{翻译路径（仅含有单词）:}};
+\node [anchor=north west] (wtranslabel) at ([yshift=-4em]t15.south west) {\scriptsize{翻译路径（仅含有单词）}};
-\draw [->,ultra thick,red,line width=1.5pt,opacity=0.7] (wtranslabel.east) -- ([xshift=1em]wtranslabel.east);
+\draw [->,ultra thick,red,line width=1.5pt,opacity=0.7] ([xshift=0.2em]wtranslabel.east) -- ([xshift=1.2em]wtranslabel.east);
 }
 {
-\node [anchor=north west] (ptranslabel) at ([yshift=-5.5em]t15.south west) {\scriptsize{翻译路径（含有短语）:}};
+\node [anchor=north west] (ptranslabel) at ([yshift=-5.5em]t15.south west) {\scriptsize{翻译路径（含有短语）}};
-\draw [->,ultra thick,ublue,line width=1.5pt,opacity=0.7] ([xshift=0.65em]ptranslabel.east) -- ([xshift=1.65em]ptranslabel.east);
+\draw [->,ultra thick,ublue,line width=1.5pt,opacity=0.7] ([xshift=0.95em]ptranslabel.east) -- ([xshift=1.95em]ptranslabel.east);
 }
 \end{scope}

--- a/Chapter7/Figures/figure-word-translation-regard-as-path.tex
+++ b/Chapter7/Figures/figure-word-translation-regard-as-path.tex
@@ -10,7 +10,7 @@
 \node [anchor=west] (s4) at ([xshift=2em]s3.east) {\textbf{表示}};
 \node [anchor=west] (s5) at ([xshift=2em]s4.east) {\textbf{满意}};
-\node [anchor=south west] (sentlabel) at ([yshift=-0.5em]s1.north west) {\scriptsize{\textbf{待翻译句子(已经分词):}}};
+\node [anchor=south west] (sentlabel) at ([yshift=-0.5em]s1.north west) {\scriptsize{\textbf{待翻译句子（已经分词）：}}};
 \draw [->,very thick,ublue] (s1.south) -- ([yshift=-0.7em]s1.south);
 \draw [->,very thick,ublue] (s2.south) -- ([yshift=-0.7em]s2.south);
@@ -52,22 +52,22 @@
 {\tiny
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt11) at (t11.east) {{\color{white} \textbf{P=.4}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt11) at (t11.east) {{\color{white} \textbf{$\funp{P}$=.4}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt12) at (t12.east) {{\color{white} \textbf{P=.2}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt12) at (t12.east) {{\color{white} \textbf{$\funp{P}$=.2}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt13) at (t13.east) {{\color{white} \textbf{P=.4}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt13) at (t13.east) {{\color{white} \textbf{$\funp{P}$=.4}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt21) at (t21.east) {{\color{white} \textbf{P=.4}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt21) at (t21.east) {{\color{white} \textbf{$\funp{P}$=.4}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt22) at (t22.east) {{\color{white} \textbf{P=.3}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt22) at (t22.east) {{\color{white} \textbf{$\funp{P}$=.3}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt23) at (t23.east) {{\color{white} \textbf{P=.3}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt23) at (t23.east) {{\color{white} \textbf{$\funp{P}$=.3}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt31) at (t31.east) {{\color{white} \textbf{P=1}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt31) at (t31.east) {{\color{white} \textbf{$\funp{P}$=1}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt41) at (t41.east) {{\color{white} \textbf{P=.5}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt41) at (t41.east) {{\color{white} \textbf{$\funp{P}$=.5}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt42) at (t42.east) {{\color{white} \textbf{P=.5}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt42) at (t42.east) {{\color{white} \textbf{$\funp{P}$=.5}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt51) at (t51.east) {{\color{white} \textbf{P=.5}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt51) at (t51.east) {{\color{white} \textbf{$\funp{P}$=.5}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt52) at (t52.east) {{\color{white} \textbf{P=.4}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt52) at (t52.east) {{\color{white} \textbf{$\funp{P}$=.4}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt53) at (t53.east) {{\color{white} \textbf{P=.1}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt53) at (t53.east) {{\color{white} \textbf{$\funp{P}$=.1}}};
 }

--- a/Chapter7/chapter7.tex
+++ b/Chapter7/chapter7.tex
--- a/Chapter7/Figures/figure-chinese-syntax-tree.tex
+++ b/Chapter7/Figures/figure-chinese-syntax-tree.tex
--- a/Chapter8/Figures/figure-cky-algorithm.tex
+++ b/Chapter8/Figures/figure-cky-algorithm.tex
@@ -12,11 +12,11 @@
 \node[srcnode,anchor=north west] (c21) at ([xshift=1.5em,yshift=0.4em]c1.south west) {\normalsize{\textbf{for} $j=0$ to $ J - 1$}};
 \node[srcnode,anchor=north west] (c22) at ([xshift=1.5em,yshift=0.4em]c21.south west) {\normalsize{$span[j,j+1 ]$.Add($A \to a \in G$)}};
 \node[srcnode,anchor=north west] (c3) at ([xshift=-1.5em,yshift=0.4em]c22.south west) {\normalsize{\textbf{for} $l$ = 1 to $J$}};
-\node[srcnode,anchor=west] (c31) at ([xshift=6em]c3.east) {\normalsize{// length of span}};
+\node[srcnode,anchor=west] (c31) at ([xshift=6em]c3.east) {\normalsize{// 跨度长度}};
 \node[srcnode,anchor=north west] (c4) at ([xshift=1.5em,yshift=0.4em]c3.south west) {\normalsize{\textbf{for} $j$ = 0 to $J-l$}};
-\node[srcnode,anchor=north west] (c41) at ([yshift=0.4em]c31.south west) {\normalsize{// beginning of span}};
+\node[srcnode,anchor=north west] (c41) at ([yshift=0.4em]c31.south west) {\normalsize{// 跨度起始位置}};
 \node[srcnode,anchor=north west] (c5) at ([xshift=1.5em,yshift=0.4em]c4.south west) {\normalsize{\textbf{for} $k$ = $j$ to $j+l$}};
-\node[srcnode,anchor=north west] (c51) at ([yshift=0.4em]c41.south west) {\normalsize{// partition of span}};
+\node[srcnode,anchor=north west] (c51) at ([yshift=0.4em]c41.south west) {\normalsize{// 跨度结束位置}};
 \node[srcnode,anchor=north west] (c6) at ([xshift=1.5em,yshift=0.4em]c5.south west) {\normalsize{$hypos$ = Compose($span[j, k], span[k, j+l]$)}};
 \node[srcnode,anchor=north west] (c7) at ([yshift=0.4em]c6.south west) {\normalsize{$span[j, j+l]$.Update($hypos$)}};
 \node[srcnode,anchor=north west] (c8) at ([xshift=-4.5em,yshift=0.4em]c7.south west) {\normalsize{\textbf{return} $span[0, J]$}};

--- a/Chapter8/Figures/figure-combination-of-translation-with-different-rules.tex
+++ b/Chapter8/Figures/figure-combination-of-translation-with-different-rules.tex
@@ -22,26 +22,29 @@
 \draw[decorate,decoration={mirror,brace}]([xshift=0.5em,yshift=-1em]q2.west) --([xshift=7em,yshift=-1em]q2.west) node [xshift=0em,yshift=-1em,align=center](label1) {};	
 {\scriptsize
-\node[anchor=west] (h1) at ([xshift=1em,yshift=-12em]q2.west) {{Span[0,3]下的翻译假设：}};
+\node[anchor=west] (h1) at ([xshift=1em,yshift=-15em]q2.west) {{Span[0,3]下的翻译假设：}};
-\node[anchor=west] (h2) at ([xshift=0em,yshift=-1.3em]h1.west) {{X: imports and exports}};
+\node[anchor=west] (h2) at ([xshift=0em,yshift=-1.3em]h1.west) {{X：imports and exports}};
-\node[anchor=west] (h6) at ([xshift=0em,yshift=-1.3em]h2.west) {{S: the import and export}};
+\node[anchor=west] (h6) at ([xshift=0em,yshift=-1.3em]h2.west) {{S：the import and export}};
 }
 {\scriptsize
-\node[anchor=west] (h21) at ([xshift=9em,yshift=2em]h1.east) {{替换$\textrm{X}_1$后生成的翻译假设：}};
+\node[anchor=west] (h21) at ([xshift=9em,yshift=5.0em]h1.east) {{替换$\textrm{X}_1$后生成的翻译假设：}};
-\node[anchor=west] (h22) at ([xshift=0em,yshift=-1.3em]h21.west) {{X: imports and exports have drastically fallen}};
+\node[anchor=west] (h22) at ([xshift=0em,yshift=-1.3em]h21.west) {{X：imports and exports have drastically fallen}};
-\node[anchor=west] (h23) at ([xshift=0em,yshift=-1.3em]h22.west) {{X: the import and export have drastically fallen}};
+\node[anchor=west] (h23) at ([xshift=0em,yshift=-1.3em]h22.west) {{X：the import and export have drastically fallen}};
-\node[anchor=west] (h24) at ([xshift=0em,yshift=-1.3em]h23.west) {{X: imports and exports have drastically fallen}};
+\node[anchor=west] (h24) at ([xshift=0em,yshift=-1.3em]h23.west) {{X：imports and exports have drastically fallen}};
-\node[anchor=west] (h25) at ([xshift=0em,yshift=-1.3em]h24.west) {{X: the import and export have drastically fallen}};
+\node[anchor=west] (h25) at ([xshift=0em,yshift=-1.3em]h24.west) {{X：the import and export have drastically fallen}};
-\node[anchor=west] (h26) at ([xshift=0em,yshift=-1.3em]h25.west) {{X: imports and exports has drastically fallen}};
+\node[anchor=west] (h26) at ([xshift=0em,yshift=-1.3em]h25.west) {{X：imports and exports has drastically fallen}};
-\node[anchor=west] (h27) at ([xshift=0em,yshift=-1.3em]h26.west) {{X: the import and export has drastically fallen}};
+\node[anchor=west] (h27) at ([xshift=0em,yshift=-1.3em]h26.west) {{X：the import and export has drastically fallen}};
 }
-\node [rectangle,inner sep=0.1em,rounded corners=1pt,draw] [fit = (h1) (h5) (h6)] (gl1) {};
+\node [rectangle,inner sep=0.1em,rounded corners=1pt,draw] [fit = (h1) (h2) (h6)] (gl1) {};
 \node [rectangle,inner sep=0.1em,rounded corners=1pt,draw] [fit = (h21) (h25) (h27)] (gl2) {};
-\draw [->,ublue,thick] ([xshift=0.6em,yshift=0.2em]n4.south) .. controls +(south:2em) and +(east:0em) ..   ([xshift=-0em,yshift=2em]gl2.west);
+\node [anchor=east,circle,inner sep=2pt,drop shadow,thick,draw=ublue,fill=white] (join) at ([xshift=4em,yshift=1em]gl1.north east) {\tiny{组合}};
-\draw [->,ublue,thick] ([xshift=0em,yshift=0em]gl1.east) .. controls +(north:2.2em) and +(east:0em) ..   ([xshift=-0em,yshift=2em]gl2.west);
+\draw [->,ublue,thick] ([xshift=0.6em,yshift=0.2em]n4.south) .. controls +(south:2em) and +(north:2em) ..   (join.90);
+\draw [->,ublue,thick] ([xshift=0em,yshift=1em]gl1.south east) .. controls +(east:2em) and +(south:2em) ..   (join.-90);
+\draw [->,ublue,thick] (join.0) -- ([xshift=2.3em]join.0);
 \end{scope}
 \end{tikzpicture}

--- a/Chapter8/Figures/figure-content-of-chart-in-tree-based-decoding.tex
+++ b/Chapter8/Figures/figure-content-of-chart-in-tree-based-decoding.tex
@@ -27,7 +27,7 @@
 \node [anchor=north] (t5) at (cell43.south) {\tiny{$l$=3}};
 \node [anchor=north] (t5) at (cell44.south) {\tiny{$l$=4}};
-\node [anchor=north] (chartlabel) at ([yshift=-1em]cell42.south east) {\footnotesize{\textbf{chart}}};
+\node [anchor=north] (chartlabel) at ([yshift=-1em]cell42.south east) {\footnotesize{Chart}};
 {\footnotesize
 \node [anchor=north west] (w1) at ([yshift=-2.5em,xshift=-1.0em]cell41.south west) {猫};

--- a/Chapter8/Figures/figure-example-of-hyper-graph.tex
+++ b/Chapter8/Figures/figure-example-of-hyper-graph.tex
@@ -3,34 +3,34 @@
 \begin{center}
 \begin{tikzpicture}\footnotesize
 \begin{scope}[scale=0.7]
-\node [anchor=center,draw,thick,circle,inner sep=13pt,ublue] (s1) at (0,0) {};
+\node [anchor=center,draw,very thick,circle,inner sep=13pt,ublue,fill=white,drop shadow] (s1) at (0,0) {};
 \node [anchor=north] (t11) at ([yshift=-0.8em]s1.north) {VP};
 \node [anchor=north] (t12) at ([yshift=-0.3em]t11.south) {[0,2]};
-\node [anchor=center,draw,thick,circle,inner sep=13pt,ublue] (s2) at ([xshift=12em,yshift=-5em]s1.north) {};
+\node [anchor=center,draw,very thick,circle,inner sep=13pt,ublue,fill=white,drop shadow] (s2) at ([xshift=12em,yshift=-3.5em]s1.north) {};
 \node [anchor=north] (t21) at ([yshift=-0.8em]s2.north) {NP};
 \node [anchor=north] (t22) at ([yshift=-0.3em]t21.south) {[0,2]};
-\node [anchor=center,draw,thick,circle,inner sep=13pt,ublue] (s3) at ([xshift=-6em,yshift=-13em]s1.south) {};
+\node [anchor=center,draw,very thick,circle,inner sep=13pt,ublue,fill=white,drop shadow] (s3) at ([xshift=-6em,yshift=-13em]s1.south) {};
 \node [anchor=north] (t31) at ([yshift=-0.8em]s3.north) {VV};
 \node [anchor=north] (t32) at ([yshift=-0.3em]t31.south) {[0,1]};
-\node [anchor=center,draw,thick,circle,inner sep=13pt,ublue] (s4) at ([xshift=13em,yshift=2.9em]s3.south) {};
+\node [anchor=center,draw,very thick,circle,inner sep=13pt,ublue,fill=white,drop shadow] (s4) at ([xshift=13em,yshift=2.9em]s3.south) {};
 \node [anchor=north] (t41) at ([yshift=-0.8em]s4.north) {NN};
 \node [anchor=north] (t42) at ([yshift=-0.3em]t41.south) {[1,2]};
-\node [anchor=center,draw,thick,circle,inner sep=13pt,ublue] (s5) at ([xshift=13em,yshift=2.9em]s4.south) {};
+\node [anchor=center,draw,very thick,circle,inner sep=13pt,ublue,fill=white,drop shadow] (s5) at ([xshift=13em,yshift=2.9em]s4.south) {};
 \node [anchor=north] (t51) at ([yshift=-0.8em]s5.north) {NP};
 \node [anchor=north] (t52) at ([yshift=-0.3em]t51.south) {[1,2]};
 {
-\draw [->,red!50,very thick] ([xshift=-1em,yshift=-0.3em]s3.north) .. controls +(north:10em) and +(south:10em) .. ([xshift=0em,yshift=0em]s1.south);
+\draw [->,red!60,very thick] ([yshift=0.1em]s3.100) .. controls +(north:8em) and +(south:10em) .. ([xshift=0em,yshift=-0.2em]s1.south);
-\draw [->,red!50,very thick] ([xshift=-1em,yshift=-0.3em]s5.north) .. controls +(north:8em) and +(south:14em) .. ([xshift=0em,yshift=0em]s1.south);
+\draw [->,red!60,very thick] ([yshift=0.1em]s5.110) .. controls +(north:8em) and +(south:12em) .. ([xshift=0em,yshift=-0.2em]s1.south);
 }
 {
-\draw [->,blue!50,very thick] ([xshift=-1em,yshift=-0.3em]s4.north) .. controls +(north:8em) and +(south:8em) .. ([xshift=0em,yshift=0em]s2.south);
+\draw [->,ugreen,very thick] ([yshift=0.1em]s4.90) .. controls +(north:9em) and +(south:7em) .. ([xshift=0em,yshift=-0.2em]s2.south);
-\draw [->,blue!50,very thick] ([xshift=1em,yshift=-0.3em]s5.north) .. controls +(north:9em) and +(south:7em) .. ([xshift=0em,yshift=0em]s2.south);
+\draw [->,ugreen,very thick] ([yshift=0.1em]s5.90) .. controls +(north:9em) and +(south:7em) .. ([xshift=0em,yshift=-0.2em]s2.south);
 }
 \node [anchor=north] (t51) at ([yshift=7em]s3.north) {edge1};

--- a/Chapter7/Figures/figure-example-of-translation-use-syntactic-structure.tex
+++ b/Chapter7/Figures/figure-example-of-translation-use-syntactic-structure.tex
--- a/Chapter8/Figures/figure-examples-of-translation-with-complex-ordering.tex
+++ b/Chapter8/Figures/figure-examples-of-translation-with-complex-ordering.tex
@@ -7,13 +7,13 @@
 {\scriptsize
-\node[anchor=west] (ref) at (0,0) {{\sffamily\bfseries{参考答案:}} The Chinese star performance troupe presented a wonderful Peking opera as well as singing and dancing };
+\node[anchor=west] (ref) at (0,0) {{\sffamily\bfseries{参考答案：}} The Chinese star performance troupe presented a wonderful Peking opera as well as singing and dancing };
 \node[anchor=north west] (ref2) at (ref.south west) {{\color{white} \sffamily\bfseries{Reference:}} performance to Hong Kong audience .};
-\node[anchor=north west] (hifst) at (ref2.south west) {{\sffamily\bfseries{层次短语系统:}} Star troupe of China, highlights of Peking opera and dance show to the audience of Hong Kong .};
+\node[anchor=north west] (hifst) at (ref2.south west) {{\sffamily\bfseries{层次短语系统：}} Star troupe of China, highlights of Peking opera and dance show to the audience of Hong Kong .};
-\node[anchor=north west] (synhifst) at (hifst.south west) {{\sffamily\bfseries{句法系统:}} Chinese star troupe};
+\node[anchor=north west] (synhifst) at (hifst.south west) {{\sffamily\bfseries{句法系统：}} Chinese star troupe};
 \node[anchor=west, fill=green!20!white, inner sep=0.25em] (synhifstpart2) at (synhifst.east) {presented};
@@ -25,7 +25,7 @@
 \node[anchor=west] (synhifstpart6) at (synhifstpart5.east) {.};
-\node[anchor=north west] (input) at ([yshift=-6.5em]synhifst.south west) {\sffamily\bfseries{源语句法树:}};
+\node[anchor=north west] (input) at ([yshift=-6.5em]synhifst.south west) {\sffamily\bfseries{源语句法树：}};
 \begin{scope}[scale = 0.9, grow'=up, sibling distance=5pt, level distance=30pt, xshift=3.49in, yshift=-3.1in]

--- a/Chapter8/Figures/figure-extract-hierarchical-phrase-rules.tex
+++ b/Chapter8/Figures/figure-extract-hierarchical-phrase-rules.tex
@@ -63,7 +63,7 @@
 {\scriptsize
 \node (phrase) {\textbf{抽取得到的短语:}};
 \draw[-] (phrase.south west)--([xshift=1.9in]phrase.south west);
-\node[anchor=north west] (rules) at ([yshift=-7.5em]phrase.south west) {\textbf{抽取得到的规则:}};
+\node[anchor=north west] (rules) at ([yshift=-7.5em]phrase.south west) {\textbf{抽取得到的层次短语规则:}};
 \draw[-] (rules.south west)--([xshift=1.9in]rules.south west);
 {

--- a/Chapter8/Figures/figure-hierarchical-phrase-rule-match-generate.tex
+++ b/Chapter8/Figures/figure-hierarchical-phrase-rule-match-generate.tex
@@ -17,26 +17,29 @@
 {\scriptsize
 \node[anchor=west] (h1) at ([xshift=1em,yshift=-7em]q2.west) {{Span[0,3]下的翻译假设：}};
-\node[anchor=west] (h2) at ([xshift=0em,yshift=-1.3em]h1.west) {{X: the imports and exports}};
+\node[anchor=west] (h2) at ([xshift=0em,yshift=-1.3em]h1.west) {{X：the imports and exports}};
-\node[anchor=west] (h3) at ([xshift=0em,yshift=-1.3em]h2.west) {{X: imports and exports}};
+\node[anchor=west] (h3) at ([xshift=0em,yshift=-1.3em]h2.west) {{X：imports and exports}};
-\node[anchor=west] (h4) at ([xshift=0em,yshift=-1.3em]h3.west) {{X: exports and imports}};
+\node[anchor=west] (h4) at ([xshift=0em,yshift=-1.3em]h3.west) {{X：exports and imports}};
-\node[anchor=west] (h5) at ([xshift=0em,yshift=-1.3em]h4.west) {{X: the imports and the exports}};
+\node[anchor=west] (h5) at ([xshift=0em,yshift=-1.3em]h4.west) {{X：the imports and the exports}};
-\node[anchor=west] (h6) at ([xshift=0em,yshift=-1.3em]h5.west) {{S: the import and export}};
+\node[anchor=west] (h6) at ([xshift=0em,yshift=-1.3em]h5.west) {{S：the import and export}};
 }
 {\scriptsize
 \node[anchor=west] (h21) at ([xshift=9em,yshift=0em]h1.east) {{替换$\textrm{X}_1$后生成的翻译假设：}};
-\node[anchor=west] (h22) at ([xshift=0em,yshift=-1.3em]h21.west) {{X: the imports and exports have drastically fallen}};
+\node[anchor=west] (h22) at ([xshift=0em,yshift=-1.3em]h21.west) {{X：the imports and exports have drastically fallen}};
-\node[anchor=west] (h23) at ([xshift=0em,yshift=-1.3em]h22.west) {{X: imports and exports have drastically fallen}};
+\node[anchor=west] (h23) at ([xshift=0em,yshift=-1.3em]h22.west) {{X：imports and exports have drastically fallen}};
-\node[anchor=west] (h24) at ([xshift=0em,yshift=-1.3em]h23.west) {{X: exports and imports have drastically fallen}};
+\node[anchor=west] (h24) at ([xshift=0em,yshift=-1.3em]h23.west) {{X：exports and imports have drastically fallen}};
-\node[anchor=west] (h25) at ([xshift=0em,yshift=-1.3em]h24.west) {{X: the imports and the exports have drastically fallen}};
+\node[anchor=west] (h25) at ([xshift=0em,yshift=-1.3em]h24.west) {{X：the imports and the exports have drastically fallen}};
 }
 \node [rectangle,inner sep=0.1em,rounded corners=1pt,draw] [fit = (h1) (h5) (h6)] (gl1) {};
 \node [rectangle,inner sep=0.1em,rounded corners=1pt,draw] [fit = (h21) (h25)] (gl2) {};
-\draw [->,ublue,thick] ([xshift=0.6em,yshift=0.2em]n2.south) .. controls +(south:2em) and +(east:0em) ..   ([xshift=-0em,yshift=2em]gl2.west);
+\node [anchor=east,circle,inner sep=2pt,drop shadow,thick,draw=ublue,fill=white] (join) at ([xshift=3em,yshift=-2em]gl1.north east) {\tiny{组合}};
-\draw [->,ublue,thick] ([xshift=0em,yshift=1em]gl1.east) .. controls +(north:2.2em) and +(east:0em) ..   ([xshift=-0em,yshift=2em]gl2.west);
+\draw [->,ublue,thick] ([xshift=0.6em,yshift=0.2em]n2.south) .. controls +(south:2em) and +(north:2em) ..   (join.90);
+\draw [->,ublue,thick] ([xshift=0em,yshift=1em]gl1.south east) .. controls +(east:2em) and +(south:2em) ..   (join.-90);
+\draw [->,ublue,thick] (join.0) -- ([xshift=1.7em]join.0);
 \end{scope}
 \end{tikzpicture}

--- a/Chapter7/Figures/figure-long-distance-dependence-in-zh2en-translation.tex
+++ b/Chapter7/Figures/figure-long-distance-dependence-in-zh2en-translation.tex
--- a/Chapter8/Figures/figure-one-best-node-alignment-and-alignment-matrix.tex
+++ b/Chapter8/Figures/figure-one-best-node-alignment-and-alignment-matrix.tex
@@ -6,7 +6,7 @@
 \begin{flushright}
 \begin{tikzpicture}
-\begin{scope}[scale=0.47]
+\begin{scope}[scale=0.60]
 {\Large
 \begin{scope}[sibling distance=17pt, level distance = 35pt]
@@ -30,7 +30,7 @@
 \end{scope}
 }
-\begin{scope}[xshift=2.3in, yshift=-0.3in]
+\begin{scope}[xshift=1.8in, yshift=-0.3in]
 \node[anchor=west, rotate=60] at (0.8,-0.6) {VP$^{[1]}$};
 \node[anchor=west, rotate=60] at (1.8,-0.6) {VBZ$^{[2]}$};
 \node[anchor=west, rotate=60] at (2.8,-0.6) {ADVP$^{[3]}$};
@@ -54,12 +54,12 @@
 \node[fill=blue!40, scale=1.1, inner sep=1pt, minimum size=12pt] at (4,-2) {{\color{white} 1}};
 \node[fill=blue!40, scale=1.1, inner sep=1pt, minimum size=12pt] at (5,-4) {{\color{white} 1}};
-\node[] at (4,-6.3) {{\color{blue!40} $\blacksquare$} = fixed alignment};
+\node[] at (4,-6.3) {{\color{blue!40} $\blacksquare$} = 确定的对齐};
-\node[] at (4,-7.2) {Matrix 1: 1-best alignment};
+\node[] at (4,-7.2) {Matrix 1: 1-best对齐};
 \end{scope}
-\begin{scope}[xshift=6.1in, yshift=-0.3in]
+\begin{scope}[xshift=4.8in, yshift=-0.3in]
 \node[anchor=west, rotate=60] at (0.8,-0.6) {VP$^{[1]}$};
 \node[anchor=west, rotate=60] at (1.8,-0.6) {VBZ$^{[2]}$};
 \node[anchor=west, rotate=60] at (2.8,-0.6) {ADVP$^{[3]}$};
@@ -92,8 +92,8 @@
 \node[fill=blue!40, scale=0.65, inner sep=1pt, minimum size=12pt] at (3,-4) {{\color{white} \small{.3}}};
 \node[fill=blue!40, scale=0.9, inner sep=1pt, minimum size=12pt] at (5,-4) {{\color{white} \small{.7}}};
-\node[] at (4,-6.3) {{\color{blue!40} $\blacksquare$} = possible alignment};
+\node[] at (4,-6.3) {{\color{blue!40} $\blacksquare$} = 概率化对齐};
-\node[] at (4,-7.2) {Matrix 2: posterior};
+\node[] at (4,-7.2) {Matrix 2: 对齐概率};
 \node[] at (9,-7.2) {};%占位符
 \end{scope}
@@ -112,8 +112,8 @@
 \begin{tabular}[t]{C{0.48\linewidth} C{0.48\linewidth} }
 \begin{tabular}{l L{150pt}}
-\multicolumn{2}{l}{\textbf{\footnotesize{Minimal Rules}}} \\
+\multicolumn{2}{l}{\textbf{\small{最小规则}}} \\
-\multicolumn{2}{l}{\textbf{\footnotesize{Extracted from Matrix 1 (1-best)}}} \\
+\multicolumn{2}{l}{\textbf{\small{Matrix 1 (基于1-best对齐)}}} \\
 \hline
 \footnotesize{$r_3$} & \footnotesize{AD(大幅度) $\rightarrow$ RB(drastically)} \\
 \footnotesize{$r_4$} & \footnotesize{VV(减少) $\rightarrow$ VBN(fallen)} \\
@@ -128,8 +128,8 @@
 &
 \begin{tabular}{l L{150pt}}
-\multicolumn{2}{l}{\textbf{\small{Minimal Rules}}} \\
+\multicolumn{2}{l}{\textbf{\small{最小规则}}} \\
-\multicolumn{2}{l}{\textbf{\small{Extracted from Matrix 2 (posterior)}}} \\
+\multicolumn{2}{l}{\textbf{\small{Matrix 2 (基于对齐概率)}}} \\
 \hline
 \footnotesize{$r_3$} & \footnotesize{AD(大幅度) $\rightarrow$ RB(drastically)} \\
 \footnotesize{$r_4$} & \footnotesize{VV(减少) $\rightarrow$ VBN(fallen)} \\

--- a/Chapter8/Figures/figure-result-of-tree-binarization.tex
+++ b/Chapter8/Figures/figure-result-of-tree-binarization.tex
@@ -9,13 +9,13 @@
 \Tree[.\node(n1){NP};
     	[.NNP \node(sw1){美国}; ]
     	[.NN \node(sw2){总统}; ]
-        [.NN \node(sw3){唐纳德}; ]
+        [.NN \node(sw3){乔治}; ]
-        [.NN \node(sw4){特朗普}; ]
+        [.NN \node(sw4){华盛顿}; ]
     ]
 \node [anchor=north] (tw1) at ([yshift=-2em]sw1.south) {U.S.};
 \node [anchor=north] (tw2) at ([yshift=-2em]sw2.south) {President};
-\node [anchor=north] (tw3) at ([yshift=-2em]sw3.south) {Trump};
+\node [anchor=north] (tw3) at ([yshift=-2em,xshift=2em]sw3.south) {Washington};
 \draw [-,dashed] (sw1.south) -- (tw1.north);
 \draw [-,dashed] (sw2.south) -- (tw2.north);
@@ -33,15 +33,15 @@
 	[.NP-BAR
     	    [.NN \node(sw2){总统}; ]
 	    [.NP-BAR
-                [.NN \node(sw3){唐纳德}; ]
+                [.NN \node(sw3){乔治}; ]
-                [.NN \node(sw4){特朗普}; ]
+                [.NN \node(sw4){华盛顿}; ]
             ]
         ]
     ]
 \node [anchor=north] (tw1) at ([yshift=-4.5em]sw1.south) {U.S.};
 \node [anchor=north] (tw2) at ([yshift=-2.75em]sw2.south) {President};
-\node [anchor=north] (tw3) at ([yshift=-1em]sw3.south) {Trump};
+\node [anchor=north] (tw3) at ([yshift=-1em,xshift=2em]sw3.south) {Washington};
 \draw [-,dashed] (sw1.south) -- (tw1.north);
 \draw [-,dashed] (sw2.south) -- (tw2.north);

--- a/Chapter8/Figures/figure-role-of-syntax-tree-in-different-decoding-methods.tex
+++ b/Chapter8/Figures/figure-role-of-syntax-tree-in-different-decoding-methods.tex
@@ -17,13 +17,13 @@
     ]
 \node [anchor=west] (target) at ([xshift=1em]bsw3.east) {Cats like eating fish};
-\node [anchor=north,inner sep=3pt] (cap1) at ([yshift=-1em]target.south west) {(a) 基于树的解码};
+\node [anchor=north,inner sep=3pt] (cap1) at ([xshift=-1.0em,yshift=-1em]target.south west) {(a) 基于树的解码};
 \draw [->,thick] (bsw3.east) -- (target.west);
 \node [anchor=west] (sourcelabel) at ([xshift=6em,yshift=-1em]bsn0.east) {显式输入的结构};
-\node [anchor=west] (source2) at ([xshift=3.3em]target.east) {猫$\ \ \;$喜欢$\ \;$吃\ 鱼};
+\node [anchor=west] (source2) at ([xshift=3.3em,yshift=0.0em]target.east) {猫$\ \ \;$喜欢$\ \;$吃\ 鱼};
 \node [anchor=west] (target2) at ([xshift=1em]source2.east) {Cats like eating fish};
-\node [anchor=north,inner sep=3pt] (cap2) at ([xshift=1.1em,yshift=-1em]target2.south west) {(b) 基于串的解码};
+\node [anchor=north,inner sep=3pt] (cap2) at ([xshift=-1.5em,yshift=-1em]target2.south west) {(b) 基于串的解码};
 \draw [->,thick] (source2.east) -- (target2.west);
 \begin{pgfonlayer}{background}
@@ -32,7 +32,7 @@
 }
 \end{pgfonlayer}
-\begin{scope}[xshift=3.18in,yshift=-0em,sibling distance=10pt]
+\begin{scope}[xshift=3.18in,yshift=-0.28em,sibling distance=10pt]
 \Tree[.\node(bsn0){IP};
          [.\node(bsn1){NP};
               [.\node(bsn2){NN}; ]
@@ -44,7 +44,7 @@
     ]
 \begin{pgfonlayer}{background}
-\node [draw,dashed,rectangle,inner sep=1em,thick,red,rounded corners=5pt] (box) [fit = (bsn0) (bsn1) (bsn2) (bsn3) (bsn4) (bsn5)] {};
+\node [draw,dashed,rectangle,inner sep=0.7em,thick,red,rounded corners=5pt] (box) [fit = (bsn0) (bsn1) (bsn2) (bsn3) (bsn4) (bsn5)] {};
 \node [anchor=north west] (boxlabel) at ([xshift=2em,yshift=-2em]box.north east) {隐含结构};
 \end{pgfonlayer}

--- a/Chapter8/Figures/figure-structure-of-chart.tex
+++ b/Chapter8/Figures/figure-structure-of-chart.tex
@@ -2,45 +2,30 @@
 %%%  基于树的解码方法 - chart-based decoding
 \begin{center}
 \begin{tikzpicture}
-\begin{scope}%[scale=0.2]
+\begin{scope}
-\node [anchor=north] (ch) at (0,0) {\large{\textbf{Chart}}};
+\node [anchor=south west,draw,fill=ugreen!20,minimum width=2.8em,minimum height=2.8em,inner sep=1pt] (cell11) at (0,0) {\scriptsize{cell[1,2]}};
+\node [anchor=south west,draw,fill=red!20,minimum width=2.8em,minimum height=2.8em,inner sep=1pt] (cell12) at (cell11.south east) {\scriptsize{cell[0,2]}};
+\node [anchor=south west,draw,fill=orange!30,minimum width=2.8em,minimum height=2.8em,inner sep=1pt] (cell21) at (cell11.north west) {\scriptsize{cell[0,1]}};
+\node [anchor=south west,draw,fill=gray!20,minimum width=2.8em,minimum height=2.8em,inner sep=1pt] (cell22) at (cell21.south east) {\scriptsize{N/A}};
+\draw [->,thick] ([xshift=-1em,yshift=1em]cell21.north west)--([xshift=-1em,yshift=-1em]cell11.south west);
+\draw [->,thick] ([xshift=-1em,yshift=1em]cell21.north west)--([xshift=1em,yshift=1em]cell22.north east);
+\node [anchor=north west,fill=orange!30,draw,drop shadow,align=left,minimum width=4em] (cell11label) at ([xshift=4em,yshift=1em]cell22.north east) {\footnotesize{VV[0,1]}};
+\node [anchor=north west,fill=ugreen!20,draw,drop shadow,align=left,minimum width=4em] (cell12label) at ([yshift=-1em]cell11label.south west) {\footnotesize{NN[1,2]}\\\footnotesize{NP[1,2]}};
+\node [anchor=north west,fill=red!20,draw,drop shadow,align=left,minimum width=4em] (cell21label) at ([yshift=-1em]cell12label.south west) {\footnotesize{VP[0,2]}\\\footnotesize{NP[0,2]}};
+\draw [->,very thick,dotted] ([yshift=0.3em]cell11label.west) .. controls +(west:2em)  and +(north:1.5em) .. ([xshift=1em,yshift=-0.5em]cell21.north);
+\draw [->,very thick,dotted] ([yshift=-0.5em]cell12label.west) -- ([yshift=-0.5em,xshift=-7.5em]cell12label.west);
+\draw [->,very thick,dotted] ([yshift=-0.3em]cell21label.west) .. controls +(west:2em)  and +(south:1.5em) .. ([xshift=1em,yshift=0.5em]cell12.south);
+\node [anchor=south] (label1) at ([yshift=1em]cell21.north east) {\footnotesize{跨度大小}};
+\node [anchor=north] (l21) at ([xshift=-2.0em,yshift=1em]cell21.west) {\footnotesize{起}};
+\node [anchor=north] (l22) at ([xshift=0em,yshift=0.5em]l21.south) {\footnotesize{始}};
+\node [anchor=north] (l23) at ([xshift=0em,yshift=0.5em]l22.south) {\footnotesize{位}};
+\node [anchor=north] (l24) at ([xshift=0em,yshift=0.5em]l23.south) {\footnotesize{置}};
+\node [anchor=north] (labelchart) at (cell11.south east) {\small{Chart（表格）}};
-\draw [->,ublue] ([xshift=-1em,yshift=-1em]ch.south) -- ([xshift=-1em,yshift=-9em]ch.south);
-\draw [->,ublue] ([xshift=-1em,yshift=-1em]ch.south) -- ([xshift=10em,yshift=-1em]ch.south);
-{\small
-\node [anchor=north] (l11) at ([xshift=-1.7em,yshift=-2.5em]ch.south) {{起}};
-\node [anchor=north] (l12) at ([xshift=0em,yshift=0.5em]l11.south) {{始}};
-\node [anchor=north] (l13) at ([xshift=0em,yshift=0.5em]l12.south) {{位}};
-\node [anchor=north] (l14) at ([xshift=0em,yshift=0.5em]l13.south) {{置}};
-\node [anchor=north] (l2) at ([xshift=4.5em,yshift=0.4em]ch.south) {{跨度大小}};
-}
-\draw [-,ublue] ([xshift=1em,yshift=-2em]ch.south) -- ([xshift=1em,yshift=-8em]ch.south);
-\draw [-,ublue] ([xshift=5em,yshift=-2em]ch.south) -- ([xshift=5em,yshift=-8em]ch.south);
-\draw [-,ublue] ([xshift=9em,yshift=-2em]ch.south) -- ([xshift=9em,yshift=-8em]ch.south);
-\draw [-,ublue] ([xshift=1em,yshift=-2em]ch.south) -- ([xshift=9em,yshift=-2em]ch.south);
-\draw [-,ublue] ([xshift=1em,yshift=-5em]ch.south) -- ([xshift=9em,yshift=-5em]ch.south);
-\draw [-,ublue] ([xshift=1em,yshift=-8em]ch.south) -- ([xshift=9em,yshift=-8em]ch.south);
-\node [anchor=north,rectangle,draw=red!40, inner sep=0mm,minimum height=4em,minimum width=9em,rounded corners=2pt,very thick] (n1) at ([xshift=18em,yshift=2em]ch.south) {};
-\node [anchor=north,rectangle,draw=red!40, inner sep=0mm,minimum height=4em,minimum width=9em,rounded corners=2pt,very thick] (n2) at ([xshift=0em,yshift=-0.5em]n1.south) {};
-\node [anchor=north,rectangle,draw=red!40, inner sep=0mm,minimum height=4em,minimum width=9em,rounded corners=2pt,very thick] (n3) at ([xshift=0em,yshift=-0.5em]n2.south) {};
-\node [anchor=north] (n11) at ([xshift=0em,yshift=-0.5em]n1.north) {Cell[0,1]:};
-\node [anchor=north] (n12) at ([xshift=1em,yshift=-1.2em]n11.north) {VV[0,1]};
-\node [anchor=north] (n21) at ([xshift=0em,yshift=-0.1em]n2.north) {Cell[1,2]:};
-\node [anchor=north] (n22) at ([xshift=1em,yshift=-1.2em]n21.north) {NN[1,2]};
-\node [anchor=north] (n23) at ([xshift=0em,yshift=-1.3em]n22.north) {NP[1,2]};
-\node [anchor=north] (n31) at ([xshift=0em,yshift=-0.1em]n3.north) {Cell[0,2]:};
-\node [anchor=north] (n32) at ([xshift=1em,yshift=-1.2em]n31.north) {VP[0,2]};
-\node [anchor=north] (n33) at ([xshift=0em,yshift=-1.3em]n32.north) {NP[0,2]};
-\draw [->,blue!40,very thick] ([xshift=0em,yshift=-0.5em]n1.west) .. controls +(west:6em) and +(north:3em) .. ([xshift=-15em,yshift=-2em]n1.south);
-\draw [->,blue!40,very thick] ([xshift=0em,yshift=1em]n2.west) .. controls +(west:2em) and +(north:2em) .. ([xshift=-14.5em,yshift=0em]n2.south);
-\draw [->,blue!40,very thick] ([xshift=0em,yshift=-0.5em]n3.west) .. controls +(west:5em) and +(south:0.5em) .. ([xshift=-12em,yshift=5em]n3.south);
 \end{scope}
 \end{tikzpicture}
 \end{center}
--- a/Chapter8/Figures/figure-syntax-tree-with-admissible-node.tex
+++ b/Chapter8/Figures/figure-syntax-tree-with-admissible-node.tex
@@ -36,8 +36,8 @@
 \draw[dashed] (cw4.south) .. controls +(south:2.0) and +(north:0.6) .. ([yshift=-0.4em]tw3.north);
 \draw[dashed] (cw5.south) .. controls +(south:2.0) and +(north:0.6) .. ([yshift=-0.4em]tw3.north);
-\node [anchor=south west,align=left,fill=red!20,drop shadow] (label1) at ([xshift=0.5em]n11.north east) {\footnotesize{span=\{3\}}\\\footnotesize{c-span=\{1,3-6\}}};
+\node [anchor=south west,align=left,fill=red!20,drop shadow] (label1) at ([xshift=0.5em,yshift=-1.3em]n11.north east) {\footnotesize{可达范围=\{3\}}\\\footnotesize{补充范围=\{1,3-6\}}};
-\node [anchor=south west,align=left,fill=blue!20,drop shadow] (label2) at ([xshift=0.5em,yshift=-0.5em]n4.north east) {\footnotesize{span=\{3-6\}}\\\footnotesize{c-span=\{1\}}};
+\node [anchor=south west,align=left,fill=blue!20,drop shadow] (label2) at ([xshift=0.5em,yshift=-0.5em]n4.north east) {\footnotesize{可达范围=\{3-6\}}\\\footnotesize{补充范围=\{1\}}};
 \begin{pgfonlayer}{background}
 \node [rectangle,fill=red!20,inner sep=0] [fit = (n11)] (n11box) {};
@@ -58,7 +58,7 @@
 {
 \node [anchor=north] (n11boxlabel) at (label1.south) {\footnotesize{{\red{不可信}}}};
-\node [anchor=north] (n4boxlabel) at (label2.south) {\footnotesize{{\red{可信}}}};
+\node [anchor=north] (n4boxlabel) at (label2.south) {\footnotesize{{{\color{ublue} 可信}}}};
 }
 {

--- a/Chapter8/Figures/figure-tree-binarization.tex
+++ b/Chapter8/Figures/figure-tree-binarization.tex
@@ -11,14 +11,14 @@
 \Tree[.\node(n1){NP};
     	[.NNP \node(sw1){美国}; ]
     	[.NN \node(sw2){总统}; ]
-        [.NN \node(sw3){唐纳德}; ]
+        [.NN \node(sw3){乔治}; ]
-        [.NN \node(sw4){特朗普}; ]
+        [.NN \node(sw4){华盛顿}; ]
     ]
 }
 \node [anchor=north] (tw1) at ([yshift=-2em]sw1.south) {U.S.};
 \node [anchor=north] (tw2) at ([yshift=-2em]sw2.south) {President};
-\node [anchor=north] (tw3) at ([yshift=-2em]sw3.south) {Trump};
+\node [anchor=north] (tw3) at ([yshift=-2em,xshift=1.5em]sw3.south) {Washington};
 \draw [-,dashed] (sw1.south) -- (tw1.north);
 \draw [-,dashed] (sw2.south) -- (tw2.north);
@@ -26,12 +26,12 @@
 \draw [-,dashed] (sw4.south) -- (tw3.north);
 \node [anchor=west] (rulelabel1) at ([xshift=1in,yshift=0.3em]n1.east) {{抽取到的规则：}};
-\node [anchor=north west] (rule1) at (rulelabel1.south west) {NP(NNP$_1$ NN$_2$ NN(唐纳德) NN(特朗普))};
+\node [anchor=north west] (rule1) at (rulelabel1.south west) {NP(NNP$_1$ NN$_2$ NN(乔治) NN(华盛顿))};
 \node [anchor=north west] (rule1t) at ([yshift=0.2em]rule1.south west) {$\to$ NNP$_1$ NN$_2$ Trump};
-\node [anchor=north west] (rule2) at (rule1t.south west) {NP(NNP$_1$ NN(总统) NN(唐纳德) NN(特朗普))};
+\node [anchor=north west] (rule2) at (rule1t.south west) {NP(NNP$_1$ NN(总统) NN(乔治) NN(华盛顿))};
 \node [anchor=north west] (rule2t) at ([yshift=0.2em]rule2.south west) {$\to$ NNP$_1$ President Trump};
 \node [anchor=north west] (rulelabel2) at ([yshift=-0.3em]rule2t.south west) {{{\red{不能}}抽取到的规则：}};
-\node [anchor=north west] (rule3) at (rulelabel2.south west) {NP(NN(唐纳德) NN(特朗普)) $\to$ Trump};
+\node [anchor=north west] (rule3) at (rulelabel2.south west) {NP(NN(乔治) NN(华盛顿)) $\to$ Trump};
 \end{scope}
 }

--- a/Chapter8/chapter8.tex
+++ b/Chapter8/chapter8.tex
--- a/bibliography.bib
+++ b/bibliography.bib
--- a/mt-book-xelatex.tex
+++ b/mt-book-xelatex.tex
@@ -134,9 +134,9 @@
 %\include{Chapter1/chapter1}
 %\include{Chapter2/chapter2}
 %\include{Chapter3/chapter3}
-\include{Chapter4/chapter4}
+%\include{Chapter4/chapter4}
 %\include{Chapter5/chapter5}
-%\include{Chapter6/chapter6}
+\include{Chapter6/chapter6}
 %\include{Chapter7/chapter7}
 %\include{Chapter8/chapter8}
 %\include{Chapter9/chapter9}

--- a/structure.tex
+++ b/structure.tex
@@ -687,3 +687,4 @@ addtohook={%
 \newcommand\funp{}%函数P等使用，空是斜体，textrm是加粗
 \newcommand\vectorn{\mathbf}%向量N等使用
+\newcommand\seq{\mathbf}%序列N等使用