合并分支 'caorunzhe' 到 'zhoutao'

Caorunzhe 查看合并请求 !181

合并分支 'caorunzhe' 到 'zhoutao'
Caorunzhe 查看合并请求 !181
c9d57ffd · zhoutao · 248543fb · 980015a3 · 248543fb · c9d57ffd
Commit c9d57ffd authored Sep 10, 2020 by zhoutao
--- a/Chapter1/Figures/figure-eniac.jpg
+++ b/Chapter1/Figures/figure-eniac.jpg
--- a/Chapter1/chapter1.tex
+++ b/Chapter1/chapter1.tex
@@ -113,16 +113,7 @@
 \parinterval 早在17世纪，如Descartes、Leibniz、Cave\ Beck、Athanasius\ Kircher和Johann\ Joachim\ Becher等很多学者就提出采用机器词典（电子词典）来克服语言障碍的想法\upcite{knowlson1975universal}，这种想法在当时是很超前的。随着语言学、计算机科学等学科的发展，在19世纪30年代使用计算模型进行自动翻译的思想开始萌芽，如当时法国科学家Georges Artsrouni就提出用机器来进行翻译的想法。只是那时依然没有合适的实现手段，所以这种想法的合理性无法被证实。
-\parinterval 随着第二次世界大战爆发， 对文字进行加密和解密成为重要的军事需求，这也使得数学和密码学变得相当发达。在战争结束一年后，世界上第一台通用电子数字计算机于1946年研制成功（图\ref{fig:1-4}），至此使用机器进行翻译有了真正实现的可能。
+\parinterval 随着第二次世界大战爆发， 对文字进行加密和解密成为重要的军事需求，这也使得数学和密码学变得相当发达。在战争结束一年后，世界上第一台通用电子数字计算机于1946年研制成功，至此使用机器进行翻译有了真正实现的可能。
-%----------------------------------------------
-\begin{figure}[htp]
-    \centering
-\includegraphics[scale=0.4]{./Chapter1/Figures/figure-eniac.jpg}
-    \caption{世界上第一台通用电子数字计算机“埃尼阿克”（ENIAC）}
-    \label{fig:1-4}
-\end{figure}
-%-------------------------------------------
 \parinterval 基于战时密码学领域与通讯领域的研究，Claude Elwood Shannon在1948年提出使用“噪声信道”描述语言的传输过程，并借用热力学中的“{\small\bfnew{熵}}\index{熵}”（Entropy）\index{Entropy}来刻画消息中的信息量\upcite{DBLP:journals/bstj/Shannon48}。次年，Shannon与Warren Weaver更是合著了著名的\emph{The Mathematical Theory of Communication}\upcite{shannon1949the}，这些工作都为后期的统计机器翻译打下了理论基础。
@@ -136,7 +127,7 @@
 \parinterval 随着电子计算机的发展，研究者开始尝试使用计算机来进行自动翻译。1954年，美国乔治敦大学在IBM公司支持下，启动了第一次真正的机器翻译实验。翻译的目标是将几个简单的俄语句子翻译成为英语，翻译系统包含6条翻译规则和250词汇。这次翻译实验中测试了50个化学文本句子，取得了初步成功。在某种意义上来说，这个实验显示了采用基于词典和翻译规则的方法可以实现机器翻译过程。虽然只是取得了初步成功，但却引起了苏联、英国和日本研究机构的机器翻译研究热，大大推动了早期机器翻译的研究进展。
-\parinterval 1957年，Noam Chomsky在\emph{Syntactic Structures}中描述了转换生成语法\upcite{Chomsky1957Syntactic}，并使用数学方法来研究自然语言，建立了包括上下文有关语法、上下文无关语法等4种类型的语法。这些工作最终为今天计算机中广泛使用的“形式语言”奠定了基础。而他的思想也深深地影响了同时期的语言学和自然语言处理领域的学者。特别是是，早期基于规则的机器翻译中也大量使用了这些思想。
+\parinterval 1957年，Noam Chomsky在\emph{Syntactic Structures}中描述了转换生成语法\upcite{chomsky1957syntactic}，并使用数学方法来研究自然语言，建立了包括上下文有关语法、上下文无关语法等4种类型的语法。这些工作最终为今天计算机中广泛使用的“形式语言”奠定了基础。而他的思想也深深地影响了同时期的语言学和自然语言处理领域的学者。特别的是，早期基于规则的机器翻译中也大量使用了这些思想。
 \parinterval 虽然在这段时间，使用机器进行翻译的议题越加火热，但是事情并不总是一帆风顺，怀疑论者对机器翻译一直存有质疑，并很容易找出一些机器翻译无法解决的问题。自然地，人们也期望能够客观地评估一下机器翻译的可行性。当时美国基金资助组织委任自动语言处理咨询会承担了这项任务。经过近两年的调查与分析，该委员会于1966年11月公布了一个题为\emph{LANGUAGE AND MACHINES}的报告（图\ref{fig:1-5}），即ALPAC报告。该报告全面否定了机器翻译的可行性，为机器翻译的研究泼了一盆冷水。
@@ -151,7 +142,7 @@
 \parinterval 随后美国政府终止了对机器翻译研究的支持，这导致整个产业界和学术界都开始回避机器翻译。没有了政府的支持，企业也无法进行大规模投入，机器翻译的研究就此受挫。
-\parinterval 从历史上看，包括机器翻译在内很多人工智能领域在那个年代并不受“待见”，其主要原因在于当时的技术水平还比较低，而大家又对机器翻译等技术的期望过高。最后发现，当时的机器翻译水平无法满足实际需要，因此转而排斥它。但是，也正是这一盆冷水，让研究人员可以更加冷静地思考机器翻译的发展方向，为后来的爆发蓄力。
+\parinterval 从历史上看，包括机器翻译在内，很多人工智能领域在那个年代并不受“待见”，其主要原因在于当时的技术水平还比较低，而大家又对机器翻译等技术的期望过高。最后发现，当时的机器翻译水平无法满足实际需要，因此转而排斥它。但是，也正是这一盆冷水，让研究人员可以更加冷静地思考机器翻译的发展方向，为后来的爆发蓄力。
 %----------------------------------------------------------------------------------------
 %    NEW SUB-SECTION
@@ -183,7 +174,7 @@
 \vspace{0.5em}
 \item 第二，神经网络的连续空间模型有更强的表示能力。机器翻译中的一个基本问题是：如何表示一个句子？统计机器翻译把句子的生成过程看作是短语或者规则的推导，这本质上是一个离散空间上的符号系统。深度学习把传统的基于离散化的表示变成了连续空间的表示。比如，用实数空间的分布式表示代替了离散化的词语表示，而整个句子可以被描述为一个实数向量。这使得翻译问题可以在连续空间上描述，进而大大缓解了传统离散空间模型维度灾难等问题。更重要的是，连续空间模型可以用梯度下降等方法进行优化，具有很好的数学性质并且易于实现。
 \vspace{0.5em}
-\item 第三，深度网络学习算法的发展和GPU\index{GPU}（Graphics Processing Unit）\index{Graphics Processing Unit}等并行计算设备为训练神经网络提供了可能。早期的基于神经网络的方法一直没有在机器翻译甚至自然语言处理领域得到大规模应用，其中一个重要的原因是这类方法需要大量的浮点运算，而且以前计算机的计算能力无法达到这个要求。随着GPU等并行计算设备的进步，训练大规模神经网络也变为了可能。现在已经可以在几亿、几十亿，甚至上百亿句对上训练机器翻译系统，系统研发的周期越来越短，进展日新月异。
+\item 第三，深度网络学习算法的发展和GPU\index{GPU}（Graphics Processing Unit）\index{Graphics Processing Unit}等并行计算设备为训练神经网络提供了可能。早期的基于神经网络的方法一直没有在机器翻译甚至自然语言处理领域得到大规模应用，其中一个重要的原因是这类方法需要大量的浮点运算，但是以前计算机的计算能力无法达到这个要求。随着GPU等并行计算设备的进步，训练大规模神经网络也变为了可能。现在已经可以在几亿、几十亿，甚至上百亿句对上训练机器翻译系统，系统研发的周期越来越短，进展日新月异。
 \vspace{0.5em}
 \end{itemize}
@@ -209,7 +200,7 @@
 \sectionnewpage
 \section{机器翻译现状及挑战}
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-\parinterval 机器翻译技术发展到今天已经过无数次迭代，技术范式也经过若干次更替，近些年机器翻译的应用也如雨后春笋。今天的机器翻译的质量究竟如何呢？乐观地说，在很多特定的条件下，机器翻译的译文结果是非常不错的，甚至可以接近人工翻译的结果。然而，在开放式翻译任务中，机器翻译的结果还并不完美。更严格来说，机器翻译的质量远没有达到人们所期望的程度。对于有些人提到的“机器翻译代替人工翻译”也并不是事实。比如，在高精度同声传译任务中，机器翻译仍需要更多打磨；再比如，针对于小说的翻译，机器翻译还无法做到与人工翻译媲美；甚至有人尝试用机器翻译系统翻译中国古代诗词，这里更多的是娱乐的味道。但是毫无疑问的是，机器翻译可以帮助人类，甚至有朝一日可以代替一些低端的人工翻译工作。
+\parinterval 机器翻译技术发展到今天已经过无数次迭代，技术范式也经过若干次更替，近些年机器翻译的应用也如雨后春笋相继浮现。今天的机器翻译的质量究竟如何呢？乐观地说，在很多特定的条件下，机器翻译的译文结果是非常不错的，甚至可以接近人工翻译的结果。然而，在开放式翻译任务中，机器翻译的结果还并不完美。更严格来说，机器翻译的质量远没有达到人们所期望的程度。对于有些人提到的“机器翻译代替人工翻译”也并不是事实。比如，在高精度同声传译任务中，机器翻译仍需要更多打磨；再比如，针对于小说的翻译，机器翻译还无法做到与人工翻译媲美；甚至有人尝试用机器翻译系统翻译中国古代诗词，这里更多的是娱乐的味道。但是毫无疑问的是，机器翻译可以帮助人类，甚至有朝一日可以代替一些低端的人工翻译工作。
 \parinterval 图\ref{fig:1-7}展示了机器翻译和人工翻译质量的一个对比结果。在汉语到英语的新闻翻译任务中，如果对译文进行人工评价（五分制），那么机器翻译的译文得分为3.9分，人工译文得分为4.7分（人的翻译也不是完美的）。可见，在这个任务中机器翻译表现不错，但是与人还有一定差距。如果换一种方式评价，把人的译文作为参考答案，用机器翻译的译文与其进行比对（百分制），会发现机器翻译的得分只有47分。当然，这个结果并不是说机器翻译的译文质量很差，它更多的是表明机器翻译系统可以生成一些与人工翻译不同的译文，机器翻译也具有一定的创造性。这也类似于，很多围棋选手都想向AlphaGo学习，因为智能围棋系统也可以走出一些人类从未走过的妙招。
@@ -265,7 +256,7 @@
 \subsection{规则的定义与层次}
-\parinterval 规则就像语言中的“IF-THEN”语句，如果满足条件，则执行相应的语义动作。比如，可以将待翻译句子中的某个词，使用目标语言单词进行替换，但是这种替换并非随意的，而是在语言学知识的指导下进行的。
+\parinterval 规则就像语言中的“If-then”语句，如果满足条件，则执行相应的语义动作。比如，可以将待翻译句子中的某个词，使用目标语言单词进行替换，但是这种替换并非随意的，而是在语言学知识的指导下进行的。
 \parinterval 图\ref{fig:1-9}展示了一个使用转换法进行翻译的实例。这里，利用一个简单的汉译英规则库完成对句子“我对你感到满意”的翻译。当翻译“我”时，从规则库中找到规则1，该规则表示遇到单词“我”就翻译为“I”；类似地，也可以从规则库中找到规则4，该规则表示翻译调序，即将单词“you”放到“be satisfied with”后面。这种通过规则表示单词之间的对应关系也为统计机器翻译方法提供了思路。如统计机器翻译中，基于短语的翻译模型使用短语对对原文进行替换，详细描述可以参考{\chapterseven}。
@@ -296,7 +287,7 @@
 \subsection{转换法}
-\parinterval 通常一个典型的{\small\bfnew{基于转换规则的机器翻译}}\index{基于转换规则的机器翻译}（Transfer-based Translation）\index{Transfer-based Translation}的过程可以被视为“独立分析-独立生成-相关转换”的过程\upcite{parsing2009speech}。如图\ref{fig:1-11}所示，这些过程可以分成六个步骤，其中每一个步骤都是通过相应的翻译规则来完成。比如，第一个步骤中需要构建源语词法分析规则，第二个步骤中需要构建源语句法分析规则，第三个和第四个步骤中需要构建转换规则，其中包括源语言-目标语言词汇和结构转换规则等等。
+\parinterval 通常一个典型的{\small\bfnew{基于转换规则的机器翻译}}\index{基于转换规则的机器翻译}（Transfer-based Translation）\index{Transfer-based Translation}的过程可以被视为“独立分析-相关转换-独立生成”的过程\upcite{parsing2009speech}。如图\ref{fig:1-11}所示，这些过程可以分成六个步骤，其中每一个步骤都是通过相应的翻译规则来完成。比如，第一个步骤中需要构建源语词法分析规则，第二个步骤中需要构建源语句法分析规则，第三个和第四个步骤中需要构建转换规则，其中包括源语言-目标语言词汇和结构转换规则等等。
 %----------------------------------------------
 \begin{figure}[htp]
@@ -353,15 +344,15 @@
 \parinterval 在基于规则的机器翻译时代，机器翻译技术研究有一个特点就是{\small\bfnew{语法}}\index{语法}（Grammer）\index{Grammer}和{\small\bfnew{算法}}\index{算法}（Algorithm）\index{Algorithm}分开，相当于是把语言分析和程序设计分开。传统方式使用程序代码来实现翻译规则，并把所谓的翻译规则隐含在程序代码实现中。其中最大问题是一旦翻译规则发生修改，程序代码也需要进行相应修改，导致维护代价非常高。此外书写翻译规则的语言学家与编代码的程序员沟通代价也非常高，有时候会出现鸡同鸭讲的感觉。把语法和算法分开对于基于规则的机器翻译技术来说最大好处就是可以将语言学家和程序员的工作分开，各自发挥自己的优势。
-\parinterval 这种语言分析和程序设计分开的实现方式也使得基于人工书写翻译规则的机器翻译方法非常直观，语言学家可以很容易地将翻译知识利用规则的方法表达出来，并且不需要修改系统代码。例如：1991年，东北大学自然语言处理实验室王宝库教授提出的规则描述语言（CTRDL）\upcite{王宝库1991机器翻译系统中一种规则描述语言}。以及1995年，同为东北大学自然语言处理实验室的姚天顺教授提出的词汇语义驱动算法\upcite{唐泓英1995基于搭配词典的词汇语义驱动算法}，都是在这种思想上对机器翻译方法的一种改进。此外，使用规则本身就具有一定的优势。
+\parinterval 这种语言分析和程序设计分开的实现方式也使得基于人工书写翻译规则的机器翻译方法非常直观，语言学家可以很容易地将翻译知识利用规则的方法表达出来，并且不需要修改系统代码。例如：1991年，东北大学自然语言处理实验室王宝库教授提出的规则描述语言（CTRDL）\upcite{王宝库1991机器翻译系统中一种规则描述语言}。以及1995年，同为东北大学自然语言处理实验室的姚天顺教授提出的词汇语义驱动算法\upcite{唐泓英1995基于搭配词典的词汇语义驱动算法}，都是在这种思想上对机器翻译方法的一种改进。此外，使用规则本身就具有一定的优势：
 \begin{itemize}
 \vspace{0.5em}
-\item 首先，翻译规则的书写颗粒度具有很大的可伸缩性；
+\item 翻译规则的书写颗粒度具有很大的可伸缩性。
 \vspace{0.5em}
-\item 其次，较大颗粒度的翻译规则有很强的概括能力，较小颗粒度的翻译规则具有精细的描述能力；
+\item 较大颗粒度的翻译规则有很强的概括能力，较小颗粒度的翻译规则具有精细的描述能力。
 \vspace{0.5em}
-\item 最后，翻译规则便于处理复杂的句法结构和进行深层次的语义理解，比如解决翻译过程中的长距离依赖问题。
+\item 翻译规则便于处理复杂的句法结构和进行深层次的语义理解，比如解决翻译过程中的长距离依赖问题。
 \vspace{0.5em}
 \end{itemize}
@@ -395,15 +386,15 @@
 \end{figure}
 %-------------------------------------------
-\parinterval 当然，基于实例的机器翻译也并不完美。
+\parinterval 当然，基于实例的机器翻译也并不完美：
 \begin{itemize}
 \vspace{0.5em}
-\item 首先，这种方法对翻译实例的精确度要求非常高，一个实例的错误可能会导致一个句型都无法翻译正确；
+\item 这种方法对翻译实例的精确度要求非常高，一个实例的错误可能会导致一个句型都无法翻译正确。
 \vspace{0.5em}
-\item 其次，实例维护较为困难，实例库的构建通常需要单词级对齐的标注，而保证词对齐的质量是非常困难的工作，这也大大增加了实例库维护的难度；
+\item 实例维护较为困难，实例库的构建通常需要单词级对齐的标注，而保证词对齐的质量是非常困难的工作，这也大大增加了实例库维护的难度。
 \vspace{0.5em}
-\item 再次，尽管可以通过实例或者模板进行翻译，但是其覆盖度仍然有限。在实际应用中，很多句子无法找到可以匹配的实例或者模板。
+\item 尽管可以通过实例或者模板进行翻译，但是其覆盖度仍然有限。在实际应用中，很多句子无法找到可以匹配的实例或者模板。
 \vspace{0.5em}
 \end{itemize}
@@ -447,14 +438,14 @@
 \end{figure}
 %-------------------------------------------
-\parinterval 与统计机器翻译相比，神经机器翻译的优势体现在其不需要特征工程，所有信息由神经网络自动从原始输入中提取。而且，相比于统计机器翻译中所使用的离散化的表示。神经机器翻译中词和句子的分布式连续空间表示可以为建模提供更为丰富的信息，同时可以使用相对成熟的基于梯度的方法优化模型。此外，神经网络的存储需求较小，天然适合小设备上的应用。当然，神经机器翻译也存在问题。
+\parinterval 与统计机器翻译相比，神经机器翻译的优势体现在其不需要特征工程，所有信息由神经网络自动从原始输入中提取。而且，相比于统计机器翻译中所使用的离散化的表示。神经机器翻译中词和句子的分布式连续空间表示可以为建模提供更为丰富的信息，同时可以使用相对成熟的基于梯度的方法优化模型。此外，神经网络的存储需求较小，天然适合小设备上的应用。当然，神经机器翻译也存在问题：
 \begin{itemize}
 \vspace{0.5em}
-\item 首先，虽然脱离了特征工程，但神经网络的结构需要人工设计，即使设计好结构，系统的调优、超参数的设置等仍然依赖大量的实验。
+\item 虽然脱离了特征工程，但神经网络的结构需要人工设计，即使设计好结构，系统的调优、超参数的设置等仍然依赖大量的实验。
 \vspace{0.5em}
-\item 其次，神经机器翻译现在缺乏可解释性，其过程和人的认知差异很大，通过人的先验知识干预的程度差。
+\item 神经机器翻译现在缺乏可解释性，其过程和人的认知差异很大，通过人的先验知识干预的程度差。
 \vspace{0.5em}
-\item 再次，神经机器翻译对数据的依赖很大，数据规模、质量对性能都有很大影响，特别是在数据稀缺的情况下，充分训练神经网络很有挑战性。
+\item 神经机器翻译对数据的依赖很大，数据规模、质量对性能都有很大影响，特别是在数据稀缺的情况下，充分训练神经网络很有挑战性。
 \vspace{0.5em}
 \end{itemize}
@@ -515,7 +506,7 @@
 \parinterval 首先，推荐一本书$Statistical\ Machine\ Translation$\upcite{koehn2009statistical}，其作者是机器翻译领域著名学者Philipp Koehn教授。该书是机器翻译领域内的经典之作，介绍了统计机器翻译技术的进展。该书从语言学和概率学两个方面介绍了统计机器翻译的构成要素，然后介绍了统计机器翻译的主要模型：基于词、基于短语和基于树的模型，以及机器翻译评价、语言建模、判别式训练等方法。此外，作者在该书的最新版本中增加了神经机器翻译的章节，方便研究人员全面了解机器翻译的最新发展趋势\upcite{DBLP:journals/corr/abs-1709-07809}。
-\parinterval $Foundations\ of\ Statistical\ Natural\ Language\ Processing$\upcite{manning1999foundations}中文译名《统计自然语言处理基础》，作者是自然语言处理领域的权威Chris Manning教授和Hinrich Sch$\ddot{\textrm{u}}$tze教授。该书对统计自然语言处理方法进行了全面介绍。书中讲解了统计自然语言处理所需的语言学和概率论基础知识，介绍了机器翻译评价、语言建模、判别式训练以及整合语言学信息等基础方法。其中也包含了构建自然语言处理工具所需的基本理论和算法，提供了对数学和语言学基础内容广泛而严格的覆盖，以及统计方法的详细讨论。
+\parinterval $Foundations\ of\ Statistical\ Natural\ Language\ Processing$\upcite{manning1999foundations}中文译名《统计自然语言处理基础》，作者是自然语言处理领域的权威Chris Manning教授和Hinrich Sch$\ddot{\textrm{u}}$tze教授。该书对统计自然语言处理方法进行了全面介绍。书中讲解了统计自然语言处理所需的语言学和概率论基础知识，介绍了机器翻译评价、语言建模、判别式训练以及整合语言学信息等基础方法。其中也包含了构建自然语言处理工具所需的基本理论和算法，并且涵盖了数学和语言学基础内容以及相关的统计方法。
 \parinterval 《统计自然语言处理（第2版）》\upcite{宗成庆2013统计自然语言处理}由中国科学院自动化所宗成庆教授所著。该书中系统介绍了统计自然语言处理的基本概念、理论方法和最新研究进展，既有对基础知识和理论模型的介绍，也有对相关问题的研究背景、实现方法和技术现状的详细阐述。可供从事自然语言处理、机器翻译等研究的相关人员参考。

--- a/Chapter10/Figures/figure-3-base-problom-of-p.tex
+++ b/Chapter10/Figures/figure-3-base-problom-of-p.tex
@@ -15,9 +15,9 @@
 					\node[rnnnode,minimum height=0.5\base,fill=green!30!white,anchor=west] (eemb\x) at ([xshift=0.4\base]eemb\y.east) {\tiny{$e_x()$}};
 				\foreach \x in {1,2,...,3}
 					\node[rnnnode,fill=blue!30!white,anchor=south] (enc\x) at ([yshift=0.3\base]eemb\x.north) {};
-			        \node[] (enclabel1) at (enc1) {\tiny{$\textbf{h}_{m-2}$}};
+			        \node[] (enclabel1) at (enc1) {\tiny{$\vectorn{h}_{m-2}$}};
-			        \node[] (enclabel2) at (enc2) {\tiny{$\textbf{h}_{m-1}$}};
+			        \node[] (enclabel2) at (enc2) {\tiny{$\vectorn{h}_{m-1}$}};
-			        \node[rnnnode,fill=purple!30!white] (enclabel3) at (enc3) {\tiny{$\textbf{h}_{m}$}};
+			        \node[rnnnode,fill=purple!30!white] (enclabel3) at (enc3) {\tiny{$\vectorn{h}_{m}$}};
 				\node[wordnode,left=0.4\base of enc1] (init1) {$\cdots$};
 				\node[wordnode,left=0.4\base of eemb1] (init2) {$\cdots$};
@@ -29,7 +29,7 @@
 				\foreach \x in {1,2,...,3}
 					\node[rnnnode,minimum height=0.5\base,fill=green!30!white,anchor=south] (demb\x) at ([yshift=\base]enc\x.north) {\tiny{$e_y()$}};
 				\foreach \x in {1,2,...,3}
-					\node[rnnnode,fill=blue!30!white,anchor=south] (dec\x) at ([yshift=0.3\base]demb\x.north) {{\tiny{$\textbf{s}_\x$}}};
+					\node[rnnnode,fill=blue!30!white,anchor=south] (dec\x) at ([yshift=0.3\base]demb\x.north) {{\tiny{$\vectorn{s}_\x$}}};
 				\foreach \x in {1,2,...,3}
 					\node[rnnnode,minimum height=0.5\base,fill=red!30!white,anchor=south] (softmax\x) at ([yshift=0.3\base]dec\x.north) {\tiny{Softmax}};
 				\node[wordnode,right=0.4\base of demb3] (end1) {$\cdots$};
@@ -73,7 +73,7 @@
 				\draw[-latex'] (enc3.north) .. controls +(north:0.3\base) and +(east:\base) .. (bridge) .. controls +(west:2.7\base) and +(west:0.3\base) .. (dec1.west);
 				{
-				\node [anchor=east] (line1) at ([xshift=-3em,yshift=0.5em]softmax1.west) {\scriptsize{基于RNN的隐层状态$\textbf{s}_i$}};
+				\node [anchor=east] (line1) at ([xshift=-3em,yshift=0.5em]softmax1.west) {\scriptsize{基于RNN的隐层状态$\vectorn{s}_i$}};
 				\node [anchor=north west] (line2) at ([yshift=0.3em]line1.south west) {\scriptsize{预测目标词的概率}};
 				\node [anchor=north west] (line3) at ([yshift=0.3em]line2.south west) {\scriptsize{通常，用Softmax函数}};
 				\node [anchor=north west] (line4) at ([yshift=0.3em]line3.south west) {\scriptsize{实现 $\textrm{P}(y_i|...)$}};
@@ -90,7 +90,7 @@
 				\node [anchor=west] (line21) at ([xshift=1.3em,yshift=1.5em]enc3.east)  {\scriptsize{源语编码器最后一个}};
 				\node [anchor=north west] (line22) at ([yshift=0.3em]line21.south west) {\scriptsize{循环单元的输出被}};
 				\node [anchor=north west] (line23) at ([yshift=0.3em]line22.south west) {\scriptsize{看作是句子的表示,}};
-				\node [anchor=north west] (line24) at ([yshift=0.3em]line23.south west) {\scriptsize{记为$\textbf{C}$}};
+				\node [anchor=north west] (line24) at ([yshift=0.3em]line23.south west) {\scriptsize{记为$\vectorn{C}$}};
 				}
 				\begin{pgfonlayer}{background}

--- a/Chapter10/Figures/figure-a-simple-example-for-tl.tex
+++ b/Chapter10/Figures/figure-a-simple-example-for-tl.tex
@@ -9,14 +9,14 @@
 \node [pos=0.4,left,xshift=-36em,yshift=7em,font=\small] (original0) {\quad 源语（中文）输入：};
 \node [pos=0.4,left,xshift=-22em,yshift=7em,font=\small] (original1) {
 \begin{tabular}[t]{l}
-\parbox{14em}{``我''、``很''、``好''、``<eos>'' }
+\parbox{14em}{“我”、“很”、“好”、“<eos>” }
 \end{tabular}
 };
 %译文1--------------mt1
 \node[font=\small] (mt1) at ([xshift=0em,yshift=-1em]original0.south) {目标语（英文）输出：};
 \node[font=\small] (ts1) at ([xshift=0em,yshift=-1em]original1.south)  {
 \begin{tabular}[t]{l}
-\parbox{14em}{``I''、``am''、``fine''、``<eos>''}
+\parbox{14em}{“I”、“am”、“fine”、“<eos>”}
 \end{tabular}
 };

--- a/Chapter10/Figures/figure-calculation-process-of-context-vector-c.tex
+++ b/Chapter10/Figures/figure-calculation-process-of-context-vector-c.tex
@@ -8,26 +8,26 @@
 \begin{scope}
-\node [anchor=west,draw,fill=red!20!white,inner sep=3pt,minimum width=2em,minimum height=1.2em] (h1) at (0,0) {\scriptsize{$\textbf{h}_1$}};
+\node [anchor=west,draw,fill=red!20!white,inner sep=3pt,minimum width=2em,minimum height=1.2em] (h1) at (0,0) {\scriptsize{$\vectorn{h}_1$}};
-\node [anchor=west,draw,fill=red!20!white,inner sep=3pt,minimum width=2em,minimum height=1.2em] (h2) at ([xshift=1em]h1.east) {\scriptsize{$\textbf{h}_2$}};
+\node [anchor=west,draw,fill=red!20!white,inner sep=3pt,minimum width=2em,minimum height=1.2em] (h2) at ([xshift=1em]h1.east) {\scriptsize{$\vectorn{h}_2$}};
 \node [anchor=west,inner sep=0pt,minimum width=3em] (h3) at ([xshift=0.5em]h2.east) {\scriptsize{...}};
-\node [anchor=west,draw,fill=red!20!white,inner sep=3pt,minimum width=2em,minimum height=1.2em] (h4) at ([xshift=0.5em]h3.east) {\scriptsize{$\textbf{h}_m$}};
+\node [anchor=west,draw,fill=red!20!white,inner sep=3pt,minimum width=2em,minimum height=1.2em] (h4) at ([xshift=0.5em]h3.east) {\scriptsize{$\vectorn{h}_m$}};
 \node [anchor=south,circle,minimum size=1.0em,draw,ublue,thick] (sum) at ([yshift=2em]h2.north east) {};
 \draw [thick,-,ublue] (sum.north) -- (sum.south);
 \draw [thick,-,ublue] (sum.west) -- (sum.east);
-\node [anchor=south,draw,fill=green!20!white,inner sep=3pt,minimum width=2em,minimum height=1.2em] (th1) at ([yshift=2em,xshift=-1em]sum.north west) {\scriptsize{$\textbf{s}_{j-1}$}};
+\node [anchor=south,draw,fill=green!20!white,inner sep=3pt,minimum width=2em,minimum height=1.2em] (th1) at ([yshift=2em,xshift=-1em]sum.north west) {\scriptsize{$\vectorn{s}_{j-1}$}};
-\node [anchor=west,draw,fill=green!20!white,inner sep=3pt,minimum width=2em,minimum height=1.2em] (th2) at ([xshift=2em]th1.east) {\scriptsize{$\textbf{s}_{j}$}};
+\node [anchor=west,draw,fill=green!20!white,inner sep=3pt,minimum width=2em,minimum height=1.2em] (th2) at ([xshift=2em]th1.east) {\scriptsize{$\vectorn{s}_{j}$}};
-\draw [->] (h1.north) .. controls +(north:0.8) and +(west:1) ..  (sum.190) node [pos=0.3,left] {\scriptsize{$\alpha_{1,j}$}};
+\draw [->] (h1.north) .. controls +(north:0.8) and +(west:1) ..  (sum.190) node [pos=0.2,left] {\scriptsize{$\alpha_{1,j}$}};
 \draw [->] (h2.north) .. controls +(north:0.6) and +(220:0.2) ..  (sum.220) node [pos=0.2,right] {\scriptsize{$\alpha_{2,j}$}};
 \draw [->] (h4.north) .. controls +(north:0.8) and +(east:1) ..  (sum.-10) node [pos=0.1,left] (alphan) {\scriptsize{$\alpha_{m,j}$}};
 \draw [->] ([xshift=-1.5em]th1.west) -- ([xshift=-0.1em]th1.west);
 \draw [->] ([xshift=0.1em]th1.east) -- ([xshift=-0.1em]th2.west);
 \draw [->] ([xshift=0.1em]th2.east) -- ([xshift=1.5em]th2.east);
-\draw [->] (sum.north) .. controls +(north:0.8) and +(west:0.2) ..  ([yshift=-0.4em,xshift=-0.1em]th2.west) node [pos=0.2,right] (ci) {\scriptsize{$\textbf{C}_{j}$}};
+\draw [->] (sum.north) .. controls +(north:0.8) and +(west:0.2) ..  ([yshift=-0.4em,xshift=-0.1em]th2.west) node [pos=0.2,right] (ci) {\scriptsize{$\vectorn{C}_{j}$}};
 \node [anchor=south,inner sep=1pt] (output) at ([yshift=0.8em]th2.north) {\scriptsize{输出层}};
 \draw [->] ([yshift=0.1em]th2.north) -- ([yshift=-0.1em]output.south);
@@ -39,11 +39,11 @@
 \node [anchor=north] (enc42) at ([yshift=0.5em]enc4.south) {\scriptsize{(位置$4$)}};
 {
-\node [anchor=west] (math1) at ([xshift=5em,yshift=1em]th2.east) {$\textbf{C}_j = \sum_{i} \alpha_{i,j} \textbf{h}_i \ \ $};
+\node [anchor=west] (math1) at ([xshift=5em,yshift=1em]th2.east) {$\vectorn{C}_j = \sum_{i} \alpha_{i,j} \vectorn{h}_i \ \ $};
 }
 {
 \node [anchor=north west] (math2) at ([yshift=-2em]math1.south west) {$\alpha_{i,j} = \frac{\exp(\beta_{i,j})}{\sum_{i'} \exp(\beta_{i',j})}$};
-\node [anchor=north west] (math3) at ([yshift=-0em]math2.south west) {$\beta_{i,j} = a(\textbf{s}_{j-1}, \textbf{h}_i)$};
+\node [anchor=north west] (math3) at ([yshift=-0em]math2.south west) {$\beta_{i,j} = a(\vectorn{s}_{j-1}, \vectorn{h}_i)$};
 }
 \begin{pgfonlayer}{background}

--- a/Chapter10/Figures/figure-encoder-decoder-process.tex
+++ b/Chapter10/Figures/figure-encoder-decoder-process.tex
@@ -2,9 +2,9 @@
 \begin{scope}
 \small{
-\node [anchor=south west,minimum width=15em] (source) at (0,0) {\textbf{source}: 我\ \ \ \ 对\ \ \ \ 你\ \ \ \ 感到\ \ \ \ 满意};
+\node [anchor=south west,minimum width=15em] (source) at (0,0) {\textbf{源语}: 我\ \ \ \ 对\ \ \ \ 你\ \ \ \ 感到\ \ \ \ 满意};
 {
-\node [anchor=south west,minimum width=15em] (target) at ([yshift=12em]source.north west) {\textbf{target}: I\ \ am\ \ \ satisfied\ \ \ with\ \ \ you};
+\node [anchor=south west,minimum width=15em] (target) at ([yshift=12em]source.north west) {\textbf{目标语}: I\ \ am\ \ \ satisfied\ \ \ with\ \ \ you};
 }
 {
 \node [anchor=center,minimum width=9.6em,minimum height=1.8em,draw,rounded corners=0.3em] (hidden) at ([yshift=6em]source.north) {};
@@ -24,7 +24,7 @@
 \node [anchor=west,minimum width=1.5em,minimum size=1.5em] (cell08) at (cell06.east){\small{
 \hspace{0.6em}
 \begin{tabular}{l}
-源语言句子的``表示''
+源语句子的“表示”
 \end{tabular}
 }
 };

--- a/Chapter10/Figures/figure-encoder-decoder-with-attention.tex
+++ b/Chapter10/Figures/figure-encoder-decoder-with-attention.tex
@@ -80,9 +80,9 @@
 \draw[<-] ([yshift=0.1em,xshift=1em]t6.north) -- ([yshift=1.2em,xshift=1em]t6.north);
-\draw [->] ([yshift=3em]s6.north) -- ([yshift=4em]s6.north) -- ([yshift=4em]t1.north) node [pos=0.5,fill=green!30,inner sep=2pt] (c1) {\scriptsize{表示$\textbf{C}_1$}} -- ([yshift=3em]t1.north) ;
+\draw [->] ([yshift=3em]s6.north) -- ([yshift=4em]s6.north) -- ([yshift=4em]t1.north) node [pos=0.5,fill=green!30,inner sep=2pt] (c1) {\scriptsize{表示$\vectorn{C}_1$}} -- ([yshift=3em]t1.north) ;
-\draw [->] ([yshift=3em]s5.north) -- ([yshift=5.3em]s5.north) -- ([yshift=5.3em]t2.north) node [pos=0.5,fill=green!30,inner sep=2pt] (c2) {\scriptsize{表示$\textbf{C}_2$}} -- ([yshift=3em]t2.north) ;
+\draw [->] ([yshift=3em]s5.north) -- ([yshift=5.3em]s5.north) -- ([yshift=5.3em]t2.north) node [pos=0.5,fill=green!30,inner sep=2pt] (c2) {\scriptsize{表示$\vectorn{C}_2$}} -- ([yshift=3em]t2.north) ;
-\draw [->] ([yshift=3.5em]s3.north) -- ([yshift=6.6em]s3.north) -- ([yshift=6.6em]t4.north) node [pos=0.5,fill=green!30,inner sep=2pt] (c3) {\scriptsize{表示$\textbf{C}_i$}} -- ([yshift=3.5em]t4.north) ;
+\draw [->] ([yshift=3.5em]s3.north) -- ([yshift=6.6em]s3.north) -- ([yshift=6.6em]t4.north) node [pos=0.5,fill=green!30,inner sep=2pt] (c3) {\scriptsize{表示$\vectorn{C}_i$}} -- ([yshift=3.5em]t4.north) ;
 \node [anchor=north] (smore) at ([yshift=3.5em]s3.north) {...};
 \node [anchor=north] (tmore) at ([yshift=3.5em]t4.north) {...};

--- a/Chapter10/Figures/figure-example-of-context-vector-calculation-process.tex
+++ b/Chapter10/Figures/figure-example-of-context-vector-calculation-process.tex
@@ -104,9 +104,9 @@
 %\visible<3->
 {
 % coverage score formula node
-\node [anchor=north west] (formula) at ([xshift=-0.3\hnode,yshift=-1.5\hnode]attn11.south) {\small{不同$\textbf{C}_j$所对应的源语言词的权重是不同的}};
+\node [anchor=north west] (formula) at ([xshift=-0.3\hnode,yshift=-1.5\hnode]attn11.south) {\small{不同$\vectorn{C}_j$所对应的源语言词的权重是不同的}};
-\node [anchor=north west] (example) at (formula.south west) {\footnotesize{$\textbf{C}_2=0.4 \times \textbf{h}(\textrm{``你''}) + 0.4 \times \textbf{h}(\textrm{``什么''}) +$}};
+\node [anchor=north west] (example) at (formula.south west) {\footnotesize{$\vectorn{C}_2=0.4 \times \vectorn{h}(\textrm{“你”}) + 0.4 \times \vectorn{h}(\textrm{“什么”}) +$}};
-\node [anchor=north west] (example2) at ([yshift=0.4em]example.south west) {\footnotesize{$\ \ \ \ \ \ \ \ 0 \times \textbf{h}(\textrm{``都''}) + 0.1 \times \textbf{h}(\textrm{`` 没''}) + ..$}};
+\node [anchor=north west] (example2) at ([yshift=0.4em]example.south west) {\footnotesize{$\ \ \ \ \ \ \ \ 0 \times \vectorn{h}(\textrm{“都”}) + 0.1 \times \vectorn{h}(\textrm{“ 没”}) + ..$}};
 }
 %\visible<3->
@@ -138,12 +138,12 @@
 %\visible<2->
 {
-\node[anchor=west] (sc1) at ([xshift=0.9\hnode]attn16.east) {$\textbf{C}_1 = \sum_{i=1}^{8} \alpha_{i1} \textbf{h}_{i}$};
+\node[anchor=west] (sc1) at ([xshift=0.9\hnode]attn16.east) {$\vectorn{C}_1 = \sum_{i=1}^{8} \alpha_{i1} \vectorn{h}_{i}$};
 }
 %\visible<3->
 {
-\node[anchor=west] (sc2) at ([xshift=0.9\hnode,yshift=0.0\hnode]attn26.east) {$\textbf{C}_2 = \sum_{i=1}^{8} \alpha_{i2} \textbf{h}_{i}$};
+\node[anchor=west] (sc2) at ([xshift=0.9\hnode,yshift=0.0\hnode]attn26.east) {$\vectorn{C}_2 = \sum_{i=1}^{8} \alpha_{i2} \vectorn{h}_{i}$};
 }
 \end{tikzpicture}
\ No newline at end of file
--- a/Chapter10/Figures/figure-gru01.tex
+++ b/Chapter10/Figures/figure-gru01.tex
@@ -78,8 +78,8 @@
        \end{scope}
        \begin{scope}
-            \node[wordnode,anchor=south] () at (aux71) {$\mathbf{h}_{t-1}$};
+            \node[wordnode,anchor=south] () at (aux71) {$\vectorn{h}_{t-1}$};
-            \node[wordnode,anchor=west] () at (aux12) {$\mathbf{x}_t$};
+            \node[wordnode,anchor=west] () at (aux12) {$\vectorn{x}_t$};
        \end{scope}

--- a/Chapter10/Figures/figure-gru02.tex
+++ b/Chapter10/Figures/figure-gru02.tex
@@ -91,8 +91,8 @@
        \end{scope}
        \begin{scope}
-            \node[wordnode,anchor=south] () at (aux71) {$\mathbf{h}_{t-1}$};
+            \node[wordnode,anchor=south] () at (aux71) {$\vectorn{h}_{t-1}$};
-            \node[wordnode,anchor=west] () at (aux12) {$\mathbf{x}_t$};
+            \node[wordnode,anchor=west] () at (aux12) {$\vectorn{x}_t$};
        \end{scope}

--- a/Chapter10/Figures/figure-gru03.tex
+++ b/Chapter10/Figures/figure-gru03.tex
@@ -109,11 +109,11 @@
        \end{scope}
        \begin{scope}
-             \node[wordnode,anchor=south] () at (aux71) {$\mathbf{h}_{t-1}$};
+             \node[wordnode,anchor=south] () at (aux71) {$\vectorn{h}_{t-1}$};
-            \node[wordnode,anchor=west] () at (aux12) {$\mathbf{x}_t$};
+            \node[wordnode,anchor=west] () at (aux12) {$\vectorn{x}_t$};
            {
-                \node[wordnode,anchor=east] () at (aux87) {$\mathbf{h}_{t}$};
+                \node[wordnode,anchor=east] () at (aux87) {$\vectorn{h}_{t}$};
-                \node[wordnode,anchor=south] () at (aux78) {$\mathbf{h}_{t}$};
+                \node[wordnode,anchor=south] () at (aux78) {$\vectorn{h}_{t}$};
            }
        \end{scope}

--- a/Chapter10/Figures/figure-lstm01.tex
+++ b/Chapter10/Figures/figure-lstm01.tex
@@ -84,9 +84,9 @@
        \end{scope}
        \begin{scope}
-            \node[wordnode,anchor=south] () at ([xshift=0.5\base]aux21) {$\mathbf{h}_{t-1}$};
+            \node[wordnode,anchor=south] () at ([xshift=0.5\base]aux21) {$\vectorn{h}_{t-1}$};
-            \node[wordnode,anchor=west] () at (aux12) {$\mathbf{x}_t$};
+            \node[wordnode,anchor=west] () at (aux12) {$\vectorn{x}_t$};
-            \node[wordnode,anchor=south] () at ([xshift=0.5\base]aux51) {$\mathbf{c}_{t-1}$};
+            \node[wordnode,anchor=south] () at ([xshift=0.5\base]aux51) {$\vectorn{c}_{t-1}$};
        \end{scope}

--- a/Chapter10/Figures/figure-lstm02.tex
+++ b/Chapter10/Figures/figure-lstm02.tex
@@ -99,9 +99,9 @@
         \end{scope}
        \begin{scope}
-            \node[wordnode,anchor=south] () at ([xshift=0.5\base]aux21) {$\mathbf{h}_{t-1}$};
+            \node[wordnode,anchor=south] () at ([xshift=0.5\base]aux21) {$\vectorn{h}_{t-1}$};
-            \node[wordnode,anchor=west] () at (aux12) {$\mathbf{x}_t$};
+            \node[wordnode,anchor=west] () at (aux12) {$\vectorn{x}_t$};
-            \node[wordnode,anchor=south] () at ([xshift=0.5\base]aux51) {$\mathbf{c}_{t-1}$};
+            \node[wordnode,anchor=south] () at ([xshift=0.5\base]aux51) {$\vectorn{c}_{t-1}$};
        \end{scope}

--- a/Chapter10/Figures/figure-lstm03.tex
+++ b/Chapter10/Figures/figure-lstm03.tex
@@ -113,11 +113,11 @@
        \end{scope}
        \begin{scope}
-            \node[wordnode,anchor=south] () at ([xshift=0.5\base]aux21) {$\mathbf{h}_{t-1}$};
+            \node[wordnode,anchor=south] () at ([xshift=0.5\base]aux21) {$\vectorn{h}_{t-1}$};
-            \node[wordnode,anchor=west] () at (aux12) {$\mathbf{x}_t$};
+            \node[wordnode,anchor=west] () at (aux12) {$\vectorn{x}_t$};
-            \node[wordnode,anchor=south] () at ([xshift=0.5\base]aux51) {$\mathbf{c}_{t-1}$};
+            \node[wordnode,anchor=south] () at ([xshift=0.5\base]aux51) {$\vectorn{c}_{t-1}$};
            {
-                \node[wordnode,anchor=south] () at ([xshift=-0.5\base]aux59) {$\mathbf{c}_{t}$};
+                \node[wordnode,anchor=south] () at ([xshift=-0.5\base]aux59) {$\vectorn{c}_{t}$};
            }
        \end{scope}

--- a/Chapter10/Figures/figure-lstm04.tex
+++ b/Chapter10/Figures/figure-lstm04.tex
@@ -131,15 +131,15 @@
        \end{scope}
        \begin{scope}
-            \node[wordnode,anchor=south] () at ([xshift=0.5\base]aux21) {$\mathbf{h}_{t-1}$};
+            \node[wordnode,anchor=south] () at ([xshift=0.5\base]aux21) {$\vectorn{h}_{t-1}$};
-            \node[wordnode,anchor=west] () at (aux12) {$\mathbf{x}_t$};
+            \node[wordnode,anchor=west] () at (aux12) {$\vectorn{x}_t$};
-            \node[wordnode,anchor=south] () at ([xshift=0.5\base]aux51) {$\mathbf{c}_{t-1}$};
+            \node[wordnode,anchor=south] () at ([xshift=0.5\base]aux51) {$\vectorn{c}_{t-1}$};
            {
-                \node[wordnode,anchor=south] () at ([xshift=-0.5\base]aux59) {$\mathbf{c}_{t}$};
+                \node[wordnode,anchor=south] () at ([xshift=-0.5\base]aux59) {$\vectorn{c}_{t}$};
            }
            {
-                \node[wordnode,anchor=east] () at (aux68) {$\mathbf{h}_{t}$};
+                \node[wordnode,anchor=east] () at (aux68) {$\vectorn{h}_{t}$};
-                \node[wordnode,anchor=south] () at ([xshift=-0.5\base]aux29) {$\mathbf{h}_{t}$};
+                \node[wordnode,anchor=south] () at ([xshift=-0.5\base]aux29) {$\vectorn{h}_{t}$};
            }
        \end{scope}

--- a/Chapter10/Figures/figure-output-layer-structur.tex
+++ b/Chapter10/Figures/figure-output-layer-structur.tex
@@ -123,7 +123,7 @@
                    \draw [->,thick] ([xshift=0.2em,yshift=0.1em]hidden.north west) -- (target.south west);
                    \draw [->,thick] ([xshift=-0.2em,yshift=0.1em]hidden.north east) -- (target.south east);
-                    \node [anchor=south] () at ([yshift=0.3em]hidden.north) {\scriptsize{$\hat{\mathbf{s}}=\mathbf{Ws}$}};
+                    \node [anchor=south] () at ([yshift=0.3em]hidden.north) {\scriptsize{$\hat{\vectorn{s}}=\vectorn{Ws}$}};
                }
                {

--- a/Chapter10/Figures/figure-query-model-corresponding-to-attention-mechanism.tex
+++ b/Chapter10/Figures/figure-query-model-corresponding-to-attention-mechanism.tex
@@ -12,17 +12,17 @@
 \tikzstyle{rnode} = [draw,minimum width=3.5em,minimum height=1.2em]
-\node [rnode,anchor=south west,fill=red!20!white] (value1) at (0,0) {\scriptsize{$\textbf{h}(\textrm{``你''})$}};
+\node [rnode,anchor=south west,fill=red!20!white] (value1) at (0,0) {\scriptsize{$\vectorn{h}(\textrm{“你”})$}};
-\node [rnode,anchor=south west,fill=red!20!white] (value2) at ([xshift=1em]value1.south east) {\scriptsize{$\textbf{h}(\textrm{``什么''})$}};
+\node [rnode,anchor=south west,fill=red!20!white] (value2) at ([xshift=1em]value1.south east) {\scriptsize{$\vectorn{h}(\textrm{“什么”})$}};
-\node [rnode,anchor=south west,fill=red!20!white] (value3) at ([xshift=1em]value2.south east) {\scriptsize{$\textbf{h}(\textrm{``也''})$}};
+\node [rnode,anchor=south west,fill=red!20!white] (value3) at ([xshift=1em]value2.south east) {\scriptsize{$\vectorn{h}(\textrm{“也”})$}};
-\node [rnode,anchor=south west,fill=red!20!white] (value4) at ([xshift=1em]value3.south east) {\scriptsize{$\textbf{h}(\textrm{``没''})$}};
+\node [rnode,anchor=south west,fill=red!20!white] (value4) at ([xshift=1em]value3.south east) {\scriptsize{$\vectorn{h}(\textrm{“没”})$}};
-\node [rnode,anchor=south west,fill=green!20!white] (key1) at ([yshift=0.2em]value1.north west) {\scriptsize{$\textbf{h}(\textrm{``你''})$}};
+\node [rnode,anchor=south west,fill=green!20!white] (key1) at ([yshift=0.2em]value1.north west) {\scriptsize{$\vectorn{h}(\textrm{“你”})$}};
-\node [rnode,anchor=south west,fill=green!20!white] (key2) at ([yshift=0.2em]value2.north west) {\scriptsize{$\textbf{h}(\textrm{``什么''})$}};
+\node [rnode,anchor=south west,fill=green!20!white] (key2) at ([yshift=0.2em]value2.north west) {\scriptsize{$\vectorn{h}(\textrm{“什么”})$}};
-\node [rnode,anchor=south west,fill=green!20!white] (key3) at ([yshift=0.2em]value3.north west) {\scriptsize{$\textbf{h}(\textrm{``也''})$}};
+\node [rnode,anchor=south west,fill=green!20!white] (key3) at ([yshift=0.2em]value3.north west) {\scriptsize{$\vectorn{h}(\textrm{“也”})$}};
-\node [rnode,anchor=south west,fill=green!20!white] (key4) at ([yshift=0.2em]value4.north west) {\scriptsize{$\textbf{h}(\textrm{``没''})$}};
+\node [rnode,anchor=south west,fill=green!20!white] (key4) at ([yshift=0.2em]value4.north west) {\scriptsize{$\vectorn{h}(\textrm{“没”})$}};
-\node [rnode,anchor=east] (query) at ([xshift=-2em]key1.west) {\scriptsize{$\textbf{s}(\textrm{``you''})$}};
+\node [rnode,anchor=east] (query) at ([xshift=-2em]key1.west) {\scriptsize{$\vectorn{s}(\textrm{“you”})$}};
 \node [anchor=east] (querylabel) at ([xshift=-0.2em]query.west) {\scriptsize{query}};
 \draw [->] ([yshift=1pt,xshift=6pt]query.north) .. controls +(90:1em) and +(90:1em) .. ([yshift=1pt]key1.north);

--- a/Chapter10/Figures/figure-the-whole-of-lstm.tex
+++ b/Chapter10/Figures/figure-the-whole-of-lstm.tex
@@ -141,15 +141,15 @@
 \end{scope}
 \begin{scope}
-\node[wordnode,anchor=south] () at ([xshift=0.5\base]aux21) {$\mathbf{h}_{t-1}$};
+\node[wordnode,anchor=south] () at ([xshift=0.5\base]aux21) {$\vectorn{h}_{t-1}$};
-\node[wordnode,anchor=west] () at (aux12) {$\mathbf{x}_t$};
+\node[wordnode,anchor=west] () at (aux12) {$\vectorn{x}_t$};
-\node[wordnode,anchor=south] () at ([xshift=0.5\base]aux51) {$\mathbf{c}_{t-1}$};
+\node[wordnode,anchor=south] () at ([xshift=0.5\base]aux51) {$\vectorn{c}_{t-1}$};
 {
-\node[wordnode,anchor=south] () at ([xshift=-0.5\base]aux59) {$\mathbf{c}_{t}$};
+\node[wordnode,anchor=south] () at ([xshift=-0.5\base]aux59) {$\vectorn{c}_{t}$};
 }
 {
-\node[wordnode,anchor=east] () at (aux68) {$\mathbf{h}_{t}$};
+\node[wordnode,anchor=east] () at (aux68) {$\vectorn{h}_{t}$};
-\node[wordnode,anchor=south] () at ([xshift=-0.5\base]aux29) {$\mathbf{h}_{t}$};
+\node[wordnode,anchor=south] () at ([xshift=-0.5\base]aux29) {$\vectorn{h}_{t}$};
 }
 \end{scope}
@@ -170,19 +170,19 @@
 \begin{scope}
 {
 % forget gate formula
-\node[formulanode,anchor=south east,text width=10em] () at ([shift={(4\base,1.5\base)}]aux51) {遗忘门\\$\mathbf{f}_t=\sigma(\mathbf{W}_f[\mathbf{h}_{t-1},\mathbf{x}_t]+\mathbf{b}_f)$};
+\node[formulanode,anchor=south east,text width=10em] () at ([shift={(4\base,1.5\base)}]aux51) {遗忘门\\$\vectorn{f}_t=\sigma(\vectorn{W}_f[\vectorn{h}_{t-1},\vectorn{x}_t]+\vectorn{b}_f)$};
 }
 {
 % input gate formula
-\node[formulanode,anchor=north east,text width=10em] () at ([shift={(4\base,-1.5\base)}]aux21) {输入门\\$\mathbf{i}_t=\sigma(\mathbf{W}_i[\mathbf{h}_{t-1},\mathbf{x}_t]+\mathbf{b}_i)$\\$\hat{\mathbf{c}}_t=\mathrm{tanh}(\mathbf{W}_c[\mathbf{h}_{t-1},\mathbf{x}_t]+\mathbf{b}_c)$};
+\node[formulanode,anchor=north east,text width=10em] () at ([shift={(4\base,-1.5\base)}]aux21) {输入门\\$\vectorn{i}_t=\sigma(\vectorn{W}_i[\vectorn{h}_{t-1},\vectorn{x}_t]+\vectorn{b}_i)$\\$\hat{\vectorn{c}}_t=\mathrm{tanh}(\vectorn{W}_c[\vectorn{h}_{t-1},\vectorn{x}_t]+\vectorn{b}_c)$};
 }
 {
 % cell update formula
-\node[formulanode,anchor=south west,text width=10em] () at ([shift={(-4\base,1.5\base)}]aux59) {记忆更新\\$\mathbf{c}_{t}=\mathbf{f}_t\cdot \mathbf{c}_{t-1}+\mathbf{i}_t\cdot \hat{\mathbf{c}}_t$};
+\node[formulanode,anchor=south west,text width=10em] () at ([shift={(-4\base,1.5\base)}]aux59) {记忆更新\\$\vectorn{c}_{t}=\vectorn{f}_t\cdot \vectorn{c}_{t-1}+\vectorn{i}_t\cdot \hat{\vectorn{c}}_t$};
 }
 {
 % output gate formula
-\node[formulanode,anchor=north west,text width=10em] () at ([shift={(-4\base,-1.5\base)}]aux29) {输出门\\$\mathbf{o}_t=\sigma(\mathbf{W}_o[\mathbf{h}_{t-1},\mathbf{x}_t]+\mathbf{b}_o)$\\$\mathbf{h}_{t}=\mathbf{o}_t\cdot \mathrm{tanh}(\mathbf{c}_{t})$};
+\node[formulanode,anchor=north west,text width=10em] () at ([shift={(-4\base,-1.5\base)}]aux29) {输出门\\$\vectorn{o}_t=\sigma(\vectorn{W}_o[\vectorn{h}_{t-1},\vectorn{x}_t]+\vectorn{b}_o)$\\$\vectorn{h}_{t}=\vectorn{o}_t\cdot \mathrm{tanh}(\vectorn{c}_{t})$};
 }
 \end{scope}
 \end{tikzpicture}

--- a/Chapter10/Figures/figure-word-embedding-structure.tex
+++ b/Chapter10/Figures/figure-word-embedding-structure.tex
@@ -14,9 +14,9 @@
                    \node[rnnnode,minimum height=0.5\base,fill=green!30!white,anchor=west] (eemb\x) at ([xshift=0.4\base]eemb\y.east) {\tiny{$e_x()$}};
                \foreach \x in {1,2,...,3}
                    \node[rnnnode,fill=blue!30!white,anchor=south] (enc\x) at ([yshift=0.3\base]eemb\x.north) {};
-                    \node[] (enclabel1) at (enc1) {\tiny{$\textbf{h}_{m-2}$}};
+                    \node[] (enclabel1) at (enc1) {\tiny{$\vectorn{h}_{m-2}$}};
-                    \node[] (enclabel2) at (enc2) {\tiny{$\textbf{h}_{m-1}$}};
+                    \node[] (enclabel2) at (enc2) {\tiny{$\vectorn{h}_{m-1}$}};
-                    \node[rnnnode,fill=purple!30!white] (enclabel3) at (enc3) {\tiny{$\textbf{h}_{m}$}};
+                    \node[rnnnode,fill=purple!30!white] (enclabel3) at (enc3) {\tiny{$\vectorn{h}_{m}$}};
                \node[wordnode,left=0.4\base of enc1] (init1) {$\cdots$};
                \node[wordnode,left=0.4\base of eemb1] (init2) {$\cdots$};
@@ -28,7 +28,7 @@
                \foreach \x in {1,2,...,3}
                    \node[rnnnode,minimum height=0.5\base,fill=green!30!white,anchor=south] (demb\x) at ([yshift=\base]enc\x.north) {\tiny{$e_y()$}};
                \foreach \x in {1,2,...,3}
-                    \node[rnnnode,fill=blue!30!white,anchor=south] (dec\x) at ([yshift=0.3\base]demb\x.north) {{\tiny{$\textbf{s}_\x$}}};
+                    \node[rnnnode,fill=blue!30!white,anchor=south] (dec\x) at ([yshift=0.3\base]demb\x.north) {{\tiny{$\vectorn{s}_\x$}}};
                \foreach \x in {1,2,...,3}
                    \node[rnnnode,minimum height=0.5\base,fill=red!30!white,anchor=south] (softmax\x) at ([yshift=0.3\base]dec\x.north) {\tiny{Softmax}};
                \node[wordnode,right=0.4\base of demb3] (end1) {$\cdots$};

--- a/Chapter10/chapter10.tex
+++ b/Chapter10/chapter10.tex
--- a/Chapter2/Figures/figure-schematic-chain-rule.tex
+++ b/Chapter2/Figures/figure-schematic-chain-rule.tex
-%%% outline
-%-------------------------------------------------------------------------
-\begin{tikzpicture}
-\node [anchor=north west](num1)  at (0,0) {\large{A}};
-\node [anchor=north west](num2)  at ([xshift=5.8em,yshift=1.44em]num1.south west) {\large{B}};
-\node [anchor=north west](num3)  at ([xshift=5.8em,yshift=1.44em]num2.south west) {\large{C}};
-\node [anchor=north west](num4)  at ([xshift=5.8em,yshift=1.44em]num3.south west) {\large{D}};
-\node [anchor=north west](num5)  at ([xshift=0.04em,yshift=-2.5em]num3.south west) {\large{E}};
-\draw [<-,very thick,black] (num1.east)--(num2.west);
-\draw [->,very thick,black] (num2.east)--(num3.west);
-\draw [<-,very thick,black] (num3.east)--(num4.west);
-\draw [->,very thick,black] (num3.south)--(num5.north);
-\end{tikzpicture}
--- a/Chapter2/chapter2.tex
+++ b/Chapter2/chapter2.tex
@@ -41,7 +41,7 @@
 %----------------------------------------------------------------------------------------
 \subsection{随机变量和概率}
-\parinterval 在自然界中，很多{\small\bfnew{事件}}\index{事件}（Event）\index{Event}是否会发生是不确定的。例如，明天会下雨、掷一枚硬币是正面朝上、扔一个骰子的点数是1等。这些事件可能会发生也可能不会发生。通过大量的重复试验，能发现其具有某种规律性的事件叫做{\small\sffamily\bfseries{随机事件}}\index{随机事件}。
+\parinterval 在自然界中，很多{\small\bfnew{事件}}\index{事件}（Event）\index{Event}是否会发生是不确定的。例如，明天会下雨、掷一枚硬币是正面朝上、扔一个骰子的点数是1等。这些事件可能会发生也可能不会发生。通过大量的重复试验，能发现具有某种规律性的事件叫做{\small\sffamily\bfseries{随机事件}}\index{随机事件}。
 \parinterval {\small\sffamily\bfseries{随机变量}}\index{随机变量}（Random Variable）\index{Random Variable}是对随机事件发生可能状态的描述，是随机事件的数量表征。设$\Omega = \{ \omega \}$为一个随机试验的样本空间，$X=X(\omega)$就是定义在样本空间$\Omega$上的单值实数函数，即$X=X(\omega)$为随机变量，记为$X$。随机变量是一种能随机选取数值的变量，常用大写的英语字母或希腊字母表示，其取值通常用小写字母来表示。例如，用$A$ 表示一个随机变量，用$a$表示变量$A$的一个取值。根据随机变量可以选取的值的某些性质，可以将其划分为离散变量和连续变量。
@@ -62,7 +62,7 @@
 \begin{tabular}{c|c c c c c c}
 \rule{0pt}{15pt}     $A$ & $a_1=1$ & $a_2=2$ & $a_3=3$ & $a_4=4$ & $a_5=5$ & $a_6=6$\\
               \hline
-\rule{0pt}{15pt}     $\funp{P}_i$ & $\funp{P}_1=\frac{4}{25}$  &  $\funp{P}_2=\frac{3}{25}$ &  $\funp{P}_3=\frac{4}{25}$ & $\funp{P}_4=\frac{6}{25}$ & $\funp{P}_5=\frac{3}{25}$ & $\funp{P}_6=\frac{1}{25}$  \\
+\rule{0pt}{15pt}     $\funp{P}_i$ & $\funp{P}_1=\frac{4}{25}$  &  $\funp{P}_2=\frac{3}{25}$ &  $\funp{P}_3=\frac{4}{25}$ & $\funp{P}_4=\frac{6}{25}$ & $\funp{P}_5=\frac{3}{25}$ & $\funp{P}_6=\frac{5}{25}$  \\
             \end{tabular}
             \label{tab:2-1}
 \end{table}
@@ -70,7 +70,7 @@
 \parinterval 除此之外，概率函数$\funp{P}(\cdot)$还具有非负性、归一性等特点。非负性是指，所有的概率函数$\funp{P}(\cdot)$都必须是大于等于0的数值，概率函数中不可能出现负数，即$\forall{x},\funp{P}{(x)}\geq{0}$。归一性，又称规范性，简单的说就是所有可能发生的事件的概率总和为1，即$\sum_{x}\funp{P}{(x)}={1}$。
-\parinterval 对于离散变量$A$，$\funp{P}(A=a)$是个确定的值，可以表示事件$A=a$的可能性大小；而对于连续变量，求在某个定点处的概率是无意义的，只能求其落在某个取值区间内的概率。因此，用{\small\sffamily\bfseries{概率分布函数}}\index{概率分布函数}$F(x)$和{\small\sffamily\bfseries{概率密度函数}}\index{概率密度函数}$f(x)$来统一描述随机变量取值的分布情况（如图\ref{fig:2-1}）。概率分布函数$F(x)$表示取值小于等于某个值的概率，是概率的累加（或积分）形式。假设$A$是一个随机变量，$a$是任意实数，将函数$F(a)=\funp{P}\{A\leq a\}$定义为$A$的分布函数。通过分布函数，可以清晰地表示任何随机变量的概率。
+\parinterval 对于离散变量$A$，$\funp{P}(A=a)$是个确定的值，可以表示事件$A=a$的可能性大小；而对于连续变量，求在某个定点处的概率是无意义的，只能求其落在某个取值区间内的概率。因此，用{\small\sffamily\bfseries{概率分布函数}}\index{概率分布函数}$F(x)$和{\small\sffamily\bfseries{概率密度函数}}\index{概率密度函数}$f(x)$来统一描述随机变量取值的分布情况（如图\ref{fig:2-1}）。概率分布函数$F(x)$表示取值小于等于某个值的概率，是概率的累加（或积分）形式。假设$A$是一个随机变量，$a$是任意实数，将函数$F(a)=\funp{P}\{A\leq a\}$定义为$A$的分布函数。通过分布函数，可以清晰地表示任何随机变量的概率分布情况。
 %----------------------------------------------
 \begin{figure}[htp]
@@ -81,7 +81,7 @@
 \end{figure}
 %-------------------------------------------
-\parinterval 概率密度函数反映了变量在某个区间内的概率变化快慢，概率密度函数的值是概率的变化率，该连续变量的概率也就是对概率密度函数求积分得到的结果。设$f(x) \geq 0$是连续变量$X$的概率密度函数，$X$的分布函数就可以用如下公式定义：
+\parinterval 概率密度函数反映了变量在某个区间内的概率变化快慢，概率密度函数的值是概率的变化率，该连续变量的概率分布函数也就是对概率密度函数求积分得到的结果。设$f(x) \geq 0$是连续变量$X$的概率密度函数，$X$的分布函数就可以用如下公式定义：
 \begin{eqnarray}
 F(x)=\int_{-\infty}^x f(x)\textrm{d}x
 \label{eq:2-1}
@@ -92,9 +92,9 @@ F(x)=\int_{-\infty}^x f(x)\textrm{d}x
 %----------------------------------------------------------------------------------------
 \subsection{联合概率、条件概率和边缘概率}
-\parinterval {\small\sffamily\bfseries{联合概率}}\index{联合概率}（Joint Probability）\index{Joint Probability}是指多个事件共同发生，每个随机变量满足各自条件的概率，表示为$\funp{P}(AB)$或$\funp{P}(A\cap{B})$。{\small\sffamily\bfseries{条件概率}}\index{条件概率}（Conditional Probability）\index{Conditional Probability}是指$A$、$B$为任意的两个事件，在事件$A$已出现的前提下，事件$B$出现的概率，使用$\funp{P}(B \mid A)$表示。
+\parinterval {\small\sffamily\bfseries{联合概率}}\index{联合概率}（Joint Probability）\index{Joint Probability}是指多个事件共同发生，每个随机变量满足各自条件的概率。如事件$A$和事件$B$的联合概率可以表示为$\funp{P}(AB)$或$\funp{P}(A\cap{B})$。{\small\sffamily\bfseries{条件概率}}\index{条件概率}（Conditional Probability）\index{Conditional Probability}是指$A$、$B$为任意的两个事件，在事件$A$已出现的前提下，事件$B$出现的概率，使用$\funp{P}(B \mid A)$表示。
-\parinterval 贝叶斯法则（见\ref{sec:2.2.3}小节）是条件概率计算时的重要依据，条件概率可以表示为
+\parinterval 贝叶斯法则（见\ref{sec:2.2.3}小节）是条件概率计算时的重要依据，条件概率可以表示为：
 \begin{eqnarray}
 \funp{P}{(B|A)} & = & \frac{\funp{P}(A\cap{B})}{\funp{P}(A)}  \nonumber \\
                           & = & \frac{\funp{P}(A)\funp{P}(B|A)}{\funp{P}(A)}  \nonumber \\
@@ -102,25 +102,25 @@ F(x)=\int_{-\infty}^x f(x)\textrm{d}x
 \label{eq:2-2}
 \end{eqnarray}
-\parinterval {\small\sffamily\bfseries{边缘概率}}\index{边缘概率}（Marginal Probability）\index{Marginal Probability}是和联合概率对应的，它指的是$\funp{P}(X=a)$或$\funp{P}(Y=b)$，即仅与单个随机变量有关的概率。对于离散随机变量$X$和$Y$，如果知道$\funp{P}(X,Y)$，则边缘概率$\funp{P}(X)$可以通过求和的方式得到。对于$\forall x \in X $，有
+\parinterval {\small\sffamily\bfseries{边缘概率}}\index{边缘概率}（Marginal Probability）\index{Marginal Probability}是和联合概率对应的，它指的是$\funp{P}(X=a)$或$\funp{P}(Y=b)$，即仅与单个随机变量有关的概率。对于离散随机变量$X$和$Y$，如果知道$\funp{P}(X,Y)$，则边缘概率$\funp{P}(X)$可以通过求和的方式得到。对于$\forall x \in X $，有：
 \begin{eqnarray}
 \funp{P}(X=x)=\sum_{y}  \funp{P}(X=x,Y=y)
 \label{eq:2-3}
 \end{eqnarray}
-\parinterval 对于连续变量，边缘概率$\funp{P}(X)$需要通过积分得到，如下式所示
+\parinterval 对于连续变量，边缘概率$\funp{P}(X)$需要通过积分得到，如下式所示：
 \begin{eqnarray}
 \funp{P}(X=x)=\int \funp{P}(x,y)\textrm{d}y
 \label{eq:2-4}
 \end{eqnarray}
-\parinterval 为了更好地区分条件概率、边缘概率和联合概率，这里用一个图形面积的计算来举例说明。如图\ref{fig:2-2}所示，矩形$A$代表事件$X$发生所对应的所有可能状态，矩形$B$代表事件$Y$发生所对应的所有可能状态，矩形$C$代表$A$和$B$的交集，则
+\parinterval 为了更好地区分条件概率、边缘概率和联合概率，这里用一个图形面积的计算来举例说明。如图\ref{fig:2-2}所示，矩形$A$代表事件$X$发生所对应的所有可能状态，矩形$B$代表事件$Y$发生所对应的所有可能状态，矩形$C$代表$A$和$B$的交集，则：
 \begin{itemize}
 \vspace{0.5em}
-\item 边缘概率：矩形$A$或者矩形$B$的面积；
+\item 边缘概率：矩形$A$或者矩形$B$的面积。
 \vspace{0.5em}
-\item 联合概率：矩形$C$的面积；
+\item 联合概率：矩形$C$的面积。
 \vspace{0.5em}
 \item 条件概率：联合概率/对应的边缘概率，如：$\funp{P}(A \mid B)$=矩形$C$的面积/矩形B的面积。
 \vspace{0.5em}
@@ -148,45 +148,27 @@ F(x)=\int_{-\infty}^x f(x)\textrm{d}x
 \label{eq:2-5}
 \end{eqnarray}
-\parinterval 推广到$n$个事件，可以得到了{\small\bfnew{链式法则}}\index{链式法则}（Chain Rule\index{Chain Rule}）的公式
+\parinterval 推广到$n$个事件，可以得到了{\small\bfnew{链式法则}}\index{链式法则}（Chain Rule\index{Chain Rule}）的公式：
 \begin{eqnarray}
 \funp{P}(x_1,x_2, \ldots ,x_n)=\funp{P}(x_1) \prod_{i=2}^n \funp{P}(x_i \mid x_1,x_2, \ldots ,x_{i-1})
 \label{eq:2-6}
 \end{eqnarray}
-\parinterval 下面的例子有助于更好的理解链式法则，如图\ref{fig:2-3}所示，$A$、$B$、$C$、$D$、$E$分别代表五个事件，其中，$A$只和$B$有关，$C$只和$B$、$D$有关，$E$只和$C$有关，$B$和$D$不依赖其他任何事件。则$P(A,B,C,D,E)$的表达式如下式：
+\parinterval 链式法则经常被用于对事件序列的建模。比如，事件A依赖于事件B，事件B依赖于事件C，应用链式法有：
-\begin{eqnarray}
-&   & \funp{P}(A,B,C,D,E) \nonumber \\
-&=&\funp{P}(E \mid A,B,C,D) \cdot \funp{P}(A,B,C,D) \nonumber \\
-&=&\funp{P}(E \mid A,B,C,D) \cdot \funp{P}(D \mid A,B,C) \cdot \funp{P}(A,B,C) \nonumber \\
-&=&\funp{P}(E \mid A,B,C,D) \cdot \funp{P}(D \mid A,B,C) \cdot \funp{P}(C \mid A,B) \cdot \funp{P}(A,B) \nonumber \\
-&=&\funp{P}(E \mid A,B,C,D) \cdot \funp{P}(D \mid A,B,C) \cdot \funp{P}(C \mid A,B) \cdot \funp{P}(B \mid A) \cdot \funp{P}(A)
-\label{eq:2-7}
-\end{eqnarray}
-\parinterval 根据图\ref {fig:2-3} 易知$E$只和$C$有关，所以$\funp{P}(E \mid A,B,C,D)=\funp{P}(E \mid C)$；$D$不依赖于其他事件，所以$\funp{P}(D \mid A,B,C)=\funp{P}(D)$；$C$只和$B$、$D$有关，所以$\funp{P}(C \mid A,B)=\funp{P}(C \mid B)$；$B$不依赖于其他事件，所以$\funp{P}(B \mid  A)=\funp{P}(B)$。最终化简可得：
 \begin{eqnarray}
-\funp{P}(A,B,C,D,E)=\funp{P}(E \mid C) \cdot \funp{P}(D) \cdot \funp{P}(C \mid B) \cdot \funp{P}(B)\cdot \funp{P}(A \mid B)
+\funp{P}(A,B,C) & = & \funp{P}(A \mid B,C)\funp{P}(B \mid C)\funp{P}(C) \nonumber \\
-\label{eq:2-8}
+                & = & \funp{P}(A \mid B)\funp{P}(B \mid C)\funp{P}(C)
+\label{eq:chain-rule-example}
 \end{eqnarray}
-%----------------------------------------------
-\begin{figure}[htp]
-\centering
-\input{./Chapter2/Figures/figure-schematic-chain-rule}
-\setlength{\belowcaptionskip}{-1cm}
-\caption{事件$A$、$B$、$C$、$D$、$E$之间的关系图}
-\label{fig:2-3}
-\end{figure}
-%-------------------------------------------
 %----------------------------------------------------------------------------------------
 %    NEW SUB-SECTION
 %----------------------------------------------------------------------------------------
 \subsection{贝叶斯法则}\label{sec:2.2.3}
-\parinterval 首先介绍一下全概率公式：{\small\bfnew{全概率公式}}\index{全概率公式}（Law Of Total Probability）\index{Law Of Total Probability}是概率论中重要的公式，它可以将一个复杂事件发生的概率分解成不同情况的小事件发生概率的和。这里先介绍一个概念——划分。集合$\Sigma$的一个划分事件为$\{B_1, \ldots ,B_n\}$是指它们满足$\bigcup_{i=1}^n B_i=S \textrm{且}B_iB_j=\varnothing , i,j=1, \ldots ,n,i\neq j$。此时事件$A$的全概率公式可以被描述为：
+\parinterval 首先介绍一下全概率公式：{\small\bfnew{全概率公式}}\index{全概率公式}（Law of Total Probability）\index{Law of Total Probability}是概率论中重要的公式，它可以将一个复杂事件发生的概率分解成不同情况的小事件发生概率的和。这里先介绍一个概念——划分。集合$\Sigma$的一个划分事件为$\{B_1, \ldots ,B_n\}$是指它们满足$\bigcup_{i=1}^n B_i=S \textrm{且}B_iB_j=\varnothing , i,j=1, \ldots ,n,i\neq j$。此时事件$A$的全概率公式可以被描述为：
 \begin{eqnarray}
 \funp{P}(A)=\sum_{k=1}^n \funp{P}(A \mid B_k)\funp{P}(B_k)
 \label{eq:2-9}
@@ -214,14 +196,14 @@ F(x)=\int_{-\infty}^x f(x)\textrm{d}x
 \label{eq:2-10}
 \end{eqnarray}
-\parinterval {\small\sffamily\bfseries{贝叶斯法则}}\index{贝叶斯法则}（Bayes' Rule）\index{Bayes' Rule}是概率论中的一个经典公式，通常用于已知$\funp{P}(A \mid B)$求$\funp{P}(B \mid A)$。可以表述为：设$\{B_1, \ldots ,B_n\}$是某个集合$\Sigma$的一个划分，$A$为事件，则对于$i=1, \ldots ,n$，有如下公式
+\parinterval {\small\sffamily\bfseries{贝叶斯法则}}\index{贝叶斯法则}（Bayes' Rule）\index{Bayes' Rule}是概率论中的一个经典公式，通常用于已知$\funp{P}(A \mid B)$求$\funp{P}(B \mid A)$。可以表述为：设$\{B_1, \ldots ,B_n\}$是某个集合$\Sigma$的一个划分，$A$为事件，则对于$i=1, \ldots ,n$，有如下公式：
 \begin{eqnarray}
 \funp{P}(B_i \mid A) & = & \frac {\funp{P}(A  B_i)} { \funp{P}(A) } \nonumber \\
                                   & = & \frac {\funp{P}(A \mid B_i)\funp{P}(B_i) } { \sum_{k=1}^n\funp{P}(A \mid B_k)\funp{P}(B_k) }
 \label{eq:2-11}
 \end{eqnarray}
-\noindent 其中，等式右端的分母部分使用了全概率公式。进一步，令$\bar{B}$表示事件$B$不发生的情况，由上式，也可以得到贝叶斯公式的另外一种写法:
+\noindent 其中，等式右端的分母部分使用了全概率公式。进一步，令$\bar{B}$表示事件$B$不发生的情况，由上式，也可以得到贝叶斯公式的另外一种写法：
 \begin{eqnarray}
 \funp{P}(B \mid A) & = & \frac { \funp{P}(A \mid B)\funp{P}(B) }  {\funp{P}(A)} \nonumber \\
                     & = & \frac { \funp{P}(A \mid B)\funp{P}(B) }  {\funp{P}(A \mid B)\funp{P}(B)+\funp{P}(A \mid \bar{B}) \funp{P}(\bar{B})}
@@ -253,7 +235,7 @@ F(x)=\int_{-\infty}^x f(x)\textrm{d}x
 \label{eg:2-1}
 \end{example}
-\parinterval 在这两句话中，“太阳从东方升起”是一件确定性事件（在地球上），几乎不需要查阅更多信息就可以确认，因此这件事的信息熵相对较低；而“明天天气多云”这件事，需要关注天气预报，才能大概率确定这件事，它的不确定性很高，因而它的信息熵也就相对较高。因此，信息熵也是对事件不确定性的度量。进一步，定义{\small\bfnew{自信息}}\index{自信息}（Self-information）\index{Self-information}为一个事件$X$的自信息的表达式为：
+\parinterval 在这两句话中，“太阳从东方升起”是一件确定性事件（在地球上），几乎不需要查阅更多信息就可以确认，因此这件事的信息熵相对较低；而“明天天气多云”这件事，需要关注天气预报，才能大概率确定这件事，它的不确定性很高，因而它的信息熵也就相对较高。因此，信息熵也是对事件不确定性的度量。进一步，一个事件$X$的{\small\bfnew{自信息}}\index{自信息}（Self-information）\index{Self-information}的表达式为：
 \begin{eqnarray}
 \funp{I}(x)=-\log \funp{P}(x)
 \label{eq:2-13}
@@ -272,8 +254,8 @@ F(x)=\int_{-\infty}^x f(x)\textrm{d}x
 \parinterval 自信息处理的是变量单一取值的情况。若量化整个概率分布中的不确定性或信息量，可以用信息熵，记为$\funp{H}(x)$。其公式如下：
 \begin{eqnarray}
-\funp{H}(x) & = & \sum_{x \in \textrm{X}}[ \funp{P}(x) \funp{I}(x)] \nonumber \\
+\funp{H}(x) & = & \sum_{x \in X}[ \funp{P}(x) \funp{I}(x)] \nonumber \\
-              & = & - \sum_{x \in \textrm{X} } [\funp{P}(x)\log(\funp{P}(x)) ]
+              & = & - \sum_{x \in X } [\funp{P}(x)\log(\funp{P}(x)) ]
 \label{eq:2-14}
 \end{eqnarray}
@@ -287,8 +269,8 @@ F(x)=\int_{-\infty}^x f(x)\textrm{d}x
 \parinterval 如果同一个随机变量$X$上有两个概率分布$\funp{P}(x)$和$\funp{Q}(x)$，那么可以使用{\small\bfnew{Kullback-Leibler距离}}\index{Kullback-Leibler距离}或{\small\bfnew{KL距离}}\index{KL距离}（KL Distance\index{KL Distance}）来衡量这两个分布的不同（也称作KL 散度），这种度量就是{\small\bfnew{相对熵}}\index{相对熵}（Relative Entropy）\index{Relative Entropy}。其公式如下：
 \begin{eqnarray}
-\funp{D}_{\textrm{KL}}(\funp{P}\parallel \funp{Q}) & = & \sum_{x \in \textrm{X}} [ \funp{P}(x)\log \frac{\funp{P}(x) }{ \funp{Q}(x) } ]  \nonumber \\
+\funp{D}_{\textrm{KL}}(\funp{P}\parallel \funp{Q}) & = & \sum_{x \in X} [ \funp{P}(x)\log \frac{\funp{P}(x) }{ \funp{Q}(x) } ]  \nonumber \\
-                                                                                       & = & \sum_{x \in \textrm{X} }[ \funp{P}(x)(\log \funp{P}(x)-\log \funp{Q}(x))]
+                                                                                       & = & \sum_{x \in X }[ \funp{P}(x)(\log \funp{P}(x)-\log \funp{Q}(x))]
 \label{eq:2-15}
 \end{eqnarray}
@@ -310,11 +292,11 @@ F(x)=\int_{-\infty}^x f(x)\textrm{d}x
 \parinterval {\small\bfnew{交叉熵}}\index{交叉熵}（Cross-entropy）\index{Cross-entropy}是一个与KL距离密切相关的概念，它的公式是：
 \begin{eqnarray}
-\funp{H}(\funp{P},\funp{Q})=-\sum_{x \in \textrm{X}} [\funp{P}(x) \log \funp{Q}(x) ]
+\funp{H}(\funp{P},\funp{Q})=-\sum_{x \in X} [\funp{P}(x) \log \funp{Q}(x) ]
 \label{eq:2-16}
 \end{eqnarray}
-\parinterval 结合相对熵公式可知，交叉熵是KL距离公式中的右半部分。因此，当概率分布$\funp{P}(x)$固定时，求关于$\funp{Q}$的交叉熵的最小值等价于求KL距离的最小值。从实践的角度来说，交叉熵与KL距离的目的相同：都是用来描述两个分布的差异，由于交叉熵计算上更加直观方便，因此在机器翻译中被广泛应用。
+\parinterval 结合相对熵公式可知，交叉熵是KL距离公式中的右半部分。因此，当概率分布$\funp{P}(x)$固定时，求关于$\funp{Q}$的交叉熵的最小值等价于求KL距离的最小值。从实践的角度来说，交叉熵与KL距离的目的相同：都是用来描述两个分布的差异。由于交叉熵计算上更加直观方便，因此在机器翻译中被广泛应用。
 %----------------------------------------------------------------------------------------
 %    NEW SECTION
@@ -336,7 +318,7 @@ F(x)=\int_{-\infty}^x f(x)\textrm{d}x
 \end{figure}
 %-------------------------------------------
-\parinterval 此时玩家的胜利似乎只能来源于运气。不过，这里的假设“随便选一个数字”本身就是一个概率模型，它对骰子的六个面的出现做了均匀分布假设。
+\parinterval 此时玩家的胜利似乎只能来源于运气。不过，这里的假设“随便选一个数字，获胜的概率是一样的”本身就是一个概率模型，它对骰子的六个面的出现做了均匀分布假设：
 \begin{eqnarray}
 \funp{P}(\text{1})=\funp{P}(\text{2})= \ldots =\funp{P}(\text{5})=\funp{P}(\text{6})=1/6
 \label{eq:2-17}
@@ -448,7 +430,7 @@ F(x)=\int_{-\infty}^x f(x)\textrm{d}x
 \label{eq:2-20}
 \end{eqnarray}
-\noindent 其中，$V$为词汇表。本质上，这个方法和计算单词出现概率$\funp{P}(w_i)$的方法是一样的。但是这里的问题是：当$m$较大时，词串$w_1 w_2 \ldots w_m$可能非常低频，甚至在数据中没有出现过。这时，由于$\textrm{count}(w_1 w_2 \ldots w_m) \approx 0$，公式\ref{eq:seq-mle}的结果会不准确，甚至产生0概率的情况。这是观测低频事件时经常出现的问题。对于这个问题，另一种概思路是对多个联合出现的事件进行独立性假设，这里可以假设$w_1$、$w_2\ldots w_m$的出现是相互独立的，于是
+\noindent 其中，$V$为词汇表。本质上，这个方法和计算单词出现概率$\funp{P}(w_i)$的方法是一样的。但是这里的问题是：当$m$较大时，词串$w_1 w_2 \ldots w_m$可能非常低频，甚至在数据中没有出现过。这时，由于$\textrm{count}(w_1 w_2 \ldots w_m) \approx 0$，公式\ref{eq:seq-mle}的结果会不准确，甚至产生0概率的情况。这是观测低频事件时经常出现的问题。对于这个问题，另一种概思路是对多个联合出现的事件进行独立性假设，这里可以假设$w_1$、$w_2\ldots w_m$的出现是相互独立的，于是：
 \begin{eqnarray}
 \funp{P}(w_1 w_2 \ldots w_m) & = & \funp{P}(w_1) \funp{P}(w_2) \ldots \funp{P}(w_m) \label{eq:seq-independ}
 \label{eq:2-21}
@@ -481,7 +463,7 @@ F(x)=\int_{-\infty}^x f(x)\textrm{d}x
 \end{definition}
 %-------------------------------------------
-\parinterval 直接求$\funp{P}(w_1 w_2 \ldots w_m)$并不简单，因为如果把整个词串$w_1 w_2 \ldots w_m$作为一个变量，模型的参数量会非常大。$w_1 w_2 \ldots w_m$有$|V|^m$种可能性，这里$|V|$表示词汇表大小。显然，当$m$ 增大时，模型的复杂度会急剧增加，甚至都无法进行存储和计算。既然把$w_1 w_2 \ldots w_m$作为一个变量不好处理，就可以考虑对这个序列的生成过程进行分解。使用链式法则（见\ref{sec:chain-rule} 节），很容易得到
+\parinterval 直接求$\funp{P}(w_1 w_2 \ldots w_m)$并不简单，因为如果把整个词串$w_1 w_2 \ldots w_m$作为一个变量，模型的参数量会非常大。$w_1 w_2 \ldots w_m$有$|V|^m$种可能性，这里$|V|$表示词汇表大小。显然，当$m$ 增大时，模型的复杂度会急剧增加，甚至都无法进行存储和计算。既然把$w_1 w_2 \ldots w_m$作为一个变量不好处理，就可以考虑对这个序列的生成过程进行分解。使用链式法则（见\ref{sec:chain-rule} 节），很容易得到：
 \begin{eqnarray}
 \funp{P}(w_1 w_2 \ldots w_m)=\funp{P}(w_1)\funp{P}(w_2|w_1)\funp{P}(w_3|w_1 w_2) \ldots \funp{P}(w_m|w_1 w_2 \ldots w_{m-1})
 \label{eq:2-22}
@@ -515,7 +497,7 @@ F(x)=\int_{-\infty}^x f(x)\textrm{d}x
 \end{center}
 %------------------------------------------------------
-\parinterval 可以看到，1-gram语言模型只是$n$-gram语言模型的一种特殊形式。基于独立性假设，1-gram假定当前单词出现与否与任何历史都无关，这种方法大大化简了求解句子概率的复杂度。比如，上一节中公式\ref{eq:seq-independ}就是一个1-gram语言模型。但是，句子中的单词并非完全相互独立的，这种独立性假设并不能完美的描述客观世界的问题。如果需要更精确地获取句子的概率，就需要使用更长的“历史”信息，比如，2-gram、3-gram、甚至更高阶的语言模型。
+\parinterval 可以看到，1-gram语言模型只是$n$-gram语言模型的一种特殊形式。基于独立性假设，1-gram假定当前单词出现与否与任何历史都无关，这种方法大大化简了求解句子概率的复杂度。比如，上一节中公式\ref{eq:seq-independ}就是一个1-gram语言模型。但是，句子中的单词并非完全相互独立的，这种独立性假设并不能完美地描述客观世界的问题。如果需要更精确地获取句子的概率，就需要使用更长的“历史”信息，比如，2-gram、3-gram、甚至更高阶的语言模型。
 \parinterval $n$-gram的优点在于，它所使用的历史信息是有限的，即$n-1$个单词。这种性质也反映了经典的马尔可夫链的思想\upcite{liuke-markov-2004,resnick1992adventures}，有时也被称作马尔可夫假设或者马尔可夫属性。因此$n$-gram也可以被看作是变长序列上的一种马尔可夫模型，比如，2-gram语言模型对应着1阶马尔可夫模型，3-gram语言模型对应着2阶马尔可夫模型，以此类推。
@@ -523,7 +505,7 @@ F(x)=\int_{-\infty}^x f(x)\textrm{d}x
 \begin{itemize}
 \vspace{0.5em}
-\item {\small\bfnew{基于频次的方法}}\index{基于频次的方法}。直接利用词序列在训练数据中出现的频次计算出$\funp{P}(w_m|w_{m-n+1}$\\$ \ldots  w_{m-1})$
+\item {\small\bfnew{基于频次的方法}}\index{基于频次的方法}。直接利用词序列在训练数据中出现的频次计算出$\funp{P}(w_m|w_{m-n+1}$\\$ \ldots  w_{m-1})$：
 \begin{eqnarray}
 \funp{P}(w_m|w_{m-n+1} \ldots w_{m-1})=\frac{\textrm{count}(w_{m-n+1} \ldots w_m)}{\textrm{count}(w_{m-n+1} \ldots w_{m-1})}
 \label{eq:2-24}
@@ -537,11 +519,12 @@ F(x)=\int_{-\infty}^x f(x)\textrm{d}x
 \end{itemize}
 \vspace{0.5em}
-\parinterval 极大似然估计方法（基于频次的方法）和掷骰子游戏中介绍的统计词汇概率的方法是一致的，它的核心是使用$n$-gram出现的频次进行参数估计。基于人工神经网络的方法在近些年也非常受关注，它直接利用多层神经网络对问题的输入$w_{m-n+1} \ldots w_{m-1}$和输出$\funp{P}(w_m|w_{m-n+1}  \ldots  w_{m-1})$进行建模，而模型的参数通过网络中神经元之间连接的权重进行体现。严格意义上了来说，基于人工神经网络的方法并不算基于$n$-gram的方法，或者说它并没有显性记录$n$-gram的生成概率，也不依赖$n$-gram的频次进行参数估计。为了保证内容的连贯性，接下来仍以传统$n$-gram语言模型为基础进行讨论，基于人工神经网络的方法将会在{\chapternine}进行详细介绍。
+\parinterval 极大似然估计方法（基于频次的方法）和掷骰子游戏中介绍的统计词汇概率的方法是一致的，它的核心是使用$n$-gram出现的频次进行参数估计。基于人工神经网络的方法在近些年也非常受关注，它直接利用多层神经网络对问题的输入$w_{m-n+1} \ldots w_{m-1}$和输出$\funp{P}(w_m|w_{m-n+1}  \ldots  w_{m-1})$进行建模，而模型的参数通过网络中神经元之间连接的权重进行体现。严格来说，基于人工神经网络的方法并不算基于$n$-gram的方法，或者说它并没有显性记录$n$-gram的生成概率，也不依赖$n$-gram的频次进行参数估计。为了保证内容的连贯性，接下来仍以传统$n$-gram语言模型为基础进行讨论，基于人工神经网络的方法将会在{\chapternine}进行详细介绍。
 \parinterval $n$-gram语言模型的使用非常简单。可以直接用它来对词序列出现的概率进行计算。比如，可以使用一个2-gram语言模型计算一个句子出现的概率，其中单词之间用斜杠分隔，如下：
 \begin{eqnarray}
- & &\funp{P}_{2-\textrm{gram}}{(\textrm{确实/现在/数据/很多})} \nonumber \\
+ & &\funp{P}_{2-\textrm{gram}}{(\textrm{确实/现在/数据/很
+/多})} \nonumber \\
 &= & \funp{P}(\textrm{确实}) \times \funp{P}(\textrm{现在}|\textrm{确实})\times \funp{P}(\textrm{数据}|\textrm{现在}) \times \nonumber \\
 &  & \funp{P}(\textrm{很}|\textrm{数据})\times \funp{P}(\textrm{多}|\textrm{很})
 \label{eq:2-25}
@@ -555,9 +538,9 @@ F(x)=\int_{-\infty}^x f(x)\textrm{d}x
 \subsection{参数估计和平滑算法}
-对于$n$-gram语言模型，每个$\funp{P}(w_m|w_{m-n+1} \ldots w_{m-1})$都可以被看作是模型的{\small\bfnew{参数}}\index{参数}（Parameter\index{参数}）。而$n$-gram语言模型的一个核心任务是估计这些参数的值，即{\small\bfnew{参数估计}}\index{参数估计}（Parameter Estimation\index{Parameter Estimation}）。通常，参数估计可以通过在数据上的统计得到。一种简单的方法是：给定一定数量的句子，统计每个$n$-gram 出现的频次，并利用公式\ref{eq:2-24}得到每个参数$\funp{P}(w_m|w_{m-n+1} \ldots w_{m-1})$的值。这个过程也被称作模型的{\small\bfnew{训练}}\index{训练}（Training\index{训练}）。对于自然语言处理任务来说，统计模型的训练是至关重要的。在本书后面的内容中也会看到，不同的问题可能需要不同的模型以及不同的模型训练方法。而很多研究工作也都集中在优化模型训练的效果上。
+对于$n$-gram语言模型，每个$\funp{P}(w_m|w_{m-n+1} \ldots w_{m-1})$都可以被看作是模型的{\small\bfnew{参数}}\index{参数}（Parameter\index{Parameter}）。而$n$-gram语言模型的一个核心任务是估计这些参数的值，即参数估计。通常，参数估计可以通过在数据上的统计得到。一种简单的方法是：给定一定数量的句子，统计每个$n$-gram 出现的频次，并利用公式\ref{eq:2-24}得到每个参数$\funp{P}(w_m|w_{m-n+1} \ldots w_{m-1})$的值。这个过程也被称作模型的{\small\bfnew{训练}}\index{训练}（Training\index{训练}）。对于自然语言处理任务来说，统计模型的训练是至关重要的。在本书后面的内容中也会看到，不同的问题可能需要不同的模型以及不同的模型训练方法，并且很多研究工作也都集中在优化模型训练的效果上。
-\parinterval 回到$n$-gram语言模型上。前面所使用的参数估计方法并不完美，因为它无法很好的处理低频或者未见现象。比如，在式\ref{eq:2-25}所示的例子中，如果语料中从没有“确实”和“现在”两个词连续出现的情况，即$\textrm{count}(\textrm{确实}\ \textrm{现在})=0$。 那么使用2-gram 计算句子“确实/现在/数据/很多”的概率时，会出现如下情况
+\parinterval 回到$n$-gram语言模型上。前面所使用的参数估计方法并不完美，因为它无法很好的处理低频或者未见现象。比如，在式\ref{eq:2-25}所示的例子中，如果语料中从没有“确实”和“现在”两个词连续出现的情况，即$\textrm{count}(\textrm{确实}\ \textrm{现在})=0$。 那么使用2-gram 计算句子“确实/现在/数据/很多”的概率时，会出现如下情况：
 \begin{eqnarray}
 \funp{P}(\textrm{现在}|\textrm{确实}) & =  & \frac{\textrm{count}(\textrm{确实}\ \textrm{现在})}{\textrm{count}(\textrm{确实})} \nonumber \\
                                                                     & =  & \frac{0}{\textrm{count}(\textrm{确实})} \nonumber \\
@@ -595,9 +578,9 @@ F(x)=\int_{-\infty}^x f(x)\textrm{d}x
 \label{eq:2-27}
 \end{eqnarray}
-\noindent 其中，$V$表示词表，$|V|$为词表中单词的个数，$w$为词表中的一个词。有时候，加法平滑方法会将$\theta$取1，这时称之为加一平滑或是拉普拉斯平滑。这种方法比较容易理解，也比较简单，因此也往往被用于对系统的快速原型中。
+\noindent 其中，$V$表示词表，$|V|$为词表中单词的个数，$w$为词表中的一个词，count表示统计单词或短语出现的次数。有时候，加法平滑方法会将$\theta$取1，这时称之为加一平滑或是拉普拉斯平滑。这种方法比较容易理解，也比较简单，因此也往往被用于对系统的快速原型中。
-\parinterval 举一个例子。假设在一个英语文档中随机采样一些单词（词表大小$|V|=20$），各个单词出现的次数为：“look”: 4，“people”: 3，“am”: 2，“what”: 1，“want”: 1，“do”: 1。图\ref{fig:2-12} 给出了在平滑之前和平滑之后的概率分布。
+\parinterval 举一个例子。假设在一个英语文档中随机采样一些单词（词表大小$|V|=20$），各个单词出现的次数为：“look”出现4次，“people”出现3次，“am”出现2次，“what”出现1次，“want”出现1次，“do”出现1次。图\ref{fig:2-12} 给出了在平滑之前和平滑之后的概率分布。
 %----------------------------------------------
 \begin{figure}[htp]
@@ -617,25 +600,25 @@ F(x)=\int_{-\infty}^x f(x)\textrm{d}x
 \vspace{-0.5em}
 \parinterval {\small\bfnew{古德-图灵估计法}}\index{古德-图灵估计法}（Good-Turing Estimate）\index{Good-Turing Estimate}是Alan Turing和他的助手Irving John Good开发的，作为他们在二战期间破解德国密码机Enigma所使用的方法的一部分，在1953 年Irving John Good将其发表。这一方法也是很多平滑算法的核心，其基本思路是：把非零的$n$元语法单元的概率降低匀给一些低概率$n$元语法单元，以减小最大似然估计与真实概率之间的偏离\upcite{good1953population,gale1995good}。
-\parinterval 假定在语料库中出现$r$次的$n$-gram有$n_r$个，特别的，出现0次的$n$-gram（即未登录词及词串）出现的次数为$n_0$个。语料库中全部单词的总个数为$N$，显然
+\parinterval 假定在语料库中出现$r$次的$n$-gram有$n_r$个，特别的，出现0次的$n$-gram（即未登录词及词串）出现的次数为$n_0$个。语料库中全部单词的总个数为$N$，显然：
 \begin{eqnarray}
 N = \sum_{r=1}^{\infty}{r\,n_r}
 \label{eq:2-28}
 \end{eqnarray}
-\parinterval 这时，出现$r$次的$n$-gram的相对频率为$r/N$，也就是不做平滑处理时的概率估计。为了解决零概率问题，对于任何一个出现$r$次的$n$-gram，古德-图灵估计法利用出现$r+1$次的$n$-gram统计量重新假设它出现$r^*$次，这里
+\parinterval 这时，出现$r$次的$n$-gram的相对频率为$r/N$，也就是不做平滑处理时的概率估计。为了解决零概率问题，对于任何一个出现$r$次的$n$-gram，古德-图灵估计法利用出现$r+1$次的$n$-gram统计量重新假设它出现$r^*$次：
 \begin{eqnarray}
 r^* = (r + 1)\frac{n_{r + 1}}{n_r}
 \label{eq:2-29}
 \end{eqnarray}
-\parinterval 基于这个公式，就可以估计所有0次$n$-gram的频次$n_0 r^*=(r+1)n_1=n_1$。要把这个重新估计的统计数转化为概率，需要进行归一化处理：对于每个统计数为$r$的事件，其概率为
+\parinterval 基于这个公式，就可以估计所有0次$n$-gram的频次$n_0 r^*=(r+1)n_1=n_1$。要把这个重新估计的统计数转化为概率，需要进行归一化处理：对于每个统计数为$r$的事件，其概率为：
 \begin{eqnarray}
 \funp{P}_r=\frac{r^*}{N}
 \label{eq:2-30}
 \end{eqnarray}
-\noindent 其中
+\noindent 其中：
 \begin{eqnarray}
 N & = & \sum_{r=0}^{\infty}{r^{*}n_r} \nonumber \\
  & = & \sum_{r=0}^{\infty}{(r + 1)n_{r + 1}} \nonumber \\
@@ -687,11 +670,11 @@ N & = & \sum_{r=0}^{\infty}{r^{*}n_r} \nonumber \\
 \parinterval 首先介绍一下Absolute Discounting平滑算法，公式如下所示：
 \begin{eqnarray}
-\funp{P}_{\textrm{AbsDiscount}}(w_i | w_{i-1}) = \frac{c(w_{i-1},w_i )-d}{c(w_{i-1})} + \lambda(w_{i-1})\funp{P}(w)
+\funp{P}_{\textrm{AbsDiscount}}(w_i | w_{i-1}) = \frac{c(w_{i-1},w_i )-d}{c(w_{i-1})} + \lambda(w_{i-1})\funp{P}(w_{i})
 \label{eq:2-33}
 \end{eqnarray}
-\noindent 其中$d$表示被裁剪的值，$\lambda$是一个正则化常数。可以看到第一项是经过减值调整过的2-gram的概率值，第二项则相当于一个带权重$\lambda$的1-gram的插值项。然而这种插值模型极易受到原始1-gram 模型的干扰。
+\noindent 其中$d$表示被裁剪的值，$\lambda$是一个正则化常数，$c(\cdot)$是count$(\cdot)$的缩写。可以看到第一项是经过减值调整过的2-gram的概率值，第二项则相当于一个带权重$\lambda$的1-gram的插值项。然而这种插值模型极易受到原始1-gram 模型的干扰。
 \parinterval 假设这里使用2-gram和1-gram的插值模型预测下面句子中下划线处的词
@@ -707,29 +690,29 @@ I cannot see without my reading \underline{\ \ \ \ \ \ \ \ }
 \parinterval 为了评估$\funp{P}_{\textrm{cont}}$，统计使用当前词作为第二个词所出现2-gram的种类，2-gram法种类越多，这个词作为第二个词出现的可能性越高，呈正比：
 \begin{eqnarray}
-\funp{P}_{\textrm{cont}}(w_i) \varpropto |w_{i-1}: c(w_{i-1} w_i )>0|
+\funp{P}_{\textrm{cont}}(w_i) \varpropto |w_{i-1}: c(w_{i-1},w_i )>0|
 \label{eq:2-34}
 \end{eqnarray}
-通过全部的二元语法的种类做归一化可得到评估的公式
+通过全部的二元语法的种类做归一化可得到评估的公式：
 \begin{eqnarray}
-\funp{P}_{\textrm{cont}}(w_i) = \frac{|\{ w_{i-1}:c(w_{i-1} w_i )>0 \}|}{|\{ (w_{j-1}, w_j):c(w_{j-1}w_j )>0 \}|}
+\funp{P}_{\textrm{cont}}(w_i) = \frac{|\{ w_{i-1}:c(w_{i-1},w_i )>0 \}|}{|\{ (w_{j-1}, w_j):c(w_{j-1},w_j )>0 \}|}
 \label{eq:2-35}
 \end{eqnarray}
-\parinterval 基于分母的变化还有另一种形式
+\parinterval 基于分母的变化还有另一种形式：
 \begin{eqnarray}
-\funp{P}_{\textrm{cont}}(w_i) = \frac{|\{ w_{i-1}:c(w_{i-1} w_i )>0 \}|}{\sum_{w^{\prime}}|\{ w_{i-1}^{\prime}:c(w_{i-1}^{\prime} w_i^{\prime} )>0 \}|}
+\funp{P}_{\textrm{cont}}(w_i) = \frac{|\{ w_{i-1}:c(w_{i-1},w_i )>0 \}|}{\sum_{w^{\prime}_{i}}|\{ w_{i-1}^{\prime}:c(w_{i-1}^{\prime},w_i^{\prime} )>0 \}|}
 \label{eq:2-36}
 \end{eqnarray}
-结合基础的Absolute discounting计算公式，从而得到了Kneser-Ney平滑方法的公式
+结合基础的Absolute discounting计算公式，从而得到了Kneser-Ney平滑方法的公式：
 \begin{eqnarray}
 \funp{P}_{\textrm{KN}}(w_i|w_{i-1}) = \frac{\max(c(w_{i-1},w_i )-d,0)}{c(w_{i-1})}+ \lambda(w_{i-1})\funp{P}_{\textrm{cont}}(w_i)
 \label{eq:2-37}
 \end{eqnarray}
-\noindent 其中
+\noindent 其中：
 \begin{eqnarray}
 \lambda(w_{i-1}) = \frac{d}{c(w_{i-1})}|\{w:c(w_{i-1},w)>0\}|
 \label{eq:2-38}
@@ -737,14 +720,14 @@ I cannot see without my reading \underline{\ \ \ \ \ \ \ \ }
 \noindent 这里$\max(\cdot)$保证了分子部分为不小0的数，原始1-gram更新成$\funp{P}_{\textrm{cont}}$概率分布，$\lambda$是正则化项。
-\parinterval 为了更具普适性，不仅局限为2-gram和1-gram的插值模型，利用递归的方式可以得到更通用的Kneser-Ney平滑公式
+\parinterval 为了更具普适性，不仅局限为2-gram和1-gram的插值模型，利用递归的方式可以得到更通用的Kneser-Ney平滑公式：
 \begin{eqnarray}
-\funp{P}_{\textrm{KN}}(w_i|w_{i-n+1}  \ldots w_{i-1}) & = & \frac{\max(c_{\textrm{KN}}(w_{i-n+1} \ldots w_{i-1})-d,0)}{c_{\textrm{KN}}(w_{i-n+1} \ldots w_{i-1})} + \nonumber \\
+\funp{P}_{\textrm{KN}}(w_i|w_{i-n+1}  \ldots w_{i-1}) & = & \frac{\max(c_{\textrm{KN}}(w_{i-n+1} \ldots w_{i})-d,0)}{c_{\textrm{KN}}(w_{i-n+1} \ldots w_{i-1})} + \nonumber \\
                                                   &   &  \lambda(w_{i-n+1} \ldots w_{i-1})\funp{P}_{\textrm{KN}}(w_i|w_{i-n+2} \ldots w_{i-1})
 \label{eq:2-39}
 \end{eqnarray}
 \begin{eqnarray}
-\lambda(w_{i-1}) =  \frac{d}{c_{\textrm{KN}}(w_{i-n+1}^{i-1})}|\{w:c_{\textrm{KN}}(w_{i-n+1} \ldots w_{i-1}w)>0\}
+\lambda(w_{i-n+1} \ldots w_{i-1}) =  \frac{d}{c_{\textrm{KN}}(w_{i-n+1}^{i-1})}|\{w:c_{\textrm{KN}}(w_{i-n+1} \ldots w_{i-1},w)>0\}
 \label{eq:2-40}
 \end{eqnarray}
 \begin{eqnarray}
@@ -769,7 +752,7 @@ c_{\textrm{KN}}(\cdot) = \left\{\begin{array}{ll}
 \begin{itemize}
 \vspace{0.5em}
-\item {\small\bfnew{训练}}\index{训练}（Training\index{Training}）：从训练数据上估计出语言模型的参数；
+\item {\small\bfnew{训练}}\index{训练}（Training\index{Training}）：从训练数据上估计出语言模型的参数。
 \vspace{0.5em}
 \item {\small\bfnew{预测}}\index{预测}（Prediction\index{Prediction}）：用训练好的语言模型对新输入的句子进行概率评估，或者生成新的句子。
 \vspace{0.5em}
@@ -779,7 +762,7 @@ c_{\textrm{KN}}(\cdot) = \left\{\begin{array}{ll}
 \begin{itemize}
 \vspace{0.5em}
-\item 预测输入句子的可能性。比如，有如下两个句子，
+\item 预测输入句子的可能性。比如，有如下两个句子
 \vspace{0.8em}
 \hspace{10em} The boy caught the cat.
@@ -821,9 +804,9 @@ c_{\textrm{KN}}(\cdot) = \left\{\begin{array}{ll}
 \noindent 这里$\arg$即argument（参数），$\argmax_x f(x)$表示返回使$f(x)$达到最大的$x$。$\argmax_{w \in \chi}$\\$\funp{P}(w)$表示找到使语言模型得分$\funp{P}(w)$达到最大的单词序列$w$。$\chi$ 是搜索问题的解空间，它是所有可能的单词序列$w$的集合。$\hat{w}$可以被看做该搜索问题中的“最优解”，即概率最大的单词序列。
-\parinterval 在序列生成任务中，最简单的策略就是对词表中的词汇进行任意组合，通过这种枚举的方式得到全部可能的序列。但是，很多时候并生成序列的长度是无法预先知道的。比如，机器翻译中目标语序列的长度是任意的。那么怎样判断一个序列何时完成了生成过程呢？这里借用人类书写中文和英文的过程：句子的生成首先从一片空白开始，然后从左到右逐词生成，除了第一个单词，所有单词的生成都依赖于前面已经生成的单词。为了方便计算机实现，通常定义单词序列从一个特殊的符号<sos>后开始生成。同样地，一个单词序列的结束也用一个特殊的符号<eos>来表示。
+\parinterval 在序列生成任务中，最简单的策略就是对词表中的词汇进行任意组合，通过这种枚举的方式得到全部可能的序列。但是，很多时候并生成序列的长度是无法预先知道的。比如，机器翻译中目标语序列的长度是任意的。那么怎样判断一个序列何时完成了生成过程呢？这里借用现代人类书写中文和英文的过程：句子的生成首先从一片空白开始，然后从左到右逐词生成，除了第一个单词，所有单词的生成都依赖于前面已经生成的单词。为了方便计算机实现，通常定义单词序列从一个特殊的符号<sos>后开始生成。同样地，一个单词序列的结束也用一个特殊的符号<eos>来表示。
-\parinterval 对于一个序列$<$sos$>$\ \ I\ \ agree\ \ $<$eos$>$，图\ref{fig:2-13}展示语言模型视角下该序列的生成过程。该过程通过在序列的末尾不断附加词表中的单词来逐渐扩展序列，直到这段序列结束。这种生成单词序列的过程被称作{\small\bfnew{自左向右生成}}\index{自左向右生成}（Left-to-right Generation）\index{Left-to-right Generation}。注意，这种序列生成策略与$n$-gram的思想天然契合，因为$n$-gram语言模型中，每个词的生成概率依赖前面（左侧）若干词，因此$n$-gram语言模型也是一种自左向右的计算模型。
+\parinterval 对于一个序列$<$sos$>$\ I\ agree\ $<$eos$>$，图\ref{fig:2-13}展示语言模型视角下该序列的生成过程。该过程通过在序列的末尾不断附加词表中的单词来逐渐扩展序列，直到这段序列结束。这种生成单词序列的过程被称作{\small\bfnew{自左向右生成}}\index{自左向右生成}（Left-to-right Generation）\index{Left-to-right Generation}。注意，这种序列生成策略与$n$-gram的思想天然契合，因为$n$-gram语言模型中，每个词的生成概率依赖前面（左侧）若干词，因此$n$-gram语言模型也是一种自左向右的计算模型。
 %----------------------------------------------
 \begin{figure}[htp]
@@ -836,17 +819,17 @@ c_{\textrm{KN}}(\cdot) = \left\{\begin{array}{ll}
 \parinterval 在这种序列生成方式的基础上，实现搜索通常有两种方法\ \dash\ 深度优先遍历和宽度优先遍历\upcite{DBLP:books/mg/CormenLR89}。在深度优先遍历中，每次从词表中可重复地选择一个单词，然后从左至右地生成序列，直到<eos>被选择，此时一个完整的单词序列被生成出来。然后从<eos>回退到上一个单词，选择之前词表中未被选择到的候选单词代替<eos>，并继续挑选下一个单词直到<eos>被选到，如果上一个单词的所有可能都被枚举过，那么回退到上上一个单词继续枚举，直到回退到<sos>，这时候枚举结束。在宽度优先遍历中，每次不是只选择一个单词，而是枚举所有单词。
-有一个简单的例子。假设词表只含两个单词\{a, b\}，从<sos>开始枚举所有单词，有三种可能：
+有一个简单的例子。假设词表只含两个单词$\{a, b\}$，从<sos>开始枚举所有候选，有三种可能：
 \begin{eqnarray}
-\text{\{<sos> a, <sos> b, <sos> <eos>\}} \nonumber
+\{\text{<sos>}\ a, \text{<sos>}\ b, \text{<sos>}\ \text{<eos>}\} \nonumber
 \end{eqnarray}
-\noindent 其中可以划分成长度为0的完整的单词序列集合\{<sos> <eos>\}和长度为1的未结束的单词序列片段集合\{<sos> a, <sos> b\}，然后下一步对未结束的单词序列枚举词表中的所有单词，可以生成：
+\noindent 其中可以划分成长度为0的完整的单词序列集合$\{\text{<sos>}\ \text{<eos>}\}$和长度为1的未结束的单词序列片段集合$\{\text{<sos>}\ a, \text{<sos>}\ b\}$，然后下一步对未结束的单词序列枚举词表中的所有单词，可以生成：
 \begin{eqnarray}
-\text{\{<sos> a a, <sos> a b, <sos> a <eos>, <sos> b a, <sos> b b, <sos> b <eos>\}} \nonumber
+\{\text{<sos>}\ a\ a, \text{<sos>}\ a\ b, \text{<sos>}\ a\ \text{<eos>}, \text{<sos>}\ b\ a, \text{<sos>}\ b\ b, \text{<sos>}\ b\ \text{<eos>}\} \nonumber
 \end{eqnarray}
-\parinterval 此时可以划分出长度为1的完整单词序列集合\{<sos> a <eos>, <sos> b <eos>\}，以及长度为2的未结束单词序列片段集合\{<sos> a a, <sos> a b, <sos> b a, <sos> b b\}。以此类推，继续生成未结束序列，直到单词序列的长度达到所允许的最大长度。
+\parinterval 此时可以划分出长度为1的完整单词序列集合$\{\text{<sos>}\ a\ \text{<eos>}, \text{<sos>}\ b\ \text{<sos>}\}$，以及长度为2的未结束单词序列片段集合$\{\text{<sos>}\ a\ a, \text{<sos>}\ a\ b, \text{<sos>}\ b\ a, \text{<sos>}\ b\ b\}$。以此类推，继续生成未结束序列，直到单词序列的长度达到所允许的最大长度。
 \parinterval 对于这两种搜索算法，通常可以从以下四个方面评价：
@@ -874,8 +857,8 @@ c_{\textrm{KN}}(\cdot) = \left\{\begin{array}{ll}
 {
 \begin{tabular}{c|c|c}
 \rule{0pt}{10pt} & 时间复杂度 & 空间复杂度\\ \hline
-\rule{0pt}{10pt} 深度优先 & $\textrm{O}({(|V|+1)}^{m-1})$ & $\textrm{O}(m)$ \\
+\rule{0pt}{10pt} 深度优先 & $O({(|V|+1)}^{m-1})$ & $O(m)$ \\
-\rule{0pt}{10pt} 宽度优先 & $\textrm{O}({(|V|+1)}^{m-1}$) & $\textrm{O}({(|V|+1)}^{m})$ \\
+\rule{0pt}{10pt} 宽度优先 & $O({(|V|+1)}^{m-1}$) & $O({(|V|+1)}^{m})$ \\
 \end{tabular}
 \label{tab:2-3}
 }
@@ -883,7 +866,7 @@ c_{\textrm{KN}}(\cdot) = \left\{\begin{array}{ll}
 }\end{table}
 %------------------------------------------------------
-\parinterval 那么是否有比枚举策略更高效的方法呢？答案是肯定的。一种直观的方法是将搜索的过程表示成树型结构，称为解空间树。它包含了搜索过程中可生成的全部序列。该树的根节点恒为$<$sos$>$，代表序列均从$<$sos$>$ 开始。该树结构中非叶子节点的兄弟节点有$|V|$个，由词表和结束符号$<$eos$>$构成。从图\ref{fig:2-14}可以看到，对于一个最大长度为4的序列的搜索过程，生成某个单词序列的过程实际上就是访问解空间树中从根节点<sos> 开始一直到叶子节点<eos>结束的某条路径，而这条的路径上节点按顺序组成了一段独特的单词序列。此时对所有可能单词序列的枚举就变成了对解空间树的遍历。并且枚举的过程与语言模型打分的过程也是一致的，每枚举一个词$i$也就是在上图选择$w_i$一列的一个节点，语言模型就可以为当前的树节点$w_i$给出一个分值，即$\funp{P}(w_i | w_1 w_2 \ldots w_{i-1})$。对于$n$-gram语言模型，这个分值$\funp{P}(w_i | w_1 w_2 \ldots w_{i-1})=\funp{P}(w_i | w_{i-n+1} \ldots w_{i-1})$
+\parinterval 那么是否有比枚举策略更高效的方法呢？答案是肯定的。一种直观的方法是将搜索的过程表示成树型结构，称为解空间树。它包含了搜索过程中可生成的全部序列。该树的根节点恒为<sos>，代表序列均从<sos> 开始。该树结构中非叶子节点的兄弟节点有$|V|$个，由词表和结束符号<eos>构成。从图\ref{fig:2-14}可以看到，对于一个最大长度为4的序列的搜索过程，生成某个单词序列的过程实际上就是访问解空间树中从根节点<sos> 开始一直到叶子节点<eos>结束的某条路径，而这条的路径上节点按顺序组成了一段独特的单词序列。此时对所有可能单词序列的枚举就变成了对解空间树的遍历。并且枚举的过程与语言模型打分的过程也是一致的，每枚举一个词$i$也就是在上图选择$w_i$一列的一个节点，语言模型就可以为当前的树节点$w_i$给出一个分值，即$\funp{P}(w_i | w_1 w_2 \ldots w_{i-1})$。对于$n$-gram语言模型，这个分值$\funp{P}(w_i | w_1 w_2 \ldots w_{i-1})=\funp{P}(w_i | w_{i-n+1} \ldots w_{i-1})$
 %----------------------------------------------
 \begin{figure}[htp]
@@ -919,7 +902,7 @@ c_{\textrm{KN}}(\cdot) = \left\{\begin{array}{ll}
 \end{figure}
 %-------------------------------------------
-\parinterval 这样，语言模型的打分与解空间树的遍历就融合在一起了。于是，序列生成的问题可以被重新描述为：寻找所有单词序列组成的解空间树中权重总和最大的一条路径。在这个定义下，前面提到的两种枚举词序列的方法就是经典的{\small\bfnew{深度优先搜索}}\index{深度优先搜索}（Depth-first Search）\index{Depth-first Search}和{\small\bfnew{宽度优先搜索}}\index{宽度优先搜索}（Breadth-first Search）\index{Breadth-first Search}的雏形\upcite{even2011graph,tarjan1972depth}。在后面的内容中可以看到，从遍历解空间树的角度出发，可以对原始这些搜索策略的效率进行优化。
+\parinterval 这样，语言模型的打分与解空间树的遍历就融合在一起了。于是，序列生成的问题可以被重新描述为：寻找所有单词序列组成的解空间树中权重总和最大的一条路径。在这个定义下，前面提到的两种枚举词序列的方法就是经典的{\small\bfnew{深度优先搜索}}\index{深度优先搜索}（Depth-first Search）\index{Depth-first Search}和{\small\bfnew{宽度优先搜索}}\index{宽度优先搜索}（Breadth-first Search）\index{Breadth-first Search}的雏形\upcite{even2011graph,tarjan1972depth}。在后面的内容中，从遍历解空间树的角度出发，可以对原始这些搜索策略的效率进行优化。
 %----------------------------------------------------------------------------------------
 %    NEW SUB-SECTION
@@ -993,7 +976,7 @@ c_{\textrm{KN}}(\cdot) = \left\{\begin{array}{ll}
 \subsubsection{1.贪婪搜索}
-\parinterval {\small\bfnew{贪婪搜索}}\index{贪婪搜索}（Greedy Search）\index{Greedy Search}基于一种思想：当一个问题可以拆分为多个子问题时，如果一直选择子问题的最优解就能得到原问题的最优解，那么就可以不必遍历原始的解空间，而是使用这种“贪婪”的策略进行搜索。基于这种思想，它每次都优先挑选得分最高的词进行扩展，这一点与改进过的深度优先搜索类似。但是它们的区别在于，贪婪搜索在搜索到一个完整的序列，也就是搜索到<eos>即停止，而改进的深度优先搜索会遍历整个解空间。因此贪婪搜索非常高效，其时间和空间复杂度仅为$\textrm{O}(m)$，这里$m$为单词序列的长度。
+\parinterval {\small\bfnew{贪婪搜索}}\index{贪婪搜索}（Greedy Search）\index{Greedy Search}基于一种思想：当一个问题可以拆分为多个子问题时，如果一直选择子问题的最优解就能得到原问题的最优解，那么就可以不必遍历原始的解空间，而是使用这种“贪婪”的策略进行搜索。基于这种思想，它每次都优先挑选得分最高的词进行扩展，这一点与改进过的深度优先搜索类似。但是它们的区别在于，贪婪搜索在搜索到一个完整的序列，也就是搜索到<eos>即停止，而改进的深度优先搜索会遍历整个解空间。因此贪婪搜索非常高效，其时间和空间复杂度仅为$O(m)$，这里$m$为单词序列的长度。
 \parinterval 由于贪婪搜索并没有遍历整个解空间，所以该方法不保证一定能找到最优解。比如对于如图\ref{fig:2-18}所示的一个搜索结构，贪婪搜索将选择红线所示的序列，该序列的最终得分是-1.7。但是，对比图\ref{fig:2-16}可以发现，在另一条路径上有得分更高的序列“<sos>\ I\ agree\ <eos>”，它的得分为-1.5。此时贪婪搜索并没有找到最优解，由于贪婪搜索选择的单词是当前步骤得分最高的，但是最后生成的单词序列的得分取决于它未生成部分的得分。因此当得分最高的单词的子树中未生成部分的得分远远小于其他子树时，贪婪搜索提供的解的质量会非常差。同样的问题可以出现在使用贪婪搜索的任意时刻。但是，即使是这样，凭借其简单的思想以及在真实问题上的效果，贪婪搜索在很多场景中仍然得到了深入应用。
@@ -1014,7 +997,7 @@ c_{\textrm{KN}}(\cdot) = \left\{\begin{array}{ll}
 \parinterval 贪婪搜索会产生质量比较差的解是由于当前单词的错误选择造成的。既然每次只挑选一个单词可能会产生错误，那么可以通过同时考虑更多候选单词来缓解这个问题，也就是对于一个位置，可以同时将其扩展到若干个节点。这样就扩大了搜索的范围，进而使得优质解被找到的概率增大。
-\parinterval 常见的做法是每一次生成新单词的时候都挑选得分最高的前$B$个单词，然后扩展这$B$个单词的$T$个孩子节点，得到$BT$条新路径，最后保留其中得分最高的$B$条路径。从另外一个角度理解，它相当于比贪婪搜索看到了更多的路径，因而它更有可能找到好的解。这个方法通常被称为{\small\bfnew{束搜索}}\index{束搜索}（Beam Search）\index{Beam Search}。图\ref{fig:2-19}展示了一个束大小为3的例子，其中束大小代表每次选择单词时保留的词数。比起贪婪搜索，束搜索在实际表现中非常优秀，它的时间、空间复杂度仅为贪婪搜索的常数倍，也就是$\textrm{O}(Bm)$。
+\parinterval 常见的做法是每一次生成新单词的时候都挑选得分最高的前$B$个单词，然后扩展这$B$个单词的$T$个孩子节点，得到$BT$条新路径，最后保留其中得分最高的$B$条路径。从另外一个角度理解，它相当于比贪婪搜索看到了更多的路径，因而它更有可能找到好的解。这个方法通常被称为{\small\bfnew{束搜索}}\index{束搜索}（Beam Search）\index{Beam Search}。图\ref{fig:2-19}展示了一个束大小为3的例子，其中束大小代表每次选择单词时保留的词数。比起贪婪搜索，束搜索在实际表现中非常优秀，它的时间、空间复杂度仅为贪婪搜索的常数倍，也就是$O(Bm)$。
 %----------------------------------------------
 \begin{figure}[htp]
@@ -1025,7 +1008,7 @@ c_{\textrm{KN}}(\cdot) = \left\{\begin{array}{ll}
 \end{figure}
 %-------------------------------------------
-\parinterval 束搜索也有很多的改进版本。回忆一下，在无信息搜索策略中可以使用剪枝技术来提升搜索的效率。而实际上，束搜索本身也是一种剪枝方法。因此有时也把束搜索称作{\small\bfnew{束剪枝}}\index{束剪枝}（Beam Pruning）\index{Beam Pruning}。在这里有很多其它的剪枝策略可供选择，例如可以只保留与当前最佳路径得分相差在$\theta$之内的路径，也就是搜索只保留得分差距在一定范围内的路径，这种方法也被称作{\small\bfnew{直方图剪枝}}\index{直方图剪枝}（Histogram Pruning）\index{Histogram Pruning}。
+\parinterval 束搜索也有很多的改进版本。回忆一下，在无信息搜索策略中可以使用剪枝技术来提升搜索的效率。而实际上，束搜索本身也是一种剪枝方法。因此有时也把束搜索称作{\small\bfnew{束剪枝}}\index{束剪枝}（Beam Pruning）\index{Beam Pruning}。在这里有很多其它的剪枝策略可供选择，例如可以只保留与当前最佳路径得分相差在$\theta$之内的路径，也就是进行搜索时只保留得分差距在一定范围内的路径，这种方法也被称作{\small\bfnew{直方图剪枝}}\index{直方图剪枝}（Histogram Pruning）\index{Histogram Pruning}。
 \parinterval 对于语言模型来说，当多个路径中最高得分比当前搜索到的最好的解的得分低时，可以立刻停止搜索。因为此时序列越长语言模型得分$\log \funp{P}(w_1 w_2 \ldots w_m)$会越低，继续扩展这些路径不会产生更好的结果。这个技术通常也被称为{\small\bfnew{最佳停止条件}}\index{最佳停止条件}（Optimal Stopping Criteria）\index{Optimal Stopping Criteria}。类似的思想也被用于机器翻译等任务\upcite{DBLP:conf/emnlp/HuangZM17,DBLP:conf/emnlp/Yang0M18}。
@@ -1051,7 +1034,7 @@ c_{\textrm{KN}}(\cdot) = \left\{\begin{array}{ll}
 \vspace{0.5em}
 \item 本章更多地关注了语言模型的基本问题和求解思路，但是基于$n$-gram的方法并不是语言建模的唯一方法。从现在自然语言处理的前沿看，端到端的深度学习方法在很多任务中都取得了领先的性能。语言模型同样可以使用这些方法\upcite{jing2019a}，而且在近些年取得了巨大成功。例如，最早提出的前馈神经语言模型\upcite{bengio2003a}和后来的基于循环单元的语言模型\upcite{mikolov2010recurrent}、基于长短期记忆单元的语言模型\upcite{sundermeyer2012lstm}以及现在非常流行的Transformer\upcite{vaswani2017attention}。 关于神经语言模型的内容，会在{\chapternine}进行进一步介绍。
 \vspace{0.5em}
-\item 最后，本章结合语言模型的序列生成任务对搜索技术进行了介绍。类似地，机器翻译任务也需要从大量的翻译后选中快速寻找最优译文。因此在机器翻译任务中也使用了搜索方法，这个过程通常被称作{\small\bfnew{解码}}\index{解码}（Decoding）\index{Decoding}。例如，有研究者在基于词的翻译模型中尝试使用启发式搜索\upcite{DBLP:conf/acl/OchUN01,DBLP:conf/acl/WangW97,tillmann1997a}以及贪婪搜索方法\upcite{germann2001fast}\upcite{germann2003greedy}，也有研究者研究基于短语的栈解码方法\upcite{Koehn2007Moses,DBLP:conf/amta/Koehn04}。此外，解码方法还包括有限状态机解码\upcite{bangalore2001a}\upcite{bangalore2000stochastic}以及基于语言学约束的解码\upcite{venugopal2007an,zollmann2007the,liu2006tree,galley2006scalable,chiang2005a}。相关内容将在{\chaptereight} 和{\chapterfourteen} 进行介绍。
+\item 最后，本章结合语言模型的序列生成任务对搜索技术进行了介绍。类似地，机器翻译任务也需要从大量的翻译候选中快速寻找最优译文。因此在机器翻译任务中也使用了搜索方法，这个过程通常被称作{\small\bfnew{解码}}\index{解码}（Decoding）\index{Decoding}。例如，有研究者在基于词的翻译模型中尝试使用启发式搜索\upcite{DBLP:conf/acl/OchUN01,DBLP:conf/acl/WangW97,tillmann1997a}以及贪婪搜索方法\upcite{germann2001fast}\upcite{germann2003greedy}，也有研究者研究基于短语的栈解码方法\upcite{Koehn2007Moses,DBLP:conf/amta/Koehn04}。此外，解码方法还包括有限状态机解码\upcite{bangalore2001a}\upcite{DBLP:journals/mt/BangaloreR02}以及基于语言学约束的解码\upcite{venugopal2007an,zollmann2007the,liu2006tree,galley2006scalable,chiang2005a}。相关内容将在{\chaptereight} 和{\chapterfourteen} 进行介绍。
 \vspace{0.5em}
 \end{itemize}
 \end{adjustwidth}
--- a/Chapter3/Figures/figure-crf-to-deal-with-sequence-problems.tex
+++ b/Chapter3/Figures/figure-crf-to-deal-with-sequence-problems.tex
@@ -8,9 +8,9 @@
 		\node[anchor=west,hide](yn-1)at([xshift=2em]dots.east){$y_{n-1}$};
 		\node[anchor=west,hide](yn)at([xshift=2em]yn-1.east){$y_n$};
-		\node[anchor=north,draw,line width=1pt,inner sep=2pt,fill=red!30,minimum height=2em,minimum width=12em](see)at ([yshift=-3em,xshift=2em]y3.south){$X=x_1,x_2,\ldots,x_{n-1},x_n$};
+		\node[anchor=north,draw,line width=1pt,inner sep=2pt,fill=red!30,minimum height=2em,minimum width=12em](see)at ([yshift=-3em,xshift=2em]y3.south){$\mathbf{X}=(x_1,x_2,\ldots,x_{n-1},x_n)$};
-		\node[anchor=south,font=\footnotesize] at ([yshift=1em,xshift=2em]y3.north){(待预测的隐藏状态序列)};
+		\node[anchor=south,font=\footnotesize] at ([yshift=1em,xshift=2em]y3.north){待预测的隐藏状态序列};
-		\node[anchor=north,font=\footnotesize] at ([yshift=-1em]see.south){(可见状态序列)};
+		\node[anchor=north,font=\footnotesize] at ([yshift=-1em]see.south){可见状态序列};
 		\draw[line width=1pt] (y1.east) -- (y2.west);
 		\draw[line width=1pt] (y2.east) -- (y3.west);

--- a/Chapter3/Figures/figure-cross-type-word-segmentation-ambiguity.tex
+++ b/Chapter3/Figures/figure-cross-type-word-segmentation-ambiguity.tex
@@ -44,15 +44,15 @@
 }
 {
-\node [anchor=west,thick,draw,minimum width=3.4em,minimum height=1.5em] (w1) at (c3.west){};
+\node [anchor=west,thick,draw,minimum width=3.4em,minimum height=1.5em,ugreen] (w1) at (c3.west){};
-\draw [->,thick] (entry3.30) ..controls +(70:1) and +(south:1.5).. ([xshift=0.3em]w1.south) node [pos=0.5, above] {\footnotesize{命中}};
+\draw [->,thick,ugreen] (entry3.30) ..controls +(70:1) and +(south:1.5).. ([xshift=0.3em]w1.south) node [pos=0.5, above] {\footnotesize{命中}};
 }
 {
-\node [anchor=west,very thick,draw,dotted,minimum width=3.4em,minimum height=1.9em,red] (w3) at (c2.west){};
+\node [anchor=west,very thick,draw,dotted,minimum width=3.4em,minimum height=1.9em,ublue] (w3) at (c2.west){};
-\draw [->,very thick,dotted,red] ([yshift=-0.2em]entry6.30) ..controls +(60:2) and +(south:3).. ([xshift=-0.6em]w3.south) node [pos=0.5, below] {\color{red}{\footnotesize{命中}}};
+\draw [->,very thick,dotted,ublue] ([yshift=-0.2em]entry6.30) ..controls +(60:2) and +(south:3).. ([xshift=-0.6em]w3.south) node [pos=0.5, below] {\color{ublue}{\footnotesize{命中}}};
 }

--- a/Chapter3/Figures/figure-example-of-hmm.tex
+++ b/Chapter3/Figures/figure-example-of-hmm.tex
@@ -2,15 +2,19 @@
 	\tikzstyle{unit} = [draw,circle,line width=0.8pt,align=center,fill=green!30,minimum size=1em]
 		\node[minimum width=3em,minimum height=1.8em] (o) at (0,0){};
-		\node[anchor=north,inner sep=1pt,font=\footnotesize] (state_A) at ([xshift=-1em,yshift=-1em]o.south){state A};
+		\node[anchor=north,inner sep=1pt,font=\footnotesize] (state_A) at ([xshift=-0em,yshift=-1em]o.south){隐藏状态A};
-		\node[anchor=north,inner sep=1pt,font=\footnotesize] (state_B) at ([yshift=-2em]state_A.south){state B};
+		\node[anchor=north,inner sep=1pt,font=\footnotesize] (state_B) at ([yshift=-1.6em]state_A.south){隐藏状态B};
-		\node[anchor=north,inner sep=1pt,font=\footnotesize] (state_C) at ([yshift=-2em]state_B.south){state C};
+		\node[anchor=north,inner sep=1pt,font=\footnotesize] (state_C) at ([yshift=-1.6em]state_B.south){隐藏状态C};
-		\node[anchor=north,inner sep=1pt,font=\footnotesize] (state_D) at ([yshift=-2em]state_C.south){state D};
+		\node[anchor=north,inner sep=1pt,font=\footnotesize] (state_D) at ([yshift=-1.6em]state_C.south){隐藏状态D};
 		\node[anchor=west,inner sep=1pt,font=\footnotesize] (c1) at ([yshift=0.2em,xshift=2em]o.east){T};
 		\node[anchor=west,inner sep=1pt,font=\footnotesize] (c2) at ([xshift=5em]c1.east){F};
 		\node[anchor=west,inner sep=1pt,font=\footnotesize] (c3) at ([xshift=5em]c2.east){F};
 		\node[anchor=west,inner sep=1pt,font=\footnotesize] (c4) at ([xshift=5em]c3.east){T};
+		\node[anchor=south,font=\scriptsize] (cl1) at (c1.north) {时刻1};
+		\node[anchor=south,font=\scriptsize] (cl2) at (c2.north) {时刻2};
+		\node[anchor=south,font=\scriptsize] (cl3) at (c3.north) {时刻3};
+		\node[anchor=south,font=\scriptsize] (cl4) at (c4.north) {时刻4};
 		\node[anchor=north,unit,fill=red!30] (u11) at ([yshift=-1.6em]c1.south){};
 		\node[anchor=north,unit] (u21) at ([yshift=-1.6em]u11.south){};
@@ -32,23 +36,22 @@
 		\node[anchor=north,unit] (u34) at ([yshift=-1.6em]u24.south){};
 		\node[anchor=north,unit] (u44) at ([yshift=-1.6em]u34.south){};
-		\draw[line width=1pt] (o.north west)--(o.south east);
+		\node[anchor=south west,align=center,font=\footnotesize] (label) at ([yshift=-1.4em,xshift=-4em]o.45){可见状态序列};
-		\node[anchor=south west,align=center,font=\tiny] at ([yshift=-1.4em,xshift=-1.2em]o.45){$i+1$位置\\隐藏状态};
+		\draw[->] ([xshift=-0.2em]label.east) --  ([xshift=0.7em]label.east);
-	\node[anchor=north east,align=center,font=\tiny] at ([yshift=1.2em,xshift=1.2em]o.-135){$i$位置\\可见状态};
 		\draw[->,line width=1pt] (u11.east) -- node[above,red!50,font=\footnotesize]{0.65}(u12.west);
 		\draw[->,line width=1pt] (u12.east) -- node[above,red!50,font=\footnotesize]{0.55}(u13.west);
-		\draw[->,line width=1pt] (u12.east) -- node[right,pos=0.6,font=\footnotesize]{0.45}(u23.west);
+		\draw[->,line width=1pt] (u12.east) -- node[right,pos=0.6,font=\footnotesize,xshift=0.2em]{0.45}(u23.west);
 		\draw[->,line width=1pt] (u13.east) -- node[above,red!50,font=\footnotesize]{0.5}(u14.west);
-		\draw[->,line width=1pt] (u13.east) -- node[right,pos=0.6,font=\footnotesize]{0.5}(u24.west);
+		\draw[->,line width=1pt] (u13.east) -- node[right,pos=0.6,font=\footnotesize,xshift=0.2em]{0.5}(u24.west);
-		\draw[->,line width=1pt] (u11.east) -- node[right,font=\footnotesize]{0.35}(u22.west);	
+		\draw[->,line width=1pt] (u11.east) -- node[right,font=\footnotesize,xshift=0.4em,yshift=-0.1em]{0.35}(u22.west);	
-		\draw[->,line width=1pt] (u22.east) -- node[left,pos=0.4,font=\footnotesize]{0.3}(u13.west);
+		\draw[->,line width=1pt] (u22.east) -- node[left,pos=0.4,font=\footnotesize,yshift=0.1em,xshift=-0.3em]{0.3}(u13.west);
-		\draw[->,line width=1pt] (u22.east) -- node[font=\footnotesize]{0.2}(u23.west);
+		\draw[->,line width=1pt] (u22.east) -- node[below,font=\footnotesize,yshift=0.2em]{0.2}(u23.west);
-		\draw[->,line width=1pt] (u22.east) -- node[font=\footnotesize]{0.2}(u33.west);
+		\draw[->,line width=1pt] (u22.east) -- node[below,font=\footnotesize,yshift=0.1em]{0.2}(u33.west);
-		\draw[->,line width=1pt] (u22.east) -- node[below,font=\footnotesize]{0.3}(u43.west);
+		\draw[->,line width=1pt] (u22.east) -- node[below,font=\footnotesize,yshift=-0.1em]{0.3}(u43.west);
-		\draw[->,line width=1pt] (u23.east) -- node[left,pos=0.4,font=\footnotesize]{0.35}(u14.west);
+		\draw[->,line width=1pt] (u23.east) -- node[left,pos=0.4,font=\footnotesize,xshift=-0.2em]{0.35}(u14.west);
-		\draw[->,line width=1pt] (u23.east) -- node[font=\footnotesize]{0.15}(u24.west);
+		\draw[->,line width=1pt] (u23.east) -- node[below,font=\footnotesize,yshift=0.2em]{0.15}(u24.west);
-		\draw[->,line width=1pt] (u23.east) -- node[font=\footnotesize]{0.15}(u34.west);
+		\draw[->,line width=1pt] (u23.east) -- node[below,font=\footnotesize,xshift=0.4em,yshift=-0.2em,pos=0.6]{0.15}(u34.west);
-		\draw[->,line width=1pt] (u23.east) -- node[below,font=\footnotesize]{0.35}(u44.west);
+		\draw[->,line width=1pt] (u23.east) -- node[below,font=\footnotesize,yshift=-0.3em]{0.35}(u44.west);
 \end{tikzpicture}
\ No newline at end of file
--- a/Chapter3/Figures/figure-example-of-word-segmentation-based-on-dictionary.tex
+++ b/Chapter3/Figures/figure-example-of-word-segmentation-based-on-dictionary.tex
@@ -102,8 +102,8 @@
 }
 {
-\node [anchor=west,thick,draw,red,minimum width=1.6em,minimum height=1.3em] (w18) at ([xshift=0.1em]c8.west){};
+\node [anchor=west,thick,draw,ublue,minimum width=1.6em,minimum height=1.3em] (w18) at ([xshift=0.1em]c8.west){};
-\node [anchor=north] (l18) at ([yshift=0.2em]w18.south) {{\color{red} \footnotesize{命中:2}}};
+\node [anchor=north] (l18) at ([yshift=-0.2em]w18.south) {{\color{ublue} \footnotesize{命中：第2号单词}}};
 }
 \end{tikzpicture}

--- a/Chapter3/Figures/figure-examples-of-chinese-word-segmentation-based-on-1-gram-model.tex
+++ b/Chapter3/Figures/figure-examples-of-chinese-word-segmentation-based-on-1-gram-model.tex
@@ -28,7 +28,7 @@
 \draw [->,very thick,ublue] ([xshift=0.2em]corpus.east) -- ([xshift=4.2em]corpus.east)  node [pos=0.5, above] {\color{red}{\scriptsize{统计学习}}};
-\draw [->,very thick,ublue] ([xshift=0.2em]model.east) -- ([xshift=4.2em]model.east)  node [pos=0.5, above] {\color{red}{\scriptsize{搜索\&计算}}};
+\draw [->,very thick,ublue] ([xshift=0.2em]model.east) -- ([xshift=4.2em]model.east)  node [pos=0.5, above] {\color{red}{\scriptsize{预测}}};
 {\scriptsize
 \node [anchor=north west] (sentlabel) at ([xshift=6.2em,yshift=-1em]model.north east) {\color{red}{自动分词系统}};
@@ -53,7 +53,7 @@
 \end{pgfonlayer}
 {
-\draw [-,thick,red,dotted] ([yshift=0.3em]modellabel.north) ..controls +(north:0.5) and +(south:0.5).. ([xshift=-3em]label1content.south);
+\draw [-,thick,dotted] ([yshift=0.3em]modellabel.north) ..controls +(north:0.5) and +(south:0.5).. ([xshift=-3em]label1content.south);
 }
 }
@@ -122,7 +122,7 @@
 \end{pgfonlayer}
 {
-\draw [-,thick,red,dotted] (segcontent.north) ..controls +(north:0.7) and +(south:0.7).. (segsystem.south);
+\draw [-,thick,dotted] (segcontent.north) ..controls +(north:0.7) and +(south:0.7).. (segsystem.south);
 }
 \end{tikzpicture}

--- a/Chapter3/Figures/figure-labeling-named-entities-in-bio-format.tex
+++ b/Chapter3/Figures/figure-labeling-named-entities-in-bio-format.tex
@@ -18,12 +18,12 @@
 		\node[anchor=west,inner sep=0pt,font=\footnotesize] at ([xshift=0.5em]n17.east){\Large{/}};
 		\node[unit,anchor=west] (n18) at ([xshift=1.2em]n17.east){首都};
-		\node[lab,anchor=north] at ([yshift=-1.4em,xshift=0.2em]n11.south){B-GPE};
+		\node[lab,anchor=north] at ([yshift=-1.4em,xshift=0.2em]n11.south){B-CIT};
 		\node[lab,anchor=north] at ([yshift=-0.8em,xshift=0.2em]n12.south){O};
-		\node[lab,anchor=north] at ([yshift=-1.4em,xshift=0.2em]n13.south){B-GPE};
+		\node[lab,anchor=north] at ([yshift=-1.4em,xshift=0.2em]n13.south){B-CNT};
-		\node[lab,anchor=north] at ([yshift=-1.4em,xshift=0.2em]n14.south){I-GPE};
+		\node[lab,anchor=north] at ([yshift=-1.4em,xshift=0.2em]n14.south){I-CNT};
-		\node[lab,anchor=north] at ([yshift=-1.4em,xshift=0.2em]n15.south){I-GPE};
+		\node[lab,anchor=north] at ([yshift=-1.4em,xshift=0.2em]n15.south){I-CNT};
-		\node[lab,anchor=north] at ([yshift=-1.4em,xshift=0.2em]n16.south){I-GPE};
+		\node[lab,anchor=north] at ([yshift=-1.4em,xshift=0.2em]n16.south){I-CNT};
 		\node[lab,anchor=north] at ([yshift=-0.8em,xshift=0.2em]n17.south){O};
 		\node[lab,anchor=north] at ([yshift=-0.8em,xshift=0.2em]n18.south){O};
 \end{tikzpicture}
\ No newline at end of file
--- a/Chapter3/Figures/figure-labeling-named-entities-in-bioes-format.tex
+++ b/Chapter3/Figures/figure-labeling-named-entities-in-bioes-format.tex
@@ -18,12 +18,12 @@
 		\node[anchor=west,inner sep=0pt,font=\footnotesize] at ([xshift=0.5em]n17.east){\Large{/}};
 		\node[unit,anchor=west] (n18) at ([xshift=1.2em]n17.east){首都};
-		\node[lab,anchor=north] at ([yshift=-1.4em,xshift=0.2em]n11.south){S-GPE};
+		\node[lab,anchor=north] at ([yshift=-1.4em,xshift=0.2em]n11.south){S-CIT};
 		\node[lab,anchor=north] at ([yshift=-0.8em,xshift=0.2em]n12.south){O};
-		\node[lab,anchor=north] at ([yshift=-1.4em,xshift=0.2em]n13.south){B-GPE};
+		\node[lab,anchor=north] at ([yshift=-1.4em,xshift=0.2em]n13.south){B-CNT};
-		\node[lab,anchor=north] at ([yshift=-1.4em,xshift=0.2em]n14.south){I-GPE};
+		\node[lab,anchor=north] at ([yshift=-1.4em,xshift=0.2em]n14.south){I-CNT};
-		\node[lab,anchor=north] at ([yshift=-1.4em,xshift=0.2em]n15.south){I-GPE};
+		\node[lab,anchor=north] at ([yshift=-1.4em,xshift=0.2em]n15.south){I-CNT};
-		\node[lab,anchor=north] at ([yshift=-1.4em,xshift=0.2em]n16.south){E-GPE};
+		\node[lab,anchor=north] at ([yshift=-1.4em,xshift=0.2em]n16.south){E-CNT};
 		\node[lab,anchor=north] at ([yshift=-0.8em,xshift=0.2em]n17.south){O};
 		\node[lab,anchor=north] at ([yshift=-0.8em,xshift=0.2em]n18.south){O};
 \end{tikzpicture}
\ No newline at end of file
--- a/Chapter3/Figures/figure-mt-system-as-a-black-box.tex
+++ b/Chapter3/Figures/figure-mt-system-as-a-black-box.tex
@@ -8,12 +8,12 @@
 \begin{tikzpicture}
 \begin{scope}
-\node [] (input) at (0,0) {{\scriptsize 猫喜欢吃鱼}};
+\node [] (input) at (-0.6em,0) {{\scriptsize 猫喜欢吃鱼}};
 \node [] (output) at ([xshift=3.35in]input.east) {{\scriptsize Cats like eating fish}};
 \draw[->,thick] ([xshift=-1pt]input.east) -- ([xshift=8pt]input.east);
-\draw[->,thick] ([xshift=-10pt]output.west) -- ([xshift=-0pt]output.west);
+\draw[->,thick] ([xshift=-6pt]output.west) -- ([xshift=2pt]output.west);
 %{
 %\draw[->,thick] ([xshift=-12pt]mtengine.west) -- ([xshift=-2pt]mtengine.west);
@@ -29,13 +29,13 @@
 \node [anchor=south] (outputlabel) at ([yshift=-0.5em]output.north) {{\scriptsize \color{red}{\textbf{输出}}}};
 {
-\node [anchor=west] (mtinputlabel) at ([xshift=0.32in]inputlabel.east) {{\scriptsize \color{red}{\textbf{}}}};
+\node [anchor=west] (mtinputlabel) at ([xshift=0.35in]inputlabel.east) {{\scriptsize \color{red}{\textbf{}}}};
 \node [anchor=west] (mtoutputlabel) at ([xshift=0.88in]mtinputlabel.east) {{\scriptsize \color{red}{\textbf{}}}};
 \node[rectangle,draw=ublue, inner sep=0mm] [fit = (mtinputlabel) (mtoutputlabel) (inputmarking) (outputmarking)] {};
 }
 {
-\node[rectangle,fill=ublue,inner sep=0mm] [fit = (mtinputlabel) (mtoutputlabel) (inputmarking) (outputmarking)] {{\color{white} \textbf{\Large{MT 系统}}}};
+\node[rectangle,fill=ublue,inner sep=2pt] [fit = (mtinputlabel) (mtoutputlabel) (inputmarking) (outputmarking)] {{\color{white} \textbf{\Large{MT 系统}}}};
 }

--- a/Chapter3/Figures/figure-mt=language-analysis+translation-engine.tex
+++ b/Chapter3/Figures/figure-mt=language-analysis+translation-engine.tex
@@ -8,7 +8,7 @@
 \begin{tikzpicture}
 \begin{scope}
-\node [] (input) at (0,0) {{\scriptsize 猫喜欢吃鱼}};
+\node [] (input) at (-0.6em,0) {{\scriptsize 猫喜欢吃鱼}};
 {
 \begin{scope}[scale=0.8,xshift=0.9in,yshift=-0.87in,level distance=20pt,sibling distance=-1pt,grow'=up]
@@ -46,7 +46,7 @@
 \node [] (output) at ([xshift=3.35in]input.east) {{\scriptsize Cats like eating fish}};
 \draw[->,thick] ([xshift=-1pt]input.east) -- ([xshift=8pt]input.east);
-\draw[->,thick] ([xshift=-10pt]output.west) -- ([xshift=-0pt]output.west);
+\draw[->,thick] ([xshift=-6pt]output.west) -- ([xshift=2pt]output.west);
 {
 \draw[->,thick] ([xshift=-12pt]mtengine.west) -- ([xshift=-2pt]mtengine.west);
@@ -62,9 +62,9 @@
 \node [anchor=south] (outputlabel) at ([yshift=-0.5em]output.north) {{\scriptsize \color{red}{\textbf{输出}}}};
 {
-\node [anchor=west] (mtinputlabel) at ([xshift=0.29in]inputlabel.east) {{\scriptsize \color{red}{\textbf{实际的输入}}}};
+\node [anchor=west] (mtinputlabel) at ([xshift=0.35in]inputlabel.east) {{\scriptsize \color{red}{\textbf{实际的输入}}}};
 \node [anchor=west] (mtoutputlabel) at ([xshift=1.0in]mtinputlabel.east) {{\scriptsize \color{red}{\textbf{实际的输出}}}};
-\node[rectangle,draw=ublue, inner sep=0mm] [fit = (mtinputlabel) (mtoutputlabel) (inputmarking) (outputmarking)] {};
+\node[rectangle,draw=ublue, inner sep=2pt] [fit = (mtinputlabel) (mtoutputlabel) (inputmarking) (outputmarking)] {};
 }
 \end{scope}

--- a/Chapter3/Figures/figure-ner-based-on-hmm.tex
+++ b/Chapter3/Figures/figure-ner-based-on-hmm.tex
@@ -3,12 +3,12 @@
 	\tikzstyle{word} = [draw,inner sep=2pt,line width=1pt,align=center,drop shadow,fill=red!30,font=\footnotesize,minimum height=1.4em,minimum width=1.6em]
 		\coordinate (o) at (0,0);
-		\node[anchor=west,class] (c1) at ([xshift=0em]o.east){B-GPE};
+		\node[anchor=west,class] (c1) at ([xshift=0em]o.east){B-CIT};
 		\node[anchor=west,class] (c2) at ([xshift=4em]o.east){O};
-		\node[anchor=west,class] (c3) at ([xshift=8em]o.east){B-GPE};
+		\node[anchor=west,class] (c3) at ([xshift=8em]o.east){B-CNT};
-		\node[anchor=west,class] (c4) at ([xshift=12em]o.east){I-GPE};
+		\node[anchor=west,class] (c4) at ([xshift=12em]o.east){I-CNT};
-		\node[anchor=west,class] (c5) at ([xshift=16em]o.east){I-GPE};
+		\node[anchor=west,class] (c5) at ([xshift=16em]o.east){I-CNT};
-		\node[anchor=west,class] (c6) at ([xshift=20em]o.east){I-GPE};
+		\node[anchor=west,class] (c6) at ([xshift=20em]o.east){I-CNT};
 		\node[anchor=west,class] (c7) at ([xshift=24em]o.east){O};
 		\node[anchor=west,class] (c8) at ([xshift=28em]o.east){O};

--- a/Chapter3/Figures/figure-process-of-statistical-syntax-analysis.tex
+++ b/Chapter3/Figures/figure-process-of-statistical-syntax-analysis.tex
@@ -55,7 +55,7 @@
 \draw [->,very thick,ublue] ([xshift=0.2em]corpus.east) -- ([xshift=4.2em]corpus.east)  node [pos=0.5, above] {\color{red}{\scriptsize{统计学习}}};
-\draw [->,very thick,ublue] ([xshift=0.2em]model.east) -- ([xshift=4.2em]model.east)  node [pos=0.5, above] {\color{red}{\scriptsize{搜索\&计算}}};
+\draw [->,very thick,ublue] ([xshift=0.2em]model.east) -- ([xshift=4.2em]model.east)  node [pos=0.5, above] {\color{red}{\scriptsize{预测}}};
 {\scriptsize
 \node [anchor=north west] (sentlabel) at ([xshift=6.2em,yshift=-1em]model.north east) {{\color{ublue} {\scriptsize \textbf{统计分析模型}}}};

--- a/Chapter3/Figures/figure-process-sequence-labeling-by-classfication.tex
+++ b/Chapter3/Figures/figure-process-sequence-labeling-by-classfication.tex
 \begin{tikzpicture}
 	\tikzstyle{unit} = [draw,minimum size=1em,circle]
-		\node[unit,fill=green!20] (g1) at (0,0){};
+		\node[unit,fill=ugreen!20] (g1) at (0,0){};
-		\node[anchor=west,unit,fill=green!20]	(g2)at([xshift=1.8em]g1.east){};
+		\node[anchor=west,unit,fill=ugreen!20]	(g2)at([xshift=1.8em]g1.east){};
-		\node[anchor=west,unit,fill=green!20]	(g3)at([xshift=1.8em]g2.east){};
+		\node[anchor=west,unit,fill=ugreen!20]	(g3)at([xshift=1.8em]g2.east){};
-		\node[anchor=west,unit,fill=green!20]	(g4)at([xshift=1.8em]g3.east){};
+		\node[anchor=west,unit,fill=ugreen!20]	(g4)at([xshift=1.8em]g3.east){};
 		\node[anchor=north,unit,fill=red!30]	(r1)at([yshift=-4em]g1.south){};
 		\node[anchor=north,unit,fill=red!30]	(r2)at([yshift=-4em]g2.south){};
@@ -12,15 +12,15 @@
 		\node[anchor=north,unit,fill=red!30]	(r4)at([yshift=-4em]g4.south){};
 		\begin{pgfonlayer}{background}
-        	\node [draw=green!20,rectangle,inner sep=2pt,rounded corners=4pt,dashed,line width=1.5pt] [fit = (g1)(g2)(g3)(g4)] (box1) {};
+        	\node [draw=ugreen!70,rectangle,inner sep=2pt,rounded corners=4pt,dashed,line width=1.0pt] [fit = (g1)(g2)(g3)(g4)] (box1) {};
-        	\node [draw=red!30,rectangle,inner sep=2pt,rounded corners=4pt,dashed,line width=1.5pt] [fit = (r1)(r2)(r3)(r4)] (box2) {};
+        	\node [draw=red!70,rectangle,inner sep=2pt,rounded corners=4pt,dashed,line width=1.0pt] [fit = (r1)(r2)(r3)(r4)] (box2) {};
    	\end{pgfonlayer}
 		\node[anchor=north,draw,inner sep=2pt,rounded corners=2pt,fill=blue!30,minimum width=6em](cla) at ([yshift=-1em]box1.south){分类器};
-		\node[anchor=south,font=\scriptsize] at ([yshift=0.4em,xshift=1.4em]g2.north){(待预测标签)};
+		\node[anchor=south,font=\scriptsize] at ([yshift=0.4em,xshift=1.4em]g2.north){待预测标签序列};
-		\node[anchor=north,font=\scriptsize] at ([yshift=-0.4em,xshift=1.4em]r2.south){(待标注标签)};
+		\node[anchor=north,font=\scriptsize] at ([yshift=-0.4em,xshift=1.4em]r2.south){观测序列};
 		\draw[->,thick] (cla.north) -- (box1.south);
 		\draw[->,thick] (box2.north) -- (cla.south);

--- a/Chapter3/Figures/figure-process-sequence-labeling-by-crf.tex
+++ b/Chapter3/Figures/figure-process-sequence-labeling-by-crf.tex
 \begin{tikzpicture}
 	\tikzstyle{unit} = [draw,minimum size=1em,circle]
-		\node[unit,fill=green!20] (g1) at (0,0){};
+		\node[unit,fill=ugreen!20] (g1) at (0,0){};
-		\node[anchor=west,unit,fill=green!20]	(g2)at([xshift=1.8em]g1.east){};
+		\node[anchor=west,unit,fill=ugreen!20]	(g2)at([xshift=1.8em]g1.east){};
-		\node[anchor=west,unit,fill=green!20]	(g3)at([xshift=1.8em]g2.east){};
+		\node[anchor=west,unit,fill=ugreen!20]	(g3)at([xshift=1.8em]g2.east){};
-		\node[anchor=west,unit,fill=green!20]	(g4)at([xshift=1.8em]g3.east){};
+		\node[anchor=west,unit,fill=ugreen!20]	(g4)at([xshift=1.8em]g3.east){};
 		\node[anchor=north,unit,fill=red!30]	(r1)at([yshift=-1.8em,xshift=1.4em]g2.south){};
-		\node[anchor=south,font=\scriptsize] at ([yshift=0.4em,xshift=1.4em]g2.north){(待预测标签)};
+		\node[anchor=south,font=\scriptsize] at ([yshift=0.4em,xshift=1.4em]g2.north){待预测标签序列};
-		\node[anchor=north,font=\scriptsize] at ([yshift=-0.4em]r1.south){(待标注标签)};
+		\node[anchor=north,font=\scriptsize] at ([yshift=-0.4em]r1.south){观测序列};
 		\draw[-,thick] (g1.east) -- (g2.west);
 		\draw[-,thick] (g2.east) -- (g3.west);

--- a/Chapter3/Figures/figure-process-sequence-labeling-by-hmm.tex
+++ b/Chapter3/Figures/figure-process-sequence-labeling-by-hmm.tex
 \begin{tikzpicture}
 	\tikzstyle{unit} = [draw,minimum size=1em,circle]
-		\node[unit,fill=green!2] (g1) at (0,0){};
+		\node[unit,fill=ugreen!20] (g1) at (0,0){};
-		\node[anchor=west,unit,fill=green!20]	(g2)at([xshift=1.8em]g1.east){};
+		\node[anchor=west,unit,fill=ugreen!20]	(g2)at([xshift=1.8em]g1.east){};
-		\node[anchor=west,unit,fill=green!20]	(g3)at([xshift=1.8em]g2.east){};
+		\node[anchor=west,unit,fill=ugreen!20]	(g3)at([xshift=1.8em]g2.east){};
-		\node[anchor=west,unit,fill=green!20]	(g4)at([xshift=1.8em]g3.east){};
+		\node[anchor=west,unit,fill=ugreen!20]	(g4)at([xshift=1.8em]g3.east){};
 		\node[anchor=north,unit,fill=red!30]	(r1)at([yshift=-1.8em]g1.south){};
 		\node[anchor=north,unit,fill=red!30]	(r2)at([yshift=-1.8em]g2.south){};
 		\node[anchor=north,unit,fill=red!30]	(r3)at([yshift=-1.8em]g3.south){};
 		\node[anchor=north,unit,fill=red!30]	(r4)at([yshift=-1.8em]g4.south){};
-		\node[anchor=south,font=\scriptsize] at ([yshift=0.4em,xshift=1.4em]g2.north){(待预测标签)};
+		\node[anchor=south,font=\scriptsize] at ([yshift=0.4em,xshift=1.4em]g2.north){待预测标签序列};
-		\node[anchor=north,font=\scriptsize] at ([yshift=-0.4em,xshift=1.4em]r2.south){(待标注标签)};
+		\node[anchor=north,font=\scriptsize] at ([yshift=-0.4em,xshift=1.4em]r2.south){观测序列};
 		\draw[->,thick] (g1.east) -- (g2.west);
 		\draw[->,thick] (g2.east) -- (g3.west);

--- a/Chapter3/Figures/figure-word-segmentation-based-on-statistics.tex
+++ b/Chapter3/Figures/figure-word-segmentation-based-on-statistics.tex
@@ -44,7 +44,7 @@
 }
 {
-\draw [->,very thick,ublue] ([xshift=0.2em]model.east) -- ([xshift=4.2em]model.east)  node [pos=0.5, above] {\color{red}{\scriptsize{搜索\&计算}}};
+\draw [->,very thick,ublue] ([xshift=0.2em]model.east) -- ([xshift=4.2em]model.east)  node [pos=0.5, above] {\color{red}{\scriptsize{推断}}};
 }
 {\scriptsize
@@ -71,7 +71,7 @@
 \node [anchor=east,draw,dashed,red,thick,minimum width=13em,minimum height=1.4em] (final) at (p2seg2.east) {};
 \node [anchor=west,red] (finallabel) at ([xshift=3.1em]sentlabel.east) {输出概率最大的结果};
 %\node [anchor=north east,red] (finallabel2) at ([yshift=0.5em]finallabel.south east) {的结果};
-\draw [->,thick,red] ([xshift=0.0em,yshift=-0.5em]final.north east) ..controls +(east:0.3) and +(south:0.0).. ([xshift=1.0em]finallabel.south);
+\draw [->,thick,red] ([xshift=0.0em,yshift=-0.5em]final.north east) ..controls +(east:0.2) and +(south:1.0).. ([xshift=2.0em]finallabel.south);
 }
 }

--- a/Chapter3/chapter3.tex
+++ b/Chapter3/chapter3.tex
--- a/Chapter4/Figures/representation-of-reference-answer-set-in-hyter.tex
+++ b/Chapter4/Figures/representation-of-reference-answer-set-in-hyter.tex
@@ -4,7 +4,7 @@
 		\node[unit] (u1)at (0,0){};
 		\node[unit,anchor=west](u2) at ([xshift=7em]u1.east){};
 		\node[unit,anchor=west](u3) at ([xshift=1.5em]u2.east){};
-		\node[unit,anchor=west](u4) at ([xshift=8em]u3.east){};
+		\node[unit,anchor=west](u4) at ([xshift=5em]u3.east){};
 		\node[unit,anchor=west](u5) at ([xshift=1.5em]u4.east){};
 		\node[unit,anchor=west](u6) at ([xshift=5em]u5.east){};
 		\node[unit,anchor=west,line width=1.5pt](u7) at ([xshift=2em]u6.east){};
@@ -14,7 +14,7 @@
 		\draw[->,red,line width=1.5pt](u1.east)-- node[inner sep=0pt,color=red,above]{\footnotesize the approval rate}(u2.west);
 		\draw[->,out=-30,in=-150,red,line width=1.5pt] (u1.south east) to  node[inner sep=0pt,color=red,below]{\footnotesize the approval level}(u2.south west);
 		\draw[->,line width=1.5pt](u2.east) -- node[above]{\footnotesize for} (u3.west);
-		\draw[->,line width=1.5pt](u3.east) -- node[above]{\footnotesize national football team} (u4.west);
+		\draw[->,line width=1.5pt](u3.east) -- node[above]{\footnotesize the proposal} (u4.west);
 		\draw[->,line width=1.5pt](u4.east) -- node[above]{\footnotesize was} (u5.west);
 		\draw[->,out=40,in=140,blue,line width=1.5pt] (u5.north east) to  node[inner sep=0pt,color=blue,above]{\footnotesize pratically}(u6.north west);
 		\draw[->,blue,line width=1.5pt](u5.east)-- node[inner sep=0pt,color=blue,above]{\footnotesize close to}(u6.west);

--- a/Chapter4/Figures/schematic-diagram-of-phrase-level-quality assessment-task.tex
+++ b/Chapter4/Figures/schematic-diagram-of-phrase-level-quality assessment-task.tex
--- a/Chapter4/Figures/schematic-diagram-of-word-level-quality-assessment-task.log
+++ b/Chapter4/Figures/schematic-diagram-of-word-level-quality-assessment-task.log
--- a/Chapter4/chapter4.tex
+++ b/Chapter4/chapter4.tex
--- a/Chapter5/chapter5.tex
+++ b/Chapter5/chapter5.tex
@@ -37,7 +37,7 @@ IBM模型由Peter F. Brown等人于上世纪九十年代初提出\cite{DBLP:jour
 \parinterval 在翻译任务中，我们希望得到一个源语言到目标语言的翻译。对于人类来说这个问题很简单，但是让计算机做这样的工作却很困难。这里面临的第一个问题是：如何对翻译进行建模？从计算机的角度来看，这就需要把自然语言的翻译问题转换为计算机可计算的问题。
-\parinterval 那么，基于单词的统计机器翻译模型又是如何描述翻译问题的呢？Peter F. Brown等人提出了一个观点\cite{Peter1993The}：在翻译一个句子时，可以把其中的每个单词翻译成对应的目标语言单词，然后调整这些目标语言单词的顺序，最后得到整个句子的翻译结果，而这个过程可以用统计模型来描述。尽管在人看来使用两个语言单词之间的对应进行翻译是很自然的事，但是对于计算机来说可是向前迈出了一大步。
+\parinterval 那么，基于单词的统计机器翻译模型又是如何描述翻译问题的呢？Peter F. Brown等人提出了一个观点\cite{DBLP:journals/coling/BrownPPM94}：在翻译一个句子时，可以把其中的每个单词翻译成对应的目标语言单词，然后调整这些目标语言单词的顺序，最后得到整个句子的翻译结果，而这个过程可以用统计模型来描述。尽管在人看来使用两个语言单词之间的对应进行翻译是很自然的事，但是对于计算机来说可是向前迈出了一大步。
 \parinterval 先来看一个例子。图 \ref{fig:5-1}展示了一个汉语翻译到英语的例子。首先，可以把源语言句子中的单词``我''、``对''、``你''、``感到''和``满意''分别翻译为``I''、``with''、``you''、``am''\ 和``satisfied''，然后调整单词的顺序，比如，``am''放在译文的第2个位置，``you''应该放在最后的位置等等，最后得到译文``I am satisfied with you''。
@@ -50,7 +50,7 @@ IBM模型由Peter F. Brown等人于上世纪九十年代初提出\cite{DBLP:jour
 \end{figure}
 %----------------------------------------------
-\parinterval 上面的例子反映了人在做翻译时所使用的一些知识：首先，两种语言单词的顺序可能不一致，而且译文需要符合目标语的习惯，这也就是常说的翻译的{\small\sffamily\bfseries{流畅度}}\index{流畅度}问题（Fluency）\index{Fluency}；其次，源语言单词需要准确的被翻译出来，也就是常说的翻译的{\small\sffamily\bfseries{准确性}}\index{准确性}(Accuracy)\index{Accuracy}问题和{\small\sffamily\bfseries{充分性}}\index{充分性}（Adequacy）\index{Adequacy}问题。为了达到以上目的，传统观点认为翻译过程需要包含三个步骤\cite{jurafsky2000speech}：
+\parinterval 上面的例子反映了人在做翻译时所使用的一些知识：首先，两种语言单词的顺序可能不一致，而且译文需要符合目标语的习惯，这也就是常说的翻译的{\small\sffamily\bfseries{流畅度}}\index{流畅度}问题（Fluency）\index{Fluency}；其次，源语言单词需要准确的被翻译出来，也就是常说的翻译的{\small\sffamily\bfseries{准确性}}\index{准确性}(Accuracy)\index{Accuracy}问题和{\small\sffamily\bfseries{充分性}}\index{充分性}（Adequacy）\index{Adequacy}问题。为了达到以上目的，传统观点认为翻译过程需要包含三个步骤\cite{parsing2009speech}：
 \begin{itemize}
 \vspace{0.5em}
@@ -529,7 +529,7 @@ g(\vectorn{s},\vectorn{t}) \equiv \prod_{j,i \in \widehat{A}}{\funp{P}(s_j,t_i)}
 %----------------------------------------------
 \vspace{-0.5em}
-\parinterval IBM模型也是建立在如上统计模型之上。具体来说，IBM模型的基础是{\small\sffamily\bfseries{噪声信道模型}}\index{噪声信道模型}（Noise Channel Model）\index{Noise Channel Model}，它是由Shannon在上世纪40年代末提出来的\cite{shannon1949communication}，并于上世纪80年代应用在语言识别领域，后来又被Brown等人用于统计机器翻译中\cite{brown1990statistical,Peter1993The}。
+\parinterval IBM模型也是建立在如上统计模型之上。具体来说，IBM模型的基础是{\small\sffamily\bfseries{噪声信道模型}}\index{噪声信道模型}（Noise Channel Model）\index{Noise Channel Model}，它是由Shannon在上世纪40年代末提出来的\cite{shannon1949communication}，并于上世纪80年代应用在语言识别领域，后来又被Brown等人用于统计机器翻译中\cite{brown1990statistical,DBLP:journals/coling/BrownPPM94}。
 \parinterval 在噪声信道模型中，源语言句子$\vectorn{s}$（信宿）被看作是由目标语言句子$\vectorn{t}$（信源）经过一个有噪声的信道得到的。如果知道了$\vectorn{s}$和信道的性质，可以通过$\funp{P}(\vectorn{t}|\vectorn{s})$得到信源的信息，这个过程如图\ref{fig:5-13}所示。
@@ -578,7 +578,7 @@ g(\vectorn{s},\vectorn{t}) \equiv \prod_{j,i \in \widehat{A}}{\funp{P}(s_j,t_i)}
 \parinterval 公式\ref{eq:5-16}展示了IBM模型最基础的建模方式，它把模型分解为两项：（反向）翻译模型$\funp{P}(\vectorn{s}|\vectorn{t})$和语言模型$\funp{P}(\vectorn{t})$。一个很自然的问题是：直接用$\funp{P}(\vectorn{t}|\vectorn{s})$定义翻译问题不就可以了吗，为什么要用$\funp{P}(\vectorn{s}|\vectorn{t})$和$\funp{P}(\vectorn{t})$的联合模型？从理论上来说，正向翻译模型$\funp{P}(\vectorn{t}|\vectorn{s})$和反向翻译模型$\funp{P}(\vectorn{s}|\vectorn{t})$的数学建模可以是一样的，因为我们只需要在建模的过程中把两个语言调换即可。使用$\funp{P}(\vectorn{s}|\vectorn{t})$和$\funp{P}(\vectorn{t})$的联合模型的意义在于引入了语言模型，它可以很好的对译文的流畅度进行评价，确保结果是通顺的目标语言句子。
-\parinterval 可以回忆一下\ref{sec:sentence-level-translation}节中讨论的问题，如果只使用翻译模型可能会造成一个局面：译文的单词都和源语言单词对应的很好，但是由于语序的问题，读起来却不像人说的话。从这个角度说，引入语言模型是十分必要的。这个问题在Brown等人的论文中也有讨论\cite{Peter1993The}，他们提到单纯使用$\funp{P}(\vectorn{s}|\vectorn{t})$会把概率分配给一些翻译对应比较好但是不合法的目标语句子，而且这部分概率可能会很大，影响模型的决策。这也正体现了IBM模型的创新之处，作者用数学技巧把$\funp{P}(\vectorn{t})$引入进来，保证了系统的输出是通顺的译文。语言模型也被广泛使用在语音识别等领域以保证结果的流畅性，甚至应用的历史比机器翻译要长得多，这里的方法也有借鉴相关工作的味道。
+\parinterval 可以回忆一下\ref{sec:sentence-level-translation}节中讨论的问题，如果只使用翻译模型可能会造成一个局面：译文的单词都和源语言单词对应的很好，但是由于语序的问题，读起来却不像人说的话。从这个角度说，引入语言模型是十分必要的。这个问题在Brown等人的论文中也有讨论\cite{DBLP:journals/coling/BrownPPM94}，他们提到单纯使用$\funp{P}(\vectorn{s}|\vectorn{t})$会把概率分配给一些翻译对应比较好但是不合法的目标语句子，而且这部分概率可能会很大，影响模型的决策。这也正体现了IBM模型的创新之处，作者用数学技巧把$\funp{P}(\vectorn{t})$引入进来，保证了系统的输出是通顺的译文。语言模型也被广泛使用在语音识别等领域以保证结果的流畅性，甚至应用的历史比机器翻译要长得多，这里的方法也有借鉴相关工作的味道。
 实际上，在机器翻译中引入语言模型是一个很深刻的概念。在IBM模型之后相当长的时间里，语言模型一直是机器翻译各个部件中最重要的部分。对译文连贯性的建模也是所有系统中需要包含的内容（即使隐形体现）。
@@ -1088,18 +1088,21 @@ c_{\mathbb{E}}(s_u|t_v)=\sum\limits_{i=1}^{N}  c_{\mathbb{E}}(s_u|t_v;s^{[i]},t^
 \sectionnewpage
 \section{小结及深入阅读}
-\parinterval 本章对IBM系列模型中的IBM模型1进行了详细的介绍和讨论，从一个简单的基于单词的翻译模型开始，本章从建模、解码、训练多个维度对统计机器翻译进行了描述，期间涉及了词对齐、优化等多个重要概念。IBM模型共分为5个模型，对翻译问题的建模依次由浅入深，同时模型复杂度也依次增加，我们将在下一章对IBM模型2-5进行详细的介绍和讨论。IBM模型作为入门统计机器翻译的``必经之路''，其思想对今天的机器翻译仍然产生着影响。虽然单独使用IBM模型进行机器翻译现在已经不多见，甚至很多从事神经机器翻译等前沿研究的人对IBM模型已经逐渐淡忘，但是不能否认IBM模型标志着一个时代的开始。从某种意义上讲，当使用公式$\hat{\vectorn{t}} = \argmax_{\vectorn{t}} \funp{P}(\vectorn{t}|\vectorn{s})$描述机器翻译问题的时候，或多或少都在与IBM模型使用相似的思想。
+\parinterval 本章对IBM系列模型中的IBM模型1进行了详细的介绍和讨论，从一个简单的基于单词的翻译模型开始，本章从建模、解码、训练多个维度对统计机器翻译进行了描述，期间涉及了词对齐、优化等多个重要概念。IBM模型共分为5个模型，对翻译问题的建模依次由浅入深，同时模型复杂度也依次增加，我们将在{\chaptersix}对IBM模型2-5进行详细的介绍和讨论。IBM模型作为入门统计机器翻译的``必经之路''，其思想对今天的机器翻译仍然产生着影响。虽然单独使用IBM模型进行机器翻译现在已经不多见，甚至很多从事神经机器翻译等前沿研究的人对IBM模型已经逐渐淡忘，但是不能否认IBM模型标志着一个时代的开始。从某种意义上讲，当使用公式$\hat{\vectorn{t}} = \argmax_{\vectorn{t}} \funp{P}(\vectorn{t}|\vectorn{s})$描述机器翻译问题的时候，或多或少都在与IBM模型使用相似的思想。
-{\color{red}词对齐需要扩充，还不太清楚具体是什么，需要问老师}
+\parinterval 当然，本书也无法涵盖IBM模型的所有内涵，很多内容需要感兴趣的读者继续研究和挖掘。其中最值得关注的是统计词对齐问题。由于词对齐是IBM模型训练的间接产物，因此IBM模型成为了自动词对齐的重要方法。比如IBM模型训练装置GIZA++更多的是被用于自动词对齐任务，而非简单的训练IBM模型参数\upcite{och2003systematic}。
-\parinterval 当然，本书也无法涵盖IBM模型的所有内涵，很多内容需要感兴趣的读者继续研究和挖掘，有两个方向可以考虑：
 \begin{itemize}
 \vspace{0.5em}
-\item IBM模型在提出后的十余年中，一直受到了学术界的关注。一个比较有代表性的成果是GIZA++（\url{https://github.com/moses-smt/giza-pp}），它集成了IBM模型和隐马尔可夫模型，并实现了这些模型的训练。在随后相当长的一段时间里，GIZA++也是机器翻译研究的标配，用于获得双语平行数据上单词一级的对齐结果。此外，研究者也对IBM模型进行了大量的分析，为后人研究统计机器翻译提供了大量依据\cite{och2004alignment}。虽然IBM模型很少被独立使用，甚至直接用基于IBM模型的解码器也不多见，但是它通常会作为其他模型的一部分参与到对翻译的建模中。这部分工作会在下一章{\color{red}基于短语和句法的模型}中进行讨论\cite{koehn2003statistical}。此外，IBM模型也给机器翻译提供了一种非常简便的计算双语词串对应好坏的方式，因此也被广泛用于度量双语词串对应的强度，是自然语言处理中的一种常用特征。
+\item 在IBM基础模型之上，有很多改进的工作。例如，对空对齐、低频词进行额外处理\upcite{DBLP:conf/acl/Moore04}；考虑源语言-目标语言和目标语言-源语言双向词对齐进行更好地词对齐对称化\upcite{肖桐1991面向统计机器翻译的重对齐方法研究}；使用词典、命名实体等多种信息对模型进行改进\upcite{2005Improving}；通过引入短语增强IBM基础模型\upcite{1998Grammar}；引入相邻单词对齐之间的依赖关系增加模型鲁棒性\upcite{DBLP:conf/acl-vlc/DaganCG93}等；也可以对IBM模型的正向和反向结果进行对称化处理，以得到更加准确词对齐结果\upcite{och2003systematic}。
+\item 随着词对齐概念的不断深入，也有很多词对齐方面的工作并不依赖IBM模型。比如，可以直接使用判别式模型利用分类器解决词对齐问题\upcite{ittycheriah2005maximum}；使用带参数控制的动态规划方法来提高词对齐准确率\upcite{DBLP:conf/naacl/GaleC91}；甚至可以把对齐的思想用于短语和句法结构的双语对应\upcite{xiao2013unsupervised}；无监督的对称词对齐方法，正向和反向模型联合训练，结合数据的相似性\upcite{DBLP:conf/naacl/LiangTK06}；除了GIZA++，研究人员也开发了很多优秀的自动对齐工具，比如，FastAlign\upcite{DBLP:conf/naacl/DyerCS13}、Berkeley Aligner（\url{https://github.com/mhajiloo/berkeleyaligner}）等，这些工具现在也有很广发的应用。
 \vspace{0.5em}
-\item 除了在机器翻译建模上的开创性工作，IBM模型的另一项重要贡献是建立了统计词对齐的基础模型。在训练IBM模型的过程中，除了学习到模型参数，还可以得到双语数据上的词对齐结果。也就是说词对齐标注是IBM模型训练的间接产物。这也使得IBM模型成为了自动词对齐的重要方法。包括GIZA++在内的很多工作，实际上更多的是被用于自动词对齐任务，而非简单的训练IBM模型参数。随着词对齐概念的不断深入，这个任务逐渐成为了自然语言处理中的重要分支，比如，对IBM模型的结果进行对称化\cite{och2003systematic}，也可以直接使用判别式模型利用分类模型解决词对齐问题\cite{ittycheriah2005maximum}，甚至可以把对齐的思想用于短语和句法结构的双语对应\cite{xiao2013unsupervised}。除了GIZA++，研究人员也开发了很多优秀的自动词对齐工具，比如，FastAlign （\url{https://github.com/clab/fast_align}）、Berkeley Aligner（\url{https://github.com/mhajiloo/berkeleyaligner}）等，这些工具现在也有很广泛的应用。
+\item 一种较为通用的词对齐评价标准是{\bfnew{对齐错误率}}(Alignment Error Rate, AER)\upcite{DBLP:journals/coling/FraserM07}。在此基础之上也可以对词对齐评价方法进行改进，以提高对齐质量与机器翻译评价得分BLEU的相关性\upcite{DBLP:conf/acl/DeNeroK07,paul2007all,黄书剑2009一种错误敏感的词对齐评价方法}。也有工作通过统计机器翻译系统性能的提升来评价对齐质量\upcite{DBLP:journals/coling/FraserM07}。不过，在相当长的时间内，词对齐质量对机器翻译系统的影响究竟如何并没有统一的结论。有些时候，词对齐的错误率下降了，但是机器翻译系统的译文品质没有带来性能提升。但是，这个问题比较复杂，需要进一步的论证。不过，可以肯定的是，词对齐可以帮助人们分析机器翻译的行为。甚至在最新的神经机器翻译中，如何在神经网络模型中寻求两种语言单词之间的对应关系也是对模型进行解释的有效手段之一\upcite{DBLP:journals/corr/FengLLZ16}。
 \vspace{0.5em}
+\item 基于单词的翻译模型的解码问题也是早期研究者所关注的。比较经典的方法的是贪婪方法\upcite{germann2003greedy}。也有研究者对不同的解码方法进行了对比\upcite{germann2001fast}，并给出了一些加速解码的思路。随后，也有工作进一步对这些方法进行改进\upcite{DBLP:conf/coling/UdupaFM04,DBLP:conf/naacl/RiedelC09}。实际上，基于单词的模型的解码是一个NP完全问题\upcite{knight1999decoding}，这也是为什么机器翻译的解码十分困难的原因。关于翻译模型解码算法的时间复杂度也有很多讨论\upcite{DBLP:conf/eacl/UdupaM06,DBLP:conf/emnlp/LeuschMN08,DBLP:journals/mt/FlemingKN15}。
 \end{itemize}

--- a/Chapter6/Figures/figure-example-of-t-s-generate.tex
+++ b/Chapter6/Figures/figure-example-of-t-s-generate.tex
@@ -10,39 +10,39 @@
 }
 {\scriptsize
 \node [anchor=west,minimum height=2.5em,minimum width=5.0em] (sf1) at ([xshift=1em]st.east) {};
-\node [rectangle,draw,anchor=west,line width=1pt,minimum height=2.5em,minimum width=5.0em,fill=green!30,drop shadow] (s1) at ([xshift=2.48em]sf1.east) {科学家};
+\node [rectangle,draw,anchor=west,line width=1pt,minimum height=2.5em,minimum width=5.0em,fill=green!30,drop shadow] (s1) at ([xshift=2.5em]sf1.east) {科学家};
-\node [rectangle,draw,anchor=west,line width=1pt,minimum height=2.5em,minimum width=5.0em,fill=green!30,drop shadow] (s2) at ([xshift=2.19em]s1.east) {们};
+\node [rectangle,draw,anchor=west,line width=1pt,minimum height=2.5em,minimum width=5.0em,fill=green!30,drop shadow] (s2) at ([xshift=2.5em]s1.east) {们};
-\node [rectangle,draw,anchor=west,line width=1pt,minimum height=2.5em,minimum width=5.0em,fill=green!30,drop shadow] (s3) at ([xshift=2.185em]s2.east) {并不};
+\node [rectangle,draw,anchor=west,line width=1pt,minimum height=2.5em,minimum width=5.0em,fill=green!30,drop shadow] (s3) at ([xshift=2.5em]s2.east) {并不};
-\node [rectangle,draw,anchor=west,line width=1pt,minimum height=2.5em,minimum width=5.0em,fill=green!30,drop shadow] (s4) at ([xshift=2.183em]s3.east) {知道};
+\node [rectangle,draw,anchor=west,line width=1pt,minimum height=2.5em,minimum width=5.0em,fill=green!30,drop shadow] (s4) at ([xshift=2.5em]s3.east) {知道};
 }
 {\scriptsize
-\node [anchor=west] (tau11) at ([xshift=1.5em]taut.east) {$\tau_0$\tiny{1.NULL}};
+\node [anchor=west] (tau11) at ([xshift=1.24em]taut.east) {$\tau_0$\; \tiny{1.NULL}};
 \begin{pgfonlayer}{background}
-\node [rounded rectangle,draw,line width=1pt,minimum height=3.0em,minimum width=6.8em,fill=red!30,drop shadow] (tau1) [fit = (tau11)] {};
+\node [rounded rectangle,draw,line width=1pt,minimum height=3.0em,minimum width=7.0em,fill=red!30,drop shadow] (tau1) [fit = (tau11)] {};
 \end{pgfonlayer}
-\node [anchor=west] (tau21) at ([xshift=1.80em]tau1.east) {$\tau_1$};
+\node [anchor=west] (tau21) at ([xshift=1.575em]tau1.east) {$\tau_1$\;};
 \node [anchor=west] (tau22) at ([yshift=-0.2em,xshift=-0.5em]tau21.north east) {\tiny{1.科学家}};
 \node [anchor=west] (tau23) at ([yshift=0.2em,xshift=-0.5em]tau21.south east) {\tiny{2.们}};
 \begin{pgfonlayer}{background}
-\node [rounded rectangle,draw,line width=1pt,minimum height=3.0em,minimum width=6.8em,fill=blue!30,drop shadow] (tau2)[fit = (tau21) (tau22) (tau23)] {};
+\node [rounded rectangle,draw,line width=1pt,minimum height=3.0em,minimum width=7.0em,fill=red!30,drop shadow] (tau2)[fit = (tau21) (tau22) (tau23)] {};
 \end{pgfonlayer}
-\node [anchor=west] (tau31) at ([xshift=2.05em]tau2.east) {$\tau_2$\tiny{1.NULL}};
+\node [anchor=west] (tau31) at ([xshift=1.997em]tau2.east) {$\tau_2$\; \tiny{1.NULL}};
 \begin{pgfonlayer}{background}
-\node [rounded rectangle,draw,line width=1pt,minimum height=3.0em,minimum width=6.8em,fill=red!30,drop shadow] (tau3) [fit = (tau31)] {};
+\node [rounded rectangle,draw,line width=1pt,minimum height=3.0em,minimum width=7.0em,fill=red!30,drop shadow] (tau3) [fit = (tau31)] {};
 \end{pgfonlayer}
-\node [anchor=west] (tau41) at ([xshift=2.2em]tau3.east) {$\tau_3$\tiny{1.并不}};
+\node [anchor=west] (tau41) at ([xshift=2.153em]tau3.east) {$\tau_3$\; \tiny{1.并不}};
 \begin{pgfonlayer}{background}
-\node [rounded rectangle,draw,line width=1pt,minimum height=3.0em,minimum width=6.8em,fill=red!30,drop shadow] (tau4) [fit = (tau41)] {};
+\node [rounded rectangle,draw,line width=1pt,minimum height=3.0em,minimum width=7.0em,fill=red!30,drop shadow] (tau4) [fit = (tau41)] {};
 \end{pgfonlayer}
-\node [anchor=west] (tau51) at ([xshift=2.2em]tau4.east) {$\tau_4$\tiny{1.知道}};
+\node [anchor=west] (tau51) at ([xshift=2.1525em]tau4.east) {$\tau_4$\; \tiny{1.知道}};
 \begin{pgfonlayer}{background}
-\node [rounded rectangle,draw,line width=1pt,minimum height=3.0em,minimum width=6.8em,fill=red!30,drop shadow] (tau5) [fit = (tau51)] {};
+\node [rounded rectangle,draw,line width=1pt,minimum height=3.0em,minimum width=7.0em,fill=red!30,drop shadow] (tau5) [fit = (tau51)] {};
 \end{pgfonlayer}
 }
@@ -73,37 +73,37 @@
 {\scriptsize
 \node [anchor=west,minimum height=2.5em,minimum width=5.0em] (sf12) at ([yshift=-15.0em,xshift=1em]st.east) {};
-\node [rectangle,draw,anchor=west,line width=1pt,minimum height=2.5em,minimum width=5.0em,fill=green!30,drop shadow] (s12) at ([xshift=2.48em]sf12.east) {科学家};
+\node [rectangle,draw,anchor=west,line width=1pt,minimum height=2.5em,minimum width=5.0em,fill=green!30,drop shadow] (s12) at ([xshift=2.5em]sf12.east) {科学家};
-\node [rectangle,draw,anchor=west,line width=1pt,minimum height=2.5em,minimum width=5.0em,fill=green!30,drop shadow] (s22) at ([xshift=2.19em]s12.east) {们};
+\node [rectangle,draw,anchor=west,line width=1pt,minimum height=2.5em,minimum width=5.0em,fill=green!30,drop shadow] (s22) at ([xshift=2.5em]s12.east) {们};
-\node [rectangle,draw,anchor=west,line width=1pt,minimum height=2.5em,minimum width=5.0em,fill=green!30,drop shadow] (s32) at ([xshift=2.185em]s22.east) {并不};
+\node [rectangle,draw,anchor=west,line width=1pt,minimum height=2.5em,minimum width=5.0em,fill=green!30,drop shadow] (s32) at ([xshift=2.5em]s22.east) {并不};
-\node [rectangle,draw,anchor=west,line width=1pt,minimum height=2.5em,minimum width=5.0em,fill=green!30,drop shadow] (s42) at ([xshift=2.183em]s32.east) {知道};
+\node [rectangle,draw,anchor=west,line width=1pt,minimum height=2.5em,minimum width=5.0em,fill=green!30,drop shadow] (s42) at ([xshift=2.5em]s32.east) {知道};
 }
 {\scriptsize
-\node [anchor=west] (tau112) at ([yshift=-15.0em,xshift=1.5em]taut.east) {$\tau_0$\tiny{1.NULL}};
+\node [anchor=west] (tau112) at ([yshift=-15.0em,xshift=1.24em]taut.east) {$\tau_0$\; \tiny{1.NULL}};
 \begin{pgfonlayer}{background}
 \node [rounded rectangle,draw,line width=1pt,minimum height=3.0em,minimum width=6.8em,fill=red!30,drop shadow] (tau12) [fit = (tau112)] {};
 \end{pgfonlayer}
-\node [anchor=west] (tau212) at ([xshift=1.80em]tau12.east) {$\tau_1$};
+\node [anchor=west] (tau212) at ([xshift=1.6762em]tau12.east) {$\tau_1$\;};
 \node [anchor=west] (tau222) at ([yshift=-0.2em,xshift=-0.5em]tau212.north east) {\tiny{1.们}};
 \node [anchor=west] (tau232) at ([yshift=0.2em,xshift=-0.5em]tau212.south east) {\tiny{2.科学家}};
 \begin{pgfonlayer}{background}
 \node [rounded rectangle,draw,line width=1pt,minimum height=3.0em,minimum width=6.8em,fill=yellow!30,drop shadow] (tau22)[fit = (tau212) (tau222) (tau232)] {};
 \end{pgfonlayer}
-\node [anchor=west] (tau312) at ([xshift=2.05em]tau22.east) {$\tau_2$\tiny{1.NULL}};
+\node [anchor=west] (tau312) at ([xshift=1.997em]tau22.east) {$\tau_2$\; \tiny{1.NULL}};
 \begin{pgfonlayer}{background}
 \node [rounded rectangle,draw,line width=1pt,minimum height=3.0em,minimum width=6.8em,fill=red!30,drop shadow] (tau32) [fit = (tau312)] {};
 \end{pgfonlayer}
-\node [anchor=west] (tau412) at ([xshift=2.2em]tau32.east) {$\tau_3$\tiny{1.并不}};
+\node [anchor=west] (tau412) at ([xshift=1.9555em]tau32.east) {$\tau_3$\; \tiny{1.并不}};
 \begin{pgfonlayer}{background}
 \node [rounded rectangle,draw,line width=1pt,minimum height=3.0em,minimum width=6.8em,fill=red!30,drop shadow] (tau42) [fit = (tau412)] {};
 \end{pgfonlayer}
-\node [anchor=west] (tau512) at ([xshift=2.2em]tau42.east) {$\tau_4$\tiny{1.知道}};
+\node [anchor=west] (tau512) at ([xshift=2.2525em]tau42.east) {$\tau_4$\; \tiny{1.知道}};
 \begin{pgfonlayer}{background}
 \node [rounded rectangle,draw,line width=1pt,minimum height=3.0em,minimum width=6.8em,fill=red!30,drop shadow] (tau52) [fit = (tau512)] {};
 \end{pgfonlayer}
@@ -131,6 +131,7 @@
 \draw [->,thick] (d42.north) -- ([yshift=-4.45em]s32.south);
 \draw [->,thick] (d52.north) -- ([yshift=-4.45em]s42.south);
 %\end{scope}
 \end{tikzpicture}

--- a/Chapter6/Figures/figure-examples-of-sequential-translation-and-reorder-translation.tex
+++ b/Chapter6/Figures/figure-examples-of-sequential-translation-and-reorder-translation.tex
@@ -24,8 +24,9 @@
 		\draw[line width=1.2pt,dashed] ([yshift=-0.3em]n14.south) -- ([yshift=0.2em]n24.north);
 		\draw[line width=1.2pt,dashed] ([yshift=-0.3em]n15.south) -- ([yshift=0.2em]n25.north);
 		\draw[line width=1.2pt,dashed] ([yshift=-0.3em]n16.south) -- ([yshift=0.2em]n26.north);
+        \node[anchor=west] at([xshift=5.5em,yshift=-3em]n21.east){(a)顺序翻译对齐结果};
 \end{scope}
-\begin{scope}[yshift=-10.0em]
+\begin{scope}[yshift=-11.5em]
 	\tikzstyle{cand} = [draw,inner sep=4pt,line width=1pt,align=center,drop shadow,minimum height =1.6em,minimum width=4.2em,fill=green!30]
 	\tikzstyle{ref} = [draw,inner sep=4pt,line width=1pt,align=center,drop shadow,minimum height =1.6em,minimum width=4.2em,fill=red!30]
@@ -48,6 +49,7 @@
 		\draw[line width=1.2pt,dashed,out=-40,in=140] ([yshift=-0.3em]n14.south) to ([yshift=0.2em]n26.north);
 		\draw[line width=1.2pt,dashed,out=-140,in=40] ([yshift=-0.3em]n15.south) to ([yshift=0.2em]n23.north);
 		\draw[line width=1.2pt,dashed,out=-140,in=40] ([yshift=-0.3em]n16.south) to ([yshift=0.2em]n24.north);
+		\node[anchor=west] at([xshift=5.5em,yshift=-3em]n21.east){(b)调序翻译对齐结果};
 \end{scope}
 \end{tikzpicture}
 %---------------------------------------------------------------------
\ No newline at end of file
--- a/Chapter6/Figures/figure-probability-translation-process.tex
+++ b/Chapter6/Figures/figure-probability-translation-process.tex
@@ -11,39 +11,39 @@
 }
 {\scriptsize
 \node [anchor=west,minimum height=2.5em,minimum width=5.0em] (sf1) at ([xshift=1em]st.east) {};
-\node [rectangle,draw,anchor=west,line width=1pt,minimum height=2.5em,minimum width=5.0em,fill=green!30,drop shadow] (s1) at ([xshift=2.48em]sf1.east) {科学家};
+\node [rectangle,draw,anchor=west,line width=1pt,minimum height=2.5em,minimum width=5.0em,fill=green!30,drop shadow] (s1) at ([xshift=2.5em]sf1.east) {科学家};
-\node [rectangle,draw,anchor=west,line width=1pt,minimum height=2.5em,minimum width=5.0em,fill=green!30,drop shadow] (s2) at ([xshift=2.19em]s1.east) {们};
+\node [rectangle,draw,anchor=west,line width=1pt,minimum height=2.5em,minimum width=5.0em,fill=green!30,drop shadow] (s2) at ([xshift=2.5em]s1.east) {们};
-\node [rectangle,draw,anchor=west,line width=1pt,minimum height=2.5em,minimum width=5.0em,fill=green!30,drop shadow] (s3) at ([xshift=2.185em]s2.east) {并不};
+\node [rectangle,draw,anchor=west,line width=1pt,minimum height=2.5em,minimum width=5.0em,fill=green!30,drop shadow] (s3) at ([xshift=2.5em]s2.east) {并不};
-\node [rectangle,draw,anchor=west,line width=1pt,minimum height=2.5em,minimum width=5.0em,fill=green!30,drop shadow] (s4) at ([xshift=2.183em]s3.east) {知道};
+\node [rectangle,draw,anchor=west,line width=1pt,minimum height=2.5em,minimum width=5.0em,fill=green!30,drop shadow] (s4) at ([xshift=2.5em]s3.east) {知道};
 }
 {\scriptsize
-\node [anchor=west] (tau11) at ([xshift=1.5em]taut.east) {$\tau_0$\tiny{1.NULL}};
+\node [anchor=west] (tau11) at ([xshift=1.24em]taut.east) {$\tau_0$\; \tiny{1.NULL}};
 \begin{pgfonlayer}{background}
-\node [rounded rectangle,draw,line width=1pt,minimum height=3.0em,minimum width=6.8em,fill=red!30,drop shadow] (tau1) [fit = (tau11)] {};
+\node [rounded rectangle,draw,line width=1pt,minimum height=3.0em,minimum width=7.0em,fill=red!30,drop shadow] (tau1) [fit = (tau11)] {};
 \end{pgfonlayer}
-\node [anchor=west] (tau21) at ([xshift=1.80em]tau1.east) {$\tau_1$};
+\node [anchor=west] (tau21) at ([xshift=1.575em]tau1.east) {$\tau_1$\;};
 \node [anchor=west] (tau22) at ([yshift=-0.2em,xshift=-0.5em]tau21.north east) {\tiny{1.科学家}};
 \node [anchor=west] (tau23) at ([yshift=0.2em,xshift=-0.5em]tau21.south east) {\tiny{2.们}};
 \begin{pgfonlayer}{background}
-\node [rounded rectangle,draw,line width=1pt,minimum height=3.0em,minimum width=6.8em,fill=red!30,drop shadow] (tau2)[fit = (tau21) (tau22) (tau23)] {};
+\node [rounded rectangle,draw,line width=1pt,minimum height=3.0em,minimum width=7.0em,fill=red!30,drop shadow] (tau2)[fit = (tau21) (tau22) (tau23)] {};
 \end{pgfonlayer}
-\node [anchor=west] (tau31) at ([xshift=2.05em]tau2.east) {$\tau_2$\tiny{1.NULL}};
+\node [anchor=west] (tau31) at ([xshift=1.997em]tau2.east) {$\tau_2$\; \tiny{1.NULL}};
 \begin{pgfonlayer}{background}
-\node [rounded rectangle,draw,line width=1pt,minimum height=3.0em,minimum width=6.8em,fill=red!30,drop shadow] (tau3) [fit = (tau31)] {};
+\node [rounded rectangle,draw,line width=1pt,minimum height=3.0em,minimum width=7.0em,fill=red!30,drop shadow] (tau3) [fit = (tau31)] {};
 \end{pgfonlayer}
-\node [anchor=west] (tau41) at ([xshift=2.2em]tau3.east) {$\tau_3$\tiny{1.并不}};
+\node [anchor=west] (tau41) at ([xshift=2.153em]tau3.east) {$\tau_3$\; \tiny{1.并不}};
 \begin{pgfonlayer}{background}
-\node [rounded rectangle,draw,line width=1pt,minimum height=3.0em,minimum width=6.8em,fill=red!30,drop shadow] (tau4) [fit = (tau41)] {};
+\node [rounded rectangle,draw,line width=1pt,minimum height=3.0em,minimum width=7.0em,fill=red!30,drop shadow] (tau4) [fit = (tau41)] {};
 \end{pgfonlayer}
-\node [anchor=west] (tau51) at ([xshift=2.2em]tau4.east) {$\tau_4$\tiny{1.知道}};
+\node [anchor=west] (tau51) at ([xshift=2.1525em]tau4.east) {$\tau_4$\; \tiny{1.知道}};
 \begin{pgfonlayer}{background}
-\node [rounded rectangle,draw,line width=1pt,minimum height=3.0em,minimum width=6.8em,fill=red!30,drop shadow] (tau5) [fit = (tau51)] {};
+\node [rounded rectangle,draw,line width=1pt,minimum height=3.0em,minimum width=7.0em,fill=red!30,drop shadow] (tau5) [fit = (tau51)] {};
 \end{pgfonlayer}
 }
@@ -51,27 +51,27 @@
 {\scriptsize
 \node [anchor=west] (phi11) at ([xshift=2.3em]phit.east) {$\phi_0$\ 0};
 \begin{pgfonlayer}{background}
-\node [rounded rectangle,draw,line width=1pt,minimum height=3.0em,minimum width=6.8em,fill=blue!30,drop shadow] (phi1) [fit = (phi11)] {};
+\node [rounded rectangle,draw,line width=1pt,minimum height=3.0em,minimum width=7.0em,fill=blue!30,drop shadow] (phi1) [fit = (phi11)] {};
 \end{pgfonlayer}
-\node [anchor=west] (phi21) at ([xshift=2.947em]phi1.east) {$\phi_1$\ 2};
+\node [anchor=west] (phi21) at ([xshift=2.867em]phi1.east) {$\phi_1$\ 2};
 \begin{pgfonlayer}{background}
-\node [rounded rectangle,draw,line width=1pt,minimum height=3.0em,minimum width=6.8em,fill=blue!30,drop shadow] (phi2) [fit = (phi21)] {};
+\node [rounded rectangle,draw,line width=1pt,minimum height=3.0em,minimum width=7.0em,fill=blue!30,drop shadow] (phi2) [fit = (phi21)] {};
 \end{pgfonlayer}
-\node [anchor=west] (phi31) at ([xshift=2.876em]phi2.east) {$\phi_2$\ 0};
+\node [anchor=west] (phi31) at ([xshift=3.087em]phi2.east) {$\phi_2$\ 0};
 \begin{pgfonlayer}{background}
-\node [rounded rectangle,draw,line width=1pt,minimum height=3.0em,minimum width=6.8em,fill=blue!30,drop shadow] (phi3) [fit = (phi31)] {};
+\node [rounded rectangle,draw,line width=1pt,minimum height=3.0em,minimum width=7.0em,fill=blue!30,drop shadow] (phi3) [fit = (phi31)] {};
 \end{pgfonlayer}
-\node [anchor=west] (phi41) at ([xshift=2.8715em]phi3.east) {$\phi_3$\ 1};
+\node [anchor=west] (phi41) at ([xshift=3.086em]phi3.east) {$\phi_3$\ 1};
 \begin{pgfonlayer}{background}
-\node [rounded rectangle,draw,line width=1pt,minimum height=3.0em,minimum width=6.8em,fill=blue!30,drop shadow] (phi4) [fit = (phi41)] {};
+\node [rounded rectangle,draw,line width=1pt,minimum height=3.0em,minimum width=7.0em,fill=blue!30,drop shadow] (phi4) [fit = (phi41)] {};
 \end{pgfonlayer}
-\node [anchor=west] (phi51) at ([xshift=2.86925em]phi4.east) {$\phi_4$\ 1};
+\node [anchor=west] (phi51) at ([xshift=3.086em]phi4.east) {$\phi_4$\ 1};
 \begin{pgfonlayer}{background}
-\node [rounded rectangle,draw,line width=1pt,minimum height=3.0em,minimum width=6.8em,fill=blue!30,drop shadow] (phi5) [fit = (phi51)] {};
+\node [rounded rectangle,draw,line width=1pt,minimum height=3.0em,minimum width=7.0em,fill=blue!30,drop shadow] (phi5) [fit = (phi51)] {};
 \end{pgfonlayer}
 }
@@ -105,7 +105,6 @@
 \draw [->,thick] (t4.north) -- (phi4.south);
 \draw [->,thick] (t5.north) -- (phi5.south);
 {\scriptsize
 \node [anchor=west] (sent11) at ([xshift=1em,yshift=-0.3em]s4.south east) {把这些元语};
 \node [anchor=west] (sent12) at ([yshift=-1em]sent11.west) {言单词放在};

--- a/Chapter6/chapter6.tex
+++ b/Chapter6/chapter6.tex
@@ -34,7 +34,7 @@
 \sectionnewpage
 \section{基于扭曲度的翻译模型}
-下面将介绍扭曲度在机器翻译中的定义及使用方法。这也带来了两个新的翻译模型\ \dash\ IBM模型2\cite{Peter1993The}和HMM翻译模型\cite{vogel1996hmm}。
+下面将介绍扭曲度在机器翻译中的定义及使用方法。这也带来了两个新的翻译模型\ \dash\ IBM模型2\upcite{DBLP:journals/coling/BrownPPM94}和HMM翻译模型\upcite{vogel1996hmm}。
 %----------------------------------------------------------------------------------------
 %    NEW SUB-SECTION
@@ -71,44 +71,45 @@
 %----------------------------------------------------------------------------------------
 \subsection{IBM模型2}
-\parinterval 对于建模来说，IBM模型1很好地化简了翻译问题，但是由于使用了很强的假设，导致模型和实际情况有较大差异。其中一个比较严重的问题是假设词对齐的生成概率服从均匀分布。IBM模型2抛弃了这个假设\cite{Peter1993The}。它认为词对齐是有倾向性的，它与源语言单词的位置和目标语言单词的位置有关。具体来说，对齐位置$a_j$的生成概率与位置$j$、源语言句子长度$m$和目标语言句子长度$l$有关，形式化表述为：
+\parinterval 对于建模来说，IBM模型1很好地化简了翻译问题，但是由于使用了很强的假设，导致模型和实际情况有较大差异。其中一个比较严重的问题是假设词对齐的生成概率服从均匀分布。IBM模型2抛弃了这个假设\upcite{DBLP:journals/coling/BrownPPM94}。它认为词对齐是有倾向性的，它与源语言单词的位置和目标语言单词的位置有关。具体来说，对齐位置$a_j$的生成概率与位置$j$、源语言句子长度$m$和目标语言句子长度$l$有关，形式化表述为：
 \begin{eqnarray}
-\textrm{P}(a_j|a_1^{j-1},s_1^{j-1},m,\mathbf{t}) \equiv a(a_j|j,m,l)
+\funp{P}(a_j|a_1^{j-1},s_1^{j-1},m,\vectorn{t}) \equiv a(a_j|j,m,l)
 \label{eq:6-1}
 \end{eqnarray}
-\parinterval 这里还用{\chapterthree}中的例子（图\ref{fig:6-4-a}）来进行说明。在IBM模型1中，``桌子''对齐到目标语言四个位置的概率是一样的。但在IBM模型2中，``桌子''对齐到``table''被形式化为$a(a_j |j,m,l)=a(3|2,3,3)$，意思是对于源语言位置2（$j=2$）的词，如果它的源语言和目标语言都是3个词（$l=3,m=3$），对齐到目标语言位置3（$a_j=3$）的概率是多少？因为$a(a_j|j,m,l)$也是模型需要学习的参数，因此``桌子''对齐到不同目标语言单词的概率也是不一样的。理想的情况下，通过$a(a_j|j,m,l)$，``桌子''对齐到``table''应该得到更高的概率。
+\parinterval 这里还用{\chapterthree}中的例子（图\ref{fig:6-3}）来进行说明。在IBM模型1中，``桌子''对齐到目标语言四个位置的概率是一样的。但在IBM模型2中，``桌子''对齐到``table''被形式化为$a(a_j |j,m,l)=a(3|2,3,3)$，意思是对于源语言位置2（$j=2$）的词，如果它的源语言和目标语言都是3个词（$l=3,m=3$），对齐到目标语言位置3（$a_j=3$）的概率是多少？因为$a(a_j|j,m,l)$也是模型需要学习的参数，因此``桌子''对齐到不同目标语言单词的概率也是不一样的。理想的情况下，通过$a(a_j|j,m,l)$，``桌子''对齐到``table''应该得到更高的概率。
 %----------------------------------------------
 \begin{figure}[htp]
    \centering
 \input{./Chapter6/Figures/figure-zh-en-bilingual-sentence-pairs}
    \caption{汉译英句对及词对齐}
-    \label{fig:6-4-a}
+    \label{fig:6-3}
 \end{figure}
 %----------------------------------------------
 \parinterval IBM模型2的其他假设均与模型1相同，即源语言长度预测概率及源语言单词生成概率被定义为：
 \begin{eqnarray}
-\textrm{P}(m|\mathbf{t}) & \equiv & \varepsilon \label{eq:s-len-gen-prob} \\
+\funp{P}(m|\vectorn{t}) & \equiv & \varepsilon \label{eq:s-len-gen-prob} \\
-\textrm{P}(s_j|a_1^{j},s_1^{j-1},m,\mathbf{t}) & \equiv & f(s_j|t_{a_j}) \label{eq:s-word-gen-prob}
+\funp{P}(s_j|a_1^{j},s_1^{j-1},m,\vectorn{t}) & \equiv & f(s_j|t_{a_j})
+\label{eq:s-word-gen-prob}
 \end{eqnarray}
-把公式\ref{eq:s-len-gen-prob}、\ref{eq:s-word-gen-prob}和\ref{eq:6-1} 重新带入公式$\textrm{P}(\mathbf{s},\mathbf{a}|\mathbf{t})=\textrm{P}(m|\mathbf{t})\prod_{j=1}^{m}{\textrm{P}(a_j|a_1^{j-1},s_1^{j-1},m,\mathbf{t})\textrm{P}(s_j|a_1^{j},s_1^{j-1},}$\\${m,\mathbf{t})}$ 和$\textrm{P}(\mathbf{s}|\mathbf{t})= \sum_{\mathbf{a}}\textrm{P}(\mathbf{s},\mathbf{a}|\mathbf{t})$，可以得到IBM模型2的数学描述：
+把公式\ref{eq:s-len-gen-prob}、\ref{eq:s-word-gen-prob}和\ref{eq:6-1} 重新带入公式$\funp{P}(\vectorn{s},\vectorn{a}|\vectorn{t})=\funp{P}(m|\vectorn{t})\prod_{j=1}^{m}{\funp{P}(a_j|a_1^{j-1},s_1^{j-1},m,\vectorn{t})}$\\${\funp{P}(s_j|a_1^{j},s_1^{j-1},m,\vectorn{t})}$ 和$\funp{P}(\vectorn{s}|\vectorn{t})= \sum_{\vectorn{a}}\funp{P}(\vectorn{s},\vectorn{a}|\vectorn{t})$，可以得到IBM模型2的数学描述：
 \begin{eqnarray}
-\textrm{P}(\mathbf{s}| \mathbf{t}) & = &  \sum_{\mathbf{a}}{\textrm{P}(\mathbf{s},\mathbf{a}| \mathbf{t})} \nonumber \\
+\funp{P}(\vectorn{s}| \vectorn{t}) & = &  \sum_{\vectorn{a}}{\funp{P}(\vectorn{s},\vectorn{a}| \vectorn{t})} \nonumber \\
                       & = & \sum_{a_1=0}^{l}{\cdots}\sum _{a_m=0}^{l}{\varepsilon}\prod_{j=1}^{m}{a(a_j|j,m,l)f(s_j|t_{a_j})}
-\label{eq:6-2}
+\label{eq:6-4}
 \end{eqnarray}
-\parinterval 类似于模型1，模型2的表达式\ref{eq:6-2}也能被拆分为两部分进行理解。第一部分：遍历所有的$\mathbf{a}$；第二部分：对于每个$\mathbf{a}$累加对齐概率$\textrm{P}(\mathbf{s},\mathbf{a}| \mathbf{t})$，即计算对齐概率$a(a_j|j,m,l)$和词汇翻译概率$f(s_j|t_{a_j})$对于所有源语言位置的乘积。
+\parinterval 类似于模型1，模型2的表达式\ref{eq:6-4}也能被拆分为两部分进行理解。第一部分：遍历所有的$\vectorn{a}$；第二部分：对于每个$\vectorn{a}$累加对齐概率$\funp{P}(\vectorn{s},\vectorn{a}| \vectorn{t})$，即计算对齐概率$a(a_j|j,m,l)$和词汇翻译概率$f(s_j|t_{a_j})$对于所有源语言位置的乘积。
 \parinterval 同样的，模型2的解码及训练优化和模型1的十分相似，在此不再赘述，详细推导过程可以参看{\chapterfive}解码及计算优化部分。这里直接给出IBM模型2的最终表达式：
 \begin{eqnarray}
-\textrm{P}(\mathbf{s}| \mathbf{t}) & = & \varepsilon \prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} a(i|j,m,l) f(s_j|t_i)
+\funp{P}(\vectorn{s}| \vectorn{t}) & = & \varepsilon \prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} a(i|j,m,l) f(s_j|t_i)
-\label{eq:6-3}
+\label{eq:6-5}
 \end{eqnarray}
@@ -118,40 +119,40 @@
 \subsection{隐马尔可夫模型}
-\parinterval IBM模型把翻译问题定义为生成词对齐的问题，模型翻译质量的好坏与词对齐有着非常紧密的联系。IBM模型1假设对齐概率仅依赖于目标语言句子长度，即对齐概率服从均匀分布；IBM模型2假设对齐概率与源语言、目标语言的句子长度以及源语言位置和目标语言位置相关。虽然IBM模型2已经覆盖了一部分词对齐问题，但是该模型只考虑到了单词的绝对位置，并未考虑到相邻单词间的关系。图\ref{fig:6-5} 展示了一个简单的实例，可以看到的是，汉语的每个单词都被分配给了英语句子中的每一个单词，但是单词并不是任意分布在各个位置上的，而是倾向于生成簇。也就是说，如果源语言的两个单词位置越近，它们的译文在目标语言句子中的位置也越近。
+\parinterval IBM模型把翻译问题定义为生成词对齐的问题，模型翻译质量的好坏与词对齐有着非常紧密的联系。IBM模型1假设对齐概率仅依赖于目标语言句子长度，即对齐概率服从均匀分布；IBM模型2假设对齐概率与源语言、目标语言的句子长度以及源语言位置和目标语言位置相关。虽然IBM模型2已经覆盖了一部分词对齐问题，但是该模型只考虑到了单词的绝对位置，并未考虑到相邻单词间的关系。图\ref{fig:6-4} 展示了一个简单的实例，可以看到的是，汉语的每个单词都被分配给了英语句子中的每一个单词，但是单词并不是任意分布在各个位置上的，而是倾向于生成簇。也就是说，如果源语言的两个单词位置越近，它们的译文在目标语言句子中的位置也越近。
 %----------------------------------------------
 \begin{figure}[htp]
    \centering
 \input{./Chapter6/Figures/figure-zh-en-sentence-alignment}
    \caption{汉译英句对及对齐}
-    \label{fig:6-5}
+    \label{fig:6-4}
 \end{figure}
 %----------------------------------------------
-\parinterval 针对此问题，基于HMM的词对齐模型抛弃了IBM模型1-2的绝对位置假设，将一阶隐马尔可夫模型用于词对齐问题\cite{vogel1996hmm}。HMM词对齐模型认为，单词与单词之间并不是毫无联系的，对齐概率应该取决于对齐位置的差异而不是本身单词所在的位置。具体来说，位置$j$的对齐概率$a_j$与前一个位置$j-1$的对齐位置$a_{j-1}$和译文长度$l$有关，形式化的表述为：
+\parinterval 针对此问题，基于HMM的词对齐模型抛弃了IBM模型1-2的绝对位置假设，将一阶隐马尔可夫模型用于词对齐问题\upcite{vogel1996hmm}。HMM词对齐模型认为，单词与单词之间并不是毫无联系的，对齐概率应该取决于对齐位置的差异而不是本身单词所在的位置。具体来说，位置$j$的对齐概率$a_j$与前一个位置$j-1$的对齐位置$a_{j-1}$和译文长度$l$有关，形式化的表述为：
 \begin{eqnarray}
-\textrm{P}(a_{j}|a_{1}^{j-1},s_{1}^{j-1},m,\mathbf{t})\equiv\textrm{P}(a_{j}|a_{j-1},l)
+\funp{P}(a_{j}|a_{1}^{j-1},s_{1}^{j-1},m,\vectorn{t})\equiv\funp{P}(a_{j}|a_{j-1},l)
-\label{eq:6-4}
+\label{eq:6-6}
 \end{eqnarray}
-\parinterval 这里用图\ref{fig:6-5}的例子对公式进行说明。在IBM模型1-2中，单词的对齐都是与单词所在的绝对位置有关。但在HMM词对齐模型中，``你''对齐到``you''被形式化为$\textrm{P}(a_{j}|a_{j-1},l)= P(5|4,5)$，意思是对于源语言位置$3(j=3)$上的单词，如果它的译文是第5个目标语言单词，上一个对齐位置是$4(a_{2}=4)$，对齐到目标语言位置$5(a_{j}=5)$的概率是多少？理想的情况下，通过$\textrm{P}(a_{j}|a_{j-1},l)$，``你''对齐到``you''应该得到更高的概率，并且由于源语言单词``对''和``你''距离很近，因此其对应的对齐位置``with''和``you''的距离也应该很近。
+\parinterval 这里用图\ref{fig:6-4}的例子对公式进行说明。在IBM模型1-2中，单词的对齐都是与单词所在的绝对位置有关。但在HMM词对齐模型中，``你''对齐到``you''被形式化为$\funp{P}(a_{j}|a_{j-1},l)= P(5|4,5)$，意思是对于源语言位置$3(j=3)$上的单词，如果它的译文是第5个目标语言单词，上一个对齐位置是$4(a_{2}=4)$，对齐到目标语言位置$5(a_{j}=5)$的概率是多少？理想的情况下，通过$\funp{P}(a_{j}|a_{j-1},l)$，``你''对齐到``you''应该得到更高的概率，并且由于源语言单词``对''和``你''距离很近，因此其对应的对齐位置``with''和``you''的距离也应该很近。
-\parinterval 把公式$\textrm{P}(s_j|a_1^{j},s_1^{j-1},m,\mathbf{t}) \equiv f(s_j|t_{a_j})$和\ref{eq:6-4}重新带入公式$\textrm{P}(\mathbf{s},\mathbf{a}|\mathbf{t})=\textrm{P}(m|\mathbf{t})$\\$\prod_{j=1}^{m}{\textrm{P}(a_j|a_1^{j-1},s_1^{j-1},m,\mathbf{t})\textrm{P}(s_j|a_1^{j},s_1^{j-1},m,\mathbf{t})}$和$\textrm{P}(\mathbf{s}|\mathbf{t})= \sum_{\mathbf{a}}\textrm{P}(\mathbf{s},\mathbf{a}|\mathbf{t})$,可得HMM词对齐模型的数学描述：
+\parinterval 把公式$\funp{P}(s_j|a_1^{j},s_1^{j-1},m,\vectorn{t}) \equiv f(s_j|t_{a_j})$和\ref{eq:6-6}重新带入公式$\funp{P}(\vectorn{s},\vectorn{a}|\vectorn{t})=\funp{P}(m|\vectorn{t})$\\$\prod_{j=1}^{m}{\funp{P}(a_j|a_1^{j-1},s_1^{j-1},m,\vectorn{t})\funp{P}(s_j|a_1^{j},s_1^{j-1},m,\vectorn{t})}$和$\funp{P}(\vectorn{s}|\vectorn{t})= \sum_{\vectorn{a}}\funp{P}(\vectorn{s},\vectorn{a}|\vectorn{t})$,可得HMM词对齐模型的数学描述：
 \begin{eqnarray}
-\textrm{P}(\mathbf{s}| \mathbf{t})=\sum_{\mathbf{a}}{\textrm{P}(m|\mathbf{t})}\prod_{j=1}^{m}{\textrm{P}(a_{j}|a_{j-1},l)f(s_{j}|t_{a_j})}
+\funp{P}(\vectorn{s}| \vectorn{t})=\sum_{\vectorn{a}}{\funp{P}(m|\vectorn{t})}\prod_{j=1}^{m}{\funp{P}(a_{j}|a_{j-1},l)f(s_{j}|t_{a_j})}
-\label{eq:6-5}
+\label{eq:6-7}
 \end{eqnarray}
-\parinterval 此外，为了使得HMM的对齐概率$\textrm{P}(a_{j}|a_{j-1},l)$满足归一化的条件，这里还假设其对齐概率只取决于$a_{j}-a_{j-1}$，即：
+\parinterval 此外，为了使得HMM的对齐概率$\funp{P}(a_{j}|a_{j-1},l)$满足归一化的条件，这里还假设其对齐概率只取决于$a_{j}-a_{j-1}$，即：
 \begin{eqnarray}
-\textrm{P}(a_{j}|a_{j-1},l)=\frac{\mu(a_{j}-a_{j-1})}{\sum_{i=1}^{l}{\mu(i-a_{j-1})}}
+\funp{P}(a_{j}|a_{j-1},l)=\frac{\mu(a_{j}-a_{j-1})}{\sum_{i=1}^{l}{\mu(i-a_{j-1})}}
-\label{eq:6-6}
+\label{eq:6-8}
 \end{eqnarray}
 \noindent 其中，$\mu( \cdot )$是隐马尔可夫模型的参数，可以通过训练得到。
-\parinterval 需要注意的是，公式\ref{eq:6-5}之所以被看作是一种隐马尔可夫模型，是由于其形式与标准的一阶隐马尔可夫模型无异。$\textrm{P}(a_{j}|a_{j-1},l)$可以被看作是一种状态转移概率，$f(s_{j}|t_{a_j})$可以被看作是一种发射概率。关于隐马尔可夫模型具体的数学描述也可参考{\chapterthree}中的相关内容。
+\parinterval 需要注意的是，公式\ref{eq:6-7}之所以被看作是一种隐马尔可夫模型，是由于其形式与标准的一阶隐马尔可夫模型无异。$\funp{P}(a_{j}|a_{j-1},l)$可以被看作是一种状态转移概率，$f(s_{j}|t_{a_j})$可以被看作是一种发射概率。关于隐马尔可夫模型具体的数学描述也可参考{\chapterthree}中的相关内容。
@@ -172,17 +173,17 @@
 \parinterval 从前面的介绍可知，IBM模型1和模型2把不同的源语言单词看作相互独立的单元来进行词对齐和翻译。换句话说，即使某个源语言短语中的两个单词都对齐到同一个目标语单词，它们之间也是相互独立的。这样IBM模型1和模型2对于多个源语言单词对齐到同一个目标语单词的情况并不能很好地进行描述。
-\parinterval 这里将会给出另一个翻译模型，能在一定程度上解决上面提到的问题\cite{Peter1993The,och2003systematic}。该模型把目标语言生成源语言的过程分解为如下几个步骤：首先，确定每个目标语言单词生成源语言单词的个数，这里把它称为{\small\sffamily\bfseries{繁衍率}}\index{繁衍率}或{\small\sffamily\bfseries{产出率}}\index{产出率}（Fertility）\index{Fertility}；其次，决定目标语言句子中每个单词生成的源语言单词都是什么，即决定生成的第一个源语言单词是什么，生成的第二个源语言单词是什么，以此类推。这样每个目标语言单词就对应了一个源语言单词列表；最后把各组源语言单词列表中的每个单词都放置到合适的位置上，完成目标语言译文到源语言句子的生成。
+\parinterval 这里将会给出另一个翻译模型，能在一定程度上解决上面提到的问题\upcite{DBLP:journals/coling/BrownPPM94,och2003systematic}。该模型把目标语言生成源语言的过程分解为如下几个步骤：首先，确定每个目标语言单词生成源语言单词的个数，这里把它称为{\small\sffamily\bfseries{繁衍率}}\index{繁衍率}或{\small\sffamily\bfseries{产出率}}\index{产出率}（Fertility）\index{Fertility}；其次，决定目标语言句子中每个单词生成的源语言单词都是什么，即决定生成的第一个源语言单词是什么，生成的第二个源语言单词是什么，以此类推。这样每个目标语言单词就对应了一个源语言单词列表；最后把各组源语言单词列表中的每个单词都放置到合适的位置上，完成目标语言译文到源语言句子的生成。
-\parinterval 对于句对$(\mathbf{s},\mathbf{t})$，令$\varphi$表示产出率，同时令${\tau}$表示每个目标语言单词对应的源语言单词列表。图{\ref{fig:6-6}}描述了一个英语句子生成汉语句子的过程。
+\parinterval 对于句对$(\vectorn{s},\vectorn{t})$，令$\varphi$表示产出率，同时令${\tau}$表示每个目标语言单词对应的源语言单词列表。图{\ref{fig:6-5}}描述了一个英语句子生成汉语句子的过程。
 \begin{itemize}
 \vspace{0.3em}
 \item 首先，对于每个英语单词$t_i$决定它的产出率$\varphi_{i}$。比如``Scientists''的产出率是2，可表示为${\varphi}_{1}=2$。这表明它会生成2个汉语单词；
 \vspace{0.3em}
-\item 其次，确定英语句子中每个单词生成的汉语单词列表。比如``Scientists''生成``科学家''和``们''两个汉语单词，可表示为${\tau}_1=\{{\tau}_{11}=\textrm{``科学家''},{\tau}_{12}=\textrm{``们''}$。 这里用特殊的空标记NULL表示翻译对空的情况；
+\item 其次，确定英语句子中每个单词生成的汉语单词列表。比如``Scientists''生成``科学家''和``们''两个汉语单词，可表示为${\tau}_1=\{{\tau}_{11}=\textrm{``科学家''},{\tau}_{12}=\textrm{``们''}\}$。 这里用特殊的空标记NULL表示翻译对空的情况；
 \vspace{0.3em}
-\item 最后，把生成的所有汉语单词放在合适的位置。比如``科学家''和``们''分别放在$\mathbf{s}$的位置1和位置2。可以用符号$\pi$记录生成的单词在源语言句子$\mathbf{s}$中的位置。比如``Scientists'' 生成的汉语单词在$\mathbf{s}$ 中的位置表示为${\pi}_{1}=\{{\pi}_{11}=1,{\pi}_{12}=2\}$。
+\item 最后，把生成的所有汉语单词放在合适的位置。比如``科学家''和``们''分别放在$\vectorn{s}$的位置1和位置2。可以用符号$\pi$记录生成的单词在源语言句子$\vectorn{s}$中的位置。比如``Scientists'' 生成的汉语单词在$\vectorn{s}$ 中的位置表示为${\pi}_{1}=\{{\pi}_{11}=1,{\pi}_{12}=2\}$。
 \vspace{0.3em}
 \end{itemize}
@@ -191,18 +192,18 @@
    \centering
 \input{./Chapter6/Figures/figure-probability-translation-process}
   \caption{基于产出率的翻译模型执行过程}
-   \label{fig:6-6}
+   \label{fig:6-5}
 \end{figure}
 %----------------------------------------------
-\parinterval 为了表述清晰，这里重新说明每个符号的含义。$\mathbf{s}$、$\mathbf{t}$、$m$和$l$分别表示源语言句子、目标语言译文、源语言单词数量以及译文单词数量。$\mathbf{\varphi}$、$\mathbf{\tau}$ 和$\mathbf{\pi}$分别表示产出率、生成的源语言单词以及它们在源语言句子中的位置。${\varphi}_{i}$表示第$i$个目标语言单词$t_i$的产出率。${\tau}_{i}$和${\pi}_i$ 分别表示$t_i$生成的源语言单词列表及其在源语言句子$\mathbf{s}$中的位置列表。
+\parinterval 为了表述清晰，这里重新说明每个符号的含义。$\vectorn{s}$、$\vectorn{t}$、$m$和$l$分别表示源语言句子、目标语言译文、源语言单词数量以及译文单词数量。$\vectorn{\varphi}$、$\vectorn{\tau}$ 和$\vectorn{\pi}$分别表示产出率、生成的源语言单词以及它们在源语言句子中的位置。${\varphi}_{i}$表示第$i$个目标语言单词$t_i$的产出率。${\tau}_{i}$和${\pi}_i$ 分别表示$t_i$生成的源语言单词列表及其在源语言句子$\vectorn{s}$中的位置列表。
-\parinterval 可以看出，一组$\tau$和$\pi$（记为$<\tau,\pi>$）可以决定一个对齐$\mathbf{a}$和一个源语句子$\mathbf{s}$。
+\parinterval 可以看出，一组$\tau$和$\pi$（记为$<\tau,\pi>$）可以决定一个对齐$\vectorn{a}$和一个源语句子$\vectorn{s}$。
-\noindent 相反的，一个对齐$\mathbf{a}$和一个源语句子$\mathbf{s}$可以对应多组$<\tau,\pi>$。如图\ref{fig:6-7}所示，不同的$<\tau,\pi>$对应同一个源语言句子和词对齐。它们的区别在于目标语单词``Scientists''生成的源语言单词``科学家''和`` 们''的顺序不同。这里把不同的$<\tau,\pi>$对应到的相同的源语句子$\mathbf{s}$和对齐$\mathbf{a}$记为$<\mathbf{s},\mathbf{a}>$。因此计算$\textrm{P}(\mathbf{s},\mathbf{a}| \mathbf{t})$时需要把每个可能结果的概率加起来，如下：
+\noindent 相反的，一个对齐$\vectorn{a}$和一个源语句子$\vectorn{s}$可以对应多组$<\tau,\pi>$。如图\ref{fig:6-6}所示，不同的$<\tau,\pi>$对应同一个源语言句子和词对齐。它们的区别在于目标语单词``Scientists''生成的源语言单词``科学家''和`` 们''的顺序不同。这里把不同的$<\tau,\pi>$对应到的相同的源语句子$\vectorn{s}$和对齐$\vectorn{a}$记为$<\vectorn{s},\vectorn{a}>$。因此计算$\funp{P}(\vectorn{s},\vectorn{a}| \vectorn{t})$时需要把每个可能结果的概率加起来，如下：
 \begin{equation}
-\textrm{P}(\mathbf{s},\mathbf{a}| \mathbf{t})=\sum_{{<\tau,\pi>}\in{<\mathbf{s},\mathbf{a}>}}{\textrm{P}(\tau,\pi|\mathbf{t}) }
+\funp{P}(\vectorn{s},\vectorn{a}| \vectorn{t})=\sum_{{<\tau,\pi>}\in{<\vectorn{s},\vectorn{a}>}}{\funp{P}(\tau,\pi|\vectorn{t}) }
-\label{eq:6-7}
+\label{eq:6-9}
 \end{equation}
 %----------------------------------------------
@@ -210,33 +211,33 @@
    \centering
 \input{./Chapter6/Figures/figure-example-of-t-s-generate}
   \caption{不同$\tau$和$\pi$对应相同的源语言句子和词对齐的情况}
-   \label{fig:6-7}
+   \label{fig:6-6}
 \end{figure}
 %----------------------------------------------
-\parinterval 不过$<\mathbf{s},\mathbf{a}>$中有多少组$<\tau,\pi>$呢？通过图\ref{fig:6-6}中的例子，可以推出$<\mathbf{s},\mathbf{a}>$应该包含$\prod_{i=0}^{l}{\varphi_i !}$个不同的二元组$<\tau,\pi>$。 这是因为在给定源语言句子和词对齐时，对于每一个$\tau_i$都有$\varphi_{i}!$种排列。
+\parinterval 不过$<\vectorn{s},\vectorn{a}>$中有多少组$<\tau,\pi>$呢？通过图\ref{fig:6-5}中的例子，可以推出$<\vectorn{s},\vectorn{a}>$应该包含$\prod_{i=0}^{l}{\varphi_i !}$个不同的二元组$<\tau,\pi>$。 这是因为在给定源语言句子和词对齐时，对于每一个$\tau_i$都有$\varphi_{i}!$种排列。
-\parinterval 进一步，$\textrm{P}(\tau,\pi|\mathbf{t})$可以被表示如图\ref{fig:6-8}的形式。其中$\tau_{i1}^{k-1}$表示$\tau_{i1}\tau_{i2}\cdots \tau_{i(k-1)}$，$\pi_{i1}^{ k-1}$表示$\pi_{i1}\pi_{i2}\cdots \pi_{i(k-1)}$。可以把图\ref{fig:6-8}中的公式分为5个部分，并用不同的序号和颜色进行标注。每部分的具体含义是：
+\parinterval 进一步，$\funp{P}(\tau,\pi|\vectorn{t})$可以被表示如图\ref{fig:6-7}的形式。其中$\tau_{i1}^{k-1}$表示$\tau_{i1}\tau_{i2}\cdots \tau_{i(k-1)}$，$\pi_{i1}^{ k-1}$表示$\pi_{i1}\pi_{i2}\cdots \pi_{i(k-1)}$。可以把图\ref{fig:6-7}中的公式分为5个部分，并用不同的序号和颜色进行标注。每部分的具体含义是：
 %----------------------------------------------
 \begin{figure}[htp]
    \centering
 \input{./Chapter6/Figures/figure-expression}
-   \caption{{$\textrm{P}(\tau,\pi|t)$}的详细表达式}
+   \caption{{$\funp{P}(\tau,\pi|t)$}的详细表达式}
 \setlength{\belowcaptionskip}{-0.5em}
-   \label{fig:6-8}
+   \label{fig:6-7}
 \end{figure}
 %----------------------------------------------
 \begin{itemize}
 \vspace{0.5em}
-\item 第一部分：每个$i\in[1,l]$的目标语单词的产出率建模（{\color{red!70} 红色}），即$\varphi_i$的生成概率。它依赖于$\mathbf{t}$和区间$[1,i-1]$的目标语单词的产出率$\varphi_1^{i-1}$。\footnote{这里约定，当$i=1$ 时，$\varphi_1^0$ 表示空。}
+\item 第一部分：每个$i\in[1,l]$的目标语单词的产出率建模（{\color{red!70} 红色}），即$\varphi_i$的生成概率。它依赖于$\vectorn{t}$和区间$[1,i-1]$的目标语单词的产出率$\varphi_1^{i-1}$。\footnote{这里约定，当$i=1$ 时，$\varphi_1^0$ 表示空。}
 \vspace{0.5em}
-\item 第二部分：$i=0$时的产出率建模（{\color{blue!70} 蓝色}），即空标记$t_0$的产出率生成概率。它依赖于$\mathbf{t}$和区间$[1,i-1]$的目标语单词的产出率$\varphi_1^l$。
+\item 第二部分：$i=0$时的产出率建模（{\color{blue!70} 蓝色}），即空标记$t_0$的产出率生成概率。它依赖于$\vectorn{t}$和区间$[1,i-1]$的目标语单词的产出率$\varphi_1^l$。
 \vspace{0.5em}
-\item 第三部分：词汇翻译建模（{\color{green!70} 绿色}），目标语言单词$t_i$生成第$k$个源语言单词$\tau_{ik}$时的概率，依赖于$\mathbf{t}$、所有目标语言单词的产出率$\varphi_0^l$、区间$i\in[1,l]$的目标语言单词生成的源语言单词$\tau_1^{i-1}$和目标语单词$t_i$生成的前$k$个源语言单词$\tau_{i1}^{k-1}$。
+\item 第三部分：词汇翻译建模（{\color{green!70} 绿色}），目标语言单词$t_i$生成第$k$个源语言单词$\tau_{ik}$时的概率，依赖于$\vectorn{t}$、所有目标语言单词的产出率$\varphi_0^l$、区间$i\in[1,l]$的目标语言单词生成的源语言单词$\tau_1^{i-1}$和目标语单词$t_i$生成的前$k$个源语言单词$\tau_{i1}^{k-1}$。
 \vspace{0.5em}
 \item 第四部分：对于每个$i\in[1,l]$的目标语言单词生成的源语言单词的扭曲度建模（{\color{yellow!70!black} 黄色}），即第$i$个目标语言单词生成的第$k$个源语言单词在源文中的位置$\pi_{ik}$ 的概率。其中$\pi_1^{i-1}$ 表示区间$[1,i-1]$的目标语言单词生成的源语言单词的扭曲度，$\pi_{i1}^{k-1}$表示第$i$目标语言单词生成的前$k-1$个源语言单词的扭曲度。
 \vspace{0.5em}
@@ -249,63 +250,65 @@
 \subsection{IBM 模型3}
-\parinterval IBM模型3通过一些假设对图\ref{fig:6-8}所表示的基本模型进行了化简。具体来说，对于每个$i\in[1,l]$，假设$\textrm{P}(\varphi_i |\varphi_1^{i-1},\mathbf{t})$仅依赖于$\varphi_i$和$t_i$，$\textrm{P}(\pi_{ik}|\pi_{i1}^{k-1},\pi_1^{i-1},\tau_0^l,\varphi_0^l,\mathbf{t})$仅依赖于$\pi_{ik}$、$i$、$m$和$l$。而对于所有的$i\in[0,l]$，假设$\textrm{P}(\tau_{ik}|\tau_{i1}^{k-1},\tau_1^{i-1},\varphi_0^l,\mathbf{t})$仅依赖于$\tau_{ik}$和$t_i$。这些假设的形式化描述为：
+\parinterval IBM模型3通过一些假设对图\ref{fig:6-7}所表示的基本模型进行了化简。具体来说，对于每个$i\in[1,l]$，假设$\funp{P}(\varphi_i |\varphi_1^{i-1},\vectorn{t})$仅依赖于$\varphi_i$和$t_i$，$\funp{P}(\pi_{ik}|\pi_{i1}^{k-1},\pi_1^{i-1},\tau_0^l,\varphi_0^l,\vectorn{t})$仅依赖于$\pi_{ik}$、$i$、$m$和$l$。而对于所有的$i\in[0,l]$，假设$\funp{P}(\tau_{ik}|\tau_{i1}^{k-1},\tau_1^{i-1},\varphi_0^l,\vectorn{t})$仅依赖于$\tau_{ik}$和$t_i$。这些假设的形式化描述为：
 \begin{eqnarray}
-\textrm{P}(\varphi_i|\varphi_1^{i-1},\mathbf{t})                                                              & = &{\textrm{P}(\varphi_i|t_i)} \label{eq:6-8} \\
+\funp{P}(\varphi_i|\varphi_1^{i-1},\vectorn{t})                                                              & = &{\funp{P}(\varphi_i|t_i)} \label{eq:6-10} \\
-\textrm{P}(\tau_{ik} = s_j |\tau_{i1}^{k-1},\tau_{1}^{i-1},\varphi_0^t,\mathbf{t})             & = & t(s_j|t_i) \label{eq:6-9} \\
+\funp{P}(\tau_{ik} = s_j |\tau_{i1}^{k-1},\tau_{1}^{i-1},\varphi_0^t,\vectorn{t})             & = & t(s_j|t_i) \label{eq:6-11} \\
-\textrm{P}(\pi_{ik} = j |\pi_{i1}^{k-1},\pi_{1}^{i-1},\tau_{0}^{l},\varphi_{0}^{l},\mathbf{t}) & = & d(j|i,m,l) \label{eq:6-10}
+\funp{P}(\pi_{ik} = j |\pi_{i1}^{k-1},\pi_{1}^{i-1},\tau_{0}^{l},\varphi_{0}^{l},\vectorn{t}) & = & d(j|i,m,l) \label{eq:6-12}
 \end{eqnarray}
-\parinterval 通常把$d(j|i,m,l)$称为扭曲度函数。这里$\textrm{P}(\varphi_i|\varphi_1^{i-1},\mathbf{t})={\textrm{P}(\varphi_i|t_i)}$和${\textrm{P}(\pi_{ik}=j|\pi_{i1}^{k-1},}$ $\pi_{1}^{i-1},\tau_0^l,\varphi_0^l,\mathbf{t})=d(j|i,m,l)$仅对$1 \le i \le l$成立。这样就完成了图\ref{fig:6-8}中第1、 3和4部分的建模。
+\parinterval 通常把$d(j|i,m,l)$称为扭曲度函数。这里$\funp{P}(\varphi_i|\varphi_1^{i-1},\vectorn{t})={\funp{P}(\varphi_i|t_i)}$和${\funp{P}(\pi_{ik}=j|\pi_{i1}^{k-1},}$ $\pi_{1}^{i-1},\tau_0^l,\varphi_0^l,\vectorn{t})=d(j|i,m,l)$仅对$1 \le i \le l$成立。这样就完成了图\ref{fig:6-7}中第1、 3和4部分的建模。
-\parinterval 对于$i=0$的情况需要单独进行考虑。实际上，$t_0$只是一个虚拟的单词。它要对应$\mathbf{s}$中原本为空对齐的单词。这里假设：要等其他非空对应单词都被生成（放置）后，才考虑这些空对齐单词的生成（放置）。即非空对单词都被生成后，在那些还有空的位置上放置这些空对的源语言单词。此外，在任何的空位置上放置空对的源语言单词都是等概率的，即放置空对齐源语言单词服从均匀分布。这样在已经放置了$k$个空对齐源语言单词的时候，应该还有$\varphi_0-k$个空位置。如果第$j$个源语言位置为空，那么
+\parinterval 对于$i=0$的情况需要单独进行考虑。实际上，$t_0$只是一个虚拟的单词。它要对应$\vectorn{s}$中原本为空对齐的单词。这里假设：要等其他非空对应单词都被生成（放置）后，才考虑这些空对齐单词的生成（放置）。即非空对单词都被生成后，在那些还有空的位置上放置这些空对的源语言单词。此外，在任何的空位置上放置空对的源语言单词都是等概率的，即放置空对齐源语言单词服从均匀分布。这样在已经放置了$k$个空对齐源语言单词的时候，应该还有$\varphi_0-k$个空位置。如果第$j$个源语言位置为空，那么
 \begin{equation}
-\textrm{P}(\pi_{0k}=j|\pi_{01}^{k-1},\pi_1^l,\tau_0^l,\varphi_0^l,\mathbf{t})=\frac{1}{\varphi_0-k}
+\funp{P}(\pi_{0k}=j|\pi_{01}^{k-1},\pi_1^l,\tau_0^l,\varphi_0^l,\vectorn{t})=\frac{1}{\varphi_0-k}
+\label{eq:6-13}
 \end{equation}
 否则
 \begin{equation}
-\textrm{P}(\pi_{0k}=j|\pi_{01}^{k-1},\pi_1^l,\tau_0^l,\varphi_0^l,\mathbf{t})=0
+\funp{P}(\pi_{0k}=j|\pi_{01}^{k-1},\pi_1^l,\tau_0^l,\varphi_0^l,\vectorn{t})=0
+\label{eq:6-14}
 \end{equation}
 这样对于$t_0$所对应的$\tau_0$，就有
 {
 \begin{eqnarray}
-\prod_{k=1}^{\varphi_0}{\textrm{P}(\pi_{0k}|\pi_{01}^{k-1},\pi_{1}^{l},\tau_{0}^{l},\varphi_{0}^{l},\mathbf{t})         }=\frac{1}{\varphi_{0}!}
+\prod_{k=1}^{\varphi_0}{\funp{P}(\pi_{0k}|\pi_{01}^{k-1},\pi_{1}^{l},\tau_{0}^{l},\varphi_{0}^{l},\vectorn{t})         }=\frac{1}{\varphi_{0}!}
-\label{eq:6-11}
+\label{eq:6-15}
 \end{eqnarray}
 }
 \parinterval 而上面提到的$t_0$所对应的这些空位置是如何生成的呢？即如何确定哪些位置是要放置空对齐的源语言单词。在IBM模型3中，假设在所有的非空对齐源语言单词都被生成出来后（共$\varphi_1+\varphi_2+\cdots {\varphi}_l$个非空对源语单词），这些单词后面都以$p_1$概率随机地产生一个``槽''用来放置空对齐单词。这样，${\varphi}_0$就服从了一个二项分布。于是得到
 {
 \begin{eqnarray}
-\textrm{P}(\varphi_0|\mathbf{t})=\big(\begin{array}{c}
+\funp{P}(\varphi_0|\vectorn{t})=\big(\begin{array}{c}
 \varphi_1+\varphi_2+\cdots \varphi_l\\
 \varphi_0\\
 \end{array}\big)p_0^{\varphi_1+\varphi_2+\cdots \varphi_l-\varphi_0}p_1^{\varphi_0}
-\label{eq:6-12}
+\label{eq:6-16}
 \end{eqnarray}
 }
-\noindent 其中，$p_0+p_1=1$。到此为止，已经完成了图\ref{fig:6-8}中第2和5部分的建模。最终根据这些假设可以得到$\textrm{P}(\mathbf{s}| \mathbf{t})$的形式为：
+\noindent 其中，$p_0+p_1=1$。到此为止，已经完成了图\ref{fig:6-7}中第2和5部分的建模。最终根据这些假设可以得到$\funp{P}(\vectorn{s}| \vectorn{t})$的形式为：
 {
 \begin{eqnarray}
-{\textrm{P}(\mathbf{s}| \mathbf{t})}&= &{\sum_{a_1=0}^{l}{\cdots}\sum_{a_m=0}^{l}{\Big[\big(\begin{array}{c}
+{\funp{P}(\vectorn{s}| \vectorn{t})}&= &{\sum_{a_1=0}^{l}{\cdots}\sum_{a_m=0}^{l}{\Big[\big(\begin{array}{c}
 m-\varphi_0\\
 \varphi_0\\
 \end{array}\big)}p_0^{m-2\varphi_0}p_1^{\varphi_0}\prod_{i=1}^{l}{{\varphi_i}!n(\varphi_i|t_i)    }} \nonumber \\
 & & \times{\prod_{j=1}^{m}{t(s_j|t_{a_j})} \times \prod_{j=1,a_j\neq 0}^{m}{d(j|a_j,m,l)}} \Big]
-\label{eq:6-13}
+\label{eq:6-17}
 \end{eqnarray}
 }
-\noindent 其中，$n(\varphi_i |t_i)={\textrm{P}(\varphi_i|t_i)}$表示产出率的分布。这里的约束条件为，
+\noindent 其中，$n(\varphi_i |t_i)={\funp{P}(\varphi_i|t_i)}$表示产出率的分布。这里的约束条件为，
 {
 \begin{eqnarray}
-\sum_{s_x}t(s_x|t_y)                     & = &1 \label{eq:6-14} \\
+\sum_{s_x}t(s_x|t_y)                     & = &1 \label{eq:6-18} \\
-\sum_{j}d(j|i,m,l)                & = & 1 \label{eq:6-15} \\
+\sum_{j}d(j|i,m,l)                & = & 1 \label{eq:6-19} \\
-\sum_{\varphi} n(\varphi|t_y) & = &1 \label{eq:6-16} \\
+\sum_{\varphi} n(\varphi|t_y) & = &1 \label{eq:6-20} \\
-p_0+p_1                            & = & 1 \label{eq:6-17}
+p_0+p_1                            & = & 1 \label{eq:6-21}
 \end{eqnarray}
 }
@@ -317,32 +320,32 @@ p_0+p_1                            & = & 1 \label{eq:6-17}
 \parinterval IBM模型3仍然存在问题，比如，它不能很好地处理一个目标语言单词生成多个源语言单词的情况。这个问题在模型1和模型2中也存在。如果一个目标语言单词对应多个源语言单词，往往这些源语言单词构成短语或搭配。但是模型1-3把这些源语言单词看成独立的单元，而实际上它们是一个整体。这就造成了在模型1-3中这些源语言单词可能会``分散''开。为了解决这个问题，模型4对模型3进行了进一步修正。
-\parinterval 为了更清楚的阐述，这里引入新的术语\ \dash \ {\small\bfnew{概念单元}}\index{概念单元}或{\small\bfnew{概念}}\index{概念}（Concept）\index{Concept}。词对齐可以被看作概念之间的对应。这里的概念是指具有独立语法或语义功能的一组单词。依照Brown等人的表示方法\cite{Peter1993The}，可以把概念记为cept.。每个句子都可以被表示成一系列的cept.。这里要注意的是，源语言句子中的cept.数量不一定等于目标句子中的cept.数量。因为有些cept. 可以为空，因此可以把那些空对的单词看作空cept.。比如，在图\ref{fig:6-9}的实例中，``了''就对应一个空cept.。
+\parinterval 为了更清楚的阐述，这里引入新的术语\ \dash \ {\small\bfnew{概念单元}}\index{概念单元}或{\small\bfnew{概念}}\index{概念}（Concept）\index{Concept}。词对齐可以被看作概念之间的对应。这里的概念是指具有独立语法或语义功能的一组单词。依照Brown等人的表示方法\upcite{DBLP:journals/coling/BrownPPM94}，可以把概念记为cept.。每个句子都可以被表示成一系列的cept.。这里要注意的是，源语言句子中的cept.数量不一定等于目标句子中的cept.数量。因为有些cept. 可以为空，因此可以把那些空对的单词看作空cept.。比如，在图\ref{fig:6-8}的实例中，``了''就对应一个空cept.。
 %----------------------------------------------
 \begin{figure}[htp]
    \centering
 \input{./Chapter6/Figures/figure-word-alignment}
   \caption{词对齐的汉译英句对及独立单词cept.的位置（记为$[i]$）}
-   \label{fig:6-9}
+   \label{fig:6-8}
 \end{figure}
 %----------------------------------------------
-\parinterval 在IBM模型的词对齐框架下，目标语的cept.只能是那些非空对齐的目标语单词，而且每个cept.只能由一个目标语言单词组成（通常把这类由一个单词组成的cept.称为独立单词cept.）。这里用$[i]$表示第$i$ 个独立单词cept.在目标语言句子中的位置。换句话说，$[i]$表示第$i$个非空对的目标语单词的位置。比如在本例中``mind''在$\mathbf{t}$中的位置表示为$[3]$。
+\parinterval 在IBM模型的词对齐框架下，目标语的cept.只能是那些非空对齐的目标语单词，而且每个cept.只能由一个目标语言单词组成（通常把这类由一个单词组成的cept.称为独立单词cept.）。这里用$[i]$表示第$i$ 个独立单词cept.在目标语言句子中的位置。换句话说，$[i]$表示第$i$个非空对的目标语单词的位置。比如在本例中``mind''在$\vectorn{t}$中的位置表示为$[3]$。
 \parinterval 另外，可以用$\odot_{i}$表示位置为$[i]$的目标语言单词对应的那些源语言单词位置的平均值，如果这个平均值不是整数则对它向上取整。比如在本例中，目标语句中第4个cept. （``.''）对应在源语言句子中的第5个单词。可表示为${\odot}_{4}=5$。
 \parinterval 利用这些新引进的概念，模型4对模型3的扭曲度进行了修改。主要是把扭曲度分解为两类参数。对于$[i]$对应的源语言单词列表($\tau_{[i]}$)中的第一个单词($\tau_{[i]1}$），它的扭曲度用如下公式计算：
 \begin{equation}
-\textrm{P}(\pi_{[i]1}=j|{\pi}_1^{[i]-1},{\tau}_0^l,{\varphi}_0^l,\mathbf{t})=d_{1}(j-{\odot}_{i-1}|A(t_{[i-1]}),B(s_j))
+\funp{P}(\pi_{[i]1}=j|{\pi}_1^{[i]-1},{\tau}_0^l,{\varphi}_0^l,\vectorn{t})=d_{1}(j-{\odot}_{i-1}|A(t_{[i-1]}),B(s_j))
-\label{eq:6-18}
+\label{eq:6-22}
 \end{equation}
 \noindent 其中，第$i$个目标语言单词生成的第$k$个源语言单词的位置用变量$\pi_{ik}$表示。而对于列表($\tau_{[i]}$)中的其他的单词($\tau_{[i]k},1 < k \le \varphi_{[i]}$)的扭曲度，用如下公式计算：
 \begin{equation}
-\textrm{P}(\pi_{[i]k}=j|{\pi}_{[i]1}^{k-1},\pi_1^{[i]-1},\tau_0^l,\varphi_0^l,\mathbf{t})=d_{>1}(j-\pi_{[i]k-1}|B(s_j))
+\funp{P}(\pi_{[i]k}=j|{\pi}_{[i]1}^{k-1},\pi_1^{[i]-1},\tau_0^l,\varphi_0^l,\vectorn{t})=d_{>1}(j-\pi_{[i]k-1}|B(s_j))
-\label{eq:6-19}
+\label{eq:6-23}
 \end{equation}
 \parinterval 这里的函数$A(\cdot)$和函数$B(\cdot)$分别把目标语言和源语言的单词影射到单词的词类。这么做的目的是要减小参数空间的大小。词类信息通常可以通过外部工具得到，比如Brown聚类等。另一种简单的方法是把单词直接映射为它的词性。这样可以直接用现在已经非常成熟的词性标注工具解决问题。
@@ -357,32 +360,32 @@ p_0+p_1                            & = & 1 \label{eq:6-17}
 \subsection{ IBM 模型5}
-\parinterval 模型3和模型4并不是``准确''的模型。这两个模型会把一部分概率分配给一些根本就不存在的句子。这个问题被称作IBM模型3和模型4的{\small\bfnew{缺陷}}\index{缺陷}（Deficiency）\index{Deficiency}。说的具体一些，模型3和模型4 中并没有这样的约束：如果已经放置了某个源语言单词的位置不能再放置其他单词，也就是说句子的任何位置只能放置一个词，不能多也不能少。由于缺乏这个约束，模型3和模型4中在所有合法的词对齐上概率和不等于1。 这部分缺失的概率被分配到其他不合法的词对齐上。举例来说，如图\ref{fig:6-10}所示，``吃/早饭''和``have breakfast''之间的合法词对齐用直线表示 。但是在模型3和模型4中， 它们的概率和为$0.9<1$。 损失掉的概率被分配到像5和6这样的对齐上了（红色）。虽然IBM模型并不支持一对多的对齐，但是模型3和模型4把概率分配给这些`` 不合法''的词对齐上，因此也就产生所谓的缺陷。
+\parinterval 模型3和模型4并不是``准确''的模型。这两个模型会把一部分概率分配给一些根本就不存在的句子。这个问题被称作IBM模型3和模型4的{\small\bfnew{缺陷}}\index{缺陷}（Deficiency）\index{Deficiency}。说的具体一些，模型3和模型4 中并没有这样的约束：如果已经放置了某个源语言单词的位置不能再放置其他单词，也就是说句子的任何位置只能放置一个词，不能多也不能少。由于缺乏这个约束，模型3和模型4中在所有合法的词对齐上概率和不等于1。 这部分缺失的概率被分配到其他不合法的词对齐上。举例来说，如图\ref{fig:6-9}所示，``吃/早饭''和``have breakfast''之间的合法词对齐用直线表示 。但是在模型3和模型4中， 它们的概率和为$0.9<1$。 损失掉的概率被分配到像5和6这样的对齐上了（红色）。虽然IBM模型并不支持一对多的对齐，但是模型3和模型4把概率分配给这些`` 不合法''的词对齐上，因此也就产生所谓的缺陷。
 %----------------------------------------------
 \begin{figure}[htp]
    \centering
 \input{./Chapter6/Figures/figure-word-alignment&probability-distribution-in-ibm-model-3}
    \caption{IBM模型3的词对齐及概率分配}
-    \label{fig:6-10}
+    \label{fig:6-9}
 \end{figure}
 %----------------------------------------------
 \parinterval 为了解决这个问题，模型5在模型中增加了额外的约束。基本想法是，在放置一个源语言单词的时候检查这个位置是否已经放置了单词，如果可以则把这个放置过程赋予一定的概率，否则把它作为不可能事件。基于这个想法，就需要在逐个放置源语言单词的时候判断源语言句子的哪些位置为空。这里引入一个变量$v(j, {\tau_1}^{[i]-1}, \tau_{[i]1}^{k-1})$，它表示在放置$\tau_{[i]k}$之前（$\tau_1^{[i]-1}$ 和$\tau_{[i]1}^{k-1}$已经被放置完了），从源语言句子的第一个位置到位置$j$（包含$j$）为止还有多少个空位置。这里，把这个变量简写为$v_j$。于是，对于$[i]$所对应的源语言单词列表（$\tau_{[i]}$）中的第一个单词（$\tau_{[i]1}$），有：
 \begin{eqnarray}
-\textrm{P}(\pi_{[i]1} = j | \pi_1^{[i]-1}, \tau_0^l, \varphi_0^l, \mathbf{t}) & = & d_1(v_j|B(s_j), v_{\odot_{i-1}}, v_m-(\varphi_{[i]}-1)) \cdot \nonumber \\
+\funp{P}(\pi_{[i]1} = j | \pi_1^{[i]-1}, \tau_0^l, \varphi_0^l, \vectorn{t}) & = & d_1(v_j|B(s_j), v_{\odot_{i-1}}, v_m-(\varphi_{[i]}-1)) \cdot \nonumber \\
                                                                                                   &     & (1-\delta(v_j,v_{j-1}))
-\label{eq:6-20}
+\label{eq:6-24}
 \end{eqnarray}
 \parinterval 对于其他单词（$\tau_{[i]k}$, $1 < k\le\varphi_{[i]}$），有：
 \begin{eqnarray}
-&   & \textrm{P}(\pi_{[i]k}=j|\pi_{[i]1}^{k-1}, \pi_1^{[i]-1}, \tau_0^l, \varphi_0^l,\mathbf{t}) \nonumber \\
+&   & \funp{P}(\pi_{[i]k}=j|\pi_{[i]1}^{k-1}, \pi_1^{[i]-1}, \tau_0^l, \varphi_0^l,\vectorn{t}) \nonumber \\
 &= & d_{>1}(v_j-v_{\pi_{[i]k-1}}|B(s_j), v_m-v_{\pi_{[i]k-1}}-\varphi_{[i]}+k) \cdot (1-\delta(v_j,v_{j-1}))
-\label{eq:6-20}
+\label{eq:6-25}
 \end{eqnarray}
-\noindent 这里，因子$1-\delta(v_j, v_{j-1})$是用来判断第$j$个位置是不是为空。如果第$j$个位置为空则$v_j = v_{j-1}$，这样$\textrm{P}(\pi_{[i]1}=j|\pi_1^{[i]-1}, \tau_0^l, \varphi_0^l, \mathbf{t}) = 0$。这样就从模型上避免了模型3和模型4中生成不存在的字符串的问题。这里还要注意的是，对于放置第一个单词的情况，影响放置的因素有$v_j$，$B(s_i)$和$v_{j-1}$。此外还要考虑位置$j$放置了第一个源语言单词以后它的右边是不是还有足够的位置留给剩下的$k-1$个源语言单词。参数$v_m-(\varphi_{[i]}-1)$正是为了考虑这个因素，这里$v_m$表示整个源语言句子中还有多少空位置，$\varphi_{[i]}-1$ 表示源语言位置$j$右边至少还要留出的空格数。对于放置非第一个单词的情况，主要是要考虑它和前一个放置位置的相对位置。这主要体现在参数$v_j-v_{\varphi_{[i]}k-1}$上。式\ref{eq:6-20} 的其他部分都可以用上面的理论解释，这里不再赘述。
+\noindent 这里，因子$1-\delta(v_j, v_{j-1})$是用来判断第$j$个位置是不是为空。如果第$j$个位置为空则$v_j = v_{j-1}$，这样$\funp{P}(\pi_{[i]1}=j|\pi_1^{[i]-1}, \tau_0^l, \varphi_0^l, \vectorn{t}) = 0$。这样就从模型上避免了模型3和模型4中生成不存在的字符串的问题。这里还要注意的是，对于放置第一个单词的情况，影响放置的因素有$v_j$，$B(s_i)$和$v_{j-1}$。此外还要考虑位置$j$放置了第一个源语言单词以后它的右边是不是还有足够的位置留给剩下的$k-1$个源语言单词。参数$v_m-(\varphi_{[i]}-1)$正是为了考虑这个因素，这里$v_m$表示整个源语言句子中还有多少空位置，$\varphi_{[i]}-1$ 表示源语言位置$j$右边至少还要留出的空格数。对于放置非第一个单词的情况，主要是要考虑它和前一个放置位置的相对位置。这主要体现在参数$v_j-v_{\varphi_{[i]}k-1}$上。式\ref{eq:6-25} 的其他部分都可以用上面的理论解释，这里不再赘述。
 \parinterval 实际上，模型5和模型4的思想基本一致，即，先确定$\tau_{[i]1}$的绝对位置，然后再确定$\tau_{[i]}$中剩余单词的相对位置。模型5消除了产生不存在的句子的可能性，不过模型5的复杂性也大大增加了。
 %----------------------------------------------------------------------------------------
@@ -415,7 +418,7 @@ p_0+p_1                            & = & 1 \label{eq:6-17}
 \parinterval 本质上，IBM模型词对齐的``不完整''问题是IBM模型本身的缺陷。解决这个问题有很多思路。一种思路是，反向训练后，合并源语言单词，然后再正向训练。这里用汉英翻译为例来解释这个方法。首先反向训练，就是把英语当作待翻译语言，而把汉语当作目标语言进行训练（参数估计）。这样可以得到一个词对齐结果（参数估计的中间结果）。在这个词对齐结果里面，一个汉语单词可对应多个英语单词。之后，扫描每个英语句子，如果有多个英语单词对应同一个汉语单词，就把这些英语单词合并成一个英语单词。处理完之后，再把汉语当作源语言而把英语当作目标语言进行训练。这样就可以把一个汉语单词对应到合并的英语单词上。虽然从模型上看，还是一个汉语单词对应一个英语``单词''，但实质上已经把这个汉语单词对应到多个英语单词上了。训练完之后，再利用这些参数进行翻译（解码）时，就能把一个中文单词翻译成多个英文单词了。但是反向训练后再训练也存在一些问题。首先，合并英语单词会使数据变得更稀疏，训练不充分。其次，由于IBM模型的词对齐结果并不是高精度的，利用它的词对齐结果来合并一些英文单词可能造成严重的错误，比如：把本来很独立的几个单词合在了一起。因此，还要考虑实际需要和问题的严重程度来决定是否使用该方法。
-\parinterval 另一种思路是双向对齐之后进行词对齐{\small\sffamily\bfseries{对称化}}\index{对称化}（Symmetrization）\index{Symmetrization}。这个方法可以在IBM词对齐的基础上获得对称的词对齐结果。思路很简单，用正向（汉语为源语言，英语为目标语言）和反向（汉语为目标语言，英语为源语言）同时训练。这样可以得到两个词对齐结果。然后利用一些启发性方法用这两个词对齐生成对称的结果（比如，取`` 并集''、``交集''等），这样就可以得到包含一对多和多对多的词对齐结果\cite{och2003systematic}。比如，在基于短语的统计机器翻译中已经很成功地使用了这种词对齐信息进行短语的获取。直到今天，对称化仍然是很多自然语言处理系统中的一个关键步骤。
+\parinterval 另一种思路是双向对齐之后进行词对齐{\small\sffamily\bfseries{对称化}}\index{对称化}（Symmetrization）\index{Symmetrization}。这个方法可以在IBM词对齐的基础上获得对称的词对齐结果。思路很简单，用正向（汉语为源语言，英语为目标语言）和反向（汉语为目标语言，英语为源语言）同时训练。这样可以得到两个词对齐结果。然后利用一些启发性方法用这两个词对齐生成对称的结果（比如，取`` 并集''、``交集''等），这样就可以得到包含一对多和多对多的词对齐结果\upcite{och2003systematic}。比如，在基于短语的统计机器翻译中已经很成功地使用了这种词对齐信息进行短语的获取。直到今天，对称化仍然是很多自然语言处理系统中的一个关键步骤。
 %----------------------------------------------------------------------------------------
 %    NEW SUB-SECTION
@@ -423,23 +426,23 @@ p_0+p_1                            & = & 1 \label{eq:6-17}
 \subsection{``缺陷''问题}
-\parinterval IBM模型的缺陷是指翻译模型会把一部分概率分配给一些根本不存在的源语言字符串。如果用$\textrm{P}(\textrm{well}|\mathbf{t})$表示$\textrm{P}(\mathbf{s}| \mathbf{t})$在所有的正确的（可以理解为语法上正确的）$\mathbf{s}$上的和，即
+\parinterval IBM模型的缺陷是指翻译模型会把一部分概率分配给一些根本不存在的源语言字符串。如果用$\funp{P}(\textrm{well}|\vectorn{t})$表示$\funp{P}(\vectorn{s}| \vectorn{t})$在所有的正确的（可以理解为语法上正确的）$\vectorn{s}$上的和，即
 \begin{eqnarray}
-\textrm{P}(\textrm{well}|\mathbf{t})=\sum_{\mathbf{s}\textrm{\;is\;well\;formed}}{\textrm{P}(\mathbf{s}| \mathbf{t})}
+\funp{P}(\textrm{well}|\vectorn{t})=\sum_{\vectorn{s}\textrm{\;is\;well\;formed}}{\funp{P}(\vectorn{s}| \vectorn{t})}
-\label{eq:6-22}
+\label{eq:6-26}
 \end{eqnarray}
-\parinterval 类似地，用$\textrm{P}(\textrm{ill}|\mathbf{t})$表示$\textrm{P}(\mathbf{s}| \mathbf{t})$在所有的错误的（可以理解为语法上错误的）$\mathbf{s}$上的和。如果$\textrm{P}(\textrm{well}|\mathbf{t})+ \textrm{P}(\textrm{ill}|\mathbf{t})<1$，就把剩余的部分定义为$\textrm{P}(\textrm{failure}|\mathbf{t})$。它的形式化定义为，
+\parinterval 类似地，用$\funp{P}(\textrm{ill}|\vectorn{t})$表示$\funp{P}(\vectorn{s}| \vectorn{t})$在所有的错误的（可以理解为语法上错误的）$\vectorn{s}$上的和。如果$\funp{P}(\textrm{well}|\vectorn{t})+ \funp{P}(\textrm{ill}|\vectorn{t})<1$，就把剩余的部分定义为$\funp{P}(\textrm{failure}|\vectorn{t})$。它的形式化定义为，
 \begin{eqnarray}
-\textrm{P}({\textrm{failure}|\mathbf{t}})  = 1 - \textrm{P}({\textrm{well}|\mathbf{t}}) - \textrm{P}({\textrm{ill}|\mathbf{t}})
+\funp{P}({\textrm{failure}|\vectorn{t}})  = 1 - \funp{P}({\textrm{well}|\vectorn{t}}) - \funp{P}({\textrm{ill}|\vectorn{t}})
-\label{eq:6-23}
+\label{eq:6-27}
 \end{eqnarray}
-\parinterval 本质上，模型3和模型4就是对应$\textrm{P}({\textrm{failure}|\mathbf{t}})>0$的情况。这部分概率是模型损失掉的。有时候也把这类缺陷称为{\small\bfnew{物理缺陷}}\index{物理缺陷}（Physical Deficiency\index{Physical Deficiency}）或{\small\bfnew{技术缺陷}}\index{技术缺陷}（Technical Deficiency\index{Technical Deficiency}）。还有一种缺陷被称作{\small\bfnew{精神缺陷}}（Spiritual Deficiency\index{Spiritual Deficiency}）或{\small\bfnew{逻辑缺陷}}\index{逻辑缺陷}（Logical Deficiency\index{Logical Deficiency}），它是指$\textrm{P}({\textrm{well}|\mathbf{t}}) + \textrm{P}({\textrm{ill}|\mathbf{t}}) = 1$ 且$\textrm{P}({\textrm{ill}|\mathbf{t}}) > 0$的情况。模型1 和模型2 就有逻辑缺陷。可以注意到，技术缺陷只存在于模型3 和模型4 中，模型1和模型2并没有技术缺陷问题。根本原因在于模型1和模型2的词对齐是从源语言出发对应到目标语言，$\mathbf{t}$到$\mathbf{s}$ 的翻译过程实际上是从单词$s_1$开始到单词$s_m$ 结束，依次把每个源语言单词$s_j$对应到唯一一个目标语言位置。显然，这个过程能够保证每个源语言单词仅对应一个目标语言单词。但是，模型3 和模型4中对齐是从目标语言出发对应到源语言，$\mathbf{t}$到$\mathbf{s}$的翻译过程从$t_1$开始$t_l$ 结束，依次把目标语言单词$t_i$生成的单词对应到某个源语言位置上。但是这个过程不能保证$t_i$中生成的单词所对应的位置没有被其他单词占用，因此也就产生了缺陷。
+\parinterval 本质上，模型3和模型4就是对应$\funp{P}({\textrm{failure}|\vectorn{t}})>0$的情况。这部分概率是模型损失掉的。有时候也把这类缺陷称为{\small\bfnew{物理缺陷}}\index{物理缺陷}（Physical Deficiency\index{Physical Deficiency}）或{\small\bfnew{技术缺陷}}\index{技术缺陷}（Technical Deficiency\index{Technical Deficiency}）。还有一种缺陷被称作{\small\bfnew{精神缺陷}}（Spiritual Deficiency\index{Spiritual Deficiency}）或{\small\bfnew{逻辑缺陷}}\index{逻辑缺陷}（Logical Deficiency\index{Logical Deficiency}），它是指$\funp{P}({\textrm{well}|\vectorn{t}}) + \funp{P}({\textrm{ill}|\vectorn{t}}) = 1$ 且$\funp{P}({\textrm{ill}|\vectorn{t}}) > 0$的情况。模型1 和模型2 就有逻辑缺陷。可以注意到，技术缺陷只存在于模型3 和模型4 中，模型1和模型2并没有技术缺陷问题。根本原因在于模型1和模型2的词对齐是从源语言出发对应到目标语言，$\vectorn{t}$到$\vectorn{s}$ 的翻译过程实际上是从单词$s_1$开始到单词$s_m$ 结束，依次把每个源语言单词$s_j$对应到唯一一个目标语言位置。显然，这个过程能够保证每个源语言单词仅对应一个目标语言单词。但是，模型3 和模型4中对齐是从目标语言出发对应到源语言，$\vectorn{t}$到$\vectorn{s}$的翻译过程从$t_1$开始$t_l$ 结束，依次把目标语言单词$t_i$生成的单词对应到某个源语言位置上。但是这个过程不能保证$t_i$中生成的单词所对应的位置没有被其他单词占用，因此也就产生了缺陷。
-\parinterval 这里还要强调的是，技术缺陷是模型3和模型4是模型本身的缺陷造成的，如果有一个``更好''的模型就可以完全避免这个问题。而逻辑缺陷几乎是不能从模型上根本解决的，因为对于任意一种语言都不能枚举所有的句子（$\textrm{P}({\textrm{ill}|\mathbf{t}})$实际上是得不到的）。
+\parinterval 这里还要强调的是，技术缺陷是模型3和模型4是模型本身的缺陷造成的，如果有一个``更好''的模型就可以完全避免这个问题。而逻辑缺陷几乎是不能从模型上根本解决的，因为对于任意一种语言都不能枚举所有的句子（$\funp{P}({\textrm{ill}|\vectorn{t}})$实际上是得不到的）。
-\parinterval IBM的模型5已经解决了技术缺陷问题。但逻辑缺陷的解决很困难，因为即使对于人来说也很难判断一个句子是不是``良好''的句子。当然可以考虑用语言模型来缓解这个问题，不过由于在翻译的时候源语言句子都是定义``良好''的句子，$\textrm{P}({\textrm{ill}|\mathbf{t}})$对$\textrm{P}(\mathbf{s}| \mathbf{t})$的影响并不大。但用输入的源语言句子$\mathbf{s}$的``良好性''并不能解决技术缺陷，因为技术缺陷是模型的问题或者模型参数估计方法的问题。无论输入什么样的$\mathbf{s}$，模型3和模型4的技术缺陷问题都存在。
+\parinterval IBM的模型5已经解决了技术缺陷问题。但逻辑缺陷的解决很困难，因为即使对于人来说也很难判断一个句子是不是``良好''的句子。当然可以考虑用语言模型来缓解这个问题，不过由于在翻译的时候源语言句子都是定义``良好''的句子，$\funp{P}({\textrm{ill}|\vectorn{t}})$对$\funp{P}(\vectorn{s}| \vectorn{t})$的影响并不大。但用输入的源语言句子$\vectorn{s}$的``良好性''并不能解决技术缺陷，因为技术缺陷是模型的问题或者模型参数估计方法的问题。无论输入什么样的$\vectorn{s}$，模型3和模型4的技术缺陷问题都存在。
 %----------------------------------------------------------------------------------------
 %    NEW SUB-SECTION
@@ -447,7 +450,7 @@ p_0+p_1                            & = & 1 \label{eq:6-17}
 \subsection{句子长度}
-\parinterval 在IBM模型中，$\textrm{P}(\mathbf{t})\textrm{P}(\mathbf{s}| \mathbf{t})$会随着目标语言句子长度的增加而减少，因为这种模型有多个概率化的因素组成，乘积项越多结果的值越小。这也就是说，IBM模型会更倾向选择长度短一些的目标语言句子。显然这种对短句子的偏向性并不是机器翻译所期望的。
+\parinterval 在IBM模型中，$\funp{P}(\vectorn{t})\funp{P}(\vectorn{s}| \vectorn{t})$会随着目标语言句子长度的增加而减少，因为这种模型有多个概率化的因素组成，乘积项越多结果的值越小。这也就是说，IBM模型会更倾向选择长度短一些的目标语言句子。显然这种对短句子的偏向性并不是机器翻译所期望的。
 \parinterval 这个问题在很多机器翻译系统中都存在。它实际上也反应了一种{\small\bfnew{系统偏置}}\index{系统偏置}（System Bias）\index{System Bias}的体现。为了消除这种偏置，可以通过在模型中增加一个短句子惩罚引子来抵消掉模型对短句子的倾向性。比如，可以定义一个惩罚引子，它的值随着长度的减少而增加。不过，简单引入这样的惩罚因子会导致模型并不符合一个严格的噪声信道模型。它对应一个基于判别式框架的翻译模型，这部分内容会在{\chapterseven}进行介绍。
@@ -457,7 +460,7 @@ p_0+p_1                            & = & 1 \label{eq:6-17}
 \subsection{其他问题}
-\parinterval 模型5的意义是什么？模型5的提出是为了消除模型3和模型4的缺陷。缺陷的本质是，$\textrm{P}(\mathbf{s},\mathbf{a}| \mathbf{t})$在所有合理的对齐上概率和不为1。 但是，在这里更关心是哪个对齐$\mathbf{a}$使$\textrm{P}(\mathbf{s},\mathbf{a}| \mathbf{t})$达到最大，即使$\textrm{P}(\mathbf{s},\mathbf{a}|\mathbf{t})$不符合概率分布的定义，也并不影响我们寻找理想的对齐$\mathbf{a}$。从工程的角度说，$\textrm{P}(\mathbf{s},\mathbf{a}| \mathbf{t})$不归一并不是一个十分严重的问题。遗憾的是，实际上到现在为止有太多对IBM模型3和模型4中的缺陷进行过系统的实验和分析，但对于这个问题到底有多严重并没有定论。当然用模型5是可以解决这个问题。但是如果用一个非常复杂的模型去解决了一个并不产生严重后果的问题，那这个模型也就没有太大意义了（从实践的角度）。
+\parinterval 模型5的意义是什么？模型5的提出是为了消除模型3和模型4的缺陷。缺陷的本质是，$\funp{P}(\vectorn{s},\vectorn{a}| \vectorn{t})$在所有合理的对齐上概率和不为1。 但是，在这里更关心是哪个对齐$\vectorn{a}$使$\funp{P}(\vectorn{s},\vectorn{a}| \vectorn{t})$达到最大，即使$\funp{P}(\vectorn{s},\vectorn{a}|\vectorn{t})$不符合概率分布的定义，也并不影响我们寻找理想的对齐$\vectorn{a}$。从工程的角度说，$\funp{P}(\vectorn{s},\vectorn{a}| \vectorn{t})$不归一并不是一个十分严重的问题。遗憾的是，实际上到现在为止有太多对IBM模型3和模型4中的缺陷进行过系统的实验和分析，但对于这个问题到底有多严重并没有定论。当然用模型5是可以解决这个问题。但是如果用一个非常复杂的模型去解决了一个并不产生严重后果的问题，那这个模型也就没有太大意义了（从实践的角度）。
 \parinterval 概念（cept.）的意义是什么？经过前面的分析可知，IBM模型的词对齐模型使用了cept.这个概念。但是，在IBM模型中使用的cept.最多只能对应一个目标语言单词（模型并没有用到源语言cept. 的概念）。因此可以直接用单词代替cept.。这样，即使不引入cept.的概念，也并不影响IBM模型的建模。实际上，cept.的引入确实可以帮助我们从语法和语义的角度解释词对齐过程。不过，这个方法在IBM 模型中的效果究竟如何还没有定论。
@@ -468,17 +471,15 @@ p_0+p_1                            & = & 1 \label{eq:6-17}
 \sectionnewpage
 \section{小结及深入阅读}
-{\color{red}产出率需要增加}
+本章在IBM模型1的基础上进一步介绍了IBM模型2-5以及HMM模型。同时，本章引入了两个新的概念\ \dash\ 扭曲度和繁衍率。它们都是机器翻译中的经典概念，也经常出现在机器翻译的建模中。另一方面，通过对上述模型的分析，本章进一步探讨建模中的若干基础问题，例如，如何把翻译问题分解为若干步骤，并建立合理的模型解释这些步骤；如何对复杂问题进行化简，以得到可以计算的模型等等。这些思想也在很多自然语言处理问题中被使用。此外，关于扭曲度和繁衍率还有一些问题值得关注：
-\parinterval 本章对IBM系列模型进行了全面的介绍和讨论，从一个简单的基于单词的翻译模型开始，本章以建模、解码、训练多个维度对统计机器翻译进行了描述，期间也涉及了词对齐、优化等多个重要概念。IBM 模型共分为5个模型，对翻译问题的建模依次由浅入深，同时模型复杂度也依次增加。IBM模型作为入门统计机器翻译的``必经之路''，其思想对今天的机器翻译仍然产生着影响。虽然单独使用IBM模型进行机器翻译现在已经不多见，甚至很多从事神经机器翻译等前沿研究的人对IBM模型已经逐渐淡忘，但是不能否认IBM模型标志着一个时代的开始。从某种意义上，当使用公式$\hat{\mathbf{t}} = \argmax_{\mathbf{t}} \textrm{P}(\mathbf{t}|\mathbf{s})$描述机器翻译问题的时候，或多或少都在与IBM模型使用相似的思想。
-\parinterval 当然，本书也无法涵盖IBM模型的所有内涵，很多内容需要感兴趣的读者继续研究和挖掘，有两个方向可以考虑：
 \begin{itemize}
 \vspace{0.5em}
-\item IBM模型在提出后的十余年中，一直受到了学术界的关注。一个比较有代表性的成果是GIZA++（\url{https://github.com/moses-smt/giza-pp}），它集成了IBM模型和隐马尔可夫模型，并实现了这些模型的训练。在随后相当长的一段时间里，GIZA++也是机器翻译研究的标配，用于获得双语平行数据上单词一级的对齐结果。此外，研究者也对IBM模型进行了大量的分析，为后人研究统计机器翻译提供了大量依据\cite{och2004alignment}。虽然IBM模型很少被独立使用，甚至直接用基于IBM模型的解码器也不多见，但是它通常会作为其他模型的一部分参与到对翻译的建模中。这部分工作会在下一章基于短语和句法的模型中进行讨论\cite{koehn2003statistical}。此外，IBM模型也给机器翻译提供了一种非常简便的计算双语词串对应好坏的方式，因此也被广泛用于度量双语词串对应的强度，是自然语言处理中的一种常用特征。
+\item 扭曲度是机器翻译中的一个经典概念。广义上来说，事物位置的变换都可以用扭曲度进行描述，比如，在物理成像系统中，扭曲度模型可以帮助进行镜头校正\upcite{1966Decentering,ClausF05}。在机器翻译中，扭曲度本质上在描述源语言和目标源单词顺序的偏差。这种偏差可以用于对调序的建模。因此扭曲度的使用也可以被看做是一种对调序问题的描述，这也是机器翻译区别于语音识别等任务的主要因素之一。在早期的统计机器翻译系统中，如Pharaoh\upcite{DBLP:conf/amta/Koehn04}，大量使用了扭曲度这个概念。虽然，随着机器翻译的发展，更复杂的调序模型被提出\upcite{Gros2008MSD,xiong2006maximum,och2004alignment,DBLP:conf/naacl/KumarB05,li-etal-2014-neural,vaswani2017attention}，但是扭曲度所引发的对调序问题的思考是非常深刻的，这也是IBM模型最大的贡献之一。
 \vspace{0.5em}
-\item 除了在机器翻译建模上的开创性工作，IBM模型的另一项重要贡献是建立了统计词对齐的基础模型。在训练IBM模型的过程中，除了学习到模型参数，还可以得到双语数据上的词对齐结果。也就是说词对齐标注是IBM模型训练的间接产物。这也使得IBM模型成为了自动词对齐的重要方法。包括GIZA++在内的很多工作，实际上更多的是被用于自动词对齐任务，而非简单的训练IBM模型参数。随着词对齐概念的不断深入，这个任务逐渐成为了自然语言处理中的重要分支，比如，对IBM模型的结果进行对称化\cite{och2003systematic}，也可以直接使用判别式模型利用分类模型解决词对齐问题\cite{ittycheriah2005maximum}，甚至可以把对齐的思想用于短语和句法结构的双语对应\cite{xiao2013unsupervised}。除了GIZA++，研究人员也开发了很多优秀的自动词对齐工具，比如，FastAlign （\url{https://github.com/clab/fast_align}）、Berkeley Aligner（\url{https://github.com/mhajiloo/berkeleyaligner}）等，这些工具现在也有很广泛的应用。
+\item IBM模型的另一个贡献是在机器翻译中引入了繁衍率的概念。本质上，繁衍率是一种对翻译长度的建模。在IBM模型中，通过计算单词的繁衍率就可以得到整个句子的长度。需要注意的是，在机器翻译中译文长度对翻译性能有着至关重要的影响。虽然，在很多机器翻译模型中并没有直接使用繁衍率这个概念，但是几乎所有的现代机器翻译系统中都有译文长度的控制模块。比如，在统计机器翻译和神经机器翻译中，都把译文单词数量作为一个特征用于生成合理长度的译文\upcite{Koehn2007Moses,ChiangLMMRS05,bahdanau2014neural}。此外，在神经机器翻译中，非自回归的解码中也使用繁衍率模型对译文长度进行预测\ref{2018Non}。
 \vspace{0.5em}
 \end{itemize}

--- a/Chapter7/Figures/figure-basic-process-of-translation.tex
+++ b/Chapter7/Figures/figure-basic-process-of-translation.tex
@@ -4,12 +4,12 @@
 \begin{scope}[minimum height = 18pt]
-\node[anchor=east] (s0) at (-0.5em, 0) {$\textbf{s}$:};
+\node[anchor=east] (s0) at (-0.5em, 0) {$\seq{s}$：};
 \node[anchor=west,fill=gray!20] (s1) at (0, 0) {\footnotesize{桌子 上}};
 \node[anchor=west,fill=gray!20] (s2) at ([xshift=1em]s1.east) {\footnotesize{有}};
 \node[anchor=west,fill=gray!20] (s3) at ([xshift=1em]s2.east) {\footnotesize{一个 苹果}};
-\node[anchor=east] (t0) at (-0.5em, -1.5) {$\textbf{t}$:};
+\node[anchor=east] (t0) at (-0.5em, -1.5) {$\seq{t}$：};
 \node[anchor=north] (l) at ([xshift=7em,yshift=-0.5em]t0.south) {\footnotesize{(a)\ }};
 \end{scope}
@@ -18,12 +18,12 @@
 \begin{scope}[xshift=15em,minimum height = 18pt]
-\node[anchor=east] (s0) at (-0.5em, 0) {$\textbf{s}$:};
+\node[anchor=east] (s0) at (-0.5em, 0) {$\seq{s}$：};
 \node[anchor=west,fill=gray!20] (s1) at (0, 0) {\footnotesize{桌子 上}};
 \node[anchor=west,fill=red!20] (s2) at ([xshift=1em]s1.east) {\footnotesize{有}};
 \node[anchor=west,fill=gray!20] (s3) at ([xshift=1em]s2.east) {\footnotesize{一个 苹果}};
-\node[anchor=east] (t0) at (-0.5em, -1.5) {$\textbf{t}$:};
+\node[anchor=east] (t0) at (-0.5em, -1.5) {$\seq{t}$：};
 {
 \node[anchor=west,fill=red!20] (t1) at (0, -1.5) {\footnotesize{There is}};
 \path[<->, thick] (s2.south) edge (t1.north);
@@ -36,12 +36,12 @@
 \begin{scope}[yshift=-9.5em,minimum height = 18pt]
-\node[anchor=east] (s0) at (-0.5em, 0) {$\textbf{s}$:};
+\node[anchor=east] (s0) at (-0.5em, 0) {$\seq{s}$：};
 \node[anchor=west,fill=gray!20] (s1) at (0, 0) {\footnotesize{桌子 上}};
 \node[anchor=west,fill=gray!20] (s2) at ([xshift=1em]s1.east) {\footnotesize{有}};
 \node[anchor=west,fill=red!20] (s3) at ([xshift=1em]s2.east) {\footnotesize{一个 苹果}};
-\node[anchor=east] (t0) at (-0.5em, -1.5) {$\textbf{t}$:};
+\node[anchor=east] (t0) at (-0.5em, -1.5) {$\seq{t}$：};
 {
 \node[anchor=west,fill=gray!20] (t1) at (0, -1.5) {\footnotesize{There is}};
 \path[<->, thick] (s2.south) edge (t1.north);
@@ -58,12 +58,12 @@
 \begin{scope}[xshift=15em,yshift=-9.5em,minimum height = 18pt]%[scale=0.5]
-\node[anchor=east] (s0) at (-0.5em, 0) {$\textbf{s}$:};
+\node[anchor=east] (s0) at (-0.5em, 0) {$\seq{s}$：};
 \node[anchor=west,fill=red!20] (s1) at (0, 0) {\footnotesize{桌子 上}};
 \node[anchor=west,fill=gray!20] (s2) at ([xshift=1em]s1.east) {\footnotesize{有}};
 \node[anchor=west,fill=gray!20] (s3) at ([xshift=1em]s2.east) {\footnotesize{一个 苹果}};
-\node[anchor=east] (t0) at (-0.5em, -1.5) {$\textbf{t}$:};
+\node[anchor=east] (t0) at (-0.5em, -1.5) {$\seq{t}$：};
 {
 \node[anchor=west,fill=gray!20] (t1) at (0, -1.5) {\footnotesize{There is}};
 \path[<->, thick] (s2.south) edge (t1.north);

--- a/Chapter7/Figures/figure-derivation-consist-of-bilingual-phrase.tex
+++ b/Chapter7/Figures/figure-derivation-consist-of-bilingual-phrase.tex
@@ -19,8 +19,8 @@
 \path[<->, thick] (s3.south) edge (t3.north);
 }
-\node[anchor=south] (s0) at ([xshift=-2em,yshift=0em]s1.south) {\textbf{s:}};
+\node[anchor=south] (s0) at ([xshift=-2em,yshift=0em]s1.south) {$\seq{s}$：};
-\node[anchor=east] (t0) at ([xshift=0em,yshift=-3.5em]s0.east) {\textbf{t:}};
+\node[anchor=east] (t0) at ([xshift=0em,yshift=-3.5em]s0.east) {$\seq{t}$：};
 \node[anchor=south,inner sep=0pt,yshift=-0.3em] (sp1) at (s1.north) {\footnotesize{$\bar{s}_{a_1 = 1}$}};
 \node[anchor=south,inner sep=0pt,yshift=-0.3em] (sp2) at (s2.north) {\footnotesize{$\bar{s}_{a_2 = 2}$}};

--- a/Chapter7/Figures/figure-example-of-hypothesis-recombination.tex
+++ b/Chapter7/Figures/figure-example-of-hypothesis-recombination.tex
@@ -5,7 +5,7 @@
 {
 \node [anchor=north,inner sep=2pt,fill=red!20,minimum height=2em,minimum width=3em] (h0) at (0,0) {\small{null}};
 \node [anchor=north west,inner sep=1.0pt,fill=black] (hl0) at (h0.north west) {\scriptsize{{\color{white} \textbf{0}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt0) at (h0.east) {\footnotesize{{\color{white} \textbf{P=1}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt0) at (h0.east) {\footnotesize{{\color{white} \textbf{$\funp{P}$=1}}}};
 \node [anchor=west,inner sep=2pt,fill=red!20,minimum height=2em,minimum width=3em] (h2) at ([xshift=2.2em,yshift=3.5em]h0.east) {\small{an}};
 \node [anchor=west,inner sep=2pt,fill=red!20,minimum height=2em,minimum width=3em] (h3) at ([xshift=2.2em]h2.east) {\small{apple}};
@@ -13,8 +13,8 @@
 \node [anchor=north west,inner sep=1.0pt,fill=black] (hl2) at (h2.north west) {\scriptsize{{\color{white} \textbf{1}}}};
 \node [anchor=north west,inner sep=1.0pt,fill=black] (hl3) at (h3.north west) {\scriptsize{{\color{white} \textbf{2}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt2) at (h2.east) {\footnotesize{{\color{white} \textbf{P=.3}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt2) at (h2.east) {\footnotesize{{\color{white} \textbf{$\funp{P}$=.3}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt3) at (h3.east) {\footnotesize{{\color{white} \textbf{P=.5}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt3) at (h3.east) {\footnotesize{{\color{white} \textbf{$\funp{P}$=.5}}}};
 \draw [->,very thick,ublue] ([xshift=0.1em]pt0.south) -- ([xshift=-0.1em]h2.west);
 \draw [->,very thick,ublue] ([xshift=0.1em]pt2.south) -- ([xshift=-0.1em]h3.west);
@@ -22,12 +22,12 @@
 {
 \node [anchor=west,inner sep=2pt,fill=red!20,minimum height=2em,minimum width=3em] (h1) at ([xshift=7em]h0.east) {\small{an apple}};
 \node [anchor=north west,inner sep=1.0pt,fill=black] (hl1) at (h1.north west) {\scriptsize{{\color{white} \textbf{1-2}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt1) at (h1.east) {\footnotesize{{\color{white} \textbf{P=.5}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt1) at (h1.east) {\footnotesize{{\color{white} \textbf{$\funp{P}$=.5}}}};
 \draw [->,very thick,ublue] ([xshift=0.1em]pt0.south) -- ([xshift=-0.1em]h1.west);
 }
 }
 {
-\node [anchor=north west,inner sep=2pt,fill=red!20,minimum height=2em,minimum width=3em] (h4) at ([yshift=-7em]h0.south west) {\small{null}};
+\node [anchor=north west,inner sep=2pt,fill=red!20,minimum height=2em,minimum width=3em] (h4) at ([yshift=-9em]h0.south west) {\small{null}};
 \node [anchor=west,inner sep=2pt,fill=red!20,minimum height=2em,minimum width=3em] (h5) at ([xshift=2.2em]h4.east) {\small{he}};
 \node [anchor=west,inner sep=2pt,fill=red!20,minimum height=2em,minimum width=3em] (h6) at ([xshift=2.2em,yshift=3.5em]h4.east) {\small{it}};
 \node [anchor=west,inner sep=2pt,fill=red!20,minimum height=2em,minimum width=3em] (h8) at ([xshift=2.2em]h6.east) {\small{is not}};
@@ -37,10 +37,10 @@
 \node [anchor=north west,inner sep=1.0pt,fill=black] (hl5) at (h6.north west) {\scriptsize{{\color{white} \textbf{1}}}};
 \node [anchor=north west,inner sep=1.0pt,fill=black] (hl5) at (h8.north west) {\scriptsize{{\color{white} \textbf{2}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt4) at (h4.east) {\footnotesize{{\color{white} \textbf{P=1}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt4) at (h4.east) {\footnotesize{{\color{white} \textbf{$\funp{P}$=1}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt5) at (h5.east) {\footnotesize{{\color{white} \textbf{P=.3}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt5) at (h5.east) {\footnotesize{{\color{white} \textbf{$\funp{P}$=.3}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt6) at (h6.east) {\footnotesize{{\color{white} \textbf{P=.4}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt6) at (h6.east) {\footnotesize{{\color{white} \textbf{$\funp{P}$=.4}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt8) at (h8.east) {\footnotesize{{\color{white} \textbf{P=.2}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt8) at (h8.east) {\footnotesize{{\color{white} \textbf{$\funp{P}$=.2}}}};
 \draw [->,very thick,ublue] ([xshift=0.1em]pt4.south) -- ([xshift=-0.1em]h5.west);
 \draw [->,very thick,ublue] ([xshift=0.1em]pt4.south) -- ([xshift=-0.1em]h6.west);
@@ -48,15 +48,15 @@
 {
 \node [anchor=west,inner sep=2pt,fill=red!20,minimum height=2em,minimum width=3em] (h7) at ([xshift=2.2em]h5.east) {\small{is not}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt7) at (h7.east) {\footnotesize{{\color{white} \textbf{P=.2}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt7) at (h7.east) {\footnotesize{{\color{white} \textbf{$\funp{P}$=.2}}}};
 \node [anchor=north west,inner sep=1.0pt,fill=black] (hl5) at (h7.north west) {\scriptsize{{\color{white} \textbf{2}}}};
 \draw [->,very thick,ublue] ([xshift=0.1em]pt5.south) -- ([xshift=-0.1em]h7.west);
 }
 }
-\node[anchor=north] (l1) at ([xshift=6em,yshift=-1em]h0.south) {\scriptsize{(a)\ 原假设（译文相同时）}};
+\node[anchor=north] (l1) at ([xshift=5.5em,yshift=-1em]h0.south) {\scriptsize{原假设}};
-\node[anchor=north] (l2) at ([xshift=6em,yshift=-1em]h4.south) {\scriptsize{(c)\ 原假设（译文不同时）}};
+\node[anchor=north] (l2) at ([xshift=5.5em,yshift=-1em]h4.south) {\scriptsize{原假设}};
-%\node [anchor=west,inner sep=2pt,fill=red!20,minimum height=2em,minimum width=3em,opacity=0.7] (h1) at ([xshift=-1em,yshift=2em]h2.north) {原假设};
+\node[anchor=north] (part1) at ([xshift=16em,yshift=-2em]h0.south){\scriptsize{（a）译文相同时的假设重组}};
 \end{scope}
@@ -68,7 +68,7 @@
 {
 \node [anchor=north,inner sep=2pt,fill=red!20,minimum height=2em,minimum width=3em] (h0) at (0,0) {\small{null}};
 \node [anchor=north west,inner sep=1.0pt,fill=black] (hl0) at (h0.north west) {\scriptsize{{\color{white} \textbf{0}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt0) at (h0.east) {\footnotesize{{\color{white} \textbf{P=1}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt0) at (h0.east) {\footnotesize{{\color{white} \textbf{$\funp{P}$=1}}}};
 \node [anchor=west,inner sep=2pt,fill=red!20,minimum height=2em,minimum width=3em] (h2) at ([xshift=2.2em,yshift=3.5em]h0.east) {\small{an}};
 \node [anchor=west,inner sep=2pt,fill=red!20,minimum height=2em,minimum width=3em] (h3) at ([xshift=2.2em]h2.east) {\small{apple}};
@@ -76,8 +76,8 @@
 \node [anchor=north west,inner sep=1.0pt,fill=black] (hl2) at (h2.north west) {\scriptsize{{\color{white} \textbf{1}}}};
 \node [anchor=north west,inner sep=1.0pt,fill=black] (hl3) at (h3.north west) {\scriptsize{{\color{white} \textbf{2}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt2) at (h2.east) {\footnotesize{{\color{white} \textbf{P=.3}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt2) at (h2.east) {\footnotesize{{\color{white} \textbf{$\funp{P}$=.3}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt3) at (h3.east) {\footnotesize{{\color{white} \textbf{P=.5}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt3) at (h3.east) {\footnotesize{{\color{white} \textbf{$\funp{P}$=.5}}}};
 \draw [->,very thick,ublue] ([xshift=0.1em]pt0.south) -- ([xshift=-0.1em]h2.west);
 \draw [->,very thick,ublue] ([xshift=0.1em]pt2.south) -- ([xshift=-0.1em]h3.west);
@@ -87,7 +87,7 @@
 }
 }
 {
-\node [anchor=north west,inner sep=2pt,fill=red!20,minimum height=2em,minimum width=3em] (h4) at ([yshift=-7em]h0.south west) {\small{null}};
+\node [anchor=north west,inner sep=2pt,fill=red!20,minimum height=2em,minimum width=3em] (h4) at ([yshift=-9em]h0.south west) {\small{null}};
 \node [anchor=west,inner sep=2pt,fill=red!20,minimum height=2em,minimum width=3em] (h5) at ([xshift=2.2em]h4.east) {\small{he}};
 \node [anchor=west,inner sep=2pt,fill=red!20,minimum height=2em,minimum width=3em] (h6) at ([xshift=2.2em,yshift=3.5em]h4.east) {\small{it}};
 \node [anchor=west,inner sep=2pt,fill=red!20,minimum height=2em,minimum width=3em] (h8) at ([xshift=2.2em]h6.east) {\small{is not}};
@@ -97,10 +97,10 @@
 \node [anchor=north west,inner sep=1.0pt,fill=black] (hl5) at (h6.north west) {\scriptsize{{\color{white} \textbf{1}}}};
 \node [anchor=north west,inner sep=1.0pt,fill=black] (hl5) at (h8.north west) {\scriptsize{{\color{white} \textbf{2}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt4) at (h4.east) {\footnotesize{{\color{white} \textbf{P=1}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt4) at (h4.east) {\footnotesize{{\color{white} \textbf{$\funp{P}$=1}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt5) at (h5.east) {\footnotesize{{\color{white} \textbf{P=.3}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt5) at (h5.east) {\footnotesize{{\color{white} \textbf{$\funp{P}$=.3}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt6) at (h6.east) {\footnotesize{{\color{white} \textbf{P=.4}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt6) at (h6.east) {\footnotesize{{\color{white} \textbf{$\funp{P}$=.4}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt8) at (h8.east) {\footnotesize{{\color{white} \textbf{P=.2}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt8) at (h8.east) {\footnotesize{{\color{white} \textbf{$\funp{P}$=.2}}}};
 \draw [->,very thick,ublue] ([xshift=0.1em]pt4.south) -- ([xshift=-0.1em]h5.west);
 \draw [->,very thick,ublue] ([xshift=0.1em]pt4.south) -- ([xshift=-0.1em]h6.west);
@@ -111,17 +111,15 @@
 }
 }
 {
 {
 \node [anchor=west,inner sep=2pt,fill=red!20,minimum height=2em,minimum width=3em,opacity=0.3] (h1) at ([xshift=7em]h0.east) {\small{an apple}};
 \node [anchor=north west,inner sep=1.0pt,fill=black,opacity=0.3] (hl1) at (h1.north west) {\scriptsize{{\color{white} \textbf{1-2}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black,opacity=0.3] (pt1) at (h1.east) {\footnotesize{{\color{white} \textbf{P=.5}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black,opacity=0.3] (pt1) at (h1.east) {\footnotesize{{\color{white} \textbf{$\funp{P}$=.5}}}};
 }
 {
 \node [anchor=west,inner sep=2pt,fill=red!20,minimum height=2em,minimum width=3em,opacity=0.3] (h7) at ([xshift=2.2em]h5.east) {\small{is not}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black,opacity=0.3] (pt7) at (h7.east) {\footnotesize{{\color{white} \textbf{P=.2}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black,opacity=0.3] (pt7) at (h7.east) {\footnotesize{{\color{white} \textbf{$\funp{P}$=.2}}}};
 \node [anchor=north west,inner sep=1.0pt,fill=black,opacity=0.3] (hl5) at (h7.north west) {\scriptsize{{\color{white} \textbf{2}}}};
 }
 }
@@ -132,12 +130,10 @@
 \node [anchor=west] (l21) at ([xshift=0em, yshift=-1em]l2.west) {\footnotesize{较低假设}};
 %\node [anchor=west,inner sep=2pt,fill=red!20,minimum height=2em,minimum width=3em,opacity=0.7] (h1) at ([xshift=-1em,yshift=2em]h2.north) {重组假设};
-\node[anchor=north] (l1) at ([xshift=6em,yshift=-1em]h0.south) {\scriptsize{(c)\ 重组假设（译文相同时）}};
+\node[anchor=north] (l1) at ([xshift=7.5em,yshift=-1em]h0.south) {\scriptsize{重组假设}};
-\node[anchor=north] (l2) at ([xshift=6em,yshift=-1em]h4.south) {\scriptsize{(d)\ 重组假设（译文不同时）}};
+\node[anchor=north] (l2) at ([xshift=7.5em,yshift=-1em]h4.south) {\scriptsize{重组假设}};
+\node[anchor=north] (part2) at ([xshift=0em,yshift=-14em]h0.south){\scriptsize{（b）译文不同时的假设重组}};
 \end{scope}
 \end{tikzpicture}
\ No newline at end of file
--- a/Chapter7/Figures/figure-example-of-n-gram-1.tex
+++ b/Chapter7/Figures/figure-example-of-n-gram-1.tex
@@ -2,17 +2,17 @@
 %%% 引入短语翻译
 {\small
 \begin{tabular}{l | l}
-{\red{词串}}翻译表 & P \\ \hline
+{\red{词串}}翻译表 & $\funp{P}$ \\ \hline
 我 $\to$ I & 0.6 \\
 喜欢 $\to$ like & 0.3 \\
 红 $\to$ red & 0.8 \\
 红 $\to$ black & 0.1 \\
 茶 $\to$ tea & 0.8\\
-我 喜欢 $\to$ I like & 0.3\\
+我/喜欢 $\to$ I like & 0.3\\
-我 喜欢 $\to$ I liked & 0.2\\
+我/喜欢 $\to$ I liked & 0.2\\
-绿 茶 $\to$ green tea & 0.5\\
+绿/茶 $\to$ green tea & 0.5\\
-绿 茶 $\to$ the green tea & 0.1\\
+绿/茶 $\to$ the green tea & 0.1\\
-红 茶 $\to$ black tea & 0.7\\
+红/茶 $\to$ black tea & 0.7\\
 ... & 
 \end{tabular}
 }
--- a/Chapter7/Figures/figure-example-of-n-gram-2.tex
+++ b/Chapter7/Figures/figure-example-of-n-gram-2.tex
@@ -12,7 +12,7 @@
 \node [anchor=west] (s2) at ([xshift=1.0em]s1.east) {喜欢};
 \node [anchor=west] (s3) at ([xshift=1.0em]s2.east) {\red{红}};
 \node [anchor=west] (s4) at ([xshift=1.0em]s3.east) {茶};
-\node [anchor=east] (s) at (s1.west) {$\textbf{s}=$};
+\node [anchor=east] (s) at (s1.west) {$\seq{s}=$};
 }
 \end{scope}
@@ -22,7 +22,7 @@
 \node [anchor=west] (t2) at ([xshift=0.8em,yshift=-0.0em]t1.east) {like};
 \node [anchor=west] (t3) at ([xshift=0.6em,yshift=-0.0em]t2.east) {red};
 \node [anchor=west] (t4) at ([xshift=1.15em,yshift=-0.1em]t3.east) {tea};
-\node [anchor=east] (t) at ([xshift=-0.2em]t1.west) {$\textbf{t}=$};
+\node [anchor=east] (t) at ([xshift=-0.2em]t1.west) {$\seq{t}=$};
 }
 \end{scope}
@@ -44,7 +44,7 @@
 \node [anchor=west] (s2) at ([xshift=1.0em]s1.east) {喜欢};
 \node [anchor=west] (s3) at ([xshift=1.0em]s2.east) {\red{红}};
 \node [anchor=west] (s4) at ([xshift=1.0em]s3.east) {茶};
-\node [anchor=east] (s) at (s1.west) {$\textbf{s}=$};
+\node [anchor=east] (s) at (s1.west) {$\seq{s}=$};
 }
 \end{scope}
@@ -54,7 +54,7 @@
 \node [anchor=west] (t2) at ([xshift=0.8em,yshift=-0.0em]t1.east) {like};
 \node [anchor=west] (t3) at ([xshift=0.6em,yshift=-0.0em]t2.east) {black};
 \node [anchor=west] (t4) at ([xshift=1.0em,yshift=-0.1em]t3.east) {tea};
-\node [anchor=east] (t) at ([xshift=-0.2em]t1.west) {$\textbf{t}=$};
+\node [anchor=east] (t) at ([xshift=-0.2em]t1.west) {$\seq{t}=$};
 }
 \end{scope}

--- a/Chapter7/Figures/figure-example-of-stack-decode.tex
+++ b/Chapter7/Figures/figure-example-of-stack-decode.tex
@@ -6,7 +6,7 @@
 {
 \node [anchor=north,inner sep=2pt,fill=red!20,minimum height=2em,minimum width=3em] (h0) at (0,0) {\scriptsize{null}};
 \node [anchor=north west,inner sep=1.5pt,fill=black] (hl0) at (h0.north west) {\scriptsize{{\color{white} \textbf{0}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt0) at (h0.east) {\scriptsize{{\color{white} \textbf{P=1}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt0) at (h0.east) {\scriptsize{{\color{white} \textbf{$\funp{P}$=1}}}};
 }
 {
 \node [anchor=west,inner sep=2pt,fill=red!20,minimum height=2em,minimum width=3em] (h13) at ([xshift=2.1em,yshift=6em]h0.east) {\scriptsize{there is}};
@@ -17,8 +17,8 @@
 \node [anchor=north west,inner sep=1.0pt,fill=black] (hl3) at (h13.north west) {\scriptsize{{\color{white} \textbf{3}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt1) at (h1.east) {\scriptsize{{\color{white} \textbf{P=.2}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt1) at (h1.east) {\scriptsize{{\color{white} \textbf{$\funp{P}$=.2}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt3) at (h13.east) {\scriptsize{{\color{white} \textbf{P=.5}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt3) at (h13.east) {\scriptsize{{\color{white} \textbf{$\funp{P}$=.5}}}};
 \node [anchor=west,inner sep=2pt,fill=red!20,minimum height=2em,minimum width=3em] (h2) at ([xshift=2.1em]h1.east) {\scriptsize{have}};
 \node [anchor=west,inner sep=2pt,minimum height=2em,minimum width=3em] (h22) at ([xshift=2.1em]h12.east) {\small{\textbf{...}}};
@@ -32,15 +32,15 @@
 \node [anchor=north west,inner sep=1.0pt,fill=black] (hl3) at (h3.north west) {\scriptsize{{\color{white} \textbf{2}}}};
 \node [anchor=north west,inner sep=1.0pt,fill=black] (hl33) at (h33.north west) {\scriptsize{{\color{white} \textbf{4-5}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt2) at (h2.east) {\scriptsize{{\color{white} \textbf{P=.5}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt2) at (h2.east) {\scriptsize{{\color{white} \textbf{$\funp{P}$=.5}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt23) at (h23.east) {\scriptsize{{\color{white} \textbf{P=.5}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt23) at (h23.east) {\scriptsize{{\color{white} \textbf{$\funp{P}$=.5}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt3) at (h3.east) {\scriptsize{{\color{white} \textbf{P=.5}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt3) at (h3.east) {\scriptsize{{\color{white} \textbf{$\funp{P}$=.5}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt33) at (h33.east) {\scriptsize{{\color{white} \textbf{P=.5}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt33) at (h33.east) {\scriptsize{{\color{white} \textbf{$\funp{P}$=.5}}}};
 }
 \node [anchor=north] (l0) at ([xshift=0.2em,yshift=-0.7em]h0.south) {\small{\textbf{未译词}}};
-\node [anchor=north] (l1) at ([xshift=0.3em,yshift=-0.7em]h1.south) {\small{\textbf{已译1词}}};
+\node [anchor=north] (l1) at ([xshift=0.3em,yshift=-0.7em]h1.south) {\small{\textbf{已译}1\textbf{词}}};
-\node [anchor=north] (l2) at ([xshift=0.3em,yshift=-0.7em]h2.south) {\small{\textbf{已译2词}}};
+\node [anchor=north] (l2) at ([xshift=0.3em,yshift=-0.7em]h2.south) {\small{\textbf{已译}2\textbf{词}}};
-\node [anchor=north] (l3) at ([xshift=0.3em,yshift=-0.7em]h3.south) {\small{\textbf{已译3词}}};
+\node [anchor=north] (l3) at ([xshift=0.3em,yshift=-0.7em]h3.south) {\small{\textbf{已译}3\textbf{词}}};
 \begin{pgfonlayer}{background}
 \node [rectangle,inner sep=0.3em,fill=blue!10] [fit = (h0) (pt0)] (box0) {};

--- a/Chapter7/Figures/figure-example-of-translation-base-word-1.tex
+++ b/Chapter7/Figures/figure-example-of-translation-base-word-1.tex
@@ -4,7 +4,7 @@
 {\small
 \begin{tabular}{l | l}
-单词翻译表 & P \\ \hline
+单词翻译表 & $\funp{P}$ \\ \hline
 我 $\to$ I & 0.6 \\
 喜欢 $\to$ like & 0.3 \\
 绿 $\to$ green & 0.9 \\

--- a/Chapter7/Figures/figure-example-of-translation-base-word-2.tex
+++ b/Chapter7/Figures/figure-example-of-translation-base-word-2.tex
@@ -7,7 +7,7 @@
 \node [anchor=west] (s2) at ([xshift=1.0em]s1.east) {喜欢};
 \node [anchor=west] (s3) at ([xshift=1.0em]s2.east) {{\color{ugreen} 绿}};
 \node [anchor=west] (s4) at ([xshift=1.07em]s3.east) {茶};
-\node [anchor=east] (s) at (s1.west) {$\textbf{s}=$};
+\node [anchor=east] (s) at (s1.west) {$\seq{s}=$};
 }
 \end{scope}
@@ -18,7 +18,7 @@
 \node [anchor=west] (t2) at ([xshift=0.8em,yshift=-0.0em]t1.east) {like};
 \node [anchor=west] (t3) at ([xshift=1.0em,yshift=-0.2em]t2.east) {green};
 \node [anchor=west] (t4) at ([xshift=0.78em,yshift=0.1em]t3.east) {tea};
-\node [anchor=east] (t) at ([xshift=-0.3em]t1.west) {$\textbf{t}=$};
+\node [anchor=east] (t) at ([xshift=-0.3em]t1.west) {$\seq{t}=$};
 }
 \end{scope}

--- a/Chapter7/Figures/figure-example-of-translation-black-tea-1.tex
+++ b/Chapter7/Figures/figure-example-of-translation-black-tea-1.tex
@@ -2,7 +2,7 @@
 %%% 基于单词的模型的问题
 {\small
 \begin{tabular}{l | l}
-单词翻译表 & P \\ \hline
+单词翻译表 & $\funp{P}$ \\ \hline
 我 $\to$ I & 0.6 \\
 喜欢 $\to$ like & 0.3 \\
 红 $\to$ red & 0.8 \\

--- a/Chapter7/Figures/figure-example-of-translation-black-tea-2.tex
+++ b/Chapter7/Figures/figure-example-of-translation-black-tea-2.tex
@@ -10,7 +10,7 @@
 \node [anchor=west] (s2) at ([xshift=1.0em]s1.east) {喜欢};
 \node [anchor=west] (s3) at ([xshift=1.0em]s2.east) {\red{红}};
 \node [anchor=west] (s4) at ([xshift=1.0em]s3.east) {茶};
-\node [anchor=east] (s) at (s1.west) {$\textbf{s}=$};
+\node [anchor=east] (s) at (s1.west) {$\seq{s}=$};
 }
 \end{scope}
@@ -21,7 +21,7 @@
 \node [anchor=west] (t2) at ([xshift=0.8em,yshift=-0.0em]t1.east) {like};
 \node [anchor=west] (t3) at ([xshift=1.0em,yshift=-0.0em]t2.east) {red};
 \node [anchor=west] (t4) at ([xshift=1.0em,yshift=-0.1em]t3.east) {tea};
-\node [anchor=east] (t) at ([xshift=-0.3em]t1.west) {$\textbf{t}=$};
+\node [anchor=east] (t) at ([xshift=-0.3em]t1.west) {$\seq{t}=$};
 }
 \end{scope}
@@ -34,7 +34,7 @@
 \begin{pgfonlayer}{background}
 {
 \node [rectangle,draw,thick,inner sep=0.2em,fill=white,drop shadow] [fit = (t3) (t4)] (problemphrase) {};
-\node [anchor=north,text width=8em,align=left] (problemlabel) at (problemphrase.south) {\begin{spacing}{0.8}\scriptsize{``红 茶''为一种搭配，应该翻译为``black tea''}\end{spacing}};
+\node [anchor=north,text width=8em,align=left] (problemlabel) at (problemphrase.south) {\begin{spacing}{0.8}\scriptsize{“红 茶”为一种搭配，应该翻译为“black tea”}\end{spacing}};
 }
 \end{pgfonlayer}

--- a/Chapter7/Figures/figure-example-of-vocabulary-translation-probability.tex
+++ b/Chapter7/Figures/figure-example-of-vocabulary-translation-probability.tex
@@ -39,10 +39,10 @@
 \node[align=center,elementnode,minimum size=0.3cm,inner sep=0.1pt,fill=blue!50] (la4) at (a41) {};
 \node[align=center,elementnode,minimum size=0.3cm,inner sep=0.1pt,fill=blue!50] (la5) at (a30) {};
-\node[anchor=west] (f1) at ([xshift=3em,yshift=0.8em]a43.east) {\small{$\textrm{P}_{\textrm{lex}}(\bar{t}|\bar{s})=w(t_1|s_1)\times$}};
+\node[anchor=west] (f1) at ([xshift=3em,yshift=0.8em]a43.east) {\small{$\funp{P}_{\textrm{lex}}(\bar{t}|\bar{s})=\sigma (t_1|s_1)\times$}};
-\node[anchor=north] (f2) at ([xshift=5.2em]f1.south) {\small{$\frac{1}{2}(w(t_2|s_2)+w(t_4|s_2))\times$}};
+\node[anchor=north] (f2) at ([xshift=5.2em]f1.south) {\small{$\frac{1}{2}(\sigma (t_2|s_2)+\sigma (t_4|s_2))\times$}};
-\node[anchor=north west] (f3) at (f2.south west) {\small{$w(N|s_3)\times$}};
+\node[anchor=north west] (f3) at (f2.south west) {\small{$\sigma (N|s_3)\times$}};
-\node[anchor=north west] (f4) at (f3.south west) {\small{$w(t_4|s_4)\times$}};
+\node[anchor=north west] (f4) at (f3.south west) {\small{$\sigma (t_4|s_4)\times$}};
 \end{scope}

--- a/Chapter7/Figures/figure-example-of-zh2en-translation-base-phrase.tex
+++ b/Chapter7/Figures/figure-example-of-zh2en-translation-base-phrase.tex
@@ -7,20 +7,20 @@
 {\small
 \node[anchor=north,fill=green!20] (s1) at (0,0) {进口};
-\node [anchor=north,fill=red!20] (s2) at ([xshift=4em,yshift=0em]s1.north) {大幅度};
+\node [anchor=west,fill=red!20] (s2) at ([xshift=1em,yshift=0em]s1.east) {大幅度};
-\node[anchor=north,fill=blue!20] (s3) at ([xshift=4.5em,yshift=0em]s2.north) {下降 了};
+\node[anchor=west,fill=blue!20] (s3) at ([xshift=1em,yshift=0em]s2.east) {下降\ \ \ 了};
 \node[anchor=west,fill=green!20] (t1) at ([xshift=0em,yshift=-4em]s1.west) {The imports have};
-\node[anchor=north,fill=red!20] (t2) at ([xshift=8em,yshift=0em]t1.north) {drastically};
+\node[anchor=west,fill=red!20] (t2) at ([xshift=1em,yshift=0em]t1.east) {drastically};
-\node[anchor=north,fill=blue!20] (t3) at ([xshift=5.7em,yshift=0em]t2.north) {fallen};
+\node[anchor=west,fill=blue!20] (t3) at ([xshift=1em,yshift=0em]t2.east) {fallen};
 \path[<->, thick] (s1.south) edge (t1.north);
 \path[<->, thick] (s2.south) edge (t2.north);
 \path[<->, thick] (s3.south) edge (t3.north);
 }
-\node[anchor=south] (s0) at ([xshift=-3em,yshift=0em]s1.south) {源语言:};
+\node[anchor=south] (s0) at ([xshift=-3em,yshift=0em]s1.south) {源语言：};
-\node[anchor=east] (t0) at ([xshift=0em,yshift=-3.5em]s0.east) {目标语言:};
+\node[anchor=east] (t0) at ([xshift=0em,yshift=-3.5em]s0.east) {目标语言：};
 \end{scope}
 \end{tikzpicture}

--- a/Chapter7/Figures/figure-function-image-about-weight-and-Bleu-2.tex
+++ b/Chapter7/Figures/figure-function-image-about-weight-and-Bleu-2.tex
-%%%------------------------------------------------------------------------------------------------------------
-%%% 特征权重调优
-\begin{tikzpicture}
-\begin{scope}
-\node[anchor=west] (x0) at (0, 0) {};
-\draw[->,thick] (x0.center) -- ([xshift=8.2em]x0.east);
-\draw[->,thick] (x0.center) -- ([yshift=5.6em]x0.center);
-\node[anchor=north] (zero) at ([yshift=0.1em]x0.south) {\small{0}};
-\node[anchor=north] (wx) at ([xshift=4em,yshift=0.1em]x0.south) {\small{$w_x$}};
-\node[anchor=north] (wi) at ([xshift=8em,yshift=0.1em]x0.south) {\small{$w_i$}};
-{
-\draw[thick] ([yshift=2em]x0.center) -- ([xshift=4em,yshift=2em]x0.center);
-\draw[thick] ([xshift=4em,yshift=4em]x0.center) -- ([xshift=8em,yshift=4em]x0.center);
-\draw[thick,dotted] ([xshift=4em]x0.center) -- ([xshift=4em,yshift=5.5em]x0.center);
-\node[anchor=north] (e1) at ([xshift=2em,yshift=3em]x0.north) {\small{$d^*=d_1$}};
-\node[anchor=north] (e2) at ([xshift=6.2em,yshift=5em]x0.north) {\small{$d^*=d_2$}};
-\node[anchor=north,rotate=90] (e2) at ([xshift=-1.3em,yshift=3.6em]x0.south) {\small{BLEU}};
-\draw[decorate,decoration={brace,amplitude=0.4em},red,thick] ([xshift=4em,yshift=0.5em]x0.south) -- ([xshift=8.2em,yshift=0.5em]x0.south);
-\node[anchor=north] (wi) at ([xshift=6.1em,yshift=2.1em]x0.south) {\footnotesize{\red{挑选$w_i$}}};
-}
-\end{scope}
-\end{tikzpicture}
--- a/Chapter7/Figures/figure-function-image-about-weight-and-Bleu-1.tex
+++ b/Chapter7/Figures/figure-function-image-about-weight-and-Bleu-1.tex
@@ -12,17 +12,42 @@
 \draw[thick] ([yshift=2em]x0.center) -- ([xshift=8em,yshift=4em]x0.center);
 \node[anchor=north] (e1) at ([xshift=6em,yshift=6em]x0.south) {\small{$d_1$}};
 \node[anchor=north] (e2) at ([xshift=7em,yshift=4em]x0.south) {\small{$d_2$}};
-\node[anchor=north,rotate=90] (e2) at ([xshift=-1.3em,yshift=3.6em]x0.south) {\small{model score}};
+\node[anchor=north,rotate=90] (e2) at ([xshift=-1.3em,yshift=4em]x0.south) {\small{score}};
 }
 {
 \node [anchor=center,draw=red,circle,inner sep=2pt,thick] (x1) at ([xshift=4em,yshift=3em]x0.center) {};
-\draw[thick,dotted] ([xshift=4em]x0.center) -- ([xshift=4em,yshift=3em]x0.center);
+\draw[thick,dotted] ([xshift=4em]x0.center) -- ([xshift=4em,yshift=3.6em]x0.center);
 }
 \node[anchor=north] (zero) at ([yshift=0.1em]x0.south) {\small{0}};
-\node[anchor=north] (wx) at ([xshift=4em,yshift=0.1em]x0.south) {\small{$w_x$}};
+\node[anchor=north] (wx) at ([xshift=4em,yshift=0.1em]x0.south) {\small{$\lambda_x$}};
-\node[anchor=north] (wi) at ([xshift=8em,yshift=0.1em]x0.south) {\small{$w_i$}};
+\node[anchor=north] (wi) at ([xshift=8em,yshift=0.1em]x0.south) {\small{$\lambda_i$}};
 \end{scope}
+\begin{scope}[xshift=1.7in]
+\node[anchor=west] (x0) at (0, 0) {};
+\draw[->,thick] (x0.center) -- ([xshift=8.2em]x0.east);
+\draw[->,thick] (x0.center) -- ([yshift=5.6em]x0.center);
+\node[anchor=north] (zero) at ([yshift=0.1em]x0.south) {\small{0}};
+\node[anchor=north] (wx) at ([xshift=4em,yshift=0.1em]x0.south) {\small{$\lambda_x$}};
+\node[anchor=north] (wi) at ([xshift=8em,yshift=0.1em]x0.south) {\small{$\lambda_i$}};
+{
+\draw[thick] ([yshift=2em]x0.center) -- ([xshift=4em,yshift=2em]x0.center);
+\draw[thick] ([xshift=4em,yshift=4em]x0.center) -- ([xshift=8em,yshift=4em]x0.center);
+\draw[thick,dotted] ([xshift=4em]x0.center) -- ([xshift=4em,yshift=5.5em]x0.center);
+\node[anchor=north] (e1) at ([xshift=2em,yshift=3em]x0.north) {\small{$\hat{d}=d_1$}};
+\node[anchor=north] (e2) at ([xshift=6.2em,yshift=5em]x0.north) {\small{$\hat{d}=d_2$}};
+\node[anchor=north,rotate=90] (e2) at ([xshift=-1.3em,yshift=4em]x0.south) {\small{BLEU}};
+\draw[decorate,decoration={brace,amplitude=0.4em},red,thick] ([xshift=4em,yshift=0.5em]x0.south) -- ([xshift=8.2em,yshift=0.5em]x0.south);
+\node[anchor=north] (wi) at ([xshift=6.1em,yshift=2.1em]x0.south) {\footnotesize{\red{挑选$\hat{\lambda}_i$}}};
+}
+\end{scope}
 \end{tikzpicture}
--- a/Chapter7/Figures/figure-grid-search-2.tex
+++ b/Chapter7/Figures/figure-grid-search-2.tex
-\begin{tikzpicture}
-\begin{scope}[scale=0.62] 
-{\tiny
-\draw[step=1,help lines,color=black] (0,0) grid (4,4); 
-\node[anchor=north] (y2) at ([xshift=-3.3em,yshift=0em]n1.north) {0.01};
-\node[anchor=north] (y1) at ([xshift=0em,yshift=-3.3em]y2.south) {0.00};
-\node[anchor=north] (y3) at ([xshift=0em,yshift=4.5em]y2.north) {0.02};
-\node[anchor=north] (y4) at ([xshift=0em,yshift=6.6em]y3.north) {$\vdots$};
-\node[anchor=north] (y5) at ([xshift=0em,yshift=2em]y4.north) {1.00};
-\node[anchor=north] (x1) at ([xshift=2em,yshift=-3em]n1.south) {$\lambda_1$};
-\node[anchor=north] (x2) at ([xshift=4.5em,yshift=0em]x1.north) {$\lambda_2$};
-\node[anchor=north] (x3) at ([xshift=4em,yshift=-1em]x2.north) {$...$};
-\node[anchor=north] (x4) at ([xshift=5em,yshift=1em]x3.north) {$\lambda_{M-1}$};
-\node[anchor=north] (x5) at ([xshift=5em,yshift=0em]x4.north) {$\lambda_M$};
-\draw [-](n1) (0,4) -- (0,4.4);
-\draw [-](n2) (1,4) -- (1,4.4);
-\draw [-](n3) (2,4) -- (2,4.4);
-\draw [-](n4) (3,4) -- (3,4.4);
-\draw [-](n5) (4,4) -- (4,4.4);
-\node [anchor=center,draw,circle,inner sep=1.5pt,red!30,fill=red!30] (r31) at (2,4) {};
-\node [anchor=center,draw,circle,inner sep=1.5pt,red!30,fill=red!30] (r32) at (2,0) {};
-\node [anchor=center,draw,circle,inner sep=1.5pt,red!30,fill=red!30] (r33) at (2,2) {};
-\node [anchor=center,draw,circle,inner sep=1.5pt,red!30,fill=red!30] (r35) at (2,1) {};
-\node [anchor=center,draw,circle,inner sep=1.5pt,ugreen!50,fill=ugreen!50] (r34) at (2,3) {};
-\draw [-,very thick,red!50, dashed] (1,2) -- (2,4) -- (3,2) -- (2,3) -- (1,2) -- (3,2) -- (2,1) -- (1,2) -- (2,0) -- (3,2);
-\draw [-,very thick,blue!50] (0,1) -- (1,2);
-\draw [-,very thick,blue!50] (3,2) -- (4,4);
-\draw [-,very thick,ugreen!50, dashed] (1,2) -- (2,3) -- (3,2);
-\node [anchor=center,draw,circle,inner sep=1.5pt,blue!30,fill=blue!30] (r11) at (0,1) {};
-\node [anchor=center,draw,circle,inner sep=1.5pt,blue!30,fill=blue!30] (r12) at (1,2) {};
-\node [anchor=center,draw,circle,inner sep=1.5pt,blue!30,fill=blue!30] (r14) at (3,2) {};
-\node [anchor=center,draw,circle,inner sep=1.5pt,blue!30,fill=blue!30] (r15) at (4,4) {};
-}
-\end{scope}
-\end{tikzpicture}
\ No newline at end of file
--- a/Chapter7/Figures/figure-grid-search-1.tex
+++ b/Chapter7/Figures/figure-grid-search-1.tex
@@ -3,13 +3,13 @@
 {\tiny
 \draw[step=1,help lines,color=black] (0,0) grid (4,4); 
-\node[anchor=north] (y2) at ([xshift=-3.3em,yshift=0em]n1.north) {0.01};
+\node[anchor=north] (y2) at (-5.3em,1.5) {0.01};
 \node[anchor=north] (y1) at ([xshift=0em,yshift=-3.3em]y2.south) {0.00};
 \node[anchor=north] (y3) at ([xshift=0em,yshift=4.5em]y2.north) {0.02};
 \node[anchor=north] (y4) at ([xshift=0em,yshift=6.6em]y3.north) {$\vdots$};
 \node[anchor=north] (y5) at ([xshift=0em,yshift=2em]y4.north) {1.00};
-\node[anchor=north] (x1) at ([xshift=2em,yshift=-3em]n1.south) {$\lambda_1$};
+\node[anchor=north] (x1) at (1em,-3em) {$\lambda_1$};
 \node[anchor=north] (x2) at ([xshift=4.5em,yshift=0em]x1.north) {$\lambda_2$};
 \node[anchor=north] (x3) at ([xshift=4em,yshift=-1em]x2.north) {$...$};
 \node[anchor=north] (x4) at ([xshift=5em,yshift=1em]x3.north) {$\lambda_{M-1}$};
@@ -28,11 +28,11 @@
 \node [anchor=center,draw,circle,inner sep=1.5pt,red!30,fill=red!30] (r35) at (2,1) {};
 \node [anchor=center,draw,circle,inner sep=1.5pt,blue!30,fill=blue!30] (f11) at ([xshift=0em,yshift=23em]y2.north) {};
-\node[anchor=south] (f12) at ([xshift=5em,yshift=-0.5em]f11.south) {\scriptsize{fixed}};
+\node[anchor=south] (f12) at ([xshift=8.5em,yshift=-1em]f11.south) {\scriptsize{固定的权重}};
 \node [anchor=center,draw,circle,inner sep=1.5pt,ugreen!50,fill=ugreen!50] (f21) at ([xshift=0em,yshift=-4em]f11.north) {};
-\node[anchor=south] (f22) at ([xshift=8.5em,yshift=-0.5em]f21.south) {\scriptsize{valid choices}};
+\node[anchor=south] (f22) at ([xshift=8.5em,yshift=-1em]f21.south) {\scriptsize{有效取值点}};
 \node [anchor=center,draw,circle,inner sep=1.5pt,red!30,fill=red!30] (f31) at ([xshift=0em,yshift=-4em]f21.north) {};
-\node[anchor=south] (f32) at ([xshift=9.5em,yshift=-0.5em]f31.south) {\scriptsize{invalid choices}};
+\node[anchor=south] (f32) at ([xshift=8.5em,yshift=-1em]f31.south) {\scriptsize{无效取值点}};
 \draw [-,very thick,red!50, dashed] (1,2) -- (2,4) -- (3,2) -- (2,3) -- (1,2) -- (3,2) -- (2,1) -- (1,2) -- (2,0) -- (3,2);
 \draw [-,very thick,blue!50] (0,1) -- (1,2);
@@ -44,4 +44,45 @@
 \node [anchor=center,draw,circle,inner sep=1.5pt,blue!30,fill=blue!30] (r15) at (4,4) {};
 }
 \end{scope}
+\begin{scope}[scale=0.62,xshift=3in] 
+{\tiny
+\draw[step=1,help lines,color=black] (0,0) grid (4,4); 
+\node[anchor=north] (y2) at (-5.3em,1.5) {0.01};
+\node[anchor=north] (y1) at ([xshift=0em,yshift=-3.3em]y2.south) {0.00};
+\node[anchor=north] (y3) at ([xshift=0em,yshift=4.5em]y2.north) {0.02};
+\node[anchor=north] (y4) at ([xshift=0em,yshift=6.6em]y3.north) {$\vdots$};
+\node[anchor=north] (y5) at ([xshift=0em,yshift=2em]y4.north) {1.00};
+\node[anchor=north] (x1) at (1em,-3em) {$\lambda_1$};
+\node[anchor=north] (x2) at ([xshift=4.5em,yshift=0em]x1.north) {$\lambda_2$};
+\node[anchor=north] (x3) at ([xshift=4em,yshift=-1em]x2.north) {$...$};
+\node[anchor=north] (x4) at ([xshift=5em,yshift=1em]x3.north) {$\lambda_{M-1}$};
+\node[anchor=north] (x5) at ([xshift=5em,yshift=0em]x4.north) {$\lambda_M$};
+\draw [-](n1) (0,4) -- (0,4.4);
+\draw [-](n2) (1,4) -- (1,4.4);
+\draw [-](n3) (2,4) -- (2,4.4);
+\draw [-](n4) (3,4) -- (3,4.4);
+\draw [-](n5) (4,4) -- (4,4.4);
+\node [anchor=center,draw,circle,inner sep=1.5pt,red!30,fill=red!30] (r31) at (2,4) {};
+\node [anchor=center,draw,circle,inner sep=1.5pt,red!30,fill=red!30] (r32) at (2,0) {};
+\node [anchor=center,draw,circle,inner sep=1.5pt,red!30,fill=red!30] (r33) at (2,2) {};
+\node [anchor=center,draw,circle,inner sep=1.5pt,red!30,fill=red!30] (r35) at (2,1) {};
+\node [anchor=center,draw,circle,inner sep=1.5pt,ugreen!50,fill=ugreen!50] (r34) at (2,3) {};
+\draw [-,very thick,red!50, dashed] (1,2) -- (2,4) -- (3,2) -- (2,3) -- (1,2) -- (3,2) -- (2,1) -- (1,2) -- (2,0) -- (3,2);
+\draw [-,very thick,blue!50] (0,1) -- (1,2);
+\draw [-,very thick,blue!50] (3,2) -- (4,4);
+\draw [-,very thick,ugreen!50, dashed] (1,2) -- (2,3) -- (3,2);
+\node [anchor=center,draw,circle,inner sep=1.5pt,blue!30,fill=blue!30] (r11) at (0,1) {};
+\node [anchor=center,draw,circle,inner sep=1.5pt,blue!30,fill=blue!30] (r12) at (1,2) {};
+\node [anchor=center,draw,circle,inner sep=1.5pt,blue!30,fill=blue!30] (r14) at (3,2) {};
+\node [anchor=center,draw,circle,inner sep=1.5pt,blue!30,fill=blue!30] (r15) at (4,4) {};
+}
+\end{scope}
 \end{tikzpicture}
\ No newline at end of file
--- a/Chapter7/Figures/figure-phrase-extraction-consistent-with-word-alignment-1.tex
+++ b/Chapter7/Figures/figure-phrase-extraction-consistent-with-word-alignment-1.tex
@@ -62,7 +62,7 @@
 \begin{scope}[xshift = 1.5in, yshift = 1.3in]
 {\scriptsize
-\node (rules) {\textbf{抽取得到的短语:}};
+\node (rules) {\textbf{抽取得到的短语：}};
 \draw[-] (rules.south west)--([xshift=2.0in]rules.south west);
 {

--- a/Chapter7/Figures/figure-phrase-extraction-consistent-with-word-alignment.tex
+++ b/Chapter7/Figures/figure-phrase-extraction-consistent-with-word-alignment.tex
@@ -50,11 +50,11 @@
 {
-\node [anchor=west] (p1line1) at ([xshift=4em,yshift=1em]a75.east) {\footnotesize{$\bar{s}_i$: 天气\ \ \ \ \ \ }};
+\node [anchor=west] (p1line1) at ([xshift=4em,yshift=1em]a75.east) {\footnotesize{$\bar{s}_i$： 天气\ \ \ \ \ \ }};
-\node [anchor=north west] (p1line2) at ([xshift=0]p1line1.south west) {\footnotesize{$\bar{t}_i$: The\ \ \ weather\ \ \ \ \ }};
+\node [anchor=north west] (p1line2) at ([xshift=0]p1line1.south west) {\footnotesize{$\bar{t}_i$： The\ \ \ weather\ \ \ \ \ }};
-\node [anchor=west] (p2line1) at ([xshift=4em]a72.east) {\footnotesize{$\bar{s}_j$: 真\ \ \ 好 \ \ }};
+\node [anchor=west] (p2line1) at ([xshift=4em]a72.east) {\footnotesize{$\bar{s}_j$： 真\ \ \ 好 \ \ }};
-\node [anchor=north west] (p2line2) at ([xshift=0]p2line1.south west) {\footnotesize{$\bar{t}_j$: very\ \ \ good\ \ \ \ \ \ \ \ }};
+\node [anchor=north west] (p2line2) at ([xshift=0]p2line1.south west) {\footnotesize{$\bar{t}_j$： very\ \ \ good\ \ \ \ \ \ \ \ }};
 \node [anchor=east] (p2line3) at ([xshift=0em,yshift=-4em]p1line2.east) {};
 \begin{pgfonlayer}{background}

--- a/Chapter7/Figures/figure-reorder-base-distance.tex
+++ b/Chapter7/Figures/figure-reorder-base-distance.tex
@@ -5,7 +5,7 @@
 \begin{scope}[minimum height = 20pt]
-\node[anchor=east] (s0) at (-0.5em, 0) {$\textbf{s}$:};
+\node[anchor=east] (s0) at (-0.5em, 0) {$\seq{s}$：};
 \node[anchor=west,fill=green!20] (s1) at (0, 0) {\small{在\ \ 桌子\ \ 上\ \ \;的}};
 \node[anchor=south] (n1) at ([xshift=-2.5em,yshift=-0.5em]s1.north) {\small{1}};
 \node[anchor=south] (n2) at ([xshift=-0.7em,yshift=-0.5em]s1.north) {\small{2}};
@@ -14,7 +14,7 @@
 \node[anchor=west,fill=red!20] (s2) at ([xshift=1em]s1.east) {\small{苹果}};
 \node[anchor=south] (n5) at ([yshift=-0.5em]s2.north) {\small{5}};
-\node[anchor=east] (t0) at (-0.5em, -1.5) {$\textbf{t}$:};
+\node[anchor=east] (t0) at (-0.5em, -1.5) {$\seq{t}$：};
 \node[anchor=west,fill=red!20] (t1) at (0, -1.5) {\small{the apple}};
 \node[anchor=west,fill=green!20] (t2) at ([xshift=1.3em]t1.east) {\small{on the table}};

--- a/Chapter7/Figures/figure-reorder-base-phrase-translation.tex
+++ b/Chapter7/Figures/figure-reorder-base-phrase-translation.tex
@@ -5,11 +5,11 @@
 \begin{scope}[minimum height = 20pt]
-\node[anchor=east] (s0) at (-0.5em, 0) {$\textbf{s}$:};
+\node[anchor=east] (s0) at (-0.5em, 0) {$\seq{s}$：};
 \node[anchor=west,fill=green!20] (s1) at (0, 0) {\footnotesize{在 桌子 上 的}};
 \node[anchor=west,fill=red!20] (s2) at ([xshift=1em]s1.east) {\footnotesize{苹果}};
-\node[anchor=east] (t0) at (-0.5em, -1.5) {$\textbf{t}$:};
+\node[anchor=east] (t0) at (-0.5em, -1.5) {$\seq{t}$：};
 \node[anchor=west,fill=red!20] (t1) at (0, -1.5) {\footnotesize{the apple}};
 \node[anchor=west,fill=green!20] (t2) at ([xshift=1em]t1.east) {\footnotesize{on the table}};

--- a/Chapter7/Figures/figure-search-space-representation-of-feature-weight-1.tex
+++ b/Chapter7/Figures/figure-search-space-representation-of-feature-weight-1.tex
-\begin{tikzpicture}
-\begin{scope}[scale=0.55] 
-{\tiny
-\draw[step=1,help lines,color=black] grid (4,4); 
-\node[anchor=north] (y2) at ([xshift=-3.3em,yshift=0em]n1.north) {0.01};
-\node[anchor=north] (y1) at ([xshift=0em,yshift=-3.3em]y2.south) {0.00};
-\node[anchor=north] (y3) at ([xshift=0em,yshift=4.5em]y2.north) {0.02};
-\node[anchor=north] (y4) at ([xshift=0em,yshift=6.6em]y3.north) {$\vdots$};
-\node[anchor=north] (y5) at ([xshift=0em,yshift=2em]y4.north) {1.00};
-\node[anchor=north] (x1) at ([xshift=2em,yshift=-3em]n1.south) {$\lambda_1$};
-\node[anchor=north] (x2) at ([xshift=4.5em,yshift=0em]x1.north) {$\lambda_2$};
-\node[anchor=north] (x3) at ([xshift=4em,yshift=-1em]x2.north) {$...$};
-\node[anchor=north] (x4) at ([xshift=5em,yshift=1em]x3.north) {$\lambda_{M-1}$};
-\node[anchor=north] (x5) at ([xshift=5em,yshift=0em]x4.north) {$\lambda_M$};
-\draw [-](n1) (0,4) -- (0,4.4);
-\draw [-](n2) (1,4) -- (1,4.4);
-\draw [-](n3) (2,4) -- (2,4.4);
-\draw [-](n4) (3,4) -- (3,4.4);
-\draw [-](n5) (4,4) -- (4,4.4);
-\draw[decorate,decoration={brace}](0,4.7) --(4,4.7) node [xshift=-4em,yshift=1.5em,align=center](label1) {M dimensions};	
-\draw[decorate,decoration={brace}](4.5,4.3) --(4.5,0) node [xshift=2.3em,yshift=5.8em,align=center](label2) {Values};	
-}
-\end{scope}
-\end{tikzpicture}
\ No newline at end of file
--- a/Chapter7/Figures/figure-search-space-representation-of-feature-weight-2.tex
+++ b/Chapter7/Figures/figure-search-space-representation-of-feature-weight-2.tex
-\begin{tikzpicture}
-\begin{scope}[scale=0.55] 
-{\tiny
-\draw[step=1,help lines,color=black] grid (4,4); 
-\node[anchor=north] (y2) at ([xshift=-3.3em,yshift=0em]n1.north) {0.01};
-\node[anchor=north] (y1) at ([xshift=0em,yshift=-3.3em]y2.south) {0.00};
-\node[anchor=north] (y3) at ([xshift=0em,yshift=4.5em]y2.north) {0.02};
-\node[anchor=north] (y4) at ([xshift=0em,yshift=6.6em]y3.north) {$\vdots$};
-\node[anchor=north] (y5) at ([xshift=0em,yshift=2em]y4.north) {1.00};
-\node[anchor=north] (x1) at ([xshift=2em,yshift=-3em]n1.south) {$\lambda_1$};
-\node[anchor=north] (x2) at ([xshift=4.5em,yshift=0em]x1.north) {$\lambda_2$};
-\node[anchor=north] (x3) at ([xshift=4em,yshift=-1em]x2.north) {$...$};
-\node[anchor=north] (x4) at ([xshift=5em,yshift=1em]x3.north) {$\lambda_{M-1}$};
-\node[anchor=north] (x5) at ([xshift=5em,yshift=0em]x4.north) {$\lambda_M$};
-\draw [-](n1) (0,4) -- (0,4.4);
-\draw [-](n2) (1,4) -- (1,4.4);
-\draw [-](n3) (2,4) -- (2,4.4);
-\draw [-](n4) (3,4) -- (3,4.4);
-\draw [-](n5) (4,4) -- (4,4.4);
-\node [anchor=center,draw,circle,inner sep=1.5pt,blue!30,fill=blue!30] (r11) at (0,1) {};
-\node [anchor=center,draw,circle,inner sep=1.5pt,blue!30,fill=blue!30] (r12) at (1,2) {};
-\node [anchor=center,draw,circle,inner sep=1.5pt,blue!30,fill=blue!30] (r13) at (2,1) {};
-\node [anchor=center,draw,circle,inner sep=1.5pt,blue!30,fill=blue!30] (r14) at (3,2) {};
-\node [anchor=center,draw,circle,inner sep=1.5pt,blue!30,fill=blue!30] (r15) at (4,4) {};
-\draw [-,very thick,blue!50] (0,1) -- (1,2) -- (2,1) -- (3,2) -- (4,4);
-\node[anchor=north] (p1) at ([xshift=5em,yshift=13em]n5.north) {\scriptsize{$\leftarrow$ \textbf{path}:}};
-\node[anchor=north] (e1) at ([xshift=0,yshift=-0.4em]p1.south) {$w_1 = 0.01$};
-\node[anchor=north] (e2) at ([xshift=0,yshift=-0.8em]e1.south) {$w_2 = 0.02$};
-\node[anchor=north] (e3) at ([xshift=0,yshift=0.4em]e2.south) {$\vdots$};
-\node[anchor=north] (e4) at ([xshift=0,yshift=-0.2em]e3.south) {$w_M = 1.00$};
-}
-\end{scope}
-\end{tikzpicture}
\ No newline at end of file
--- a/Chapter7/Figures/figure-search-space-representation-of-feature-weight-3.tex
+++ b/Chapter7/Figures/figure-search-space-representation-of-feature-weight-3.tex
@@ -3,13 +3,85 @@
 {\tiny
 \draw[step=1,help lines,color=black] grid (4,4); 
-\node[anchor=north] (y2) at ([xshift=-3.3em,yshift=0em]n1.north) {0.01};
+\draw [-](n1) (0,4) -- (0,4.4);
+\draw [-](n2) (1,4) -- (1,4.4);
+\draw [-](n3) (2,4) -- (2,4.4);
+\draw [-](n4) (3,4) -- (3,4.4);
+\draw [-](n5) (4,4) -- (4,4.4);
+\node[anchor=north] (y2) at (-5.3em,1.5) {0.01};
 \node[anchor=north] (y1) at ([xshift=0em,yshift=-3.3em]y2.south) {0.00};
 \node[anchor=north] (y3) at ([xshift=0em,yshift=4.5em]y2.north) {0.02};
 \node[anchor=north] (y4) at ([xshift=0em,yshift=6.6em]y3.north) {$\vdots$};
 \node[anchor=north] (y5) at ([xshift=0em,yshift=2em]y4.north) {1.00};
-\node[anchor=north] (x1) at ([xshift=2em,yshift=-3em]n1.south) {$\lambda_1$};
+\node[anchor=north] (x1) at (1em,-3em) {$\lambda_1$};
+\node[anchor=north] (x2) at ([xshift=4.5em,yshift=0em]x1.north) {$\lambda_2$};
+\node[anchor=north] (x3) at ([xshift=4em,yshift=-1em]x2.north) {$...$};
+\node[anchor=north] (x4) at ([xshift=5em,yshift=1em]x3.north) {$\lambda_{M-1}$};
+\node[anchor=north] (x5) at ([xshift=5em,yshift=0em]x4.north) {$\lambda_M$};
+\draw[decorate,decoration={brace}](0,4.7) --(4,4.7) node [xshift=-4em,yshift=1.5em,align=center](label1) {M个特征函数};	
+\draw[decorate,decoration={brace}](4.5,4.3) --(4.5,0) node [xshift=2.3em,yshift=6.8em,align=center](label2) {V种};	
+\node[anchor=north] (label3) at ([xshift=0em,yshift=-2.5em]label2.north) {取值};	
+}
+\node[anchor=north] (l1) at ([xshift=0em,yshift=-2.5em]x3.south) {\footnotesize{(a)搜索空间}};
+\end{scope}
+\begin{scope}[scale=0.55,xshift=3.2in] 
+{\tiny
+\draw[step=1,help lines,color=black] grid (4,4); 
+\node[anchor=north] (y2) at (-5.3em,1.5) {0.01};
+\node[anchor=north] (y1) at ([xshift=0em,yshift=-3.3em]y2.south) {0.00};
+\node[anchor=north] (y3) at ([xshift=0em,yshift=4.5em]y2.north) {0.02};
+\node[anchor=north] (y4) at ([xshift=0em,yshift=6.6em]y3.north) {$\vdots$};
+\node[anchor=north] (y5) at ([xshift=0em,yshift=2em]y4.north) {1.00};
+\node[anchor=north] (x1) at (1em,-3em) {$\lambda_1$};
+\node[anchor=north] (x2) at ([xshift=4.5em,yshift=0em]x1.north) {$\lambda_2$};
+\node[anchor=north] (x3) at ([xshift=4em,yshift=-1em]x2.north) {$...$};
+\node[anchor=north] (x4) at ([xshift=5em,yshift=1em]x3.north) {$\lambda_{M-1}$};
+\node[anchor=north] (x5) at ([xshift=5em,yshift=0em]x4.north) {$\lambda_M$};
+\draw [-](n1) (0,4) -- (0,4.4);
+\draw [-](n2) (1,4) -- (1,4.4);
+\draw [-](n3) (2,4) -- (2,4.4);
+\draw [-](n4) (3,4) -- (3,4.4);
+\draw [-](n5) (4,4) -- (4,4.4);
+\node [anchor=center,draw,circle,inner sep=1.5pt,blue!30,fill=blue!30] (r11) at (0,1) {};
+\node [anchor=center,draw,circle,inner sep=1.5pt,blue!30,fill=blue!30] (r12) at (1,2) {};
+\node [anchor=center,draw,circle,inner sep=1.5pt,blue!30,fill=blue!30] (r13) at (2,1) {};
+\node [anchor=center,draw,circle,inner sep=1.5pt,blue!30,fill=blue!30] (r14) at (3,2) {};
+\node [anchor=center,draw,circle,inner sep=1.5pt,blue!30,fill=blue!30] (r15) at (4,4) {};
+\draw [-,very thick,blue!50] (0,1) -- (1,2) -- (2,1) -- (3,2) -- (4,4);
+\node[anchor=north] (p1) at (5.7,4.3) {\scriptsize{$\leftarrow$ \textbf{path}:}};
+\node[anchor=north] (e1) at ([xshift=0,yshift=-0.4em]p1.south) {$w_1 = 0.01$};
+\node[anchor=north] (e2) at ([xshift=0,yshift=-0.8em]e1.south) {$w_2 = 0.02$};
+\node[anchor=north] (e3) at ([xshift=0,yshift=0.4em]e2.south) {$\vdots$};
+\node[anchor=north] (e4) at ([xshift=0,yshift=-0.2em]e3.south) {$w_M = 1.00$};
+}
+\node[anchor=north] (l1) at ([xshift=0em,yshift=-2.5em]x3.south) {\footnotesize{(b)一条搜索路径}};
+\end{scope}
+\begin{scope}[scale=0.55,xshift=6.8in] 
+{\tiny
+\draw[step=1,help lines,color=black] grid (4,4); 
+\node[anchor=north] (y2) at (-5.3em,1.5) {0.01};
+\node[anchor=north] (y1) at ([xshift=0em,yshift=-3.3em]y2.south) {0.00};
+\node[anchor=north] (y3) at ([xshift=0em,yshift=4.5em]y2.north) {0.02};
+\node[anchor=north] (y4) at ([xshift=0em,yshift=6.6em]y3.north) {$\vdots$};
+\node[anchor=north] (y5) at ([xshift=0em,yshift=2em]y4.north) {1.00};
+\node[anchor=north] (x1) at (1em,-3em) {$\lambda_1$};
 \node[anchor=north] (x2) at ([xshift=4.5em,yshift=0em]x1.north) {$\lambda_2$};
 \node[anchor=north] (x3) at ([xshift=4em,yshift=-1em]x2.north) {$...$};
 \node[anchor=north] (x4) at ([xshift=5em,yshift=1em]x3.north) {$\lambda_{M-1}$};
@@ -43,8 +115,10 @@
 \draw [-,very thick,ugreen!50] (0,2) -- (1,3) -- (2,4) -- (3,0) -- (4,2);
 \draw [-,very thick,red!50] (0,4) -- (1,3) -- (2,2) -- (3,3) -- (4,1);
-\draw[decorate,decoration={brace}](4.5,4.3) --(4.5,0) node [xshift=2.3em,yshift=7.5em,align=center](label1) {$M^V$};	
+\draw[decorate,decoration={brace}](4.5,4.3) --(4.5,0) node [xshift=2.3em,yshift=6.5em,align=center](label1) {$M^V$};	
-\node[anchor=north] (label2) at ([xshift=0em,yshift=-2.5em]label1.north) {pathes};
+\node[anchor=north] (label2) at ([xshift=0em,yshift=-2.5em]label1.north) {种组合};
 }
+\node[anchor=north] (l1) at ([xshift=0em,yshift=-2.5em]x3.south) {\footnotesize{(c)多条搜索路径}};
 \end{scope}
 \end{tikzpicture}
\ No newline at end of file
--- a/Chapter7/Figures/figure-three-types-of-reorder-method-in-msd.tex
+++ b/Chapter7/Figures/figure-three-types-of-reorder-method-in-msd.tex
@@ -52,10 +52,10 @@
 {
-\node [anchor=west] (p1line1) at ([xshift=3.5em,yshift=0.5em]a75.east) {\footnotesize{M(monotone):单调调序}};
+\node [anchor=west] (p1line1) at ([xshift=3.5em,yshift=0.5em]a75.east) {\footnotesize{M(monotone)：单调调序}};
-\node [anchor=north west] (p1line2) at ([xshift=0,yshift=-1em]p1line1.south west) {\footnotesize{S(swap): 与前面一个短语}};
+\node [anchor=north west] (p1line2) at ([xshift=0,yshift=-1em]p1line1.south west) {\footnotesize{S(swap)： 与前面一个短语}};
-\node [anchor=north west] (p1line3) at ([xshift=3.5em]p1line2.south west) {\footnotesize{位置进行交换}};
+\node [anchor=north west] (p1line3) at ([xshift=3.8em]p1line2.south west) {\footnotesize{位置进行交换}};
-\node [anchor=north west] (p1line4) at ([xshift=-3.5em,yshift=-1em]p1line3.south west) {\footnotesize{D(discontinuous):非连续调序}};
+\node [anchor=north west] (p1line4) at ([xshift=-3.5em,yshift=-1em]p1line3.south west) {\footnotesize{D(discontinuous)：非连续调序}};
 \node [anchor=east] (p1line5) at ([xshift=0em,yshift=3em]p1line4.east) {};
 \node [anchor=east] (p1line6) at ([xshift=0em,yshift=7em]p1line4.east) {};

--- a/Chapter7/Figures/figure-translation-hypothesis-extension.tex
+++ b/Chapter7/Figures/figure-translation-hypothesis-extension.tex
@@ -6,7 +6,7 @@
 {
 \node [anchor=north,inner sep=2pt,fill=red!20,minimum height=2em,minimum width=3.5em] (h0) at (0,0) {\small{null}};
 \node [anchor=north west,inner sep=1.5pt,fill=black] (hl0) at (h0.north west) {\scriptsize{{\color{white} \textbf{0}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt0) at (h0.east) {\footnotesize{{\color{white} \textbf{P=1}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt0) at (h0.east) {\footnotesize{{\color{white} \textbf{$\funp{P}$=1}}}};
 }
 {
@@ -16,8 +16,8 @@
 \node [anchor=north west,inner sep=1.5pt,fill=black] (hl1) at (h1.north west) {\scriptsize{{\color{white} \textbf{2}}}};
 \node [anchor=north west,inner sep=1.5pt,fill=black] (hl2) at (h2.north west) {\scriptsize{{\color{white} \textbf{1}}}};
 \node [anchor=north west,inner sep=1.5pt,fill=black] (hl3) at (h3.north west) {\scriptsize{{\color{white} \textbf{3}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt1) at (h1.east) {\footnotesize{{\color{white} \textbf{P=.2}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt1) at (h1.east) {\footnotesize{{\color{white} \textbf{$\funp{P}$=.2}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt2) at (h2.east) {\footnotesize{{\color{white} \textbf{P=.3}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt2) at (h2.east) {\footnotesize{{\color{white} \textbf{$\funp{P}$=.3}}}};
 \node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt3) at (h3.east) {\footnotesize{{\color{white} \textbf{P=.5}}}};
 \draw [->,very thick,ublue] ([xshift=0.1em]pt0.south) -- ([xshift=-0.1em]h1.west);
@@ -38,11 +38,11 @@
 \node [anchor=north west,inner sep=1.5pt,fill=black] (hl7) at (h7.north west) {\scriptsize{{\color{white} \textbf{1-2}}}};
 \node [anchor=north west,inner sep=1.5pt,fill=black] (hl8) at (h8.north west) {\scriptsize{{\color{white} \textbf{5}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt4) at (h4.east) {\footnotesize{{\color{white} \textbf{P=.1}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt4) at (h4.east) {\footnotesize{{\color{white} \textbf{$\funp{P}$=.1}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt5) at (h5.east) {\footnotesize{{\color{white} \textbf{P=.4}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt5) at (h5.east) {\footnotesize{{\color{white} \textbf{$\funp{P}$=.4}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt6) at (h6.east) {\footnotesize{{\color{white} \textbf{P=.3}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt6) at (h6.east) {\footnotesize{{\color{white} \textbf{$\funp{P}$=.3}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt7) at (h7.east) {\footnotesize{{\color{white} \textbf{P=.4}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt7) at (h7.east) {\footnotesize{{\color{white} \textbf{$\funp{P}$=.4}}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt8) at (h8.east) {\footnotesize{{\color{white} \textbf{P=.2}}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt8) at (h8.east) {\footnotesize{{\color{white} \textbf{$\funp{P}$=.2}}}};
 \draw [->,very thick,ublue] ([xshift=0.1em]pt1.south) -- ([xshift=1em,yshift=0.7em]pt1.south);
@@ -65,7 +65,7 @@
 {
 \draw [->,ultra thick,red,line width=2pt,opacity=0.7] ([xshift=-0.5em]h0.west) -- ([xshift=0.7em]h0.east) -- ([xshift=-0.2em]h3.west) -- ([xshift=0.8em]h3.east) -- ([xshift=-0.2em]h5.west) -- ([xshift=0.8em]h5.east) -- ([xshift=-0.2em]h7.west) -- ([xshift=1.5em]h7.east);
-\node [anchor=north west] (wtranslabel) at ([yshift=-3em]h0.south west) {\small{翻译路径:}};
+\node [anchor=north west] (wtranslabel) at ([yshift=-3em]h0.south west) {\small{翻译路径：}};
 \draw [->,ultra thick,red,line width=1.5pt,opacity=0.7] (wtranslabel.east) -- ([xshift=1.5em]wtranslabel.east);
 }
 \end{scope}

--- a/Chapter7/Figures/figure-translation-option.tex
+++ b/Chapter7/Figures/figure-translation-option.tex
@@ -4,7 +4,7 @@
 \begin{tikzpicture}
 \begin{scope}[minimum height = 16pt]
-\node[anchor=east] (s0) at (-0.8em, 0) {$\textbf{s}$:};
+\node[anchor=east] (s0) at (-0.8em, 0) {$\textbf{s}$：};
 \node[anchor=west] (s1) at (0, 0) {桌子};
 \node[anchor=west] (s2) at ([xshift=2em]s1.east) {上};
 \node[anchor=west] (s3) at ([xshift=2.3em]s2.east) {有};

--- a/Chapter7/Figures/figure-unlimited-phrase-extraction.tex
+++ b/Chapter7/Figures/figure-unlimited-phrase-extraction.tex
@@ -41,11 +41,11 @@
 \node[tgtnode] (tgt7) at ([yshift=-0.5*1.0cm]tgt6.north east) {\scriptsize{?}};
 \node[tgtnode] (tgt8) at ([yshift=-0.5*1.0cm]tgt7.north east) {\scriptsize{EOS}};
-\node [anchor=west] (p1line1) at ([xshift=4em,yshift=1em]a57.east) {\footnotesize{$\bar{s}_i$: 什么\ \ \ 都\ \ \ 没}};
+\node [anchor=west] (p1line1) at ([xshift=4em,yshift=1em]a57.east) {\footnotesize{$\bar{s}_i$： 什么\ \ \ 都\ \ \ 没}};
-\node [anchor=north west] (p1line2) at ([xshift=0]p1line1.south west) {\footnotesize{$\bar{t}_i$: learned\ \ \ nothing\ \ \ ? \ \ \ \ \ \ \ \ \ \ \ \ }};
+\node [anchor=north west] (p1line2) at ([xshift=0]p1line1.south west) {\footnotesize{$\bar{t}_i$： learned\ \ \ nothing\ \ \ ? \ \ \ \ \ \ \ \ \ \ \ \ }};
-\node [anchor=west] (p2line1) at ([xshift=4em]a53.east) {\footnotesize{$\bar{s}_j$: 到\ \ \ ?}};
+\node [anchor=west] (p2line1) at ([xshift=4em]a53.east) {\footnotesize{$\bar{s}_j$： 到\ \ \ ?}};
-\node [anchor=north west] (p2line2) at ([xshift=0]p2line1.south west) {\footnotesize{$\bar{t}_j$: Have\ \ \ you\ \ \ learned\ \ \ nothing}};
+\node [anchor=north west] (p2line2) at ([xshift=0]p2line1.south west) {\footnotesize{$\bar{t}_j$： Have\ \ \ you\ \ \ learned\ \ \ nothing}};
 \node [anchor=east] (p1line3) at ([xshift=0em,yshift=2.9cm]p2line2.east) {};
 \begin{pgfonlayer}{background}

--- a/Chapter7/Figures/figure-word-and-phrase-translation-regard-as-path.tex
+++ b/Chapter7/Figures/figure-word-and-phrase-translation-regard-as-path.tex
@@ -10,7 +10,7 @@
 \node [anchor=west] (s4) at ([xshift=2em]s3.east) {\textbf{表示}};
 \node [anchor=west] (s5) at ([xshift=2em]s4.east) {\textbf{满意}};
-\node [anchor=south west] (sentlabel) at ([yshift=-0.5em]s1.north west) {\scriptsize{\textbf{待翻译句子(已经分词):}}};
+\node [anchor=south west] (sentlabel) at ([yshift=-0.5em]s1.north west) {\scriptsize{\textbf{待翻译句子（已经分词）：}}};
 \draw [->,very thick,ublue] (s1.south) -- ([yshift=-0.7em]s1.south);
 \draw [->,very thick,ublue] (s2.south) -- ([yshift=-0.7em]s2.south);
@@ -80,38 +80,38 @@
 {\tiny
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt11) at (t11.east) {{\color{white} \textbf{P=.4}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt11) at (t11.east) {{\color{white} \textbf{$\funp{P}$=.4}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt12) at (t12.east) {{\color{white} \textbf{P=.2}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt12) at (t12.east) {{\color{white} \textbf{$\funp{P}$=.2}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt13) at (t13.east) {{\color{white} \textbf{P=.4}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt13) at (t13.east) {{\color{white} \textbf{$\funp{P}$=.4}}};
 {
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt14) at (t14.east) {{\color{white} \textbf{P=.1}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt14) at (t14.east) {{\color{white} \textbf{$\funp{P}$=.1}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt15) at (t15.east) {{\color{white} \textbf{P=.2}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt15) at (t15.east) {{\color{white} \textbf{$\funp{P}$=.2}}};
 }
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt21) at (t21.east) {{\color{white} \textbf{P=.4}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt21) at (t21.east) {{\color{white} \textbf{$\funp{P}$=.4}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt22) at (t22.east) {{\color{white} \textbf{P=.3}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt22) at (t22.east) {{\color{white} \textbf{$\funp{P}$=.3}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt23) at (t23.east) {{\color{white} \textbf{P=.3}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt23) at (t23.east) {{\color{white} \textbf{$\funp{P}$=.3}}};
 {
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt24) at (t24.east) {{\color{white} \textbf{P=.2}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt24) at (t24.east) {{\color{white} \textbf{$\funp{P}$=.2}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt25) at (t25.east) {{\color{white} \textbf{P=.1}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt25) at (t25.east) {{\color{white} \textbf{$\funp{P}$=.1}}};
 }
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt31) at (t31.east) {{\color{white} \textbf{P=1}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt31) at (t31.east) {{\color{white} \textbf{$\funp{P}$=1}}};
 {
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt33) at (t32.east) {{\color{white} \textbf{P=.4}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt33) at (t32.east) {{\color{white} \textbf{$\funp{P}$=.4}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt33) at (t33.east) {{\color{white} \textbf{P=.3}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt33) at (t33.east) {{\color{white} \textbf{$\funp{P}$=.3}}};
 }
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt41) at (t41.east) {{\color{white} \textbf{P=.5}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt41) at (t41.east) {{\color{white} \textbf{$\funp{P}$=.5}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt42) at (t42.east) {{\color{white} \textbf{P=.5}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt42) at (t42.east) {{\color{white} \textbf{$\funp{P}$=.5}}};
 {
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt43) at (t43.east) {{\color{white} \textbf{P=.3}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt43) at (t43.east) {{\color{white} \textbf{$\funp{P}$=.3}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt44) at (t44.east) {{\color{white} \textbf{P=.2}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt44) at (t44.east) {{\color{white} \textbf{$\funp{P}$=.2}}};
 }
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt51) at (t51.east) {{\color{white} \textbf{P=.5}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt51) at (t51.east) {{\color{white} \textbf{$\funp{P}$=.5}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt52) at (t52.east) {{\color{white} \textbf{P=.4}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt52) at (t52.east) {{\color{white} \textbf{$\funp{P}$=.4}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt53) at (t53.east) {{\color{white} \textbf{P=.1}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt53) at (t53.east) {{\color{white} \textbf{$\funp{P}$=.1}}};
 }
@@ -143,13 +143,13 @@
 }
 {
-\node [anchor=north west] (wtranslabel) at ([yshift=-4em]t15.south west) {\scriptsize{翻译路径（仅含有单词）:}};
+\node [anchor=north west] (wtranslabel) at ([yshift=-4em]t15.south west) {\scriptsize{翻译路径（仅含有单词）}};
-\draw [->,ultra thick,red,line width=1.5pt,opacity=0.7] (wtranslabel.east) -- ([xshift=1em]wtranslabel.east);
+\draw [->,ultra thick,red,line width=1.5pt,opacity=0.7] ([xshift=0.2em]wtranslabel.east) -- ([xshift=1.2em]wtranslabel.east);
 }
 {
-\node [anchor=north west] (ptranslabel) at ([yshift=-5.5em]t15.south west) {\scriptsize{翻译路径（含有短语）:}};
+\node [anchor=north west] (ptranslabel) at ([yshift=-5.5em]t15.south west) {\scriptsize{翻译路径（含有短语）}};
-\draw [->,ultra thick,ublue,line width=1.5pt,opacity=0.7] ([xshift=0.65em]ptranslabel.east) -- ([xshift=1.65em]ptranslabel.east);
+\draw [->,ultra thick,ublue,line width=1.5pt,opacity=0.7] ([xshift=0.95em]ptranslabel.east) -- ([xshift=1.95em]ptranslabel.east);
 }
 \end{scope}

--- a/Chapter7/Figures/figure-word-translation-regard-as-path.tex
+++ b/Chapter7/Figures/figure-word-translation-regard-as-path.tex
@@ -10,7 +10,7 @@
 \node [anchor=west] (s4) at ([xshift=2em]s3.east) {\textbf{表示}};
 \node [anchor=west] (s5) at ([xshift=2em]s4.east) {\textbf{满意}};
-\node [anchor=south west] (sentlabel) at ([yshift=-0.5em]s1.north west) {\scriptsize{\textbf{待翻译句子(已经分词):}}};
+\node [anchor=south west] (sentlabel) at ([yshift=-0.5em]s1.north west) {\scriptsize{\textbf{待翻译句子（已经分词）：}}};
 \draw [->,very thick,ublue] (s1.south) -- ([yshift=-0.7em]s1.south);
 \draw [->,very thick,ublue] (s2.south) -- ([yshift=-0.7em]s2.south);
@@ -52,22 +52,22 @@
 {\tiny
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt11) at (t11.east) {{\color{white} \textbf{P=.4}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt11) at (t11.east) {{\color{white} \textbf{$\funp{P}$=.4}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt12) at (t12.east) {{\color{white} \textbf{P=.2}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt12) at (t12.east) {{\color{white} \textbf{$\funp{P}$=.2}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt13) at (t13.east) {{\color{white} \textbf{P=.4}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt13) at (t13.east) {{\color{white} \textbf{$\funp{P}$=.4}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt21) at (t21.east) {{\color{white} \textbf{P=.4}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt21) at (t21.east) {{\color{white} \textbf{$\funp{P}$=.4}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt22) at (t22.east) {{\color{white} \textbf{P=.3}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt22) at (t22.east) {{\color{white} \textbf{$\funp{P}$=.3}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt23) at (t23.east) {{\color{white} \textbf{P=.3}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt23) at (t23.east) {{\color{white} \textbf{$\funp{P}$=.3}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt31) at (t31.east) {{\color{white} \textbf{P=1}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt31) at (t31.east) {{\color{white} \textbf{$\funp{P}$=1}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt41) at (t41.east) {{\color{white} \textbf{P=.5}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt41) at (t41.east) {{\color{white} \textbf{$\funp{P}$=.5}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt42) at (t42.east) {{\color{white} \textbf{P=.5}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt42) at (t42.east) {{\color{white} \textbf{$\funp{P}$=.5}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt51) at (t51.east) {{\color{white} \textbf{P=.5}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt51) at (t51.east) {{\color{white} \textbf{$\funp{P}$=.5}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt52) at (t52.east) {{\color{white} \textbf{P=.4}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt52) at (t52.east) {{\color{white} \textbf{$\funp{P}$=.4}}};
-\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt53) at (t53.east) {{\color{white} \textbf{P=.1}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt53) at (t53.east) {{\color{white} \textbf{$\funp{P}$=.1}}};
 }

--- a/Chapter7/chapter7.tex
+++ b/Chapter7/chapter7.tex
--- a/Chapter7/Figures/figure-chinese-syntax-tree.tex
+++ b/Chapter7/Figures/figure-chinese-syntax-tree.tex
--- a/Chapter8/Figures/figure-cky-algorithm.tex
+++ b/Chapter8/Figures/figure-cky-algorithm.tex
@@ -12,11 +12,11 @@
 \node[srcnode,anchor=north west] (c21) at ([xshift=1.5em,yshift=0.4em]c1.south west) {\normalsize{\textbf{for} $j=0$ to $ J - 1$}};
 \node[srcnode,anchor=north west] (c22) at ([xshift=1.5em,yshift=0.4em]c21.south west) {\normalsize{$span[j,j+1 ]$.Add($A \to a \in G$)}};
 \node[srcnode,anchor=north west] (c3) at ([xshift=-1.5em,yshift=0.4em]c22.south west) {\normalsize{\textbf{for} $l$ = 1 to $J$}};
-\node[srcnode,anchor=west] (c31) at ([xshift=6em]c3.east) {\normalsize{// length of span}};
+\node[srcnode,anchor=west] (c31) at ([xshift=6em]c3.east) {\normalsize{// 跨度长度}};
 \node[srcnode,anchor=north west] (c4) at ([xshift=1.5em,yshift=0.4em]c3.south west) {\normalsize{\textbf{for} $j$ = 0 to $J-l$}};
-\node[srcnode,anchor=north west] (c41) at ([yshift=0.4em]c31.south west) {\normalsize{// beginning of span}};
+\node[srcnode,anchor=north west] (c41) at ([yshift=0.4em]c31.south west) {\normalsize{// 跨度起始位置}};
 \node[srcnode,anchor=north west] (c5) at ([xshift=1.5em,yshift=0.4em]c4.south west) {\normalsize{\textbf{for} $k$ = $j$ to $j+l$}};
-\node[srcnode,anchor=north west] (c51) at ([yshift=0.4em]c41.south west) {\normalsize{// partition of span}};
+\node[srcnode,anchor=north west] (c51) at ([yshift=0.4em]c41.south west) {\normalsize{// 跨度结束位置}};
 \node[srcnode,anchor=north west] (c6) at ([xshift=1.5em,yshift=0.4em]c5.south west) {\normalsize{$hypos$ = Compose($span[j, k], span[k, j+l]$)}};
 \node[srcnode,anchor=north west] (c7) at ([yshift=0.4em]c6.south west) {\normalsize{$span[j, j+l]$.Update($hypos$)}};
 \node[srcnode,anchor=north west] (c8) at ([xshift=-4.5em,yshift=0.4em]c7.south west) {\normalsize{\textbf{return} $span[0, J]$}};

--- a/Chapter8/Figures/figure-combination-of-translation-with-different-rules.tex
+++ b/Chapter8/Figures/figure-combination-of-translation-with-different-rules.tex
@@ -22,26 +22,29 @@
 \draw[decorate,decoration={mirror,brace}]([xshift=0.5em,yshift=-1em]q2.west) --([xshift=7em,yshift=-1em]q2.west) node [xshift=0em,yshift=-1em,align=center](label1) {};	
 {\scriptsize
-\node[anchor=west] (h1) at ([xshift=1em,yshift=-12em]q2.west) {{Span[0,3]下的翻译假设：}};
+\node[anchor=west] (h1) at ([xshift=1em,yshift=-15em]q2.west) {{Span[0,3]下的翻译假设：}};
-\node[anchor=west] (h2) at ([xshift=0em,yshift=-1.3em]h1.west) {{X: imports and exports}};
+\node[anchor=west] (h2) at ([xshift=0em,yshift=-1.3em]h1.west) {{X：imports and exports}};
-\node[anchor=west] (h6) at ([xshift=0em,yshift=-1.3em]h2.west) {{S: the import and export}};
+\node[anchor=west] (h6) at ([xshift=0em,yshift=-1.3em]h2.west) {{S：the import and export}};
 }
 {\scriptsize
-\node[anchor=west] (h21) at ([xshift=9em,yshift=2em]h1.east) {{替换$\textrm{X}_1$后生成的翻译假设：}};
+\node[anchor=west] (h21) at ([xshift=9em,yshift=5.0em]h1.east) {{替换$\textrm{X}_1$后生成的翻译假设：}};
-\node[anchor=west] (h22) at ([xshift=0em,yshift=-1.3em]h21.west) {{X: imports and exports have drastically fallen}};
+\node[anchor=west] (h22) at ([xshift=0em,yshift=-1.3em]h21.west) {{X：imports and exports have drastically fallen}};
-\node[anchor=west] (h23) at ([xshift=0em,yshift=-1.3em]h22.west) {{X: the import and export have drastically fallen}};
+\node[anchor=west] (h23) at ([xshift=0em,yshift=-1.3em]h22.west) {{X：the import and export have drastically fallen}};
-\node[anchor=west] (h24) at ([xshift=0em,yshift=-1.3em]h23.west) {{X: imports and exports have drastically fallen}};
+\node[anchor=west] (h24) at ([xshift=0em,yshift=-1.3em]h23.west) {{X：imports and exports have drastically fallen}};
-\node[anchor=west] (h25) at ([xshift=0em,yshift=-1.3em]h24.west) {{X: the import and export have drastically fallen}};
+\node[anchor=west] (h25) at ([xshift=0em,yshift=-1.3em]h24.west) {{X：the import and export have drastically fallen}};
-\node[anchor=west] (h26) at ([xshift=0em,yshift=-1.3em]h25.west) {{X: imports and exports has drastically fallen}};
+\node[anchor=west] (h26) at ([xshift=0em,yshift=-1.3em]h25.west) {{X：imports and exports has drastically fallen}};
-\node[anchor=west] (h27) at ([xshift=0em,yshift=-1.3em]h26.west) {{X: the import and export has drastically fallen}};
+\node[anchor=west] (h27) at ([xshift=0em,yshift=-1.3em]h26.west) {{X：the import and export has drastically fallen}};
 }
-\node [rectangle,inner sep=0.1em,rounded corners=1pt,draw] [fit = (h1) (h5) (h6)] (gl1) {};
+\node [rectangle,inner sep=0.1em,rounded corners=1pt,draw] [fit = (h1) (h2) (h6)] (gl1) {};
 \node [rectangle,inner sep=0.1em,rounded corners=1pt,draw] [fit = (h21) (h25) (h27)] (gl2) {};
-\draw [->,ublue,thick] ([xshift=0.6em,yshift=0.2em]n4.south) .. controls +(south:2em) and +(east:0em) ..   ([xshift=-0em,yshift=2em]gl2.west);
+\node [anchor=east,circle,inner sep=2pt,drop shadow,thick,draw=ublue,fill=white] (join) at ([xshift=4em,yshift=1em]gl1.north east) {\tiny{组合}};
-\draw [->,ublue,thick] ([xshift=0em,yshift=0em]gl1.east) .. controls +(north:2.2em) and +(east:0em) ..   ([xshift=-0em,yshift=2em]gl2.west);
+\draw [->,ublue,thick] ([xshift=0.6em,yshift=0.2em]n4.south) .. controls +(south:2em) and +(north:2em) ..   (join.90);
+\draw [->,ublue,thick] ([xshift=0em,yshift=1em]gl1.south east) .. controls +(east:2em) and +(south:2em) ..   (join.-90);
+\draw [->,ublue,thick] (join.0) -- ([xshift=2.3em]join.0);
 \end{scope}
 \end{tikzpicture}

--- a/Chapter8/Figures/figure-content-of-chart-in-tree-based-decoding.tex
+++ b/Chapter8/Figures/figure-content-of-chart-in-tree-based-decoding.tex
@@ -27,7 +27,7 @@
 \node [anchor=north] (t5) at (cell43.south) {\tiny{$l$=3}};
 \node [anchor=north] (t5) at (cell44.south) {\tiny{$l$=4}};
-\node [anchor=north] (chartlabel) at ([yshift=-1em]cell42.south east) {\footnotesize{\textbf{chart}}};
+\node [anchor=north] (chartlabel) at ([yshift=-1em]cell42.south east) {\footnotesize{Chart}};
 {\footnotesize
 \node [anchor=north west] (w1) at ([yshift=-2.5em,xshift=-1.0em]cell41.south west) {猫};

--- a/Chapter8/Figures/figure-example-of-hyper-graph.tex
+++ b/Chapter8/Figures/figure-example-of-hyper-graph.tex
@@ -3,34 +3,34 @@
 \begin{center}
 \begin{tikzpicture}\footnotesize
 \begin{scope}[scale=0.7]
-\node [anchor=center,draw,thick,circle,inner sep=13pt,ublue] (s1) at (0,0) {};
+\node [anchor=center,draw,very thick,circle,inner sep=13pt,ublue,fill=white,drop shadow] (s1) at (0,0) {};
 \node [anchor=north] (t11) at ([yshift=-0.8em]s1.north) {VP};
 \node [anchor=north] (t12) at ([yshift=-0.3em]t11.south) {[0,2]};
-\node [anchor=center,draw,thick,circle,inner sep=13pt,ublue] (s2) at ([xshift=12em,yshift=-5em]s1.north) {};
+\node [anchor=center,draw,very thick,circle,inner sep=13pt,ublue,fill=white,drop shadow] (s2) at ([xshift=12em,yshift=-3.5em]s1.north) {};
 \node [anchor=north] (t21) at ([yshift=-0.8em]s2.north) {NP};
 \node [anchor=north] (t22) at ([yshift=-0.3em]t21.south) {[0,2]};
-\node [anchor=center,draw,thick,circle,inner sep=13pt,ublue] (s3) at ([xshift=-6em,yshift=-13em]s1.south) {};
+\node [anchor=center,draw,very thick,circle,inner sep=13pt,ublue,fill=white,drop shadow] (s3) at ([xshift=-6em,yshift=-13em]s1.south) {};
 \node [anchor=north] (t31) at ([yshift=-0.8em]s3.north) {VV};
 \node [anchor=north] (t32) at ([yshift=-0.3em]t31.south) {[0,1]};
-\node [anchor=center,draw,thick,circle,inner sep=13pt,ublue] (s4) at ([xshift=13em,yshift=2.9em]s3.south) {};
+\node [anchor=center,draw,very thick,circle,inner sep=13pt,ublue,fill=white,drop shadow] (s4) at ([xshift=13em,yshift=2.9em]s3.south) {};
 \node [anchor=north] (t41) at ([yshift=-0.8em]s4.north) {NN};
 \node [anchor=north] (t42) at ([yshift=-0.3em]t41.south) {[1,2]};
-\node [anchor=center,draw,thick,circle,inner sep=13pt,ublue] (s5) at ([xshift=13em,yshift=2.9em]s4.south) {};
+\node [anchor=center,draw,very thick,circle,inner sep=13pt,ublue,fill=white,drop shadow] (s5) at ([xshift=13em,yshift=2.9em]s4.south) {};
 \node [anchor=north] (t51) at ([yshift=-0.8em]s5.north) {NP};
 \node [anchor=north] (t52) at ([yshift=-0.3em]t51.south) {[1,2]};
 {
-\draw [->,red!50,very thick] ([xshift=-1em,yshift=-0.3em]s3.north) .. controls +(north:10em) and +(south:10em) .. ([xshift=0em,yshift=0em]s1.south);
+\draw [->,red!60,very thick] ([yshift=0.1em]s3.100) .. controls +(north:8em) and +(south:10em) .. ([xshift=0em,yshift=-0.2em]s1.south);
-\draw [->,red!50,very thick] ([xshift=-1em,yshift=-0.3em]s5.north) .. controls +(north:8em) and +(south:14em) .. ([xshift=0em,yshift=0em]s1.south);
+\draw [->,red!60,very thick] ([yshift=0.1em]s5.110) .. controls +(north:8em) and +(south:12em) .. ([xshift=0em,yshift=-0.2em]s1.south);
 }
 {
-\draw [->,blue!50,very thick] ([xshift=-1em,yshift=-0.3em]s4.north) .. controls +(north:8em) and +(south:8em) .. ([xshift=0em,yshift=0em]s2.south);
+\draw [->,ugreen,very thick] ([yshift=0.1em]s4.90) .. controls +(north:9em) and +(south:7em) .. ([xshift=0em,yshift=-0.2em]s2.south);
-\draw [->,blue!50,very thick] ([xshift=1em,yshift=-0.3em]s5.north) .. controls +(north:9em) and +(south:7em) .. ([xshift=0em,yshift=0em]s2.south);
+\draw [->,ugreen,very thick] ([yshift=0.1em]s5.90) .. controls +(north:9em) and +(south:7em) .. ([xshift=0em,yshift=-0.2em]s2.south);
 }
 \node [anchor=north] (t51) at ([yshift=7em]s3.north) {edge1};

--- a/Chapter7/Figures/figure-example-of-translation-use-syntactic-structure.tex
+++ b/Chapter7/Figures/figure-example-of-translation-use-syntactic-structure.tex
--- a/Chapter8/Figures/figure-examples-of-translation-with-complex-ordering.tex
+++ b/Chapter8/Figures/figure-examples-of-translation-with-complex-ordering.tex
@@ -7,13 +7,13 @@
 {\scriptsize
-\node[anchor=west] (ref) at (0,0) {{\sffamily\bfseries{参考答案:}} The Chinese star performance troupe presented a wonderful Peking opera as well as singing and dancing };
+\node[anchor=west] (ref) at (0,0) {{\sffamily\bfseries{参考答案：}} The Chinese star performance troupe presented a wonderful Peking opera as well as singing and dancing };
 \node[anchor=north west] (ref2) at (ref.south west) {{\color{white} \sffamily\bfseries{Reference:}} performance to Hong Kong audience .};
-\node[anchor=north west] (hifst) at (ref2.south west) {{\sffamily\bfseries{层次短语系统:}} Star troupe of China, highlights of Peking opera and dance show to the audience of Hong Kong .};
+\node[anchor=north west] (hifst) at (ref2.south west) {{\sffamily\bfseries{层次短语系统：}} Star troupe of China, highlights of Peking opera and dance show to the audience of Hong Kong .};
-\node[anchor=north west] (synhifst) at (hifst.south west) {{\sffamily\bfseries{句法系统:}} Chinese star troupe};
+\node[anchor=north west] (synhifst) at (hifst.south west) {{\sffamily\bfseries{句法系统：}} Chinese star troupe};
 \node[anchor=west, fill=green!20!white, inner sep=0.25em] (synhifstpart2) at (synhifst.east) {presented};
@@ -25,7 +25,7 @@
 \node[anchor=west] (synhifstpart6) at (synhifstpart5.east) {.};
-\node[anchor=north west] (input) at ([yshift=-6.5em]synhifst.south west) {\sffamily\bfseries{源语句法树:}};
+\node[anchor=north west] (input) at ([yshift=-6.5em]synhifst.south west) {\sffamily\bfseries{源语句法树：}};
 \begin{scope}[scale = 0.9, grow'=up, sibling distance=5pt, level distance=30pt, xshift=3.49in, yshift=-3.1in]

--- a/Chapter8/Figures/figure-extract-hierarchical-phrase-rules.tex
+++ b/Chapter8/Figures/figure-extract-hierarchical-phrase-rules.tex
@@ -63,7 +63,7 @@
 {\scriptsize
 \node (phrase) {\textbf{抽取得到的短语:}};
 \draw[-] (phrase.south west)--([xshift=1.9in]phrase.south west);
-\node[anchor=north west] (rules) at ([yshift=-7.5em]phrase.south west) {\textbf{抽取得到的规则:}};
+\node[anchor=north west] (rules) at ([yshift=-7.5em]phrase.south west) {\textbf{抽取得到的层次短语规则:}};
 \draw[-] (rules.south west)--([xshift=1.9in]rules.south west);
 {

--- a/Chapter8/Figures/figure-hierarchical-phrase-rule-match-generate.tex
+++ b/Chapter8/Figures/figure-hierarchical-phrase-rule-match-generate.tex
@@ -17,26 +17,29 @@
 {\scriptsize
 \node[anchor=west] (h1) at ([xshift=1em,yshift=-7em]q2.west) {{Span[0,3]下的翻译假设：}};
-\node[anchor=west] (h2) at ([xshift=0em,yshift=-1.3em]h1.west) {{X: the imports and exports}};
+\node[anchor=west] (h2) at ([xshift=0em,yshift=-1.3em]h1.west) {{X：the imports and exports}};
-\node[anchor=west] (h3) at ([xshift=0em,yshift=-1.3em]h2.west) {{X: imports and exports}};
+\node[anchor=west] (h3) at ([xshift=0em,yshift=-1.3em]h2.west) {{X：imports and exports}};
-\node[anchor=west] (h4) at ([xshift=0em,yshift=-1.3em]h3.west) {{X: exports and imports}};
+\node[anchor=west] (h4) at ([xshift=0em,yshift=-1.3em]h3.west) {{X：exports and imports}};
-\node[anchor=west] (h5) at ([xshift=0em,yshift=-1.3em]h4.west) {{X: the imports and the exports}};
+\node[anchor=west] (h5) at ([xshift=0em,yshift=-1.3em]h4.west) {{X：the imports and the exports}};
-\node[anchor=west] (h6) at ([xshift=0em,yshift=-1.3em]h5.west) {{S: the import and export}};
+\node[anchor=west] (h6) at ([xshift=0em,yshift=-1.3em]h5.west) {{S：the import and export}};
 }
 {\scriptsize
 \node[anchor=west] (h21) at ([xshift=9em,yshift=0em]h1.east) {{替换$\textrm{X}_1$后生成的翻译假设：}};
-\node[anchor=west] (h22) at ([xshift=0em,yshift=-1.3em]h21.west) {{X: the imports and exports have drastically fallen}};
+\node[anchor=west] (h22) at ([xshift=0em,yshift=-1.3em]h21.west) {{X：the imports and exports have drastically fallen}};
-\node[anchor=west] (h23) at ([xshift=0em,yshift=-1.3em]h22.west) {{X: imports and exports have drastically fallen}};
+\node[anchor=west] (h23) at ([xshift=0em,yshift=-1.3em]h22.west) {{X：imports and exports have drastically fallen}};
-\node[anchor=west] (h24) at ([xshift=0em,yshift=-1.3em]h23.west) {{X: exports and imports have drastically fallen}};
+\node[anchor=west] (h24) at ([xshift=0em,yshift=-1.3em]h23.west) {{X：exports and imports have drastically fallen}};
-\node[anchor=west] (h25) at ([xshift=0em,yshift=-1.3em]h24.west) {{X: the imports and the exports have drastically fallen}};
+\node[anchor=west] (h25) at ([xshift=0em,yshift=-1.3em]h24.west) {{X：the imports and the exports have drastically fallen}};
 }
 \node [rectangle,inner sep=0.1em,rounded corners=1pt,draw] [fit = (h1) (h5) (h6)] (gl1) {};
 \node [rectangle,inner sep=0.1em,rounded corners=1pt,draw] [fit = (h21) (h25)] (gl2) {};
-\draw [->,ublue,thick] ([xshift=0.6em,yshift=0.2em]n2.south) .. controls +(south:2em) and +(east:0em) ..   ([xshift=-0em,yshift=2em]gl2.west);
+\node [anchor=east,circle,inner sep=2pt,drop shadow,thick,draw=ublue,fill=white] (join) at ([xshift=3em,yshift=-2em]gl1.north east) {\tiny{组合}};
-\draw [->,ublue,thick] ([xshift=0em,yshift=1em]gl1.east) .. controls +(north:2.2em) and +(east:0em) ..   ([xshift=-0em,yshift=2em]gl2.west);
+\draw [->,ublue,thick] ([xshift=0.6em,yshift=0.2em]n2.south) .. controls +(south:2em) and +(north:2em) ..   (join.90);
+\draw [->,ublue,thick] ([xshift=0em,yshift=1em]gl1.south east) .. controls +(east:2em) and +(south:2em) ..   (join.-90);
+\draw [->,ublue,thick] (join.0) -- ([xshift=1.7em]join.0);
 \end{scope}
 \end{tikzpicture}

--- a/Chapter7/Figures/figure-long-distance-dependence-in-zh2en-translation.tex
+++ b/Chapter7/Figures/figure-long-distance-dependence-in-zh2en-translation.tex
--- a/Chapter8/Figures/figure-one-best-node-alignment-and-alignment-matrix.tex
+++ b/Chapter8/Figures/figure-one-best-node-alignment-and-alignment-matrix.tex
@@ -6,7 +6,7 @@
 \begin{flushright}
 \begin{tikzpicture}
-\begin{scope}[scale=0.47]
+\begin{scope}[scale=0.60]
 {\Large
 \begin{scope}[sibling distance=17pt, level distance = 35pt]
@@ -30,7 +30,7 @@
 \end{scope}
 }
-\begin{scope}[xshift=2.3in, yshift=-0.3in]
+\begin{scope}[xshift=1.8in, yshift=-0.3in]
 \node[anchor=west, rotate=60] at (0.8,-0.6) {VP$^{[1]}$};
 \node[anchor=west, rotate=60] at (1.8,-0.6) {VBZ$^{[2]}$};
 \node[anchor=west, rotate=60] at (2.8,-0.6) {ADVP$^{[3]}$};
@@ -54,12 +54,12 @@
 \node[fill=blue!40, scale=1.1, inner sep=1pt, minimum size=12pt] at (4,-2) {{\color{white} 1}};
 \node[fill=blue!40, scale=1.1, inner sep=1pt, minimum size=12pt] at (5,-4) {{\color{white} 1}};
-\node[] at (4,-6.3) {{\color{blue!40} $\blacksquare$} = fixed alignment};
+\node[] at (4,-6.3) {{\color{blue!40} $\blacksquare$} = 确定的对齐};
-\node[] at (4,-7.2) {Matrix 1: 1-best alignment};
+\node[] at (4,-7.2) {Matrix 1: 1-best对齐};
 \end{scope}
-\begin{scope}[xshift=6.1in, yshift=-0.3in]
+\begin{scope}[xshift=4.8in, yshift=-0.3in]
 \node[anchor=west, rotate=60] at (0.8,-0.6) {VP$^{[1]}$};
 \node[anchor=west, rotate=60] at (1.8,-0.6) {VBZ$^{[2]}$};
 \node[anchor=west, rotate=60] at (2.8,-0.6) {ADVP$^{[3]}$};
@@ -92,8 +92,8 @@
 \node[fill=blue!40, scale=0.65, inner sep=1pt, minimum size=12pt] at (3,-4) {{\color{white} \small{.3}}};
 \node[fill=blue!40, scale=0.9, inner sep=1pt, minimum size=12pt] at (5,-4) {{\color{white} \small{.7}}};
-\node[] at (4,-6.3) {{\color{blue!40} $\blacksquare$} = possible alignment};
+\node[] at (4,-6.3) {{\color{blue!40} $\blacksquare$} = 概率化对齐};
-\node[] at (4,-7.2) {Matrix 2: posterior};
+\node[] at (4,-7.2) {Matrix 2: 对齐概率};
 \node[] at (9,-7.2) {};%占位符
 \end{scope}
@@ -112,8 +112,8 @@
 \begin{tabular}[t]{C{0.48\linewidth} C{0.48\linewidth} }
 \begin{tabular}{l L{150pt}}
-\multicolumn{2}{l}{\textbf{\footnotesize{Minimal Rules}}} \\
+\multicolumn{2}{l}{\textbf{\small{最小规则}}} \\
-\multicolumn{2}{l}{\textbf{\footnotesize{Extracted from Matrix 1 (1-best)}}} \\
+\multicolumn{2}{l}{\textbf{\small{Matrix 1 (基于1-best对齐)}}} \\
 \hline
 \footnotesize{$r_3$} & \footnotesize{AD(大幅度) $\rightarrow$ RB(drastically)} \\
 \footnotesize{$r_4$} & \footnotesize{VV(减少) $\rightarrow$ VBN(fallen)} \\
@@ -128,8 +128,8 @@
 &
 \begin{tabular}{l L{150pt}}
-\multicolumn{2}{l}{\textbf{\small{Minimal Rules}}} \\
+\multicolumn{2}{l}{\textbf{\small{最小规则}}} \\
-\multicolumn{2}{l}{\textbf{\small{Extracted from Matrix 2 (posterior)}}} \\
+\multicolumn{2}{l}{\textbf{\small{Matrix 2 (基于对齐概率)}}} \\
 \hline
 \footnotesize{$r_3$} & \footnotesize{AD(大幅度) $\rightarrow$ RB(drastically)} \\
 \footnotesize{$r_4$} & \footnotesize{VV(减少) $\rightarrow$ VBN(fallen)} \\

--- a/Chapter8/Figures/figure-result-of-tree-binarization.tex
+++ b/Chapter8/Figures/figure-result-of-tree-binarization.tex
@@ -9,13 +9,13 @@
 \Tree[.\node(n1){NP};
     	[.NNP \node(sw1){美国}; ]
     	[.NN \node(sw2){总统}; ]
-        [.NN \node(sw3){唐纳德}; ]
+        [.NN \node(sw3){乔治}; ]
-        [.NN \node(sw4){特朗普}; ]
+        [.NN \node(sw4){华盛顿}; ]
     ]
 \node [anchor=north] (tw1) at ([yshift=-2em]sw1.south) {U.S.};
 \node [anchor=north] (tw2) at ([yshift=-2em]sw2.south) {President};
-\node [anchor=north] (tw3) at ([yshift=-2em]sw3.south) {Trump};
+\node [anchor=north] (tw3) at ([yshift=-2em,xshift=2em]sw3.south) {Washington};
 \draw [-,dashed] (sw1.south) -- (tw1.north);
 \draw [-,dashed] (sw2.south) -- (tw2.north);
@@ -33,15 +33,15 @@
 	[.NP-BAR
     	    [.NN \node(sw2){总统}; ]
 	    [.NP-BAR
-                [.NN \node(sw3){唐纳德}; ]
+                [.NN \node(sw3){乔治}; ]
-                [.NN \node(sw4){特朗普}; ]
+                [.NN \node(sw4){华盛顿}; ]
             ]
         ]
     ]
 \node [anchor=north] (tw1) at ([yshift=-4.5em]sw1.south) {U.S.};
 \node [anchor=north] (tw2) at ([yshift=-2.75em]sw2.south) {President};
-\node [anchor=north] (tw3) at ([yshift=-1em]sw3.south) {Trump};
+\node [anchor=north] (tw3) at ([yshift=-1em,xshift=2em]sw3.south) {Washington};
 \draw [-,dashed] (sw1.south) -- (tw1.north);
 \draw [-,dashed] (sw2.south) -- (tw2.north);

--- a/Chapter8/Figures/figure-role-of-syntax-tree-in-different-decoding-methods.tex
+++ b/Chapter8/Figures/figure-role-of-syntax-tree-in-different-decoding-methods.tex
@@ -17,13 +17,13 @@
     ]
 \node [anchor=west] (target) at ([xshift=1em]bsw3.east) {Cats like eating fish};
-\node [anchor=north,inner sep=3pt] (cap1) at ([yshift=-1em]target.south west) {(a) 基于树的解码};
+\node [anchor=north,inner sep=3pt] (cap1) at ([xshift=-1.0em,yshift=-1em]target.south west) {(a) 基于树的解码};
 \draw [->,thick] (bsw3.east) -- (target.west);
 \node [anchor=west] (sourcelabel) at ([xshift=6em,yshift=-1em]bsn0.east) {显式输入的结构};
-\node [anchor=west] (source2) at ([xshift=3.3em]target.east) {猫$\ \ \;$喜欢$\ \;$吃\ 鱼};
+\node [anchor=west] (source2) at ([xshift=3.3em,yshift=0.0em]target.east) {猫$\ \ \;$喜欢$\ \;$吃\ 鱼};
 \node [anchor=west] (target2) at ([xshift=1em]source2.east) {Cats like eating fish};
-\node [anchor=north,inner sep=3pt] (cap2) at ([xshift=1.1em,yshift=-1em]target2.south west) {(b) 基于串的解码};
+\node [anchor=north,inner sep=3pt] (cap2) at ([xshift=-1.5em,yshift=-1em]target2.south west) {(b) 基于串的解码};
 \draw [->,thick] (source2.east) -- (target2.west);
 \begin{pgfonlayer}{background}
@@ -32,7 +32,7 @@
 }
 \end{pgfonlayer}
-\begin{scope}[xshift=3.18in,yshift=-0em,sibling distance=10pt]
+\begin{scope}[xshift=3.18in,yshift=-0.28em,sibling distance=10pt]
 \Tree[.\node(bsn0){IP};
          [.\node(bsn1){NP};
               [.\node(bsn2){NN}; ]
@@ -44,7 +44,7 @@
     ]
 \begin{pgfonlayer}{background}
-\node [draw,dashed,rectangle,inner sep=1em,thick,red,rounded corners=5pt] (box) [fit = (bsn0) (bsn1) (bsn2) (bsn3) (bsn4) (bsn5)] {};
+\node [draw,dashed,rectangle,inner sep=0.7em,thick,red,rounded corners=5pt] (box) [fit = (bsn0) (bsn1) (bsn2) (bsn3) (bsn4) (bsn5)] {};
 \node [anchor=north west] (boxlabel) at ([xshift=2em,yshift=-2em]box.north east) {隐含结构};
 \end{pgfonlayer}

--- a/Chapter8/Figures/figure-structure-of-chart.tex
+++ b/Chapter8/Figures/figure-structure-of-chart.tex
@@ -2,45 +2,30 @@
 %%%  基于树的解码方法 - chart-based decoding
 \begin{center}
 \begin{tikzpicture}
-\begin{scope}%[scale=0.2]
+\begin{scope}
-\node [anchor=north] (ch) at (0,0) {\large{\textbf{Chart}}};
+\node [anchor=south west,draw,fill=ugreen!20,minimum width=2.8em,minimum height=2.8em,inner sep=1pt] (cell11) at (0,0) {\scriptsize{cell[1,2]}};
+\node [anchor=south west,draw,fill=red!20,minimum width=2.8em,minimum height=2.8em,inner sep=1pt] (cell12) at (cell11.south east) {\scriptsize{cell[0,2]}};
+\node [anchor=south west,draw,fill=orange!30,minimum width=2.8em,minimum height=2.8em,inner sep=1pt] (cell21) at (cell11.north west) {\scriptsize{cell[0,1]}};
+\node [anchor=south west,draw,fill=gray!20,minimum width=2.8em,minimum height=2.8em,inner sep=1pt] (cell22) at (cell21.south east) {\scriptsize{N/A}};
+\draw [->,thick] ([xshift=-1em,yshift=1em]cell21.north west)--([xshift=-1em,yshift=-1em]cell11.south west);
+\draw [->,thick] ([xshift=-1em,yshift=1em]cell21.north west)--([xshift=1em,yshift=1em]cell22.north east);
+\node [anchor=north west,fill=orange!30,draw,drop shadow,align=left,minimum width=4em] (cell11label) at ([xshift=4em,yshift=1em]cell22.north east) {\footnotesize{VV[0,1]}};
+\node [anchor=north west,fill=ugreen!20,draw,drop shadow,align=left,minimum width=4em] (cell12label) at ([yshift=-1em]cell11label.south west) {\footnotesize{NN[1,2]}\\\footnotesize{NP[1,2]}};
+\node [anchor=north west,fill=red!20,draw,drop shadow,align=left,minimum width=4em] (cell21label) at ([yshift=-1em]cell12label.south west) {\footnotesize{VP[0,2]}\\\footnotesize{NP[0,2]}};
+\draw [->,very thick,dotted] ([yshift=0.3em]cell11label.west) .. controls +(west:2em)  and +(north:1.5em) .. ([xshift=1em,yshift=-0.5em]cell21.north);
+\draw [->,very thick,dotted] ([yshift=-0.5em]cell12label.west) -- ([yshift=-0.5em,xshift=-7.5em]cell12label.west);
+\draw [->,very thick,dotted] ([yshift=-0.3em]cell21label.west) .. controls +(west:2em)  and +(south:1.5em) .. ([xshift=1em,yshift=0.5em]cell12.south);
+\node [anchor=south] (label1) at ([yshift=1em]cell21.north east) {\footnotesize{跨度大小}};
+\node [anchor=north] (l21) at ([xshift=-2.0em,yshift=1em]cell21.west) {\footnotesize{起}};
+\node [anchor=north] (l22) at ([xshift=0em,yshift=0.5em]l21.south) {\footnotesize{始}};
+\node [anchor=north] (l23) at ([xshift=0em,yshift=0.5em]l22.south) {\footnotesize{位}};
+\node [anchor=north] (l24) at ([xshift=0em,yshift=0.5em]l23.south) {\footnotesize{置}};
+\node [anchor=north] (labelchart) at (cell11.south east) {\small{Chart（表格）}};
-\draw [->,ublue] ([xshift=-1em,yshift=-1em]ch.south) -- ([xshift=-1em,yshift=-9em]ch.south);
-\draw [->,ublue] ([xshift=-1em,yshift=-1em]ch.south) -- ([xshift=10em,yshift=-1em]ch.south);
-{\small
-\node [anchor=north] (l11) at ([xshift=-1.7em,yshift=-2.5em]ch.south) {{起}};
-\node [anchor=north] (l12) at ([xshift=0em,yshift=0.5em]l11.south) {{始}};
-\node [anchor=north] (l13) at ([xshift=0em,yshift=0.5em]l12.south) {{位}};
-\node [anchor=north] (l14) at ([xshift=0em,yshift=0.5em]l13.south) {{置}};
-\node [anchor=north] (l2) at ([xshift=4.5em,yshift=0.4em]ch.south) {{跨度大小}};
-}
-\draw [-,ublue] ([xshift=1em,yshift=-2em]ch.south) -- ([xshift=1em,yshift=-8em]ch.south);
-\draw [-,ublue] ([xshift=5em,yshift=-2em]ch.south) -- ([xshift=5em,yshift=-8em]ch.south);
-\draw [-,ublue] ([xshift=9em,yshift=-2em]ch.south) -- ([xshift=9em,yshift=-8em]ch.south);
-\draw [-,ublue] ([xshift=1em,yshift=-2em]ch.south) -- ([xshift=9em,yshift=-2em]ch.south);
-\draw [-,ublue] ([xshift=1em,yshift=-5em]ch.south) -- ([xshift=9em,yshift=-5em]ch.south);
-\draw [-,ublue] ([xshift=1em,yshift=-8em]ch.south) -- ([xshift=9em,yshift=-8em]ch.south);
-\node [anchor=north,rectangle,draw=red!40, inner sep=0mm,minimum height=4em,minimum width=9em,rounded corners=2pt,very thick] (n1) at ([xshift=18em,yshift=2em]ch.south) {};
-\node [anchor=north,rectangle,draw=red!40, inner sep=0mm,minimum height=4em,minimum width=9em,rounded corners=2pt,very thick] (n2) at ([xshift=0em,yshift=-0.5em]n1.south) {};
-\node [anchor=north,rectangle,draw=red!40, inner sep=0mm,minimum height=4em,minimum width=9em,rounded corners=2pt,very thick] (n3) at ([xshift=0em,yshift=-0.5em]n2.south) {};
-\node [anchor=north] (n11) at ([xshift=0em,yshift=-0.5em]n1.north) {Cell[0,1]:};
-\node [anchor=north] (n12) at ([xshift=1em,yshift=-1.2em]n11.north) {VV[0,1]};
-\node [anchor=north] (n21) at ([xshift=0em,yshift=-0.1em]n2.north) {Cell[1,2]:};
-\node [anchor=north] (n22) at ([xshift=1em,yshift=-1.2em]n21.north) {NN[1,2]};
-\node [anchor=north] (n23) at ([xshift=0em,yshift=-1.3em]n22.north) {NP[1,2]};
-\node [anchor=north] (n31) at ([xshift=0em,yshift=-0.1em]n3.north) {Cell[0,2]:};
-\node [anchor=north] (n32) at ([xshift=1em,yshift=-1.2em]n31.north) {VP[0,2]};
-\node [anchor=north] (n33) at ([xshift=0em,yshift=-1.3em]n32.north) {NP[0,2]};
-\draw [->,blue!40,very thick] ([xshift=0em,yshift=-0.5em]n1.west) .. controls +(west:6em) and +(north:3em) .. ([xshift=-15em,yshift=-2em]n1.south);
-\draw [->,blue!40,very thick] ([xshift=0em,yshift=1em]n2.west) .. controls +(west:2em) and +(north:2em) .. ([xshift=-14.5em,yshift=0em]n2.south);
-\draw [->,blue!40,very thick] ([xshift=0em,yshift=-0.5em]n3.west) .. controls +(west:5em) and +(south:0.5em) .. ([xshift=-12em,yshift=5em]n3.south);
 \end{scope}
 \end{tikzpicture}
 \end{center}
--- a/Chapter8/Figures/figure-syntax-tree-with-admissible-node.tex
+++ b/Chapter8/Figures/figure-syntax-tree-with-admissible-node.tex
@@ -36,8 +36,8 @@
 \draw[dashed] (cw4.south) .. controls +(south:2.0) and +(north:0.6) .. ([yshift=-0.4em]tw3.north);
 \draw[dashed] (cw5.south) .. controls +(south:2.0) and +(north:0.6) .. ([yshift=-0.4em]tw3.north);
-\node [anchor=south west,align=left,fill=red!20,drop shadow] (label1) at ([xshift=0.5em]n11.north east) {\footnotesize{span=\{3\}}\\\footnotesize{c-span=\{1,3-6\}}};
+\node [anchor=south west,align=left,fill=red!20,drop shadow] (label1) at ([xshift=0.5em,yshift=-1.3em]n11.north east) {\footnotesize{可达范围=\{3\}}\\\footnotesize{补充范围=\{1,3-6\}}};
-\node [anchor=south west,align=left,fill=blue!20,drop shadow] (label2) at ([xshift=0.5em,yshift=-0.5em]n4.north east) {\footnotesize{span=\{3-6\}}\\\footnotesize{c-span=\{1\}}};
+\node [anchor=south west,align=left,fill=blue!20,drop shadow] (label2) at ([xshift=0.5em,yshift=-0.5em]n4.north east) {\footnotesize{可达范围=\{3-6\}}\\\footnotesize{补充范围=\{1\}}};
 \begin{pgfonlayer}{background}
 \node [rectangle,fill=red!20,inner sep=0] [fit = (n11)] (n11box) {};
@@ -58,7 +58,7 @@
 {
 \node [anchor=north] (n11boxlabel) at (label1.south) {\footnotesize{{\red{不可信}}}};
-\node [anchor=north] (n4boxlabel) at (label2.south) {\footnotesize{{\red{可信}}}};
+\node [anchor=north] (n4boxlabel) at (label2.south) {\footnotesize{{{\color{ublue} 可信}}}};
 }
 {

--- a/Chapter8/Figures/figure-tree-binarization.tex
+++ b/Chapter8/Figures/figure-tree-binarization.tex
@@ -11,14 +11,14 @@
 \Tree[.\node(n1){NP};
     	[.NNP \node(sw1){美国}; ]
     	[.NN \node(sw2){总统}; ]
-        [.NN \node(sw3){唐纳德}; ]
+        [.NN \node(sw3){乔治}; ]
-        [.NN \node(sw4){特朗普}; ]
+        [.NN \node(sw4){华盛顿}; ]
     ]
 }
 \node [anchor=north] (tw1) at ([yshift=-2em]sw1.south) {U.S.};
 \node [anchor=north] (tw2) at ([yshift=-2em]sw2.south) {President};
-\node [anchor=north] (tw3) at ([yshift=-2em]sw3.south) {Trump};
+\node [anchor=north] (tw3) at ([yshift=-2em,xshift=1.5em]sw3.south) {Washington};
 \draw [-,dashed] (sw1.south) -- (tw1.north);
 \draw [-,dashed] (sw2.south) -- (tw2.north);
@@ -26,12 +26,12 @@
 \draw [-,dashed] (sw4.south) -- (tw3.north);
 \node [anchor=west] (rulelabel1) at ([xshift=1in,yshift=0.3em]n1.east) {{抽取到的规则：}};
-\node [anchor=north west] (rule1) at (rulelabel1.south west) {NP(NNP$_1$ NN$_2$ NN(唐纳德) NN(特朗普))};
+\node [anchor=north west] (rule1) at (rulelabel1.south west) {NP(NNP$_1$ NN$_2$ NN(乔治) NN(华盛顿))};
 \node [anchor=north west] (rule1t) at ([yshift=0.2em]rule1.south west) {$\to$ NNP$_1$ NN$_2$ Trump};
-\node [anchor=north west] (rule2) at (rule1t.south west) {NP(NNP$_1$ NN(总统) NN(唐纳德) NN(特朗普))};
+\node [anchor=north west] (rule2) at (rule1t.south west) {NP(NNP$_1$ NN(总统) NN(乔治) NN(华盛顿))};
 \node [anchor=north west] (rule2t) at ([yshift=0.2em]rule2.south west) {$\to$ NNP$_1$ President Trump};
 \node [anchor=north west] (rulelabel2) at ([yshift=-0.3em]rule2t.south west) {{{\red{不能}}抽取到的规则：}};
-\node [anchor=north west] (rule3) at (rulelabel2.south west) {NP(NN(唐纳德) NN(特朗普)) $\to$ Trump};
+\node [anchor=north west] (rule3) at (rulelabel2.south west) {NP(NN(乔治) NN(华盛顿)) $\to$ Trump};
 \end{scope}
 }

--- a/Chapter8/chapter8.tex
+++ b/Chapter8/chapter8.tex
--- a/Chapter9/Figures/biological-neuron.jpg
+++ b/Chapter9/Figures/biological-neuron.jpg
--- a/Chapter9/Figures/deep-learning.jpg
+++ b/Chapter9/Figures/deep-learning.jpg
--- a/Chapter9/Figures/feature-engineering.jpg
+++ b/Chapter9/Figures/feature-engineering.jpg
--- a/Chapter9/Figures/fig-4-gram.tex
+++ b/Chapter9/Figures/fig-4-gram.tex
+\begin{tikzpicture}
+\begin{scope}
+\node [anchor=west] (w0) at (0,0) {\footnotesize{$w_{i-3}$}};
+\node [anchor=west] (w1) at ([xshift=2em]w0.east) {\footnotesize{$w_{i-2}$}};
+\node [anchor=west] (w2) at ([xshift=2em]w1.east) {\footnotesize{$w_{i-1}$}};
+\node [anchor=north] (index0) at ([yshift=0.5em]w0.south) {\tiny(index)};
+\node [anchor=north] (index1) at ([yshift=0.5em]w1.south) {\tiny(index)};
+\node [anchor=north] (index2) at ([yshift=0.5em]w2.south) {\tiny(index)};
+\node [anchor=south,draw,inner sep=3pt,fill=blue!20!white] (e0) at ([yshift=1em]w0.north) {\tiny{$\textbf{e}_0=w_{i-3} \textbf{C}$}};
+\node [anchor=south,draw,inner sep=3pt,fill=blue!20!white] (e1) at ([yshift=1em]w1.north) {\tiny{$\textbf{e}_1=w_{i-2} \textbf{C}$}};
+\node [anchor=south,draw,inner sep=3pt,fill=blue!20!white] (e2) at ([yshift=1em]w2.north) {\tiny{$\textbf{e}_2=w_{i-1} \textbf{C}$}};
+\node [anchor=south,draw,minimum width=9em,inner sep=3pt,fill=orange!20!white] (h0) at ([yshift=1.5em]e1.north) {\tiny{$\textbf{h}_0=\textrm{Tanh}([\textbf{e}_0,\textbf{e}_1,\textbf{e}_2] \textbf{H} + \textbf{d})$}};
+\node [anchor=south,draw,minimum width=9em,inner sep=3pt,fill=orange!20!white] (h1) at ([yshift=1.5em]h0.north) {\tiny{$\textbf{y}=\textrm{Softmax}(\textbf{h}_0 \textbf{U})$}};
+\node [anchor=south] (ylabel) at ([yshift=1em]h1.north) {\footnotesize{$\textrm{P}(w_i|w_{i-3}w_{i-2}w_{i-1})$}};
+\draw [->,line width=1pt] ([yshift=0.1em]w0.north) -- ([yshift=-0.1em]e0.south);
+\draw [->,line width=1pt] ([yshift=0.1em]w1.north) -- ([yshift=-0.1em]e1.south);
+\draw [->,line width=1pt] ([yshift=0.1em]w2.north) -- ([yshift=-0.1em]e2.south);
+\draw [->,line width=1pt] ([yshift=0.1em]e0.north) -- ([xshift=-2em,yshift=-0.1em]h0.south);
+\draw [->,line width=1pt] ([yshift=0.1em]e1.north) -- ([yshift=-0.1em]h0.south);
+\draw [->,line width=1pt] ([yshift=0.1em]e2.north) -- ([xshift=2em,yshift=-0.1em]h0.south);
+\draw [->,line width=1pt] ([yshift=0.1em]h0.north) -- ([yshift=-0.1em]h1.south);
+\draw [->,line width=1pt] ([yshift=0.1em]h1.north) -- ([yshift=-0.1em]ylabel.south);
+{
+\draw [->,dashed,red,line width=1pt] ([xshift=1em,yshift=0.1em]e1.north) -- ([xshift=1em,yshift=-0.1em]h1.south);
+\draw [->,dashed,red,line width=1pt] ([xshift=-1em,yshift=0.1em]e0.north) .. controls +(north:2) and +(south:1) .. ([xshift=-3em,yshift=-0.1em]h1.south);
+\draw [->,dashed,red,line width=1pt] ([xshift=1em,yshift=0.1em]e2.north) .. controls +(north:2) and +(south:1) .. ([xshift=3em,yshift=-0.1em]h1.south);
+}
+\begin{pgfonlayer}{background}
+{
+\node [rectangle,inner sep=0.1em,fill=ugreen!20!white] [fit = (w0) (index0)] (wordbox0) {};
+\node [rectangle,inner sep=0.1em,fill=ugreen!20!white] [fit = (w1) (index1)] (wordbox1) {};
+\node [rectangle,inner sep=0.1em,fill=ugreen!20!white] [fit = (w2) (index2)] (wordbox2) {};
+}
+\end{pgfonlayer}
+\end{scope}
+\end{tikzpicture}
\ No newline at end of file
--- a/Chapter9/Figures/fig-absolute-loss.tex
+++ b/Chapter9/Figures/fig-absolute-loss.tex
+%%%------------------------------------------------------------------------------------------------------------
+ \begin{tikzpicture}
+\begin{scope}[yscale=0.2,xscale=0.8]
+\draw[-,very thick,ublue,domain=-4.2:3.5,samples=100] plot (\x,{ - 1/14 * (\x + 4) * (\x + 1) * (\x - 1) * (\x - 3)});
+{
+\draw[-,very thick,ugreen,domain=-3.8:3.0,samples=100] plot (\x,{ - 1/14 * (4*\x*\x*\x + 3*\x*\x - 26*\x - 1)});
+}
+\draw[->,thick] (-6,0) -- (5,0);
+\draw[->,thick] (-5,-4) -- (-5,5);
+\draw [<-] (-2.5,4) -- (-2,5) node [pos=1,right,inner sep=2pt] {\footnotesize{答案$\tilde{\textbf{y}}_i$}};
+{
+\draw [<-] (-3,-3) -- (-2.5,-2) node [pos=0,left,inner sep=2pt] {\footnotesize{预测$\textbf{y}_i$}};}
+{
+\draw [<-] (2.3,1) -- (3.3,2) node [pos=1,right,inner sep=2pt] {\footnotesize{偏差$|\tilde{\textbf{y}}_i - \textbf{y}_i|$}};
+\foreach \x in {-3.8,-3.7,...,3.0}{
+    \pgfmathsetmacro{\p}{- 1/14 * (\x + 4) * (\x + 1) * (\x - 1) * (\x - 3)};
+    \pgfmathsetmacro{\q}{- 1/14 * (4*\x*\x*\x + 3*\x*\x - 26*\x - 1)};
+    \draw [-] (\x,\p) -- (\x, \q);
+}
+}
+\end{scope}
+\end{tikzpicture}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-artificial-neuron.tex
+++ b/Chapter9/Figures/fig-artificial-neuron.tex
+%%%------------------------------------------------------------------------------------------------------------
+\begin{tikzpicture}
+\begin{scope}
+\node [anchor=center,circle,draw,ublue,very thick,minimum size=3.5em,fill=white,drop shadow={shadow xshift=0.1em,shadow yshift=-0.1em}] (neuron) at (0,0) {};
+\node [anchor=east] (x1) at ([xshift=-6em]neuron.west) {\Large{$x_1$}};
+\node [anchor=center] (x0) at ([yshift=3em]x1.center) {\Large{$x_0$}};
+\node [anchor=center] (x2) at ([yshift=-3em]x1.center) {\Large{$b$}};
+\node [anchor=west] (y) at ([xshift=6em]neuron.east) {\Large{$y$}};
+\node [anchor=center] (neuronmath) at (neuron.center) {\Large{$f$}};
+\draw [->,thick] (x0.east) -- (neuron.150) node [pos=0.5,above] {$w_0$};
+\draw [->,thick] (x1.east) -- (neuron.180) node [pos=0.5,above] {$w_1$};
+\draw [->,thick] (x2.east) -- (neuron.210) node [pos=0.5,above] {$$};
+\draw [->,thick] (neuron.east) -- (y.west);
+\end{scope}
+\end{tikzpicture}
+%
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-back-propagation-hid.tex
+++ b/Chapter9/Figures/fig-back-propagation-hid.tex
+%%%------------------------------------------------------------------------------------------------------------
+\begin{tikzpicture}
+\begin{scope}
+\node [anchor=center,draw,fill=red!20,minimum height=1.8em,minimum width=2.5em] (h) at (0,0) {$\textbf{h}^{k-1}$};
+\node [anchor=west,draw,fill=blue!20,minimum height=1.8em,minimum width=2.5em] (s) at ([xshift=6em]h.east) {$\textbf{s}^{k}$};
+\node [anchor=west,draw,fill=green!20,minimum height=1.8em,minimum width=2.5em] (h2) at ([xshift=6em]s.east) {$\textbf{h}^{k}$};
+\node [anchor=east] (prev) at ([xshift=-2em]h.west) {...};
+\node [anchor=west] (next) at ([xshift=2em]h2.east) {...};
+\draw [->,thick] ([xshift=0.1em]prev.east) -- ([xshift=-0.1em]h.west);
+\draw [->,thick] ([xshift=0.1em]h.east) -- ([xshift=-0.1em]s.west) node [pos=0.5,below] {\scriptsize{$\textbf{s}^k = \textbf{h}^{k-1}\textbf{w}^k$}};
+\draw [->,thick] ([xshift=0.1em]s.east) -- ([xshift=-0.1em]h2.west) node [pos=0.5,below] {\scriptsize{$\textbf{h}^k = f^k(\textbf{s}^{k})$}};
+\draw [->,thick] ([xshift=0.1em]h2.east) -- ([xshift=-0.1em]next.west);
+{
+\draw [<-,thick,red] ([xshift=0.1em,yshift=0.4em]h2.east) -- ([xshift=-0.1em,yshift=0.4em]next.west) node [pos=0.8,above] {\scriptsize{反向传播}};
+}
+{
+\draw [<-,thick,red] ([xshift=0.1em,yshift=0.4em]s.east) -- ([xshift=-0.1em,yshift=0.4em]h2.west) node [pos=0.5,above] {\scriptsize{反向传播}};
+}
+{
+\draw [<-,thick,red] ([xshift=0.1em,yshift=0.4em]h.east) -- ([xshift=-0.1em,yshift=0.4em]s.west) node [pos=0.5,above] {\scriptsize{反向传播}};
+}
+{
+\draw [->,thick,red,dashed] ([yshift=-0.1em]h.south) -- ([yshift=-1em]h.south) -- ([yshift=-1em]h2.south) -- ([yshift=-0.1em]h2.south);
+\node [anchor=north,red] (recur) at ([yshift=-1em]s.south) {\scriptsize{$k=k-1$重复上述过程}};
+}
+{
+\node [anchor=south] (h2label) at (h2.north) {$\frac{\partial L}{\partial \textbf{h}^{k}}$};
+}
+{
+\node [anchor=south] (slabel) at (s.north) {$\frac{\partial L}{\partial \textbf{s}^{k}}$};
+}
+{
+\node [anchor=south] (hlabel) at (h.north) {$\frac{\partial L}{\partial \textbf{h}^{k-1}}$, $\frac{\partial L}{\partial \textbf{w}^{k}}$};
+}
+\end{scope}
+\end{tikzpicture}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-back-propagation-output1.tex
+++ b/Chapter9/Figures/fig-back-propagation-output1.tex
+%%%------------------------------------------------------------------------------------------------------------
+\begin{tikzpicture}
+\begin{scope}
+\node [anchor=west,minimum height=1.7em,fill=blue!20,draw] (s) at (0,0) {$\textbf{s}^{K}$};
+\node [anchor=west,minimum height=1.7em,fill=green!20,draw] (h2) at ([xshift=5.5em]s.east) {$\textbf{h}^{K}$};
+\node [anchor=west,minimum height=1.7em,fill=orange!20,draw] (l) at ([xshift=5.5em]h2.east) {$L$};
+\draw [->] (s.east) -- (h2.west);
+\draw [->] (h2.east) -- (l.west);
+\draw [->,very thick,red] ([yshift=1em,xshift=-0.1em]l.north) -- ([yshift=1em,xshift=0.1em]h2.north) node [pos=0.5,above] {\scriptsize{求梯度{$\frac{\partial L}{\partial \textbf{h}^K} = ?$}}};
+\draw [->,very thick,red] ([yshift=1em,xshift=-0.1em]h2.north) -- ([yshift=1em,xshift=0.1em]s.north) node [pos=0.5,above] {\scriptsize{求梯度{$\frac{\partial f^K(\textbf{s}^K)}{\partial \textbf{s}^K} = ?$}}};
+\draw [-,very thick,red] ([yshift=0.5em]l.north) -- ([yshift=1.5em]l.north);
+\draw [-,very thick,red] ([yshift=0.5em]h2.north) -- ([yshift=1.5em]h2.north);
+\draw [-,very thick,red] ([yshift=0.5em]s.north) -- ([yshift=1.5em]s.north);
+\end{scope}
+\end{tikzpicture}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-back-propagation-output2.tex
+++ b/Chapter9/Figures/fig-back-propagation-output2.tex
+%%%------------------------------------------------------------------------------------------------------------
+\begin{tikzpicture}
+\begin{scope}
+\node [anchor=center,minimum height=1.7em,fill=yellow!20,draw] (h) at (0,0) {$\textbf{h}^{K-1}$};
+\node [anchor=west,minimum height=1.7em,fill=blue!20,draw] (s) at ([xshift=6.0em]h.east) {$\textbf{s}^{K}$};
+\draw [->] (h.east) -- (s.west);
+\node [anchor=south west,inner sep=2pt] (step100) at ([xshift=0.5em,yshift=-0.8em]h.north east) {\scriptsize{$\textbf{s}^K = \textbf{h}^{K-1} \textbf{w}^K$}};
+\node [anchor=south west] (slabel) at ([yshift=1em,xshift=0.3em]s.north) {\scriptsize{\red{\textbf{{已经得到：$\pi^K = \frac{\partial L}{\partial \textbf{s}^K}$}}}}};
+\draw [->,red] ([yshift=0.3em]slabel.south) .. controls +(south:0.5) and +(north:0.5) .. ([xshift=0.5em]s.north);
+{
+\draw [->,very thick,red] ([yshift=1em,xshift=-0.1em]s.north) -- ([yshift=1em,xshift=0.1em]h.north) node [pos=0.5,above] {\scriptsize{{$\frac{\partial L}{\partial \textbf{w}^K} = ?$, $\frac{\partial L}{\partial \textbf{h}^{K-1}} = ?$}}};
+\draw [-,very thick,red] ([yshift=0.5em]h.north) -- ([yshift=1.5em]h.north);
+\draw [-,very thick,red] ([yshift=0.5em]s.north) -- ([yshift=1.5em]s.north);
+}
+\end{scope}
+\end{tikzpicture}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-back-propagation.tex
+++ b/Chapter9/Figures/fig-back-propagation.tex
+%%%------------------------------------------------------------------------------------------------------------
+\begin{tikzpicture}
+\begin{scope}
+\tikzstyle{layernode} = [draw,thick,fill=ugreen!30!white,blur shadow={shadow xshift=1pt,shadow yshift=-1pt}];
+\node [anchor=center,layernode,minimum height=4em,minimum width=1em] (layer01) at (0,0) {};
+\node [anchor=north west,layernode,minimum height=3em,minimum width=1em] (layer02) at ([xshift=3em]layer01.north east) {};
+\node [anchor=south west,layernode,minimum height=3em,minimum width=1em] (layer03) at ([xshift=7em]layer01.south east) {};
+\node [anchor=south west,layernode,minimum height=4em,minimum width=1em] (layer04) at ([xshift=11em]layer01.south east) {};
+\node [anchor=south west,layernode,minimum height=4em,minimum width=1em] (layer05) at ([xshift=3em]layer04.south east) {};
+\node [anchor=east] (input) at ([xshift=-1em]layer01.west){\scriptsize{输入}};
+\node [anchor=west] (output) at ([xshift=1em]layer05.east){\scriptsize{输出}};
+{
+\draw [<-,very thick,red] ([xshift=-1em,yshift=-0.3em]layer01.west) -- ([xshift=-0.1em,yshift=-0.3em]layer01.west)node [pos=0.5,above] {\small{\ding{178}}};
+\draw [<-,very thick,red] ([xshift=0.1em,yshift=-0.8em]layer01.north east) -- ([xshift=-0.1em,yshift=-0.8em]layer02.north west)node [pos=0.5,above] {\small{\ding{177}}};
+\draw [<-,very thick,red] ([xshift=0.1em,yshift=0.2em]layer01.south east) -- ([xshift=-0.1em,yshift=0.2em]layer03.south west)node [pos=0.5,below] {\small{\ding{176}}};
+\draw [<-,very thick,red] ([xshift=0.1em,yshift=-0.8em]layer02.north east) -- ([xshift=-0.1em,yshift=-0.8em]layer04.north west)node [pos=0.5,above] {\small{\ding{175}}};
+\draw [<-,very thick,red] ([xshift=0.1em,yshift=0.2em]layer03.south east) -- ([xshift=-0.1em,yshift=0.2em]layer04.south west)node [pos=0.5,below] {\small{\ding{174}}};
+\draw [<-,very thick,red] ([xshift=0.1em,yshift=-0.3em]layer04.east) -- ([xshift=-0.1em,yshift=-0.3em]layer05.west)node [pos=0.5,above] {\small{\ding{173}}};
+\draw [<-,very thick,red] ([xshift=0.1em,yshift=-0.3em]layer05.east) -- ([xshift=1.0em,yshift=-0.3em]layer05.east)node [pos=0.5,above] {\small{\ding{172}}};
+}
+{
+\draw [<-,thin] ([xshift=0.3em,yshift=-0.7em]layer04.east) .. controls +(-35:1) and +(145:1) .. ([xshift=-2em,yshift=-0.9em]layer05.south west) node [pos=1,below] {\scriptsize{反向：$h_{i}$ 处的梯度$\frac{\partial L}{\partial h_i}$}};
+}
+\end{scope}
+\end{tikzpicture}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-bias.tex
+++ b/Chapter9/Figures/fig-bias.tex
+%%%------------------------------------------------------------------------------------------------------------
+\begin{tikzpicture}
+%% a two-layer neural network
+\begin{scope}
+{
+\draw [->,thick] (-1.8,0) -- (1.8,0);
+\draw [->,thick] (0,0) -- (0,2);
+\draw [-] (-0.05,1) -- (0.05,1);
+\node [anchor=east,inner sep=1pt] (label1) at (0,1) {\tiny{1}};
+\node [anchor=south east,inner sep=1pt] (label2) at (0,0) {\tiny{0}};
+\node [anchor=south east,inner sep=1pt] (labela) at (0.2,-0.5) {\footnotesize{(a)}};
+}
+{\node [anchor=north west,align=left] (wblabel) at (-1.8,2) {{\scriptsize{$w_1=100$}}\\[-0ex] \scriptsize{\ $b_1=0$}};}
+{\draw [-,very thick,ublue,rounded corners=0.1em] (-1.5,0) -- (0,0) -- (0,1) -- (1.5,1);}
+\end{scope}
+%---------------------------------------------------------------------------------------------
+\begin{scope}[xshift=1.6in]
+{
+\draw [->,thick] (-1.8,0) -- (1.8,0);
+\draw [->,thick] (0,0) -- (0,2);
+\draw [-] (-0.05,1) -- (0.05,1);
+\node [anchor=east,inner sep=1pt] (label1) at (0,1) {\tiny{1}};
+\node [anchor=south east,inner sep=1pt] (label2) at (0,0) {\tiny{0}};
+\node [anchor=south east,inner sep=1pt] (labelb) at (0.2,-0.5) {\footnotesize{(b)}};
+}
+{\node [anchor=north west,align=left] (wblabel) at (-1.8,2) {\scriptsize{$w_1=100$}\\[-0ex] {\scriptsize{\ $b_1=-2$}}};}
+{\draw [-,very thick,ublue,rounded corners=0.1em] (-1.5,0) -- (0.25,0) -- (0.25,1) -- (1.5,1);}
+\end{scope}
+%-----------------------------------------------------------------------------------------------
+\begin{scope}[xshift=3.2in]
+{
+\draw [->,thick] (-1.8,0) -- (1.8,0);
+\draw [->,thick] (0,0) -- (0,2);
+\draw [-] (-0.05,1) -- (0.05,1);
+\node [anchor=east,inner sep=1pt] (label1) at (0,1) {\tiny{1}};
+\node [anchor=south east,inner sep=1pt] (label2) at (0,0) {\tiny{0}};
+\node [anchor=south east,inner sep=1pt] (labelc) at (0.2,-0.5) {\footnotesize{(c)}};
+}
+{\node [anchor=north west,align=left] (wblabel) at (-1.8,2) {\scriptsize{$w_1=100$}\\[-0ex] {\scriptsize{\ $b_1=-4$}}};}
+{\draw [-,very thick,ublue,rounded corners=0.1em] (-1.5,0) -- (0.5,0) -- (0.5,1) -- (1.5,1);}
+\end{scope}
+\end{tikzpicture}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-broadcast.tex
+++ b/Chapter9/Figures/fig-broadcast.tex
+%%%------------------------------------------------------------------------------------------------------------
+\begin{tikzpicture}
+\begin{scope}[xshift=0.6in]
+\setcounter{mycount1}{1}
+\draw[step=0.5cm,color=orange,thick] (-1,-0.5) grid (1,0.5);
+\foreach \y in {+0.25,-0.25}
+  \foreach \x in {-0.75,-0.25,0.25,0.75}{
+    \node [fill=orange!20,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,\y) {$\number\value{mycount1}$};
+    \addtocounter{mycount1}{1};
+  }
+\node [anchor=south] (varlabel) at (0,0.6) {$\mathbf{s}$};
+\node [anchor=north] (labelc) at (0,-0.7) {\footnotesize{(a)}};
+\end{scope}
+\begin{scope}[xshift=2.1in]
+\setcounter{mycount1}{1}
+\draw[step=0.5cm,color=ugreen,thick] (-1,-0.5) grid (1,0);
+\foreach \y in {-0.25}
+  \foreach \x in {-0.75,-0.25,0.25,0.75}{
+    \node [fill=green!20,inner sep=0pt,minimum height=0.48cm,minimum width=0.48cm] at (\x,\y) {$1$};
+    \addtocounter{mycount1}{1};
+  }
+\node [anchor=south] (varlabel) at (0,0.1) {$\mathbf{b}$};
+\node [anchor=north] (labelc) at (0,-0.7) {\footnotesize{(b)}};
+\end{scope}
+\begin{scope}[yshift=-1in]
+\setcounter{mycount1}{1}
+\draw[step=0.5cm,color=orange,thick] (-1,-0.5) grid (1,0.5);
+\foreach \y in {+0.25,-0.25}
+  \foreach \x in {-0.75,-0.25,0.25,0.75}{
+    \node [fill=orange!20,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,\y) {$\number\value{mycount1}$};
+    \addtocounter{mycount1}{1};
+  }
+\node [anchor=south] (varlabel) at (0,0.6) {$\mathbf{s}$};
+\end{scope}
+\begin{scope}[yshift=-1in,xshift=1.5in]
+\setcounter{mycount1}{1}
+\draw[step=0.5cm,color=ugreen,thick] (-1,-0.5) grid (1,0.5);
+\foreach \y in {+0.25}
+  \foreach \x in {-0.75,-0.25,0.25,0.75}{
+    \node [fill=green!20,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,\y) {$1$};
+    \addtocounter{mycount1}{1};
+  }
+  \foreach \y in {-0.25}
+  \foreach \x in {-0.75,-0.25,0.25,0.75}{
+    \node [fill=purple!20,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,\y) {$1$};
+    \addtocounter{mycount1}{1};
+  }
+\node [anchor=center] (plabel) at (-4.5em,0) {\huge{$\mathbf{+}$}};
+\node [anchor=south] (varlabel) at (0,0.6) {$\mathbf{b}$};
+\node [anchor=north] (labelc) at (0,-0.7) {\footnotesize{(c)}};
+\end{scope}
+\begin{scope}[yshift=-1in,xshift=3in]
+\setcounter{mycount1}{2}
+\draw[step=0.5cm,color=orange,thick] (-1,-0.5) grid (1,0.5);
+\foreach \y in {+0.25,-0.25}
+  \foreach \x in {-0.75,-0.25,0.25,0.75}{
+    \node [fill=orange!20,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,\y) {$\number\value{mycount1}$};
+    \addtocounter{mycount1}{1};
+  }
+\node [anchor=center] (plabel) at (-4.5em,0) {\huge{$\mathbf{=}$}};
+\node [anchor=south] (varlabel) at (0,0.6) {$\mathbf{s}+\mathbf{b}$};
+\end{scope}
+\end{tikzpicture}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-code-back-propagation-1.tex
+++ b/Chapter9/Figures/fig-code-back-propagation-1.tex
+%%%------------------------------------------------------------------------------------------------------------
+ \begin{tcolorbox}
+[bicolor,sidebyside,width=13cm,righthand width=4cm,size=title,frame engine=empty,
+ colback=blue!10!white,colbacklower=black!5!white]
+ {\scriptsize
+\begin{tabbing}
+\texttt{XTensor x, y, gold, h[5], w[5], s[5];} \\
+\texttt{XTensor dh[5], dw[5], ds[5];} \\
+\texttt{...} // 前向过程 \\
+\texttt{h[0] = x;} \\
+\texttt{y = h[4];} \\
+\texttt{} \\
+\texttt{CrossEntropyBackward(dh[4], y, gold);} \\
+\texttt{SoftmaxBackward(y, s[4], dh[4], ds[4]);}\\
+\texttt{MMul(h[3], {\scriptsize X\_TRANS}, ds[4], {\scriptsize X\_NOTRANS}, dw[4]);}\\
+\texttt{MMul(ds[4], {\scriptsize X\_NOTRANS}, w[4], {\scriptsize X\_RANS}, dh[3]);}\\
+\texttt{} \\
+\texttt{dh[2] = dh[3];}\\
+\texttt{ReluBackward(h[2], s[2], dh[2], ds[2]);}\\
+\texttt{MMul(h[1], {\scriptsize X\_TRANS}, ds[2], {\scriptsize X\_NOTRANS}, dw[2]);}\\
+\texttt{MMul(ds[2], {\scriptsize X\_NOTRANS}, w[2], {\scriptsize X\_TRANS}, dh[2]);}\\
+\texttt{} \\
+\texttt{dh[1] = dh[1] + dh[3];}\\
+\texttt{...} // 继续反向传播 \\
+\texttt{} \\
+\texttt{for(unsigned i = 0; i < 5; i++)\{} \\
+\texttt{} \ \ \ \ ... // 通过{\texttt{dw[i]}}访问参数的梯度\\
+\texttt{\}}
+\end{tabbing}
+}
+\tcblower
+\begin{center}
+\begin{tikzpicture}
+\node [anchor=south,draw,rounded corners,inner sep=2pt,minimum width=8em,minimum height=1.2em,fill=red!30!white,blur shadow={shadow xshift=1pt,shadow yshift=-1pt}] (h1) at (0,0) {\scriptsize{x (input)}};
+\node [anchor=south,draw,rounded corners,inner sep=2pt,minimum width=8em,minimum height=1.2em,fill=green!30!white,blur shadow={shadow xshift=1pt,shadow yshift=-1pt}] (h2) at ([yshift=1.5em]h1.north) {\scriptsize{h1 = Relu(x * w1)}};
+\node [anchor=south,draw,rounded corners,inner sep=2pt,minimum width=8em,minimum height=1.2em,fill=green!30!white,blur shadow={shadow xshift=1pt,shadow yshift=-1pt}] (h3) at ([yshift=1.5em]h2.north) {\scriptsize{h2 = Relu(h1 * w2)}};
+\node [anchor=south,draw,rounded corners,inner sep=2pt,minimum width=8em,minimum height=1.2em,fill=green!30!white,blur shadow={shadow xshift=1pt,shadow yshift=-1pt}] (h4) at ([yshift=1.5em]h3.north) {\scriptsize{h3 = h2 + h1}};
+{\draw [<-,very thick,red] (h1.north) -- (h2.south);}
+{\draw [<-,very thick,red] (h2.north) -- (h3.south);}
+{\draw [<-,very thick,red] (h3.north) -- (h4.south);}
+{\draw [<-,very thick,red,rounded corners] (h2.east) -- ([xshift=0.5em]h2.east) -- ([xshift=0.5em,yshift=0.5em]h3.north east) -- ([xshift=-2em,yshift=0.5em]h3.north east) -- ([xshift=-2em,yshift=1.5em]h3.north east);}
+\node [anchor=south,draw,rounded corners,inner sep=2pt,minimum width=8.0em,minimum height=1.2em,fill=red!30!white,blur shadow={shadow xshift=1pt,shadow yshift=-1pt}] (slayer) at ([yshift=1.5em]h4.north) {\tiny{h4 = Softmax(h3 * w4) (output)}};
+\node [anchor=south] (losslabel) at (slayer.north) {\scriptsize{\textbf{Cross Entropy Loss}}};
+{\draw [<-,very thick,red] (h4.north) -- (slayer.south);}
+\end{tikzpicture}
+\end{center}
+\end{tcolorbox}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-code-back-propagation-2.tex
+++ b/Chapter9/Figures/fig-code-back-propagation-2.tex
+%%%------------------------------------------------------------------------------------------------------------
+\begin{tcolorbox}
+[bicolor,sidebyside,width=13cm,righthand width=4cm,size=title,frame engine=empty,
+ colback=blue!10!white,colbacklower=black!5!white]
+ {\scriptsize
+\begin{tabbing}
+\texttt{XTensor x, loss, gold, h[5], w[5], b[5];} \\
+\texttt{...} \\
+\texttt{} \\
+\texttt{h[1] = Relu(MMul(x, w[1]) + b[1]);} \\
+\texttt{h[2] = Relu(MMul(h[1], w[2]) + b[2]);} \\
+\texttt{h[3] = HardTanH(h[2]);} \\
+\texttt{h[4] = Softmax(MMul(h[3], w[3]));} \\
+\texttt{loss = CrossEntropy(h[4], gold);} \\
+\texttt{} \\
+\texttt{XNet net;}\\
+{\texttt{net.Backward(loss);} //一行代码实现自动微分}\\
+\texttt{} \\
+\texttt{for(unsigned i = 0; i < 5; i++)\{} \\
+\texttt{} \ \ \ \ ... // 通过{\texttt{w[i].grad}}访问参数的梯度\\
+\texttt{\}}
+\end{tabbing}
+}
+\tcblower
+\begin{center}
+\begin{tikzpicture}
+\node [anchor=south,draw,rounded corners,inner sep=2pt,minimum width=8em,minimum height=1.0em,fill=red!30!white,blur shadow={shadow xshift=1pt,shadow yshift=-1pt}] (h1) at (0,0) {\tiny{x (input)}};
+\node [anchor=south,draw,rounded corners,inner sep=2pt,minimum width=8em,minimum height=1.0em,fill=green!30!white,blur shadow={shadow xshift=1pt,shadow yshift=-1pt}] (h2) at ([yshift=1.0em]h1.north) {\tiny{h1 = Relu(x * w1 + b1)}};
+\node [anchor=south,draw,rounded corners,inner sep=2pt,minimum width=8em,minimum height=1.0em,fill=green!30!white,blur shadow={shadow xshift=1pt,shadow yshift=-1pt}] (h3) at ([yshift=1.0em]h2.north) {\tiny{h2 = Relu(h1 * w2 + b2)}};
+\node [anchor=south,draw,rounded corners,inner sep=2pt,minimum width=8em,minimum height=1.0em,fill=green!30!white,blur shadow={shadow xshift=1pt,shadow yshift=-1pt}] (h4) at ([yshift=1.0em]h3.north) {\tiny{h3 = HardTanh(h2)}};
+\draw [->,very thick] (h1.north) -- (h2.south);
+\draw [->,very thick] (h2.north) -- (h3.south);
+\draw [->,very thick] (h3.north) -- (h4.south);
+\node [anchor=south,draw,rounded corners,inner sep=2pt,minimum width=8.0em,minimum height=1.0em,fill=red!30!white,blur shadow={shadow xshift=1pt,shadow yshift=-1pt}] (slayer) at ([yshift=1.0em]h4.north) {\tiny{h4 = Softmax(h3 * w4) (output)}};
+\node [anchor=south] (losslabel) at (slayer.north) {\scriptsize{\textbf{Cross Entropy Loss}}};
+\draw [->,very thick] (h4.north) -- (slayer.south);
+\end{tikzpicture}
+\end{center}
+\end{tcolorbox}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-code-fnnlm.tex
+++ b/Chapter9/Figures/fig-code-fnnlm.tex
+%%%------------------------------------------------------------------------------------------------------------
+\begin{tcolorbox}
+[bicolor,sidebyside,width=12cm,righthand width=4cm,size=title,frame engine=empty,
+ colback=blue!10!white,colbacklower=black!5!white]
+ {\scriptsize
+\begin{tabbing}
+\texttt{XTensor w[3], e[3], h0, y;} \\
+\texttt{XTensor C, H, d, U;} \\
+\texttt{...}\\
+\texttt{} \\
+\texttt{for(unsigned i = 0; i < 3; i++)} \\
+\texttt{\ \ \ \ e[i] = MMul(w[i], C);}\\
+\texttt{e01 = Concatenate(e[0], e[1], -1);}\\
+\texttt{e = Concatenate(e01, e[2], -1);}\\
+\texttt{} \\
+\texttt{h0 = TanH(MMul(e, H) + d);}\\
+\texttt{y = Softmax(MMul(h0, U));}\\
+\texttt{} \\
+\texttt{for(unsigned k = 0; k < size; k++)\{} \\
+\texttt{} \ \ \ \ ... // {\texttt{y}}的第$k$元素表示 $\textrm{P}(w|...)$\\
+\texttt{} \ \ \ \ ... // $w$为词汇表里第$k$个词\\
+\texttt{\}}
+\end{tabbing}
+}
+\tcblower
+\begin{center}
+\begin{tikzpicture}
+\begin{scope}
+\node [anchor=west] (w0) at (0,0) {\scriptsize{$w_{i-3}$}};
+\node [anchor=west] (w1) at ([xshift=0.5em]w0.east) {\scriptsize{$w_{i-2}$}};
+\node [anchor=west] (w2) at ([xshift=0.5em]w1.east) {\scriptsize{$w_{i-1}$}};
+\node [anchor=north] (index0) at ([yshift=0.5em]w0.south) {\tiny(index)};
+\node [anchor=north] (index1) at ([yshift=0.5em]w1.south) {\tiny(index)};
+\node [anchor=north] (index2) at ([yshift=0.5em]w2.south) {\tiny(index)};
+\node [anchor=south,draw,inner sep=3pt,align=left] (e0) at ([yshift=1.0em]w0.north) {\tiny{$e_0:$}\\\tiny{$w_{i-3} \textbf{C}$}};
+\node [anchor=south,draw,inner sep=3pt,align=left] (e1) at ([yshift=1.0em]w1.north) {\tiny{$e_1:$}\\\tiny{$w_{i-2} \textbf{C}$}};
+\node [anchor=south,draw,inner sep=3pt,align=left] (e2) at ([yshift=1.0em]w2.north) {\tiny{$e_2:$}\\\tiny{$w_{i-1} \textbf{C}$}};
+\node [anchor=south,draw,minimum width=9em,inner sep=3pt] (h0) at ([yshift=1.5em]e1.north) {\tiny{$h_0=\textrm{Tanh}([e_0,e_1,e_2] \textbf{H} + \textbf{d})$}};
+\node [anchor=south,draw,minimum width=9em,inner sep=3pt] (h1) at ([yshift=1.5em]h0.north) {\tiny{$y=\textrm{Softmax}(h_0 \textbf{U})$}};
+\node [anchor=south] (ylabel) at ([yshift=1em]h1.north) {\scriptsize{$\textrm{P}(w_i|w_{i-3}w_{i-2}w_{i-1})$}};
+\draw [->] ([yshift=0.1em]w0.north) -- ([yshift=-0.1em]e0.south);
+\draw [->] ([yshift=0.1em]w1.north) -- ([yshift=-0.1em]e1.south);
+\draw [->] ([yshift=0.1em]w2.north) -- ([yshift=-0.1em]e2.south);
+\draw [->] ([yshift=0.1em]e0.north) -- ([xshift=-2em,yshift=-0.1em]h0.south);
+\draw [->] ([yshift=0.1em]e1.north) -- ([yshift=-0.1em]h0.south);
+\draw [->] ([yshift=0.1em]e2.north) -- ([xshift=2em,yshift=-0.1em]h0.south);
+\draw [->] ([yshift=0.1em]h0.north) -- ([yshift=-0.1em]h1.south);
+\draw [->] ([yshift=0.1em]h1.north) -- ([yshift=-0.1em]ylabel.south);
+\end{scope}
+\end{tikzpicture}
+\end{center}
+\end{tcolorbox}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-code-niutensor-rnn.tex
+++ b/Chapter9/Figures/fig-code-niutensor-rnn.tex
+%%%------------------------------------------------------------------------------------------------------------
+ \begin{tcolorbox}
+[bicolor,sidebyside,width=11cm,righthand width=4cm,size=title,frame engine=empty,
+ colback=blue!10!white,colbacklower=black!5!white]
+ {\scriptsize
+\begin{tabbing}
+\texttt{XTensor x[3], y[3], r, wh;} \\
+\texttt{XTensor h1, h2, w1, b1, h3, h4;} \\
+\texttt{XList splits;} \\
+\texttt{...} \\
+\texttt{for(unsigned i = 0; i < 3; i++)\{} \\
+\texttt{\hspace{2em}r = Concatenate(x[i] + r) * wh;}\\
+\texttt{\hspace{2em}splits.Add(\&r);}\\
+\texttt{\}}\\
+\texttt{} \\
+\texttt{h1 = Merge(splits, 0);}\\
+\texttt{h2 = Relu(h1 * w1 + b1);}\\
+\texttt{h3 = h1 + h2;} \\
+\texttt{h4 = Softmax(h3);} \\
+\texttt{} \\
+\texttt{Split(h4, splits, 0);} \\
+\texttt{} \\
+\texttt{for(unsigned i = 0; i < 3; i++)\{} \\
+\texttt{\hspace{2em}y[i] = *(XTensor*)splits.Get(i);}\\
+\texttt{\hspace{2em}y[i].Dump(stdout);}\\
+\texttt{\}}
+\end{tabbing}
+}
+\tcblower
+\begin{center}
+\begin{tikzpicture}
+\node [draw,circle,inner sep=1pt,fill=red!30!white,blur shadow={shadow xshift=1pt,shadow yshift=-1pt}] (x1) at (0,0) {\footnotesize{$\textrm{x}_1$}};
+\node [anchor=west,draw,circle,inner sep=1pt,fill=red!30!white,blur shadow={shadow xshift=1pt,shadow yshift=-1pt}] (x2) at ([xshift=2em]x1.east) {\footnotesize{$\textrm{x}_2$}};
+\node [anchor=west,draw,circle,inner sep=1pt,fill=red!30!white,blur shadow={shadow xshift=1pt,shadow yshift=-1pt}] (x3) at ([xshift=2em]x2.east) {\footnotesize{$\textrm{x}_3$}};
+\node [anchor=south,draw,rounded corners,inner sep=2pt,minimum width=2.5em,fill=green!30!white,blur shadow={shadow xshift=1pt,shadow yshift=-1pt}] (rlayer1) at ([yshift=1em]x1.north) {\tiny{rlayer}};
+\node [anchor=south,draw,rounded corners,inner sep=2pt,minimum width=2.5em,fill=green!30!white,blur shadow={shadow xshift=1pt,shadow yshift=-1pt}] (rlayer2) at ([yshift=1em]x2.north) {\tiny{rlayer}};
+\node [anchor=south,draw,rounded corners,inner sep=2pt,minimum width=2.5em,fill=green!30!white,blur shadow={shadow xshift=1pt,shadow yshift=-1pt}] (rlayer3) at ([yshift=1em]x3.north) {\tiny{rlayer}};
+\draw [->,thick] (x1.north) -- (rlayer1.south);
+\draw [->,thick] (x2.north) -- (rlayer2.south);
+\draw [->,thick] (x3.north) -- (rlayer3.south);
+\draw [->,thick] (rlayer1.east) -- (rlayer2.west);
+\draw [->,thick] (rlayer2.east) -- (rlayer3.west);
+\draw [->,thick] (rlayer1.north) -- ([yshift=1em]rlayer1.north);
+\draw [->,thick] (rlayer2.north) -- ([yshift=1em]rlayer2.north);
+\draw [->,thick] (rlayer3.north) -- ([yshift=1em]rlayer3.north);
+{
+\node [anchor=south,draw,rounded corners,inner sep=2pt,minimum width=9.4em,minimum height=1.0em,fill=green!30!white,blur shadow={shadow xshift=1pt,shadow yshift=-1pt}] (h1) at ([yshift=1em]rlayer2.north) {\scriptsize{h1 = Merge($\cdot$)}};
+\node [anchor=south,draw,rounded corners,inner sep=2pt,minimum width=9.4em,minimum height=1.0em,fill=green!30!white,blur shadow={shadow xshift=1pt,shadow yshift=-1pt}] (h2) at ([yshift=1em]h1.north) {\scriptsize{h2 = Relu($\cdot$)}};
+\node [anchor=south,draw,rounded corners,inner sep=2pt,minimum width=9.4em,minimum height=1.0em,fill=green!30!white,blur shadow={shadow xshift=1pt,shadow yshift=-1pt}] (h3) at ([yshift=1em]h2.north) {\scriptsize{h3 = Sum($\cdot$)}};
+\node [anchor=south,draw,rounded corners,inner sep=2pt,minimum width=9.4em,minimum height=1.0em,fill=green!30!white,blur shadow={shadow xshift=1pt,shadow yshift=-1pt}] (h4) at ([yshift=1em]h3.north) {\scriptsize{h4 = Softmax($\cdot$)}};
+\draw [->,thick] (h1.north) -- (h2.south);
+\draw [->,thick] (h2.north) -- (h3.south);
+\draw [->,thick] (h3.north) -- (h4.south);
+\draw [->,thick,rounded corners] (h1.east) -- ([xshift=0.5em]h1.east) -- ([xshift=0.5em,yshift=0.5em]h2.north east) -- ([xshift=-2em,yshift=0.5em]h2.north east) -- ([xshift=-2em,yshift=1em]h2.north east);
+}
+{
+\node [anchor=south,draw,rounded corners,inner sep=2pt,minimum width=9.4em,minimum height=1.0em,fill=green!30!white,blur shadow={shadow xshift=1pt,shadow yshift=-1pt}] (slayer) at ([yshift=1em]h4.north) {\scriptsize{Split($\cdot$)}};
+\node [anchor=south,draw,circle,inner sep=1pt,fill=red!30!white,blur shadow={shadow xshift=1pt,shadow yshift=-1pt}] (y2) at ([yshift=1em]slayer.north) {\footnotesize{$\textrm{y}_2$}};
+\node [anchor=east,draw,circle,inner sep=1pt,fill=red!30!white,blur shadow={shadow xshift=1pt,shadow yshift=-1pt}] (y1) at ([xshift=-2em]y2.west) {\footnotesize{$\textrm{y}_1$}};
+\node [anchor=west,draw,circle,inner sep=1pt,fill=red!30!white,blur shadow={shadow xshift=1pt,shadow yshift=-1pt}] (y3) at ([xshift=2em]y2.east) {\footnotesize{$\textrm{y}_3$}};
+\draw [<-,thick] (y1.south) -- ([yshift=-1em]y1.south);
+\draw [<-,thick] (y2.south) -- ([yshift=-1em]y2.south);
+\draw [<-,thick] (y3.south) -- ([yshift=-1em]y3.south);
+}
+{
+\draw [->,thick] (h4.north) -- (slayer.south);
+}
+\end{tikzpicture}
+\end{center}
+\end{tcolorbox}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-code-out.tex
+++ b/Chapter9/Figures/fig-code-out.tex
+%%%------------------------------------------------------------------------------------------------------------
+\begin{tcolorbox}[enhanced,width=11cm,frame engine=empty,boxrule=0.1mm,size=title,colback=black!10!white]
+\begin{flushleft}
+{\scriptsize
+\begin{tabbing}
+\texttt{order=2 dimsize=2,2 dtype=X\_FLOAT dense=1.000000} \\
+\texttt{3.605762e-001 2.992340e-001 1.393780e-001 7.301248e-001}
+\end{tabbing}
+}
+\end{flushleft}
+\end{tcolorbox}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-code-tensor-define-2.tex
+++ b/Chapter9/Figures/fig-code-tensor-define-2.tex
+%%%------------------------------------------------------------------------------------------------------------
+\begin{tcolorbox}[enhanced,width=12cm,frame engine=empty,boxrule=0.1mm,size=title,colback=blue!10!white]
+\begin{flushleft}
+{\scriptsize
+\begin{tabbing}
+\texttt{XTensor tensor;} \hspace{14em} \= // 声明张量tensor \\
+\texttt{int sizes[6] = \{2,3,4,2,3,4\};} \> // 张量的形状为2*3*4*2*3*4 \\
+\texttt{InitTensor(\&tensor, 6, sizes, X\_FLOAT);} \> // 定义形状为sizes的6阶张量
+\end{tabbing}
+}
+\end{flushleft}
+\end{tcolorbox}
+\hspace{0.1in} \scriptsize{(a) NiuTensor定义张量程序}
+\\
+\begin{tcolorbox}[enhanced,width=12cm,frame engine=empty,boxrule=0.1mm,size=title,colback=blue!10!white]
+\begin{flushleft}
+{\scriptsize
+\begin{tabbing}
+\texttt{XTensor a, b, c;} \hspace{13.5em} \= // 声明张量tensor \\
+\texttt{InitTensor1D(\&a, 10, X\_INT);} \> // 10维的整数型向量\\
+\texttt{InitTensor1D(\&b, 10);} \> // 10维的向量，缺省类型(浮点)\\
+\texttt{InitTensor4D(\&c, 10, 20, 30, 40);} \> // 10*20*30*40的4阶张量(浮点)
+\end{tabbing}
+}
+\end{flushleft}
+\end{tcolorbox}
+\hspace{0.1in} \scriptsize{(b) 定义张量的简便方式程序}
+\\
+\begin{tcolorbox}[enhanced,width=12cm,frame engine=empty,boxrule=0.1mm,size=title,colback=blue!10!white]
+\begin{flushleft}
+{\scriptsize
+\begin{tabbing}
+\texttt{XTensor tensorGPU;} \hspace{12.5em} \= // 声明张量tensor \\
+\texttt{InitTensor2D(\&tensorGPU, 10, 20,} $\backslash$ \> // 在编号为0的GPU上定义张量 \\
+\hspace{6.7em} \texttt{X\_FLOAT, 0);}
+\end{tabbing}
+}
+\end{flushleft}
+\end{tcolorbox}
+\hspace{0.1in} \scriptsize{(c) 在GPU上定义张量程序}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-code-tensor-define.tex
+++ b/Chapter9/Figures/fig-code-tensor-define.tex
+%------------------------------------------------------------------------------------------------------------
+\begin{tcolorbox}[enhanced,width=12cm,frame engine=empty,boxrule=0.1mm,size=title,colback=blue!10!white]
+\begin{flushleft}
+{\scriptsize
+\begin{tabbing}
+\texttt{\#include "source/tensor/XTensor.h"} \hspace{6em} \= // 引用XTensor定义的头文件 \\
+\texttt{using namespace nts;} \> // 引用nts命名空间 \\
+\ \\
+\texttt{int main(int argc, const char ** argv)\{} \\
+\ \ \ \ \texttt{XTensor tensor;} \> // 声明张量tensor \\
+\ \ \ \ \texttt{InitTensor2D(\&tensor, 2, 2, X\_FLOAT);} \> // 定义张量为2*2的矩阵 \\
+\ \ \ \ \texttt{tensor.SetDataRand();} \> // [0,1]均匀分布初始化张量 \\
+\ \ \ \ \texttt{tensor.Dump(stdout);} \> // 输出张量内容 \\
+\ \ \ \ \texttt{return 0;}\\
+\texttt{\}}
+\end{tabbing}
+}
+\end{flushleft}
+\end{tcolorbox}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-code-tensor-operation.tex
+++ b/Chapter9/Figures/fig-code-tensor-operation.tex
+%%%------------------------------------------------------------------------------------------------------------
+\begin{tcolorbox}[enhanced,width=12cm,frame engine=empty,boxrule=0.1mm,size=title,colback=blue!10!white]
+\begin{flushleft}
+{\scriptsize
+\begin{tabbing}
+\texttt{XTensor a, b, c, d, e;} \hspace{9em} \= // 声明张量tensor \\
+\texttt{InitTensor3D(\&a, 2, 3, 4);} \> // a为2*3*4的3阶张量 \\
+\texttt{InitTensor3D(\&b, 2, 3, 4);} \> // b为2*3*4的3阶张量 \\
+\texttt{InitTensor3D(\&c, 2, 3, 4);} \> // c为2*3*4的3阶张量 \\
+\texttt{a.SetDataRand();} \> // 随机初始化a \\
+\texttt{b.SetDataRand();} \> // 随机初始化b \\
+\texttt{c.SetDataRand();} \> // 随机初始化c \\
+\texttt{d = a + b * c;} \> // d被赋值为 a + b * c \\
+\texttt{d = ((a + b) * d - b / c ) * d;} \> // d可以被嵌套使用 \\
+\texttt{e = Sigmoid(d);} \> // d经过激活函数Sigmoid赋值给e
+\end{tabbing}
+}
+\end{flushleft}
+\end{tcolorbox}
+\hspace{0.1in} \scriptsize{(a) 张量进行1阶运算}
+\\
+\begin{tcolorbox}[enhanced,width=12cm,frame engine=empty,boxrule=0.1mm,size=title,colback=blue!10!white]
+\begin{flushleft}
+{\scriptsize
+\begin{tabbing}
+\texttt{XTensor a, b, c;} \hspace{12.0em} \= // 声明张量tensor \\
+\texttt{InitTensor4D(\&a, 2, 2, 3, 4);} \> // a为2*2*3*4的4阶张量 \\
+\texttt{InitTensor2D(\&b, 4, 5);} \> // b为4*5的矩阵 \\
+\texttt{a.SetDataRand();} \> // 随机初始化a \\
+\texttt{b.SetDataRand();} \> // 随机初始化b \\
+\texttt{c = MMul(a, b);} \> // 矩阵乘的结果为2*2*3*5的4阶张量
+\end{tabbing}
+}
+\end{flushleft}
+\end{tcolorbox}
+\hspace{0.1in} \scriptsize{(b) 张量之间的矩阵乘法}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-corresponence-between-matrix-element-and-output.tex
+++ b/Chapter9/Figures/fig-corresponence-between-matrix-element-and-output.tex
+%%%------------------------------------------------------------------------------------------------------------
+\begin{tikzpicture}
+\begin{scope}
+\tikzstyle{neuronnode} = [minimum size=1.5em,circle,draw,ublue,very thick,fill=white,drop shadow={shadow xshift=0.1em,shadow yshift=-0.1em}]
+\node [anchor=center,neuronnode] (neuron00) at (0,0) {};
+\node [anchor=center,neuronnode] (neuron01) at ([yshift=-3em]neuron00) {};
+\node [anchor=center,neuronnode] (neuron02) at ([yshift=-3em]neuron01) {};
+\node [anchor=east] (x0) at ([xshift=-6em]neuron00.west) {$x_0$};
+\node [anchor=east] (x1) at ([xshift=-6em]neuron01.west) {$x_1$};
+\node [anchor=east] (x2) at ([xshift=-6em]neuron02.west) {$b$};
+\node [anchor=west] (y0) at ([xshift=4em]neuron00.east) {$y_0$：\scriptsize{温度}};
+\draw [->,red!50,line width=0.4mm] (x0.east) -- (neuron00.180) node [pos=0.1,above] {\tiny{$w_{00}$}};
+\draw [->,red!50,line width=0.4mm] (x1.east) -- (neuron00.200) node [pos=0.1,above] {\tiny{$w_{10}$}};
+\draw [->,red!50,line width=0.4mm] (x2.east) -- (neuron00.220) node [pos=0.05,above,yshift=0.3em] {\tiny{$b_{0}$}};
+\draw [->,red!30,line width=0.4mm] (neuron00.east) -- (y0.west);
+\node [anchor=west] (y1) at ([xshift=4em]neuron01.east) {$y_1$：\scriptsize{湿度}};
+\draw [->,blue!50,line width=0.4mm] (x0.east) -- (neuron01.160) node [pos=0.4,above] {\tiny{$w_{01}$}};
+\draw [->,blue!50,line width=0.4mm] (x1.east) -- (neuron01.180) node [pos=0.35,above,yshift=-0.2em] {\tiny{$w_{11}$}};
+\draw [->,blue!50,line width=0.4mm] (x2.east) -- (neuron01.200) node [pos=0.3,below,yshift=0.2em] {\tiny{$b_{1}$}};
+\draw [->,blue!30,line width=0.4mm] (neuron01.east) -- (y1.west);
+\node [anchor=west] (y2) at ([xshift=4em]neuron02.east) {$y_2$：\scriptsize{风力}};
+\draw [->,purple!40,line width=0.4mm] (x0.east) -- (neuron02.140) node [pos=0.1,below,yshift=-0.2em] {\tiny{$w_{02}$}};
+\draw [->,purple!40,line width=0.4mm] (x1.east) -- (neuron02.160) node [pos=0.1,below] {\tiny{$w_{12}$}};
+\draw [->,purple!40,line width=0.4mm] (x2.east) -- (neuron02.180) node [pos=0.3,below] {\tiny{$b_{2}$}};
+\draw [->,purple!30,line width=0.4mm] (neuron02.east) -- (y2.west);
+\end{scope}
+\end{tikzpicture}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-derivative1.tex
+++ b/Chapter9/Figures/fig-derivative1.tex
+%%%------------------------------------------------------------------------------------------------------------
+\begin{tikzpicture}
+\begin{axis}[  
+  name=sigmoid,
+  width=6cm, height=4.5cm, 
+  xtick={-5,-2.5,...,6},
+  ytick={0,0.5,1.0},
+  xlabel={$x$},
+  ylabel={$y$},
+  xlabel style={xshift=2.4cm,yshift=0.7cm},
+  axis y line=middle,%y轴居中
+  ylabel style={xshift=0.1cm,yshift=0cm},
+  x axis line style={->},%x轴箭头
+  axis line style={very thick},
+  axis x line*=bottom,%将上面的x轴去掉，仅保留下面的x轴
+  xmin=-6,
+  xmax=6,
+  ymin=0,
+  ymax=1.4]
+\addplot[draw=red,very thick]{1/(1 + exp( -x))};
+\end{axis}
+\node [anchor=north] (labelc) at (2.3,-0.5) {\footnotesize{(a)}};
+\begin{axis}[
+  at={(sigmoid.east)},
+  anchor=east, 
+  xshift=6cm,
+  yshift=0cm,  
+  width=6cm, height=4.5cm, 
+  xtick={-4,-2,...,4},
+  ytick={0,0.1,0.2},
+  xlabel={$x$},
+  ylabel={$y$},
+  xlabel style={xshift=2.4cm,yshift=0.7cm},
+  axis y line=middle,%y轴居中
+  ylabel style={xshift=0.1cm,yshift=0cm},
+  x axis line style={->},%x轴箭头
+  axis line style={very thick},
+  axis x line*=bottom,%将上面的x轴去掉，仅保留下面的x轴
+  xmin=-6,
+  xmax=6,
+  ymin=0,
+  ymax=0.3]
+\addplot[draw=ublue,very thick]{(1/(1 + exp( -x)))*(1-(1/(1 + exp( -x))))};
+\end{axis}
+\node [anchor=north] (labelc) at (8.2,-0.5) {\footnotesize{(b)}};
+\end{tikzpicture}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-derivative2.tex
+++ b/Chapter9/Figures/fig-derivative2.tex
+%%%------------------------------------------------------------------------------------------------------------
+\begin{tikzpicture}
+\begin{axis}[  
+  name=tanh,
+  width=6cm, height=4.5cm, 
+  xtick={-5,-2.5,...,6},
+  ytick={-1.0,-0.5,0,0.5,1.0},
+  xlabel={$x$},
+  ylabel={$y$},
+  xlabel style={xshift=2.4cm,yshift=2.5cm},
+  axis y line=middle,%y轴居中
+  ylabel style={xshift=0.1cm,yshift=0cm},
+  x axis line style={->},%x轴箭头
+  axis line style={very thick},
+  axis x line*=middle,%将上面的x轴去掉，仅保留下面的x轴
+  xmin=-6,
+  xmax=6,
+  ymin=-1.2,
+  ymax=1.2]
+\addplot[draw=red,very thick]{tanh(x)};
+\end{axis}
+\node [anchor=north] (labelc) at (2.3,-0.5) {\footnotesize{(a)}};
+\begin{axis}[  
+  at={(tanh.east)},
+  anchor=east, 
+  xshift=6cm,
+  yshift=0cm, 
+  width=6cm, height=4.5cm, 
+  xtick={-10,-5,...,10},
+  ytick={0,0.5,1},
+  xlabel={$x$},
+  ylabel={$y$},
+  xlabel style={xshift=2.4cm,yshift=0.7cm},
+  axis y line=middle,%y轴居中
+  ylabel style={xshift=0.1cm,yshift=0cm},
+  x axis line style={->},%x轴箭头
+  axis line style={very thick},
+  axis x line*=bottom,%将上面的x轴去掉，仅保留下面的x轴
+  xmin=-10,
+  xmax=10,
+  ymin=0,
+  ymax=1.2]
+\addplot[draw=ublue,very thick]{1-tanh(x)*tanh(x)};
+\end{axis}
+\node [anchor=north] (labelc) at (8.2,-0.5) {\footnotesize{(b)}};
+\end{tikzpicture}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-derivative3.tex
+++ b/Chapter9/Figures/fig-derivative3.tex
+%%%------------------------------------------------------------------------------------------------------------
+\begin{tikzpicture}
+\begin{axis}[  
+  name=relu,
+  width=6cm, height=4.5cm, 
+  xtick={-1.0,-0.5,...,1.0},
+  ytick={0.5,1.0},
+  xlabel={$x$},
+  ylabel={$y$},
+  xlabel style={xshift=2.4cm,yshift=0.7cm},
+  axis y line=middle,%y轴居中
+  ylabel style={xshift=0.1cm,yshift=0cm},
+  x axis line style={->},%x轴箭头
+  axis line style={very thick},
+  axis x line*=middle,%将上面的x轴去掉，仅保留下面的x轴
+  xmin=-1.2,
+  xmax=1.2,
+  ymin=0,
+  ymax=1.2]
+\addplot[domain=-1:0,draw=red,very thick]{0};
+\addplot[domain=0:1,draw=red,very thick]{x};
+\end{axis}
+\node [anchor=north] (labelc) at (2.3,-0.5) {\footnotesize{(a)}};
+\begin{axis}[  
+  at={(relu.east)},
+  anchor=east, 
+  xshift=6cm,
+  yshift=0cm, 
+  width=6cm, height=4.5cm, 
+  xtick={-4,-2,...,4},
+  ytick={0,0.2,...,1.0},
+  xlabel={$x$},
+  ylabel={$y$},
+  xlabel style={xshift=2.4cm,yshift=0.7cm},
+  axis y line=middle,%y轴居中
+  ylabel style={xshift=0.1cm,yshift=0cm},
+  x axis line style={->},%x轴箭头
+  axis line style={very thick},
+  axis x line*=middle,%将上面的x轴去掉，仅保留下面的x轴
+  xmin=-5,
+  xmax=5,
+  ymin=0,
+  ymax=1.2]
+\addplot[domain=-4:0,draw=ublue,very thick]{0};
+\addplot[domain=0:4,draw=ublue,very thick]{1};
+\end{axis}
+\node [anchor=north] (labelc) at (8.2,-0.5) {\footnotesize{(b)}};
+\end{tikzpicture}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-different-forms-of-neuronal-input.tex
+++ b/Chapter9/Figures/fig-different-forms-of-neuronal-input.tex
+%%%------------------------------------------------------------------------------------------------------------
+\begin{tikzpicture}
+\begin{scope}
+\draw [->,thick] (0,0) -- (2.5,0);
+\draw [->,thick] (0,0) -- (0, 1.5);
+\draw [-,very thick,ublue,domain=0.6:2,samples=100] plot (\x,{ 1/\x - 0.2});
+\node [anchor=east] (ylabel) at (0, 3.2em) {\footnotesize{$x_0$}};
+\node [anchor=north] (xlabel) at (5em, 0em) {\scriptsize{距离(km)}};
+\end{scope}
+\begin{scope}[xshift=9em]
+\draw [->,thick] (0,0) -- (2.5,0);
+\draw [->,thick] (0,0) -- (0, 1.5);
+\draw [-,very thick,ublue,domain=0.4:2,samples=100] plot (\x,{ 0.5/\x});
+\node [anchor=east] (ylabel) at (0, 3.2em) {\footnotesize{$x_1$}};
+\node [anchor=north] (xlabel) at (5em, 0em) {\scriptsize{票价(元)}};
+\end{scope}
+\begin{scope}[xshift=18em]
+\draw [->,thick] (0,0) -- (2.5,0);
+\draw [->,thick] (0,0) -- (0, 1.5);
+\node [anchor=east] (ylabel) at (0, 3.2em) {\footnotesize{$x_2$}};
+\node [anchor=south, fill=ublue, minimum width=1.5em, minimum height=0.1em, inner sep=0] (histogram1) at (1.5em, 0) {};
+\node [anchor=south, fill=ublue, minimum width=1.5em, minimum height=3em, inner sep=0] (histogram2) at (4.0em, 0) {};
+\node [anchor=north] (hlabel1) at (histogram1.south) {\tiny{女友不去}};
+\node [anchor=north] (hlabel2) at (histogram2.south) {\tiny{女友去}};
+\end{scope}
+\end{tikzpicture}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-embedding-matrix.tex
+++ b/Chapter9/Figures/fig-embedding-matrix.tex
+%%%------------------------------------------------------------------------------------------------------------
+\begin{tikzpicture}
+\begin{scope}
+\node [anchor=center,inner sep=2pt] (e) at (0,0) {\small{$e=w$}};
+\node [anchor=west,inner sep=2pt] (c) at (e.east) {\small{$\textbf{C}$}};
+\begin{pgfonlayer}{background}
+\node [rectangle,inner sep=0.4em,draw,fill=blue!20!white] [fit = (e) (c)] (box) {};
+\end{pgfonlayer}
+\draw [->,thick] ([yshift=-1em]box.south)--([yshift=-0.1em]box.south) node [pos=0,below] (bottom1) {\small{单词$w$}};
+\draw [->,thick] ([yshift=0.1em]box.north)--([yshift=1em]box.north) node [pos=1,above] (top1) {\scriptsize{$e$=(8,.2,-1,.9,...,1)}};
+\node [anchor=north] (bottom2) at ([yshift=0.3em]bottom1.south) {\scriptsize{$w$=(0,0,1,0,...,0)}};
+\node [anchor=south] (top2) at ([yshift=-0.3em]top1.north) {\small{$w$的分布式表示}};
+{
+\node [anchor=north west,fill=red!20!white] (cmatrix) at ([xshift=3em,yshift=1.0em]c.north east) {\scriptsize{$\begin{pmatrix} 1 & .2 & -.2 & 8 & ... & 0 \\ .6 & .8 & -2 & 1 & ... & -.2 \\ 8 & .2 & -1 & .9 & ... & 2.3 \\ 1 & 1.2 & -.9 & 3 & ... & .2 \\ ... & ... & ... & ... & ... & ... \\ 1 & .3 & 3 & .9 & ... & 5.1 \end{pmatrix}$}};
+\node [anchor=west,inner sep=2pt,fill=red!30!white] (c) at (e.east) {\small{$\textbf{C}$}};
+\draw [<-,thick] (c.east) -- ([xshift=3em]c.east);
+}
+{
+\node [anchor=south,draw,fill=green!20!white] (e2) at ([yshift=1.5em]cmatrix.north) {\scriptsize{外部词嵌入系统得到的$\textbf{C}$}};
+\draw [->,very thick,dashed] (e2.south) -- (cmatrix.north);
+}
+\end{scope}
+\end{tikzpicture}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-embedding.tex
+++ b/Chapter9/Figures/fig-embedding.tex
+%%%------------------------------------------------------------------------------------------------------------
+\begin{tikzpicture}
+{
+\begin{scope}[xshift=2in]
+\node [anchor=north west] (o1) at (0,0) {\footnotesize{$\begin{bmatrix} .1 \\ -1 \\ 2 \\ ... \\ 0 \end{bmatrix}$}};
+\node [anchor=north west] (o2) at ([xshift=1em]o1.north east) {\footnotesize{$\begin{bmatrix} 1 \\ 2 \\ .2 \\ ... \\ -1 \end{bmatrix}$}};
+\node [anchor=north east] (v) at ([xshift=-0em]o1.north west) {\footnotesize{$\begin{matrix} \textrm{\ \ \ 属性}_1 \\ \textrm{\ \ \ 属性}_2 \\ \textrm{\ \ \ 属性}_3 \\ ... \\ \textrm{属性}_{512} \end{matrix}$}};
+\node [anchor=south] (w1) at (o1.north) {\footnotesize{桌子}};
+\node [anchor=south] (w2) at (o2.north) {\footnotesize{椅子}};
+{
+\node [anchor=south,fill=red!20!white] (cosine) at (w1.north) {\footnotesize{$\textrm{cosine}(\textrm{`桌子'},\textrm{`椅子'})=0.5$}};
+}
+\end{scope}
+}
+\end{tikzpicture}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-fit.tex
+++ b/Chapter9/Figures/fig-fit.tex
+%%%------------------------------------------------------------------------------------------------------------
+\begin{tikzpicture}
+%% a two-layer neural network
+\begin{scope}
+\tikzstyle{neuronnode} = [minimum size=1.7em,circle,draw,ublue,very thick,inner sep=1pt, fill=white,align=center,drop shadow={shadow xshift=0.1em,shadow yshift=-0.1em}]
+%% input and hidden layers
+\node [neuronnode] (n10) at (0,0) {\tiny{$f$}\\[-1ex] \tiny{$\sum$}};
+\node [neuronnode] (n11) at (1.4,0) {\tiny{$f$}\\[-1ex] \tiny{$\sum$}};
+\draw [-,ublue] (n10.west) -- (n10.east);
+\draw [-,ublue] (n11.west) -- (n11.east);
+\node [anchor=north] (x1) at ([yshift=-6em]n11.south) {$x_1$};
+\node [anchor=north] (labela) at ([xshift=3.5em,yshift=-0.5em]x1.south) {\footnotesize{(a) 拟合一小段函数}};
+\node [anchor=north] (b) at ([yshift=-6em]n10.south) {$b$};
+{
+\draw [->,thick,red] (b.north) -- ([yshift=-0.1em]n10.south);
+\draw [->,thick,ugreen] (x1.north) -- ([yshift=-0.1em]n10.290);
+}
+{
+\draw [->,thick,blue] (b.north) -- ([yshift=-0.1em]n11.250);
+\draw [->,thick,purple] (x1.north) -- ([yshift=-0.1em]n11.south);
+}
+%% output layers
+\node [neuronnode] (n20) at (0.7,5em) {\scriptsize{$\sum$}};
+{\draw [->,thick,brown] ([yshift=0.1em]n10.north) -- ([yshift=-0.1em]n20.250);}
+{\draw [->,thick,orange] ([yshift=0.1em]n11.north) -- ([yshift=-0.1em]n20.290);}
+\node [] (y) at ([yshift=3em]n20.north) {$y$};
+\draw [->,thick] ([yshift=0.1em]n20.north) -- (y.south);
+%% weight and bias
+{\node [anchor=center,rotate=90,fill=white,inner sep=1pt] (b0) at ([yshift=3em,xshift=-0.5em]b.north) {\tiny{$b_1=-6$}};}
+{\node [anchor=center,rotate=-59,fill=white,inner sep=1pt] (w2) at ([yshift=1.2em,xshift=-1.2em]x1.north) {\tiny{$w_1=100$}};}
+{\node [anchor=center,rotate=59,fill=white,inner sep=1pt] (b1) at ([yshift=5.1em,xshift=2.3em]b.north) {\tiny{$b_2=-4$}};}
+{\node [anchor=center,rotate=90,fill=white,inner sep=1pt] (w1) at ([yshift=3em,xshift=0.5em]x1.north) {\tiny{$w_2=100$}};}
+{\node [anchor=center,rotate=62,fill=white,inner sep=1pt] (w21) at ([yshift=1.8em,xshift=0.2em]n10.north) {\tiny{$w'_1=-0.7$}};}
+{\node [anchor=center,rotate=-62,fill=white,inner sep=1pt] (w22) at ([yshift=1.8em,xshift=-0.2em]n11.north) {\tiny{$w'_2=0.7$}};}
+%% sigmoid box
+\begin{scope}
+{
+\node [anchor=west] (flabel) at ([xshift=0.5in]y.east) {\scriptsize{sigmoid:}};
+\node [anchor=north east] (slabel) at ([xshift=0]flabel.south east) {\scriptsize{sum:}};
+\node [anchor=west,inner sep=2pt] (flabel2) at (flabel.east) {\scriptsize{$f(s)=1/(1+e^{-s})$}};
+\node [anchor=west,inner sep=2pt] (flabel3) at (slabel.east) {\scriptsize{$s=x_1 \cdot w + b$}};
+\draw [->,thick,dotted] ([yshift=-0.3em,xshift=-0.1em]n11.60)  .. controls +(east:1) and +(west:2) ..  ([xshift=-0.2em]flabel.west) ;
+\begin{pgfonlayer}{background}
+{
+\node [rectangle,inner sep=0.2em,fill=blue!20,drop shadow={shadow xshift=0.1em,shadow yshift=-0.1em}] [fit = (flabel) (flabel2) (flabel3)] (funcbox) {};
+}
+\end{pgfonlayer}
+}
+\end{scope}
+%% output illustration
+\begin{scope}[xshift=1.6in,yshift=0.1in]
+{
+\draw [->,thick] (-1.6,0) -- (1.6,0);
+\draw [->,thick] (0,0) -- (0,2);
+\draw [-] (-0.05,1) -- (0.05,1);
+\node [anchor=east,inner sep=1pt] (label1) at (0,1) {\tiny{1}};
+\node [anchor=south east,inner sep=1pt] (label2) at (0,0) {\tiny{0}};
+}
+{\draw [->,dashed] (0.6,-0.05) -- (0.6,-0.75in);}
+{\draw [-,very thick,ublue,rounded corners=0.1em] (-1.5,0) -- (0.5,0) -- (0.5,0.7) -- (0.7,0.7) -- (0.7,0) -- (1.5,0);}
+\end{scope}
+\begin{scope}[xshift=1.6in,yshift=-1.0in]
+{
+\draw [->,thick] (-1.6,0) -- (1.6,0);
+\draw [->,thick] (0,0) -- (0,2);
+\draw [-,very thick,red,domain=-1.5:1.5,samples=100] plot (\x,{0.2 * (\x +0.4)^3 + 1.2 - 0.3 *(\x + 0.8)^2});
+}
+{
+\foreach \n in {0.5}{
+    \pgfmathsetmacro{\result}{0.2 * (\n + 0.1 + 0.4)^3 + 1.2 - 0.3 *(\n + 0.1 + 0.8)^2};
+    \draw [-,ublue,thick] (\n,0) -- (\n, \result) -- (\n + 0.2, \result) -- (\n + 0.2, 0);
+}
+}
+\end{scope}
+\end{scope}
+%----------------------------------------------------------------------------------------
+\begin{scope}[xshift=2.6in]
+\tikzstyle{neuronnode} = [minimum size=1.7em,circle,draw,ublue,very thick,inner sep=1pt, fill=white,align=center,drop shadow={shadow xshift=0.1em,shadow yshift=-0.1em}]
+%% input and hidden layers
+\node [neuronnode] (n10) at (0,0) {\tiny{$f$}\\[-1ex] \tiny{$\sum$}};
+\node [neuronnode] (n11) at (1.4,0) {\tiny{$f$}\\[-1ex] \tiny{$\sum$}};
+\draw [-,ublue] (n10.west) -- (n10.east);
+\draw [-,ublue] (n11.west) -- (n11.east);
+\node [anchor=north] (x1) at ([yshift=-6em]n11.south) {$x_1$};
+\node [anchor=north] (labelb) at ([xshift=6em,yshift=-0.5em]x1.south) {\footnotesize{(b) 拟合更大一段函数}};
+\node [anchor=north] (b) at ([yshift=-6em]n10.south) {$b$};
+{
+\draw [->,thick,red] (b.north) -- ([yshift=-0.1em]n10.south);
+\draw [->,thick,ugreen] (x1.north) -- ([yshift=-0.1em]n10.290);
+}
+{
+\draw [->,thick,blue] (b.north) -- ([yshift=-0.1em]n11.250);
+\draw [->,thick,purple] (x1.north) -- ([yshift=-0.1em]n11.south);
+}
+{
+\node [neuronnode] (n12) at (2.5,0) {\tiny{$f$}\\[-1ex] \tiny{$\sum$}};
+\node [neuronnode] (n13) at (3.4,0) {\tiny{$f$}\\[-1ex] \tiny{$\sum$}};
+\draw [-,ublue] (n12.west) -- (n12.east);
+\draw [-,ublue] (n13.west) -- (n13.east);
+\draw [->,thick] (b.north) -- ([yshift=-0.1em]n12.250);
+\draw [->,thick] (x1.north) -- ([yshift=-0.1em]n12.270);
+\draw [->,thick] (b.north) -- ([yshift=-0.1em]n13.230);
+\draw [->,thick] (x1.north) -- ([yshift=-0.1em]n13.250);
+}
+%% output layers
+\node [neuronnode] (n20) at (0.7,5em) {\scriptsize{$\sum$}};
+{\draw [->,thick,brown] ([yshift=0.1em]n10.north) -- ([yshift=-0.1em]n20.250);}
+{\draw [->,thick,orange] ([yshift=0.1em]n11.north) -- ([yshift=-0.1em]n20.290);}
+\node [] (y) at ([yshift=3em]n20.north) {$y$};
+\draw [->,thick] ([yshift=0.1em]n20.north) -- (y.south);
+{
+\draw [->,thick] ([yshift=0.1em]n12.north) -- ([yshift=-0.1em]n20.330);
+\draw [->,thick] ([yshift=0.1em]n13.north) -- ([yshift=-0.1em]n20.340);
+}
+%% weight and bias
+{\node [anchor=center,rotate=90,fill=white,inner sep=1pt] (b0) at ([yshift=3em,xshift=-0.5em]b.north) {\tiny{$b_1=-6$}};}
+{\node [anchor=center,rotate=-59,fill=white,inner sep=1pt] (w2) at ([yshift=1.2em,xshift=-1.2em]x1.north) {\tiny{$w_1=100$}};}
+{\node [anchor=center,rotate=59,fill=white,inner sep=1pt] (b1) at ([yshift=5.1em,xshift=2.3em]b.north) {\tiny{$b_2=-4$}};}
+{\node [anchor=center,rotate=90,fill=white,inner sep=1pt] (w1) at ([yshift=3em,xshift=0.5em]x1.north) {\tiny{$w_2=100$}};}
+{\node [anchor=center,rotate=62,fill=white,inner sep=1pt] (w21) at ([yshift=1.8em,xshift=0.2em]n10.north) {\tiny{$w'_1=-0.7$}};}
+{\node [anchor=center,rotate=-62,fill=white,inner sep=1pt] (w22) at ([yshift=1.8em,xshift=-0.2em]n11.north) {\tiny{$w'_2=0.7$}};}
+%% sigmoid box
+\begin{scope}
+{
+\node [anchor=west] (flabel) at ([xshift=0.8in]y.east) {\scriptsize{sigmoid:}};
+\node [anchor=north east] (slabel) at ([xshift=0]flabel.south east) {\scriptsize{sum:}};
+\node [anchor=west,inner sep=2pt] (flabel2) at (flabel.east) {\scriptsize{$f(s)=1/(1+e^{-s})$}};
+\node [anchor=west,inner sep=2pt] (flabel3) at (slabel.east) {\scriptsize{$s=x_1 \cdot w + b$}};
+\draw [->,thick,dotted] ([yshift=-0.3em,xshift=-0.1em]n11.60)  .. controls +(east:1) and +(west:2) ..  ([xshift=-0.2em]flabel.west) ;
+\begin{pgfonlayer}{background}
+{
+\node [rectangle,inner sep=0.2em,fill=blue!20,drop shadow={shadow xshift=0.1em,shadow yshift=-0.1em}] [fit = (flabel) (flabel2) (flabel3)] (funcbox) {};
+}
+\end{pgfonlayer}
+}
+\end{scope}
+%% output illustration
+\begin{scope}[xshift=2.1in,yshift=0.1in]
+\draw [->,thick] (-1.6,0) -- (1.6,0);
+\draw [->,thick] (0,0) -- (0,2);
+\draw [-] (-0.05,1) -- (0.05,1);
+{\draw [-,very thick,ublue,rounded corners=0.1em] (-1.5,0) -- (0.5,0) -- (0.5,0.7) -- (0.7,0.7) -- (0.7,0) -- (1.5,0);}
+{\draw [-,very thick,ublue,rounded corners=0.1em] (-1.5,0) -- (0.7,0) -- (0.7,0.6) -- (0.9,0.6) -- (0.9,0) -- (1.5,0);}
+{\draw [->,dashed] (0.8,-0.05) -- (0.8,-0.78in);}
+\end{scope}
+\begin{scope}[xshift=2.1in,yshift=-1.0in]
+{
+\draw [->,thick] (-1.6,0) -- (1.6,0);
+\draw [->,thick] (0,0) -- (0,2);
+\draw [-,very thick,red,domain=-1.5:1.5,samples=100] plot (\x,{0.2 * (\x +0.4)^3 + 1.2 - 0.3 *(\x + 0.8)^2});
+}
+\foreach \n in {0.5}{
+    \pgfmathsetmacro{\result}{0.2 * (\n + 0.1 + 0.4)^3 + 1.2 - 0.3 *(\n + 0.1 + 0.8)^2};
+    \draw [-,ublue,thick] (\n,0) -- (\n, \result) -- (\n + 0.2, \result) -- (\n + 0.2, 0);
+}
+{
+\foreach \n in {0.7}{
+    \pgfmathsetmacro{\result}{0.2 * (\n + 0.1 + 0.4)^3 + 1.2 - 0.3 *(\n + 0.1 + 0.8)^2};
+    \draw [-,ublue,thick] (\n,0) -- (\n, \result) -- (\n + 0.2, \result) -- (\n + 0.2, 0);
+}
+}
+\end{scope}
+\end{scope}
+\end{tikzpicture}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-forward-propagation-hid.tex
+++ b/Chapter9/Figures/fig-forward-propagation-hid.tex
+%%%------------------------------------------------------------------------------------------------------------
+\begin{tikzpicture}
+\begin{scope}
+\node [anchor=center,draw,fill=red!20,minimum height=1.8em,minimum width=2.5em] (h) at (0,0) {$\textbf{h}^{k-1}$};
+\node [anchor=west,draw,fill=blue!20,minimum height=1.8em,minimum width=2.5em] (s) at ([xshift=6em]h.east) {$\textbf{s}^{k}$};
+\node [anchor=west,draw,fill=green!20,minimum height=1.8em,minimum width=2.5em] (h2) at ([xshift=6em]s.east) {$\textbf{h}^{k}$};
+\node [anchor=east] (prev) at ([xshift=-2em]h.west) {...};
+\node [anchor=west] (next) at ([xshift=2em]h2.east) {...};
+\draw [->,thick] ([xshift=0.1em]prev.east) -- ([xshift=-0.1em]h.west);
+\draw [->,thick] ([xshift=0.1em]h.east) -- ([xshift=-0.1em]s.west) node [pos=0.5,below] {\scriptsize{$\textbf{s}^k = \textbf{h}^{k-1}\textbf{w}^k$}};
+\draw [->,thick] ([xshift=0.1em]s.east) -- ([xshift=-0.1em]h2.west) node [pos=0.5,below] {\scriptsize{$\textbf{h}^k = f^k(\textbf{s}^{k})$}};
+\draw [->,thick] ([xshift=0.1em]h2.east) -- ([xshift=-0.1em]next.west);
+\end{scope}
+\end{tikzpicture}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-forward-propagation-output.tex
+++ b/Chapter9/Figures/fig-forward-propagation-output.tex
+%%%------------------------------------------------------------------------------------------------------------
+\begin{tikzpicture}
+\begin{scope}
+\node [anchor=center,minimum height=1.7em,fill=yellow!20,draw] (h) at (0,0) {$\textbf{h}^{K-1}$};
+\node [anchor=west,minimum height=1.7em,fill=blue!20,draw] (s) at ([xshift=5.5em]h.east) {$\textbf{s}^{K}$};
+\node [anchor=west,minimum height=1.7em,fill=green!20,draw] (h2) at ([xshift=5.5em]s.east) {$\textbf{h}^{K}$};
+\node [anchor=west,minimum height=1.7em,fill=orange!20,draw] (l) at ([xshift=5.5em]h2.east) {$L$};
+\draw [->] (h.east) -- (s.west);
+\draw [->] (s.east) -- (h2.west);
+\draw [->] (h2.east) -- (l.west) node [pos=0.5,above] {\tiny{损失}};
+\node [anchor=south west,inner sep=2pt] (step100) at ([xshift=0.5em,yshift=-0.8em]h.north east) {\tiny{$\textbf{s}^K = \textbf{h}^{K-1} \textbf{w}^K$}};
+\node [anchor=south west,inner sep=2pt] (step101) at (step100.north west) {\tiny{线性变换}};
+\node [anchor=south west,inner sep=2pt] (step200) at ([xshift=0.5em,yshift=-0.8em]s.north east) {\tiny{$\textbf{h}^K = f^K(\textbf{s}^K)$}};
+\node [anchor=south west,inner sep=2pt] (step201) at (step200.north west) {\tiny{激活函数}};
+\node [anchor=south,inner sep=1pt] (outputlabel) at ([yshift=0.0em]h2.north) {\tiny{\textbf{输出层}}};
+{
+\draw[decorate,thick,decoration={brace,mirror,raise=0.4em,amplitude=2mm}] (h.south west) -- (s.south west) node [pos=0.5,below,yshift=-1em] {\scriptsize{\textbf{第一阶段：线性变换}}};
+}
+{
+\draw[decorate,thick,decoration={brace,mirror,raise=0.4em,amplitude=2mm}] ([xshift=0.2em]s.south west) -- (l.south east) node [pos=0.5,below,yshift=-1em] (step2) {\scriptsize{\textbf{第二阶段：激活函数+损失函数}}};
+}
+\end{scope}
+\end{tikzpicture}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-forward-propagation.tex
+++ b/Chapter9/Figures/fig-forward-propagation.tex
+%%%------------------------------------------------------------------------------------------------------------
+\begin{tikzpicture}
+\begin{scope}
+\tikzstyle{layernode} = [draw,thick,fill=ugreen!30!white,blur shadow={shadow xshift=1pt,shadow yshift=-1pt}];
+\node [anchor=center,layernode,minimum height=4em,minimum width=1em] (layer01) at (0,0) {};
+\node [anchor=north west,layernode,minimum height=3em,minimum width=1em] (layer02) at ([xshift=3em]layer01.north east) {};
+\node [anchor=south west,layernode,minimum height=3em,minimum width=1em] (layer03) at ([xshift=7em]layer01.south east) {};
+\node [anchor=south west,layernode,minimum height=4em,minimum width=1em] (layer04) at ([xshift=11em]layer01.south east) {};
+\node [anchor=south west,layernode,minimum height=4em,minimum width=1em] (layer05) at ([xshift=3em]layer04.south east) {};
+\node [anchor=east] (input) at ([xshift=-1em]layer01.west){\scriptsize{输入}};
+\node [anchor=west] (output) at ([xshift=1em]layer05.east){\scriptsize{输出}};
+{
+\draw [->,very thick,ublue] ([xshift=-1em]layer01.west) -- ([xshift=-0.1em]layer01.west)node [pos=0.5,above] {\small{\ding{172}}};;
+}
+{
+\draw [->,very thick,ublue] ([xshift=0.1em,yshift=-0.5em]layer01.north east) -- ([xshift=-0.1em,yshift=-0.5em]layer02.north west)node [pos=0.5,above] {\small{\ding{173}}};
+}
+{
+\draw [->,very thick,ublue] ([xshift=0.1em,yshift=0.5em]layer01.south east) -- ([xshift=-0.1em,yshift=0.5em]layer03.south west)node [pos=0.5,below] {\small{\ding{174}}};
+}
+{
+\draw [->,very thick,ublue] ([xshift=0.1em,yshift=-0.5em]layer02.north east) -- ([xshift=-0.1em,yshift=-0.5em]layer04.north west)node [pos=0.5,above] {\small{\ding{175}}};
+\draw [->,very thick,ublue] ([xshift=0.1em,yshift=0.5em]layer03.south east) -- ([xshift=-0.1em,yshift=0.5em]layer04.south west)node [pos=0.5,below] {\small{\ding{176}}};
+\draw [->,very thick,ublue] ([xshift=0.1em]layer04.east) -- ([xshift=-0.1em]layer05.west)node [pos=0.5,above] {\small{\ding{177}}};
+\draw [->,very thick,ublue] ([xshift=0.1em]layer05.east) -- ([xshift=1.0em]layer05.east)node [pos=0.5,above] {\small{\ding{178}}};
+}
+{
+\draw [<-,thin] ([xshift=0.3em,yshift=0.3em]layer04.east) .. controls +(35:1) and +(215:1) .. ([xshift=-2em,yshift=0.3em]layer05.north west) node [pos=1,above] {\scriptsize{前向：层$i$ 的输出$h_{i}$}};
+}
+\end{scope}
+\end{tikzpicture}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-four-layers-of-neural-network.tex
+++ b/Chapter9/Figures/fig-four-layers-of-neural-network.tex
+%%%------------------------------------------------------------------------------------------------------------
+\begin{tikzpicture}
+\begin{scope}[]
+\def\neuronsep{1.3}
+\tikzstyle{neuronnode} = [minimum size=1.7em,circle,draw,ublue,very thick,inner sep=1pt, fill=white,align=center,drop shadow={shadow xshift=0.1em,shadow yshift=-0.1em}]
+%%% layer 1
+\foreach \n in {1,...,5}{
+    \node [neuronnode] (neuron0\n) at (\n * \neuronsep,0) {\tiny{$f_1$}\\[-1ex] \tiny{$\sum$}};
+    \draw [-,ublue] (neuron0\n.east) -- (neuron0\n.west);
+}
+\foreach \n in {1,...,5}{
+    \foreach \m in {1,...,5}{
+        \draw [<-] ([yshift=-0.1em]neuron0\m.south) -- ([yshift=-1.8em]neuron0\n.south);
+    }
+    \node [anchor=north] (x\n) at ([yshift=-1.8em]neuron0\n.south) {$x_\n$};
+}
+\node [anchor=west] (w1label) at ([xshift=-0.5em,yshift=0.5em]x5.north east) {$\textbf{w}_1$};
+\begin{pgfonlayer}{background}
+\node [rectangle,inner sep=0.2em,fill=red!20] [fit = (neuron01) (neuron05)] (layer01) {};
+\end{pgfonlayer}
+\node [anchor=west] (layer00label) at ([xshift=1.3em]x5.east) {\footnotesize{第0层}};
+\node [anchor=west] (layer00label2) at (layer00label.east) {\footnotesize{\red{(输入层)}}};
+{
+\node [anchor=west] (layer01label) at ([xshift=1em]layer01.east) {\footnotesize{第1层}};
+}
+{
+\node [anchor=west] (layer01label2) at (layer01label.east) {\footnotesize{\red{({隐层})}}};
+}
+%%% layer 2
+{
+\foreach \n in {2,...,4}{
+    \node [neuronnode] (neuron1\n) at (\n * \neuronsep,3.5em) {\tiny{$f_2$}\\[-1ex] \tiny{$\sum$}};
+    \draw [-,ublue] (neuron1\n.east) -- (neuron1\n.west);
+}
+\foreach \n in {2,...,4}{
+    \foreach \m in {1,...,5}{
+        \draw [<-] ([yshift=-0.1em]neuron1\n.south) -- (neuron0\m.north);
+    }
+}
+\node [anchor=west] (w2label) at ([xshift=-2.5em,yshift=4.6em]x5.north east) {$\textbf{w}_2$};
+\begin{pgfonlayer}{background}
+{
+\node [rectangle,inner sep=0.2em,fill=ugreen!20] [fit = (neuron12) (neuron14)] (layer02) {};
+}
+\end{pgfonlayer}
+\node [anchor=west] (layer02label) at ([xshift=4.4em]layer02.east) {\footnotesize{第2层}};
+{
+\node [anchor=west] (layer02label2) at (layer02label.east) {\footnotesize{\red{({隐层})}}};
+}
+}
+%%% layer 3
+{
+\foreach \n in {1,...,5}{
+    \node [neuronnode] (neuron2\n) at (\n * \neuronsep,7em) {\tiny{$f_3$}\\[-1ex] \tiny{$\sum$}};
+    \draw [-,ublue] (neuron2\n.east) -- (neuron2\n.west);
+}
+\foreach \n in {1,...,5}{
+    \foreach \m in {2,...,4}{
+        \draw [<-] ([yshift=-0.1em]neuron2\n.south) -- (neuron1\m.north);
+    }
+    \node [anchor=south] (y\n) at ([yshift=1.2em]neuron2\n.north) {$y_\n$};
+    \draw [<-,thick] ([yshift=1.1em]neuron2\n.north) -- (neuron2\n.north);
+}
+\node [anchor=west] (w3label) at ([xshift=-2.5em,yshift=7.5em]x5.north east) {$\textbf{w}_3$};
+\begin{pgfonlayer}{background}
+{
+\node [rectangle,inner sep=0.2em,fill=blue!20] [fit = (neuron21) (neuron25)] (layer03) {};
+}
+\end{pgfonlayer}
+\node [anchor=west] (layer03label) at ([xshift=1em]layer03.east) {\footnotesize{第3层}};
+{
+\node [anchor=west] (layer03label2) at (layer03label.east) {\footnotesize{\red{({输出层})}}};
+}
+}
+\end{scope}
+\end{tikzpicture}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-gaussian.tex
+++ b/Chapter9/Figures/fig-gaussian.tex
+%%%------------------------------------------------------------------------------------------------------------
+\begin{tikzpicture}
+        \draw[->, line width=1pt](-1.2,0)--(1.2,0)node[left,below,font=\tiny]{$x$};
+        \draw[->, line width=1pt](0,-1.2)--(0,1.2)node[right,font=\tiny]{$y$};
+        \draw[dashed](-1.2,1)--(1.2,1);
+        \foreach \x in {-1.0,-0.5,0.0,0.5,1.0}{\draw(\x,0)--(\x,0.05)node[below,outer sep=2pt,font=\tiny]at(\x,0){\x};}
+        \foreach \y in {0.5,1.0}{\draw(0,\y)--(0.05,\y)node[left,outer sep=2pt,font=\tiny]at(0,\y){\y};}
+        \draw[color=red ,domain=-1.2:1.2, line width=1pt]plot(\x,{exp(-1*((\x)^2))});
+        \node[black,anchor=south] at (0,1.2) {\small $y =e^{-x^2}$};
+        \end{tikzpicture}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-gradient-descent.tex
+++ b/Chapter9/Figures/fig-gradient-descent.tex
+%%%------------------------------------------------------------------------------------------------------------
+\pgfplotsset{
+  colormap={whitered}{color(-1cm)=(orange!75!red);color(1cm)=(white)}
+}
+\begin{tikzpicture}[
+  declare function = {mu1=1;},
+  declare function = {mu2=2;},
+  declare function = {sigma1=0.5;},
+  declare function = {sigma2=1;},
+  declare function = {normal(\m,\s)=1/(2*\s*sqrt(pi))*exp(-(x-\m)^2/(2*\s^2));},
+  declare function = {bivar(\ma,\sa,\mb,\sb)=1/(2*pi*\sa*\sb) * exp(-((x-\ma)^2/\sa^2 + (y-\mb)^2/\sb^2))/2;}]
+  \footnotesize{
+  {
+  \begin{scope}
+  \begin{axis}[
+    colormap name  = whitered,
+    width          = 8cm,
+    height         = 5cm,
+    view           = {20}{45},
+    enlargelimits  = false,
+    grid           = major,
+    domain         = -1:3,
+    y domain       = 0:4,
+    samples        = 30,
+    xlabel         = $\textbf{w}^{[1]}$,
+    ylabel         = $\textbf{w}^{[2]}$,
+    xlabel style   = {xshift=0em,yshift=0.8em},
+    ylabel style   = {xshift=0.2em,yshift=0.8em},
+    zlabel         = {$J(\textbf{w})$},
+    ztick          = {-0.1},
+    colorbar,
+    colorbar style = {
+      at     = {(1.2,0.5)},
+      anchor = north west,
+      ytick  = {0,-0.1},
+      height = 0.25*\pgfkeysvalueof{/pgfplots/parent axis height},
+      title  = {}
+    }
+  ]
+    \addplot3 [surf] {-bivar(mu1,sigma1,mu2,sigma2)};
+    \node [circle,fill=red,minimum size=3pt,inner sep=1.5pt] () at (axis cs:0.5,2,-0.01) {};
+    \draw [->,very thick,ublue] (axis cs:0.5,2,-0.01) -- (axis cs:0.8,1.6,-0.03) node [pos=1,right,inner sep=2pt] {\tiny{-$\frac{\partial J(\textbf{w})}{\partial \textbf{w}}$}};
+    \draw [->,very thick,dotted] (axis cs:0.5,2,-0.01) -- (axis cs:0.2,1.5,-0.03);
+    \draw [->,very thick,dotted] (axis cs:0.5,2,-0.01) -- (axis cs:0.2,3.5,-0.03);
+    %\draw [black!50] (axis cs:0,-1,0) -- (axis cs:0,4,0);
+  \end{axis}
+  \end{scope}
+  }
+  }
+\end{tikzpicture}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-identity.tex
+++ b/Chapter9/Figures/fig-identity.tex
+%%%------------------------------------------------------------------------------------------------------------
+\begin{tikzpicture}
+        \draw[->, line width=1pt](-1.2,0)--(1.2,0)node[left,below,font=\tiny]{$x$};
+        \draw[->, line width=1pt](0,-1.2)--(0,1.2)node[right,font=\tiny]{$y$};
+        \foreach \x in {-1.0,-0.5,0.0,0.5,1.0}{\draw(\x,0)--(\x,0.05)node[below,outer sep=2pt,font=\tiny]at(\x,0){\x};}
+        \foreach \y in {0.5,1.0}{\draw(0,\y)--(0.05,\y)node[left,outer sep=2pt,font=\tiny]at(0,\y){\y};}
+        \draw[color=red ,domain=-1:1, line width=1pt]plot(\x,\x);
+        \node[black,anchor=south] at (0,1.2) {\small $y =x$};
+        \end{tikzpicture}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-linear-transformation.tex
+++ b/Chapter9/Figures/fig-linear-transformation.tex
+%%%------------------------------------------------------------------------------------------------------------
+\begin{footnotesize}
+$$
+\begin{smallmatrix}  \underbrace{
+    \left\{
+        \begin{smallmatrix}
+            \left[
+            \begin{array}{cccc}
+             1& 0 &0 \\
+             0& 1 &0 \\
+             0& 0 &1
+            \end{array}
+            \right ]
+            \cdots
+            \left[
+            \begin{array}{cccc}
+                1& 0 &0 \\
+                0& 1 &0 \\
+                0& 0 &1
+            \end{array}
+            \right]
+        \end{smallmatrix}
+        \right\}
+     }\\5
+\end{smallmatrix}
+\times
+\begin{smallmatrix}
+\left[
+    \begin{array}{cccc}
+    1\\
+    1\\
+    1
+    \end{array}
+\right ]
+\end{smallmatrix}
+=
+\begin{smallmatrix}  \underbrace{
+    \left\{
+        \begin{smallmatrix}
+            \left[
+            \begin{array}{cccc}
+             1 \\
+             1 \\
+             1
+            \end{array}
+            \right ]
+            \cdots
+            \left[
+            \begin{array}{cccc}
+                1 \\
+                1 \\
+                1
+            \end{array}
+            \right]
+        \end{smallmatrix}
+        \right\}
+     }\\5
+\end{smallmatrix}
+$$
+\end{footnotesize}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-model-training.tex
+++ b/Chapter9/Figures/fig-model-training.tex
+%%%------------------------------------------------------------------------------------------------------------
+\begin{tikzpicture}
+\begin{scope}
+\node [anchor=west,draw,thick,minimum width=4em,minimum height=1.7em,fill=blue!20] (encoder) at (0,0) {模块};
+\node [anchor=south,minimum width=4em,minimum height=1.7em] (space) at ([yshift=0.3em]encoder.north) {\footnotesize{目标系统}};
+\begin{pgfonlayer}{background}
+\node [rectangle,draw,thick,fill=red!20] [fit = (encoder) (space)] (system) {};
+\end{pgfonlayer}
+\node [anchor=north] (data) at ([yshift=-1em]system.south) {\scriptsize{\textbf{目标任务有标注数据}}};
+\draw [->,thick] (data.north) -- ([yshift=-0.1em]system.south);
+\node [anchor=north] (label) at ([yshift=-0em]data.south) {\scriptsize{(a) standard method}};
+\end{scope}
+\begin{scope}[xshift=2.8in]
+\node [anchor=west,draw,dashed,thick,minimum width=4em,minimum height=1.7em,fill=blue!20] (encoder) at (0,0) {模块};
+\node [anchor=south,minimum width=4em,minimum height=1.7em] (space) at ([yshift=0.3em]encoder.north) {\footnotesize{目标系统}};
+\node [anchor=center,draw,thick,minimum width=4em,minimum height=1.7em,fill=green!20] (encoderpre) at ([xshift=-7em]encoder.center) {\footnotesize{语言模型}};
+\draw [->,thick] (encoderpre.east) -- (encoder.west);
+\begin{pgfonlayer}{background}
+\node [rectangle,draw,thick,fill=red!20] [fit = (encoder) (space)] (system) {};
+\end{pgfonlayer}
+\node [anchor=north] (data) at ([yshift=-1em]system.south) {\scriptsize{\textbf{目标任务有标注数据}}};
+\draw [->,thick] (data.north) -- ([yshift=-0.1em]system.south);
+\node [anchor=north] (data2) at ([yshift=-1em,xshift=-7em]system.south) {\scriptsize{\textbf{大规模无标注数据}}};
+\draw [->,thick] (data2.north) -- ([yshift=-0.1em]encoderpre.south);
+\node [anchor=north] (label) at ([yshift=-0em,xshift=-4em]data.south) {\scriptsize{(b) pre-training + fine-tuning}};
+\end{scope}
+\end{tikzpicture}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-more-layers.tex
+++ b/Chapter9/Figures/fig-more-layers.tex
+%%%------------------------------------------------------------------------------------------------------------
+\begin{tikzpicture}
+\begin{scope}[]
+\def\neuronsep{1.3}
+\tikzstyle{neuronnode} = [minimum size=1.7em,circle,draw,ublue,very thick,inner sep=1pt, fill=white,align=center,drop shadow={shadow xshift=0.1em,shadow yshift=-0.1em}]
+%%% layer 1
+\foreach \n in {1,...,5}{
+    \node [neuronnode] (neuron0\n) at (\n * \neuronsep,0) {\tiny{$f_1$}\\[-1ex] \tiny{$\sum$}};
+    \draw [-,ublue] (neuron0\n.east) -- (neuron0\n.west);
+}
+\foreach \n in {1,...,5}{
+    \foreach \m in {1,...,5}{
+        \draw [<-] ([yshift=-0.1em]neuron0\m.south) -- ([yshift=-1.8em]neuron0\n.south);
+    }
+    \node [anchor=north] (x\n) at ([yshift=-1.8em]neuron0\n.south) {$x_\n$};
+ {
+    \draw [<-,thick] ([yshift=1.1em]neuron0\n.north) -- (neuron0\n.north);
+    \node [anchor=south] (y\n) at ([yshift=1.2em]neuron0\n.north) {$y_\n$};
+    }
+}
+\node [anchor=west] (w1label) at ([xshift=-0.5em,yshift=0.5em]x5.north east) {$\textbf{w}_1$};
+\begin{pgfonlayer}{background}
+\node [rectangle,inner sep=0.2em,fill=red!20] [fit = (neuron01) (neuron05)] (layer01) {};
+\end{pgfonlayer}
+{
+\node [anchor=west] (layer01label) at ([xshift=2em]layer01.east) {\footnotesize{单层神经网络}};
+}
+\node[anchor=north] (arrow1) at ([yshift=0em]x3.south) {};
+\draw[fill=blue!20,draw=blue!30]([xshift=-0.20em]arrow1.north west)--([xshift=0.20em]arrow1.north east)--([yshift=0.15em,xshift=0.20em]arrow1.south east)--([yshift=0.15em,xshift=0.8em]arrow1.south east) --([yshift=-0.8em]arrow1.south)--([yshift=0.15em,xshift=-0.8em]arrow1.south west)--([yshift=0.15em,xshift=-0.20em]arrow1.south west)--([xshift=-0.20em]arrow1.north west);
+\end{scope}
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\begin{scope}[xshift=0.0em,yshift=-12.7em]
+\def\neuronsep{1.3}
+\tikzstyle{neuronnode} = [minimum size=1.7em,circle,draw,ublue,very thick,inner sep=1pt, fill=white,align=center,drop shadow={shadow xshift=0.1em,shadow yshift=-0.1em}]
+%%% layer 1
+\foreach \n in {1,...,5}{
+    \node [neuronnode] (neuron0\n) at (\n * \neuronsep,0) {\tiny{$f_1$}\\[-1ex] \tiny{$\sum$}};
+    \draw [-,ublue] (neuron0\n.east) -- (neuron0\n.west);
+}
+\foreach \n in {1,...,5}{
+    \foreach \m in {1,...,5}{
+        \draw [<-] ([yshift=-0.1em]neuron0\m.south) -- ([yshift=-1.8em]neuron0\n.south);
+    }
+    \node [anchor=north] (x\n) at ([yshift=-1.8em]neuron0\n.south) {$x_\n$};
+}
+\node [anchor=west] (w1label) at ([xshift=-0.5em,yshift=0.5em]x5.north east) {$\textbf{w}_1$};
+\begin{pgfonlayer}{background}
+\node [rectangle,inner sep=0.2em,fill=red!20] [fit = (neuron01) (neuron05)] (layer01) {};
+\end{pgfonlayer}
+%%% layer 2
+\foreach \n in {2,...,4}{
+    \node [neuronnode] (neuron1\n) at (\n * \neuronsep,3.5em) {\tiny{$f_2$}\\[-1ex] \tiny{$\sum$}};
+    \draw [-,ublue] (neuron1\n.east) -- (neuron1\n.west);
+}
+\foreach \n in {2,...,4}{
+    \foreach \m in {1,...,5}{
+        \draw [<-] ([yshift=-0.1em]neuron1\n.south) -- (neuron0\m.north);
+    }
+    \draw [<-,thick] ([yshift=1.1em]neuron1\n.north) -- (neuron1\n.north);
+    \node [anchor=south] (y\n) at ([yshift=1.25em]neuron1\n.north) {$y_\n$};
+}
+\node [anchor=west] (w2label) at ([xshift=-2.5em,yshift=4.6em]x5.north east) {$\textbf{w}_2$};
+\node [anchor=west] (layer02label) at ([xshift=2.7em]w2label.east) {\footnotesize{两层神经网络}};
+\begin{pgfonlayer}{background}
+{
+\node [rectangle,inner sep=0.2em,fill=ugreen!20] [fit = (neuron12) (neuron14)] (layer02) {};
+}
+\end{pgfonlayer}
+\node[anchor=north] (arrow1) at ([yshift=0em]x3.south) {};
+\draw[fill=blue!20,draw=blue!30]([xshift=-0.20em]arrow1.north west)--([xshift=0.20em]arrow1.north east)--([yshift=0.15em,xshift=0.20em]arrow1.south east)--([yshift=0.15em,xshift=0.8em]arrow1.south east) --([yshift=-0.8em]arrow1.south)--([yshift=0.15em,xshift=-0.8em]arrow1.south west)--([yshift=0.15em,xshift=-0.20em]arrow1.south west)--([xshift=-0.20em]arrow1.north west);
+\end{scope}
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\begin{scope}[xshift=0.0em,yshift=-29em]
+\def\neuronsep{1.3}
+\tikzstyle{neuronnode} = [minimum size=1.7em,circle,draw,ublue,very thick,inner sep=1pt, fill=white,align=center,drop shadow={shadow xshift=0.1em,shadow yshift=-0.1em}]
+%%% layer 1
+\foreach \n in {1,...,5}{
+    \node [neuronnode] (neuron0\n) at (\n * \neuronsep,0) {\tiny{$f_1$}\\[-1ex] \tiny{$\sum$}};
+    \draw [-,ublue] (neuron0\n.east) -- (neuron0\n.west);
+}
+\foreach \n in {1,...,5}{
+    \foreach \m in {1,...,5}{
+        \draw [<-] ([yshift=-0.1em]neuron0\m.south) -- ([yshift=-1.8em]neuron0\n.south);
+    }
+    \node [anchor=north] (x\n) at ([yshift=-1.8em]neuron0\n.south) {$x_\n$};
+}
+\node [anchor=west] (w1label) at ([xshift=-0.5em,yshift=0.5em]x5.north east) {$\textbf{w}_1$};
+\begin{pgfonlayer}{background}
+\node [rectangle,inner sep=0.2em,fill=red!20] [fit = (neuron01) (neuron05)] (layer01) {};
+\end{pgfonlayer}
+%%% layer 2
+{
+\foreach \n in {2,...,4}{
+    \node [neuronnode] (neuron1\n) at (\n * \neuronsep,3.5em) {\tiny{$f_2$}\\[-1ex] \tiny{$\sum$}};
+    \draw [-,ublue] (neuron1\n.east) -- (neuron1\n.west);
+}
+\foreach \n in {2,...,4}{
+    \foreach \m in {1,...,5}{
+        \draw [<-] ([yshift=-0.1em]neuron1\n.south) -- (neuron0\m.north);
+    }
+}
+\node [anchor=west] (w2label) at ([xshift=-2.5em,yshift=4.6em]x5.north east) {$\textbf{w}_2$};
+\begin{pgfonlayer}{background}
+{
+\node [rectangle,inner sep=0.2em,fill=ugreen!20] [fit = (neuron12) (neuron14)] (layer02) {};
+}
+\end{pgfonlayer}
+\node [anchor=west] (layer02label) at ([xshift=5.2em]layer02.east) {\footnotesize{三层神经网络}};
+}
+%%% layer 3
+{
+\foreach \n in {1,...,5}{
+    \node [neuronnode] (neuron2\n) at (\n * \neuronsep,7em) {\tiny{$f_3$}\\[-1ex] \tiny{$\sum$}};
+    \draw [-,ublue] (neuron2\n.east) -- (neuron2\n.west);
+}
+\foreach \n in {1,...,5}{
+    \foreach \m in {2,...,4}{
+        \draw [<-] ([yshift=-0.1em]neuron2\n.south) -- (neuron1\m.north);
+    }
+    \node [anchor=south] (y\n) at ([yshift=1.25em]neuron2\n.north) {$y_\n$};
+    \draw [<-,thick] ([yshift=1.1em]neuron2\n.north) -- (neuron2\n.north);
+}
+\node [anchor=west] (w3label) at ([xshift=-2.5em,yshift=7.5em]x5.north east) {$\textbf{w}_3$};
+\begin{pgfonlayer}{background}
+{
+\node [rectangle,inner sep=0.2em,fill=blue!20] [fit = (neuron21) (neuron25)] (layer03) {};
+}
+\end{pgfonlayer}
+}
+\end{scope}
+\end{tikzpicture}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-multilayer-neural-network-example.tex
+++ b/Chapter9/Figures/fig-multilayer-neural-network-example.tex
+%%%------------------------------------------------------------------------------------------------------------
+\begin{tikzpicture}
+\begin{scope}
+\def\neuronsep{1}
+\tikzstyle{neuronnode} = [minimum size=1.2em,circle,draw,ublue,very thick,inner sep=1pt, fill=white,align=center,drop shadow={shadow xshift=0.1em,shadow yshift=-0.1em}];
+%%% layer 1
+\foreach \n in {1,...,4}{
+    \node [neuronnode] (neuron0\n) at (\n * \neuronsep,0) {};
+    \draw [->] ([yshift=-0.8em]neuron0\n.south) -- ([yshift=-0.1em]neuron0\n.south) node [pos=0,below] {\tiny{...}};
+}
+\begin{pgfonlayer}{background}
+\node [rectangle,inner sep=0.2em,fill=red!20] [fit = (neuron01) (neuron04)] (layer01) {};
+\node [anchor=east] (layer01label) at (layer01.west) {\scriptsize{层$k-1$}};
+\end{pgfonlayer}
+%%% layer 2
+\foreach \n in {1,...,4}{
+    \node [neuronnode] (neuron1\n) at (\n * \neuronsep,3em) {};
+}
+\foreach \n in {1,...,4}{
+    \foreach \m in {1,...,4}{
+        \draw [<-] ([yshift=-0.1em]neuron1\n.south) -- (neuron0\m.north);
+    }
+}
+\begin{pgfonlayer}{background}
+\node [rectangle,inner sep=0.2em,fill=ugreen!20] [fit = (neuron11) (neuron14)] (layer02) {};
+\node [anchor=east] (layer02label) at (layer02.west) {\scriptsize{层$k$}};
+\end{pgfonlayer}
+%%% layer 3
+\foreach \n in {1,...,4}{
+    \node [neuronnode] (neuron2\n) at (\n * \neuronsep,6em) {};
+    \draw [<-] ([yshift=0.8em]neuron2\n.north) -- ([yshift=0.0em]neuron2\n.north) node [pos=0,above] {\tiny{...}};
+}
+\foreach \n in {1,...,4}{
+    \foreach \m in {1,...,4}{
+        \draw [<-] ([yshift=-0.1em]neuron2\n.south) -- (neuron1\m.north);
+    }
+}
+\begin{pgfonlayer}{background}
+\node [rectangle,inner sep=0.2em,fill=blue!20] [fit = (neuron21) (neuron24)] (layer03) {};
+\node [anchor=east] (layer03label) at (layer03.west) {\scriptsize{层$k+1$}};
+\end{pgfonlayer}
+%%% output layer
+\foreach \n in {1,...,4}{
+    \node [neuronnode] (neuron3\n) at (\n * \neuronsep,9.4em) {};
+{
+    \draw [<-] ([yshift=0.6em]neuron3\n.north) -- ([yshift=0.0em]neuron3\n.north) node [pos=0,above] {\tiny{output}};
+    }
+{
+    \draw [<-,very thick] ([yshift=0.6em]neuron3\n.north) -- ([yshift=0.0em]neuron3\n.north) node [pos=0,above] {\tiny{output}};
+    }
+    \draw [->] ([yshift=-0.6em]neuron3\n.south) -- ([yshift=0.0em]neuron3\n.south);
+}
+\begin{pgfonlayer}{background}
+\node [rectangle,inner sep=0.2em,fill=ugreen!20] [fit = (neuron31) (neuron34)] (layer04) {};
+\node [anchor=east] (layer04label) at (layer04.west) {\scriptsize{层$K$(输出)}};
+\end{pgfonlayer}
+{
+\node [neuronnode,draw=red,fill=red!20!white,inner sep=1pt] (neuron12new) at (2 * \neuronsep,3em) {};
+\node [anchor=east] (neuronsamplelabel) at ([yshift=-1em]layer02label.south east) {{\textbf{\tiny{第$k$层, 第$i$个神经元}}}};
+\draw [->,dashed,very thick,red] ([xshift=-0.2em,yshift=0.2em]neuronsamplelabel.east) .. controls +(30:1) and +(220:1) .. ([xshift=-0em,yshift=-0em]neuron12new.210);
+}
+{
+\foreach \n in {1,...,4}{
+\draw [<-,thick,red] (neuron2\n.south) -- (neuron12.north);
+}
+}
+{
+\draw [<-,thick,red] (neuron14.south) -- (neuron04.north);
+\node [anchor=north] (wlabel) at (layer02.south east) {{\scriptsize{$w_{4,4}^{k}$}}};
+}
+\end{scope}
+\end{tikzpicture}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-one-hot.tex
+++ b/Chapter9/Figures/fig-one-hot.tex
+%%%------------------------------------------------------------------------------------------------------------
+\begin{tikzpicture}
+\begin{scope}
+\node [anchor=north west] (o1) at (0,0) {\footnotesize{$\begin{bmatrix} 0 \\ 1 \\ 0 \\ 0 \\ 0 \\ ... \\ 0 \end{bmatrix}$}};
+\node [anchor=north west] (o2) at ([xshift=1em]o1.north east) {\footnotesize{$\begin{bmatrix} 0 \\ 0 \\ 0 \\ 1 \\ 0 \\ ... \\ 0 \end{bmatrix}$}};
+\node [anchor=north east] (v) at ([xshift=-0em]o1.north west) {\footnotesize{$\begin{matrix} \textrm{\ \ \ \ \ 你}_1 \\ \textrm{\ \ 桌子}_2 \\ \textrm{\ \ \ \ \ 他}_3 \\ \textrm{\ \ 椅子}_4 \\ \textrm{\ \ 我们}_5 \\ ... \\ \textrm{你好}_{10k} \end{matrix}$}};
+\node [anchor=south] (w1) at (o1.north) {\footnotesize{桌子}};
+\node [anchor=south] (w2) at (o2.north) {\footnotesize{椅子}};
+{
+\node [anchor=south,fill=red!20!white] (cosine) at (w1.north) {\footnotesize{$\textrm{cosine}(\textrm{`桌子'},\textrm{`椅子'})=0$}};
+}
+\end{scope}
+\end{tikzpicture}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-parallel.tex
+++ b/Chapter9/Figures/fig-parallel.tex
+%%%------------------------------------------------------------------------------------------------------------
+\begin{tikzpicture}
+%%%%%%%%%%%%%%%%
+% parameter server + processor
+\begin{scope}[]
+{\scriptsize
+\tikzstyle{parametershard} = [draw,thick,minimum width=4em,align=left,rounded corners=2pt]
+{
+\node[parametershard,anchor=west,fill=yellow!10] (param1) at (0,0) {$W_o$};
+\node (param2) at ([xshift=1em]param1.east) {};
+\node[parametershard,anchor=west,fill=red!10] (param3) at ([xshift=1em]param2.east) {$W_h$};
+\node[anchor=south,inner sep=1pt] (serverlabel) at ([yshift=0.2em]param2.north) {\footnotesize{\textbf{parameter server}: $\mathbf w_{new} = \mathbf w - \alpha\cdot \frac{\partial L}{\partial \mathbf w}$}};
+}
+\begin{pgfonlayer}{background}
+{
+\node[rectangle,draw,thick,inner sep=2pt,fill=gray!20] [fit = (param1) (param2) (param3) (serverlabel)] (serverbox) {};
+}
+\end{pgfonlayer}
+\tikzstyle{processor} = [draw,thick,fill=orange!20,minimum width=4em,align=left,rounded corners=2pt]
+{
+\node [processor,anchor=north,align=center] (processor2) at ([yshift=-1.2in]serverlabel.south) {\scriptsize{Processor 2}\\\scriptsize{on GPU2 (G2)}};
+\node [anchor=north] (labela) at ([xshift=4em,yshift=-1em]processor2.south) {\footnotesize {(a)同步更新}};
+\node [processor,anchor=east,align=center] (processor1) at ([xshift=-1em]processor2.west) {\scriptsize{Processor 1}\\\scriptsize{on GPU1 (G1)}};
+\node [processor,anchor=west,align=center] (processor3) at ([xshift=1em]processor2.east) {\scriptsize{Processor 3}\\\scriptsize{on GPU3 (G3)}};
+}
+{
+\draw[->,very thick,red] ([xshift=-0.5em,yshift=2pt]processor2.north) -- ([xshift=-0.5em,yshift=-2pt]serverbox.south) node [pos=0.5,align=right,xshift=-2em] (pushlabel) {\scriptsize{$\frac{\partial L}{\partial \mathbf w}$}};;
+\draw[<-,very thick,blue] ([xshift=0.5em,yshift=2pt]processor2.north) -- ([xshift=0.5em,yshift=-2pt]serverbox.south) node [pos=0.5,align=left,xshift=2.2em] (fetchlabel) {\scriptsize{$\mathbf w_{new}$}};;;
+\draw[->,very thick,red] ([xshift=-0.5em,yshift=2pt]processor3.north) --
+ ([xshift=3em,yshift=-2pt]serverbox.south);
+\draw[<-,very thick,blue] ([xshift=0.5em,yshift=2pt]processor3.north) -- ([xshift=4em,yshift=-2pt]serverbox.south) node [pos=0.5,align=left,xshift=2.2em] (fetchlabel) {\scriptsize{fetch (F)}};
+\draw[->,very thick,red] ([xshift=-0.5em,yshift=2pt]processor1.north) -- ([xshift=-4em,yshift=-2pt]serverbox.south) node [pos=0.5,align=right,xshift=-2em] (pushlabel) {\scriptsize{push (P)}};
+\draw[<-,very thick,blue] ([xshift=0.5em,yshift=2pt]processor1.north) -- ([xshift=-3em,yshift=-2pt]serverbox.south);
+}
+%%%%%%%%%%%
+% synchronous mode
+\tikzstyle{job} = [draw,rotate=90,minimum height=0.25in]
+\scriptsize{
+{
+\node[job,anchor=south west,fill=blue!50] (fetch11) at ([xshift=6em,yshift=1em]processor3.east) {\textbf{F}};
+\node[job,anchor=west,fill=orange!30] (minibatch11) at ([yshift=1pt]fetch11.east) {\tiny{minibatch3}};
+\node[job,anchor=west,fill=red!50] (push11) at ([yshift=1pt]minibatch11.east) {\textbf{P}};
+\node[job,anchor=north west,fill=blue!50] (fetch12) at ([xshift=0.8em]fetch11.south west) {\textbf{F}};
+\node[job,anchor=west,fill=orange!30] (minibatch12) at ([yshift=1pt]fetch12.east) {\tiny{minibatch2}};
+\node[job,anchor=west,fill=red!50] (push12) at ([yshift=1pt]minibatch12.east) {\textbf{P}};
+\node[job,anchor=north west,fill=blue!50] (fetch13) at ([xshift=0.8em]fetch12.south west) {\textbf{F}};
+\node[job,anchor=west,fill=orange!30,minimum width=8em] (minibatch13) at ([yshift=1pt]fetch13.east) {\scriptsize{minibatch1}};
+\node[job,anchor=west,fill=red!50] (push13) at ([yshift=1pt]minibatch13.east) {\textbf{P}};
+\node[anchor=south west,draw,fill=gray!20,minimum width=8.0em] (update11) at ([yshift=4.0em]push11.north east) {Update};
+\node[anchor=north] (G11) at (fetch11.west) {\small{G3}};
+\node[anchor=north] (G12) at (fetch12.west) {\small{G2}};
+\node[anchor=north] (G13) at (fetch13.west) {\small{G1}};
+\node[anchor=north,align=center] (synlabel) at (G12.south) {\small{\textbf{Synchronous}}\\\small{\textbf{Training}}};
+\draw[->,thick] ([xshift=1em]G13.east) -- ([xshift=1em,yshift=1.4in]G13.east) node [pos=0.5,rotate=90,yshift=-1em] {\small{time line}};
+}
+}
+{
+\draw [<->,thin,dotted] ([xshift=-1pt]minibatch11.north) .. controls +(west:3em) and +(east:3em) .. ([xshift=1pt]processor3.east);
+\draw [<->,thin,dotted] ([xshift=-1pt]fetch11.north) .. controls +(west:4em) and +(east:4em) .. ([xshift=-0.5em,yshift=0.3in]processor3.north);
+\draw [<->,thin,dotted] ([xshift=-1pt]push11.north) -- ([xshift=-4em,yshift=0.8in]processor3.north);
+}
+{
+\draw [<->,thin,dotted] ([xshift=-1pt]update11.west) -- ([xshift=1pt,yshift=-1.5em]serverbox.north east);
+}
+}
+\end{scope}
+\begin{scope}[yshift=-2.5in]
+{\scriptsize
+\tikzstyle{parametershard} = [draw,thick,minimum width=4em,align=left,rounded corners=2pt]
+{
+\node[parametershard,anchor=west,fill=yellow!10] (param1) at (0,0) {$W_o$};
+\node (param2) at ([xshift=1em]param1.east) {};
+\node[parametershard,anchor=west,fill=red!10] (param3) at ([xshift=1em]param2.east) {$W_h$};
+\node[anchor=south,inner sep=1pt] (serverlabel) at ([yshift=0.2em]param2.north) {\footnotesize{\textbf{parameter server}: $\mathbf w_{new} = \mathbf w - \alpha\cdot \frac{\partial L}{\partial \mathbf w}$}};
+}
+\begin{pgfonlayer}{background}
+{
+\node[rectangle,draw,thick,inner sep=2pt,fill=gray!20] [fit = (param1) (param2) (param3) (serverlabel)] (serverbox) {};
+}
+\end{pgfonlayer}
+\tikzstyle{processor} = [draw,thick,fill=orange!20,minimum width=4em,align=left,rounded corners=2pt]
+{
+\node [processor,anchor=north,align=center] (processor2) at ([yshift=-1.2in]serverlabel.south) {\scriptsize{Processor 2}\\\scriptsize{on GPU2 (G2)}};
+\node [anchor=north] (label) at ([xshift=4em,yshift=-1em]processor2.south) {\footnotesize {(b)异步更新}};
+\node [processor,anchor=east,align=center] (processor1) at ([xshift=-1em]processor2.west) {\scriptsize{Processor 1}\\\scriptsize{on GPU1 (G1)}};
+\node [processor,anchor=west,align=center] (processor3) at ([xshift=1em]processor2.east) {\scriptsize{Processor 3}\\\scriptsize{on GPU3 (G3)}};
+}
+{
+\draw[->,very thick,red] ([xshift=-0.5em,yshift=2pt]processor2.north) -- ([xshift=-0.5em,yshift=-2pt]serverbox.south) node [pos=0.5,align=right,xshift=-2em] (pushlabel) {\scriptsize{$\frac{\partial L}{\partial \mathbf w}$}};;
+\draw[<-,very thick,blue] ([xshift=0.5em,yshift=2pt]processor2.north) -- ([xshift=0.5em,yshift=-2pt]serverbox.south) node [pos=0.5,align=left,xshift=2.2em] (fetchlabel) {\scriptsize{$\mathbf w_{new}$}};;;
+\draw[->,very thick,red] ([xshift=-0.5em,yshift=2pt]processor3.north) --
+ ([xshift=3em,yshift=-2pt]serverbox.south);
+\draw[<-,very thick,blue] ([xshift=0.5em,yshift=2pt]processor3.north) -- ([xshift=4em,yshift=-2pt]serverbox.south) node [pos=0.5,align=left,xshift=2.2em] (fetchlabel) {\scriptsize{fetch (F)}};
+\draw[->,very thick,red] ([xshift=-0.5em,yshift=2pt]processor1.north) -- ([xshift=-4em,yshift=-2pt]serverbox.south) node [pos=0.5,align=right,xshift=-2em] (pushlabel) {\scriptsize{push (P)}};
+\draw[<-,very thick,blue] ([xshift=0.5em,yshift=2pt]processor1.north) -- ([xshift=-3em,yshift=-2pt]serverbox.south);
+}
+%%%%%%%%%%%
+% synchronous mode
+\tikzstyle{job} = [draw,rotate=90,minimum height=0.25in]
+\scriptsize{
+{
+\node[job,anchor=south west,fill=blue!50] (fetch21) at ([xshift=6em,yshift=1em]processor3.east) {\textbf{F}};
+\node[job,anchor=west,fill=orange!30] (minibatch21) at ([yshift=1pt]fetch21.east) {\tiny{minibatch3}};
+\node[job,anchor=west,fill=red!50] (push21) at ([yshift=1pt]minibatch21.east) {\textbf{P}};
+\node[job,anchor=north west,fill=blue!50] (fetch22) at ([xshift=0.8em]fetch21.south west) {\textbf{F}};
+\node[job,anchor=west,fill=orange!30] (minibatch22) at ([yshift=1pt]fetch22.east) {\tiny{minibatch2}};
+\node[job,anchor=west,fill=red!50] (push22) at ([yshift=1pt]minibatch22.east) {\textbf{P}};
+\node[job,anchor=north west,fill=blue!50] (fetch23) at ([xshift=0.8em]fetch22.south west) {\textbf{F}};
+\node[job,anchor=west,fill=orange!30,minimum width=8em] (minibatch23) at ([yshift=1pt]fetch23.east) {\scriptsize{minibatch1}};
+\node[job,anchor=west,fill=red!50] (push23) at ([yshift=1pt]minibatch23.east) {\textbf{P}};
+\node[anchor=south west,draw,fill=gray!20,minimum width=0.59in] (update21) at ([yshift=2pt]push21.north east) {Update};
+\node[anchor=south west,draw,fill=gray!20,minimum width=0.25in] (update22) at ([yshift=2pt]push23.north east) {\tiny{Upd.}};
+\node[anchor=north] (G21) at (fetch21.west) {\small{G3}};
+\node[anchor=north] (G22) at (fetch22.west) {\small{G2}};
+\node[anchor=north] (G23) at (fetch23.west) {\small{G1}};
+\node[anchor=north,align=center] (synlabel) at (G22.south) {\small{\textbf{Asynchronous}}\\\small{\textbf{Training}}};
+\draw[->,thick] ([xshift=1em]G23.east) -- ([xshift=1em,yshift=1.4in]G23.east) node [pos=0.5,rotate=90,yshift=-1em] {\small{time line}};
+\draw [<->,thin,dotted] ([xshift=-1pt]update21.west) -- ([xshift=1pt,yshift=-1.5em]serverbox.north east);
+\draw [<->,thin,dotted] ([xshift=-1pt]update22.west) -- ([xshift=1pt,yshift=-1.5em]serverbox.north east);
+}
+}
+}
+\end{scope}
+\end{tikzpicture}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-perceptron-mode.tex
+++ b/Chapter9/Figures/fig-perceptron-mode.tex
+%%%------------------------------------------------------------------------------------------------------------
+ \begin{tikzpicture}
+\begin{scope}
+\node [anchor=center,circle,draw,ublue,very thick,minimum size=3.5em,fill=white,drop shadow={shadow xshift=0.1em,shadow yshift=-0.1em}] (neuron) at (0,0) {};
+\node [anchor=east] (x1) at ([xshift=-6em]neuron.west) {\Large{$x_1$}};
+\node [anchor=center] (x0) at ([yshift=3em]x1.center) {\Large{$x_0$}};
+\node [anchor=center] (x2) at ([yshift=-3em]x1.center) {\Large{$x_2$}};
+\node [anchor=west] (y) at ([xshift=6em]neuron.east) {\Large{$y$}};
+\node [anchor=center] (neuronmath) at (neuron.center) {\red{\small{$\sum \ge \sigma$}}};
+\draw [->,thick] (x0.east) -- (neuron.150) node [pos=0.5,above] {$w_0$};
+\draw [->,thick] (x1.east) -- (neuron.180) node [pos=0.5,above] {$w_1$};
+\draw [->,thick] (x2.east) -- (neuron.210) node [pos=0.5,above] {$w_2$};
+\draw [->,thick] (neuron.east) -- (y.west);
+\end{scope}
+\end{tikzpicture}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-perceptron-to-predict-1.tex
+++ b/Chapter9/Figures/fig-perceptron-to-predict-1.tex
+%%%------------------------------------------------------------------------------------------------------------
+ \begin{tikzpicture}
+\begin{scope}
+\node [anchor=center,circle,draw,ublue,very thick,minimum size=3.5em,fill=white,drop shadow={shadow xshift=0.1em,shadow yshift=-0.1em}] (neuron) at (0,0) {};
+\node [anchor=east] (x1) at ([xshift=-6em]neuron.west) {$x_1$:票价够低？};
+\node [anchor=center] (x0) at ([yshift=3em]x1.center) {$x_0$:距离够近？};
+\node [anchor=center] (x2) at ([yshift=-3em]x1.center) {$x_2$:女友喜欢？};
+\node [anchor=west] (y) at ([xshift=2em]neuron.east) {$y$:去？还是不去？};
+{
+\draw [->,thick] (x0.east) -- (neuron.150) node [pos=0.5,above,yshift=0.2em] {\small{$w_0=1$}};
+\draw [->,thick] (x1.east) -- (neuron.180) node [pos=0.5,above,yshift=-0.1em] {\small{$w_1=1$}};
+\draw [->,thick] (x2.east) -- (neuron.210) node [pos=0.5,above,yshift=0.1em] {\small{$w_2=1$}};
+}
+\draw [->,thick] (neuron.east) -- (y.west);
+\node [anchor=center] (neuronmath) at (neuron.center) {\small{$\sum \ge \sigma$}};
+\end{scope}
+\end{tikzpicture}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-perceptron-to-predict-2.tex
+++ b/Chapter9/Figures/fig-perceptron-to-predict-2.tex
+%%%------------------------------------------------------------------------------------------------------------
+\begin{tikzpicture}
+\begin{scope}
+\node [anchor=center,circle,draw,ublue,very thick,minimum size=3.5em,fill=white,drop shadow={shadow xshift=0.1em,shadow yshift=-0.1em}] (neuron) at (0,0) {};
+\node [anchor=east] (x1) at ([xshift=-6em]neuron.west) {$x_1$:票价够低？};
+\node [anchor=center] (x0) at ([yshift=3em]x1.center) {$x_0$:距离够近？};
+\node [anchor=center] (x2) at ([yshift=-3em]x1.center) {$x_2$:女友喜欢？};
+\node [anchor=west] (y) at ([xshift=2em]neuron.east) {$y$:去？还是不去？};
+\draw [->,thin] (x0.east) -- (neuron.150) node [pos=0.5,above,yshift=0.2em] {\small{$w_0=.5$}};
+\draw [->,line width=0.5mm] (x1.east) -- (neuron.180) node [pos=0.5,above,yshift=-0.1em] {\textbf{\small{$w_1=2$}}};
+\draw [->,thin] (x2.east) -- (neuron.210) node [pos=0.5,above,yshift=0.1em] {\small{$w_2=.5$}};
+\draw [->,thick] (neuron.east) -- (y.west);
+\node [anchor=center] (neuronmath) at (neuron.center) {\small{$\sum \ge \sigma$}};
+\node [anchor=south] (ylabel) at (y.north) {\textbf{不去了！}};
+\end{scope}
+\end{tikzpicture}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-perceptron-to-predict-3.tex
+++ b/Chapter9/Figures/fig-perceptron-to-predict-3.tex
+%%%------------------------------------------------------------------------------------------------------------
+\begin{tikzpicture}
+\begin{scope}
+\node [anchor=center,circle,draw,ublue,very thick,minimum size=3.5em,fill=white,drop shadow={shadow xshift=0.1em,shadow yshift=-0.1em}] (neuron) at (0,0) {};
+\node [anchor=east] (x1) at ([xshift=-6em]neuron.west) {$x_1$:便宜程度\ \ \ \ };
+\node [anchor=center] (x0) at ([yshift=3em]x1.center) {$x_0$:远近程度\ \ \ \ };
+\node [anchor=center] (x2) at ([yshift=-3em]x1.center) {$x_2$:女友喜欢？};
+\node [anchor=west] (y) at ([xshift=2em]neuron.east) {$y$:去？还是不去？};
+\draw [->,thick] (neuron.east) -- (y.west);
+\node [anchor=center] (neuronmath) at (neuron.center) {\small{$\sum \ge \sigma$}};
+{
+\draw [->,dotted] (x0.east) -- (neuron.150) node [pos=0.5,above,yshift=0.2em] {\small{$w_0=0$}};
+\draw [->,dotted] (x1.east) -- (neuron.180) node [pos=0.5,above,yshift=-0.1em] {\textbf{\small{$w_1=0$}}};
+\draw [->,line width=0.5mm] (x2.east) -- (neuron.210) node [pos=0.5,above,yshift=0.1em] {\small{$w_2=10$}};
+}
+\end{scope}
+\end{tikzpicture}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-piecewise.tex
+++ b/Chapter9/Figures/fig-piecewise.tex
+%%%------------------------------------------------------------------------------------------------------------
+\begin{tikzpicture}
+%% a two-layer neural network
+\begin{scope}[xshift=2in]
+\tikzstyle{neuronnode} = [minimum size=1.7em,circle,draw,ublue,very thick,inner sep=1pt, fill=white,align=center,drop shadow={shadow xshift=0.1em,shadow yshift=-0.1em}]
+%% output illustration
+\begin{scope}[xshift=2.8in,yshift=0.1in]
+{
+\draw [->,thick] (-2.2,0) -- (2.2,0);
+\draw [->,thick] (0,0) -- (0,2);
+\draw [-] (-0.05,1) -- (0.05,1);
+\node [anchor=north,inner sep=1pt] (labelb) at (0,-0.2) {\footnotesize{(b)}};
+}
+{
+\draw [->,thick] (-2.2,0) -- (2.2,0);
+\draw [->,thick] (0,0) -- (0,2);
+\draw [-,very thick,red,domain=-1.98:2,samples=100] plot (\x,{0.2 * (\x +0.4)^3 + 1.2 - 0.3 *(\x + 0.8)^2});
+}
+\foreach \n in {-1.9,-1.7,...,1.9}{
+    \pgfmathsetmacro{\result}{0.2 * (\n + 0.1 + 0.4)^3 + 1.2 - 0.3 *(\n + 0.1 + 0.8)^2};
+    \draw [-,ublue,thick] (\n,0) -- (\n, \result) -- (\n + 0.2, \result) -- (\n + 0.2, 0);
+}
+\end{scope}
+\end{scope}
+%% a two-layer neural network
+\begin{scope}[xshift=0in]
+\tikzstyle{neuronnode} = [minimum size=1.7em,circle,draw,ublue,very thick,inner sep=1pt, fill=white,align=center,drop shadow={shadow xshift=0.1em,shadow yshift=-0.1em}]
+%% output illustration
+\begin{scope}[xshift=2.8in,yshift=0.1in]
+{
+\draw [->,thick] (-2.2,0) -- (2.2,0);
+\draw [->,thick] (0,0) -- (0,2);
+\draw [-] (-0.05,1) -- (0.05,1);
+\node [anchor=east,inner sep=1pt] (label1) at (0,1) {\tiny{1}};
+\node [anchor=south east,inner sep=1pt] (label2) at (0,0) {\tiny{0}};
+\node [anchor=north,inner sep=1pt] (labela) at (0,-0.2) {\footnotesize{(a)}};
+}
+{
+\draw [->,thick] (-2.2,0) -- (2.2,0);
+\draw [->,thick] (0,0) -- (0,2);
+\draw [-,very thick,red,domain=-1.98:2,samples=100] plot (\x,{0.2 * (\x +0.4)^3 + 1.2 - 0.3 *(\x + 0.8)^2});
+}
+\end{scope}
+\end{scope}
+\end{tikzpicture}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-relu.tex
+++ b/Chapter9/Figures/fig-relu.tex
+%%%------------------------------------------------------------------------------------------------------------
+\begin{tikzpicture}
+        \draw[->, line width=1pt](-1.2,0)--(1.2,0)node[left,below,font=\tiny]{$x$};
+        \draw[->, line width=1pt](0,-1.2)--(0,1.2)node[right,font=\tiny]{$y$};
+        \draw[dashed](-1.2,1)--(1.2,1);
+        \draw[dashed](-1.2,-1)--(1.2,-1);
+        \foreach \x in {-1.0,-0.5,0.0,0.5,1.0}{\draw(\x,0)--(\x,0.05)node[below,outer sep=2pt,font=\tiny]at(\x,0){\x};}
+        \foreach \y in {0.5,1.0}{\draw(0,\y)--(0.05,\y)node[left,outer sep=2pt,font=\tiny]at(0,\y){\y};}
+        \draw[color=red ,domain=-1.2:1.2, line width=1pt]plot(\x,{max(\x,0)});
+        \node[black,anchor=south] at (0,1.2) {\small $y =\max (0, x)$};
+        \end{tikzpicture}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-residual-structure.tex
+++ b/Chapter9/Figures/fig-residual-structure.tex
+%%%------------------------------------------------------------------------------------------------------------
+\begin{tikzpicture}
+\begin{scope}
+\node [anchor=center] (node6) at (0,0) {};
+\node[anchor=west](node6-1) at ([xshift=-0.2em,yshift=-0.6em]node6.east) {\footnotesize{$\rm{ReLU}$}};
+\node [anchor=north](node3)at ([yshift=-1.2em]node6.south){$\bigoplus$};
+\draw[->,thick]([yshift=-0.32em]node3.north)--(node6.south);
+\node [anchor=north,draw,thick](node2)at ([yshift=-1.2em]node3.south){\small{weight layer}};
+\draw[->,thick](node2.north)--([yshift=0.35em]node3.south);
+\node[anchor=west](node2-1) at ([xshift=2.1em,yshift=1.2em]node2.east) {$\mathbf{x}$};
+\node[anchor=north](node2-2) at ([xshift=0.2em,yshift=-0.3em]node2-1.south) {\footnotesize{$\rm{identity}$}};
+\node [anchor=east](node4) at ([xshift=-0.2em]node2.west) {$\textrm{F}(\mathbf{x})$};
+\node [anchor=east](node5) at ([xshift=-0.3em]node3.west) {$\textrm{F}(\mathbf{x})+\mathbf{x}$};
+\node [anchor=north](node1) at ([yshift=-1.8em]node2.south) {};
+\draw[->,thick]([yshift=0.0em]node1.north)--(node2.south);
+\node [anchor=east](node1-1) at ([xshift=1em,yshift=0.4em]node1.east) {$\mathbf{x}$};
+\draw[->,thick]([xshift=-1.3em,yshift=0.8em]node1-1.east)--([xshift=2.7em,yshift=0.8em]node1-1.east)--([xshift=2.7em,yshift=5.35em]node1-1.east)--([xshift=-0.4em]node3.east);
+\end{scope}
+\end{tikzpicture}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-rnn-lm.tex
+++ b/Chapter9/Figures/fig-rnn-lm.tex
+\begin{tikzpicture}
+\begin{scope}
+\tikzstyle{rnnnode} = [draw,inner sep=5pt,minimum width=4em,minimum height=1.5em,fill=green!30!white,blur shadow={shadow xshift=1pt,shadow yshift=-1pt}]
+{
+\node [anchor=west,rnnnode] (node11) at (0,0) {\scriptsize{RNN Cell}};
+\node [anchor=west,rnnnode] (node12) at ([xshift=2em]node11.east) {\scriptsize{RNN Cell}};
+\node [anchor=west,rnnnode] (node13) at ([xshift=2em]node12.east) {\scriptsize{RNN Cell}};
+\node [anchor=west,rnnnode] (node14) at ([xshift=2em]node13.east) {\scriptsize{RNN Cell}};
+}
+\node [anchor=north,rnnnode,fill=red!30!white] (e1) at ([yshift=-1.2em]node11.south) {\tiny{$e_1=w_1\textbf{C}$}};
+\node [anchor=north,rnnnode,fill=red!30!white] (e2) at ([yshift=-1.2em]node12.south) {\tiny{$e_2=w_2\textbf{C}$}};
+\node [anchor=north,rnnnode,fill=red!30!white] (e3) at ([yshift=-1.2em]node13.south) {\tiny{$e_3=w_3\textbf{C}$}};
+\node [anchor=north,rnnnode,fill=red!30!white] (e4) at ([yshift=-1.2em]node14.south) {\tiny{$e_4=w_4\textbf{C}$}};
+\node [anchor=north] (w1) at ([yshift=-1em]e1.south) {\footnotesize{$w_1$}};
+\node [anchor=north] (w2) at ([yshift=-1em]e2.south) {\footnotesize{$w_2$}};
+\node [anchor=north] (w3) at ([yshift=-1em]e3.south) {\footnotesize{$w_3$}};
+\node [anchor=north] (w4) at ([yshift=-1em]e4.south) {\footnotesize{$w_4$}};
+\draw [->,thick] ([yshift=0.1em]w1.north)--([yshift=-0.1em]e1.south);
+\draw [->,thick] ([yshift=0.1em]w2.north)--([yshift=-0.1em]e2.south);
+\draw [->,thick] ([yshift=0.1em]w3.north)--([yshift=-0.1em]e3.south);
+\draw [->,thick] ([yshift=0.1em]w4.north)--([yshift=-0.1em]e4.south);
+\draw [->,thick] ([yshift=0.1em]e1.north)--([yshift=-0.1em]node11.south);
+\draw [->,thick] ([yshift=0.1em]e2.north)--([yshift=-0.1em]node12.south);
+\draw [->,thick] ([yshift=0.1em]e3.north)--([yshift=-0.1em]node13.south);
+\draw [->,thick] ([yshift=0.1em]e4.north)--([yshift=-0.1em]node14.south);
+{
+\node [anchor=south,rnnnode] (node21) at ([yshift=1.5em]node11.north) {\scriptsize{RNN Cell}};
+\node [anchor=south,rnnnode] (node22) at ([yshift=1.5em]node12.north) {\scriptsize{RNN Cell}};
+\node [anchor=south,rnnnode] (node23) at ([yshift=1.5em]node13.north) {\scriptsize{RNN Cell}};
+\node [anchor=south,rnnnode] (node24) at ([yshift=1.5em]node14.north) {\scriptsize{RNN Cell}};
+\node [anchor=south,rnnnode,fill=blue!30!white] (node31) at ([yshift=1.5em]node21.north) {\scriptsize{Softmax($\cdot$)}};
+\node [anchor=south,rnnnode,fill=blue!30!white] (node32) at ([yshift=1.5em]node22.north) {\scriptsize{Softmax($\cdot$)}};
+\node [anchor=south,rnnnode,fill=blue!30!white] (node33) at ([yshift=1.5em]node23.north) {\scriptsize{Softmax($\cdot$)}};
+\node [anchor=south,rnnnode,fill=blue!30!white] (node34) at ([yshift=1.5em]node24.north) {\scriptsize{Softmax($\cdot$)}};
+}
+{
+\draw [->,thick] ([yshift=0.1em]node31.north)--([yshift=1em]node31.north) node[pos=1,above] {\scriptsize{$\textrm{P}(w_2)$}};
+\draw [->,thick] ([yshift=0.1em]node32.north)--([yshift=1em]node32.north) node[pos=1,above] {\scriptsize{$\textrm{P}(w_3|w_2)$}};
+\draw [->,thick] ([yshift=0.1em]node33.north)--([yshift=1em]node33.north) node[pos=1,above] {\scriptsize{$\textrm{P}(w_4|w_2 w_3)$}};
+\draw [->,thick] ([yshift=0.1em]node34.north)--([yshift=1em]node34.north) node[pos=1,above] {\scriptsize{$\textrm{P}(w_5|w_2 w_3 w_4)$}};
+\draw [->,thick] ([yshift=0.1em]node21.north)--([yshift=-0.1em]node31.south);
+\draw [->,thick] ([yshift=0.1em]node22.north)--([yshift=-0.1em]node32.south);
+\draw [->,thick] ([yshift=0.1em]node23.north)--([yshift=-0.1em]node33.south);
+\draw [->,thick] ([yshift=0.1em]node24.north)--([yshift=-0.1em]node34.south);
+\draw [->,thick] ([xshift=-1em]node21.west)--([xshift=-0.1em]node21.west);
+\draw [->,thick] ([xshift=0.1em]node21.east)--([xshift=-0.1em]node22.west);
+\draw [->,thick] ([xshift=0.1em]node22.east)--([xshift=-0.1em]node23.west);
+\draw [->,thick] ([xshift=0.1em]node23.east)--([xshift=-0.1em]node24.west);
+\draw [->,thick] ([xshift=0.1em]node24.east)--([xshift=1em]node24.east);
+}
+\draw [->,thick] ([yshift=0.1em]node11.north)--([yshift=-0.1em]node21.south);
+\draw [->,thick] ([yshift=0.1em]node12.north)--([yshift=-0.1em]node22.south);
+\draw [->,thick] ([yshift=0.1em]node13.north)--([yshift=-0.1em]node23.south);
+\draw [->,thick] ([yshift=0.1em]node14.north)--([yshift=-0.1em]node24.south);
+\draw [->,thick] ([xshift=-1em]node11.west)--([xshift=-0.1em]node11.west);
+\draw [->,thick] ([xshift=0.1em]node11.east)--([xshift=-0.1em]node12.west);
+\draw [->,thick] ([xshift=0.1em]node12.east)--([xshift=-0.1em]node13.west);
+\draw [->,thick] ([xshift=0.1em]node13.east)--([xshift=-0.1em]node14.west);
+\draw [->,thick] ([xshift=0.1em]node14.east)--([xshift=1em]node14.east);
+\end{scope}
+\end{tikzpicture}
\ No newline at end of file
--- a/Chapter9/Figures/fig-rnn-model.tex
+++ b/Chapter9/Figures/fig-rnn-model.tex
+\begin{tikzpicture}
+\begin{scope}
+\tikzstyle{rnnnode} = [draw,inner sep=5pt,minimum width=4em,minimum height=1.5em,fill=green!30!white,blur shadow={shadow xshift=1pt,shadow yshift=-1pt}]
+\node [anchor=west,rnnnode] (node11) at (0,0) {\scriptsize{RNN Cell}};
+\node [anchor=west,rnnnode] (node12) at ([xshift=2em]node11.east) {\scriptsize{RNN Cell}};
+\node [anchor=west,rnnnode] (node13) at ([xshift=2em]node12.east) {\scriptsize{RNN Cell}};
+\node [anchor=west,rnnnode] (node14) at ([xshift=2em]node13.east) {\scriptsize{RNN Cell}};
+\node [anchor=north,rnnnode,fill=red!30!white] (e1) at ([yshift=-1.2em]node11.south) {\scriptsize{embedding}};
+\node [anchor=north,rnnnode,fill=red!30!white] (e2) at ([yshift=-1.2em]node12.south) {\scriptsize{embedding}};
+\node [anchor=north,rnnnode,fill=red!30!white] (e3) at ([yshift=-1.2em]node13.south) {\scriptsize{embedding}};
+\node [anchor=north,rnnnode,fill=red!30!white] (e4) at ([yshift=-1.2em]node14.south) {\scriptsize{embedding}};
+\node [anchor=north] (w1) at ([yshift=-1em]e1.south) {\footnotesize{乔布斯}};
+\node [anchor=north] (w2) at ([yshift=-1em]e2.south) {\footnotesize{任职}};
+\node [anchor=north] (w3) at ([yshift=-1em]e3.south) {\footnotesize{于}};
+\node [anchor=north] (w4) at ([yshift=-1em]e4.south) {\footnotesize{苹果}};
+\draw [->,thick] ([yshift=0.1em]w1.north)--([yshift=-0.1em]e1.south);
+\draw [->,thick] ([yshift=0.1em]w2.north)--([yshift=-0.1em]e2.south);
+\draw [->,thick] ([yshift=0.1em]w3.north)--([yshift=-0.1em]e3.south);
+\draw [->,thick] ([yshift=0.1em]w4.north)--([yshift=-0.1em]e4.south);
+\draw [->,thick] ([yshift=0.1em]e1.north)--([yshift=-0.1em]node11.south);
+\draw [->,thick] ([yshift=0.1em]e2.north)--([yshift=-0.1em]node12.south);
+\draw [->,thick] ([yshift=0.1em]e3.north)--([yshift=-0.1em]node13.south);
+\draw [->,thick] ([yshift=0.1em]e4.north)--([yshift=-0.1em]node14.south);
+\node [anchor=south,rnnnode] (node21) at ([yshift=1.5em]node11.north) {\scriptsize{RNN Cell}};
+\node [anchor=south,rnnnode] (node22) at ([yshift=1.5em]node12.north) {\scriptsize{RNN Cell}};
+\node [anchor=south,rnnnode] (node23) at ([yshift=1.5em]node13.north) {\scriptsize{RNN Cell}};
+\node [anchor=south,rnnnode] (node24) at ([yshift=1.5em]node14.north) {\scriptsize{RNN Cell}};
+\node [anchor=south] (node31) at ([yshift=1.0em]node21.north) {\scriptsize{的表示}};
+\node [anchor=south west] (node31new) at ([yshift=-0.3em]node31.north west) {\scriptsize{``乔布斯''}};
+\node [anchor=south] (node32) at ([yshift=1.0em]node22.north) {\scriptsize{的表示\ \ \ }};
+\node [anchor=south west] (node32new) at ([yshift=-0.3em]node32.north west) {\scriptsize{``乔布斯 任职''}};
+\node [anchor=south] (node33) at ([yshift=1.0em]node23.north) {\scriptsize{的表示\ \ \ \ \ \ \ \ }};
+\node [anchor=south west] (node33new) at ([yshift=-0.3em]node33.north west) {\scriptsize{``乔布斯 任职 于''}};
+\node [anchor=south] (node34) at ([yshift=1.0em]node24.north) {\scriptsize{的表示\ \ \ \ \ \ \ \ }};
+\node [anchor=south west] (node34new) at ([yshift=-0.3em]node34.north west) {\scriptsize{``乔布斯 任职 于 苹果''}};
+\draw [->,thick] ([yshift=0.1em]node21.north)--([yshift=-0.1em]node31.south);
+\draw [->,thick] ([yshift=0.1em]node22.north)--([yshift=-0.1em]node32.south);
+\draw [->,thick] ([yshift=0.1em]node23.north)--([yshift=-0.1em]node33.south);
+\draw [->,thick] ([yshift=0.1em]node24.north)--([yshift=-0.1em]node34.south);
+\draw [->,thick] ([xshift=-1em]node21.west)--([xshift=-0.1em]node21.west);
+\draw [->,thick] ([xshift=0.1em]node21.east)--([xshift=-0.1em]node22.west);
+\draw [->,thick] ([xshift=0.1em]node22.east)--([xshift=-0.1em]node23.west);
+\draw [->,thick] ([xshift=0.1em]node23.east)--([xshift=-0.1em]node24.west);
+\draw [->,thick] ([xshift=0.1em]node24.east)--([xshift=1em]node24.east);
+\draw [->,thick] ([yshift=0.1em]node11.north)--([yshift=-0.1em]node21.south);
+\draw [->,thick] ([yshift=0.1em]node12.north)--([yshift=-0.1em]node22.south);
+\draw [->,thick] ([yshift=0.1em]node13.north)--([yshift=-0.1em]node23.south);
+\draw [->,thick] ([yshift=0.1em]node14.north)--([yshift=-0.1em]node24.south);
+\draw [->,thick] ([xshift=-1em]node11.west)--([xshift=-0.1em]node11.west);
+\draw [->,thick] ([xshift=0.1em]node11.east)--([xshift=-0.1em]node12.west);
+\draw [->,thick] ([xshift=0.1em]node12.east)--([xshift=-0.1em]node13.west);
+\draw [->,thick] ([xshift=0.1em]node13.east)--([xshift=-0.1em]node14.west);
+\draw [->,thick] ([xshift=0.1em]node14.east)--([xshift=1em]node14.east);
+{
+\node [anchor=south] (toplabel1) at ([yshift=2em,xshift=-1.3em]node32new.north) {\footnotesize{``苹果''的表示：}};
+\node [anchor=west,fill=blue!20!white,minimum width=3em] (toplabel2) at (toplabel1.east) {\footnotesize{上下文}};
+}
+{
+\node [anchor=west,fill=red!20!white,minimum width=3em] (toplabel3) at (toplabel2.east) {\footnotesize{词}};
+}
+\begin{pgfonlayer}{background}
+{
+\node [rectangle,inner sep=2pt,draw,thick,dashed,red] [fit = (e4)] (r2) {};
+\draw [->,thick,red] (r2.west) .. controls +(west:0.8) and +(south:2) .. ([xshift=1.3em]toplabel3.south);
+}
+{
+\node [rectangle,inner sep=2pt,draw,thick,dashed,ublue,fill=white] [fit = (node33) (node33new)] (r1) {};
+\draw [->,thick,ublue] ([xshift=-2em]r1.north) .. controls +(north:0.7) and +(south:0.7) .. ([xshift=-0.5em]toplabel2.south);
+}
+\end{pgfonlayer}
+\end{scope}
+\end{tikzpicture}
\ No newline at end of file
--- a/Chapter9/Figures/fig-rotation.tex
+++ b/Chapter9/Figures/fig-rotation.tex
+%%%------------------------------------------------------------------------------------------------------------
+\begin{tikzpicture}
+\tikzstyle{neuron} = [rectangle,draw,thick,fill=red!30,red!35,minimum height=2em,minimum width=2em,font=\small]
+\node[neuron,anchor=north] (a1) at (0,0) {};
+\draw[->,thick] ([xshift=-2em,yshift=0em]a1.south) to ([xshift=3em,yshift=0em]a1.south);
+\draw[->,thick] ([xshift=0em,yshift=-4em]a1.west) to ([xshift=0em,yshift=2em]a1.west);
+\node[below] at ([xshift=0.5em,yshift=-1em]a1.west){0};
+\node[below] at ([xshift=2em,yshift=-1em]a1.west){1};
+\node[below] at ([xshift=-0.5em,yshift=2em]a1.west){1};
+\node [anchor=west] (x) at ([xshift=-0.7em,yshift=1em]a1.south) {\Large{$\textbf{F}$}};
+{
+\tikzstyle{neuron} = [rectangle,draw,thick,fill=red!30,red!35,minimum height=2em,minimum width=2em,font=\small]
+\node[neuron,anchor=north] (a2) at ([xshift=10em,yshift=0em]a1.south) {};
+\draw[->,thick] ([xshift=-2em,yshift=0em]a2.north) to ([xshift=3em,yshift=0em]a2.north);
+\draw[->,thick] ([xshift=0em,yshift=-2em]a2.west) to ([xshift=0em,yshift=4em]a2.west);
+\node[above] at ([xshift=0.5em,yshift=1em]a2.west){0};
+\node[above] at ([xshift=2em,yshift=1em]a2.west){1};
+\node[below] at ([xshift=-0.5em,yshift=0em]a2.west){-1};
+\node [anchor=west] (x) at ([xshift=-3.5cm,yshift=2em]a2.north) {\scriptsize{
+    $w=\begin{bmatrix}
+    1&0&0\\
+    0&-1&0\\
+    0&0&1
+    \end{bmatrix}$}
+    };
+\node [anchor=west,rotate = 180] (x) at ([xshift=0.7em,yshift=1em]a2.south) {\Large{$\textbf{F}$}};
+\draw[-stealth, line width=2pt,dashed] ([xshift=4em,yshift=0em]a1.south) to ([xshift=-3em,yshift=0em]a2.north);
+}
+\end{tikzpicture}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-save.tex
+++ b/Chapter9/Figures/fig-save.tex
+%%%------------------------------------------------------------------------------------------------------------
+\begin{tikzpicture}
+\begin{scope}
+\setcounter{mycount1}{1}
+\draw[step=0.5cm,thick] (0,-0) grid (1.5,0.5);
+\foreach \x in {0.25,0.75,1.25}{
+    \node [fill=green!20,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm](vector1) at (\x,0.25) {$\number\value{mycount1}$};
+    \addtocounter{mycount1}{1};
+}
+\node [anchor=north] (labela) at ([xshift=-1.2em,yshift=-0em]vector1.south) {\footnotesize{(a) }};
+\end{scope}
+\begin{scope}[xshift=1.2in]
+\draw[step=0.5cm,thick] (0,-0) grid (3.0,0.5);
+\setcounter{mycount2}{1}
+\foreach \x in {0.25,0.75,1.25}{
+    \node [fill=green!20,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] (vector2)at (\x,0.25) {$\number\value{mycount2}$};
+    \addtocounter{mycount2}{1};
+}
+\foreach \x in {1.75,2.25,2.75}{
+    \node [fill=red!20,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,0.25) {$\number\value{mycount2}$};
+    \addtocounter{mycount2}{1};
+}
+\node [anchor=north] (labelb) at ([xshift=0.3em,yshift=-0em]vector2.south) {\footnotesize{(b) }};
+\end{scope}
+\begin{scope}[yshift=-0.6in]
+\draw[step=0.5cm,thick] (0,-0) grid (6.0,0.5);
+\setcounter{mycount3}{1}
+\foreach \x in {0.25,0.75,1.25}{
+    \node [fill=green!20,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,0.25) {$\number\value{mycount3}$};
+    \addtocounter{mycount3}{1};
+}
+\foreach \x in {1.75,2.25,2.75}{
+    \node [fill=red!20,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,0.25) {$\number\value{mycount3}$};
+    \addtocounter{mycount3}{1};
+}
+\foreach \x in {3.25,3.75,4.25}{
+    \node [fill=green!20,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,0.25) {$\number\value{mycount3}$};
+    \addtocounter{mycount3}{1};
+}
+\foreach \x in {4.75,5.25,5.75}{
+    \node [fill=red!20,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,0.25) {$\number\value{mycount3}$};
+    \addtocounter{mycount3}{1};
+}
+\draw[decorate,thick,decoration={brace,mirror,raise=0.2em}] (0,-0.2) -- (2.95,-0.2);
+\draw[decorate,thick,decoration={brace,mirror,raise=0.2em}] (3.05,-0.2) -- (6,-0.2);
+\node [anchor=north] (subtensor1) at (1.5,-0.4) {\footnotesize{$3 \times 2$ sub-tensor}};
+\node [anchor=north] (subtensor1) at (4.5,-0.4) {\footnotesize{$3 \times 2$ sub-tensor}};
+\node [anchor=north] (labelc) at (3,-0.8) {\footnotesize{(c)}};
+\end{scope}
+\end{tikzpicture}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-sawtooth.tex
+++ b/Chapter9/Figures/fig-sawtooth.tex
+%%%------------------------------------------------------------------------------------------------------------
+\begin{tikzpicture}
+\begin{scope}
+\node [anchor=center,color=red] (node1) at (0,0) {};
+\node [anchor=north,color=red] (node4) at ([xshift=9.0em,yshift=0.88em]node1.south) {\large{$\bullet$}};
+\node [anchor=north,color=red] (node5) at ([xshift=6.35em,yshift=-2.5em]node1.south) {\large{$\bullet$}};
+\node [anchor=north,color=red] (node6) at ([xshift=3.8em,yshift=-0.5em]node1.south) {\large{$\bullet$}};
+\node [anchor=north,color=red] (node7) at ([xshift=2.6em,yshift=-2.0em]node1.south) {\large{$\bullet$}};
+\node [anchor=north,color=red] (node8) at ([xshift=1.22em,yshift=-0.91em]node1.south) {\large{$\bullet$}};
+\node [anchor=north,color=red] (node9) at ([xshift=0.6em,yshift=-1.7em]node1.south) {\large{$\bullet$}};
+\node [anchor=north,color=red] (node10) at ([xshift=0.0em,yshift=-1.2em]node1.south) {\large{$\bullet$}};
+\draw[-,ublue,line width=0.3mm]([xshift=0.5em,yshift=0.46em]node4.south west)--([xshift=-0.5em,yshift=-0.4em]node5.north east);
+\draw[-,ublue,line width=0.3mm]([xshift=-0.45em,yshift=0.52em]node6.south east)--([xshift=0.47em,yshift=-0.43em]node5.north west);
+\draw[-,ublue,line width=0.3mm]([xshift=0.5em,yshift=0.46em]node6.south west)--([xshift=-0.5em,yshift=-0.4em]node7.north east);
+\draw[-,ublue,line width=0.3mm]([xshift=-0.45em,yshift=0.52em]node8.south east)--([xshift=0.47em,yshift=-0.43em]node7.north west);
+\draw[-,ublue]([xshift=0.5em,yshift=0.46em]node8.south west)--([xshift=-0.5em,yshift=-0.4em]node9.north east);
+\draw[-,ublue,line width=0.3mm]([xshift=-0.78em,yshift=0.77em]node9.south east)--([xshift=0.78em,yshift=-0.68em]node10.north west);
+\draw [-,ublue] (0,0) .. controls (2,0) and (3,-1.0)..(3,-1.5) .. controls (3,-2.2) and (2,-1.75)..(1.5,-1.65)..controls (1.5,-1.65) and (0.5,-1.45)..(0,-1.45)..controls (-0.5,-1.45) and (-1.5,-1.65)..(-1.5,-1.65)..controls (-2,-1.75)and (-3,-2.2).. (-3,-1.5)..controls (-3,-1.0) and (-2,0)..(0,0);
+\draw [-,ublue] (0,0.5)..controls (2,0.5) and (4,-1.0).. (4,-1.7)..controls(4,-2.6)and (3,-2.3)..(2,-2.05)..controls (2,-2.05) and (1,-1.80)..(0,-1.80)..controls (-1,-1.80)and (-2,-2.05)..(-2,-2.05)..controls(-3,-2.3)and(-4,-2.6)..(-4,-1.7)..controls(-4,-1.0)and (-2,0.5)..(0,0.5);
+\draw[-,ublue](0,1.0)..controls(3,1.0) and (5,-1.0)..(5,-1.9)..controls (5,-3.2)and (4,-2.7)..(3,-2.5)..controls (3,-2.5) and (2,-2.20)..(0,-2.15)..controls (-2,-2.20)and (-3,-2.5)..(-3,-2.5)..controls (-4,-2.7) and (-5,-3.2) ..(-5,-1.9)..controls (-5,-1.0) and (-3,1.0)..(0,1.0);
+\draw[-,ublue] (0,-0.3)..controls (1.5,-0.3)and (2.5,-1.0)..(2.5,-1.4)..controls(2.5,-1.8)and (2,-1.55)..(1.5,-1.45) ..controls (1.5,-1.45) and (0.5,-1.25)..(0,-1.25) .. controls(-0.5,-1.25)and (-1.5,-1.45)..(-1.5,-1.45)..controls(-2,-1.55)and (-2.5,-1.8) ..(-2.5,-1.4)..controls(-2.5,-1.0) and (-1.5,-0.3)..(0,-0.3);
+\draw[-,ublue](0,-0.5)..controls (1.0,-0.5) and (1.9,-0.8)..(1.9,-1.3)..controls(1.9,-1.5)and (1.5,-1.3)..(1.0,-1.2) ..controls(1.0,-1.2) and (0.5,-1.1)..(0,-1.1)..controls(-0.5,-1.1) and (-1.0,-1.2)..(-1.0,-1.2)..controls (-1.5,-1.3)and (-1.9,-1.5)..(-1.9,-1.3) ..controls(-1.9,-0.8)and (-1.0,-0.5) ..(0,-0.5);
+\draw[-,ublue](0,-0.7)..controls(1.0,-0.7) and (1.4,-0.9)..(1.4,-1.1) .. controls(1.4,-1.25) and (1.2,-1.15)..(1.0,-1.1)..controls(1.0,-1.1) and (0.5,-0.95)..(0,-0.95)..controls(-0.5,-0.95)and (-1.0,-1.1) ..(-1.0,-1.1)..controls(-1.2,-1.15) and (-1.4,-1.25)..(-1.4,-1.1)..controls(-1.4,-0.9) and (-1.0,-0.7)..(0,-0.7);
+\draw[-,ublue](0,-0.75)..controls(0.7,-0.75)and (1.0,-0.9)..(1.0,-1.0)..controls(1.0,-1.05) and (0.9,-1.05)..(0.7,-1.0)..controls(0.5,-0.95)and (0.3,-0.9)..(0,-0.9)..controls(-0.3,-0.9)and (-0.5,-0.95)..(-0.7,-1.0)..controls(-0.9,-1.05)and (-1.0,-1.05)..(-1.0,-1.0) ..controls(-1.0,-0.9)and (-0.7,-0.75)..(0,-0.75);
+\draw[-,ublue](0,-0.8)..controls(0.5,-0.8) and (0.6,-0.85)..(0.6,-0.9)..controls(0.6,-0.93)and (0.5,-0.91)..(0.3,-0.88)..controls(0.2,-0.87)and (0.1,-0.86)..(0,-0.86)..controls(-0.1,-0.86)and(-0.2,-0.87)..(-0.3,-0.88)..controls(-0.5,-0.91) and(-0.6,-0.93) ..(-0.6,-0.9)..controls(-0.6,-0.85)and (-0.5,-0.8)..(0,-0.8);
+\node [anchor=north] (labela) at (0,-2.7) {\footnotesize{(a)梯度下降算法中的``锯齿''现象}};
+\end{scope}
+%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\begin{scope}[yshift=-1.8in]
+\node [anchor=center,color=red] (node1) at (0,0) {};
+\node [anchor=north,color=red] (node2) at ([xshift=0.0em,yshift=-1.2em]node1.south) {\large{$\bullet$}};
+\node [anchor=north,color=red] (node3) at ([xshift=3.55em,yshift=-0.981em]node1.south) {\large{$\bullet$}};
+\node [anchor=north,color=red] (node4) at ([xshift=7.75em,yshift=-2.91em]node1.south) {\large{$\bullet$}};
+\node [anchor=north,color=red] (node5) at ([xshift=11.38em,yshift=-1.11em]node1.south) {\large{$\bullet$}};
+\draw[-,ublue,line width=0.3mm]([xshift=0.79em,yshift=-0.59em]node2.north west)--([xshift=-0.75em,yshift=0.6em]node3.south east);
+\draw[-,ublue,line width=0.3mm]([xshift=0.79em,yshift=0.66em]node3.south west)--([xshift=-0.76em,yshift=-0.5em]node4.north east);
+\draw[-,ublue,line width=0.3mm]([xshift=0.79em,yshift=-0.59em]node4.north west)--([xshift=-0.75em,yshift=0.6em]node5.south east);
+\draw [-,ublue] (0,0) .. controls (2,0) and (3,-1.0)..(3,-1.5) .. controls (3,-2.2) and (2,-1.75)..(1.5,-1.65)..controls (1.5,-1.65) and (0.5,-1.45)..(0,-1.45)..controls (-0.5,-1.45) and (-1.5,-1.65)..(-1.5,-1.65)..controls (-2,-1.75)and (-3,-2.2).. (-3,-1.5)..controls (-3,-1.0) and (-2,0)..(0,0);
+\draw [-,ublue] (0,0.5)..controls (2,0.5) and (4,-1.0).. (4,-1.7)..controls(4,-2.6)and (3,-2.3)..(2,-2.05)..controls (2,-2.05) and (1,-1.80)..(0,-1.80)..controls (-1,-1.80)and (-2,-2.05)..(-2,-2.05)..controls(-3,-2.3)and(-4,-2.6)..(-4,-1.7)..controls(-4,-1.0)and (-2,0.5)..(0,0.5);
+\draw[-,ublue](0,1.0)..controls(3,1.0) and (5,-1.0)..(5,-1.9)..controls (5,-3.2)and (4,-2.7)..(3,-2.5)..controls (3,-2.5) and (2,-2.20)..(0,-2.15)..controls (-2,-2.20)and (-3,-2.5)..(-3,-2.5)..controls (-4,-2.7) and (-5,-3.2) ..(-5,-1.9)..controls (-5,-1.0) and (-3,1.0)..(0,1.0);
+\draw[-,ublue] (0,-0.3)..controls (1.5,-0.3)and (2.5,-1.0)..(2.5,-1.4)..controls(2.5,-1.8)and (2,-1.55)..(1.5,-1.45) ..controls (1.5,-1.45) and (0.5,-1.25)..(0,-1.25) .. controls(-0.5,-1.25)and (-1.5,-1.45)..(-1.5,-1.45)..controls(-2,-1.55)and (-2.5,-1.8) ..(-2.5,-1.4)..controls(-2.5,-1.0) and (-1.5,-0.3)..(0,-0.3);
+\draw[-,ublue](0,-0.5)..controls (1.0,-0.5) and (1.9,-0.8)..(1.9,-1.3)..controls(1.9,-1.5)and (1.5,-1.3)..(1.0,-1.2) ..controls(1.0,-1.2) and (0.5,-1.1)..(0,-1.1)..controls(-0.5,-1.1) and (-1.0,-1.2)..(-1.0,-1.2)..controls (-1.5,-1.3)and (-1.9,-1.5)..(-1.9,-1.3) ..controls(-1.9,-0.8)and (-1.0,-0.5) ..(0,-0.5);
+\draw[-,ublue](0,-0.7)..controls(1.0,-0.7) and (1.4,-0.9)..(1.4,-1.1) .. controls(1.4,-1.25) and (1.2,-1.15)..(1.0,-1.1)..controls(1.0,-1.1) and (0.5,-0.95)..(0,-0.95)..controls(-0.5,-0.95)and (-1.0,-1.1) ..(-1.0,-1.1)..controls(-1.2,-1.15) and (-1.4,-1.25)..(-1.4,-1.1)..controls(-1.4,-0.9) and (-1.0,-0.7)..(0,-0.7);
+\draw[-,ublue](0,-0.75)..controls(0.7,-0.75)and (1.0,-0.9)..(1.0,-1.0)..controls(1.0,-1.05) and (0.9,-1.05)..(0.7,-1.0)..controls(0.5,-0.95)and (0.3,-0.9)..(0,-0.9)..controls(-0.3,-0.9)and (-0.5,-0.95)..(-0.7,-1.0)..controls(-0.9,-1.05)and (-1.0,-1.05)..(-1.0,-1.0) ..controls(-1.0,-0.9)and (-0.7,-0.75)..(0,-0.75);
+\draw[-,ublue](0,-0.8)..controls(0.5,-0.8) and (0.6,-0.85)..(0.6,-0.9)..controls(0.6,-0.93)and (0.5,-0.91)..(0.3,-0.88)..controls(0.2,-0.87)and (0.1,-0.86)..(0,-0.86)..controls(-0.1,-0.86)and(-0.2,-0.87)..(-0.3,-0.88)..controls(-0.5,-0.91) and(-0.6,-0.93) ..(-0.6,-0.9)..controls(-0.6,-0.85)and (-0.5,-0.8)..(0,-0.8);
+\node [anchor=north] (labelb) at (0,-3) {\footnotesize{(b)Momentum梯度下降算法更加``平滑''地更新}};
+\end{scope}
+\end{tikzpicture}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-sigmoid.tex
+++ b/Chapter9/Figures/fig-sigmoid.tex
+%%%------------------------------------------------------------------------------------------------------------
+ \begin{tikzpicture}
+        \draw[->, line width=1pt](-1.2,0)--(1.2,0)node[left,below,font=\tiny]{$x$};
+        \draw[->, line width=1pt](0,-1.2)--(0,1.2)node[right,font=\tiny]{$y$};
+        \draw[dashed](-1.2,1)--(1.2,1);
+        \foreach \x in {-1,-0.5,0,0.5,1}{\draw(\x,0)--(\x,0.05)node[below,outer sep=2pt,font=\tiny]at(\x,0){
+            \pgfmathparse{(\x)*5}
+            \pgfmathresult};}
+        \foreach \y in {0.5,1.0}{\draw(0,\y)--(0.05,\y)node[left,outer sep=2pt,font=\tiny]at(0,\y){\y};}
+        \draw[color=red,domain=-1.2:1.2, line width=1pt]plot(\x,{1/(1+(exp(-5*\x)))});
+        \node[black,anchor=south] at (0,1.2) {\small $y = \frac{1}{1+e^{-x}}$};
+        \end{tikzpicture}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-single-layer-of-neural-network-for-weather-prediction.tex
+++ b/Chapter9/Figures/fig-single-layer-of-neural-network-for-weather-prediction.tex
+%%%------------------------------------------------------------------------------------------------------------
+\begin{tikzpicture}
+\begin{scope}
+\tikzstyle{neuronnode} = [minimum size=1.5em,circle,draw,ublue,very thick,fill=white,drop shadow={shadow xshift=0.1em,shadow yshift=-0.1em}]
+\node [anchor=center,neuronnode] (neuron00) at (0,0) {};
+\node [anchor=center,neuronnode] (neuron01) at ([yshift=-3em]neuron00) {};
+\node [anchor=center,neuronnode] (neuron02) at ([yshift=-3em]neuron01) {};
+\node [anchor=east] (x0) at ([xshift=-6em]neuron00.west) {$x_0$};
+\node [anchor=east] (x1) at ([xshift=-6em]neuron01.west) {$x_1$};
+\node [anchor=east] (x2) at ([xshift=-6em]neuron02.west) {$b$};
+\node [anchor=west] (y0) at ([xshift=4em]neuron00.east) {$y_0$};
+\draw [->] (x0.east) -- (neuron00.180) node [pos=0.1,above] {\tiny{$w_{00}$}};
+\draw [->] (x1.east) -- (neuron00.200) node [pos=0.1,above] {\tiny{$w_{10}$}};
+\draw [->] (x2.east) -- (neuron00.220) node [pos=0.05,above,yshift=0.3em] {\tiny{$b_{0}$}};
+\draw [->] (neuron00.east) -- (y0.west);
+\node [anchor=west] (y1) at ([xshift=4em]neuron01.east) {$y_1$};
+\draw [->] (x0.east) -- (neuron01.160) node [pos=0.4,above] {\tiny{$w_{01}$}};
+\draw [->] (x1.east) -- (neuron01.180) node [pos=0.35,above,yshift=-0.2em] {\tiny{$w_{11}$}};
+\draw [->] (x2.east) -- (neuron01.200) node [pos=0.3,below,yshift=0.2em] {\tiny{$b_{1}$}};
+\draw [->] (neuron01.east) -- (y1.west);
+\node [anchor=west] (y2) at ([xshift=4em]neuron02.east) {$y_2$};
+\draw [->] (x0.east) -- (neuron02.140) node [pos=0.1,below,yshift=-0.2em] {\tiny{$w_{02}$}};
+\draw [->] (x1.east) -- (neuron02.160) node [pos=0.1,below] {\tiny{$w_{12}$}};
+\draw [->] (x2.east) -- (neuron02.180) node [pos=0.3,below] {\tiny{$b_{2}$}};
+\draw [->] (neuron02.east) -- (y2.west);
+\node [anchor=east,align=left] (inputlabel) at ([xshift=-0.1em]x1.west) {\scriptsize{输入向量}:\\\small{$\textbf{x}=(x_0,x_1)$}};
+\node [anchor=west,align=left] (outputlabel) at ([xshift=0.1em]y1.east) {\scriptsize{输出向量}:\\\small{$\textbf{y}=(y_0,y_1,y_2)$}};
+\begin{pgfonlayer}{background}
+\node [rectangle,inner sep=0.4em,fill=red!20] [fit = (neuron00) (neuron01) (neuron02)] (layer) {};
+\node [anchor=south] (layerlabel) at ([yshift=0.2em]layer.north) {\scriptsize{一层神经元}};
+\node [rectangle,inner sep=0.1em,fill=ugreen!20] [fit = (x0) (x1)] (inputshadow) {};
+\node [rectangle,inner sep=0.1em,fill=blue!20] [fit = (y0) (y1) (y2)] (outputshadow) {};
+\end{pgfonlayer}
+\end{scope}
+\end{tikzpicture}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-softplus.tex
+++ b/Chapter9/Figures/fig-softplus.tex
+%%%------------------------------------------------------------------------------------------------------------
+\begin{tikzpicture}
+\draw[->, line width=1pt](-1.2,0)--(1.2,0)node[left,below,font=\tiny]{$x$};
+\draw[->, line width=1pt](0,-1.2)--(0,1.2)node[right,font=\tiny]{$y$};
+\foreach \x in {-1.0,-0.5,0.0,0.5,1.0}{\draw(\x,0)--(\x,0.05)node[below,outer sep=2pt,font=\tiny]at(\x,0){\x};}
+ \foreach \y in {1.0,0.5}{\draw(0,\y)--(0.05,\y)node[left,outer sep=2pt,font=\tiny]at(0,\y){\y};}
+\draw[color=red ,domain=-1.2:1, line width=1pt]plot(\x,{ln(1+(exp(\x))});
+\node[black,anchor=south] at (0,1.2) {\small $y = ln(1+e^x)$};
+\end{tikzpicture}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-tanh.tex
+++ b/Chapter9/Figures/fig-tanh.tex
+%%%------------------------------------------------------------------------------------------------------------
+\begin{tikzpicture}
+        \draw[->, line width=1pt](-1.2,0)--(1.2,0)node[left,below,font=\tiny]{$x$};
+        \draw[->, line width=1pt](0,-1.2)--(0,1.2)node[right,font=\tiny]{$y$};
+        \draw[dashed](-1.2,1)--(1.2,1);
+        \draw[dashed](-1.2,-1)--(1.2,-1);
+        \foreach \x in {-1.0,-0.5,0.0,0.5,1.0}{\draw(\x,0)--(\x,0.05)node[below,outer sep=2pt,font=\tiny]at(\x,0){\x};}
+        \foreach \y in {0.5,1.0}{\draw(0,\y)--(0.05,\y)node[left,outer sep=2pt,font=\tiny]at(0,\y){\y};}
+        \draw[color=red ,domain=-1.2:1.2, line width=1pt]plot(\x,{tanh(\x)});
+        \node[black,anchor=south] at (0,1.2) {\small $y = \frac{e^{x}-e^{-x}}{e^{x}+e^{-x}}$};
+        \end{tikzpicture}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-tensor-mul.tex
+++ b/Chapter9/Figures/fig-tensor-mul.tex
+%%%------------------------------------------------------------------------------------------------------------
+\begin{tikzpicture}
+\begin{scope}[yshift=6.5em,xshift=1em]
+{
+\setcounter{mycount1}{1}
+\draw[step=0.5cm,color=orange,thick] (-1,-1) grid (1,1);
+\foreach \y in {+0.75,+0.25,-0.25,-0.75}
+  \foreach \x in {-0.75,-0.25,0.25,0.75}{
+    \node [fill=orange!20,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,\y) {$\number\value{mycount1}$};
+    \addtocounter{mycount1}{1};
+  }
+}
+\node [anchor=south west] (label11) at (-1.3,0.9) {\footnotesize{\ding{172}}};
+\end{scope}
+\begin{scope}[yshift=6em,xshift=0.5em]
+{
+\setcounter{mycount2}{2}
+\draw[step=0.5cm,color=blue,thick] (-1,-1) grid (1,1);
+\foreach \y in {+0.75,+0.25,-0.25,-0.75}
+  \foreach \x in {-0.75,-0.25,0.25,0.75}{
+    \node [fill=blue!20,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,\y) {$\number\value{mycount2}$};
+    \addtocounter{mycount2}{1};
+  }
+}
+\node [anchor=south west] (label12) at (-1.3,0.9) {\footnotesize{\ding{173}}};
+\end{scope}
+\begin{scope}[yshift=5.5em,xshift=0em]
+{
+\setcounter{mycount3}{3}
+\draw[step=0.5cm,color=ugreen,thick] (-1,-1) grid (1,1);
+\foreach \y in {+0.75,+0.25,-0.25,-0.75}
+  \foreach \x in {-0.75,-0.25,0.25,0.75}{
+    \node [fill=green!20,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,\y) {$\number\value{mycount3}$};
+    \addtocounter{mycount3}{1};
+  }
+}
+\node [anchor=south west] (label13) at (-1.3,0.9) {\footnotesize{\ding{174}}};
+\end{scope}
+\begin{scope}[yshift=5em,xshift=-0.5em]
+{
+\setcounter{mycount4}{4}
+\draw[step=0.5cm,color=red,thick] (-1,-1) grid (1,1);
+\foreach \y in {+0.75,+0.25,-0.25,-0.75}
+  \foreach \x in {-0.75,-0.25,0.25,0.75}{
+    \node [fill=red!20,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,\y) {$\number\value{mycount4}$};
+    \addtocounter{mycount4}{1};
+  }
+\node [anchor=north] (xlabel) at (0,-1.2) {$\textbf{x}$};
+}
+\node [anchor=south west] (label14) at (-1.3,0.9) {\footnotesize{\ding{175}}};
+\end{scope}
+\begin{scope}[yshift=5em,xshift=1.5in]
+{
+\draw[step=0.5cm,thick] (-0.5,-1) grid (0.5,1.0);
+\node [fill=black!20,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (-0.25,0.75) {\small{$-1$}};
+\node [fill=black!20,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (-0.25,0.25) {$0$};
+\node [fill=black!20,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (-0.25,-0.25) {$1$};
+\node [fill=black!20,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (-0.25,-0.75) {$0$};
+\node [fill=black!20,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (0.25,0.75) {$0$};
+\node [fill=black!20,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (0.25,0.25) {\small{$-1$}};
+\node [fill=black!20,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (0.25,-0.25) {$1$};
+\node [fill=black!20,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (0.25,-0.75) {$0$};
+\node [anchor=north] (xlabel) at (0,-1.2) {$\textbf{w}$};
+}
+{\draw [->,thick,dashed] (-1.5in+2em+1.5em,-0.3) .. controls +(east:2) and +(west:1) .. (-0.55,0.8) node [pos=0.5,left] {\scriptsize{\textbf{矩阵乘}}};}
+{\draw [->,thick,dashed] (-1.5in+2em+1.0em,-0.5) .. controls +(east:2) and +(west:1) .. (-0.55,0.8) ;}
+{\draw [->,thick,dashed] (-1.5in+2em+0.5em,-0.7) .. controls +(east:2.5) and +(west:1) .. (-0.55,0.8) ;}
+{\draw [->,thick,dashed] (-1.5in+2em,-0.9) .. controls +(east:3) and +(west:1) .. (-0.55,0.8);}
+\end{scope}
+\begin{scope}[yshift=6.5em,xshift=1em+3in]
+{
+\draw[step=0.5cm,color=orange,thick] (-0.5,-1) grid (0.5,1.0);
+\foreach \y in {+0.75,+0.25,-0.25,-0.75}{
+  \setcounter{mycount1}{2}
+  \foreach \x in {-0.25,0.25}{
+    \node [fill=orange!20,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,\y) {$\number\value{mycount1}$};
+    \addtocounter{mycount1}{-1};
+  }
+}
+}
+\node [anchor=south west] (label21) at (-0.8,0.9) {\footnotesize{\ding{172}}};
+\end{scope}
+\begin{scope}[yshift=6em,xshift=0.5em+3in]
+{
+\draw[step=0.5cm,color=blue,thick] (-0.5,-1) grid (0.5,1.0);
+\foreach \y in {+0.75,+0.25,-0.25,-0.75}{
+  \setcounter{mycount1}{2}
+  \foreach \x in {-0.25,0.25}{
+    \node [fill=blue!20,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,\y) {$\number\value{mycount1}$};
+    \addtocounter{mycount1}{-1};
+  }
+}
+}
+\node [anchor=south west] (label22) at (-0.8,0.9) {\footnotesize{\ding{173}}};
+\end{scope}
+\begin{scope}[yshift=5.5em,xshift=0em+3in]
+{
+\draw[step=0.5cm,color=ugreen,thick] (-0.5,-1) grid (0.5,1.0);
+\foreach \y in {+0.75,+0.25,-0.25,-0.75}{
+  \setcounter{mycount1}{2}
+  \foreach \x in {-0.25,0.25}{
+    \node [fill=green!20,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,\y) {$\number\value{mycount1}$};
+    \addtocounter{mycount1}{-1};
+  }
+}
+}
+\node [anchor=south west] (label23) at (-0.8,0.9) {\footnotesize{\ding{174}}};
+\end{scope}
+\begin{scope}[yshift=5.0em,xshift=-0.5em+3in]
+{
+\draw[step=0.5cm,color=red,thick] (-0.5,-1) grid (0.5,1.0);
+\foreach \y in {+0.75,+0.25,-0.25,-0.75}{
+  \setcounter{mycount1}{2}
+  \foreach \x in {-0.25,0.25}{
+    \node [fill=red!20,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,\y) {$\number\value{mycount1}$};
+    \addtocounter{mycount1}{-1};
+  }
+}
+}
+\node [anchor=south west] (label24) at (-0.8,0.9) {\footnotesize{\ding{175}}};
+{
+\node [anchor=north] (xlabel) at (0,-1.2) {$\textbf{x} \cdot \textbf{w}$};
+\node [anchor=center] (elabel) at (-0.7in,0) {\Huge{$\textbf{=}$}};
+}
+\end{scope}
+\end{tikzpicture}
+%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-tensor-sample.tex
+++ b/Chapter9/Figures/fig-tensor-sample.tex
+%%%------------------------------------------------------------------------------------------------------------
+\newcounter{mycount1}
+\newcounter{mycount2}
+\newcounter{mycount3}
+\newcounter{mycount4}
+\begin{tikzpicture}
+\begin{scope}[yshift=6.5em,xshift=1em]
+\setcounter{mycount1}{1}
+\draw[step=0.5cm,color=orange,line width=0.2mm] (-2,-2) grid (1,1);
+\foreach \y in {+0.5,-0.5,-1.5}
+  \foreach \x in {-1.5,-0.5,0.5}{
+    \node [fill=orange!20,inner sep=0pt,minimum height=0.98cm,minimum width=0.98cm] at (\x,\y) {\number\value{mycount1}};
+    \addtocounter{mycount1}{1};
+  }
+\end{scope}
+\begin{scope}[yshift=5.5em,xshift=0em]
+\setcounter{mycount2}{2}
+\draw[step=0.5cm,color=blue,line width=0.2mm] (-2,-2) grid (1,1);
+\foreach \y in {+0.5,-0.5,-1.5}
+  \foreach \x in {-1.5,-0.5,0.5}{
+    \node [fill=blue!20,inner sep=0pt,minimum height=0.98cm,minimum width=0.98cm] at (\x,\y) {\number\value{mycount2}};
+    \addtocounter{mycount2}{1};
+  }
+\end{scope}
+\begin{scope}[yshift=4.5em,xshift=-1em]
+\setcounter{mycount3}{3}
+\draw[step=0.5cm,color=ugreen,line width=0.2mm] (-2,-2) grid (1,1);
+\foreach \y in {+0.5,-0.5,-1.5}
+  \foreach \x in {-1.5,-0.5,0.5}{
+    \node [fill=green!20,inner sep=0pt,minimum height=0.98cm,minimum width=0.98cm] at (\x,\y) {\number\value{mycount3}};
+    \addtocounter{mycount3}{1};
+  }
+\end{scope}
+\begin{scope}[yshift=3.5em,xshift=-2em]
+\setcounter{mycount4}{4}
+\draw[step=0.5cm,color=red,line width=0.2mm] (-2,-2) grid (1,1);
+\foreach \y in {+0.5,-0.5,-1.5}
+  \foreach \x in {-1.5,-0.5,0.5}{
+    \node [fill=red!20,inner sep=0pt,minimum height=0.98cm,minimum width=0.98cm] at (\x,\y) {\number\value{mycount4}};
+    \addtocounter{mycount4}{1};
+  }
+\end{scope}
+\end{tikzpicture}
+%%%------------------------------------------------------------------------------------------------------------
\ No newline at end of file
--- a/Chapter9/Figures/fig-the-amount-of-data-in-a-bilingual-corpus.tex
+++ b/Chapter9/Figures/fig-the-amount-of-data-in-a-bilingual-corpus.tex
+%%%------------------------------------------------------------------------------------------------------------
+ \begin{tikzpicture}
+      \scriptsize{
+\begin{semilogyaxis}[
+    width=.75\textwidth,
+    height=.30\textwidth,
+    yticklabel style={/pgf/number format/precision=1,/pgf/number format/fixed zerofill},
+    xticklabel style={/pgf/number format/1000 sep=},
+    xlabel style={yshift=0.5em},
+    xlabel={\footnotesize{Year}},ylabel={\footnotesize{句子数量}},
+    ymin=1,ymax=1000000000000,
+    xmin=1999,xmax=2020,xtick={2000,2005,2010,2015,2020},
+    legend style={yshift=-5em,xshift=0em,legend cell align=left,legend plot pos=right}
+]
+\addplot[purple,mark=square,mark=star,very thick] coordinates {(2001,10000) (2005,2000000) (2008,8000000) (2009,9000000) (2011,10000000) (2012,12000000) (2014,20000000) (2016,30000000) (2018,40000000) };
+\addlegendentry{\tiny{Bi-text used in MT papers}\ \ \ \ \ \ \ \ \ \ }
+{
+\addplot[ublue,mark=otimes*,very thick] coordinates {(2005,10000000) (2008,100000000) (2012,3000000000) (2016,5000000000) (2019,10000000000) };
+\addlegendentry{\tiny{Bi-text used in practical systems}}
+}
+\end{semilogyaxis}
+}
+ \end{tikzpicture}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-translation.tex
+++ b/Chapter9/Figures/fig-translation.tex
+%%%------------------------------------------------------------------------------------------------------------
+\begin{tikzpicture}
+\tikzstyle{neuron} = [rectangle,draw,thick,fill=red!30,red!35,minimum height=2em,minimum width=2em,font=\small]
+\node[neuron,anchor=north] (a1) at (0,0) {};
+\draw[->,thick] ([xshift=-2em,yshift=0em]a1.south) to ([xshift=3em,yshift=0em]a1.south);
+\draw[->,thick] ([xshift=0em,yshift=-4em]a1.west) to ([xshift=0em,yshift=2em]a1.west);
+\node[below] at ([xshift=0.5em,yshift=-1em]a1.west){0};
+\node[below] at ([xshift=2em,yshift=-1em]a1.west){1};
+\node[below] at ([xshift=-0.5em,yshift=2em]a1.west){1};
+\node [anchor=west] (x) at ([xshift=-0.7em,yshift=1em]a1.south) {\Large{$\textbf{F}$}};
+{
+\tikzstyle{neuron} = [rectangle,draw,thick,fill=red!30,red!35,minimum height=2em,minimum width=2em,font=\small]
+\node[neuron,anchor=north] (a2) at ([xshift=10em,yshift=0em]a1.south) {};
+\draw[->,thick] ([xshift=-2em,yshift=0em]a2.north) to ([xshift=3em,yshift=0em]a2.north);
+\draw[->,thick] ([xshift=0em,yshift=-2em]a2.west) to ([xshift=0em,yshift=4em]a2.west);
+\node[above] at ([xshift=0.5em,yshift=1em]a2.west){0};
+\node[above] at ([xshift=2em,yshift=1em]a2.west){1};
+\node[below] at ([xshift=-0.5em,yshift=0em]a2.west){-1};
+\node [anchor=west] (x) at ([xshift=-3.5cm,yshift=2em]a2.north) {\scriptsize{
+    $w=\begin{bmatrix}
+    1&0&0\\
+    0&-1&0\\
+    0&0&1
+    \end{bmatrix}$}
+    };
+\node [anchor=west,rotate = 180] (x) at ([xshift=0.7em,yshift=1em]a2.south) {\Large{$\textbf{F}$}};
+\draw[-stealth, line width=2pt,dashed] ([xshift=4em,yshift=0em]a1.south) to ([xshift=-3em,yshift=0em]a2.north);
+}
+{
+\tikzstyle{neuron} = [rectangle,draw,thick,fill=red!30,red!35,minimum height=2em,minimum width=2em,font=\small]
+\node[neuron,anchor=north] (a3) at ([xshift=11em,yshift=2.05em]a2.south) {};
+\draw[->,thick] ([xshift=-3em,yshift=0em]a3.north) to ([xshift=2em,yshift=0em]a3.north);
+\draw[->,thick] ([xshift=-1em,yshift=-2em]a3.west) to ([xshift=-1em,yshift=4em]a3.west);
+\node[above] at ([xshift=-0.5em,yshift=1em]a3.west){0};
+\node[above] at ([xshift=1em,yshift=1em]a3.west){1};
+\node[left] at ([xshift=-0.75em,yshift=-0.5em]a3.west){-1};
+\node [anchor=west,rotate = 180] (x) at ([xshift=0.7em,yshift=1em]a3.south) {\Large{$\textbf{F}$}};
+\node [anchor=west] (x) at ([xshift=-4cm,yshift=2em]a3.north) {\scriptsize{
+    $b=\begin{bmatrix}
+    0.5&0&0\\
+    0&0&0\\
+    0&0&0
+    \end{bmatrix}$}
+    };
+\draw[-stealth, line width=2pt,dashed] ([xshift=3em,yshift=1em]a2.east) to ([xshift=-3em,yshift=1em]a3.west);
+}
+\end{tikzpicture}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-two-layer-neural-network.tex
+++ b/Chapter9/Figures/fig-two-layer-neural-network.tex
+%%%------------------------------------------------------------------------------------------------------------
+\begin{tikzpicture}
+%% a two-layer neural network
+\begin{scope}
+\tikzstyle{neuronnode} = [minimum size=1.7em,circle,draw,ublue,very thick,inner sep=1pt, fill=white,align=center,drop shadow={shadow xshift=0.1em,shadow yshift=-0.1em}]
+%% input and hidden layers
+\node [neuronnode] (n10) at (0,0) {\tiny{$f$}\\[-1ex] \tiny{$\sum$}};
+\node [neuronnode] (n11) at (1.0,0) {\tiny{$f$}\\[-1ex] \tiny{$\sum$}};
+\draw [-,ublue] (n10.west) -- (n10.east);
+\draw [-,ublue] (n11.west) -- (n11.east);
+\node [anchor=north] (x1) at ([yshift=-4em]n11.south) {$x_1$};
+\node [anchor=north] (b) at ([yshift=-4em]n10.south) {$b$};
+{
+\draw [->,thick] (b.north) -- ([yshift=-0.1em]n10.south);
+\draw [->,thick] (x1.north) -- ([yshift=-0.1em]n10.290);
+}
+{
+\draw [->,thick] (b.north) -- ([yshift=-0.1em]n11.250);
+\draw [->,thick] (x1.north) -- ([yshift=-0.1em]n11.south);
+}
+{
+\draw [->,thick,blue] (b.north) -- ([yshift=-0.1em]n11.250);
+\draw [->,thick,purple] (x1.north) -- ([yshift=-0.1em]n11.south);
+}
+%% output layers
+\node [neuronnode] (n20) at (0.5,4em) {\scriptsize{$\sum$}};
+{\draw [->,thick,brown] ([yshift=0.1em]n10.north) -- ([yshift=-0.1em]n20.250);}
+{\draw [->,thick,orange] ([yshift=0.1em]n11.north) -- ([yshift=-0.1em]n20.290);}
+\node [] (y) at ([yshift=2.5em]n20.north) {$y$};
+\draw [->,thick] ([yshift=0.1em]n20.north) -- (y.south);
+%% weight and bias
+{\node [anchor=center,rotate=90,fill=white,inner sep=1pt] (b0) at ([yshift=2em,xshift=-0.5em]b.north) {\scriptsize{$b_1$}};}
+{\node [anchor=center,rotate=-59,fill=white,inner sep=1pt] (w2) at ([yshift=1em,xshift=-1.0em]x1.north) {\scriptsize{$w_1$}};}
+{\node [anchor=center,rotate=62,fill=white,inner sep=1pt] (w21) at ([yshift=1.2em,xshift=-0.2em]n10.north) {\scriptsize{$w'_1$}};}
+{\node [anchor=center,rotate=-62,fill=white,inner sep=1pt] (w22) at ([yshift=1.2em,xshift=0.2em]n11.north) {\scriptsize{$w'_2$}};}
+{\node [anchor=center,rotate=59,fill=white,inner sep=1pt] (b1) at ([yshift=3.4em,xshift=1.5em]b.north) {\scriptsize{$b_2$}};}
+{\node [anchor=center,rotate=90,fill=white,inner sep=1pt] (w1) at ([yshift=2em,xshift=0.5em]x1.north) {\scriptsize{$w_2$}};}
+%% sigmoid box
+\begin{scope}
+{
+\node [anchor=west] (flabel) at ([xshift=1in]y.east) {\footnotesize{sigmoid:}};
+\node [anchor=north east] (slabel) at ([xshift=0]flabel.south east) {\footnotesize{sum:}};
+\node [anchor=west,inner sep=2pt] (flabel2) at (flabel.east) {\footnotesize{$f(s)=1/(1+e^{-s})$}};
+\node [anchor=west,inner sep=2pt] (flabel3) at (slabel.east) {\footnotesize{$s=x_1 \cdot w + b$}};
+\draw [->,thick,dotted] ([yshift=-0.3em,xshift=-0.1em]n11.60)  .. controls +(east:1) and +(west:2) ..  ([xshift=-0.2em]flabel.west) ;
+\begin{pgfonlayer}{background}
+{
+\node [rectangle,inner sep=0.2em,fill=blue!20,drop shadow={shadow xshift=0.1em,shadow yshift=-0.1em}] [fit = (flabel) (flabel2) (flabel3)] (funcbox) {};
+}
+\end{pgfonlayer}
+}
+\end{scope}
+\end{scope}
+\end{tikzpicture}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-w1.tex
+++ b/Chapter9/Figures/fig-w1.tex
+%%%------------------------------------------------------------------------------------------------------------
+\begin{tikzpicture}
+%% a two-layer neural network
+\begin{scope}
+{
+\draw [->,thick] (-1.8,0) -- (1.8,0);
+\draw [->,thick] (0,0) -- (0,2);
+\draw [-] (-0.05,1) -- (0.05,1);
+\node [anchor=east,inner sep=1pt] (label1) at (0,1) {\tiny{1}};
+\node [anchor=south east,inner sep=1pt] (label2) at (0,0) {\tiny{0}};
+\node [anchor=south east,inner sep=1pt] (labela) at (0.2,-0.5) {\footnotesize{(a)}};
+}
+{\node [anchor=north west,align=left] (wblabel) at (-1.8,2) {\scriptsize{$w_1=100$}\\[-0ex] {\scriptsize{\ $b_1=-4$}}};}
+{\draw [-,very thick,ublue,rounded corners=0.1em] (-1.5,0) -- (0.5,0) -- (0.5,1) -- (1.5,1);}
+\end{scope}
+%---------------------------------------------------------------------------------------------
+\begin{scope}[xshift=1.6in]
+{
+\draw [->,thick] (-1.8,0) -- (1.8,0);
+\draw [->,thick] (0,0) -- (0,2);
+\draw [-] (-0.05,1) -- (0.05,1);
+\node [anchor=east,inner sep=1pt] (label1) at (0,1) {\tiny{1}};
+\node [anchor=south east,inner sep=1pt] (label2) at (0,0) {\tiny{0}};
+\node [anchor=south east,inner sep=1pt] (labelb) at (0.2,-0.5) {\footnotesize{(b)}};
+}
+{\node [anchor=north west,align=left] (wblabel) at (-1.8,2) {{\scriptsize{$w'_1=0.9$}}};}
+{\draw [-,very thick,ublue,rounded corners=0.1em] (-1.8,0) -- (0.5,0) -- (0.5,0.9) -- (1.8,0.9);}
+\end{scope}
+%-----------------------------------------------------------------------------------------------
+\begin{scope}[xshift=3.2in]
+{
+\draw [->,thick] (-1.8,0) -- (1.8,0);
+\draw [->,thick] (0,0) -- (0,2);
+\draw [-] (-0.05,1) -- (0.05,1);
+\node [anchor=east,inner sep=1pt] (label1) at (0,1) {\tiny{1}};
+\node [anchor=south east,inner sep=1pt] (label2) at (0,0) {\tiny{0}};
+\node [anchor=south east,inner sep=1pt] (labelc) at (0.2,-0.5) {\footnotesize{(c)}};
+}
+{\node [anchor=north west,align=left] (wblabel) at (-1.8,2) {{\scriptsize{$w'_1=0.7$}}};}
+{\draw [-,very thick,ublue,rounded corners=0.1em] (-1.5,0) -- (0.5,0) -- (0.5,0.7) -- (1.5,0.7);}
+\end{scope}
+\end{tikzpicture}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-w2.tex
+++ b/Chapter9/Figures/fig-w2.tex
+%%%------------------------------------------------------------------------------------------------------------
+\begin{tikzpicture}
+%% a two-layer neural network
+\begin{scope}
+{
+\draw [->,thick] (-1.8,0) -- (1.8,0);
+\draw [->,thick] (0,0) -- (0,2);
+\draw [-] (-0.05,1) -- (0.05,1);
+\node [anchor=east,inner sep=1pt] (label1) at (0,1) {\tiny{1}};
+\node [anchor=south east,inner sep=1pt] (label2) at (0,0) {\tiny{0}};
+\node [anchor=south east,inner sep=1pt] (labela) at (0.2,-0.5) {\footnotesize{(a)}};
+}
+{\node [anchor=north west,align=left] (wblabel) at (-1.8,2) {{\scriptsize{$w'_1=0.7$}}};}
+{\draw [-,very thick,ublue,rounded corners=0.1em] (-1.5,0) -- (0.5,0) -- (0.5,0.7) -- (1.5,0.7);}
+\end{scope}
+%---------------------------------------------------------------------------------------------
+\begin{scope}[xshift=1.6in]
+{
+\draw [->,thick] (-1.8,0) -- (1.8,0);
+\draw [->,thick] (0,0) -- (0,2);
+\draw [-] (-0.05,1) -- (0.05,1);
+\node [anchor=east,inner sep=1pt] (label1) at (0,1) {\tiny{1}};
+\node [anchor=south east,inner sep=1pt] (label2) at (0,0) {\tiny{0}};
+\node [anchor=south east,inner sep=1pt] (labelb) at (0.2,-0.5) {\footnotesize{(b)}};
+}
+{\node [anchor=north west,align=left] (wblabel) at (-1.8,2) {{\scriptsize{$w_2=100$}}\\[-0ex] {\scriptsize{\ $b_2=-6$}}\\[-0ex] {\scriptsize{\ $w'_2=0.7$}}};}
+{\draw [-,very thick,ublue,rounded corners=0.1em] (-1.5,0) -- (0.5,0) -- (0.5,0.7) -- (0.7,0.7) -- (0.7,1.4) -- (1.5,1.4);}
+\end{scope}
+%-----------------------------------------------------------------------------------------------
+\begin{scope}[xshift=3.2in]
+{
+\draw [->,thick] (-1.8,0) -- (1.8,0);
+\draw [->,thick] (0,0) -- (0,2);
+\draw [-] (-0.05,1) -- (0.05,1);
+\node [anchor=east,inner sep=1pt] (label1) at (0,1) {\tiny{1}};
+\node [anchor=south east,inner sep=1pt] (label2) at (0,0) {\tiny{0}};
+\node [anchor=south east,inner sep=1pt] (labelc) at (0.2,-0.5) {\footnotesize{(c)}};
+}
+{\node [anchor=north west,align=left] (wblabel) at (-1.8,2) {\scriptsize{$w_2=100$}\\[-0ex] \scriptsize{\ $b_2=-6$}\\[-0ex] {\scriptsize{\ $w'_2=-0.7$}}};}
+{\draw [-,very thick,ublue,rounded corners=0.1em] (-1.5,0) -- (0.5,0) -- (0.5,0.7) -- (0.7,0.7) -- (0.7,0) -- (1.5,0);}
+\end{scope}
+\end{tikzpicture}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-weather-forward.tex
+++ b/Chapter9/Figures/fig-weather-forward.tex
+%%%------------------------------------------------------------------------------------------------------------
+\begin{tikzpicture}
+\node [anchor=west,minimum width=1.5em,minimum height=1.5em] (part1) at (0,0) {\footnotesize{$y$}};
+\node [anchor=north,minimum width=1.5em,minimum height=1.5em] (part1-2) at ([xshift=-1.2em,yshift=-0.3em]part1.south) {\scriptsize {$1\times1$}};
+\node [anchor=north,draw,minimum width=4.0em,minimum height=1.5em,fill=orange!20] (part2) at ([yshift=-1.5em]part1.south) {\footnotesize {$\rm{Sigmoid}$}};
+\draw [-,thick](part1.south)--(part2.north);
+\node [anchor=north,minimum width=1.5em,minimum height=1.5em] (part2-2) at ([xshift=-1.2em,yshift=-0.3em]part2.south) {\scriptsize {$1\times1$}};
+\node [anchor=north,draw,minimum width=4.0em,minimum height=1.5em,fill=green!20] (part3) at ([yshift=-1.5em]part2.south) {\footnotesize {$\rm{ADD}$}};
+\draw [-,thick](part2.south)--(part3.north);
+\node [anchor=north,minimum width=1.5em,minimum height=1.5em] (part3-2) at ([xshift=-1.2em,yshift=-0.3em]part3.south) {\scriptsize {$1\times1$}};
+\node [anchor=north,draw,minimum width=4.0em,minimum height=1.5em,fill=blue!20] (part4) at ([yshift=-1.5em]part3.south) {\footnotesize {$\rm{MUL}$}};
+\draw [-,thick](part3.south)--(part4.north);
+\node [anchor=north,minimum width=1.5em,minimum height=1.5em] (part4-2) at ([xshift=-1.2em,yshift=-0.2em]part4.south) {\scriptsize {$1\times 2$}};
+\node [anchor=north,minimum width=4.0em,minimum height=1.5em] (part5) at ([yshift=-1.4em]part4.south) {\footnotesize {$\mathbf a$}};
+\draw [-,thick](part4.south)--([yshift=-0.1em]part5.north);
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\node [anchor=west,minimum width=2.0em,minimum height=1.5em,draw,fill=red!20] (part5-3) at ([xshift=0.0em,yshift=0.1em]part5.east) {\footnotesize {$\mathbf w^{[2]}$}};
+\node [anchor=west,minimum width=2.0em,minimum height=1.5em,draw,fill=orange!40] (part5-4) at ([xshift=2.0em,yshift=0.0em]part5-3.east) {\footnotesize {$ b^{[2]}$}};
+\draw[-,thick](part4.south)--(part5-3.north);
+\draw[-,thick](part3.south)--(part5-4.north);
+\node [anchor=south,minimum width=1.5em,minimum height=1.5em] (part5-3-1) at ([xshift=1.1em,yshift=-0.45em]part5-3.north) {\scriptsize {$1\times 2$}};
+\node [anchor=south,minimum width=1.5em,minimum height=1.5em] (part5-4-1) at ([xshift=1.1em,yshift=-0.45em]part5-4.north) {\scriptsize {$1\times1$}};
+%%%%%%%%%%%%%%%%%%%%%%%%%%
+\node [anchor=north,minimum width=1.5em,minimum height=1.5em] (part5-2) at ([xshift=-1.2em,yshift=-0.2em]part5.south) {\scriptsize {$1\times 2$}};
+\node [anchor=north,draw,minimum width=4.0em,minimum height=1.5em,fill=yellow!20] (part6) at ([yshift=-1.4em]part5.south) {\footnotesize {$\rm{Tanh}$}};
+\draw [-,thick]([yshift=0.1em]part5.south)--(part6.north);
+\node [anchor=north,minimum width=1.5em,minimum height=1.5em] (part6-2) at ([xshift=-1.2em,yshift=-0.3em]part6.south) {\scriptsize {$1\times 2$}};
+\node [anchor=north,draw,minimum width=4.0em,minimum height=1.5em,fill=green!20] (part7) at ([yshift=-1.5em]part6.south) {\footnotesize {$\rm{ADD}$}};
+\draw [-,thick](part6.south)--(part7.north);
+\node [anchor=north,minimum width=1.5em,minimum height=1.5em] (part7-2) at ([xshift=-1.2em,yshift=-0.3em]part7.south) {\scriptsize {$1\times 2$}};
+\node [anchor=north,draw,minimum width=4.0em,minimum height=1.5em,fill=blue!20] (part8) at ([yshift=-1.5em]part7.south) {\footnotesize {$\rm{MUL}$}};
+\draw [-,thick](part7.south)--(part8.north);
+\node [anchor=north,minimum width=1.5em,minimum height=1.5em] (part8-2) at ([xshift=-1.2em,yshift=-0.2em]part8.south) {\scriptsize{$1\times 2$}};
+\node [anchor=north,minimum width=4.0em,minimum height=1.5em] (part9) at ([yshift=-1.4em]part8.south) {\footnotesize {$\mathbf x$}};
+\draw [-,thick](part8.south)--([yshift=-0.1em]part9.north);
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\node [anchor=west,minimum width=2.0em,minimum height=1.5em,draw,fill=red!20] (part9-3) at ([xshift=0.0em,yshift=0.1em]part9.east) {\footnotesize {$\mathbf w^{[1]}$}};
+\node [anchor=west,minimum width=2.0em,minimum height=1.5em,draw,fill=orange!40] (part9-4) at ([xshift=2.0em,yshift=0.0em]part9-3.east) {\footnotesize {$\mathbf b^{[1]}$}};
+\draw[-,thick](part8.south)--(part9-3.north);
+\draw[-,thick](part7.south)--(part9-4.north);
+\node [anchor=south,minimum width=1.5em,minimum height=1.5em] (part9-3-1) at ([xshift=1.1em,yshift=-0.45em]part9-3.north) {\scriptsize {$3\times 2$}};
+\node [anchor=south,minimum width=1.5em,minimum height=1.5em] (part9-4-1) at ([xshift=1.1em,yshift=-0.45em]part9-4.north) {\scriptsize {$1\times 2$}};
+%%%%%%%%%%%%%%%%%%%%%%%%%%
+\end{tikzpicture}
+%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-weather.tex
+++ b/Chapter9/Figures/fig-weather.tex
+%%%------------------------------------------------------------------------------------------------------------
+\begin{tikzpicture}
+\begin{scope}
+%左
+\node [anchor=west,draw=ublue,minimum width=3.55em,fill=yellow!20] (part1-1) at (0,0) {\scriptsize{天空状况}};
+\node [anchor=north] (inputlabel) at ([yshift=2em]part1-1.north) {\scriptsize{输入}};
+\node [anchor=north,draw=ublue,minimum width=3.55em,fill=yellow!20] (part1-2) at ([yshift=-2.0em]part1-1.south) {\scriptsize {低空气温}};
+\node [anchor=north,draw=ublue,minimum width=3.55em,fill=yellow!20] (part1-3) at ([yshift=-2.0em]part1-2.south) {\scriptsize {水平气压}};
+\node [rectangle,rounded corners,draw=black!50,densely dashed,inner sep=0.4em] [fit = (part1-1) (part1-2) (part1-3) (inputlabel)] (inputshadow) {};
+\node [anchor=north,draw=ublue,minimum width=3.55em,fill=yellow!20] (part1-4) at ([yshift=-2.0em]part1-3.south) {\scriptsize {偏置1}};
+\node [anchor=north,minimum width=2.5em] (part1-5) at ([yshift=-0.5em]part1-4.south) {\scriptsize {输入层}};
+%中
+\node [circle,anchor=west,draw=ublue,minimum width=2.5em,fill=blue!20] (part2-1) at ([xshift=2.0em,yshift=1.7em]part1-2.east) {\scriptsize {温度}};
+\node [anchor=north] (hidlabel) at ([yshift=3.1em]part2-1.north) {\scriptsize{特征}};
+\node [circle,anchor=west,draw=ublue,minimum width=2.5em,fill=blue!20] (part2-2) at ([xshift=2.0em,yshift=-1.7em]part1-2.east) {\scriptsize {风速}};
+\node [rectangle,rounded corners,draw=black!50,densely dashed,inner sep=0.4em] [fit = (part2-1) (part2-2) (hidlabel) ] (inputshadow) {};
+\node [circle,anchor=west,draw=ublue,minimum width=2.5em,fill=blue!20,inner sep=2pt] (part2-3) at ([xshift=2.0em,yshift=-1.7em]part1-3.east) {\scriptsize {偏置2}};
+\node [anchor=north,minimum width=3.0em] (part2-4) at ([xshift=0.0em,yshift=-1.6em]part2-3.south) {\scriptsize{隐藏层}};
+\node [anchor=north] (labela) at ([xshift=0.0em,yshift=-3em]part2-3.south) {\footnotesize {(a)}};
+%右
+\node [anchor=west,draw=ublue,minimum width=3.0em,fill=purple!20] (part3-1) at ([xshift=2em,yshift=0.0em]part2-2.east) {\scriptsize {穿衣指数}};
+\node [anchor=north,minimum width=3.0em] (part3-2) at ([yshift=-5.55em]part3-1.south) {\scriptsize{输出层}};
+%\node[anchor=south,minimum height=18em,minimum width=16.0em,draw=ublue,dotted,thick] (part2out) at ([xshift=4.8em,yshift=-11em]part1-2.north) {};
+%连线
+\draw [->,line width=0.2mm,ublue](part1-1.east)--([xshift=-0.05em]part2-1.170);
+\draw [->,line width=0.2mm,ublue](part1-1.east)--([xshift=-0.05em]part2-2.165);
+\draw [->,line width=0.2mm,ublue](part1-2.east)--([xshift=-0.05em]part2-1.175);
+\draw [->,line width=0.2mm,ublue](part1-2.east)--([xshift=-0.05em]part2-2.175);
+\draw [->,line width=0.2mm,ublue](part1-3.east)--([xshift=-0.05em]part2-1.185);
+\draw [->,line width=0.2mm,ublue](part1-3.east)--([xshift=-0.05em]part2-2.185);
+\draw [->,line width=0.2mm,ublue](part1-4.east)--([xshift=-0.05em]part2-1.195);
+\draw [->,line width=0.2mm,ublue](part1-4.east)--([xshift=-0.05em]part2-2.195);
+\draw [->,line width=0.2mm,ublue](part2-1.east)--([xshift=-0.05em,yshift=0.2em]part3-1.west);
+\draw [->,line width=0.2mm,ublue](part2-2.east)--([xshift=-0.05em]part3-1.west);
+\draw [->,line width=0.2mm,ublue](part2-3.east)--([xshift=-0.05em,yshift=-0.2em]part3-1.west);
+\end{scope}
+\begin{scope}[xshift=3.0in]
+%左
+\node [anchor=west,align=center,draw=ublue,minimum width=3.55em,minimum height=1.33em,fill=yellow!20] (part1-1) at (0,0) {\normalsize{$x_1$}};
+\node [anchor=north] (inputlabel) at ([yshift=2em]part1-1.north) {\scriptsize{输入$\mathbf x $}};
+\node [anchor=north,draw=ublue,minimum width=3.55em,minimum height=1.33em,fill=yellow!20] (part1-2) at ([yshift=-2.0em]part1-1.south) {\normalsize{$x_2$}};
+\node [anchor=north,draw=ublue,minimum width=3.55em,minimum height=1.33em,fill=yellow!20] (part1-3) at ([yshift=-2.0em]part1-2.south) {\normalsize{$x_3$}};
+y
+\node [rectangle,rounded corners,draw=black!50,densely dashed,inner sep=0.4em] [fit = (part1-1) (part1-2) (part1-3) (inputlabel)] (inputshadow) {};
+\node [anchor=north,draw=ublue,minimum width=3.55em,fill=yellow!20] (part1-4) at ([yshift=-2.0em]part1-3.south) {\footnotesize {$\mathbf b^{[1]} $}};
+\node [anchor=north,minimum width=2.5em] (part1-5) at ([yshift=-0.5em]part1-4.south) {\scriptsize {输入层}};
+%中
+\node [circle,anchor=west,draw=ublue,minimum width=2.5em,fill=blue!20] (part2-1) at ([xshift=2.0em,yshift=1.7em]part1-2.east) {\large{$a_1$}};
+\node [anchor=north] (hidlabel) at ([yshift=3.1em]part2-1.north) {\scriptsize{特征$\mathbf a $}};
+\node [circle,anchor=west,draw=ublue,minimum width=2.5em,fill=blue!20] (part2-2) at ([xshift=2.0em,yshift=-1.7em]part1-2.east) {\large{$a_2$}};
+\node [rectangle,rounded corners,draw=black!50,densely dashed,inner sep=0.4em] [fit = (part2-1) (part2-2) (hidlabel) ] (inputshadow) {};
+\node [circle,anchor=west,draw=ublue,minimum width=2.5em,fill=blue!20,inner sep=2pt] (part2-3) at ([xshift=2.0em,yshift=-1.7em]part1-3.east) {\large {$b^{[2]} $}};
+\node [anchor=north,minimum width=3.0em] (part2-4) at ([xshift=0.0em,yshift=-1.6em]part2-3.south) {\scriptsize{隐藏层}};
+\node [anchor=north] (labelb) at ([xshift=0.0em,yshift=-3em]part2-3.south) {\footnotesize {(b)}};
+%右
+\node [anchor=west,draw=ublue,minimum width=3.0em,fill=purple!20] (part3-1) at ([xshift=2em,yshift=0.0em]part2-2.east) {\large{$y$}};
+\node [anchor=north,minimum width=3.0em] (part3-2) at ([yshift=-5.55em]part3-1.south) {\scriptsize{输出层}};
+%\node[anchor=south,minimum height=18em,minimum width=16.0em,draw=ublue,dotted,thick] (part2out) at ([xshift=4.8em,yshift=-11em]part1-2.north) {};
+%连线
+\draw [->,line width=0.2mm,ublue](part1-1.east)--([xshift=-0.05em]part2-1.170);
+\draw [->,line width=0.2mm,ublue](part1-1.east)--([xshift=-0.05em]part2-2.165);
+\draw [->,line width=0.2mm,ublue](part1-2.east)--([xshift=-0.05em]part2-1.175);
+\draw [->,line width=0.2mm,ublue](part1-2.east)--([xshift=-0.05em]part2-2.175);
+\draw [->,line width=0.2mm,ublue](part1-3.east)--([xshift=-0.05em]part2-1.185);
+\draw [->,line width=0.2mm,ublue](part1-3.east)--([xshift=-0.05em]part2-2.185);
+\draw [->,line width=0.2mm,ublue](part1-4.east)--([xshift=-0.05em]part2-1.195);
+\draw [->,line width=0.2mm,ublue](part1-4.east)--([xshift=-0.05em]part2-2.195);
+\draw [->,line width=0.2mm,ublue](part2-1.east)--([xshift=-0.05em,yshift=0.2em]part3-1.west);
+\draw [->,line width=0.2mm,ublue](part2-2.east)--([xshift=-0.05em]part3-1.west);
+\draw [->,line width=0.2mm,ublue](part2-3.east)--([xshift=-0.05em,yshift=-0.2em]part3-1.west);
+\end{scope}
+\end{tikzpicture}
+%%%------------------------------------------------------------------------------------------------------------
+%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/fig-weight.tex
+++ b/Chapter9/Figures/fig-weight.tex
+%%%------------------------------------------------------------------------------------------------------------
+\begin{tikzpicture}
+%% a two-layer neural network
+\begin{scope}
+{
+\draw [->,thick] (-1.8,0) -- (1.8,0);
+\draw [->,thick] (0,0) -- (0,2);
+\draw [-] (-0.05,1) -- (0.05,1);
+\node [anchor=east,inner sep=1pt] (label1) at (0,1) {\tiny{1}};
+\node [anchor=south east,inner sep=1pt] (label2) at (0,0) {\tiny{0}};
+\node [anchor=south east,inner sep=1pt] (labela) at (0.2,-0.5) {\footnotesize{(a)}};
+}
+{\node [anchor=north west,align=left] (wblabel) at (-1.8,2) {\scriptsize{$w_1=1$}\\[-0ex] \scriptsize{\ $b_1=0$}};}
+{\draw [-,very thick,ublue,domain=-1.5:1.5,samples=100] plot (\x,{1/(1+exp(-2*\x))});}
+\end{scope}
+%---------------------------------------------------------------------------------------------
+\begin{scope}[xshift=1.6in]
+{
+\draw [->,thick] (-1.8,0) -- (1.8,0);
+\draw [->,thick] (0,0) -- (0,2);
+\draw [-] (-0.05,1) -- (0.05,1);
+\node [anchor=east,inner sep=1pt] (label1) at (0,1) {\tiny{1}};
+\node [anchor=south east,inner sep=1pt] (label2) at (0,0) {\tiny{0}};
+\node [anchor=south east,inner sep=1pt] (labelb) at (0.2,-0.5) {\footnotesize{(b)}};
+}
+{\node [anchor=north west,align=left] (wblabel) at (-1.8,2) {{\scriptsize{$w_1=10$}}\\[-0ex] \scriptsize{\ $b_1=0$}};}
+{\draw [-,very thick,ublue,domain=-1.5:1.5,samples=100] plot (\x,{1/(1+exp(-4*\x))});}
+\end{scope}
+%-----------------------------------------------------------------------------------------------
+\begin{scope}[xshift=3.2in]
+{
+\draw [->,thick] (-1.8,0) -- (1.8,0);
+\draw [->,thick] (0,0) -- (0,2);
+\draw [-] (-0.05,1) -- (0.05,1);
+\node [anchor=east,inner sep=1pt] (label1) at (0,1) {\tiny{1}};
+\node [anchor=south east,inner sep=1pt] (label2) at (0,0) {\tiny{0}};
+\node [anchor=south east,inner sep=1pt] (labelc) at (0.2,-0.5) {\footnotesize{(c)}};
+}
+{\node [anchor=north west,align=left] (wblabel) at (-1.8,2) {{\scriptsize{$w_1=100$}}\\[-0ex] \scriptsize{\ $b_1=0$}};}
+{\draw [-,very thick,ublue,rounded corners=0.1em] (-1.5,0) -- (0,0) -- (0,1) -- (1.5,1);}
+\end{scope}
+\end{tikzpicture}
+%%%------------------------------------------------------------------------------------------------------------
--- a/Chapter9/Figures/word-graph.jpg
+++ b/Chapter9/Figures/word-graph.jpg
--- a/Chapter9/chapter9.tex
+++ b/Chapter9/chapter9.tex
--- a/ChapterAppend/chapterappend.tex
+++ b/ChapterAppend/chapterappend.tex
@@ -193,8 +193,8 @@ a(i|j,m,l) &=\frac{c(i|j;\mathbf{s},\mathbf{t})}  {\sum_{i}c(i|j;\mathbf{s},\mat
 对于由$K$个样本组成的训练集$\{(\mathbf{s}^{[1]},\mathbf{t}^{[1]}),...,(\mathbf{s}^{[K]},\mathbf{t}^{[K]})\}$，可以将M-Step的计算调整为：
 \begin{eqnarray}
-f(s_u|t_v) &=\frac{\sum_{k=0}^{K}c_{\mathbb{E}}(s_u|t_v;\mathbf{s}^{[k]},\mathbf{t}^{[k]}) }    {\sum_{s_u} \sum_{k=0}^{K} c_{\mathbb{E}}(s_u|t_v;\mathbf{s}^{[k]},\mathbf{t}^{[k]})} \\
+f(s_u|t_v) &=\frac{\sum_{k=0}^{K}c_{\mathbb{E}}(s_u|t_v;\mathbf{s}^{[k]},\mathbf{t}^{[k]}) }    {\sum_{s_u} \sum_{k=1}^{K} c_{\mathbb{E}}(s_u|t_v;\mathbf{s}^{[k]},\mathbf{t}^{[k]})} \\
-a(i|j,m,l) &=\frac{\sum_{k=0}^{K}c_{\mathbb{E}}(i|j;\mathbf{s}^{[k]},\mathbf{t}^{[k]})}  {\sum_{i}\sum_{k=0}^{K}c_{\mathbb{E}}(i|j;\mathbf{s}^{[k]},\mathbf{t}^{[k]})}
+a(i|j,m,l) &=\frac{\sum_{k=0}^{K}c_{\mathbb{E}}(i|j;\mathbf{s}^{[k]},\mathbf{t}^{[k]})}  {\sum_{i}\sum_{k=1}^{K}c_{\mathbb{E}}(i|j;\mathbf{s}^{[k]},\mathbf{t}^{[k]})}
 \label{eq:append-3}
 \end{eqnarray}

--- a/ChapterPreface/chapterpreface.tex
+++ b/ChapterPreface/chapterpreface.tex
@@ -46,7 +46,7 @@
 \vspace{0.5em}
-本书全面回顾了近三十年内机器翻译的技术发展历程，并围绕{\sffamily\bfseries 机器翻译的统计建模}这一主题对机器翻译的技术方法进行了全面介绍。在写作中，笔者力求用朴实的语言和简洁的实例阐述机器翻译的基本模型，同时对相关的技术前沿进行讨论。其中也会涉及大量的实践经验，包括许多机器翻译系统开发的细节。从这个角度来说，本书不单单是一本理论书籍，它还结合了机器翻译的应用，给读者提供了很多机器翻译技术落地的具体思路。
+本书全面回顾了近三十年内机器翻译的技术发展历程，并围绕{\sffamily\bfseries 机器翻译的建模}这一主题对机器翻译的技术方法进行了全面介绍。在写作中，笔者力求用朴实的语言和简洁的实例阐述机器翻译的基本模型，同时对相关的技术前沿进行讨论。其中也会涉及大量的实践经验，包括许多机器翻译系统开发的细节。从这个角度来说，本书不单单是一本理论书籍，它还结合了机器翻译的应用，给读者提供了很多机器翻译技术落地的具体思路。
 本书可以供计算机相关专业高年级本科生及研究生学习之用，也可以作为自然语言处理领域，特别是机器翻译方向相关研究人员的参考资料。此外，本书各章的主题都十分明确，内容也相对集中。因此，读者也可将每章作为某一专题的学习资料。

--- a/bibliography.bib
+++ b/bibliography.bib
--- a/mt-book-xelatex.tex
+++ b/mt-book-xelatex.tex
@@ -136,7 +136,7 @@
 %\include{Chapter3/chapter3}
 %\include{Chapter4/chapter4}
 %\include{Chapter5/chapter5}
-%\include{Chapter6/chapter6}
+\include{Chapter6/chapter6}
 %\include{Chapter7/chapter7}
 %\include{Chapter8/chapter8}
 %\include{Chapter9/chapter9}
@@ -149,7 +149,7 @@
 %\include{Chapter16/chapter16}
 %\include{Chapter17/chapter17}
 %\include{Chapter18/chapter18}
-\include{ChapterAppend/chapterappend}
+%\include{ChapterAppend/chapterappend}
 %----------------------------------------------------------------------------------------

--- a/run.sh
+++ b/run.sh
--- a/structure.tex
+++ b/structure.tex
@@ -76,7 +76,7 @@
 %	BIBLIOGRAPHY AND INDEX
 %----------------------------------------------------------------------------------------
-\usepackage[style=numeric,citestyle=numeric,sorting=nyt,sortcites=true,maxbibnames=40,minbibnames=30,autopunct=true,babel=hyphen,hyperref=true,abbreviate=false,backref=true,backend=biber,autocite=plain]{biblatex}
+\usepackage[style=numeric,citestyle=numeric,sorting=none,sortcites=true,maxbibnames=40,minbibnames=30,autopunct=true,babel=hyphen,hyperref=true,abbreviate=false,backref=true,backend=biber,autocite=plain]{biblatex}
 %maxbibnames 设置参考文献最多显示作者数目
 %minbibnames 如果作者数目超过maxbibnames，则只显示minbibnames个作者
 \addbibresource{bibliography.bib} % BibTeX bibliography file
@@ -685,5 +685,6 @@ addtohook={%
 \newcommand\chapterseventeen{第十七章}
 \newcommand\chaptereighteen{第十八章}%*
-\newcommand\funp{\textrm}%函数P等使用
+\newcommand\funp{}%函数P等使用，空是斜体，textrm是加粗
-\newcommand\vectorn{\mathbf}%向量N等使用
+\newcommand\vectorn{\textbf}%向量N等使用
+\newcommand\seq{}%序列N等使用