update chapter3 to 3.4

aacc5f12 · 曹润柘 · 20d19d47 · aacc5f12 · aacc5f12 · aacc5f12
Commit aacc5f12 authored Feb 15, 2020 by 曹润柘
--- a/Book/Chapter2/Figures/figure-a-simple-pre-processing-process.tex
+++ b/Book/Chapter2/Figures/figure-a-simple-pre-processing-process.tex
@@ -48,7 +48,7 @@
    \node [ugreen] (input) at (0,0) {猫喜欢吃鱼};
    \node [draw,thick,anchor=west,ublue] (preprocessing) at ([xshift=1em]input.east) {分词系统};
    \node [ugreen,anchor=west] (mtinput) at ([xshift=1em]preprocessing.east) {猫/喜欢/吃/鱼};
-    \node [draw,thick,anchor=west,ublue] (smt) at ([xshift=1em]mtinput.east) {SMT系统};
+    \node [draw,thick,anchor=west,ublue] (smt) at ([xshift=1em]mtinput.east) {MT系统};
    \node [anchor=west] (mtoutput) at ([xshift=1em]smt.east) {...};
    \draw [->,thick,ublue] ([xshift=0.1em]input.east) -- ([xshift=-0.2em]preprocessing.west);
    \draw [->,thick,ublue] ([xshift=0.2em]preprocessing.east) -- ([xshift=-0.1em]mtinput.west);

--- a/Book/Chapter2/chapter2.tex
+++ b/Book/Chapter2/chapter2.tex
@@ -4,6 +4,7 @@
 %----------------------------------------------------------------------------------------
 \renewcommand\figurename{图}%将figure改为图
 \renewcommand\tablename{表}%将figure改为图
+%\renewcommand\arraystretch{1.5}%将表格高度调整为1.5倍
 \chapterimage{chapter_head_1.pdf} % Chapter heading image

 \chapter{词法、语法及统计思想基础}
@@ -77,20 +78,20 @@

 \parinterval 连续变量是在其取值区间内连续取值，无法被一一列举，具有无限个取值的变量。例如，图书馆的开馆时间是8:30-22:00，用$X$代表某人进入图书馆的时间，时间的取值范围是[8:30，22:00]这个时间区间，$X$是一个连续变量。

-\parinterval 概率是度量随机事件呈现其每个可能状态的可能性的数值，概率的大小表征了随机事件在一次试验中发生的可能性大小。用$\textrm{P}(\cdot )$表示一个随机事件的可能性，即事件发生的概率。比如$\textrm{P}(\textrm{太阳从东方升起})$表示“太阳从东方升起的可能性”，同理，$\textrm{P}(A=B)$表示的就是“$A=B$”这件事的可能性。
+\parinterval 概率是度量随机事件呈现其每个可能状态的可能性的数值，本质上它是一个测度函数\cite{茆诗松2011概率论与数理统计教程}\cite{kolmogorov2018foundations}。概率的大小表征了随机事件在一次试验中发生的可能性大小。用$\textrm{P}(\cdot )$表示一个随机事件的可能性，即事件发生的概率。比如$\textrm{P}(\textrm{太阳从东方升起})$表示“太阳从东方升起的可能性”，同理，$\textrm{P}(A=B)$表示的就是“$A=B$”这件事的可能性。

 \parinterval 在概率学中，一个很简单的获取概率的方式是利用相对频度作为概率的估计值。如果$\{x_1,x_2,\dots,x_n \}$是一个试验的样本空间，在相同情况下重复试验N次，观察到样本$x_i (1\leq{i}\leq{n})$的次数为$n_N (x_i )$，那么$x_i$在这N次试验中的相对频率是$\frac{n_N (x_i )}{N}$。当N越来越大时，相对概率也就越来越接近真实概率$\textrm{P}(x_i)$，即$\lim_{N \to \infty}\frac{n_N (x_i )}{N}=\textrm{P}(x_i)$。

-\parinterval 概率函数是用函数形式给出离散变量每个取值发生的概率，其实就是将变量的概率分布转化为数学表达形式。如果我们把$A$看做一个离散变量，$a$看做变量$A$的一个取值，那么$\textrm{P}(A)$被称作变量$A$的概率函数，$\textrm{P}(A=a)$被称作$A = a$的概率值，简记为$\textrm{P}(a)$。下表为离散变量$A$的概率分布，给出了$A$的所有取值及其概率。
+\parinterval 概率函数是用函数形式给出离散变量每个取值发生的概率，其实就是将变量的概率分布转化为数学表达形式。如果我们把$A$看做一个离散变量，$a$看做变量$A$的一个取值，那么$\textrm{P}(A)$被称作变量$A$的概率函数，$\textrm{P}(A=a)$被称作$A = a$的概率值，简记为$\textrm{P}(a)$。例如，在相同条件下掷一个骰子50次，用$A$表示投骰子出现的点数这个离散变量，$a_i$表示点数的取值，$\textrm{P}_i$表示$A=a_i$的概率值。下表为$A$的概率分布，给出了$A$的所有取值及其概率。
 %表1--------------------------------------------------------------------
 \begin{table}[htp]
 \centering
 \caption{离散变量A的概率分布}
-\begin{tabular}{c|c c c c c}
+\begin{tabular}{c|c c c c c c}
               \hline
-                 A & $a_1$ & $a_2$ & ... & $a_n$ & ...\\
+\rule{0pt}{15pt}     A & $a_1=1$ & $a_2=2$ & $a_3=3$ & $a_4=4$ & $a_5=5$ & $a_6=6$\\
               \hline
-                 $P_i$ & $P_1$  &  $P_2$ &  ... & $P_n$ & ...  \\
+\rule{0pt}{15pt}     $\textrm{P}_i$ & $\textrm{P}_1=\frac{4}{25}$  &  $\textrm{P}_2=\frac{3}{25}$ &  $\textrm{P}_3=\frac{4}{25}$ & $\textrm{P}_4=\frac{6}{25}$ & $\textrm{P}_5=\frac{3}{25}$ & $\textrm{P}_6=\frac{1}{25}$  \\
               \hline
             \end{tabular}
             \label{tab1}    
@@ -224,7 +225,7 @@
 \label{eqC2.9}
 \end{equation}

-\parinterval 此可以看出使用链式法则可以大大减小求解概率表达式时的计算量。
+\parinterval 由此可以看出使用链式法则可以大大减小求解概率表达式时的计算量。


 \subsection{贝叶斯法则（Bayes’Rule）}\index{Chapter2.2.4}
@@ -246,6 +247,21 @@

 \parinterval 这就是全概率公式。

+\parinterval 举个例子，小张从家到公司有三条路分别为a，b，c，选择每条路的概率分别为0.5，0.3，0.2，那么：
+
+\parinterval $S_a$: 选择a路去上班，$S_b$: 选择b路去上班，$S_c$: 选择c路去上班 $S$：小张去上班,这四件事的关系即为：$S_a$，$S_b$，$S_c$是$S$的划分。
+
+\parinterval 如果三条路不拥堵的概率分别为$\textrm{P}({S_{a}^{'}})$=0.2，$\textrm{P}({S_{b}^{'}})$=0.4，$\textrm{P}({S_{c}^{'}})$=0.7，那么事件L：小张上班没有遇到拥堵情况的概率就是：
+\begin{equation}
+\begin{split}
+{\textrm{P}(L)} &= {\textrm{P}( L| S_a )\textrm{P}(S_a )+\textrm{P}( L| S_b )\textrm{P}(S_b )+\textrm{P}( L| S_c )\textrm{P}(S_c )}\\
+& ={\textrm{P}({S_{a}^{'}})\textrm{P}(S_a)+\textrm{P}({S_{b}^{'}})\textrm{P}(S_b)+\textrm{P}({S_{c}^{'}})\textrm{P}(S_c) }\\
+& ={0.36}\nonumber \\
+\end{split}
+\end{equation}
+
+%$\textrm{P}(L)=\textrm{P}( L| S_a )\textrm{P}(S_a )+\textrm{P}( L| S_b )\textrm{P}(S_b )+\textrm{P}( L| S_c )\textrm{P}(S_c )=\textrm{P}({S_{a}^{'}})\textrm{P}(S_a)+\textrm{P}({S_{b}^{'}})\textrm{P}(S_b)+\textrm{P}({S_{c}^{'}})\textrm{P}(S_c)=0.36$
+
 \parinterval 贝叶斯法则（Bayes’ rule）是概率论中的一个定理，通常用于知$\textrm{P}(A \mid B)$求$\textrm{P}(B \mid A)$。其内容如下：

 \parinterval 设$B_1,…,B_n$是S的一个划分，A为事件，则对于$i=1,…,n$，有如下公式
@@ -298,7 +314,7 @@
 \parinterval 熵是热力学中的一个概念，同时也是对系统无序性的一种度量标准，在机器翻译领域，最常用到的是信息熵这一概念。一条信息的信息量大小与它的不确定性有着直接的关系，如果我们需要确认一件非常不确定甚至于一无所知的事情，那么需要理解大量的相关信息才能确认清楚；同样的，如果我们对某件事已经非常确定，那么就不需要太多的信息就可以把它搞清楚。

 \begin{example}
-确定性的事件
+确定性和不确定性的事件

 \qquad\qquad\quad“太阳从东方升起”

@@ -321,7 +337,7 @@
 \begin{figure}[htp]
 \centering
 \includegraphics[scale=1]{ ./Chapter2/Figures/figure-Self-information-function.jpg} 
-\caption{自信息函数图像}
+\caption{自信息函数图像（{\red图片需要换一个高清的}）}
 \label{figureC2.6}
 \end{figure}
 %-------------------------------------------
@@ -380,32 +396,32 @@
 \parinterval 在机器翻译中，分词系统的好坏往往会决定机器翻译的质量。分词的目的是定义系统处理的基本单元，那么什么叫做“词”呢？关于词的定义有很多，比如：\\

 %-------------------------------------------
-\begin{definition}[词]
-新华字典
+\begin{definition}
+《新华字典》

 语言里最小的可以独立运用的单位：词汇。
 \end{definition}
 %-------------------------------------------

 %-------------------------------------------
-\begin{definition}[词]
-维基百科
+\begin{definition}
+《维基百科》

 单词（word），含有语义内容或语用内容，且能被单独念出来的的最小单位。
 \end{definition}
 %-------------------------------------------

 %-------------------------------------------
-\begin{definition}[词]
-国语辞典
+\begin{definition}
+《国语辞典》

 語句中具有完整概念，能獨立自由運用的基本單位。
 \end{definition}
 %-------------------------------------------

 %-------------------------------------------
-\begin{definition}[词]
-现代汉语词典
+\begin{definition}
+《现代汉语词典》

 说话或诗歌、文章、戏剧中的语句  
 \end{definition}
@@ -507,7 +523,7 @@
 \label{eqC2.9}
 \end{equation}

-\parinterval 但是这个游戏没有人规定骰子是均匀的（有些被坑了的感觉 :) ）。但是如果骰子的六个面不均匀呢？我们可以用一种更加“聪明”的方式定义一个新模型，即定义骰子的每一个面都以一定的概率出现，而不是相同的概率。这里，为了保证概率的归一性，我们只需定义$\theta_1 \sim \theta_5$，最后一个面的概率用1减去前几个面的概率之和进行表示，即
+\parinterval 但是这个游戏没有人规定骰子是均匀的（有些被坑了的感觉）。但是如果骰子的六个面不均匀呢？我们可以用一种更加“聪明”的方式定义一个新模型，即定义骰子的每一个面都以一定的概率出现，而不是相同的概率。这里，为了保证概率的归一性，我们只需定义$\theta_1 \sim \theta_5$，最后一个面的概率用1减去前几个面的概率之和进行表示，即
 \begin{equation}
 \begin{split}
 \textrm{P}("1") &=\theta_1 \\
@@ -655,5 +671,278 @@

 \parinterval 当然，真正的分词系统还需要解决很多其它问题，比如使用动态规划等方法高效搜索最优解以及如何处理未见过的词等等，由于本节的重点是介绍中文分词的基础方法和统计建模思想，因此无法覆盖所有中文分词的技术内容，有兴趣的读者可以参考2.6节的相关文献做进一步深入研究。

+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\section{语言模型 }\index{Chapter2.4}
+
+\parinterval 在基于统计的汉语分词模型中，我们通过“大题小做”的技巧，利用独立性假设把整个句子的单词切分概率转化为每个单个词出现概率的乘积。这里，每个单词也被称作1-gram（或uni-gram），而1-gram概率的乘积实际上也是在度量词序列出现的可能性（记为$\textrm{P}(w_1 w_2...w_m)$）。这种计算整个单词序列概率$\textrm{P}(w_1 w_2...w_m)$的方法被称为统计语言模型。1-gram语言模型是最简单的一种语言模型，它没有考虑任何的上下文。很自然的一个问题是：能否考虑上下文信息构建更强大的语言模型，进而得到跟准确的分词结果。下面我们将进一步介绍更加通用的$n$-gram语言模型，它在机器翻译及其它自然语言处理任务中有更加广泛的应用。
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\subsection{n-gram语言模型}\index{Chapter2.4.1}
+
+\parinterval 语言模型的目的是描述文字序列出现的规律。如果使用统计建模的方式，语言模型可以被定义为计算$\textrm{P}(w_1 w_2...w_m)$，也就是计算整个词序列$w_1 w_2...w_m$出现的可能性大小。具体定义如下，
+
+%----------------------------------------------
+% 定义3.1
+\begin{definition}[]
+词汇表V上的语言模型是一个函数$\textrm{P}(w_1 w_2...w_m)$，它表示$V^+$上的一个概率分布。其中，对于任何词串$w_1 w_2...w_m\in{V^+}$，有$\textrm{P}(w_1 w_2...w_m)\geq{0}$。而且对于所有的词串，函数满足归一化条件$\sum{_{w_1 w_2...w_m\in{V^+}}\textrm{P}(w_1 w_2...w_m)}=1$。
+\end{definition}
+%-------------------------------------------
+
+\parinterval 直接求$\textrm{P}(w_1 w_2...w_m)$并不简单，因为如果把$w_1 w_2...w_m$整个作为一个变量，模型的参数量会非常大。$w_1 w_2...w_m$有$|V|^m$种可能性，这里$|V|$表示词汇表大小。显然，当$m$增大的时候会使模型复杂度会急剧增加，甚至都无法进行存储和计算。既然把$w_1 w_2...w_m$整个作为一个变量不好处理，就可以考虑对这个序列的生成进行分解。使用链式法则，很容易得到
+
+\begin{equation}
+\textrm{P}(w_1 w_2...w_m)=\textrm{P}(w_1)\textrm{P}(w_2|w_1)\textrm{P}(w_3|w_1 w_2)...\textrm{P}(w_m|w_1 w_2...w_{m-1})
+\label{eq:2.4.1.1}
+\end{equation}
+
+这样，$w_1 w_2...w_m$的生成可以被看作是逐个生成每个单词的过程，即首先生成$w_1$，然后根据$w_1$再生成$w_2$，然后根据$w_1 w_2$再生成$w_3$，以此类推，直到根据所有前$m$-1个词生成序列的最后一个单词$w_m$。这个模型把联合概率$\textrm{P}(w_1 w_2...w_m)$分解为多个条件概率的乘积，虽然可以对生成序列的过程进行分解，但是模型的复杂度和以前是一样的，比如，$\textrm{P}(w_m|w_1 w_2...w_{m-1})$仍然不好计算。
+
+\parinterval 换一个角度看，$\textrm{P}(w_m|w_1 w_2...w_{m-1})$体现了一种基于“历史”的单词生成模型，也就是把前面生成的所有单词作为“历史”，并参考这个“历史”生成当前单词。但是这个“历史”的长度和整个序列长度是相关的，也是一种长度变化的历史序列。为了化简问题，一种自然的想法是使用定长历史，比如，每次只考虑前面$n$-1个历史单词来生成当前单词，这就是$n$-gram语言模型。这个模型的数学描述如下：
+
+\begin{equation}
+\textrm{P}(w_m|w_{m-n+1}...w_{m-1})=\textrm{P}(w_m|w_1 w_2...w_{m-1})
+\label{eq:2.4.1.2}
+\end{equation}
+
+\parinterval 这样，整个序列$w_1 w_2...w_m$的生成概率可以被重新定义为：
+%------------------------------------------------------
+% 表1.2
+\begin{table}[htp]{
+\begin{center}
+\caption{$n$-gram语言模型取不同$n$值的模型描述}
+\label{tab:n-gram-model-of-different-n}
+{\scriptsize
+\begin{tabular}{l|l|l l l}
+\toprule
+\textbf{链式法则} & \textbf{1-gram} & \textbf{2-gram} & $...$ & \textbf{$n$-gram}\\
+\midrule
+$\textrm{P}(w_1 w_2...w_m)$ = & $\textrm{P}(w_1 w_2...w_m)$ = & $\textrm{P}(w_1 w_2...w_m)$ = & $...$ & $\textrm{P}(w_1 w_2...w_m)$ = \\
+$\textrm{P}(w_1)\times$ & $\textrm{P}(w_1)\times$ & $\textrm{P}(w_1)\times$  & $...$ & $\textrm{P}(w_1)\times$ \\
+$\textrm{P}(w_2|w_1)\times$ & $\textrm{P}(w_2)\times$ & $\textrm{P}(w_2|w_1)\times$ & $...$ & $\textrm{P}(w_2|w_1)\times$\\
+$\textrm{P}(w_3|w_1 w_2)\times$ & $\textrm{P}(w_3)\times$ & $\textrm{P}(w_3|w_2)\times$ & $...$ & $\textrm{P}(w_3|w_1 w_2)\times$ \\
+$\textrm{P}(w_4|w_1 w_2 w_3)\times$ & $\textrm{P}(w_4)\times$ & $\textrm{P}(w_4|w_3)\times$ & $...$ & $\textrm{P}(w_4|w_1 w_2 w_3)\times$ \\
+$...$ & $...$ & $...$ & $...$ & $...$ \\
+$\textrm{P}(w_m|w_1 ... w_{m-1})$ & $\textrm{P}(w_m)$ & $\textrm{P}(w_m|w_{m-1})$ & $...$ & $\textrm{P}(w_m|w_{m-n+1} ... w_{m-1})$ \\
+\end{tabular}
+}
+\end{center}
+}\end{table}
+%------------------------------------------------------
+
+\parinterval 可以看到，1-gram语言模型只是$n$-gram语言模型的一种特殊形式。$n$-gram的优点在于，它所使用的历史信息是有限的，即$n$-1个单词。这种性质也反映了经典的马尔可夫链的思想\cite{刘克2004实用马尔可夫决策过程}\cite{resnick1992adventures}有时也被称作马尔可夫假设或者马尔可夫属性。因此$n$-gram也可以被看作是变长序列上的一种马尔可夫模型，比如，2-gram语言模型对应着1阶马尔可夫模型，3-gram语言模型对应着2阶马尔可夫模型，以此类推。
+
+\parinterval 那么，如何计算$\textrm{P}(w_m|w_{m-n+1} ... w_{m-1})$？有很多种选择，比如：
+
+\begin{adjustwidth}{1em}{}
+\begin{itemize}
+\item 极大似然估计。直接利用不同词序列在训练数据中出现的频度计算出$\textrm{P}(w_m|w_{m-n+1} ... w_{m-1})$
+
+\begin{equation}
+\textrm{P}(w_m|w_{m-n+1}...w_{m-1})=\frac{count(w_{m-n+1}...w_m)}{count(w_{m-n+1}...w_{m-1})}
+\label{eq:2.4.1.3}
+\end{equation}
+
+\item 人工神经网络方法。构建一个人工神经网络估计$\textrm{P}(w_m|w_{m-n+1} ... w_{m-1})$的值，比如，可以构建一个前馈神经网络来对$n$-gram进行建模。
+\end{itemize}
+\end{adjustwidth}
+
+\parinterval 极大似然估计方法和前面介绍的统计分词中的方法是一致的，它的核心是使用$n$-gram出现的频度进行参数估计，因此是也自然语言处理中一类经典的$n$-gram方法。基于人工神经网络的方法在近些年也非常受关注，它直接利用多层神经网络对问题的输入$(w_{m-n+1}...w_{m-1})$和输出$(\textrm{P}(w_m|w_{m-n+1} ... w_{m-1}))$进行建模，而模型的参数通过网络中神经元之间连接的权重进行体现。严格意义上了来说，基于人工神经网络的方法并不算基于$n$-gram的方法，或者说它并显性记录$n$-gram的生成概率，也不不依赖$n$-gram的频度进行参数估计。为了保证内容的连贯性，本章将仍以传统$n$-gram语言模型为基础进行讨论，基于人工神经网络的方法将会在第五章和第六章进行详细介绍。
+
+\parinterval 使用$n$-gram语言模型非常简单。我们可以像2.3.2节中一样，直接用它来对词序列出现的概率进行计算。比如，可以使用一个2-gram语言模型计算一个分词序列的概率
+
+\begin{equation}
+\begin{aligned}
+& \textrm{P}_{2-gram}{('\textrm{确实}/\textrm{现在}/\textrm{数据}/\textrm{很}/\textrm{多}')} \\ = \quad & \textrm{P}('\textrm{确实}')\times\textrm{P}('\textrm{现在}'|'\textrm{确实}')\times\textrm{P}('\textrm{数据}'|'\textrm{现在}')\times  \\
+& \textrm{P}('\textrm{很}'|'\textrm{数据}')\times\textrm{P}('\textrm{多}'|'\textrm{很}')
+\label{eq:2.4.1.4}
+\end{aligned}
+\end{equation}
+
+\parinterval 以$n$-gram语言模型为代表的统计语言模型的应用非常广泛。除了分词，在文本生成、信息检索、摘要等等自然语言处理任务中，语言模型都有举足轻重的地位。包括近些年非常受关注的预训练模型，本质上也是统计语言模型。这些技术都会在后续章节进行介绍。值得注意的是，统计语言模型给我们解决自然语言处理问题提供了一个非常好的建模思路，即：把整个序列生成的问题转化为逐个生成单词的问题。很快我们就会看到，这种建模方式会被广泛的用于机器翻译建模中，在统计机器翻译和神经机器翻译中都会有明显的体现。
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\subsection{未登录词和平滑算法}\index{Chapter2.4.2}
+
+\parinterval 在式\ref{eq:2.4.1.4}的例子中，如果语料中从没有“确实”和“现在”两个词连续出现的情况，那么使用2-gram计算 “确实/现在/数据/很/多”的切分方式的概率时，会出现如下情况
+
+\begin{equation}
+\textrm{P}('\textrm{现在}'|'\textrm{确实}') = \frac{count('\textrm{确实}\,\textrm{现在}')}{count('\textrm{确实}')} = \frac{0}{count('\textrm{确实}')} = 0
+\label{eq:2.4.1.5}
+\end{equation}
+
+\parinterval 显然，这个结果是不能接受的。因为即使语料中没有 “确实”和“现在”两个词连续出现，但是这种搭配也是客观存在的。这时简单的用极大似然估计得到概率却是0，导致整个切分结果的概率为0。更常见的问题是那些根本没有出现在词表中的词，称为未登录词（Out-of-Vocabulary, OOV），比如一些生僻词，可能模型训练阶段从来没有看到过，这时模型仍然会给出0概率。图\ref{fig:2.4.1.1}展示了词语出现频度的分布，可以看到绝大多数词都是低频词。
+
+%----------------------------------------------
+% 图2.4.1.1
+
+\begin{figure}[htp]
+    \centering
+\includegraphics{./Chapter2/Figures/figure-word-frequency-distribution.jpg}
+	 \caption{词语频度分布（{\red 需要一个高清图片}）}
+    \label{fig:2.4.1.1}
+\end{figure}
+%---------------------------
+
+\parinterval 为了解决未登录词引起的零概率问题，常用的做法是对模型进行平滑处理，也就是给可能出现的情况一个非零的概率，使得模型不会对整个序列给出零概率。平滑可以用“劫富济贫”这一思想理解，在保证所有情况的概率和为1的前提下，使极低概率的部分可以从高概率的部分分配一部分概率，从而达到平滑的目的。
+
+\parinterval 语言模型使用的平滑算法有很多，比如、加法平滑方法、古德-图灵估计法、Katz平滑法等。在本节中，主要介绍三种平滑方法：加法平滑法、古德-图灵估计法和Kneser-Ney平滑。
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\subsubsection{加法平滑方法}\index{Chapter2.4.2.1}
+
+\parinterval 加法平滑方法可以说是简单的平滑技术，我们首先介绍这一方法，希望通过它了解平滑算法的思想。
+
+\parinterval 通常情况下，我们会利用采集到的语料库来模拟现实生活中真实全部的语料库。但是采集总是不充分的，比如无法涵盖所有的词汇，直接依据这样语料所获得的统计信息计算现实中的语言概率就会产生偏差。假设依据某语料C（从未出现“确实 现在”二元语法），评估一个已经分好词的句子S =“确实 现在 物价 很 高”的概率，当计算“确实 现在”的概率时使得评估$\textrm{P}(S) = 0$。显然这个结果是不够准确的，根据我们的常识，句子$S$是有出现的可能性的，这样句子的概率值不应该是0。
+
+\begin{equation}
+\textrm{P}(\textrm{现在}|\textrm{确实}) = \frac{count(\textrm{确实}\,\textrm{现在})}{count(\textrm{确实})} = \frac{0}{count(\textrm{确实})} = 0
+\label{eq:2.4.1.6}
+\end{equation}
+
+\parinterval 为了避免这种由于数据所产生的评估预测概率为0的问题，采用“数据平滑”的方式对最大似然估计方法进行调整。通常的平滑方法都是为了提高低概率（如零概率），或者降低高概率，这种做法的思想比较类似于“劫富济贫”。
+
+\parinterval 加法平滑方法（additive smoothing）假设每个n元语法出现的次数比实际统计次数多$\theta$次，$0 \leqslant\theta\leqslant 1$，使得分子部分不为0，那么计算前文例子“确实 现在”的概率时，可以使用如下方法计算。
+
+\begin{equation}
+\textrm{P}(\textrm{现在}|\textrm{确实}) = \frac{\theta + count(\textrm{确实}\,\textrm{现在})}{\sum_{w}^{|V|}(\theta + count(\textrm{确实}w))} = \frac{\theta + count(\textrm{确实}\,\textrm{现在})}{\theta{|V|} + count(\textrm{确实})}
+\label{eq:2.4.1.7}
+\end{equation}
+
+\parinterval 这里面$V$表示所有词汇的词表，$|V|$为词表中单词的个数，$w$为词典中的词。常见的加法平滑方法会将$\theta$取1，这时我们又称为加一平滑或是拉普拉斯平滑。这种方法比较容易理解，也比较简单，但是一些人认为这种方法的表现较差，因此，其实际的使用效果还要视具体情况而定。
+
+\parinterval 举一个例子来形象的描述加法平滑方法。假设在一个英文文档中随机抽取词汇，已经抽到的词包括12个，词典大小$|V|$=20，已抽到的词汇统计结果为：4 look，3 people，2 am，1 what，1 want，1 do。为了更形象的描述在平滑之前和平滑之后的概率分布的区别，如图所示：
+
+%----------------------------------------------
+% 图2.4.1.2
+\begin{figure}[htp]
+    \centering
+\includegraphics{./Chapter2/Figures/figure-no-smoothing&smoothed-probability-distributions.jpg}
+    \caption{无平滑和有平滑后的概率分布（{\red需要高清图片}）}
+    \label{fig:2.4.1.2}
+\end{figure}
+%---------------------------
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\subsubsection{古德-图灵估计法}\index{Chapter2.4.2.2}
+
+\parinterval 古德-图灵估计法是图灵（Alan Turing）和他的助手古德（I.J.Good）开发的，作为他们在二战期间破解德国密码机Enigma所使用的方法的一部分，在1953年古德将其发表，这一方法也是很多平滑算法的核心，其基本思路是：是把非零的$n$元语法的概率降低匀给一些低概率$n$元语法，以修改最大似然估计与真实概率之间的偏离\cite{good1953population}\cite{gale1995good}。
+
+\parinterval 假定在语料库中出现r次的$n$元语法有$n_r$个，特别的，出现0次的$n$元语法(即未登录词)出现的次数为$n_0$个。语料库中全部词语的个数为$N$，显然
+
+\begin{equation}
+N = \sum_{r=1}^{\infty}{r\,n_r}
+\label{eq:2.4.1.8}
+\end{equation}
+
+\parinterval 这时，出现$r$次的$n$元语法在词典中的相对频率为$r/N$，这也是不做平滑处理时这些词的概率估计。为了解决零概率问题，Good-Turing方法对于任何一个出现$r$次的$n$元语法，利用出现$r$+1次的$n$元语法统计量重新假设它出现$r^*$次，这里
+
+\begin{equation}
+r^* = (r + 1)\frac{n_{r + 1}}{n_r}
+\label{eq:2.4.1.9}
+\end{equation}
+
+\parinterval 基于这个公式，就可以估计所有0次$n$元语法的频次$n_0 r_0^*=(r_0+1)n_1=n_1$。要把这个重新估计的统计数转化为概率，只需要进行归一化处理：对于每个统计数为$r$的事件，其概率为$\textrm{p}_r=r^*/N$，其中
+
+\begin{equation}
+N = \sum_{r=0}^{\infty}{r^{*}n_r} = \sum_{r=0}^{\infty}{(r + 1)n_{r + 1}} = \sum_{r=1}^{\infty}{r\,n_r}
+\label{eq:2.4.1.10}
+\end{equation}
+
+也就是说，N仍然为这个整个样本分布最初的计数。这样样本中所有事件的概率之和为：
+
+\begin{equation}
+N = \sum_{r>0}{p_r n_r} = 1 - \frac{n_1}{N} < 1
+\label{eq:2.4.1.11}
+\end{equation}
+
+其中$n_1/N$的概率余量就是分配给所有统计为0的事件。
+
+\parinterval Good-Turing方法最终通过出现1次的$n$元语法估计了统计为0的事件概率，达到了平滑的效果。
+
+\parinterval 我们使用一个例子来说明这个方法是如何通过已知事物的数量来预测未来事物的数量。仍然考虑在加法平滑法的英文词汇抽取的例子，根据Good-Turing方法进行修正如下表
+%------------------------------------------------------
+% 表1.3
+\begin{table}[htp]{
+\begin{center}
+\caption{英文词汇抽取统计结果}
+\label{tab:results-of-en-vocabulary-extraction}
+{
+\begin{tabular}{l|l|l|l}
+\toprule
+\textbf{$r$} & \textbf{$n_r$} & \textbf{$n^*$} & \textbf{$p_r$}\\
+\midrule
+0 & 14 & 0.21 & 0.018 \\ \hline 1 & 3 & 0.67 & 0.056 \\ \hline 2 & 1 & 3 & 0.25 \\ \hline 3 & 1 & 4 & 0.333 \\ \hline 4 & 1 & - & - \\ \hline
+\end{tabular}
+}
+\end{center}
+}\end{table}
+%------------------------------------------------------
+
+\parinterval 很多时候会出现$n_{r+1}=0$的情况，这是对于当$r$很大的时候很常见的情况，而且通常情况下当$r$很大时，$n_r$也会有噪音的存在。这种简单的Good-Turing方法无法应对这些复杂的情况，随着更多的研究和发展，成为了一些其他平滑方法的基础。
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\subsubsection{Kneser-Ney平滑方法}\index{Chapter2.4.2.3}
+
+\parinterval Kneser-Ney平滑方法由R.Kneser和H.Ney于1995年提出的用于计算$n$元语法概率分布的方法\cite{kneser1995improved}\cite{chen1999empirical}。基于absolute discounting，并被广泛认为是最有效的平滑方法。这种平滑方法改进了absolute discounting中与高阶分布相结合的低阶分布的计算方法，使不同阶分布得到充分的利用。这种算法综合利用了其他多种平滑算法的思想，是一种先进而且标准的平滑算法。
+
+\parinterval 首先介绍一下absolute discounting平滑算法，公式如下所示
+
+\begin{equation}
+\textrm{P}_{AbsDiscount}(w_i | w_{i-1}) = \frac{c(w_{i-1},w_i )-d}{c(w_{i-1})} + \lambda(w_{i-1})\textrm{P}(w)
+\label{eq:2.4.1.12}
+\end{equation}
+
+其中$d$是固定的被裁剪的值，$\lambda$是一个正则化常数。可以看到第一项是经过减值调整过的2-gram的概率值，第二项则相当于一个带权重$\lambda$的1-gram的插值项。然而这种插值模型极易受到原始1-gram模型的干扰。
+
+\parinterval 假设我们使用2-gram和1-gram的插值模型预测下面句子中“[BLANK]”处的词：“I can’t see without my reading [BLANK]”，直觉上我们会猜测这个地方的词应该是glasses，但是在训练语料库中Francisco出现的频率非常高。如果在预测时仍然使用的是标准的1-gram模型，那么计算机会由于高概率选择Francisco填入句子的空白处，这结果明显是不合理的。当使用的是混合的插值模型时，如果reading Francisco这种二元语法并没有出现在语料中，就会导致1-gram对结果的影响变大，使得仍然会做出与标准1-gram模型相同的结果，犯下相同的错误。
+
+\parinterval 观察语料的二元语法发现，Francisco的前一个词仅是San，不会出现reading。这个分析提醒了我们，考虑前一个词的影响是有帮助的，比如仅在前一个词时San时，我们才给Francisco赋予一个较高的概率值。基于这种想法，改进原有的1-gram模型，创造一个新的1-gram模型$\textrm{P}_{continuation}$，使这个模型可以通过考虑前一个词的影响评估了当前词作为第二个词出现的可能性。
+
+\parinterval 为了评估$\textrm{P}_{continuation}$，统计使用当前词作为第二个词所出现二元语法的种类，二元语法种类越多，这个词作为第二个词出现的可能性越高，呈正比：
+
+\begin{equation}
+\textrm{P}_{continuation}(w_i) \varpropto |w_{i-1}: c(w_{i-1} w_i )>0|
+\label{eq:2.4.1.13}
+\end{equation}
+
+通过全部的二元语法的种类做归一化可得到评估的公式
+
+\begin{equation}
+\textrm{P}_{continuation}(w_i) = \frac{|\{ w_{i-1}:c(w_{i-1} w_i )>0 \}|}{|\{ (w_{j-1}, w_j):c(w_{j-1},w_j )>0 \}|}
+\label{eq:2.4.1.14}
+\end{equation}
+
+\parinterval 基于分母的变化还有另一种形式
+
+\begin{equation}
+\textrm{P}_{continuation}(w_i) = \frac{|\{ w_{i-1}:c(w_{i-1} w_i )>0 \}|}{\sum_{w^{\prime}}|\{ w_{i-1}^{\prime}:c(w_{i-1}^{\prime},w_i^{\prime} )>0 \}|}
+\label{eq:2.4.1.15}
+\end{equation}
+
+结合基础的absolute discounting计算公式，从而得到了Kneser-Ney平滑方法的公式
+
+\begin{equation}
+\begin{aligned}
+\textrm{P}_{KN}(w_i|w_{i-1}) & =\frac{\max(c(w_{i-1},w_i )-d,0)}{c(w_{i-1})}+ \lambda(w_{i-1})\textrm{P}_continuation(w_i) \\ & \lambda(w_{i-1}) = \frac{d}{c(w_{i-1})}|\{w:c(w_{i-1},w)>0\}|
+\label{eq:2.4.1.16}
+\end{aligned}
+\end{equation}
+
+其中max部分保证了分子部分为不小0的数，原始1-gram更新成$\textrm{P}_{continuation}$概率分布，$\lambda$是正则化项。
+
+\parinterval 为了更具普适性，不仅局限为2-gram和1-gram的插值模型，利用递归的方式得到更通用的公式
+
+\begin{equation}
+\begin{aligned}
+\textrm{P}_{KN}(w_i|w_{i-n+1} & ...w_{i-1}) \\ & =\frac{\max(c_{KN}(w_{i-n+1}...w_{i-1})-d,0)}{c_{KN}(w_{i-n+1}...w_{i-1})} \\ & + \lambda(w_{i-n+1}...w_{i-1})\textrm{P}_{KN}(w_i|w_{i-n+2}...w_{i-1}) \\ \lambda(w_{i-1}) = &  \frac{d}{c_{KN}(w_{i-n+1}^{i-1})}|\{w:c_{KN}(w_{i-n+1}...w_{i-1}w)>0\}| \\ 
+c_{KN}(\cdot) & \begin{cases} count(\cdot)\quad for\ the\ highest\ order  \\ continuationcount(\cdot)\quad for\ lower\ order \end{cases}
+\label{eq:2.4.1.17}
+\end{aligned}
+\end{equation}
+
+\parinterval 其中continuationcount表示的是基于某个单个词作为第$n$个词的$n$元语法的种类数目。
+
+\parinterval 我们前面提到Kneser-Ney Smoothing 是当前一个标准的、广泛采用的、先进的平滑算法。还有很多基于此为基础衍生出来的算法，有兴趣的读者可以查找更多资料了解。\cite{parsing2009speech}\cite{ney1994structuring}\cite{chen1999empirical}

+\parinterval 

--- a/Book/Chapter3/Chapter3.tex
+++ b/Book/Chapter3/Chapter3.tex
@@ -11,14 +11,14 @@
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \chapter{基于词的翻译模型}

-\hspace{2em}使用统计方法对翻译进行建模是机器翻译发展中的重要里程碑。这种思想也影响了随后的统计机器翻译和神经机器翻译。虽然技术不断发展，传统的统计模型已经不再``新鲜''，但是对于今天机器翻译的研究仍然有启示作用。想要了解前沿、展望未来，我们更要冷静的思考前任给我们带来了什么。基于此，本章将主要介绍统计机器翻译的开山之作\ \ —\ \ IBM模型，它主要提出了使用统计模型进行翻译的思想，并使用基于单词对齐的方式完成了机器翻译的统计建模。IBM模型由Peter E. Brown等人在1993年提出({\red 参考文献!!!})\ \ —\ \ 《The Mathematics of Statistical Machine Translation: Parameter Estimation》。客观的说，这篇文章的视野和对问题的理解，已经超过当时绝大多数人所能看到的东西，其衍生出来的一系列方法和新的问题还被后人花费将近10年的时间来进行研究与讨论。时至今日，IBM模型中的一些思想仍然影响着很多研究工作。
+\hspace{2em}使用统计方法对翻译进行建模是机器翻译发展中的重要里程碑。这种思想也影响了随后的统计机器翻译和神经机器翻译。虽然技术不断发展，传统的统计模型已经不再``新鲜''，但是对于今天机器翻译的研究仍然有启示作用。想要了解前沿、展望未来，我们更要冷静的思考前任给我们带来了什么。基于此，本章将主要介绍统计机器翻译的开山之作\ \ —\ \ IBM模型，它主要提出了使用统计模型进行翻译的思想，并使用基于单词对齐的方式完成了机器翻译的统计建模。IBM模型由Peter E. Brown等人在1993年提出\ \ —\ \ 《The Mathematics of Statistical Machine Translation: Parameter Estimation》\cite{brown1993mathematics}。客观的说，这篇文章的视野和对问题的理解，已经超过当时绝大多数人所能看到的东西，其衍生出来的一系列方法和新的问题还被后人花费将近10年的时间来进行研究与讨论。时至今日，IBM模型中的一些思想仍然影响着很多研究工作。

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \section{什么是基于词的翻译模型}\index{Chapter3.1}

 \parinterval 在机器翻译中，我们希望得到一个源语言到目标语言的翻译。虽然对于人类来说这个问题很简单。但是，让计算机做这样的工作却困难很多，因为我们需要把翻译``描述''成计算机可以计算的形式。因此这里面临的第一个问题是：如何对翻译进行建模？从这个角度计算机的角度来说，需要把抽象的翻译问题转换为可计算的问题，这样问题又可以被重新描述为：如何将翻译转换为一个可计算的模型或过程？

-\parinterval 那么，基于单词的统计机器翻译模型又是如何描述翻译问题的呢？Peter E. Brown等人提出了一个观点({\red 参考文献!!!})：在翻译源语句时，通常是把每个源语句的单词翻译成对应的目标语单词，然后调整这些单词的顺序，最后得到翻译结果，而这个过程可以用统计模型描述。尽管在人看来使用两个语言单词之间的对应进行翻译是很自然的事，但是对于计算机来说是迈出了一大步。
+\parinterval 那么，基于单词的统计机器翻译模型又是如何描述翻译问题的呢？Peter E. Brown等人提出了一个观点\cite{brown1993mathematics}：在翻译源语句时，通常是把每个源语句的单词翻译成对应的目标语单词，然后调整这些单词的顺序，最后得到翻译结果，而这个过程可以用统计模型描述。尽管在人看来使用两个语言单词之间的对应进行翻译是很自然的事，但是对于计算机来说是迈出了一大步。

 \parinterval 先来看一个例子。图 \ref{fig:figure-zh-en-translation-example}展示了一个汉语翻译到英语的例子。首先我们把源语句的单词``我''、``对''、``你''、``感到''和``满意''分别翻译为``I''、``with''、``you''、``am''\ 和``satisfied''，然后调整单词的顺序，比如``am''放在译文的第2个位置，``you''应该放在最后的位置等，最后得到译文``I am satisfied with you''。
 %空一行用来段落换行，noindent取消首行缩进，hspace{}指定缩进距离，1em等于两个英文字符|一个汉字
@@ -110,11 +110,11 @@
 \end{figure}
 %---------------------------

-\parinterval 对于第二个问题，尽管机器能够找到很多这样的译文选择路径，但它并不知道那些路径是好的。说的再直白一些，简单的枚举路径实际上就是一个体力活，没有什么智能。因此计算机还需要再聪明一些，运用它的能够``掌握''的知识判断哪个结果是好的。这一步是最具挑战的，当然也有很多思路。在统计机器翻译中，这个问题被定义为：设计一种统计模型，它可以给每个译文一个概率值，这个概率值越高表示译文质量越好。如图\ref{fig:process-of-machine-translation}所示，每个单词翻译候选的右侧黑色框里的数字就是单词的翻译概率。使用这些单词的翻译概率，我们可以得到整句译文的概率（符号P表示）。这样，我们用概率化的模型描述了每个翻译候选的可能性。基于每个翻译候选的可能性，机器翻译系统可以对所有的译文选择路径进行打分，图\ref{figureC3.4}中第一条翻译路径的分数为0.042，第二条是0.006，以此类推。最后，系统可以选择分数最高的路径作为源语句的最终译文。
+\parinterval 对于第二个问题，尽管机器能够找到很多这样的译文选择路径，但它并不知道那些路径是好的。说的再直白一些，简单的枚举路径实际上就是一个体力活，没有什么智能。因此计算机还需要再聪明一些，运用它的能够``掌握''的知识判断哪个结果是好的。这一步是最具挑战的，当然也有很多思路。在统计机器翻译中，这个问题被定义为：设计一种统计模型，它可以给每个译文一个概率值，这个概率值越高表示译文质量越好。如图\ref{fig:process-of-machine-translation}所示，每个单词翻译候选的右侧黑色框里的数字就是单词的翻译概率。使用这些单词的翻译概率，我们可以得到整句译文的概率（符号P表示）。这样，我们用概率化的模型描述了每个翻译候选的可能性。基于每个翻译候选的可能性，机器翻译系统可以对所有的译文选择路径进行打分，图\ref{fig:process-of-machine-translation}中第一条翻译路径的分数为0.042，第二条是0.006，以此类推。最后，系统可以选择分数最高的路径作为源语句的最终译文。

 \subsubsection{（三）人工 vs. 机器}\index{Chapter3.2.1.3}

-\parinterval 人在翻译时的决策和推断是非常确定并且快速的，但机器翻译处理这个问题却充满了不确定性和概率化的思想。当然它们也有类似的地方。首先，计算机使用统计模型目的是建立处理翻译问题的基本模式，并储存相关的模型参数，这个和我们大脑的作用是类似的\footnote{这里，并不是要把统计模型等同于生物学或者认知科学上的人脑，我们指的是他们处理问题时发挥的作用类似。}；其次，计算机对统计模型进行训练的过程相当于人学习知识的过程，或者二者都可以称为学习；再有，计算机使用学习到的模型对新的句子进行翻译的过程相当于人运用知识进行翻译的过程。在统计机器翻译中，模型学习的过程称为\textbf{训练}，目的是从双语平行数据中自动学习翻译知识；我们把应用模型的过程称为\textbf{解码}或\textbf{推断}，目的是使用学习的知识对新的句子进行翻译。这就是当前机器实现翻译的两个核心步骤：训练和解码。图\ref{figureC3.4}右侧标注了在机器翻译过程中这两个部分的体现。这样，统计机器翻译的核心由三部分构成 - 建模、训练和解码。本章后续内容会围绕这三个问题展开讨论。
+\parinterval 人在翻译时的决策和推断是非常确定并且快速的，但机器翻译处理这个问题却充满了不确定性和概率化的思想。当然它们也有类似的地方。首先，计算机使用统计模型目的是建立处理翻译问题的基本模式，并储存相关的模型参数，这个和我们大脑的作用是类似的\footnote{这里，并不是要把统计模型等同于生物学或者认知科学上的人脑，我们指的是他们处理问题时发挥的作用类似。}；其次，计算机对统计模型进行训练的过程相当于人学习知识的过程，或者二者都可以称为学习；再有，计算机使用学习到的模型对新的句子进行翻译的过程相当于人运用知识进行翻译的过程。在统计机器翻译中，模型学习的过程称为\textbf{训练}，目的是从双语平行数据中自动学习翻译知识；我们把应用模型的过程称为\textbf{解码}或\textbf{推断}，目的是使用学习的知识对新的句子进行翻译。这就是当前机器实现翻译的两个核心步骤：训练和解码。图\ref{fig:process-of-machine-translation}右侧标注了在机器翻译过程中这两个部分的体现。这样，统计机器翻译的核心由三部分构成 - 建模、训练和解码。本章后续内容会围绕这三个问题展开讨论。

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \subsection{基本框架}\index{Chapter3.2.2}
@@ -154,6 +154,8 @@
 % 表
 \begin{table}[htp]
    \centering
+    \caption{汉译英单词翻译概率}
+    \label{tab:word-translation-examples}
    \begin{tabular}{l | l | l}
    源语言 & 目标语言 & 翻译概率 \\ \hline
                & I              & 0.50 \\
@@ -163,8 +165,7 @@
                & am         & 0.10 \\
    ...         & ...           & ... \\
    \end{tabular}
-    \caption{汉译英单词翻译概率}
-    \label{tab:word-translation-examples}
+
 \end{table}
 %---------------------------

@@ -255,7 +256,7 @@

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \subsection{句子级翻译模型}\index{Chapter3.2.4}
-
+\label{sec:sentence-level-translation}
 \parinterval 在获得单词翻译概率的基础上，本节继续介绍如何获取句子级翻译概率。如图\ref{fig:role-of-P(t|s)-in-sentence-level-translation}所示，条件概率$\textrm{P}(t|s)$表示给出源语言句子$s$的情况下译文为$t$的概率。这也是整个句子级翻译模型的核心，它处于整个统计机器翻译流程的中心，一方面我们需要从数据中学习这个模型的参数，另一方面，对于新输入的句子，我们需要使用这个模型得到最佳的译文。下面介绍句子级翻译的建模方法。
 %----------------------------------------------
 % 图3.9
@@ -291,14 +292,14 @@

 \parinterval 回到设计$g(s,t)$的问题上。这里，我们采用``大题小作''的方法，这个技巧在第二章已经进行了充分的介绍。具体来说，直接建模句子之间的对应比较困难，但可以利用词之间的对应来描述句子之间的对应关系。这里，就用到了上一小节所介绍的单词翻译概率。

-\parinterval 我们首先引入一个非常重要的概念\ —\ \textbf{词对齐}，它是统计机器翻译中最基本的概念之一。词对齐描述了并行句对中单词之间的对应关系，它体现了一种观点：本质上句子之间的对应是由词之间的对应表示的。当然，这个观点在神经机器翻译或者其它模型中可能会有不同的理解，但是翻译句子的过程中我们考虑词一级的对应关系是符合我们对语言的认知的。图\ref{figureC3.11}展示了一个句对$s$和$t$，单词的右下标数字表示了该词在句中的位置，而虚线表示的是句子$s$和$t$中的词对齐关系。比如，``满意''的右下标数字5表示在句子$s$中处于第5个位置，``satisfied''的右下标数字3表示在句子$t$中处于第3个位置，``满意''和``satisfied''之间的虚线表示两个单词之间是对齐的。为方便描述，我们用二元组$(j,i)$来描述词对齐，它表示源语句第$j$个单词对应目标语句第$i$个单词，即单词$s_j$和$t_i$对应。通常，也会把$(j,i)$称作一条词对齐连接。图\ref{figureC3.11}中共有5条虚线，表示有5组单词之间的词对齐连接。我们把这些词对齐连接构成的集合称为$A$，即$A={\{(1,1),(2,4),(3,5),(4,2)(5,3)}\}$。
+\parinterval 我们首先引入一个非常重要的概念\ —\ \textbf{词对齐}，它是统计机器翻译中最基本的概念之一。词对齐描述了并行句对中单词之间的对应关系，它体现了一种观点：本质上句子之间的对应是由词之间的对应表示的。当然，这个观点在神经机器翻译或者其它模型中可能会有不同的理解，但是翻译句子的过程中我们考虑词一级的对应关系是符合我们对语言的认知的。图\ref{fig:zh-en-translation-sentence-pairs&word-alignment-connection}展示了一个句对$s$和$t$，单词的右下标数字表示了该词在句中的位置，而虚线表示的是句子$s$和$t$中的词对齐关系。比如，``满意''的右下标数字5表示在句子$s$中处于第5个位置，``satisfied''的右下标数字3表示在句子$t$中处于第3个位置，``满意''和``satisfied''之间的虚线表示两个单词之间是对齐的。为方便描述，我们用二元组$(j,i)$来描述词对齐，它表示源语句第$j$个单词对应目标语句第$i$个单词，即单词$s_j$和$t_i$对应。通常，也会把$(j,i)$称作一条词对齐连接。图\ref{fig:zh-en-translation-sentence-pairs&word-alignment-connection}中共有5条虚线，表示有5组单词之间的词对齐连接。我们把这些词对齐连接构成的集合称为$A$，即$A={\{(1,1),(2,4),(3,5),(4,2)(5,3)}\}$。
 %----------------------------------------------
-% 图3.11
+% 图3.10
 \begin{figure}[htp]
    \centering
-\input{./Chapter3/Figures/figure311}
+\input{./Chapter3/Figures/figure-zh-en-translation-sentence-pairs&word-alignment-connection}
    \caption{汉英互译句对及词对齐连接（蓝色虚线）}
-    \label{figureC3.11}
+    \label{fig:zh-en-translation-sentence-pairs&word-alignment-connection}
 \end{figure}
 %---------------------------

@@ -309,7 +310,7 @@ g(s,t) = \prod_{(j,i)\in \widehat{A}}\textrm{P}(s_j,t_i)
 \label{eqC3.8}
 \end{equation}

-\noindent其中$g(s,t)$被定义为句子$s$中的单词和句子$t$中的单词的翻译概率的乘积，并且这两个单词之间必须有对齐连接。$\textrm{P}(s_j,t_i)$表示具有对齐链接的源语词$s_j$和目标语词$t_i$的单词翻译概率。以图\ref{figureC3.11}中的句对为例，其中``我''与``I''、``对''与``with''、``你''与``you''\\等相互对应，可以把它们的翻译概率相乘得到$g(s,t)$的计算结果，如下：
+\noindent其中$g(s,t)$被定义为句子$s$中的单词和句子$t$中的单词的翻译概率的乘积，并且这两个单词之间必须有对齐连接。$\textrm{P}(s_j,t_i)$表示具有对齐链接的源语词$s_j$和目标语词$t_i$的单词翻译概率。以图\ref{fig:zh-en-translation-sentence-pairs&word-alignment-connection}中的句对为例，其中``我''与``I''、``对''与``with''、``你''与``you''\\等相互对应，可以把它们的翻译概率相乘得到$g(s,t)$的计算结果，如下：

 \begin{eqnarray}
 {g(s,t)}&= &  \textrm{P}(\textrm{``我'',``I''}) \times \textrm{P}(\textrm{``对'',``with''}) \times \textrm{P}(\textrm{``你'',``you''}) \times \nonumber \\
@@ -321,15 +322,15 @@ g(s,t) = \prod_{(j,i)\in \widehat{A}}\textrm{P}(s_j,t_i)

 \subsubsection{（二）生成流畅的译文}\index{Chapter3.2.4.2}

-\parinterval 公式\ref{eqC3.8}定义的$g(s,t)$存在的问题是没有考虑词序信息。我们用一个简单的例子说明这个问题。如图\ref{figureC3.12}所示，源语句``我 对 你 感到 满意''有两个翻译结果，第一个翻译结果是``I am satisfied with you''，第二个是``I with you am satisfied''。虽然这两个译文包含的目标语单词是一样的，但词序存在很大差异。比如，它们都选择了``satisfied''作为源语单词``满意''的译文，但是在第一个翻译结果中``satisfied''处于第3个位置，而第二个结果中处于最后的位置。显然第一个翻译结果更符合英文的表达习惯，翻译的质量更高。可遗憾的，对于有明显差异的两个译文，公式\ref{eqC3.8}计算得到的函数$g(\cdot)$的值却是一样的。
+\parinterval 公式\ref{eqC3.8}定义的$g(s,t)$存在的问题是没有考虑词序信息。我们用一个简单的例子说明这个问题。如图\ref{fig:example-translation-alignment}所示，源语句``我 对 你 感到 满意''有两个翻译结果，第一个翻译结果是``I am satisfied with you''，第二个是``I with you am satisfied''。虽然这两个译文包含的目标语单词是一样的，但词序存在很大差异。比如，它们都选择了``satisfied''作为源语单词``满意''的译文，但是在第一个翻译结果中``satisfied''处于第3个位置，而第二个结果中处于最后的位置。显然第一个翻译结果更符合英文的表达习惯，翻译的质量更高。可遗憾的，对于有明显差异的两个译文，公式\ref{eqC3.8}计算得到的函数$g(\cdot)$的值却是一样的。
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 %----------------------------------------------
-% 图3.12
+% 图3.11
 \begin{figure}[htp]
    \centering
-\input{./Chapter3/Figures/figure312}
+\input{./Chapter3/Figures/figure-example-translation-alignment}
    \caption{此处为图片的描述...蓝色的虚线表示的是对齐关系（词对齐）}
-    \label{figureC3.12}
+    \label{fig:example-translation-alignment}
 \end{figure}
 %---------------------------

@@ -352,19 +353,20 @@ g(s,t) \equiv \prod_{j,i \in \widehat{A}}{\textrm{P}(s_j,t_i)} \times  \textrm{P
 \label{eqC3.13}
 \end{equation}

-\parinterval 如图\ref{figureC3.14}所示，语言模型$\textrm{P}_{lm}(t)$给分别$t^{'}$和$t^{''}$赋予0.0107和0.0009的概率，这表明句子$t^{'}$更符合英文的表达。这与我们的期望是相吻合的。它们再分别乘以$\prod_{j,i \in \widehat{A}}{\textrm{P}(s_j,t_i)}$的值，就得到公式\ref{eqC3.13}定义的函数$g(\cdot)$的值。显然句子$t^{'}$的分数更高。同时它也是我们希望得到的翻译结果。至此，我们完成了对函数$g(s,t)$的一个简单定义，把它带入公式\ref{eqC3.7}就得到了同时考虑准确性和流畅性的句子级统计翻译模型。
+\parinterval 如图\ref{fig:scores-of-different-translation_model&language_model}所示，语言模型$\textrm{P}_{lm}(t)$给分别$t^{'}$和$t^{''}$赋予0.0107和0.0009的概率，这表明句子$t^{'}$更符合英文的表达。这与我们的期望是相吻合的。它们再分别乘以$\prod_{j,i \in \widehat{A}}{\textrm{P}(s_j,t_i)}$的值，就得到公式\ref{eqC3.13}定义的函数$g(\cdot)$的值。显然句子$t^{'}$的分数更高。同时它也是我们希望得到的翻译结果。至此，我们完成了对函数$g(s,t)$的一个简单定义，把它带入公式\ref{eqC3.7}就得到了同时考虑准确性和流畅性的句子级统计翻译模型。
 %----------------------------------------------
-% 图3.14
+% 图3.12
 \begin{figure}[htp]
    \centering
-\input{./Chapter3/Figures/figure314}
+\input{./Chapter3/Figures/figure-scores-of-different-translation_model&language_model}
    \caption{不同译文的语言模型得分和翻译模型得分}
-    \label{figureC3.14}
+    \label{fig:scores-of-different-translation_model&language_model}
 \end{figure}
 %---------------------------

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \subsection{解码}\index{Chapter3.2.5}
+\label{sec:simple-decoding}

 \parinterval \textbf{解码}是指，在得到翻译模型后，对于新输入的句子生成最佳译文的过程。具体来说，当给定任意的源语句$s$，解码系统要找到翻译概率最大的目标语译文$\hat{t}$。这个过程可以被形式化描述为：

@@ -411,17 +413,17 @@ $m$ & $n$ & $n^m \cdot m!$ \\ \hline

 \parinterval 对于如此巨大的搜索空间，我们需要一种十分有效的搜索算法才能实现机器翻译的解码。这里介绍一种贪婪的解码算法，把解码分成若干步骤，每步只翻译一个单词，并保留当前``最好''的结果，直至所有源语言单词都被翻译完毕。

-\parinterval 图\ref{figureC3.18}给出了贪婪解码算法的伪代码。其中$\pi$保存所有源语单词的候选译文，$\pi[j]$表示第$j$个源语单词的候选翻译集合，$best$保存当前最好翻译结果，$h$保存当前步生成的所有译文候选。算法的主体有两层循环，在内层循环中如果第$j$个源语单词没有被翻译过，则用$best$和它的候选译文$\pi[j]$生成新的翻译，再存于$h$中，即操作$h=h\cup{\textrm{JOIN}(best,\pi[j])}$。外层循环再从$h$中选择得分最好的结果存于$best$中，即操作$best=\textrm{PruneForTop1}(h)$，并标识相应的源语单词已翻译，即$used[best.j]=true$。该算法的核心在于，我们一直维护一个当前最好的结果，之后每一步考虑扩展这个结果的所有可能，并计算模型得分，然后再保留扩展后的最好结果。注意，在每一步中，只有排名第一的结果才会被保留，其它结果都会被丢弃。这也体现了贪婪的思想。显然这个方法不能保证搜索到全局最优的结果，但是由于每次扩展只考虑一个最好的结果，因此该方法速度很快。图\ref{figureC3.18-2}给出了算法执行过程的简单示例。当然，机器翻译的解码方法有很多，这里我们仅仅使用简单的贪婪搜索方法来解决机器翻译的解码问题，在后续章节会对更加优秀的解码方法进行介绍。
+\parinterval 图\ref{fig:greedy-MT-decoding}给出了贪婪解码算法的伪代码。其中$\pi$保存所有源语单词的候选译文，$\pi[j]$表示第$j$个源语单词的候选翻译集合，$best$保存当前最好翻译结果，$h$保存当前步生成的所有译文候选。算法的主体有两层循环，在内层循环中如果第$j$个源语单词没有被翻译过，则用$best$和它的候选译文$\pi[j]$生成新的翻译，再存于$h$中，即操作$h=h\cup{\textrm{JOIN}(best,\pi[j])}$。外层循环再从$h$中选择得分最好的结果存于$best$中，即操作$best=\textrm{PruneForTop1}(h)$，并标识相应的源语单词已翻译，即$used[best.j]=true$。该算法的核心在于，我们一直维护一个当前最好的结果，之后每一步考虑扩展这个结果的所有可能，并计算模型得分，然后再保留扩展后的最好结果。注意，在每一步中，只有排名第一的结果才会被保留，其它结果都会被丢弃。这也体现了贪婪的思想。显然这个方法不能保证搜索到全局最优的结果，但是由于每次扩展只考虑一个最好的结果，因此该方法速度很快。图\ref{fig:greedy-MT-decoding}给出了算法执行过程的简单示例。当然，机器翻译的解码方法有很多，这里我们仅仅使用简单的贪婪搜索方法来解决机器翻译的解码问题，在后续章节会对更加优秀的解码方法进行介绍。


 %----------------------------------------------
-% 图3.17
+% 图3.13
 \begin{figure}[htp]
    \centering
-\input{./Chapter3/Figures/figure318-1}
-\input{./Chapter3/Figures/figure318-2}
+\input{./Chapter3/Figures/figure-greedy-MT-decoding-1}
+\input{./Chapter3/Figures/figure-greedy-MT-decoding-2}
    \caption{贪婪的机器翻译解码过程实例}
-    \label{figureC3.18}
+    \label{fig:greedy-MT-decoding}
 \end{figure}
 %---------------------------
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@@ -431,28 +433,28 @@ $m$ & $n$ & $n^m \cdot m!$ \\ \hline

 \subsection{噪声信道模型}\index{Chapter3.3.1}

-\parinterval 首先分析下统计机器翻译或IBM模型是从什么角度去完成翻译的。人在做翻译时比较简单。对于给定的源语句$s$，不会尝试太多的可能性，而是快速地翻译一个或者若干个正确的译文$\widehat{t}$。因此在人看来除了正确的译文外，其他的翻译都是不正确的。而统计机器翻译更多地强调可能性大小，即所有的译文都是可能的。换句话说，对于源语句$s$，所有可能的目标语词串$t$都是可能的译文，只是可能性大小不同。即每对$(s,t)$都有一个概率值$\textrm{P}(t|s)$来描述$s$翻译为$t$的好与坏。如图\ref{figureC3.19}所示。
+\parinterval 首先分析下统计机器翻译或IBM模型是从什么角度去完成翻译的。人在做翻译时比较简单。对于给定的源语句$s$，不会尝试太多的可能性，而是快速地翻译一个或者若干个正确的译文$\widehat{t}$。因此在人看来除了正确的译文外，其他的翻译都是不正确的。而统计机器翻译更多地强调可能性大小，即所有的译文都是可能的。换句话说，对于源语句$s$，所有可能的目标语词串$t$都是可能的译文，只是可能性大小不同。即每对$(s,t)$都有一个概率值$\textrm{P}(t|s)$来描述$s$翻译为$t$的好与坏。如图\ref{fig:different-translation-candidate-space}所示。
 %----------------------------------------------
-% 图3.19
+% 图3.14
 \begin{figure}[htp]
    \centering
-\input{./Chapter3/Figures/figure319}
+\input{./Chapter3/Figures/figure-different-translation-candidate-space}
    \caption{不同翻译候选空间的对比：人（左）vs 机器翻译 （右）}
-    \label{figureC3.19}
+    \label{fig:different-translation-candidate-space}
 \end{figure}
 %---------------------------

-\parinterval IBM模型也是建立在如上统计模型之上。具体来说，IBM模型的基础是\textbf{噪声信道模型}(Noise Channel Model)，它是由香农在上世纪40年代末提出来的({\red 参考文献？})，并于上世纪80年代应用在语言识别领域，后来又被Brown等人用于统计机器翻译中({\red 参考文献？2篇 Brown等人还有一篇})
+\parinterval IBM模型也是建立在如上统计模型之上。具体来说，IBM模型的基础是\textbf{噪声信道模型}(Noise Channel Model)，它是由香农在上世纪40年代末提出来的\cite{shannon1949communication}，并于上世纪80年代应用在语言识别领域，后来又被Brown等人用于统计机器翻译中\cite{brown1990statistical}。

-\parinterval 在噪声信道模型中，源语言句子$s$(信宿)被看作是由目标语言句子$t$(信源)经过一个有噪声的信道得到的。如果知道了$s$和信道的性质，我们可以通过$\textrm{P}(t|s)$得到信源的信息，这个过程如图\ref{figureC3.20}所示。
+\parinterval 在噪声信道模型中，源语言句子$s$(信宿)被看作是由目标语言句子$t$(信源)经过一个有噪声的信道得到的。如果知道了$s$和信道的性质，我们可以通过$\textrm{P}(t|s)$得到信源的信息，这个过程如图\ref{fig:noise-channel-model}所示。

 %----------------------------------------------
 % 图3.20
 \begin{figure}[htp]
    \centering
-\input{./Chapter3/Figures/figure320}
+\input{./Chapter3/Figures/figure-noise-channel-model}
    \caption{噪声信道模型，其中$s$表示信宿，$t$表示信源}
-    \label{figureC3.20}
+    \label{fig:noise-channel-model}
 \end{figure}
 %---------------------------

@@ -463,7 +465,7 @@ $m$ & $n$ & $n^m \cdot m!$ \\ \hline
 \label{eqC3.18}
 \end{equation}

-\noindent\hspace{2em}公式\ref{eqC3.18}的核心内容之一是定义$\textrm{P}(t|s)$。在IBM模型中，我们使用贝叶斯准则对$\textrm{P}(t|s)$进行如下变换：
+\parinterval 公式\ref{eqC3.18}的核心内容之一是定义$\textrm{P}(t|s)$。在IBM模型中，我们使用贝叶斯准则对$\textrm{P}(t|s)$进行如下变换：

 \begin{eqnarray}
 \textrm{P}(t|s) & = &\frac{\textrm{P}(s,t)}{\textrm{P}(s)} \nonumber \\
@@ -471,31 +473,34 @@ $m$ & $n$ & $n^m \cdot m!$ \\ \hline
 \label{eqC3.19}
 \end{eqnarray}

-\noindent\hspace{2em}公式\ref{eqC3.19}可以分为三部分。第一部分是$\textrm{P}(s|t)$，也称为翻译模型。它表示给定目标语句$t$生成源语句$s$的概率，需要注意翻译的方向已经从$\textrm{P}(s|t)$转向了$\textrm{P}(t|s)$，但无须刻意的区分，可以简单地理解为翻译模型刻画了$s$和$t$的翻译对应程度。第二部分是$\textrm{P}(t)$，也称为语言模型。它表示的是目标语句$t$的通顺程度。第三部分是$\textrm{P}(s)$，也是语言模型，但刻画的是源语句$s$的通顺程度，在后续的建模中这一项是可以被化简的。
+\parinterval 公式\ref{eqC3.19}把$s$到$t$的翻译概率转化为$\frac{\textrm{P}(s|t)\textrm{P(t)}}{\textrm{P}(s)}$，包括三个部分：第一部分是由译文$t$到源语言句子$s$的翻译概率$\textrm{P}(s|t)$，也被称为翻译模型。它表示给定目标语句$t$生成源语句$s$的概率，需要注意是翻译的方向已经从$\textrm{P}(s|t)$转向了$\textrm{P}(t|s)$，但无须刻意的区分，可以简单地理解为翻译模型刻画了$s$和$t$的翻译对应程度；第二部分是$\textrm{P}(t)$，也被称为语言模型。它表示的是目标语言句子$t$的流畅度；第三部分是$\textrm{P}(s)$，表示源语言句子$s$出现的可能性。因为$s$输入的不变量，而且$\textrm{P}(s) \ge 0$，所以省略分母部分$\textrm{P}(s)$不会影响$\frac{\textrm{P}(s|t)\textrm{P(t)}}{\textrm{P}(s)}$最大值的求解。于是，机器翻译的目标可以被重新定义为：给定源语句子$s$，寻找这样的目标语译文$t$，它使得翻译模型$\textrm{P}(s|t)$和语言模型$\textrm{P}(t)$乘积最大：

-\noindent\hspace{2em}因此我们可以将翻译问题重新表示为公式\ref{eqC3.20}。其中因为$s$的变化不会影响对目标译文$t$的选择，所以可以省略$\textrm{P}(s)$。该式可以理解为：给定源语句子$s$，要寻找这样的目标语译文$t$，它使得翻译模型$\textrm{P}(s|t)$和语言模型$\textrm{P}(t)$乘积最大。
-\begin{equation}
-\begin{split}
-{\widehat{t}}&={\argmax_t\textrm{P}(t|s)}\\
-&={\argmax_t \frac{\textrm{P}(t|s)\textrm{P}(t)}{\textrm{P}(s)}=\argmax_t\textrm{P}(s|t)\textrm{P}(t) }
+\begin{eqnarray}
+\hat{t} & = & \argmax_t \textrm{P}(t|s) \nonumber \\
+          & = & \argmax_t \frac{\textrm{P}(s|t)\textrm{P(t)}}{\textrm{P}(s)} \nonumber \\
+          & = & \argmax_t \textrm{P}(s|t)\textrm{P}(t)
 \label{eqC3.20}
-\end{split}
-\end{equation}
+\end{eqnarray}

-\noindent\hspace{2em}公式\ref{eqC3.20}是IBM模型最基础的建模方式，它把问题分解为两项：翻译模型和语言模型。这样的做法隐含着一个深刻的概念：如果没有语言模型，并且翻译模型不够强大的话，可能会生成局部翻译得当，但整体上不通顺的句子。见图\ref{figureC3.12}描述的例子。为了避免这个问题，从数学技巧上把$\textrm{P}$加了进来，但这并不是必要的过程。
+\parinterval 公式\ref{eqC3.20}展示了IBM模型最基础的建模方式，它把模型分解为两项：(反向)翻译模型$\textrm{P}(s|t)$和语言模型$\textrm{P}(t)$。一个很自然的问题是：直接用$\textrm{P}(t|s)$定义翻译问题不就可以了吗，干嘛用$\textrm{P}(s|t)$和$\textrm{P}(t)$的联合模型？从理论上来说，正向翻译模型$\textrm{P}(t|s)$和反向翻译模型$\textrm{P}(s|t)$的数学建模可以是一样的，因为我们只需要在建模的过程中把两个语言调换即可。使用$\textrm{P}(s|t)$和$\textrm{P}(t)$的联合模型的意义在于引入了语言模型，它可以很好的对译文的流畅度进行评价，确保结果是通顺的目标语言句子。可以回忆一下\ref{sec:sentence-level-translation}节中讨论的问题，如果只使用翻译模型可能会造成一个局面，就是译文的单词都和源语言单词对应的很好，但是由于语序的问题，读起来却不像人说的话。从这个角度说，引入语言模型是十分必要的。这个问题在Brown等人的论文中也有讨论\cite{brown1990statistical}，他们提到单纯使用$\textrm{P}(t|s)$会把概率分配给一些翻译对应比较好但是不合法的目标语句子，而且这部分概率可能会很大，影响模型的决策。这也正体现了IBM模型的创新之处，作者用数学技巧把$\textrm{P(t)}$引入进来，保证了系统的输出是通顺译文。语言模型也被广泛使用在语音识别等领域以保证结果的流畅性，甚至应用的历史比机器翻译要长得多，这里的方法也有借鉴相关工作的味道。

-\noindent\hspace{2em}上述就是IBM模型的建模思想。其中的过程就是为了引入$\textrm{P}(t)$，而非简单地为了使用贝叶斯变换。
-\subsection{建模}\index{Chapter3.3.2}
+实际上，在机器翻译中引入语言模型是一个很深刻的概念。在IBM模型之后相当长的时间里，语言模型一直是机器翻译各个部件中最重要的部分。即使现在机器翻译模型已经更新换代，对译文连贯性的建模也是所有系统中需要包含的内容（即使隐形体现）。
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-\noindent\hspace{2em}IBM模型把翻译问题描述为：给定源语句$s$，在所有可能的译文中找到使翻译模型$\textrm{P}(s|t)$和语言模型$\textrm{P}(t)$乘积最大的译文$\widehat{t}$，如公式\ref{eqC3.20}所示。在具体解决翻译问题时，需要面临三个基本问题，如下所示。
-
-\noindent\hspace{2em}1. 建模：如何描述计算$\textrm{P}(s|t)$和$\textrm{P}(t)$的计算方式。换句话说，如何用可计算的方式把概率描述出来。这也是最核心的问题。
+\subsection{建模}\index{Chapter3.3.2}

-\noindent\hspace{2em}2. 训练：如何获得计算$\textrm{P}(s|t)$和$\textrm{P}(t)$所需的参数。即如何从数据中得到得到模型的最优参数。
+\parinterval 公式\ref{eqC3.20}给出了统计机器翻译问题的数学描述。为了实现这个过程，面临着三个基本问题：

-\noindent\hspace{2em}3. 解码：如何完成搜索最优解的过程$argmax$。
+\begin{itemize}
+\vspace{0.5em}
+\item \textbf{建模}（modeling）：如何建立$\textrm{P}(s|t)$和$\textrm{P}(t)$的数学模型。换句话说，需要用可计算的方式把翻译问题进行描述，这也是最核心的问题。
+\vspace{0.5em}
+\item \textbf{训练}（training）：如何获得$\textrm{P}(s|t)$和$\textrm{P}(t)$所需的参数。即从数据中得到得到模型的最优参数。
+\vspace{0.5em}
+\item \textbf{解码}（decoding）：如何完成搜索最优解的过程。即完成$\argmax$。
+\vspace{0.5em}
+\end{itemize}

-\noindent\hspace{2em}我们先不介绍上述三个问题该如何解决，而与\ref{chapter3.2.3}小节中的公式\ref{eqC3.13}做比较，即$g(s,t)$函数。如图\ref{figureC3.21}所示，我们看到$g(s,t)$函数可以与本节的建模方式相对应。即$g(s,t)$函数中红色部分求译文$t$的可能性大小，对应翻译模型$\textrm{P}(s|t)$；蓝色部分求译文的平滑或流畅程度，对应语言模型$\textrm{P}(t)$。尽管这种对应并不是严格的，但也间接地完成了翻译问题的建模。
+\parinterval 为了理解以上的问题，可以先回忆一下\ref{chapter3.2.3}小节中的公式\ref{eqC3.13}，即$g(s,t)$函数的定义，它用于评估一个译文的好与坏。如图\ref{figureC3.21}所示，$g(s,t)$函数与公式\ref{eqC3.20}的建模方式非常一致，即$g(s,t)$函数中红色部分描述译文$t$的可能性大小，对应翻译模型$\textrm{P}(s|t)$；蓝色部分描述译文的平滑或流畅程度，对应语言模型$\textrm{P}(t)$。尽管这种对应并不十分严格的，但也可以看出在处理机器翻译问题上，很多想法的本质是一样的。
 %----------------------------------------------
 % 图3.21
 \begin{figure}[htp]
@@ -506,206 +511,216 @@ $m$ & $n$ & $n^m \cdot m!$ \\ \hline
 \end{figure}
 %---------------------------

-\noindent\hspace{2em}但$g(s,t)$函数对翻译问题的建模很粗糙。因此下面我们将介绍IBM模型中更严谨和科学的定义与建模。对于语言模型$\textrm{P}(t)$和解码过程（即$argmax$）在前面的内容中都有介绍，所以重点介绍如何求解$\textrm{P}(s|t)$。主要包括两个问题：第一、翻译模型建模，即$\textrm{P}(s|t)$的计算方法；第二、翻译模型参数估计，即计算$\textrm{P}(s|t)$所需的参数。本节主要回答第一个问题，第二个问题留在后面进行介绍。
+\parinterval 但$g(s,t)$函数的建模很粗糙，因此下面我们将介绍IBM模型中对问题更严谨的定义与建模。对于语言模型$\textrm{P}(t)$和解码过程在前面的内容中都有介绍，所以本章的后半部分会重点介绍如何定义$\textrm{P}(s|t)$模型以及如何训练模型参数。本节主要回答第一个问题，第二个问题留在后面进行介绍。

-\noindent\textbf{词对齐}\index{Chapter3.3.2.1}
+\subsubsection{（一）词对齐}\index{Chapter3.3.2.1}

-\noindent\hspace{2em}IBM模型中有一个非常基础的假设—词对齐（或称单词对齐）。词对齐描述了句子和它的译文之间在单词级别的对应。具体地说，给定源语句$s$和目标语译文$t$，并且$s$由$s_1$到$s_m$共$m$个单词组成，$t$由$t_1$到$t_n$共n个单词组成。IBM模型假设词对齐满足下述两个条件。
+\parinterval IBM模型中有一个非常基础的假设是词对齐假设（或称单词对齐假设）。\textbf{词对齐}（word alignment）描述了源语言句子和目标语句子之间单词级别的对应。具体地说，给定源语句子$s$和目标语译文$t$，其中$s$由$s_1$到$s_m$共$m$个单词组成，$t$由$t_1$到$t_n$共$n$个单词组成。IBM模型假设词对齐满足下述两个条件。

-\noindent\hspace{2em}第一、一个源语言单词只能对应一个目标语单词。图\ref{figureC3.22}表示的例子中，(a)和\\(c)都满足该条件，尽管(c)中的“谢谢”和“你”都对应“thanks”，但并不违背该规则。而(b)不满足该条件，因为“谢谢”同时对应到了两个目标语单词上。因此这种词对齐称为非对称的词对齐。这样假设的目的也是为了减少建模的复杂度。在后来的方法中也提出了双向词对齐，用于建模一个源语言单词对应到多个目标语单词的情况。
+\begin{itemize}
+\vspace{0.5em}
+\item 一个源语言单词只能对应一个目标语单词。在图\ref{fig:different-alignment-comparison}表示的例子中，(a)和\\(c)都满足该条件，尽管(c)中的“谢谢”和“你”都对应“thanks”，但并不违背条件。而(b)不满足条件，因为“谢谢”同时对应到了两个目标语单词上。这个约束条件也导致这里的词对齐变成一种\textbf{非对称的词对齐}，因为它只对源语言做了约束，但是目标语言没有。使用这样的约束的目的是为了减少建模的复杂度。在后来的方法中也提出了双向词对齐，用于建模一个源语言单词对应到多个目标语单词的情况。
 %----------------------------------------------
-% 图3.21
+% 图3.17
 \begin{figure}[htp]
    \centering
-\input{./Chapter3/Figures/figure322}
-    \caption{此处为图片的描述...}
-    \label{figureC3.22}
+\input{./Chapter3/Figures/figure-different-alignment-comparison}
+    \caption{不同次对齐对比}
+    \label{fig:different-alignment-comparison}
 \end{figure}
 %---------------------------

-\noindent\hspace{2em}第二、源语言单词可以翻译为空，这时它对应到一个虚拟或伪造的目标语单词$t_0$。图\ref{figureC3.23}表示的例子中，“在”没有对应到“on the table”中的任意词，而是把它对应到$t_0$上。此时所有的源语言单词都能找到一个目标语单词对应，只不过有的单词对应到 上。这个条件或规则的提出主要建模对空翻译，即源语言单词对应第0个目标语单词$t_0$的情况。
+\vspace{0.5em}
+\item 源语言单词可以翻译为空，这时它对应到一个虚拟或伪造的目标语单词$t_0$。在图\ref{fig:alignment-of-empty-translation}表示的例子中，``在''没有对应到``on the table''中的任意一个词，而是把它对应到$t_0$上。这个条件保证了所有的源语言单词都能找到一个目标语单词对应。这个条件也很好的引入了\textbf{空对齐}的思想，即源语言单词不对应任何真实存在的单词的情况。而这种空对齐的情况在翻译中是频繁出现的，比如虚词的翻译。
 %----------------------------------------------
-% 图3.21
+% 图3.18
 \begin{figure}[htp]
    \centering
-\input{./Chapter3/Figures/figure323}
-    \caption{此处为图片的描述...}
-    \label{figureC3.23}
+\input{./Chapter3/Figures/figure-alignment-of-empty-translation}
+    \caption{空翻译的对齐（``在''对应到$t_0$）}
+    \label{fig:alignment-of-empty-translation}
 \end{figure}
 %---------------------------
+\vspace{0.5em}
+\end{itemize}

-\noindent\hspace{2em}那如何描述词对齐呢？给定源语句子$s$、目标译文$t$和词对齐$a$。其中$a_1$是由$a_m$\\到 共$m$个项依次组成，即$a=a_1...a_m$。$a_j$表示第$j$个源语单词$s_j$对应的目标语单词的位置。如图\ref{figureC3.24}所示，实线表示的是“在 桌子 上”和“on the table”单词之间的对应。该对应关系记为$a_1=0$,$a_2=3$,$a_3=1$。它表示第1个源语单词“在”对应到目标语译文的第0个位置，第2个源语单词“桌子”对应在目标语译文的位置是3，第3个源语单词“上”对应在目标语译文的位置是1。
+\parinterval 通常，我们把词对齐记为$a$，它由$a_1$到$a_m$共$m$个词对齐连接组成，即$a=a_1...a_m$，其中$m$表示源语言句子长度。$a_j$表示第$j$个源语单词$s_j$对应的目标语单词的位置。如图\ref{fig:word-alignment-instance}所示，实线表示的是``在 桌子 上''和``on the table''之间的词对齐。具体词对齐关系可以记为$a_1=0, a_2=3, a_3=1$。它表示第1个源语单词``在''对应到目标语译文的第0个位置，第2个源语单词``桌子''对应到目标语译文的第3个位置是，第3个源语单词``上''对应到目标语译文的第1个位置。
 %----------------------------------------------
-% 图3.21
+% 图3.19
 \begin{figure}[htp]
    \centering
-\input{./Chapter3/Figures/figure324}
-    \caption{此处为图片的描述...}
-    \label{figureC3.24}
+\input{./Chapter3/Figures/figure-word-alignment-instance}
+    \caption{词对齐实例}
+    \label{fig:word-alignment-instance}
 \end{figure}
 %---------------------------

-\noindent\textbf{建模翻译模型}\index{Chapter3.3.2.2}
+\subsubsection{（二）基于词对齐的翻译模型}\index{Chapter3.3.2.2}
+
+\parinterval 直接估计$\textrm{P}(s|t)$很难，因为大部分句子即使在大规模的语料中也只出现过一次甚至没有出现过。为了解决这个问题，IBM模型的建模思想是：句子之间的对应可以由单词之间的对应进行表示。更具体的说，把句子之间对应的概率转换为所有可能的词对齐的生成概率，如下：

-\noindent\hspace{2em}直接对句子的翻译概率分布$\textrm{P}(s|t)$进行建模很难，因为大部分句子即使在大规模的语料中也只出现过一次。为了解决这个问题，IBM模型提出的第一个建模思想：句子之间的对应可以由单词之间的对应进行表示。更具体的说，把句子之间对应的概率转换为所有可能的词对齐的生成概率，如公式\ref{eqC3.21}所示。
 \begin{equation}
 \textrm{P}(s|t)=\sum_a\textrm{P}(s,a|t)
 \label{eqC3.21}
 \end{equation}

-\noindent\hspace{2em}换句话说，公式\ref{eqC3.21}在求解$t$到$s$的翻译概率或者$s$和$t$的互译概率时，枚举$s$和$t$之间所有可能的单词对齐，并把对应的对齐概率进行求和，得到了$t$到$s$的翻译概率。
+\parinterval 公式\ref{eqC3.21}使用了简单的全概率公式把$\textrm{P}(s|t)$进行展开。通过访问$s$和$t$之间所有可能的词对齐$a$，并把对应的对齐概率进行求和，得到了$t$到$s$的翻译概率。这里，可以把词对齐看作翻译的隐含变量，这样从$t$到$s$的生成就变为从$t$同时生成$s$和隐含变量$a$的问题。引入隐含变量是生成式模型常用的手段，通过使用隐含变量，可以把较为困难的端到端学习问题转化为分步学习问题。

-\noindent\hspace{2em}举个例子说明公式\ref{eqC3.21}。如图\ref{figureC3.25}所示，表示把求“谢谢 你”到“thank you”的翻译概率分解为9种可能的词对齐对应的概率的加和。我们用$t$和$s$分别表示“谢谢 你”和“thank you”。为什么是9种词对齐呢？$s$加上空标记共3个词，而$t$仅有2个词，并且都有可能对应到$s$中任意词，所以共有$3\times3=9$种可能。
+\parinterval 举个例子说明公式\ref{eqC3.21}的实际意义。如图\ref{fig:alignment-of-all-words-in-zh-en-sentence}所示，可以把从``谢谢 你''到``thank you''的翻译分解为9种可能的词对齐。因为源语言句子$s$有2个词，目标语言句子$t$加上空标记$t_0$共3个词，因此每个源语言单词有3个可能对齐的位置，整个句子共有$3\times3=9$种可能的词对齐。
 %----------------------------------------------
-% 图3.21
+% 图3.20
 \begin{figure}[htp]
    \centering
-\input{./Chapter3/Figures/figure325}
-    \caption{此处为图片的描述...}
-    \label{figureC3.25}
+\input{./Chapter3/Figures/figure-alignment-of-all-words-in-zh-en-sentence}
+    \caption{一个汉译英句对的所有词对齐可能}
+    \label{fig:alignment-of-all-words-in-zh-en-sentence}
 \end{figure}
 %---------------------------

-\noindent\hspace{2em}在求解词对齐对应的概率 时，我们也可以用公式\ref{eqC3.13}定义的$g(s,t)$函数，但IBM模型用生成模型更加深刻更加细致地定义了它的计算，如公式\ref{eqC3.22}所示。其中$s$、$a$和$t$分别代表源语句子、目标语译文和词对齐；$s_j$和$a_j$分别表示第$j$个源语单词及其对齐；$s_1^{j-1}$和$a_1^{j}$表示第$j-1$个源语单词和第$j$个源语单词的对齐；$m$表示源语句子的长度。
+\parinterval 接下来的问题是如何定义$\textrm{P}(s,a|t)$ - 即定义词对齐的生成概率。但是，隐含变量$a$仍然很复杂，因此直接定义$\textrm{P}(s,a|t)$也很困难，在IBM模型中，为了化简问题，$\textrm{P}(s,a|t)$被进一步分解。使用链式法则，可以得到：
+%%%%%%%%%%%%%%%%%%%%%%%%%
 \begin{equation}
 \textrm{P}(s,a|t)=\textrm{P}(m|t)\prod_{j=1}^{m}{\textrm{P}(a_j|a_1^{j-1},s_1^{j-1},m,t)\textrm{P}(s_j|a_1^{j},s_1^{j-1},m,t)}
 \label{eqC3.22}
 \end{equation}

-\noindent\hspace{2em}生成模型不像端到端的深度学习，而是把生成数据的过程分解成若干步，通过概率分布对每一步进行建模，然后把它们组合成联合概率。如图\ref{figureC3.26}所示，我们将公式\ref{eqC3.22}分解为四个部分，并用不同的序号和颜色进行表示。下面我们介绍下每一部分的含义。
-
-\noindent\hspace{2em}1. 根据译文$t$选择源文$s$的长度$m$，即估计概率分布$\textrm{P}(m|t)$。

-\noindent\hspace{2em}2. 当确定源文长度$m$后，循环每个位置$j$逐次生成单词。
+\noindent 其中$s_j$和$a_j$分别表示第$j$个源语单词及第$j$源语言单词对应到目标位置，$s_1^{j-1}$和$a_1^{j}$表示第$j-1$个源语单词和第$j$个源语单词的对齐，$m$表示源语句子的长度。公式\ref{eqC3.22}可以进一步被分解为四个部分，具体含义如下：

-\noindent\hspace{2em}3. 根据译文$t$、源文长度$m$、已经生成的源语单词$s_1^{j-1}$和对齐$a_1^{j-1}$，生成第$j$个位置的对齐结果$a_j$，用概率分布$\textrm{P}(a_j|a_1^{j-1},s_1^{j-1},m,t)$表示。
+\begin{itemize}
+\vspace{0.5em}
+\item 根据译文$t$选择源文$s$的长度$m$，用$\textrm{P}(m|t)$表示；
+\vspace{0.5em}
+\item 当确定源文长度$m$后，循环每个位置$j$逐次生成每个源语言单词$s_j$，也就是$\prod_{j=1}^m$计算的内容；
+\vspace{0.5em}
+\item 根据译文$t$、源文长度$m$、已经生成的源语单词$s_1^{j-1}$和对齐$a_1^{j-1}$，生成第$j$个位置的对齐结果$a_j$，用$\textrm{P}(a_j|a_1^{j-1},s_1^{j-1},m,t)$表示；
+\vspace{0.5em}
+\item 根据译文$t$、源文长度$m$、已经生成的源语单词$s_1^{j-1}$和对齐$a_1^j$，生成第$j$个位置的源语言单词$s_j$，用$\textrm{P}(s_j|a_1^{j},s_1^{j-1},m,t)$表示。
+\vspace{0.5em}
+\end{itemize}

-\noindent\hspace{2em}4. 根据译文$t$、源文长度$m$、已经生成的源语单词$s_1^{j-1}$和对齐$a_1^j$，生成第$j$个位置的源语言单词$s_j$，即$\textrm{P}(s_j|a_1^{j},s_1^{j-1},m,t)$。
 %----------------------------------------------
 % 图3.26
-\begin{figure}[htp]
-    \centering
-\input{./Chapter3/Figures/figure326}
-    \caption{此处为图片的描述...}
-    \label{figureC3.26}
-\end{figure}
-%---------------------------
+%\begin{figure}[htp]
+%    \centering
+%\input{./Chapter3/Figures/figure326}
+%    \caption{{\red 这个图应该和公式3.19合并，因为都是描述的一个事情}}
+%    \label{figureC3.26}
+%\end{figure}
+%%---------------------------

-\noindent\hspace{2em}换句话说，当我们求概率分布$\textrm{P}(s,a|t)$时，首先根据译文$t$确定源文$s$的长度$m$；当知道源文有多少个单词后，循环$m$次，依次生成第1个到第$m$个源文单词；当生成第$j$个源文单词时，要先确定它是由哪个目标语译文单词生成的，即确定生成的源语单词对应的译文单词；当知道了目标语译文单词的位置或词对齐，就能确定第$j$个位置有哪些候选的源文单词。
+\parinterval 换句话说，当我们求概率分布$\textrm{P}(s,a|t)$时，首先根据译文$t$确定源语言句子$s$的长度$m$；当知道源文有多少个单词后，循环$m$次，依次生成第1个到第$m$个源语言单词；当生成第$j$个源语言单词时，要先确定它是由哪个目标语译文单词生成的，即确定生成的源语言单词对应的译文单词的位置；当知道了目标语译文单词的位置，就能确定第$j$个位置的源语言单词。

-\noindent\textbf{举例说明}\index{Chapter3.3.2.3}
+\parinterval 需要注意的是公式\ref{eqC3.22}定义的模型并没有做任何化简和假设，也就是说公式的左右两端是严格相等的。在后面的内容中会看到，这种将一个整体进行拆分的方法可以有助于分步骤化简并处理问题。

-\noindent\hspace{2em}我们用一个简单的例子来说明公式\ref{eqC3.22}。如图3.25所示，源语句子$s$是“在 桌子 上”，目标语译文$t$是“on the table”，以及词对齐$a$等于${1-0,2-3,3-1}$。基于当前的假设，我们套用公式\ref{eqC3.22}用$t$生成$s$和$a$，即求概率$\textrm{P}(s,a|t)$。求解的过程如下所示。
+\subsubsection{（三）基于词对齐的翻译实例}\index{Chapter3.3.2.3}
 %----------------------------------------------
 % 图3.26
 \begin{figure}[htp]
    \centering
 \input{./Chapter3/Figures/figure327}
-    \caption{此处为图片的描述...}
+    \caption{汉译英词对齐实例}
    \label{figureC3.27}
 \end{figure}
 %---------------------------

-\noindent\hspace{2em}1. 首先根据译文确定源文$s$的单词数量，可知有3个单词。我们用公式\ref{eqC3.23}表示。
-\begin{equation}
-\textrm{P}(m=3|'t_0\;on\;the\;table')
-\label{eqC3.23}
-\end{equation}
+\parinterval 我们用一个简单的例子来对公式\ref{eqC3.22}进行进一步说明。如图\ref{figureC3.27}所示，源语言句子``在 桌子 上''目标语译文``on the table''之间的词对齐为$a=\{1-0,2-3,3-1\}$。基于当前的假设，我们套用公式\ref{eqC3.22}用$t$生成$s$和$a$，即求概率$\textrm{P}(s,a|t)$。求解的过程如下所示：

-\noindent\hspace{2em}2. 再确定源语单词$s_1$由谁生成的且生成的是什么。由词对齐知道$s_1$由第0个目标语单词生成的，也就是$t_0$。当知道了$s_1$由$t_0$生成的，就可以通过$t_0$生成源语第一个单词“在”。我们用公式\ref{eqC3.24}表示。
-\begin{equation}
-\begin{split}
-&{\textrm{P}(a_1\;= 0\;\; |\phi,\phi,3,'t_0\;on\;the\;table')\quad \times}\\
-&{\textrm{P}(s_1\;= \textrm{在}\;|\{1-0\},\phi,3,'t_0\;on\;the\;table') }
-\label{eqC3.24}
-\end{split}
-\end{equation}
+\begin{itemize}

-\noindent\hspace{2em}3. 类似于过程2，我们依次确定源语单词$s_2$和$s_3$由谁生成且生成的是什么。如公式\ref{eqC3.25}和\ref{eqC3.26}所示。
-\begin{equation}
-\begin{split}
-&{\textrm{P}(a_2\;= 3\;\; |\{1-0\},'\textrm{在}',3,'t_0\;on\;the\;table')\quad \times}\\
-&{\textrm{P}(s_1\;= \textrm{桌子}\;|\{1-0,2-3\},'\textrm{在}',3,'t_0\;on\;the\;table') }
-\label{eqC3.25}
-\end{split}
-\end{equation}
-\begin{equation}
-\begin{split}
-&{\textrm{P}(a_3\;= 1\;\; |\{1-0,2-3\},'\textrm{在\;桌子}',3,'t_0\;on\;the\;table')\quad \times}\\
-&{\textrm{P}(s_1\;= \textrm{上}\;|\{1-0,2-3,3-1\},'\textrm{在\;桌子}',3,'t_0\;on\;the\;table') }
-\label{eqC3.26}
-\end{split}
-\end{equation}
+\vspace{0.5em}
+\item 首先根据译文确定源文$s$的单词数量（$m=3$），即$\textrm{P}(m=3|\textrm{``}t_0\;\textrm{on\;the\;table''})$；

-\noindent\hspace{2em}4. 最后将公式\ref{eqC3.23}、\ref{eqC3.24}、\ref{eqC3.25}和\ref{eqC3.26}乘到一起，就得到概率 ，如公式\ref{eqC3.27}所示。
-\begin{equation}
-\begin{split}
-{\textrm{P}(s,a|t)}\; &=\;{\textrm{P}(m|t) \prod\limits_{j=1}^{m} \textrm{P}(a_j|a_{1}^{j-1},s_{1}^{j-1},m,t) \textrm{P}(s_j|a_{1}^{j},s_{1}^{j-1},m,t)}\\
-&={\textrm{P}(m=3 \mid \textrm{'$t_0$ on the table'}){\times}}\\
-&\quad\;{\textrm{P}(a_1=0 \mid \phi,\phi,3,\textrm{'$t_0$ on the table'}){\times} }\\
-&\quad\;{\textrm{P}(f_1=\textrm{在} \mid \textrm{\{1-0\}},\phi,3,\textrm{'$t_0$ on the table'}){\times} } \\
-&\quad\;{\textrm{P}(a_2=3 \mid \textrm{\{1-0\}},\textrm{'在'},3,\textrm{'$t_0$ on the table'}) {\times}}\\
-&\quad\;{\textrm{P}(f_2=\textrm{桌子} \mid \textrm{\{1-0,2-3\}},\textrm{'在'},3,\textrm{'$t_0$ on the table'}) {\times}} \\
-&\quad\;{\textrm{P}(a_3=1 \mid \textrm{\{1-0,2-3\}},\textrm{'在 桌子'},3,\textrm{'$t_0$ on the table'}) {\times}}\\
-&\quad\;{\textrm{P}(f_3=\textrm{上} \mid \textrm{\{1-0,2-3,3-1\}},\textrm{'在 桌子'},3,\textrm{'$t_0$ on the table'})  }
+\vspace{0.5em}
+\item 再确定源语单词$s_1$由谁生成的且生成的是什么。可以看到$s_1$由第0个目标语单词生成的，也就是$t_0$，表示为$\textrm{P}(a_1\;= 0\;\; |\phi,\phi,3,\textrm{``}t_0\;\textrm{on\;the\;table''})$，其中$\phi$表示空。当知道了$s_1$是由$t_0$生成的，就可以通过$t_0$生成源语第一个单词``在''，即$\textrm{P}(s_1\;= \textrm{在}\;|\{1-0\},\phi,3,\textrm{``}t_0\;on\;the\;table\textrm{''}) $；
+
+\vspace{0.5em}
+\item 类似于生成$s_1$，我们依次确定源语单词$s_2$和$s_3$由谁生成且生成的是什么；
+
+\vspace{0.5em}
+\end{itemize}
+
+\parinterval 最后得到基于词对齐$a$的翻译概率为：
+
+\begin{eqnarray}
+\textrm{P}(s,a|t)\; &= & \textrm{P}(m|t) \prod\limits_{j=1}^{m} \textrm{P}(a_j|a_{1}^{j-1},s_{1}^{j-1},m,t) \textrm{P}(s_j|a_{1}^{j},s_{1}^{j-1},m,t) \nonumber \\
+&=&\textrm{P}(m=3 \mid \textrm{``$t_0$ on the table''}){\times} \nonumber \\
+&&{\textrm{P}(a_1=0 \mid \phi,\phi,3,\textrm{``$t_0$ on the table''}){\times} } \nonumber \\
+&&{\textrm{P}(f_1=\textrm{``在''} \mid \textrm{\{1-0\}},\phi,3,\textrm{``$t_0$ on the table''}){\times} } \nonumber \\
+&&{\textrm{P}(a_2=3 \mid \textrm{\{1-0\}},\textrm{``在''},3,\textrm{``$t_0$ on the table''}) {\times}} \nonumber \\
+&&{\textrm{P}(f_2=\textrm{``桌子''} \mid \textrm{\{1-0,2-3\}},\textrm{``在''},3,\textrm{``$t_0$ on the table''}) {\times}} \nonumber \\
+&&{\textrm{P}(a_3=1 \mid \textrm{\{1-0,2-3\}},\textrm{``在 桌子''},3,\textrm{``$t_0$ on the table''}) {\times}} \nonumber \\
+&&{\textrm{P}(f_3=\textrm{``上''} \mid \textrm{\{1-0,2-3,3-1\}},\textrm{``在 桌子''},3,\textrm{``$t_0$ on the table''})  }
 \label{eqC3.27}
-\end{split}
-\end{equation}
+\end{eqnarray}
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \section{IBM模型1-2}\index{Chapter3.4}

-\noindent\hspace{2em}回顾公式\ref{eqC3.21}和公式\ref{eqC3.22}，我们发现了两个严重的问题。问题一、对于公式(3.20)，如何遍历所有的对齐$a$；问题二、对于公式\ref{eqC3.22}，如何计算$\textrm{P}(m|t)$、$\textrm{P}(a_j|$\\$a_1^{j-1},s_1^{j-1},m,t)$和$\textrm{P}(s_j|a_1^{j},s_1^{j-1},m,t)$。Peter E. Brown等人总共提出了5种解决方法。第一个问题可以通过一定的数学技巧进行高效的求解；对于第二个问题，可以通过一些假设进行化简，依据化简的层次和复杂度不同，可以分为IBM模型1、IBM模型2、IBM模型3、IBM模型4以及IBM模型5。本节首先介绍较为简单的IBM模型1-2。
+\parinterval 公式\ref{eqC3.21}和公式\ref{eqC3.22}把翻译问题定义为对词对齐和句子同时进行生成的问题。其中有两个问题：首先，公式\ref{eqC3.21}的右端（$\sum_a\textrm{P}(s,a|t)$）要求对所有的词对齐概率进行求和，但是词对齐的数量随着句子长度是呈指数增长的，如何遍历所有的对齐$a$？其次，公式\ref{eqC3.22}虽然对词对齐的问题进行了描述，但是模型中的很多参数仍然很复杂，如何计算$\textrm{P}(m|t)$、$\textrm{P}(a_j|a_1^{j-1},s_1^{j-1},m,t)$和$\textrm{P}(s_j|a_1^{j},s_1^{j-1},m,t)$？针对这些问题，Brown等人总共提出了5种解决方案，这也就是被后人所熟知的IBM翻译模型。第一个问题可以通过一定的数学或者工程技巧进行求解；第二个问题可以通过一些假设进行化简，依据化简的层次和复杂度不同，可以分为IBM模型1、IBM模型2、IBM模型3、IBM模型4以及IBM模型5。本节首先介绍较为简单的IBM模型1-2。
 %从此处往下公式+2
+%%%%%%%%%%%%%%%%%%%%%%%
 \subsection{IBM模型1}\index{Chapter3.4.1}

-\noindent\hspace{2em}在IBM模型1中通过一些假设，对公式\ref{eqC3.22}中的三个项进行了简化。
+\parinterval IBM模型1对公式\ref{eqC3.22}中的三项进行了简化。具体化简方法如下：

-\noindent\hspace{2em}第一、假设$\textrm{P}(m|t)$为常数$\varepsilon$，即源语言的长度是等分布的。如公式\ref{eqC3.28}所示。
+\begin{itemize}
+\vspace{0.5em}
+\item 假设$\textrm{P}(m|t)$为常数$\varepsilon$，即源语言的长度的生成概率服从均匀分布，如下：
 \begin{equation}
 \textrm{P}(m|t)\; \equiv \; \varepsilon
 \label{eqC3.28}
 \end{equation}

-\noindent\hspace{2em}第二、对齐概率$\textrm{P}(a_j|a_1^{j-1},s_1^{j-1},m,t)$仅依赖于译文长度$l=1$，即假设对齐概率也是均匀分布。换句话说，对于任何$j$到它对齐到目标语句子的任何位置都是等概率的。比如译文为“on the table”，再加上$t_0$共4个位置，相应的源语句子的单词对齐到这4个位置的概率是一样的。
+\vspace{0.5em}
+\item 对齐概率$\textrm{P}(a_j|a_1^{j-1},s_1^{j-1},m,t)$仅依赖于译文长度$l$，即每个词对齐连接的概率也服从均匀分布。换句话说，对于任何$j$到它对齐到目标语句子的任何位置都是等概率的。比如译文为``on the table''，再加上$t_0$共4个位置，相应的任意源语单词对齐到这4个位置的概率是一样的。具体描述如下：
+
 \begin{equation}
 \textrm{P}(a_j|a_1^{j-1},s_1^{j-1},m,t) \equiv \frac{1}{l+1}
 \label{eqC3.29}
 \end{equation}

-\noindent\hspace{2em}第三、源语单词$s_j$生成概率$\textrm{P}(a_j|a_1^{j-1},s_1^{j-1},m,t)$仅依懒与其对齐的译文单词$t_{a_i}$，即词汇翻译概率$f(s_j|t_{a_i})$。此时词汇翻译概率满足$\sum_{s_j}{f(s_j|t_{a_i})}=1$。比如在图\ref{figureC3.27}表示的例子中，源语单词“上”生成的概率只和与它对齐的“on”有关系，与其他单词没有关系。
+\vspace{0.5em}
+\item 源语单词$s_j$生成概率$\textrm{P}(a_j|a_1^{j-1},s_1^{j-1},m,t)$仅依懒与其对齐的译文单词$t_{a_i}$，即词汇翻译概率$f(s_j|t_{a_i})$。此时词汇翻译概率满足$\sum_{s_j}{f(s_j|t_{a_i})}=1$。比如在图\ref{figureC3.27}表示的例子中，源语单词``上''出现的概率只和与它对齐的单词``on''有关系，与其它单词没有关系。
+
 \begin{equation}
 \textrm{P}(s_j|a_1^{j},s_1^{j-1},m,t) \equiv f(s_j|t_{a_i})
 \label{eqC3.30}
 \end{equation}

-\noindent\hspace{2em}我们用一个简单的例子说明公式\ref{eqC3.30}。如图\ref{figureC3.28}所示，其中“桌子”对齐“table”。可形象化的描述为$f(s_2 |t_(a_2 ))=f(桌子|table)$，表示给定“table”翻译为“桌子”的概率。
+我们用一个简单的例子对公式\ref{eqC3.30}进行说明。比如，在图\ref{fig:zh-en-bilingual-sentence-pairs}所示的实例中，``桌子''对齐到``table''，可被描述为$f(s_2 |t_{a_2})=f(\textrm{``桌子''}|\textrm{``table''})$，表示给定``table''翻译为``桌子''的概率。通常，$f(s_2 |t_{a_2})$被认为是一种概率词典，它反应了两种语言词汇一级的对应程度。
+
+\vspace{0.5em}
+\end{itemize}
+
 %----------------------------------------------
-% 图3.28
+% 图3.22
 \begin{figure}[htp]
    \centering
-\input{./Chapter3/Figures/figure328}
-    \caption{此处为图片的描述...}
-    \label{figureC3.28}
+\input{./Chapter3/Figures/figure-zh-en-bilingual-sentence-pairs}
+    \caption{汉译英双语句对及词对齐}
+    \label{fig:zh-en-bilingual-sentence-pairs}
 \end{figure}
 %---------------------------

-\noindent\hspace{2em}将上述三个假设和公式\ref{eqC3.22}代入公式\ref{eqC3.21}中，得到概率$\textrm{P}(s|t)$的表示式，如公式\ref{eqC3.31}所示。
-\begin{equation}
-\begin{split}
-{\textrm{P}(s|t)}&=\;{\sum_a{\textrm{P}(s,a|t)}}\\
-&=\;{\sum_a{\textrm{P}(m|t)}\prod_{j=1}^{m}{\textrm{P}(a_j|a_1^{j-1},s_1^{j-1},m,t)\textrm{P}(s_j |a_1^j,m,t)}}\\
-&=\;{\sum_a{\varepsilon}\prod_{j=1}^{m}{\frac{1}{l+1}f(s_j|t_{a_j})}}\\
-&=\;{\sum_a{\frac{\varepsilon}{(l+1)^m}}\prod_{j=1}^{m}f(s_j|t_{a_j})}
+\parinterval 将上述三个假设和公式\ref{eqC3.22}代入公式\ref{eqC3.21}中，得到$\textrm{P}(s|t)$的表达式：
+
+\begin{eqnarray}
+\textrm{P}(s|t) & = & \sum_a{\textrm{P}(s,a|t)} \nonumber \\
+                        & = & \sum_a{\textrm{P}(m|t)}\prod_{j=1}^{m}{\textrm{P}(a_j|a_1^{j-1},s_1^{j-1},m,t)\textrm{P}(s_j |a_1^j,m,t)} \nonumber \\
+                        & = & \sum_a{\varepsilon}\prod_{j=1}^{m}{\frac{1}{l+1}f(s_j|t_{a_j})} \nonumber \\
+                        & = &\sum_a{\frac{\varepsilon}{(l+1)^m}}\prod_{j=1}^{m}f(s_j|t_{a_j})
 \label{eqC3.31}
-\end{split}
-\end{equation}
+\end{eqnarray}

-\noindent\hspace{2em}在公式\ref{eqC3.31}中需要遍历所有的对齐，即$\sum_a{\bullet}$。但这种表示不够直观，因此我们把这个过程重新表示为公式\ref{eqC3.32}。
+\parinterval 在公式\ref{eqC3.31}中，我们需要遍历所有的词对齐，即$\sum_a{\cdot}$。但这种表示不够直观，因此可以把这个过程重新表示为如下形式：
 \begin{equation}
 \textrm{P}(s|t)={\sum_{a_1=0}^{l}\cdots}{\sum_{a_m=0}^{l}\frac{\varepsilon}{(l+1)^m}}{\prod_{j=1}^{m}f(s_j|t_{a_j})}
 \label{eqC3.32}
 \end{equation}

-\noindent\hspace{2em}我们可以把公式\ref{eqC3.32}分为两个部分进行理解和计算。第一部分：遍历所有的对齐$a$。其中$a$由$\{a_1,...,a_m\}$组成，每个$a_j\in \{a_1,...,a_m\}$从译文的开始位置$(0)$循环到截止位置$(l)$。如图\ref{figureC3.28}表示的例子，描述的是源语单词$s_3$从译文的开始$t_0$遍历到结尾$t_3$，即$a_3$。第二部分: 对于每个$a$累计对齐概率$\textrm{P}(s,a|t)$。
+\parinterval 公式\ref{eqC3.32}分为两个主要部分。第一部分：遍历所有的对齐$a$。其中$a$由$\{a_1,...,a_m\}$组成，每个$a_j\in \{a_1,...,a_m\}$从译文的开始位置$(0)$循环到截止位置$(l)$。如图\ref{fig:zh-en-bilingual-sentence-pairs}表示的例子，描述的是源语单词$s_3$从译文的开始$t_0$遍历到结尾$t_3$，即$a_3$的取值范围。第二部分: 对于每个$a$累加对齐概率$\textrm{P}(s,a|t)=\frac{\varepsilon}{(l+1)^m}{\prod_{j=1}^{m}f(s_j|t_{a_j})}$。
+
 %----------------------------------------------
 % 图3.29
 \begin{figure}[htp]
@@ -716,230 +731,353 @@ $m$ & $n$ & $n^m \cdot m!$ \\ \hline
 \end{figure}
 %---------------------------

-\noindent\hspace{2em}这样就得到了模型1中翻译概率的计算式。它的形式相比原始的计算式要简单许多。可以看出模型1的假设把翻译模型化简成了非常简单的形式。对于给定的$s$，$a$和$t$，只要知道$\varepsilon$和$t(s_j |t_(a_j ))$就可以计算出$\textrm{P}(s|t)$，进而求出$\textrm{P}(s|t)$。
+\parinterval 这样就得到了IBM模型1中句子翻译概率的计算式。可以看出IBM模型1的假设把翻译模型化简成了非常简单的形式。对于给定的$s$，$a$和$t$，只要知道$\varepsilon$和$t(s_j |t_(a_j ))$就可以计算出$\textrm{P}(s|t)$，进而求出$\textrm{P}(s|t)$。

 \subsection{IBM模型2}\index{Chapter3.4.2}

-\noindent\hspace{2em}IBM模型1中的假设大大化简了问题的难度，但是这些假设显然并不与实际相符。特别是模型1中假设词对齐服从均与分布，这显然存在问题。如图\ref{figureC3.28}，尽管译文$t$比$t'$的质量更好，但对于IBM模型1来说翻译概率相同。这是因为当词对齐服从均匀分布时，模型会忽略了翻译的调序问题。因此当单词翻译相同但顺序不同时，翻译概率一样。
+\parinterval IBM模型1很好的化简了问题，但是由于使用了很强的假设，导致模型和实际情况有较大差异。其中一个比较严重的问题是假设词对齐的生成概率服从均匀分布。图\ref{fig:different-translation-result-in-different-score-IBM1}展示了一个简单的实例，尽管译文$t$比$t'$的质量更好，但对于IBM模型1来说翻译概率相同。这是因为当词对齐服从均匀分布时，模型会忽略目标语言单词的位置信息。因此当单词翻译相同但顺序不同时，翻译概率一样。同时，不合理的对齐也会导致使用不合理的词汇翻译概率，因为源语言单词是由错误位置的目标语单词生成的。虽然这个问题可以通过引入目标语语言模型进行缓解，但是翻译模型仍然需要给出更合理的建模方式，以保证翻译译文的选择是正确的。
+
 %----------------------------------------------
-% 图3.30
+% 图3.24
 \begin{figure}[htp]
    \centering
-\input{./Chapter3/Figures/figure330}
-    \caption{此处为图片的描述...}
-    \label{figureC3.30}
+\input{./Chapter3/Figures/figure-different-translation-result-in-different-score-IBM1}
+    \caption{不同的译文导致不同IBM模型1得分的情况}
+    \label{fig:different-translation-result-in-different-score-IBM1}
 \end{figure}
 %---------------------------

-\noindent\hspace{2em}IBM模型2认为词对齐是有倾向性的，对齐至少要与源语单词的位置和目标语单词的位置有关。基于这种想法，模型2对模型1的词对齐假设进行了修改。它假设对齐对齐位置$a_j$的生成概率与语言单位位置$j$，源语句子长度$m$和译文长度$l$有关。形式化的描述见公式\ref{eqC3.33}。
+\parinterval 因此，IBM模型2抛弃了对对齐概率$\textrm{P}(a_j|a_1^{j-1},s_1^{j-1},m,t)$服从均匀分布的假设。在IBM模型2中，我们认为词对齐是有倾向性的，对齐至少要与源语单词的位置和目标语单词的位置有关。具体来说，对齐位置$a_j$的生成概率与语言单位位置$j$、源语句子长度$m$和译文长度$l$有关，形式化表述为：
+
 \begin{equation}
 \textrm{P}(a_j|a_1^{j-1},s_1^{j-1},m,t) \equiv a(a_j|j,m,l)
 \label{eqC3.33}
 \end{equation}

-\noindent\hspace{2em}我们用一个简单的例子来说明公式\ref{eqC3.33}。如图\ref{figureC3.31}所示，其中“桌子”对齐“table”。如果在模型1中，“桌子”对齐的译文中的$t_0$、“on”、“the”、和“table”的概率是一样的。但在模型2中可形式化的表示为$a(a_j |j,m,l)=a(3|2,3,3)$，意思是对于源文位置2（$j=2$）的词，如果它的源文是和目标语译文都是3个词（$l=m=3$），对齐到目标语译文位置3（$a_j=3$）的概率是多少。
-%----------------------------------------------
-% 图3.31
+\parinterval 我们还用图\ref{fig:different-translation-result-in-different-score-IBM1}中的例子来进行说明公式。如果在模型1中，``桌子''对齐到译文四个位置上的单词的概率是一样的。但在模型2中，``桌子''对齐到``table''被形式化为$a(a_j |j,m,l)=a(3|2,3,3)$，意思是对于源文位置2（$j=2$）的词，如果它的源文是和目标语译文都是3个词（$l=m=3$），对齐到目标语译文位置3（$a_j=3$）的概率是多少？因为$a(a_j|j,m,l)$也是模型需要学习的参数，因此``桌子''对齐到不同位置上的目标语单词概率也是不一样的。理想的情况下，通过$a(a_j|j,m,l)$，``桌子''对齐到``table''应该得到更高的概率。

-\begin{figure}[htp]
-    \centering
-\input{./Chapter3/Figures/figure331}
-   \caption{此处为图片的描述...}
-   \label{figureC3.31}
-\end{figure}
-%---------------------------
+\parinterval IBM模型2的其他假设均与模型1相同。把公式\ref{eqC3.28}、\ref{eqC3.29}和\ref{eqC3.33}重新带入公式\ref{eqC3.22}和\ref{eqC3.21}，可以得到IBM模型2的数学描述：

-\noindent\hspace{2em}IBM模型2的其他假设均与模型1相同，如公式\ref{eqC3.28}和公式\ref{eqC3.29}所示。把公式\ref{eqC3.28}、\ref{eqC3.29}和\ref{eqC3.33}代入得到完整的模型。如公式\ref{eqC3.34}所示。
-\begin{equation}
-\textrm{P}(s|t)=\;\sum_a{\textrm{P}(s,a|t)}=\sum_{a_1=0}^{l}{\cdots}\sum _{a_m=0}^{l}{\varepsilon}\prod_{j=1}^{m}{a(a_j|j,m,l)f(s_j|t_{a_j})} 
+\begin{eqnarray}
+\textrm{P}(s|t) & = & \sum_a{\textrm{P}(s,a|t)} \nonumber \\
+                       & = & \sum_{a_1=0}^{l}{\cdots}\sum _{a_m=0}^{l}{\varepsilon}\prod_{j=1}^{m}{a(a_j|j,m,l)f(s_j|t_{a_j})} 
 \label{eqC3.34}
-\end{equation}
+\end{eqnarray}
+
+\parinterval 类似于模型1，模型2的表达式\ref{eqC3.34}也能被拆分为两部分进行理解。第一部分：遍历所有的$a$；第二部分：对于每个$a$累加对齐概率$\textrm{P}(s,a|t)$，即计算对齐概率$a(a_j|j,m,l)$和词汇翻译概率$f(s_j|t_{a_j})$对于所有源语言位置的乘积。

-\noindent\hspace{2em}类似于模型1，模型2的表达式\ref{eqC3.31}也能拆分为两部分进行理解和计算。第一部分：遍历所有的$a$。第二部分：对于每个$a$累加对齐概率$\textrm{P}(s,a|t)$，即计算对齐概率和词汇翻译概率。
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\subsection{解码及计算优化}\index{Chapter3.4.3}

-\subsection{计算优化}\index{Chapter3.4.3}
+\parinterval 如果模型参数给定，我们可以使用IBM模型1-2对新的句子进行翻译。比如，我们可以使用\ref{sec:simple-decoding}节描述的解码方法搜索最优译文，或者使用自左向右解码 + 剪枝的方法。在搜索过程中，只需要通过公式\ref{eqC3.32}和\ref{eqC3.34}计算每个译文候选的IBM模型翻译概率。但是，公式\ref{eqC3.32}和\ref{eqC3.34}的高计算复杂度导致这些模型很难直接使用。以IBM模型1为例，这里把公式\ref{eqC3.32}重写为：

-\noindent\hspace{2em}暂时没有内容
 \begin{equation}
 \textrm{P}(s|t) = \frac{\epsilon}{(l+1)^{m}} \underbrace{\sum\limits_{a_1=0}^{l} ... \sum\limits_{a_m=0}^{l}}_{(l+1)^m\textrm{次循环}} \underbrace{\prod\limits_{j=1}^{m} f(s_j|t_{a_j})}_{m\textrm{次循环}}
 \label{eqC3.35}
 \end{equation}

-\noindent\hspace{2em}暂时没有内容
+\noindent可以看到，遍历所有的词对齐需要$(l+1)^m$次循环，遍历所有源语言位置累计$f(s_j|t_{a_j})$需要$m$次循环，因此这个模型的计算复杂度为$O((l+1)^m m)$。当$m$较大时，计算这样的模型几乎是不可能的。不过，经过仔细观察，可以发现还有更加有效的方法进行计算，如下：
+
+\begin{equation}
+\sum\limits_{a_1=0}^{l} ... \sum\limits_{a_m=0}^{l} \prod\limits_{j=1}^{m} f(s_j|t_{a_j}) = \prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} f(s_j|t_i)
+\label{eqC3.35new}
+\end{equation}
+
+\noindent 公式\ref{eqC3.35new}的技巧在于把若干个乘积的加法（等式左手端）转化为若干加法结果的乘积（等式右手端），这样省去了多次循环，把$O((l+1)^m m)$的计算复杂度降为$(l+1)m$。图\ref{fig:example-of-formula1.29}对这个过程进行了进一步解释。
+
 %----------------------------------------------
-% 图3.32
+% 图3.32-new
 \begin{figure}[htp]
    \centering
-\input{./Chapter3/Figures/figure332}
-   \caption{此处为图片的描述...}
-   \label{figureC3.32}
+\input{./Chapter3/Figures/figure-example-of-formula1.29}
+   \caption{$\sum\limits_{a_1=0}^{l} ... \sum\limits_{a_m=0}^{l} \prod\limits_{j=1}^{m} f(s_j|t_{a_j}) = \prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} f(s_j|t_i)$的实例}
+   \label{fig:example-of-formula1.29}
 \end{figure}
 %---------------------------

+\parinterval 接着，利用公式\ref{eqC3.35new}的方式，可以把公式\ref{eqC3.32}和\ref{eqC3.34}重写表示为：
+
+\begin{eqnarray}
+\textrm{IBM模型1：\ \ \ \ } \textrm{P}(s|t) & = & \frac{\epsilon}{(l+1)^{m}} \prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} f(s_j|t_i) \label{eq:final-model1} \\
+\textrm{IBM模型2：\ \ \ \ }\textrm{P}(s|t) & = & \epsilon \prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} a(i|j,m,l) f(s_j|t_i) \label{eq:final-model2}
+\label{eqC3.35new2}
+\end{eqnarray}
+
+公式\ref{eq:final-model1}和\ref{eq:final-model2}是IBM模型1-2的最终表达式，在解码和训练中可以被直接使用。
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \subsection{训练}\index{Chapter3.4.4}

-\noindent\hspace{2em}在前面的章节中我们完成的对翻译的建模，但如何求解出模型的参数的呢？这也是本节需要解决的问题。下面我们将目标函数、极大似然估计、最大化$\textrm{P}(s|t)$和对目标函数求导这四个部分逐步介绍参数的求解过程。
+\parinterval 在完成了建模和解码的基础上，剩下的问题是如何得到模型的参数。这也是整个统计机器翻译里最重要的内容。下面我们将会对IBM模型1-2的参数估计方法进行介绍。

-\noindent\textbf{目标函数}\index{Chapter3.4.4.1}
+\subsubsection{（一）目标函数}\index{Chapter3.4.4.1}
+
+\parinterval 统计机器翻译模型的训练是一个典型的优化问题。简单来说，训练是只在给定数据集（训练集）上调整参数使得目标函数的值达到最大（或最小），此时得到的参数被称为是该模型在该目标函数下的最优解（图\ref{figureC3.33}）。

-\noindent\hspace{2em}一般来说训练是在训练集上调整参数使得目标函数的值达到最大（最下），此时得到对的参数称为该模型在该目标函数下的最优解，如下图所示。
 %----------------------------------------------
-% 图3.32
+% 图3.33
 \begin{figure}[htp]
    \centering
 \input{./Chapter3/Figures/figure333}
-   \caption{此处为图片的描述...}
+   \caption{一个目标函数的优化结果示例}
   \label{figureC3.33}
 \end{figure}
 %---------------------------

-\noindent\hspace{2em}在IBM模型中，训练就是对于给定的句对$(s,t)$，最大化翻译概率$\textrm{P}(s|t)$。这里用符号$\textrm{P}_{\theta}(s|t)$表示概率由参数$\theta$决定，公式如下所示。
+\parinterval 在IBM模型中，优化的目标函数被定义为$\textrm{P}(s|t)$。也就是，对于给定的句对$(s,t)$，最大化翻译概率$\textrm{P}(s|t)$。这里用符号$\textrm{P}_{\theta}(s|t)$表示模型由参数$\theta$决定，模型训练可以被描述为对目标函数$\textrm{P}_{\theta}(s|t)$的优化过程：
+
 \begin{equation}
 \widehat{\theta}=\argmax_{\theta}\textrm{P}_{\theta}(s|t)
 \label{eqC3.36}
 \end{equation}

-\noindent\hspace{2em}上述公式可以分解为两个部分，如图所示。其中我们的目标函数是$\textrm{P}_{\theta}(s|t)$，而$argmax_{\theta}$表示求最优参数。
-%----------------------------------------------
-% 图3.34
-\begin{figure}[htp]
-    \centering
-\input{./Chapter3/Figures/figure334}
-   \caption{此处为图片的描述...}
-   \label{figureC3.34}
-\end{figure}
-%---------------------------
+\noindent其中，$\argmax_{\theta}$表示求最优参数的过程（或优化过程）。
+
+\parinterval 公式\ref{eqC3.36}实际上也是一种基于极大似然的模型训练方法。这里，可以把$\textrm{P}_{\theta}(s|t)$看作是模型对数据描述的一个似然函数，可以记做$\textrm{L}(s,t;\theta)$。也就是，我们的优化目标实际上是对似然函数的优化：$\widehat{\theta}=\argmax_{\theta \in \Theta}\textrm{L}(s,t;\theta)$，其中\{$\widehat{\theta}$\}表示可能有多个组的结果，$\Theta$表示参数空间。
+
+\parinterval 回到IBM模型的优化问题上。以IBM模型1为例，我们优化的目标是最大化翻译概率$\textrm{P}(s|t)$。使用公式\ref{eq:final-model1} ，可以把这个目标表述为：
+
+\begin{eqnarray}
+&                    & \textrm{max}\Big(\frac{\varepsilon}{(l+1)^m}\prod_{j=1}^{m}\sum_{i=0}^{l}{f({s_j|t_i})}\Big) \nonumber \\
+& \textrm{s.t.} & \textrm{任意单词} t_{y}:\;\sum_{s_x}{f(s_x|t_y)}=1 \nonumber%%%%%不显示公式序号
+\label{eqC3.37}
+\end{eqnarray}
+
+\noindent其中，$\textrm{max}(\cdot)$表示最大化，$\frac{\varepsilon}{(l+1)^m}\prod_{j=1}^{m}\sum_{i=0}^{l}{f({s_j|t_i})}$是目标函数，$f({s_j|t_i})$是模型的参数，$\sum_{s_x}{f(s_x|t_y)}=1$是优化的约束条件，保证翻译概率满足归一化的要求。需要注意的是$\{f(s_x |t_y)\}$对应了很多参数，每个源语言单词和每个目标语单词的组合都对应一个参数$f(s_x |t_y)$。
+
+%%%%%%%%%%%%%%%%%%%%%%%%

-\noindent\textbf{极大似然估计}\index{Chapter3.4.4.2}
+\subsubsection {（二）优化}\index{Chapter3.4.4.2}

-\noindent\hspace{2em}那如何求解上述目标函数的最优参数呢？这里我们用到了极大似然估计。所谓极大似然估计，就是要找到使似然函数达到最大的$\theta$。其中$\textrm{P}(s|t)$可以被看做是$(s,t)$上的似然函数，记做$\textrm{L}(s,t;\theta)$。上述公式可以改写为下述公式。其中$L(s,t;\theta)$\\表示$\textrm{L}(\bullet)$依赖模型参数$\theta$，\{$\widehat{\theta}$\}表示可能有多个组的结果，$\Theta$表示参数空间。
+\parinterval 我们已经把IBM模型的参数训练问题定义为带约束的目标函数优化问题。由于目标函数是可微分函数，解决这类问题的一种常用手法是把带约束的优化问题转化为不带约束的优化问题。这里用到了拉格朗日乘数法（The Lagrange Multiplier Method），它的基本思想是把含有$n$个变量和$m$个约束条件的优化问题转化为含有$n+m$个变量的无约束优化问题。

-\noindent\hspace{2em}那如何找到一组$\theta$使$\textrm{P}_{\theta}(s|t)$达到最大。求函数最大值问题。比如，我们可以对$\textrm{P}_{\theta}(s|t)$求导，令导数为零，得到极值点。
+\parinterval 这里，我们的目标是$\max(\textrm{P}_{\theta}(s|t))$，约束条件是对于任意的目标语单词$\forall{t_y}$有\\$\sum_{s_x}{\textrm{P}(s_x|t_y)}=1$。根据拉格朗日乘数法，可以把上述优化问题重新定义最大化如下拉格朗日函数：

-\noindent\textbf{最大化$\textrm{P}(s|t)$}\index{Chapter3.4.4.3}
+\begin{equation}
+L(f,\lambda)=\frac{\epsilon}{(l+1)^m}\prod_{j=1}^{m}\sum_{i=0}^{l}\prod_{j=1}^{m}{f(s_j|t_i)}-\sum_{u}{\lambda_{t_y}(\sum_{s_x}{f(s_x|t_y)}-1)}
+\label{eqC3.37}
+\end{equation}

-\noindent\hspace{2em}那如何找到一组$\theta$使$\textrm{P}_{\theta}(s|t)$达到最大呢？一般求函数的最大值问题的做法是：对目标函数进行求导，并令导数为零，得到极值点。如图所示。
+\parinterval $L(f,\lambda)$包含两部分，$\frac{\epsilon}{(l+1)^m}\prod_{j=1}^{m}\sum_{i=0}^{l}\prod_{j=1}^{m}{f(s_j|t_i)}$是原始的目标函数，\\$\sum_{u}{\lambda_{t_y}(\sum_{s_x}{f(s_x|t_y)}-1)}$是原始的约束条件乘以拉格朗日乘数$\lambda_{t_y}$，拉格朗日乘数的数量和约束条件的数量相同。图\ref{figureC3.35}通过图例说明了$L(f,\lambda)$各部分的意义。
 %----------------------------------------------
 % 图3.35
 \begin{figure}[htp]
    \centering
-\includegraphics[scale=1]{./Chapter3/Figures/figure1.jpg}
-    \caption{求极值点}
+\input{./Chapter3/Figures/figure338}
+   \caption{拉格朗日乘数法（IBM模型1）}
   \label{figureC3.35}
 \end{figure}
-%-------------------------------------------
+%---------------------------
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\noindent\hspace{2em}因为$L(f,\lambda)$是可微分函数，因此可以通过计算$L(f,\lambda)$的导数为零的点得到极值点。因为这个模型里仅有$f(s_x|t_y)$一种类型的参数，我们只需要如下导数进行计算
+
+\begin{eqnarray}
+\frac{\partial L(f,\lambda)}{\partial f(s_u|t_v)}& = & \frac{\partial \big[ \frac{\epsilon}{(l+1)^{m}} \prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} f(s_j|t_i) \big]}{\partial f(s_u|t_v)} - \nonumber \\
+                                                                     &     & \frac{\partial \big[ \sum_{t_y} \lambda_{t_y} (\sum_{s_x} f(s_x|t_y) -1) \big]}{\partial f(s_u|t_v)} \nonumber \\
+                                                                     & =  & \frac{\epsilon}{(l+1)^{m}} \cdot \frac{\partial \big[ \prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} f(s_j|t_{a_j}) \big]}{\partial f(s_u|t_v)} - \lambda_{t_v}
+\label{eqC3.38}
+\end{eqnarray}
+
+\noindent\hspace{2em}为了求$\frac{\partial \big[ \prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} f(s_j|t_i) \big]}{\partial f(s_u|t_v)}$，这里引入一个辅助函数。令$g(z)=\alpha z^{\beta}$为变量$z$的函数，显然，
+$\frac{\partial g(z)}{\partial z} = \alpha \beta z^{\beta-1} = \frac{\beta}{z}\alpha z^{\beta} = \frac{\beta}{z} g(z)$。这里可以把$\prod_{j=1}^{m} \sum_{i=0}^{l} f(s_j|t_i)$看做$g(z)=\alpha z^{\beta}$的实例。首先，令$z=\sum_{i=0}^{l}f(s_u|t_i)$，注意$s_u$为给定的源语单词。然后，把$\beta$定义为$\sum_{i=0}^{l}f(s_u|t_i)$在$\prod_{j=1}^{m} \sum_{i=0}^{l} f(s_j|t_i)$中出现的次数，即源语句子中与$s_u$相同的单词的个数。
+\begin{equation}
+\beta=\sum_{j=1}^{m} \delta(s_j,s_u)
+\label{eqC3.38}
+\end{equation}
+
+\noindent其中，当$x=y$时为$\delta(x,y)=1$，否则为0。

-\noindent\hspace{2em}以IBM模型1为例，我们基于极大似然估计可以把对应的模型训练问题描述为：
+\noindent\hspace{2em}根据$\frac{\partial g(z)}{\partial z} = \frac{\beta}{z} g(z)$，可以得到
+\begin{equation}
+\frac{\partial g(z)}{\partial z} = \frac{\partial \big[ \prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} f(s_j|t_i) \big]}{\partial \big[ \sum\limits_{i=0}^{l}f(s_u|t_i) \big]} = \frac{\sum\limits_{j=1}^{m} \delta(s_j,s_u)}{\sum\limits_{i=0}^{l}f(s_u|t_i)} \prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} f(s_j|t_i)
+\label{eqC3.39}
+\end{equation}
+
+\noindent\hspace{2em}根据$\frac{\partial g(z)}{\partial z}$和$\frac{\partial z}{\partial f}$计算的结果，可以得到
 \begin{equation}
 \begin{split}
-&{max(\frac{\varepsilon}{((l+1)^m}\prod_{j=1}^{m}\sum_{i=0}^{l}{f{s_j|t_i}})}\\
-&{s.t.\;for\;each\;word\;t_{y}:\;\sum_{s_x}{f(s_x|t_y)}}\\
-&{=1}
-\label{eqC3.37}
+{\frac{\partial \big[ \prod_{j=1}^{m} \sum_{i=0}^{l} f(s_j|t_i) \big]}{\partial f(s_u|t_v)}}& = {{\frac{\partial \big[ \prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} f(s_j|t_i) \big]}{\partial \big[ \sum\limits_{i=0}^{l}f(s_u|t_i) \big]}} \cdot{\frac{\partial \big[ \sum\limits_{i=0}^{l}f(s_u|t_i) \big]}{\partial f(s_u|t_v)}}}\\
+& = {\frac{\sum\limits_{j=1}^{m} \delta(s_j,s_u)}{\sum\limits_{i=0}^{l}f(s_u|t_i)} \prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} f(s_j|t_i) \cdot \sum\limits_{i=0}^{l} \delta(t_i,t_v)}\\
+\label{eqC3.39-40}
+\end{split}
+\end{equation}
+%把原来的图3.36替换成了公式
+
+\noindent\hspace{2em}将$\frac{\partial \big[ \prod_{j=1}^{m} \sum_{i=0}^{l} f(s_j|t_i) \big]}{\partial f(s_u|t_v)}$进一步代入$\frac{\partial L(f,\lambda)}{\partial f(s_u|t_v)}$，得到$L(f,\lambda)$的导数
+\begin{equation}
+\begin{split}
+&{\frac{\partial L(f,\lambda)}{\partial f(s_u|t_v)}}\\
+&={\frac{\epsilon}{(l+1)^{m}} \cdot \color{red}{\frac{\partial \big[ \prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} f(s_j|t_{a_j}) \big]}{\partial f(s_u|t_v)}} - \lambda_{t_v}}\\
+&={\frac{\epsilon}{(l+1)^{m}} \cdot \color{red}{\frac{\sum_{j=1}^{m} \delta(s_j,s_u) \cdot \sum_{i=0}^{l} \delta(t_i,t_v)}{\sum_{i=0}^{l}f(s_u|t_i)} \prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} f(s_j|t_i)} - \lambda_{t_v}}\\
+\label{eqC3.40}
+\end{split}
+\end{equation}
+
+\noindent\hspace{2em}令$\frac{\partial L(f,\lambda)}{\partial f(s_u|t_v)}=0$，有
+\begin{equation}
+\begin{split}
+f(s_u|t_v) = \frac{\lambda_{t_v}^{-1} \epsilon}{(l+1)^{m}} \cdot \frac{\sum\limits_{j=1}^{m} \delta(s_j,s_u) \cdot \sum\limits_{i=0}^{l} \delta(t_i,t_v)}{\sum\limits_{i=0}^{l}f(s_u|t_i)} \prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} f(s_j|t_i) \cdot f(s_u|t_v)
+\label{eqC3.41}
+\end{split}
+\end{equation}
+
+\noindent\hspace{2em} 将上式稍作调整得到下式，可以看出，这不是一个计算$f(s_u|t_v)$的解析式，因为等式右端仍含有$f(s_u|t_v)$。
+\begin{equation}
+\begin{split}
+f(s_u|t_v) = \lambda_{t_v}^{-1} \frac{\epsilon}{(l+1)^{m}} \prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} f(s_j|t_i) \sum\limits_{j=1}^{m} \delta(s_j,s_u) \sum\limits_{i=0}^{l} \delta(t_i,t_v) \frac{f(s_u|t_v) }{\sum\limits_{i=0}^{l}f(s_u|t_i)}
+\label{eqC3.42}
 \end{split}
 \end{equation}

-\noindent\hspace{2em}上述公式的含义解释如下。需要注意的是$\{f(s_x |t_y)\}$对应很多参数，每个源语言单词和每个目标语单词的组合都对应一个$f(s_x |t_y)$。
+\noindent\hspace{2em}  通过采用一个非常经典的期望最大化（Expectation Maximization）方法，简称EM方法（或算法），我们仍可以利用上式迭代地计算$f(s_u|t_v)$，使其最终收敛到最优值。该方法的思想是：用当前的参数，求一个似然函数的期望，之后最大化这个期望同时得到新的一组参数的值。对于IBM模型来说，其迭代过程就是反复使用公式1.39，具体如下图。
 %----------------------------------------------
-% 图3.36
+% 图3.28
 \begin{figure}[htp]
    \centering
-\input{./Chapter3/Figures/figure337}
-    \caption{公式图解}
-    \label{figureC3.36}
+\input{./Chapter3/Figures/figure-1}
+   \caption{IBM模型迭代过程示意图}
+   \label{figureC3.28}
 \end{figure}
-%-------------------------------------------
-
-\noindent\textbf{对目标函数求导}\index{Chapter3.4.4.4}
+%---------------------------

-\noindent\hspace{2em}如何解决带约束的目标函数优化问题呢？这里用到了拉格朗日乘数法。在百度百科中关于它的定义如下所示。简单地说，拉格朗日乘数法解决了把含有约束的优化问题转换为了不含约束的优化问题。
+\noindent\hspace{2em} 为了化简$f(s_u|t_v)$的计算，在此对公式\ref{eqC3.42}进行了重新组织，见下图。红色部分表示翻译概率P$(s|t)$；蓝色部分表示$(s_u,t_v)$在句对$(s,t)$中配对的总次数，即’$t_v$翻译为$s_u$‘在所有对齐中出现的次数；绿色部分表示$f(s_u|t_v)$对于所有的$t_i$的相对值，即’$t_v$翻译为$s_u$‘在所有对齐中出现的相对概率；蓝色与绿色部分相乘表示“$t_v$翻译为$s_u$”这个事件出现次数的期望的估计，称之为期望频次(expected count)。
 %----------------------------------------------
-% 定义3.1
-\begin{definition}[拉格朗日乘数法]
-百度百科
+% 图3.29
+\begin{figure}[htp]
+    \centering
+\input{./Chapter3/Figures/figure-2}
+   \caption{公式\ref{eqC3.42}的更详细解释}
+   \label{figureC3.29}
+\end{figure}
+%---------------------------

-{\color{red}拉格朗日乘数法}是一种寻找变量受一个或多个条件所限制的多元函数的极值的方法。这种方法将一个有$n$个变量与$k$个约束条件的最优化问题转换为一个有$n\sum k$个变量的方程组的极值问题，其变量不受任何约束。这种方法引入了一种新的标量未知数，即拉格朗日乘数
-\end{definition}
-%-------------------------------------------
+\noindent\hspace{2em} 更具体的，期望频次是事件在其分布下出现的次数的期望。其计算公式为：$c_{\mathbb{E}}(X)=\sum_i c(x_i) \cdot \textrm{P}(x_i)$。其中$c(x_i)$表示$x_i$出现的次数，P$(x_i)$表示$x_i$出现的概率。下表展示了事件X的期望频次的详细计算过程。其中$x_1,x_2,x_3$分别表示事件X出现2次，1次和5次的情况。
+\begin{table}[h]
+\centering
+\caption{期望频次的详细计算过程}
+\label{tab:calculation-of-the-expected-frequency}
+\subtable{
+      \begin{tabular}{cc}
+	\multicolumn{1}{c|}{$x_i$} & c($x_i$) \\ \hline
+	\multicolumn{1}{c|}{$x_1$} & 2        \\
+	\multicolumn{1}{c|}{$x_2$} & 1        \\
+	\multicolumn{1}{c|}{$x_3$} & 5        \\ \hline
+	\multicolumn{2}{c}{c(X)=8}           
+	\end{tabular}
+       \label{tab:firsttable}
+}
+\qquad
+\subtable{        
+       \begin{tabular}{cccc}
+	\multicolumn{1}{c|}{$x_i$} & c($x_i$) & P($x_i$) & $c(x_i)\cdot$P($x_i$) \\ \hline
+	\multicolumn{1}{c|}{$x_1$} & 2        & 0.1      & 0.2                   \\
+	\multicolumn{1}{c|}{$x_2$} & 1        & 0.3      & 0.3                   \\
+	\multicolumn{1}{c|}{$x_3$} & 5        & 0.2      & 1.0                   \\ \hline
+	\multicolumn{4}{c}{c(X)=0.2+0.3+1.0=1.5}                                
+	\end{tabular}
+       \label{tab:secondtable}
+}
+\end{table}

-\noindent\hspace{2em}这里我们的目标是$max(\textrm{P}_{\theta}(s|t))$，约束是对于任意的$\forall{t_y}$，都有$\sum_{s_x}{\textrm{P}(s_x|t_y)}=1$。根据拉格朗日乘数法，我们可以把上述优化问题重新定义如下。
+\noindent\hspace{2em} 因为在P$(s|t)$中，$t_v$翻译（连接）到$s_u$的期望频次为：
 \begin{equation}
 \begin{split}
-L(f,\lambda)=\frac{\epsilon}{(l+1)^m}\prod_{j=1}^{m}\sum_{i=0}^{l}\prod_{j=1}^{m}{f(s_j|t_i)}-\sum_{u}{\lambda_{t_y}(\sum_{s_x}{f(s_x|t_y)}-1)}
-\label{eqC3.37}
+c_{\mathbb{E}}(s_u|t_v;s,t) \equiv \sum\limits_{j=1}^{m} \delta(s_j,s_u) \sum\limits_{i=0}^{l} \delta(t_i,t_v) \cdot \frac {f(s_u|t_v)}{\sum\limits_{i=0}^{l}f(s_u|t_i)}
+\label{eqC3.43}
 \end{split}
 \end{equation}

-\noindent\hspace{2em}分析上式可得。我们可以看到$\lambda_(t_y)$是我们引入的拉格朗日乘数，它的数量和参数约束条件的数量相同。
-%----------------------------------------------
-% 图3.35
-\begin{figure}[htp]
-    \centering
-\input{./Chapter3/Figures/figure338}
-   \caption{拉格朗日乘数法}
-   \label{figureC3.35}
-\end{figure}
-%---------------------------
+\noindent\hspace{2em} 所以公式\ref {eqC3.42}可重写为：
+\begin{equation}
+\begin{split}
+f(s_u|t_v)=\lambda_{t_v}^{-1} \cdot \textrm{P}(s|t) \cdot c_{\mathbb{E}}(s_u|t_v;s,t)
+\label{eqC3.44}
+\end{split}
+\end{equation}

-\noindent\hspace{2em}现在我们需要对目标函数L进行求导。首先用$s_u$和$t_v$分别表示任意一个源语单词和一个目标语单词，再进行求导，得到如下公式。
+\noindent\hspace{2em} 在此如果令$\lambda_{t_v}^{'}=\frac{\lambda_{t_v}}{\textrm{P}(s|t)}$，可得：
 \begin{equation}
 \begin{split}
-{\frac{\partial L(f,\lambda)}{\partial f(s_u|t_v)}}&={\frac{\partial \big[ \frac{\epsilon}{(l+1)^{m}} \prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} f(s_j|t_i) \big]}{\partial f(s_u|t_v)}}\\
-&-{\frac{\partial \big[ \sum_{t_y} \lambda_{t_y} (\sum_{s_x} f(s_x|t_y) -1) \big]}{\partial f(s_u|t_v)}}\\
-&={\frac{\epsilon}{(l+1)^{m}} \cdot \frac{\partial \big[ \prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} f(s_j|t_{a_j}) \big]}{\partial f(s_u|t_v)} - \lambda_{t_v}}
-\label{eqC3.38}
+f(s_u|t_v) &=\lambda_{t_v}^{-1} \cdot \textrm{P}(s|t) \cdot c_{\mathbb{E}}(s_u|t_v;s,t) \\
+                &={(\lambda_{t_v}^{'})}^{-1} \cdot c_{\mathbb{E}}(s_u|t_v;s,t) 
+\label{eqC3.45}
 \end{split}
 \end{equation}

-\noindent\hspace{2em}为了求$\frac{\partial \big[ \prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} f(s_j|t_i) \big]}{\partial f(s_u|t_v)}$，先看一个简单的例子：$g(z)=\alpha z^{\beta}$为一个变量z的多项式函数，显然，
-$\frac{\partial g(z)}{\partial z} = \alpha \beta z^{\beta-1} = \frac{\beta}{z}\alpha z^{\beta} = \frac{\beta}{z} g(z)$
+\noindent\hspace{2em} 又因为IBM模型对$f(\cdot|\cdot)$的约束如下：
+\begin{equation}
+\begin{split}
+\forall t_y : \sum\limits_{s_x} f(s_x|t_y) =1
+\label{eqC3.46}
+\end{split}
+\end{equation}

-\noindent\hspace{2em}这里可以把$\prod_{j=1}^{m} \sum_{i=0}^{l} f(s_j|t_i)$看做$g(z)=\alpha z^{\beta}$的实例。首先令$z=\sum_{i=0}^{l}f(s_u|t_i)$，注意$s_u$为给定的源语单词。其次那么$\beta$为$\sum_{i=0}^{l}f(s_u|t_i)$在$\prod_{j=1}^{m} \sum_{i=0}^{l} f(s_j|t_i)$中出现的次数，即源语句子中与$s_u$相同的单词的个数。
+\noindent\hspace{2em} 为了满足$f(\cdot|\cdot)$的概率归一化约束，易知$\lambda_{t_v}^{'}$的计算为：
 \begin{equation}
-\beta=\sum_{j=1}^{m} \delta(s_j,s_u)
-\label{eqC3.38}
+\begin{split}
+\lambda_{t_v}^{'}=\sum\limits_{s_u} c_{\mathbb{E}}(s_u|t_v;s,t)
+\label{eqC3.47}
+\end{split}
 \end{equation}

-\noindent\hspace{2em}其中$\delta(x,y)$，当$x=y$时为1，否则为0。
+\noindent\hspace{2em} 因此，$f(s_u|t_v)$的计算式可再一步变换成下式：
+\begin{equation}
+\begin{split}
+f(s_u|t_v)=\frac{c_{\mathbb{E}}(s_u|t_v;s,t)}  { \sum\limits_{s_u} c_{\mathbb{E}}(s_u|t_v;s,t) }
+\label{eqC3.48}
+\end{split}
+\end{equation}

-\noindent\hspace{2em}根据$\frac{\partial g(z)}{\partial z} = \frac{\beta}{z} g(z)$。可以得到
+\noindent\hspace{2em} 总的来说，对于实际情况中我们拥有的N个互译的句对（称作平行语料）：
+${(s^{[1]},t^{[1]}),(s^{[2]},t^{[2]}),...,(s^{[N]},t^{[N]})}$来说，$f(s_u|t_v)$的期望频次为：
 \begin{equation}
-\frac{\partial g(z)}{\partial z} = \frac{\partial \big[ \prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} f(s_j|t_i) \big]}{\partial \big[ \sum\limits_{i=0}^{l}f(s_u|t_i) \big]} = \frac{\sum\limits_{j=1}^{m} \delta(s_j,s_u)}{\sum\limits_{i=0}^{l}f(s_u|t_i)} \prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} f(s_j|t_i)
-\label{eqC3.39}
+\begin{split}
+c_{\mathbb{E}}(s_u|t_v)=\sum\limits_{i=1}^{N}  c_{\mathbb{E}}(s_u|t_v;s^{[i]},t^{[i]})
+\label{eqC3.49}
+\end{split}
 \end{equation}

-\noindent\hspace{2em}即目标语译文单词中与$t_v$相同的个数。
+\noindent\hspace{2em}  于是有$f(s_u|t_v)$的计算公式和迭代过程如下：
+%----------------------------------------------
+% 图3.30
+\begin{figure}[htp]
+    \centering
+\input{./Chapter3/Figures/figure-3}
+   \caption{$f(s_u|t_v)$的计算公式和迭代过程}
+   \label{figureC3.30}
+\end{figure}
+%---------------------------

-\noindent\hspace{2em}根据$\frac{\partial g(z)}{\partial z}$和$\frac{\partial z}{\partial f}$计算的结果，可以得到
+\noindent\hspace{2em}  完整的EM算法如下图。其中E-Step对应4-5行，目的是计算$c_{\mathbb{E}}(\cdot)$；M-Step对应6-9行，目的是计算$f(\cdot)$。
 %----------------------------------------------
-% 图3.36
+% 图3.31
 \begin{figure}[htp]
    \centering
-\input{./Chapter3/Figures/figure336}
-   \caption{拉格朗日乘数法}
-   \label{figureC3.36}
+    \input{./Chapter3/Figures/figure-4}
+   \caption{EM算法流程图}
+   \label{figureC3.31}
 \end{figure}
 %---------------------------

-\noindent\hspace{2em}将$\frac{\partial \big[ \prod_{j=1}^{m} \sum_{i=0}^{l} f(s_j|t_i) \big]}{\partial f(s_u|t_v)}$进一步代入$\frac{\partial L(f,\lambda)}{\partial f(s_u|t_v)}$
+
+\noindent\hspace{2em}  同样的，EM算法可以直接用于训练IBM模型2。对于句对$(s,t)$，$m=|s|$，$l=|t|$，E-Step的计算公式如下，其中参数$f(s_j|t_i)$与IBM模型1一样：
 \begin{equation}
 \begin{split}
-&{\frac{\partial L(f,\lambda)}{\partial f(s_u|t_v)}}\\
-&={\frac{\epsilon}{(l+1)^{m}} \cdot \color{red}{\frac{\partial \big[ \prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} f(s_j|t_{a_j}) \big]}{\partial f(s_u|t_v)}} - \lambda_{t_v}}\\
-&={\frac{\epsilon}{(l+1)^{m}} \cdot \color{red}{\frac{\sum_{j=1}^{m} \delta(s_j,s_u) \cdot \sum_{i=0}^{l} \delta(t_i,t_v)}{\sum_{i=0}^{l}f(s_u|t_i)} \prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} f(s_j|t_i)} - \lambda_{t_v}}\\
-\label{eqC3.40}
+c_{\mathbb{E}}(s_u|t_v;s,t) &=\sum\limits_{j=1}^{m} \sum\limits_{i=0}^{l} \frac{f(s_u|t_v)a(i|j,m,l) \delta(s_j,s_u)\delta (t_i,t_v) }   {\sum_{k=0}^{l} f(s_u|t_v)a(k|j,m,l)} \\
+c_{\mathbb{E}}(i|j,m,l;s,t) &=\frac{f(s_j|t_i)a(i|j,m,l)}   {\sum_{k=0}^{l} f(s_j|t_k)a(k,j,m,l)}
+\label{eqC3.50}
 \end{split}
 \end{equation}

-\noindent\hspace{2em}令$\frac{\partial L(f,\lambda)}{\partial f(s_u|t_v)}=0$，有
+\noindent\hspace{2em}  M-Step的计算公式如下，其中参数$a(i|j,m,l)$表示调序概率：
 \begin{equation}
 \begin{split}
-f(s_u|t_v) = \frac{\lambda_{t_v}^{-1} \epsilon}{(l+1)^{m}} \cdot \frac{\sum\limits_{j=1}^{m} \delta(s_j,s_u) \cdot \sum\limits_{i=0}^{l} \delta(t_i,t_v)}{\sum\limits_{i=0}^{l}f(s_u|t_i)} \prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} f(s_j|t_i) \cdot f(s_u|t_v)
-\label{eqC3.41}
+f(s_u|t_v) &=\frac{\sum_{k=0}^{K}c_{\mathbb{E}}(s_u|t_v;s^{[k]},t^{[k]}) }    {\sum_{s_u} \sum_{k=0}^{K} c_{\mathbb{E}}(s_u|t_v;s^{[k]},t^{[k]})} \\
+a(i|j,m,l) &=\frac{\sum_{k=0}^{K}c_{\mathbb{E}}(i|j;s^{[k]},t^{[k]})}  {\sum_{i}\sum_{k=0}^{K}c_{\mathbb{E}}(i|j;s^{[k]},t^{[k]})}
+\label{eqC3.51}
 \end{split}
 \end{equation}
-
-\noindent\hspace{2em}(二)求$f(s_u |t_v)$的最优解
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%最新一版改到这里
 \section{IBM模型及隐马尔可夫模型}\index{Chapter3.5}
 \subsection{基本翻译模型}\index{Chapter3.5.1}

@@ -991,7 +1129,7 @@ f(s_u|t_v) = \frac{\lambda_{t_v}^{-1} \epsilon}{(l+1)^{m}} \cdot \frac{\sum\limi
 \end{split}
 \end{equation}

-\noindent\hspace{2em}如图\ref{{figureC3.5.3}}所示，我们可以把公式\ref{eqC3.5.2}分为5个部分，并用不同的序号和颜色进行标注。下面我们介绍每一部分的含义。
+\noindent\hspace{2em}如图\ref{figureC3.5.3}所示，我们可以把公式\ref{eqC3.5.2}分为5个部分，并用不同的序号和颜色进行标注。下面我们介绍每一部分的含义。

 \noindent\hspace{2em}第一、对于每个$j\in[1,l]$的目标语单词的繁衍率建模。即$\varphi_j$的概率。它依赖于$t$和区间$[1,j-1]$的目标语单词的繁衍率$\varphi_1^{j-1}$。


--- a/Book/Chapter3/Chapter3/Chapter3.tex
+++ b/Book/Chapter3/Chapter3/Chapter3.tex
--- a/Book/Chapter3/Chapter3/Figures/figure-1.tex
+++ b/Book/Chapter3/Chapter3/Figures/figure-1.tex
+
+
+%%% outline
+%-------------------------------------------------------------------------
+
+
+\begin{tikzpicture}
+
+\node [anchor=west,inner sep=2pt,fill=red!20,minimum height=3em] (eq1) at (0,0) {$f(s_u|t_v)$};
+\node [anchor=west,inner sep=2pt] (eq2) at ([xshift=-2pt]eq1.east) {$=$};
+\node [anchor=west,inner sep=2pt] (eq3) at ([xshift=-2pt]eq2.east) {$\lambda_{t_v}^{-1}$};
+\node [anchor=west,inner sep=2pt] (eq4) at ([xshift=-2pt]eq3.east) {$\frac{\epsilon}{(l+1)^{m}}$};
+\node [anchor=west,inner sep=2pt,fill=red!20,minimum height=3em] (eq5) at ([xshift=-2pt]eq4.east) {\footnotesize{$\prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} f(s_j|t_i)$}};
+\node [anchor=west,inner sep=2pt] (eq6) at ([xshift=-2pt]eq5.east) {\footnotesize{$\sum\limits_{j=1}^{m} \delta(s_j,s_u) \sum\limits_{i=0}^{l} \delta(t_i,t_v)$}};
+\node [anchor=west,inner sep=2pt,fill=red!20,minimum height=3em] (eq7) at ([xshift=-2pt,yshift=-0pt]eq6.east) {$\frac{f(s_u|t_v)}{\sum_{i=0}^{l}f(s_u|t_i)}$};
+
+\node [anchor=south west,inner sep=2pt] (label1) at ([yshift=1em]eq1.north west) {\footnotesize{\textbf{新的参数值}}};
+\node [anchor=south east,inner sep=2pt] (label2) at ([yshift=1em,xshift=-5em]eq7.north east) {\footnotesize{\textbf{旧的参数值}}};
+
+
+\draw [<-,thick] (label1.south) .. controls +(south:1em) and +(north:1em) .. ([xshift=-1em]eq1.north);
+\draw [<-,thick] (label2.south) .. controls +(300:1em) and +(north:1em) .. ([xshift=1em]eq7.north);
+\draw [<-,thick] ([xshift=-0.5em]label2.south) .. controls +(240:1em) and +(north:1em) .. ([xshift=1em]eq5.north);
+
+
+
+\end{tikzpicture}
+
+
+
+
+%---------------------------------------------------------------------
+
+
--- a/Book/Chapter3/Chapter3/Figures/figure-2.tex
+++ b/Book/Chapter3/Chapter3/Figures/figure-2.tex
+
+
+%%% outline
+%-------------------------------------------------------------------------
+
+
+
+
+\begin{tikzpicture}
+
+\node [anchor=west,inner sep=2pt,minimum height=2em] (eq1) at (0,0) {$f(s_u|t_v)$};
+\node [anchor=west,inner sep=2pt] (eq2) at ([xshift=-2pt]eq1.east) {$=$};
+\node [anchor=west,inner sep=2pt,minimum height=2em] (eq3) at ([xshift=-2pt]eq2.east) {$\lambda_{t_v}^{-1}$};
+\node [anchor=west,inner sep=2pt,minimum height=3.0em] (eq4) at ([xshift=-3pt]eq3.east) {\footnotesize{$\frac{\epsilon}{(l+1)^{m}} \prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} f(s_j|t_i)$}};
+\node [anchor=west,inner sep=2pt,minimum height=3.0em] (eq5) at ([xshift=1pt]eq4.east) {\footnotesize{$\sum\limits_{j=1}^{m} \delta(s_j,s_u) \sum\limits_{i=0}^{l} \delta(t_i,t_v)$}};
+\node [anchor=west,inner sep=2pt,minimum height=3.0em] (eq6) at ([xshift=1pt]eq5.east) {$\frac{f(s_u|t_v)}{\sum_{i=0}^{l}f(s_u|t_i)}$};
+
+
+{
+\node [anchor=west,inner sep=2pt,fill=red!20,minimum height=3.0em] (eq4) at ([xshift=-3pt]eq3.east) {\footnotesize{$\frac{\epsilon}{(l+1)^{m}} \prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} f(s_j|t_i)$}};
+}
+{
+\node [anchor=west,inner sep=2pt,fill=blue!20,minimum height=3.0em] (eq5) at ([xshift=1pt]eq4.east) {\footnotesize{$\sum\limits_{j=1}^{m} \delta(s_j,s_u) \sum\limits_{i=0}^{l} \delta(t_i,t_v)$}};
+}
+{
+\node [anchor=west,inner sep=2pt,fill=green!20,minimum height=3.0em] (eq6) at ([xshift=1pt]eq5.east) {$\frac{f(s_u|t_v)}{\sum_{i=0}^{l}f(s_u|t_i)}$};
+}
+
+{
+\node [anchor=south west,inner sep=2pt] (label1) at (eq4.north west) {\textbf{\scriptsize{翻译概率$\textrm{P}(s|t)$}}};
+}
+{
+\node [anchor=south west,inner sep=2pt] (label2) at (eq5.north west) {\textbf{\scriptsize{配对的总次数}}};
+\node [anchor=south west,inner sep=2pt] (label2part2) at ([yshift=-3pt]label2.north west) {\textbf{\scriptsize{$(s_u,t_v)$在句对$(s,t)$中}}};
+}
+{
+\node [anchor=south west,inner sep=2pt] (label3) at (eq6.north west) {\textbf{\scriptsize{有的$t_i$的相对值}}};
+\node [anchor=south west,inner sep=2pt] (label4) at ([yshift=-3pt]label3.north west) {\textbf{\scriptsize{$f(s_u|t_v)$对于所}}};
+}
+
+{
+\node [anchor=east,rotate=90] (neweq1) at ([yshift=-0em]eq4.south) {=};
+\node [anchor=north,inner sep=1pt] (neweq1full) at (neweq1.west) {\large{$\textrm{P}(s|t)$}};
+}
+
+{
+\draw[decorate,thick,decoration={brace,amplitude=5pt,mirror}] ([yshift=-0.2em]eq5.south west) -- ([yshift=-0.2em]eq6.south east) node [pos=0.4,below,xshift=-0.0em,yshift=-0.3em] (expcount1) {\footnotesize{\textbf{'$t_v$翻译为$s_u$'这个事件}}};
+\node [anchor=north west] (expcount2) at ([yshift=0.5em]expcount1.south west) {\footnotesize{\textbf{出现次数的期望的估计}}};
+\node [anchor=north west] (expcount3) at ([yshift=0.5em]expcount2.south west) {\footnotesize{\textbf{称之为期望频次expected count}}};
+}
+
+\end{tikzpicture}
+
+
+
+
--- a/Book/Chapter3/Chapter3/Figures/figure-3.tex
+++ b/Book/Chapter3/Chapter3/Figures/figure-3.tex
+
+%%% outline
+%-------------------------------------------------------------------------
+
+
+
+
+
+    \begin{tikzpicture}
+    \node [anchor=west,inner sep=2pt] (eq1) at (0,0) {$f(s_u|t_v)$};
+    \node [anchor=west] (eq2) at (eq1.east) {$=$\ };
+    \draw [-] ([xshift=0.3em]eq2.east) -- ([xshift=11.6em]eq2.east);
+    \node [anchor=south west] (eq3) at ([xshift=1em]eq2.east) {$\sum_{i=1}^{N} c_{\mathbb{E}}(s_u|t_v;s^{[i]},t^{[i]})$};
+    \node [anchor=north west] (eq4) at (eq2.east) {$\sum_{s_u} \sum_{i=1}^{N} c_{\mathbb{E}}(s_u|t_v;s^{[i]},t^{[i]})$};
+
+   {
+    \node [anchor=south] (label1) at ([yshift=-6em,xshift=3em]eq1.north west) {利用这个公式计算};
+    \node [anchor=north west] (label1part2) at ([yshift=0.3em]label1.south west) {新的$f(s_u|t_v)$值};
+    }
+    {
+    \node [anchor=west] (label2) at ([xshift=5em]label1.east) {用当前的$f(s_u|t_v)$};
+    \node [anchor=north west] (label2part2) at ([yshift=0.3em]label2.south west) {计算期望频次$c_{\mathbb{E}}(\cdot)$};
+    }
+
+    {
+    \node [anchor=west,fill=red!20,inner sep=2pt] (eq1) at (0,0) {$f(s_u|t_v)$};
+    }
+
+    \begin{pgfonlayer}{background}
+    {
+    \node[rectangle,fill=blue!20,inner sep=0] [fit = (eq3) (eq4)] (c) {};
+    }
+   {
+    \node[rectangle,draw,red,inner sep=0] [fit = (label1) (label1part2)] (flabel) {};
+    }
+    {
+    \node[rectangle,draw,ublue,inner sep=0] [fit = (label2) (label2part2)] (clabel) {};
+    }
+    \end{pgfonlayer}
+
+    {
+    \draw [->,thick] (eq1.south) ..controls +(south:1.5em) and +(north:1.5em).. (flabel.north);
+    }
+    {
+    \draw [->,thick] (c.south) ..controls +(south:1.0em) and +(north:1.0em).. (clabel.north);
+    }
+
+    {
+    \draw [->,thick] ([yshift=1em]flabel.east) -- ([yshift=1em]clabel.west);
+    \draw [<-,thick] ([yshift=-1em]flabel.east) -- ([yshift=-1em]clabel.west) node [pos=0.5,above,yshift=0.3em] {\footnotesize{\textbf{反复执行}}};
+    }
+    \end{tikzpicture}
+
+
+
+
--- a/Book/Chapter3/Chapter3/Figures/figure-4.tex
+++ b/Book/Chapter3/Chapter3/Figures/figure-4.tex
+
+\definecolor{ublue}{rgb}{0.152,0.250,0.545}
+\definecolor{ugreen}{rgb}{0,0.5,0}
+
+
+%%% outline
+%-------------------------------------------------------------------------
+\begin{tikzpicture}
+\node [anchor=north west] (line1) at (0,0) {\textbf{IBM模型1的训练（EM算法）}};
+\node [anchor=north west] (line2) at ([yshift=-0.3em]line1.south west) {输入: 平行语料${(s^{[1]},t^{[1]}),...,(s^{[N]},t^{[N]})}$};
+\node [anchor=north west] (line3) at ([yshift=-0.1em]line2.south west) {输出: 参数$f(\cdot|\cdot)$的最优值};
+\node [anchor=north west] (line4) at ([yshift=-0.1em]line3.south west) {1: \textbf{Function} \textsc{TrainItWithEM}($\{(s^{[1]},t^{[1]}),...,(s^{[N]},t^{[N]})\}$) };
+\node [anchor=north west] (line5) at ([yshift=-0.1em]line4.south west) {2: \ \ Initialize $f(\cdot|\cdot)$ \hspace{5em} $\rhd$ 比如给$f(\cdot|\cdot)$一个均匀分布};
+\node [anchor=north west] (line6) at ([yshift=-0.1em]line5.south west) {3: \ \ Loop until $f(\cdot|\cdot)$ converges};
+\node [anchor=north west] (line7) at ([yshift=-0.1em]line6.south west) {4: \ \ \ \ \textbf{foreach} $k = 1$ to $N$ \textbf{do}};
+\node [anchor=north west] (line8) at ([yshift=-0.1em]line7.south west) {5: \ \ \ \ \ \ \ \footnotesize{$c_{\mathbb{E}}(s_u|t_v;s^{[k]},t^{[k]}) = \sum\limits_{j=1}^{|s^{[k]}|} \delta(s_j,s_u) \sum\limits_{i=0}^{|t^{[k]}|} \delta(t_i,t_v) \cdot \frac{f(s_u|t_v)}{\sum_{i=0}^{l}f(s_u|t_i)}$}\normalsize{}};
+\node [anchor=north west] (line9) at ([yshift=-0.1em]line8.south west) {6: \ \ \ \ \textbf{foreach} $t_v$ appears at least one of $\{t^{[1]},...,t^{[N]}\}$ \textbf{do}};
+\node [anchor=north west] (line10) at ([yshift=-0.1em]line9.south west) {7: \ \ \ \ \ \ \ $\lambda_{t_v}^{'} = \sum_{s_u} \sum_{k=1}^{N} c_{\mathbb{E}}(s_u|t_v;s^{[k]},t^{[k]})$};
+\node [anchor=north west] (line11) at ([yshift=-0.1em]line10.south west) {8: \ \ \ \ \ \ \ \textbf{foreach} $s_u$ appears at least one of $\{s^{[1]},...,s^{[N]}\}$ \textbf{do}};
+\node [anchor=north west] (line12) at ([yshift=-0.1em]line11.south west) {9: \ \ \ \ \ \ \ \ \ $f(s_u|t_v) = \sum_{k=1}^{N} c_{\mathbb{E}}(s_u|t_v;s^{[k]},t^{[k]}) \cdot (\lambda_{t_v}^{'})^{-1}$};
+\node [anchor=north west] (line13) at ([yshift=-0.1em]line12.south west) {10: \ \textbf{return} $f(\cdot|\cdot)$};
+
+\begin{pgfonlayer}{background}
+{
+\node[rectangle,draw=ublue, inner sep=0mm] [fit =(line1)(line2)(line3)(line4)(line5)(line6)(line7)(line11)(line8)(line9)(line13)] {};
+}
+\end{pgfonlayer}
+
+\end{tikzpicture}
+
+
+
--- a/Book/Chapter3/Chapter3/Figures/figure-human-translation.tex
+++ b/Book/Chapter3/Chapter3/Figures/figure-human-translation.tex
+
+%%% outline
+%-------------------------------------------------------------------------
+\definecolor{ublue}{rgb}{0.152,0.250,0.545}
+\begin{tikzpicture}
+
+\begin{scope}
+\node [anchor=west] (s1) at (0,0) {\textbf{我}};
+\node [anchor=west] (s2) at ([xshift=3em]s1.east) {\textbf{对}};
+\node [anchor=west] (s3) at ([xshift=3em]s2.east) {\textbf{你}};
+\node [anchor=west] (s4) at ([xshift=3em]s3.east) {\textbf{表示ʾ}};
+\node [anchor=west] (s5) at ([xshift=3em]s4.east) {\textbf{满意}};
+\node [anchor=south west] (sentlabel) at ([yshift=-0.5em]s1.north west) {\scriptsize{\textbf{\color{red}{待翻译句子(已经分词):}}}};
+
+{
+\draw [->,very thick,ublue] (s1.south) -- ([yshift=-0.7em]s1.south);
+\draw [->,very thick,ublue] (s2.south) -- ([yshift=-0.7em]s2.south);
+\draw [->,very thick,ublue] (s3.south) -- ([yshift=-0.7em]s3.south);
+\draw [->,very thick,ublue] (s4.south) -- ([yshift=-0.7em]s4.south);
+\draw [->,very thick,ublue] (s5.south) -- ([yshift=-0.7em]s5.south);
+
+{\small
+\node [anchor=north,inner sep=2pt,fill=red!20,minimum height=1.5em,minimum width=2.5em] (t11) at ([yshift=-1em]s1.south) {I};
+\node [anchor=north,inner sep=2pt,fill=red!20,minimum height=1.5em,minimum width=2.5em] (t12) at ([yshift=-0.8em]t11.south) {me};
+\node [anchor=north,inner sep=2pt,fill=red!20,minimum height=1.5em,minimum width=2.5em] (t13) at ([yshift=-0.8em]t12.south) {I'm};
+\node [anchor=north west,inner sep=1pt,fill=black] (tl11) at (t11.north west) {\tiny{{\color{white} \textbf{1}}}};
+\node [anchor=north west,inner sep=1pt,fill=black] (tl12) at (t12.north west) {\tiny{{\color{white} \textbf{1}}}};
+\node [anchor=north west,inner sep=1pt,fill=black] (tl13) at (t13.north west) {\tiny{{\color{white} \textbf{1}}}};
+
+\node [anchor=north,inner sep=2pt,fill=green!20,minimum height=1.5em,minimum width=2.5em] (t21) at ([yshift=-1em]s2.south) {to};
+\node [anchor=north,inner sep=2pt,fill=green!20,minimum height=1.5em,minimum width=2.5em] (t22) at ([yshift=-0.8em]t21.south) {with};
+\node [anchor=north,inner sep=2pt,fill=green!20,minimum height=1.5em,minimum width=2.5em] (t23) at ([yshift=-0.8em]t22.south) {for};
+\node [anchor=north west,inner sep=1pt,fill=black] (tl21) at (t21.north west) {\tiny{{\color{white} \textbf{2}}}};
+\node [anchor=north west,inner sep=1pt,fill=black] (tl22) at (t22.north west) {\tiny{{\color{white} \textbf{2}}}};
+\node [anchor=north west,inner sep=1pt,fill=black] (tl23) at (t23.north west) {\tiny{{\color{white} \textbf{2}}}};
+
+\node [anchor=north,inner sep=2pt,fill=blue!20,minimum height=1.5em,minimum width=2.5em] (t31) at ([yshift=-1em]s3.south) {you};
+\node [anchor=north west,inner sep=1pt,fill=black] (tl31) at (t31.north west) {\tiny{{\color{white} \textbf{3}}}};
+
+\node [anchor=north,inner sep=2pt,fill=orange!20,minimum height=1.5em,minimum width=3em] (t41) at ([yshift=-1em]s4.south) {$\phi$};
+\node [anchor=north,inner sep=2pt,fill=orange!20,minimum height=1.5em,minimum width=3em] (t42) at ([yshift=-0.8em]t41.south) {show};
+\node [anchor=north west,inner sep=1pt,fill=black] (tl41) at (t41.north west) {\tiny{{\color{white} \textbf{4}}}};
+\node [anchor=north west,inner sep=1pt,fill=black] (tl42) at (t42.north west) {\tiny{{\color{white} \textbf{4}}}};
+
+\node [anchor=north,inner sep=2pt,fill=purple!20,minimum height=1.5em,minimum width=4.5em] (t51) at ([yshift=-1em]s5.south) {satisfy};
+\node [anchor=north,inner sep=2pt,fill=purple!20,minimum height=1.5em,minimum width=4.5em] (t52) at ([yshift=-0.8em]t51.south) {satisfied};
+\node [anchor=north,inner sep=2pt,fill=purple!20,minimum height=1.5em,minimum width=4.5em] (t53) at ([yshift=-0.8em]t52.south) {satisfies};
+\node [anchor=north west,inner sep=1pt,fill=black] (tl51) at (t51.north west) {\tiny{{\color{white} \textbf{5}}}};
+\node [anchor=north west,inner sep=1pt,fill=black] (tl52) at (t52.north west) {\tiny{{\color{white} \textbf{5}}}};
+\node [anchor=north west,inner sep=1pt,fill=black] (tl53) at (t53.north west) {\tiny{{\color{white} \textbf{5}}}};
+}
+}
+\end{scope}
+
+\begin{scope}
+{\small
+
+{
+\node [anchor=west,inner sep=2pt,fill=red!20,minimum height=1.5em,minimum width=2.5em] (ft11) at ([yshift=-1.2in]t11.west) {I'm};
+\node [anchor=center,inner sep=2pt,fill=purple!20,minimum height=1.5em,minimum width=5em] (ft12) at ([xshift=6.0em]ft11.center) {satisfied};
+\node [anchor=center,inner sep=2pt,fill=green!20,minimum height=1.5em,minimum width=2.5em] (ft13) at ([xshift=6.0em]ft12.center) {with};
+\node [anchor=center,inner sep=2pt,fill=blue!20,minimum height=1.5em,minimum width=2.5em] (ft14) at ([xshift=5.0em]ft13.center) {you};
+}
+
+{
+\node [anchor=north west,inner sep=1pt,fill=black] (ftl11) at (ft11.north west) {\tiny{{\color{white} \textbf{1}}}};
+\node [anchor=north west,inner sep=1pt,fill=black] (ftl12) at (ft12.north west) {\tiny{{\color{white} \textbf{5}}}};
+\node [anchor=north west,inner sep=1pt,fill=black] (ftl13) at (ft13.north west) {\tiny{{\color{white} \textbf{2}}}};
+\node [anchor=north west,inner sep=1pt,fill=black] (ftl14) at (ft14.north west) {\tiny{{\color{white} \textbf{3}}}};
+}
+
+{
+\draw [->,thick] ([yshift=-0.1em]t13.south) -- ([yshift=0.1em]ft11.north);
+\draw [->,thick] ([yshift=0.1em]t22.south east) ..controls +(280:3em) and +(north:3em).. ([yshift=0.1em]ft13.north);
+\draw [->,thick] ([yshift=-0.1em,xshift=0.2em]t31.south west) ..controls +(south:3em) and +(north:3em).. ([yshift=0.1em,xshift=0.2em]ft14.north west);
+\draw [->,thick] ([yshift=0.1em]t52.south west) ..controls +(250:4em) and +(north:4em).. ([yshift=0.1em]ft12.north);
+
+\node [anchor=east,inner sep=1pt] (nulltranslabel) at (t42.south west) {\scriptsize{\textbf{翻空}}};
+\draw [->,thick] ([yshift=0.1em]t41.south west) ..controls +(250:1em) and +(north:1em).. (nulltranslabel.north);
+}
+}
+\end{scope}
+
+\begin{scope}
+{
+\node [anchor=north west] (label1) at (ft11.south west) {\small{选择最佳单词翻译，调整词序，得到完美的结果}};
+}
+{
+\draw[decorate,thick,decoration={brace,amplitude=5pt,mirror}] ([yshift=8em,xshift=-0.5em]t13.south west) -- ([xshift=-0.5em]t13.south west) node [pos=0.5,left,xshift=-0.5em,yshift=0.5em] (label2) {\footnotesize{\textbf{学习到的}}};
+\node [anchor=north west] (label2part2) at ([yshift=0.3em]label2.south west) {\footnotesize{\textbf{单词翻译}}};
+}
+{
+\draw[decorate,thick,decoration={brace,amplitude=5pt,mirror}] ([yshift=-0.2em,xshift=-0.5em]t13.south west) -- ([yshift=-5em,xshift=-0.5em]t13.south west) node [pos=0.5,left,xshift=-0.5em,yshift=0.5em] (label3) {\footnotesize{\textbf{运用知识}}};
+\node [anchor=north west] (label3part2) at ([yshift=0.3em]label3.south west) {\footnotesize{\textbf{生成译文}}};
+}
+\end{scope}
+
+\end{tikzpicture}
+%---------------------------------------------------------------------
+
+
+
--- a/Book/Chapter3/Chapter3/Figures/figure-process-of-machine-translation.tex
+++ b/Book/Chapter3/Chapter3/Figures/figure-process-of-machine-translation.tex
+\begin{tikzpicture}
+
+\begin{scope}
+
+\node [anchor=west] (s1) at (0,0) {\textbf{我}};
+\node [anchor=west] (s2) at ([xshift=3em]s1.east) {\textbf{对}};
+\node [anchor=west] (s3) at ([xshift=3em]s2.east) {\textbf{你}};
+\node [anchor=west] (s4) at ([xshift=3em]s3.east) {\textbf{表示}};
+\node [anchor=west] (s5) at ([xshift=3em]s4.east) {\textbf{满意}};
+
+\node [anchor=south west] (sentlabel) at ([yshift=-0.5em]s1.north west) {\scriptsize{\textbf{\color{red}{待翻译句子(已经分词):}}}};
+
+\draw [->,very thick,ublue] (s1.south) -- ([yshift=-0.7em]s1.south);
+\draw [->,very thick,ublue] (s2.south) -- ([yshift=-0.7em]s2.south);
+\draw [->,very thick,ublue] (s3.south) -- ([yshift=-0.7em]s3.south);
+\draw [->,very thick,ublue] (s4.south) -- ([yshift=-0.7em]s4.south);
+\draw [->,very thick,ublue] (s5.south) -- ([yshift=-0.7em]s5.south);
+
+{\small
+\node [anchor=north,inner sep=2pt,fill=red!20,minimum height=1.5em,minimum width=2.5em] (t11) at ([yshift=-1em]s1.south) {I};
+\node [anchor=north,inner sep=2pt,fill=red!20,minimum height=1.5em,minimum width=2.5em] (t12) at ([yshift=-0.8em]t11.south) {me};
+\node [anchor=north,inner sep=2pt,fill=red!20,minimum height=1.5em,minimum width=2.5em] (t13) at ([yshift=-0.8em]t12.south) {I'm};
+\node [anchor=north west,inner sep=1pt,fill=black] (tl11) at (t11.north west) {\tiny{{\color{white} \textbf{1}}}};
+\node [anchor=north west,inner sep=1pt,fill=black] (tl12) at (t12.north west) {\tiny{{\color{white} \textbf{1}}}};
+\node [anchor=north west,inner sep=1pt,fill=black] (tl13) at (t13.north west) {\tiny{{\color{white} \textbf{1}}}};
+
+\node [anchor=north,inner sep=2pt,fill=green!20,minimum height=1.5em,minimum width=2.5em] (t21) at ([yshift=-1em]s2.south) {to};
+\node [anchor=north,inner sep=2pt,fill=green!20,minimum height=1.5em,minimum width=2.5em] (t22) at ([yshift=-0.8em]t21.south) {with};
+\node [anchor=north,inner sep=2pt,fill=green!20,minimum height=1.5em,minimum width=2.5em] (t23) at ([yshift=-0.8em]t22.south) {for};
+\node [anchor=north west,inner sep=1pt,fill=black] (tl21) at (t21.north west) {\tiny{{\color{white} \textbf{2}}}};
+\node [anchor=north west,inner sep=1pt,fill=black] (tl22) at (t22.north west) {\tiny{{\color{white} \textbf{2}}}};
+\node [anchor=north west,inner sep=1pt,fill=black] (tl23) at (t23.north west) {\tiny{{\color{white} \textbf{2}}}};
+
+\node [anchor=north,inner sep=2pt,fill=blue!20,minimum height=1.5em,minimum width=2.5em] (t31) at ([yshift=-1em]s3.south) {you};
+\node [anchor=north west,inner sep=1pt,fill=black] (tl31) at (t31.north west) {\tiny{{\color{white} \textbf{3}}}};
+
+\node [anchor=north,inner sep=2pt,fill=orange!20,minimum height=1.5em,minimum width=3em] (t41) at ([yshift=-1em]s4.south) {$\phi$};
+\node [anchor=north,inner sep=2pt,fill=orange!20,minimum height=1.5em,minimum width=3em] (t42) at ([yshift=-0.8em]t41.south) {show};
+\node [anchor=north west,inner sep=1pt,fill=black] (tl41) at (t41.north west) {\tiny{{\color{white} \textbf{4}}}};
+\node [anchor=north west,inner sep=1pt,fill=black] (tl42) at (t42.north west) {\tiny{{\color{white} \textbf{4}}}};
+
+\node [anchor=north,inner sep=2pt,fill=purple!20,minimum height=1.5em,minimum width=4.5em] (t51) at ([yshift=-1em]s5.south) {satisfy};
+\node [anchor=north,inner sep=2pt,fill=purple!20,minimum height=1.5em,minimum width=4.5em] (t52) at ([yshift=-0.8em]t51.south) {satisfied};
+\node [anchor=north,inner sep=2pt,fill=purple!20,minimum height=1.5em,minimum width=4.5em] (t53) at ([yshift=-0.8em]t52.south) {satisfies};
+\node [anchor=north west,inner sep=1pt,fill=black] (tl51) at (t51.north west) {\tiny{{\color{white} \textbf{5}}}};
+\node [anchor=north west,inner sep=1pt,fill=black] (tl52) at (t52.north west) {\tiny{{\color{white} \textbf{5}}}};
+\node [anchor=north west,inner sep=1pt,fill=black] (tl53) at (t53.north west) {\tiny{{\color{white} \textbf{5}}}};
+
+}
+
+{\tiny
+
+{
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt11) at (t11.east) {{\color{white} \textbf{P=.4}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt12) at (t12.east) {{\color{white} \textbf{P=.2}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt13) at (t13.east) {{\color{white} \textbf{P=.4}}};
+
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt21) at (t21.east) {{\color{white} \textbf{P=.4}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt22) at (t22.east) {{\color{white} \textbf{P=.3}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt23) at (t23.east) {{\color{white} \textbf{P=.3}}};
+
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt31) at (t31.east) {{\color{white} \textbf{P=1}}};
+
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt41) at (t41.east) {{\color{white} \textbf{P=.5}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt42) at (t42.east) {{\color{white} \textbf{P=.5}}};
+
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt51) at (t51.east) {{\color{white} \textbf{P=.5}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt52) at (t52.east) {{\color{white} \textbf{P=.4}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.55em,fill=black] (pt53) at (t53.east) {{\color{white} \textbf{P=.1}}};
+}
+
+}
+
+\end{scope}
+
+\begin{scope}
+{\small
+
+\node [anchor=west,inner sep=2pt,fill=red!20,minimum height=1.5em,minimum width=2.5em] (ft11) at ([yshift=-1.4in]t11.west) {I'm};
+\node [anchor=center,inner sep=2pt,fill=purple!20,minimum height=1.5em,minimum width=5em] (ft12) at ([xshift=6.0em]ft11.center) {satisfied};
+\node [anchor=center,inner sep=2pt,fill=green!20,minimum height=1.5em,minimum width=2.5em] (ft13) at ([xshift=6.0em]ft12.center) {with};
+\node [anchor=center,inner sep=2pt,fill=blue!20,minimum height=1.5em,minimum width=2.5em] (ft14) at ([xshift=5.0em]ft13.center) {you};
+
+{
+\node [anchor=west,inner sep=2pt,fill=red!20,minimum height=1.5em,minimum width=2.5em] (ft21) at ([yshift=-3em]ft11.west) {I'm};
+\node [anchor=center,inner sep=2pt,fill=purple!20,minimum height=1.5em,minimum width=5em] (ft22) at ([xshift=6.0em]ft21.center) {satisfy};
+\node [anchor=center,inner sep=2pt,fill=green!20,minimum height=1.5em,minimum width=2.5em] (ft23) at ([xshift=6.0em]ft22.center) {to};
+\node [anchor=center,inner sep=2pt,fill=blue!20,minimum height=1.5em,minimum width=2.5em] (ft24) at ([xshift=5.0em]ft23.center) {you};
+}
+
+{
+\node [anchor=west,inner sep=2pt,fill=red!20,minimum height=1.5em,minimum width=2.5em] (ft31) at ([yshift=-3em]ft21.west) {I'm};
+\node [anchor=center,inner sep=2pt,fill=purple!20,minimum height=1.5em,minimum width=5em] (ft32) at ([xshift=6.0em]ft31.center) {satisfy};
+\node [anchor=center,inner sep=2pt,fill=blue!20,minimum height=1.5em,minimum width=2.5em] (ft33) at ([xshift=6.0em]ft32.center) {you};
+\node [anchor=center,inner sep=2pt,fill=green!20,minimum height=1.5em,minimum width=2.5em] (ft34) at ([xshift=5.0em]ft33.center) {to};
+}
+
+\node [anchor=north west,inner sep=1pt,fill=black] (ftl11) at (ft11.north west) {\tiny{{\color{white} \textbf{1}}}};
+\node [anchor=north west,inner sep=1pt,fill=black] (ftl12) at (ft12.north west) {\tiny{{\color{white} \textbf{5}}}};
+\node [anchor=north west,inner sep=1pt,fill=black] (ftl13) at (ft13.north west) {\tiny{{\color{white} \textbf{2}}}};
+\node [anchor=north west,inner sep=1pt,fill=black] (ftl14) at (ft14.north west) {\tiny{{\color{white} \textbf{3}}}};
+
+{
+\node [anchor=north west,inner sep=1pt,fill=black] (ftl21) at (ft21.north west) {\tiny{{\color{white} \textbf{1}}}};
+\node [anchor=north west,inner sep=1pt,fill=black] (ftl22) at (ft22.north west) {\tiny{{\color{white} \textbf{5}}}};
+\node [anchor=north west,inner sep=1pt,fill=black] (ftl23) at (ft23.north west) {\tiny{{\color{white} \textbf{2}}}};
+\node [anchor=north west,inner sep=1pt,fill=black] (ftl24) at (ft24.north west) {\tiny{{\color{white} \textbf{3}}}};
+}
+
+{
+\node [anchor=north west,inner sep=1pt,fill=black] (ftl31) at (ft31.north west) {\tiny{{\color{white} \textbf{1}}}};
+\node [anchor=north west,inner sep=1pt,fill=black] (ftl32) at (ft32.north west) {\tiny{{\color{white} \textbf{5}}}};
+\node [anchor=north west,inner sep=1pt,fill=black] (ftl33) at (ft33.north west) {\tiny{{\color{white} \textbf{3}}}};
+\node [anchor=north west,inner sep=1pt,fill=black] (ftl34) at (ft34.north west) {\tiny{{\color{white} \textbf{2}}}};
+}
+
+{\tiny
+
+{
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.5em,fill=black] (pft11) at (ft11.east) {{\color{white} \textbf{P=.4}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.5em,fill=black] (pft12) at (ft12.east) {{\color{white} \textbf{P=.4}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.5em,fill=black] (pft13) at (ft13.east) {{\color{white} \textbf{P=.3}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.5em,fill=black] (pft14) at (ft14.east) {{\color{white} \textbf{P=1}}};
+
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.5em,fill=black] (pft21) at (ft21.east) {{\color{white} \textbf{P=.4}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.5em,fill=black] (pft22) at (ft22.east) {{\color{white} \textbf{P=.1}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.5em,fill=black] (pft23) at (ft23.east) {{\color{white} \textbf{P=.4}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.5em,fill=black] (pft24) at (ft24.east) {{\color{white} \textbf{P=1}}};
+
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.5em,fill=black] (pft31) at (ft31.east) {{\color{white} \textbf{P=.4}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.5em,fill=black] (pft32) at (ft32.east) {{\color{white} \textbf{P=.1}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.5em,fill=black] (pft33) at (ft33.east) {{\color{white} \textbf{P=1}}};
+\node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.5em,fill=black] (pft34) at (ft34.east) {{\color{white} \textbf{P=.4}}};
+}
+
+}
+
+\begin{pgfonlayer}{background}
+\node[rectangle,draw=ublue,red,inner sep=0.1em,fill=white] [fit = (ft11) (pft14)] (trans1) {};
+{
+\node[rectangle,draw=ublue,ublue,inner sep=0.1em,fill=white] [fit = (ft21) (pft24)] (trans1) {};
+}
+{
+\node[rectangle,draw=ublue,ublue,inner sep=0.1em,fill=white] [fit = (ft31) (pft34)] (trans1) {};
+}
+\end{pgfonlayer}
+
+{
+\node [anchor=west,inner sep=2pt,minimum height=1.5em,minimum width=2.5em] (ft41) at ([yshift=-2em]ft31.west) {...};
+}
+
+{
+\node [anchor=west,inner sep=2pt,minimum height=1.5em,minimum width=2.5em] (ft42) at ([yshift=-2em]ft32.west) {\scriptsize{\textbf{所有翻译单元都是概率化的}}};
+\node [anchor=west,inner sep=1pt,fill=black] (ft43) at (ft42.east) {{\color{white} \tiny{\textbf{P=概率}}}};
+}
+}
+\end{scope}
+
+\begin{scope}
+{\small
+\node [anchor=east] (label4) at ([yshift=0.8em]ft11.west) {翻译就是一条};
+\node [anchor=north west] (label4part2) at ([yshift=0.7em]label4.south west) {译文选择路径};
+}
+
+{\small
+\node [anchor=east] (label5) at ([yshift=0.4em]ft21.west) {不同的译文对};
+\node [anchor=north west] (label5part2) at ([yshift=0.7em]label5.south west) {应不同的路径};
+}
+
+{\small
+\node [anchor=east] (label6) at ([yshift=0.4em]ft31.west) {单词翻译的词};
+\node [anchor=north west] (label6part2) at ([yshift=0.7em]label6.south west) {序也可能不同};
+}
+
+{\small
+\node [anchor=east] (label7) at ([yshift=0.4em]ft41.west) {可能的翻译路};
+\node [anchor=north west] (label7part2) at ([yshift=0.7em]label7.south west) {径非常多};
+}
+
+\end{scope}
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\begin{scope}
+{
+\draw[decorate,thick,decoration={brace,amplitude=5pt}] ([yshift=8em,xshift=1.5em]t53.south east) -- ([xshift=1.5em]t53.south east) node [pos=0.5,right,xshift=0.5em,yshift=2.0em] (label2) {\footnotesize{\textbf{从双语数}}};
+\node [anchor=north west] (label2part2) at ([yshift=0.3em]label2.south west) {\footnotesize{\textbf{据中自动}}};
+\node [anchor=north west] (label2part3) at ([yshift=0.3em]label2part2.south west) {\footnotesize{\textbf{学习词典}}};
+\node [anchor=north west] (label2part4) at ([yshift=0.3em]label2part3.south west) {\footnotesize{\textbf{(训练)}}};
+}
+
+{
+\draw[decorate,thick,decoration={brace,amplitude=5pt}] ([yshift=-1.0em,xshift=5.7em]t53.south west) -- ([yshift=-10.5em,xshift=5.7em]t53.south west) node [pos=0.5,right,xshift=0.5em,yshift=2.0em] (label3) {\footnotesize{\textbf{利用概率}}};
+\node [anchor=north west] (label3part2) at ([yshift=0.3em]label3.south west) {\footnotesize{\textbf{化的词典}}};
+\node [anchor=north west] (label3part3) at ([yshift=0.3em]label3part2.south west) {\footnotesize{\textbf{进行翻译}}};
+\node [anchor=north west] (label3part4) at ([yshift=0.3em]label3part3.south west) {\footnotesize{\textbf{(解码)}}};
+}
+\end{scope}
+
+
+\begin{scope}
+
+{
+\node [anchor=west] (score1) at ([xshift=1.5em]ft14.east) {\footnotesize{P=0.042}};
+\node [anchor=west] (score2) at ([xshift=1.5em]ft24.east) {\footnotesize{P=0.006}};
+\node [anchor=west] (score3) at ([xshift=1.5em]ft34.east) {\footnotesize{P=0.003}};
+\node [anchor=south] (scorelabel) at (score1.north) {\scriptsize{\textbf{\color{red}{都赋予一个模型得分}}}};
+\node [anchor=south] (scorelabel2) at ([yshift=-0.5em]scorelabel.north) {\scriptsize{\textbf{\color{red}{系统给每个译文}}}};
+}
+{
+\node [anchor=north] (scorelabel2) at (score3.south) {\scriptsize{\textbf{选择得分}}};
+\node [anchor=north west] (scorelabel2part2) at ([xshift=-0.5em,yshift=0.5em]scorelabel2.south west) {\scriptsize{\textbf{最高的译文}}};
+\node [anchor=center,draw=ublue,circle,thick,fill=white,inner sep=1pt,circular drop shadow={shadow xshift=0.05em,shadow yshift=-0.05em}] (head1) at ([xshift=0.3em]score1.east) {\scriptsize{{\color{ugreen} \textbf{ok}}}};
+}
+
+\end{scope}
+
+\begin{scope}
+
+{
+\draw [->,ultra thick,ublue,line width=2pt,opacity=0.7] ([xshift=-0.5em,yshift=-0.3em]t13.west) -- ([xshift=0.8em,yshift=-0.3em]t13.east) -- ([xshift=-0.2em,yshift=-0.3em]t21.west) -- ([xshift=0.8em,yshift=-0.3em]t21.east) -- ([xshift=-0.2em,yshift=-0.3em]t31.west) -- ([xshift=0.8em,yshift=-0.3em]t31.east) -- ([xshift=-0.2em,yshift=-0.3em]t41.west) -- ([xshift=0.8em,yshift=-0.3em]t41.east) -- ([xshift=-0.2em,yshift=-0.3em]t51.west) -- ([xshift=1.2em,yshift=-0.3em]t51.east);
+}
+
+\draw [->,ultra thick,red,line width=2pt,opacity=0.7] ([xshift=-0.5em,yshift=-0.5em]t13.west) -- ([xshift=0.8em,yshift=-0.5em]t13.east) -- ([xshift=-0.2em,yshift=-0.5em]t22.west) -- ([xshift=0.8em,yshift=-0.5em]t22.east) -- ([xshift=-0.2em,yshift=-0.5em]t31.west) -- ([xshift=0.8em,yshift=-0.5em]t31.east) -- ([xshift=-0.2em,yshift=-0.5em]t41.west) -- ([xshift=0.8em,yshift=-0.5em]t41.east) -- ([xshift=-0.2em,yshift=-0.5em]t52.west) -- ([xshift=1.2em,yshift=-0.5em]t52.east);
+
+\end{scope}
+
+\end{tikzpicture}
\ No newline at end of file
--- a/Book/Chapter3/Chapter3/Figures/figure-processes-SMT.tex
+++ b/Book/Chapter3/Chapter3/Figures/figure-processes-SMT.tex
+
+%%% outline
+%-------------------------------------------------------------------------
+
+\begin{tikzpicture}
+
+{\scriptsize
+\node [anchor=north west,inner sep=1pt] (entry1) at (0,0) {\tiny{\textbf{1:} 这 是 数据 $\leftrightarrow$ This is data}};
+\node [anchor=north west,inner sep=1pt] (entry2) at ([yshift=0.1em]entry1.south west) {\tiny{\textbf{2:} 小心 ！$\leftrightarrow$ Look out !}};
+\node [anchor=north west,inner sep=1pt] (entry3) at ([yshift=0.1em]entry2.south west) {\tiny{\textbf{3:} 你 是 谁 $\leftrightarrow$ Who are you}};
+\node [anchor=north west,inner sep=2pt] (entry4) at ([yshift=0.1em]entry3.south west) {...};
+\node [anchor=south west] (corpuslabel) at (entry1.north west) {{\color{ublue} \textbf{双语平行数据}}};
+
+\begin{pgfonlayer}{background}
+\node[rectangle,draw=ublue,thick,inner sep=0.2em,fill=white,drop shadow,minimum height=1.6cm] [fit = (entry1) (entry2) (entry3) (entry4) (corpuslabel)] (corpus) {};
+\end{pgfonlayer}
+}
+
+\node [anchor=west,ugreen] (P) at ([xshift=4em,yshift=-0.7em]corpus.east){P($t|s$)};
+\node [anchor=south] (modellabel) at (P.north) {{\color{ublue} {\scriptsize \textbf{翻译模型}}}};
+
+\begin{pgfonlayer}{background}
+\node[rectangle,draw=ublue,thick,inner sep=0.2em,fill=white,drop shadow,minimum height=1.6cm] [fit = (P) (modellabel)] (model) {};
+\end{pgfonlayer}
+
+\draw [->,very thick,ublue] ([xshift=0.2em]corpus.east) -- ([xshift=3.2em]corpus.east)  node [pos=0.5, above] {\color{red}{\scriptsize{模型学习}}};
+
+{
+\draw [->,very thick,ublue] ([xshift=0.4em]model.east) -- ([xshift=3.4em]model.east)  node [inner sep=0pt,pos=0.5, above,yshift=0.3em] (decodingarrow) {\color{red}{\scriptsize{穷举\&计算}}};
+
+{\scriptsize
+\node [anchor=north west,inner sep=2pt] (sentlabel) at ([xshift=5.5em,yshift=-0.3em]model.north east) {{\color{ublue} \textbf{机器翻译引擎}}};
+\node [anchor=north west] (sent) at ([yshift=-0.5em]sentlabel.south west) {\textbf{对任意句子}};
+\node [anchor=north west] (sentpart2) at ([yshift=0.3em]sent.south west) {\textbf{进行翻译}};
+}
+}
+
+\begin{pgfonlayer}{background}
+{
+\node[rectangle,draw=ublue,thick,inner sep=0.2em,fill=white,drop shadow,minimum height=1.6cm] [fit = (sentlabel) (sent) (sentpart2)] (segsystem) {};
+}
+\end{pgfonlayer}
+\end{tikzpicture}
+
+%---------------------------------------------------------------------
--- a/Book/Chapter3/Chapter3/Figures/figure-simple-statistical-algorithms.tex
+++ b/Book/Chapter3/Chapter3/Figures/figure-simple-statistical-algorithms.tex
+
+%%% outline
+%-------------------------------------------------------------------------
+\begin{tikzpicture}
+\begin{scope}
+{
+\node [pos=0.5,left,xshift=-0.5em,yshift=2.0em] (label2) {Input: a word pair (x, y) and a sentence pair (s, t)};
+\node [anchor=north west] (label2part2) at ([yshift=0.3em]label2.south west) {Output: the number of (x, y) in the (s, t)};
+\node [anchor=north west] (label2part3) at ([yshift=0.3em]label2part2.south west) {{1：$count$ $\leftarrow$ 0}};
+\node [anchor=north west] (label2part4) at ([yshift=0.3em]label2part3.south west) {{2：\textbf{for} $s\underline{\hbox to 0.1cm{}}word$ $\leftarrow$ $S_1$ … $S_{length(s)}$ \textbf{do}    }};
+\node [anchor=north west] (label2part5)at ([yshift=0.3em]label2part4.south west)  {{3：\quad\textbf{for} $t\underline{\hbox to 0.1cm{}}word$ $\leftarrow$ $t_1$ … $t_{length(t)}$ \textbf{do}    }};
+\node [anchor=north west] (label2part6)at ([yshift=0.3em]label2part5.south west)  {{4：\quad\quad\textbf{if} $s\underline{\hbox to 0.1cm{}}word$ == x \textbf{and}  $t\underline{\hbox to 0.1cm{}}word$==y \textbf{then}      }};
+\node [anchor=north west] (label2part7) at ([yshift=0.3em]label2part6.south west) {{5：\quad\quad\quad\quad $count$ $\leftarrow$ $count$ + 1     }};
+\node [anchor=north west] (label2part8) at ([yshift=0.3em]label2part7.south west) {{6：\quad\quad \textbf{end if}     }};
+\node [anchor=north west] (label2part9) at ([yshift=0.3em]label2part8.south west) {{7：\quad \textbf{end for}     }};
+\node [anchor=north west] (label2part10) at ([yshift=0.3em]label2part9.south west) {{8： \textbf{end for}     }};
+
+}
+
+\end{scope}
+\end{tikzpicture}
+
+
+%---------------------------------------------------------------------
+
+
+
--- a/Book/Chapter3/Chapter3/Figures/figure-translation-pipeline.tex
+++ b/Book/Chapter3/Chapter3/Figures/figure-translation-pipeline.tex
+%%% outline
+%-------------------------------------------------------------------------
+\begin{tikzpicture}
+\definecolor{ugreen}{rgb}{0,0.5,0}
+\definecolor{ublue}{rgb}{0.152,0.250,0.545}
+
+\node [anchor=west,draw,thick,minimum width=18.8em,minimum height=1.2em] (sent) at (0,0) {};
+
+\node [anchor=west,draw,thick,circle,minimum size=1.6em,red] (s1) at ([xshift=0.0em,yshift=-2em]sent.south west) {};
+\node [anchor=west,draw,thick,circle,minimum size=1.6em,ugreen] (s2) at ([xshift=2.7em]s1.east) {};
+\node [anchor=west,draw,thick,circle,minimum size=1.6em,orange] (s3) at ([xshift=2.7em]s2.east) {};
+\node [anchor=west,draw,thick,circle,minimum size=1.6em,ublue] (s4) at ([xshift=2.7em]s3.east) {};
+\node [anchor=west,draw,thick,circle,minimum size=1.6em,purple] (s5) at ([xshift=2.7em]s4.east) {};
+
+{
+\node [anchor=west,draw,thick,circle,minimum size=1.6em,red,fill=red] (t1) at ([yshift=-3.5em]s1.west) {};
+\node [anchor=west,draw,thick,circle,minimum size=1.6em,ugreen,fill=ugreen] (t2) at ([xshift=2.7em]t1.east) {};
+\node [anchor=west,draw,thick,circle,minimum size=1.6em,orange,fill=orange] (t3) at ([xshift=2.7em]t2.east) {};
+\node [anchor=west,draw,thick,circle,minimum size=1.6em,ublue,fill=ublue] (t4) at ([xshift=2.7em]t3.east) {};
+\node [anchor=west,draw,thick,circle,minimum size=1.6em,purple,fill=purple] (t5) at ([xshift=2.7em]t4.east) {};
+}
+
+{
+\node [anchor=west,draw,thick,circle,minimum size=1.6em,red,fill=red] (ft1) at ([yshift=-3.5em]t1.west) {};
+\node [anchor=west,draw,thick,circle,minimum size=1.6em,ublue,fill=ublue] (ft2) at ([xshift=2.7em]ft1.east) {};
+\node [anchor=west,draw,thick,circle,minimum size=1.6em,purple,fill=purple] (ft3) at ([xshift=2.7em]ft2.east) {};
+\node [anchor=west,draw,thick,circle,minimum size=1.6em,ugreen,fill=ugreen] (ft4) at ([xshift=2.7em]ft3.east) {};
+\node [anchor=west,draw,thick,circle,minimum size=1.6em,orange,fill=orange] (ft5) at ([xshift=2.7em]ft4.east) {};
+}
+
+\draw [->,thick,double] ([yshift=-0.1em]sent.south) -- ([yshift=-.8em]sent.south);
+
+{
+\draw [->,thick] ([yshift=-0.1em]s1.south) -- ([yshift=0.1em]t1.north);
+\draw [->,thick] ([yshift=-0.1em]s2.south) -- ([yshift=0.1em]t2.north);
+\draw [->,thick] ([yshift=-0.1em]s3.south) -- ([yshift=0.1em]t3.north);
+\draw [->,thick] ([yshift=-0.1em]s4.south) -- ([yshift=0.1em]t4.north);
+\draw [->,thick] ([yshift=-0.1em]s5.south) -- ([yshift=0.1em]t5.north);
+}
+{
+\draw [->,thick] ([yshift=-0.1em]t1.south) -- ([yshift=0.1em]ft1.north);
+\draw [->,thick] ([yshift=-0.1em]t2.south) -- ([yshift=0.1em]ft4.north);
+\draw [->,thick] ([yshift=-0.1em]t3.south) -- ([yshift=0.1em]ft5.north);
+\draw [->,thick] ([yshift=-0.1em]t4.south) -- ([yshift=0.1em]ft2.north);
+\draw [->,thick] ([yshift=-0.1em]t5.south) -- ([yshift=0.1em]ft3.north);
+}
+{
+\node [anchor=north west] (label1) at ([xshift=1.5em,yshift=0.5em]s5.east) {\textbf{分析}};
+\node [anchor=north west] (label2) at ([xshift=1.5em,yshift=0.5em]t5.east) {\textbf{转换}};
+\node [anchor=north west] (label3) at ([xshift=1.5em,yshift=0.5em]ft5.east) {\textbf{生成}};
+}
+
+\end{tikzpicture}
+%---------------------------------------------------------------------
+
+
+
--- a/Book/Chapter3/Chapter3/Figures/figure-two-zh-en-sentences.tex
+++ b/Book/Chapter3/Chapter3/Figures/figure-two-zh-en-sentences.tex
+
+%%% outline
+%-------------------------------------------------------------------------
+\begin{tikzpicture}
+\begin{scope}
+{
+\node [pos=0.5,left,xshift=-0.5em,yshift=2.0em] (label2) {\footnotesize{{$s^1$ = 机器\quad {\color{red}翻译}\quad 就\quad 是\quad 用\quad 计算机\quad 来\quad 进行\quad {\color{red}翻译}      }}};
+\node [anchor=north west] (label2part2) at ([yshift=0.3em]label2.south west) {\footnotesize{{$t^1$ = machine\; {\color{red}translation}\; is\; just\; {\color{red}translation}\; by\; computer    }}};
+\node [anchor=north west] (label2part3) at ([yshift=0.3em]label2part2.south west) {\footnotesize{{$s^2$ = 那\quad 人工\quad {\color{red}翻译}\quad 呢\quad ?    }}};
+\node [anchor=north west] (label2part4) at ([yshift=0.3em]label2part3.south west) {\footnotesize{{$t^2$ = so\; what\; is\; human\; {\color{red}translation}\;?   }}};
+
+}
+
+\end{scope}
+\end{tikzpicture}
+
+
+%---------------------------------------------------------------------
--- a/Book/Chapter3/Chapter3/Figures/figure-zh-en-sentence.tex
+++ b/Book/Chapter3/Chapter3/Figures/figure-zh-en-sentence.tex
+
+%%% outline
+%-------------------------------------------------------------------------
+\begin{tikzpicture}
+\begin{scope}
+{
+\node [pos=0.5,left,xshift=-0.5em,yshift=2.0em] (label2) {\footnotesize{{s = 机器\quad {\color{red}翻译}\quad 就\quad 是\quad 用\quad 计算机\quad 来\quad 进行\quad {\color{red}翻译}      }}};
+\node [anchor=north west] (label2part2) at ([yshift=0.3em]label2.south west) {\footnotesize{{t = machine\; {\color{red}translation}\; is\; just\; {\color{red}translation}\; by\; computer    }}};
+
+}
+
+\end{scope}
+\end{tikzpicture}
+
+
+%---------------------------------------------------------------------
+
+
--- a/Book/Chapter3/Chapter3/Figures/figure-zh-en-translation-example.tex
+++ b/Book/Chapter3/Chapter3/Figures/figure-zh-en-translation-example.tex
+\begin{center}
+%%% outline
+%-------------------------------------------------------------------------
+{\footnotesize
+\definecolor{ublue}{rgb}{0.152,0.250,0.545}
+		\begin{tikzpicture}
+
+		\begin{scope}
+		\node [anchor=west, font=\normalsize] (s1) at (0,0) {我};
+		\node [anchor=west, font=\normalsize] (s2) at ([xshift=3em]s1.east) {对};
+		\node [anchor=west, font=\normalsize] (s3) at ([xshift=5.6em]s2.east) {你};
+		\node [anchor=west, font=\normalsize] (s4) at ([xshift=5.1em]s3.east) {感到};
+		\node [anchor=west, font=\normalsize] (s5) at ([xshift=3.1em]s4.east) {满意};
+		\end{scope}
+
+		\begin{scope}[yshift=-7em]
+		\node [anchor=west, font=\normalsize] (t1) at (0.4em,0) {I};
+		\node [anchor=west, font=\normalsize] (t2) at ([xshift=3.5em,yshift=-0.1em]t1.east) {am};
+		\node [anchor=west, font=\normalsize] (t3) at ([xshift=3.5em,yshift=0.1em]t2.east) {satisfied};
+		\node [anchor=west, font=\normalsize] (t4) at ([xshift=3.5em]t3.east) {with};
+		\node [anchor=west, font=\normalsize] (t5) at ([xshift=3.5em,yshift=-0.2em]t4.east) {you};
+		\end{scope}
+
+		{
+		\draw [-,very thick,ublue,dashed] (s1.south) -- (t1.north);
+		\draw [-,very thick,ublue,dashed] (s4.south) -- ([yshift=0.3em]t2.north);
+		\draw [-,very thick,ublue,dashed] (s2.south) ..controls +(south:1em) and +(north:1em).. (t4.north);
+		\draw [-,very thick,ublue,dashed] (s3.south) ..controls +(south:0.5em) and +(north:1.5em).. (t5.north);
+		\draw [-,very thick,ublue,dashed] (s5.south) -- (t3.north);
+		}
+		\end{tikzpicture}
+}
+%---------------------------------------------------------------------
+\end{center}
+
--- a/Book/Chapter3/Chapter3/Figures/figure1.jpg
+++ b/Book/Chapter3/Chapter3/Figures/figure1.jpg
--- a/Book/Chapter3/Chapter3/Figures/figure1.png
+++ b/Book/Chapter3/Chapter3/Figures/figure1.png
--- a/Book/Chapter3/Chapter3/Figures/figure2.jpg
+++ b/Book/Chapter3/Chapter3/Figures/figure2.jpg
--- a/Book/Chapter3/Chapter3/Figures/figure3.jpg
+++ b/Book/Chapter3/Chapter3/Figures/figure3.jpg
--- a/Book/Chapter3/Figures/figure311.tex
+++ b/Book/Chapter3/Figures/figure311.tex
--- a/Book/Chapter3/Figures/figure312.tex
+++ b/Book/Chapter3/Figures/figure312.tex
--- a/Book/Chapter3/Chapter3/Figures/figure313.tex
+++ b/Book/Chapter3/Chapter3/Figures/figure313.tex
+
+%%% outline
+%-------------------------------------------------------------------------
+\begin{tikzpicture}
+\begin{scope}
+{
+\node [pos=0.5,left,xshift=-0.5em,yshift=2.0em] (label2) {\footnotesize{{$t^1$ =$<$s$>$ {\color{red}machine}\; {\color{red}translation}\; is\; just\; translation\; by\; computer\; $<$/s$>$     }}};
+\node [anchor=north west] (label2part2) at ([yshift=0.3em]label2.south west) {\footnotesize{{$t^2$ =$<$s$>$ how\; does\;this\; {\color{red}machine}\;work\; $?$ $<$/s$>$     }}};
+\node [anchor=north west] (label2part3) at ([yshift=0.3em]label2part2.south west) {\footnotesize{{$t^3$ =$<$s$>$ we\; are\;replaced\;the\; old\; adding\; {\color{red}machine}\;with\;a\; computer\; $<$/s$>$     }}};
+\node [anchor=north west] (label2part4) at ([yshift=0.3em]label2part3.south west) {\footnotesize{{$t^4$ =$<$s$>$ several\; approaches\; to\;{\color{red}machine}\;learning\; are\; used\; to\; solve\; problems\; $?$\;  $<$/s$>$  }}};
+
+}
+
+
+\end{scope}
+\end{tikzpicture}
+
+
+%---------------------------------------------------------------------
+
+
+
--- a/Book/Chapter3/Figures/figure314.tex
+++ b/Book/Chapter3/Figures/figure314.tex
--- a/Book/Chapter3/Chapter3/Figures/figure315.tex
+++ b/Book/Chapter3/Chapter3/Figures/figure315.tex
+
+%%% outline
+%-------------------------------------------------------------------------
+\begin{tikzpicture}
+\definecolor{ugreen}{rgb}{0,0.5,0}
+\definecolor{ublue}{rgb}{0.152,0.250,0.545}
+
+\node [anchor=west,inner sep=2pt,fill=red!20,thick,minimum width=3.6em,minimum height=0.8em] (s1) at (0,0) {$\phi$};
+
+{%第一列
+\node [anchor=north west] (label1) at ([xshift=2.5em,yshift=6.3em]s1.east) {{\scriptsize \textbf{第1步}}};
+\node [anchor=west,inner sep=2pt,fill=green!20,thick,minimum width=3.6em,minimum height=0.8em] (s21) at ([xshift=2.0em,yshift=3.5em]s1.east) {$w_1^1$};
+\node [anchor=west,inner sep=2pt,fill=green!20,thick,minimum width=3.6em,minimum height=0.8em] (s22) at ([xshift=2.0em,yshift=0.0em]s1.east) {$w_2^1$};
+\node [anchor=west] (s23) at ([xshift=3.3em,yshift=-3.2em]s1.east) {$\vdots$};
+\node [anchor=west,inner sep=2pt,fill=green!20,thick,minimum width=3.6em,minimum height=0.8em] (s24) at ([xshift=2.0em,yshift=-7.0em]s1.east) {\scriptsize{$w_{m \bullet n}^1$}};
+}
+{
+\draw [->,thick] ([yshift=0.0em]s1.east) -- ([yshift=0.0em]s21.west);
+\draw [->,thick] ([yshift=0.0em]s1.east) -- ([yshift=0.0em]s22.west);
+\draw [->,thick] ([yshift=0.0em]s1.east) -- ([xshift=-1.3em,yshift=-0.2em]s23.west);
+\draw [->,thick] ([yshift=0.0em]s1.east) -- ([yshift=0.0em]s24.west);
+}
+
+{%第二列
+\node [anchor=north west] (label2) at ([xshift=9.3em,yshift=6.3em]s1.east) {{\scriptsize \textbf{第2步}}};
+\node [anchor=west,inner sep=2pt,fill=blue!20,thick,minimum width=3.6em,minimum height=0.8em] (s31) at ([xshift=8.8em,yshift=3.5em]s1.east) {$w_1^2$};
+\node [anchor=west,inner sep=2pt,fill=blue!20,thick,minimum width=3.6em,minimum height=0.8em] (s32) at ([xshift=8.8em,yshift=0.0em]s1.east) {$w_2^2$};
+\node [anchor=west] (s33) at ([xshift=10.1em,yshift=-3.2em]s1.east) {$\vdots$};
+\node [anchor=west,inner sep=2pt,fill=blue!20,thick,minimum width=3.6em,minimum height=0.8em] (s34) at ([xshift=8.8em,yshift=-7.0em]s1.east) {\tiny{$w_{(m-1) \bullet n}^2$}};
+}
+{
+\draw [->,thick] ([yshift=0.0em]s21.east) -- ([yshift=0.0em]s31.west);
+\draw [->,thick] ([yshift=0.0em]s21.east) -- ([yshift=0.0em]s32.west);
+\draw [->,thick] ([yshift=0.0em]s21.east) -- ([xshift=-1.3em,yshift=-0.2em]s33.west);
+\draw [->,thick] ([yshift=0.0em]s21.east) -- ([yshift=0.0em]s34.west);
+
+\draw [->,thick] ([yshift=0.0em]s22.east) -- ([yshift=0.0em]s31.west);
+\draw [->,thick] ([yshift=0.0em]s22.east) -- ([yshift=0.0em]s32.west);
+\draw [->,thick] ([yshift=0.0em]s22.east) -- ([xshift=-1.3em,yshift=-0.2em]s33.west);
+\draw [->,thick] ([yshift=0.0em]s22.east) -- ([yshift=0.0em]s34.west);
+
+\draw [->,thick] ([xshift=1.3em,yshift=-0.3em]s23.east) -- ([yshift=0.0em]s31.west);
+\draw [->,thick] ([xshift=1.3em,yshift=-0.3em]s23.east) -- ([yshift=0.0em]s32.west);
+\draw [->,thick] ([xshift=1.3em,yshift=-0.3em]s23.east) -- ([xshift=-1.3em,yshift=-0.3em]s33.west);
+\draw [->,thick] ([xshift=1.3em,yshift=-0.3em]s23.east) -- ([yshift=0.0em]s34.west);
+
+\draw [->,thick] ([yshift=0.0em]s24.east) -- ([yshift=0.0em]s31.west);
+\draw [->,thick] ([yshift=0.0em]s24.east) -- ([yshift=0.0em]s32.west);
+\draw [->,thick] ([yshift=0.0em]s24.east) -- ([xshift=-1.3em,yshift=-0.2em]s33.west);
+\draw [->,thick] ([yshift=0.0em]s24.east) -- ([yshift=0.0em]s34.west);
+}
+
+{%第三列
+\node [anchor=west] (s41) at ([xshift=15.6em,yshift=3.5em]s1.east) {$\cdots$};
+\node [anchor=west] (s42) at ([xshift=15.6em,yshift=0.0em]s1.east) {$\cdots$};
+\node [anchor=west] (s43) at ([xshift=15.6em,yshift=-3.5em]s1.east) {$\cdots$};
+\node [anchor=west] (s44) at ([xshift=15.6em,yshift=-7.0em]s1.east) {$\cdots$};
+
+\draw [->,thick] ([yshift=0.0em]s31.east) -- ([yshift=0.0em]s41.west);
+\draw [->,thick] ([yshift=0.0em]s31.east) -- ([yshift=0.0em]s42.west);
+\draw [->,thick] ([yshift=0.0em]s31.east) -- ([yshift=0.0em]s43.west);
+\draw [->,thick] ([yshift=0.0em]s31.east) -- ([yshift=0.0em]s44.west);
+
+\draw [->,thick] ([yshift=0.0em]s32.east) -- ([yshift=0.0em]s41.west);
+\draw [->,thick] ([yshift=0.0em]s32.east) -- ([yshift=0.0em]s42.west);
+\draw [->,thick] ([yshift=0.0em]s32.east) -- ([yshift=0.0em]s43.west);
+\draw [->,thick] ([yshift=0.0em]s32.east) -- ([yshift=0.0em]s44.west);
+
+\draw [->,thick] ([xshift=1.3em,yshift=-0.3em]s33.east) -- ([yshift=0.0em]s41.west);
+\draw [->,thick] ([xshift=1.3em,yshift=-0.3em]s33.east) -- ([yshift=0.0em]s42.west);
+\draw [->,thick] ([xshift=1.3em,yshift=-0.3em]s33.east) -- ([yshift=0.0em]s43.west);
+\draw [->,thick] ([xshift=1.3em,yshift=-0.3em]s33.east) -- ([yshift=0.0em]s44.west);
+
+\draw [->,thick] ([yshift=0.0em]s34.east) -- ([yshift=0.0em]s41.west);
+\draw [->,thick] ([yshift=0.0em]s34.east) -- ([yshift=0.0em]s42.west);
+\draw [->,thick] ([yshift=0.0em]s34.east) -- ([yshift=0.0em]s43.west);
+\draw [->,thick] ([yshift=0.0em]s34.east) -- ([yshift=0.0em]s44.west);
+}
+
+{%第四列
+\node [anchor=north west] (label3) at ([xshift=19.4em,yshift=6.3em]s1.east) {{\scriptsize \textbf{第m步}}};
+\node [anchor=west,inner sep=2pt,fill=orange!20,thick,minimum width=3.6em,minimum height=0.8em] (s51) at ([xshift=19.0em,yshift=3.5em]s1.east) {$w_1^m$};
+\node [anchor=west,inner sep=2pt,fill=orange!20,thick,minimum width=3.6em,minimum height=0.8em] (s52) at ([xshift=19.0em,yshift=0.0em]s1.east) {$w_2^m$};
+\node [anchor=west,inner sep=2pt,fill=orange!20,thick,minimum width=3.6em,minimum height=0.8em] (s53) at ([xshift=19.0em,yshift=-3.5em]s1.east) {$w_{?}^{m}$};
+\node [anchor=west,inner sep=2pt,fill=orange!20,thick,minimum width=3.6em,minimum height=0.8em] (s54) at ([xshift=19.0em,yshift=-7.0em]s1.east) {\scriptsize{$w_{1 \bullet n}^m$}};
+
+\draw [->,thick] ([yshift=0.0em]s41.east) -- ([yshift=0.0em]s51.west);
+\draw [->,thick] ([yshift=0.0em]s42.east) -- ([yshift=0.0em]s52.west);
+\draw [->,thick] ([yshift=0.0em]s43.east) -- ([yshift=0.0em]s53.west);
+\draw [->,thick] ([yshift=0.0em]s44.east) -- ([yshift=0.0em]s54.west);
+
+
+
+}
+
+
+
+\end{tikzpicture}
+%---------------------------------------------------------------------
+
+
+
--- a/Book/Chapter3/Chapter3/Figures/figure317.tex
+++ b/Book/Chapter3/Chapter3/Figures/figure317.tex
+
+
+%%% outline
+%-------------------------------------------------------------------------
+\begin{tikzpicture}
+\definecolor{ugreen}{rgb}{0,0.5,0}
+\definecolor{ublue}{rgb}{0.152,0.250,0.545}
+
+\node [anchor=west,inner sep=2pt,fill=red!20,thick,minimum width=3.6em,minimum height=0.8em] (s1) at (0,0) {$\phi$};
+
+{%第一列
+\node [anchor=north west] (label1) at ([xshift=2.5em,yshift=6.3em]s1.east) {{\scriptsize \textbf{第1步}}};
+\node [anchor=west,inner sep=2pt,fill=green!20,thick,minimum width=3.6em,minimum height=0.8em] (s21) at ([xshift=2.0em,yshift=3.5em]s1.east) {$w_1^1$};
+\node [anchor=west,inner sep=2pt,fill=green!20,thick,minimum width=3.6em,minimum height=0.8em] (s22) at ([xshift=2.0em,yshift=0.0em]s1.east) {$w_2^1$};
+\node [anchor=west] (s23) at ([xshift=3.3em,yshift=-3.2em]s1.east) {$\vdots$};
+\node [anchor=west,inner sep=2pt,fill=green!20,thick,minimum width=3.6em,minimum height=0.8em] (s24) at ([xshift=2.0em,yshift=-7.0em]s1.east) {\scriptsize{$w_{m \bullet n}^1$}};
+}
+{
+\draw [->,thick,red] ([yshift=0.0em]s1.east) -- ([yshift=0.0em]s21.west);
+\draw [->,thick,densely dashed] ([yshift=0.0em]s1.east) -- ([yshift=0.0em]s22.west);
+\draw [->,thick,densely dashed] ([yshift=0.0em]s1.east) -- ([xshift=-1.3em,yshift=-0.2em]s23.west);
+\draw [->,thick,densely dashed] ([yshift=0.0em]s1.east) -- ([yshift=0.0em]s24.west);
+}
+
+{%第二列
+\node [anchor=north west] (label2) at ([xshift=9.3em,yshift=6.3em]s1.east) {{\scriptsize \textbf{第2步}}};
+\node [anchor=west,inner sep=2pt,fill=blue!20,thick,minimum width=3.6em,minimum height=0.8em] (s31) at ([xshift=8.8em,yshift=3.5em]s1.east) {$w_1^2$};
+\node [anchor=west,inner sep=2pt,fill=blue!20,thick,minimum width=3.6em,minimum height=0.8em] (s32) at ([xshift=8.8em,yshift=0.0em]s1.east) {$w_2^2$};
+\node [anchor=west] (s33) at ([xshift=10.1em,yshift=-3.2em]s1.east) {$\vdots$};
+\node [anchor=west,inner sep=2pt,fill=blue!20,thick,minimum width=3.6em,minimum height=0.8em] (s34) at ([xshift=8.8em,yshift=-7.0em]s1.east) {\tiny{$w_{(m-1) \bullet n}^2$}};
+}
+{
+\draw [->,thick,densely dashed] ([yshift=0.0em]s21.east) -- ([yshift=0.0em]s31.west);
+\draw [->,thick,densely dashed] ([yshift=0.0em]s21.east) -- ([yshift=0.0em]s32.west);
+\draw [->,thick,red] ([yshift=0.0em]s21.east) -- ([xshift=-1.3em,yshift=-0.2em]s33.west);
+\draw [->,thick,densely dashed] ([yshift=0.0em]s21.east) -- ([yshift=0.0em]s34.west);
+}
+
+{%第三列
+\node [anchor=west] (s41) at ([xshift=15.6em,yshift=3.5em]s1.east) {$\cdots$};
+\node [anchor=west] (s42) at ([xshift=15.6em,yshift=0.0em]s1.east) {$\cdots$};
+\node [anchor=west] (s43) at ([xshift=15.6em,yshift=-3.5em]s1.east) {$\cdots$};
+\node [anchor=west] (s44) at ([xshift=15.6em,yshift=-7.0em]s1.east) {$\cdots$};
+
+\draw [->,thick,densely dashed] ([xshift=1.3em,yshift=-0.3em]s33.east) -- ([yshift=0.0em]s41.west);
+\draw [->,thick,red] ([xshift=1.3em,yshift=-0.3em]s33.east) -- ([yshift=0.0em]s42.west);
+\draw [->,thick,densely dashed] ([xshift=1.3em,yshift=-0.3em]s33.east) -- ([yshift=0.0em]s43.west);
+\draw [->,thick,densely dashed] ([xshift=1.3em,yshift=-0.3em]s33.east) -- ([yshift=0.0em]s44.west);
+
+}
+
+{%第四列
+\node [anchor=north west] (label3) at ([xshift=19.4em,yshift=6.3em]s1.east) {{\scriptsize \textbf{第m步}}};
+\node [anchor=west,inner sep=2pt,fill=orange!20,thick,minimum width=3.6em,minimum height=0.8em] (s51) at ([xshift=19.0em,yshift=3.5em]s1.east) {$w_1^m$};
+\node [anchor=west,inner sep=2pt,fill=orange!20,thick,minimum width=3.6em,minimum height=0.8em] (s52) at ([xshift=19.0em,yshift=0.0em]s1.east) {$w_2^m$};
+\node [anchor=west,inner sep=2pt,fill=orange!20,thick,minimum width=3.6em,minimum height=0.8em] (s53) at ([xshift=19.0em,yshift=-3.5em]s1.east) {$w_{?}^{m}$};
+\node [anchor=west,inner sep=2pt,fill=orange!20,thick,minimum width=3.6em,minimum height=0.8em] (s54) at ([xshift=19.0em,yshift=-7.0em]s1.east) {\scriptsize{$w_{1 \bullet n}^m$}};
+
+\draw [->,thick,densely dashed] ([yshift=0.0em]s41.east) -- ([yshift=0.0em]s51.west);
+\draw [->,thick,densely dashed] ([yshift=0.0em]s42.east) -- ([yshift=0.0em]s52.west);
+\draw [->,thick,red] ([yshift=0.0em]s43.east) -- ([yshift=0.0em]s53.west);
+\draw [->,thick,densely dashed] ([yshift=0.0em]s44.east) -- ([yshift=0.0em]s54.west);
+
+
+
+}
+
+
+
+\end{tikzpicture}
+%---------------------------------------------------------------------
+
+
--- a/Book/Chapter3/Figures/figure318-1.tex
+++ b/Book/Chapter3/Figures/figure318-1.tex
--- a/Book/Chapter3/Figures/figure318-2.tex
+++ b/Book/Chapter3/Figures/figure318-2.tex
--- a/Book/Chapter3/Figures/figure319.tex
+++ b/Book/Chapter3/Figures/figure319.tex
--- a/Book/Chapter3/Figures/figure320.tex
+++ b/Book/Chapter3/Figures/figure320.tex
--- a/Book/Chapter3/Chapter3/Figures/figure321.tex
+++ b/Book/Chapter3/Chapter3/Figures/figure321.tex
+
+
+%%% outline
+%-------------------------------------------------------------------------
+
+    \begin{tikzpicture}
+    \node [anchor=west] (e1) at (0,0) {$g(s,t)$};
+    \node [anchor=west] (e2) at (e1.east) {$=$};
+    \node [anchor=west,inner sep=2pt,fill=red!20] (e3) at (e2.east) {$\prod\nolimits_{(j,i) \in \hat{A}} \textrm{P}(s_j,t_i)$};
+    \node [anchor=west,inner sep=1pt] (e4) at (e3.east) {$\times$};
+    \node [anchor=west,inner sep=3pt,fill=blue!20] (e5) at (e4.east) {$\textrm{P}_{lm}(t)$};
+    \node [anchor=north west,inner sep=1pt] (n1) at ([xshift=2.5em,yshift=-1em]e1.south west) {$\textrm{P}(s|t)$};
+    \node [anchor=north] (n1part2) at ([yshift=0.3em]n1.south) {\scriptsize{\textbf{翻译模型}}};
+    \node [anchor=west,inner sep=1pt] (n2) at ([xshift=2em]n1.east) {$\textrm{P}(t)$};
+    \node [anchor=north] (n2part2) at ([yshift=0.3em]n2.south) {\scriptsize{\textbf{语言模型}}};
+    \draw [->,thick] (e3.south) .. controls +(south:1em) and +(north:1em) .. (n1.north);
+    \draw [->,thick] (e5.south) .. controls +(south:1em) and +(70:1em) .. (n2.north);
+    \end{tikzpicture}
+
+
+
+
+%---------------------------------------------------------------------
+
+
+
--- a/Book/Chapter3/Figures/figure322.tex
+++ b/Book/Chapter3/Figures/figure322.tex
--- a/Book/Chapter3/Figures/figure323.tex
+++ b/Book/Chapter3/Figures/figure323.tex
--- a/Book/Chapter3/Figures/figure324.tex
+++ b/Book/Chapter3/Figures/figure324.tex
--- a/Book/Chapter3/Figures/figure325.tex
+++ b/Book/Chapter3/Figures/figure325.tex
--- a/Book/Chapter3/Chapter3/Figures/figure326.tex
+++ b/Book/Chapter3/Chapter3/Figures/figure326.tex
+
+
+%%% outline
+%-------------------------------------------------------------------------
+
+
+
+\begin{tikzpicture}
+
+\node [anchor=west] (eq1) at (0,0) {$\textrm{P}(s,a|t)=$};
+\node [anchor=west,inner sep=1pt,minimum height=2.64em] (eq2) at (eq1.east) {$\textrm{P}(m|t)$};
+\node [anchor=west,inner sep=1pt,minimum height=2.64em] (eq3) at ([xshift=1pt]eq2.east) {$\prod\limits_{j=1}^{m}$};
+\node [anchor=west,inner sep=1pt,minimum height=2.64em] (eq4) at ([xshift=1pt]eq3.east) {$\textrm{P}(a_j|a_{1}^{j-1},s_{1}^{j-1},m,t)$};
+\node [anchor=west,inner sep=1pt,minimum height=2.64em] (eq5) at ([xshift=1pt]eq4.east) {$\textrm{P}(s_j|a_{1}^{j},s_{1}^{j-1},m,t)$};
+
+{
+\node [anchor=west,inner sep=1pt,minimum height=2.64em,fill=red!20] (eq2) at (eq1.east) {$\textrm{P}(m|t)$};
+}
+{
+\node [anchor=west,inner sep=1pt,minimum height=2.64em,fill=blue!20] (eq3) at ([xshift=1pt]eq2.east) {$\prod\limits_{j=1}^{m}$};
+}
+{
+\node [anchor=west,inner sep=1pt,minimum height=2.64em,fill=green!20] (eq4) at ([xshift=1pt]eq3.east) {$\textrm{P}(a_j|a_{1}^{j-1},s_{1}^{j-1},m,t)$};
+}
+{
+\node [anchor=west,inner sep=1pt,minimum height=2.64em,fill=purple!20] (eq5) at ([xshift=1pt]eq4.east) {$\textrm{P}(s_j|a_{1}^{j},s_{1}^{j-1},m,t)$};
+}
+
+{
+\node [anchor=north,draw,circle,inner sep=1pt,ublue] (c1) at ([yshift=-1pt]eq2.south) {\tiny{\textbf{1}}};
+}
+{
+\node [anchor=north,draw,circle,inner sep=1pt,ublue] (c2) at ([yshift=-1pt]eq3.south) {\tiny{\textbf{2}}};
+}
+{
+\node [anchor=north,draw,circle,inner sep=1pt,ublue] (c3) at ([yshift=-1pt]eq4.south) {\tiny{\textbf{3}}};
+}
+{
+\node [anchor=north,draw,circle,inner sep=1pt,ublue] (c4) at ([yshift=-1pt]eq5.south) {\tiny{\textbf{4}}};
+}
+
+\end{tikzpicture}
+
+
+
+%---------------------------------------------------------------------
+
+
--- a/Book/Chapter3/Chapter3/Figures/figure327.tex
+++ b/Book/Chapter3/Chapter3/Figures/figure327.tex
+
+
+%%% outline
+%-------------------------------------------------------------------------
+\begin{tikzpicture}
+    {\small
+    \node [anchor=west,inner sep=2pt] (s1) at (0,0) {$s_1$:在};
+    \node [anchor=west,inner sep=2pt] (s2) at ([xshift=1em]s1.east) {$s_2$:桌子};
+    \node [anchor=west,inner sep=2pt] (s3) at ([xshift=1em]s2.east) {$s_3$:上};
+    \node [anchor=north,inner sep=2pt] (t1) at ([yshift=-1.7em]s1.center) {$t_1$:on};
+    \node [anchor=north,inner sep=2pt] (t2) at ([yshift=-1.6em]s2.center) {$t_2$:the};
+    \node [anchor=north,inner sep=2pt] (t3) at ([yshift=-1.6em]s3.center) {$t_3$:table};
+    \node [anchor=east,inner sep=2pt] (t0) at ([xshift=-1.5em]t1.west) {$t_0$};
+    \draw [-] (s1.south) -- (t0.north);
+    \draw [-] (s2.south) -- (t3.north);
+    \draw [-] (s3.south) -- (t1.north);
+    }
+    \end{tikzpicture}
+
+%---------------------------------------------------------------------
+
+
--- a/Book/Chapter3/Figures/figure328.tex
+++ b/Book/Chapter3/Figures/figure328.tex
--- a/Book/Chapter3/Chapter3/Figures/figure329.tex
+++ b/Book/Chapter3/Chapter3/Figures/figure329.tex
+
+
+%%% outline
+%-------------------------------------------------------------------------
+\begin{tikzpicture}
+
+    {
+    \node [anchor=west,inner sep=2pt] (s1) at (0,0) {$s_1$:在};
+    \node [anchor=west,inner sep=2pt] (s2) at ([xshift=2em]s1.east) {$s_2$:桌子};
+    \node [anchor=west,inner sep=2pt] (s3) at ([xshift=2em]s2.east) {$s_3$:上};
+
+    \node [anchor=north,inner sep=2pt] (t1) at ([yshift=-2.4em]s1.center) {$t_1$:on};
+    \node [anchor=north,inner sep=2pt] (t2) at ([yshift=-2.4em]s2.center) {$t_2$:the};
+    \node [anchor=north,inner sep=2pt] (t3) at ([yshift=-2.4em]s3.center) {$t_3$:table};
+    \node [anchor=east,inner sep=2pt] (t0) at ([xshift=-2.2em]t1.west) {$t_0$};
+    }
+
+    {
+    \draw [-,dashed,thick] (s3.south) -- (t0);
+    \draw [-,dashed,thick] (s3.south) -- (t1);
+    \draw [-,dashed,thick] (s3.south) -- (t2);
+    \draw [-,dashed,thick] (s3.south) -- (t3);
+    }
+
+    \end{tikzpicture}
+
+%---------------------------------------------------------------------
+
+
--- a/Book/Chapter3/Figures/figure330.tex
+++ b/Book/Chapter3/Figures/figure330.tex
--- a/Book/Chapter3/Chapter3/Figures/figure331.tex
+++ b/Book/Chapter3/Chapter3/Figures/figure331.tex
+\begin{center}
+\begin{tikzpicture}
+{
+\node [anchor=west,inner sep=2pt] (s1) at (0,0) {$s_1$:在};
+\node [anchor=west,inner sep=2pt] (s2) at ([xshift=2em]s1.east) {$s_2$:桌子};
+\node [anchor=west,inner sep=2pt] (s3) at ([xshift=2em]s2.east) {$s_3$:上};
+\node [anchor=north,inner sep=2pt] (t1) at ([yshift=-2.4em]s1.center) {$t_1$:on};
+\node [anchor=north,inner sep=2pt] (t2) at ([yshift=-2.4em]s2.center) {$t_2$:the};
+\node [anchor=north,inner sep=2pt] (t3) at ([yshift=-2.4em]s3.center) {$t_3$:table};
+\node [anchor=east,inner sep=2pt] (t0) at ([xshift=-2.2em]t1.west) {$t_0$};
+\draw [-,dashed,thick] (s1.south) -- (t0.north);
+\draw [-,dashed,thick] (s2.south) -- (t3.north);
+\draw [-,dashed,thick] (s3.south) -- (t1.north);
+}
+\draw [-,dashed,thick,red] (s2.south) -- (t3.north);
+\end{tikzpicture}
+\end{center}
\ No newline at end of file
--- a/Book/Chapter3/Chapter3/Figures/figure332.tex
+++ b/Book/Chapter3/Chapter3/Figures/figure332.tex
+
+\begin{tikzpicture}
+
+\node [anchor=west] (eq1) at (0,0) {$\sum\limits_{a_1=0}^{l} ... \sum\limits_{a_m=0}^{l} \prod\limits_{j=1}^{m} f(s_j|t_{a_j})$};
+\node [anchor=west] (eq1part2) at (eq1.east) {$=$};
+\node [anchor=west,inner sep=2pt] (eq1part3) at (eq1part2.east) {$\prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} f(s_j|t_i)$};
+{
+\node [anchor=west,inner sep=2pt,fill=red!20] (eq1part3) at (eq1part2.east) {$\prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} f(s_j|t_i)$};
+}
+
+
+
+{
+\node [anchor=west] (eq2) at ([xshift=5em,yshift=-4.5em]eq1.west) {$\textrm{P}(s|t) = \frac{\epsilon}{(l+1)^{m}} $};
+\node [anchor=west,inner sep=2pt] (eq2part2) at ([xshift=-0.3em]eq2.east) {$\prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} f(s_j|t_i)$};
+\node [anchor=east] (eq2label) at ([xshift=-0em,yshift=0.2em]eq2.west) {\small{IBM模型1:}};
+
+\node [anchor=west,inner sep=2pt,fill=red!20] (eq2part2) at ([xshift=-0.3em]eq2.east) {$\prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} f(s_j|t_i)$};
+}
+
+{
+\node [anchor=west] (eq3) at ([xshift=5em,yshift=-7.5em]eq1.west) {$\textrm{P}(s|t) = \epsilon$};
+\node [anchor=west,inner sep=2pt] (eq3part2) at ([xshift=-0.3em]eq3.east) {$\prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} a(i|j,m,l) f(s_j|t_i)$};
+\node [anchor=east] (eq3label) at ([xshift=-0em,yshift=0.2em]eq3.west) {\small{类似的，IBM模型2:}};
+}
+
+\begin{pgfonlayer}{background}
+{
+\node[rectangle,draw=red,inner sep=2pt] [fit = (eq2) (eq2part2) (eq2label) (eq3) (eq3part2) (eq3label)] {};
+}
+\end{pgfonlayer}
+
+{
+\draw [->,thick] ([xshift=-1em]eq1part3.south) .. controls +(south:1.3em) and +(north:1.3em) .. ([xshift=1em]eq2part2.north);
+}
+
+\end{tikzpicture}
+
+
+
--- a/Book/Chapter3/Chapter3/Figures/figure333.tex
+++ b/Book/Chapter3/Chapter3/Figures/figure333.tex
+\begin{tikzpicture}
+
+\begin{scope}[scale=0.8]
+
+\draw [-,very thick] (0,0) sin (1,1) cos (2,0) sin (3,-1) cos (4,0) sin (7,-1);
+\draw [-latex,thick] (-0.5,-1.2) -- (8,-1.2);
+\draw [-latex,thick] (-0.5,-1.2) -- (-0.5,1.3);
+\draw [-,dashed] (1,1) -- (1,-1.2);
+
+\node [anchor=north] at (1,-1.2) {\scriptsize{参数$x$的最优解}};
+\node [anchor=center] at (1,-1.2) {$\bullet$};
+\node [anchor=center] at (1,1) {$\bullet$};
+\node [anchor=west] at (1.3,1) {\scriptsize{目标函数$f(x)$的最大值}};
+\draw [<-] (6.5,-0.8) -- (7.0,-0.3);
+\node [anchor=south west,inner sep=1pt] at (7.0,-0.3) {\scriptsize{$f(x)$}};
+
+\end{scope}
+
+\end{tikzpicture}
+
+
+
--- a/Book/Chapter3/Chapter3/Figures/figure334.tex
+++ b/Book/Chapter3/Chapter3/Figures/figure334.tex
+\begin{tikzpicture}
+
+\node [anchor=west] (eq1) at (0,0) {$\hat{\theta}$};
+\node [anchor=west] (eq2) at ([yshift=-0.2em]eq1.east) {=};
+\node [anchor=west,inner sep=2pt] (eq3) at ([yshift=-0.0em]eq2.east) {$\argmax$};
+\node [anchor=north,inner sep=1pt] (eq3part2) at ([yshift=-0.2em]eq3.south) {\scriptsize{$\theta$}};
+\node [anchor=west,inner sep=2pt] (eq4) at ([xshift=0.1em]eq3.east) {$\textrm{P}_{\theta}(s|t)$};
+
+{
+\node [anchor=west,inner sep=2pt,fill=red!20,minimum height=1.35em] (eq3) at ([yshift=-0.0em]eq2.east) {$\argmax$};
+\node [anchor=west,inner sep=2pt,fill=green!20] (eq4) at ([xshift=0.1em]eq3.east) {$\textrm{P}_{\theta}(s|t)$};
+
+\node [anchor=north,draw,inner sep=3pt,fill=red!20] (eq3label) at ([yshift=-1.5em]eq3.south west) {\footnotesize{\textbf{求最优参数}}};
+\node [anchor=north,draw,inner sep=3pt,fill=green!20] (eq4label) at ([yshift=-1.5em]eq4.south east) {\footnotesize{\textbf{目标函数}}};
+
+\draw [->,thick] ([xshift=-1em]eq3.south) .. controls +(south:1em) and +(north:1em) .. (eq3label.north);
+\draw [->,thick] (eq4.south) .. controls +(south:1em) and +(north:1em) .. (eq4label.north);
+}
+
+\end{tikzpicture}
+
+
+
--- a/Book/Chapter3/Chapter3/Figures/figure335.tex
+++ b/Book/Chapter3/Chapter3/Figures/figure335.tex
+
+\begin{tikzpicture}
+
+\node [anchor=west,inner sep=2pt,fill=green!20] (eq1) at (0,0) {$L(f,\lambda)$};
+\node [anchor=west,inner sep=2pt] (eq2) at (eq1.east) {$=$};
+\node [anchor=west,inner sep=2pt] (eq3) at (eq2.east) {$\frac{\epsilon}{(l+1)^{m}} \prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l}$};
+\node [anchor=west,inner sep=2pt] (eq4) at (eq3.east) {\black{$f(s_j|t_{i})$}};
+\node [anchor=west,inner sep=2pt] (eq5) at (eq4.east) {$-$};
+\node [anchor=north west,inner sep=2pt] (eq6) at ([yshift=-6pt]eq3.south west) {$\sum_{t_y}$};
+\node [anchor=west,inner sep=1pt,minimum height=1.85em] (eq7) at (eq6.east) {$\lambda_{t_y}$};
+\node [anchor=west,inner sep=1pt,minimum height=1.5em] (eq8) at ([xshift=3pt]eq7.east) {$(\sum_{s_x}$};
+\node [anchor=west,inner sep=1pt,minimum height=1.5em] (eq9) at (eq8.east) {\black{$f(s_x|t_y)$}};
+\node [anchor=west,inner sep=1pt,minimum height=1.5em] (eq10) at (eq9.east) {$-1)$};
+
+{
+\node [anchor=west,inner sep=2pt,fill=green!20] (eq1) at (0,0) {$L(f,\lambda)$};
+\node [anchor=west,inner sep=2pt] (eq2) at (eq1.east) {$=$};
+\node [anchor=west,inner sep=2pt] (eq3) at (eq2.east) {$\frac{\epsilon}{(l+1)^{m}} \prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l}$};
+\node [anchor=west,inner sep=2pt,draw,ublue,fill=blue!20] (eq4) at (eq3.east) {\black{$f(s_j|t_{i})$}};
+\node [anchor=west,inner sep=2pt] (eq5) at (eq4.east) {$-$};
+\node [anchor=north west,inner sep=2pt] (eq6) at ([yshift=-6pt]eq3.south west) {$\sum_{t_y}$};
+\node [anchor=west,inner sep=1pt,fill=purple!20,minimum height=1.85em] (eq7) at (eq6.east) {$\lambda_{t_y}$};
+\node [anchor=west,inner sep=1pt,minimum height=1.5em] (eq8) at ([xshift=3pt]eq7.east) {$(\sum_{s_x}$};
+\node [anchor=west,inner sep=1pt,draw,ublue,fill=blue!20,minimum height=1.5em] (eq9) at (eq8.east) {\black{$f(s_x|t_y)$}};
+\node [anchor=west,inner sep=1pt,minimum height=1.5em] (eq10) at (eq9.east) {$-1)$};
+}
+
+\begin{pgfonlayer}{background}
+\node[rectangle,red,inner sep=2pt,fill=red!20] [fit = (eq3) (eq4)] (oldobj) {};
+{
+\node[rectangle,red,inner sep=2pt,fill=orange!20] [fit = (eq8) (eq9) (eq10)] (constraint) {};
+}
+\end{pgfonlayer}
+
+\node [anchor=west,draw,fill=green!20] (label1) at ([xshift=-2em,yshift=3em]eq1.north west) {新目标函数};
+\node [anchor=west,draw,fill=red!20] (label2) at ([xshift=2em]label1.east) {旧目标函数};
+
+{
+\node [anchor=west,draw,fill=blue!20] (label3) at ([xshift=2em]label2.east) {参数：词翻译概率};
+\node [anchor=west,draw,fill=purple!20] (label4) at ([xshift=-4em,yshift=-2.5em]eq6.south west) {参数：拉格朗日乘数};
+\node [anchor=west,draw,fill=orange!20] (label5) at ([xshift=3em]label4.east) {参数约束条件};
+}
+
+\draw [<-,thick] ([xshift=-1em]label1.south) .. controls +(south:2em) and +(north:2em) .. (eq1.north);
+\draw [<-,thick] ([xshift=-1em]label2.south) .. controls +(south:1.0em) and +(north:1.0em) .. (oldobj.north);
+
+{
+\draw [<-,thick] ([xshift=0em]label3.south) .. controls +(south:6.5em) and +(north:1.5em) .. (eq9.north);
+\draw [<-,thick] ([xshift=0em]label3.south) .. controls +(south:1.5em) and +(north:1.5em) .. (eq4.north);
+\draw [<-,thick] ([xshift=-2em]label4.north) .. controls +(north:1.5em) and +(south:1.5em) .. (eq7.south);
+\draw [<-,thick] ([xshift=0em]label5.north) .. controls +(north:1.5em) and +(south:1.5em) .. (constraint.south);
+}
+
+\end{tikzpicture}
+
+
+
--- a/Book/Chapter3/Chapter3/Figures/figure336.tex
+++ b/Book/Chapter3/Chapter3/Figures/figure336.tex
+    \begin{tikzpicture}
+    \node[anchor=west,inner sep=2pt] (eq11) at (0,0) {\large{$\frac{\partial \big[ \prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} f(s_j|t_i) \big]}{\partial f(s_u|t_v)}$}};
+    \node[anchor=west,inner sep=2pt] (eq12) at ([yshift=-9pt]eq11.east) {=};
+    \node[anchor=west,inner sep=2pt,fill=red!20] (eq13) at ([yshift=2pt]eq12.east) {\large{$\frac{\partial \big[ \prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} f(s_j|t_i) \big]}{\partial \big[ \sum\limits_{i=0}^{l}f(s_u|t_i) \big]}$}};
+    \node[anchor=west,inner sep=2pt] (eq14) at ([yshift=-1pt]eq13.east) {$\cdot$};
+    \node[anchor=west,inner sep=2pt,fill=blue!20,minimum height=4em] (eq15) at ([yshift=8pt]eq14.east) {\large{$\frac{\partial \big[ \sum\limits_{i=0}^{l}f(s_u|t_i) \big]}{\partial f(s_u|t_v)}$}};
+    \node[anchor=south,inner sep=1pt] (label1) at ([yshift=1pt]eq13.north) {\footnotesize{$\partial g(z)/\partial z$}};
+    \node[anchor=south,inner sep=1pt] (label2) at ([yshift=1pt]eq15.north) {\footnotesize{$\partial z/\partial f$}};
+    \node[anchor=north west,inner sep=2pt] (eq21) at (eq13.south west) {$\frac{\sum\limits_{j=1}^{m} \delta(s_j,s_u)}{\sum\limits_{i=0}^{l}f(s_u|t_i)} \prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} f(s_j|t_i) \cdot \sum\limits_{i=0}^{l} \delta(t_i,t_v)$};
+    \node[anchor=east,inner sep=2pt] (eq20) at (eq21.west) {=};
+    \end{tikzpicture}
--- a/Book/Chapter3/Chapter3/Figures/figure337.tex
+++ b/Book/Chapter3/Chapter3/Figures/figure337.tex
+
+%%% outline
+%-------------------------------------------------------------------------
+
+\begin{tikzpicture}
+
+{\large
+\node [anchor=west,minimum height=3em,inner sep=2pt] (eq11) at (0,0) {max};
+\node [anchor=west,inner sep=2pt] (eq12) at ([xshift=2pt]eq11.east) {$\Big( \frac{\epsilon}{(l+1)^{m}} \prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l}$};
+\node [anchor=west,inner sep=2pt] (eq13) at (eq12.east) {$f(s_j|t_i)$};
+\node [anchor=west,inner sep=2pt] (eq14) at ([yshift=0.2em]eq13.east) {$\Big)$};
+
+\node [anchor=north west,minimum height=1.8em,inner sep=2pt] (eq21) at ([yshift=-0.5em]eq11.south west) {s.t.};
+\node [anchor=west] (eq22) at ([xshift=0.5em,yshift=-0.1em]eq21.east) {\normalsize{for each word $t_y$}:};
+\node [anchor=west] (eq23) at ([xshift=0.1em,yshift=-0.0em]eq22.east) {$\sum_{s_x} f(s_x|t_y) = 1$};
+
+{
+\node [anchor=west,minimum height=3em,inner sep=2pt,fill=green!20] (eq11) at (0,0) {max};
+\node [anchor=west,draw,thin,ublue,inner sep=2pt,fill=blue!20] (eq13) at (eq12.east) {\black{$f(s_j|t_i)$}};
+
+\node [anchor=north west,minimum height=1.8em,inner sep=2pt,fill=purple!20] (eq21) at ([yshift=-0.5em]eq11.south west) {s.t.};
+\node [anchor=west,fill=orange!20] (eq23) at ([xshift=0.1em,yshift=-0.0em]eq22.east) {$\sum_{s_x} f(s_x|t_y) = 1$};
+}
+
+\begin{pgfonlayer}{background}
+{
+\node[rectangle,red,inner sep=0pt,fill=red!20] [fit = (eq12) (eq13) (eq14)] (eq1full) (eq1obj) {};
+}
+\end{pgfonlayer}
+}
+
+{
+\node [anchor=west,draw,fill=green!20] (label1) at ([xshift=-2em,yshift=3em]eq11.north west) {最大化函数};
+\node [anchor=west,draw,fill=red!20] (label2) at ([xshift=4em]label1.east) {目标函数};
+\node [anchor=west,draw,fill=blue!20] (label3) at ([xshift=4em]label2.east) {参数};
+
+\node [anchor=west,draw,fill=purple!20] (label4) at ([xshift=-2em,yshift=-2.5em]eq21.south west) {subject to=满足...约束};
+\node [anchor=west,draw,fill=orange!20] (label5) at ([xshift=3em]label4.east) {翻译概率归一化约束条件};
+
+
+\draw [<-,thick] ([xshift=-2em]label1.south) .. controls +(south:2em) and +(north:2em) .. (eq11.north);
+\draw [<-,thick] ([xshift=-0em]label2.south) .. controls +(south:1.7em) and +(north:1.7em) .. (eq1obj.north);
+\draw [<-,thick] ([xshift=-0em]label3.south) .. controls +(south:2.5em) and +(north:2.5em) .. (eq13.north);
+\draw [<-,thick] ([xshift=2em]label4.north) .. controls +(north:1.5em) and +(south:1.5em) .. (eq21.south);
+\draw [<-,thick] ([xshift=0em]label5.north) .. controls +(north:1.0em) and +(south:1.0em) .. (eq23.south);
+}
+
+\end{tikzpicture}
+
+
+
--- a/Book/Chapter3/Chapter3/Figures/figure338.tex
+++ b/Book/Chapter3/Chapter3/Figures/figure338.tex
+
+
+\begin{tikzpicture}
+
+\node [anchor=west,inner sep=2pt,fill=green!20] (eq1) at (0,0) {$L(f,\lambda)$};
+\node [anchor=west,inner sep=2pt] (eq2) at (eq1.east) {$=$};
+\node [anchor=west,inner sep=2pt] (eq3) at (eq2.east) {$\frac{\epsilon}{(l+1)^{m}} \prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l}$};
+\node [anchor=west,inner sep=2pt] (eq4) at (eq3.east) {\black{$f(s_j|t_{i})$}};
+\node [anchor=west,inner sep=2pt] (eq5) at (eq4.east) {$-$};
+\node [anchor=north west,inner sep=2pt] (eq6) at ([yshift=-6pt]eq3.south west) {$\sum_{t_y}$};
+\node [anchor=west,inner sep=1pt,minimum height=1.85em] (eq7) at (eq6.east) {$\lambda_{t_y}$};
+\node [anchor=west,inner sep=1pt,minimum height=1.5em] (eq8) at ([xshift=3pt]eq7.east) {$(\sum_{s_x}$};
+\node [anchor=west,inner sep=1pt,minimum height=1.5em] (eq9) at (eq8.east) {\black{$f(s_x|t_y)$}};
+\node [anchor=west,inner sep=1pt,minimum height=1.5em] (eq10) at (eq9.east) {$-1)$};
+
+{
+\node [anchor=west,inner sep=2pt,fill=green!20] (eq1) at (0,0) {$L(f,\lambda)$};
+\node [anchor=west,inner sep=2pt] (eq2) at (eq1.east) {$=$};
+\node [anchor=west,inner sep=2pt] (eq3) at (eq2.east) {$\frac{\epsilon}{(l+1)^{m}} \prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l}$};
+\node [anchor=west,inner sep=2pt,draw,ublue,fill=blue!20] (eq4) at (eq3.east) {\black{$f(s_j|t_{i})$}};
+\node [anchor=west,inner sep=2pt] (eq5) at (eq4.east) {$-$};
+\node [anchor=north west,inner sep=2pt] (eq6) at ([yshift=-6pt]eq3.south west) {$\sum_{t_y}$};
+\node [anchor=west,inner sep=1pt,fill=purple!20,minimum height=1.85em] (eq7) at (eq6.east) {$\lambda_{t_y}$};
+\node [anchor=west,inner sep=1pt,minimum height=1.5em] (eq8) at ([xshift=3pt]eq7.east) {$(\sum_{s_x}$};
+\node [anchor=west,inner sep=1pt,draw,ublue,fill=blue!20,minimum height=1.5em] (eq9) at (eq8.east) {\black{$f(s_x|t_y)$}};
+\node [anchor=west,inner sep=1pt,minimum height=1.5em] (eq10) at (eq9.east) {$-1)$};
+}
+
+\begin{pgfonlayer}{background}
+\node[rectangle,red,inner sep=2pt,fill=red!20] [fit = (eq3) (eq4)] (oldobj) {};
+{
+\node[rectangle,red,inner sep=2pt,fill=orange!20] [fit = (eq8) (eq9) (eq10)] (constraint) {};
+}
+\end{pgfonlayer}
+
+\node [anchor=west,draw,fill=green!20] (label1) at ([xshift=-2em,yshift=3em]eq1.north west) {新目标函数};
+\node [anchor=west,draw,fill=red!20] (label2) at ([xshift=2em]label1.east) {旧目标函数};
+
+{
+\node [anchor=west,draw,fill=blue!20] (label3) at ([xshift=2em]label2.east) {参数：词翻译概率};
+\node [anchor=west,draw,fill=purple!20] (label4) at ([xshift=-4em,yshift=-2.5em]eq6.south west) {参数：拉格朗日乘数};
+\node [anchor=west,draw,fill=orange!20] (label5) at ([xshift=3em]label4.east) {参数约束条件};
+}
+
+\draw [<-,thick] ([xshift=-1em]label1.south) .. controls +(south:2em) and +(north:2em) .. (eq1.north);
+\draw [<-,thick] ([xshift=-1em]label2.south) .. controls +(south:1.0em) and +(north:1.0em) .. (oldobj.north);
+
+{
+\draw [<-,thick] ([xshift=0em]label3.south) .. controls +(south:6.5em) and +(north:1.5em) .. (eq9.north);
+\draw [<-,thick] ([xshift=0em]label3.south) .. controls +(south:1.5em) and +(north:1.5em) .. (eq4.north);
+\draw [<-,thick] ([xshift=-2em]label4.north) .. controls +(north:1.5em) and +(south:1.5em) .. (eq7.south);
+\draw [<-,thick] ([xshift=0em]label5.north) .. controls +(north:1.5em) and +(south:1.5em) .. (constraint.south);
+}
+
+\end{tikzpicture}
+
+
+
--- a/Book/Chapter3/Chapter3/Figures/figure4.jpg
+++ b/Book/Chapter3/Chapter3/Figures/figure4.jpg
--- a/Book/Chapter3/Chapter3/Figures/figure5.jpg
+++ b/Book/Chapter3/Chapter3/Figures/figure5.jpg
--- a/Book/Chapter3/Chapter3/Figures/figure6.jpg
+++ b/Book/Chapter3/Chapter3/Figures/figure6.jpg
--- a/Book/Chapter3/Chapter3/Figures/figure7.jpg
+++ b/Book/Chapter3/Chapter3/Figures/figure7.jpg
--- a/Book/Chapter3/Chapter3/Figures/figurerole-of-P-in-sentence-level-translation.tex
+++ b/Book/Chapter3/Chapter3/Figures/figurerole-of-P-in-sentence-level-translation.tex
+
+
+%%% outline
+%-------------------------------------------------------------------------
+\begin{tikzpicture}
+
+{\scriptsize
+\node [anchor=north west,inner sep=1pt] (entry1) at (0,0) {\tiny{\textbf{1:} 这 是 数据 $\leftrightarrow$ This is data}};
+\node [anchor=north west,inner sep=1pt] (entry2) at ([yshift=0.1em]entry1.south west) {\tiny{\textbf{2:} 小心 ！$\leftrightarrow$ Look out !}};
+\node [anchor=north west,inner sep=1pt] (entry3) at ([yshift=0.1em]entry2.south west) {\tiny{\textbf{3:} 你 是 谁 $\leftrightarrow$ Who are you}};
+\node [anchor=north west,inner sep=2pt] (entry4) at ([yshift=0.1em]entry3.south west) {...};
+\node [anchor=south west] (corpuslabel) at (entry1.north west) {{\color{ublue} \textbf{双语平行数据}}};
+
+\begin{pgfonlayer}{background}
+\node[rectangle,draw=ublue,thick,inner sep=0.2em,fill=white,drop shadow,minimum height=1.6cm] [fit = (entry1) (entry2) (entry3) (entry4) (corpuslabel)] (corpus) {};
+\end{pgfonlayer}
+}
+
+\node [anchor=west,ugreen] (P) at ([xshift=4em,yshift=-0.7em]corpus.east){P($t|s$)};
+\node [anchor=south] (modellabel) at (P.north) {{\color{ublue} {\scriptsize \textbf{翻译模型}}}};
+
+\begin{pgfonlayer}{background}
+\node[rectangle,draw=ublue,thick,inner sep=0.2em,fill=white,drop shadow,minimum height=1.6cm] [fit = (P) (modellabel)] (model) {};
+\end{pgfonlayer}
+
+\draw [->,very thick,ublue] ([xshift=0.2em]corpus.east) -- ([xshift=3.0em]corpus.east)  node [inner sep=0pt,pos=0.5,above,yshift=0.3em] (trainingarrow) {\color{red}{\scriptsize{模型学习}}};
+
+\draw [->,very thick,ublue] ([xshift=0.4em]model.east) -- ([xshift=3.4em]model.east)  node [inner sep=0pt,pos=0.5,above,yshift=0.3em] (decodingarrow) {\color{red}{\scriptsize{穷举\&计算}}};
+
+{\scriptsize
+\node [anchor=north west,inner sep=2pt] (sentlabel) at ([xshift=5.5em,yshift=-0.3em]model.north east) {{\color{ublue} \textbf{机器翻译引擎}}};
+\node [anchor=north west] (sent) at ([yshift=-0.5em]sentlabel.south west) {\textbf{对任意句子}};
+\node [anchor=north west] (sentpart2) at ([yshift=0.3em]sent.south west) {\textbf{进行翻译}};
+}
+
+\begin{pgfonlayer}{background}
+\node[rectangle,draw=ublue,thick,inner sep=0.2em,fill=white,drop shadow,minimum height=1.6cm] [fit = (sentlabel) (sent) (sentpart2)] (segsystem) {};
+\end{pgfonlayer}
+
+\node[rectangle,fill=white,fill opacity=0.85,inner sep=0pt] [fit = (segsystem) (decodingarrow)] (segsystem2) {};
+\node[rectangle,fill=white,fill opacity=0.85,inner sep=0pt] [fit = (corpus) (trainingarrow)] (corpus2) {};
+
+\end{tikzpicture}
+
+
+%---------------------------------------------------------------------
+
+
--- a/Book/Chapter3/Figures/figure-1.tex
+++ b/Book/Chapter3/Figures/figure-1.tex
+
+
+%%% outline
+%-------------------------------------------------------------------------
+
+
+\begin{tikzpicture}
+
+\node [anchor=west,inner sep=2pt,fill=red!20,minimum height=3em] (eq1) at (0,0) {$f(s_u|t_v)$};
+\node [anchor=west,inner sep=2pt] (eq2) at ([xshift=-2pt]eq1.east) {$=$};
+\node [anchor=west,inner sep=2pt] (eq3) at ([xshift=-2pt]eq2.east) {$\lambda_{t_v}^{-1}$};
+\node [anchor=west,inner sep=2pt] (eq4) at ([xshift=-2pt]eq3.east) {$\frac{\epsilon}{(l+1)^{m}}$};
+\node [anchor=west,inner sep=2pt,fill=red!20,minimum height=3em] (eq5) at ([xshift=-2pt]eq4.east) {\footnotesize{$\prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} f(s_j|t_i)$}};
+\node [anchor=west,inner sep=2pt] (eq6) at ([xshift=-2pt]eq5.east) {\footnotesize{$\sum\limits_{j=1}^{m} \delta(s_j,s_u) \sum\limits_{i=0}^{l} \delta(t_i,t_v)$}};
+\node [anchor=west,inner sep=2pt,fill=red!20,minimum height=3em] (eq7) at ([xshift=-2pt,yshift=-0pt]eq6.east) {$\frac{f(s_u|t_v)}{\sum_{i=0}^{l}f(s_u|t_i)}$};
+
+\node [anchor=south west,inner sep=2pt] (label1) at ([yshift=1em]eq1.north west) {\footnotesize{\textbf{新的参数值}}};
+\node [anchor=south east,inner sep=2pt] (label2) at ([yshift=1em,xshift=-5em]eq7.north east) {\footnotesize{\textbf{旧的参数值}}};
+
+
+\draw [<-,thick] (label1.south) .. controls +(south:1em) and +(north:1em) .. ([xshift=-1em]eq1.north);
+\draw [<-,thick] (label2.south) .. controls +(300:1em) and +(north:1em) .. ([xshift=1em]eq7.north);
+\draw [<-,thick] ([xshift=-0.5em]label2.south) .. controls +(240:1em) and +(north:1em) .. ([xshift=1em]eq5.north);
+
+
+
+\end{tikzpicture}
+
+
+
+
+%---------------------------------------------------------------------
+
+
--- a/Book/Chapter3/Figures/figure-2.tex
+++ b/Book/Chapter3/Figures/figure-2.tex
+
+
+%%% outline
+%-------------------------------------------------------------------------
+
+
+
+
+\begin{tikzpicture}
+
+\node [anchor=west,inner sep=2pt,minimum height=2em] (eq1) at (0,0) {$f(s_u|t_v)$};
+\node [anchor=west,inner sep=2pt] (eq2) at ([xshift=-2pt]eq1.east) {$=$};
+\node [anchor=west,inner sep=2pt,minimum height=2em] (eq3) at ([xshift=-2pt]eq2.east) {$\lambda_{t_v}^{-1}$};
+\node [anchor=west,inner sep=2pt,minimum height=3.0em] (eq4) at ([xshift=-3pt]eq3.east) {\footnotesize{$\frac{\epsilon}{(l+1)^{m}} \prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} f(s_j|t_i)$}};
+\node [anchor=west,inner sep=2pt,minimum height=3.0em] (eq5) at ([xshift=1pt]eq4.east) {\footnotesize{$\sum\limits_{j=1}^{m} \delta(s_j,s_u) \sum\limits_{i=0}^{l} \delta(t_i,t_v)$}};
+\node [anchor=west,inner sep=2pt,minimum height=3.0em] (eq6) at ([xshift=1pt]eq5.east) {$\frac{f(s_u|t_v)}{\sum_{i=0}^{l}f(s_u|t_i)}$};
+
+
+{
+\node [anchor=west,inner sep=2pt,fill=red!20,minimum height=3.0em] (eq4) at ([xshift=-3pt]eq3.east) {\footnotesize{$\frac{\epsilon}{(l+1)^{m}} \prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} f(s_j|t_i)$}};
+}
+{
+\node [anchor=west,inner sep=2pt,fill=blue!20,minimum height=3.0em] (eq5) at ([xshift=1pt]eq4.east) {\footnotesize{$\sum\limits_{j=1}^{m} \delta(s_j,s_u) \sum\limits_{i=0}^{l} \delta(t_i,t_v)$}};
+}
+{
+\node [anchor=west,inner sep=2pt,fill=green!20,minimum height=3.0em] (eq6) at ([xshift=1pt]eq5.east) {$\frac{f(s_u|t_v)}{\sum_{i=0}^{l}f(s_u|t_i)}$};
+}
+
+{
+\node [anchor=south west,inner sep=2pt] (label1) at (eq4.north west) {\textbf{\scriptsize{翻译概率$\textrm{P}(s|t)$}}};
+}
+{
+\node [anchor=south west,inner sep=2pt] (label2) at (eq5.north west) {\textbf{\scriptsize{配对的总次数}}};
+\node [anchor=south west,inner sep=2pt] (label2part2) at ([yshift=-3pt]label2.north west) {\textbf{\scriptsize{$(s_u,t_v)$在句对$(s,t)$中}}};
+}
+{
+\node [anchor=south west,inner sep=2pt] (label3) at (eq6.north west) {\textbf{\scriptsize{有的$t_i$的相对值}}};
+\node [anchor=south west,inner sep=2pt] (label4) at ([yshift=-3pt]label3.north west) {\textbf{\scriptsize{$f(s_u|t_v)$对于所}}};
+}
+
+{
+\node [anchor=east,rotate=90] (neweq1) at ([yshift=-0em]eq4.south) {=};
+\node [anchor=north,inner sep=1pt] (neweq1full) at (neweq1.west) {\large{$\textrm{P}(s|t)$}};
+}
+
+{
+\draw[decorate,thick,decoration={brace,amplitude=5pt,mirror}] ([yshift=-0.2em]eq5.south west) -- ([yshift=-0.2em]eq6.south east) node [pos=0.4,below,xshift=-0.0em,yshift=-0.3em] (expcount1) {\footnotesize{\textbf{'$t_v$翻译为$s_u$'这个事件}}};
+\node [anchor=north west] (expcount2) at ([yshift=0.5em]expcount1.south west) {\footnotesize{\textbf{出现次数的期望的估计}}};
+\node [anchor=north west] (expcount3) at ([yshift=0.5em]expcount2.south west) {\footnotesize{\textbf{称之为期望频次expected count}}};
+}
+
+\end{tikzpicture}
+
+
+
+
--- a/Book/Chapter3/Figures/figure-3.tex
+++ b/Book/Chapter3/Figures/figure-3.tex
+
+%%% outline
+%-------------------------------------------------------------------------
+
+
+
+
+
+    \begin{tikzpicture}
+    \node [anchor=west,inner sep=2pt] (eq1) at (0,0) {$f(s_u|t_v)$};
+    \node [anchor=west] (eq2) at (eq1.east) {$=$\ };
+    \draw [-] ([xshift=0.3em]eq2.east) -- ([xshift=11.6em]eq2.east);
+    \node [anchor=south west] (eq3) at ([xshift=1em]eq2.east) {$\sum_{i=1}^{N} c_{\mathbb{E}}(s_u|t_v;s^{[i]},t^{[i]})$};
+    \node [anchor=north west] (eq4) at (eq2.east) {$\sum_{s_u} \sum_{i=1}^{N} c_{\mathbb{E}}(s_u|t_v;s^{[i]},t^{[i]})$};
+
+   {
+    \node [anchor=south] (label1) at ([yshift=-6em,xshift=3em]eq1.north west) {利用这个公式计算};
+    \node [anchor=north west] (label1part2) at ([yshift=0.3em]label1.south west) {新的$f(s_u|t_v)$值};
+    }
+    {
+    \node [anchor=west] (label2) at ([xshift=5em]label1.east) {用当前的$f(s_u|t_v)$};
+    \node [anchor=north west] (label2part2) at ([yshift=0.3em]label2.south west) {计算期望频次$c_{\mathbb{E}}(\cdot)$};
+    }
+
+    {
+    \node [anchor=west,fill=red!20,inner sep=2pt] (eq1) at (0,0) {$f(s_u|t_v)$};
+    }
+
+    \begin{pgfonlayer}{background}
+    {
+    \node[rectangle,fill=blue!20,inner sep=0] [fit = (eq3) (eq4)] (c) {};
+    }
+   {
+    \node[rectangle,draw,red,inner sep=0] [fit = (label1) (label1part2)] (flabel) {};
+    }
+    {
+    \node[rectangle,draw,ublue,inner sep=0] [fit = (label2) (label2part2)] (clabel) {};
+    }
+    \end{pgfonlayer}
+
+    {
+    \draw [->,thick] (eq1.south) ..controls +(south:1.5em) and +(north:1.5em).. (flabel.north);
+    }
+    {
+    \draw [->,thick] (c.south) ..controls +(south:1.0em) and +(north:1.0em).. (clabel.north);
+    }
+
+    {
+    \draw [->,thick] ([yshift=1em]flabel.east) -- ([yshift=1em]clabel.west);
+    \draw [<-,thick] ([yshift=-1em]flabel.east) -- ([yshift=-1em]clabel.west) node [pos=0.5,above,yshift=0.3em] {\footnotesize{\textbf{反复执行}}};
+    }
+    \end{tikzpicture}
+
+
+
+
--- a/Book/Chapter3/Figures/figure-4.tex
+++ b/Book/Chapter3/Figures/figure-4.tex
+
+\definecolor{ublue}{rgb}{0.152,0.250,0.545}
+\definecolor{ugreen}{rgb}{0,0.5,0}
+
+
+%%% outline
+%-------------------------------------------------------------------------
+\begin{tikzpicture}
+\node [anchor=north west] (line1) at (0,0) {\textbf{IBM模型1的训练（EM算法）}};
+\node [anchor=north west] (line2) at ([yshift=-0.3em]line1.south west) {输入: 平行语料${(s^{[1]},t^{[1]}),...,(s^{[N]},t^{[N]})}$};
+\node [anchor=north west] (line3) at ([yshift=-0.1em]line2.south west) {输出: 参数$f(\cdot|\cdot)$的最优值};
+\node [anchor=north west] (line4) at ([yshift=-0.1em]line3.south west) {1: \textbf{Function} \textsc{TrainItWithEM}($\{(s^{[1]},t^{[1]}),...,(s^{[N]},t^{[N]})\}$) };
+\node [anchor=north west] (line5) at ([yshift=-0.1em]line4.south west) {2: \ \ Initialize $f(\cdot|\cdot)$ \hspace{5em} $\rhd$ 比如给$f(\cdot|\cdot)$一个均匀分布};
+\node [anchor=north west] (line6) at ([yshift=-0.1em]line5.south west) {3: \ \ Loop until $f(\cdot|\cdot)$ converges};
+\node [anchor=north west] (line7) at ([yshift=-0.1em]line6.south west) {4: \ \ \ \ \textbf{foreach} $k = 1$ to $N$ \textbf{do}};
+\node [anchor=north west] (line8) at ([yshift=-0.1em]line7.south west) {5: \ \ \ \ \ \ \ \footnotesize{$c_{\mathbb{E}}(s_u|t_v;s^{[k]},t^{[k]}) = \sum\limits_{j=1}^{|s^{[k]}|} \delta(s_j,s_u) \sum\limits_{i=0}^{|t^{[k]}|} \delta(t_i,t_v) \cdot \frac{f(s_u|t_v)}{\sum_{i=0}^{l}f(s_u|t_i)}$}\normalsize{}};
+\node [anchor=north west] (line9) at ([yshift=-0.1em]line8.south west) {6: \ \ \ \ \textbf{foreach} $t_v$ appears at least one of $\{t^{[1]},...,t^{[N]}\}$ \textbf{do}};
+\node [anchor=north west] (line10) at ([yshift=-0.1em]line9.south west) {7: \ \ \ \ \ \ \ $\lambda_{t_v}^{'} = \sum_{s_u} \sum_{k=1}^{N} c_{\mathbb{E}}(s_u|t_v;s^{[k]},t^{[k]})$};
+\node [anchor=north west] (line11) at ([yshift=-0.1em]line10.south west) {8: \ \ \ \ \ \ \ \textbf{foreach} $s_u$ appears at least one of $\{s^{[1]},...,s^{[N]}\}$ \textbf{do}};
+\node [anchor=north west] (line12) at ([yshift=-0.1em]line11.south west) {9: \ \ \ \ \ \ \ \ \ $f(s_u|t_v) = \sum_{k=1}^{N} c_{\mathbb{E}}(s_u|t_v;s^{[k]},t^{[k]}) \cdot (\lambda_{t_v}^{'})^{-1}$};
+\node [anchor=north west] (line13) at ([yshift=-0.1em]line12.south west) {10: \ \textbf{return} $f(\cdot|\cdot)$};
+
+\begin{pgfonlayer}{background}
+{
+\node[rectangle,draw=ublue, inner sep=0mm] [fit =(line1)(line2)(line3)(line4)(line5)(line6)(line7)(line11)(line8)(line9)(line13)] {};
+}
+\end{pgfonlayer}
+
+\end{tikzpicture}
+
+
+
--- a/Book/Chapter3/Figures/figure-alignment-of-all-words-in-zh-en-sentence.tex
+++ b/Book/Chapter3/Figures/figure-alignment-of-all-words-in-zh-en-sentence.tex
+
+
+%%% outline
+%-------------------------------------------------------------------------
+\begin{tikzpicture}
+    \begin{scope}
+    {
+    \node [anchor=west,inner sep=2pt] (s1) at (0,0) {谢谢};
+    \node [anchor=west,inner sep=2pt] (s2) at ([xshift=0.4em]s1.east) {你};
+    \node [anchor=north,inner sep=2pt] (t1) at ([yshift=-1.4em]s1.center) {thank};
+    \node [anchor=north,inner sep=2pt] (t2) at ([yshift=-1.6em]s2.center) {you};
+    \node [anchor=east,inner sep=2pt] (t0) at ([xshift=-0.2em,yshift=-0.05em]t1.west) {$t_0$};
+    \draw [-] (s1.south) -- (t0.north);
+    \draw [-] (s2.south) -- (t0.north);
+   {
+    \node [anchor=south east,inner sep=0pt] (p) at (t0.north west) {\small{{\color{ugreen} P(}}};
+    \node [anchor=south west,inner sep=0pt] (p2) at ([yshift=0.2em]t2.north east) {\small{{\color{ugreen} )}}};
+    \node [anchor=west] (eq) at ([xshift=0.7em]p2.east) {\small{+}};
+    }
+    }
+    \end{scope}
+    \begin{scope}[xshift=1.5in]
+    {
+    \node [anchor=west,inner sep=2pt] (s1) at (0,0) {谢谢};
+    \node [anchor=west,inner sep=2pt] (s2) at ([xshift=0.4em]s1.east) {你};
+    \node [anchor=north,inner sep=2pt] (t1) at ([yshift=-1.4em]s1.center) {thank};
+    \node [anchor=north,inner sep=2pt] (t2) at ([yshift=-1.6em]s2.center) {you};
+    \node [anchor=east,inner sep=2pt] (t0) at ([xshift=-0.2em,yshift=-0.05em]t1.west) {$t_0$};
+    \draw [-] (s1.south) -- (t0.north);
+    \draw [-] (s2.south) -- (t1.north);
+    {
+    \node [anchor=south east,inner sep=0pt] (p) at (t0.north west) {\small{{\color{ugreen} P(}}};
+    \node [anchor=south west,inner sep=0pt] (p2) at ([yshift=0.2em]t2.north east) {\small{{\color{ugreen} )}}};
+    \node [anchor=west] (eq) at ([xshift=0.7em]p2.east) {\small{+}};
+    }
+    }
+    \end{scope}
+    \begin{scope}[xshift=3in]
+    {
+    \node [anchor=west,inner sep=2pt] (s1) at (0,0) {谢谢};
+    \node [anchor=west,inner sep=2pt] (s2) at ([xshift=0.4em]s1.east) {你};
+    \node [anchor=north,inner sep=2pt] (t1) at ([yshift=-1.4em]s1.center) {thank};
+    \node [anchor=north,inner sep=2pt] (t2) at ([yshift=-1.6em]s2.center) {you};
+    \node [anchor=east,inner sep=2pt] (t0) at ([xshift=-0.2em,yshift=-0.05em]t1.west) {$t_0$};
+    \draw [-] (s1.south) -- (t0.north);
+    \draw [-] (s2.south) -- (t2.north);
+    {
+    \node [anchor=south east,inner sep=0pt] (p) at (t0.north west) {\small{{\color{ugreen} P(}}};
+    \node [anchor=south west,inner sep=0pt] (p2) at ([yshift=0.2em]t2.north east) {\small{{\color{ugreen} )}}};
+    \node [anchor=west] (eq) at ([xshift=0.7em]p2.east) {\small{+}};
+    }
+    }
+    \end{scope}
+    \begin{scope}[yshift=-0.6in]
+    {
+    \node [anchor=west,inner sep=2pt] (s1) at (0,0) {谢谢};
+    \node [anchor=west,inner sep=2pt] (s2) at ([xshift=0.4em]s1.east) {你};
+    \node [anchor=north,inner sep=2pt] (t1) at ([yshift=-1.4em]s1.center) {thank};
+    \node [anchor=north,inner sep=2pt] (t2) at ([yshift=-1.6em]s2.center) {you};
+    \node [anchor=east,inner sep=2pt] (t0) at ([xshift=-0.2em,yshift=-0.05em]t1.west) {$t_0$};
+    \draw [-] (s1.south) -- ([yshift=-0.2em]t1.north);
+    \draw [-] (s2.south) -- (t0.north);
+    {
+    \node [anchor=south east,inner sep=0pt] (p) at (t0.north west) {\small{{\color{ugreen} P(}}};
+    \node [anchor=south west,inner sep=0pt] (p2) at ([yshift=0.2em]t2.north east) {\small{{\color{ugreen} )}}};
+    \node [anchor=west] (eq) at ([xshift=0.7em]p2.east) {\small{+}};
+    }
+    }
+    \end{scope}
+    \begin{scope}[xshift=1.5in,yshift=-0.6in]
+    {
+    \node [anchor=west,inner sep=2pt] (s1) at (0,0) {谢谢};
+    \node [anchor=west,inner sep=2pt] (s2) at ([xshift=0.4em]s1.east) {你};
+    \node [anchor=north,inner sep=2pt] (t1) at ([yshift=-1.4em]s1.center) {thank};
+    \node [anchor=north,inner sep=2pt] (t2) at ([yshift=-1.6em]s2.center) {you};
+    \node [anchor=east,inner sep=2pt] (t0) at ([xshift=-0.2em,yshift=-0.05em]t1.west) {$t_0$};
+    \draw [-] (s1.south) -- ([yshift=-0.2em]t1.north);
+    \draw [-] (s2.south) -- ([yshift=-0.2em]t1.north);
+    {
+    \node [anchor=south east,inner sep=0pt] (p) at (t0.north west) {\small{{\color{ugreen} P(}}};
+    \node [anchor=south west,inner sep=0pt] (p2) at ([yshift=0.2em]t2.north east) {\small{{\color{ugreen} )}}};
+    \node [anchor=west] (eq) at ([xshift=0.7em]p2.east) {\small{+}};
+    }
+    }
+    \end{scope}
+    \begin{scope}[xshift=3in,yshift=-0.6in]
+    {
+    \node [anchor=west,inner sep=2pt] (s1) at (0,0) {谢谢};
+    \node [anchor=west,inner sep=2pt] (s2) at ([xshift=0.4em]s1.east) {你};
+    \node [anchor=north,inner sep=2pt] (t1) at ([yshift=-1.4em]s1.center) {thank};
+    \node [anchor=north,inner sep=2pt] (t2) at ([yshift=-1.6em]s2.center) {you};
+    \node [anchor=east,inner sep=2pt] (t0) at ([xshift=-0.2em,yshift=-0.05em]t1.west) {$t_0$};
+    \draw [-] (s1.south) -- ([yshift=-0.2em]t1.north);
+    \draw [-] (s2.south) -- (t2.north);
+    {
+    \node [anchor=south east,inner sep=0pt] (p) at (t0.north west) {\small{{\color{ugreen} P(}}};
+    \node [anchor=south west,inner sep=0pt] (p2) at ([yshift=0.2em]t2.north east) {\small{{\color{ugreen} )}}};
+    \node [anchor=west] (eq) at ([xshift=0.7em]p2.east) {\small{+}};
+    }
+    }
+    \end{scope}
+    \begin{scope}[yshift=-1.2in]
+    {
+    \node [anchor=west,inner sep=2pt] (s1) at (0,0) {谢谢};
+    \node [anchor=west,inner sep=2pt] (s2) at ([xshift=0.4em]s1.east) {你};
+    \node [anchor=north,inner sep=2pt] (t1) at ([yshift=-1.4em]s1.center) {thank};
+    \node [anchor=north,inner sep=2pt] (t2) at ([yshift=-1.6em]s2.center) {you};
+    \node [anchor=east,inner sep=2pt] (t0) at ([xshift=-0.2em,yshift=-0.05em]t1.west) {$t_0$};
+    \draw [-] (s1.south) -- (t2.north);
+    \draw [-] (s2.south) -- (t0.north);
+    {
+    \node [anchor=south east,inner sep=0pt] (p) at (t0.north west) {\small{{\color{ugreen} P(}}};
+    \node [anchor=south west,inner sep=0pt] (p2) at ([yshift=0.2em]t2.north east) {\small{{\color{ugreen} )}}};
+    \node [anchor=west] (eq) at ([xshift=0.7em]p2.east) {\small{+}};
+    }
+    }
+    \end{scope}
+    \begin{scope}[xshift=1.5in,yshift=-1.2in]
+    {
+    \node [anchor=west,inner sep=2pt] (s1) at (0,0) {谢谢};
+    \node [anchor=west,inner sep=2pt] (s2) at ([xshift=0.4em]s1.east) {你};
+    \node [anchor=north,inner sep=2pt] (t1) at ([yshift=-1.4em]s1.center) {thank};
+    \node [anchor=north,inner sep=2pt] (t2) at ([yshift=-1.6em]s2.center) {you};
+    \node [anchor=east,inner sep=2pt] (t0) at ([xshift=-0.2em,yshift=-0.05em]t1.west) {$t_0$};
+    \draw [-] (s1.south) -- (t2.north);
+    \draw [-] (s2.south) -- (t1.north);
+   {
+    \node [anchor=south east,inner sep=0pt] (p) at (t0.north west) {\small{{\color{ugreen} P(}}};
+    \node [anchor=south west,inner sep=0pt] (p2) at ([yshift=0.2em]t2.north east) {\small{{\color{ugreen} )}}};
+    \node [anchor=west] (eq) at ([xshift=0.7em]p2.east) {\small{+}};
+    }
+    }
+    \end{scope}
+    \begin{scope}[xshift=3in,yshift=-1.2in]
+    {
+    \node [anchor=west,inner sep=2pt] (s1) at (0,0) {谢谢};
+    \node [anchor=west,inner sep=2pt] (s2) at ([xshift=0.4em]s1.east) {你};
+    \node [anchor=north,inner sep=2pt] (t1) at ([yshift=-1.4em]s1.center) {thank};
+    \node [anchor=north,inner sep=2pt] (t2) at ([yshift=-1.6em]s2.center) {you};
+    \node [anchor=east,inner sep=2pt] (t0) at ([xshift=-0.2em,yshift=-0.05em]t1.west) {$t_0$};
+    \draw [-] (s1.south) -- (t2.north);
+    \draw [-] (s2.south) -- (t2.north);
+    {
+    \node [anchor=south east,inner sep=0pt] (p) at (t0.north west) {\small{{\color{ugreen} P(}}};
+    \node [anchor=south west,inner sep=0pt] (p2) at ([yshift=0.2em]t2.north east) {\small{{\color{ugreen} )}}};
+    \node [anchor=west] (eq) at ([xshift=0.7em]p2.east) {\normalsize{= \ P($s|t$)}};
+    }
+    }
+    \end{scope}
+    \end{tikzpicture}
+
+%---------------------------------------------------------------------
+
--- a/Book/Chapter3/Figures/figure-alignment-of-empty-translation.tex
+++ b/Book/Chapter3/Figures/figure-alignment-of-empty-translation.tex
+
+
+%%% outline
+%-------------------------------------------------------------------------
+
+    
+    \begin{tikzpicture}
+    {\small
+    \node [anchor=west,inner sep=2pt] (s1) at (0,0) {在};
+    \node [anchor=west,inner sep=2pt] (s2) at ([xshift=1em]s1.east) {桌子};
+    \node [anchor=west,inner sep=2pt] (s3) at ([xshift=1em]s2.east) {上};
+    \node [anchor=north,inner sep=2pt] (t1) at ([yshift=-1.7em]s1.center) {on};
+    \node [anchor=north,inner sep=2pt] (t2) at ([yshift=-1.5em]s2.center) {the};
+    \node [anchor=north,inner sep=2pt] (t3) at ([yshift=-1.5em]s3.center) {table};
+    \node [anchor=east,inner sep=2pt] (t0) at ([xshift=-1.5em]t1.west) {$t_0$};
+    \draw [-] (s1.south) -- (t0.north);
+    \draw [-] (s2.south) -- (t3.north);
+    \draw [-] (s3.south) .. controls +(south:1em) and +(north:1em) .. (t1.north);
+    }
+    \end{tikzpicture}
+
+%---------------------------------------------------------------------
+
+
+
--- a/Book/Chapter3/Figures/figure-different-alignment-comparison.tex
+++ b/Book/Chapter3/Figures/figure-different-alignment-comparison.tex
+
+
+%%% outline
+%-------------------------------------------------------------------------
+    \begin{tikzpicture}
+    \begin{scope}
+    {\small
+    \node [anchor=west,inner sep=2pt] (s1) at (0,0) {谢谢};
+    \node [anchor=west,inner sep=2pt] (s2) at ([xshift=2em]s1.east) {你};
+    \node [anchor=north,inner sep=2pt] (t1) at ([yshift=-2.2em]s1.center) {thank};
+    \node [anchor=north,inner sep=2pt] (t2) at ([yshift=-2.45em]s2.center) {you};
+    \draw [-] (s1.south) -- ([yshift=-0.2em]t1.north);
+    \draw [-] (s2.south) -- (t2.north);
+    \node [anchor=center,draw=ublue,circle,thick,fill=white,inner sep=1pt,circular drop shadow={shadow xshift=0.1em,shadow yshift=-0.1em}] (mark) at ([xshift=0.8em,yshift=-0.7em]s2.south east) {{\color{ugreen} \tiny{\textbf{Yes}}}};
+    }
+    \end{scope}
+    \begin{scope}[xshift=1.8in]
+    {\small
+    \node [anchor=west,inner sep=2pt] (s1) at (0,0) {谢谢};
+    \node [anchor=west,inner sep=2pt] (s2) at ([xshift=2em]s1.east) {你};
+    \node [anchor=north,inner sep=2pt] (t1) at ([yshift=-2.2em]s1.center) {thank};
+    \node [anchor=north,inner sep=2pt] (t2) at ([yshift=-2.45em]s2.center) {you};
+    \draw [-] (s1.south) -- ([yshift=-0.2em]t1.north);
+    \draw [-] (s1.south) -- (t2.north);
+    \node [anchor=center,draw=ublue,circle,thick,fill=white,inner sep=1.5pt,circular drop shadow={shadow xshift=0.1em,shadow yshift=-0.1em}] (mark) at ([xshift=0.8em,yshift=-0.7em]s2.south east) {{\color{red} \tiny{\textbf{No}}}};
+    }
+    \end{scope}
+    \begin{scope}[xshift=3.6in]
+    {\small
+    \node [anchor=west,inner sep=2pt] (s1) at (0,0) {谢谢};
+    \node [anchor=west,inner sep=2pt] (s2) at ([xshift=2em]s1.east) {你};
+    \node [anchor=north,inner sep=2pt] (t1) at ([yshift=-2.2em]s1.center) {thank};
+    \node [anchor=north,inner sep=2pt] (t2) at ([yshift=-2.45em]s2.center) {you};
+    \draw [-] (s1.south) -- ([yshift=-0.2em]t1.north);
+    \draw [-] (s2.south) -- ([yshift=-0.2em]t1.north);
+    \node [anchor=center,draw=ublue,circle,thick,fill=white,inner sep=1pt,circular drop shadow={shadow xshift=0.1em,shadow yshift=-0.1em}] (mark) at ([xshift=0.8em,yshift=-0.7em]s2.south east) {{\color{ugreen} \tiny{\textbf{Yes}}}};
+    }
+    \end{scope}
+    \end{tikzpicture}
+   
+%---------------------------------------------------------------------
+
+
+
--- a/Book/Chapter3/Figures/figure-different-translation-candidate-space.tex
+++ b/Book/Chapter3/Figures/figure-different-translation-candidate-space.tex
+
+
+%%% outline
+%-------------------------------------------------------------------------
+
+
+
+\begin{tikzpicture}
+
+\node [draw,red,fill=red!10,thick,anchor=center,circle,inner sep=3.5pt] (s1) at (0,0) {\black{$s$}};
+\node [draw,ublue,fill=blue!10,thick,anchor=center,circle,inner sep=2pt] (t) at ([xshift=1in]s1.east) {\black{$\hat{t}$}};
+
+\draw [->,thick,] (s1.north east) .. controls +(north east:1em) and +(north west:1em).. (t.north west) node[pos=0.5,below] {\tiny{正确翻译}};
+
+
+\node [draw,red,fill=red!10,thick,anchor=center,circle,inner sep=3.5pt] (s) at ([xshift=13em,yshift=0em]s1.east) {\black{$s$}};
+\node [draw,ublue,fill=blue!10,thick,anchor=center,circle,inner sep=2pt] (t1) at ([xshift=1in]s.east) {\black{$t_1$}};
+\node [draw,ublue,fill=blue!10,thick,anchor=center,circle,inner sep=2pt] (t2) at ([xshift=3em,yshift=2em]t1.north east) {\black{$t_2$}};
+\node [draw,ublue,fill=blue!10,thick,anchor=center,circle,inner sep=2pt] (t3) at ([xshift=1em,yshift=4em]t1.north east) {\black{$t_3$}};
+\node [draw,ublue,fill=blue!10,thick,anchor=center,circle,inner sep=2pt] (t4) at ([xshift=3em,yshift=-1.5em]t1.north east) {\black{$t_4$}};
+
+\node [draw,dashed,ublue,fill=blue!10,thick,anchor=center,circle,minimum size=18pt] (t5) at ([xshift=3em]t3.east) {};
+\node [draw,dashed,ublue,fill=blue!10,thick,anchor=center,circle,minimum size=18pt] (t6) at ([xshift=3em]t2.east) {};
+\node [draw,dashed,ublue,fill=blue!10,thick,anchor=center,circle,minimum size=18pt] (t7) at ([xshift=3em]t4.east) {};
+
+\draw [->,thick,] (s.north east) .. controls +(north east:1em) and +(north west:1em).. (t1.north west) node[pos=0.5,below] {\tiny{P ($t_1|s$)=0.1}};
+\draw [->,thick,] (s.60) .. controls +(50:4em) and +(west:1em).. (t2.west) node[pos=0.5,below] {\tiny{P($t_2|s$)=0.2}};
+\draw [->,thick,] (s.north) .. controls +(70:4em) and +(west:1em).. (t3.west) node[pos=0.5,above,xshift=-1em] {\tiny{P($t_3|s$)=0.3}};
+\draw [->,thick,] (s.south east) .. controls +(300:3em) and +(south west:1em).. (t4.south west) node[pos=0.5,below] {\tiny{P($t_4|s$)=0.1}};
+
+\node [anchor=center] (foot1) at ([xshift=3.8em,yshift=-3em]s1.south) {\footnotesize{人的翻译候选空间}};
+\node [anchor=center] (foot2) at ([xshift=7em,yshift=-3em]s.south) {\footnotesize{机器的翻译候选空间}};
+
+
+\end{tikzpicture}
+
+
+
+%---------------------------------------------------------------------
+
+
--- a/Book/Chapter3/Figures/figure-different-translation-result-in-different-score-IBM1.tex
+++ b/Book/Chapter3/Figures/figure-different-translation-result-in-different-score-IBM1.tex
+
+%%% outline
+%-------------------------------------------------------------------------
+
+
+
+\begin{tikzpicture}
+
+\begin{scope}
+
+\node [anchor=west] (s1) at (0,0) {$s$ = 在\ \ 桌子\ \ 上};
+\node [anchor=west] (t1) at ([yshift=-2em]s1.west) {$t$ = on\ \ the\ \ table};
+\draw [->,double,thick,ublue] ([yshift=0.2em]s1.south) -- ([yshift=-0.8em]s1.south);
+
+\end{scope}
+
+\begin{scope}[xshift=1.5in]
+
+\node [anchor=west] (s2) at (0,0) {$s$ = 在\ \ 桌子\ \ 上};
+\node [anchor=west] (t2) at ([yshift=-2em]s2.west) {$t'$ = table \ on\ \ the};
+\draw [->,double,thick,ublue] ([yshift=0.2em]s2.south) -- ([yshift=-0.8em]s2.south);
+
+\end{scope}
+
+\node [anchor=north] (score11) at ([yshift=-2.0em]s1.south) {$\textrm{P}(s|t)$};
+\node [anchor=north] (score12) at ([yshift=-2.0em]s2.south) {$\textrm{P}(s|t')$};
+\node [anchor=west] (comp1) at ([xshift=2.3em]score11.east) {\large{$\mathbf{=}$}};
+\node [anchor=east] (label1) at ([xshift=-1em,yshift=0.1em]score11.west) {\textbf{IBM模型1:}};
+
+{
+\node [anchor=north] (score21) at ([yshift=0.2em]score11.south) {$\textrm{P}(s|t)$};
+\node [anchor=north] (score22) at ([yshift=0.2em]score12.south) {$\textrm{P}(s|t')$};
+\node [anchor=west] (comp2) at ([xshift=2.3em]score21.east) {\large{$\mathbf{>}$}};
+\node [anchor=east] (label2) at ([xshift=-1em,yshift=0.1em]score21.west) {\textbf{理想:}};
+}
+
+\end{tikzpicture}
+
+%---------------------------------------------------------------------
+
+
+
--- a/Book/Chapter3/Figures/figure-example-of-formula1.29.tex
+++ b/Book/Chapter3/Figures/figure-example-of-formula1.29.tex
+
+\definecolor{ublue}{rgb}{0.152,0.250,0.545}
+\definecolor{ugreen}{rgb}{0,0.5,0}
+
+%%% outline
+%-------------------------------------------------------------------------
+
+
+\begin{tikzpicture}
+
+{\footnotesize
+\node [anchor=west] (mid) at (0,0) {$\alpha(1,0)\alpha(2,0) + \alpha(1,0)\alpha(2,1) + \alpha(1,0)\alpha(2,2) +$};
+\node [anchor=west] (mid2) at ([yshift=-2em]mid.west) {$\alpha(1,1)\alpha(2,0) + \alpha(1,1)\alpha(2,1) + \alpha(1,1)\alpha(2,2)+$};
+\node [anchor=west] (mid3) at ([yshift=-2em]mid2.west) {$\alpha(1,2)\alpha(2,0) + \alpha(1,2)\alpha(2,1) + \alpha(1,2)\alpha(2,2)$};
+}
+
+\begin{pgfonlayer}{background}
+\node[rectangle,draw=ublue,red,inner sep=0.1em,fill=white] [fit = (mid) (mid2) (mid3)] (exampleeq) {};
+\end{pgfonlayer}
+
+{\footnotesize
+{
+\node [anchor=north] (eq1) at ([xshift=2em,yshift=-2em]exampleeq.south west) {$\sum\limits_{y_1=0}^{2} \sum\limits_{y_2=0}^{2} \alpha(1,y_1)\alpha(2,y_2)$};
+\node [anchor=west] (eq1part2) at ([xshift=-1em,yshift=-3em]eq1.west) {$=$};
+\node [anchor=west] (eq1part3) at ([xshift=-0.5em]eq1part2.east) {$\sum\limits_{y_1=0}^{2} \sum\limits_{y_2=0}^{2} \prod\limits_{x=1}^{2} $};
+\node [anchor=west,inner sep=2pt] (eq1part4) at ([xshift=-0.3em]eq1part3.east) {$\alpha(x,y_x)$};
+}
+
+{
+\node [anchor=north] (eq2) at ([xshift=-2em,yshift=-2em]exampleeq.south east) {$(\alpha(1,0)+\alpha(1,1)+\alpha(1,2))\cdot$};
+\node [anchor=west] (eq2part2) at ([yshift=-1.5em]eq2.west) {$(\alpha(2,0)+\alpha(2,1)+\alpha(2,2))$};
+\node [anchor=west] (eq2part3) at ([xshift=2.1in]eq1part2.east){$=$};
+\node [anchor=west] (eq2part4) at ([xshift=-0.5em]eq2part3.east){$\prod\limits_{x=1}^{2} \sum\limits_{y=0}^{2}$};
+\node [anchor=west,inner sep=2pt] (eq2part5) at ([xshift=-0.3em]eq2part4.east){$\alpha(x,y)$};
+}
+}
+
+\begin{pgfonlayer}{background}
+{
+\node[rectangle,draw=ublue,red,inner sep=0.1em,fill=white] [fit = (eq1) (eq1part2) (eq1part3)] (eq1full) {};
+}
+{
+\node[rectangle,draw=ublue,red,inner sep=0.1em,fill=white] [fit = (eq2) (eq2part2) (eq2part3) (eq2part4)] (eq2full) {};
+}
+\end{pgfonlayer}
+
+{
+\draw [->,thick] ([xshift=-3em]exampleeq.south) .. controls +(south:1.5em) and +(north:1.5em) .. (eq1full.north);
+}
+{
+\draw [->,thick] ([xshift=3em]exampleeq.south) .. controls +(south:1.5em) and +(north:1.5em) .. (eq2full.north);
+}
+
+{
+\node [anchor=west] at ([xshift=0.7em]eq1full.east) {\LARGE{\textbf{=}}};
+}
+
+{
+{\large
+\node [anchor=west] (feq) at ([xshift=3em,yshift=-3em]eq1full.south west) {$\sum\limits_{a_1=0}^{l} ... \sum\limits_{a_m=0}^{l} \prod\limits_{j=1}^{m}$};
+\node [anchor=west,inner sep=2pt,fill=blue!20] (feqpart2) at ([xshift=-0.3em]feq.east) {$f(s_j|t_{a_j})$};
+\node [anchor=west,inner sep=1pt] (feqpart3) at (feqpart2.east) {=};
+\node [anchor=west] (feqpart4) at (feqpart3.east) {$\prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l}$};
+\node [anchor=west,inner sep=2pt,fill=blue!20] (feqpart5) at ([xshift=-0.3em]feqpart4.east) {$f(s_j|t_i)$};
+}
+
+\draw [->,thick] (eq1part4.south) .. controls +(south:2.5em) and +(north:2.5em) .. (feqpart2.north);
+\draw [->,thick] (eq2part5.south) .. controls +(south:1.5em) and +(north:1.5em) .. (feqpart5.north);
+
+\node [anchor=west,inner sep=2pt,fill=blue!20] (eq1part4) at ([xshift=-0.3em]eq1part3.east) {\footnotesize{$\alpha(x,y_x)$}};
+\node [anchor=west,inner sep=2pt,fill=blue!20] (eq2part5) at ([xshift=-0.3em]eq2part4.east){\footnotesize{$\alpha(x,y)$}};
+
+}
+
+\end{tikzpicture}
+
+
--- a/Book/Chapter3/Figures/figure-example-translation-alignment.tex
+++ b/Book/Chapter3/Figures/figure-example-translation-alignment.tex
+
+%%% outline
+%-------------------------------------------------------------------------
+
+
+
+\begin{tabular}{| l | l |}
+\hline
+& {\footnotesize{$\prod\limits_{(j,i) \in \hat{A}} \textrm{P}(s_j,t_i)$} } \\ \hline
+
+\begin{tikzpicture}
+
+\begin{scope}
+
+{\footnotesize
+\begin{scope}
+\node [anchor=west] (s1) at (0,0) {我$_1$};
+\node [anchor=west] (s2) at ([xshift=2.2em]s1.east) {对$_2$};
+\node [anchor=west] (s3) at ([xshift=3.2em]s2.east) {你$_3$};
+\node [anchor=west] (s4) at ([xshift=3.6em]s3.east) {感到$_4$};
+\node [anchor=west] (s5) at ([xshift=1.9em]s4.east) {满意$_5$};
+\node [anchor=east] (s) at (s1.west) {$s=$};
+\end{scope}
+
+\begin{scope}[yshift=-3.6em]
+\node [anchor=west] (t1) at (0.35em,0) {I$_1$};
+\node [anchor=west] (t2) at ([xshift=2.3em,yshift=-0.1em]t1.east) {am$_2$};
+\node [anchor=west] (t3) at ([xshift=2.3em,yshift=0.1em]t2.east) {satisfied$_3$};
+\node [anchor=west] (t4) at ([xshift=2.3em]t3.east) {with$_4$};
+\node [anchor=west] (t5) at ([xshift=2.3em,yshift=-0.2em]t4.east) {you$_5$};
+\node [anchor=east] (t) at (t1.west) {$t'=$};
+\end{scope}
+
+
+\draw [-,thick,ublue,dashed] (s1.south) -- (t1.north);
+\draw [-,thick,ublue,dashed] (s4.south) -- ([yshift=0.3em]t2.north);
+\draw [-,thick,ublue,dashed] (s2.south) ..controls +(south:1em) and +(north:1em).. (t4.north);
+\draw [-,thick,ublue,dashed] (s3.south) ..controls +(south:0.5em) and +(north:1.5em).. (t5.north);
+\draw [-,thick,ublue,dashed] (s5.south) -- (t3.north);
+}
+
+\end{scope}
+
+\end{tikzpicture}
+
+& {\tikz{\node[minimum height=3.2em]{\small{0.0023}};}} \\
+
+\begin{tikzpicture}
+
+\begin{scope}
+
+{\footnotesize
+\begin{scope}
+\node [anchor=west] (s1) at (0,0) {我$_1$};
+\node [anchor=west] (s2) at ([xshift=2.5em]s1.east) {对$_2$};
+\node [anchor=west] (s3) at ([xshift=2.5em]s2.east) {你$_3$};
+\node [anchor=west] (s4) at ([xshift=2.5em]s3.east) {感到$_4$};
+\node [anchor=west] (s5) at ([xshift=2.5em]s4.east) {满意$_5$};
+\node [anchor=east] (s) at (s1.west) {$s=$};
+\end{scope}
+
+\begin{scope}[yshift=-3.6em]
+\node [anchor=center] (t1) at ([yshift=-1.6em]s1.south) {I$_1$};
+\node [anchor=center] (t2) at ([yshift=-1.6em]s2.south) {with$_2$};
+\node [anchor=center] (t3) at ([yshift=-1.7em]s3.south) {you$_3$};
+\node [anchor=center] (t4) at ([yshift=-1.7em]s4.south) {am$_4$};
+\node [anchor=center] (t5) at ([yshift=-1.6em]s5.south) {satisfied$_5$};
+\node [anchor=center] (t) at ([xshift=-1.3em]t1.west) {$t''=$};
+\end{scope}
+
+
+\draw [-,thick,ublue,dashed] (s1.south) -- (t1.north);
+\draw [-,thick,ublue,dashed] (s2.south) -- (t2.north);
+\draw [-,thick,ublue,dashed] (s3.south) -- (t3.north);
+\draw [-,thick,ublue,dashed] (s4.south) -- (t4.north);
+\draw [-,thick,ublue,dashed] (s5.south) -- (t5.north);
+}
+
+\end{scope}
+
+\end{tikzpicture}
+
+&{\tikz{\node[minimum height=3.2em]{\small{0.0023}};}}\\
+\hline
+\end{tabular}
+
+
+%---------------------------------------------------------------------
+
+
+
--- a/Book/Chapter3/Figures/figure-greedy-MT-decoding-1.tex
+++ b/Book/Chapter3/Figures/figure-greedy-MT-decoding-1.tex
+
+
+\begin{tikzpicture}
+
+{
+
+\node [anchor=north west,inner sep=2pt,align=left] (line1) at (0,0) {\textrm{\textbf{Function} \textsc{WordDecoding}($s$)}};
+\node [anchor=north west,inner sep=2pt,align=left] (line2) at ([yshift=-1pt]line1.south west) {\textrm{1: $\pi = $\textsc{GetTransOptions}($s$)}};
+\node [anchor=north west,inner sep=2pt,align=left] (line3) at ([yshift=-1pt]line2.south west) {\textrm{2: $best = \phi$}};
+\node [anchor=north west,inner sep=2pt,align=left] (line4) at ([yshift=-1pt]line3.south west) {\textrm{3: \textbf{for} $i$ in $[1,m]$ \textbf{do}}};
+\node [anchor=north west,inner sep=2pt,align=left] (line5) at ([yshift=-1pt]line4.south west) {\textrm{4: \hspace{1em} $h = \phi$}};
+\node [anchor=north west,inner sep=2pt,align=left] (line6) at ([yshift=-1pt]line5.south west) {\textrm{5: \hspace{1em} \textbf{foreach} $j$ in $[1,m]$ \textbf{do}}};
+\node [anchor=north west,inner sep=2pt,align=left] (line7) at ([yshift=-1pt]line6.south west) {\textrm{6: \hspace{2em} \textbf{if} $used[j]=$ \textbf{true} \textbf{then}}};
+\node [anchor=north west,inner sep=2pt,align=left] (line8) at ([yshift=-1pt]line7.south west) {\textrm{7: \hspace{3em} $h = h \cup \textrm{\textsc{Join}}(best,\pi[j])$}};
+\node [anchor=north west,inner sep=2pt,align=left] (line9) at ([yshift=-1pt]line8.south west) {\textrm{8: \hspace{1em} $best = \textrm{\textsc{PruneForTop1}}(h)$}};
+\node [anchor=north west,inner sep=2pt,align=left] (line10) at ([yshift=-1pt]line9.south west) {\textrm{9: \hspace{1em} $used[best.j] = \textrm{\textsc{\textbf{true}}}$}};
+\node [anchor=north west,inner sep=2pt,align=left] (line11) at ([yshift=-1pt]line10.south west) {\textrm{10: \textbf{return} $best.translatoin$}};
+
+\node [anchor=south west,inner sep=2pt,align=left] (head1) at ([yshift=1pt]line1.north west) {输出: 找的最佳译文};
+\node [anchor=south west,inner sep=2pt,align=left] (head2) at ([yshift=1pt]head1.north west) {输入: 源语句子$s=s_1...s_m$};
+
+}
+
+\begin{pgfonlayer}{background}
+\node[rectangle,draw=ublue,thick,inner sep=0.2em,fill=white,drop shadow,minimum height=1.6cm] [fit = (head2) (line8) (line11)] (algorithm) {};
+
+%% highlights
+%\begin{pgfonlayer}{background}
+{
+\node[anchor=west,fill=blue!20,minimum height=0.16in,minimum width=2.21in] (line2highlight) at (line2.west) {};
+}
+{
+\node[anchor=west,fill=blue!20,minimum height=0.16in,minimum width=2.21in] (line3highlight) at (line3.west) {};
+\node[anchor=west,fill=blue!20,minimum height=0.16in,minimum width=2.21in] (line5highlight) at (line5.west) {};
+}
+
+%\end{pgfonlayer}
+
+\end{pgfonlayer}
+
+{
+
+%% remark 1
+\begin{scope}
+{
+\node [anchor=north west,align=left] (remark1) at ([xshift=0.4in]algorithm.north east) {获取每个单词\\的翻译候选};
+
+\node [anchor=west,draw,thick,circle,minimum size=0.3em,inner sep=2.1pt,red] (s1) at ([yshift=-0.7em,xshift=0.5em]remark1.north east){1};
+\node [anchor=west,draw,thick,circle,minimum size=0.3em,inner sep=2.1pt,ugreen] (s2) at ([xshift=0.4em]s1.east) {2};
+\node [anchor=west,draw,thick,circle,minimum size=0.3em,inner sep=2.1pt,orange] (s3) at ([xshift=0.4em]s2.east) {3};
+\node [anchor=west,draw,thick,circle,minimum size=0.3em,inner sep=3.0pt,ublue] (s4) at ([xshift=0.4em]s3.east) {...};
+\node [anchor=west,draw,thick,circle,minimum size=0.3em,inner sep=1.5pt,purple] (s5) at ([xshift=0.4em]s4.east) {$m$};
+
+\node [anchor=center,draw,thick,circle,minimum size=0.3em,inner sep=2pt,red,fill=red] (t1) at ([yshift=-1.7em]s1.center) {{\color{white} $n$}};
+\node [anchor=center,draw,thick,circle,minimum size=0.3em,inner sep=2pt,ugreen,fill=ugreen] (t2) at ([yshift=-1.7em]s2.center) {{\color{white} $n$}};
+\node [anchor=center,draw,thick,circle,minimum size=0.3em,inner sep=2pt,orange,fill=orange] (t3) at ([yshift=-1.7em]s3.center) {{\color{white} $n$}};
+\node [anchor=center,draw,thick,circle,minimum size=0.3em,inner sep=2pt,ublue,fill=ublue] (t4) at ([yshift=-1.7em]s4.center) {{\color{white} $n$}};
+\node [anchor=center,draw,thick,circle,minimum size=0.3em,inner sep=2pt,purple,fill=purple] (t5) at ([yshift=-1.7em]s5.center) {{\color{white} $n$}};
+
+\draw [->,thick] ([yshift=-0.1em]s1.south) -- ([yshift=0.1em]t1.north);
+\draw [->,thick] ([yshift=-0.1em]s2.south) -- ([yshift=0.1em]t2.north);
+\draw [->,thick] ([yshift=-0.1em]s3.south) -- ([yshift=0.1em]t3.north);
+\draw [->,thick] ([yshift=-0.1em]s4.south) -- ([yshift=0.1em]t4.north);
+\draw [->,thick] ([yshift=-0.1em]s5.south) -- ([yshift=0.1em]t5.north);
+
+\begin{pgfonlayer}{background}
+{
+\node[rectangle,draw,inner sep=0.2em,fill=blue!10] [fit = (remark1) (t5)] (remark1label) {};
+}
+\end{pgfonlayer}
+}
+\end{scope}
+%% end of remark 1
+
+%% remark 2
+\begin{scope}
+{
+\node [anchor=north west,draw,inner sep=2pt,fill=blue!10] (remark2) at ([xshift=-0.2em,yshift=-1em]remark1.south west) {$best$用于保存当前最好的翻译结果};
+}
+\end{scope}
+%% end of remark 2
+
+
+
+\node [anchor=north west] (remark4) at ([xshift=21.8em,yshift=-0.6em]line7.east) {};
+
+
+
+
+
+
+
+
+%% remark 3
+\begin{scope}
+{
+\node [anchor=north west,draw,inner sep=2pt,fill=blue!10] (remark3) at ([yshift=-0.5em]remark2.south west) {$h$用于保存每步生成的所有译文候选};
+}
+\end{scope}
+%% end of remark 3
+
+
+{
+\draw [->,thick] (line2highlight.east) ..controls +(east:1em) and +(west:1em).. (remark1label.west);
+}
+{
+\draw [->,thick] (line3highlight.east) ..controls +(east:1em) and +(west:1em).. ([yshift=0.3em]remark2.south west);
+\draw [->,thick] (line5highlight.east) ..controls +(east:1em) and +(west:1em).. ([yshift=0.3em]remark3.south west);
+}
+
+}
+
+\end{tikzpicture}
+
+
+
+
+
--- a/Book/Chapter3/Figures/figure-greedy-MT-decoding-2.tex
+++ b/Book/Chapter3/Figures/figure-greedy-MT-decoding-2.tex
+
+%%% outline
+%-------------------------------------------------------------------------
+
+
+\begin{tikzpicture}
+
+{
+
+\node [anchor=north west,inner sep=2pt,align=left] (line1) at (0,0) {\textrm{\textbf{Function} \textsc{WordDecoding}($s$)}};
+\node [anchor=north west,inner sep=2pt,align=left] (line2) at ([yshift=-1pt]line1.south west) {\textrm{1: $\pi = $\textsc{GetTransOptions}($s$)}};
+\node [anchor=north west,inner sep=2pt,align=left] (line3) at ([yshift=-1pt]line2.south west) {\textrm{2: $best = \phi$}};
+\node [anchor=north west,inner sep=2pt,align=left] (line4) at ([yshift=-1pt]line3.south west) {\textrm{3: \textbf{for} $i$ in $[1,m]$ \textbf{do}}};
+\node [anchor=north west,inner sep=2pt,align=left] (line5) at ([yshift=-1pt]line4.south west) {\textrm{4: \hspace{1em} $h = \phi$}};
+\node [anchor=north west,inner sep=2pt,align=left] (line6) at ([yshift=-1pt]line5.south west) {\textrm{5: \hspace{1em} \textbf{foreach} $j$ in $[1,m]$ \textbf{do}}};
+\node [anchor=north west,inner sep=2pt,align=left] (line7) at ([yshift=-1pt]line6.south west) {\textrm{6: \hspace{2em} \textbf{if} $used[j]=$ \textbf{true} \textbf{then}}};
+\node [anchor=north west,inner sep=2pt,align=left] (line8) at ([yshift=-1pt]line7.south west) {\textrm{7: \hspace{3em} $h = h \cup \textrm{\textsc{Join}}(best,\pi[j])$}};
+\node [anchor=north west,inner sep=2pt,align=left] (line9) at ([yshift=-1pt]line8.south west) {\textrm{8: \hspace{1em} $best = \textrm{\textsc{PruneForTop1}}(h)$}};
+\node [anchor=north west,inner sep=2pt,align=left] (line10) at ([yshift=-1pt]line9.south west) {\textrm{9: \hspace{1em} $used[best.j] = \textrm{\textsc{\textbf{true}}}$}};
+\node [anchor=north west,inner sep=2pt,align=left] (line11) at ([yshift=-1pt]line10.south west) {\textrm{10: \textbf{return} $best.translatoin$}};
+
+\node [anchor=south west,inner sep=2pt,align=left] (head1) at ([yshift=1pt]line1.north west) {输出: 找的最佳译文};
+\node [anchor=south west,inner sep=2pt,align=left] (head2) at ([yshift=1pt]head1.north west) {输入: 源语句子$s=s_1...s_m$};
+
+}
+
+\begin{pgfonlayer}{background}
+\node[rectangle,draw=ublue,thick,inner sep=0.2em,fill=white,drop shadow,minimum height=1.6cm] [fit = (head2) (line8) (line11)] (algorithm) {};
+
+%% highlights
+%\begin{pgfonlayer}{background}
+{
+\node[anchor=west,fill=blue!20,minimum height=0.16in,minimum width=2.21in] (line2highlight) at (line2.west) {};
+}
+{
+\node[anchor=west,fill=blue!20,minimum height=0.16in,minimum width=2.21in] (line3highlight) at (line3.west) {};
+\node[anchor=west,fill=blue!20,minimum height=0.16in,minimum width=2.21in] (line5highlight) at (line5.west) {};
+}
+{
+\node[anchor=west,fill=blue!20,minimum height=0.16in,minimum width=2.21in] (line8highlight) at (line8.west) {};
+}
+{
+\node[anchor=west,fill=blue!20,minimum height=0.16in,minimum width=2.21in] (line9highlight) at (line9.west) {};
+}
+{
+\node[anchor=west,fill=blue!20,minimum height=0.16in,minimum width=2.21in] (line10highlight) at (line10.west) {};
+}
+%\end{pgfonlayer}
+
+\end{pgfonlayer}
+
+{
+
+%% remark 1
+\begin{scope}
+{
+\node [anchor=north west,align=left] (remark1) at ([xshift=0.4in]algorithm.north east) {获取每个单词\\的翻译候选};
+
+\node [anchor=west,draw,thick,circle,minimum size=0.3em,inner sep=2.1pt,red] (s1) at ([yshift=-0.7em,xshift=0.5em]remark1.north east){1};
+\node [anchor=west,draw,thick,circle,minimum size=0.3em,inner sep=2.1pt,ugreen] (s2) at ([xshift=0.4em]s1.east) {2};
+\node [anchor=west,draw,thick,circle,minimum size=0.3em,inner sep=2.1pt,orange] (s3) at ([xshift=0.4em]s2.east) {3};
+\node [anchor=west,draw,thick,circle,minimum size=0.3em,inner sep=3.0pt,ublue] (s4) at ([xshift=0.4em]s3.east) {...};
+\node [anchor=west,draw,thick,circle,minimum size=0.3em,inner sep=1.5pt,purple] (s5) at ([xshift=0.4em]s4.east) {$m$};
+
+\node [anchor=center,draw,thick,circle,minimum size=0.3em,inner sep=2pt,red,fill=red] (t1) at ([yshift=-1.7em]s1.center) {{\color{white} $n$}};
+\node [anchor=center,draw,thick,circle,minimum size=0.3em,inner sep=2pt,ugreen,fill=ugreen] (t2) at ([yshift=-1.7em]s2.center) {{\color{white} $n$}};
+\node [anchor=center,draw,thick,circle,minimum size=0.3em,inner sep=2pt,orange,fill=orange] (t3) at ([yshift=-1.7em]s3.center) {{\color{white} $n$}};
+\node [anchor=center,draw,thick,circle,minimum size=0.3em,inner sep=2pt,ublue,fill=ublue] (t4) at ([yshift=-1.7em]s4.center) {{\color{white} $n$}};
+\node [anchor=center,draw,thick,circle,minimum size=0.3em,inner sep=2pt,purple,fill=purple] (t5) at ([yshift=-1.7em]s5.center) {{\color{white} $n$}};
+
+\draw [->,thick] ([yshift=-0.1em]s1.south) -- ([yshift=0.1em]t1.north);
+\draw [->,thick] ([yshift=-0.1em]s2.south) -- ([yshift=0.1em]t2.north);
+\draw [->,thick] ([yshift=-0.1em]s3.south) -- ([yshift=0.1em]t3.north);
+\draw [->,thick] ([yshift=-0.1em]s4.south) -- ([yshift=0.1em]t4.north);
+\draw [->,thick] ([yshift=-0.1em]s5.south) -- ([yshift=0.1em]t5.north);
+
+\begin{pgfonlayer}{background}
+{
+\node[rectangle,draw,inner sep=0.2em,fill=blue!10] [fit = (remark1) (t5)] (remark1label) {};
+}
+\end{pgfonlayer}
+}
+\end{scope}
+%% end of remark 1
+
+%% remark 2
+\begin{scope}
+{
+\node [anchor=north west,draw,inner sep=2pt,fill=blue!10] (remark2) at ([xshift=-0.2em,yshift=-1em]remark1.south west) {$best$用于保存当前最好的翻译结果};
+}
+\end{scope}
+%% end of remark 2
+
+%% remark 3
+\begin{scope}
+{
+\node [anchor=north west,draw,inner sep=2pt,fill=blue!10] (remark3) at ([yshift=-0.5em]remark2.south west) {$h$用于保存每步生成的所有译文候选};
+}
+\end{scope}
+%% end of remark 3
+
+%% remark 4
+\begin{scope}
+{
+\node [anchor=north west,inner sep=2pt,align=left] (remark4) at ([xshift=0.25em,yshift=-0.6em]remark3.south west) {\textsc{Join}($a,b$) 返回\\$a$ 和$b$ 的所有组合};
+
+{
+\node [anchor=north west,inner sep=1pt,align=center,draw] (a1) at ([yshift=-0.2em]remark4.north east) {a1\\a2};
+\node [anchor=west] (join) at (a1.east) {$\times$};
+\node [anchor=north west,inner sep=1pt,align=center,draw] (b1) at ([xshift=1.5em]a1.north east) {b1\\b2};
+\node [anchor=west] (join) at (b1.east) {$=$};
+\node [anchor=north west,inner sep=1pt,align=center,draw] (result) at ([xshift=1.5em]b1.north east) {a1b1 a1b2\\a2b1 a2b2};
+}
+
+\begin{pgfonlayer}{background}
+{
+\node[rectangle,draw,inner sep=2pt,fill=blue!10] [fit = (remark4) (result)] (remark4label) {};
+}
+\end{pgfonlayer}
+}
+\end{scope}
+%% end of remark 4
+
+%% remark 5
+\begin{scope}
+{
+\node [anchor=north west,align=left] (remark5) at ([xshift=0.0em,yshift=-1.3em]remark4.south west) {\textsc{PruneForTop1}\\保留得分最高的结果};
+\node [anchor=west,draw,inner sep=1pt] (s1) at ([yshift=-0.5em,xshift=1.2em]remark5.north east){0.234};
+\node [anchor=north west,draw,inner sep=1pt] (s2) at ([yshift=-0.2em]s1.south west){0.197};
+\node [anchor=north west,draw,inner sep=1pt] (s3) at ([yshift=-0.2em]s2.south west){0.083};
+
+\draw [-] ([yshift=-0.1em,xshift=-0.2em]s1.south west) -- ([yshift=-0.1em,xshift=3em]s1.south east);
+\node [anchor=west] (top1) at ([xshift=0.1em]s1.east) {\color{red}{$\gets$ top1}};
+
+\begin{pgfonlayer}{background}
+{
+\node[rectangle,draw,inner sep=0.2em,fill=blue!10] [fit = (remark5) (s3) (top1)] (remark5label) {};
+}
+\end{pgfonlayer}
+}
+\end{scope}
+% end of remark 5
+
+%% remark 6
+\begin{scope}
+{
+\node [anchor=north west,align=left] (remark6) at ([xshift=0.0em,yshift=-1.3em]remark5.south west) {记录已经翻译过\\的源语单词};
+
+\node [anchor=west,draw,thick,circle,minimum size=0.3em,inner sep=2.1pt,red] (s1) at ([yshift=-1.3em,xshift=0.5em]remark6.north east){1};
+\node [anchor=west,draw,thick,circle,minimum size=0.3em,inner sep=2.1pt,ugreen] (s2) at ([xshift=0.4em]s1.east) {2};
+\node [anchor=west,draw,thick,circle,minimum size=0.3em,inner sep=2.1pt,orange] (s3) at ([xshift=0.4em]s2.east) {3};
+\node [anchor=west,draw,thick,circle,minimum size=0.3em,inner sep=3.0pt,ublue] (s4) at ([xshift=0.4em]s3.east) {...};
+\node [anchor=west,draw,thick,circle,minimum size=0.3em,inner sep=1.5pt,purple] (s5) at ([xshift=0.4em]s4.east) {$m$};
+
+\draw [-,thick,red] (s1.north east) -- (s1.south west);
+\draw [-,thick,orange] (s3.north east) -- (s3.south west);
+
+\begin{pgfonlayer}{background}
+{
+\node[rectangle,draw,inner sep=0.2em,fill=blue!10] [fit = (remark6) (s5)] (remark6label) {};
+}
+\end{pgfonlayer}
+}
+\end{scope}
+% end of remark 6
+
+{
+\draw [->,thick] (line2highlight.east) ..controls +(east:1em) and +(west:1em).. (remark1label.west);
+}
+{
+\draw [->,thick] (line3highlight.east) ..controls +(east:1em) and +(west:1em).. ([yshift=0.3em]remark2.south west);
+\draw [->,thick] (line5highlight.east) ..controls +(east:1em) and +(west:1em).. ([yshift=0.3em]remark3.south west);
+}
+{
+\draw [->,thick] (line8highlight.east) ..controls +(east:1em) and +(west:1em).. ([yshift=0.5em]remark4label.west);
+}
+{
+\draw [->,thick] (line9highlight.east) ..controls +(east:1em) and +(west:1em).. ([yshift=0.5em]remark5label.west);
+}
+{
+\draw [->,thick] (line10highlight.east) ..controls +(east:1em) and +(west:1em).. ([yshift=0.5em]remark6label.south west);
+}
+}
+
+\end{tikzpicture}
+
+
+
+%---------------------------------------------------------------------
+
+
+
--- a/Book/Chapter3/Figures/figure-noise-channel-model.tex
+++ b/Book/Chapter3/Figures/figure-noise-channel-model.tex
+
+
+%%% outline
+%-------------------------------------------------------------------------
+
+
+\begin{tikzpicture}
+
+\node [draw,red,fill=red!10,thick,anchor=center,circle,inner sep=3.5pt] (s) at (0,0) {\black{$s$}};
+\node [draw,ublue,fill=blue!10,thick,anchor=center,circle,inner sep=3.3pt] (t) at ([xshift=1.5in]s.east) {\black{$t$}};
+
+\draw [<->,thick,] (s.east) -- (t.west) node [pos=0.5,draw,fill=white] {噪声信道};
+\node [anchor=east] at (s.west) {\scriptsize{信宿}};
+\node [anchor=west] at (t.east) {\scriptsize{信源}};
+
+\end{tikzpicture}
+
+
+
+
+%---------------------------------------------------------------------
+
+
+
--- a/Book/Chapter3/Figures/figure-scores-of-different-translation_model&language_model.tex
+++ b/Book/Chapter3/Figures/figure-scores-of-different-translation_model&language_model.tex
+%%% outline
+%-------------------------------------------------------------------------
+\begin{tabular}{| l | l |}
+\hline
+& {\footnotesize{$\prod\limits_{(j,i) \in \hat{A}} \textrm{P}(s_j,t_i)$} \color{red}{{\footnotesize{$\times\textrm{P}_{lm}(t)$}}}} \\ \hline
+
+\begin{tikzpicture}
+
+\begin{scope}
+
+{\footnotesize
+\begin{scope}
+\node [anchor=west] (s1) at (0,0) {我$_1$};
+\node [anchor=west] (s2) at ([xshift=2.2em]s1.east) {对$_2$};
+\node [anchor=west] (s3) at ([xshift=3.2em]s2.east) {你$_3$};
+\node [anchor=west] (s4) at ([xshift=3.6em]s3.east) {感到$_4$};
+\node [anchor=west] (s5) at ([xshift=1.9em]s4.east) {满意$_5$};
+\node [anchor=east] (s) at (s1.west) {$s=$};
+\end{scope}
+
+\begin{scope}[yshift=-3.6em]
+\node [anchor=west] (t1) at (0.35em,0) {I$_1$};
+\node [anchor=west] (t2) at ([xshift=2.3em,yshift=-0.1em]t1.east) {am$_2$};
+\node [anchor=west] (t3) at ([xshift=2.3em,yshift=0.1em]t2.east) {satisfied$_3$};
+\node [anchor=west] (t4) at ([xshift=2.3em]t3.east) {with$_4$};
+\node [anchor=west] (t5) at ([xshift=2.3em,yshift=-0.2em]t4.east) {you$_5$};
+\node [anchor=east] (t) at (t1.west) {$t'=$};
+\end{scope}
+
+
+\draw [-,thick,ublue,dashed] (s1.south) -- (t1.north);
+\draw [-,thick,ublue,dashed] (s4.south) -- ([yshift=0.3em]t2.north);
+\draw [-,thick,ublue,dashed] (s2.south) ..controls +(south:1em) and +(north:1em).. (t4.north);
+\draw [-,thick,ublue,dashed] (s3.south) ..controls +(south:0.5em) and +(north:1.5em).. (t5.north);
+\draw [-,thick,ublue,dashed] (s5.south) -- (t3.north);
+}
+
+\end{scope}
+
+\end{tikzpicture}
+
+& {\tikz{\node[minimum height=3.2em]{\small{0.0023}{\color{red}\small{$\times$0.0107}}};}} \\
+
+\begin{tikzpicture}
+
+\begin{scope}
+
+{\footnotesize
+\begin{scope}
+\node [anchor=west] (s1) at (0,0) {我$_1$};
+\node [anchor=west] (s2) at ([xshift=2.5em]s1.east) {对$_2$};
+\node [anchor=west] (s3) at ([xshift=2.5em]s2.east) {你$_3$};
+\node [anchor=west] (s4) at ([xshift=2.5em]s3.east) {感到$_4$};
+\node [anchor=west] (s5) at ([xshift=2.5em]s4.east) {满意$_5$};
+\node [anchor=east] (s) at (s1.west) {$s=$};
+\end{scope}
+
+\begin{scope}[yshift=-3.6em]
+\node [anchor=center] (t1) at ([yshift=-1.6em]s1.south) {I$_1$};
+\node [anchor=center] (t2) at ([yshift=-1.6em]s2.south) {with$_2$};
+\node [anchor=center] (t3) at ([yshift=-1.7em]s3.south) {you$_3$};
+\node [anchor=center] (t4) at ([yshift=-1.7em]s4.south) {am$_4$};
+\node [anchor=center] (t5) at ([yshift=-1.6em]s5.south) {satisfied$_5$};
+\node [anchor=center] (t) at ([xshift=-1.3em]t1.west) {$t''=$};
+\end{scope}
+
+
+\draw [-,thick,ublue,dashed] (s1.south) -- (t1.north);
+\draw [-,thick,ublue,dashed] (s2.south) -- (t2.north);
+\draw [-,thick,ublue,dashed] (s3.south) -- (t3.north);
+\draw [-,thick,ublue,dashed] (s4.south) -- (t4.north);
+\draw [-,thick,ublue,dashed] (s5.south) -- (t5.north);
+}
+
+\end{scope}
+
+\end{tikzpicture}
+
+&{ \tikz{\node[minimum height=3em]{\small{0.0023}{\color{red}\small{$\times$0.0009}}};}}\\
+\hline
+\end{tabular}
+
+
+%---------------------------------------------------------------------
+
+
+
--- a/Book/Chapter3/Figures/figure-word-alignment-instance.tex
+++ b/Book/Chapter3/Figures/figure-word-alignment-instance.tex
+
+%%% outline
+%-------------------------------------------------------------------------
+\begin{tikzpicture}
+    {\small
+    \node [anchor=west,inner sep=2pt] (s1) at (0,0) {$s_1$:在};
+    \node [anchor=west,inner sep=2pt] (s2) at ([xshift=1em]s1.east) {$s_2$:桌子};
+    \node [anchor=west,inner sep=2pt] (s3) at ([xshift=1em]s2.east) {$s_3$:上};
+    \node [anchor=north,inner sep=2pt] (t1) at ([yshift=-1.7em]s1.center) {$t_1$:on};
+    \node [anchor=north,inner sep=2pt] (t2) at ([yshift=-1.6em]s2.center) {$t_2$:the};
+    \node [anchor=north,inner sep=2pt] (t3) at ([yshift=-1.6em]s3.center) {$t_3$:table};
+    \node [anchor=east,inner sep=2pt] (t0) at ([xshift=-1.5em]t1.west) {$t_0$};
+    \draw [-] (s1.south) -- (t0.north);
+    \draw [-] (s2.south) -- (t3.north);
+    \draw [-] (s3.south) -- (t1.north);
+    }
+    \end{tikzpicture}
+
+%---------------------------------------------------------------------
+
+
--- a/Book/Chapter3/Figures/figure-zh-en-bilingual-sentence-pairs.tex
+++ b/Book/Chapter3/Figures/figure-zh-en-bilingual-sentence-pairs.tex
+
+%%% outline
+%-------------------------------------------------------------------------
+\begin{tikzpicture}
+{
+    \node [anchor=west,inner sep=2pt] (s1) at (0,0) {$s_1$:在};
+    \node [anchor=west,inner sep=2pt] (s2) at ([xshift=2em]s1.east) {$s_2$:桌子};
+    \node [anchor=west,inner sep=2pt] (s3) at ([xshift=2em]s2.east) {$s_3$:上};
+
+    \node [anchor=north,inner sep=2pt] (t1) at ([yshift=-2.4em]s1.center) {$t_1$:on};
+    \node [anchor=north,inner sep=2pt] (t2) at ([yshift=-2.4em]s2.center) {$t_2$:the};
+    \node [anchor=north,inner sep=2pt] (t3) at ([yshift=-2.4em]s3.center) {$t_3$:table};
+    \node [anchor=east,inner sep=2pt] (t0) at ([xshift=-2.2em]t1.west) {$t_0$};
+
+    \draw [-,dashed,thick] (s1.south) -- (t0.north);
+    \draw [-,dashed,thick] (s2.south) -- (t3.north);
+    \draw [-,dashed,thick] (s3.south) -- (t1.north);
+    }
+ \begin{pgfonlayer}{background}
+    {
+    \path [fill=red!20] (s2.north west) -- (s2.south west) -- (t3.north west) -- (t3.south west) -- (t3.south east) -- (t3.north east) -- (s2.south east) -- (s2.north east) -- (s2.north west);
+    }
+    \end{pgfonlayer}
+    \end{tikzpicture}
+
+%---------------------------------------------------------------------
+
+
+
--- a/Book/Chapter3/Figures/figure-zh-en-translation-sentence-pairs&word-alignment-connection.tex
+++ b/Book/Chapter3/Figures/figure-zh-en-translation-sentence-pairs&word-alignment-connection.tex
+
+%%% outline
+%-------------------------------------------------------------------------
+
+
+\begin{tikzpicture}
+
+\begin{scope}
+\node [anchor=west] (s1) at (0,0) {我$_1$};
+\node [anchor=west] (s2) at ([xshift=3em]s1.east) {对$_2$};
+\node [anchor=west] (s3) at ([xshift=4.6em]s2.east) {你$_3$};
+\node [anchor=west] (s4) at ([xshift=5.1em]s3.east) {感到$_4$};
+\node [anchor=west] (s5) at ([xshift=3.1em]s4.east) {满意$_5$};
+\node [anchor=east] (s) at (s1.west) {$s=$};
+\end{scope}
+
+\begin{scope}[yshift=-7.0em]
+\node [anchor=west] (t1) at (0.4em,0) {I$_1$};
+\node [anchor=west] (t2) at ([xshift=3.5em,yshift=-0.1em]t1.east) {am$_2$};
+\node [anchor=west] (t3) at ([xshift=3.5em,yshift=0.1em]t2.east) {satisfied$_3$};
+\node [anchor=west] (t4) at ([xshift=3.5em]t3.east) {with$_4$};
+\node [anchor=west] (t5) at ([xshift=3.5em,yshift=-0.2em]t4.east) {you$_5$};
+\node [anchor=east] (t) at ([xshift=-0.3em]t1.west) {$t=$};
+\end{scope}
+
+
+\draw [-,thick,ublue,dashed] (s1.south) -- (t1.north);
+\draw [-,thick,ublue,dashed] (s4.south) -- ([yshift=0.3em]t2.north);
+\draw [-,thick,ublue,dashed] (s2.south) ..controls +(south:1em) and +(north:1em).. (t4.north);
+\draw [-,thick,ublue,dashed] (s3.south) ..controls +(south:0.5em) and +(north:1.5em).. (t5.north);
+\draw [-,thick,ublue,dashed] (s5.south) -- (t3.north);
+
+\end{tikzpicture}
+
+
+%---------------------------------------------------------------------
+
+
+
--- a/Book/Chapter3/Figures/figure000.tex
+++ b/Book/Chapter3/Figures/figure000.tex
+
+
+\def\CTeXPreproc{Created by ctex v0.2.13, don't edit!}
+\documentclass[cjk,t,compress,12pt]{standalone}
+%\documentclass{article}
+%\usepackage{beamerarticle}
+\usepackage{pstricks}
+\usepackage{etex}
+\usepackage{eso-pic,graphicx}
+\usepackage{fancybox}
+\usepackage{amsmath,amssymb}
+\usepackage{setspace}
+\usepackage{xcolor}
+\usepackage{CJK}
+\usepackage{tikz}
+\usepackage{tikz-qtree}
+\usepackage{hyperref}
+
+\usetikzlibrary{arrows,decorations.pathreplacing}
+\usetikzlibrary{shadows} % LATEX and plain TEX when using Tik Z
+
+
+\usepgflibrary{arrows} % LATEX and plain TEX and pure pgf
+\usetikzlibrary{arrows} % LATEX and plain TEX when using Tik Z
+\usetikzlibrary{decorations}
+\usetikzlibrary{arrows,shapes}
+
+\usetikzlibrary{decorations.text}
+\usetikzlibrary{positioning,fit,calc}
+
+\usetikzlibrary{mindmap,backgrounds} % mind map
+
+\DeclareMathOperator*{\argmax}{arg\,max}
+\DeclareMathOperator*{\argmin}{arg\,min}
+\definecolor{ublue}{rgb}{0.152,0.250,0.545}
+\definecolor{ugreen}{rgb}{0,0.5,0}
+
+\begin{document}
+\begin{CJK}{UTF8}{you}
+
+
+%%% outline
+%-------------------------------------------------------------------------
+
+
+\begin{tikzpicture}
+
+{\footnotesize
+\node [anchor=west] (mid) at (0,0) {$\alpha(1,0)\alpha(2,0) + \alpha(1,0)\alpha(2,1) + \alpha(1,0)\alpha(2,2) +$};
+\node [anchor=west] (mid2) at ([yshift=-2em]mid.west) {$\alpha(1,1)\alpha(2,0) + \alpha(1,1)\alpha(2,1) + \alpha(1,1)\alpha(2,2)+$};
+\node [anchor=west] (mid3) at ([yshift=-2em]mid2.west) {$\alpha(1,2)\alpha(2,0) + \alpha(1,2)\alpha(2,1) + \alpha(1,2)\alpha(2,2)$};
+}
+
+\begin{pgfonlayer}{background}
+\node[rectangle,draw=ublue,red,inner sep=0.1em,fill=white] [fit = (mid) (mid2) (mid3)] (exampleeq) {};
+\end{pgfonlayer}
+
+{\footnotesize
+{
+\node [anchor=north] (eq1) at ([xshift=2em,yshift=-2em]exampleeq.south west) {$\sum\limits_{y_1=0}^{2} \sum\limits_{y_2=0}^{2} \alpha(1,y_1)\alpha(2,y_2)$};
+\node [anchor=west] (eq1part2) at ([xshift=-1em,yshift=-3em]eq1.west) {$=$};
+\node [anchor=west] (eq1part3) at ([xshift=-0.5em]eq1part2.east) {$\sum\limits_{y_1=0}^{2} \sum\limits_{y_2=0}^{2} \prod\limits_{x=1}^{2} $};
+\node [anchor=west,inner sep=2pt] (eq1part4) at ([xshift=-0.3em]eq1part3.east) {$\alpha(x,y_x)$};
+}
+
+{
+\node [anchor=north] (eq2) at ([xshift=-2em,yshift=-2em]exampleeq.south east) {$(\alpha(1,0)+\alpha(1,1)+\alpha(1,2))\cdot$};
+\node [anchor=west] (eq2part2) at ([yshift=-1.5em]eq2.west) {$(\alpha(2,0)+\alpha(2,1)+\alpha(2,2))$};
+\node [anchor=west] (eq2part3) at ([xshift=2.1in]eq1part2.east){$=$};
+\node [anchor=west] (eq2part4) at ([xshift=-0.5em]eq2part3.east){$\prod\limits_{x=1}^{2} \sum\limits_{y=0}^{2}$};
+\node [anchor=west,inner sep=2pt] (eq2part5) at ([xshift=-0.3em]eq2part4.east){$\alpha(x,y)$};
+}
+}
+
+\begin{pgfonlayer}{background}
+{
+\node[rectangle,draw=ublue,red,inner sep=0.1em,fill=white] [fit = (eq1) (eq1part2) (eq1part3)] (eq1full) {};
+}
+{
+\node[rectangle,draw=ublue,red,inner sep=0.1em,fill=white] [fit = (eq2) (eq2part2) (eq2part3) (eq2part4)] (eq2full) {};
+}
+\end{pgfonlayer}
+
+{
+\draw [->,thick] ([xshift=-3em]exampleeq.south) .. controls +(south:1.5em) and +(north:1.5em) .. (eq1full.north);
+}
+{
+\draw [->,thick] ([xshift=3em]exampleeq.south) .. controls +(south:1.5em) and +(north:1.5em) .. (eq2full.north);
+}
+
+{
+\node [anchor=west] at ([xshift=0.7em]eq1full.east) {\LARGE{\textbf{=}}};
+}
+
+{
+{\large
+\node [anchor=west] (feq) at ([xshift=3em,yshift=-3em]eq1full.south west) {$\sum\limits_{a_1=0}^{l} ... \sum\limits_{a_m=0}^{l} \prod\limits_{j=1}^{m}$};
+\node [anchor=west,inner sep=2pt,fill=blue!20] (feqpart2) at ([xshift=-0.3em]feq.east) {$f(s_j|t_{a_j})$};
+\node [anchor=west,inner sep=1pt] (feqpart3) at (feqpart2.east) {=};
+\node [anchor=west] (feqpart4) at (feqpart3.east) {$\prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l}$};
+\node [anchor=west,inner sep=2pt,fill=blue!20] (feqpart5) at ([xshift=-0.3em]feqpart4.east) {$f(s_j|t_i)$};
+}
+
+\draw [->,thick] (eq1part4.south) .. controls +(south:2.5em) and +(north:2.5em) .. (feqpart2.north);
+\draw [->,thick] (eq2part5.south) .. controls +(south:1.5em) and +(north:1.5em) .. (feqpart5.north);
+
+\node [anchor=west,inner sep=2pt,fill=blue!20] (eq1part4) at ([xshift=-0.3em]eq1part3.east) {\footnotesize{$\alpha(x,y_x)$}};
+\node [anchor=west,inner sep=2pt,fill=blue!20] (eq2part5) at ([xshift=-0.3em]eq2part4.east){\footnotesize{$\alpha(x,y)$}};
+
+}
+
+\end{tikzpicture}
+
+
+%---------------------------------------------------------------------
+\end{CJK}
+\end{document}
+
+
--- a/Book/bibliography.bib
+++ b/Book/bibliography.bib
@@ -68,3 +68,155 @@
  year={1993},
  publisher={Springer}
 }
+@article{rabiner1989tutorial,
+  title={A tutorial on hidden Markov models and selected applications in speech recognition},
+  author={Rabiner, Lawrence R},
+  journal={Proceedings of the IEEE},
+  volume={77},
+  number={2},
+  pages={257--286},
+  year={1989},
+  publisher={Ieee}
+}
+
+@article{rabiner1986introduction,
+  title={An introduction to hidden Markov models},
+  author={Rabiner, Lawrence and Juang, B},
+  journal={ieee assp magazine},
+  volume={3},
+  number={1},
+  pages={4--16},
+  year={1986},
+  publisher={IEEE}
+}
+
+@article{parsing2009speech,
+  title={Speech and language processing},
+  author={Parsing, Constituency},
+  year={2009}
+}
+
+@article{ney1994structuring,
+  title={On structuring probabilistic dependences in stochastic language modelling},
+  author={Ney, Hermann and Essen, Ute and Kneser, Reinhard},
+  journal={Computer Speech \& Language},
+  volume={8},
+  number={1},
+  pages={1--38},
+  year={1994}
+}
+
+@article{chen1999empirical,
+  title={An empirical study of smoothing techniques for language modeling},
+  author={Chen, Stanley F and Goodman, Joshua},
+  journal={Computer Speech \& Language},
+  volume={13},
+  number={4},
+  pages={359--394},
+  year={1999},
+  publisher={Elsevier}
+}
+
+@book{茆诗松2011概率论与数理统计教程,
+  title={概率论与数理统计教程: 第二版},
+  author={茆诗松 and 程依明 and 濮晓龙 and 平装 and 查看清 and 单书目},
+  year={2011},
+  publisher={北京: 高等教育出版社}
+}
+
+@book{kolmogorov2018foundations,
+  title={Foundations of the theory of probability: Second English Edition},
+  author={Kolmogorov, Andre Nikolaevich and Bharucha-Reid, Albert T},
+  year={2018},
+  publisher={Courier Dover Publications}
+}
+
+@article{brown1993mathematics,
+  title={The mathematics of statistical machine translation: Parameter estimation},
+  author={Brown, Peter F and Pietra, Vincent J Della and Pietra, Stephen A Della and Mercer, Robert L},
+  journal={Computational linguistics},
+  volume={19},
+  number={2},
+  pages={263--311},
+  year={1993},
+  publisher={MIT Press}
+}
+
+@article{shannon1949communication,
+  title={Communication theory of secrecy systems},
+  author={Shannon, Claude E},
+  journal={Bell system technical journal},
+  volume={28},
+  number={4},
+  pages={656--715},
+  year={1949},
+  publisher={Wiley Online Library}
+}
+
+@article{brown1990statistical,
+  title={A statistical approach to machine translation},
+  author={Brown, Peter F and Cocke, John and Della Pietra, Stephen A and Della Pietra, Vincent J and Jelinek, Frederick and Lafferty, John and Mercer, Robert L and Roossin, Paul S},
+  journal={Computational linguistics},
+  volume={16},
+  number={2},
+  pages={79--85},
+  year={1990}
+}
+
+@book{刘克2004实用马尔可夫决策过程,
+  title={实用马尔可夫决策过程},
+  author={刘克},
+  volume={3},
+  year={2004},
+  publisher={清华大学出版社有限公司}
+}
+
+@book{resnick1992adventures,
+  title={Adventures in stochastic processes},
+  author={Resnick, Sidney I},
+  year={1992},
+  publisher={Springer Science \& Business Media}
+}
+
+@article{good1953population,
+  title={The population frequencies of species and the estimation of population parameters},
+  author={Good, Irving J},
+  journal={Biometrika},
+  volume={40},
+  number={3-4},
+  pages={237--264},
+  year={1953},
+  publisher={Oxford University Press}
+}
+
+@article{gale1995good,
+  title={Good-turing frequency estimation without tears},
+  author={Gale, William A and Sampson, Geoffrey},
+  journal={Journal of quantitative linguistics},
+  volume={2},
+  number={3},
+  pages={217--237},
+  year={1995},
+  publisher={Taylor \& Francis}
+}
+
+@inproceedings{kneser1995improved,
+  title={Improved backing-off for m-gram language modeling},
+  author={Kneser, Reinhard and Ney, Hermann},
+  booktitle={1995 International Conference on Acoustics, Speech, and Signal Processing},
+  volume={1},
+  pages={181--184},
+  year={1995},
+  organization={IEEE}
+}
+
+@article{chen1999empirical,
+  title={An empirical study of smoothing techniques for language modeling},
+  author={Chen, Stanley F and Goodman, Joshua},
+  journal={Computer Speech \& Language},
+  volume={13},
+  number={4},
+  pages={359--394},
+  year={1999},
+  publisher={Elsevier}
+}
\ No newline at end of file