updates of Appendix B

19d4ab2e · xiaotong · d9ab6107 · 19d4ab2e · 19d4ab2e · 19d4ab2e
Commit 19d4ab2e authored Apr 15, 2020 by xiaotong
--- a/Book/ChapterAppend/ChapterAppend.tex
+++ b/Book/ChapterAppend/ChapterAppend.tex
@@ -51,7 +51,7 @@ c(1|\mathbf{s},\mathbf{t}) & = & \sum_{\mathbf{a}}\big[\textrm{P}_{\theta}(\math
 t(s|t) & = & \lambda_{t}^{-1} \times \sum_{k=1}^{K}c(s|t;\mathbf{s}^{[k]},\mathbf{t}^{[k]}) \label{eq:1.7} \\
 d(j|i,m,l) & = & \mu_{iml}^{-1} \times \sum_{k=1}^{K}c(j|i,m,l;\mathbf{s}^{[k]},\mathbf{t}^{[k]}) \label{eq:1.8} \\
 n(\varphi|t) & = & \nu_{t}^{-1} \times \sum_{s=1}^{K}c(\varphi |t;\mathbf{s}^{[k]},\mathbf{t}^{[k]}) \label{eq:1.9} \\
-p^x & = & \zeta^{-1} \sum_{k=1}^{K}c(x;\mathbf{s}^{[k]},\mathbf{t}^{[k]}) \label{eq:1.10}
+p_x & = & \zeta^{-1} \sum_{k=1}^{K}c(x;\mathbf{s}^{[k]},\mathbf{t}^{[k]}) \label{eq:1.10}
 \end{eqnarray}
 %----------------------------------------------

@@ -62,7 +62,7 @@ c(s|t,\mathbf{s},\mathbf{t}) \approx \sum_{\mathbf{a} \in \mathbf{S}}\big[\textr
 \end{eqnarray}
 %----------------------------------------------

-\parinterval 同理可以获得式\ref{eq:1.3}-\ref{eq:1.6}的修改结果。进一步，在IBM模型3中，可以如下定义\textrm{S}：
+\parinterval 同理可以获得式\ref{eq:1.3}-\ref{eq:1.6}的修改结果。进一步，在IBM模型3中，可以如下定义$S$：

 \begin{eqnarray}
 S = N(b^{\infty}(V(\mathbf{s}|\mathbf{t};2))) \cup (\mathop{\cup}\limits_{ij} N(b_{i \leftrightarrow j}^{\infty}(V_{i \leftrightarrow j}(\mathbf{s}|\mathbf{t},2))))
@@ -74,64 +74,48 @@ S = N(b^{\infty}(V(\mathbf{s}|\mathbf{t};2))) \cup (\mathop{\cup}\limits_{ij} N(

 \begin{itemize}
 \item $V(\mathbf{s}|\mathbf{t})$表示Viterbi词对齐，$V(\mathbf{s}|\mathbf{t},1)$、$V(\mathbf{s}|\mathbf{t},2)$和$V(\mathbf{s}|\mathbf{t},3)$就分别对应了模型1、2 和3 的Viterbi 词对齐； 
-\item 把那些满足第$j$个源语言语单词对应第$i$个目标语言单词（$a_j=i$）的词对齐构成的集合记为$\mathbf{A}_{i \leftrightarrow j}(\mathbf{s},\mathbf{t})$。通常称这些对齐中$j$和$i$被``钉''在了一起。在$\mathbf{A}_{i \leftrightarrow j}(\mathbf{s},\mathbf{t})$中使$\textrm{P}(\mathbf{a}|\mathbf{s},\mathbf{t})$达到最大的那个词对齐被记为$V_{i \leftrightarrow j}(\mathbf{s},\mathbf{t})$；
+\item 把那些满足第$j$个源语言单词对应第$i$个目标语言单词（$a_j=i$）的词对齐构成的集合记为$\mathbf{A}_{i \leftrightarrow j}(\mathbf{s},\mathbf{t})$。通常称这些对齐中$j$和$i$被``钉''在了一起。在$\mathbf{A}_{i \leftrightarrow j}(\mathbf{s},\mathbf{t})$中使$\textrm{P}(\mathbf{a}|\mathbf{s},\mathbf{t})$达到最大的那个词对齐被记为$V_{i \leftrightarrow j}(\mathbf{s},\mathbf{t})$；
 \item 如果两个词对齐，通过交换两个词对齐连接就能互相转化，则称它们为邻居。一个词对齐$\mathbf{a}$的所有邻居记为$N(\mathbf{a})$。
 \end{itemize}

 \vspace{0.3em}
-\parinterval 公式\ref{eq:1.12}中，$b^{\infty}(V(\mathbf{s}|\mathbf{t};2))$ 和 $b_{i \leftrightarrow j}^{\infty}(V_{i \leftrightarrow j}(\mathbf{s}|\mathbf{t},2))$ 分别是对 $V(\mathbf{s}|\mathbf{t};3)$ 和 $V_{i \leftrightarrow j}(\mathbf{s}|\mathbf{t},3)$ 的估计。在计算$S$的过程中，需要知道一个对齐$\bf{a}$的邻居$\bf{a}^{'}$的概率，即如何通过$\textrm{P}_{\theta}(\mathbf{a},\mathbf{s}|\mathbf{t})$计算$\textrm{p}_{\theta}(\mathbf{a}',\mathbf{s}|\mathbf{t})$。在模型3中，如果$\bf{a}$和$\bf{a}'$区别于某个源语单词的对齐到的目标位置上（$a_j \neq a_{j}'$），那么
+\parinterval 公式\ref{eq:1.12}中，$b^{\infty}(V(\mathbf{s}|\mathbf{t};2))$ 和 $b_{i \leftrightarrow j}^{\infty}(V_{i \leftrightarrow j}(\mathbf{s}|\mathbf{t},2))$ 分别是对 $V(\mathbf{s}|\mathbf{t};3)$ 和 $V_{i \leftrightarrow j}(\mathbf{s}|\mathbf{t},3)$ 的估计。在计算$S$的过程中，需要知道一个对齐$\bf{a}$的邻居$\bf{a}^{'}$的概率，即通过$\textrm{P}_{\theta}(\mathbf{a},\mathbf{s}|\mathbf{t})$计算$\textrm{p}_{\theta}(\mathbf{a}',\mathbf{s}|\mathbf{t})$。在模型3中，如果$\bf{a}$和$\bf{a}'$仅区别于某个源语单词对齐到的目标位置上（$a_j \neq a_{j}'$），那么

-\begin{small}
 \begin{eqnarray}
-\textrm{P}_{\theta}(\mathbf{a}',\mathbf{s}|\mathbf{t}) = \textrm{P}_{\theta}(\mathbf{a},\mathbf{s}|\mathbf{t}) \cdot \frac{\varphi_{j'}+1}{\varphi_j} \cdot \frac{n(\varphi_{j'}+1|t_{j'})}{n(\varphi_{j'}|t_{j'})} \cdot \frac{n(\varphi_{j-1}|t_{j})}{n(\varphi_{j}|t_{j})} \cdot \frac{t(s_i|t_{j'})}{t(s_{i}|t_{j})} \cdot \frac{d(i|j',m,l)}{d(i|j,m,l)}
+\textrm{P}_{\theta}(\mathbf{a}',\mathbf{s}|\mathbf{t}) & = & \textrm{P}_{\theta}(\mathbf{a},\mathbf{s}|\mathbf{t}) \cdot  \nonumber \\
+                                                                                   &     & \frac{\varphi_{i'}+1}{\varphi_i} \cdot \frac{n(\varphi_{i'}+1|t_{i'})}{n(\varphi_{i'}|t_{i'})} \cdot \frac{n(\varphi_{i}-1|t_{i})}{n(\varphi_{i}|t_{i})} \cdot \nonumber \\
+                                                                                   &     & \frac{t(s_j|t_{i'})}{t(s_{j}|t_{i})} \cdot \frac{d(j|i',m,l)}{d(j|i,m,l)}
 \label{eq:1.13}
 \end{eqnarray}
-\end{small}
 %----------------------------------------------

-\parinterval 如果$\bf{a}$和$\bf{a}'$区别于两个位置$i_1$和$i_2$的对齐上，$a_{j_{1}}=a{j_{2}}'$且$a_{j_{2}}=a{j_{1}}'$，那么
+\parinterval 如果$\bf{a}$和$\bf{a}'$区别于两个位置$j_1$和$j_2$的对齐上，$a_{j_{1}}=a_{j_{2}^{'}}$且$a_{j_{2}}=a_{j_{1}^{'}}$，那么
 \begin{eqnarray}
-\textrm{P}_{\theta}(\mathbf{a'},\mathbf{s}|\mathbf{t}) = \textrm{P}_{\theta}(\mathbf{a},\mathbf{s}|\mathbf{t}) \cdot \frac{t(s_{i_{2}}|t_{a_{i_{2}}})}{t(s_{i_{1}}|t_{a{i_{1}}})} \cdot \frac{d(i_{2})|a{i_{2}},m,l)}{d(i_{1}|a_{i_{1}},m,l)}
+\textrm{P}_{\theta}(\mathbf{a'},\mathbf{s}|\mathbf{t}) = \textrm{P}_{\theta}(\mathbf{a},\mathbf{s}|\mathbf{t}) \cdot \frac{t(s_{j_{2}}|t_{a_{j_{2}}})}{t(s_{j_{1}}|t_{a_{j_{1}}})} \cdot \frac{d(j_{2}|a_{j_{2}},m,l)}{d(j_{1}|a_{j_{1}},m,l)}
 \label{eq:1.14}
 \end{eqnarray}
 %----------------------------------------------

-\parinterval 这样每次迭代就可以仅在\textrm{S}上进行计数。相比整个词对齐空间，\textrm{S}只是一个非常小的子集，因此运算复杂度可以大大被降低。本质上说，这里定义\textrm{S}是为了用模型2的Viterbi对齐来估计模型3的Viterbi对齐。
-
-\parinterval 对于模型3的参数估计过程，实际上是建立在模型1和模型2的参数估计结果上的。这不仅是因为模型3要利用模型2的Viterbi对齐，而且还因为模型3参数的初值也要直接利用模型2的参数。从这个角度说，模型1，2，3是有序的且向前依赖的。单独的对模型3的参数进行估计是极其困难的。实际上IBM的模型4和模型5也具有这样的性质，即他们都可以利用前一个模型参数估计的结果作为自身参数的初始值。
+\parinterval 相比整个词对齐空间，$S$只是一个非常小的子集，因此运算复杂度可以大大被降低。可以看到，模型3的参数估计过程是建立在模型1和模型2的参数估计结果上的。这不仅是因为模型3要利用模型2的Viterbi对齐，而且还因为模型3参数的初值也要直接利用模型2的参数。从这个角度说，模型1，2，3是有序的且向前依赖的。单独的对模型3的参数进行估计是极其困难的。实际上IBM的模型4和模型5也具有这样的性质，即它们都可以利用前一个模型参数估计的结果作为自身参数的初始值。

 \section{IBM模型4训练方法}

-\parinterval 模型4的参数估计基本与模型3一致。需要修改的是扭曲度的估计公式，如下：
+\parinterval 模型4的参数估计基本与模型3一致。需要修改的是扭曲度的估计公式，对于目标语第$i$个cept.生成的第一单词，可以得到（假设有$K$个训练样本）：
 \begin{eqnarray}
-c_1(\Delta_i|ca,cb;\mathbf{s},\mathbf{t}) = \sum_{\mathbf{a}}(\textrm{P}_{\theta}(\mathbf{s},\mathbf{a}|\mathbf{t}) \times s_1(\Delta_i|ca,cb;\mathbf{a},\mathbf{s},\mathbf{t}))
+d_1(\Delta_j|ca,cb;\mathbf{s},\mathbf{t}) = \mu_{1cacb}^{-1} \times \sum_{k=1}^{K}c_1(\Delta_j|ca,cb;\mathbf{s}^{[k]},\mathbf{t}^{[k]})
 \label{eq:1.15}
 \end{eqnarray}
-\begin{small}
-\begin{eqnarray}
-s_1(\Delta_i|ca,cb;\rm{a},\mathbf{s},\mathbf{t}) = \sum_{p=1}^l (\varepsilon(\phi_p) \cdot \delta(\pi_{p1}-\odot _{[p]},\Delta_i) \cdot \delta(A(e_{p-1}),ca) \cdot \delta(B(\tau_{p1}),cb))
-\label{eq:1.16}
-\end{eqnarray}
-\end{small}
-\begin{eqnarray}
-d_1(\Delta_i|ca,cb;\mathbf{s},\mathbf{t}) = \mu_{1cacb}^{-1} \times \sum_{s=1}^{S}c(\Delta_i|ca,cb;\mathbf{s}(s),\mathbf{t}(s))
-\label{eq:1.17}
-\end{eqnarray}
-\begin{eqnarray}
-c_{>1}(\Delta_i|cb;\mathbf{s},\mathbf{t}) = \sum_{\mathbf{a}}(\textrm{p}_{\theta}(\mathbf{s},\mathbf{a}|\mathbf{t}) \times s_{>1}(\Delta_i|cb;\mathbf{a},\mathbf{s},\mathbf{t}))
-\label{eq:1.18}
-\end{eqnarray}
-\begin{eqnarray}
-s_{>1}(\Delta_i|cb;\mathbf{a},\mathbf{s},\mathbf{t}) = \sum_{p=1}^l(\varepsilon(\phi_p-1)\sum_{k=2}^{\phi_p}\delta(p-\pi_{[p]k-1},\Delta_i) \cdot \delta(B(\tau_{[p]k}),cb))
-\label{eq:1.19}
-\end{eqnarray}
+
+其中，
+
 \begin{eqnarray}
-d_{>1}(\Delta_i|cb;\mathbf{s},\mathbf{t}) = \mu_{>1cb}^{-1} \times \sum_{s=1}^{S}c_{>1}(\Delta_i|cb;\mathbf{s}(s),\mathbf{t}(s))
-\label{eq:1.20}
+c_1(\Delta_j|ca,cb;\mathbf{s},\mathbf{t})           & = & \sum_{\mathbf{a}}\big[\textrm{P}_{\theta}(\mathbf{s},\mathbf{a}|\mathbf{t}) \times s_1(\Delta_j|ca,cb;\mathbf{a},\mathbf{s},\mathbf{t})\big] \label{eq:1.16} \\
+s_1(\Delta_j|ca,cb;\rm{a},\mathbf{s},\mathbf{t}) & = & \sum_{i=1}^l \big[\varepsilon(\phi_i) \cdot \delta(\pi_{i1}-\odot _{i},\Delta_j) \cdot \nonumber \\
+                                                                           &     & \delta(A(t_{i-1}),ca) \cdot \delta(B(\tau_{i1}),cb) \big] \label{eq:1.17}
 \end{eqnarray}
-%----------------------------------------------

-\parinterval 其中，
+且
+
 \begin{eqnarray}
 \varepsilon(x) = \begin{cases}
 0 & x \leq 0 \\
@@ -139,63 +123,78 @@ d_{>1}(\Delta_i|cb;\mathbf{s},\mathbf{t}) = \mu_{>1cb}^{-1} \times \sum_{s=1}^{S
 \end{cases}
 \label{eq:1.21}
 \end{eqnarray}
+
+对于目标语第$i$个cept.生成的其他单词（非第一个单词），可以得到：
+
+\begin{eqnarray}
+d_{>1}(\Delta_j|cb;\mathbf{s},\mathbf{t}) = \mu_{>1cb}^{-1} \times \sum_{k=1}^{K}c_{>1}(\Delta_j|cb;\mathbf{s}^{[k]},\mathbf{t}^{[k]})
+\label{eq:1.18}
+\end{eqnarray}
+
+其中，
+
+\begin{eqnarray}
+c_{>1}(\Delta_j|cb;\mathbf{s},\mathbf{t})                  & = & \sum_{\mathbf{a}}\big[\textrm{p}_{\theta}(\mathbf{s},\mathbf{a}|\mathbf{t}) \times s_{>1}(\Delta_j|cb;\mathbf{a},\mathbf{s},\mathbf{t}) \big] \label{eq:1.19} \\
+s_{>1}(\Delta_j|cb;\mathbf{a},\mathbf{s},\mathbf{t}) & = & \sum_{i=1}^l \big[\varepsilon(\phi_i-1)\sum_{k=2}^{\phi_i}\delta(\pi_{[i]k}-\pi_{[i]k-1},\Delta_j) \cdot \nonumber ß\\
+                                                                                  &    & \delta(B(\tau_{[i]k}),cb) \big] \label{eq:1.20}
+\end{eqnarray}
+
 %----------------------------------------------

-\parinterval $ca$和$cb$分别表示目标语和源语的某个词类。
+\noindent 这里，$ca$和$cb$分别表示目标语言和源语言的某个词类。模型4需要像模型3一样，通过定义一个词对齐集合$S$，使得每次迭代都在$S$上进行，进而降低运算量。模型4中$S$的定义为：

-\parinterval 模型4需要像模型3一样，通过定义一个词对齐集合\textrm{S}，使得每次迭代都在\textrm{S}上进行，进而降低运算量。模型4中\textrm{S}的定义为，
 \begin{eqnarray}
 \textrm{S} = N(\tilde{b}^{\infty}(V(\mathbf{s}|\mathbf{t};2))) \cup (\mathop{\cup}\limits_{ij} N(\tilde{b}_{i \leftrightarrow j}^{\infty}(V_{i \leftrightarrow j}(\mathbf{s}|\mathbf{t},2))))
 \label{eq:1.22}
 \end{eqnarray}
 %----------------------------------------------

-\parinterval 对于一个对齐$\mathbf{a}$，可用模型3对它的邻居进行排名，即按$\textrm{p}_{\theta}(b(\mathbf{a})|\mathbf{s},\mathbf{t};3)$排序。$\tilde{b}(\mathbf{a})$ \\ 表示这个排名表中满足$\textrm{p}_{\theta}(\mathbf{a}'|\mathbf{s},\mathbf{t};4) > \textrm{P}_{\theta}⁡(\mathbf{a}|\mathbf{s},\mathbf{t};4)$的最高排名的$\mathbf{a}'$。同理可知$\tilde{b}_{i \leftrightarrow j}^{\infty}(\mathbf{a})$ \\ 的意义。这里之所以不用模型3中采用的方法直接利用$b^{\infty}(\mathbf{a})$得到模型4中高概率的对齐，是因为模型4中，要想获得某个对齐$\mathbf{a}$的邻居$\mathbf{a}'$，必须做很大调整，比如：调整$\tau_{[j]1}$和$\odot_{[j]}$等等。这个过程要比模型3的相应过程复杂得多。因此在模型4中只能借助于模型3的中间步骤来进行估计。
+\parinterval 对于一个对齐$\mathbf{a}$，可用模型3对它的邻居进行排名，即按$\textrm{P}_{\theta}(b(\mathbf{a})|\mathbf{s},\mathbf{t};3)$排序，其中$b(\mathbf{a})$表示$\mathbf{a}$的邻居。$\tilde{b}(\mathbf{a})$ 表示这个排名表中满足$\textrm{P}_{\theta}(\mathbf{a}'|\mathbf{s},\mathbf{t};4) > \textrm{P}_{\theta}⁡(\mathbf{a}|\mathbf{s},\mathbf{t};4)$的最高排名的$\mathbf{a}'$。同理可知$\tilde{b}_{i \leftrightarrow j}^{\infty}(\mathbf{a})$ 的意义。这里之所以不用模型3中采用的方法直接利用$b^{\infty}(\mathbf{a})$得到模型4中高概率的对齐，是因为模型4中，要想获得某个对齐$\mathbf{a}$的邻居$\mathbf{a}'$，必须做很大调整，比如：调整$\tau_{[i]1}$和$\odot_{i}$等等。这个过程要比模型3的相应过程复杂得多。因此在模型4中只能借助于模型3的中间步骤来进行参数估计。
 \setlength{\belowdisplayskip}{3pt}%调整空白大小
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \section{IBM模型5训练方法}
-\parinterval 模型5的参数估计过程也与模型3的过程基本一致，二者的区别在于扭曲度的估计公式。在模型5中，
+\parinterval 模型5的参数估计过程也与模型3的过程基本一致，二者的区别在于扭曲度的估计公式。在模型5中，对于目标语第$i$个cept.生成的第一单词，可以得到（假设有$K$个训练样本）：
+
 \begin{eqnarray}
-c_1(\Delta_i|cb,v1,v2;\mathbf{s},\mathbf{t}) = \sum_{\mathbf{a}}(\textrm{P}(\mathbf{s},\mathbf{a}|\mathbf{t}) \times s_1(\Delta_i|cb,v1,v2;\mathbf{a},\mathbf{s},\mathbf{t}))
+d_1(\Delta_j|cb;\mathbf{s},\mathbf{t}) = \mu_{1cb}^{-1} \times \sum_{k=1}^{K}c_1(\Delta_j|cb;\mathbf{s}^{[k]},\mathbf{t}^{[k]})
 \label{eq:1.23}
 \end{eqnarray}
+
+其中，
+
 \begin{eqnarray}
-s_1(\Delta_i|cb,v1,v2;\rm{a},\mathbf{s},\mathbf{t}) & = & \sum_{p=1}^l (\varepsilon(\phi_p) \cdot \delta(v_{\pi_{p1}},\Delta_i) \cdot \delta(X_{\{p-1\}},v1) \nonumber \\
-& & \cdot \delta(v_m-\phi_p+1,v2) \cdot \delta(v_{\pi_{p1}},v_{\pi_{p1-1}})
-\label{eq:1.24}
-\end{eqnarray}
-\begin{eqnarray}
-d_1(\Delta_i|cb;\mathbf{s},\mathbf{t}) = \mu_{1cb}^{-1} \times \sum_{s=1}^{S}c(\Delta_i|cb;\mathbf{f}(s),\mathbf{e}(s))
-\label{eq:1.25}
+c_1(\Delta_j|cb,v_x,v_y;\mathbf{s},\mathbf{t})                   & = & \sum_{\mathbf{a}}\Big[ \textrm{P}(\mathbf{s},\mathbf{a}|\mathbf{t}) \times s_1(\Delta_j|cb,v_x,v_y;\mathbf{a},\mathbf{s},\mathbf{t}) \Big] \label{eq:1.24} \\
+s_1(\Delta_j|cb,v_x,v_y;\mathbf{a},\mathbf{s},\mathbf{t}) & = & \sum_{i=1}^l \Big [ \varepsilon(\phi_i) \cdot \delta(v_{\pi_{i1}},\Delta_j) \cdot \delta(v_{\odot _{i-1}},v_x) \nonumber \\
+                                                                                          &    & \cdot \delta(v_m-\phi_i+1,v_y) \cdot \delta(v_{\pi_{i1}},v_{\pi_{i1}-1} )\Big] \label{eq:1.25}
 \end{eqnarray}
+
+
+对于目标语第$i$个cept.生成的其他单词（非第一个单词），可以得到：
+
 \begin{eqnarray}
-c_{>1}(\Delta_i|cb,v;\mathbf{s},\mathbf{t}) = \sum_{\mathbf{a}}(\textrm{p}(\mathbf{f},\mathbf{s}|\mathbf{t}) \times s_{>1}(\Delta_i|cb,v;\mathbf{a},\mathbf{s},\mathbf{t}))
+d_{>1}(\Delta_j|cb,v;\mathbf{s},\mathbf{t}) = \mu_{>1cb}^{-1} \times \sum_{k=1}^{K}c_{>1}(\Delta_j|cb,v;\mathbf{s}^{[k]},\mathbf{t}^{[k]})
 \label{eq:1.26}
 \end{eqnarray}
-%\begin{small}
-\begin{eqnarray}
-s_{>1}(\Delta_i|cb,v;\mathbf{a},\mathbf{s},\mathbf{t}) & = & \sum_{p=1}^l(\varepsilon(\phi_p-1)\sum_{k=2}^{\phi_p}(\delta(v_{\pi_{pk}}-V_{\pi_{[p]k-1}},\Delta_i)  \nonumber \\
-& & \cdot \delta(B(\tau_{[p]k}) ,cb) \cdot \delta(vm-v_{\pi_{p(k-1)}}-\phi_p+k,v) \nonumber \\
-& & \cdot \delta(v_{\pi_{p1}},v_{\pi_{p1-1}})))
-\label{eq:1.27}
-\end{eqnarray}
-%\end{small}
+
+其中，
+
 \begin{eqnarray}
-d_{>1}(\Delta_i|cb,v;\mathbf{s},\mathbf{t}) = \mu_{>1cb}^{-1} \times \sum_{s=1}^{S}c_{>1}(\Delta_i|cb,v;\mathbf{f}(s),\mathbf{e}(s))
-\label{eq:1.28}
+c_{>1}(\Delta_j|cb,v;\mathbf{s},\mathbf{t})                   & =  & \sum_{\mathbf{a}}\Big[\textrm{P}(\mathbf{a},\mathbf{s}|\mathbf{t}) \times s_{>1}(\Delta_j|cb,v;\mathbf{a},\mathbf{s},\mathbf{t}) \Big] \label{eq:1.27} \\
+s_{>1}(\Delta_j|cb,v;\mathbf{a},\mathbf{s},\mathbf{t}) & = & \sum_{i=1}^l\Big[\varepsilon(\phi_i-1)\sum_{k=2}^{\phi_i} \big[\delta(v_{\pi_{ik}}-v_{\pi_{[i]k}-1},\Delta_j)  \nonumber \\
+                                                                                    &     & \cdot \delta(B(\tau_{[i]k}) ,cb) \cdot \delta(v_m-v_{\pi_{i(k-1)}}-\phi_i+k,v) \nonumber \\
+                                                                                    &     & \cdot \delta(v_{\pi_{i1}},v_{\pi_{i1}-1}) \big] \Big] \label{eq:1.28}
 \end{eqnarray}
+
 %----------------------------------------------
 \vspace{0.5em}

-\parinterval 这里$X_{\{p-1\}}$表示在位置小于$p$的非空对的目标语单词对应的源语单词的平均置位。
-
-\parinterval 从式(\ref{eq:1.24})中可以看出因子$\delta(v_{\pi_{p1}},v_{\pi_{p1-1}})$保证了，即使对齐$\mathbf{a}$不合理（一个源语位置对应多个目标语位置）也可以避免在这个不合理的对齐上计算结果。需要注意的是因子$\delta(v_{\pi_{p1}},v_{\pi_{p1-1}})$，只能保证$\mathbf{a}$中不合理的部分不产生坏的影响，而$\mathbf{a}$中其他正确的部分仍会参与迭代。
+\parinterval 从式(\ref{eq:1.24})中可以看出因子$\delta(v_{\pi_{i1}},v_{\pi_{i1}-1})$保证了，即使对齐$\mathbf{a}$不合理（一个源语位置对应多个目标语位置）也可以避免在这个不合理的对齐上计算结果。需要注意的是因子$\delta(v_{\pi_{p1}},v_{\pi_{p1-1}})$，确保了$\mathbf{a}$中不合理的部分不产生坏的影响，而$\mathbf{a}$中其他正确的部分仍会参与迭代。

-\parinterval 不过上面的参数估计过程与前面4个模型中参数估计过程并不完全一样。前面四个模型在每次迭代中，可以在给定$\mathbf{s}$、$\mathbf{t}$和一个对齐$\mathbf{a}$的情况下直接计算并更新参数。但是在模型5的参数估计过程中，如公式(\ref{eq:1.24})中，需要模拟出由$\mathbf{t}$生成$\mathbf{s}$的过程才能得到正确的结果，因为从$\mathbf{t}$、$\mathbf{s}$和$\mathbf{a}$中是不能直接得到 的正确结果的。具体说，就是要从目标语句子的第一个单词开始到最后一个单词结束，依次生成每个目标语单词对应的源语单词，每处理完一个目标语单词就要暂停，然后才能计算式(\ref{eq:1.24})中求和符号里面的内容。这也就是说即使给定了$\mathbf{s}$、$\mathbf{t}$和一个对齐$\mathbf{a}$，也不能直接在它们上计算，必须重新模拟$\mathbf{t}$到$\mathbf{s}$的生成过程。
+\parinterval 不过上面的参数估计过程与IBM前4个模型的参数估计过程并不完全一样。IBM前4个模型在每次迭代中，可以在给定$\mathbf{s}$、$\mathbf{t}$和一个对齐$\mathbf{a}$的情况下直接计算并更新参数。但是在模型5的参数估计过程中（如公式\ref{eq:1.24}），需要模拟出由$\mathbf{t}$生成$\mathbf{s}$的过程才能得到正确的结果，因为从$\mathbf{t}$、$\mathbf{s}$和$\mathbf{a}$中是不能直接得到 的正确结果的。具体说，就是要从目标语言句子的第一个单词开始到最后一个单词结束，依次生成每个目标语言单词对应的源语言单词，每处理完一个目标语言单词就要暂停，然后才能计算式\ref{eq:1.24}中求和符号里面的内容。这也就是说即使给定了$\mathbf{s}$、$\mathbf{t}$和一个对齐$\mathbf{a}$，也不能直接在它们上进行计算，必须重新模拟$\mathbf{t}$到$\mathbf{s}$的生成过程。

-\parinterval 从前面的分析可以看出，虽然模型5比模型4更精确，但是模型5过于复杂以至于给参数估计增加了巨大的计算量（对于每组$\mathbf{t}$、$\mathbf{s}$和$\mathbf{a}$都要模拟$\mathbf{t}$生成$\mathbf{s}$的翻译过程，时间复杂度成指数增加）。因此模型5并不具有很强的实际意义。
+\parinterval 从前面的分析可以看出，虽然模型5比模型4更精确，但是模型5过于复杂以至于给参数估计增加了计算量（对于每组$\mathbf{t}$、$\mathbf{s}$和$\mathbf{a}$都要模拟$\mathbf{t}$生成$\mathbf{s}$的翻译过程）。因此模型5的开发对于系统实现是一个挑战。

-\parinterval 在模型5中同样需要定义一个词对齐集合S，使得每次迭代都在\textrm{S}上进行。这里对\textrm{S}进行如下定义
+\parinterval 在模型5中同样需要定义一个词对齐集合$S$，使得每次迭代都在$S$上进行。可以对$S$进行如下定义
 \begin{eqnarray}
 \textrm{S} = N(\tilde{\tilde{b}}^{\infty}(V(\mathbf{s}|\mathbf{t};2))) \cup (\mathop{\cup}\limits_{ij} N(\tilde{\tilde{b}}_{i \leftrightarrow j}^{\infty}(V_{i \leftrightarrow j}(\mathbf{s}|\mathbf{t},2))))
 \label{eq:1.29}
@@ -203,7 +202,7 @@ d_{>1}(\Delta_i|cb,v;\mathbf{s},\mathbf{t}) = \mu_{>1cb}^{-1} \times \sum_{s=1}^
 \vspace{0.5em}

 %----------------------------------------------
-\parinterval 这里$\tilde{\tilde{b}}(\mathbf{a})$借用了模型4中$\tilde{b}(\mathbf{a})$的概念。不过$\tilde{\tilde{b}}(\mathbf{a})$表示在利用模型3进行排名的列表中满足$\textrm{p}_{\theta}(\mathbf{a}'|\mathbf{s},\mathbf{t};5)$的最高排名的词对齐。
+\parinterval 这里$\tilde{\tilde{b}}(\mathbf{a})$借用了模型4中$\tilde{b}(\mathbf{a})$的概念。不过$\tilde{\tilde{b}}(\mathbf{a})$表示在利用模型3进行排名的列表中满足$\textrm{P}_{\theta}(\mathbf{a}'|\mathbf{s},\mathbf{t};5)$的最高排名的词对齐。
 \end{appendices}



--- a/Book/mt-book-xelatex.idx
+++ b/Book/mt-book-xelatex.idx
+\indexentry{Chapter3.1|hyperpage}{9}
+\indexentry{Chapter3.2|hyperpage}{11}
+\indexentry{Chapter3.2.1|hyperpage}{11}
+\indexentry{Chapter3.2.1.1|hyperpage}{11}
+\indexentry{Chapter3.2.1.2|hyperpage}{12}
+\indexentry{Chapter3.2.1.3|hyperpage}{13}
+\indexentry{Chapter3.2.2|hyperpage}{13}
+\indexentry{Chapter3.2.3|hyperpage}{14}
+\indexentry{Chapter3.2.3.1|hyperpage}{14}
+\indexentry{Chapter3.2.3.2|hyperpage}{14}
+\indexentry{Chapter3.2.3.3|hyperpage}{16}
+\indexentry{Chapter3.2.4|hyperpage}{17}
+\indexentry{Chapter3.2.4.1|hyperpage}{17}
+\indexentry{Chapter3.2.4.2|hyperpage}{19}
+\indexentry{Chapter3.2.5|hyperpage}{21}
+\indexentry{Chapter3.3|hyperpage}{24}
+\indexentry{Chapter3.3.1|hyperpage}{24}
+\indexentry{Chapter3.3.2|hyperpage}{26}
+\indexentry{Chapter3.3.2.1|hyperpage}{27}
+\indexentry{Chapter3.3.2.2|hyperpage}{27}
+\indexentry{Chapter3.3.2.3|hyperpage}{29}
+\indexentry{Chapter3.4|hyperpage}{30}
+\indexentry{Chapter3.4.1|hyperpage}{30}
+\indexentry{Chapter3.4.2|hyperpage}{32}
+\indexentry{Chapter3.4.3|hyperpage}{33}
+\indexentry{Chapter3.4.4|hyperpage}{34}
+\indexentry{Chapter3.4.4.1|hyperpage}{34}
+\indexentry{Chapter3.4.4.2|hyperpage}{35}
+\indexentry{Chapter3.5|hyperpage}{41}
+\indexentry{Chapter3.5.1|hyperpage}{41}
+\indexentry{Chapter3.5.2|hyperpage}{44}
+\indexentry{Chapter3.5.3|hyperpage}{45}
+\indexentry{Chapter3.5.4|hyperpage}{47}
+\indexentry{Chapter3.5.5|hyperpage}{48}
+\indexentry{Chapter3.5.5|hyperpage}{51}
+\indexentry{Chapter3.6|hyperpage}{51}
+\indexentry{Chapter3.6.1|hyperpage}{51}
+\indexentry{Chapter3.6.2|hyperpage}{52}
+\indexentry{Chapter3.6.4|hyperpage}{53}
+\indexentry{Chapter3.6.5|hyperpage}{54}
+\indexentry{Chapter3.7|hyperpage}{54}
--- a/Book/mt-book-xelatex.ptc
+++ b/Book/mt-book-xelatex.ptc
 \boolfalse {citerequest}\boolfalse {citetracker}\boolfalse {pagetracker}\boolfalse {backtracker}\relax 
-\babel@toc {english}{}
 \defcounter {refsection}{0}\relax 
-\contentsline {part}{\@mypartnumtocformat {I}{附录}}{7}{part.1}%
+\select@language {english}
+\defcounter {refsection}{0}\relax 
+\contentsline {part}{\@mypartnumtocformat {I}{统计机器翻译}}{7}{part.1}
 \ttl@starttoc {default@1}
 \defcounter {refsection}{0}\relax 
-\contentsline {chapter}{\numberline {A}附录A}{9}{appendix.1.A}%
+\contentsline {chapter}{\numberline {1}基于词的机器翻译模型}{9}{chapter.1}
+\defcounter {refsection}{0}\relax 
+\contentsline {section}{\numberline {1.1}什么是基于词的翻译模型}{9}{section.1.1}
+\defcounter {refsection}{0}\relax 
+\contentsline {section}{\numberline {1.2}构建一个简单的机器翻译系统}{11}{section.1.2}
+\defcounter {refsection}{0}\relax 
+\contentsline {subsection}{\numberline {1.2.1}如何进行翻译？}{11}{subsection.1.2.1}
+\defcounter {refsection}{0}\relax 
+\contentsline {subsubsection}{机器翻译流程}{12}{section*.6}
+\defcounter {refsection}{0}\relax 
+\contentsline {subsubsection}{人工翻译 vs. 机器翻译}{13}{section*.8}
+\defcounter {refsection}{0}\relax 
+\contentsline {subsection}{\numberline {1.2.2}基本框架}{13}{subsection.1.2.2}
+\defcounter {refsection}{0}\relax 
+\contentsline {subsection}{\numberline {1.2.3}单词翻译概率}{14}{subsection.1.2.3}
+\defcounter {refsection}{0}\relax 
+\contentsline {subsubsection}{什么是单词翻译概率？}{14}{section*.10}
+\defcounter {refsection}{0}\relax 
+\contentsline {subsubsection}{如何从一个双语平行数据中学习？}{14}{section*.12}
+\defcounter {refsection}{0}\relax 
+\contentsline {subsubsection}{如何从大量的双语平行数据中学习？}{16}{section*.13}
+\defcounter {refsection}{0}\relax 
+\contentsline {subsection}{\numberline {1.2.4}句子级翻译模型}{17}{subsection.1.2.4}
+\defcounter {refsection}{0}\relax 
+\contentsline {subsubsection}{基础模型}{17}{section*.15}
+\defcounter {refsection}{0}\relax 
+\contentsline {subsubsection}{生成流畅的译文}{19}{section*.17}
+\defcounter {refsection}{0}\relax 
+\contentsline {subsection}{\numberline {1.2.5}解码}{21}{subsection.1.2.5}
+\defcounter {refsection}{0}\relax 
+\contentsline {section}{\numberline {1.3}基于词的翻译建模}{24}{section.1.3}
+\defcounter {refsection}{0}\relax 
+\contentsline {subsection}{\numberline {1.3.1}噪声信道模型}{24}{subsection.1.3.1}
+\defcounter {refsection}{0}\relax 
+\contentsline {subsection}{\numberline {1.3.2}统计机器翻译的三个基本问题}{26}{subsection.1.3.2}
+\defcounter {refsection}{0}\relax 
+\contentsline {subsubsection}{词对齐}{27}{section*.26}
+\defcounter {refsection}{0}\relax 
+\contentsline {subsubsection}{基于词对齐的翻译模型}{27}{section*.29}
+\defcounter {refsection}{0}\relax 
+\contentsline {subsubsection}{基于词对齐的翻译实例}{29}{section*.31}
+\defcounter {refsection}{0}\relax 
+\contentsline {section}{\numberline {1.4}IBM模型1-2}{30}{section.1.4}
+\defcounter {refsection}{0}\relax 
+\contentsline {subsection}{\numberline {1.4.1}IBM模型1}{30}{subsection.1.4.1}
+\defcounter {refsection}{0}\relax 
+\contentsline {subsection}{\numberline {1.4.2}IBM模型2}{32}{subsection.1.4.2}
+\defcounter {refsection}{0}\relax 
+\contentsline {subsection}{\numberline {1.4.3}解码及计算优化}{33}{subsection.1.4.3}
+\defcounter {refsection}{0}\relax 
+\contentsline {subsection}{\numberline {1.4.4}训练}{34}{subsection.1.4.4}
+\defcounter {refsection}{0}\relax 
+\contentsline {subsubsection}{目标函数}{34}{section*.36}
+\defcounter {refsection}{0}\relax 
+\contentsline {subsubsection}{优化}{35}{section*.38}
+\defcounter {refsection}{0}\relax 
+\contentsline {section}{\numberline {1.5}IBM模型3-5及隐马尔可夫模型}{41}{section.1.5}
+\defcounter {refsection}{0}\relax 
+\contentsline {subsection}{\numberline {1.5.1}基于产出率的翻译模型}{41}{subsection.1.5.1}
+\defcounter {refsection}{0}\relax 
+\contentsline {subsection}{\numberline {1.5.2}IBM 模型3}{44}{subsection.1.5.2}
+\defcounter {refsection}{0}\relax 
+\contentsline {subsection}{\numberline {1.5.3}IBM 模型4}{45}{subsection.1.5.3}
+\defcounter {refsection}{0}\relax 
+\contentsline {subsection}{\numberline {1.5.4} IBM 模型5}{47}{subsection.1.5.4}
+\defcounter {refsection}{0}\relax 
+\contentsline {subsection}{\numberline {1.5.5}隐马尔可夫模型}{48}{subsection.1.5.5}
+\defcounter {refsection}{0}\relax 
+\contentsline {subsubsection}{隐马尔可夫模型}{49}{section*.50}
+\defcounter {refsection}{0}\relax 
+\contentsline {subsubsection}{词对齐模型}{50}{section*.52}
+\defcounter {refsection}{0}\relax 
+\contentsline {subsection}{\numberline {1.5.6}解码和训练}{51}{subsection.1.5.6}
+\defcounter {refsection}{0}\relax 
+\contentsline {section}{\numberline {1.6}问题分析}{51}{section.1.6}
+\defcounter {refsection}{0}\relax 
+\contentsline {subsection}{\numberline {1.6.1}词对齐及对称化}{51}{subsection.1.6.1}
+\defcounter {refsection}{0}\relax 
+\contentsline {subsection}{\numberline {1.6.2}Deficiency}{52}{subsection.1.6.2}
+\defcounter {refsection}{0}\relax 
+\contentsline {subsection}{\numberline {1.6.3}句子长度}{53}{subsection.1.6.3}
+\defcounter {refsection}{0}\relax 
+\contentsline {subsection}{\numberline {1.6.4}其他问题}{54}{subsection.1.6.4}
+\defcounter {refsection}{0}\relax 
+\contentsline {section}{\numberline {1.7}小结及深入阅读}{54}{section.1.7}
+\defcounter {refsection}{0}\relax 
+\contentsline {part}{\@mypartnumtocformat {II}{附录}}{57}{part.2}
+\ttl@stoptoc {default@1}
+\ttl@starttoc {default@2}
+\defcounter {refsection}{0}\relax 
+\contentsline {chapter}{\numberline {A}附录A}{59}{Appendix.1.A}
 \defcounter {refsection}{0}\relax 
-\contentsline {chapter}{\numberline {B}附录B}{11}{appendix.2.B}%
+\contentsline {chapter}{\numberline {B}附录B}{61}{Appendix.2.B}
 \defcounter {refsection}{0}\relax 
-\contentsline {section}{\numberline {B.1}IBM模型3训练方法}{11}{section.2.B.1}%
+\contentsline {section}{\numberline {B.1}IBM模型3训练方法}{61}{section.2.B.1}
 \defcounter {refsection}{0}\relax 
-\contentsline {section}{\numberline {B.2}IBM模型4训练方法}{13}{section.2.B.2}%
+\contentsline {section}{\numberline {B.2}IBM模型4训练方法}{63}{section.2.B.2}
 \defcounter {refsection}{0}\relax 
-\contentsline {section}{\numberline {B.3}IBM模型5训练方法}{15}{section.2.B.3}%
+\contentsline {section}{\numberline {B.3}IBM模型5训练方法}{65}{section.2.B.3}
 \contentsfinish 
--- a/Book/mt-book-xelatex.tex
+++ b/Book/mt-book-xelatex.tex
@@ -114,7 +114,7 @@

 %\include{Chapter1/chapter1}
 %\include{Chapter2/chapter2}
-%\include{Chapter3/chapter3}
+\include{Chapter3/chapter3}
 %\include{Chapter4/chapter4}
 %\include{Chapter5/chapter5}
 %\include{Chapter6/chapter6}