合并分支 'caorunzhe' 到 'mengxia'

Caorunzhe 查看合并请求 !253

合并分支 'caorunzhe' 到 'mengxia'
Caorunzhe 查看合并请求 !253
e5564fa2 · 孟霞 · 2d52d067 · 34827943 · e5564fa2 · e5564fa2
Commit e5564fa2 authored Sep 23, 2020 by 孟霞
--- a/Chapter2/chapter2.tex
+++ b/Chapter2/chapter2.tex
@@ -593,10 +593,10 @@ F(x)=\int_{-\infty}^x f(x)\textrm{d}x
 %    NEW SUBSUB-SECTION
 %----------------------------------------------------------------------------------------

-\subsubsection{2. 古德-图灵估计法}
+\subsubsection{2. 古德-图灵估计方法}

 \vspace{-0.5em}
-\parinterval {\small\bfnew{古德-图灵估计法}}\index{古德-图灵估计法}（Good-Turing Estimate）\index{Good-Turing Estimate}是Alan Turing和他的助手Irving John Good开发的，作为他们在二战期间破解德国密码机Enigma所使用方法的一部分，在1953 年Irving John Good将其发表。这一方法也是很多平滑算法的核心，其基本思路是：把非零的$n$元语法单元的概率降低，匀给一些低概率$n$元语法单元，以减小最大似然估计与真实概率之间的偏离\upcite{good1953population,gale1995good}。
+\parinterval {\small\bfnew{古德-图灵估计}}\index{古德-图灵估计}（Good-Turing Estimate）\index{Good-Turing Estimate}是Alan Turing和他的助手Irving John Good开发的，作为他们在二战期间破解德国密码机Enigma所使用方法的一部分，在1953 年Irving John Good将其发表。这一方法也是很多平滑算法的核心，其基本思路是：把非零的$n$元语法单元的概率降低，匀给一些低概率$n$元语法单元，以减小最大似然估计与真实概率之间的偏离\upcite{good1953population,gale1995good}。

 \parinterval 假定在语料库中出现$r$次的$n$-gram有$n_r$个，特别的，出现0次的$n$-gram（即未登录词及词串）有$n_0$个。语料库中全部单词的总个数为$N$，显然：
 \begin{eqnarray}

--- a/Chapter5/Figures/figure-calculation-formula&iterative-process-of-function.tex
+++ b/Chapter5/Figures/figure-calculation-formula&iterative-process-of-function.tex
@@ -10,8 +10,8 @@
    \node [anchor=west,inner sep=2pt] (eq1) at (0,0) {$f(s_u|t_v)$};
    \node [anchor=west] (eq2) at (eq1.east) {$=$\ };
    \draw [-] ([xshift=0.3em]eq2.east) -- ([xshift=11.6em]eq2.east);
-    \node [anchor=south west] (eq3) at ([xshift=1em]eq2.east) {$\sum_{i=1}^{N} c_{\mathbb{E}}(s_u|t_v;s^{[i]},t^{[i]})$};
-    \node [anchor=north west] (eq4) at (eq2.east) {$\sum_{s_u} \sum_{i=1}^{N} c_{\mathbb{E}}(s_u|t_v;s^{[i]},t^{[i]})$};
+    \node [anchor=south west] (eq3) at ([xshift=1em]eq2.east) {$\sum_{i=1}^{K} c_{\mathbb{E}}(s_u|t_v;s^{[i]},t^{[i]})$};
+    \node [anchor=north west] (eq4) at (eq2.east) {$\sum_{s_u} \sum_{i=1}^{K} c_{\mathbb{E}}(s_u|t_v;s^{[i]},t^{[i]})$};

   {
    \node [anchor=south] (label1) at ([yshift=-6em,xshift=3em]eq1.north west) {利用这个公式计算};

--- a/Chapter5/Figures/figure-em-algorithm-flow-chart.tex
+++ b/Chapter5/Figures/figure-em-algorithm-flow-chart.tex
@@ -6,17 +6,17 @@
 %-------------------------------------------------------------------------
 \begin{tikzpicture}
 \node [anchor=north west] (line1) at (0,0) {\small\sffamily\bfseries{IBM模型1的训练（EM算法）}};
-\node [anchor=north west] (line2) at ([yshift=-0.3em]line1.south west) {输入: 平行语料${(\seq{s}^{[1]},\seq{t}^{[1]}),...,(\seq{s}^{[N]},\seq{t}^{[N]})}$};
+\node [anchor=north west] (line2) at ([yshift=-0.3em]line1.south west) {输入: 平行语料${(\seq{s}^{[1]},\seq{t}^{[1]}),...,(\seq{s}^{[K]},\seq{t}^{[K]})}$};
 \node [anchor=north west] (line3) at ([yshift=-0.1em]line2.south west) {输出: 参数$f(\cdot|\cdot)$的最优值};
-\node [anchor=north west] (line4) at ([yshift=-0.1em]line3.south west) {1: \textbf{Function} \textsc{EM}($\{(\seq{s}^{[1]},\seq{t}^{[1]}),...,(\seq{s}^{[N]},\seq{t}^{[N]})\}$) };
+\node [anchor=north west] (line4) at ([yshift=-0.1em]line3.south west) {1: \textbf{Function} \textsc{EM}($\{(\seq{s}^{[1]},\seq{t}^{[1]}),...,(\seq{s}^{[K]},\seq{t}^{[K]})\}$) };
 \node [anchor=north west] (line5) at ([yshift=-0.1em]line4.south west) {2: \ \ Initialize $f(\cdot|\cdot)$ \hspace{5em} $\rhd$ 比如给$f(\cdot|\cdot)$一个均匀分布};
 \node [anchor=north west] (line6) at ([yshift=-0.1em]line5.south west) {3: \ \ Loop until $f(\cdot|\cdot)$ converges};
-\node [anchor=north west] (line7) at ([yshift=-0.1em]line6.south west) {4: \ \ \ \ \textbf{foreach} $k = 1$ to $N$ \textbf{do}};
+\node [anchor=north west] (line7) at ([yshift=-0.1em]line6.south west) {4: \ \ \ \ \textbf{foreach} $k = 1$ to $K$ \textbf{do}};
 \node [anchor=north west] (line8) at ([yshift=-0.1em]line7.south west) {5: \ \ \ \ \ \ \ \footnotesize{$c_{\mathbb{E}}(\seq{s}_u|\seq{t}_v;\seq{s}^{[k]},\seq{t}^{[k]}) = \sum\limits_{j=1}^{|\seq{s}^{[k]}|} \delta(s_j,s_u) \sum\limits_{i=0}^{|\seq{t}^{[k]}|} \delta(t_i,t_v) \cdot \frac{f(s_u|t_v)}{\sum_{i=0}^{l}f(s_u|t_i)}$}\normalsize{}};
-\node [anchor=north west] (line9) at ([yshift=-0.1em]line8.south west) {6: \ \ \ \ \textbf{foreach} $t_v$ appears at least one of $\{\seq{t}^{[1]},...,\seq{t}^{[N]}\}$ \textbf{do}};
-\node [anchor=north west] (line10) at ([yshift=-0.1em]line9.south west) {7: \ \ \ \ \ \ \ $\lambda_{t_v}^{'} = \sum_{s_u} \sum_{k=1}^{N} c_{\mathbb{E}}(s_u|t_v;\seq{s}^{[k]},\seq{t}^{[k]})$};
-\node [anchor=north west] (line11) at ([yshift=-0.1em]line10.south west) {8: \ \ \ \ \ \ \ \textbf{foreach} $s_u$ appears at least one of $\{\seq{s}^{[1]},...,\seq{s}^{[N]}\}$ \textbf{do}};
-\node [anchor=north west] (line12) at ([yshift=-0.1em]line11.south west) {9: \ \ \ \ \ \ \ \ \ $f(s_u|t_v) = \sum_{k=1}^{N} c_{\mathbb{E}}(s_u|t_v;\seq{s}^{[k]},\seq{t}^{[k]}) \cdot (\lambda_{t_v}^{'})^{-1}$};
+\node [anchor=north west] (line9) at ([yshift=-0.1em]line8.south west) {6: \ \ \ \ \textbf{foreach} $t_v$ appears at least one of $\{\seq{t}^{[1]},...,\seq{t}^{[K]}\}$ \textbf{do}};
+\node [anchor=north west] (line10) at ([yshift=-0.1em]line9.south west) {7: \ \ \ \ \ \ \ $\lambda_{t_v}^{'} = \sum_{s_u} \sum_{k=1}^{K} c_{\mathbb{E}}(s_u|t_v;\seq{s}^{[k]},\seq{t}^{[k]})$};
+\node [anchor=north west] (line11) at ([yshift=-0.1em]line10.south west) {8: \ \ \ \ \ \ \ \textbf{foreach} $s_u$ appears at least one of $\{\seq{s}^{[1]},...,\seq{s}^{[K]}\}$ \textbf{do}};
+\node [anchor=north west] (line12) at ([yshift=-0.1em]line11.south west) {9: \ \ \ \ \ \ \ \ \ $f(s_u|t_v) = \sum_{k=1}^{K} c_{\mathbb{E}}(s_u|t_v;\seq{s}^{[k]},\seq{t}^{[k]}) \cdot (\lambda_{t_v}^{'})^{-1}$};
 \node [anchor=north west] (line13) at ([yshift=-0.1em]line12.south west) {10: \ \textbf{return} $f(\cdot|\cdot)$};

 \begin{pgfonlayer}{background}

--- a/Chapter5/Figures/figure-processes-smt.tex
+++ b/Chapter5/Figures/figure-processes-smt.tex
@@ -15,7 +15,7 @@
 \end{pgfonlayer}
 }

-\node [anchor=west,ugreen] (P) at ([xshift=5em,yshift=-0.7em]corpus.east){P($\mathbf{t}|\mathbf{s}$)};
+\node [anchor=west,ugreen] (P) at ([xshift=5em,yshift=-0.7em]corpus.east){$\funp{P}(\seq{t}|\seq{s})$};
 \node [anchor=south] (modellabel) at (P.north) {{\color{ublue} {\scriptsize \sffamily\bfseries{翻译模型}}}};

 \begin{pgfonlayer}{background}

--- a/Chapter5/chapter5.tex
+++ b/Chapter5/chapter5.tex
@@ -50,7 +50,7 @@ IBM模型由Peter F. Brown等人于上世纪九十年代初提出\upcite{DBLP:jo
 \end{figure}
 %----------------------------------------------

-\parinterval 上面的例子反映了人在做翻译时所使用的一些知识：首先，两种语言单词的顺序可能不一致，而且译文需要符合目标语的习惯，这也就是常说的翻译的{\small\sffamily\bfseries{流畅度}}\index{流畅度}问题（Fluency）\index{Fluency}；其次，源语言单词需要准确地被翻译出来，也就是常说的翻译的{\small\sffamily\bfseries{准确性}}\index{准确性}(Accuracy)\index{Accuracy}问题和{\small\sffamily\bfseries{充分性}}\index{充分性}（Adequacy）\index{Adequacy}问题。为了达到以上目的，传统观点认为翻译过程需要包含三个步骤\upcite{parsing2009speech}：
+\parinterval 上面的例子反映了人在做翻译时所使用的一些知识：首先，两种语言单词的顺序可能不一致，而且译文需要符合目标语的习惯，这也就是常说的翻译的{\small\sffamily\bfseries{流畅度}}\index{流畅度}问题（Fluency）\index{Fluency}；其次，源语言单词需要准确地被翻译出来，也就是常说的翻译的{\small\sffamily\bfseries{准确性}}\index{准确性}（Accuracy）\index{Accuracy}问题和{\small\sffamily\bfseries{充分性}}\index{充分性}（Adequacy）\index{Adequacy}问题。为了达到以上目的，传统观点认为翻译过程需要包含三个步骤\upcite{parsing2009speech}：

 \begin{itemize}
 \vspace{0.5em}
@@ -273,13 +273,13 @@ $\seq{t}$ = machine\; \underline{translation}\; is\; a\; process\; of\; generati

 \subsubsection{3. 如何从大量的双语平行数据中进行学习？}

-\parinterval 如果有更多的句子，上面的方法同样适用。假设，有$N$个互译句对$\{(\seq{s}^{[1]},\seq{t}^{[1]})$,...,\\$(\seq{s}^{[N]},\seq{t}^{[N]})\}$。仍然可以使用基于相对频次的方法估计翻译概率$\funp{P}(x,y)$，具体方法如下:
+\parinterval 如果有更多的句子，上面的方法同样适用。假设，有$K$个互译句对$\{(\seq{s}^{[1]},\seq{t}^{[1]})$,...,\\$(\seq{s}^{[K]},\seq{t}^{[K]})\}$。仍然可以使用基于相对频次的方法估计翻译概率$\funp{P}(x,y)$，具体方法如下:
 \begin{eqnarray}
-\funp{P}(x,y)  =  \frac{{\sum_{i=1}^{N} c(x,y;\seq{s}^{[i]},\seq{t}^{[i]})}}{\sum_{i=1}^{N}{{\sum_{x',y'} c(x',y';\seq{s}^{[i]},\seq{t}^{[i]})}}}
+\funp{P}(x,y)  =  \frac{{\sum_{i=1}^{K} c(x,y;\seq{s}^{[i]},\seq{t}^{[i]})}}{\sum_{i=1}^{K}{{\sum_{x',y'} c(x',y';\seq{s}^{[i]},\seq{t}^{[i]})}}}
 \label{eq:5-4}
 \end{eqnarray}

-\parinterval 与公式\ref{eq:5-1}相比，公式\ref{eq:5-4}的分子、分母都多了一项累加符号$\sum_{i=1}^{N} \cdot$，它表示遍历语料库中所有的句对。换句话说，当计算词的共现次数时，需要对每个句对上的计数结果进行累加。从统计学习的角度，使用更大规模的数据进行参数估计可以提高结果的可靠性。计算单词的翻译概率也是一样，在小规模的数据上看，很多翻译现象的特征并不突出，但是当使用的数据量增加到一定程度，翻译的规律会很明显的体现出来。
+\parinterval 与公式\ref{eq:5-1}相比，公式\ref{eq:5-4}的分子、分母都多了一项累加符号$\sum_{i=1}^{K} \cdot$，它表示遍历语料库中所有的句对。换句话说，当计算词的共现次数时，需要对每个句对上的计数结果进行累加。从统计学习的角度，使用更大规模的数据进行参数估计可以提高结果的可靠性。计算单词的翻译概率也是一样，在小规模的数据上看，很多翻译现象的特征并不突出，但是当使用的数据量增加到一定程度，翻译的规律会很明显的体现出来。

 \parinterval 举个例子，实例\ref{eg:5-2}展示了一个由两个句对构成的平行语料库。

@@ -633,7 +633,7 @@ g(\seq{s},\seq{t}) \equiv \prod_{j,i \in \widehat{A}}{\funp{P}(s_j,t_i)} \times 
 \end{figure}
 %----------------------------------------------

-\item 源语言单词可以翻译为空，这时它对应到一个虚拟或伪造的目标语单词$t_0$。在图\ref{fig:5-16}所示的例子中，``在''没有对应到``on the table''中的任意一个词，而是把它对应到$t_0$上。这样，所有的源语言单词都能找到一个目标语单词对应。这种设计也很好地引入了{\small\sffamily\bfseries{空对齐}}\index{空对齐}的思想，即源语言单词不对应任何真实存在的单词的情况。而这种空对齐的情况在翻译中是频繁出现的，比如虚词的翻译。
+\item 源语言单词可以翻译为空，这时它对应到一个虚拟或伪造的目标语单词$t_0$。在图\ref{fig:5-16}所示的例子中，``在''没有对应到``on the table''中的任意一个词，而是把它对应到$t_0$上。这样，所有的源语言单词都能找到一个目标语单词对应。这种设计也很好地引入了{\small\sffamily\bfseries{空对齐}}\index{空对齐}（Empty Alignment\index{Empty Alignment}）的思想，即源语言单词不对应任何真实存在的单词的情况。而这种空对齐的情况在翻译中是频繁出现的，比如虚词的翻译。

 %----------------------------------------------
 \begin{figure}[htp]
@@ -703,7 +703,7 @@ g(\seq{s},\seq{t}) \equiv \prod_{j,i \in \widehat{A}}{\funp{P}(s_j,t_i)} \times 

 \subsection{基于词对齐的翻译实例}

-\parinterval 用前面图\ref{fig:5-16}中例子来对公式\ref{eq:5-18}进行说明。例子中，源语言句子``在\ \ 桌子\ \ 上''目标语译文``on the table''之间的词对齐为$\seq{a}=\{\textrm{1-0, 2-3, 3-1}\}$。公式\ref{eq:5-18}的计算过程如下：
+\parinterval 用前面图\ref{fig:5-16}中例子来对公式\ref{eq:5-18}进行说明。例子中，源语言句子``在\ \ 桌子\ \ 上''目标语译文``on the table''之间的词对齐为$\seq{a}=\{\textrm{1-0, 2-3, 3-1}\}$。 公式\ref{eq:5-18}的计算过程如下：

 \begin{itemize}
 \vspace{0.5em}
@@ -720,11 +720,11 @@ g(\seq{s},\seq{t}) \equiv \prod_{j,i \in \widehat{A}}{\funp{P}(s_j,t_i)} \times 
 \funp{P}(\seq{s},\seq{a}|\seq{t})\; &= & \funp{P}(m|\seq{t}) \prod\limits_{j=1}^{m} \funp{P}(a_j|a_{1}^{j-1},s_{1}^{j-1},m,\seq{t}) \funp{P}(s_j|a_{1}^{j},s_{1}^{j-1},m,\seq{t}) \nonumber \\
 &=&\funp{P}(m=3 \mid \textrm{$t_0$ on the table}){\times} \nonumber \\
 &&{\funp{P}(a_1=0 \mid \phi,\phi,3,\textrm{$t_0$ on the table}){\times} } \nonumber \\
-&&{\funp{P}(f_1=\textrm{在} \mid \textrm{\{1-0\}},\phi,3,\textrm{$t_0$ on the table}){\times} } \nonumber \\
+&&{\funp{P}(s_1=\textrm{在} \mid \textrm{\{1-0\}},\phi,3,\textrm{$t_0$ on the table}){\times} } \nonumber \\
 &&{\funp{P}(a_2=3 \mid \textrm{\{1-0\}},\textrm{在},3,\textrm{$t_0$ on the table}) {\times}} \nonumber \\
-&&{\funp{P}(f_2=\textrm{桌子} \mid \textrm{\{1-0, 2-3\}},\textrm{在},3,\textrm{$t_0$ on the table}) {\times}} \nonumber \\
+&&{\funp{P}(s_2=\textrm{桌子} \mid \textrm{\{1-0, 2-3\}},\textrm{在},3,\textrm{$t_0$ on the table}) {\times}} \nonumber \\
 &&{\funp{P}(a_3=1 \mid \textrm{\{1-0, 2-3\}},\textrm{在\ \ 桌子},3,\textrm{$t_0$ on the table}) {\times}} \nonumber \\
-&&{\funp{P}(f_3=\textrm{上} \mid \textrm{\{1-0, 2-3, 3-1\}},\textrm{在\ \ 桌子},3,\textrm{$t_0$ on the table})  }
+&&{\funp{P}(s_3=\textrm{上} \mid \textrm{\{1-0, 2-3, 3-1\}},\textrm{在\ \ 桌子},3,\textrm{$t_0$ on the table})  }
 \label{eq:5-19}
 \end{eqnarray}

@@ -1064,10 +1064,10 @@ f(s_u|t_v)=\frac{c_{\mathbb{E}}(s_u|t_v;\seq{s},\seq{t})}  { \sum\limits_{s_u} c
 \end{figure}
 %----------------------------------------------

-\parinterval 进一步，假设有$N$个互译的句对（称作平行语料）：
-$\{(\seq{s}^{[1]},\seq{t}^{[1]}),...,(\seq{s}^{[N]},\seq{t}^{[N]})\}$，$f(s_u|t_v)$的期望频次为：
+\parinterval 进一步，假设有$K$个互译的句对（称作平行语料）：
+$\{(\seq{s}^{[1]},\seq{t}^{[1]}),...,(\seq{s}^{[K]},\seq{t}^{[K]})\}$，$f(s_u|t_v)$的期望频次为：
 \begin{eqnarray}
-c_{\mathbb{E}}(s_u|t_v)=\sum\limits_{i=1}^{N}  c_{\mathbb{E}}(s_u|t_v;s^{[i]},t^{[i]})
+c_{\mathbb{E}}(s_u|t_v)=\sum\limits_{i=1}^{K}  c_{\mathbb{E}}(s_u|t_v;s^{[i]},t^{[i]})
 \label{eq:5-46}
 \end{eqnarray}


--- a/Chapter6/Figures/figure-expression.tex
+++ b/Chapter6/Figures/figure-expression.tex
@@ -12,16 +12,16 @@



-\node [anchor=west,inner sep=2pt,minimum height=2.5em] (eq1) at (0,0) {${\textrm{P}(\tau,\pi|\mathbf{t}) =  \prod_{i=1}^{l}\hspace{6.0em} \times \ \hspace{5.5em}\times}$};
+\node [anchor=west,inner sep=2pt,minimum height=2.5em] (eq1) at (0,0) {${\funp{P}(\tau,\pi|\seq{t}) =  \prod_{i=1}^{l}\hspace{6.0em} \times \ \hspace{5.5em}\times}$};
 \node [anchor=north west,inner sep=2pt,minimum height=2.5em] (eq2) at ([xshift=-16.06em,yshift=0.0em]eq1.south east) {${\prod_{i=0}^l{\prod_{k=1}^{\varphi_i}\hspace{9.6em}} \ \ \times}$};
 \node [anchor=north west,inner sep=2pt,minimum height=2.5em] (eq3) at ([xshift=-16.05em,yshift=0.0em]eq2.south east) {${\prod_{i=1}^l{\prod_{k=1}^{\varphi_i}}\hspace{11.5em} \times}$};
 \node [anchor=north west,inner sep=2pt,minimum height=2.5em] (eq4) at ([xshift=-17.44em,yshift=0.0em]eq3.south east) {{${\prod_{k=1}^{\varphi_0}}$}};

-\node [anchor=west,inner sep=2pt,minimum height=2.0em,fill=red!30] (part1) at ([xshift=-13.4em,yshift=0.0em]eq1.east) {{${\textrm{P}(\varphi_i|\varphi_{1}^{i-1},\mathbf{t})}$}};
-\node [anchor=west,inner sep=2pt,minimum height=2.0em,fill=blue!30] (part2) at ([xshift=-6.4em,yshift=0.0em]eq1.east) {{${\textrm{P}(\varphi_0|\varphi_{1}^{l},\mathbf{t})}$}};
-\node [anchor=west,inner sep=2pt,minimum height=2.0em,fill=green!30] (part3) at ([xshift=-11em,yshift=0.0em]eq2.east) {{${\textrm{P}(\tau_{ik}|\tau_{i1}^{k-1},\tau_{1}^{i-1},\varphi_{0}^{l},\mathbf{t} )}$}};
-\node [anchor=west,inner sep=2pt,minimum height=2.0em,fill=yellow!30] (part4) at ([xshift=-12.5em,yshift=0.0em]eq3.east) {{${\textrm{P}(\pi_{ik}|\pi_{i1}^{k-1},\pi_{1}^{i-1},\tau_{0}^{l},\varphi_{0}^{l},\mathbf{t} )}$}};
-\node [anchor=west,inner sep=2pt,minimum height=2.0em,fill=gray!30] (part5) at ([xshift=0.0em,yshift=0.0em]eq4.east) {{${\textrm{P}(\pi_{0k}|\pi_{01}^{k-1},\pi_{1}^{l},\tau_{0}^{l},\varphi_{0}^{l},\mathbf{t} )}$}};
+\node [anchor=west,inner sep=2pt,minimum height=2.0em,fill=red!30] (part1) at ([xshift=-13.4em,yshift=0.0em]eq1.east) {{${\funp{P}(\varphi_i|\varphi_{1}^{i-1},\seq{t})}$}};
+\node [anchor=west,inner sep=2pt,minimum height=2.0em,fill=blue!30] (part2) at ([xshift=-6.4em,yshift=0.0em]eq1.east) {{${\funp{P}(\varphi_0|\varphi_{1}^{l},\seq{t})}$}};
+\node [anchor=west,inner sep=2pt,minimum height=2.0em,fill=green!30] (part3) at ([xshift=-11em,yshift=0.0em]eq2.east) {{${\funp{P}(\tau_{ik}|\tau_{i1}^{k-1},\tau_{1}^{i-1},\varphi_{0}^{l},\seq{t} )}$}};
+\node [anchor=west,inner sep=2pt,minimum height=2.0em,fill=yellow!30] (part4) at ([xshift=-12.5em,yshift=0.0em]eq3.east) {{${\funp{P}(\pi_{ik}|\pi_{i1}^{k-1},\pi_{1}^{i-1},\tau_{0}^{l},\varphi_{0}^{l},\seq{t} )}$}};
+\node [anchor=west,inner sep=2pt,minimum height=2.0em,fill=gray!30] (part5) at ([xshift=0.0em,yshift=0.0em]eq4.east) {{${\funp{P}(\pi_{0k}|\pi_{01}^{k-1},\pi_{1}^{l},\tau_{0}^{l},\varphi_{0}^{l},\seq{t} )}$}};


 \end{tikzpicture}

--- a/Chapter6/Figures/figure-probability-translation-process.tex
+++ b/Chapter6/Figures/figure-probability-translation-process.tex
@@ -4,10 +4,10 @@

 {
 {
-\node [anchor=north west] (st) at (0,0) {$\mathbf{s}$};
+\node [anchor=north west] (st) at (0,0) {$\seq{s}$};
 \node [anchor=north] (taut) at ([yshift=-3em]st.south) {\sffamily\bfseries{$\tau$}};
 \node [anchor=north] (phit) at ([yshift=-3em]taut.south) {\sffamily\bfseries{$\phi$}};
-\node [anchor=north] (tt) at ([yshift=-3em]phit.south) {$\mathbf{t}$};
+\node [anchor=north] (tt) at ([yshift=-3em]phit.south) {$\seq{t}$};
 }
 {\scriptsize
 \node [anchor=west,minimum height=2.5em,minimum width=5.0em] (sf1) at ([xshift=1em]st.east) {};

--- a/Chapter6/Figures/figure-word-alignment&probability-distribution-in-ibm-model-3.tex
+++ b/Chapter6/Figures/figure-word-alignment&probability-distribution-in-ibm-model-3.tex
@@ -14,8 +14,8 @@
 \node [anchor=west] (eq2) at ([xshift=3.0em,yshift=0.0em]eq1.east) {早饭};
 \node [anchor=north] (eq3) at ([xshift=0.0em,yshift=-2.0em]eq1.south) {have};
 \node [anchor=north] (eq4) at ([xshift=0.0em,yshift=-2.0em]eq2.south) {breakfast};
-\node [anchor=east] (eq5) at ([xshift=-1.0em,yshift=-1.8em]eq1.west) {$\mathbf{a}_{1}$};
-\node [anchor=west] (eq6) at ([xshift=1.0em,yshift=-1.8em]eq2.east) {$\textrm{P}(\mathbf{s},\mathbf{a}_{1}|\mathbf{t})=0.5$};
+\node [anchor=east] (eq5) at ([xshift=-1.0em,yshift=-1.8em]eq1.west) {$\seq{a}_{1}$};
+\node [anchor=west] (eq6) at ([xshift=1.0em,yshift=-1.8em]eq2.east) {$\funp{P}(\seq{s},\seq{a}_{1}|\seq{t})=0.5$};
 \draw [-,very thick](eq1.south) -- (eq3.north);
 \draw [-,very thick](eq2.south) -- (eq4.north);
 \node [anchor=west] (eq7) at ([xshift=13.1em,yshift=1.4em]eq2.east) {};
@@ -33,8 +33,8 @@
 \node [anchor=west] (eq2) at ([xshift=3.0em,yshift=0.0em]eq1.east) {早饭};
 \node [anchor=north] (eq3) at ([xshift=0.0em,yshift=-2.0em]eq1.south) {have};
 \node [anchor=north] (eq4) at ([xshift=0.0em,yshift=-2.0em]eq2.south) {breakfast};
-\node [anchor=east] (eq5) at ([xshift=-1.0em,yshift=-1.8em]eq1.west) {$\mathbf{a}_{2}$};
-\node [anchor=west] (eq6) at ([xshift=1.0em,yshift=-1.8em]eq2.east) {$\textrm{P}(\mathbf{s},\mathbf{a}_{2}|\mathbf{t})=0.1$};
+\node [anchor=east] (eq5) at ([xshift=-1.0em,yshift=-1.8em]eq1.west) {$\seq{a}_{2}$};
+\node [anchor=west] (eq6) at ([xshift=1.0em,yshift=-1.8em]eq2.east) {$\funp{P}(\seq{s},\seq{a}_{2}|\seq{t})=0.1$};
 \draw [-,very thick](eq1.south) -- (eq4.north);
 \draw [-,very thick](eq2.south) -- (eq3.north);
 \end{scope}
@@ -44,8 +44,8 @@
 \node [anchor=west] (eq2) at ([xshift=3.0em,yshift=0.0em]eq1.east) {早饭};
 \node [anchor=north] (eq3) at ([xshift=0.0em,yshift=-2.0em]eq1.south) {have};
 \node [anchor=north] (eq4) at ([xshift=0.0em,yshift=-2.0em]eq2.south) {breakfast};
-\node [anchor=east] (eq5) at ([xshift=-1.0em,yshift=-1.8em]eq1.west) {$\mathbf{a}_{3}$};
-\node [anchor=west] (eq6) at ([xshift=1.0em,yshift=-1.8em]eq2.east) {$\textrm{P}(\mathbf{s},\mathbf{a}_{3}|\mathbf{t})=0.1$};
+\node [anchor=east] (eq5) at ([xshift=-1.0em,yshift=-1.8em]eq1.west) {$\seq{a}_{3}$};
+\node [anchor=west] (eq6) at ([xshift=1.0em,yshift=-1.8em]eq2.east) {$\funp{P}(\seq{s},\seq{a}_{3}|\seq{t})=0.1$};
 \draw [-,very thick](eq1.south) -- (eq3.north);
 \draw [-,very thick](eq2.south) -- (eq3.north);
 \end{scope}
@@ -55,8 +55,8 @@
 \node [anchor=west] (eq2) at ([xshift=3.0em,yshift=0.0em]eq1.east) {早饭};
 \node [anchor=north] (eq3) at ([xshift=0.0em,yshift=-2.0em]eq1.south) {have};
 \node [anchor=north] (eq4) at ([xshift=0.0em,yshift=-2.0em]eq2.south) {breakfast};
-\node [anchor=east] (eq5) at ([xshift=-1.0em,yshift=-1.8em]eq1.west) {$\mathbf{a}_{4}$};
-\node [anchor=west] (eq6) at ([xshift=1.0em,yshift=-1.8em]eq2.east) {$\textrm{P}(\mathbf{s},\mathbf{a}_{4}|\mathbf{t})=0.1$};
+\node [anchor=east] (eq5) at ([xshift=-1.0em,yshift=-1.8em]eq1.west) {$\seq{a}_{4}$};
+\node [anchor=west] (eq6) at ([xshift=1.0em,yshift=-1.8em]eq2.east) {$\funp{P}(\seq{s},\seq{a}_{4}|\seq{t})=0.1$};
 \draw [-,very thick](eq1.south) -- (eq4.north);
 \draw [-,very thick](eq2.south) -- (eq4.north);
 \end{scope}
@@ -66,8 +66,8 @@
 \node [anchor=west] (eq2) at ([xshift=3.0em,yshift=0.0em]eq1.east) {早饭};
 \node [anchor=north] (eq3) at ([xshift=0.0em,yshift=-2.0em]eq1.south) {have};
 \node [anchor=north] (eq4) at ([xshift=0.0em,yshift=-2.0em]eq2.south) {breakfast};
-\node [anchor=east] (eq5) at ([xshift=-1.0em,yshift=-1.8em]eq1.west) {$\mathbf{a}_{5}$};
-\node [anchor=west] (eq6) at ([xshift=1.0em,yshift=-1.8em]eq2.east) {$\textrm{P}(\mathbf{s},\mathbf{a}_{5}|\mathbf{t})=0.05$};
+\node [anchor=east] (eq5) at ([xshift=-1.0em,yshift=-1.8em]eq1.west) {$\seq{a}_{5}$};
+\node [anchor=west] (eq6) at ([xshift=1.0em,yshift=-1.8em]eq2.east) {$\funp{P}(\seq{s},\seq{a}_{5}|\seq{t})=0.05$};
 \draw [-,very thick](eq1.south) -- (eq3.north);
 \draw [-,very thick](eq1.south) -- (eq4.north);
 \draw [-,very thick](eq2.south) -- (eq3.north);
@@ -81,8 +81,8 @@
 \node [anchor=west] (eq2) at ([xshift=3.0em,yshift=0.0em]eq1.east) {早饭};
 \node [anchor=north] (eq3) at ([xshift=0.0em,yshift=-2.0em]eq1.south) {have};
 \node [anchor=north] (eq4) at ([xshift=0.0em,yshift=-2.0em]eq2.south) {breakfast};
-\node [anchor=east] (eq5) at ([xshift=-1.0em,yshift=-1.8em]eq1.west) {$\mathbf{a}_{6}$};
-\node [anchor=west] (eq6) at ([xshift=1.0em,yshift=-1.8em]eq2.east) {$\textrm{P}(\mathbf{s},\mathbf{a}_{6}|\mathbf{t})=0.05$};
+\node [anchor=east] (eq5) at ([xshift=-1.0em,yshift=-1.8em]eq1.west) {$\seq{a}_{6}$};
+\node [anchor=west] (eq6) at ([xshift=1.0em,yshift=-1.8em]eq2.east) {$\funp{P}(\seq{s},\seq{a}_{6}|\seq{t})=0.05$};
 \draw [-,very thick](eq1.south) -- (eq3.north);
 \draw [-,very thick](eq2.south) -- (eq4.north);
 \draw [-,very thick](eq2.south) -- (eq3.north);

--- a/Chapter6/Figures/figure-word-alignment.tex
+++ b/Chapter6/Figures/figure-word-alignment.tex
@@ -39,8 +39,8 @@
 \draw [->,thick] (s5.south) -- (t6.north);
 }

-\node [anchor=east] (ss) at ([xshift=-0.5em]s1.west) {$\mathbf{s}$};
-\node [anchor=east] (tt) at ([xshift=-0.5em]t1.west) {$\mathbf{t}$};
+\node [anchor=east] (ss) at ([xshift=-0.5em]s1.west) {$\seq{s}$};
+\node [anchor=east] (tt) at ([xshift=-0.5em]t1.west) {$\seq{t}$};

 }
 \end{tikzpicture}

--- a/Chapter7/Figures/figure-translation-option.tex
+++ b/Chapter7/Figures/figure-translation-option.tex
@@ -4,7 +4,7 @@
 \begin{tikzpicture}
 \begin{scope}[minimum height = 16pt]

-\node[anchor=east] (s0) at (-0.8em, 0) {$\textbf{s}$：};
+\node[anchor=east] (s0) at (-0.8em, 0) {$\seq{s}$：};
 \node[anchor=west] (s1) at (0, 0) {桌子};
 \node[anchor=west] (s2) at ([xshift=2em]s1.east) {上};
 \node[anchor=west] (s3) at ([xshift=2.3em]s2.east) {有};

--- a/Chapter8/Figures/figure-chinese-syntax-tree.tex
+++ b/Chapter8/Figures/figure-chinese-syntax-tree.tex
@@ -8,7 +8,7 @@
        [.NP
            [.NP
                [.DT the ]
-                [.NN import ]
+                [.NN imports ]
            ]
            [.IN in ]
            [.NP \edge[roof]; {North Korea} ]

--- a/Chapter8/Figures/figure-cky-algorithm.tex
+++ b/Chapter8/Figures/figure-cky-algorithm.tex
@@ -8,7 +8,7 @@
 \tikzstyle{srcnode} = [anchor=south west]
 \begin{scope}[scale=0.85]

-\node[srcnode] (c1) at (0,0) {\normalsize{\textbf{Function} CKY-Algorithm($\textbf{s},G$)}};
+\node[srcnode] (c1) at (0,0) {\normalsize{\textbf{Function} CKY-Algorithm($\seq{s},G$)}};
 \node[srcnode,anchor=north west] (c21) at ([xshift=1.5em,yshift=0.4em]c1.south west) {\normalsize{\textbf{for} $j=0$ to $ J - 1$}};
 \node[srcnode,anchor=north west] (c22) at ([xshift=1.5em,yshift=0.4em]c21.south west) {\normalsize{$span[j,j+1 ]$.Add($A \to a \in G$)}};
 \node[srcnode,anchor=north west] (c3) at ([xshift=-1.5em,yshift=0.4em]c22.south west) {\normalsize{\textbf{for} $l$ = 1 to $J$}};
@@ -21,7 +21,7 @@
 \node[srcnode,anchor=north west] (c7) at ([yshift=0.4em]c6.south west) {\normalsize{$span[j, j+l]$.Update($hypos$)}};
 \node[srcnode,anchor=north west] (c8) at ([xshift=-4.5em,yshift=0.4em]c7.south west) {\normalsize{\textbf{return} $span[0, J]$}};

-\node[anchor=west] (c9) at ([xshift=-3.2em,yshift=1.7em]c1.west) {\small{\textrm{参数：}\textbf{s}为输入字符串。$G$为输入CFG。$J$为待分析字符串长度。}};
+\node[anchor=west] (c9) at ([xshift=-3.2em,yshift=1.7em]c1.west) {\small{\textrm{参数：}\seq{s}为输入字符串。$G$为输入CFG。$J$为待分析字符串长度。}};
 \node[anchor=west] (c10) at ([xshift=0em,yshift=1.3em]c9.west) {\small{\textrm{输出：字符串全部可能的语法分析结果}}};
 \node[anchor=west] (c11) at ([xshift=0em,yshift=1.3em]c10.west) {\small{\textrm{输入：符合乔姆斯基范式的待分析字符串和一个上下文无关文法（CFG）}}};


--- a/Chapter8/Figures/figure-content-of-chart-in-tree-based-decoding.tex
+++ b/Chapter8/Figures/figure-content-of-chart-in-tree-based-decoding.tex
@@ -64,8 +64,8 @@

 \node[anchor=west](k2) at ([xshift=0.2em,yshift=-1.7em]k1.west){{[0,1]}};
 \node[anchor=west](k3) at ([xshift=0em,yshift=-1.5em]k2.west){{[1,2]}};
-\node[anchor=west](k4) at ([xshift=0em,yshift=-1.5em]k3.west){{[2,5]}};
-\node[anchor=west](k5) at ([xshift=0em,yshift=-1.5em]k4.west){{[3,6]}};
+\node[anchor=west](k4) at ([xshift=0em,yshift=-1.5em]k3.west){{[2,3]}};
+\node[anchor=west](k5) at ([xshift=0em,yshift=-1.5em]k4.west){{[3,4]}};
 \node[anchor=west](k6) at ([xshift=0em,yshift=-1.5em]k5.west){{[0,2]}};
 \node[anchor=west](k7) at ([xshift=0em,yshift=-1.5em]k6.west){{[1,3]}};
 \node[anchor=west](k8) at ([xshift=0em,yshift=-1.5em]k7.west){{[2,4]}};

--- a/Chapter8/Figures/figure-example-of-cky-algorithm-execution.tex
+++ b/Chapter8/Figures/figure-example-of-cky-algorithm-execution.tex
@@ -224,7 +224,7 @@
 {
 \node [anchor=center] (n6) at ([yshift=-4em]n5.center) {\scriptsize{6}};
 \node [anchor=center] (k6) at ([yshift=-4em]k5.center) {\scriptsize{[{\blue 0},{\blue 2}]}};
-\node [anchor=west] (t6) at ([xshift=0.2em,yshift=-4em]t5.west) {\scriptsize{none}};
+\node [anchor=west] (t6) at ([xshift=0.2em,yshift=-4.2em]t5.west) {\scriptsize{none}};
 \node [anchor=center,selectnode,fill=red!20] (alig22) at (cell22.center) {\tiny{}};
 }

@@ -337,7 +337,7 @@
 \node [anchor=center] (sep1) at ([yshift=-1.7em]n7.center) {\scriptsize{...}};
 \node [anchor=center] (n8) at ([yshift=-3.4em]n7.center) {\scriptsize{15}};
 \node [anchor=center] (k8) at ([yshift=-3.4em]k7.center) {\scriptsize{[{\blue 0},{\blue 5}]}};
-\node [anchor=west] (t8) at ([yshift=-3.4em]t7.west) {\tiny{S $\to$ AB}};
+\node [anchor=west] (t8) at ([yshift=-3.4em]t7.west) {\scriptsize{S $\to$ AB}};

 \node [anchor=center,selectnode,fill=red!20] (alig33) at (cell33.center) {\tiny{}};
 \node [anchor=center,selectnode,fill=red!20] (alig42) at (cell42.center) {\tiny{}};

--- a/Chapter8/Figures/figure-tree-binarization.tex
+++ b/Chapter8/Figures/figure-tree-binarization.tex
@@ -31,7 +31,7 @@
 \node [anchor=north west] (rule2) at (rule1t.south west) {NP(NNP$_1$ NN(总统) NN(乔治) NN(华盛顿))};
 \node [anchor=north west] (rule2t) at ([yshift=0.2em]rule2.south west) {$\to$ NNP$_1$ President Trump};
 \node [anchor=north west] (rulelabel2) at ([yshift=-0.3em]rule2t.south west) {{{\red{不能}}抽取到的规则：}};
-\node [anchor=north west] (rule3) at (rulelabel2.south west) {NP(NN(乔治) NN(华盛顿)) $\to$ Trump};
+\node [anchor=north west] (rule3) at (rulelabel2.south west) {NP(NN(乔治) NN(华盛顿)) $\to$ Washington};

 \end{scope}
 }

--- a/Chapter8/chapter8.tex
+++ b/Chapter8/chapter8.tex
--- a/bibliography.bib
+++ b/bibliography.bib
@@ -2312,13 +2312,12 @@ year = {2012}
 }
 @article{shannon1949communication,
  title ={Communication theory of secrecy systems},
-  author ={Shannon, Claude E},
+  author ={Claude E. Shannon},
  journal ={Bell system technical journal},
  volume ={28},
  number ={4},
  pages ={656--715},
-  year ={1949},
-  publisher ={Wiley Online Library}
+  year ={1949}
 }
 @inproceedings{DBLP:conf/acl/Moore04,
  author    = {Robert C. Moore},
@@ -2352,8 +2351,8 @@ year = {2012}
 }
 @article{1998Grammar,
  title={Grammar Inference and Statistical Machine Translation},
-  author={Ye-Yi Wang and Jaime Carbonell},
-  year={1998},
+  author={Ye-Yi Wang and Wayne Ward},
+  year={1999},
  publisher={Carnegie Mellon University}
 }

@@ -3227,12 +3226,10 @@ year = {2012}

 @inproceedings{2014Dynamic,
  title={Dynamic Phrase Tables for Machine Translation in an Interactive Post-editing Scenario},
-  author={Germann, Ulrich},
+  author={Ulrich Germann },
  publisher = {Association for Machine Translation in the Americas},
  year={2014},
 }
-
-
 %%%%% chapter 7------------------------------------------------------
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

@@ -3249,7 +3246,7 @@ year = {2012}
 }
 @article{chiang2007hierarchical,
    title={Hierarchical Phrase-Based Translation},
-    author ={Chiang David},
+    author ={David Chiang},
    journal ={Computational Linguistics},
    volume ={33},
    number ={2},
@@ -3258,8 +3255,7 @@ year = {2012}
 }
 @book{cocke1969programming,
  title ={Programming Languages and Their Compilers: Preliminary Notes},
-  author ={Cocke, J. and Schwartz, J.T.},
-  lccn ={76374279},
+  author ={Cocke, John and Schwartz, J.T.},
  year ={1970},
  publisher ={Courant Institute of Mathematical Sciences, New York University}
 }
@@ -3273,7 +3269,7 @@ year = {2012}
  year      = {1967}
 }
 @article{kasami1966efficient,
-  author ={Kasami, Tadao},
+  author ={Tadao Kasami},
  title ={An efficient recognition and syntax-analysis algorithm for context-free languages},
  journal ={Coordinated Science Laboratory Report no. R-257},
  year ={1966}
@@ -3298,14 +3294,14 @@ year = {2012}
 }
 @inproceedings{huang2006statistical,
  title ={Statistical syntax-directed translation with extended domain of locality},
-  author ={Huang, Liang and Knight, Kevin and Joshi, Aravind},
+  author ={Liang Huang and Kevin Knight and Aravind Joshi},
  pages ={66--73},
  year ={2006},
  publisher ={Computationally Hard Problems \& Joint Inference in Speech \& Language Processing}
 }
 @inproceedings{galley2004s,
  title ={What’s in a translation rule?},
-  author ={Galley, Michel and Hopkins, Mark and Knight, Kevin and Marcu, Daniel},
+  author ={Michel Galleyand Mark Hopkins and Kevin Knight and Daniel Marcu},
  publisher={Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics},
  pages ={273--280},
  year ={2004}
@@ -3655,7 +3651,10 @@ year = {2012}
 }
 @article{Zhai2012Treebased,
  title={Treebased translation without using parse trees},
-  author={Zhai, Feifei and Zhang, Jiajun and Zhou, Yu and Zong, Chengqing},
+  author    = {Feifei Zhai and
+               Jiajun Zhang and
+               Yu Zhou and
+               Chengqing Zong},
  publisher = {International Conference on Computational Linguistics},
  year={2012},
 }
@@ -3771,7 +3770,7 @@ year = {2012}
 }
 @inproceedings{bangalore2001computing,
  title ={Computing consensus translation from multiple machine translation systems},
-  author ={Bangalore, B and Bordel, German and Riccardi, Giuseppe},
+  author ={Srinivas Bangalore, German Bordel and Giuseppe Riccardi},
  pages ={351--354},
  year ={2001},
  organization ={The Institute of Electrical and Electronics Engineers}
@@ -3790,7 +3789,7 @@ year = {2012}
 }
 @article{xiao2013bagging,
  title ={Bagging and boosting statistical machine translation systems},
-  author ={Xiao, Tong and Zhu, Jingbo and Liu, Tongran},
+  author ={Tong Xiao and Jingbo Zhu and Tongran Liu },
  publisher ={Artificial Intelligence},
  volume ={195},
  pages ={496--527},