合并分支 'caorunzhe' 到 'master'

Caorunzhe 查看合并请求 !1037

合并分支 'caorunzhe' 到 'master'
Caorunzhe 查看合并请求 !1037
16123b24 · 曹润柘 · 6abf7367 · e8958816 · 16123b24 · 16123b24
Commit 16123b24 authored Mar 03, 2021 by 曹润柘
--- a/Chapter1/Figures/figure-zh-sentences-into-en-sentences.tex
+++ b/Chapter1/Figures/figure-zh-sentences-into-en-sentences.tex
@@ -12,10 +12,11 @@
 {\footnotesize
 \node [anchor=north west] (example1) at (0,0) {\textbf{1:} 源=什么\ 时候\ 开始};
 \node [anchor=north west] (example1part2) at ([yshift=0.5em]example1.south west) {\hspace{1em} 译=\ When will it start};
-\node [anchor=north west] (example2) at ([yshift=0.1em]example1part2.south west) {\textbf{2:} 源=我\ 对\ 他\ 感到\ 高兴};
+\node [anchor=north west] (example2) at ([yshift=0.1em]example1part2.south west) {\textbf{2:} 源=我\ 对\ 他\ 感到\ 失望};
-\node [anchor=north west] (example2part2) at ([yshift=0.5em]example2.south west) {\hspace{1em} 译=\ I am happy with him};
+\node [anchor=north west] (example2part2) at ([yshift=0.5em]example2.south west) {\hspace{1em} 译=\ I am disappointed with him};
 \node [anchor=north west] (example3) at ([yshift=0.1em]example2part2.south west) {\hspace{1em} ...};
 \node [anchor=south west] (examplebaselabel) at (example1.north west) {{\color{ublue} 资源1：翻译实例库}};
+\node [anchor=north east,opacity=0] (empty) at ([yshift=-5em]example2part2.south east) {examplebaselab};
 }
 }
@@ -40,7 +41,7 @@
 \begin{pgfonlayer}{background}
 {
-\node[rectangle,draw=ublue, thick,inner sep=0mm] [fit = (entry1) (entry2) (entry3) (entry4) (dictionarylabel)] {};
+\node[rectangle,draw=ublue, thick,inner sep=0mm] [fit = (entry1) (entry2) (entry3) (entry4) (dictionarylabel) (empty)] {};
 }
 \end{pgfonlayer}
@@ -49,20 +50,20 @@
 \begin{scope}[xshift=2.3in]
 {\footnotesize
 \node [anchor=north west,inner sep=1mm] (w1) at (0,1.7em) {我};
-\node [anchor=north west,inner sep=1mm] (w2) at ([xshift=0.3em]w1.north east) {对};
+\node [anchor=north west,inner sep=1mm] (w2) at ([xshift=1.05em]w1.north east) {对};
-\node [anchor=north west,inner sep=1mm] (w3) at ([xshift=0.3em]w2.north east) {你};
+\node [anchor=north west,inner sep=1mm] (w3) at ([xshift=1.05em]w2.north east) {你};
-\node [anchor=north west,inner sep=1mm] (w4) at ([xshift=0.3em]w3.north east) {感到};
+\node [anchor=north west,inner sep=1mm] (w4) at ([xshift=1.05em]w3.north east) {感到};
-\node [anchor=north west,inner sep=1mm] (w5) at ([xshift=0.3em]w4.north east) {满意};
+\node [anchor=north west,inner sep=1mm] (w5) at ([xshift=1.05em]w4.north east) {满意};
 }
 \end{scope}
 \begin{scope}[xshift=2.3in,yshift=-0.2in]
 {\footnotesize
 \node [anchor=north west,inner sep=1mm] (c1) at (0,0) {我};
-\node [anchor=north west,inner sep=1mm] (c2) at ([xshift=0.3em]c1.north east) {对};
+\node [anchor=north west,inner sep=1mm] (c2) at ([xshift=1.05em]c1.north east) {对};
-\node [anchor=north west,inner sep=1mm] (c3) at ([xshift=0.3em]c2.north east) {他};
+\node [anchor=north west,inner sep=1mm] (c3) at ([xshift=1.05em]c2.north east) {他};
-\node [anchor=north west,inner sep=1mm] (c4) at ([xshift=0.3em]c3.north east) {感到};
+\node [anchor=north west,inner sep=1mm] (c4) at ([xshift=1.05em]c3.north east) {感到};
-\node [anchor=north west,inner sep=1mm] (c5) at ([xshift=0.3em]c4.north east) {高兴};
+\node [anchor=north west,inner sep=1mm] (c5) at ([xshift=1.05em]c4.north east) {失望};
 }
 \end{scope}
@@ -70,7 +71,7 @@
 {\footnotesize
 \node [anchor=west,inner sep=1mm] (e1) at (0,0) {I};
 \node [anchor=west,inner sep=1mm] (e2) at ([xshift=0.3em]e1.east) {am};
-\node [anchor=west,inner sep=1mm] (e3) at ([xshift=0.3em]e2.east) {happy};
+\node [anchor=west,inner sep=1mm] (e3) at ([xshift=0.3em]e2.east) {disappointed};
 \node [anchor=west,inner sep=1mm] (e4) at ([xshift=0.3em]e3.east) {with};
 \node [anchor=west,inner sep=1mm] (e5) at ([xshift=0.3em]e4.east) {him};
 }
@@ -94,16 +95,16 @@
 {
 \draw[double,->,thick,ublue] (e3.south)--([yshift=-1.2em]e3.south) node[pos=0.5,right,xshift=0.2em,yshift=0.2em] (step1) {\color{red}{\tiny{用“你”替换“他”}}};
-\draw[->,dotted,thick,red] ([xshift=-0.1em]entry2.east)..controls +(east:4) and +(west:4)..([yshift=-0.6em,xshift=-0.5em]e3.south) ;
+\draw[->,dotted,thick,red] ([xshift=0.2em]entry2.east)..controls +(east:4) and +(west:4)..([yshift=-0.6em,xshift=-0.5em]e3.south) ;
 }
 \begin{scope}[xshift=2.3in,yshift=-0.9in]
 {\footnotesize
 \node [anchor=north west,inner sep=1mm] (c1) at (0,0) {我};
-\node [anchor=north west,inner sep=1mm] (c2) at ([xshift=0.3em]c1.north east) {对};
+\node [anchor=north west,inner sep=1mm] (c2) at ([xshift=1.05em]c1.north east) {对};
-\node [anchor=north west,inner sep=1mm] (c3) at ([xshift=0.3em]c2.north east) {\footnotesize{{\color{ublue} 你}}};
+\node [anchor=north west,inner sep=1mm] (c3) at ([xshift=1.05em]c2.north east) {\footnotesize{{\color{ublue} 你}}};
-\node [anchor=north west,inner sep=1mm] (c4) at ([xshift=0.3em]c3.north east) {感到};
+\node [anchor=north west,inner sep=1mm] (c4) at ([xshift=1.05em]c3.north east) {感到};
-\node [anchor=north west,inner sep=1mm] (c5) at ([xshift=0.3em]c4.north east) {高兴};
+\node [anchor=north west,inner sep=1mm] (c5) at ([xshift=1.05em]c4.north east) {失望};
 }
 \end{scope}
@@ -111,7 +112,7 @@
 {\footnotesize
 \node [anchor=west,inner sep=1mm] (e1) at (0,0) {I};
 \node [anchor=west,inner sep=1mm] (e2) at ([xshift=0.3em]e1.east) {am};
-\node [anchor=west,inner sep=1mm] (e3) at ([xshift=0.3em]e2.east) {happy};
+\node [anchor=west,inner sep=1mm] (e3) at ([xshift=0.3em]e2.east) {disappointed};
 \node [anchor=west,inner sep=1mm] (e4) at ([xshift=0.3em]e3.east) {with};
 \node [anchor=west,inner sep=1mm] (e5) at ([xshift=0.3em,yshift=-0.2em]e4.east) {\textbf{{\color{ublue} you}}};
 }
@@ -122,28 +123,28 @@
 }
 {
-\draw[double,->,thick,ublue] (e3.south)--([yshift=-1.2em]e3.south) node[pos=0.5,right,xshift=0.2em,yshift=0.2em] (step1) {\color{red}{\tiny{用“满意”替换“高兴”}}};
+\draw[double,->,thick,ublue] (e3.south)--([yshift=-1.2em]e3.south) node[pos=0.5,right,xshift=0.2em,yshift=0.2em] (step1) {\color{red}{\tiny{用“满意”替换“失望”}}};
-\draw[->,dotted,thick,red] ([xshift=-1.2em,yshift=-0.6em]entry3.north east)..controls +(east:2) and +(west:3)..([yshift=-0.6em,xshift=-0.5em]e3.south) ;
+\draw[->,dotted,thick,red] ([xshift=0.2em,yshift=-0em]entry3.east)..controls +(east:2) and +(west:3)..([yshift=-0.6em,xshift=-0.5em]e3.south) ;
 }
 \begin{scope}[xshift=2.3in,yshift=-1.6in]
 {\footnotesize
 \node [anchor=north west,inner sep=1mm] (c1) at (0,0) {我};
-\node [anchor=north west,inner sep=1mm] (c2) at ([xshift=0.3em]c1.north east) {对};
+\node [anchor=north west,inner sep=1mm] (c2) at ([xshift=1.05em]c1.north east) {对};
-\node [anchor=north west,inner sep=1mm] (c3) at ([xshift=0.3em]c2.north east) {你};
+\node [anchor=north west,inner sep=1mm] (c3) at ([xshift=1.05em]c2.north east) {你};
-\node [anchor=north west,inner sep=1mm] (c4) at ([xshift=0.3em]c3.north east) {感到};
+\node [anchor=north west,inner sep=1mm] (c4) at ([xshift=1.05em]c3.north east) {感到};
-\node [anchor=north west,inner sep=1mm] (c5) at ([xshift=0.3em]c4.north east) {\footnotesize{{\color{ublue} 满意}}};
+\node [anchor=north west,inner sep=1mm] (c5) at ([xshift=1.05em]c4.north east) {\footnotesize{{\color{ublue} 满意}}};
 }
 \end{scope}
 \begin{scope}[xshift=2.3in,yshift=-2.0in]
 {\footnotesize
 \node [anchor=west,inner sep=1mm] (e1) at (0,0) {I};
-\node [anchor=west,inner sep=1mm] (e2) at ([xshift=0.3em]e1.east) {am};
+\node [anchor=west,inner sep=1mm] (e2) at ([xshift=0.7em]e1.east) {am};
-\node [anchor=west,inner sep=1mm] (e3) at ([xshift=0.3em]e2.east) {\textbf{{\color{ublue} satisfied}}};
+\node [anchor=west,inner sep=1mm] (e3) at ([xshift=0.7em]e2.east) {\textbf{{\color{ublue} satisfied}}};
-\node [anchor=west,inner sep=1mm] (e4) at ([xshift=0.3em]e3.east) {with};
+\node [anchor=west,inner sep=1mm] (e4) at ([xshift=0.7em]e3.east) {with};
-\node [anchor=west,inner sep=1mm] (e5) at ([xshift=0.3em,yshift=-0.2em]e4.east) {you};
+\node [anchor=west,inner sep=1mm] (e5) at ([xshift=0.7em,yshift=-0.2em]e4.east) {you};
 }
 \end{scope}

--- a/Chapter1/chapter1.tex
+++ b/Chapter1/chapter1.tex
--- a/Chapter15/Figures/figure-relative-position-coding-and-absolute-position-coding.tex
+++ b/Chapter15/Figures/figure-relative-position-coding-and-absolute-position-coding.tex
@@ -133,9 +133,9 @@
 \draw[->,standard] ([yshift=-0.3em]sa2.south) -- ([xshift=-4em,yshift=-0.3em]sa2.south) -- ([xshift=-4em,yshift=2em]sa2.south) -- ([xshift=-3.5em,yshift=2em]sa2.south);
 \draw[->,standard] ([yshift=0.2em]res3.north) -- ([xshift=-4em,yshift=0.2em]res3.north) -- ([xshift=-4em,yshift=2.5em]res3.north) -- ([xshift=-3.5em,yshift=2.5em]res3.north);
-\draw[->,standard] ([xshift=0em]wi.east) -- ([xshift=3.2em,yshift=0em]wi.east) -- ([xshift=-0em,yshift=0em]pos2.south);
+\draw[->,standard] ([xshift=0em]wi.east) -- ([xshift=3.25em,yshift=0em]wi.east) -- ([xshift=-0em,yshift=0em]pos2.south);
-\draw[->,standard] ([xshift=0em]wi.east) -- ([xshift=6.7em,yshift=0em]wi.east) -- ([xshift=-0em,yshift=0em]pos3.south);
+\draw[->,standard] ([xshift=0em]wi.east) -- ([xshift=6.78em,yshift=0em]wi.east) -- ([xshift=-0em,yshift=0em]pos3.south);
-\draw[->,standard] ([xshift=0em]wi.east) -- ([xshift=10.2em,yshift=0em]wi.east) -- ([xshift=-0em,yshift=0em]pos4.south);
+\draw[->,standard] ([xshift=0em]wi.east) -- ([xshift=10.3em,yshift=0em]wi.east) -- ([xshift=-0em,yshift=0em]pos4.south);
 \draw[->,standard] ([xshift=0em]pos2.north) -- ([xshift=0em,yshift=2.1em]pos2.north) -- ([xshift=-0em,yshift=0em]sa1.east);
 \draw[->,standard] ([xshift=0em]pos3.north) -- ([xshift=0em,yshift=9.6em]pos3.north) -- ([xshift=-0em,yshift=0em]dot1.east);
 \draw[->,standard] ([xshift=0em]pos4.north) -- ([xshift=0em,yshift=12.3em]pos4.north) -- ([xshift=-0em,yshift=0em]sa2.east);

--- a/Chapter15/chapter15.tex
+++ b/Chapter15/chapter15.tex
@@ -688,9 +688,9 @@ v_i &=& \mathbi{I}_d^{\textrm{T}}\textrm{Tanh}(\mathbi{W}_d\mathbi{Q}_i)
 \vspace{0.5em}
 \item 类似于标准的Transformer初始化方式，使用Xavier初始化方式来初始化除了词嵌入以外的所有参数矩阵。词嵌入矩阵服从$\mathbb{N}(0,d^{-\frac{1}{2}})$的高斯分布，其中$d$代表词嵌入的维度。
 \vspace{0.5em}
-\item 对编码器中自注意力机制的参数矩阵以及前馈神经网络中所有参数矩阵进行缩放因子为$0.67 {L}^{-\frac{1}{4}}$的缩放，$L$为编码器层数。
+\item 对编码器中部分自注意力机制的参数矩阵以及前馈神经网络的参数矩阵进行缩放因子为$0.67 {L}^{-\frac{1}{4}}$的缩放，$L$为编码器层数。
 \vspace{0.5em}
-\item 对解码器中全部注意力机制的参数矩阵以及前馈神经网络中所有参数矩阵进行缩放因子为$(9 {M})^{-\frac{1}{4}}$的缩放，其中$M$为解码器层数。
+\item 对解码器中部分注意力机制的参数矩阵、前馈神经网络的参数矩阵以及前馈前馈神经网络的嵌入式输入进行缩放因子为$(9 {M})^{-\frac{1}{4}}$的缩放，其中$M$为解码器层数。
 \vspace{0.5em}
 \end{itemize}

--- a/Chapter2/chapter2.tex
+++ b/Chapter2/chapter2.tex
@@ -579,7 +579,7 @@ F(x)=\int_{-\infty}^x f(x)\textrm{d}x
 \label{eq:2-27}
 \end{eqnarray}
-\noindent 其中，$V$表示词表，$|V|$为词表中单词的个数，$w$为词表中的一个词，c表示统计单词或短语出现的次数。有时候，加法平滑方法会将$\theta$取1，这时称之为加一平滑或是拉普拉斯平滑。这种方法比较容易理解，也比较简单，因此也往往被用于对系统的快速原型中。
+\noindent 其中，$V$表示词表，$|V|$为词表中单词的个数，$w$为词表中的一个词，c表示统计单词或短语出现的次数。有时候，加法平滑方法会将$\theta$取1，这时称之为加一平滑或是拉普拉斯平滑。这种方法比较容易理解，也比较简单，因此常被用于对系统的快速实现上。
 \parinterval 举一个例子。假设在一个英语文档中随机采样一些单词（词表大小$|V|=20$），各个单词出现的次数为：“look”出现4次，“people”出现3次，“am”出现2次，“what”出现1次，“want”出现1次，“do”出现1次。图\ref{fig:2-11} 给出了在平滑之前和平滑之后的概率分布。
@@ -803,7 +803,7 @@ c(\cdot) & \textrm{当计算最高阶模型时}  \\
 \parinterval 从词序列建模的角度看，这两类预测问题本质上是一样的。因为，它们都在使用语言模型对词序列进行概率评估。但是，从实现上看，词序列的生成问题更难。因为，它不仅要对所有可能的词序列进行打分，同时要“找到”最好的词序列。由于潜在的词序列不计其数，因此这个“找”最优词序列的过程并不简单。
-\parinterval 实际上，生成最优词序列的问题也是自然语言处理中的一大类问题\ \dash\ {\small\bfnew{序列生成}}\index{序列生成}（Sequence Generation）\index{Sequence Generation}。机器翻译就是一个非常典型的序列生成问题：在机器翻译任务中，需要根据源语言词序列生成与之相对应的目标语言词序列。但是语言模型本身并不能“制造”单词序列的。因此，严格地说，序列生成问题的本质并非让语言模型凭空“生成”序列，而是使用语言模型在所有候选的单词序列中“找出”最佳序列。这个过程对应着经典的{\small\bfnew{搜索问题}}\index{搜索问题}（Search Problem）\index{Search Problem}。下面将着重介绍序列生成背后的建模方法，以及在序列生成里常用的搜索技术。
+\parinterval 实际上，生成最优词序列的问题也是自然语言处理中的一大类问题\ \dash\ {\small\bfnew{序列生成}}\index{序列生成}（Sequence Generation）\index{Sequence Generation}。机器翻译就是一个非常典型的序列生成任务：在机器翻译任务中，需要根据源语言词序列生成与之相对应的目标语言词序列。但是语言模型本身并不能“制造”单词序列的。因此，严格地说，序列生成任务的本质并非让语言模型凭空“生成”序列，而是使用语言模型在所有候选的单词序列中“找出”最佳序列。这个过程对应着经典的{\small\bfnew{搜索问题}}\index{搜索问题}（Search Problem）\index{Search Problem}。下面将着重介绍序列生成任务背后的建模方法，以及在序列生成任务里常用的搜索技术。
 %----------------------------------------------------------------------------------------
 %    NEW SUB-SECTION
@@ -811,7 +811,7 @@ c(\cdot) & \textrm{当计算最高阶模型时}  \\
 \subsection{搜索问题的建模}
-\parinterval 基于语言模型的序列生成问题可以被定义为：在无数任意排列的单词序列中找到概率最高的序列。这里单词序列$w = w_1 w_2 \ldots w_m$的语言模型得分$\funp{P}(w)$度量了这个序列的合理性和流畅性。在序列生成任务中，基于语言模型的搜索问题可以被描述为：
+\parinterval 基于语言模型的序列生成任务可以被定义为：在无数任意排列的单词序列中找到概率最高的序列。这里单词序列$w = w_1 w_2 \ldots w_m$的语言模型得分$\funp{P}(w)$度量了这个序列的合理性和流畅性。在序列生成任务中，基于语言模型的搜索问题可以被描述为：
 \begin{eqnarray}
 \hat{w} = \argmax_{w \in \chi}\funp{P}(w)
 \label{eq:2-42}
@@ -832,7 +832,7 @@ c(\cdot) & \textrm{当计算最高阶模型时}  \\
 \end{figure}
 %-------------------------------------------
-\parinterval 在这种序列生成方式的基础上，实现搜索通常有两种方法\ \dash\ 深度优先遍历和宽度优先遍历\upcite{DBLP:books/mg/CormenLR89}。在深度优先遍历中，每次从词表中可重复地选择一个单词，然后从左至右地生成序列，直到<eos>被选择，此时一个完整的单词序列被生成出来。然后从<eos>回退到上一个单词，选择之前词表中未被选择到的候选单词代替<eos>，并继续挑选下一个单词直到<eos>被选到，如果上一个单词的所有可能都被枚举过，那么回退到上上一个单词继续枚举，直到回退到<sos>，这时候枚举结束。在宽度优先遍历中，每次不是只选择一个单词，而是枚举所有单词。
+\parinterval 在这种序列生成策略的基础上，实现搜索通常有两种方法\ \dash\ 深度优先遍历和宽度优先遍历\upcite{DBLP:books/mg/CormenLR89}。在深度优先遍历中，每次从词表中选择一个单词（可重复），然后从左至右地生成序列，直到<eos>被选择，此时一个完整的单词序列被生成出来。然后从<eos>回退到上一个单词，选择之前词表中未被选择到的候选单词代替<eos>，并继续挑选下一个单词直到<eos>被选到，如果上一个单词的所有可能都被枚举过，那么回退到上上一个单词继续枚举，直到回退到<sos>，这时候枚举结束。在宽度优先遍历中，每次不是只选择一个单词，而是枚举所有单词。
 \parinterval 有一个简单的例子。假设词表只含两个单词$\{a, b\}$，从<sos>开始枚举所有候选，有三种可能：
 \begin{eqnarray}
@@ -916,7 +916,7 @@ c(\cdot) & \textrm{当计算最高阶模型时}  \\
 \end{figure}
 %-------------------------------------------
-\parinterval 这样，语言模型的打分与解空间树的遍历就融合在一起了。于是，序列生成的问题可以被重新描述为：寻找所有单词序列组成的解空间树中权重总和最大的一条路径。在这个定义下，前面提到的两种枚举词序列的方法就是经典的{\small\bfnew{深度优先搜索}}\index{深度优先搜索}（Depth-first Search）\index{Depth-first Search}和{\small\bfnew{宽度优先搜索}}\index{宽度优先搜索}（Breadth-first Search）\index{Breadth-first Search}的雏形\upcite{even2011graph,tarjan1972depth}。在后面的内容中，从遍历解空间树的角度出发，可以对这些原始的搜索策略的效率进行优化。
+\parinterval 这样，语言模型的打分与解空间树的遍历就融合在一起了。于是，序列生成任务可以被重新描述为：寻找所有单词序列组成的解空间树中权重总和最大的一条路径。在这个定义下，前面提到的两种枚举词序列的方法就是经典的{\small\bfnew{深度优先搜索}}\index{深度优先搜索}（Depth-first Search）\index{Depth-first Search}和{\small\bfnew{宽度优先搜索}}\index{宽度优先搜索}（Breadth-first Search）\index{Breadth-first Search}的雏形\upcite{even2011graph,tarjan1972depth}。在后面的内容中，从遍历解空间树的角度出发，可以对这些原始的搜索策略的效率进行优化。
 %----------------------------------------------------------------------------------------
 %    NEW SUB-SECTION

--- a/Chapter5/chapter5.tex
+++ b/Chapter5/chapter5.tex
@@ -162,7 +162,7 @@ IBM模型由Peter F. Brown等人于上世纪九十年代初提出\upcite{DBLP:jo
 \subsection{统计机器翻译的基本框架}
-\parinterval 为了对统计机器翻译有一个直观的认识，下面将介绍如何构建一个非常简单的统计机器翻译系统，其中涉及到的很多思想来自IBM模型。这里，仍然使用数据驱动的统计建模方法。图\ref{fig:5-5}展示了系统的主要流程，包括两个步骤：
+\parinterval 为了对统计机器翻译有一个直观的认识，下面将介绍如何构建一个非常简单的统计机器翻译系统，其中涉及到的很多思想来自IBM模型。这里，仍然使用数据驱动的统计建模方法。图\ref{fig:5-5}展示了统计机器翻译的主要流程，包括两个步骤：
 \begin{itemize}
 \vspace{0.5em}

--- a/Chapter8/chapter8.tex
+++ b/Chapter8/chapter8.tex
@@ -44,7 +44,7 @@
 \end{figure}
 %-------------------------------------------
-\parinterval 当然，可以使用平滑算法对长短语的概率进行估计，但是使用过长的短语在实际系统研发中仍然不现实。图\ref{fig:8-1}展示了一个汉语到英语的翻译实例。源语言的两个短语（蓝色和红色高亮）在目标语言中产生了调序。但是，这两个短语在源语言句子中横跨11个单词。如果直接使用这11个单词构成的短语进行翻译，显然会有非常严重的数据稀疏问题，因为很难期望在训练数据中见到一模一样的短语。
+\parinterval 当然，可以使用平滑算法对长短语的概率进行估计，但是使用过长的短语在实际系统研发中仍然不现实。图\ref{fig:8-1}展示了一个汉语到英语的翻译实例。源语言的两个短语（蓝色和红色高亮）在目标语言中产生了调序。但是，这两个短语在源语言句子中横跨8个单词。如果直接使用这8个单词构成的短语进行翻译，显然会有非常严重的数据稀疏问题，因为很难期望在训练数据中见到一模一样的短语。
 \parinterval 仅使用连续词串不能处理所有的翻译问题，其根本原因在于句子的表层串很难描述片段之间大范围的依赖。一个新的思路是使用句子的层次结构信息进行建模。{\chapterthree}已经介绍了句法分析基础。对于每个句子，都可以用句法树描述它的结构。