合并分支 'caorunzhe' 到 'master'

Caorunzhe 查看合并请求 !779

合并分支 'caorunzhe' 到 'master'
Caorunzhe 查看合并请求 !779
98b7832d · 曹润柘 · 5d7f9b1f · bd05a3a3 · 98b7832d · 98b7832d
Commit 98b7832d authored Jan 05, 2021 by 曹润柘
--- a/Chapter16/Figures/figure-mass.tex
+++ b/Chapter16/Figures/figure-mass.tex
@@ -7,12 +7,12 @@
 \node [anchor=center,model,fill=blue!20] (decoder) at ([xshift=7.5em]ate.east) {\small{解码器}};
 \node [anchor=north,word] (w1) at ([yshift=-1.5em,xshift=0em]decoder.south) {\small{$x_3$}};
 \node [anchor=west,word] (w2) at ([xshift=0em]w1.east) {\small{$x_4$}};
-\node [anchor=west,word] (w3) at ([xshift=0em]w2.east) {[M]};
+\node [anchor=west,word] (w3) at ([xshift=0em]w2.east) {<M>};
-\node [anchor=east,word] (w4) at ([xshift=0em]w1.west) {[M]};
+\node [anchor=east,word] (w4) at ([xshift=0em]w1.west) {<M>};
-\node [anchor=east,word] (w5) at ([xshift=0em]w4.west) {[M]};
+\node [anchor=east,word] (w5) at ([xshift=0em]w4.west) {<M>};
-\node [anchor=east,word] (w6) at ([xshift=0em]w5.west) {[M]};
+\node [anchor=east,word] (w6) at ([xshift=0em]w5.west) {<M>};
-\node [anchor=west,word] (w7) at ([xshift=0em]w3.east) {[M]};
+\node [anchor=west,word] (w7) at ([xshift=0em]w3.east) {<M>};
 \node [anchor=south,word] (w8) at ([yshift=1.5em,xshift=0em]decoder.north) {\small{$x_4$}};
 \node [anchor=east,word] (w9) at (w8.west) {\small{$x_3$}};
@@ -33,11 +33,11 @@
 %encoder
 \node [model] (encoder) at ([xshift=-7.5em]ate.west) {\small{编码器}};
-\node [anchor=north,word] (we1) at ([yshift=-1.5em,xshift=0em]encoder.south) {[M]};
+\node [anchor=north,word] (we1) at ([yshift=-1.5em,xshift=0em]encoder.south) {<M>};
-\node [anchor=west,word] (we2) at ([xshift=0em]we1.east) {[M]};
+\node [anchor=west,word] (we2) at ([xshift=0em]we1.east) {<M>};
 \node [anchor=west,word] (we3) at ([xshift=0em]we2.east) {\small{$x_6$}};
-\node [anchor=east,word] (we4) at ([xshift=0em]we1.west) {[M]};
+\node [anchor=east,word] (we4) at ([xshift=0em]we1.west) {<M>};
 \node [anchor=east,word] (we5) at ([xshift=0em]we4.west) {\small{$x_2$}};
 \node [anchor=east,word] (we6) at ([xshift=0em]we5.west) {\small{$x_1$}};
 \node [anchor=west,word] (we7) at ([xshift=0em]we3.east) {\small{$x_7$}};
@@ -51,5 +51,5 @@
 \draw [->,thick] (we7.north) -- ([yshift=1.35em]we7.north);
 \draw [->,very thick] ([xshift=0.5em]encoder)--([xshift=-0.3em]decoder);
-\node [anchor=south] (ex) at ([xshift=-4.0em,yshift=1.0em]encoder.north) {\small{[M]：Mask}};
+\node [anchor=south] (ex) at ([xshift=-4.0em,yshift=1.0em]encoder.north) {\small{<M>：<Mask>}};
 \end{tikzpicture}
\ No newline at end of file
--- a/Chapter16/Figures/figure-multi-language-single-model-system-diagram.tex
+++ b/Chapter16/Figures/figure-multi-language-single-model-system-diagram.tex
@@ -3,13 +3,13 @@
 %-------------------------------------------------------------------------
 \begin{tikzpicture}
 \tikzstyle{lan}=[font=\footnotesize,inner ysep=2pt,minimum height=1em]
-\node[minimum height=3em,minimum width=8em,fill=orange!20,draw,rounded corners=2pt,align=center,line width=0.6pt] (sys) at (0,0){多语言 \\ 单模型系统};
+\node[minimum height=4em,minimum width=8em,fill=orange!20,draw,rounded corners=2pt,align=center,line width=0.6pt,font=\small] (sys) at (0,0){多语言 \\ 单模型系统};
-\node[draw,font=\footnotesize,minimum width=4em,fill=red!20,rounded corners=1pt,line width=0.6pt] (en) at (-3em,4em){英语};
+\node[draw,font=\footnotesize,minimum width=4em,fill=red!20,rounded corners=1pt,line width=0.6pt] (en) at (-3em,5em){英语};
-\node[draw,font=\footnotesize,minimum width=4em,fill=red!20,rounded corners=1pt,line width=0.6pt] (fr) at (3em,4em){法语};
+\node[draw,font=\footnotesize,minimum width=4em,fill=red!20,rounded corners=1pt,line width=0.6pt] (fr) at (3em,5em){法语};
-\node[minimum width=4em]  at (6.6em,4em){$\dots$};
+\node[minimum width=4em]  at (6.6em,5em){$\dots$};
-\node[draw,font=\footnotesize,minimum width=4em,fill=blue!20,rounded corners=1pt,line width=0.6pt] (de) at (-3em,-4em){德语};
+\node[draw,font=\footnotesize,minimum width=4em,fill=blue!20,rounded corners=1pt,line width=0.6pt] (de) at (-3em,-5em){德语};
-\node[draw,font=\footnotesize,minimum width=4em,fill=blue!20,rounded corners=1pt,line width=0.6pt] (sp) at (3em,-4em){西班牙语};
+\node[draw,font=\footnotesize,minimum width=4em,fill=blue!20,rounded corners=1pt,line width=0.6pt] (sp) at (3em,-5em){西班牙语};
-\node[minimum width=4em]  at (6.6em,-4em){$\dots$};
+\node[minimum width=4em]  at (6.6em,-5em){$\dots$};
 \draw[->,thick] (en.-90) -- ([xshift=-1em]sys.90);
 \draw[->,thick] (fr.-90) -- ([xshift=1em]sys.90);

--- a/Chapter16/Figures/figure-parameter-initialization-method-diagram.tex
+++ b/Chapter16/Figures/figure-parameter-initialization-method-diagram.tex
@@ -12,10 +12,10 @@
 \node[node,anchor=west,minimum width=6em,minimum height=2.4em,fill=blue!20,line width=0.6pt] (decoder2) at ([xshift=4em,yshift=0em]decoder1.east){\small 解码器};
 \node[node,anchor=west,minimum width=6em,minimum height=2.4em,fill=blue!30,line width=0.6pt] (decoder3) at ([xshift=3em]decoder2.east){\small 解码器};
-\node[anchor=north,font=\scriptsize,fill=yellow!20] (w1) at ([yshift=-1.6em]decoder1.south){知识 \ 就是 \ 力量 \ 。 \ <eos>};
+\node[anchor=north,font=\scriptsize,fill=yellow!20,drop shadow,draw] (w1) at ([yshift=-1.6em]decoder1.south){知识 \ 就是 \ 力量 \ 。 \ <eos>};
-\node[anchor=north,font=\scriptsize,fill=green!20] (w3) at ([yshift=-1.6em]decoder3.south){Wissen  \ ist \ Machit \ . \ <eos>};
+\node[anchor=north,font=\scriptsize,fill=green!20,drop shadow,draw] (w3) at ([yshift=-1.6em]decoder3.south){El conocimiento es poder . <eos>};
-\node[anchor=south,font=\scriptsize,fill=orange!20] (w2) at ([yshift=1.6em]encoder1.north){Knowledge \ is \ power \ . };
+\node[anchor=south,font=\scriptsize,fill=orange!20,drop shadow,draw] (w2) at ([yshift=1.6em]encoder1.north){Knowledge \ is \ power \ . };
-\node[anchor=south,font=\scriptsize,fill=orange!20] (w4) at ([yshift=1.6em]encoder3.north){Knowledge \ is \ power \ . };
+\node[anchor=south,font=\scriptsize,fill=orange!20,drop shadow,draw] (w4) at ([yshift=1.6em]encoder3.north){Knowledge \ is \ power \ . };
 \draw[->,thick] (decoder1.-90) -- (w1.north);

--- a/Chapter16/chapter16.tex
+++ b/Chapter16/chapter16.tex
--- a/ChapterAppend/chapterappend.tex
+++ b/ChapterAppend/chapterappend.tex
@@ -245,16 +245,16 @@ c(i|j,m,l;\mathbf{s},\mathbf{t}) &=&\frac{f(s_j|t_i)a(i|j,m,l)}   {\sum_{k=0}^{l
 \parinterval M-Step的计算公式如下，其中参数$a(i|j,m,l)$表示调序概率：
 \begin{eqnarray}
-f(s_u|t_v) &=\frac{c(s_u|t_v;\mathbf{s},\mathbf{t}) }    {\sum_{s_u} c(s_u|t_v;\mathbf{s},\mathbf{t})} \\
+f(s_u|t_v) &=&\frac{c(s_u|t_v;\mathbf{s},\mathbf{t}) }    {\sum_{s_u} c(s_u|t_v;\mathbf{s},\mathbf{t})} \\
-a(i|j,m,l) &=\frac{c(i|j;\mathbf{s},\mathbf{t})}  {\sum_{i}c(i|j;\mathbf{s},\mathbf{t})}
+a(i|j,m,l) &=&\frac{c(i|j;\mathbf{s},\mathbf{t})}  {\sum_{i}c(i|j;\mathbf{s},\mathbf{t})}
 \label{eq:append-2}
 \end{eqnarray}
 对于由$K$个样本组成的训练集$\{(\mathbf{s}^{[1]},\mathbf{t}^{[1]}),...,(\mathbf{s}^{[K]},\mathbf{t}^{[K]})\}$，可以将M-Step的计算调整为：
 \begin{eqnarray}
-f(s_u|t_v) &=\frac{\sum_{k=1}^{K}c_{\mathbb{E}}(s_u|t_v;\mathbf{s}^{[k]},\mathbf{t}^{[k]}) }    {\sum_{s_u} \sum_{k=1}^{K} c_{\mathbb{E}}(s_u|t_v;\mathbf{s}^{[k]},\mathbf{t}^{[k]})} \\
+f(s_u|t_v) &=&\frac{\sum_{k=1}^{K}c_{\mathbb{E}}(s_u|t_v;\mathbf{s}^{[k]},\mathbf{t}^{[k]}) }    {\sum_{s_u} \sum_{k=1}^{K} c_{\mathbb{E}}(s_u|t_v;\mathbf{s}^{[k]},\mathbf{t}^{[k]})} \\
-a(i|j,m,l) &=\frac{\sum_{k=1}^{K}c_{\mathbb{E}}(i|j;\mathbf{s}^{[k]},\mathbf{t}^{[k]})}  {\sum_{i}\sum_{k=1}^{K}c_{\mathbb{E}}(i|j;\mathbf{s}^{[k]},\mathbf{t}^{[k]})}
+a(i|j,m,l) &=&\frac{\sum_{k=1}^{K}c_{\mathbb{E}}(i|j;\mathbf{s}^{[k]},\mathbf{t}^{[k]})}  {\sum_{i}\sum_{k=1}^{K}c_{\mathbb{E}}(i|j;\mathbf{s}^{[k]},\mathbf{t}^{[k]})}
 \label{eq:append-3}
 \end{eqnarray}
@@ -294,13 +294,13 @@ p_x & = & \zeta^{-1} \sum_{k=1}^{K}c(x;\mathbf{s}^{[k]},\mathbf{t}^{[k]}) \label
 \parinterval 在模型3中，因为繁衍率的引入，并不能像模型1和模型2那样，在保证正确性的情况下加速参数估计的过程。这就使得每次迭代过程中，都不得不面对大小为$(l+1)^m$的词对齐空间。遍历所有$(l+1)^m$个词对齐所带来的高时间复杂度显然是不能被接受的。因此就要考虑能否仅利用词对齐空间中的部分词对齐对这些参数进行估计。比较简单的方法是仅使用Viterbi对齐来进行参数估计，这里Viterbi 词对齐可以被简单的看作搜索到的最好词对齐。遗憾的是，在模型3中并没有方法直接获得Viterbi对齐。这样只能采用一种折中的策略，即仅考虑那些使得$\funp{P}_{\theta}(\mathbf{s},\mathbf{a}|\mathbf{t})$ 达到较高值的词对齐。这里把这部分词对齐组成的集合记为$S$。式\ref{eq:1.2}可以被修改为：
 \begin{eqnarray}
-c(s|t,\mathbf{s},\mathbf{t}) \approx \sum_{\mathbf{a} \in S}\big[\funp{P}_{\theta}(\mathbf{s},\mathbf{a}|\mathbf{t}) \times \sum_{j=1}^{m}(\delta(s_j,\mathbf{s}) \cdot \delta(t_{a_{j}},\mathbf{t})) \big]
+c(s|t,\mathbf{s},\mathbf{t}) &\approx & \sum_{\mathbf{a} \in S}\big[\funp{P}_{\theta}(\mathbf{s},\mathbf{a}|\mathbf{t}) \times \sum_{j=1}^{m}(\delta(s_j,\mathbf{s}) \cdot \delta(t_{a_{j}},\mathbf{t})) \big]
 \label{eq:1.11}
 \end{eqnarray}
 \parinterval 同理可以获得式\ref{eq:1.3}-\ref{eq:1.6}的修改结果。进一步，在IBM模型3中，可以定义$S$如下：
 \begin{eqnarray}
-S = N(b^{\infty}(V(\mathbf{s}|\mathbf{t};2))) \cup (\mathop{\cup}\limits_{ij} N(b_{i \leftrightarrow j}^{\infty}(V_{i \leftrightarrow j}(\mathbf{s}|\mathbf{t},2))))
+S &=& N(b^{\infty}(V(\mathbf{s}|\mathbf{t};2))) \cup (\mathop{\cup}\limits_{ij} N(b_{i \leftrightarrow j}^{\infty}(V_{i \leftrightarrow j}(\mathbf{s}|\mathbf{t},2))))
 \label{eq:1.12}
 \end{eqnarray}
@@ -323,7 +323,7 @@ S = N(b^{\infty}(V(\mathbf{s}|\mathbf{t};2))) \cup (\mathop{\cup}\limits_{ij} N(
 \parinterval 如果$\bf{a}$和$\bf{a}'$区别于两个位置$j_1$和$j_2$的对齐上，$a_{j_{1}}=a_{j_{2}^{'}}$且$a_{j_{2}}=a_{j_{1}^{'}}$，那么
 \begin{eqnarray}
-\funp{P}_{\theta}(\mathbf{a'},\mathbf{s}|\mathbf{t}) = \funp{P}_{\theta}(\mathbf{a},\mathbf{s}|\mathbf{t}) \cdot \frac{t(s_{j_{2}}|t_{a_{j_{2}}})}{t(s_{j_{1}}|t_{a_{j_{1}}})} \cdot \frac{d(j_{2}|a_{j_{2}},m,l)}{d(j_{1}|a_{j_{1}},m,l)}
+\funp{P}_{\theta}(\mathbf{a'},\mathbf{s}|\mathbf{t}) &=& \funp{P}_{\theta}(\mathbf{a},\mathbf{s}|\mathbf{t}) \cdot \frac{t(s_{j_{2}}|t_{a_{j_{2}}})}{t(s_{j_{1}}|t_{a_{j_{1}}})} \cdot \frac{d(j_{2}|a_{j_{2}},m,l)}{d(j_{1}|a_{j_{1}},m,l)}
 \label{eq:1.14}
 \end{eqnarray}
@@ -337,7 +337,7 @@ S = N(b^{\infty}(V(\mathbf{s}|\mathbf{t};2))) \cup (\mathop{\cup}\limits_{ij} N(
 \parinterval 模型4的参数估计基本与模型3一致。需要修改的是扭曲度的估计公式，对于目标语第$i$个cept.生成的第一单词，可以得到（假设有$K$个训练样本）：
 \begin{eqnarray}
-d_1(\Delta_j|ca,cb) = \mu_{1cacb}^{-1} \times \sum_{k=1}^{K}c_1(\Delta_j|ca,cb;\mathbf{s}^{[k]},\mathbf{t}^{[k]})
+d_1(\Delta_j|ca,cb) &=& \mu_{1cacb}^{-1} \times \sum_{k=1}^{K}c_1(\Delta_j|ca,cb;\mathbf{s}^{[k]},\mathbf{t}^{[k]})
 \label{eq:1.15}
 \end{eqnarray}
@@ -352,7 +352,7 @@ s_1(\Delta_j|ca,cb;\rm{a},\mathbf{s},\mathbf{t}) & = & \sum_{i=1}^l \big[\vareps
 且
 \begin{eqnarray}
-\varepsilon(x) = \begin{cases}
+\varepsilon(x) &=& \begin{cases}
 0 & x \leq 0 \\
 1 & x > 0
 \end{cases}
@@ -362,7 +362,7 @@ s_1(\Delta_j|ca,cb;\rm{a},\mathbf{s},\mathbf{t}) & = & \sum_{i=1}^l \big[\vareps
 对于目标语第$i$个cept.生成的其他单词（非第一个单词），可以得到：
 \begin{eqnarray}
-d_{>1}(\Delta_j|cb) = \mu_{>1cb}^{-1} \times \sum_{k=1}^{K}c_{>1}(\Delta_j|cb;\mathbf{s}^{[k]},\mathbf{t}^{[k]})
+d_{>1}(\Delta_j|cb) &=& \mu_{>1cb}^{-1} \times \sum_{k=1}^{K}c_{>1}(\Delta_j|cb;\mathbf{s}^{[k]},\mathbf{t}^{[k]})
 \label{eq:1.18}
 \end{eqnarray}
@@ -377,7 +377,7 @@ s_{>1}(\Delta_j|cb;\mathbf{a},\mathbf{s},\mathbf{t}) & = & \sum_{i=1}^l \big[\va
 \noindent 这里，$ca$和$cb$分别表示目标语言和源语言的某个词类。模型4需要像模型3一样，通过定义一个词对齐集合$S$，使得每次迭代都在$S$上进行，进而降低运算量。模型4中$S$的定义为：
 \begin{eqnarray}
-\textrm{S} = N(\tilde{b}^{\infty}(V(\mathbf{s}|\mathbf{t};2))) \cup (\mathop{\cup}\limits_{ij} N(\tilde{b}_{i \leftrightarrow j}^{\infty}(V_{i \leftrightarrow j}(\mathbf{s}|\mathbf{t},2))))
+\textrm{S} &=& N(\tilde{b}^{\infty}(V(\mathbf{s}|\mathbf{t};2))) \cup (\mathop{\cup}\limits_{ij} N(\tilde{b}_{i \leftrightarrow j}^{\infty}(V_{i \leftrightarrow j}(\mathbf{s}|\mathbf{t},2))))
 \label{eq:1.22}
 \end{eqnarray}
@@ -392,7 +392,7 @@ s_{>1}(\Delta_j|cb;\mathbf{a},\mathbf{s},\mathbf{t}) & = & \sum_{i=1}^l \big[\va
 \parinterval 模型5的参数估计过程也模型4的过程基本一致，二者的区别在于扭曲度的估计公式。在模型5中，对于目标语第$i$个cept.生成的第一单词，可以得到（假设有$K$个训练样本）：
 \begin{eqnarray}
-d_1(\Delta_j|cb) = \mu_{1cb}^{-1} \times \sum_{k=1}^{K}c_1(\Delta_j|cb;\mathbf{s}^{[k]},\mathbf{t}^{[k]})
+d_1(\Delta_j|cb) &=& \mu_{1cb}^{-1} \times \sum_{k=1}^{K}c_1(\Delta_j|cb;\mathbf{s}^{[k]},\mathbf{t}^{[k]})
 \label{eq:1.23}
 \end{eqnarray}
@@ -408,7 +408,7 @@ s_1(\Delta_j|cb,v_x,v_y;\mathbf{a},\mathbf{s},\mathbf{t}) & = & \sum_{i=1}^l \Bi
 对于目标语第$i$个cept.生成的其他单词（非第一个单词），可以得到：
 \begin{eqnarray}
-d_{>1}(\Delta_j|cb,v) = \mu_{>1cb}^{-1} \times \sum_{k=1}^{K}c_{>1}(\Delta_j|cb,v;\mathbf{s}^{[k]},\mathbf{t}^{[k]})
+d_{>1}(\Delta_j|cb,v) &=& \mu_{>1cb}^{-1} \times \sum_{k=1}^{K}c_{>1}(\Delta_j|cb,v;\mathbf{s}^{[k]},\mathbf{t}^{[k]})
 \label{eq:1.26}
 \end{eqnarray}
@@ -431,7 +431,7 @@ s_{>1}(\Delta_j|cb,v;\mathbf{a},\mathbf{s},\mathbf{t}) & = & \sum_{i=1}^l\Big[\v
 \parinterval 在模型5中同样需要定义一个词对齐集合$S$，使得每次迭代都在$S$上进行。可以对$S$进行如下定义
 \begin{eqnarray}
-\textrm{S} = N(\tilde{\tilde{b}}^{\infty}(V(\mathbf{s}|\mathbf{t};2))) \cup (\mathop{\cup}\limits_{ij} N(\tilde{\tilde{b}}_{i \leftrightarrow j}^{\infty}(V_{i \leftrightarrow j}(\mathbf{s}|\mathbf{t},2))))
+\textrm{S} &=& N(\tilde{\tilde{b}}^{\infty}(V(\mathbf{s}|\mathbf{t};2))) \cup (\mathop{\cup}\limits_{ij} N(\tilde{\tilde{b}}_{i \leftrightarrow j}^{\infty}(V_{i \leftrightarrow j}(\mathbf{s}|\mathbf{t},2))))
 \label{eq:1.29}
 \end{eqnarray}
 \vspace{0.5em}