合并分支 'caorunzhe' 到 'zengxin'

Caorunzhe 查看合并请求 !307

合并分支 'caorunzhe' 到 'zengxin'
Caorunzhe 查看合并请求 !307
e43bf648 · zengxin · 2ef067af · 1801ec0d · e43bf648 · e43bf648
Commit e43bf648 authored Oct 09, 2020 by zengxin
--- a/Chapter2/chapter2.tex
+++ b/Chapter2/chapter2.tex
@@ -150,7 +150,7 @@ F(x)=\int_{-\infty}^x f(x)\textrm{d}x

 \parinterval 推广到$n$个事件，可以得到{\small\bfnew{链式法则}}\index{链式法则}（Chain Rule\index{Chain Rule}）的公式：
 \begin{eqnarray}
-\funp{P}(x_1,x_2, \ldots ,x_n)=\funp{P}(x_1) \prod_{i=2}^n \funp{P}(x_i \mid x_1, \ldots ,x_{i-1})
+\funp{P}(x_1,x_2, \ldots ,x_n)=\prod_{i=1}^n \funp{P}(x_i \mid x_1, \ldots ,x_{i-1})
 \label{eq:2-6}
 \end{eqnarray}

@@ -258,7 +258,7 @@ F(x)=\int_{-\infty}^x f(x)\textrm{d}x
 \label{eq:2-14}
 \end{eqnarray}

-\parinterval 一个分布的信息熵也就是从该分布中得到的一个事件的期望信息量。比如，$a$、$b$、$c$、$d$四支球队，四支队伍夺冠的概率分别是$\funp{P}_1$、$\funp{P}_2$、$\funp{P}_3$、$\funp{P}_4$，某个人对比赛不感兴趣但是又想知道哪只球队夺冠，通过使用二分法2次就确定哪支球队夺冠了。但假设这四只球队中$c$的实力可以碾压其他球队，那么猜1次就可以确定。所以对于前面这种情况，哪只球队夺冠的信息量较高，信息熵也相对较高；对于后面这种情况，因为结果是容易猜到的，信息量和信息熵也就相对较低。因此可以得知：分布越尖锐熵越低，分布越均匀熵越高。
+\parinterval 一个分布的信息熵也就是从该分布中得到的一个事件的期望信息量。比如，$a$、$b$、$c$、$d$四支球队，四支队伍夺冠的概率分别是$\funp{P}_1$、$\funp{P}_2$、$\funp{P}_3$、$\funp{P}_4$，某个人对比赛不感兴趣但是又想知道哪只球队夺冠，使用2次二分法就能确定哪支球队夺冠了。但假设这四只球队中$c$的实力可以碾压其他球队，那么猜1次就可以确定。所以对于前面这种情况，哪只球队夺冠的信息量较高，信息熵也相对较高；对于后面这种情况，因为结果是容易猜到的，信息量和信息熵也就相对较低。因此可以得知：分布越尖锐熵越低，分布越均匀熵越高。

 %----------------------------------------------------------------------------------------
 %    NEW SUBSUB-SECTION
@@ -1022,7 +1022,7 @@ c(\cdot) & \textrm{当计算最高阶模型时}  \\

 \parinterval 对于语言模型来说，当多个路径中最高得分比当前搜索到的最好的解的得分低时，可以立刻停止搜索。因为此时序列越长语言模型得分$\log \funp{P}(w_1 w_2 \ldots w_m)$会越低，继续扩展这些路径不会产生更好的结果。这个技术通常也被称为{\small\bfnew{最佳停止条件}}\index{最佳停止条件}（Optimal Stopping Criteria）\index{Optimal Stopping Criteria}。类似的思想也被用于机器翻译等任务\upcite{DBLP:conf/emnlp/HuangZM17,DBLP:conf/emnlp/Yang0M18}。

-\parinterval 总的来说，虽然局部搜索由于没有遍历完整的解空间，使得这类方法无法保证找到最优解。但是，局部搜索算法大大降低了搜索过程的时间、空间复杂度。因此在语言模型生成和机器翻译的解码过程中常常使用局部搜索算法。在{\chapterseven}、{\chapterten}中还将介绍这些算法的具体应用。
+\parinterval 总的来说，虽然局部搜索没有遍历完整的解空间，使得这类方法无法保证找到最优解。但是，局部搜索算法大大降低了搜索过程的时间、空间复杂度。因此在语言模型生成和机器翻译的解码过程中常常使用局部搜索算法。在{\chapterseven}、{\chapterten}中还将介绍这些算法的具体应用。

 %----------------------------------------------------------------------------------------
 %    NEW SECTION

--- a/Chapter5/chapter5.tex
+++ b/Chapter5/chapter5.tex
@@ -978,7 +978,7 @@ f(s_u|t_v) = \lambda_{t_v}^{-1} \frac{\varepsilon}{(l+1)^{m}} \prod\limits_{j=1}
 \end{figure}
 %----------------------------------------------

-\parinterval 为了化简$f(s_u|t_v)$的计算，在此对公式\eqref{eq:5-39}进行了重新组织，见图\ref{fig:5-25}。其中，红色部分表示翻译概率P$(\seq{s}|\seq{t})$；蓝色部分表示$(s_u,t_v)$ 在句对$(\seq{s},\seq{t})$中配对的总次数，即“$t_v$翻译为$s_u$”在所有对齐中出现的次数；绿色部分表示$f(s_u|t_v)$对于所有的$t_i$的相对值，即“$t_v$翻译为$s_u$”在所有对齐中出现的相对概率；蓝色与绿色部分相乘表示“$t_v$翻译为$s_u$”这个事件出现次数的期望的估计，称之为{\small\sffamily\bfseries{期望频次}}\index{期望频次}(Expected Count)\index{Expected Count}。
+\parinterval 为了化简$f(s_u|t_v)$的计算，在此对公式\eqref{eq:5-39}进行了重新组织，见图\ref{fig:5-25}。其中，红色部分表示翻译概率P$(\seq{s}|\seq{t})$；蓝色部分表示$(s_u,t_v)$ 在句对$(\seq{s},\seq{t})$中配对的总次数，即“$t_v$翻译为$s_u$”在所有对齐中出现的次数；绿色部分表示$f(s_u|t_v)$对于所有的$t_i$的相对值，即“$t_v$翻译为$s_u$”在所有对齐中出现的相对概率；蓝色与绿色部分相乘表示“$t_v$翻译为$s_u$”这个事件出现次数的期望的估计，称之为{\small\sffamily\bfseries{期望频次}}\index{期望频次}（Expected Count）\index{Expected Count}。
 \vspace{-0.3em}
 %----------------------------------------------
 \begin{figure}[htp]

--- a/Chapter7/Figures/figure-translation-hypothesis-extension.tex
+++ b/Chapter7/Figures/figure-translation-hypothesis-extension.tex
@@ -64,7 +64,7 @@
 }

 {
-\draw [->,ultra thick,red,line width=2pt,opacity=0.7] ([xshift=-0.5em]h0.west) -- ([xshift=0.7em]h0.east) -- ([xshift=-0.2em]h3.west) -- ([xshift=0.8em]h3.east) -- ([xshift=-0.2em]h5.west) -- ([xshift=0.8em]h5.east) -- ([xshift=-0.2em]h7.west) -- ([xshift=1.5em]h7.east);
+\draw [->,ultra thick,red,line width=2pt,opacity=0.7] ([xshift=-0.5em,yshift=-0.5em]h0.west) -- ([xshift=0.7em,yshift=-0.5em]h0.east) -- ([xshift=-0.2em,yshift=-0.5em]h3.west) -- ([xshift=0.8em,yshift=-0.5em]h3.east) -- ([xshift=-0.2em,yshift=-0.5em]h5.west) -- ([xshift=0.8em,yshift=-0.5em]h5.east) -- ([xshift=-0.2em,yshift=-0.5em]h7.west) -- ([xshift=1.5em,yshift=-0.5em]h7.east);
 \node [anchor=north west] (wtranslabel) at ([yshift=-3em]h0.south west) {\small{翻译路径：}};
 \draw [->,ultra thick,red,line width=1.5pt,opacity=0.7] (wtranslabel.east) -- ([xshift=1.5em]wtranslabel.east);
 }

--- a/Chapter7/Figures/figure-word-and-phrase-translation-regard-as-path.tex
+++ b/Chapter7/Figures/figure-word-and-phrase-translation-regard-as-path.tex
@@ -129,13 +129,13 @@
 \draw[decorate,thick,decoration={brace,amplitude=5pt,mirror}] ([yshift=0em,xshift=-0.5em]t11.north west) -- ([xshift=-0.5em]t13.south west) node [pos=0.5,left,xshift=1.0em,yshift=0.0em,text width=5em,align=left] (label2) {\footnotesize{\textbf{单词翻译}}};

 {
-\draw [->,ultra thick,red,line width=2pt,opacity=0.7] ([xshift=-0.5em]t13.west) -- ([xshift=0.8em]t13.east) -- ([xshift=-0.2em]t22.west) -- ([xshift=0.8em]t22.east) -- ([xshift=-0.2em]t31.west) -- ([xshift=0.8em]t31.east) -- ([xshift=-0.2em]t41.west) -- ([xshift=0.8em]t41.east) -- ([xshift=-0.2em]t52.west) -- ([xshift=1.2em]t52.east);
+\draw [->,ultra thick,red,line width=2pt,opacity=0.7] ([xshift=-0.5em,yshift=-0.5em]t13.west) -- ([xshift=0.8em,yshift=-0.5em]t13.east) -- ([xshift=-0.2em,yshift=-0.5em]t22.west) -- ([xshift=0.8em,yshift=-0.5em]t22.east) -- ([xshift=-0.2em,yshift=-0.5em]t31.west) -- ([xshift=0.8em,yshift=-0.5em]t31.east) -- ([xshift=-0.2em,yshift=-0.5em]t41.west) -- ([xshift=0.8em,yshift=-0.5em]t41.east) -- ([xshift=-0.2em,yshift=-0.5em]t52.west) -- ([xshift=1.2em,yshift=-0.5em]t52.east);
 }

 {
-\draw [->,ultra thick,ublue,line width=2pt,opacity=0.7] ([xshift=-0.5em]t15.west) -- ([xshift=0.8em]t15.east) -- ([xshift=-0.2em]t32.west) -- ([xshift=1.2em]t32.east);
+\draw [->,ultra thick,ublue,line width=2pt,opacity=0.7] ([xshift=-0.5em,yshift=-0.5em]t15.west) -- ([xshift=0.8em,yshift=-0.5em]t15.east) -- ([xshift=-0.2em,yshift=-0.5em]t32.west) -- ([xshift=1.2em,yshift=-0.5em]t32.east);

-\draw [->,ultra thick,ublue,line width=2pt,opacity=0.7] ([xshift=-0.5em,yshift=0.1em]t13.west) -- ([xshift=0.8em,yshift=0.1em]t13.east) -- ([xshift=-0.2em]t25.west) -- ([xshift=0.8em]t25.east) -- ([xshift=-0.2em,yshift=0.1em]t41.west) -- ([xshift=0.8em,yshift=0.1em]t41.east) -- ([xshift=-0.2em,yshift=0.1em]t52.west) -- ([xshift=1.2em,yshift=0.1em]t52.east);
+\draw [->,ultra thick,ublue,line width=2pt,opacity=0.7] ([xshift=-0.5em,yshift=-0.4em]t13.west) -- ([xshift=0.8em,yshift=-0.4em]t13.east) -- ([xshift=-0.2em,yshift=-0.4em]t25.west) -- ([xshift=0.8em,yshift=-0.4em]t25.east) -- ([xshift=-0.2em,yshift=-0.4em]t41.west) -- ([xshift=0.8em,yshift=-0.4em]t41.east) -- ([xshift=-0.2em,yshift=-0.4em]t52.west) -- ([xshift=1.2em,yshift=-0.4em]t52.east);
 }

 {

--- a/Chapter7/Figures/figure-word-translation-regard-as-path.tex
+++ b/Chapter7/Figures/figure-word-translation-regard-as-path.tex
@@ -84,7 +84,7 @@
 \begin{scope}

 {
-\draw [->,ultra thick,red,line width=2pt,opacity=0.7] ([xshift=-0.5em]t13.west) -- ([xshift=0.8em]t13.east) -- ([xshift=-0.2em]t22.west) -- ([xshift=0.8em]t22.east) -- ([xshift=-0.2em]t31.west) -- ([xshift=0.8em]t31.east) -- ([xshift=-0.2em]t41.west) -- ([xshift=0.8em]t41.east) -- ([xshift=-0.2em]t52.west) -- ([xshift=1.2em]t52.east);
+\draw [->,ultra thick,red,line width=2pt,opacity=0.7] ([xshift=-0.5em,yshift=-0.5em]t13.west) -- ([xshift=0.8em,yshift=-0.5em]t13.east) -- ([xshift=-0.2em,yshift=-0.5em]t22.west) -- ([xshift=0.8em,yshift=-0.5em]t22.east) -- ([xshift=-0.2em,yshift=-0.5em]t31.west) -- ([xshift=0.8em,yshift=-0.5em]t31.east) -- ([xshift=-0.2em,yshift=-0.5em]t41.west) -- ([xshift=0.8em,yshift=-0.5em]t41.east) -- ([xshift=-0.2em,yshift=-0.5em]t52.west) -- ([xshift=1.2em,yshift=-0.5em]t52.east);
 }

 \end{scope}

--- a/bibliography.bib
+++ b/bibliography.bib
@@ -1941,18 +1941,6 @@ pages = {150--155},
 publisher = {Annual Meeting of the Association for Computational Linguistics},
 year = {2015},
 }
-@inproceedings{DBLP:conf/acl/ShenCHHWSL16,
-author = {Shiqi Shen and
-Yong Cheng and
-Zhongjun He and
-Wei He and
-Hua Wu and
-Maosong Sun and
-Yang Liu},
-title = {Minimum Risk Training for Neural Machine Translation},
-publisher = {Annual Meeting of the Association for Computational Linguistics},
-year = {2016},
-}
 @inproceedings{kulesza2004learning,
 title={A learning approach to improving sentence-level MT evaluation},
 author={Kulesza, Alex and Shieber, Stuart},
@@ -2179,16 +2167,6 @@ year = {2012}
  publisher = {Annual Meeting of the Association for Computational Linguistics},
  year      = {2017}
 }
-@inproceedings{DBLP:conf/wmt/YankovskayaTF19,
-  author    = {Elizaveta Yankovskaya and
-               Andre T{\"{a}}ttar and
-               Mark Fishel},
-  title     = {Quality Estimation and Translation Metrics via Pre-trained Word and
-               Sentence Embeddings},
-  pages     = {101--105},
-  publisher = {Annual Meeting of the Association for Computational Linguistics},
-  year      = {2019}
-}
 @inproceedings{DBLP:conf/wmt/KimLKN19,
  author    = {Hyun Kim and
               Joon-Ho Lim and