update chapter 2

be196d93 · 曹润柘 · a744eab3 · be196d93 · be196d93 · be196d93
Commit be196d93 authored Apr 12, 2020 by 曹润柘
--- a/Book/Chapter2/chapter2.tex
+++ b/Book/Chapter2/chapter2.tex
@@ -233,8 +233,7 @@ F(X)=\int_{-\infty}^x f(x)dx
 \item $S$：小张去上班
 \end{itemize}

-\parinterval 显然，$S_a$，$S_b$，$S_c$是$S$的划分。如果三条路不拥堵的概率分别为$\textrm{P}({S_{a}^{'}})$=0.2，$\textrm{P}({S_{b}^{'}})$=0.4，$\textrm{P}({S_{c}^{'}})$=0.7，那么事件$L$：小张上班没有遇到拥堵情况的概率就是：
-
+\parinterval 显然，$S_a$，$S_b$，$S_c$是$S$的划分。如果三条路不拥堵的概率分别为$\textrm{P}({S_{a}^{'}})$=0.2， $\textrm{P}({S_{b}^{'}})$=0.4，$\textrm{P}({S_{c}^{'}})$=0.7，那么事件$L$：小张上班没有遇到拥堵情况的概率就是：
 %--------------------------------------------
 \begin{eqnarray}
 {\textrm{P}(L)} &=& {\textrm{P}( L| S_a )\textrm{P}(S_a )+\textrm{P}( L| S_b )\textrm{P}(S_b )+\textrm{P}( L| S_c )\textrm{P}(S_c )}\nonumber \\
@@ -244,7 +243,6 @@ F(X)=\int_{-\infty}^x f(x)dx
 %--------------------------------------------

 \parinterval {\small\sffamily\bfseries{贝叶斯法则}}（Bayes' rule）是概率论中的一个经典公式，通常用于已知$\textrm{P}(A \mid B)$求$\textrm{P}(B \mid A)$。可以表述为：设$\{B_1,...,B_n\}$是$S$的一个划分，$A$为事件，则对于$i=1,...,n$，有如下公式
-
 %--------------------------------------------
 \begin{eqnarray}
 \textrm{P}(B_i \mid A) & = & \frac {\textrm{P}(A  B_i)} { \textrm{P}(A) } \nonumber \\
@@ -309,6 +307,7 @@ F(X)=\int_{-\infty}^x f(x)dx
 \subsubsection{KL距离}\index{Chapter2.2.5.2}

 \parinterval 如果同一个随机变量$X$上有两个独立的概率分布P$(x)$和Q$(x)$，那么可以使用KL距离(``Kullback-Leibler''散度)来衡量这两个分布的不同，这种度量就是{\small\bfnew{相对熵}}（Relative Entropy）。其公式如下：
+
 \begin{eqnarray}
 \textrm{D}_{\textrm{KL}}(\textrm{P}\parallel \textrm{Q}) & = & \sum_{x \in \textrm{X}} [ \textrm{P}(x)\log \frac{\textrm{P}(x) }{ \textrm{Q}(x) } ]  \nonumber \\
                                                                                       & = & \sum_{x \in \textrm{X} }[ \textrm{P}(x)(\log\textrm{P}(x)-\log \textrm{Q}(x))]
@@ -585,7 +584,6 @@ F(X)=\int_{-\infty}^x f(x)dx


 \parinterval 以``确实现在数据很多''这个实例来说，如果把这句话按照``确实/现在/数据/很/多''这样的方式进行切分，这个句子切分的概率P(``确实/现在/数据/很/多'')可以通过每个词出现概率相乘的方式进行计算。
-
 \begin{eqnarray}
 &\textrm{P}&\textrm{(``确实/现在/数据/很/多'')} \nonumber \\
 & = &\textrm{P}\textrm{(``确实'')} \cdot \textrm{P}\textrm{(``现在'')} \cdot \textrm{P}\textrm{(``数据'')} \cdot \textrm{P}\textrm{(``很'')} \cdot \textrm{P}\textrm{(``多'')} \nonumber \\
@@ -633,7 +631,6 @@ F(X)=\int_{-\infty}^x f(x)dx
 %-------------------------------------------

 \parinterval 直接求$\textrm{P}(w_1 w_2...w_m)$并不简单，因为如果把整个词串$w_1 w_2...w_m$作为一个变量，模型的参数量会非常大。$w_1 w_2...w_m$有$|V|^m$种可能性，这里$|V|$表示词汇表大小。显然，当$m$ 增大的时候会使模型复杂度会急剧增加，甚至都无法进行存储和计算。既然把$w_1 w_2...w_m$作为一个变量不好处理，就可以考虑对这个序列的生成过程进行分解。使用链式法则，很容易得到
-
 \begin{eqnarray}
 \textrm{P}(w_1 w_2...w_m)=\textrm{P}(w_1)\textrm{P}(w_2|w_1)\textrm{P}(w_3|w_1 w_2)...\textrm{P}(w_m|w_1 w_2...w_{m-1})
 \label{eq:2.4-1}
@@ -665,14 +662,14 @@ F(X)=\int_{-\infty}^x f(x)dx
 }
 \end{center}

-
+\vspace{-0.3em}
 \parinterval 可以看到，1-gram语言模型只是$n$-gram语言模型的一种特殊形式。$n$-gram的优点在于，它所使用的历史信息是有限的，即$n-1$个单词。这种性质也反映了经典的马尔可夫链的思想\cite{liuke-markov-2004}\cite{resnick1992adventures}有时也被称作马尔可夫假设或者马尔可夫属性。因此$n$-gram也可以被看作是变长序列上的一种马尔可夫模型，比如，2-gram语言模型对应着1阶马尔可夫模型，3-gram语言模型对应着2阶马尔可夫模型，以此类推。

 \parinterval 那么，如何计算$\textrm{P}(w_m|w_{m-n+1} ... w_{m-1})$呢？有很多种选择，比如：

 \vspace{0.3em}
 \begin{itemize}
-\item {\small\bfnew{极大似然估计}}。直接利用词序列在训练数据中出现的频度计算出$\textrm{P}(w_m$\\$|w_{m-n+1} ... w_{m-1})$
+\item {\small\bfnew{极大似然估计}}。直接利用词序列在训练数据中出现的频度计算出$\textrm{P}(w_m|w_{m-n+1}$\\$... w_{m-1})$
 \begin{eqnarray}
 \textrm{P}(w_m|w_{m-n+1}...w_{m-1})=\frac{\textrm{count}(w_{m-n+1}...w_m)}{\textrm{count}(w_{m-n+1}...w_{m-1})}
 \label{eq:2.4-3}
@@ -728,7 +725,7 @@ F(X)=\int_{-\infty}^x f(x)dx
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \subsubsection{加法平滑方法}\index{Chapter2.4.2.1}

-\parinterval {\small\bfnew{加法平滑}}（Additive Smoothing）一种简单的平滑技术。我们首先介绍这一方法，希望通过它了解平滑算法的思想。通常情况下，我们会利用采集到的语料库来模拟真实的全部语料库。当然，没有一个语料库能覆盖所有的语言现象。常见的一个问题是，使用的语料无法涵盖所有的词汇。因此，直接依据这样语料所获得的统计信息来获取语言模型就会产生偏差。假设依据某语料$C$ （从未出现`` 确实 现在''二元语法），评估一个已经分好词的句子$S$ =``确实/现在/物价/很/高''的概率。当计算``确实/现在''的概率时，$\textrm{P}(S) = 0$。显然这个结果是不合理的。
+\parinterval {\small\bfnew{加法平滑}}（Additive Smoothing）是一种简单的平滑技术。我们首先介绍这一方法，希望通过它了解平滑算法的思想。通常情况下，我们会利用采集到的语料库来模拟真实的全部语料库。当然，没有一个语料库能覆盖所有的语言现象。常见的一个问题是，使用的语料无法涵盖所有的词汇。因此，直接依据这样语料所获得的统计信息来获取语言模型就会产生偏差。假设依据某语料$C$ （从未出现`` 确实 现在''二元语法），评估一个已经分好词的句子$S$ =``确实/现在/物价/很/高''的概率。当计算``确实/现在''的概率时，$\textrm{P}(S) = 0$。显然这个结果是不合理的。

 \parinterval 加法平滑方法假设每个$n$-gram出现的次数比实际统计次数多$\theta$次，$0 \le \theta\le 1$。这样，计算概率的时候分子部分不会为0。重新计算$\textrm{P}(\textrm{现在}|\textrm{确实})$，可以得到：

@@ -764,7 +761,6 @@ N = \sum_{r=1}^{\infty}{r\,n_r}
 \end{eqnarray}

 \parinterval 这时，出现$r$次的$n$-gram的相对频率为$r/N$，也就是不做平滑处理时的概率估计。为了解决零概率问题，对于任何一个出现$r$次的$n$-gram，古德-图灵估计法利用出现$r+1$次的$n$-gram统计量重新假设它出现$r^*$次，这里
-
 \begin{eqnarray}
 r^* = (r + 1)\frac{n_{r + 1}}{n_r}
 \label{eq:2.4-9}
@@ -785,7 +781,6 @@ N & = & \sum_{r=0}^{\infty}{r^{*}n_r} \nonumber \\
 \end{eqnarray}

 也就是说，$N$仍然为这个整个样本分布最初的计数。样本中所有事件的概率之和为：
-
 \begin{eqnarray}
 \textrm{P}(r>0) & = & \sum_{r>0}{\textrm{P}_r} \nonumber \\
                & = & 1 - \frac{n_1}{N} < 1
@@ -887,7 +882,7 @@ c_{\textrm{KN}}(\cdot) & = & \begin{cases} \textrm{count}(\cdot)\quad\quad \text
 \end{eqnarray}
 \noindent 其中catcount$(\cdot)$表示的是基于某个单个词作为第$n$个词的$n$-gram的种类数目。

-\parinterval Kneser-Ney平滑是很多语言模型工具的基础（{\color{red} 引用参考文献！NiuTrans、KenLM、Berkeley LM、SRILM}）。还有很多基于此为基础衍生出来的算法，有兴趣的读者可以通过参考文献自行了解\cite{parsing2009speech}\cite{ney1994structuring}\cite{chen1999empirical}。
+\parinterval Kneser-Ney平滑是很多语言模型工具的基础\cite{wang-etal-2018-niutrans}\cite{heafield-2011-kenlm}\cite{stolcke2002srilm}。还有很多基于此为基础衍生出来的算法，有兴趣的读者可以通过参考文献自行了解\cite{parsing2009speech}\cite{ney1994structuring}\cite{chen1999empirical}。

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \section{句法分析（短语结构分析）}\index{Chapter2.5}
@@ -953,6 +948,7 @@ c_{\textrm{KN}}(\cdot) & = & \begin{cases} \textrm{count}(\cdot)\quad\quad \text
 一个上下文无关文法可以被视为一个系统$G=<N,\Sigma,R,S>$，其中

 \begin{itemize}
+\item
 \item $N$为一个非终结符集合
 \item $\Sigma$为一个终结符集合
 \item $R$为一个规则（产生式）集合，每条规则 $r \in R$的形式为$X \to Y_1Y_2...Y_n$，其中$X \in N$, $Y_i \in N \cup \Sigma$
@@ -1020,6 +1016,7 @@ s_0 \overset{r_1}{\Rightarrow} s_1 \overset{r_2}{\Rightarrow} s_2 \overset{r_3}{

 且
 \begin{itemize}
+\item 
 \item $\forall i \in [0,n], s_i \in (N\cup\Sigma)^*$ \hspace{3.5em} $\lhd$ $s_i$为合法的字符串
 \item $\forall j \in [1,n], r_j \in R$ \hspace{6.3em} $\lhd$ $r_j$为$G$的规则
 \item $s_0 \in S$ \hspace{10.9em} $\lhd$ $s_0$为起始非终结符
@@ -1095,6 +1092,7 @@ s_0 \overset{r_1}{\Rightarrow} s_1 \overset{r_2}{\Rightarrow} s_2 \overset{r_3}{

 一个概率上下文无关文法可以被视为一个系统$G=<N,\Sigma,R,S>$，其中
 \begin{itemize}
+\item 
 \item $N$为一个非终结符集合
 \item $\Sigma$为一个终结符集合
 \item $R$为一个规则(产生式)集合，每条规则 $r \in R$的形式为$p:X \to Y_1Y_2...Y_n$，其中$X \in N$, $Y_i \in N \cup \Sigma$，每个$r$都对应一个概率$p$，表示其生成的可能性。

--- a/Book/bibliography.bib
+++ b/Book/bibliography.bib
@@ -328,7 +328,6 @@ year={2017}}
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


-
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 %%%%% chapter 2------------------------------------------------------

@@ -512,9 +511,48 @@ year={2015}
  publisher={Prentice-Hall Englewood Cliffs, NJ}
 }

+@inproceedings{heafield-2011-kenlm,
+    title = "{K}en{LM}: Faster and Smaller Language Model Queries",
+    author = "Heafield, Kenneth",
+    booktitle = "Proceedings of the Sixth Workshop on Statistical Machine Translation",
+    month = jul,
+    year = "2011",
+    address = "Edinburgh, Scotland",
+    publisher = "Association for Computational Linguistics",
+    url = "https://www.aclweb.org/anthology/W11-2123",
+    pages = "187--197"
+}
+
+@inproceedings{wang-etal-2018-niutrans,
+    title = "The {N}iu{T}rans Machine Translation System for {WMT}18",
+    author = "Wang, Qiang  and
+      Li, Bei  and
+      Liu, Jiqiang  and
+      Jiang, Bojian  and
+      Zhang, Zheyang  and
+      Li, Yinqiao  and
+      Lin, Ye  and
+      Xiao, Tong  and
+      Zhu, Jingbo",
+    booktitle = "Proceedings of the Third Conference on Machine Translation: Shared Task Papers",
+    month = oct,
+    year = "2018",
+    address = "Belgium, Brussels",
+    publisher = "Association for Computational Linguistics",
+    url = "https://www.aclweb.org/anthology/W18-6430",
+    doi = "10.18653/v1/W18-6430",
+    pages = "528--534"
+}
+
+@article{stolcke2002srilm,
+	title={SRILM - an extensible language modeling toolkit},
+	author={Stolcke, Andreas},
+	journal={INTERSPEECH},
+	year={2002}
+}
+
 %%%%% chapter 2------------------------------------------------------
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 %%%%% chapter 3------------------------------------------------------


--- a/Book/mt-book-xelatex.tex
+++ b/Book/mt-book-xelatex.tex
@@ -112,13 +112,13 @@
 %	CHAPTERS
 %----------------------------------------------------------------------------------------

-\include{Chapter1/chapter1}
+%\include{Chapter1/chapter1}
 \include{Chapter2/chapter2}
-\include{Chapter3/chapter3}
-\include{Chapter4/chapter4}
-\include{Chapter5/chapter5}
-\include{Chapter6/chapter6}
-\include{ChapterAppend/chapterappend}
+%\include{Chapter3/chapter3}
+%\include{Chapter4/chapter4}
+%\include{Chapter5/chapter5}
+%\include{Chapter6/chapter6}
+%\include{ChapterAppend/chapterappend}




--- a/Book/mt-book.tex
+++ b/Book/mt-book.tex
 % !Mode:: "TeX:UTF-8"
 % !TEX encoding = UTF-8 Unicode

-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-% The Legrand Orange Book
-% LaTeX Template
-% Version 2.4 (26/09/2018)
-%
-% This template was downloaded from:
-% http://www.LaTeXTemplates.com
-%
-% Original author:
-% Mathias Legrand (legrand.mathias@gmail.com) with modifications by:
-% Vel (vel@latextemplates.com)
-%
-% License:
-% CC BY-NC-SA 3.0 (http://creativecommons.org/licenses/by-nc-sa/3.0/)
-%
-% Compiling this template:
-% This template uses biber for its bibliography and makeindex for its index.
-% When you first open the template, compile it from the command line with the
-% commands below to make sure your LaTeX distribution is configured correctly:
-%
-% 1) pdflatex main
-% 2) makeindex main.idx -s StyleInd.ist
-% 3) biber main
-% 4) pdflatex main x 2
-%
-% After this, when you wish to update the bibliography/index use the appropriate
-% command above and make sure to compile with pdflatex several times
-% afterwards to propagate your changes to the document.
-%
-% This template also uses a number of packages which may need to be
-% updated to the newest versions for the template to compile. It is strongly
-% recommended you update your LaTeX distribution if you have any
-% compilation errors.
-%
-% Important note:
-% Chapter heading images should have a 2:1 width:height ratio,
-% e.g. 920px width and 460px height.
-%
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-
 %----------------------------------------------------------------------------------------
 %	PACKAGES AND OTHER DOCUMENT CONFIGURATIONS
 %----------------------------------------------------------------------------------------
@@ -61,7 +21,7 @@

 %\IfFileExists{C:/WINDOWS/win.ini}
 {\newcommand{\mycfont}{song}}
-%{\newcommand{\mycfont}{gbsn}}
+{\newcommand{\mycfont}{gbsn}}

 %公式字体设置为计算机现代罗马
 \AtBeginDocument{
@@ -98,7 +58,7 @@
 \node[inner sep=0pt] (background) at (current page.center) {\includegraphics[width=\paperwidth]{background.pdf}};
 \draw (current page.center) node [fill=ocre!30!white,fill opacity=0.6,text opacity=1,inner sep=1cm]{\Huge\centering\bfseries\sffamily\parbox[c][][t]{\paperwidth}{\centering 机器翻译：统计建模与深度学习方法\\[15pt] % Book title
 %{\Large 副标题是否需要}\\[20pt] % Subtitle
-{\huge 肖桐}}}; % Author name
+{\LARGE 肖桐\ \ 朱靖波}}}; % Author name
 \end{tikzpicture}
 \vfill
 \endgroup
@@ -111,38 +71,51 @@
 ~\vfill
 \thispagestyle{empty}

-\noindent Copyright \copyright\ 2020 Xiao Tong\\ % Copyright notice
+\noindent Copyright \copyright\ 2020 肖桐\ \ 朱靖波\\ % Copyright notice

-\noindent \textsc{Published by \red{Publisher}}\\ % Publisher
+\noindent \textsc{东北大学自然语言处理实验室\ /\ 小牛翻译}\\ % Publisher

 \noindent \textsc{\url{http://47.105.50.196/NiuTrans/Toy-MT-Introduction/tree/master/Book}}\\ % URL

 \noindent {\red{Licensed under the Creative Commons Attribution-NonCommercial 3.0 Unported License (the ``License''). You may not use this file except in compliance with the License. You may obtain a copy of the License at \url{http://creativecommons.org/licenses/by-nc/3.0}. Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \textsc{``as is'' basis, without warranties or conditions of any kind}, either express or implied. See the License for the specific language governing permissions and limitations under the License.}}\\ % License information, replace this with your own license (if any)

-\noindent \textit{First printing, \red{March 2019}} % Printing/edition date
+\noindent \textit{First Edition, April 2020}

 %----------------------------------------------------------------------------------------
-%	TABLE OF CONTENTS
+%	ACKNOWLEDGE PAGE
 %----------------------------------------------------------------------------------------

-\chapterimage{chapter_head_1.pdf} %目录标题的图案
+\newpage
+~\vfill
+\thispagestyle{empty}

-\pagestyle{empty} % Disable headers and footers for the following pages
+{\large
+\noindent {\color{red} 在此感谢所有为本书做出贡献的人} \\

-\tableofcontents % 打印目录
+\noindent 曹润柘、曾信、孟霞、单韦乔、姜雨帆、王子扬、刘辉、许诺、李北、刘继强、张哲旸、周书涵、周涛、张裕浩、李炎洋，刘晓倩、牛蕊 \\
+}

-\cleardoublepage %保证章节页在奇数页

+%----------------------------------------------------------------------------------------
+%	TABLE OF CONTENTS
+%----------------------------------------------------------------------------------------
+%\usechapterimagefalse % If you don't want to include a chapter image, use this to toggle images off - it can be enabled later with \usechapterimagetrue
+\chapterimage{chapter_head_1.pdf} %目录标题的图案
+\pagestyle{empty} % Disable headers and footers for the following pages
+\tableofcontents % 打印目录
+\cleardoublepage %保证章节页在奇数页
 \pagestyle{fancy} % Enable headers and footers again

 %----------------------------------------------------------------------------------------
 %	CHAPTERS
 %----------------------------------------------------------------------------------------
-\include{Chapter1/chapter1}
+%\include{Chapter1/chapter1}
 \include{Chapter2/chapter2}
-\include{Chapter3/chapter3}
-\include{Chapter5/chapter5}
-\include{Chapter6/chapter6}
+%\include{Chapter3/chapter3}
+%\include{Chapter4/chapter4}
+%\include{Chapter5/chapter5}
+%\include{Chapter6/chapter6}
+%\include{ChapterAppend/chapterappend}


 %----------------------------------------------------------------------------------------