Commit be196d93 by 曹润柘

update chapter 2

parent a744eab3
......@@ -233,8 +233,7 @@ F(X)=\int_{-\infty}^x f(x)dx
\item $S$:小张去上班
\end{itemize}
\parinterval 显然,$S_a$$S_b$$S_c$$S$的划分。如果三条路不拥堵的概率分别为$\textrm{P}({S_{a}^{'}})$=0.2,$\textrm{P}({S_{b}^{'}})$=0.4,$\textrm{P}({S_{c}^{'}})$=0.7,那么事件$L$:小张上班没有遇到拥堵情况的概率就是:
\parinterval 显然,$S_a$$S_b$$S_c$$S$的划分。如果三条路不拥堵的概率分别为$\textrm{P}({S_{a}^{'}})$=0.2, $\textrm{P}({S_{b}^{'}})$=0.4,$\textrm{P}({S_{c}^{'}})$=0.7,那么事件$L$:小张上班没有遇到拥堵情况的概率就是:
%--------------------------------------------
\begin{eqnarray}
{\textrm{P}(L)} &=& {\textrm{P}( L| S_a )\textrm{P}(S_a )+\textrm{P}( L| S_b )\textrm{P}(S_b )+\textrm{P}( L| S_c )\textrm{P}(S_c )}\nonumber \\
......@@ -244,7 +243,6 @@ F(X)=\int_{-\infty}^x f(x)dx
%--------------------------------------------
\parinterval {\small\sffamily\bfseries{贝叶斯法则}}(Bayes' rule)是概率论中的一个经典公式,通常用于已知$\textrm{P}(A \mid B)$$\textrm{P}(B \mid A)$。可以表述为:设$\{B_1,...,B_n\}$$S$的一个划分,$A$为事件,则对于$i=1,...,n$,有如下公式
%--------------------------------------------
\begin{eqnarray}
\textrm{P}(B_i \mid A) & = & \frac {\textrm{P}(A B_i)} { \textrm{P}(A) } \nonumber \\
......@@ -309,6 +307,7 @@ F(X)=\int_{-\infty}^x f(x)dx
\subsubsection{KL距离}\index{Chapter2.2.5.2}
\parinterval 如果同一个随机变量$X$上有两个独立的概率分布P$(x)$和Q$(x)$,那么可以使用KL距离(``Kullback-Leibler''散度)来衡量这两个分布的不同,这种度量就是{\small\bfnew{相对熵}}(Relative Entropy)。其公式如下:
\begin{eqnarray}
\textrm{D}_{\textrm{KL}}(\textrm{P}\parallel \textrm{Q}) & = & \sum_{x \in \textrm{X}} [ \textrm{P}(x)\log \frac{\textrm{P}(x) }{ \textrm{Q}(x) } ] \nonumber \\
& = & \sum_{x \in \textrm{X} }[ \textrm{P}(x)(\log\textrm{P}(x)-\log \textrm{Q}(x))]
......@@ -585,7 +584,6 @@ F(X)=\int_{-\infty}^x f(x)dx
\parinterval 以``确实现在数据很多''这个实例来说,如果把这句话按照``确实/现在/数据/很/多''这样的方式进行切分,这个句子切分的概率P(``确实/现在/数据/很/多'')可以通过每个词出现概率相乘的方式进行计算。
\begin{eqnarray}
&\textrm{P}&\textrm{(``确实/现在/数据/很/多'')} \nonumber \\
& = &\textrm{P}\textrm{(``确实'')} \cdot \textrm{P}\textrm{(``现在'')} \cdot \textrm{P}\textrm{(``数据'')} \cdot \textrm{P}\textrm{(``很'')} \cdot \textrm{P}\textrm{(``多'')} \nonumber \\
......@@ -633,7 +631,6 @@ F(X)=\int_{-\infty}^x f(x)dx
%-------------------------------------------
\parinterval 直接求$\textrm{P}(w_1 w_2...w_m)$并不简单,因为如果把整个词串$w_1 w_2...w_m$作为一个变量,模型的参数量会非常大。$w_1 w_2...w_m$$|V|^m$种可能性,这里$|V|$表示词汇表大小。显然,当$m$ 增大的时候会使模型复杂度会急剧增加,甚至都无法进行存储和计算。既然把$w_1 w_2...w_m$作为一个变量不好处理,就可以考虑对这个序列的生成过程进行分解。使用链式法则,很容易得到
\begin{eqnarray}
\textrm{P}(w_1 w_2...w_m)=\textrm{P}(w_1)\textrm{P}(w_2|w_1)\textrm{P}(w_3|w_1 w_2)...\textrm{P}(w_m|w_1 w_2...w_{m-1})
\label{eq:2.4-1}
......@@ -665,14 +662,14 @@ F(X)=\int_{-\infty}^x f(x)dx
}
\end{center}
\vspace{-0.3em}
\parinterval 可以看到,1-gram语言模型只是$n$-gram语言模型的一种特殊形式。$n$-gram的优点在于,它所使用的历史信息是有限的,即$n-1$个单词。这种性质也反映了经典的马尔可夫链的思想\cite{liuke-markov-2004}\cite{resnick1992adventures}有时也被称作马尔可夫假设或者马尔可夫属性。因此$n$-gram也可以被看作是变长序列上的一种马尔可夫模型,比如,2-gram语言模型对应着1阶马尔可夫模型,3-gram语言模型对应着2阶马尔可夫模型,以此类推。
\parinterval 那么,如何计算$\textrm{P}(w_m|w_{m-n+1} ... w_{m-1})$呢?有很多种选择,比如:
\vspace{0.3em}
\begin{itemize}
\item {\small\bfnew{极大似然估计}}。直接利用词序列在训练数据中出现的频度计算出$\textrm{P}(w_m$\\$|w_{m-n+1} ... w_{m-1})$
\item {\small\bfnew{极大似然估计}}。直接利用词序列在训练数据中出现的频度计算出$\textrm{P}(w_m|w_{m-n+1}$\\$... w_{m-1})$
\begin{eqnarray}
\textrm{P}(w_m|w_{m-n+1}...w_{m-1})=\frac{\textrm{count}(w_{m-n+1}...w_m)}{\textrm{count}(w_{m-n+1}...w_{m-1})}
\label{eq:2.4-3}
......@@ -728,7 +725,7 @@ F(X)=\int_{-\infty}^x f(x)dx
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsubsection{加法平滑方法}\index{Chapter2.4.2.1}
\parinterval {\small\bfnew{加法平滑}}(Additive Smoothing)一种简单的平滑技术。我们首先介绍这一方法,希望通过它了解平滑算法的思想。通常情况下,我们会利用采集到的语料库来模拟真实的全部语料库。当然,没有一个语料库能覆盖所有的语言现象。常见的一个问题是,使用的语料无法涵盖所有的词汇。因此,直接依据这样语料所获得的统计信息来获取语言模型就会产生偏差。假设依据某语料$C$ (从未出现`` 确实 现在''二元语法),评估一个已经分好词的句子$S$ =``确实/现在/物价/很/高''的概率。当计算``确实/现在''的概率时,$\textrm{P}(S) = 0$。显然这个结果是不合理的。
\parinterval {\small\bfnew{加法平滑}}(Additive Smoothing)一种简单的平滑技术。我们首先介绍这一方法,希望通过它了解平滑算法的思想。通常情况下,我们会利用采集到的语料库来模拟真实的全部语料库。当然,没有一个语料库能覆盖所有的语言现象。常见的一个问题是,使用的语料无法涵盖所有的词汇。因此,直接依据这样语料所获得的统计信息来获取语言模型就会产生偏差。假设依据某语料$C$ (从未出现`` 确实 现在''二元语法),评估一个已经分好词的句子$S$ =``确实/现在/物价/很/高''的概率。当计算``确实/现在''的概率时,$\textrm{P}(S) = 0$。显然这个结果是不合理的。
\parinterval 加法平滑方法假设每个$n$-gram出现的次数比实际统计次数多$\theta$次,$0 \le \theta\le 1$。这样,计算概率的时候分子部分不会为0。重新计算$\textrm{P}(\textrm{现在}|\textrm{确实})$,可以得到:
......@@ -764,7 +761,6 @@ N = \sum_{r=1}^{\infty}{r\,n_r}
\end{eqnarray}
\parinterval 这时,出现$r$次的$n$-gram的相对频率为$r/N$,也就是不做平滑处理时的概率估计。为了解决零概率问题,对于任何一个出现$r$次的$n$-gram,古德-图灵估计法利用出现$r+1$次的$n$-gram统计量重新假设它出现$r^*$次,这里
\begin{eqnarray}
r^* = (r + 1)\frac{n_{r + 1}}{n_r}
\label{eq:2.4-9}
......@@ -785,7 +781,6 @@ N & = & \sum_{r=0}^{\infty}{r^{*}n_r} \nonumber \\
\end{eqnarray}
也就是说,$N$仍然为这个整个样本分布最初的计数。样本中所有事件的概率之和为:
\begin{eqnarray}
\textrm{P}(r>0) & = & \sum_{r>0}{\textrm{P}_r} \nonumber \\
& = & 1 - \frac{n_1}{N} < 1
......@@ -887,7 +882,7 @@ c_{\textrm{KN}}(\cdot) & = & \begin{cases} \textrm{count}(\cdot)\quad\quad \text
\end{eqnarray}
\noindent 其中catcount$(\cdot)$表示的是基于某个单个词作为第$n$个词的$n$-gram的种类数目。
\parinterval Kneser-Ney平滑是很多语言模型工具的基础{\color{red} 引用参考文献!NiuTrans、KenLM、Berkeley LM、SRILM}。还有很多基于此为基础衍生出来的算法,有兴趣的读者可以通过参考文献自行了解\cite{parsing2009speech}\cite{ney1994structuring}\cite{chen1999empirical}
\parinterval Kneser-Ney平滑是很多语言模型工具的基础\cite{wang-etal-2018-niutrans}\cite{heafield-2011-kenlm}\cite{stolcke2002srilm}。还有很多基于此为基础衍生出来的算法,有兴趣的读者可以通过参考文献自行了解\cite{parsing2009speech}\cite{ney1994structuring}\cite{chen1999empirical}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{句法分析(短语结构分析)}\index{Chapter2.5}
......@@ -953,6 +948,7 @@ c_{\textrm{KN}}(\cdot) & = & \begin{cases} \textrm{count}(\cdot)\quad\quad \text
一个上下文无关文法可以被视为一个系统$G=<N,\Sigma,R,S>$,其中
\begin{itemize}
\item
\item $N$为一个非终结符集合
\item $\Sigma$为一个终结符集合
\item $R$为一个规则(产生式)集合,每条规则 $r \in R$的形式为$X \to Y_1Y_2...Y_n$,其中$X \in N$, $Y_i \in N \cup \Sigma$
......@@ -1020,6 +1016,7 @@ s_0 \overset{r_1}{\Rightarrow} s_1 \overset{r_2}{\Rightarrow} s_2 \overset{r_3}{
\begin{itemize}
\item
\item $\forall i \in [0,n], s_i \in (N\cup\Sigma)^*$ \hspace{3.5em} $\lhd$ $s_i$为合法的字符串
\item $\forall j \in [1,n], r_j \in R$ \hspace{6.3em} $\lhd$ $r_j$$G$的规则
\item $s_0 \in S$ \hspace{10.9em} $\lhd$ $s_0$为起始非终结符
......@@ -1095,6 +1092,7 @@ s_0 \overset{r_1}{\Rightarrow} s_1 \overset{r_2}{\Rightarrow} s_2 \overset{r_3}{
一个概率上下文无关文法可以被视为一个系统$G=<N,\Sigma,R,S>$,其中
\begin{itemize}
\item
\item $N$为一个非终结符集合
\item $\Sigma$为一个终结符集合
\item $R$为一个规则(产生式)集合,每条规则 $r \in R$的形式为$p:X \to Y_1Y_2...Y_n$,其中$X \in N$, $Y_i \in N \cup \Sigma$,每个$r$都对应一个概率$p$,表示其生成的可能性。
......
......@@ -328,7 +328,6 @@ year={2017}}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%% chapter 2------------------------------------------------------
......@@ -512,9 +511,48 @@ year={2015}
publisher={Prentice-Hall Englewood Cliffs, NJ}
}
@inproceedings{heafield-2011-kenlm,
title = "{K}en{LM}: Faster and Smaller Language Model Queries",
author = "Heafield, Kenneth",
booktitle = "Proceedings of the Sixth Workshop on Statistical Machine Translation",
month = jul,
year = "2011",
address = "Edinburgh, Scotland",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/W11-2123",
pages = "187--197"
}
@inproceedings{wang-etal-2018-niutrans,
title = "The {N}iu{T}rans Machine Translation System for {WMT}18",
author = "Wang, Qiang and
Li, Bei and
Liu, Jiqiang and
Jiang, Bojian and
Zhang, Zheyang and
Li, Yinqiao and
Lin, Ye and
Xiao, Tong and
Zhu, Jingbo",
booktitle = "Proceedings of the Third Conference on Machine Translation: Shared Task Papers",
month = oct,
year = "2018",
address = "Belgium, Brussels",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/W18-6430",
doi = "10.18653/v1/W18-6430",
pages = "528--534"
}
@article{stolcke2002srilm,
title={SRILM - an extensible language modeling toolkit},
author={Stolcke, Andreas},
journal={INTERSPEECH},
year={2002}
}
%%%%% chapter 2------------------------------------------------------
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%% chapter 3------------------------------------------------------
......
......@@ -112,13 +112,13 @@
% CHAPTERS
%----------------------------------------------------------------------------------------
\include{Chapter1/chapter1}
%\include{Chapter1/chapter1}
\include{Chapter2/chapter2}
\include{Chapter3/chapter3}
\include{Chapter4/chapter4}
\include{Chapter5/chapter5}
\include{Chapter6/chapter6}
\include{ChapterAppend/chapterappend}
%\include{Chapter3/chapter3}
%\include{Chapter4/chapter4}
%\include{Chapter5/chapter5}
%\include{Chapter6/chapter6}
%\include{ChapterAppend/chapterappend}
......
% !Mode:: "TeX:UTF-8"
% !TEX encoding = UTF-8 Unicode
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% The Legrand Orange Book
% LaTeX Template
% Version 2.4 (26/09/2018)
%
% This template was downloaded from:
% http://www.LaTeXTemplates.com
%
% Original author:
% Mathias Legrand (legrand.mathias@gmail.com) with modifications by:
% Vel (vel@latextemplates.com)
%
% License:
% CC BY-NC-SA 3.0 (http://creativecommons.org/licenses/by-nc-sa/3.0/)
%
% Compiling this template:
% This template uses biber for its bibliography and makeindex for its index.
% When you first open the template, compile it from the command line with the
% commands below to make sure your LaTeX distribution is configured correctly:
%
% 1) pdflatex main
% 2) makeindex main.idx -s StyleInd.ist
% 3) biber main
% 4) pdflatex main x 2
%
% After this, when you wish to update the bibliography/index use the appropriate
% command above and make sure to compile with pdflatex several times
% afterwards to propagate your changes to the document.
%
% This template also uses a number of packages which may need to be
% updated to the newest versions for the template to compile. It is strongly
% recommended you update your LaTeX distribution if you have any
% compilation errors.
%
% Important note:
% Chapter heading images should have a 2:1 width:height ratio,
% e.g. 920px width and 460px height.
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%----------------------------------------------------------------------------------------
% PACKAGES AND OTHER DOCUMENT CONFIGURATIONS
%----------------------------------------------------------------------------------------
......@@ -61,7 +21,7 @@
%\IfFileExists{C:/WINDOWS/win.ini}
{\newcommand{\mycfont}{song}}
%{\newcommand{\mycfont}{gbsn}}
{\newcommand{\mycfont}{gbsn}}
%公式字体设置为计算机现代罗马
\AtBeginDocument{
......@@ -98,7 +58,7 @@
\node[inner sep=0pt] (background) at (current page.center) {\includegraphics[width=\paperwidth]{background.pdf}};
\draw (current page.center) node [fill=ocre!30!white,fill opacity=0.6,text opacity=1,inner sep=1cm]{\Huge\centering\bfseries\sffamily\parbox[c][][t]{\paperwidth}{\centering 机器翻译:统计建模与深度学习方法\\[15pt] % Book title
%{\Large 副标题是否需要}\\[20pt] % Subtitle
{\huge 肖桐}}}; % Author name
{\LARGE 肖桐\ \ 朱靖波}}}; % Author name
\end{tikzpicture}
\vfill
\endgroup
......@@ -111,38 +71,51 @@
~\vfill
\thispagestyle{empty}
\noindent Copyright \copyright\ 2020 Xiao Tong\\ % Copyright notice
\noindent Copyright \copyright\ 2020 肖桐\ \ 朱靖波\\ % Copyright notice
\noindent \textsc{Published by \red{Publisher}}\\ % Publisher
\noindent \textsc{东北大学自然语言处理实验室\ /\ 小牛翻译}\\ % Publisher
\noindent \textsc{\url{http://47.105.50.196/NiuTrans/Toy-MT-Introduction/tree/master/Book}}\\ % URL
\noindent {\red{Licensed under the Creative Commons Attribution-NonCommercial 3.0 Unported License (the ``License''). You may not use this file except in compliance with the License. You may obtain a copy of the License at \url{http://creativecommons.org/licenses/by-nc/3.0}. Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \textsc{``as is'' basis, without warranties or conditions of any kind}, either express or implied. See the License for the specific language governing permissions and limitations under the License.}}\\ % License information, replace this with your own license (if any)
\noindent \textit{First printing, \red{March 2019}} % Printing/edition date
\noindent \textit{First Edition, April 2020}
%----------------------------------------------------------------------------------------
% TABLE OF CONTENTS
% ACKNOWLEDGE PAGE
%----------------------------------------------------------------------------------------
\chapterimage{chapter_head_1.pdf} %目录标题的图案
\newpage
~\vfill
\thispagestyle{empty}
\pagestyle{empty} % Disable headers and footers for the following pages
{\large
\noindent {\color{red} 在此感谢所有为本书做出贡献的人} \\
\tableofcontents % 打印目录
\noindent 曹润柘、曾信、孟霞、单韦乔、姜雨帆、王子扬、刘辉、许诺、李北、刘继强、张哲旸、周书涵、周涛、张裕浩、李炎洋,刘晓倩、牛蕊 \\
}
\cleardoublepage %保证章节页在奇数页
%----------------------------------------------------------------------------------------
% TABLE OF CONTENTS
%----------------------------------------------------------------------------------------
%\usechapterimagefalse % If you don't want to include a chapter image, use this to toggle images off - it can be enabled later with \usechapterimagetrue
\chapterimage{chapter_head_1.pdf} %目录标题的图案
\pagestyle{empty} % Disable headers and footers for the following pages
\tableofcontents % 打印目录
\cleardoublepage %保证章节页在奇数页
\pagestyle{fancy} % Enable headers and footers again
%----------------------------------------------------------------------------------------
% CHAPTERS
%----------------------------------------------------------------------------------------
\include{Chapter1/chapter1}
%\include{Chapter1/chapter1}
\include{Chapter2/chapter2}
\include{Chapter3/chapter3}
\include{Chapter5/chapter5}
\include{Chapter6/chapter6}
%\include{Chapter3/chapter3}
%\include{Chapter4/chapter4}
%\include{Chapter5/chapter5}
%\include{Chapter6/chapter6}
%\include{ChapterAppend/chapterappend}
%----------------------------------------------------------------------------------------
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论