合并分支 'caorunzhe' 到 'master'

Caorunzhe 查看合并请求 !87

合并分支 'caorunzhe' 到 'master'
Caorunzhe 查看合并请求 !87
41d5b8db · 曹润柘 · 29939616 · 4f05fac1 · 41d5b8db · 41d5b8db
Commit 41d5b8db authored Aug 22, 2020 by 曹润柘
--- a/Chapter1/chapter1.tex
+++ b/Chapter1/chapter1.tex
@@ -111,7 +111,7 @@

 \parinterval 人工翻译已经存在了上千年，而机器翻译又起源于什么时候呢？机器翻译跌宕起伏的发展史可以分为萌芽期、受挫期、快速成长期和爆发期四个阶段。

-\parinterval 早在17世纪，如Descartes就提出使用世界语言，即使用统一符号表示不同语言、相同含义的词汇，来克服语言障碍的想法\upcite{knowlson1975universal}，这种想法在当时是很超前的。随着语言学、计算机科学等学科的发展，在19世纪30年代使用计算模型进行自动翻译的思想开始萌芽，如当时法国科学家Georges Artsrouni就提出用机器来进行翻译的想法。只是那时依然没有合适的实现手段，所以这种想法的合理性无法被证实。
+\parinterval 17世纪，Descartes提出世界语言的概念\upcite{knowlson1975universal}，他希望使用统一符号表示不同语言、相同含义的词汇，以此来克服语言障碍，这种想法在当时是很超前的。随着语言学、计算机科学等学科的发展，在19世纪30年代使用计算模型进行自动翻译的思想开始萌芽，如当时法国科学家Georges Artsrouni就提出用机器来进行翻译的想法。只是那时依然没有合适的实现手段，所以这种想法的合理性无法被证实。

 \parinterval 随着第二次世界大战爆发， 对文字进行加密和解密成为重要的军事需求，这也使得数学和密码学变得相当发达。在战争结束一年后，世界上第一台通用电子数字计算机于1946年研制成功（图\ref{fig:1-4}），至此使用机器进行翻译有了真正实现的可能。


--- a/Chapter2/Figures/figure-example-of-beam-search.tex
+++ b/Chapter2/Figures/figure-example-of-beam-search.tex
@@ -9,10 +9,10 @@
 %	\node[anchor=north,minimum width=1.8em,minimum height=1em,fill=blue!10] (l1) at ([yshift=-1em]eos.south){};
 %	\node[anchor=north,minimum width=1.8em,minimum height=1em,fill=red!10] (l2) at ([yshift=-0.5em]l1.south){};
 		
-	\node[anchor=west,unit] (w1) at ([xshift=1.5em,yshift=7em]eos.east){$w_1$};
+	\node[anchor=north,unit] (w1) at ([xshift=0em,yshift=-1.8em]eos.south){$w_1$};
 	\node[anchor=north,unit,fill=blue!10] (n11) at ([yshift=-0.5em]w1.south){$<$sos$>$};
 		
-	\node[anchor=west,unit,fill=red!20,opacity=0.3] (n24) at ([xshift=4.5em]n11.east){an};
+	\node[anchor=west,unit,fill=red!20,opacity=0.3] (n24) at ([xshift=6.5em,yshift=4.3em]n11.east){an};
 \node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black,opacity=0.3] (pt24) at (n24.east) {\small{{\color{white} \textbf{-1.4}}}};
 	\node[anchor=south,unit,fill=red!20] (n23) at ([yshift=0.1em]n24.north){one};
 \node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt23) at (n23.east) {\small{{\color{white} \textbf{-0.6}}}};
@@ -30,7 +30,7 @@
 \node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black,opacity=0.3] (pt27) at (n27.east) {\small{{\color{white} \textbf{-7.2}}}};
 	\node[anchor=south,unit] (w2) at ([yshift=0.5em]n21.north){$w_2$};
 		
-	\node[anchor=west,unit,fill=red!20] (n31) at ([yshift=3em,xshift=6em]n21.east){is};
+	\node[anchor=west,unit,fill=red!20] (n31) at ([yshift=4.7em,xshift=8em]n21.east){is};
 \node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt31) at (n31.east) {\small{{\color{white} \textbf{-0.1}}}};
 	\node[anchor=north,unit,fill=blue!10] (n32) at ([yshift=-0.1em]n31.south){$<$eos$>$};
 \node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt32) at (n32.east) {\small{{\color{white} \textbf{-0.6}}}};
@@ -49,7 +49,7 @@
 \node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt41) at (n41.east) {\small{{\color{white} \textbf{-0.1}}}};
 	\node[anchor=north,unit,fill=red!20,opacity=0.3,minimum width=3.5em,minimum height=2.5em] (n51) at ([yshift=-0.1em]n41.south){…};
 \node [anchor=north,rotate=90,inner sep=1pt,minimum width=2.5em,fill=black,opacity=0.3] (pt51) at (n51.east) {\small{{\color{white} \textbf{$<$-0.7}}}};
-	\node[anchor=south,unit] (w3) at ([yshift=0.5em]n31.north){$w_2$};
+	\node[anchor=south,unit] (w3) at ([yshift=0.5em]n31.north){$w_3$};
 		
 	\draw[->,ublue,very thick] (n11.east) -- (n21.west);
 	\draw[->,ublue,very thick] (n11.east) -- (n22.west);

--- a/Chapter2/Figures/figure-example-of-greedy-search.tex
+++ b/Chapter2/Figures/figure-example-of-greedy-search.tex
@@ -6,17 +6,17 @@
 	\node[fill=blue!40,anchor=north,align=left,inner sep=2pt,minimum width=5em](spe)at(words.south){\color{white}{\small\bfnew{特殊符号}}};
 	\node[fill=blue!10,anchor=north,align=left,inner sep=3pt,minimum width=5em](eos)at(spe.south){$<$sos$>$\\[-0.5ex]$<$eos$>$};
 		
-	\node[anchor=west,unit] (w1) at ([xshift=2em,yshift=4.5em]eos.east){$w_1$};
+	\node[anchor=north,unit] (w1) at ([xshift=2.5em,yshift=-1em]eos.south){$w_1$};
 	\node[anchor=north,unit,fill=blue!10] (n11) at ([yshift=-0.5em]w1.south){$<$sos$>$};
 		
-\node [anchor=north] (wtranslabel) at ([xshift=0em,yshift=-1em]n11.south) {\small{生成顺序:}};
+\node [anchor=north] (wtranslabel) at ([xshift=-2.5em,yshift=-3em]n11.south) {\small{生成顺序:}};
 \draw [->,ultra thick,red,line width=1.5pt,opacity=0.7] (wtranslabel.east) -- ([xshift=1.5em]wtranslabel.east);

 	\node[anchor=west,unit,fill=red!20] (n22) at ([xshift=5em]n11.east){agree};
 \node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt22) at (n22.east) {\small{{\color{white} \textbf{-0.4}}}};
-	\node[anchor=south,unit,fill=red!20] (n21) at ([yshift=0.3em]n22.north){I};
+	\node[anchor=south,unit,fill=red!20] (n21) at ([yshift=5.5em]n22.north){I};
 \node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt21) at (n21.east) {\small{{\color{white} \textbf{-0.5}}}};
-	\node[anchor=north,unit,fill=blue!10] (n23) at ([yshift=-0.3em]n22.south){$<$eos$>$};
+	\node[anchor=north,unit,fill=blue!10] (n23) at ([yshift=-3em]n22.south){$<$eos$>$};
 \node [anchor=north,rotate=90,inner sep=1pt,minimum width=2em,fill=black] (pt23) at (n23.east) {\small{{\color{white} \textbf{-2.2}}}};
 	\node[anchor=south,unit] (w2) at ([yshift=0.5em]n21.north){$w_2$};
 		

--- a/Chapter2/chapter2.tex
+++ b/Chapter2/chapter2.tex
@@ -686,7 +686,7 @@ N & = & \sum_{r=0}^{\infty}{r^{*}n_r} \nonumber \\

 \subsubsection{3.Kneser-Ney平滑方法}

-\parinterval Kneser-Ney平滑方法是由Reinhard Kneser和Hermann Ney于1995年提出的用于计算$n$元语法概率分布的方法\upcite{kneser1995improved,chen1999empirical}，并被广泛认为是最有效的平滑方法之一。这种平滑方法改进了Absolute Discounting\upcite{ney1994on,ney1991on}中与高阶分布相结合的低阶分布的计算方法，使不同阶分布得到充分的利用。这种算法也综合利用了其他多种平滑算法的思想。
+\parinterval Kneser-Ney平滑方法是由Reinhard Kneser和Hermann Ney于1995年提出的用于计算$n$元语法概率分布的方法\upcite{kneser1995improved,chen1999empirical}，并被广泛认为是最有效的平滑方法之一。这种平滑方法改进了Absolute Discounting\upcite{ney1994on,ney1991smoothing}中与高阶分布相结合的低阶分布的计算方法，使不同阶分布得到充分的利用。这种算法也综合利用了其他多种平滑算法的思想。

 \parinterval 首先介绍一下Absolute Discounting平滑算法，公式如下所示：
 \begin{eqnarray}
@@ -823,7 +823,7 @@ c_{\textrm{KN}}(\cdot) = \left\{\begin{array}{ll}
 \label{eq:2-40}
 \end{eqnarray}

-\noindent 这里$\arg$即argument（参数），$\argmax_x f(x)$表示返回使$f(x)$达到最大的$x$。$\argmax_{w \in \chi}\funp{P}(w)$表示找到使语言模型得分$\funp{P}(w)$达到最大的单词序列$w$。$\chi$ 是搜索问题的解空间，它是所有可能的单词序列$w$的集合。$\hat{w}$可以被看做该搜索问题中的“最优解”，即概率最大的单词序列。
+\noindent 这里$\arg$即argument（参数），$\argmax_x f(x)$表示返回使$f(x)$达到最大的$x$。$\argmax_{w \in \chi}$\\$\funp{P}(w)$表示找到使语言模型得分$\funp{P}(w)$达到最大的单词序列$w$。$\chi$ 是搜索问题的解空间，它是所有可能的单词序列$w$的集合。$\hat{w}$可以被看做该搜索问题中的“最优解”，即概率最大的单词序列。

 \parinterval 在序列生成任务中，最简单的策略就是对词表中的词汇进行任意组合，通过这种枚举的方式得到全部可能的序列。但是，很多时候并生成序列的长度是无法预先知道的。比如，机器翻译中目标语序列的长度是任意的。那么怎样判断一个序列何时完成了生成过程呢？这里借用人类书写中文和英文的过程：句子的生成首先从一片空白开始，然后从左到右逐词生成，除了第一个单词，所有单词的生成都依赖于前面已经生成的单词。为了方便计算机实现，通常定义单词序列从一个特殊的符号<sos>后开始生成。同样地，一个单词序列的结束也用一个特殊的符号<eos>来表示。

@@ -925,7 +925,7 @@ c_{\textrm{KN}}(\cdot) = \left\{\begin{array}{ll}
 \end{figure}
 %-------------------------------------------

-\parinterval 这样，语言模型的打分与解空间树的遍历就融合了在一起。于是，序列生成的问题可以被重新描述为：寻找所有单词序列组成的解空间树中权重总和最大的一条路径。在这个定义下，前面提到的两种枚举词序列的方法就是经典的{\small\bfnew{深度优先搜索}}\index{深度优先搜索}（Depth-first Search）\index{Depth-first Search}和{\small\bfnew{宽度优先搜索}}\index{宽度优先搜索}（Breadth-first Search）\index{Breadth-first Search}的雏形。在后面的内容中可以看到，从遍历解空间树的角度出发，可以对原始这些搜索策略的效率进行优化。
+\parinterval 这样，语言模型的打分与解空间树的遍历就融合了在一起。于是，序列生成的问题可以被重新描述为：寻找所有单词序列组成的解空间树中权重总和最大的一条路径。在这个定义下，前面提到的两种枚举词序列的方法就是经典的{\small\bfnew{深度优先搜索}}\index{深度优先搜索}（Depth-first Search）\upcite{even2011graph}\index{Depth-first Search}和{\small\bfnew{宽度优先搜索}}\index{宽度优先搜索}（Breadth-first Search）\upcite{lee1961an}\index{Breadth-first Search}的雏形。在后面的内容中可以看到，从遍历解空间树的角度出发，可以对原始这些搜索策略的效率进行优化。

 %----------------------------------------------------------------------------------------
 %    NEW SUB-SECTION

--- a/bibliography-old.bib
+++ b/bibliography-old.bib
@@ -1174,12 +1174,22 @@
  biburl    = {https://dblp.org/rec/books/mg/CormenLR89.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
 }
-%没有出版社
-@book{russell2003artificial,
-	title={Artificial Intelligence : A Modern Approach},
-	author={Stuart J. {Russell} and Peter {Norvig}},
-	//notes="Sourced from Microsoft Academic - https://academic.microsoft.com/paper/2122410182",
-	year={2003}
+
+@article{DBLP:journals/ai/SabharwalS11,
+  author    = {Ashish Sabharwal and
+               Bart Selman},
+  title     = {S. Russell, P. Norvig, Artificial Intelligence: {A} Modern Approach,
+               Third Edition},
+  journal   = {Artif. Intell.},
+  volume    = {175},
+  number    = {5-6},
+  pages     = {935--937},
+  year      = {2011},
+  url       = {https://doi.org/10.1016/j.artint.2011.01.005},
+  doi       = {10.1016/j.artint.2011.01.005},
+  timestamp = {Sat, 27 May 2017 14:24:41 +0200},
+  biburl    = {https://dblp.org/rec/journals/ai/SabharwalS11.bib},
+  bibsource = {dblp computer science bibliography, https://dblp.org}
 }

 @book{sahni1978fundamentals,
@@ -1370,11 +1380,12 @@
  number={3},
  year={1957},
 }
-%没有出版社
+
 @book{lowerre1976the,
 	title={The HARPY speech recognition system},
 	author={Bruce T. {Lowerre}},
 	//notes="Sourced from Microsoft Academic - https://academic.microsoft.com/paper/2137095888",
+	publisher={Carnegie Mellon University},
 	year={1976}
 }

@@ -1419,13 +1430,13 @@
 	year={1994}
 }

-@inproceedings{ney1991on,
+@inproceedings{ney1991smoothing,
  title={On smoothing techniques for bigram-based natural language modelling},
-	author={H. {Ney} and U. {Essen}},
-	booktitle={[Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing},
+  author={Ney, Hermann and Essen, Ute},
+  booktitle={Acoustics, Speech, and Signal Processing, IEEE International Conference on},
  pages={825--828},
-	//notes={Sourced from Microsoft Academic - https://academic.microsoft.com/paper/2020749563},
-	year={1991}
+  year={1991},
+  organization={IEEE Computer Society}
 }

 @article{chen1999an,
@@ -1438,13 +1449,13 @@
 	//notes={Sourced from Microsoft Academic - https://academic.microsoft.com/paper/2158195707},
 	year={1999}
 }
-%需要确认
+
 @book{bell1990text,
 	title={Text compression},
 	author={Timothy C. {Bell} and John G. {Cleary} and Ian H. {Witten}},
 	//notes={Sourced from Microsoft Academic - https://academic.microsoft.com/paper/2611071497},
 	year={1990},
-	publisher={Prentice-Hall, Inc.}
+	publisher={Prentice Hall}
 }

 @article{katz1987estimation,
@@ -1686,6 +1697,22 @@
 	year={2000}
 }

+@article{lee1961an,
+	title="An Algorithm for Path Connections and Its Applications",
+	author="C. Y. {Lee}",
+	journal="Ire Transactions on Electronic Computers",
+	volume="10",
+	number="3",
+	pages="346--365",
+	year="1961"
+}
+
+@book{even2011graph,
+  title={Graph algorithms},
+  author={Even, Shimon},
+  year={2011},
+  publisher={Cambridge University Press}
+}

 %%%%% chapter 2------------------------------------------------------
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

--- a/bibliography.bib
+++ b/bibliography.bib
@@ -18,6 +18,13 @@
  year ={2019},
 }

+@book{knowlson1975universal,
+	title={Universal Language Schemes in England and France 1600-1800},
+	author={James {Knowlson}},
+	year={1975},
+	publisher={University of Toronto Press}
+}
+
 @article{DBLP:journals/bstj/Shannon48,
  author    = {Claude E. Shannon},
  title     = {A mathematical theory of communication},
@@ -25,12 +32,7 @@
  volume    = {27},
  number    = {3},
  pages     = {379--423},
-  year      = {1948},
-  //url       = {https://doi.org/10.1002/j.1538-7305.1948.tb01338.x},
-  //doi       = {10.1002/j.1538-7305.1948.tb01338.x},
-  //timestamp = {Sat, 30 May 2020 20:01:09 +0200},
-  //biburl    = {https://dblp.org/rec/journals/bstj/Shannon48.bib},
-  //bibsource = {dblp computer science bibliography, https://dblp.org}
+  year      = {1948}
 }

 @article{shannon1949the,
@@ -38,10 +40,20 @@
 	author={Claude E. {Shannon} and Warren {Weaver}},
 	journal={IEEE Transactions on Instrumentation and Measurement},
 	volume={13},
-	//notes={Sourced from Microsoft Academic - https://academic.microsoft.com/paper/2993383518},
 	year={1949}
 }

+@article{weaver1955translation,
+  title={Translation},
+  author={Weaver, Warren},
+  journal={Machine translation of languages},
+  volume={14},
+  number={15-23},
+  pages={10},
+  year={1955},
+  publisher={Cambridge: Technology Press, MIT}
+}
+
 @article{Chomsky1957Syntactic,
  title={Syntactic Structures},
  author={Chomsky, Noam},
@@ -51,24 +63,22 @@
  year={1957},
 }

-@article{DBLP:journals/coling/BrownCPPJLMR90,
-  author    = {Peter F. Brown and
-               John Cocke and
-               Stephen Della Pietra and
-               Vincent J. Della Pietra and
-               Frederick Jelinek and
-               John D. Lafferty and
-               Robert L. Mercer and
-               Paul S. Roossin},
-  title     = {A Statistical Approach to Machine Translation},
-  journal   = {Computational Linguistics},
-  volume    = {16},
-  number    = {2},
-  pages     = {79--85},
-  year      = {1990},
-  //timestamp = {Mon, 11 May 2020 15:46:08 +0200},
-  //biburl    = {https://dblp.org/rec/journals/coling/BrownCPPJLMR90.bib},
-  //bibsource = {dblp computer science bibliography, https://dblp.org}
+@inproceedings{DBLP:conf/coling/SatoN90,
+  author    = {Satoshi Sato and
+               Makoto Nagao},
+  title     = {Toward Memory-based Translation},
+  booktitle = {13th International Conference on Computational Linguistics, {COLING}
+               1990, University of Helsinki, Finland, August 20-25, 1990},
+  pages     = {247--252},
+  year      = {1990}
+}
+
+@article{nagao1984framework,
+  title={A framework of a mechanical translation between Japanese and English by analogy principle},
+  author={Nagao, Makoto},
+  journal={Artificial and human intelligence},
+  pages={351--354},
+  year={1984}
 }

 @article{DBLP:journals/coling/BrownPPM94,
@@ -81,32 +91,7 @@
  volume    = {19},
  number    = {2},
  pages     = {263--311},
-  year      = {1993},
-  //timestamp = {Mon, 11 May 2020 15:46:10 +0200},
-  //biburl    = {https://dblp.org/rec/journals/coling/BrownPPM94.bib},
-  //bibsource = {dblp computer science bibliography, https://dblp.org}
-}
-
-@article{nagao1984framework,
-  title={A framework of a mechanical translation between Japanese and English by analogy principle},
-  author={Nagao, Makoto},
-  journal={Artificial and human intelligence},
-  pages={351--354},
-  year={1984}
-}
-
-@inproceedings{DBLP:conf/coling/SatoN90,
-  author    = {Satoshi Sato and
-               Makoto Nagao},
-  title     = {Toward Memory-based Translation},
-  booktitle = {13th International Conference on Computational Linguistics, {COLING}
-               1990, University of Helsinki, Finland, August 20-25, 1990},
-  pages     = {247--252},
-  year      = {1990},
-  //url       = {https://www.aclweb.org/anthology/C90-3044/},
-  //timestamp = {Mon, 16 Sep 2019 17:08:53 +0200},
-  //biburl    = {https://dblp.org/rec/conf/coling/SatoN90.bib},
-  //bibsource = {dblp computer science bibliography, https://dblp.org}
+  year      = {1993}
 }

 @article{DBLP:journals/coling/BrownCPPJLMR90,
@@ -119,14 +104,11 @@
               Robert L. Mercer and
               Paul S. Roossin},
  title     = {A Statistical Approach to Machine Translation},
-  journal   = {Comput. Linguistics},
+  journal   = {Computational Linguistics},
  volume    = {16},
  number    = {2},
  pages     = {79--85},
-  year      = {1990},
-  //timestamp = {Mon, 11 May 2020 15:46:08 +0200},
-  //biburl    = {https://dblp.org/rec/journals/coling/BrownCPPJLMR90.bib},
-  //bibsource = {dblp computer science bibliography, https://dblp.org}
+  year      = {1990}
 }

 @article{nirenburg1989knowledge,
@@ -154,7 +136,6 @@
 	volume={26},
 	number={4},
 	pages={638--641},
-	//notes="Sourced from Microsoft Academic - https://academic.microsoft.com/paper/1579838312",
 	year={2000}
 }

@@ -192,28 +173,67 @@
 	volume={19},
 	number={1},
 	pages={75--102},
-	//notes="Sourced from Microsoft Academic - https://academic.microsoft.com/paper/1489181569",
 	year={1993}
 }

-@article{brown1990statistical,
-  author    = {Peter F. Brown and
-               John Cocke and
-               Stephen Della Pietra and
-               Vincent J. Della Pietra and
-               Frederick Jelinek and
-               John D. Lafferty and
-               Robert L. Mercer and
-               Paul S. Roossin},
-  title     = {A Statistical Approach to Machine Translation},
-  journal   = {Computational Linguistics},
-  volume    = {16},
-  number    = {2},
-  pages     = {79--85},
-  year      = {1990},
-  //timestamp = {Wed, 13 Feb 2002 09:26:36 +0100},
-  //biburl    = {https://dblp.org/rec/journals/coling/BrownCPPJLMR90.bib},
-  //bibsource = {dblp computer science bibliography, https://dblp.org}
+@inproceedings{DBLP:journals/corr/LuongPM15,
+  author    = {Thang Luong and
+               Hieu Pham and
+               Christopher D. Manning},
+  //editor    = {Llu{\'{\i}}s M{\`{a}}rquez and
+               Chris Callison{-}Burch and
+               Jian Su and
+               Daniele Pighin and
+               Yuval Marton},
+  title     = {Effective Approaches to Attention-based Neural Machine Translation},
+  booktitle = {Proceedings of the 2015 Conference on Empirical Methods in Natural
+               Language Processing, {EMNLP} 2015, Lisbon, Portugal, September 17-21,
+               2015},
+  pages     = {1412--1421},
+  publisher = {The Association for Computational Linguistics},
+  year      = {2015}
+}
+
+@inproceedings{DBLP:journals/corr/GehringAGYD17,
+  author    = {Jonas Gehring and
+               Michael Auli and
+               David Grangier and
+               Denis Yarats and
+               Yann N. Dauphin},
+  //editor    = {Doina Precup and
+               Yee Whye Teh},
+  title     = {Convolutional Sequence to Sequence Learning},
+  booktitle = {Proceedings of the 34th International Conference on Machine Learning,
+               {ICML} 2017, Sydney, NSW, Australia, 6-11 August 2017},
+  series    = {Proceedings of Machine Learning Research},
+  volume    = {70},
+  pages     = {1243--1252},
+  publisher = {{PMLR}},
+  year      = {2017}
+}
+
+@inproceedings{NIPS2017_7181,
+  author    = {Ashish Vaswani and
+               Noam Shazeer and
+               Niki Parmar and
+               Jakob Uszkoreit and
+               Llion Jones and
+               Aidan N. Gomez and
+               Lukasz Kaiser and
+               Illia Polosukhin},
+  //editor    = {Isabelle Guyon and
+               Ulrike von Luxburg and
+               Samy Bengio and
+               Hanna M. Wallach and
+               Rob Fergus and
+               S. V. N. Vishwanathan and
+               Roman Garnett},
+  title     = {Attention is All you Need},
+  booktitle = {Advances in Neural Information Processing Systems 30: Annual Conference
+               on Neural Information Processing Systems 2017, 4-9 December 2017,
+               Long Beach, CA, {USA}},
+  pages     = {5998--6008},
+  year      = {2017}
 }

 @inproceedings{bahdanau2014neural,
@@ -225,11 +245,7 @@
  title     = {Neural Machine Translation by Jointly Learning to Align and Translate},
  booktitle = {3rd International Conference on Learning Representations, {ICLR} 2015,
               San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings},
-  year      = {2015},
-  //url       = {http://arxiv.org/abs/1409.0473},
-  //timestamp = {Wed, 17 Jul 2019 10:40:54 +0200},
-  //biburl    = {https://dblp.org/rec/journals/corr/BahdanauCB14.bib},
-  //bibsource = {dblp computer science bibliography, https://dblp.org}
+  year      = {2015}
 }

 @inproceedings{NIPS2014_5346,
@@ -246,23 +262,14 @@
               on Neural Information Processing Systems 2014, December 8-13 2014,
               Montreal, Quebec, Canada},
  pages     = {3104--3112},
-  year      = {2014},
-  //url       = {http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks},
-  //timestamp = {Fri, 06 Mar 2020 16:58:11 +0100},
-  //biburl    = {https://dblp.org/rec/conf/nips/SutskeverVL14.bib},
-  //bibsource = {dblp computer science bibliography, https://dblp.org}
+  year      = {2014}
 }

 @book{koehn2009statistical,
  author    = {Philipp Koehn},
  title     = {Statistical Machine Translation},
  publisher = {Cambridge University Press},
-  year      = {2010},
-  //url       = {http://www.statmt.org/book/},
-  //isbn      = {978-0-521-87415-1},
-  //timestamp = {Tue, 25 Jun 2019 09:00:29 +0200},
-  //biburl    = {https://dblp.org/rec/books/daglib/0032677.bib},
-  //bibsource = {dblp computer science bibliography, https://dblp.org}
+  year      = {2010}
 }

 @article{DBLP:journals/corr/abs-1709-07809,
@@ -270,12 +277,7 @@
  title     = {Neural Machine Translation},
  journal   = {CoRR},
  volume    = {abs/1709.07809},
-  year      = {2017},
-  //url       = {http://arxiv.org/abs/1709.07809},
-  //eprint    = {1709.07809},
-  //timestamp = {Mon, 13 Aug 2018 16:47:37 +0200},
-  //biburl    = {https://dblp.org/rec/journals/corr/abs-1709-07809.bib},
-  //bibsource = {dblp computer science bibliography, https://dblp.org}
+  year      = {2017}
 }

 @book{manning1999foundations,
@@ -299,12 +301,7 @@
  title     = {Deep Learning},
  series    = {Adaptive computation and machine learning},
  publisher = {{MIT} Press},
-  year      = {2016},
-  //url       = {http://www.deeplearningbook.org/},
-  //isbn      = {978-0-262-03561-3},
-  //timestamp = {Sat, 25 Mar 2017 20:16:59 +0100},
-  //biburl    = {https://dblp.org/rec/books/daglib/0040158.bib},
-  //bibsource = {dblp computer science bibliography, https://dblp.org}
+  year      = {2016}
 }

 @article{goldberg2017neural,
@@ -338,27 +335,55 @@
  journal ={中文信息学报},
  volume ={34},
  pages ={4},
-  year ={2020},
-  //note ={\url{https://nndl.github.io/}}
+  year ={2020}
 }

-@book{knowlson1975universal,
-	title={Universal Language Schemes in England and France 1600-1800},
-	author={James {Knowlson}},
-	//notes={Sourced from Microsoft Academic - https://academic.microsoft.com/paper/2088082035},
-	year={1975},
-	publisher={University of Toronto Press}
+%%%%% chapter 1------------------------------------------------------
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%% chapter 2------------------------------------------------------
+
+@book{kolmogorov2018foundations,
+  title ={Foundations of the theory of probability: Second English Edition},
+  author ={Kolmogorov, Andre Nikolaevich and Bharucha-Reid, Albert T},
+  year ={2018},
+  publisher ={Courier Dover Publications}
 }

-@article{weaver1955translation,
-  title={Translation},
-  author={Weaver, Warren},
-  journal={Machine translation of languages},
-  volume={14},
-  number={15-23},
-  pages={10},
-  year={1955},
-  publisher={Cambridge: Technology Press, MIT}
+@book{mao-prob-book-2011,
+  title ={概率论与数理统计教程: 第二版},
+  author ={魏宗舒},
+  year ={2011},
+  publisher ={北京: 高等教育出版社}
+}
+
+@article{resnick1992adventures,
+    author = {Barbour, A. and Resnick, Sidney},
+    year = {1993},
+    month = {12},
+    pages = {1474},
+    title = {Adventures in Stochastic Processes.},
+    volume = {88},
+    journal = {Journal of the American Statistical Association}
+}
+
+@book{liuke-markov-2004,
+  title ={实用马尔可夫决策过程},
+  author ={刘克},
+  year ={2004},
+  publisher ={清华大学出版社}
+}
+
+@article{gale1995good,
+  author    = {William A. Gale and
+               Geoffrey Sampson},
+  title     = {Good-Turing Frequency Estimation Without Tears},
+  journal   = {Journal of Quantitative Linguistics},
+  volume    = {2},
+  number    = {3},
+  pages     = {217--237},
+  year      = {1995}
 }

 @article{good1953population,
@@ -372,26 +397,427 @@
  publisher ={Oxford University Press}
 }

-@article{gale1995good,
-  author    = {William A. Gale and
-               Geoffrey Sampson},
-  title     = {Good-Turing Frequency Estimation Without Tears},
-  journal   = {Journal of Quantitative Linguistics},
-  volume    = {2},
-  number    = {3},
-  pages     = {217--237},
-  year      = {1995},
-  //url       = {https://doi.org/10.1080/09296179508590051},
-  //doi       = {10.1080/09296179508590051},
-  //timestamp = {Sat, 20 May 2017 00:22:46 +0200},
-  //biburl    = {https://dblp.org/rec/journals/jql/GaleS95.bib},
-  //bibsource = {dblp computer science bibliography, https://dblp.org}
+@inproceedings{kneser1995improved,
+  author    = {Reinhard Kneser and
+               Hermann Ney},
+  title     = {Improved backing-off for M-gram language modeling},
+  booktitle = {1995 International Conference on Acoustics, Speech, and Signal Processing,
+               {ICASSP} '95, Detroit, Michigan, USA, May 08-12, 1995},
+  pages     = {181--184},
+  publisher = {{IEEE} Computer Society},
+  year      = {1995}
 }
-%%%%% chapter 1------------------------------------------------------
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-%%%%% chapter 2------------------------------------------------------
+@inproceedings{ney1991smoothing,
+  title={On smoothing techniques for bigram-based natural language modelling},
+  author={Ney, Hermann and Essen, Ute},
+  booktitle={Acoustics, Speech, and Signal Processing, IEEE International Conference on},
+  pages={825--828},
+  year={1991},
+  organization={IEEE Computer Society}
+}
+
+@article{ney1994on,
+	title={On structuring probabilistic dependences in stochastic language modelling},
+	author={Hermann {Ney} and Ute {Essen} and Reinhard {Kneser}},
+	journal={Computer Speech \& Language},
+	volume={8},
+	number={1},
+	pages={1--38},
+	year={1994}
+}
+
+@inproceedings{stolcke2002srilm,
+  author    = {Andreas Stolcke},
+  //editor    = {John H. L. Hansen and
+               Bryan L. Pellom},
+  title     = {{SRILM} - an extensible language modeling toolkit},
+  booktitle = {7th International Conference on Spoken Language Processing, {ICSLP2002}
+               - {INTERSPEECH} 2002, Denver, Colorado, USA, September 16-20, 2002},
+  publisher = {{ISCA}},
+  year      = {2002}
+}
+
+@inproceedings{heafield-2011-kenlm,
+  author    = {Kenneth Heafield},
+  //editor    = {Chris Callison{-}Burch and
+               Philipp Koehn and
+               Christof Monz and
+               Omar Zaidan},
+  title     = {KenLM: Faster and Smaller Language Model Queries},
+  booktitle = {Proceedings of the Sixth Workshop on Statistical Machine Translation,
+               WMT@EMNLP 2011, Edinburgh, Scotland, UK, July 30-31, 2011},
+  pages     = {187--197},
+  publisher = {Association for Computational Linguistics},
+  year      = {2011}
+}
+
+@article{chen1999empirical,
+  author    = {Stanley F. Chen and
+               Joshua Goodman},
+  title     = {An empirical study of smoothing techniques for language modeling},
+  journal   = {Computer Speech \& Language},
+  volume    = {13},
+  number    = {4},
+  pages     = {359--393},
+  year      = {1999}
+}
+
+@article{ney1994structuring,
+  author    = {Hermann Ney and
+               Ute Essen and
+               Reinhard Kneser},
+  title     = {On structuring probabilistic dependences in stochastic language modelling},
+  journal   = {Computer Speech \& Language},
+  volume    = {8},
+  number    = {1},
+  pages     = {1--38},
+  year      = {1994}
+}
+
+@book{parsing2009speech,
+  author    = {Dan Jurafsky and
+               James H. Martin},
+  title     = {Speech and language processing: an introduction to natural language
+               processing, computational linguistics, and speech recognition, 2nd
+               Edition},
+  series    = {Prentice Hall series in artificial intelligence},
+  publisher = {Prentice Hall, Pearson Education International},
+  year      = {2009}
+}
+
+@book{DBLP:books/mg/CormenLR89,
+  author    = {Thomas H. Cormen and
+               Charles E. Leiserson and
+               Ronald L. Rivest},
+  title     = {Introduction to Algorithms},
+  publisher = {The {MIT} Press and McGraw-Hill Book Company},
+  year      = {1989}
+}
+
+@book{even2011graph,
+  title={Graph algorithms},
+  author={Even, Shimon},
+  year={2011},
+  publisher={Cambridge University Press}
+}
+
+@article{lee1961an,
+	title="An Algorithm for Path Connections and Its Applications",
+	author="C. Y. {Lee}",
+	journal="Ire Transactions on Electronic Computers",
+	volume="10",
+	number="3",
+	pages="346--365",
+	year="1961"
+}
+
+@article{DBLP:journals/ai/SabharwalS11,
+  author    = {Ashish Sabharwal and
+               Bart Selman},
+  title     = {S. Russell, P. Norvig, Artificial Intelligence: {A} Modern Approach,
+               Third Edition},
+  journal   = {Artificial Intelligence},
+  volume    = {175},
+  number    = {5-6},
+  pages     = {935--937},
+  year      = {2011}
+}
+
+@book{sahni1978fundamentals,
+	title={Fundamentals of Computer Algorithms},
+	author={Sartaj {Sahni} and Ellis {Horowitz}},
+	year={1978},
+	publisher={Computer Science Press}
+}
+
+@article{hart1968a,
+	title={A Formal Basis for the Heuristic Determination of Minimum Cost Paths},
+	author={Peter E. {Hart} and Nils J. {Nilsson} and Bertram {Raphael}},
+	journal={IEEE Transactions on Systems Science and Cybernetics},
+	volume={4},
+	number={2},
+	pages={100--107},
+	year={1968}
+}
+
+@book{lowerre1976the,
+	title={The HARPY speech recognition system},
+	author={Bruce T. {Lowerre}},
+	publisher={Carnegie Mellon University},
+	year={1976}
+}
+
+@book{bishop1995neural,
+	title={Neural networks for pattern recognition},
+	author={Christopher M. {Bishop}},
+	year={1995},
+	publisher={Oxford university press}
+}
+
+@article{åström1965optimal,
+	title={Optimal control of Markov processes with incomplete state information},
+	author={Karl Johan {Åström}},
+	journal={Journal of Mathematical Analysis and Applications},
+	volume={10},
+	number={1},
+	pages={174--205},
+	year={1965}
+}
+
+@article{korf1990real,
+	title={Real-time heuristic search},
+	author={Richard E. {Korf}},
+	journal={Artificial Intelligence},
+	volume={42},
+	number={2},
+	pages={189--211},
+	year={1990}
+}
+%缩写
+@article{jelinek1980interpolated,
+	title={Interpolated estimation of Markov source parameters from sparse data},
+	author={F. {Jelinek}},
+	journal={Proc. Workshop on Pattern Recognition in Practice, 1980},
+	pages={381--397},
+	year={1980}
+}
+
+@article{katz1987estimation,
+	title={Estimation of probabilities from sparse data for the language model component of a speech recognizer},
+	author={S. {Katz}},
+	journal={IEEE Transactions on Acoustics, Speech, and Signal Processing},
+	volume={35},
+	number={3},
+	pages={400--401},
+	year={1987}
+}
+
+@article{witten1991the,
+	title={The zero-frequency problem: estimating the probabilities of novel events in adaptive text compression},
+	author={I.H. {Witten} and T.C. {Bell}},
+	journal={IEEE Transactions on Information Theory},
+	volume={37},
+	number={4},
+	pages={1085--1094},
+	year={1991}
+}
+
+@book{bell1990text,
+	title={Text compression},
+	author={Timothy C. {Bell} and John G. {Cleary} and Ian H. {Witten}},
+	year={1990},
+	publisher={Prentice Hall}
+}
+
+@article{goodman2001a,
+	title={A bit of progress in language modeling},
+	author={Joshua T. {Goodman}},
+	journal={Computer Speech \& Language},
+	volume={15},
+	number={4},
+	pages={403--434},
+	year={2001}
+}
+
+@article{chen1999an,
+	title={An empirical study of smoothing techniques for language modeling},
+	author={Stanley F. {Chen} and Joshua {Goodman}},
+	journal={Computer Speech \& Language},
+	volume={13},
+	number={4},
+	pages={359--394},
+	year={1999}
+}
+
+@inproceedings{kirchhoff2005improved,
+	title={Improved Language Modeling for Statistical Machine Translation},
+	author={Katrin {Kirchhoff} and Mei {Yang}},
+	booktitle={Proceedings of the ACL Workshop on Building and Using Parallel Texts},
+	pages={125--128},
+	year={2005}
+}
+
+@inproceedings{koehn2007factored,
+	title={Factored Translation Models},
+	author={Philipp {Koehn} and Hieu {Hoang}},
+	booktitle={Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)},
+	pages={868--876},
+	year={2007}
+}
+
+@inproceedings{sarikaya2007joint,
+	title={Joint Morphological-Lexical Language Modeling for Machine Translation},
+	author={Ruhi {Sarikaya} and Yonggang {Deng}},
+	booktitle={Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers},
+	pages={145--148},
+	year={2007}
+}
+
+@inproceedings{heafield2011kenlm,
+	title={KenLM: Faster and Smaller Language Model Queries},
+	author={Kenneth {Heafield}},
+	booktitle={Proceedings of the Sixth Workshop on Statistical Machine Translation},
+	pages={187--197},
+	year={2011}
+}
+
+@inproceedings{federico2006how,
+	title={How Many Bits Are Needed To Store Probabilities for Phrase-Based Translation?},
+	author={Marcello {Federico} and Nicola {Bertoldi}},
+	booktitle={Proceedings on the Workshop on Statistical Machine Translation},
+	pages={94--101},
+	year={2006}
+}
+
+@inproceedings{federico2007efficient,
+	title={Efficient Handling of N-gram Language Models for Statistical Machine Translation},
+	author={Marcello {Federico} and Mauro {Cettolo}},
+	booktitle={Proceedings of the Second Workshop on Statistical Machine Translation},
+	pages={88--95},
+	year={2007}
+}
+
+@inproceedings{talbot2007randomised,
+	title={Randomised Language Modelling for Statistical Machine Translation},
+	author={David {Talbot} and Miles {Osborne}},
+	booktitle={Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics},
+	pages={512--519},
+	year={2007}
+}
+
+@inproceedings{talbot2007smoothed,
+	title={Smoothed Bloom Filter Language Models: Tera-Scale LMs on the Cheap},
+	author={David {Talbot} and Miles {Osborne}},
+	booktitle={Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)},
+	pages={468--476},
+	year={2007}
+}
+
+@article{jing2019a,
+	title={A Survey on Neural Network Language Models.},
+	author={Kun {Jing} and Jungang {Xu}},
+	journal={arXiv preprint arXiv:1906.03591},
+	year={2019}
+}
+
+@article{bengio2003a,
+	title={A neural probabilistic language model},
+	author={Yoshua {Bengio} and Réjean {Ducharme} and Pascal {Vincent} and Christian {Janvin}},
+	journal={Journal of Machine Learning Research},
+	volume={3},
+	number={6},
+	pages={1137--1155},
+	year={2003}
+}
+
+@inproceedings{mikolov2010recurrent,
+  author    = {Tomas Mikolov and
+               Martin Karafi{\'{a}}t and
+               Luk{\'{a}}s Burget and
+               Jan Cernock{\'{y}} and
+               Sanjeev Khudanpur},
+  //editor    = {Takao Kobayashi and
+               Keikichi Hirose and
+               Satoshi Nakamura},
+  title     = {Recurrent neural network based language model},
+  booktitle = {{INTERSPEECH} 2010, 11th Annual Conference of the International Speech
+               Communication Association, Makuhari, Chiba, Japan, September 26-30,
+               2010},
+  pages     = {1045--1048},
+  publisher = {{ISCA}},
+  year      = {2010}
+}
+
+@inproceedings{sundermeyer2012lstm,
+	title={LSTM Neural Networks for Language Modeling.},
+	author={Martin {Sundermeyer} and Ralf {Schlüter} and Hermann {Ney}},
+	booktitle={INTERSPEECH},
+	pages={194--197},
+	year={2012}
+}
+
+@inproceedings{vaswani2017attention,
+	title={Attention is All You Need},
+	author={Ashish {Vaswani} and Noam {Shazeer} and Niki {Parmar} and Jakob {Uszkoreit} and Llion {Jones} and Aidan N. {Gomez} and Lukasz {Kaiser} and Illia {Polosukhin}},
+	booktitle={Proceedings of the 31st International Conference on Neural Information Processing Systems},
+	pages={5998--6008},
+	year={2017}
+}
+
+@inproceedings{tillmann1997a,
+	title={A DP-based Search Using Monotone Alignments in Statistical Translation},
+	author={Christoph {Tillmann} and Stephan {Vogel} and Hermann {Ney} and Alex {Zubiaga}},
+	booktitle={Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics},
+	pages={289--296},
+	year={1997}
+}
+
+@inproceedings{DBLP:conf/acl/WangW97,
+  author    = {Ye{-}Yi Wang and
+               Alex Waibel},
+  editor    = {Philip R. Cohen and
+               Wolfgang Wahlster},
+  title     = {Decoding Algorithm in Statistical Machine Translation},
+  booktitle = {35th Annual Meeting of the Association for Computational Linguistics
+               and 8th Conference of the European Chapter of the Association for
+               Computational Linguistics, Proceedings of the Conference, 7-12 July
+               1997, Universidad Nacional de Educaci{\'{o}}n a Distancia (UNED),
+               Madrid, Spain},
+  pages     = {366--372},
+  publisher = {Morgan Kaufmann Publishers / {ACL}},
+  year      = {1997}
+}
+
+@inproceedings{DBLP:conf/acl/OchUN01,
+  author    = {Franz Josef Och and
+               Nicola Ueffing and
+               Hermann Ney},
+  title     = {An Efficient A* Search Algorithm for Statistical Machine Translation},
+  booktitle = {Proceedings of the {ACL} Workshop on Data-Driven Methods in Machine
+               Translation, Toulouse, France, July 7, 2001},
+  year      = {2001}
+}
+
+@inproceedings{germann2001fast,
+	title={Fast Decoding and Optimal Decoding for Machine Translation},
+	author={Ulrich {Germann} and Michael {Jahr} and Kevin {Knight} and Daniel {Marcu} and Kenji {Yamada}},
+	booktitle={Proceedings of 39th Annual Meeting of the Association for Computational Linguistics},
+	pages={228--235},
+	year={2001}
+}
+
+@inproceedings{germann2003greedy,
+	title={Greedy decoding for statistical machine translation in almost linear time},
+	author={Ulrich {Germann}},
+	booktitle={NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1},
+	pages={1--8},
+	year={2003}
+}
+
+@inproceedings{bangalore2001a,
+	title={A finite-state approach to machine translation},
+	author={S. {Bangalore} and G. {Riccardi}},
+	booktitle={IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.},
+	pages={381--388},
+	year={2001}
+}
+
+@inproceedings{bangalore2000stochastic,
+	title={Stochastic finite-state models for spoken language machine translation},
+	author={Srinivas {Bangalore} and Giuseppe {Riccardi}},
+	booktitle={NAACL-ANLP-EMTS '00 Proceedings of the 2000 NAACL-ANLP Workshop on Embedded machine translation systems - Volume 5},
+	pages={52--59},
+	year={2000}
+}
+
+@inproceedings{venugopal2007an,
+	title={An Efficient Two-Pass Approach to Synchronous-CFG Driven Statistical MT},
+	author={Ashish {Venugopal} and Andreas {Zollmann} and Vogel {Stephan}},
+	booktitle={Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference},
+	pages={500--507},
+	year={2007}
+}

 %%%%% chapter 2------------------------------------------------------
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%