合并分支 'caorunzhe' 到 'mengxia'

Caorunzhe 查看合并请求 !279

合并分支 'caorunzhe' 到 'mengxia'
Caorunzhe 查看合并请求 !279
72b8e7c9 · 孟霞 · 35e150a3 · a15d54e6 · 72b8e7c9 · 72b8e7c9
Commit 72b8e7c9 authored Sep 25, 2020 by 孟霞
--- a/Chapter10/chapter10.tex
+++ b/Chapter10/chapter10.tex
@@ -1245,7 +1245,7 @@ L(\vectorn{\emph{Y}},\widehat{\vectorn{\emph{Y}}}) = \sum_{j=1}^n L_{\textrm{ce}
 %    NEW SECTION
 %----------------------------------------------------------------------------------------
 \sectionnewpage
-\section{小节及深入阅读}
+\section{小节及拓展阅读}

 \parinterval 神经机器翻译是近几年的热门方向。无论是前沿性的技术探索，还是面向应用落地的系统研发，神经机器翻译已经成为当下最好的选择之一。研究人员对神经机器翻译的热情使得这个领域得到了快速的发展。本章作为神经机器翻译的入门章节，对神经机器翻译的建模思想和基础框架进行了描述。同时，对常用的神经机器翻译架构\ \dash \ 循环神经网络进行了讨论与分析。


--- a/Chapter12/chapter12.tex
+++ b/Chapter12/chapter12.tex
@@ -573,7 +573,7 @@ Transformer Deep（48层） & 30.2            & 43.1            & 194$\times 10^
 %----------------------------------------------------------------------------------------
 %    NEW SECTION  12.3
 %----------------------------------------------------------------------------------------
-\section{小结及深入阅读}
+\section{小结及拓展阅读}

 \parinterval 编码器-解码器框架提供了一个非常灵活的机制，因为开发者只需要设计编码器和解码器的结构就能完成机器翻译。但是，架构的设计是深度学习中最具挑战的工
 作，优秀的架构往往需要长时间的探索和大量的实验验证，而且还需要一点点 “灵感”。前面介绍的基于循环神经网络的翻译模型和注意力机制就是研究人员通过长期

--- a/Chapter7/chapter7.tex
+++ b/Chapter7/chapter7.tex
@@ -579,11 +579,11 @@ dr = start_i-end_{i-1}-1

 \parinterval 对于每种调序类型，都可以定义一个调序概率，如下：
 \begin{eqnarray}
-\funp{P}(\mathbf{o}|\seq{s},\seq{t},\seq{a}) = \prod_{i=1}^{K} \funp{P}(o_i| \bar{s}_{a_i}, \bar{t}_i, a_{i-1}, a_i)
+\funp{P}(\seq{o}|\seq{s},\seq{t},\seq{a}) = \prod_{i=1}^{K} \funp{P}(o_i| \bar{s}_{a_i}, \bar{t}_i, a_{i-1}, a_i)
 \label{eq:7-16}
 \end{eqnarray}

-\noindent 其中，$o_i$表示（目标语言）第$i$个短语的调序方向，$\mathbf{o}=\{o_i\}$表示短语序列的调序方向，$K$表示短语的数量。短语之间的调序概率是由双语短语以及短语对齐决定的，$o$表示调序的种类，可以取M、S、D 中的任意一种。而整个句子调序的好坏就是把相邻的短语之间的调序概率相乘（对应取log后的加法）。这样，公式\eqref{eq:7-16}把调序的好坏定义为新的特征，对于M、S、D总共就有三个特征。除了当前短语和前一个短语的调序特征，还可以定义当前短语和后一个短语的调序特征，即将上述公式中的$a_{i-1}$换成$a_{i+1}$。 于是，又可以得到三个特征。因此在MSD调序中总共可以有6个特征。
+\noindent 其中，$o_i$表示（目标语言）第$i$个短语的调序方向，$\seq{o}=\{o_i\}$表示短语序列的调序方向，$K$表示短语的数量。短语之间的调序概率是由双语短语以及短语对齐决定的，$o$表示调序的种类，可以取M、S、D 中的任意一种。而整个句子调序的好坏就是把相邻的短语之间的调序概率相乘（对应取log后的加法）。这样，公式\eqref{eq:7-16}把调序的好坏定义为新的特征，对于M、S、D总共就有三个特征。除了当前短语和前一个短语的调序特征，还可以定义当前短语和后一个短语的调序特征，即将上述公式中的$a_{i-1}$换成$a_{i+1}$。 于是，又可以得到三个特征。因此在MSD调序中总共可以有6个特征。

 \parinterval 具体实现时，通常使用词对齐对两个短语间的调序关系进行判断。图\ref{fig:7-22}展示了这个过程。先判断短语的左上角和右上角是否存在词对齐，再根据其位置对调序类型进行划分。每个短语对应的调序概率都可以用相对频次估计进行计算。而MSD调序模型也相当于在短语表中的每个双语短语后添加6个特征。不过，调序模型一般并不会和短语表一起存储，因此在系统中通常会看到两个独立的模型文件，分别保存短语表和调序模型。


--- a/Chapter9/chapter9.tex
+++ b/Chapter9/chapter9.tex
@@ -2301,7 +2301,7 @@ Jobs was the CEO of {\red{\underline{apple}}}.
 %----------------------------------------------------------------------------------------

 \sectionnewpage
-\section{小结及深入阅读}
+\section{小结及拓展阅读}

 \parinterval  神经网络为解决自然语言处理问题提供了全新的思路。而所谓深度学习也是建立在多层神经网络结构之上的一系列模型和方法。本章从神经网络的基本概念到其在语言建模中的应用进行了概述。由于篇幅所限，这里无法覆盖所有神经网络和深度学习的相关内容，感兴趣的读者可以进一步阅读\textit{Neural Network Methods in Natural Language Processing}\cite{goldberg2017neural}和\textit{Deep Learning}\cite{Goodfellow-et-al-2016}。此外，也有一些研究方向值得关注：


--- a/ChapterAppend/chapterappend.tex
+++ b/ChapterAppend/chapterappend.tex
@@ -193,8 +193,8 @@ a(i|j,m,l) &=\frac{c(i|j;\mathbf{s},\mathbf{t})}  {\sum_{i}c(i|j;\mathbf{s},\mat
 对于由$K$个样本组成的训练集$\{(\mathbf{s}^{[1]},\mathbf{t}^{[1]}),...,(\mathbf{s}^{[K]},\mathbf{t}^{[K]})\}$，可以将M-Step的计算调整为：

 \begin{eqnarray}
-f(s_u|t_v) &=\frac{\sum_{k=0}^{K}c_{\mathbb{E}}(s_u|t_v;\mathbf{s}^{[k]},\mathbf{t}^{[k]}) }    {\sum_{s_u} \sum_{k=1}^{K} c_{\mathbb{E}}(s_u|t_v;\mathbf{s}^{[k]},\mathbf{t}^{[k]})} \\
-a(i|j,m,l) &=\frac{\sum_{k=0}^{K}c_{\mathbb{E}}(i|j;\mathbf{s}^{[k]},\mathbf{t}^{[k]})}  {\sum_{i}\sum_{k=1}^{K}c_{\mathbb{E}}(i|j;\mathbf{s}^{[k]},\mathbf{t}^{[k]})}
+f(s_u|t_v) &=\frac{\sum_{k=1}^{K}c_{\mathbb{E}}(s_u|t_v;\mathbf{s}^{[k]},\mathbf{t}^{[k]}) }    {\sum_{s_u} \sum_{k=1}^{K} c_{\mathbb{E}}(s_u|t_v;\mathbf{s}^{[k]},\mathbf{t}^{[k]})} \\
+a(i|j,m,l) &=\frac{\sum_{k=1}^{K}c_{\mathbb{E}}(i|j;\mathbf{s}^{[k]},\mathbf{t}^{[k]})}  {\sum_{i}\sum_{k=1}^{K}c_{\mathbb{E}}(i|j;\mathbf{s}^{[k]},\mathbf{t}^{[k]})}
 \label{eq:append-3}
 \end{eqnarray}

@@ -228,7 +228,7 @@ c(1|\mathbf{s},\mathbf{t}) & = & \sum_{\mathbf{a}}\big[\funp{P}_{\theta}(\mathbf
 \begin{eqnarray}
 t(s|t) & = & \lambda_{t}^{-1} \times \sum_{k=1}^{K}c(s|t;\mathbf{s}^{[k]},\mathbf{t}^{[k]}) \label{eq:1.7} \\
 d(j|i,m,l) & = & \mu_{iml}^{-1} \times \sum_{k=1}^{K}c(j|i,m,l;\mathbf{s}^{[k]},\mathbf{t}^{[k]}) \label{eq:1.8} \\
-n(\varphi|t) & = & \nu_{t}^{-1} \times \sum_{s=1}^{K}c(\varphi |t;\mathbf{s}^{[k]},\mathbf{t}^{[k]}) \label{eq:1.9} \\
+n(\varphi|t) & = & \nu_{t}^{-1} \times \sum_{k=1}^{K}c(\varphi |t;\mathbf{s}^{[k]},\mathbf{t}^{[k]}) \label{eq:1.9} \\
 p_x & = & \zeta^{-1} \sum_{k=1}^{K}c(x;\mathbf{s}^{[k]},\mathbf{t}^{[k]}) \label{eq:1.10}
 \end{eqnarray}


--- a/bibliography.bib
+++ b/bibliography.bib
@@ -252,12 +252,11 @@
  year      = {2010}
 }

-@article{DBLP:journals/corr/abs-1709-07809,
+@book{DBLP:journals/corr/abs-1709-07809,
  author    = {Philipp Koehn},
  title     = {Neural Machine Translation},
-  journal   = {CoRR},
-  volume    = {abs/1709.07809},
-  year      = {2017}
+  publisher   = {Cambridge University Press},
+  year      = {2020}
 }

 @book{宗成庆2013统计自然语言处理,
@@ -841,8 +840,7 @@
 %%%%% chapter 3------------------------------------------------------

 @inproceedings{ng2002discriminative,
-  author    = {Andrew Y. Ng and
-               Michael I. Jordan},
+  author    = {Ng, Andrew Y and Jordan, Michael I},
  title     = {On Discriminative vs. Generative Classifiers: {A} comparison of logistic
               regression and naive Bayes},
  pages     = {841--848},
@@ -850,7 +848,6 @@
  year      = {2001},
 }

-
 @inproceedings{huang2008coling,
 	author = {Huang, Liang},
    title = {Coling 2008: Advanced Dynamic Programming in Computational Linguistics: Theory, Algorithms and Applications-Tutorial notes},
@@ -898,7 +895,7 @@

 @article{Baum1966Statistical,
  title={Statistical Inference for Probabilistic Functions of Finite State Markov Chains},
-  author={Baum, Leonard E. and Petrie, Ted},
+  author={Baum, Leonard E and Petrie, Ted},
  journal={Annals of Mathematical Stats},
  volume={37},
  number={6},
@@ -909,7 +906,7 @@
 @article{baum1970maximization,
  title={A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains},
  author={Baum, Leonard E and Petrie, Ted and Soules, George and Weiss, Norman},
-  journal={The annals of mathematical statistics},
+  journal={Annals of Mathematical Stats},
  volume={41},
  number={1},
  pages={164--171},
@@ -918,15 +915,17 @@

 @article{1977Maximum,
  title={Maximum likelihood from incomplete data via the EM algorithm},
-  author={ Dempster, A. P. },
-  journal={Journal of the Royal Statal Society},
+  author={Dempster, Arthur P and Laird, Nan M and Rubin, Donald B},
+  journal={Journal of the Royal Statistical Society: Series B (Methodological)},
  volume={39},
-  year={1977},
+  number={1},
+  pages={1--22},
+  year={1977}
 }

 @article{1967Error,
  title={Error bounds for convolutional codes and an asymptotically optimum decoding algorithm},
-  author={ Viterbi, Andrew J. },
+  author={Viterbi, Andrew},
  journal={IEEE Transactions on Information Theory},
  volume={13},
  number={2},
@@ -942,11 +941,10 @@
 }

 @inproceedings{brants-2000-tnt,
-    title = {{T}n{T} {--} A Statistical Part-of-Speech Tagger},
+    title = {TnT - {A} Statistical Part-of-Speech Tagger},
    author = {Brants, Thorsten},
-    month = apr,
    year = {2000},
-    publisher = {Association for Computational Linguistics},
+    publisher = {Annual Meeting of the Association for Computational Linguistics},
    pages = {224--231},
 }

@@ -954,7 +952,6 @@
    title = {Chunk Parsing Revisited},
    author = {Yoshimasa Tsuruoka and
               Jun'ichi Tsujii},
-    month = oct,
    year = {2005},
    publisher = {Annual Meeting of the Association for Computational Linguistics},
    pages = {133--140},
@@ -966,7 +963,6 @@
      Wang, Houfeng  and
      Yu, Shiwen  and
      Xin, Chengsheng},
-    month = jul,
    year = {2003},
    publisher = {Annual Meeting of the Association for Computational Linguistics},
    pages = {92--97},
@@ -1089,9 +1085,7 @@
 }

 @inproceedings{DBLP:conf/muc/BlackRM98,
-  author    = {William J. Black and
-               Fabio Rinaldi and
-               David Mowatt},
+  author    = {Black, William J and Rinaldi, Fabio and Mowatt, David},
  title     = {{FACILE:} Description of the {NE} System Used for {MUC-7}},
  publisher = {Annual Meeting of the Association for Computational Linguistics},
  year      = {1998},
@@ -1163,7 +1157,7 @@
  year={1996}
 }

-@article{mitchell1996m,
+@book{mitchell1996m,
  title={Machine Learning},
  author={Mitchell, Tom},
  journal={McCraw Hill},
@@ -1195,8 +1189,7 @@
  volume={153},
  number={3731},
  pages={34--37},
-  year={1966},
-  publisher={American Association for the Advancement of Science}
+  year={1966}
 }

 %%%%% chapter 3------------------------------------------------------
@@ -1422,8 +1415,7 @@
  journal={Computer Speech \& Language},
  volume={45},
  pages={180--200},
-  year={2017},
-  publisher={Elsevier}
+  year={2017}
 }
 @inproceedings{gamon2005sentence,
  title={Sentence-level MT evaluation without reference translations: Beyond language modeling},
@@ -1499,8 +1491,7 @@
  volume={27},
  number={3-4},
  pages={171--192},
-  year={2013},
-  publisher={Springer}
+  year={2013}
 }
 @inproceedings{DBLP:conf/wmt/BiciciW14,
  author    = {Ergun Bi{\c{c}}ici and
@@ -1801,7 +1792,7 @@
 @inproceedings{popovic2011human,
  title={From human to automatic error classification for machine translation output},
  author={Popovic, Maja and Burchardt, Aljoscha and others},
-  booktitle={European Association for Machine Translation},
+  publisher={European Association for Machine Translation},
  year={2011}
 }
 @article{DBLP:journals/mt/CostaLLCC15,
@@ -2219,6 +2210,7 @@ year = {2012}
 }
 @article{kepler2019unbabel,
  title={Unbabel's Participation in the WMT19 Translation Quality Estimation Shared Task},
+  pages={78--84},
  author={Kepler, F{\'a}bio and Tr{\'e}nous, Jonay and Treviso, Marcos and Vera, Miguel and G{\'o}is, Ant{\'o}nio and Farajian, M Amin and Lopes, Ant{\'o}nio V and Martins, Andr{\'e} FT},
  year={2019}
 }
@@ -2271,15 +2263,8 @@ year = {2012}
  year={2000},
  publisher={Pearson Education India}
 }
-@article{devlin2018bert,
-  title={Bert: Pre-training of deep bidirectional transformers for language understanding},
-  author={Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina},
-  journal={arXiv preprint arXiv:1810.04805},
-  year={2018}
-}
 %%%%% chapter 4------------------------------------------------------
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 %%%%% chapter 5------------------------------------------------------
 @article{brown1990statistical,
@@ -2310,7 +2295,7 @@ year = {2012}
 }
 @article{shannon1949communication,
  title ={Communication theory of secrecy systems},
-  author ={Claude E. Shannon},
+  author ={Claude Elwood Shannon},
  journal ={Bell system technical journal},
  volume ={28},
  number ={4},
@@ -2416,7 +2401,7 @@ year = {2012}
  year={2009}
 }
 @article{DBLP:journals/coling/FraserM07,
-  author    = {Alexander M. Fraser and
+  author    = {Alexander Fraser and
               Daniel Marcu},
  title     = {Measuring Word Alignment Quality for Statistical Machine Translation},
  journal   = {Computational Linguistics},
@@ -2537,7 +2522,7 @@ year = {2012}
 }
 @article{xiao2013unsupervised,
  title ={Unsupervised sub-tree alignment for tree-to-tree translation},
-  author ={Xiao, Tong and Zhu, Jingbo},
+  author ={Tong Xiao and Jingbo Zhu},
  journal ={Journal of Artificial Intelligence Research},
  volume ={48},
  pages ={733--782},
@@ -2573,7 +2558,6 @@ year = {2012}
  publisher = {Annual Meeting of the Association for Computational Linguistics},
  year      = {2005},
 }
-
 %%%%% chapter 6------------------------------------------------------
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

@@ -2947,7 +2931,7 @@ year = {2012}
 }

 @inproceedings{robert2007faster,
-  author    = {Robert C Moore and
+  author    = {Robert C. Moore and
               Chris Quirk},
  title     = {Faster Beam-Search Decoding for Phrasal Statistical Machine Translation},
  publisher = {Machine Translation Summit XI},
@@ -3177,7 +3161,7 @@ year = {2012}
 }
 @inproceedings{DBLP:conf/naacl/ZettlemoyerM07,
  author    = {Luke S. Zettlemoyer and
-               Robert Moore},
+               Robert C. Moore},
  title     = {Selective Phrase Pair Extraction for Improved Statistical Machine
               Translation},
  pages     = {209--212},
@@ -3393,7 +3377,7 @@ year = {2012}
 @inproceedings{charniak2006multilevel,
 	title={Multilevel Coarse-to-Fine PCFG Parsing},
 	author={Eugene {Charniak} and Mark {Johnson} and Micha {Elsner} and Joseph {Austerweil} and David {Ellis} and Isaac {Haxton} and Catherine {Hill} and R. {Shrivaths} and Jeremy {Moore} and Michael {Pozar} and Theresa {Vu}},
-	booktitle={Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics},
+	publisher={Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics},
 	pages={168--175},
 	year={2006}
 }
@@ -3769,9 +3753,9 @@ year = {2012}
 @inproceedings{bangalore2001computing,
  title ={Computing consensus translation from multiple machine translation systems},
  author ={Srinivas Bangalore, German Bordel and Giuseppe Riccardi},
+  publisher = {IEEE Workshop on Automatic Speech Recognition and Understanding},
  pages ={351--354},
-  year ={2001},
-  organization ={The Institute of Electrical and Electronics Engineers}
+  year ={2001}
 }
 @inproceedings{rosti2007combining,
  author    = {Antti-Veikko I. Rosti and
@@ -3809,7 +3793,7 @@ year = {2012}
               Mei Yang and
               Jianfeng Gao and
               Patrick Nguyen and
-               Robert Moore},
+               Robert C. Moore},
  title     = {Indirect-HMM-based Hypothesis Alignment for Combining Outputs from
               Machine Translation Systems},
  pages     = {98--107},