\parinterval 基于战时密码学领域与通讯领域的研究,Claude Elwood Shannon在1948年提出使用“噪声信道”描述语言的传输过程,并借用热力学中的“{\small\bfnew{熵}}\index{熵}”(Entropy)\index{Entropy}来刻画消息中的信息量\upcite{DBLP:journals/bstj/Shannon48}。次年,Shannon与Warren Weaver更是合著了著名的\emph{The Mathematical Theory of Communication}\upcite{shannon1949the},这些工作都为后期的统计机器翻译打下了理论基础。
\parinterval 首先介绍一下全概率公式:{\small\bfnew{全概率公式}}\index{全概率公式}(Law Of Total Probability)\index{Law Of Total Probability}是概率论中重要的公式,它可以将一个复杂事件发生的概率分解成不同情况的小事件发生概率的和。这里先介绍一个概念——划分。集合$\Sigma$的一个划分事件为$\{B_1, \ldots ,B_n\}$是指它们满足$\bigcup_{i=1}^n B_i=S \textrm{且}B_iB_j=\varnothing , i,j=1, \ldots ,n,i\neq j$。此时事件$A$的全概率公式可以被描述为:
\parinterval 首先介绍一下全概率公式:{\small\bfnew{全概率公式}}\index{全概率公式}(Law of Total Probability)\index{Law of Total Probability}是概率论中重要的公式,它可以将一个复杂事件发生的概率分解成不同情况的小事件发生概率的和。这里先介绍一个概念——划分。集合$\Sigma$的一个划分事件为$\{B_1, \ldots ,B_n\}$是指它们满足$\bigcup_{i=1}^n B_i=S \textrm{且}B_iB_j=\varnothing , i,j=1, \ldots ,n,i\neq j$。此时事件$A$的全概率公式可以被描述为:
\{\text{<sos>}\ a, \text{<sos>}\ b, \text{<sos>}\ \text{<eos>}\}\nonumber
\end{eqnarray}
\noindent 其中可以划分成长度为0的完整的单词序列集合\{<sos> <eos>\}和长度为1的未结束的单词序列片段集合\{<sos> a, <sos> b\},然后下一步对未结束的单词序列枚举词表中的所有单词,可以生成:
\noindent 其中可以划分成长度为0的完整的单词序列集合$\{\text{<sos>}\ \text{<eos>}\}$和长度为1的未结束的单词序列片段集合$\{\text{<sos>}\ a, \text{<sos>}\ b\}$,然后下一步对未结束的单词序列枚举词表中的所有单词,可以生成:
\begin{eqnarray}
\text{\{<sos> a a, <sos> a b, <sos> a <eos>, <sos> b a, <sos> b b, <sos> b <eos>\}}\nonumber
\{\text{<sos>}\ a\ a, \text{<sos>}\ a\ b, \text{<sos>}\ a\ \text{<eos>}, \text{<sos>}\ b\ a, \text{<sos>}\ b\ b, \text{<sos>}\ b\ \text{<eos>}\}\nonumber
\end{eqnarray}
\parinterval 此时可以划分出长度为1的完整单词序列集合\{<sos> a <eos>, <sos> b <eos>\},以及长度为2的未结束单词序列片段集合\{<sos> a a, <sos> a b, <sos> b a, <sos> b b\}。以此类推,继续生成未结束序列,直到单词序列的长度达到所允许的最大长度。
\parinterval 此时可以划分出长度为1的完整单词序列集合$\{\text{<sos>}\ a\ \text{<eos>}, \text{<sos>}\ b\ \text{<sos>}\}$,以及长度为2的未结束单词序列片段集合$\{\text{<sos>}\ a\ a, \text{<sos>}\ a\ b, \text{<sos>}\ b\ a, \text{<sos>}\ b\ b\}$。以此类推,继续生成未结束序列,直到单词序列的长度达到所允许的最大长度。
booktitle = {13th International Conference on Computational Linguistics, {COLING}
//booktitle = {13th International Conference on Computational Linguistics, {COLING}
1990, University of Helsinki, Finland, August 20-25, 1990},
pages = {247--252},
year = {1990}
...
...
@@ -213,7 +213,7 @@
Daniele Pighin and
Yuval Marton},
title = {Effective Approaches to Attention-based Neural Machine Translation},
booktitle = {Proceedings of the 2015 Conference on Empirical Methods in Natural
//booktitle = {Proceedings of the 2015 Conference on Empirical Methods in Natural
Language Processing, {EMNLP} 2015, Lisbon, Portugal, September 17-21,
2015},
pages = {1412--1421},
...
...
@@ -230,7 +230,7 @@
//editor = {Doina Precup and
Yee Whye Teh},
title = {Convolutional Sequence to Sequence Learning},
booktitle = {Proceedings of the 34th International Conference on Machine Learning,
//booktitle = {Proceedings of the 34th International Conference on Machine Learning,
{ICML} 2017, Sydney, NSW, Australia, 6-11 August 2017},
series = {Proceedings of Machine Learning Research},
volume = {70},
...
...
@@ -246,7 +246,7 @@
//editor = {Yoshua Bengio and
Yann LeCun},
title = {Neural Machine Translation by Jointly Learning to Align and Translate},
booktitle = {3rd International Conference on Learning Representations, {ICLR} 2015,
//booktitle = {3rd International Conference on Learning Representations, {ICLR} 2015,
San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings},
year = {2015}
}
...
...
@@ -261,7 +261,7 @@
Neil D. Lawrence and
Kilian Q. Weinberger},
title = {Sequence to Sequence Learning with Neural Networks},
booktitle = {Advances in Neural Information Processing Systems 27: Annual Conference
//booktitle = {Advances in Neural Information Processing Systems 27: Annual Conference
on Neural Information Processing Systems 2014, December 8-13 2014,
Montreal, Quebec, Canada},
pages = {3104--3112},
...
...
@@ -397,7 +397,7 @@
author = {Reinhard Kneser and
Hermann Ney},
title = {Improved backing-off for M-gram language modeling},
booktitle = {1995 International Conference on Acoustics, Speech, and Signal Processing,
//booktitle = {1995 International Conference on Acoustics, Speech, and Signal Processing,
{ICASSP} '95, Detroit, Michigan, USA, May 08-12, 1995},
pages = {181--184},
publisher = {{IEEE} Computer Society},
...
...
@@ -407,7 +407,7 @@
@inproceedings{ney1991smoothing,
title={On smoothing techniques for bigram-based natural language modelling},
author={Ney, Hermann and Essen, Ute},
booktitle={Acoustics, Speech, and Signal Processing, IEEE International Conference on},
//booktitle={Acoustics, Speech, and Signal Processing, IEEE International Conference on},
pages={825--828},
year={1991},
organization={IEEE Computer Society}
...
...
@@ -418,7 +418,7 @@
//editor = {John H. L. Hansen and
Bryan L. Pellom},
title = {{SRILM} - an extensible language modeling toolkit},
booktitle = {7th International Conference on Spoken Language Processing, {ICSLP2002}
//booktitle = {7th International Conference on Spoken Language Processing, {ICSLP2002}
- {INTERSPEECH} 2002, Denver, Colorado, USA, September 16-20, 2002},
publisher = {{ISCA}},
year = {2002}
...
...
@@ -553,7 +553,7 @@
Mingbo Ma},
title = {When to Finish? Optimal Beam Search for Neural Text Generation (modulo
beam size)},
booktitle = {Proceedings of the 2017 Conference on Empirical Methods in Natural
//booktitle = {Proceedings of the 2017 Conference on Empirical Methods in Natural
Language Processing, {EMNLP} 2017, Copenhagen, Denmark, September
9-11, 2017},
pages = {2134--2139},
...
...
@@ -571,7 +571,7 @@
Jun'ichi Tsujii},
title = {Breaking the Beam Search Curse: {A} Study of (Re-)Scoring Methods
and Stopping Criteria for Neural Machine Translation},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural
//booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural
Language Processing, Brussels, Belgium, October 31 - November 4, 2018},
pages = {3054--3059},
publisher = {Association for Computational Linguistics},
...
...
@@ -626,7 +626,7 @@
@inproceedings{kirchhoff2005improved,
title={Improved Language Modeling for Statistical Machine Translation},
author={Katrin {Kirchhoff} and Mei {Yang}},
booktitle={Proceedings of the ACL Workshop on Building and Using Parallel Texts},
//booktitle={Proceedings of the ACL Workshop on Building and Using Parallel Texts},
pages={125--128},
year={2005}
}
...
...
@@ -634,7 +634,7 @@
@inproceedings{koehn2007factored,
title={Factored Translation Models},
author={Philipp {Koehn} and Hieu {Hoang}},
booktitle={Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)},
//booktitle={Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)},
pages={868--876},
year={2007}
}
...
...
@@ -642,7 +642,7 @@
@inproceedings{sarikaya2007joint,
title={Joint Morphological-Lexical Language Modeling for Machine Translation},
author={Ruhi {Sarikaya} and Yonggang {Deng}},
booktitle={Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers},
//booktitle={Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers},
pages={145--148},
year={2007}
}
...
...
@@ -650,7 +650,7 @@
@inproceedings{heafield2011kenlm,
title={KenLM: Faster and Smaller Language Model Queries},
author={Kenneth {Heafield}},
booktitle={Proceedings of the Sixth Workshop on Statistical Machine Translation},
//booktitle={Proceedings of the Sixth Workshop on Statistical Machine Translation},
pages={187--197},
year={2011}
}
...
...
@@ -658,7 +658,7 @@
@inproceedings{federico2006how,
title={How Many Bits Are Needed To Store Probabilities for Phrase-Based Translation?},
author={Marcello {Federico} and Nicola {Bertoldi}},
booktitle={Proceedings on the Workshop on Statistical Machine Translation},
//booktitle={Proceedings on the Workshop on Statistical Machine Translation},
pages={94--101},
year={2006}
}
...
...
@@ -666,7 +666,7 @@
@inproceedings{federico2007efficient,
title={Efficient Handling of N-gram Language Models for Statistical Machine Translation},
author={Marcello {Federico} and Mauro {Cettolo}},
booktitle={Proceedings of the Second Workshop on Statistical Machine Translation},
//booktitle={Proceedings of the Second Workshop on Statistical Machine Translation},
pages={88--95},
year={2007}
}
...
...
@@ -674,7 +674,7 @@
@inproceedings{talbot2007randomised,
title={Randomised Language Modelling for Statistical Machine Translation},
author={David {Talbot} and Miles {Osborne}},
booktitle={Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics},
//booktitle={Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics},
pages={512--519},
year={2007}
}
...
...
@@ -682,7 +682,7 @@
@inproceedings{talbot2007smoothed,
title={Smoothed Bloom Filter Language Models: Tera-Scale LMs on the Cheap},
author={David {Talbot} and Miles {Osborne}},
booktitle={Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)},
//booktitle={Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)},
pages={468--476},
year={2007}
}
...
...
@@ -714,7 +714,7 @@
Keikichi Hirose and
Satoshi Nakamura},
title = {Recurrent neural network based language model},
booktitle = {{INTERSPEECH} 2010, 11th Annual Conference of the International Speech
//booktitle = {{INTERSPEECH} 2010, 11th Annual Conference of the International Speech
Communication Association, Makuhari, Chiba, Japan, September 26-30,
2010},
pages = {1045--1048},
...
...
@@ -725,7 +725,7 @@
@inproceedings{sundermeyer2012lstm,
title={LSTM Neural Networks for Language Modeling.},
author={Martin {Sundermeyer} and Ralf {Schlüter} and Hermann {Ney}},
booktitle={INTERSPEECH},
//booktitle={INTERSPEECH},
pages={194--197},
year={2012}
}
...
...
@@ -733,7 +733,7 @@
@inproceedings{vaswani2017attention,
title={Attention is All You Need},
author={Ashish {Vaswani} and Noam {Shazeer} and Niki {Parmar} and Jakob {Uszkoreit} and Llion {Jones} and Aidan N. {Gomez} and Lukasz {Kaiser} and Illia {Polosukhin}},
booktitle={Proceedings of the 31st International Conference on Neural Information Processing Systems},
//booktitle={Proceedings of the 31st International Conference on Neural Information Processing Systems},
pages={5998--6008},
year={2017}
}
...
...
@@ -741,7 +741,7 @@
@inproceedings{tillmann1997a,
title={A DP-based Search Using Monotone Alignments in Statistical Translation},
author={Christoph {Tillmann} and Stephan {Vogel} and Hermann {Ney} and Alex {Zubiaga}},
booktitle={Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics},
//booktitle={Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics},
pages={289--296},
year={1997}
}
...
...
@@ -752,7 +752,7 @@
//editor = {Philip R. Cohen and
Wolfgang Wahlster},
title = {Decoding Algorithm in Statistical Machine Translation},
booktitle = {35th Annual Meeting of the Association for Computational Linguistics
//booktitle = {35th Annual Meeting of the Association for Computational Linguistics
and 8th Conference of the European Chapter of the Association for
Computational Linguistics, Proceedings of the Conference, 7-12 July
1997, Universidad Nacional de Educaci{\'{o}}n a Distancia (UNED),
...
...
@@ -767,7 +767,7 @@
Nicola Ueffing and
Hermann Ney},
title = {An Efficient A* Search Algorithm for Statistical Machine Translation},
booktitle = {Proceedings of the {ACL} Workshop on Data-Driven Methods in Machine
//booktitle = {Proceedings of the {ACL} Workshop on Data-Driven Methods in Machine
Translation, Toulouse, France, July 7, 2001},
year = {2001}
}
...
...
@@ -775,7 +775,7 @@
@inproceedings{germann2001fast,
title={Fast Decoding and Optimal Decoding for Machine Translation},
author={Ulrich {Germann} and Michael {Jahr} and Kevin {Knight} and Daniel {Marcu} and Kenji {Yamada}},
booktitle={Proceedings of 39th Annual Meeting of the Association for Computational Linguistics},
//booktitle={Proceedings of 39th Annual Meeting of the Association for Computational Linguistics},
pages={228--235},
year={2001}
}
...
...
@@ -783,7 +783,7 @@
@inproceedings{germann2003greedy,
title={Greedy decoding for statistical machine translation in almost linear time},
author={Ulrich {Germann}},
booktitle={NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1},
//booktitle={NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1},
pages={1--8},
year={2003}
}
...
...
@@ -807,7 +807,7 @@
Antal van den Bosch and
Annie Zaenen},
title = {Moses: Open Source Toolkit for Statistical Machine Translation},
booktitle = {{ACL} 2007, Proceedings of the 45th Annual Meeting of the Association
//booktitle = {{ACL} 2007, Proceedings of the 45th Annual Meeting of the Association
for Computational Linguistics, June 23-30, 2007, Prague, Czech Republic},
publisher = {The Association for Computational Linguistics},
year = {2007}
...
...
@@ -819,7 +819,7 @@
Kathryn Taylor},
title = {Pharaoh: {A} Beam Search Decoder for Phrase-Based Statistical Machine
Translation Models},
booktitle = {Machine Translation: From Real Users to Research, 6th Conference of
//booktitle = {Machine Translation: From Real Users to Research, 6th Conference of
the Association for Machine Translation in the Americas, {AMTA} 2004,
Washington, DC, USA, September 28-October 2, 2004, Proceedings},
series = {Lecture Notes in Computer Science},
...
...
@@ -832,7 +832,7 @@
@inproceedings{bangalore2001a,
title={A finite-state approach to machine translation},
author={S. {Bangalore} and G. {Riccardi}},
booktitle={IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.},
//booktitle={IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.},
pages={381--388},
year={2001}
}
...
...
@@ -840,7 +840,7 @@
@inproceedings{bangalore2000stochastic,
title={Stochastic finite-state models for spoken language machine translation},
author={Srinivas {Bangalore} and Giuseppe {Riccardi}},
booktitle={NAACL-ANLP-EMTS '00 Proceedings of the 2000 NAACL-ANLP Workshop on Embedded machine translation systems - Volume 5},
//booktitle={NAACL-ANLP-EMTS '00 Proceedings of the 2000 NAACL-ANLP Workshop on Embedded machine translation systems - Volume 5},
pages={52--59},
year={2000}
}
...
...
@@ -848,7 +848,7 @@
@inproceedings{venugopal2007an,
title={An Efficient Two-Pass Approach to Synchronous-CFG Driven Statistical MT},
author={Ashish {Venugopal} and Andreas {Zollmann} and Vogel {Stephan}},
booktitle={Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference},
//booktitle={Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference},
pages={500--507},
year={2007}
}
...
...
@@ -864,7 +864,7 @@
Christof Monz},
title = {The Syntax Augmented {MT} {(SAMT)} System at the Shared Task for the
2007 {ACL} Workshop on Statistical Machine Translation},
booktitle = {Proceedings of the Second Workshop on Statistical Machine Translation,
//booktitle = {Proceedings of the Second Workshop on Statistical Machine Translation,
WMT@ACL 2007, Prague, Czech Republic, June 23, 2007},
pages = {216--219},
publisher = {Association for Computational Linguistics},
...
...
@@ -879,7 +879,7 @@
Claire Cardie and
Pierre Isabelle},
title = {Tree-to-String Alignment Template for Statistical Machine Translation},
booktitle = {{ACL} 2006, 21st International Conference on Computational Linguistics
//booktitle = {{ACL} 2006, 21st International Conference on Computational Linguistics
and 44th Annual Meeting of the Association for Computational Linguistics,
Proceedings of the Conference, Sydney, Australia, 17-21 July 2006},
publisher = {The Association for Computer Linguistics},
...
...
@@ -899,7 +899,7 @@
Pierre Isabelle},
title = {Scalable Inference and Training of Context-Rich Syntactic Translation
Models},
booktitle = {{ACL} 2006, 21st International Conference on Computational Linguistics
//booktitle = {{ACL} 2006, 21st International Conference on Computational Linguistics
and 44th Annual Meeting of the Association for Computational Linguistics,
Proceedings of the Conference, Sydney, Australia, 17-21 July 2006},
publisher = {The Association for Computer Linguistics},
...
...
@@ -912,7 +912,7 @@
Hwee Tou Ng and
Kemal Oflazer},
title = {A Hierarchical Phrase-Based Model for Statistical Machine Translation},
booktitle = {{ACL} 2005, 43rd Annual Meeting of the Association for Computational
//booktitle = {{ACL} 2005, 43rd Annual Meeting of the Association for Computational
Linguistics, Proceedings of the Conference, 25-30 June 2005, University