合并分支 'caorunzhe' 到 'mengxia'

Caorunzhe 查看合并请求 !1131

合并分支 'caorunzhe' 到 'mengxia'
Caorunzhe 查看合并请求 !1131
e8548365 · 孟霞 · e33a803e · b26421ec · e8548365 · e8548365
Commit e8548365 authored Sep 14, 2021 by 孟霞
--- a/Chapter7/chapter7.tex
+++ b/Chapter7/chapter7.tex
@@ -914,7 +914,7 @@ dr & = & {\rm{start}}_i-{\rm{end}}_{i-1}-1
 \vspace{0.5em}
 \item 统计机器翻译的成功很大程度上来自判别模型引入任意特征的能力。因此，在统计机器翻译时代，很多工作都集中在新特征的设计上。比如，可以基于不同的统计特征和先验知识设计翻译特征\upcite{och2004smorgasbord,Chiang200911,gildea2003loosely}，也可以模仿分类任务设计大规模的稀疏特征\upcite{DBLP:conf/emnlp/ChiangMR08}。模型训练和特征权重调优也是统计机器翻译中的重要问题，除了最小错误率训练，还有很多方法，比如，最大似然估计\upcite{koehn2003statistical,DBLP:journals/coling/BrownPPM94}、判别式方法\upcite{Blunsom2008A}、贝叶斯方法\upcite{Blunsom2009A,Cohn2009A}、最小风险训练\upcite{smith2006minimum,li2009first}、基于Margin的方法\upcite{watanabe2007online,Chiang200911}以及基于排序模型的方法（PRO）\upcite{Hopkins2011Tuning,dreyer2015apro}。实际上，统计机器翻译的训练和解码也存在不一致的问题，比如，特征值由双语数据上的极大似然估计得到（没有剪枝），而解码时却使用束剪枝，而且模型的目标是最大化机器翻译评价指标。对于这个问题也可以通过调整训练的目标函数进行缓解\upcite{XiaoA,marcu2006practical}。
 \vspace{0.5em}
-\item 短语表是基于短语的系统中的重要模块。但是，简单地利用基于频次的方法估计得到的翻译概率无法很好地处理低频短语。这时就需要对短语表进行平滑\upcite{DBLP:conf/iwslt/ZensN08,DBLP:conf/emnlp/SchwenkCF07,boxing2011unpacking,DBLP:conf/coling/DuanSZ10}。另一方面，随着数据量的增长和抽取短语长度的增大，短语表的体积会急剧膨胀，这也大大增加了系统的存储消耗，同时过大的短语表也会带来短语查询效率的下降。针对这个问题，很多工作尝试对短语表进行压缩。一种思路是限制短语的长度\upcite{DBLP:conf/naacl/QuirkM06,DBLP:journals/coling/MarinoBCGLFC06}；另一种广泛使用的思路是使用一些指标或者分类器来对短语进行剪枝，其核心思想是判断每个短语的质量\upcite{DBLP:conf/emnlp/ZensSX12}，并过滤掉低质量的短语。代表性的方法有：基于假设检验的剪枝\upcite{DBLP:conf/emnlp/JohnsonMFK07}、基于熵的剪枝\upcite{DBLP:conf/emnlp/LingGTB12}、两阶段短语抽取方法\upcite{DBLP:conf/naacl/ZettlemoyerM07}、基于解码中短语使用频率的方法\upcite{DBLP:conf/naacl/EckVW07}等。此外，短语表的存储方式也是在实际使用中需要考虑的问题。因此，也有研究者尝试使用更加紧凑、高效的结构保存短语表。其中最具代表性的结构是后缀数组（Suffix Arrays），这种结构可以充分利用短语之间有重叠的性质，减少了重复存储\upcite{DBLP:conf/acl/Callison-BurchBS05,DBLP:conf/acl/Callison-BurchBS05,DBLP:conf/naacl/ZensN07,2014Dynamic}。
+\item 短语表是基于短语的系统中的重要模块。但是，简单地利用基于频次的方法估计得到的翻译概率无法很好地处理低频短语。这时就需要对短语表进行平滑\upcite{DBLP:conf/iwslt/ZensN08,DBLP:conf/emnlp/SchwenkCF07,boxing2011unpacking,DBLP:conf/coling/DuanSZ10}。另一方面，随着数据量的增长和抽取短语长度的增大，短语表的体积会急剧膨胀，这也大大增加了系统的存储消耗，同时过大的短语表也会带来短语查询效率的下降。针对这个问题，很多工作尝试对短语表进行压缩。一种思路是限制短语的长度\upcite{DBLP:conf/naacl/QuirkM06,DBLP:journals/coling/MarinoBCGLFC06}；另一种广泛使用的思路是使用一些指标或者分类器来对短语进行剪枝，其核心思想是判断每个短语的质量\upcite{DBLP:conf/emnlp/ZensSX12}，并过滤掉低质量的短语。代表性的方法有：基于假设检验的剪枝\upcite{DBLP:conf/emnlp/JohnsonMFK07}、基于熵的剪枝\upcite{DBLP:conf/emnlp/LingGTB12}、两阶段短语抽取方法\upcite{DBLP:conf/naacl/ZettlemoyerM07}、基于解码中短语使用频率的方法\upcite{DBLP:conf/naacl/EckVW07}等。此外，短语表的存储方式也是在实际使用中需要考虑的问题。因此，也有研究者尝试使用更加紧凑、高效的结构保存短语表。其中最具代表性的结构是后缀数组（Suffix Arrays），这种结构可以充分利用短语之间有重叠的性质，减少了重复存储\upcite{DBLP:conf/acl/Callison-BurchBS05,DBLP:conf/naacl/McNamee-and-Mayfield06,DBLP:conf/naacl/ZensN07,2014Dynamic}。
 \vspace{0.5em}
 \end{itemize}

--- a/bibliography.bib
+++ b/bibliography.bib
@@ -145,44 +145,7 @@ new
 	pages={75--102},
 	year={1993}
 }
-@inproceedings{Wu2016GooglesNM,
-  author    = {Yonghui Wu and
-               Mike Schuster and
-               Zhifeng Chen and
-               Quoc V. Le and
-               Mohammad Norouzi and
-               Wolfgang Macherey and
-               Maxim Krikun and
-               Yuan Cao and
-               Qin Gao and
-               Klaus Macherey and
-               Jeff Klingner and
-               Apurva Shah and
-               Melvin Johnson and
-               Xiaobing Liu and
-               Lukasz Kaiser and
-               Stephan Gouws and
-               Yoshikiyo Kato and
-               Taku Kudo and
-               Hideto Kazawa and
-               Keith Stevens and
-               George Kurian and
-               Nishant Patil and
-			   Wei Wang and
-               Cliff Young and
-               Jason Smith and
-               Jason Riesa and
-               Alex Rudnick and
-               Oriol Vinyals and
-               Greg Corrado and
-               Macduff Hughes and
-               Jeffrey Dean},
-  title     = {Google's Neural Machine Translation System: Bridging the Gap between
-               Human and Machine Translation},
-  publisher   = {CoRR},
-  volume    = {abs/1609.08144},
-  year      = {2016}
-}
 @inproceedings{DBLP:journals/corr/LuongPM15,
  author    = {Thang Luong and
               Hieu Pham and
@@ -3015,7 +2978,7 @@ year = {2012}
  publisher = {Annual Meeting of the Association for Computational Linguistics},
  year      = {2005}
 }
-@inproceedings{DBLP:conf/acl/Callison-BurchBS05,
+@inproceedings{DBLP:conf/naacl/McNamee-and-Mayfield06,
  author    = {Paul McNamee and James Mayfield},
  title     = {Translation of Multiword Expressions Using Parallel Suffix Arrays},
  publisher = {Association for Machine Translation in the Americas},
@@ -3351,16 +3314,7 @@ year = {2012}
  publisher = {Annual Meeting of the Association for Computational Linguistics},
  year      = {2006}
 }
-@inproceedings{DBLP:conf/emnlp/DeNeefeKWM07,
-  author    = {Steve DeNeefe and
-               Kevin Knight and
-			   Wei Wang and
-               Daniel Marcu},
-  title     = {What Can Syntax-Based {MT} Learn from Phrase-Based MT?},
-  pages     = {755--763},
-  publisher = {Annual Meeting of the Association for Computational Linguistics},
-  year      = {2007}
-}
 @inproceedings{DBLP:conf/wmt/LiuG08,
  author    = {Ding Liu and
               Daniel Gildea},
@@ -7781,20 +7735,7 @@ author    = {Zhuang Liu and
  volume    = {abs/2006.10369},
  year      = {2020}
 }
-@inproceedings{DBLP:conf/aclnmt/HuLLLLWXZ20,
-  author    = {Chi Hu and
-               Bei Li and
-               Yinqiao Li and
-               Ye Lin and
-               Yanyang Li and
-               Chenglong Wang and
-               Tong Xiao and
-               Jingbo Zhu},
-  title     = {The NiuTrans System for WNGT 2020 Efficiency Task},
-  pages     = {204--210},
-  publisher = {Annual Meeting of the Association for Computational Linguistics},
-  year      = {2020}
-}
 @inproceedings{DBLP:journals/corr/abs-2010-02416,
  author    = {Yi-Te Hsu and
               Sarthak Garg and
@@ -7926,7 +7867,7 @@ author    = {Zhuang Liu and
 			   Xiao, Tong  and  
 			   Zhu, Jingbo},
  title     = {The NiuTrans Machine Translation Systems for WMT20},
-  month          = {November},
+  month          = {11},
  year           = {2020},
  publisher      = {Annual Meeting of the Association for Computational Linguistics},
  pages     = {336--343}
@@ -9232,21 +9173,7 @@ author    = {Zhuang Liu and
  publisher = {{IEEE} Conference on Computer Vision and Pattern Recognition},
  year      = {2017}
 }
-@inproceedings{DBLP:conf/coling/XuHJFWHJXZ20,
-  author    = {Chen Xu and
-               Bojie Hu and
-               Yufan Jiang and
-               Kai Feng and
-               Zeyang Wang and
-               Shen Huang and
-               Qi Ju and
-               Tong Xiao and
-               Jingbo Zhu},
-  title     = {Dynamic Curriculum Learning for Low-Resource Neural Machine Translation},
-  pages     = {3977--3989},
-  publisher = {International Conference on Computational Linguistics},
-  year      = {2020}
-}
 @inproceedings{DBLP:conf/acml/WuXTZQLL18,
  author    = {Lijun Wu and
               Yingce Xia and
@@ -9341,15 +9268,7 @@ author    = {Zhuang Liu and
  publisher = {Conference on Empirical Methods in Natural Language Processing},
  year      = {2007}
 }
-@inproceedings{DBLP:conf/emnlp/ShiPK16,
-  author    = {Xing Shi and
-               Inkit Padhi and
-               Kevin Knight},
-  title     = {Does String-Based Neural {MT} Learn Source Syntax?},
-  pages     = {1526--1534},
-  publisher = {Conference on Empirical Methods in Natural Language Processing},
-  year      = {2016}
-}
 @inproceedings{tu2017neural,
  title={Neural machine translation with reconstruction},
  author={Tu, Zhaopeng and Liu, Yang and Shang, Lifeng and Liu, Xiaohua and Li, Hang},
@@ -10768,12 +10687,6 @@ author    = {Zhuang Liu and
  publisher = {Asian Federation of Natural Language Processing},
  year      = {2017}
 }
-@inproceedings{2018When,
-  title={When and Why are Pre-trainedWord Embeddings Useful for Neural Machine Translation?},
-  author={ Qi, Ye  and  Sachan, Devendra Singh  and  Felix, Matthieu  and  Padmanabhan, Sarguna Janani  and  Neubig, Graham },
-  publisher = {Annual Conference of the North American Chapter of the Association for Computational Linguistics},
-  year={2018},
-}
 @inproceedings{DBLP:conf/acl/PetersABP17,
  author    = {Matthew Peters and
               Waleed Ammar and