\item{\small\bfnew{卷积层}}与{\small\bfnew{门控线性单元}}(Gated Linear Units, GLU\index{Gated Linear Units, GLU}):黄色背景框是卷积模块,这里使用门控线性单元作为非线性函数,之前的研究工作\upcite{Dauphin2017LanguageMW} 表明这种非线性函数更适合于序列建模任务。图中为了简化,只展示了一层卷积,但在实际中为了更好地捕获句子信息,通常使用多层卷积的叠加。
\noindent 其中,$\mathbi{A},\mathbi{B}\in\mathbb{R}^d$,$\mathbi{W}\in\mathbb{R}^{K\times d \times d}$、$\mathbi{V}\in\mathbb{R}^{K\times d \times d}$、$\mathbi{b}_\mathbi{W}$,$\mathbi{b}_\mathbi{V}\in\mathbb{R}^d $,$\mathbi{W}$、$\mathbi{V}$在此表示卷积核,$\mathbi{b}_\mathbi{W}$,$\mathbi{b}_\mathbi{V}$为偏置矩阵。在卷积操作之后,引入非线性变换:
\rule{0pt}{20pt}Layer Type &\begin{tabular}[l]{@{}l@{}}Complexity\\ per Layer\end{tabular}&\begin{tabular}[l]{@{}l@{}}Sequential\\ Operations\end{tabular}&\begin{tabular}[l]{@{}l@{}}Maximum\\ Path Length\end{tabular}\\\hline
\parinterval围绕如何利用回译方法生成伪双语数据,研究人员们进行了详细的分析探讨。一般观点认为,反向模型的性能越好,生成的伪数据质量也就更高,对前向模型的性能提升也就越大({\color{red} 参考文献!})。回译方法面临的一个问题是:反向翻译模型的训练只依赖于有限的双语数据,生成的源语言端伪数据的质量难以保证。为此,可以采用{\small\sffamily\bfnew{迭代式回译}}\index{迭代式回译}(Iterative Back Translation)\index{Iterative Back Translation}的方法\upcite{DBLP:conf/aclnmt/HoangKHC18},同时利用源语言端和目标语言端的单语数据,不断通过回译的方式来提升前向和反向翻译模型的性能。图\ref{fig:16-2-xc}展示了迭代式回译的框架。首先,使用双语数据训练一个前向翻译模型,然后利用源语言单语数据通过回译的方式生成伪双语数据,来提升反向翻译模型的性能,最后由反向翻译模型和目标语言单语数据生成的伪双语数据来提升前向翻译模型的性能。可以看出,这个往复的过程是闭环的,因此可以一直重复进行,直到两个翻译模型的性能均不再提升。
\parinterval在回译方法中,反向翻译模型的训练只依赖于有限的双语数据,因此生成的源语言端伪数据的质量难以保证。为此,可以采用{\small\sffamily\bfnew{迭代式回译}}\index{迭代式回译}(Iterative Back Translation)\index{Iterative Back Translation}的方法\upcite{DBLP:conf/aclnmt/HoangKHC18},同时利用源语言端和目标语言端的单语数据,不断通过回译的方式来提升正向和反向翻译模型的性能。图\ref{fig:16-2-xc}展示了迭代式回译的框架。首先,使用双语数据训练一个正向翻译模型,然后利用源语言单语数据通过回译的方式生成伪双语数据,来提升反向翻译模型的性能,再利用反向翻译模型和目标语言单语数据生成伪双语数据,用于提升正向翻译模型的性能。可以看出,迭代式回译的过程是完全闭环的,因此可以一直重复进行,直到正向和反向翻译模型的性能均不再提升。
\parinterval 虽然预训练词嵌入在海量的单语数据上学习到了丰富的表示,但词嵌入很主要的一个缺点是无法解决一词多义问题。在不同的上下文中,同一个单词经常表示不同的意思,但词嵌入是完全相同的。模型需要在编码过程中通过上下文去理解每个词在当前语境下的含义,从而增加了建模的复杂度。因此,上下文词向量在近些年得到了广泛的关注\upcite{DBLP:conf/acl/PetersABP17,mccann2017learned,DBLP:conf/naacl/PetersNIGCLZ18}。上下文词嵌入是指一个词的表示不仅依赖于单词自身,还要根据所在的上下文语境来得到。由于在不同的上下文中,每个词对应的词嵌入是不同的,因此无法简单地通过词嵌入矩阵来表示,通常的做法是使用海量的单语数据预训练语言模型任务,使模型具备丰富的特征提取能力\upcite{DBLP:conf/naacl/PetersNIGCLZ18,radford2018improving,devlin2019bert}。比如,{\small\bfnew{来自语言模型的嵌入}}(Embeddings from Language Models,ELMo)\index{ELMo}\index{来自语言模型的嵌入}通过BiLSTM模型预训练语言模型任务,通过线性融合不同层的表示来得到每个词的上下文词嵌入,在很多自然语言处理任务上均得到了最佳的性能\upcite{DBLP:conf/naacl/PetersNIGCLZ18}。({\color{red} 许:可以加个图,类似于ELMo里的})
\parinterval BERT的核心思想是通过{\small\bfnew{掩码语言模型}}(Masked Language model,MLM)\index{掩码语言模型}\index{MLM}任务进行预训练。掩码语言模型的思想类似于完形填空,随机选择输入句子中的部分词掩码,模型来预测这些被掩码的词。掩码的具体做法是将被选中的词替换为一个特殊的词[Mask],这样模型在训练过程中,无法得到掩码位置词的信息,需要联合上下文内容进行预测,因此提高了模型对上下文的特征提取能力。实验表明,相比在下游任务中仅利用上下文词嵌入,在大规模单语输数据上预训练的模型具有更强的表示能力。同时,对比单向预训练模型GPT,BERT这种双向编码的训练方式也展示出了更好的效果。
author={Ashish {Vaswani} and Noam {Shazeer} and Niki {Parmar} and Jakob {Uszkoreit} and Llion {Jones} and Aidan N. {Gomez} and Lukasz {Kaiser} and Illia {Polosukhin}},
publisher={International Conference on Neural Information Processing},
pages={5998--6008},
year={2017}
}
@inproceedings{DBLP:conf/acl/LiLWJXZLL20,
author = {Bei Li and
Hui Liu and
...
...
@@ -4417,20 +4425,7 @@ author = {Yoshua Bengio and
pages = {157--166},
year = {1994}
}
@inproceedings{NIPS2017_7181,
author = {Ashish Vaswani and
Noam Shazeer and
Niki Parmar and
Jakob Uszkoreit and
Llion Jones and
Aidan N. Gomez and
Lukasz Kaiser and
Illia Polosukhin},
title = {Attention is All you Need},
publisher = {Conference on Neural Information Processing Systems},
pages = {5998--6008},
year = {2017}
}
@article{StahlbergNeural,
title={Neural Machine Translation: A Review},
author={Felix Stahlberg},
...
...
@@ -4980,6 +4975,94 @@ author = {Yoshua Bengio and
title = {Conformer: Convolution-augmented Transformer for Speech Recognition},
pages = {5036--5040},
publisher = {International Speech Communication Association},
year = {2020}
}
@inproceedings{DBLP:conf/icassp/DongXX18,
author = {Linhao Dong and
Shuang Xu and
Bo Xu},
title = {Speech-Transformer: {A} No-Recurrence Sequence-to-Sequence Model for
Speech Recognition},
pages = {5884--5888},
publisher = {Institute of Electrical and Electronics Engineers},
year = {2018}
}
@article{DBLP:journals/corr/abs-1802-05751,
author = {Niki Parmar and
Ashish Vaswani and
Jakob Uszkoreit and
Lukasz Kaiser and
Noam Shazeer and
Alexander Ku},
title = {Image Transformer},
journal = {CoRR},
volume = {abs/1802.05751},
year = {2018}
}
@inproceedings{vaswani2017attention,
title={Attention is All You Need},
author={Ashish {Vaswani} and Noam {Shazeer} and Niki {Parmar} and Jakob {Uszkoreit} and Llion {Jones} and Aidan N. {Gomez} and Lukasz {Kaiser} and Illia {Polosukhin}},
publisher={International Conference on Neural Information Processing},
pages={5998--6008},
year={2017}
}
%----------
%----------
@inproceedings{DBLP:conf/iclr/RaePJHL20,
author = {Jack W. Rae and
Anna Potapenko and
...
...
@@ -7449,7 +7590,502 @@ author = {Yoshua Bengio and
publisher = {Conference on Empirical Methods in Natural Language Processing},
year = {2019}
}
@inproceedings{DBLP:conf/acl/FadaeeBM17a,
author = {Marzieh Fadaee and
Arianna Bisazza and
Christof Monz},
title = {Data Augmentation for Low-Resource Neural Machine Translation},
pages = {567--573},
publisher = {Association for Computational Linguistics},
year = {2017}
}
@inproceedings{DBLP:conf/emnlp/WangPDN18,
author = {Xinyi Wang and
Hieu Pham and
Zihang Dai and
Graham Neubig},
title = {SwitchOut: an Efficient Data Augmentation Algorithm for Neural Machine
Translation},
pages = {856--861},
publisher = {Conference on Empirical Methods in Natural Language Processing},
year = {2018}
}
@inproceedings{DBLP:conf/emnlp/MartonCR09,
author = {Yuval Marton and
Chris Callison-Burch and
Philip Resnik},
title = {Improved Statistical Machine Translation Using Monolingually-Derived
Paraphrases},
pages = {381--390},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2009}
}
@inproceedings{DBLP:conf/eacl/LapataSM17,
author = {Jonathan Mallinson and
Rico Sennrich and
Mirella Lapata},
title = {Paraphrasing Revisited with Neural Machine Translation},
pages = {881--893},
publisher = {European Association of Computational Linguistics},
year = {2017}
}
@inproceedings{DBLP:conf/aclnmt/ImamuraFS18,
author = {Kenji Imamura and
Atsushi Fujita and
Eiichiro Sumita},
title = {Enhancement of Encoder and Attention Using Target Monolingual Corpora
in Neural Machine Translation},
pages = {55--63},
publisher = {Association for Computational Linguistics},
year = {2018}
}
@inproceedings{DBLP:conf/icml/VincentLBM08,
author = {Pascal Vincent and
Hugo Larochelle and
Yoshua Bengio and
Pierre-Antoine Manzagol},
title = {Extracting and composing robust features with denoising autoencoders},
series = {{ACM} International Conference Proceeding Series},
volume = {307},
pages = {1096--1103},
publisher = {International Conference on Machine Learning}
}
@article{DBLP:journals/ipm/FarhanTAJATT20,
author = {Wael Farhan and
Bashar Talafha and
Analle Abuammar and
Ruba Jaikat and
Mahmoud Al-Ayyoub and
Ahmad Bisher Tarakji and
Anas Toma},
title = {Unsupervised dialectal neural machine translation},
journal = {Inform Process Manag},
volume = {57},
number = {3},
pages = {102181},
year = {2020}
}
@inproceedings{DBLP:conf/iclr/LampleCDR18,
author = {Guillaume Lample and
Alexis Conneau and
Ludovic Denoyer and
Marc'Aurelio Ranzato},
title = {Unsupervised Machine Translation Using Monolingual Corpora Only},
publisher = {International Conference on Learning Representations},
year = {2018}
}
@article{DBLP:journals/coling/BhagatH13,
author = {Rahul Bhagat and
Eduard H. Hovy},
title = {What Is a Paraphrase?},
journal = {Computational Linguistics},
volume = {39},
number = {3},
pages = {463--472},
year = {2013}
}
@article{2010Generating,
title={Generating Phrasal and Sentential Paraphrases: A Survey of Data-Driven Methods},
author={ Madnani, Nitin and Dorr, Bonnie J. },
journal={Computational Linguistics},
volume={36},
number={3},
pages={341-387},
year={2010},
}
@inproceedings{DBLP:conf/wmt/GuoH19,
author = {Yinuo Guo and
Junfeng Hu},
title = {Meteor++ 2.0: Adopt Syntactic Level Paraphrase Knowledge into Machine
Translation Evaluation},
pages = {501--506},
publisher = {Association for Computational Linguistics},
year = {2019}
}
@inproceedings{DBLP:conf/acl/ZhouSW19,
author = {Zhong Zhou and
Matthias Sperber and
Alexander H. Waibel},
title = {Paraphrases as Foreign Languages in Multilingual Neural Machine Translation},
pages = {113--122},
publisher = {Association for Computational Linguistics},
year = {2019}
}
@inproceedings{DBLP:conf/eacl/LapataSM17,
author = {Jonathan Mallinson and
Rico Sennrich and
Mirella Lapata},
title = {Paraphrasing Revisited with Neural Machine Translation},
pages = {881--893},
publisher = {European Association of Computational Linguistics},
year = {2017}
}
@inproceedings{yasuda2008method,
title={Method for building sentence-aligned corpus from wikipedia},
author={Yasuda, Keiji and Sumita, Eiichiro},
publisher={2008 AAAI Workshop on Wikipedia and Artificial Intelligence},
pages={263--268},
year={2008}
}
@article{2005Improving,
title={Improving Machine Translation Performance by Exploiting Non-Parallel Corpora},
author={ Munteanu, Ds and Marcu, D },
journal={Computational Linguistics},
volume={31},
number={4},
pages={477-504},
year={2005},
}
@inproceedings{DBLP:conf/naacl/SmithQT10,
author = {Jason R. Smith and
Chris Quirk and
Kristina Toutanova},
title = {Extracting Parallel Sentences from Comparable Corpora using Document
Level Alignment},
pages = {403--411},
publisher = {The Association for Computational Linguistics},
year = {2010}
}
@article{DBLP:journals/jair/RuderVS19,
author = {Sebastian Ruder and
Ivan Vulic and
Anders S{\o}gaard},
title = {A Survey of Cross-lingual Word Embedding Models},
journal = {J. Artif. Intell. Res.},
volume = {65},
pages = {569--631},
year = {2019}
}
@inproceedings{DBLP:conf/acl/TuLLLL16,
author = {Zhaopeng Tu and
Zhengdong Lu and
Yang Liu and
Xiaohua Liu and
Hang Li},
title = {Modeling Coverage for Neural Machine Translation},
publisher = {The Association for Computer Linguistics},
year = {2016}
}
@article{DBLP:journals/tacl/TuLLLL17,
author = {Zhaopeng Tu and
Yang Liu and
Zhengdong Lu and
Xiaohua Liu and
Hang Li},
title = {Context Gates for Neural Machine Translation},
journal = {Annual Meeting of the Association for Computational Linguistics},
volume = {5},
pages = {87--99},
year = {2017}
}
@inproceedings{DBLP:conf/wmt/WangCJYCLSWY17,
author = {Yuguang Wang and
Shanbo Cheng and
Liyang Jiang and
Jiajun Yang and
Wei Chen and
Muze Li and
Lin Shi and
Yanfeng Wang and
Hongtao Yang},
title = {Sogou Neural Machine Translation Systems for {WMT17}},
pages = {410--415},
publisher = {Association for Computational Linguistics},