15章深入阅读和bib，13章文字和bib

8d62bb4d · 单韦乔 · 9557b055 · 8d62bb4d · 8d62bb4d · 8d62bb4d
Commit 8d62bb4d authored Dec 23, 2020 by 单韦乔
--- a/Chapter13/chapter13.tex
+++ b/Chapter13/chapter13.tex
--- a/Chapter15/chapter15.tex
+++ b/Chapter15/chapter15.tex
@@ -1332,10 +1332,12 @@ f(x) &=& x \cdot \delta(\beta x) \\
 \sectionnewpage
 \section{小结及深入阅读}

-\parinterval 除了上述介绍的多分支网络，还可以通过多尺度的思想来对输入的特征表示进行分级表示，引入短语的信息\upcite{DBLP:conf/emnlp/HaoWSZT19}。此外，在对自注意力网络中的注意力权重分布进行修改时，同样可以根据不同的缩放比例对序列中的实词与虚词进行区分\upcite{DBLP:conf/emnlp/Lin0RLS18}。
+\parinterval 神经网络结构的设计一直是研究热点。本章节介绍了基于Transformer模型的结构优化工作，其中涉及了基于注意力机制的改进，网络连接优化，基于树结构的模型，以及神经网络结构的自动搜索。依靠自注意力机制的高并行化的计算特性，以及其全局建模能力，Transformer模型在各项自然语言处理任务中取得了优异的成绩。因此，该模型常被用来当作各项任务的基线模型。Transformer模型能够取得如此傲人的成绩，一定程度上受益于多头自注意力机制。多头机制可以让模型从更多维度的空间中提取相应的特征，与多分支思想有异曲同工之妙。研究人员针对编码端的多头进行分析，发现部分头在神经网络的学习过程中扮演至关重要的角色，并且蕴含语言学解释\upcite{DBLP:journals/corr/abs-1905-09418}。而另一部分头本身则不具备实质性的解释，可以通过剪枝的方式去除神经网络的冗余。然而在Transformer模型中，并不是头数越多，模型的性能就越强。{\red 一个有趣的发现是，在训练过程中利用多头，在推断过程中可以去除大部分头，性能没有明显变化，但却能够提高在CPU等串行计算单元的计算效率}\upcite{Michel2019AreSH}。

-\parinterval 高效地Transformer：针对处理长文本数据时面临庞大的时间复杂度问题，除了上述章节介绍的方法，研究人员通过简化网络的结构来构建更高效地Transformer结构\upcite{DBLP:journals/corr/abs-2004-05150,DBLP:journals/corr/abs-2006-04768,DBLP:journals/corr/abs-2009-14794}。
+\parinterval 为了进一步提高Transformer的性能，可以利用正则化训练手段,在训练过程中增大不同头之间的差异\upcite{DBLP:conf/emnlp/LiTYLZ18}。也可以通过多尺度的思想,对输入的特征进行分级表示，并引入短语的信息\upcite{DBLP:conf/emnlp/HaoWSZT19}。此外，在对自注意力网络中的注意力权重分布进行修改时，同样可以根据不同的缩放比例对序列中的实词与虚词进行区分\upcite{DBLP:conf/emnlp/Lin0RLS18}。除了上述基于编码端-解码端的建模范式，还可以定义隐变量模型来捕获句子中潜在的语义信息\upcite{Su2018VariationalRN,DBLP:conf/acl/SetiawanSNP20},或直接对源语言和目标语言进行联合的抽象表示\upcite{Li2020NeuralMT}。

-\parinterval  ODE工作（图像）
+\parinterval Transformer的优越性能得益于自注意力机制与前馈神经网络的子层设计，同时残差连接{\red (引用)}与层正则化{\red (引用)}的引入让网络的训练变得更加稳定。此外，Transformer在训练过程中使用多种dropout来缓解过拟合问题\upcite{JMLR:v15:srivastava14a}，同时在计算交叉熵损失时，也可以使用标签平滑(Label Smoothing)来提高模型的泛化能力\upcite{Szegedy_2016_CVPR}。有研究人员综合了上述的技术并提出了RNMT模型\upcite{Chen2018TheBO}，即利用循环神经网络作为编码器和解码器，搭配上残差连接、层正则化、多头自注意力以及标签平滑等机制，构建了一个新模型。该模型能够取得与Transformer相媲美的性能，这也侧面反映了Transformer模型恰似一件艺术品，各个组件共同作用，缺一不可。

-\parinterval 结构搜索技术是模型结构优化的一种方法，在近些年愈加得到广泛的关注。其中搜索策略作为衔接搜索空间和评价方法的关键步骤，在整个结构搜索方法中扮演着非常重要的角色，不同类型的搜索策略也决定了搜索过程能够在怎样的搜索空间中进行、性能评估如何帮助完成搜索等。除前文详细介绍的基于进化算法、强化学习以及梯度的方法外，基于贝叶斯优化以及随机搜索的方式同样能够有效对搜索空间进行探索。贝叶斯优化的方式已经在超参数优化领域中取得一定成绩，而对于结构搜索任务来说，也有相关研究人员尝试使用基于树的模型来对模型结构以及超参数进行协同优化\upcite{DBLP:conf/nips/BergstraBBK11,DBLP:conf/lion/HutterHL11,DBLP:conf/icml/BergstraYC13,DBLP:conf/ijcai/DomhanSH15,DBLP:conf/icml/MendozaKFSH16,DBLP:journals/corr/abs-1807-06906}。随机搜索的方法在很多工作中被作为基线，这种方式在搜索空间中随机对搜索单元进行选择及操作，最终通过组合或进化等方式找到适用于于当前任务的模型结构\upcite{li2020automated,DBLP:conf/cvpr/BenderLCCCKL20,DBLP:conf/uai/LiT19}。对于结构搜索任务来说如何提升目前搜索策略的稳定性\upcite{DBLP:conf/iccv/ChenXW019,DBLP:conf/icml/ChenH20,DBLP:conf/iclr/XuX0CQ0X20}，如何在更大的搜索空间\upcite{DBLP:conf/iclr/XieZLL19,DBLP:conf/acl/LiHZXJXZLL20,DBLP:conf/iclr/CaiZH19}、更多的任务中进行搜索\upcite{DBLP:conf/emnlp/JiangHXZZ19,DBLP:conf/icml/SoLL19}等问题成为结构搜索方法中亟待解决的重要问题。
+\parinterval 针对处理长文本数据时面临的复杂度较高的问题，一种比较直接的方式是优化自注意力机制，将复杂度减少到O(N)，其中N代表输入序列的长度。例如采用基于滑动窗口的局部注意力的Longformer模型\upcite{DBLP:journals/corr/abs-2004-05150}，基于正向的正价随机特征的Performer\upcite{DBLP:journals/corr/abs-2006-04768},应用低秩分解重新设计注意力计算的Linformer\upcite{DBLP:journals/corr/abs-2009-14794}，应用星型拓扑排序的Star-Transformer模型(Star-Transformer)。
+
+\parinterval 结构搜索技术是模型结构优化的一种方法，在近些年得到了愈加广泛的关注。其中搜索策略是衔接搜索空间和评价方法的关键步骤，在整个结构搜索方法中扮演着非常重要的角色，不同类型的搜索策略也决定了搜索过程能够在怎样的搜索空间中进行，以及性能评估如何帮助完成搜索等。除前文详细介绍的基于进化算法、强化学习以及基于梯度的方法外，基于贝叶斯优化以及随机搜索的方式同样能够有效地对搜索空间进行探索。贝叶斯优化的方式已经在超参数优化领域中取得一定成绩，而对于结构搜索任务来说，也有相关研究人员尝试使用基于树的模型来对模型结构以及超参数进行协同优化\upcite{DBLP:conf/nips/BergstraBBK11,DBLP:conf/lion/HutterHL11,DBLP:conf/icml/BergstraYC13,DBLP:conf/ijcai/DomhanSH15,DBLP:conf/icml/MendozaKFSH16,DBLP:journals/corr/abs-1807-06906}。随机搜索的方法在很多工作中被作为基线，这种方式在搜索空间中随机对搜索单元进行选择及操作，最终通过组合或进化等方式找到适用于当前任务的模型结构\upcite{li2020automated,DBLP:conf/cvpr/BenderLCCCKL20,DBLP:conf/uai/LiT19}。对于结构搜索任务来说如何提升目前搜索策略的稳定性\upcite{DBLP:conf/iccv/ChenXW019,DBLP:conf/icml/ChenH20,DBLP:conf/iclr/XuX0CQ0X20}，如何在更大的搜索空间\upcite{DBLP:conf/iclr/XieZLL19,DBLP:conf/acl/LiHZXJXZLL20,DBLP:conf/iclr/CaiZH19}、如何在更多的任务中进行搜索\upcite{DBLP:conf/emnlp/JiangHXZZ19,DBLP:conf/icml/SoLL19}等等，已经成为结构搜索方法中亟待解决的重要问题。
--- a/bibliography.bib
+++ b/bibliography.bib
@@ -7022,6 +7022,31 @@ author    = {Yoshua Bengio and
  year      = {2020}
 }

+@inproceedings{DBLP:conf/cvpr/RebuffiKSL17,
+  author    = {Sylvestre{-}Alvise Rebuffi and
+               Alexander Kolesnikov and
+               Georg Sperl and
+               Christoph H. Lampert},
+  title     = {iCaRL: Incremental Classifier and Representation Learning},
+  pages     = {5533--5542},
+  publisher = {IEEE Conference on Computer Vision and Pattern Recognition},
+  year      = {2017}
+}
+
+@inproceedings{DBLP:conf/eccv/CastroMGSA18,
+  author    = {Francisco M. Castro and
+               Manuel J. Mar{\'{\i}}n{-}Jim{\'{e}}nez and
+               Nicol{\'{a}}s Guil and
+               Cordelia Schmid and
+               Karteek Alahari},
+  title     = {End-to-End Incremental Learning},
+  series    = {Lecture Notes in Computer Science},
+  volume    = {11216},
+  pages     = {241--257},
+  publisher = {European Conference on Computer Vision},
+  year      = {2018}
+}
+
 %%%%% chapter 13------------------------------------------------------
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@@ -10572,6 +10597,117 @@ author    = {Zhuang Liu and
  year      = {2019}
 }

+@inproceedings{DBLP:journals/corr/abs-1905-09418,
+  author    = {Elena Voita and
+               David Talbot and
+               Fedor Moiseev and
+               Rico Sennrich and
+               Ivan Titov},
+  title     = {Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy
+               Lifting, the Rest Can Be Pruned},
+  pages     = {5797--5808},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2019},
+}
+
+@inproceedings{Michel2019AreSH,
+  title={Are Sixteen Heads Really Better than One?},
+  author    = {Paul Michel and
+               Omer Levy and
+               Graham Neubig},
+  title     = {Are Sixteen Heads Really Better than One?},
+  publisher = {Conference and Workshop on Neural Information Processing Systems},
+  pages     = {14014--14024},
+  year      = {2019}
+}
+
+@inproceedings{DBLP:conf/emnlp/LiTYLZ18,
+  author    = {Jian Li and
+               Zhaopeng Tu and
+               Baosong Yang and
+               Michael R. Lyu and
+               Tong Zhang},
+  title     = {Multi-Head Attention with Disagreement Regularization},
+  pages     = {2897--2903},
+  publisher = {Conference on Empirical Methods in Natural Language Processing},
+  year      = {2018}
+}
+
+@inproceedings{Su2018VariationalRN,
+  title={Variational Recurrent Neural Machine Translation},
+  author={Jinsong Su and Shan Wu and Deyi Xiong and Yaojie Lu and Xianpei Han and Biao Zhang},
+  publisher={AAAI Conference on Artificial Intelligence},
+  pages={5488--5495},
+  year={2018}
+}
+
+@inproceedings{DBLP:conf/acl/SetiawanSNP20,
+  author    = {Hendra Setiawan and
+               Matthias Sperber and
+               Udhyakumar Nallasamy and
+               Matthias Paulik},
+  title     = {Variational Neural Machine Translation with Normalizing Flows},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2020}
+}
+
+@inproceedings{Li2020NeuralMT,
+  author    = {Yanyang Li and
+               Qiang Wang and
+               Tong Xiao and
+               Tongran Liu and
+               Jingbo Zhu},
+  title     = {Neural Machine Translation with Joint Representation},
+  pages     = {8285--8292},
+  publisher = {AAAI Conference on Artificial Intelligence},
+  year      = {2020}
+}
+
+@inproceedings{JMLR:v15:srivastava14a,
+  author  = {Nitish Srivastava and Geoffrey Hinton and Alex Krizhevsky and Ilya Sutskever and Ruslan Salakhutdinov},
+  title   = {Dropout: A Simple Way to Prevent Neural Networks from Overfitting},
+  publisher = {Journal of Machine Learning Research},
+  year    = {2014},
+  volume  = {15},
+  pages   = {1929-1958},
+}
+
+@inproceedings{Szegedy_2016_CVPR,
+  author    = {Christian Szegedy and
+               Vincent Vanhoucke and
+               Sergey Ioffe and
+               Jonathon Shlens and
+               Zbigniew Wojna},
+  title     = {Rethinking the Inception Architecture for Computer Vision},
+  publisher = {IEEE Conference on Computer Vision and Pattern Recognition},
+  pages     = {2818--2826},
+  year      = {2016},
+}
+
+@inproceedings{Chen2018TheBO,
+  author    = {Mia Xu Chen and
+               Orhan Firat and
+               Ankur Bapna and
+               Melvin Johnson and
+               Wolfgang Macherey and
+               George F. Foster and
+               Llion Jones and
+               Mike Schuster and
+               Noam Shazeer and
+               Niki Parmar and
+               Ashish Vaswani and
+               Jakob Uszkoreit and
+               Lukasz Kaiser and
+               Zhifeng Chen and
+               Yonghui Wu and
+               Macduff Hughes},
+  title     = {The Best of Both Worlds: Combining Recent Advances in Neural Machine
+               Translation},
+  pages     = {76--86},
+  publisher = {Annual Meeting of the Association for Computational Linguistics},
+  year      = {2018}
+}
+
 %%%%% chapter 15------------------------------------------------------
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%