合并分支 'caorunzhe' 到 'master'

Caorunzhe 查看合并请求 !376

合并分支 'caorunzhe' 到 'master'
Caorunzhe 查看合并请求 !376
6cdb9337 · 曹润柘 · 889debec · bd5d5ba0 · 6cdb9337 · 6cdb9337
Commit 6cdb9337 authored Nov 09, 2020 by 曹润柘
--- a/Chapter11/chapter11.tex
+++ b/Chapter11/chapter11.tex
@@ -333,7 +333,7 @@ x_{l+1} = x_l + F (x_l)
 \label{eq:11-3}
 \end{eqnarray}

-\noindent 其中，$x_l$表示$l$层网络的输入向量，$\mathcal{F} (x_l)$是子层运算。如果$l=2$，那么公式\eqref{eq:11-3}可以解释为，第3层的输入（$x_3$）等于第2层的输出（$\mathcal{F}(x_2)$）加上第二层的输入（$x_2$）。
+\noindent 其中，$x_l$表示$l$层网络的输入向量，${F} (x_l)$是子层运算。如果$l=2$，那么公式\eqref{eq:11-3}可以解释为，第3层的输入（$x_3$）等于第2层的输出（${F}(x_2)$）加上第二层的输入（$x_2$）。

 \parinterval 在ConvS2S中残差连接主要应用于门控卷积网络和多跳自注意力机制中。为了堆叠更多的卷积网络，在每个卷积网络的输入和输出之间增加残差连接，具体的数学描述如下：
 \begin{eqnarray}

--- a/Chapter12/Figures/figure-point-product-attention-model.tex
+++ b/Chapter12/Figures/figure-point-product-attention-model.tex
@@ -59,7 +59,7 @@
 {
 \node [rectangle,inner sep=0.2em,rounded corners=1pt,fill=green!10,drop shadow,draw=ugreen,minimum width=10em] [fit = (line1) (line2) (line3) (line4)] (box1) {};
 \node [rectangle,inner sep=0.1em,rounded corners=1pt,very thick,dotted,draw=ugreen] [fit = (Q1) (K1) (V1)] (box0) {};
-\draw [->,dotted,very thick,ugreen] ([yshift=-1.5em,xshift=1.2em]box1.east) -- ([yshift=-1.5em,xshift=0.1em]box1.east);
+\draw [->,dotted,very thick,ugreen] ([yshift=-1.5em,xshift=1.8em]box1.east) -- ([yshift=-1.5em,xshift=0.1em]box1.east);
 }
 {
 \node [rectangle,inner sep=0.2em,rounded corners=1pt,fill=blue!20!white,drop shadow,draw=blue] [fit = (line11) (line12) (line13)] (box2) {};

--- a/Chapter12/Figures/figure-process-of-5.tex
+++ b/Chapter12/Figures/figure-process-of-5.tex
@@ -116,7 +116,11 @@
 % )
 \node(bra2) at ([xshift=0.2em,yshift=0]mid.east){)};
 %红色框
-\node[rectangle,minimum width=4.0em,minimum height=1.5em,draw=red](p222) at([xshift=0em,yshift=-1.0em]mid.north) {};
+\node[rectangle,minimum width=4.0em,minimum height=1.5em,draw=red,line width=1pt](p222) at([xshift=0em,yshift=-1.0em]mid.north) {};
+
+\node[rectangle,minimum width=4.0em,minimum height=1.5em,draw=ugreen,ultra thick,dotted,thick,font=\footnotesize](sub) at([xshift=-12em,yshift=1.0em]p222.west) {按行进行Softmax};
+\draw[->,dotted,very thick,draw=ugreen] (p222.west) .. controls +(north:0.5) and +(east:1) .. (sub.east);
+

 %%%% v
 \node(tbv3) at ([xshift=0.5em,yshift=0]bra2.east){

--- a/Chapter12/chapter12.tex
+++ b/Chapter12/chapter12.tex
@@ -398,11 +398,11 @@

 %\parinterval 残差连接从广义上讲也叫短连接，指的是这种短距离的连接。它的思想很简单，就是把层和层之间的距离拉近。如图\ref{fig:12-49}所示，子层1通过残差连接跳过了子层2，直接和子层3进行信息传递。使信息传递变得更高效，有效解决了深层网络训练过程中容易出现的梯度消失/爆炸问题，使得深层网络的训练更加容易。其计算公式为：
 %\begin{eqnarray}
-%x_{l+1} = x_l + \mathcal{F} (x_l)
+%x_{l+1} = x_l + {F} (x_l)
 %\label{eq:12-50}
 %\end{eqnarray}

-%\noindent 其中，$x_l$表示$l$层网络的输入向量，$\mathcal{F} (x_l)$是子层运算。如果$l=2$，那么公式\eqref{eq:12-50}可以解释为，第3层的输入（$x_3$）等于第2层的输出（$\mathcal{F}(x_2)$）加上第二层的输入（$x_2$）。图\ref{fig:12-50} 中的红色方框展示了Transformer 中残差连接的位置。
+%\noindent 其中，$x_l$表示$l$层网络的输入向量，${F} (x_l)$是子层运算。如果$l=2$，那么公式\eqref{eq:12-50}可以解释为，第3层的输入（$x_3$）等于第2层的输出（${F}(x_2)$）加上第二层的输入（$x_2$）。图\ref{fig:12-50} 中的红色方框展示了Transformer 中残差连接的位置。

 %----------------------------------------------
 \begin{figure}[htp]
@@ -415,7 +415,7 @@

 \parinterval 在Transformer的训练过程中，由于引入了残差操作，将前面所有层的输出加到一起，如公式：
 \begin{eqnarray}
-x_{l+1} = x_l + \mathcal{F} (x_l)
+x_{l+1} = x_l + F (x_l)
 \label{eq:12-50}
 \end{eqnarray}


--- a/Chapter16/chapter16.tex
+++ b/Chapter16/chapter16.tex
--- a/Chapter9/chapter9.tex
+++ b/Chapter9/chapter9.tex
@@ -2162,7 +2162,7 @@ Jobs was the CEO of {\red{\underline{apple}}}.

 \begin{itemize}
 \vspace{0.5em}
-\item 端到端学习是神经网络方法的特点之一。这样，系统开发者不需要设计输入和输出的隐含结构，甚至连特征工程都不再需要。但是，另一方面，由于这种端到端学习完全由神经网络自行完成，整个学习过程没有人的先验知识做指导，导致学习的结构和参数很难进行解释。针对这个问题也有很多研究者进行{\small\sffamily\bfseries{可解释机器学习}}\index{可解释机器学习}（Explainable Machine Learning）\index{Explainable Machine Learning}的研究\upcite{moraffah2020causal}。对于自然语言处理，方法的可解释性是十分必要的。从另一个角度说，如何使用先验知识改善端到端学习也是很多人关注的方向\upcite{arthur2016incorporating,zhang-etal-2017-prior}，比如，如何使用句法知识改善自然语言处理模型\upcite{stahlberg2016syntactically,currey2019incorporating,Yang2017TowardsBH,marevcek2018extracting,blevins2018deep}。
+\item 端到端学习是神经网络方法的特点之一。这样，系统开发者不需要设计输入和输出的隐含结构，甚至连特征工程都不再需要。但是，另一方面，由于这种端到端学习完全由神经网络自行完成，整个学习过程没有人的先验知识做指导，导致学习的结构和参数很难进行解释。针对这个问题也有很多研究者进行{\small\sffamily\bfseries{可解释机器学习}}\index{可解释机器学习}（Explainable Machine Learning）\index{Explainable Machine Learning}的研究\upcite{moraffah2020causal,Kovalerchuk2020SurveyOE,DoshiVelez2017TowardsAR}。对于自然语言处理，方法的可解释性是十分必要的。从另一个角度说，如何使用先验知识改善端到端学习也是很多人关注的方向\upcite{arthur2016incorporating,zhang-etal-2017-prior}，比如，如何使用句法知识改善自然语言处理模型\upcite{stahlberg2016syntactically,currey2019incorporating,Yang2017TowardsBH,marevcek2018extracting,blevins2018deep}。
 \vspace{0.5em}
 \item 为了进一步提高神经语言模型性能，除了改进模型，还可以在模型中引入新的结构或是其他有效信息，该领域也有很多典型工作值得关注。例如在神经语言模型中引入除了词嵌入以外的单词特征，如语言特征（形态、语法、语义特征等）\upcite{Wu2012FactoredLM,Adel2015SyntacticAS}、上下文信息\upcite{mikolov2012context,Wang2015LargerContextLM}、知识图谱等外部知识\upcite{Ahn2016ANK}；或是在神经语言模型中引入字符级信息，将其作为字符特征单独\upcite{Kim2016CharacterAwareNL,Hwang2017CharacterlevelLM}或与单词特征一起\upcite{Onoe2016GatedWR,Verwimp2017CharacterWordLL}送入模型中；在神经语言模型中引入双向模型也是一种十分有效的尝试，在单词预测时可以同时利用来自过去和未来的文本信息\upcite{Graves2013HybridSR,bahdanau2014neural,Peters2018DeepCW}。
 \vspace{0.5em}

--- a/bibliography.bib
+++ b/bibliography.bib
@@ -4313,6 +4313,23 @@ year = {2012}
  volume={abs/1904.02342}
 }

+@article{Kovalerchuk2020SurveyOE,
+  title={Survey of explainable machine learning with visual and granular methods beyond quasi-explanations},
+  author={Boris Kovalerchuk and 
+          Muhammad Ahmad and 
+		  Ankur Teredesai},
+  journal={ArXiv},
+  year={2020},
+  volume={abs/2009.10221}
+}
+
+@article{DoshiVelez2017TowardsAR,
+  title={Towards A Rigorous Science of Interpretable Machine Learning},
+  author={Finale Doshi-Velez and 
+          Been Kim},
+  journal={arXiv: Machine Learning},
+  year={2017}
+}

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 %%%%% chapter 9------------------------------------------------------
@@ -6055,6 +6072,20 @@ pages ={157-166},
  publisher = {Annual Meeting of the Association for Computational Linguistics},
  year      = {2019}
 }
+@article{2015OnGulcehre,
+  title = {On Using Monolingual Corpora in Neural Machine Translation},
+  author = { Gulcehre Caglar  and  
+           Firat Orhan  and  
+           Xu Kelvin  and  
+           Cho Kyunghyun  and  
+           Barrault Loic  and  
+           Lin Huei Chi  and  
+           Bougares Fethi  and  
+           Schwenk Holger  and  
+           Bengio  Yoshua },
+  journal = {Computer Science},
+  year = {2015},
+}
 %%%%% chapter 16------------------------------------------------------
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%