合并分支 'zengxin' 到 'caorunzhe'

Zengxin 查看合并请求 !301

合并分支 'zengxin' 到 'caorunzhe'
Zengxin 查看合并请求 !301
a194633f · zengxin · 4da72f96 · 646b88e5 · a194633f · a194633f
Commit a194633f authored Oct 09, 2020 by zengxin
--- a/Chapter10/chapter10.tex
+++ b/Chapter10/chapter10.tex
--- a/Chapter12/Figures/figure-point-product-attention-model.tex
+++ b/Chapter12/Figures/figure-point-product-attention-model.tex
@@ -27,13 +27,13 @@

 {
 \node [anchor=east] (line1) at ([xshift=-4em,yshift=1em]MatMul.west) {\scriptsize{自注意力机制的Query}};
-\node [anchor=north west] (line2) at ([yshift=0.3em]line1.south west) {\scriptsize{Key和Value均来自同一句子}};
-\node [anchor=north west] (line3) at ([yshift=0.3em]line2.south west) {\scriptsize{编码-解码注意力机制}};
+\node [anchor=north west] (line2) at ([yshift=0.3em]line1.south west) {\scriptsize{Key和Value均来自同一句}};
+\node [anchor=north west] (line3) at ([yshift=0.3em]line2.south west) {\scriptsize{子编码-解码注意力机制}};
 \node [anchor=north west] (line4) at ([yshift=0.3em]line3.south west) {\scriptsize{与前面讲的一样}};
 }
 {
-\node [anchor=west] (line11) at ([xshift=3em,yshift=0em]MatMul.east) {\scriptsize{Query和Key的转置}};
-\node [anchor=north west] (line12) at ([yshift=0.3em]line11.south west) {\scriptsize{进行点积,得到句子内部}};
+\node [anchor=west] (line11) at ([xshift=3em,yshift=0em]MatMul.east) {\scriptsize{Query和Key的转置进}};
+\node [anchor=north west] (line12) at ([yshift=0.3em]line11.south west) {\scriptsize{行点积,得到句子内部}};
 \node [anchor=north west] (line13) at ([yshift=0.3em]line12.south west) {\scriptsize{各个位置的相关性}};
 }

@@ -57,7 +57,7 @@

 \begin{pgfonlayer}{background}
 {
-\node [rectangle,inner sep=0.2em,rounded corners=1pt,fill=green!10,drop shadow,draw=ugreen] [fit = (line1) (line2) (line3) (line4)] (box1) {};
+\node [rectangle,inner sep=0.2em,rounded corners=1pt,fill=green!10,drop shadow,draw=ugreen,minimum width=10em] [fit = (line1) (line2) (line3) (line4)] (box1) {};
 \node [rectangle,inner sep=0.1em,rounded corners=1pt,very thick,dotted,draw=ugreen] [fit = (Q1) (K1) (V1)] (box0) {};
 \draw [->,dotted,very thick,ugreen] ([yshift=-1.5em,xshift=1.2em]box1.east) -- ([yshift=-1.5em,xshift=0.1em]box1.east);
 }

--- a/Chapter12/chapter12.tex
+++ b/Chapter12/chapter12.tex
@@ -356,7 +356,7 @@

 \subsection{掩码操作}

-\parinterval 在公式\ref{eq:12-47}中提到了掩码（Mask），它的目的是对向量中某些值进行掩盖，避免无关位置的数值对运算造成影响。Transformer中的掩码主要应用在注意力机制中的相关性系数计算，具体方式是在相关性系数矩阵上累加一个掩码矩阵。该矩阵在需要掩码的位置的值为负无穷$-$inf（具体实现时是一个非常小的数，比如$-$1e-9），其余位置为0，这样在进行了Softmax归一化操作之后，被掩码掉的位置计算得到的权重便近似为0，也就是说对无用信息分配的权重为0，从而避免了其对结果产生影响。Transformer包含两种掩码：
+\parinterval 在公式\eqref{eq:12-47}中提到了掩码（Mask），它的目的是对向量中某些值进行掩盖，避免无关位置的数值对运算造成影响。Transformer中的掩码主要应用在注意力机制中的相关性系数计算，具体方式是在相关性系数矩阵上累加一个掩码矩阵。该矩阵在需要掩码的位置的值为负无穷$-$inf（具体实现时是一个非常小的数，比如$-$1e-9），其余位置为0，这样在进行了Softmax归一化操作之后，被掩码掉的位置计算得到的权重便近似为0，也就是说对无用信息分配的权重为0，从而避免了其对结果产生影响。Transformer包含两种掩码：

 \begin{itemize}
 \vspace{0.5em}
@@ -402,7 +402,7 @@ x_{l+1} = x_l + \mathcal{F} (x_l)
 \label{eq:12-50}
 \end{eqnarray}

-\noindent 其中，$x_l$表示$l$层网络的输入向量，$\mathcal{F} (x_l)$是子层运算。如果$l=2$，那么公式\ref{eq:12-50}可以解释为，第3层的输入（$x_3$）等于第2层的输出（$\mathcal{F}(x_2)$）加上第二层的输入（$x_2$）。图\ref{fig:12-50} 中的红色方框展示了Transformer 中残差连接的位置。
+\noindent 其中，$x_l$表示$l$层网络的输入向量，$\mathcal{F} (x_l)$是子层运算。如果$l=2$，那么公式\eqref{eq:12-50}可以解释为，第3层的输入（$x_3$）等于第2层的输出（$\mathcal{F}(x_2)$）加上第二层的输入（$x_2$）。图\ref{fig:12-50} 中的红色方框展示了Transformer 中残差连接的位置。

 %----------------------------------------------
 \begin{figure}[htp]