17改错

47f21f9b · 孟霞 · e71d392a · 47f21f9b · 47f21f9b
Commit 47f21f9b authored Jan 14, 2021 by 孟霞
--- a/Chapter17/Figures/figure-multiencoder.tex
+++ b/Chapter17/Figures/figure-multiencoder.tex
@@ -6,7 +6,7 @@
 \tikzstyle{every node}=[scale=0.7]
 \node(encoder_c)[coder]{\large{编码器}};
 \node(encoder_s)[coder, right of = encoder_c, xshift=3.5cm, fill=red!30]{\large{编码器}};
-\node(h_pre)[above of = encoder_c, yshift=1.3cm,scale=1.3]{${\mathbi{h}}_{\rm pre}$};
+\node(h_pre)[above of = encoder_c, yshift=1.3cm,scale=1.3]{${\mathbi{h}}^{\rm pre}$};
 \node(h)[above of = encoder_s, yshift=1.3cm,scale=1.3]{$\mathbi{h}$};
 \node(cir)[circle,very thick, right of = h, draw=black!90,minimum width=0.5cm,xshift=1.1cm]{};
 \draw[-,very thick,draw=black!90]([xshift=0.04cm]cir.west)--([xshift=-0.04cm]cir.east);

--- a/Chapter17/chapter17.tex
+++ b/Chapter17/chapter17.tex
@@ -546,7 +546,7 @@
 \subsubsection{2. 多编码器结构}


-\parinterval 另一种思路是对传统的编码器-解码器框架进行更改，引入额外的编码器来对上下文句子进行编码，该结构被称为多编码器结构\upcite{DBLP:conf/acl/LiLWJXZLL20,DBLP:conf/discomt/SugiyamaY19}。这种结构最早被应用在基于循环神经网络的篇章级翻译模型中\upcite{DBLP:journals/corr/JeanLFC17,DBLP:conf/coling/KuangX18,DBLP:conf/naacl/BawdenSBH18,DBLP:conf/pacling/YamagishiK19}，后期证明在Transformer模型上同样适用\upcite{DBLP:journals/corr/abs-1805-10163,DBLP:conf/emnlp/ZhangLSZXZL18}。图\ref{fig:17-18}展示了一个基于Transformer模型的多编码器结构，基于源语言当前待翻译句子的编码表示$\mathbi{h}$和上下文句子的编码表示$\mathbi{h}^{\textrm pre}$，模型首先通过注意力机制提取上下文信息$\mathbi{d}$：
+\parinterval 另一种思路是对传统的编码器-解码器框架进行更改，引入额外的编码器来对上下文句子进行编码，该结构被称为多编码器结构\upcite{DBLP:conf/acl/LiLWJXZLL20,DBLP:conf/discomt/SugiyamaY19}。这种结构最早被应用在基于循环神经网络的篇章级翻译模型中\upcite{DBLP:journals/corr/JeanLFC17,DBLP:conf/coling/KuangX18,DBLP:conf/naacl/BawdenSBH18,DBLP:conf/pacling/YamagishiK19}，后期证明在Transformer模型上同样适用\upcite{DBLP:journals/corr/abs-1805-10163,DBLP:conf/emnlp/ZhangLSZXZL18}。图\ref{fig:17-18}展示了一个基于Transformer模型的多编码器结构，基于源语言当前待翻译句子的编码表示$\mathbi{h}$和上下文句子的编码表示$\mathbi{h}^{\textrm pre}$，模型首先通过注意力机制提取句子间上下文信息$\mathbi{d}$：
 \begin{eqnarray}
 \mathbi{d}&=&\textrm{Attention}(\mathbi{h},\mathbi{h}^{\textrm pre},\mathbi{h}^{\textrm pre})
 \label{eq:17-3-3}
@@ -604,7 +604,7 @@
 \noindent 之后，分别计算词级和句子级注意力模型。需要注意的是句子级注意力添加了一个前馈全连接网络子层FFN。其具体计算方式如下：

 \begin{eqnarray}
-\mathbi{s}^k&=&\textrm{WordAttention}(\mathbi{q}_{w},\mathbi{h}^{k},\mathbi{h}^{k})
+\mathbi{s}^k&=&\textrm{WordAttention}(\mathbi{q}_{w},\mathbi{h}^{\textrm {pre}k},\mathbi{h}^{\textrm{pre}k})
 \label{eq:17-3-7}\\
 \mathbi{d}_t&=&\textrm{FFN}(\textrm{SentAttention}(\mathbi{q}_{s},\mathbi{s},\mathbi{s}))
 \label{eq:17-3-9}