exposure-bias

6dfd43b3 · zengxin · c143d66d · 6dfd43b3 · 6dfd43b3 · 6dfd43b3
Commit 6dfd43b3 authored Dec 28, 2020 by zengxin
--- a/exposure bias/2019-acl-Best-Bridging the Gap between Training and Inference for Neural Machine Translation.pdf
+++ b/exposure bias/2019-acl-Best-Bridging the Gap between Training and Inference for Neural Machine Translation.pdf
--- a/exposure bias/exposure-bias.md
+++ b/exposure bias/exposure-bias.md
+## Bridging the Gap between Training and Inference for Neural Machine Translation，ACL best paper 2019
+* 在模型训练中，其当前时刻预测的词依赖的context来自标准译文，而在推断阶段，当前时刻预测的词所依赖的context来自模型自己的推断。训练与推断的差别，将导致模型在推断时产生误差。因此这篇文章在模型训练阶段，对context中的词进行采样，使其既包括来自标准译文的词，又包括模型自己预测的词，以达到缓解训练和推断不一致。
+* 对于第$j$步的预测，本文的方法：
+  1. 选择一个模型预测的单词（oracle word）$y_{j-1}^{\textrm{oracle}}$（词级（使用贪婪搜索算法，搜索出概率最高的词）或者句子级（使用beam search算法，选出让句子翻译概率最高的词））
+     1. 词级：为模型预测的分加一个噪声，然后再进行softmax和argmax
+        	$\eta =-log(-logu)$
+            $\tilde{o}_{j-1}=(o_{j-1}+\eta)/\tau$
+            $\tilde{P}_{j-1}=softmax(\tilde{o}_{j-1})$
+            $y_{j-1}^{\textrm{oracle}}=argmax(\tilde{P}_{j-1})$
+        其中$o_{j-1}$是模型隐藏向量，$\eta$是Gumbel噪声，$\mu$是服从（0,1）之间均匀分布的随机数，$\tau$是temperature，当$\tau$趋于0，那么softmax操作与argmax相似，当$\tau$趋于无穷，softmax操作类似于均匀分布。
+     2. 句子级：使用BLEU作为句子级评价方法；使用强制解码，解码出和标准译文一样长的句子；同样引入噪声
+        强制解码：
+        * 在$j\leq |y^*|$时，如果句子概率分布中概率最大的词是EOS，也就是说此时句子预测结束，但长度不够，就选择概率第二大的词作为第$j$个预测词；
+        * 如果句子长度超过$|y^*|$就选择EOS作为第$|y^*| +1$个词。
+  2. 以概率$p$从参考译文词$y_{j-1}^*$采样，或以概率$1-p$从模型预测的词$y_{j-1}^{oracle}$中进行采样；
+     1. 如果完全以模型预测的词作为$y_{j-1}$，模型会收敛很慢，甚至陷入局部最优；如果以很大概率选择参考译文的词作为$y_{j-1}$，那么无法解决训练推断不一致问题。**因此$p$的选择不是固定的，且随着训练的推进逐渐减少**
+     2. 一开始$p=1$，这意味着模型完全使用参考译文进行训练
+     3. 本文将$p$定义为依赖于训练周期e的指数的衰减函数：
+        $p=\frac{\mu}{\mu + exp(e/\mu)}$
+        其中$\mu$是一个超参。
+  3. 选择采样的词$y_{j-1}$，替换普通模型中使用的参考译文中的词$y_{j-1}^*$
\ No newline at end of file
--- a/unclassified/2019-acl-Best-Bridging the Gap between Training and Inference for Neural Machine Translation.pdf
+++ b/unclassified/2019-acl-Best-Bridging the Gap between Training and Inference for Neural Machine Translation.pdf
--- a/unclassified/unclassified.md
+++ b/unclassified/unclassified.md
-## Bridging the Gap between Training and Inference for Neural Machine Translation，ACL best paper 2019
-* 在模型训练中，其当前时刻预测的词依赖的context来自标准译文，而在推断阶段，当前时刻预测的词所依赖的context来自模型自己的推断。训练与推断的差别，将导致模型在推断时产生误差。因此这篇文章在模型训练阶段，对context中的词进行采样，使其既包括来自标准译文的词，又包括模型自己预测的词，以达到缓解训练和推断不一致。