Commit 85fbffb8 by xiaotong

updates

parent e5adc002
......@@ -1139,17 +1139,17 @@ s_0 \overset{r_1}{\Rightarrow} s_1 \overset{r_2}{\Rightarrow} s_2 \overset{r_3}{
\label{eq:2.5-5}
\end{eqnarray}
\parinterval 这样我们就可以得到每个推导d的概率值。这个模型,可以很好的解释词串的生成过程。比如,对于规则集
\parinterval 这样我们就可以得到每个推导$d$的概率值。这个模型,可以很好的解释词串的生成过程。比如,对于规则集
\begin{eqnarray}
r_3: & \textrm{VV} \to \text{}\nonumber \\
r_4: & \textrm{NN} \to \text{}\nonumber \\
r_6: & \textrm{VP} \to \textrm{VV} \textrm{NN} \nonumber
r_3: & &\textrm{VV} \to \text{}\nonumber \\
r_4: & & \textrm{NN} \to \text{}\nonumber \\
r_6: & & \textrm{VP} \to \textrm{VV} \textrm{NN} \nonumber
\end{eqnarray}
\parinterval 可以得到 $d_1=r_3 \cdot r_4 \cdot r_6$的概率为
\begin{eqnarray}
\textrm{P}(d_1) & = &\textrm{P}(r_3) \cdot \textrm{P}(r_4) \cdot \textrm{P}(r_6)\nonumber \\
& = & \textrm{P}(\textrm{VV} \to \text{}) \cdot \textrm{P}(\textrm{NN} \to \text{}) \cdots \textrm{P}(\textrm{VP} \to \textrm{VV NN})
& = & \textrm{P}(\textrm{``VV} \to \text{吃''}) \cdot \textrm{P}(\textrm{``NN} \to \text{鱼''}) \cdots \textrm{P}(\textrm{``VP} \to \textrm{VV NN''})
\label{eq:2.5-6}
\end{eqnarray}
......
......@@ -209,7 +209,7 @@
\parinterval 上面提到的方法需要能在更多的句子上使用。假设,有$N$个互译句对$(\mathbf{s}^1,\mathbf{t}^1)$,...,\\$(\mathbf{s}^N,\mathbf{t}^N)$。这时,我们仍然可以使用基于相对频度的方法进行概率估计,具体方法如下:
\begin{eqnarray}
\textrm{P}(x,y) = \frac{{\sum_{i=1}^{N} c(x,y;\mathbf{s}^i,\mathbf{t}^i)}}{\sum_{i=1}^{n}{{\sum_{x',y'} c(x',y';x^i,y^i)}}}
\textrm{P}(x,y) = \frac{{\sum_{i=1}^{N} c(x,y;\mathbf{s}^i,\mathbf{t}^i)}}{\sum_{i=1}^{n}{{\sum_{x',y'} c(x',y';\mathbf{s}^i,\mathbf{t}^i)}}}
\label{eqC3.5-new}
\end{eqnarray}
......@@ -233,8 +233,8 @@
\begin{eqnarray}
{\textrm{P}(\textrm{``翻译''},\textrm{``translation''})} & = & {\frac{c(\textrm{``翻译''},\textrm{``translation''};\mathbf{s}^{1},\mathbf{t}^{1})+c(\textrm{``翻译''},\textrm{``translation''};\mathbf{s}^{2},\mathbf{t}^{2})}{\sum_{x',y'} c(x',y';\mathbf{s}^{1},\mathbf{t}^{1}) + \sum_{x',y'} c(x',y';\mathbf{s}^{2},\mathbf{t}^{2})}} \nonumber \\
& = & \frac{4 + 1}{|\mathbf{s}^{1}| \times |\mathbf{t}^{1}| + |\mathbf{s}^{2}| \times |\mathbf{t}^{2}|} \nonumber \\
& = & \frac{4 + 1}{9 \times 7 + 5 \times 7} \nonumber \\
& = & \frac{5}{102}
& = & \frac{4 + 1}{9 \times 7 + 5 \times 6} \nonumber \\
& = & \frac{5}{93}
\label{eqC3.6-new}
\end{eqnarray}
}
......@@ -698,7 +698,7 @@ g(\mathbf{s},\mathbf{t}) \equiv \prod_{j,i \in \widehat{A}}{\textrm{P}(s_j,t_i)}
\parinterval 如果模型参数给定,我们可以使用IBM模型1-2对新的句子进行翻译。比如,可以使用\ref{sec:simple-decoding}节描述的解码方法搜索最优译文,或者使用自左向右解码 + 剪枝的方法。在搜索过程中,只需要通过公式\ref{eqC3.25-new}\ref{eqC3.27-new}计算每个译文候选的IBM模型翻译概率。但是,公式\ref{eqC3.25-new}\ref{eqC3.27-new}的高计算复杂度导致这些模型很难直接使用。以IBM模型1为例,这里把公式\ref{eqC3.25-new}重写为:
\begin{eqnarray}
\textrm{P}(\mathbf{s}| \mathbf{t}) = \frac{\epsilon}{(l+1)^{m}} \underbrace{\sum\limits_{a_1=0}^{l} ... \sum\limits_{a_m=0}^{l}}_{(l+1)^m\textrm{次循环}} \underbrace{\prod\limits_{j=1}^{m} f(s_j|t_{a_j})}_{m\textrm{次循环}}
\textrm{P}(\mathbf{s}| \mathbf{t}) = \frac{\varepsilon}{(l+1)^{m}} \underbrace{\sum\limits_{a_1=0}^{l} ... \sum\limits_{a_m=0}^{l}}_{(l+1)^m\textrm{次循环}} \underbrace{\prod\limits_{j=1}^{m} f(s_j|t_{a_j})}_{m\textrm{次循环}}
\label{eqC3.28-new}
\end{eqnarray}
......@@ -722,8 +722,8 @@ g(\mathbf{s},\mathbf{t}) \equiv \prod_{j,i \in \widehat{A}}{\textrm{P}(s_j,t_i)}
\parinterval 接着,利用公式\ref{eqC3.29-new}的方式,可以把公式\ref{eqC3.25-new}\ref{eqC3.27-new}重写表示为:
\begin{eqnarray}
\textrm{IBM模型1:\ \ \ \ } \textrm{P}(\mathbf{s}| \mathbf{t}) & = & \frac{\epsilon}{(l+1)^{m}} \prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} f(s_j|t_i) \label{eq:final-model1} \\
\textrm{IBM模型2:\ \ \ \ }\textrm{P}(\mathbf{s}| \mathbf{t}) & = & \epsilon \prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} a(i|j,m,l) f(s_j|t_i) \label{eq:final-model2}
\textrm{IBM模型1:\ \ \ \ } \textrm{P}(\mathbf{s}| \mathbf{t}) & = & \frac{\varepsilon}{(l+1)^{m}} \prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} f(s_j|t_i) \label{eq:final-model1} \\
\textrm{IBM模型2:\ \ \ \ }\textrm{P}(\mathbf{s}| \mathbf{t}) & = & \varepsilon \prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} a(i|j,m,l) f(s_j|t_i) \label{eq:final-model2}
\label{eqC3.31-new}
\end{eqnarray}
......@@ -772,11 +772,11 @@ g(\mathbf{s},\mathbf{t}) \equiv \prod_{j,i \in \widehat{A}}{\textrm{P}(s_j,t_i)}
\parinterval 这里,我们的目标是$\max(\textrm{P}_{\theta}(\mathbf{s}|\mathbf{t}))$,约束条件是对于任意的目标语单词$t_y$\\$\sum_{s_x}{\textrm{P}(s_x|t_y)}=1$。根据拉格朗日乘数法,可以把上述优化问题重新定义最大化如下拉格朗日函数:
\begin{eqnarray}
L(f,\lambda)=\frac{\epsilon}{(l+1)^m}\prod_{j=1}^{m}\sum_{i=0}^{l}\prod_{j=1}^{m}{f(s_j|t_i)}-\sum_{u}{\lambda_{t_y}(\sum_{s_x}{f(s_x|t_y)}-1)}
L(f,\lambda)=\frac{\varepsilon}{(l+1)^m}\prod_{j=1}^{m}\sum_{i=0}^{l}\prod_{j=1}^{m}{f(s_j|t_i)}-\sum_{u}{\lambda_{t_y}(\sum_{s_x}{f(s_x|t_y)}-1)}
\label{eqC3.33-new}
\end{eqnarray}
\parinterval $L(f,\lambda)$包含两部分,$\frac{\epsilon}{(l+1)^m}\prod_{j=1}^{m}\sum_{i=0}^{l}\prod_{j=1}^{m}{f(s_j|t_i)}$是原始的目标函数,\\$\sum_{u}{\lambda_{t_y}(\sum_{s_x}{f(s_x|t_y)}-1)}$是原始的约束条件乘以拉格朗日乘数$\lambda_{t_y}$,拉格朗日乘数的数量和约束条件的数量相同。图\ref{fig:3-23}通过图例说明了$L(f,\lambda)$各部分的意义。
\parinterval $L(f,\lambda)$包含两部分,$\frac{\varepsilon}{(l+1)^m}\prod_{j=1}^{m}\sum_{i=0}^{l}\prod_{j=1}^{m}{f(s_j|t_i)}$是原始的目标函数,\\$\sum_{u}{\lambda_{t_y}(\sum_{s_x}{f(s_x|t_y)}-1)}$是原始的约束条件乘以拉格朗日乘数$\lambda_{t_y}$,拉格朗日乘数的数量和约束条件的数量相同。图\ref{fig:3-23}通过图例说明了$L(f,\lambda)$各部分的意义。
\vspace{6.0em}
%----------------------------------------------
% 图3.23
......@@ -790,9 +790,9 @@ L(f,\lambda)=\frac{\epsilon}{(l+1)^m}\prod_{j=1}^{m}\sum_{i=0}^{l}\prod_{j=1}^{m
\noindent 因为$L(f,\lambda)$是可微分函数,因此可以通过计算$L(f,\lambda)$导数为零的点得到极值点。因为这个模型里仅有$f(s_x|t_y)$一种类型的参数,我们只需要如下导数进行计算。
\begin{eqnarray}
\frac{\partial L(f,\lambda)}{\partial f(s_u|t_v)}& = & \frac{\partial \big[ \frac{\epsilon}{(l+1)^{m}} \prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} f(s_j|t_i) \big]}{\partial f(s_u|t_v)} - \nonumber \\
\frac{\partial L(f,\lambda)}{\partial f(s_u|t_v)}& = & \frac{\partial \big[ \frac{\varepsilon}{(l+1)^{m}} \prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} f(s_j|t_i) \big]}{\partial f(s_u|t_v)} - \nonumber \\
& & \frac{\partial \big[ \sum_{t_y} \lambda_{t_y} (\sum_{s_x} f(s_x|t_y) -1) \big]}{\partial f(s_u|t_v)} \nonumber \\
& = & \frac{\epsilon}{(l+1)^{m}} \cdot \frac{\partial \big[ \prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} f(s_j|t_{a_j}) \big]}{\partial f(s_u|t_v)} - \lambda_{t_v}
& = & \frac{\varepsilon}{(l+1)^{m}} \cdot \frac{\partial \big[ \prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} f(s_j|t_{a_j}) \big]}{\partial f(s_u|t_v)} - \lambda_{t_v}
\label{eqC3.34-new}
\end{eqnarray}
......@@ -821,20 +821,20 @@ $\frac{\partial g(z)}{\partial z} = \alpha \beta z^{\beta-1} = \frac{\beta}{z}\a
\parinterval$\frac{\partial \big[ \prod_{j=1}^{m} \sum_{i=0}^{l} f(s_j|t_i) \big]}{\partial f(s_u|t_v)}$进一步代入$\frac{\partial L(f,\lambda)}{\partial f(s_u|t_v)}$,得到$L(f,\lambda)$的导数
\begin{eqnarray}
& &{\frac{\partial L(f,\lambda)}{\partial f(s_u|t_v)}}\nonumber \\
&=&{\frac{\epsilon}{(l+1)^{m}} \cdot \frac{\partial \big[ \prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} f(s_j|t_{a_j}) \big]}{\partial f(s_u|t_v)} - \lambda_{t_v}}\nonumber \\
&=&{\frac{\epsilon}{(l+1)^{m}} \frac{\sum_{j=1}^{m} \delta(s_j,s_u) \cdot \sum_{i=0}^{l} \delta(t_i,t_v)}{\sum_{i=0}^{l}f(s_u|t_i)} \prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} f(s_j|t_i) - \lambda_{t_v}}
&=&{\frac{\varepsilon}{(l+1)^{m}} \cdot \frac{\partial \big[ \prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} f(s_j|t_{a_j}) \big]}{\partial f(s_u|t_v)} - \lambda_{t_v}}\nonumber \\
&=&{\frac{\varepsilon}{(l+1)^{m}} \frac{\sum_{j=1}^{m} \delta(s_j,s_u) \cdot \sum_{i=0}^{l} \delta(t_i,t_v)}{\sum_{i=0}^{l}f(s_u|t_i)} \prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} f(s_j|t_i) - \lambda_{t_v}}
\label{eqC3.38-new}
\end{eqnarray}
\parinterval$\frac{\partial L(f,\lambda)}{\partial f(s_u|t_v)}=0$,有
\begin{eqnarray}
f(s_u|t_v) = \frac{\lambda_{t_v}^{-1} \epsilon}{(l+1)^{m}} \cdot \frac{\sum\limits_{j=1}^{m} \delta(s_j,s_u) \cdot \sum\limits_{i=0}^{l} \delta(t_i,t_v)}{\sum\limits_{i=0}^{l}f(s_u|t_i)} \prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} f(s_j|t_i) \cdot f(s_u|t_v)
f(s_u|t_v) = \frac{\lambda_{t_v}^{-1} \varepsilon}{(l+1)^{m}} \cdot \frac{\sum\limits_{j=1}^{m} \delta(s_j,s_u) \cdot \sum\limits_{i=0}^{l} \delta(t_i,t_v)}{\sum\limits_{i=0}^{l}f(s_u|t_i)} \prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} f(s_j|t_i) \cdot f(s_u|t_v)
\label{eqC3.39-new}
\end{eqnarray}
\noindent \hspace{2em} 将上式稍作调整得到下式,可以看出,这不是一个计算$f(s_u|t_v)$的解析式,因为等式右端仍含有$f(s_u|t_v)$
\begin{eqnarray}
f(s_u|t_v) = \lambda_{t_v}^{-1} \frac{\epsilon}{(l+1)^{m}} \prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} f(s_j|t_i) \sum\limits_{j=1}^{m} \delta(s_j,s_u) \sum\limits_{i=0}^{l} \delta(t_i,t_v) \frac{f(s_u|t_v) }{\sum\limits_{i=0}^{l}f(s_u|t_i)}
f(s_u|t_v) = \lambda_{t_v}^{-1} \frac{\varepsilon}{(l+1)^{m}} \prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} f(s_j|t_i) \sum\limits_{j=1}^{m} \delta(s_j,s_u) \sum\limits_{i=0}^{l} \delta(t_i,t_v) \frac{f(s_u|t_v) }{\sum\limits_{i=0}^{l}f(s_u|t_i)}
\label{eqC3.40-new}
\end{eqnarray}
......@@ -910,19 +910,19 @@ f(s_u|t_v)=\frac{c_{\mathbb{E}}(s_u|t_v;\mathbf{s},\mathbf{t})} { \sum\limits_{
\end{eqnarray}
\noindent \hspace{2em} 总的来说,对于实际情况中我们拥有的N个互译的句对(称作平行语料):
${(\mathbf{s}^{[1]},\mathbf{t}^{[1]}),(\mathbf{s}^{[2]},\mathbf{t}^{[2]}),...,(\mathbf{s}^{[N]},\mathbf{t}^{[N]})}$来说,$f(\mathbf{s}_u|\mathbf{t}_v)$的期望频次为:
${(\mathbf{s}^{[1]},\mathbf{t}^{[1]}),(\mathbf{s}^{[2]},\mathbf{t}^{[2]}),...,(\mathbf{s}^{[N]},\mathbf{t}^{[N]})}$来说,$f(s_u|t_v)$的期望频次为:
\begin{eqnarray}
c_{\mathbb{E}}(\mathbf{s}_u|\mathbf{t}_v)=\sum\limits_{i=1}^{N} c_{\mathbb{E}}(\mathbf{s}_u|\mathbf{t}_v;s^{[i]},t^{[i]})
c_{\mathbb{E}}(s_u|t_v)=\sum\limits_{i=1}^{N} c_{\mathbb{E}}(s_u|t_v;s^{[i]},t^{[i]})
\label{eqC3.47-new}
\end{eqnarray}
\noindent \hspace{2em} 于是有$f(\mathbf{s}_u|\mathbf{t}_v)$的计算公式和迭代过程如下:
\noindent \hspace{2em} 于是有$f(s_u|t_v)$的计算公式和迭代过程如下:
%----------------------------------------------
% 图3.30
\begin{figure}[htp]
\centering
\input{./Chapter3/Figures/figure-calculation-formula&iterative-process-of-function}
\caption{$f(\mathbf{s}_u|\mathbf{t}_v)$的计算公式和迭代过程}
\caption{$f(s_u|t_v)$的计算公式和迭代过程}
\label{fig:3-27}
\end{figure}
%---------------------------
......
......@@ -8,7 +8,7 @@
\node [anchor=west,inner sep=2pt,fill=red!20,minimum height=3em] (eq1) at (0,0) {$f(s_u|t_v)$};
\node [anchor=west,inner sep=2pt] (eq2) at ([xshift=-2pt]eq1.east) {$=$};
\node [anchor=west,inner sep=2pt] (eq3) at ([xshift=-2pt]eq2.east) {$\lambda_{t_v}^{-1}$};
\node [anchor=west,inner sep=2pt] (eq4) at ([xshift=-2pt]eq3.east) {$\frac{\epsilon}{(l+1)^{m}}$};
\node [anchor=west,inner sep=2pt] (eq4) at ([xshift=-2pt]eq3.east) {$\frac{\varepsilon}{(l+1)^{m}}$};
\node [anchor=west,inner sep=2pt,fill=red!20,minimum height=3em] (eq5) at ([xshift=-2pt]eq4.east) {\footnotesize{$\prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} f(s_j|t_i)$}};
\node [anchor=west,inner sep=2pt] (eq6) at ([xshift=-2pt]eq5.east) {\footnotesize{$\sum\limits_{j=1}^{m} \delta(s_j,s_u) \sum\limits_{i=0}^{l} \delta(t_i,t_v)$}};
\node [anchor=west,inner sep=2pt,fill=red!20,minimum height=3em] (eq7) at ([xshift=-2pt,yshift=-0pt]eq6.east) {$\frac{f(s_u|t_v)}{\sum_{i=0}^{l}f(s_u|t_i)}$};
......
......@@ -11,7 +11,7 @@
\node [anchor=west,inner sep=2pt,minimum height=2em] (eq1) at (0,0) {$f(s_u|t_v)$};
\node [anchor=west,inner sep=2pt] (eq2) at ([xshift=-2pt]eq1.east) {$=$};
\node [anchor=west,inner sep=2pt,minimum height=2em] (eq3) at ([xshift=-2pt]eq2.east) {$\lambda_{t_v}^{-1}$};
\node [anchor=west,inner sep=2pt,minimum height=3.0em] (eq4) at ([xshift=-3pt]eq3.east) {\footnotesize{$\frac{\epsilon}{(l+1)^{m}} \prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} f(s_j|t_i)$}};
\node [anchor=west,inner sep=2pt,minimum height=3.0em] (eq4) at ([xshift=-3pt]eq3.east) {\footnotesize{$\frac{\varepsilon}{(l+1)^{m}} \prod\limits_{j=1}^{m} \sum\limits_{i=0}^{l} f(s_j|t_i)$}};
\node [anchor=west,inner sep=2pt,minimum height=3.0em] (eq5) at ([xshift=1pt]eq4.east) {\footnotesize{$\sum\limits_{j=1}^{m} \delta(s_j,s_u) \sum\limits_{i=0}^{l} \delta(t_i,t_v)$}};
\node [anchor=west,inner sep=2pt,minimum height=3.0em] (eq6) at ([xshift=1pt]eq5.east) {$\frac{f(s_u|t_v)}{\sum_{i=0}^{l}f(s_u|t_i)}$};
......
......@@ -7,23 +7,23 @@
\begin{tikzpicture}
\node [anchor=west,inner sep=2pt] (eq1) at (0,0) {$f(\mathbf{s}_u|\mathbf{t}_v)$};
\node [anchor=west,inner sep=2pt] (eq1) at (0,0) {$f(s_u|t_v)$};
\node [anchor=west] (eq2) at (eq1.east) {$=$\ };
\draw [-] ([xshift=0.3em]eq2.east) -- ([xshift=11.6em]eq2.east);
\node [anchor=south west] (eq3) at ([xshift=1em]eq2.east) {$\sum_{i=1}^{N} c_{\mathbb{E}}(\mathbf{s}_u|\mathbf{t}_v;s^{[i]},t^{[i]})$};
\node [anchor=north west] (eq4) at (eq2.east) {$\sum_{\mathbf{s}_u} \sum_{i=1}^{N} c_{\mathbb{E}}(\mathbf{s}_u|\mathbf{t}_v;s^{[i]},t^{[i]})$};
\node [anchor=south west] (eq3) at ([xshift=1em]eq2.east) {$\sum_{i=1}^{N} c_{\mathbb{E}}(s_u|t_v;s^{[i]},t^{[i]})$};
\node [anchor=north west] (eq4) at (eq2.east) {$\sum_{s_u} \sum_{i=1}^{N} c_{\mathbb{E}}(s_u|t_v;s^{[i]},t^{[i]})$};
{
\node [anchor=south] (label1) at ([yshift=-6em,xshift=3em]eq1.north west) {利用这个公式计算};
\node [anchor=north west] (label1part2) at ([yshift=0.3em]label1.south west) {新的$f(\mathbf{s}_u|\mathbf{t}_v)$};
\node [anchor=north west] (label1part2) at ([yshift=0.3em]label1.south west) {新的$f(s_u|t_v)$};
}
{
\node [anchor=west] (label2) at ([xshift=5em]label1.east) {用当前的$f(\mathbf{s}_u|\mathbf{t}_v)$};
\node [anchor=west] (label2) at ([xshift=5em]label1.east) {用当前的$f(s_u|t_v)$};
\node [anchor=north west] (label2part2) at ([yshift=0.3em]label2.south west) {计算期望频次$c_{\mathbb{E}}(\cdot)$};
}
{
\node [anchor=west,fill=red!20,inner sep=2pt] (eq1) at (0,0) {$f(\mathbf{s}_u|\mathbf{t}_v)$};
\node [anchor=west,fill=red!20,inner sep=2pt] (eq1) at (0,0) {$f(s_u|t_v)$};
}
\begin{pgfonlayer}{background}
......
......@@ -8,7 +8,7 @@
\node [anchor=west] (e2) at (e1.east) {$=$};
\node [anchor=west,inner sep=2pt,fill=red!20] (e3) at (e2.east) {$\prod\nolimits_{(j,i) \in \hat{A}} \textrm{P}(s_j,t_i)$};
\node [anchor=west,inner sep=1pt] (e4) at (e3.east) {$\times$};
\node [anchor=west,inner sep=3pt,fill=blue!20] (e5) at (e4.east) {$\textrm{P}_{lm}(\mathbf{t})$};
\node [anchor=west,inner sep=3pt,fill=blue!20] (e5) at (e4.east) {$\textrm{P}_{\textrm{lm}}(\mathbf{t})$};
\node [anchor=north west,inner sep=1pt] (n1) at ([xshift=7.0em,yshift=-0.5em]e1.south west) {$\textrm{P}(\mathbf{s}|\mathbf{t})$};
\node [anchor=north] (n1part2) at ([yshift=0.3em]n1.south) {\scriptsize{{翻译模型}}};
\node [anchor=west,inner sep=1pt] (n2) at ([xshift=4.0em]n1.east) {$\textrm{P}(\mathbf{t})$};
......
......@@ -65,7 +65,7 @@
\item 同年Dzmitry Bahdanau等人首次将注意力机制(Attention Mechanism)应用到机器翻译领域,在机器翻译任务上将翻译和局部翻译单元之间的对应关系同时进行建模\cite{bahdanau2014neural}。Bahdanau等人工作的意义在于,使用了更加有效的模型来表示源语言的信息,同时使用注意力机制对两种语言不同部分之间的相互联系进行了建模。这种方法可以有效的处理长句子的翻译,而且注意力的中间结果具有一定的可解释性\footnote{比如,目标语言和源语言句子不同单词之间的注意力强度能够在一定程度上反应单词之间的互译程度。} 。然而相比于前人的神经机器翻译模型,注意力模型也引入了额外的成本,计算量较大。
\item 2016年谷歌发布了基于多层循环神经网络方法的GNMT系统,集成了当时的神经机器翻译技术,并进行了诸多的改进,性能显著优于基于短语的机器翻译系统\cite{Wu2016GooglesNM},引起了广泛的关注。在之后的不到一年中,Facebook采用卷积神经网络(CNN)研发了新的神经机器翻译系统\cite{DBLP:journals/corr/GehringAGYD17},实现了比基于循环神经网络(RNN)系统更好的表现水平,并获得了近9倍的加速。
\item 2016年谷歌发布了基于多层循环神经网络方法的GNMT系统,集成了当时的神经机器翻译技术,并进行了诸多的改进,性能显著优于基于短语的机器翻译系统\cite{Wu2016GooglesNM},引起了广泛的关注。在之后的不到一年中,Facebook采用卷积神经网络(CNN)研发了新的神经机器翻译系统\cite{DBLP:journals/corr/GehringAGYD17},实现了比基于循环神经网络(RNN)系统更好的表现水平,并获得了明显的加速。
\item 2017年,谷歌的Ashish Vaswani等人提出了新的翻译模型Transformer。其完全抛弃了CNN、RNN等结构,仅仅通过自注意力机制(self-attentiion)和前向神经网络,不需要使用序列对齐的循环框架就实现了强大的性能,并且巧妙的解决了翻译中的长距离依赖问题\cite{NIPS2017_7181}。Transformer是第一个完全基于注意力机制搭建的模型,不仅计算速度更快,在翻译任务上也获得了更好的结果,一跃成为目前最主流的神经机器翻译框架。
\end{itemize}
......@@ -1389,7 +1389,7 @@ L(\mathbf{Y},\hat{\mathbf{Y}}) = \sum_{j=1}^n L_{\textrm{ce}}(\mathbf{y}_j,\hat{
\begin{figure}[htp]
\centering
\input{./Chapter6/Figures/figure-Calculation-of-context-vector-C}
\caption{上下文向量C的计算}
\caption{上下文向量$\mathbf{C}$的计算}
\label{fig:6-39}
\end{figure}
%----------------------------------------------
......@@ -1432,8 +1432,8 @@ L(\mathbf{Y},\hat{\mathbf{Y}}) = \sum_{j=1}^n L_{\textrm{ce}}(\mathbf{y}_j,\hat{
\parinterval 那么为什么通过这种计算方式可以很好的表示位置信息?有三方面原因。首先,正余弦函数是具有上下界的周期函数,用正余弦函数可将长度不同的序列的位置编码的范围都固定到[-1,1],这样在与词的编码进行相加时,不至于产生太大差距。另外位置编码的不同维度对应不同的正余弦曲线,这为多维的表示空间赋予一定意义。最后,根据三角函数的性质:
%------------------------
\begin{eqnarray}
\textrm{sin}(\alpha + \beta) &=& \textrm{sin}\alpha \textrm{cos} \beta + \textrm{cos} \alpha \textrm{sin} \beta \nonumber \\
\textrm{cos}(\alpha + \beta) &=& \textrm{cos} \alpha \textrm{cos} \beta - \textrm{sin} \alpha \textrm{sin} \beta
\textrm{sin}(\alpha + \beta) &=& \textrm{sin}\alpha \cdot \textrm{cos} \beta + \textrm{cos} \alpha \cdot \textrm{sin} \beta \nonumber \\
\textrm{cos}(\alpha + \beta) &=& \textrm{cos} \alpha \cdot \textrm{cos} \beta - \textrm{sin} \alpha \cdot \textrm{sin} \beta
\label{eqC6.43}
\end{eqnarray}
......@@ -1707,8 +1707,8 @@ lrate = d_{model}^{-0.5} \cdot \textrm{min} (step^{-0.5} , step \cdot warmup\_st
\label{tab:word-translation-examples}
\begin{tabular}{l | l l l}
\multirow{2}{*}{\#} & \multicolumn{2}{c}{BLEU} & \multirow{2}{*}{params} \\
& EN-DE & EN-FR & \\ \hline
\multirow{2}{*}{系统} & \multicolumn{2}{c}{BLEU[\%]} & \# of \\
& EN-DE & EN-FR & params \\ \hline
Transformer Base & 27.3 & 38.1 & 65$\times 10^{6}$ \\
Transformer Big & 28.4 & 41.8 & 213$\times 10^{6}$ \\
Transformer Deep(48层) & 30.2 & 43.1 & 194$\times 10^{6}$ \\
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论