Commit a3a7c1da by zengxin

chapter7 fig

parent 46043306
...@@ -270,7 +270,7 @@ ...@@ -270,7 +270,7 @@
\subsubsection{大词表和OOV问题} \subsubsection{大词表和OOV问题}
\parinterval 首先来具体看一看神经机器翻译的大词表问题。神经机器翻译模型训练和解码都依赖于源语言和目标语言的词表。在建模中,词表中的每一个单词都会被转换为分布式(向量)表示,即词嵌入。这些向量会作为模型的输入(见第六章)。如果每个单词都对应一个向量,那么单词的各种变形(时态、语态等)都会导致词表和相应的向量数量的增加。 \parinterval 首先来具体看一看神经机器翻译的大词表问题。神经机器翻译模型训练和解码都依赖于源语言和目标语言的词表。在建模中,词表中的每一个单词都会被转换为分布式(向量)表示,即词嵌入。这些向量会作为模型的输入(见第六章)。如果每个单词都对应一个向量,那么单词的各种变形(时态、语态等)都会导致词表和相应的向量数量的增加。\ref{fig:7-7}展示了一些英语单词的时态语态变化。
%---------------------------------------------- %----------------------------------------------
\begin{figure}[htp] \begin{figure}[htp]
...@@ -1180,9 +1180,7 @@ b &=& \omega_{\textrm{high}}\cdot |\mathbf{x}| ...@@ -1180,9 +1180,7 @@ b &=& \omega_{\textrm{high}}\cdot |\mathbf{x}|
\label{eq:7-15} \label{eq:7-15}
\end{eqnarray} \end{eqnarray}
\noindent 其中$\gamma_{k}$表示第$k$个系统的权重,且满足$\sum_{k=1}^{K} \gamma_{k} = 1$。公式\ref{eq:7-15}是一种线性模型。权重$\{ \gamma_{k}\}$可以在开发集上自动调整,比如,使用最小错误率训练得到最优的权重(见第四章)。不过在实践中发现,如果这$K$个模型都是由一个基础模型衍生出来的,权重$\{ \gamma_{k}\}$对最终结果的影响并不大。因此,有时候也简单的将权重设置为$\gamma_{k} = \frac{1}{K}$ \noindent 其中$\gamma_{k}$表示第$k$个系统的权重,且满足$\sum_{k=1}^{K} \gamma_{k} = 1$。公式\ref{eq:7-15}是一种线性模型。权重$\{ \gamma_{k}\}$可以在开发集上自动调整,比如,使用最小错误率训练得到最优的权重(见第四章)。不过在实践中发现,如果这$K$个模型都是由一个基础模型衍生出来的,权重$\{ \gamma_{k}\}$对最终结果的影响并不大。因此,有时候也简单的将权重设置为$\gamma_{k} = \frac{1}{K}$。图\ref{fig:7-25}展示了对三个模型预测结果的集成。
\parinterval 公式\ref{eq:7-15}是一种典型的线性插值模型,这类模型在语言建模等任务中已经得到成功应用。从统计学习的角度,对多个模型的插值可以有效的降低经验错误率。不过,多模型集成依赖一个假设:这些模型之间需要有一定的互补性。这种互补性有时也体现在多个模型预测的上限上,称为Oracle。比如,可以把这$K$个模型输出中BLEU最高的结果作为Oracle,也可以选择每个预测结果中使BLEU达到最高的译文单词,这样构成的句子作为Oracle。当然,并不是说Oracle提高,模型集成的结果一定会变好。因为Oracle是最理想情况下的结果,而实际预测的结果与Oracle往往有很大差异。如何使用Oracle进行模型优化也是很多研究者在探索的问题。
%---------------------------------------------- %----------------------------------------------
\begin{figure}[htp] \begin{figure}[htp]
...@@ -1193,6 +1191,8 @@ b &=& \omega_{\textrm{high}}\cdot |\mathbf{x}| ...@@ -1193,6 +1191,8 @@ b &=& \omega_{\textrm{high}}\cdot |\mathbf{x}|
\end{figure} \end{figure}
%---------------------------------------------- %----------------------------------------------
\parinterval 公式\ref{eq:7-15}是一种典型的线性插值模型,这类模型在语言建模等任务中已经得到成功应用。从统计学习的角度,对多个模型的插值可以有效的降低经验错误率。不过,多模型集成依赖一个假设:这些模型之间需要有一定的互补性。这种互补性有时也体现在多个模型预测的上限上,称为Oracle。比如,可以把这$K$个模型输出中BLEU最高的结果作为Oracle,也可以选择每个预测结果中使BLEU达到最高的译文单词,这样构成的句子作为Oracle。当然,并不是说Oracle提高,模型集成的结果一定会变好。因为Oracle是最理想情况下的结果,而实际预测的结果与Oracle往往有很大差异。如何使用Oracle进行模型优化也是很多研究者在探索的问题。
\parinterval 此外,如何构建集成用的模型也是非常重要的,甚至说这部分工作会成为模型集成方法中最困难的部分。绝大多数时候,模型生成并没有固定的方法。系统研发者大多也是``八仙过海、各显神通''。一些常用的方法有: \parinterval 此外,如何构建集成用的模型也是非常重要的,甚至说这部分工作会成为模型集成方法中最困难的部分。绝大多数时候,模型生成并没有固定的方法。系统研发者大多也是``八仙过海、各显神通''。一些常用的方法有:
\begin{itemize} \begin{itemize}
......
\begin{tikzpicture} \begin{tikzpicture}
\tikzstyle{node} = [minimum height=1.0em,draw=teal,fill=teal!10] \tikzstyle{node} = [minimum height=1.0*1.2em,draw=teal,fill=teal!10]
\tikzstyle{legend} = [minimum height=1.0em,minimum width=1.0em,draw] \tikzstyle{legend} = [minimum height=1.0*1.2em,minimum width=1.0*1.2em,draw]
\tikzstyle{node2} = [minimum width=1.0em,minimum height=4.1em,draw=blue,fill=blue!10] \tikzstyle{node2} = [minimum width=1.0*1.2em,minimum height=4.1*1.2em,draw=blue,fill=blue!10]
\node[node,minimum width=2.8em] (node1) at (0,0) {}; \node[node,minimum width=2.8*1.2em] (node1) at (0,0) {};
\node[node,minimum width=4.0em,anchor=north west] (node2) at (node1.south west) {}; \node[node,minimum width=4.0*1.2em,anchor=north west] (node2) at (node1.south west) {};
\node[node,minimum width=3.2em,anchor=north west] (node3) at (node2.south west) {}; \node[node,minimum width=3.2*1.2em,anchor=north west] (node3) at (node2.south west) {};
\node[node,minimum width=3.0em,anchor=north west] (node4) at (node3.south west) {}; \node[node,minimum width=3.0*1.2em,anchor=north west] (node4) at (node3.south west) {};
\node[node2,anchor = north west] (grad1) at ([xshift=1.2em]node1.north east) {}; \node[node2,anchor = north west] (grad1) at ([xshift=1.2em]node1.north east) {};
\node[node,minimum width=3.7em,anchor=north west] (node5) at (grad1.north east) {}; \node[node,minimum width=3.7*1.2em,anchor=north west] (node5) at (grad1.north east) {};
\node[node,minimum width=2.8em,anchor=north west] (node6) at (node5.south west) {}; \node[node,minimum width=2.8*1.2em,anchor=north west] (node6) at (node5.south west) {};
\node[node,minimum width=3.2em,anchor=north west] (node7) at (node6.south west) {}; \node[node,minimum width=3.2*1.2em,anchor=north west] (node7) at (node6.south west) {};
\node[node,minimum width=4.0em,anchor=north west] (node8) at (node7.south west) {}; \node[node,minimum width=4.0*1.2em,anchor=north west] (node8) at (node7.south west) {};
\node[font=\scriptsize,anchor=east] (line1) at (node1.west) {gpu1}; \node[font=\footnotesize,anchor=east] (line1) at (node1.west) {gpu1};
\node[font=\scriptsize,anchor=east] (line2) at (node2.west) {gpu2}; \node[font=\footnotesize,anchor=east] (line2) at (node2.west) {gpu2};
\node[font=\scriptsize,anchor=east] (line3) at (node3.west) {gpu3}; \node[font=\footnotesize,anchor=east] (line3) at (node3.west) {gpu3};
\node[font=\scriptsize,anchor=east] (line4) at (node4.west) {gpu4}; \node[font=\footnotesize,anchor=east] (line4) at (node4.west) {gpu4};
\node[node2,anchor = north west] (grad2) at ([xshift=0.3em]node5.north east) {}; \node[node2,anchor = north west] (grad2) at ([xshift=0.3em]node5.north east) {};
\draw[->] (-1.4em,-3.62em) -- (9.5em,-3.62em); \draw[->] (-1.4em*1.2,-3.62*1.2em) -- (9em*1.2,-3.62*1.2em);
\node[node,minimum width=2.8em] (node9) at (15em,0) {}; \node[node,minimum width=2.8*1.2em] (node9) at (16em,0) {};
\node[node,minimum width=4.0em,anchor=north west] (node10) at (node9.south west) {}; \node[node,minimum width=4.0*1.2em,anchor=north west] (node10) at (node9.south west) {};
\node[node,minimum width=3.2em,anchor=north west] (node11) at (node10.south west) {}; \node[node,minimum width=3.2*1.2em,anchor=north west] (node11) at (node10.south west) {};
\node[node,minimum width=3.0em,anchor=north west] (node12) at (node11.south west) {}; \node[node,minimum width=3.0*1.2em,anchor=north west] (node12) at (node11.south west) {};
\node[node,minimum width=3.7em,anchor=north west] (node13) at (node9.north east) {}; \node[node,minimum width=3.7*1.2em,anchor=north west] (node13) at (node9.north east) {};
\node[node,minimum width=2.8em,anchor=north west] (node14) at (node10.north east) {}; \node[node,minimum width=2.8*1.2em,anchor=north west] (node14) at (node10.north east) {};
\node[node,minimum width=3.2em,anchor=north west] (node15) at (node11.north east) {}; \node[node,minimum width=3.2*1.2em,anchor=north west] (node15) at (node11.north east) {};
\node[node,minimum width=4.0em,anchor=north west] (node16) at (node12.north east) {}; \node[node,minimum width=4.0*1.2em,anchor=north west] (node16) at (node12.north east) {};
\node[node2,anchor = north west] (grad3) at ([xshift=0.5em]node13.north east) {}; \node[node2,anchor = north west] (grad3) at ([xshift=0.5em]node13.north east) {};
\node[font=\scriptsize,anchor=east] (line1) at (node9.west) {gpu1}; \node[font=\footnotesize,anchor=east] (line1) at (node9.west) {gpu1};
\node[font=\scriptsize,anchor=east] (line2) at (node10.west) {gpu2}; \node[font=\footnotesize,anchor=east] (line2) at (node10.west) {gpu2};
\node[font=\scriptsize,anchor=east] (line3) at (node11.west) {gpu3}; \node[font=\footnotesize,anchor=east] (line3) at (node11.west) {gpu3};
\node[font=\scriptsize,anchor=east] (line4) at (node12.west) {gpu4}; \node[font=\footnotesize,anchor=east] (line4) at (node12.west) {gpu4};
\draw[->] (13.6em,-3.62em) -- (22.2em,-3.62em); \draw[->] (13.6*1.2em,-3.62*1.2em) -- (20.5*1.2em,-3.62*1.2em);
\begin{pgfonlayer}{background} \begin{pgfonlayer}{background}
\node [rectangle,inner sep=-0.0em,draw] [fit = (node1) (node2) (node3) (node4)] (box1) {}; \node [rectangle,inner sep=-0.0em,draw] [fit = (node1) (node2) (node3) (node4)] (box1) {};
\node [rectangle,inner sep=-0.0em,draw] [fit = (node5) (node6) (node7) (node8)] (box2) {}; \node [rectangle,inner sep=-0.0em,draw] [fit = (node5) (node6) (node7) (node8)] (box2) {};
\node [rectangle,inner sep=-0.0em,draw] [fit = (node9) (node13) (node12) (node16)] (box2) {}; \node [rectangle,inner sep=-0.0em,draw] [fit = (node9) (node13) (node12) (node16)] (box2) {};
\end{pgfonlayer} \end{pgfonlayer}
\node[font=\scriptsize,anchor=north] (legend1) at ([xshift=3em]node4.south) {一步一更新}; \node[font=\footnotesize,anchor=north] (legend1) at ([xshift=3em]node4.south) {一步一更新};
\node[font=\scriptsize,anchor=north] (legend2) at ([xshift=2.5em]node12.south) {累积两步更新}; \node[font=\footnotesize,anchor=north] (legend2) at ([xshift=2.5em]node12.south) {累积两步更新};
\node[font=\scriptsize,anchor=north] (time1) at (grad2.south) {time}; \node[font=\footnotesize,anchor=north] (time1) at (grad2.south) {time};
\node[font=\scriptsize,anchor=north] (time1) at (grad3.south) {time}; \node[font=\footnotesize,anchor=north] (time1) at (grad3.south) {time};
\node[legend] (legend3) at (2em,2em) {}; \node[legend] (legend3) at (2em,2em) {};
\node[font=\scriptsize,anchor=west] (idle) at (legend3.east) {:空闲}; \node[font=\footnotesize,anchor=west] (idle) at (legend3.east) {:空闲};
\node[legend,anchor=west,draw=teal,fill=teal!10] (legend4) at ([xshift = 2em]idle.east) {}; \node[legend,anchor=west,draw=teal,fill=teal!10] (legend4) at ([xshift = 2em]idle.east) {};
\node[font=\scriptsize,anchor=west] (FB) at (legend4.east) {:前向/反向}; \node[font=\footnotesize,anchor=west] (FB) at (legend4.east) {:前向/反向};
\node[legend,anchor=west,draw=blue,fill=blue!10] (legend5) at ([xshift = 2em]FB.east) {}; \node[legend,anchor=west,draw=blue,fill=blue!10] (legend5) at ([xshift = 2em]FB.east) {};
\node[font=\scriptsize,anchor=west] (grad_sync) at (legend5.east) {:梯度更新}; \node[font=\footnotesize,anchor=west] (grad_sync) at (legend5.east) {:梯度更新};
\end{tikzpicture} \end{tikzpicture}
\ No newline at end of file
\begin{tikzpicture} \begin{tikzpicture}
\tikzstyle{node} = [minimum height=1.0em,draw=teal,fill=teal!10] \tikzstyle{node} = [minimum height=1.0*1.2em,draw=teal,fill=teal!10]
\node[node,minimum width=2.0em] (sent1) at (0,0) {}; \node[node,minimum width=2.0*1.2em] (sent1) at (0,0) {};
\node[node,minimum width=5.0em,anchor=north west] (sent2) at (sent1.south west) {}; \node[node,minimum width=5.0*1.2em,anchor=north west] (sent2) at (sent1.south west) {};
\node[node,minimum width=1.0em,anchor=north west] (sent3) at (sent2.south west) {}; \node[node,minimum width=1.0*1.2em,anchor=north west] (sent3) at (sent2.south west) {};
\node[node,minimum width=3.0em,anchor=north west] (sent4) at (sent3.south west) {}; \node[node,minimum width=3.0*1.2em,anchor=north west] (sent4) at (sent3.south west) {};
\node[node,minimum width=4.0em] (sent5) at (12em,0) {}; \node[node,minimum width=4.0*1.2em] (sent5) at (14em,0) {};
\node[node,minimum width=4.5em,anchor=north west] (sent6) at (sent5.south west) {}; \node[node,minimum width=4.5*1.2em,anchor=north west] (sent6) at (sent5.south west) {};
\node[node,minimum width=4.5em,anchor=north west] (sent7) at (sent6.south west) {}; \node[node,minimum width=4.5*1.2em,anchor=north west] (sent7) at (sent6.south west) {};
\node[node,minimum width=5em,anchor=north west] (sent8) at (sent7.south west) {}; \node[node,minimum width=5*1.2em,anchor=north west] (sent8) at (sent7.south west) {};
\node[font=\scriptsize,anchor=east] (line1) at (sent1.west) {sent1}; \node[font=\footnotesize,anchor=east] (line1) at (sent1.west) {sent1};
\node[font=\scriptsize,anchor=east] (line2) at (sent2.west) {sent2}; \node[font=\footnotesize,anchor=east] (line2) at (sent2.west) {sent2};
\node[font=\scriptsize,anchor=east] (line3) at (sent3.west) {sent3}; \node[font=\footnotesize,anchor=east] (line3) at (sent3.west) {sent3};
\node[font=\scriptsize,anchor=east] (line4) at (sent4.west) {sent4}; \node[font=\footnotesize,anchor=east] (line4) at (sent4.west) {sent4};
\node[font=\scriptsize,anchor=east] (line5) at (sent5.west) {sent1}; \node[font=\footnotesize,anchor=east] (line5) at (sent5.west) {sent1};
\node[font=\scriptsize,anchor=east] (line6) at (sent6.west) {sent2}; \node[font=\footnotesize,anchor=east] (line6) at (sent6.west) {sent2};
\node[font=\scriptsize,anchor=east] (line7) at (sent7.west) {sent3}; \node[font=\footnotesize,anchor=east] (line7) at (sent7.west) {sent3};
\node[font=\scriptsize,anchor=east] (line8) at (sent8.west) {sent4}; \node[font=\footnotesize,anchor=east] (line8) at (sent8.west) {sent4};
\begin{pgfonlayer}{background} \begin{pgfonlayer}{background}
\node [rectangle,inner sep=-0.0em,draw] [fit = (sent1) (sent2) (sent3) (sent4)] (box1) {}; \node [rectangle,inner sep=-0.0em,draw] [fit = (sent1) (sent2) (sent3) (sent4)] (box1) {};
\node [rectangle,inner sep=-0.0em,draw] [fit = (sent5) (sent6) (sent7) (sent8)] (box2) {}; \node [rectangle,inner sep=-0.0em,draw] [fit = (sent5) (sent6) (sent7) (sent8)] (box2) {};
\end{pgfonlayer} \end{pgfonlayer}
\node[font=\scriptsize] (node1) at ([yshift=-3em]sent2.south) {随机生成}; \node[font=\footnotesize] (node1) at ([yshift=-3.4em]sent2.south) {随机生成};
\node[font=\scriptsize] (node2) at ([yshift=-1em]sent8.south) {排序生成}; \node[font=\footnotesize] (node2) at ([yshift=-1em]sent8.south) {排序生成};
\end{tikzpicture} \end{tikzpicture}
\ No newline at end of file
\begin{center} \begin{center}
\centerline{以英语为例:}
\vspace{0.5em}
\begin{tikzpicture} \begin{tikzpicture}
\node[rounded corners=3pt,minimum width=10.0em,minimum height=2.0em,draw,thick,fill=green!5,font=\scriptsize,drop shadow,inner sep=0.5em] (left) at (0,0) { \node[rounded corners=3pt,minimum width=10.0em,minimum height=2.0em,draw,thick,fill=green!5,font=\scriptsize,drop shadow,inner sep=0.5em] (left) at (0,0) {
\begin{tabular}{c} \begin{tabular}{c}
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论