9的图

0932b887 · 孟霞 · 0a09044b · 0932b887 · 0932b887 · 0932b887
Commit 0932b887 authored Nov 03, 2020 by 孟霞
--- a/Chapter9/Figures/figure-deep-learning.jpg
+++ b/Chapter9/Figures/figure-deep-learning.jpg
--- a/Chapter9/Figures/figure-embedding-matrix.tex
+++ b/Chapter9/Figures/figure-embedding-matrix.tex
 %%%------------------------------------------------------------------------------------------------------------
 \begin{tikzpicture}
 \begin{scope}
-\node [anchor=center,inner sep=2pt] (e) at (0,0) {\small{$e=w$}};
-\node [anchor=west,inner sep=2pt] (c) at (e.east) {\small{${\bm C}$}};
+\node [anchor=center,inner sep=2pt] (e) at (0,0) {\small{$\vectorn{\emph{e}}=\vectorn{\emph{o}}$}};
+\node [anchor=west,inner sep=2pt] (c) at (e.east) {\small{$\vectorn{\emph{C}}$}};

 \begin{pgfonlayer}{background}
 \node [rectangle,inner sep=0.4em,draw,fill=blue!20!white] [fit = (e) (c)] (box) {};
 \end{pgfonlayer}

 \draw [->,thick] ([yshift=-1em]box.south)--([yshift=-0.1em]box.south) node [pos=0,below] (bottom1) {\small{单词$w$的one-hot表示}};
-\draw [->,thick] ([yshift=0.1em]box.north)--([yshift=1em]box.north) node [pos=1,above] (top1) {\scriptsize{${\bm e}$=(8,.2,-1,.9,...,1)}};
-\node [anchor=north] (bottom2) at ([yshift=0.3em]bottom1.south) {\scriptsize{${\bm o}$=(0,0,1,0,...,0)}};
+\draw [->,thick] ([yshift=0.1em]box.north)--([yshift=1em]box.north) node [pos=1,above] (top1) {\scriptsize{$\vectorn{\emph{e}}$=(8,.2,-1,.9,...,1)}};
+\node [anchor=north] (bottom2) at ([yshift=0.3em]bottom1.south) {\scriptsize{$\vectorn{\emph{o}}$=(0,0,1,0,...,0)}};
 \node [anchor=south] (top2) at ([yshift=-0.3em]top1.north) {\small{单词$w$的分布式表示}};

 {
 \node [anchor=north west,fill=red!20!white] (cmatrix) at ([xshift=3em,yshift=1.0em]c.north east) {\scriptsize{$\begin{pmatrix} 1 & .2 & -.2 & 8 & ... & 0 \\ .6 & .8 & -2 & 1 & ... & -.2 \\ 8 & .2 & -1 & .9 & ... & 2.3 \\ 1 & 1.2 & -.9 & 3 & ... & .2 \\ ... & ... & ... & ... & ... & ... \\ 1 & .3 & 3 & .9 & ... & 5.1 \end{pmatrix}$}};
-\node [anchor=west,inner sep=2pt,fill=red!30!white] (c) at (e.east) {\small{$\textbf{C}$}};
+\node [anchor=west,inner sep=2pt,fill=red!30!white] (c) at (e.east) {\small{$\vectorn{\emph{C}}$}};
 \draw [<-,thick] (c.east) -- ([xshift=3em]c.east);
 }

 {
-\node [anchor=south,draw,fill=green!20!white] (e2) at ([yshift=1.5em]cmatrix.north) {\scriptsize{外部词嵌入系统得到的${\bm C}$}};
+\node [anchor=south,draw,fill=green!20!white] (e2) at ([yshift=1.5em]cmatrix.north) {\scriptsize{外部词嵌入系统得到的$\vectorn{\emph{C}}$}};
 \draw [->,very thick,dashed] (e2.south) -- (cmatrix.north);
 }


--- a/Chapter9/Figures/figure-feature-engineering.jpg
+++ b/Chapter9/Figures/figure-feature-engineering.jpg
--- a/Chapter9/Figures/figure-four-layers-of-neural-network.tex
+++ b/Chapter9/Figures/figure-four-layers-of-neural-network.tex
@@ -25,7 +25,7 @@
 \node [rectangle,inner sep=0.2em,fill=red!20] [fit = (neuron01) (neuron05)] (layer01) {};
 \end{pgfonlayer}

-\node [anchor=west] (layer00label) at ([xshift=1.3em]x5.east) {\footnotesize{第0层}};
+\node [anchor=west] (layer00label) at ([xshift=1.4em]x5.east) {\footnotesize{第0层}};
 \node [anchor=west] (layer00label2) at (layer00label.east) {\footnotesize{(输入层)}};
 {
 \node [anchor=west] (layer01label) at ([xshift=1em]layer01.east) {\footnotesize{第1层}};

--- a/Chapter9/Figures/figure-multilayer-neural-network-example.tex
+++ b/Chapter9/Figures/figure-multilayer-neural-network-example.tex
@@ -54,7 +54,7 @@
 \foreach \n in {1,...,4}{
    \node [neuronnode] (neuron3\n) at (\n * \neuronsep,12.4em) {};
 {
-    \draw [<-] ([yshift=0.6em]neuron3\n.north) -- ([yshift=0.0em]neuron3\n.north) node [pos=0,above] {\scriptsize{output}};
+    \draw [<-] ([yshift=0.6em]neuron3\n.north) -- ([yshift=0.0em]neuron3\n.north) node [pos=0,above] {\scriptsize{输出}};
    }

    \draw [->] ([yshift=-0.6em]neuron3\n.south) -- ([yshift=0.0em]neuron3\n.south);

--- a/Chapter9/Figures/figure-parallel.tex
+++ b/Chapter9/Figures/figure-parallel.tex
@@ -25,10 +25,10 @@
 \tikzstyle{processor} = [draw,thick,fill=orange!20,minimum width=4em,align=left,rounded corners=2pt]

 {
-\node [processor,anchor=north,align=center] (processor2) at ([yshift=-1.2in]serverlabel.south) {\scriptsize{处理器 2}\\\scriptsize{on GPU2 (G2)}};
+\node [processor,anchor=north,align=center] (processor2) at ([yshift=-1.2in]serverlabel.south) {\scriptsize{处理器 2}\\\scriptsize{(G2)}};
 \node [anchor=north] (labela) at ([xshift=4em,yshift=-1em]processor2.south) {\footnotesize {(a)同步更新}};
-\node [processor,anchor=east,align=center] (processor1) at ([xshift=-1em]processor2.west) {\scriptsize{处理器 1}\\\scriptsize{on GPU1 (G1)}};
-\node [processor,anchor=west,align=center] (processor3) at ([xshift=1em]processor2.east) {\scriptsize{处理器 3}\\\scriptsize{on GPU3 (G3)}};
+\node [processor,anchor=east,align=center] (processor1) at ([xshift=-1em]processor2.west) {\scriptsize{处理器 1}\\\scriptsize{(G1)}};
+\node [processor,anchor=west,align=center] (processor3) at ([xshift=1em]processor2.east) {\scriptsize{处理器 3}\\\scriptsize{(G3)}};
 }

 {
@@ -103,10 +103,10 @@
 \tikzstyle{processor} = [draw,thick,fill=orange!20,minimum width=4em,align=left,rounded corners=2pt]

 {
-\node [processor,anchor=north,align=center] (processor2) at ([yshift=-1.2in]serverlabel.south) {\scriptsize{处理器 2}\\\scriptsize{on GPU2 (G2)}};
+\node [processor,anchor=north,align=center] (processor2) at ([yshift=-1.2in]serverlabel.south) {\scriptsize{处理器 2}\\\scriptsize{(G2)}};
 \node [anchor=north] (label) at ([xshift=4em,yshift=-1em]processor2.south) {\footnotesize {(b)异步更新}};
-\node [processor,anchor=east,align=center] (processor1) at ([xshift=-1em]processor2.west) {\scriptsize{处理器 1}\\\scriptsize{on GPU1 (G1)}};
-\node [processor,anchor=west,align=center] (processor3) at ([xshift=1em]processor2.east) {\scriptsize{处理器 3}\\\scriptsize{on GPU3 (G3)}};
+\node [processor,anchor=east,align=center] (processor1) at ([xshift=-1em]processor2.west) {\scriptsize{处理器 1}\\\scriptsize{(G1)}};
+\node [processor,anchor=west,align=center] (processor3) at ([xshift=1em]processor2.east) {\scriptsize{处理器 3}\\\scriptsize{(G3)}};
 }

 {

--- a/Chapter9/Figures/figure-the-amount-of-data-in-a-bilingual-corpus.tex
+++ b/Chapter9/Figures/figure-the-amount-of-data-in-a-bilingual-corpus.tex
@@ -7,7 +7,7 @@
    yticklabel style={/pgf/number format/precision=1,/pgf/number format/fixed zerofill},
    xticklabel style={/pgf/number format/1000 sep=},
    xlabel style={yshift=0.5em},
-    xlabel={\footnotesize{Year}},ylabel={\footnotesize{句子数量}},
+    xlabel={\footnotesize{Year}},ylabel={\footnotesize{句子数量(个)}},
    ymin=1,ymax=1000000000000,
    xmin=1999,xmax=2020,xtick={2000,2005,2010,2015,2020},
    legend style={yshift=-5em,xshift=0em,legend cell align=left,legend plot pos=right}

--- a/Chapter9/Figures/figure-w2.tex
+++ b/Chapter9/Figures/figure-w2.tex
@@ -10,7 +10,7 @@
 \node [anchor=south east,inner sep=1pt] (label2) at (0,0) {\tiny{0}};
 \node [anchor=south east,inner sep=1pt] (labela) at (0.2,-0.5) {\footnotesize{(a)}};
 }
-{\node [anchor=north west,align=left] (wblabel) at (-1.8,2) {{\scriptsize{\ $w'_{11}=0.7$}}};}
+{\node [anchor=north west,align=left] (wblabel) at (-2,2) {{\scriptsize{\ $w'_{11}=0.7$}}};}
 {\draw [-,very thick,ublue,rounded corners=0.1em] (-1.5,0) -- (0.5,0) -- (0.5,0.7) -- (1.5,0.7);}
 \end{scope}
 %---------------------------------------------------------------------------------------------
@@ -23,7 +23,7 @@
 \node [anchor=south east,inner sep=1pt] (label2) at (0,0) {\tiny{0}};
 \node [anchor=south east,inner sep=1pt] (labelb) at (0.2,-0.5) {\footnotesize{(b)}};
 }
-{\node [anchor=north west,align=left] (wblabel) at (-1.8,2) {{\scriptsize{\ $w_{12}=100$}}\\[-0ex] {\scriptsize{\ $b_2=-6$}}\\[-0ex] {\scriptsize{\ $w'_{21}=0.7$}}};}
+{\node [anchor=north west,align=left] (wblabel) at (-2,2) {{\scriptsize{\ $w_{12}=100$}}\\[-0ex] {\scriptsize{\ $b_2=-6$}}\\[-0ex] {\scriptsize{\ $w'_{21}=0.7$}}};}
 {\draw [-,very thick,ublue,rounded corners=0.1em] (-1.5,0) -- (0.5,0) -- (0.5,0.7) -- (0.7,0.7) -- (0.7,1.4) -- (1.5,1.4);}
 \end{scope}
 %-----------------------------------------------------------------------------------------------
@@ -37,7 +37,7 @@
 \node [anchor=south east,inner sep=1pt] (label2) at (0,0) {\tiny{0}};
 \node [anchor=south east,inner sep=1pt] (labelc) at (0.2,-0.5) {\footnotesize{(c)}};
 }
-{\node [anchor=north west,align=left] (wblabel) at (-1.8,2) {\scriptsize{\ $w_{12}=100$}\\[-0ex] \scriptsize{\ $b_2=-6$}\\[-0ex] {\scriptsize{\ $w'_{21}=-0.7$}}};}
+{\node [anchor=north west,align=left] (wblabel) at (-2,2) {\scriptsize{\ $w_{12}=100$}\\[-0ex] \scriptsize{\ $b_2=-6$}\\[-0ex] {\scriptsize{\ $w'_{21}=-0.7$}}};}
 {\draw [-,very thick,ublue,rounded corners=0.1em] (-1.5,0) -- (0.5,0) -- (0.5,0.7) -- (0.7,0.7) -- (0.7,0) -- (1.5,0);}
 \end{scope}


--- a/Chapter9/chapter9.tex
+++ b/Chapter9/chapter9.tex
@@ -1522,7 +1522,7 @@ z_t&=&\gamma z_{t-1}+(1-\gamma) \frac{\partial J}{\partial {\theta}_t} \cdot  \f
 \vspace{0.5em}
 \end{itemize}

-\parinterval  图\ref{fig:9-47}对比了同步更新和异步更新的区别，在这个例子中，使用4台设备对一个两层神经网络中的参数进行更新，其中使用了一个{\small\bfnew{参数服务器}}\index{参数服务器}（Parameter Server\index{Parameter Server}）来保存最新的参数，不同设备（Worker，图中的G1、G2、G3）可以通过同步或者异步的方式访问参数服务器。图中的$ {\bm \theta}_o $和$ {\bm \theta}_h $分别代表输出层和隐藏层的全部参数，操作push(P) 表示设备向参数服务器传送梯度，操作fetch(F)表示参数服务器向设备传送更新后的参数。
+\parinterval  图\ref{fig:9-47}对比了同步更新和异步更新的区别，在这个例子中，使用4台设备对一个两层神经网络中的参数进行更新，其中使用了一个{\small\bfnew{参数服务器}}\index{参数服务器}（Parameter Server\index{Parameter Server}）来保存最新的参数，不同设备（图中的G1、G2、G3）可以通过同步或者异步的方式访问参数服务器。图中的$ {\bm \theta}_o $和$ {\bm \theta}_h $分别代表输出层和隐藏层的全部参数，操作push(P) 表示设备向参数服务器传送梯度，操作fetch(F)表示参数服务器向设备传送更新后的参数。

 \parinterval  此外，在使用多个设备进行并行训练的时候，由于设备间带宽的限制，大量的数据传输会有较高的延时。对于复杂神经网络来说，设备间参数和梯度传递的时间消耗也会成为一个不得不考虑的因素。有时候，设备间数据传输的时间甚至比模型计算的时间都长，大大降低了并行度\upcite{xiao2017fast}。对于这种问题，可以考虑对数据进行压缩或者减少传输的次数来缓解问题。

@@ -1908,7 +1908,7 @@ z_t&=&\gamma z_{t-1}+(1-\gamma) \frac{\partial J}{\partial {\theta}_t} \cdot  \f
 \end{figure}
 %-------------------------------------------

-\parinterval  为了有一个直观的认识，这里以4-gram的FNNLM语言模型为例，即根据前三个单词$ w_{i-3} $、 $ w_{i-2} $ 、$ w_{i-1} $预测当前单词$ w_i $的概率。模型结构如图\ref{fig:9-60}所示。从结构上看，FNNLM是一个典型的多层神经网络结构。主要有三层：
+\parinterval  为了有一个直观的认识，这里以4-gram的FNNLM为例，即根据前三个单词$ w_{i-3} $、 $ w_{i-2} $ 、$ w_{i-1} $预测当前单词$ w_i $的概率。模型结构如图\ref{fig:9-60}所示。从结构上看，FNNLM是一个典型的多层神经网络结构。主要有三层：

 \begin{itemize}
 \vspace{0.3em}
@@ -1980,9 +1980,9 @@ z_t&=&\gamma z_{t-1}+(1-\gamma) \frac{\partial J}{\partial {\theta}_t} \cdot  \f

 \parinterval  在FNNLM中，所有的参数、输入、输出都是连续变量，因此FNNLM也是典型的一个连续空间模型。通过使用交叉熵等损失函数，FNNLM很容易进行优化。比如，可以使用梯度下降方法对FNNLM的模型参数进行训练。

-\parinterval  虽然FNNLM模型形式简单，却为处理自然语言提供了一个全新的视角。首先，该模型重新定义了“词是什么”\ \dash \ 它并非词典的一项，而是可以用一个连续实数向量进行表示的可计算的“量”。此外，由于$n$-gram不再是离散的符号序列，模型不需要记录$n$-gram，所以很好的缓解了上面所提到的数据稀疏问题，模型体积也大大减小。
+\parinterval  虽然FNNLM形式简单，却为处理自然语言提供了一个全新的视角。首先，该模型重新定义了“词是什么”\ \dash \ 它并非词典的一项，而是可以用一个连续实数向量进行表示的可计算的“量”。此外，由于$n$-gram不再是离散的符号序列，模型不需要记录$n$-gram，所以很好的缓解了上面所提到的数据稀疏问题，模型体积也大大减小。

-\parinterval  当然，FNNLM模型也引发后人的许多思考，比如：神经网络每一层都学到了什么？是词法、句法，还是一些其他知识？如何理解词的分布式表示？等等。在随后的内容中也会看到，随着近几年深度学习和自然语言处理的发展，部分问题已经得到了很好的解答，但是仍有许多问题需要进一步探索。
+\parinterval  当然，FNNLM也引发后人的许多思考，比如：神经网络每一层都学到了什么？是词法、句法，还是一些其他知识？如何理解词的分布式表示？等等。在随后的内容中也会看到，随着近几年深度学习和自然语言处理的发展，部分问题已经得到了很好的解答，但是仍有许多问题需要进一步探索。

 %----------------------------------------------------------------------------------------
 %    NEW SUB-SECTION
@@ -1990,7 +1990,7 @@ z_t&=&\gamma z_{t-1}+(1-\gamma) \frac{\partial J}{\partial {\theta}_t} \cdot  \f

 \subsection{对于长序列的建模}

-\parinterval  FNNLM模型固然有效，但是和传统的$n$-gram语言模型一样需要依赖有限上下文假设，也就是$ w_i $的生成概率只依赖于之前的$ n-1 $个单词。很自然的一个想法是引入更大范围的历史信息，这样可以捕捉单词间的长距离依赖。
+\parinterval  FNNLM模型有效，但是和传统的$n$-gram语言模型一样需要依赖有限上下文假设，也就是$ w_i $的生成概率只依赖于之前的$ n-1 $个单词。很自然的一个想法是引入更大范围的历史信息，这样可以捕捉单词间的长距离依赖。

 %----------------------------------------------------------------------------------------
 %    NEW SUBSUB-SECTION
@@ -2138,7 +2138,7 @@ Jobs was the CEO of {\red{\underline{apple}}}.

 \parinterval  回忆一下神经语言模型的结构，它需要在每个位置预测单词生成的概率。这个概率是由若干层神经网络进行计算后，通过输出层得到的。实际上，在送入输出层之前，系统已经得到了这个位置的一个向量（隐藏层的输出），因此可以把它看作是含有一部分上下文信息的表示结果。

-\parinterval  以RNN为例，图\ref{fig:9-68}展示了一个由四个词组成的句子，这里使用了一个两层循环神经网络对其进行建模。可以看到，对于第三个位置，RNN已经积累了从第1个单词到第3个单词的信息，因此可以看作是单词1-3（“乔布斯\ 就职\ 于”）的一种表示；另一方面，第4个单词的词嵌入可以看作是“苹果”自身的表示。这样，可以把第3 个位置RNN的输出和第4个位置的词嵌入进行合并，就得到了第4个位置上含有上下文信息的表示结果。从另一个角度说，这里得到了“苹果”的一种新的表示，它不仅包含苹果这个词自身的信息，也包含它前文的信息。
+\parinterval  以RNNLM为例，图\ref{fig:9-68}展示了一个由四个词组成的句子，这里使用了一个两层循环神经网络对其进行建模。可以看到，对于第三个位置，RNNLM已经积累了从第1个单词到第3个单词的信息，因此可以看作是单词1-3（“乔布斯\ 就职\ 于”）的一种表示；另一方面，第4个单词的词嵌入可以看作是“苹果”自身的表示。这样，可以把第3个位置RNNLM的输出和第4个位置的词嵌入进行合并，就得到了第4个位置上含有上下文信息的表示结果。从另一个角度说，这里得到了“苹果”的一种新的表示，它不仅包含苹果这个词自身的信息，也包含它前文的信息。

 %----------------------------------------------
 \begin{figure}[htp]