Commit 7b3fbd9f by 孟霞

合并分支 'master' 到 'mengxia'

Master

查看合并请求 !433
parents df5f0c2b 92a5de00

182 KB | W: | H:

186 KB | W: | H:

Chapter10/Figures/mt-history.png
Chapter10/Figures/mt-history.png
Chapter10/Figures/mt-history.png
Chapter10/Figures/mt-history.png
  • 2-up
  • Swipe
  • Onion skin
......@@ -272,7 +272,7 @@ NMT & 21.7 & 18.7 & -13.7 \\
\parinterval 编码器-解码器框架是一种典型的基于“表示”的模型。编码器的作用是将输入的文字序列通过某种转换变为一种新的“表示”形式,这种“表示”包含了输入序列的所有信息。之后,解码器把这种“表示”重新转换为输出的文字序列。这其中的一个核心问题是表示学习,即:如何定义对输入文字序列的表示形式,并自动学习这种表示,同时应用它生成输出序列。一般来说,不同的表示学习方法可以对应不同的机器翻译模型,比如,在最初的神经机器翻译模型中,源语言句子都被表示为一个独立的向量,这时表示结果是静态的;而在注意力机制中,源语言句子的表示是动态的,也就是翻译目标语言的每个单词时都会使用不同的表示结果。
\parinterval\ref{fig:10-5}是一个应用编码器-解码器结构来解决机器翻译问题的简单实例。给定一个中文句子“我/对/你/感到/满意”,编码器会将这句话编码成一个实数向量$(0.2, -1, 6, 5, 0.7, -2)$,这个向量就是源语言句子的“表示”结果。虽然有些不可思议,但是神经机器翻译模型把这个向量等同于输入序列。向量中的数字并没有实际的意义,然而解码器却能从中提取到源语言句子中所包含的信息。也有研究者把向量的每一个维度看作是一个“特征”,这样源语言句子就被表示成多个“特征”的联合,而且这些特征可以被自动学习。有了这样的源语言句子的“表示”,解码器可以把这个实数向量作为输入,然后逐词生成目标语言句子“I am satisfied with you”。
\parinterval\ref{fig:10-5}是一个应用编码器-解码器结构来解决机器翻译问题的简单实例。给定一个中文句子“我/对/你/感到/满意”,编码器会将这句话编码成一个实数向量$(0.2, -1, 6, \\ 5, 0.7, -2)$,这个向量就是源语言句子的“表示”结果。虽然有些不可思议,但是神经机器翻译模型把这个向量等同于输入序列。向量中的数字并没有实际的意义,然而解码器却能从中提取到源语言句子中所包含的信息。也有研究者把向量的每一个维度看作是一个“特征”,这样源语言句子就被表示成多个“特征”的联合,而且这些特征可以被自动学习。有了这样的源语言句子的“表示”,解码器可以把这个实数向量作为输入,然后逐词生成目标语言句子“I am satisfied with you”。
%----------------------------------------------
\begin{figure}[htp]
......@@ -966,7 +966,7 @@ L(\mathbi{Y},\widehat{\mathbi{Y}}) = \sum_{j=1}^n L_{\textrm{ce}}(\mathbi{y}_j,\
\parinterval 神经网络的参数主要是各层中的线性变换矩阵和偏置。在训练开始时,需要对参数进行初始化。但是,由于神经机器翻译的网络结构复杂,因此损失函数往往不是凸函数,不同初始化会导致不同的优化结果。而且在大量实践中已经发现,神经机器翻译模型对初始化方式非常敏感,性能优异的系统往往需要特定的初始化方式。
\parinterval 因为LSTM是神经机器翻译中常用的一种模型,下面以LSTM模型为例(见\ref{sec:lstm-cell}节),介绍机器翻译模型的初始化方法,这些方法也可以推广到GRU等结构。具体内容如下:
\parinterval 因为LSTM是神经机器翻译中常用的一种模型,下面以LSTM模型为例(见\ref{sec:lstm-cell}\\ 节),介绍机器翻译模型的初始化方法,这些方法也可以推广到GRU等结构。具体内容如下:
\begin{itemize}
\vspace{0.5em}
......
......@@ -51,9 +51,9 @@
%\node[minimum width = 1.8cm] (sub) at ([xshift=-5.5cm,yshift=2cm]num9_9.east) {};
\draw[decorate,decoration={brace,mirror,raise=0pt,amplitude=0.3cm},black,thick] ([yshift=0.4cm,xshift=-0.1cm]num1_1.west) -- node[att,xshift=-0.5cm]{$q$} ([yshift=-0.4cm,xshift=-0.1cm]num3_3.west);
\draw[decorate,decoration={brace,raise=0pt,amplitude=0.3cm},black,thick] ([xshift=-0.4cm,yshift=0.1cm]num1.north) -- node[att,yshift=0.5cm]{$k$}([xshift=0.4cm,yshift=0.1cm]num7.north);
\draw[decorate,decoration={brace,mirror,raise=0pt,amplitude=0.3cm},black,thick] ([xshift=0.5cm,yshift=0.00cm]num9_9.south) -- node[att,xshift=0.5cm,yshift=-0.3cm]{$o$}([xshift=0.5cm,yshift=0.00cm]num9.south);
\draw[decorate,decoration={brace,mirror,raise=0pt,amplitude=0.3cm},black,thick] ([yshift=0.4cm,xshift=-0.1cm]num1_1.west) -- node[att,xshift=-0.5cm]{$Q$} ([yshift=-0.4cm,xshift=-0.1cm]num3_3.west);
\draw[decorate,decoration={brace,raise=0pt,amplitude=0.3cm},black,thick] ([xshift=-0.4cm,yshift=0.1cm]num1.north) -- node[att,yshift=0.5cm]{$U$}([xshift=0.4cm,yshift=0.1cm]num7.north);
\draw[decorate,decoration={brace,mirror,raise=0pt,amplitude=0.3cm},black,thick] ([xshift=0.5cm,yshift=0.00cm]num9_9.south) -- node[att,xshift=0.5cm,yshift=-0.3cm]{$O$}([xshift=0.5cm,yshift=0.00cm]num9.south);
\end{tikzpicture}
\ No newline at end of file
......@@ -21,10 +21,10 @@
\draw[line width=0.9pt, gray!80, -latex] (l\point_3.east) -- (r2_3.west);
}
\node[vuale] at (-1.5em, 1.9em) {$x_2$};
\node[vuale] at (-1.5em, 9.9em) {$x_1$};
\node[vuale] at (6.5em, 1.9em) {$y_1$};
\node[vuale] at (6.5em, 9.9em) {$y_2$};
\node[vuale] at (-1.5em, 1.9em) {$\mathbi{x}_2$};
\node[vuale] at (-1.5em, 9.9em) {$\mathbi{x}_1$};
\node[vuale] at (6.5em, 1.9em) {$\mathbi{z}_2$};
\node[vuale] at (6.5em, 9.9em) {$\mathbi{z}_1$};
\node (t2) at (2.5em, -1em) {\large{$\cdots$}};
\node [anchor=north,font=\tiny] at ([yshift=-0.2em]t2.south) {深度卷积};
......@@ -46,10 +46,10 @@
\draw[line width=0.9pt, cyan!80, -latex] (l\point_3.east) -- (r2_3.west);
}
\node[vuale] at (-1.5em, 1.9em) {$x_2$};
\node[vuale] at (-1.5em, 9.9em) {$x_1$};
\node[vuale] at (6.5em, 1.9em) {$y_1$};
\node[vuale] at (6.5em, 9.9em) {$y_2$};
\node[vuale] at (-1.5em, 1.9em) {$\mathbi{x}_2$};
\node[vuale] at (-1.5em, 9.9em) {$\mathbi{x}_1$};
\node[vuale] at (6.5em, 1.9em) {$\mathbi{z}_2$};
\node[vuale] at (6.5em, 9.9em) {$\mathbi{z}_1$};
\node (t2) at (2.5em, -1em) {\large{$\cdots$}};
\node [anchor=north,font=\tiny] at ([yshift=-0.2em]t2.south) {轻量卷积};
......
......@@ -63,9 +63,9 @@ $\otimes$: & 按位乘运算 \\
\draw[-latex,thick] (b.east) -- (c2.west);
\draw[-latex,thick] (c2.east) -- ([xshift=0.4cm]c2.east);
\node[inner sep=0pt, font=\tiny] at (0.75cm, -0.4cm) {$X$};
\node[inner sep=0pt, font=\tiny] at ([yshift=-0.4cm]a.south) {$B=X * V + c$};
\node[inner sep=0pt, font=\tiny] at ([yshift=-0.4cm]b.south) {$A=X * W + b$};
\node[inner sep=0pt, font=\tiny] at (8.5cm, -0.4cm) {$Y=A \otimes \sigma(B)$};
\node[inner sep=0pt, font=\tiny] at (0.75cm, -0.4cm) {$\mathbi{X}$};
\node[inner sep=0pt, font=\tiny] at ([yshift=-0.4cm]a.south) {$\mathbi{B}=\mathbi{X} * \mathbi{V} + \mathbi{b}_{\mathbi{W}}$};
\node[inner sep=0pt, font=\tiny] at ([yshift=-0.4cm]b.south) {$\mathbi{A}=\mathbi{X} * \mathbi{W} + \mathbi{b}_{\mathbi{V}}$};
\node[inner sep=0pt, font=\tiny] at (8.5cm, -0.4cm) {$\mathbi{Y}=\mathbi{A} \otimes \sigma(\mathbi{B})$};
\end{tikzpicture}
\ No newline at end of file
......@@ -32,12 +32,12 @@
\draw[line width=0.5pt, cyan!80, -latex] (l3_\point.east) -- ([xshift=0em,yshift=0.1em]r3_2.west);
}
\node[vuale] at ([xshift=-0.9em]l1_1.west) {$x_3$};
\node[vuale] at ([xshift=-0.9em]l2_1.west) {$x_2$};
\node[vuale] at ([xshift=-0.9em]l3_1.west) {$x_1$};
\node[vuale] at ([xshift=0.9em]r1_1.east) {$y_3$};
\node[vuale] at ([xshift=0.9em]r2_1.east) {$y_3$};
\node[vuale] at ([xshift=0.9em]r3_1.east) {$y_3$};
\node[vuale] at ([xshift=-0.9em]l1_1.west) {$\mathbi{x}_3$};
\node[vuale] at ([xshift=-0.9em]l2_1.west) {$\mathbi{x}_2$};
\node[vuale] at ([xshift=-0.9em]l3_1.west) {$\mathbi{x}_1$};
\node[vuale] at ([xshift=0.9em]r1_1.east) {$\mathbi{z}_3$};
\node[vuale] at ([xshift=0.9em]r2_1.east) {$\mathbi{z}_2$};
\node[vuale] at ([xshift=0.9em]r3_1.east) {$\mathbi{z}_1$};
\node (t1) at (2.5em, -1em) {\large{$\cdots$}};
\node [anchor=north,font=\tiny] at ([yshift=-0.2em]t1.south) {传统卷积};
......@@ -66,12 +66,12 @@
\draw[line width=0.5pt, cyan!80, -latex] (l\point_2.east) -- (r3_2.west);
}
\node[vuale] at ([xshift=-0.9em]l1_1.west) {$x_3$};
\node[vuale] at ([xshift=-0.9em]l2_1.west) {$x_2$};
\node[vuale] at ([xshift=-0.9em]l3_1.west) {$x_1$};
\node[vuale] at ([xshift=0.9em]r1_1.east) {$y_3$};
\node[vuale] at ([xshift=0.9em]r2_1.east) {$y_3$};
\node[vuale] at ([xshift=0.9em]r3_1.east) {$y_3$};
\node[vuale] at ([xshift=-0.9em]l1_1.west) {$\mathbi{x}_3$};
\node[vuale] at ([xshift=-0.9em]l2_1.west) {$\mathbi{x}_2$};
\node[vuale] at ([xshift=-0.9em]l3_1.west) {$\mathbi{x}_1$};
\node[vuale] at ([xshift=0.9em]r1_1.east) {$\mathbi{z}_3$};
\node[vuale] at ([xshift=0.9em]r2_1.east) {$\mathbi{z}_2$};
\node[vuale] at ([xshift=0.9em]r3_1.east) {$\mathbi{z}_1$};
\node (t2) at (2.5em, -1em) {\large{$\cdots$}};
\node [anchor=north,font=\tiny] at ([yshift=-0.2em]t2.south) {深度卷积};
......@@ -102,12 +102,12 @@
\draw[line width=0.5pt, cyan!80, -latex] (l3_\point.east) -- (r3_2.west);
}
\node[vuale] at ([xshift=-0.9em]l1_1.west) {$x_3$};
\node[vuale] at ([xshift=-0.9em]l2_1.west) {$x_2$};
\node[vuale] at ([xshift=-0.9em]l3_1.west) {$x_1$};
\node[vuale] at ([xshift=0.9em]r1_1.east) {$y_3$};
\node[vuale] at ([xshift=0.9em]r2_1.east) {$y_3$};
\node[vuale] at ([xshift=0.9em]r3_1.east) {$y_3$};
\node[vuale] at ([xshift=-0.9em]l1_1.west) {$\mathbi{x}_3$};
\node[vuale] at ([xshift=-0.9em]l2_1.west) {$\mathbi{x}_2$};
\node[vuale] at ([xshift=-0.9em]l3_1.west) {$\mathbi{x}_1$};
\node[vuale] at ([xshift=0.9em]r1_1.east) {$\mathbi{z}_3$};
\node[vuale] at ([xshift=0.9em]r2_1.east) {$\mathbi{z}_2$};
\node[vuale] at ([xshift=0.9em]r3_1.east) {$\mathbi{z}_1$};
\node (t3) at (2.5em, -1em) {\large{$\cdots$}};
\node [anchor=north,font=\tiny] at ([yshift=-0.2em]t3.south) {逐点卷积};
......
......@@ -12,8 +12,8 @@
\node(num7)[num,right of = num6,xshift = 1.2cm]{\textcolor{blue!70}{$\mathbi{e}_7$}};
\node(num8)[num,right of = num7,xshift = 1.2cm]{\textcolor{blue!70}{$\mathbi{e}_8$}};
\node(num9)[num,right of = num8,xshift = 1.2cm]{$\mathbi{e}_9$};
\node(A)[below of = num2,yshift = -0.6cm]{A};
\node(B)[below of = num8,yshift = -0.6cm]{B};
%\node(A)[below of = num2,yshift = -0.6cm]{A};
%\node(B)[below of = num8,yshift = -0.6cm]{B};
\draw [->, thick, color = blue!80](num2.east)--(num3.west);
......@@ -23,5 +23,8 @@
\draw [->, thick, color = blue!80](num6.east)--(num7.west);
\draw [->, thick, color = blue!80](num7.east)--(num8.west);
\draw [->,thick,color = black!70] (num1) -- (num2);
\draw [->,thick,color =black!70] (num8) -- (num9);
\end{tikzpicture}
\ No newline at end of file
......@@ -13,8 +13,8 @@
\node(num1_8)[num,right of = num1_7,xshift = 1.2cm]{\textcolor{blue!70}{$\mathbi{e}_8$}};
\node(num1_9)[num,right of = num1_8,xshift = 1.2cm]{$\mathbi{e}_9$};
\node(num1_10)[num,right of = num1_9,xshift = 1.2cm, fill = blue!40]{$\mathbi{0}$};
\node(A)[below of = num2,yshift = -0.6cm]{A};
\node(B)[below of = num8,yshift = -0.6cm]{B};
%\node(A)[below of = num2,yshift = -0.6cm]{A};
%\node(B)[below of = num8,yshift = -0.6cm]{B};
\node(num2_0)[num,above of = num1_0,yshift = 1.2cm, fill = blue!40]{\textcolor{white}{$\mathbi{0}$}};
\node(num2_1)[num,right of = num2_0,xshift = 1.2cm]{\textbf2};
......
......@@ -138,7 +138,7 @@
\node [anchor=south,inner sep=2pt,minimum height=1.5em,minimum width=3.0em] (c10) at (c11.north) {\scriptsize{源语言}};
\node [anchor=south,inner sep=2pt,minimum height=1.5em,minimum width=3.0em] (c30) at (c31.north) {\small{$n$=3}};
\node [anchor=south,inner sep=2pt,minimum height=1.5em,minimum width=3.0em] (c50) at (c51.north) {\small{$S$}};
\node [anchor=south,inner sep=2pt,minimum height=1.5em,minimum width=3.0em] (c50) at (c51.north) {\small{$\mathbi{S}$}};
\node [anchor=south,inner sep=2pt] (c60) at (c61.north) {\scriptsize{进行排序}};
\node [anchor=south,inner sep=2pt] (c60-2) at (c60.north) {\scriptsize{由小到大}};
......
This source diff could not be displayed because it is too large. You can view the blob instead.
......@@ -2166,6 +2166,6 @@ Jobs was the CEO of {\red{\underline{apple}}}.
\vspace{0.5em}
\item 为了进一步提高神经语言模型性能,除了改进模型,还可以在模型中引入新的结构或是其他有效信息,该领域也有很多典型工作值得关注。例如在神经语言模型中引入除了词嵌入以外的单词特征,如语言特征(形态、语法、语义特征等)\upcite{Wu2012FactoredLM,Adel2015SyntacticAS}、上下文信息\upcite{mikolov2012context,Wang2015LargerContextLM}、知识图谱等外部知识\upcite{Ahn2016ANK};或是在神经语言模型中引入字符级信息,将其作为字符特征单独\upcite{Kim2016CharacterAwareNL,Hwang2017CharacterlevelLM}或与单词特征一起\upcite{Onoe2016GatedWR,Verwimp2017CharacterWordLL}送入模型中;在神经语言模型中引入双向模型也是一种十分有效的尝试,在单词预测时可以同时利用来自过去和未来的文本信息\upcite{Graves2013HybridSR,bahdanau2014neural,Peters2018DeepCW}
\vspace{0.5em}
\item 词嵌入是自然语言处理近些年的重要进展。所谓“嵌入”是一类方法,理论上,把一个事物进行分布式表示的过程都可以被看作是广义上的“嵌入”。基于这种思想的表示学习也成为了自然语言处理中的前沿方法。比如,如何对树结构,甚至图结构进行分布式表示成为了分析自然语言的重要方法\upcite{DBLP:journals/corr/abs-1809-01854,Yin2018StructVAETL,Aharoni2017TowardsSN,Bastings2017GraphCE,KoncelKedziorski2019TextGF}。此外,除了语言建模,还有很多方式可以进行词嵌入的学习,比如,SENNA\upcite{collobert2011natural}、word2vec\upcite{DBLP:journals/corr/abs-1301-3781,mikolov2013distributed}、Glove\upcite{DBLP:conf/emnlp/PenningtonSM14}、CoVe\upcite{mccann2017learned} 等。
\item 词嵌入是自然语言处理近些年的重要进展。所谓“嵌入”是一类方法,理论上,把一个事物进行分布式表示的过程都可以被看作是广义上的“嵌入”。基于这种思想的表示学习也成为了自然语言处理中的前沿方法。比如,如何对树结构,甚至图结构进行分布式表示成为了分析自然语言的重要方法\upcite{DBLP:journals/corr/abs-1809-01854,Yin2018StructVAETL,Aharoni2017TowardsSN,Bastings2017GraphCE,KoncelKedziorski2019TextGF}。此外,除了语言建模,还有很多方式可以进行词嵌入的学习,比如,SENNA\upcite{2011Natural}、word2vec\upcite{DBLP:journals/corr/abs-1301-3781,mikolov2013distributed}、Glove\upcite{DBLP:conf/emnlp/PenningtonSM14}、CoVe\upcite{mccann2017learned} 等。
\vspace{0.5em}
\end{itemize}
......@@ -139,14 +139,14 @@
%\include{Chapter6/chapter6}
%\include{Chapter7/chapter7}
%\include{Chapter8/chapter8}
\include{Chapter9/chapter9}
\include{Chapter10/chapter10}
\include{Chapter11/chapter11}
\include{Chapter12/chapter12}
%\include{Chapter9/chapter9}
%\include{Chapter10/chapter10}
%\include{Chapter11/chapter11}
%\include{Chapter12/chapter12}
%\include{Chapter13/chapter13}
%\include{Chapter14/chapter14}
%\include{Chapter15/chapter15}
%\include{Chapter16/chapter16}
\include{Chapter16/chapter16}
%\include{Chapter17/chapter17}
%\include{Chapter18/chapter18}
%\include{ChapterAppend/chapterappend}
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论