Commit c053ef8a by zengxin

合并分支 'caorunzhe' 到 'zengxin'

Caorunzhe

查看合并请求 !432
parents 984ce1bd 583c3721
......@@ -567,7 +567,7 @@
\parinterval 卷积是一种高效处理网格数据的计算方式,在图像、语音等领域取得了令人瞩目的成绩。本章介绍了卷积的概念及其特性,并对池化、填充等操作进行了详细的讨论。前面介绍的基于循环神经网络的翻译模型在引入注意力机制后已经大幅度超越了基于统计的机器翻译模型,但由于循环神经网络的计算方式导致网络整体的并行能力差,训练耗时。本章介绍了具有高并行计算的能力的模型范式,即基于卷积神经网络的编码器-解码器框架。其在机器翻译任务上取得了与基于循环神经网络的GNMT模型相当的性能,并大幅度缩短了模型的训练周期。除了基础部分,本章还针对卷积计算进行了延伸,包括逐通道卷积、逐点卷积、轻量卷积和动态卷积等。除了上述提及的内容,卷积神经网络及其变种在文本分类、命名实体识别等其他自然语言处理任务上也有许多应用。
\parinterval 和机器翻译任务不同的是,文本分类任务侧重于对序列特征的提取,然后通过压缩后的特征表示做出类别预测。卷积神经网络可以对序列中一些$n$-gram特征进行提取,也可以用在文本分类任务中,其基本结构包括输入层、卷积层、池化层和全连接层。除了在本章介绍过的TextCNN模型\upcite{Kim2014ConvolutionalNN},不少研究工作在此基础上对其进行改进。比如,通过改变输入层来引入更多特征\upcite{DBLP:conf/acl/NguyenG15,DBLP:conf/aaai/LaiXLZ15},对卷积层的改进\upcite{DBLP:conf/acl/ChenXLZ015,DBLP:conf/emnlp/LeiBJ15}以及对池化层的改进\upcite{Kalchbrenner2014ACN,DBLP:conf/acl/ChenXLZ015}。在命名实体识别任务中,同样可以使用卷积神经网络来进行特征提取\upcite{DBLP:journals/jmlr/CollobertWBKKK11,DBLP:conf/cncl/ZhouZXQBX17},或者使用更高效的空洞卷积对更长的上下文进行建模\upcite{DBLP:conf/emnlp/StrubellVBM17}。此外,也有一些研究工作尝试使用卷积神经网络来提取字符级特征\upcite{DBLP:conf/acl/MaH16,DBLP:conf/emnlp/LiDWCM17,DBLP:conf/acl-codeswitch/WangCK18}
\parinterval 和机器翻译任务不同的是,文本分类任务侧重于对序列特征的提取,然后通过压缩后的特征表示做出类别预测。卷积神经网络可以对序列中一些$n$-gram特征进行提取,也可以用在文本分类任务中,其基本结构包括输入层、卷积层、池化层和全连接层。除了在本章介绍过的TextCNN模型\upcite{Kim2014ConvolutionalNN},不少研究工作在此基础上对其进行改进。比如,通过改变输入层来引入更多特征\upcite{DBLP:conf/acl/NguyenG15,DBLP:conf/aaai/LaiXLZ15},对卷积层的改进\upcite{DBLP:conf/acl/ChenXLZ015,DBLP:conf/emnlp/LeiBJ15}以及对池化层的改进\upcite{Kalchbrenner2014ACN,DBLP:conf/acl/ChenXLZ015}。在命名实体识别任务中,同样可以使用卷积神经网络来进行特征提取\upcite{2011Natural,DBLP:conf/cncl/ZhouZXQBX17},或者使用更高效的空洞卷积对更长的上下文进行建模\upcite{DBLP:conf/emnlp/StrubellVBM17}。此外,也有一些研究工作尝试使用卷积神经网络来提取字符级特征\upcite{DBLP:conf/acl/MaH16,DBLP:conf/emnlp/LiDWCM17,DBLP:conf/acl-codeswitch/WangCK18}
......
\begin{tikzpicture}
\begin{scope}
\node [anchor=center] (node1) at (4.9,1) {\small{训练:}};
\node [anchor=center] (node11) at (5.5,1) {};
\node [anchor=center] (node12) at (6.7,1) {};
\node [anchor=center] (node2) at (4.9,0.5) {\small{推理:}};
\node [anchor=center] (node21) at (5.5,0.5) {};
\node [anchor=center] (node22) at (6.7,0.5) {};
\node [anchor=west,line width=0.6pt,draw=black,minimum width=5.6em,minimum height=2.2em,fill=blue!20,rounded corners=2pt] (node1-1) at (0,0) {\footnotesize{双语数据}};
\node [anchor=south,line width=0.6pt,draw=black,minimum width=4.5em,minimum height=2.2em,fill=blue!20,rounded corners=2pt] (node1-2) at ([yshift=-5em]node1-1.south) {\footnotesize{目标语伪数据}};
\node [anchor=west,line width=0.6pt,draw=black,minimum width=4.5em,minimum height=2.2em,fill=red!20,rounded corners=2pt] (node2-1) at ([xshift=-8.8em,yshift=-2.5em]node1-1.west) {\footnotesize{反向NMT系统}};
\node [anchor=west,line width=0.6pt,draw=black,minimum width=4.5em,minimum height=2.2em,fill=red!20,rounded corners=2pt] (node3-1) at ([xshift=3em,yshift=-2.5em]node1-1.east) {\footnotesize{前向NMT系统}};
\draw [->,line width=1pt](node1-1.west)--([xshift=3em]node2-1.north);
\draw [->,line width=1pt](node1-1.east)--([xshift=-3em]node3-1.north);
\draw [->,line width=1pt](node1-2.east)--([xshift=-3em]node3-1.south);
\draw [->,line width=1pt](node11.east)--(node12.west);
\draw [->,line width=1pt,dashed](node21.east)--(node22.west);
\draw [->,line width=1pt,dashed]([xshift=3em]node2-1.south)--(node1-2.west);
\end{scope}
\tikzstyle{bignode} = [line width=0.6pt,draw=black,minimum width=6.3em,minimum height=2.2em,fill=blue!20,rounded corners=2pt]
\tikzstyle{middlenode} = [line width=0.6pt,draw=black,minimum width=5.6em,minimum height=2.2em,fill=blue!20,rounded corners=2pt]
\node [anchor=center] (node1-1) at (0,0) {\scriptsize{汉语}};
\node [anchor=west] (node1-2) at ([xshift=1.0em]node1-1.east) {\scriptsize{英语}};
\node [anchor=north] (node1-3) at ([xshift=1.65em]node1-1.south) {\scriptsize{反向翻译模型}};
\draw [->,line width=0.6pt](node1-1.east)--(node1-2.west);
\begin{pgfonlayer}{background}
{
\node[fill=red!20,rounded corners=2pt,inner sep=0.2em,draw=black,line width=0.6pt,minimum width=6.0em] [fit =(node1-1)(node1-2)(node1-3)] (remark1) {};
}
\end{pgfonlayer}
\node [anchor=north](node2-1) at ([xshift=-1.93em,yshift=-1.95em]remark1.south){\scriptsize{汉语}};
\node [anchor=north](node2-1-2) at (node2-1.south){\scriptsize{真实数据}};
\begin{pgfonlayer}{background}
{
\node[fill=blue!20,rounded corners=2pt,inner sep=0.1em,draw=black,line width=0.6pt,minimum width=3.85em] [fit =(node2-1)(node2-1-2)] (remark2-1) {};
}
\end{pgfonlayer}
\node [anchor=west](node2-2) at ([xshift=0.82em,yshift=0.68em]remark2-1.east){\scriptsize{英语}};
\node [anchor=north](node2-2-2) at (node2-2.south){\scriptsize{真实数据}};
\begin{pgfonlayer}{background}
{
\node[fill=green!20,rounded corners=2pt,inner sep=0.1em,draw=black,line width=0.6pt,minimum width=3.85em] [fit =(node2-2)(node2-2-2)] (remark2-2) {};
}
\end{pgfonlayer}
\draw [->,line width=0.6pt]([yshift=-2.0em]remark1.south)--(remark1.south) node [pos=0.5,right] (pos1) {\scriptsize{训练}};
\node [anchor=west](node3-1) at ([xshift=5.0em,yshift=0.1em]node1-2.east){\scriptsize{汉语}};
\node [anchor=north](node3-1-2) at (node3-1.south){\scriptsize{真实数据}};
\begin{pgfonlayer}{background}
{
\node[fill=blue!20,rounded corners=2pt,inner sep=0.1em,draw=black,line width=0.6pt,minimum width=3.85em] [fit =(node3-1)(node3-1-2)] (remark3-1) {};
}
\end{pgfonlayer}
\node [anchor=north](node3-2) at ([yshift=-2.15em]remark3-1.south){\scriptsize{英语}};
\node [anchor=north](node3-2-2) at (node3-2.south){\scriptsize{伪数据}};
\begin{pgfonlayer}{background}
{
\node[fill=yellow!20,rounded corners=2pt,inner sep=0.1em,draw=black,line width=0.6pt,minimum width=3.85em] [fit =(node3-2)(node3-2-2)] (remark3-2) {};
}
\end{pgfonlayer}
\draw [->,line width=0.6pt](remark3-1.south)--(remark3-2.north) node [pos=0.5,right] (pos2) {\scriptsize{翻译}};
\begin{pgfonlayer}{background}
{
\node[rounded corners=2pt,inner sep=0.3em,draw=black,line width=0.6pt,dotted] [fit =(remark3-1)(remark3-2)] (remark2) {};
}
\end{pgfonlayer}
\draw [->,line width=0.6pt](remark1.east)--([yshift=2.40em]remark2.west) node [pos=0.5,above] (pos2) {\scriptsize{模型翻译}};
\node [anchor=south](pos2-2) at ([yshift=-0.5em]pos2.north){\scriptsize{使用反向}};
\draw[decorate,thick,decoration={brace,amplitude=5pt}] ([yshift=1.3em,xshift=1.5em]node3-1.east) -- ([yshift=-7.7em,xshift=1.5em]node3-1.east) node [pos=0.1,right,xshift=0.0em,yshift=0.0em] (label1) {\scriptsize{{混合}}};
\node [anchor=west](node4-1) at ([xshift=3.5em,yshift=3.94em]node3-2.east){\scriptsize{英语}};
\node [anchor=north](node4-1-2) at (node4-1.south){\scriptsize{伪数据}};
\begin{pgfonlayer}{background}
{
\node[fill=yellow!20,rounded corners=2pt,inner sep=0.1em,draw=black,line width=0.6pt,minimum width=3.85em] [fit =(node4-1)(node4-1-2)] (remark4-1) {};
}
\end{pgfonlayer}
\node [anchor=north](node4-2) at ([yshift=-1.59em]node4-1.south){\scriptsize{英语}};
\node [anchor=north](node4-2-2) at (node4-2.south){\scriptsize{真实数据}};
\begin{pgfonlayer}{background}
{
\node[fill=green!20,rounded corners=2pt,inner sep=0.1em,draw=black,line width=0.6pt,minimum width=3.85em] [fit =(node4-2)(node4-2-2)] (remark4-2) {};
}
\end{pgfonlayer}
\node [anchor=west](node4-3) at ([xshift=1.7em]node4-2.east){\scriptsize{汉语}};
\node [anchor=north](node4-3-2) at (node4-3.south){\scriptsize{真实数据}};
\begin{pgfonlayer}{background}
{
\node[fill=blue!20,rounded corners=2pt,inner sep=0.1em,draw=black,line width=0.6pt,minimum width=3.85em] [fit =(node4-3)(node4-3-2)] (remark4-3) {};
}
\end{pgfonlayer}
\node [anchor=west](node4-4) at ([xshift=1.7em]node4-1.east){\scriptsize{汉语}};
\node [anchor=north](node4-4-2) at (node4-4.south){\scriptsize{真实数据}};
\begin{pgfonlayer}{background}
{
\node[fill=blue!20,rounded corners=2pt,inner sep=0.1em,draw=black,line width=0.6pt,minimum width=3.85em] [fit =(node4-4)(node4-4-2)] (remark4-3) {};
}
\end{pgfonlayer}
\node [anchor=center] (node5-1) at ([xshift=4.3em,yshift=-1.48em]node4-4.east) {\scriptsize{英语}};
\node [anchor=west] (node5-2) at ([xshift=1.0em]node5-1.east) {\scriptsize{汉语}};
\node [anchor=north] (node5-3) at ([xshift=1.65em]node5-1.south) {\scriptsize{正向翻译模型}};
\draw [->,line width=0.6pt](node5-1.east)--(node5-2.west);
\begin{pgfonlayer}{background}
{
\node[fill=red!20,rounded corners=2pt,inner sep=0.2em,draw=black,line width=0.6pt,minimum width=6.0em] [fit =(node5-1)(node5-2)(node5-3)] (remark3) {};
}
\end{pgfonlayer}
\draw [->,line width=0.6pt]([xshift=-2em]remark3.west)--(remark3.west) node [pos=0.5,above] (pos3) {\scriptsize{训练}};
\end{tikzpicture}
\ No newline at end of file
......@@ -235,7 +235,7 @@
\end{pgfonlayer}
{\scriptsize
\node [anchor=center] (cy00-2) at ([xshift=6.7em,yshift=0.2em]pos4-212) {\tiny{TopK}};
\node [anchor=center] (cy00-2) at ([xshift=6.7em,yshift=0.2em]pos4-212) {\tiny{$n$-best}};
\node [anchor=center,minimum height=1.8em,minimum width=0.8em,fill=orange!30] (cy11-2) at ([xshift=0.0em,yshift=-1.8em]pos4-212) {};
\node [anchor=center,minimum height=1.5em,minimum width=0.8em,fill=blue!30] (cy12-2) at ([xshift=1.3em,yshift=-0.15em]cy11-2) {};
\node [anchor=center,minimum height=2.5em,minimum width=0.8em,fill=black!30] (cy13-2) at ([xshift=1.3em,yshift=0.5em]cy12-2) {};
......
......@@ -5,7 +5,7 @@
\node [rectangle,inner sep=2pt,font=\scriptsize] (top) at ([yshift=3em,xshift=0em]center.north) {
\begin{tabular}{c}
翻译模型 \\
$\textrm{P}(\mathbf t|\mathbf s)$
$\textrm{P}(\ \mathbi{y}|\ \mathbi{x})$
\end{tabular}
};
......@@ -24,7 +24,7 @@ The weather is \\so good today.
\node [rectangle,inner sep=2pt,font=\scriptsize] (down) at ([yshift=-3em,xshift=0em]center.south) {
\begin{tabular}{c}
翻译模型 \\
$\textrm{P}(\mathbf s|\mathbf t)$
$\textrm{P}(\ \mathbi{x}|\ \mathbi{y})$
\end{tabular}
};
......
\begin{tikzpicture}
\begin{scope}
\node [anchor=center] (node1) at (-2.3,0) {\small{$x,y$:双语数据}};
\node [anchor=center] (node2) at (-2.1,-0.5) {\small{$z$}:单语数据};
\node [anchor=center] (node1-1) at (0,0) {\small{$y'$}};
\node [anchor=center] (node3-1) at ([xshift=5.5em,yshift=-0.1em]node1-1.east) {\small{$z'$}};
\node[anchor=south,line width=0.6pt,draw,rounded corners,minimum height=1.5em,minimum width=4em,fill=blue!20](node1-2) at ([yshift=-3em]node1-1.south) {\small{softmax}};
\node[anchor=south,line width=0.6pt,draw,rounded corners,minimum height=1.5em,minimum width=4em,fill=blue!20](node3-2) at ([yshift=-3em]node3-1.south) {\small{softmax}};
\node[anchor=south,line width=0.6pt,draw,rounded corners,minimum height=2.2em,minimum width=4em,fill=red!20](node1-3) at ([yshift=-4.0em]node1-2.south) {\small{Decoder}};
\node[anchor=south,line width=0.6pt,draw,rounded corners,minimum height=2.2em,minimum width=4em,fill=yellow!20](node3-3) at ([yshift=-4.0em]node3-2.south) {\small{LM}};
\node[anchor=south](node1-4) at ([xshift=-0.6em,yshift=-3em]node1-3.south) {\gray{\small{$y$}}};
\node[anchor=south](node3-41) at ([xshift=-0.6em,yshift=-3em]node3-3.south) {\small{$y$}};
\node[anchor=south](node3-42) at ([xshift=0.6em,yshift=-2.9em]node3-3.south) {\small{$z$}};
\node[anchor=west](node2-2) at ([xshift=-4.9em]node1-4.west) {\small{$x$}};
\node[anchor=north,line width=0.6pt,draw,rounded corners,minimum height=2.2em,minimum width=4em,fill=red!20](node2-1) at ([yshift=4em]node2-2.north) {\small{Encoder}};
\node[anchor=north,line width=0.6pt,draw,rounded corners,minimum height=2.2em,minimum width=4em,fill=red!20](node1-3) at ([yshift=-2.0em]node1-2.south) {\small{Decoder}};
\node[anchor=north,line width=0.6pt,draw,rounded corners,minimum height=2.2em,minimum width=4em,fill=yellow!20](node3-3) at ([yshift=-2.0em]node1-3.south) {\small{LM}};
\node[anchor=west,line width=0.6pt,draw,rounded corners,minimum height=1.5em,minimum width=4em,fill=blue!20](node3-2) at ([xshift=2em]node3-3.east) {\small{softmax}};
\node [anchor=north] (node3-1) at ([yshift=3.0em]node3-2.north) {\small{$z'$}};
\node[anchor=north](node3-41) at ([xshift=-0.6em,yshift=-2em]node3-3.south) {\small{$y$}};
\node[anchor=north](node3-42) at ([xshift=0.6em,yshift=-2em]node3-3.south) {\small{$z$}};
\node[anchor=east,line width=0.6pt,draw,rounded corners,minimum height=2.2em,minimum width=4em,fill=red!20](node2-1) at ([xshift=-2em]node1-3.west) {\small{Encoder}};
\node[anchor=north](node2-2) at ([yshift=-2em]node2-1.south) {\small{$x$}};
\node [rectangle,rounded corners,draw=red,line width=0.2mm,densely dashed,inner sep=0.4em] [fit = (node3-2) (node3-3)] (inputshadow) {};
\draw [->,thick,draw=gray](node1-4.north)--([xshift=-0.6em]node1-3.south);
\draw [->,thick](node1-3.north)--(node1-2);
\draw [->,thick](node1-2.north)--(node1-1);
\draw [->,thick](node2-2.north)--(node2-1);
......@@ -24,11 +27,31 @@
\draw [->,thick](node3-41.north)--([xshift=-0.6em]node3-3.south);
\draw [->,thick](node3-42.north)--([xshift=0.6em]node3-3.south);
\draw [->,thick]([xshift=0.6em]node3-3.north)--([xshift=0.6em]node3-2.south);
\draw [->,thick](node3-3.north)--(node1-3.south);
\draw [->,thick](node3-2.north)--(node3-1);
\draw[->,thick]([xshift=-0.6em]node3-3.north)--([xshift=-0.6em,yshift=0.6em]node3-3.north)--([xshift=-3em,yshift=0.6em]node3-3.north)--([xshift=-3em,yshift=-3em]node3-3.north)--([xshift=-5.6em,yshift=-3em]node3-3.north)--([xshift=0.6em]node1-3.south);
\draw[->,thick](node3-3.east)--(node3-2.west);
\node [anchor=east] (node2-1-1) at ([xshift=-12.0em,yshift=-4.25em]node1-1.west) {\small{$y'$}};
\node[anchor=south,line width=0.6pt,draw,rounded corners,minimum height=1.5em,minimum width=4em,fill=blue!20](node2-1-2) at ([yshift=-3em]node2-1-1.south) {\small{softmax}};
\node[anchor=north,line width=0.6pt,draw,rounded corners,minimum height=2.2em,minimum width=4em,fill=red!20](node2-1-3) at ([yshift=-2.0em]node2-1-2.south) {\small{Decoder}};
\node[anchor=east,line width=0.6pt,draw,rounded corners,minimum height=2.2em,minimum width=4em,fill=red!20](node2-2-1) at ([xshift=-2em]node2-1-3.west) {\small{Encoder}};
\node[anchor=north](node2-2-2) at ([yshift=-2em]node2-2-1.south) {\small{$x$}};
\node[anchor=north](node2-2-3) at ([yshift=-2em]node2-1-3.south) {\small{$y$}};
\draw [->,thick](node2-1-2.north)--(node2-1-1);
\draw [->,thick](node2-2-2.north)--(node2-2-1);
\draw[->,thick](node2-2-1.east)--(node2-1-3.west);
\draw [->,thick](node2-1-3.north)--(node2-1-2.south);
\draw [->,thick](node2-2-3.north)--(node2-1-3);
\node [anchor=east] (node1) at ([xshift=-2.0em,yshift=4em]node2-1-1.west) {\small{$x,y$:双语数据}};
\node [anchor=north] (node2) at ([xshift=0.45em]node1.south) {\small{$z$}:单语数据};
\node [anchor=north](pos1) at ([yshift=-3.5em]node3-3.south) {\small{(b)多任务学习}};
\node [anchor=east](pos2) at ([xshift=-10.0em]pos1.west) {\small{(a)单任务学习}};
%\draw[->](node2-1.north)--([yshift=1em]node2-1.north)--([xshift=2.5em,yshift=1em]node2-1.north)--([xshift=2.5em,yshift=-0.4em]node2-1.north)--(node1-3.west);
\end{scope}
\end{tikzpicture}
\ No newline at end of file
......@@ -6,17 +6,17 @@
\node [anchor=north,inner sep=2pt,fill=green!20,minimum height=1.5em,minimum width=3.0em] (a14) at ([yshift=-0.2em]a13.south) {感到};
\node [anchor=north,inner sep=2pt,fill=green!20,minimum height=1.5em,minimum width=3.0em] (a15) at ([yshift=-0.2em]a14.south) {满意};
\node [anchor=south east,inner sep=1pt,fill=black] (pa11) at (a11.south east) {\tiny{\color{white} \textbf{0}}};
\node [anchor=south east,inner sep=1pt,fill=black] (pa12) at (a12.south east) {\tiny{\color{white} \textbf{1}}};
\node [anchor=south east,inner sep=1pt,fill=black] (pa13) at (a13.south east) {\tiny{\color{white} \textbf{2}}};
\node [anchor=south east,inner sep=1pt,fill=black] (pa14) at (a14.south east) {\tiny{\color{white} \textbf{3}}};
\node [anchor=south east,inner sep=1pt,fill=black] (pa15) at (a15.south east) {\tiny{\color{white} \textbf{4}}};
\node [anchor=west,inner sep=2pt,fill=green!20,minimum height=1.5em,minimum width=2.0em] (a21) at ([xshift=1.0em]a11.east) {\footnotesize{P=0.1}};
\node [anchor=north,inner sep=2pt,fill=red!20,minimum height=1.5em,minimum width=2.0em] (a22) at ([yshift=-0.2em]a21.south) {\footnotesize{P=0.1}};
\node [anchor=north,inner sep=2pt,fill=green!20,minimum height=1.5em,minimum width=2.0em] (a23) at ([yshift=-0.2em]a22.south) {\footnotesize{P=0.1}};
\node [anchor=north,inner sep=2pt,fill=green!20,minimum height=1.5em,minimum width=2.0em] (a24) at ([yshift=-0.2em]a23.south) {\footnotesize{P=0.1}};
\node [anchor=north,inner sep=2pt,fill=green!20,minimum height=1.5em,minimum width=2.0em] (a25) at ([yshift=-0.2em]a24.south) {\footnotesize{P=0.1}};
\node [anchor=south east,inner sep=1pt,fill=black] (pa11) at (a11.south east) {\tiny{\color{white} \textbf{1}}};
\node [anchor=south east,inner sep=1pt,fill=black] (pa12) at (a12.south east) {\tiny{\color{white} \textbf{2}}};
\node [anchor=south east,inner sep=1pt,fill=black] (pa13) at (a13.south east) {\tiny{\color{white} \textbf{3}}};
\node [anchor=south east,inner sep=1pt,fill=black] (pa14) at (a14.south east) {\tiny{\color{white} \textbf{4}}};
\node [anchor=south east,inner sep=1pt,fill=black] (pa15) at (a15.south east) {\tiny{\color{white} \textbf{5}}};
\node [anchor=west,inner sep=2pt,fill=green!20,minimum height=1.5em,minimum width=2.0em] (a21) at ([xshift=1.0em]a11.east) {\footnotesize{$P=0.1$}};
\node [anchor=north,inner sep=2pt,fill=red!20,minimum height=1.5em,minimum width=2.0em] (a22) at ([yshift=-0.2em]a21.south) {\footnotesize{$P=0.1$}};
\node [anchor=north,inner sep=2pt,fill=green!20,minimum height=1.5em,minimum width=2.0em] (a23) at ([yshift=-0.2em]a22.south) {\footnotesize{$P=0.1$}};
\node [anchor=north,inner sep=2pt,fill=green!20,minimum height=1.5em,minimum width=2.0em] (a24) at ([yshift=-0.2em]a23.south) {\footnotesize{$P=0.1$}};
\node [anchor=north,inner sep=2pt,fill=green!20,minimum height=1.5em,minimum width=2.0em] (a25) at ([yshift=-0.2em]a24.south) {\footnotesize{$P=0.1$}};
\node [anchor=west,inner sep=2pt] (a31) at ([xshift=0.3em]a23.east) {$\Rightarrow$};
......@@ -25,13 +25,13 @@
\node [anchor=north,inner sep=2pt,fill=green!20,minimum height=1.5em,minimum width=3.0em] (a43) at ([yshift=-0.2em]a42.south) {感到};
\node [anchor=north,inner sep=2pt,fill=green!20,minimum height=1.5em,minimum width=3.0em] (a44) at ([yshift=-0.2em]a43.south) {满意};
\node [anchor=south east,inner sep=1pt,fill=black] (pa41) at (a41.south east) {\tiny{\color{white} \textbf{0}}};
\node [anchor=south east,inner sep=1pt,fill=black] (pa42) at (a42.south east) {\tiny{\color{white} \textbf{2}}};
\node [anchor=south east,inner sep=1pt,fill=black] (pa43) at (a43.south east) {\tiny{\color{white} \textbf{3}}};
\node [anchor=south east,inner sep=1pt,fill=black] (pa44) at (a44.south east) {\tiny{\color{white} \textbf{4}}};
\node [anchor=south east,inner sep=1pt,fill=black] (pa41) at (a41.south east) {\tiny{\color{white} \textbf{1}}};
\node [anchor=south east,inner sep=1pt,fill=black] (pa42) at (a42.south east) {\tiny{\color{white} \textbf{3}}};
\node [anchor=south east,inner sep=1pt,fill=black] (pa43) at (a43.south east) {\tiny{\color{white} \textbf{4}}};
\node [anchor=south east,inner sep=1pt,fill=black] (pa44) at (a44.south east) {\tiny{\color{white} \textbf{5}}};
\node [anchor=south,inner sep=2pt,minimum height=1.5em,minimum width=3.0em] (a10) at (a11.north) {\scriptsize{源语言}};
\node [anchor=south,inner sep=2pt,minimum height=1.5em,minimum width=3.0em] (a20) at (a21.north) {\small{P}};
\node [anchor=south,inner sep=2pt,minimum height=1.5em,minimum width=3.0em] (a20) at (a21.north) {\small{$P$}};
\node [anchor=south,inner sep=2pt] (a30) at (a41.north) {\scriptsize{丢弃的结果}};
\node [anchor=south,inner sep=2pt] (a30-2) at (a30.north) {\scriptsize{部分词随机}};
\node [anchor=north,inner sep=2pt] (pos1) at ([xshift=0.5em,yshift=-0.5em]a25.south) {\scriptsize{(a)部分词随机丢弃的加噪方法}};
......@@ -42,17 +42,17 @@
\node [anchor=north,inner sep=2pt,fill=blue!20,minimum height=1.5em,minimum width=3.0em] (b14) at ([yshift=-0.2em]b13.south) {感到};
\node [anchor=north,inner sep=2pt,fill=blue!20,minimum height=1.5em,minimum width=3.0em] (b15) at ([yshift=-0.2em]b14.south) {满意};
\node [anchor=south east,inner sep=1pt,fill=black] (pb11) at (b11.south east) {\tiny{\color{white} \textbf{0}}};
\node [anchor=south east,inner sep=1pt,fill=black] (pb12) at (b12.south east) {\tiny{\color{white} \textbf{1}}};
\node [anchor=south east,inner sep=1pt,fill=black] (pb13) at (b13.south east) {\tiny{\color{white} \textbf{2}}};
\node [anchor=south east,inner sep=1pt,fill=black] (pb14) at (b14.south east) {\tiny{\color{white} \textbf{3}}};
\node [anchor=south east,inner sep=1pt,fill=black] (pb15) at (b15.south east) {\tiny{\color{white} \textbf{4}}};
\node [anchor=south east,inner sep=1pt,fill=black] (pb11) at (b11.south east) {\tiny{\color{white} \textbf{1}}};
\node [anchor=south east,inner sep=1pt,fill=black] (pb12) at (b12.south east) {\tiny{\color{white} \textbf{2}}};
\node [anchor=south east,inner sep=1pt,fill=black] (pb13) at (b13.south east) {\tiny{\color{white} \textbf{3}}};
\node [anchor=south east,inner sep=1pt,fill=black] (pb14) at (b14.south east) {\tiny{\color{white} \textbf{4}}};
\node [anchor=south east,inner sep=1pt,fill=black] (pb15) at (b15.south east) {\tiny{\color{white} \textbf{5}}};
\node [anchor=west,inner sep=2pt,fill=blue!20,minimum height=1.5em,minimum width=2.0em] (b21) at ([xshift=1.0em]b11.east) {\footnotesize{P=0.1}};
\node [anchor=north,inner sep=2pt,fill=blue!20,minimum height=1.5em,minimum width=2.0em] (b22) at ([yshift=-0.2em]b21.south) {\footnotesize{P=0.1}};
\node [anchor=north,inner sep=2pt,fill=red!20,minimum height=1.5em,minimum width=2.0em] (b23) at ([yshift=-0.2em]b22.south) {\footnotesize{P=0.1}};
\node [anchor=north,inner sep=2pt,fill=blue!20,minimum height=1.5em,minimum width=2.0em] (b24) at ([yshift=-0.2em]b23.south) {\footnotesize{P=0.1}};
\node [anchor=north,inner sep=2pt,fill=blue!20,minimum height=1.5em,minimum width=2.0em] (b25) at ([yshift=-0.2em]b24.south) {\footnotesize{P=0.1}};
\node [anchor=west,inner sep=2pt,fill=blue!20,minimum height=1.5em,minimum width=2.0em] (b21) at ([xshift=1.0em]b11.east) {\footnotesize{$P=0.1$}};
\node [anchor=north,inner sep=2pt,fill=blue!20,minimum height=1.5em,minimum width=2.0em] (b22) at ([yshift=-0.2em]b21.south) {\footnotesize{$P=0.1$}};
\node [anchor=north,inner sep=2pt,fill=red!20,minimum height=1.5em,minimum width=2.0em] (b23) at ([yshift=-0.2em]b22.south) {\footnotesize{$P=0.1$}};
\node [anchor=north,inner sep=2pt,fill=blue!20,minimum height=1.5em,minimum width=2.0em] (b24) at ([yshift=-0.2em]b23.south) {\footnotesize{$P=0.1$}};
\node [anchor=north,inner sep=2pt,fill=blue!20,minimum height=1.5em,minimum width=2.0em] (b25) at ([yshift=-0.2em]b24.south) {\footnotesize{$P=0.1$}};
\node [anchor=west,inner sep=2pt] (b31) at ([xshift=0.3em]b23.east) {$\Rightarrow$};
......@@ -62,14 +62,14 @@
\node [anchor=north,inner sep=2pt,fill=blue!20,minimum height=1.5em,minimum width=3.0em] (b44) at ([yshift=-0.2em]b43.south) {感到};
\node [anchor=north,inner sep=2pt,fill=blue!20,minimum height=1.5em,minimum width=3.0em] (b45) at ([yshift=-0.2em]b44.south) {满意};
\node [anchor=south east,inner sep=1pt,fill=black] (pb41) at (b41.south east) {\tiny{\color{white} \textbf{0}}};
\node [anchor=south east,inner sep=1pt,fill=black] (pb42) at (b42.south east) {\tiny{\color{white} \textbf{1}}};
\node [anchor=south east,inner sep=1pt,fill=black] (pb43) at (b43.south east) {\tiny{\color{white} \textbf{2}}};
\node [anchor=south east,inner sep=1pt,fill=black] (pb44) at (b44.south east) {\tiny{\color{white} \textbf{3}}};
\node [anchor=south east,inner sep=1pt,fill=black] (pb45) at (b45.south east) {\tiny{\color{white} \textbf{4}}};
\node [anchor=south east,inner sep=1pt,fill=black] (pb41) at (b41.south east) {\tiny{\color{white} \textbf{1}}};
\node [anchor=south east,inner sep=1pt,fill=black] (pb42) at (b42.south east) {\tiny{\color{white} \textbf{2}}};
\node [anchor=south east,inner sep=1pt,fill=black] (pb43) at (b43.south east) {\tiny{\color{white} \textbf{3}}};
\node [anchor=south east,inner sep=1pt,fill=black] (pb44) at (b44.south east) {\tiny{\color{white} \textbf{4}}};
\node [anchor=south east,inner sep=1pt,fill=black] (pb45) at (b45.south east) {\tiny{\color{white} \textbf{5}}};
\node [anchor=south,inner sep=2pt,minimum height=1.5em,minimum width=3.0em] (b10) at (b11.north) {\scriptsize{源语言}};
\node [anchor=south,inner sep=2pt,minimum height=1.5em,minimum width=3.0em] (b20) at (b21.north) {\small{P}};
\node [anchor=south,inner sep=2pt,minimum height=1.5em,minimum width=3.0em] (b20) at (b21.north) {\small{$P$}};
\node [anchor=south,inner sep=2pt] (b30) at (b41.north) {\scriptsize{屏蔽的结果}};
\node [anchor=south,inner sep=2pt] (b30-2) at (b30.north) {\scriptsize{部分词随机}};
\node [anchor=north,inner sep=2pt] (pos2) at ([xshift=0.5em,yshift=-0.5em]b25.south) {\scriptsize{(b)部分词随机屏蔽的加噪方法}};
......@@ -80,23 +80,23 @@
\node [anchor=north,inner sep=2pt,fill=yellow!20,minimum height=1.5em,minimum width=3.0em] (c14) at ([yshift=-0.2em]c13.south) {感到};
\node [anchor=north,inner sep=2pt,fill=yellow!20,minimum height=1.5em,minimum width=3.0em] (c15) at ([yshift=-0.2em]c14.south) {满意};
\node [anchor=south east,inner sep=1pt,fill=black] (pc11) at (c11.south east) {\tiny{\color{white} \textbf{0}}};
\node [anchor=south east,inner sep=1pt,fill=black] (pc12) at (c12.south east) {\tiny{\color{white} \textbf{1}}};
\node [anchor=south east,inner sep=1pt,fill=black] (pc13) at (c13.south east) {\tiny{\color{white} \textbf{2}}};
\node [anchor=south east,inner sep=1pt,fill=black] (pc14) at (c14.south east) {\tiny{\color{white} \textbf{3}}};
\node [anchor=south east,inner sep=1pt,fill=black] (pc15) at (c15.south east) {\tiny{\color{white} \textbf{4}}};
\node [anchor=south east,inner sep=1pt,fill=black] (pc11) at (c11.south east) {\tiny{\color{white} \textbf{1}}};
\node [anchor=south east,inner sep=1pt,fill=black] (pc12) at (c12.south east) {\tiny{\color{white} \textbf{2}}};
\node [anchor=south east,inner sep=1pt,fill=black] (pc13) at (c13.south east) {\tiny{\color{white} \textbf{3}}};
\node [anchor=south east,inner sep=1pt,fill=black] (pc14) at (c14.south east) {\tiny{\color{white} \textbf{4}}};
\node [anchor=south east,inner sep=1pt,fill=black] (pc15) at (c15.south east) {\tiny{\color{white} \textbf{5}}};
\node [anchor=west,inner sep=2pt] (c21) at ([xshift=0.3em]c11.east) {\footnotesize{+}};
\node [anchor=west,inner sep=2pt] (c22) at ([xshift=0.3em]c12.east) {\footnotesize{+}};
\node [anchor=west,inner sep=2pt] (c23) at ([xshift=0.3em]c13.east) {\footnotesize{+}};
\node [anchor=west,inner sep=2pt] (c24) at ([xshift=0.3em]c14.east) {\footnotesize{+}};
\node [anchor=west,inner sep=2pt] (c25) at ([xshift=0.3em]c15.east) {\footnotesize{+}};
\node [anchor=west,inner sep=2pt] (c21) at ([xshift=0.35em]c11.east) {\footnotesize{+}};
\node [anchor=west,inner sep=2pt] (c22) at ([xshift=0.35em]c12.east) {\footnotesize{+}};
\node [anchor=west,inner sep=2pt] (c23) at ([xshift=0.35em]c13.east) {\footnotesize{+}};
\node [anchor=west,inner sep=2pt] (c24) at ([xshift=0.35em]c14.east) {\footnotesize{+}};
\node [anchor=west,inner sep=2pt] (c25) at ([xshift=0.35em]c15.east) {\footnotesize{+}};
\node [anchor=west,inner sep=2pt,fill=yellow!20,minimum height=1.5em] (c31) at ([xshift=0.352em]c21.east) {\footnotesize{2.54}};
\node [anchor=north,inner sep=2pt,fill=yellow!20,minimum height=1.5em] (c32) at ([yshift=-0.2em]c31.south) {\footnotesize{0.63}};
\node [anchor=north,inner sep=2pt,fill=yellow!20,minimum height=1.5em] (c33) at ([yshift=-0.2em]c32.south) {\footnotesize{1.77}};
\node [anchor=north,inner sep=2pt,fill=yellow!20,minimum height=1.5em] (c34) at ([yshift=-0.2em]c33.south) {\footnotesize{1.32}};
\node [anchor=north,inner sep=2pt,fill=yellow!20,minimum height=1.5em] (c35) at ([yshift=-0.2em]c34.south) {\footnotesize{2.15}};
\node [anchor=west,inner sep=2pt,fill=yellow!20,minimum height=1.5em,minimum width=2.5em] (c31) at ([xshift=0.423em]c21.east) {\footnotesize{2.54}};
\node [anchor=north,inner sep=2pt,fill=yellow!20,minimum height=1.5em,minimum width=2.5em] (c32) at ([yshift=-0.2em]c31.south) {\footnotesize{0.63}};
\node [anchor=north,inner sep=2pt,fill=yellow!20,minimum height=1.5em,minimum width=2.5em] (c33) at ([yshift=-0.2em]c32.south) {\footnotesize{1.77}};
\node [anchor=north,inner sep=2pt,fill=yellow!20,minimum height=1.5em,minimum width=2.5em] (c34) at ([yshift=-0.2em]c33.south) {\footnotesize{1.32}};
\node [anchor=north,inner sep=2pt,fill=yellow!20,minimum height=1.5em,minimum width=2.5em] (c35) at ([yshift=-0.2em]c34.south) {\footnotesize{2.15}};
\node [anchor=west,inner sep=2pt] (c41) at ([xshift=0.55em]c31.east) {\footnotesize{=}};
\node [anchor=west,inner sep=2pt] (c42) at ([xshift=0.55em]c32.east) {\footnotesize{=}};
......@@ -104,31 +104,31 @@
\node [anchor=west,inner sep=2pt] (c44) at ([xshift=0.55em]c34.east) {\footnotesize{=}};
\node [anchor=west,inner sep=2pt] (c45) at ([xshift=0.55em]c35.east) {\footnotesize{=}};
\node [anchor=west,inner sep=2pt,fill=yellow!20,minimum height=1.5em,minimum width=3.0em] (c51) at ([xshift=0.55em]c41.east) {\footnotesize{$S_{0}=2.54$}};
\node [anchor=north,inner sep=2pt,fill=yellow!20,minimum height=1.5em,minimum width=3.0em] (c52) at ([yshift=-0.2em]c51.south) {\footnotesize{$S_{1}=1.63$}};
\node [anchor=north,inner sep=2pt,fill=yellow!20,minimum height=1.5em,minimum width=3.0em] (c53) at ([yshift=-0.2em]c52.south) {\footnotesize{$S_{2}=3.77$}};
\node [anchor=north,inner sep=2pt,fill=yellow!20,minimum height=1.5em,minimum width=3.0em] (c54) at ([yshift= -0.2em]c53.south) {\footnotesize{$S_{3}=4.33$}};
\node [anchor=north,inner sep=2pt,fill=yellow!20,minimum height=1.5em,minimum width=3.0em] (c55) at ([yshift=-0.2em]c54.south) {\footnotesize{$S_{4}=6.15$}};
\node [anchor=west,inner sep=2pt,fill=yellow!20,minimum height=1.5em,minimum width=3.0em] (c51) at ([xshift=0.56em]c41.east) {\footnotesize{$S_{1}=3.54$}};
\node [anchor=north,inner sep=2pt,fill=yellow!20,minimum height=1.5em,minimum width=3.0em] (c52) at ([yshift=-0.2em]c51.south) {\footnotesize{$S_{2}=2.63$}};
\node [anchor=north,inner sep=2pt,fill=yellow!20,minimum height=1.5em,minimum width=3.0em] (c53) at ([yshift=-0.2em]c52.south) {\footnotesize{$S_{3}=4.77$}};
\node [anchor=north,inner sep=2pt,fill=yellow!20,minimum height=1.5em,minimum width=3.0em] (c54) at ([yshift= -0.2em]c53.south) {\footnotesize{$S_{4}=5.33$}};
\node [anchor=north,inner sep=2pt,fill=yellow!20,minimum height=1.5em,minimum width=3.0em] (c55) at ([yshift=-0.2em]c54.south) {\footnotesize{$S_{5}=7.15$}};
\node [anchor=west,inner sep=2pt,fill=yellow!20,minimum height=1.5em,minimum width=3.0em] (c61) at ([xshift=3.72em]c51.east) {\footnotesize{$S_{0}^{'}=1.63$}};
\node [anchor=north,inner sep=2pt,fill=yellow!20,minimum height=1.5em,minimum width=3.0em] (c62) at ([yshift=-0.2em]c61.south) {\footnotesize{$S_{1}^{'}=2.54$}};
\node [anchor=north,inner sep=2pt,fill=yellow!20,minimum height=1.5em,minimum width=3.0em] (c63) at ([yshift=-0.2em]c62.south) {\footnotesize{$S_{2}^{'}=3.77$}};
\node [anchor=north,inner sep=2pt,fill=yellow!20,minimum height=1.5em,minimum width=3.0em] (c64) at ([yshift=-0.2em]c63.south) {\footnotesize{$S_{3}^{'}=4.33$}};
\node [anchor=north,inner sep=2pt,fill=yellow!20,minimum height=1.5em,minimum width=3.0em] (c65) at ([yshift=-0.2em]c64.south) {\footnotesize{$S_{4}^{'}=6.15$}};
\node [anchor=west,inner sep=2pt,fill=yellow!20,minimum height=1.5em,minimum width=3.0em] (c61) at ([xshift=4.55em]c51.east) {\footnotesize{$S_{2}^{'}=2.63$}};
\node [anchor=north,inner sep=2pt,fill=yellow!20,minimum height=1.5em,minimum width=3.0em] (c62) at ([yshift=-0.2em]c61.south) {\footnotesize{$S_{1}^{'}=3.54$}};
\node [anchor=north,inner sep=2pt,fill=yellow!20,minimum height=1.5em,minimum width=3.0em] (c63) at ([yshift=-0.2em]c62.south) {\footnotesize{$S_{3}^{'}=4.77$}};
\node [anchor=north,inner sep=2pt,fill=yellow!20,minimum height=1.5em,minimum width=3.0em] (c64) at ([yshift=-0.2em]c63.south) {\footnotesize{$S_{4}^{'}=5.33$}};
\node [anchor=north,inner sep=2pt,fill=yellow!20,minimum height=1.5em,minimum width=3.0em] (c65) at ([yshift=-0.2em]c64.south) {\footnotesize{$S_{5}^{'}=7.15$}};
\node [anchor=north,inner sep=2pt] (c71) at ([yshift=-12.3em]b31.south) {$\Rightarrow$};
\node [anchor=west,inner sep=2pt,fill=red!20,minimum height=1.5em,minimum width=3.0em] (c81) at ([xshift=2.0em]c61.east) {};
\node [anchor=west,inner sep=2pt,fill=red!20,minimum height=1.5em,minimum width=3.0em] (c81) at ([xshift=1.99em]c61.east) {};
\node [anchor=north,inner sep=2pt,fill=red!20,minimum height=1.5em,minimum width=3.0em] (c82) at ([yshift=-0.2em]c81.south) {};
\node [anchor=north,inner sep=2pt,fill=yellow!20,minimum height=1.5em,minimum width=3.0em] (c83) at ([yshift=-0.2em]c82.south) {};
\node [anchor=north,inner sep=2pt,fill=yellow!20,minimum height=1.5em,minimum width=3.0em] (c84) at ([yshift=-0.2em]c83.south) {感到};
\node [anchor=north,inner sep=2pt,fill=yellow!20,minimum height=1.5em,minimum width=3.0em] (c85) at ([yshift=-0.2em]c84.south) {满意};
\node [anchor=south east,inner sep=1pt,fill=black] (pc81) at (c81.south east) {\tiny{\color{white} \textbf{0}}};
\node [anchor=south east,inner sep=1pt,fill=black] (pc81) at (c81.south east) {\tiny{\color{white} \textbf{2}}};
\node [anchor=south east,inner sep=1pt,fill=black] (pc82) at (c82.south east) {\tiny{\color{white} \textbf{1}}};
\node [anchor=south east,inner sep=1pt,fill=black] (pc83) at (c83.south east) {\tiny{\color{white} \textbf{2}}};
\node [anchor=south east,inner sep=1pt,fill=black] (pc84) at (c84.south east) {\tiny{\color{white} \textbf{3}}};
\node [anchor=south east,inner sep=1pt,fill=black] (pc85) at (c85.south east) {\tiny{\color{white} \textbf{4}}};
\node [anchor=south east,inner sep=1pt,fill=black] (pc83) at (c83.south east) {\tiny{\color{white} \textbf{3}}};
\node [anchor=south east,inner sep=1pt,fill=black] (pc84) at (c84.south east) {\tiny{\color{white} \textbf{4}}};
\node [anchor=south east,inner sep=1pt,fill=black] (pc85) at (c85.south east) {\tiny{\color{white} \textbf{5}}};
\draw [->,dashed](c51.east)--(c62.west);
\draw [->,dashed](c52.east)--(c61.west);
......@@ -137,8 +137,8 @@
\draw [->,dashed](c55.east)--(c65.west);
\node [anchor=south,inner sep=2pt,minimum height=1.5em,minimum width=3.0em] (c10) at (c11.north) {\scriptsize{源语言}};
\node [anchor=south,inner sep=2pt,minimum height=1.5em,minimum width=3.0em] (c30) at (c31.north) {\small{n=3}};
\node [anchor=south,inner sep=2pt,minimum height=1.5em,minimum width=3.0em] (c50) at (c51.north) {\small{S}};
\node [anchor=south,inner sep=2pt,minimum height=1.5em,minimum width=3.0em] (c30) at (c31.north) {\small{$n$=3}};
\node [anchor=south,inner sep=2pt,minimum height=1.5em,minimum width=3.0em] (c50) at (c51.north) {\small{$\mathbi{S}$}};
\node [anchor=south,inner sep=2pt] (c60) at (c61.north) {\scriptsize{进行排序}};
\node [anchor=south,inner sep=2pt] (c60-2) at (c60.north) {\scriptsize{由小到大}};
......
......@@ -20,8 +20,8 @@
\node [anchor=west] (a15-2) at ([xshift=-4.25em]a15.west) {\tiny{$\cdots$}};
\node [anchor=east] (a13-3) at ([yshift=0.8em]a13-2.west) {\small{无监督语言}};
\node [anchor=north] (a13-4) at ([xshift=0em]a13-3.south) {\small{模型隐藏层}};
\node [anchor=east] (a13-3) at ([yshift=0.8em]a13-2.west) {\small{模型语言}};
\node [anchor=north] (a13-4) at ([xshift=0em]a13-3.south) {\small{隐藏层}};
\node [anchor=east] (a14-3) at ([yshift=0.8em]a14-2.west) {\small{神经机器翻译}};
\node [anchor=north] (a14-4) at ([xshift=0.5em]a14-3.south) {\small{模型隐藏层}};
......
This source diff could not be displayed because it is too large. You can view the blob instead.
......@@ -66,7 +66,7 @@
\subsubsection{2. 神经网络的第二次高潮和第二次寒冬}
\parinterval 虽然第一代神经网络受到了打击,但是20世纪80年代,第二代人工神经网络开始萌发新的生机。在这个发展阶段,生物属性已经不再是神经网络的唯一灵感来源,在{\small\bfnew{连接主义}}\index{连接主义}(Connectionism)\index{Connectionism}和分布式表示两种思潮的影响下,神经网络方法再次走入了人们的视线。
\parinterval 虽然第一代神经网络受到了打击,但是20世纪80年代,第二代人工神经网络开始萌发新的生机。在这个发展阶段,生物属性已经不再是神经网络的唯一灵感来源,在{\small\bfnew{连接主义}}\index{连接主义}(Connectionism)\index{Connectionism}和分布式表示两种思潮的影响下,神经网络方法再次走入了人们的视线。
\vspace{0.3em}
\parinterval (1)符号主义与连接主义
......@@ -102,7 +102,7 @@
\vspace{0.5em}
\end{itemize}
\parinterval 另外,从应用的角度,数据量的快速提升和模型容量的增加也为深度学习的成功提供了条件,数据量的增加使得深度学习有了用武之地,例如,2000年以来,无论在学术研究还是在工业实践中,双语数据的使用数量都在逐年上升(如图\ref{fig:9-1}所示)。现在的深度学习模型参数量都十分巨大,因此需要大规模数据才能保证模型学习的充分性,而大数据时代的到来为训练这样的模型提供了数据基础。
\parinterval 另外,从应用的角度来看,数据量的快速提升和模型容量的增加也为深度学习的成功提供了条件,数据量的增加使得深度学习有了用武之地,例如,2000年以来,无论在学术研究还是在工业实践中,双语数据的使用数量都在逐年上升(如图\ref{fig:9-1}所示)。现在的深度学习模型参数量都十分巨大,因此需要大规模数据才能保证模型学习的充分性,而大数据时代的到来为训练这样的模型提供了数据基础。
%----------------------------------------------------------------------
\begin{figure}[htp]
......@@ -142,7 +142,7 @@
\begin{itemize}
\vspace{0.5em}
\item 特征的构造需要耗费大量的时间和精力。在传统机器学习的特征工程方法中,特征提取过程往往依赖于大量的先验假设,都基于人力完成的,这样导致相关系统的研发周期也大大增加;
\item 特征的构造需要耗费大量的时间和精力。在传统机器学习的特征工程方法中,特征提取都是基于人力完成的,该过程往往依赖于大量的先验假设,会导致相关系统的研发周期也大大增加;
\vspace{0.5em}
\item 最终的系统性能强弱非常依赖特征的选择。有一句话在业界广泛流传:“数据和特征决定了机器学习的上限”,但是人的智力和认知是有限的,因此人工设计的特征的准确性和覆盖度会存在瓶颈;
\vspace{0.5em}
......@@ -150,7 +150,7 @@
\vspace{0.5em}
\end{itemize}
\parinterval 端到端学习将人们从大量的特征提取工作之中解放出来,可以不需要太多人的先验知识。从某种意义上讲,对问题的特征提取完全是自动完成的,这也意味着哪怕系统开发者不是该任务的“专家”也可以完成相关系统的开发。此外,端到端学习实际上也隐含了一种新的对问题的表示形式\ $\dash$分布式表示。 在这种框架下,模型的输入可以被描述为分布式的实数向量,这样模型可以有更多的维度描述一个事物,同时避免传统符号系统对客观事物离散化的刻画。比如,在自然语言处理中,表示学习重新定义了什么是词,什么是句子。在本章后面的内容中也会看到,表示学习可以让计算机对语言文字的描述更加准确和充分。
\parinterval 端到端学习将人们从大量的特征提取工作之中解放出来,可以不需要太多人的先验知识。从某种意义上讲,对问题的特征提取完全是自动完成的,这也意味着即使系统开发者不是该任务的“专家”也可以完成相关系统的开发。此外,端到端学习实际上也隐含了一种新的对问题的表示形式\ $\dash$分布式表示。 在这种框架下,模型的输入可以被描述为分布式的实数向量,这样模型可以有更多的维度描述一个事物,同时避免传统符号系统对客观事物离散化的刻画。比如,在自然语言处理中,表示学习重新定义了什么是词,什么是句子。在本章后面的内容中也会看到,表示学习可以让计算机对语言文字的描述更加准确和充分。
%----------------------------------------------------------------------------------------
% NEW SUBSUB-SECTION
......@@ -196,7 +196,7 @@
\subsection{线性代数基础} \label{sec:9.2.1}
\parinterval 线性代数作为一个数学分支,广泛应用于科学和工程中,神经网络的数学描述中也大量使用了线性代数工具。因此,这里对线性代数的一些概念进行简要介绍,以方便后续对神经网络数学描述。
\parinterval 线性代数作为一个数学分支,广泛应用于科学和工程中,神经网络的数学描述中也大量使用了线性代数工具。因此,这里对线性代数的一些概念进行简要介绍,以方便后续对神经网络进行数学描述。
%----------------------------------------------------------------------------------------
% NEW SUBSUB-SECTION
......@@ -740,7 +740,7 @@ x_1\cdot w_1+x_2\cdot w_2+x_3\cdot w_3 & = & 0\cdot 1+0\cdot 1+1\cdot 1 \nonumbe
%-------------------------------------------
\vspace{-0.5em}
\parinterval 那激活函数又是什么?神经元在接收到经过线性变换的结果后,通过激活函数的处理,得到最终的输出$ \mathbf y $。激活函数的目的是解决实际问题中的非线性变换,线性变换只能拟合直线,而激活函数的加入,使神经网络具有了拟合曲线的能力。 特别是在实际问题中,很多现象都无法用简单的线性关系描述,这时可以使用非线性激活函数来描述更加复杂的问题。常见的非线性函数有Sigmoid、ReLU、Tanh等。如图\ref{fig:9-15}列举了几种激活函数的形式。
\parinterval 那激活函数又是什么?神经元在接收到经过线性变换的结果后,通过激活函数的处理,得到最终的输出$ \mathbf y $。激活函数的目的是解决实际问题中的非线性变换,线性变换只能拟合直线,而激活函数的加入,使神经网络具有了拟合曲线的能力。 特别是在实际问题中,很多现象都无法用简单的线性关系描述,这时可以使用非线性激活函数来描述更加复杂的问题。常见的非线性激活函数有Sigmoid、ReLU、Tanh等。如图\ref{fig:9-15}列举了几种激活函数的形式。
%----------------------------------------------
\begin{figure}[htp]
......@@ -1069,7 +1069,7 @@ f(x)=\begin{cases} 0 & x\le 0 \\x & x>0\end{cases}
\parinterval 有了张量这个工具,可以很容易地实现任意的神经网络。反过来,神经网络都可以被看作是张量的函数。一种经典的神经网络计算模型是:给定输入张量,通过各个神经网络层所对应的张量计算之后,最后得到输出张量。这个过程也被称作{\small\sffamily\bfseries{前向传播}}\index{前向传播}(Forward Propagation\index{Forward Propagation}),它常常被应用在使用神经网络对新的样本进行推断中。
\parinterval 来看一个具体的例子,如图\ref{fig:9-37}(a)是一个根据天气情况判断穿衣指数(穿衣指数是人们穿衣薄厚的依据)的过程,将当天的天空状况、低空气温、水平气压作为输入,通过一层神经元在输入数据中提取温度、风速两方面的特征,并根据这两方面的特征判断穿衣指数。需要注意的是,在实际的神经网络中,并不能准确地知道神经元究竟可以提取到哪方面的特征,以上表述是为了让读者更好地理解神经网络的建模过程和前向传播过程。这里将上述过程建模为如图\ref{fig:9-37}(b)所示的两层神经网络。
\parinterval 来看一个具体的例子,如图\ref{fig:9-37}是一个根据天气情况判断穿衣指数(穿衣指数是人们穿衣薄厚的依据)的过程,将当天的天空状况、低空气温、水平气压作为输入,通过一层神经元在输入数据中提取温度、风速两方面的特征,并根据这两方面的特征判断穿衣指数。需要注意的是,在实际的神经网络中,并不能准确地知道神经元究竟可以提取到哪方面的特征,以上表述是为了让读者更好地理解神经网络的建模过程和前向传播过程。这里将上述过程建模为如图\ref{fig:9-37}所示的两层神经网络。
%----------------------------------------------
\begin{figure}[htp]
......@@ -2162,10 +2162,10 @@ Jobs was the CEO of {\red{\underline{apple}}}.
\begin{itemize}
\vspace{0.5em}
\item 端到端学习是神经网络方法的特点之一。这样,系统开发者不需要设计输入和输出的隐含结构,甚至连特征工程都不再需要。但是,另一方面,由于这种端到端学习完全由神经网络自行完成,整个学习过程没有人的先验知识做指导,导致学习的结构和参数很难进行解释。针对这个问题也有很多研究者进行{\small\sffamily\bfseries{可解释机器学习}}\index{可解释机器学习}(Explainable Machine Learning)\index{Explainable Machine Learning}的研究\upcite{DBLP:journals/corr/abs-1905-09418,moraffah2020causal,blodgett2020language,}。对于自然语言处理,方法的可解释性是十分必要的。从另一个角度说,如何使用先验知识改善端到端学习也是很多人关注的方向\upcite{arthur2016incorporating,zhang-etal-2017-prior},比如,如何使用句法知识改善自然语言处理模型\upcite{stahlberg2016syntactically,currey2019incorporating,Yang2017TowardsBH,marevcek2018extracting,blevins2018deep}
\item 端到端学习是神经网络方法的特点之一。这样,系统开发者不需要设计输入和输出的隐含结构,甚至连特征工程都不再需要。但是,另一方面,由于这种端到端学习完全由神经网络自行完成,整个学习过程没有人的先验知识做指导,导致学习的结构和参数很难进行解释。针对这个问题也有很多研究者进行{\small\sffamily\bfseries{可解释机器学习}}\index{可解释机器学习}(Explainable Machine Learning)\index{Explainable Machine Learning}的研究\upcite{moraffah2020causal,Kovalerchuk2020SurveyOE,DoshiVelez2017TowardsAR}。对于自然语言处理,方法的可解释性是十分必要的。从另一个角度说,如何使用先验知识改善端到端学习也是很多人关注的方向\upcite{arthur2016incorporating,zhang-etal-2017-prior},比如,如何使用句法知识改善自然语言处理模型\upcite{stahlberg2016syntactically,currey2019incorporating,Yang2017TowardsBH,marevcek2018extracting,blevins2018deep}
\vspace{0.5em}
\item 为了进一步提高神经语言模型性能,除了改进模型,还可以在模型中引入新的结构或是其他有效信息,该领域也有很多典型工作值得关注。例如在神经语言模型中引入除了词嵌入以外的单词特征,如语言特征(形态、语法、语义特征等)\upcite{Wu2012FactoredLM,Adel2015SyntacticAS}、上下文信息\upcite{mikolov2012context,Wang2015LargerContextLM}、知识图谱等外部知识\upcite{Ahn2016ANK};或是在神经语言模型中引入字符级信息,将其作为字符特征单独\upcite{Kim2016CharacterAwareNL,Hwang2017CharacterlevelLM}或与单词特征一起\upcite{Onoe2016GatedWR,Verwimp2017CharacterWordLL}送入模型中;在神经语言模型中引入双向模型也是一种十分有效的尝试,在单词预测时可以同时利用来自过去和未来的文本信息\upcite{Graves2013HybridSR,bahdanau2014neural,Peters2018DeepCW}
\vspace{0.5em}
\item 词嵌入是自然语言处理近些年的重要进展。所谓“嵌入”是一类方法,理论上,把一个事物进行分布式表示的过程都可以被看作是广义上的“嵌入”。基于这种思想的表示学习也成为了自然语言处理中的前沿方法。比如,如何对树结构,甚至图结构进行分布式表示成为了分析自然语言的重要方法\upcite{DBLP:journals/corr/abs-1809-01854,Yin2018StructVAETL,Aharoni2017TowardsSN}。此外,除了语言建模,还有很多方式可以进行词嵌入的学习,比如,SENNA\upcite{collobert2011natural}、word2vec\upcite{DBLP:journals/corr/abs-1301-3781,mikolov2013distributed}、Glove\upcite{DBLP:conf/emnlp/PenningtonSM14}、CoVe\upcite{mccann2017learned} 等。
\item 词嵌入是自然语言处理近些年的重要进展。所谓“嵌入”是一类方法,理论上,把一个事物进行分布式表示的过程都可以被看作是广义上的“嵌入”。基于这种思想的表示学习也成为了自然语言处理中的前沿方法。比如,如何对树结构,甚至图结构进行分布式表示成为了分析自然语言的重要方法\upcite{DBLP:journals/corr/abs-1809-01854,Yin2018StructVAETL,Aharoni2017TowardsSN,Bastings2017GraphCE,KoncelKedziorski2019TextGF}。此外,除了语言建模,还有很多方式可以进行词嵌入的学习,比如,SENNA\upcite{2011Natural}、word2vec\upcite{DBLP:journals/corr/abs-1301-3781,mikolov2013distributed}、Glove\upcite{DBLP:conf/emnlp/PenningtonSM14}、CoVe\upcite{mccann2017learned} 等。
\vspace{0.5em}
\end{itemize}
......@@ -3867,8 +3867,7 @@ year = {2012}
volume={18},
number={4},
pages={467--479},
year={1992},
publisher={MIT Press}
year={1992}
}
@inproceedings{mikolov2012context,
......@@ -3877,10 +3876,9 @@ year = {2012}
Tomas and
Zweig and
Geoffrey},
booktitle={2012 IEEE Spoken Language Technology Workshop (SLT)},
publisher={IEEE Spoken Language Technology Workshop},
pages={234--239},
year={2012},
organization={IEEE}
year={2012}
}
@article{zaremba2014recurrent,
......@@ -3905,7 +3903,7 @@ year = {2012}
Jan and
Schmidhuber and
Jurgen},
journal={arXiv: Learning},
journal={International Conference on Machine Learning},
year={2016}
}
......@@ -3917,7 +3915,7 @@ year = {2012}
Nitish Shirish and
Socher and
Richard},
journal={arXiv: Computation and Language},
journal={International Conference on Learning Representations},
year={2017}
}
......@@ -3934,12 +3932,11 @@ year = {2012}
@article{baydin2017automatic,
title ={Automatic differentiation in machine learning: a survey},
author ={Baydin, At{\i}l{\i}m G{\"u}nes and Pearlmutter, Barak A and Radul, Alexey Andreyevich and Siskind, Jeffrey Mark},
journal ={The Journal of Machine Learning Research},
journal ={Journal of Machine Learning Research},
volume ={18},
number ={1},
pages ={5595--5637},
year ={2017},
publisher ={JMLR. org}
year ={2017}
}
@article{qian1999momentum,
......@@ -3977,9 +3974,8 @@ year = {2012}
author = {Diederik P. Kingma and
Jimmy Ba},
title = {Adam: {A} Method for Stochastic Optimization},
booktitle = {3rd International Conference on Learning Representations, {ICLR} 2015,
San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings},
year = {2015},
publisher = {International Conference on Learning Representations},
year = {2015}
}
@inproceedings{ioffe2015batch,
......@@ -3987,13 +3983,10 @@ year = {2012}
Christian Szegedy},
title = {Batch Normalization: Accelerating Deep Network Training by Reducing
Internal Covariate Shift},
booktitle = {Proceedings of the 32nd International Conference on Machine Learning,
{ICML} 2015, Lille, France, 6-11 July 2015},
series = {{JMLR} Workshop and Conference Proceedings},
publisher = {International Conference on Machine Learning},
volume = {37},
pages = {448--456},
publisher = {JMLR.org},
year = {2015},
year = {2015}
}
@article{Ba2016LayerN,
......@@ -4003,7 +3996,7 @@ year = {2012}
title = {Layer Normalization},
journal = {CoRR},
volume = {abs/1607.06450},
year = {2016},
year = {2016}
}
@inproceedings{mikolov2013distributed,
......@@ -4013,11 +4006,9 @@ year = {2012}
Gregory S. Corrado and
Jeffrey Dean},
title = {Distributed Representations of Words and Phrases and their Compositionality},
booktitle = {Advances in Neural Information Processing Systems 26: 27th Annual
Conference on Neural Information Processing Systems 2013. Proceedings
of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States},
publisher = {Conference on Neural Information Processing Systems},
pages = {3111--3119},
year = {2013},
year = {2013}
}
@inproceedings{arthur2016incorporating,
......@@ -4025,12 +4016,9 @@ year = {2012}
Graham Neubig and
Satoshi Nakamura},
title = {Incorporating Discrete Translation Lexicons into Neural Machine Translation},
booktitle = {Proceedings of the 2016 Conference on Empirical Methods in Natural
Language Processing, {EMNLP} 2016, Austin, Texas, USA, November 1-4,
2016},
pages = {1557--1567},
publisher = {The Association for Computational Linguistics},
year = {2016},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2016}
}
@inproceedings{stahlberg2016syntactically,
......@@ -4039,10 +4027,7 @@ year = {2012}
Aurelien Waite and
Bill Byrne},
title = {Syntactically Guided Neural Machine Translation},
booktitle = {Proceedings of the 54th Annual Meeting of the Association for Computational
Linguistics, {ACL} 2016, August 7-12, 2016, Berlin, Germany, Volume
2: Short Papers},
publisher = {The Association for Computer Linguistics},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2016},
}
......@@ -4051,12 +4036,9 @@ year = {2012}
Alessandro Moschitti},
title = {Embedding Semantic Similarity in Tree Kernels for Domain Adaptation
of Relation Extraction},
booktitle = {Proceedings of the 51st Annual Meeting of the Association for Computational
Linguistics, {ACL} 2013, 4-9 August 2013, Sofia, Bulgaria, Volume
1: Long Papers},
pages = {1498--1507},
publisher = {The Association for Computer Linguistics},
year = {2013},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2013}
}
@inproceedings{perozzi2014deepwalk,
......@@ -4064,42 +4046,32 @@ year = {2012}
Rami Al-Rfou and
Steven Skiena},
title = {DeepWalk: online learning of social representations},
booktitle = {The 20th {ACM} {SIGKDD} International Conference on Knowledge Discovery
and Data Mining, {KDD} '14, New York, NY, {USA} - August 24 - 27,
2014},
publisher = {ACM SIGKDD Conference on Knowledge Discovery and Data Mining},
pages = {701--710},
publisher = {{ACM}},
year = {2014},
year = {2014}
}
@article{collobert2011natural,
author = {Ronan Collobert and
Jason Weston and
L{\'{e}}on Bottou and
Michael Karlen and
Koray Kavukcuoglu and
Pavel P. Kuksa},
title = {Natural Language Processing (Almost) from Scratch},
journal = {Journal of Machine Learning Research},
volume = {12},
pages = {2493--2537},
year = {2011},
@article{2011Natural,
title={Natural Language Processing (almost) from Scratch},
author={ Collobert, Ronan and Weston, Jason and Bottou, Léon and Karlen, Michael and Kavukcuoglu, Koray and Kuksa, Pavel },
journal={Journal of Machine Learning Research},
volume={12},
number={1},
pages={2493-2537},
year={2011}
}
@inproceedings{mccann2017learned,
author = {Bryan McCann and
James Bradbury and
Caiming Xiong and
Richard Socher},
title = {Learned in Translation: Contextualized Word Vectors},
booktitle = {Advances in Neural Information Processing Systems 30: Annual Conference
on Neural Information Processing Systems 2017, 4-9 December 2017,
Long Beach, CA, {USA}},
booktitle = {Conference on Neural Information Processing Systems},
pages = {6294--6305},
year = {2017},
year = {2017}
}
%%%%%%%%%%%%%%%%%%%%%%%神经语言模型,检查修改%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%神经语言模型,检查修改%%%%%%%%%%%%%%%%%%%%%%%%%
@inproceedings{Peters2018DeepCW,
title={Deep contextualized word representations},
author={Matthew E. Peters and
......@@ -4135,13 +4107,13 @@ year = {2012}
}
@inproceedings{Onoe2016GatedWR,
title={Gated Word-Character Recurrent Language Model},
author={Yasumasa Miyamoto and
Kyunghyun Cho},
publisher={arXiv preprint arXiv:1606.01700},
year={2016}
author = {Yasumasa Miyamoto and
Kyunghyun Cho},
title = {Gated Word-Character Recurrent Language Model},
pages = {1992--1997},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2016}
}
@inproceedings{Hwang2017CharacterlevelLM,
title={Character-level language modeling with hierarchical recurrent neural networks},
author={Kyuyeon Hwang and
......@@ -4216,12 +4188,11 @@ year = {2012}
Ruocheng Guo and
Adrienne Raglin and
Huan Liu},
journal={ACM SIGKDD Explorations Newsletter},
journal={ACM SIGKDD Conference on Knowledge Discovery and Data Mining},
volume={22},
number={1},
pages={18--33},
year={2020},
publisher={ACM New York, NY, USA}
year={2020}
}
@incollection{nguyen2019understanding,
......@@ -4231,7 +4202,7 @@ year = {2012}
Jeff Clune},
pages={55--76},
year={2019},
publisher={Explainable AI}
publisher={Springer}
}
@inproceedings{yang2017improving,
title={Improving adversarial neural machine translation with prior knowledge},
......@@ -4250,15 +4221,16 @@ year = {2012}
title={Incorporating source syntax into transformer-based neural machine translation},
author={Anna Currey and
Kenneth Heafield},
publisher={Proceedings of the Fourth Conference on Machine Translation},
publisher={Annual Meeting of the Association for Computational Linguistics},
pages={24--33},
year={2019}
}
@article{currey2018multi,
title={Multi-source syntactic neural machine translation},
author={Anna Currey and
Kenneth Heafield},
journal={arXiv preprint arXiv:1808.10267},
journal={Conference on Empirical Methods in Natural Language Processing},
year={2018}
}
@inproceedings{marevcek2018extracting,
......@@ -4272,7 +4244,7 @@ year = {2012}
@article{blevins2018deep,
title={Deep rnns encode soft hierarchical syntax},
author={Blevins, Terra and Levy, Omer and Zettlemoyer, Luke},
journal={arXiv preprint arXiv:1805.04218},
journal={Annual Meeting of the Association for Computational Linguistics},
year={2018}
}
@inproceedings{Yin2018StructVAETL,
......@@ -4288,7 +4260,7 @@ year = {2012}
title={Towards String-To-Tree Neural Machine Translation},
author={Roee Aharoni and
Yoav Goldberg},
journal={arXiv preprint arXiv:1704.04743},
journal={Annual Meeting of the Association for Computational Linguistics},
year={2017}
}
......@@ -4308,9 +4280,8 @@ year = {2012}
Dhanush Bekal and Yi Luan and
Mirella Lapata and
Hannaneh Hajishirzi},
journal={ArXiv},
year={2019},
volume={abs/1904.02342}
journal={Annual Conference of the North American Chapter of the Association for Computational Linguistics},
year={2019}
}
@article{Kovalerchuk2020SurveyOE,
......@@ -4327,7 +4298,7 @@ year = {2012}
title={Towards A Rigorous Science of Interpretable Machine Learning},
author={Finale Doshi-Velez and
Been Kim},
journal={arXiv: Machine Learning},
journal={arXiv preprint arXiv:1702.08608},
year={2017}
}
......@@ -4349,7 +4320,7 @@ year = {2012}
title = {Does Multi-Encoder Help? {A} Case Study on Context-Aware Neural Machine
Translation},
pages = {3512--3518},
publisher = {Association for Computational Linguistics},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2020}
}
......@@ -4359,7 +4330,7 @@ year = {2012}
Abe Ittycheriah},
title = {Supervised Attentions for Neural Machine Translation},
pages = {2283--2288},
publisher = {The Association for Computational Linguistics},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2016}
}
......@@ -4370,7 +4341,7 @@ year = {2012}
Eiichiro Sumita},
title = {Neural Machine Translation with Supervised Attention},
pages = {3093--3102},
publisher = {The Association for Computational Linguistics},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2016}
}
......@@ -4384,16 +4355,16 @@ year = {2012}
title = {Fast and Robust Neural Network Joint Models for Statistical Machine
Translation},
pages = {1370--1380},
publisher = {The Association for Computer Linguistics},
year = {2014},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2014}
}
@inproceedings{Schwenk_continuousspace,
author = {Holger Schwenk},
title = {Continuous Space Translation Models for Phrase-Based Statistical Machine
Translation},
pages = {1071--1080},
publisher = {Indian Institute of Technology Bombay},
year = {2012},
publisher = {International Conference on Computational Linguistics},
year = {2012}
}
@inproceedings{kalchbrenner-blunsom-2013-recurrent,
author = {Nal Kalchbrenner and
......@@ -4401,25 +4372,24 @@ year = {2012}
title = {Recurrent Continuous Translation Models},
pages = {1700--1709},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2013},
year = {2013}
}
@article{HochreiterThe,
author = {Sepp Hochreiter},
title = {The Vanishing Gradient Problem During Learning Recurrent Neural Nets
and Problem Solutions},
journal = {International Journal of Uncertainty, Fuzziness and Knowledge-Based
Systems},
journal = {International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems},
volume = {6},
number = {2},
pages = {107--116},
year = {1998},
year = {1998}
}
@article{BENGIO1994Learning,
author = {Yoshua Bengio and
Patrice Y. Simard and
Paolo Frasconi},
title = {Learning long-term dependencies with gradient descent is difficult},
journal = {Institute of Electrical and Electronics Engineers},
journal = {IEEE Transportation Neural Networks},
volume = {5},
number = {2},
pages = {157--166},
......@@ -4435,15 +4405,14 @@ author = {Yoshua Bengio and
Lukasz Kaiser and
Illia Polosukhin},
title = {Attention is All you Need},
publisher = {Advances in Neural Information Processing Systems 30: Annual Conference
on Neural Information Processing Systems},
publisher = {Conference on Neural Information Processing Systems},
pages = {5998--6008},
year = {2017},
year = {2017}
}
@article{StahlbergNeural,
title={Neural Machine Translation: A Review},
author={Felix Stahlberg},
journal={journal of artificial intelligence research},
journal={Journal of Artificial Intelligence Research},
year={2020},
volume={69},
pages={343-418}
......@@ -4455,8 +4424,8 @@ author = {Yoshua Bengio and
Marcello Federico},
title = {Neural versus Phrase-Based Machine Translation Quality: a Case Study},
pages = {257--267},
publisher = {The Association for Computational Linguistics},
year = {2016},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2016}
}
@article{Hassan2018AchievingHP,
author = {Hany Hassan and
......@@ -4498,19 +4467,19 @@ author = {Yoshua Bengio and
Lidia S. Chao},
title = {Learning Deep Transformer Models for Machine Translation},
pages = {1810--1822},
publisher = {Association for Computational Linguistics},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2019}
}
@article{Li2020NeuralMT,
@inproceedings{Li2020NeuralMT,
author = {Yanyang Li and
Qiang Wang and
Tong Xiao and
Tongran Liu and
Jingbo Zhu},
title = {Neural Machine Translation with Joint Representation},
journal = {CoRR},
volume = {abs/2002.06546},
year = {2020},
pages = {8285--8292},
publisher = {AAAI Conference on Artificial Intelligence},
year = {2020}
}
@article{HochreiterLong,
author = {Hochreiter, Sepp and Schmidhuber, Jürgen},
......@@ -4519,7 +4488,7 @@ author = {Yoshua Bengio and
pages = {1735-80},
title = {Long Short-term Memory},
volume = {9},
journal = {Neural computation},
journal = {Neural Computation}
}
@inproceedings{Cho2014Learning,
author = {Kyunghyun Cho and
......@@ -4531,24 +4500,18 @@ author = {Yoshua Bengio and
Yoshua Bengio},
title = {Learning Phrase Representations using {RNN} Encoder-Decoder for Statistical
Machine Translation},
publisher = {Proceedings of the 2014 Conference on Empirical Methods in Natural
Language Processing, {EMNLP} 2014, October 25-29, 2014, Doha, Qatar,
{A} meeting of SIGDAT, a Special Interest Group of the {ACL}},
publisher = {Annual Meeting of the Association for Computational Linguistics},
pages = {1724--1734},
//publisher = {{ACL}},
year = {2014},
year = {2014}
}
@inproceedings{pmlr-v9-glorot10a,
author = {Xavier Glorot and
Yoshua Bengio},
title = {Understanding the difficulty of training deep feedforward neural networks},
publisher = {Proceedings of the Thirteenth International Conference on Artificial
Intelligence and Statistics, {AISTATS} 2010, Chia Laguna Resort, Sardinia,
Italy, May 13-15, 2010},
publisher = {International Conference on Artificial Intelligence and Statistics},
volume = {9},
pages = {249--256},
//publisher = {JMLR.org},
year = {2010},
year = {2010}
}
@inproceedings{xiao2017fast,
author = {Tong Xiao and
......@@ -4556,12 +4519,9 @@ author = {Yoshua Bengio and
Tongran Liu and
Chunliang Zhang},
title = {Fast Parallel Training of Neural Language Models},
publisher = {Proceedings of the Twenty-Sixth International Joint Conference on
Artificial Intelligence, {IJCAI} 2017, Melbourne, Australia, August
19-25, 2017},
publisher = {International Joint Conference on Artificial Intelligence},
pages = {4193--4199},
//publisher = {ijcai.org},
year = {2017},
year = {2017}
}
@inproceedings{Gu2017NonAutoregressiveNM,
author = {Jiatao Gu and
......@@ -4571,7 +4531,7 @@ author = {Yoshua Bengio and
Richard Socher},
title = {Non-Autoregressive Neural Machine Translation},
publisher = {International Conference on Learning Representations},
year = {2018},
year = {2018}
}
@inproceedings{li-etal-2018-simple,
author = {Yanyang Li and
......@@ -4581,12 +4541,9 @@ author = {Yoshua Bengio and
Changming Xu and
Jingbo Zhu},
title = {A Simple and Effective Approach to Coverage-Aware Neural Machine Translation},
publisher = {Proceedings of the 56th Annual Meeting of the Association for Computational
Linguistics, {ACL} 2018, Melbourne, Australia, July 15-20, 2018, Volume
2: Short Papers},
publisher = {Annual Meeting of the Association for Computational Linguistics},
pages = {292--297},
//publisher = {Association for Computational Linguistics},
year = {2018},
year = {2018}
}
@inproceedings{TuModeling,
author = {Zhaopeng Tu and
......@@ -4595,11 +4552,8 @@ author = {Yoshua Bengio and
Xiaohua Liu and
Hang Li},
title = {Modeling Coverage for Neural Machine Translation},
publisher = {Proceedings of the 54th Annual Meeting of the Association for Computational
Linguistics, {ACL} 2016, August 7-12, 2016, Berlin, Germany, Volume
1: Long Papers},
//publisher = {The Association for Computer Linguistics},
year = {2016},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2016}
}
@inproceedings{DBLP:journals/corr/SennrichFCBHHJL17,
author = {Rico Sennrich and
......@@ -4614,23 +4568,17 @@ author = {Yoshua Bengio and
Jozef Mokry and
Maria Nadejde},
title = {Nematus: a Toolkit for Neural Machine Translation},
publisher = {Proceedings of the 15th Conference of the European Chapter of the
Association for Computational Linguistics, {EACL} 2017, Valencia,
Spain, April 3-7, 2017, Software Demonstrations},
publisher = {European Association of Computational Linguistics},
pages = {65--68},
//publisher = {Association for Computational Linguistics},
year = {2017},
year = {2017}
}
@inproceedings{DBLP:journals/corr/abs-1905-13324,
author = {Biao Zhang and
Rico Sennrich},
title = {A Lightweight Recurrent Network for Sequence Modeling},
publisher = {Proceedings of the 57th Conference of the Association for Computational
Linguistics, {ACL} 2019, Florence, Italy, July 28- August 2, 2019,
Volume 1: Long Papers},
publisher = {Annual Meeting of the Association for Computational Linguistics},
pages = {1538--1548},
//publisher = {Association for Computational Linguistics},
year = {2019},
year = {2019}
}
@article{Lei2017TrainingRA,
author = {Tao Lei and
......@@ -4639,7 +4587,7 @@ author = {Yoshua Bengio and
title = {Training RNNs as Fast as CNNs},
journal = {CoRR},
volume = {abs/1709.02755},
year = {2017},
year = {2017}
}
@inproceedings{Zhang2018SimplifyingNM,
author = {Biao Zhang and
......@@ -4649,22 +4597,18 @@ author = {Yoshua Bengio and
Huiji Zhang},
title = {Simplifying Neural Machine Translation with Addition-Subtraction Twin-Gated
Recurrent Networks},
publisher = {Proceedings of the 2018 Conference on Empirical Methods in Natural
Language Processing, Brussels, Belgium, October 31 - November 4, 2018},
publisher = {Conference on Empirical Methods in Natural Language Processing},
pages = {4273--4283},
//publisher = {Association for Computational Linguistics},
year = {2018},
year = {2018}
}
@inproceedings{Liu_2019_CVPR,
author = {Shikun Liu and
Edward Johns and
Andrew J. Davison},
title = {End-To-End Multi-Task Learning With Attention},
publisher = {{IEEE} Conference on Computer Vision and Pattern Recognition, {CVPR}
2019, Long Beach, CA, USA, June 16-20, 2019},
publisher = {IEEE Conference on Computer Vision and Pattern Recognition},
pages = {1871--1880},
//publisher = {Computer Vision Foundation / {IEEE}},
year = {2019},
year = {2019}
}
@inproceedings{DBLP:journals/corr/abs-1811-00498,
author = {Ra{\'{u}}l V{\'{a}}zquez and
......@@ -4672,11 +4616,9 @@ author = {Yoshua Bengio and
J{\"{o}}rg Tiedemann and
Mathias Creutz},
title = {Multilingual {NMT} with a Language-Independent Attention Bridge},
publisher = {Proceedings of the 4th Workshop on Representation Learning for NLP,
RepL4NLP@ACL 2019, Florence, Italy, August 2, 2019},
publisher = {Annual Meeting of the Association for Computational Linguistics},
pages = {33--39},
//publisher = {Association for Computational Linguistics},
year = {2019},
year = {2019}
}
@inproceedings{MoradiInterrogating,
author = {Pooya Moradi and
......@@ -4684,11 +4626,9 @@ author = {Yoshua Bengio and
Anoop Sarkar},
title = {Interrogating the Explanatory Power of Attention in Neural Machine
Translation},
publisher = {Proceedings of the 3rd Workshop on Neural Generation and Translation@EMNLP-IJCNLP
2019, Hong Kong, November 4, 2019},
publisher = {Conference on Empirical Methods in Natural Language Processing},
pages = {221--230},
//publisher = {Association for Computational Linguistics},
year = {2019},
year = {2019}
}
@inproceedings{WangNeural,
author = {Xing Wang and
......@@ -4698,11 +4638,9 @@ author = {Yoshua Bengio and
Deyi Xiong and
Min Zhang},
title = {Neural Machine Translation Advised by Statistical Machine Translation},
publisher = {Proceedings of the Thirty-First {AAAI} Conference on Artificial Intelligence,
February 4-9, 2017, San Francisco, California, {USA}},
publisher = {AAAI Conference on Artificial Intelligence},
pages = {3330--3336},
//publisher = {{AAAI} Press},
year = {2017},
year = {2017}
}
@inproceedings{Xiao2019SharingAW,
author = {Tong Xiao and
......@@ -4711,12 +4649,9 @@ author = {Yoshua Bengio and
Zhengtao Yu and
Tongran Liu},
title = {Sharing Attention Weights for Fast Transformer},
publisher = {Proceedings of the Twenty-Eighth International Joint Conference on
Artificial Intelligence, {IJCAI} 2019, Macao, China, August 10-16,
2019},
publisher = {International Joint Conference on Artificial Intelligence},
pages = {5292--5298},
//publisher = {ijcai.org},
year = {2019},
year = {2019}
}
@inproceedings{Yang2017TowardsBH,
author = {Baosong Yang and
......@@ -4726,36 +4661,27 @@ author = {Yoshua Bengio and
Jingbo Zhu},
title = {Towards Bidirectional Hierarchical Representations for Attention-based
Neural Machine Translation},
publisher = {Proceedings of the 2017 Conference on Empirical Methods in Natural
Language Processing, {EMNLP} 2017, Copenhagen, Denmark, September
9-11, 2017},
publisher = {Conference on Empirical Methods in Natural Language Processing},
pages = {1432--1441},
//publisher = {Association for Computational Linguistics},
year = {2017},
year = {2017}
}
@inproceedings{Wang2019TreeTI,
author = {Yau-Shian Wang and
Hung-yi Lee and
Yun-Nung Chen},
title = {Tree Transformer: Integrating Tree Structures into Self-Attention},
publisher = {Proceedings of the 2019 Conference on Empirical Methods in Natural
Language Processing and the 9th International Joint Conference on
Natural Language Processing, {EMNLP-IJCNLP} 2019, Hong Kong, China,
November 3-7, 2019},
//publisher = {Association for Computational Linguistics},
publisher = {Conference on Empirical Methods in Natural Language Processing},
pages = {1061--1070},
year = {2019},
year = {2019}
}
@inproceedings{DBLP:journals/corr/abs-1809-01854,
author = {Jetic Gu and
Hassan S. Shavarani and
Anoop Sarkar},
title = {Top-down Tree Structured Decoding with Syntactic Connections for Neural Machine Translation and Parsing},
publisher = {Proceedings of the 2018 Conference on Empirical Methods in Natural
Language Processing, Brussels, Belgium, October 31 - November 4, 2018},
publisher = {Conference on Empirical Methods in Natural Language Processing},
pages = {401--413},
//publisher = {Association for Computational Linguistics},
year = {2018},
year = {2018}
}
@inproceedings{DBLP:journals/corr/abs-1808-09374,
author = {Xinyi Wang and
......@@ -4763,11 +4689,9 @@ author = {Yoshua Bengio and
Pengcheng Yin and
Graham Neubig},
title = {A Tree-based Decoder for Neural Machine Translation},
publisher = {Proceedings of the 2018 Conference on Empirical Methods in Natural
Language Processing, Brussels, Belgium, October 31 - November 4, 2018},
publisher = {Conference on Empirical Methods in Natural Language Processing},
pages = {4772--4777},
//publisher = {Association for Computational Linguistics},
year = {2018},
year = {2018}
}
@article{DBLP:journals/corr/ZhangZ16c,
author = {Jiajun Zhang and
......@@ -4775,7 +4699,7 @@ author = {Yoshua Bengio and
title = {Bridging Neural Machine Translation and Bilingual Dictionaries},
journal = {CoRR},
volume = {abs/1610.07272},
year = {2016},
year = {2016}
}
@article{Dai2019TransformerXLAL,
author = {Zihang Dai and
......@@ -4787,7 +4711,7 @@ author = {Yoshua Bengio and
title = {Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context},
journal = {CoRR},
volume = {abs/1901.02860},
year = {2019},
year = {2019}
}
@inproceedings{li-etal-2019-word,
author = {Xintong Li and
......@@ -4796,12 +4720,9 @@ author = {Yoshua Bengio and
Max Meng and
Shuming Shi},
title = {On the Word Alignment from Neural Machine Translation},
publisher = {Proceedings of the 57th Conference of the Association for Computational
Linguistics, {ACL} 2019, Florence, Italy, July 28- August 2, 2019,
Volume 1: Long Papers},
publisher = {Annual Meeting of the Association for Computational Linguistics},
pages = {1293--1303},
//publisher = {Association for Computational Linguistics},
year = {2019},
year = {2019}
}
@inproceedings{Werlen2018DocumentLevelNM,
......@@ -4811,11 +4732,9 @@ author = {Yoshua Bengio and
James Henderson},
title = {Document-Level Neural Machine Translation with Hierarchical Attention
Networks},
publisher = {Proceedings of the 2018 Conference on Empirical Methods in Natural
Language Processing, Brussels, Belgium, October 31 - November 4, 2018},
publisher = {Conference on Empirical Methods in Natural Language Processing},
pages = {2947--2954},
//publisher = {Association for Computational Linguistics},
year = {2018},
year = {2018}
}
@inproceedings{DBLP:journals/corr/abs-1805-10163,
author = {Elena Voita and
......@@ -4823,12 +4742,9 @@ author = {Yoshua Bengio and
Rico Sennrich and
Ivan Titov},
title = {Context-Aware Neural Machine Translation Learns Anaphora Resolution},
publisher = {Proceedings of the 56th Annual Meeting of the Association for Computational
Linguistics, {ACL} 2018, Melbourne, Australia, July 15-20, 2018, Volume
1: Long Papers},
publisher = {Annual Meeting of the Association for Computational Linguistics},
pages = {1264--1274},
//publisher = {Association for Computational Linguistics},
year = {2018},
year = {2018}
}
@article{DBLP:journals/corr/abs-1906-00532,
author = {Aishwarya Bhandare and
......@@ -4842,7 +4758,7 @@ author = {Yoshua Bengio and
Translation Model},
journal = {CoRR},
volume = {abs/1906.00532},
year = {2019},
year = {2019}
}
@inproceedings{Zhang2018SpeedingUN,
......@@ -4852,22 +4768,18 @@ author = {Yoshua Bengio and
Lei Shen and
Qun Liu},
title = {Speeding Up Neural Machine Translation Decoding by Cube Pruning},
publisher = {Proceedings of the 2018 Conference on Empirical Methods in Natural
Language Processing, Brussels, Belgium, October 31 - November 4, 2018},
publisher = {Conference on Empirical Methods in Natural Language Processing},
pages = {4284--4294},
//publisher = {Association for Computational Linguistics},
year = {2018},
year = {2018}
}
@inproceedings{DBLP:journals/corr/SeeLM16,
author = {Abigail See and
Minh-Thang Luong and
Christopher D. Manning},
title = {Compression of Neural Machine Translation Models via Pruning},
publisher = {Proceedings of the 20th {SIGNLL} Conference on Computational Natural
Language Learning, CoNLL 2016, Berlin, Germany, August 11-12, 2016},
publisher = {International Conference on Computational Linguistics},
pages = {291--301},
//publisher = {{ACL}},
year = {2016},
year = {2016}
}
@inproceedings{DBLP:journals/corr/ChenLCL17,
author = {Yun Chen and
......@@ -4875,12 +4787,9 @@ author = {Yoshua Bengio and
Yong Cheng and
Victor O. K. Li},
title = {A Teacher-Student Framework for Zero-Resource Neural Machine Translation},
publisher = {Proceedings of the 55th Annual Meeting of the Association for Computational
Linguistics, {ACL} 2017, Vancouver, Canada, July 30 - August 4, Volume
1: Long Papers},
pages = {1925--1935},
//publisher = {Association for Computational Linguistics},
year = {2017},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2017}
}
@article{Hinton2015Distilling,
author = {Geoffrey E. Hinton and
......@@ -4889,13 +4798,13 @@ author = {Yoshua Bengio and
title = {Distilling the Knowledge in a Neural Network},
journal = {CoRR},
volume = {abs/1503.02531},
year = {2015},
year = {2015}
}
@inproceedings{Ott2018ScalingNM,
title={Scaling Neural Machine Translation},
author={Myle Ott and Sergey Edunov and David Grangier and M. Auli},
publisher={Workshop on Machine Translation},
publisher={Annual Meeting of the Association for Computational Linguistics},
year={2018}
}
@inproceedings{Lin2020TowardsF8,
......@@ -4915,7 +4824,7 @@ author = {Yoshua Bengio and
Alexander M. Rush},
title = {Sequence-Level Knowledge Distillation},
pages = {1317--1327},
publisher = {The Association for Computational Linguistics},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2016}
}
@article{Akaike1969autoregressive,
......@@ -4946,13 +4855,13 @@ author = {Yoshua Bengio and
title = {The Best of Both Worlds: Combining Recent Advances in Neural Machine
Translation},
pages = {76--86},
publisher = {Association for Computational Linguistics},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2018}
}
@inproceedings{He2018LayerWiseCB,
title={Layer-Wise Coordination between Encoder and Decoder for Neural Machine Translation},
author={Tianyu He and X. Tan and Yingce Xia and D. He and T. Qin and Zhibo Chen and T. Liu},
publisher={Conference and Workshop on Neural Information Processing Systems},
publisher={Conference on Neural Information Processing Systems},
year={2018}
}
@inproceedings{cho-etal-2014-properties,
......@@ -4962,7 +4871,7 @@ author = {Yoshua Bengio and
Yoshua Bengio},
title = {On the Properties of Neural Machine Translation: Encoder-Decoder Approaches},
pages = {103--111},
publisher = {Association for Computational Linguistics},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2014}
}
......@@ -4973,7 +4882,7 @@ author = {Yoshua Bengio and
Yoshua Bengio},
title = {On Using Very Large Target Vocabulary for Neural Machine Translation},
pages = {1--10},
publisher = {The Association for Computer Linguistics},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2015}
}
......@@ -4982,8 +4891,7 @@ author = {Yoshua Bengio and
Hieu Pham and
Christopher D. Manning},
title = {Effective Approaches to Attention-based Neural Machine Translation},
publisher = {Conference on Empirical Methods in Natural
Language Processing},
publisher = {Conference on Empirical Methods in Natural Language Processing},
pages = {1412--1421},
year = {2015}
}
......@@ -4994,7 +4902,7 @@ author = {Yoshua Bengio and
Haifeng Wang},
title = {Improved Neural Machine Translation with {SMT} Features},
pages = {151--157},
publisher = {the Association for the Advance of Artificial Intelligence},
publisher = {AAAI Conference on Artificial Intelligence},
year = {2016}
}
@inproceedings{zhang-etal-2017-prior,
......@@ -5005,7 +4913,7 @@ author = {Yoshua Bengio and
Xu, Jingfang and
Sun, Maosong},
year = {2017},
publisher = {Association for Computational Linguistics},
publisher = {Annual Meeting of the Association for Computational Linguistics},
pages = {1514--1523},
}
......@@ -5021,7 +4929,7 @@ author = {Yoshua Bengio and
title = {Bilingual Dictionary Based Neural Machine Translation without Using
Parallel Sentences},
pages = {1570--1579},
publisher = {Association for Computational Linguistics},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2020}
}
......@@ -5030,7 +4938,7 @@ author = {Yoshua Bengio and
Deyi Xiong},
title = {Encoding Gated Translation Memory into Neural Machine Translation},
pages = {3042--3047},
publisher = {Association for Computational Linguistics},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2018}
}
@inproceedings{yang-etal-2016-hierarchical,
......@@ -5042,7 +4950,7 @@ author = {Yoshua Bengio and
Eduard H. Hovy},
title = {Hierarchical Attention Networks for Document Classification},
pages = {1480--1489},
publisher = {The Association for Computational Linguistics},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2016}
}
%%%%% chapter 10------------------------------------------------------
......@@ -5056,7 +4964,7 @@ author = {Yoshua Bengio and
Douwe Kiela},
title = {Code-Switched Named Entity Recognition with Embedding Attention},
pages = {154--158},
publisher = {Association for Computational Linguistics},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2018}
}
......@@ -5069,7 +4977,7 @@ author = {Yoshua Bengio and
title = {Leveraging Linguistic Structures for Named Entity Recognition with
Bidirectional Recursive Neural Networks},
pages = {2664--2669},
publisher = {Association for Computational Linguistics},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2017}
}
......@@ -5077,7 +4985,7 @@ author = {Yoshua Bengio and
author = {Xuezhe Ma and
Eduard H. Hovy},
title = {End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF},
publisher = {The Association for Computer Linguistics},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2016}
}
......@@ -5088,7 +4996,7 @@ author = {Yoshua Bengio and
Andrew McCallum},
title = {Fast and Accurate Entity Recognition with Iterated Dilated Convolutions},
pages = {2670--2680},
publisher = {Association for Computational Linguistics},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2017}
}
......@@ -5107,26 +5015,21 @@ author = {Yoshua Bengio and
year = {2017}
}
@article{DBLP:journals/jmlr/CollobertWBKKK11,
author = {Ronan Collobert and
Jason Weston and
L{\'{e}}on Bottou and
Michael Karlen and
Koray Kavukcuoglu and
Pavel P. Kuksa},
title = {Natural Language Processing (Almost) from Scratch},
journal = {J. Mach. Learn. Res.},
volume = {12},
pages = {2493--2537},
year = {2011}
@article{2011Natural,
title={Natural Language Processing (almost) from Scratch},
author={ Collobert, Ronan and Weston, Jason and Bottou, Léon and Karlen, Michael and Kavukcuoglu, Koray and Kuksa, Pavel },
journal={Journal of Machine Learning Research},
volume={12},
number={1},
pages={2493-2537},
year={2011},
}
@inproceedings{DBLP:conf/acl/NguyenG15,
author = {Thien Huu Nguyen and
Ralph Grishman},
title = {Event Detection and Domain Adaptation with Convolutional Neural Networks},
pages = {365--371},
publisher = {The Association for Computer Linguistics},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2015}
}
......@@ -5137,7 +5040,7 @@ author = {Yoshua Bengio and
Jun Zhao},
title = {Recurrent Convolutional Neural Networks for Text Classification},
pages = {2267--2273},
publisher = {the Association for the Advance of Artificial Intelligence},
publisher = {AAAI Conference on Artificial Intelligence},
year = {2015}
}
......@@ -5149,7 +5052,7 @@ author = {Yoshua Bengio and
Jun Zhao},
title = {Event Extraction via Dynamic Multi-Pooling Convolutional Neural Networks},
pages = {167--176},
publisher = {The Association for Computer Linguistics},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2015}
}
......@@ -5159,7 +5062,7 @@ author = {Yoshua Bengio and
Tommi S. Jaakkola},
title = {Molding CNNs for text: non-linear, non-consecutive convolutions},
pages = {1565--1575},
publisher = {The Association for Computational Linguistics},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2015}
}
......@@ -5169,7 +5072,7 @@ author = {Yoshua Bengio and
title = {Effective Use of Word Order for Text Categorization with Convolutional
Neural Networks},
pages = {103--112},
publisher = {The Association for Computational Linguistics},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2015}
}
......@@ -5178,14 +5081,14 @@ author = {Yoshua Bengio and
Ralph Grishman},
title = {Relation Extraction: Perspective from Convolutional Neural Networks},
pages = {39--48},
publisher = {The Association for Computational Linguistics},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2015}
}
@article{StahlbergNeural,
title={Neural Machine Translation: A Review},
author={Felix Stahlberg},
journal={journal of artificial intelligence research},
journal={Journal of Artificial Intelligence Research},
year={2020},
volume={69},
pages={343-418}
......@@ -5211,7 +5114,7 @@ author = {Yoshua Bengio and
@article{Waibel1989PhonemeRU,
title={Phoneme recognition using time-delay neural networks},
author={Alexander H. Waibel and Toshiyuki Hanazawa and Geoffrey E. Hinton and K. Shikano and K. Lang},
journal={IEEE Trans. Acoust. Speech Signal Process.},
journal={IEEE Transactions on Acoustics, Speech, and Signal Processing},
year={1989},
volume={37},
pages={328-339}
......@@ -5226,7 +5129,7 @@ author = {Yoshua Bengio and
pages={541-551}
}
@ARTICLE{726791,
@article{726791,
author={Y. {Lecun} and L. {Bottou} and Y. {Bengio} and P. {Haffner}},
journal={Proceedings of the IEEE},
title={Gradient-based learning applied to document recognition},
......@@ -5234,7 +5137,6 @@ author = {Yoshua Bengio and
volume={86},
number={11},
pages={2278-2324},
//doi={10.1109/5.726791}
}
@inproceedings{DBLP:journals/corr/HeZRS15,
......@@ -5262,7 +5164,7 @@ author = {Yoshua Bengio and
@article{Girshick2015FastR,
title={Fast R-CNN},
author={Ross B. Girshick},
journal={2015 IEEE International Conference on Computer Vision (ICCV)},
journal={International Conference on Computer Vision},
year={2015},
pages={1440-1448}
}
......@@ -5279,7 +5181,7 @@ author = {Yoshua Bengio and
@inproceedings{Kalchbrenner2014ACN,
title={A Convolutional Neural Network for Modelling Sentences},
author={Nal Kalchbrenner and Edward Grefenstette and P. Blunsom},
booktitle={ACL},
publisher={Annual Meeting of the Association for Computational Linguistics},
pages={655--665},
year={2014}
}
......@@ -5287,7 +5189,7 @@ author = {Yoshua Bengio and
@inproceedings{Kim2014ConvolutionalNN,
title={Convolutional Neural Networks for Sentence Classification},
author={Yoon Kim},
booktitle={Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing},
publisher={Conference on Empirical Methods in Natural Language Processing},
pages = {1746--1751},
year={2014}
}
......@@ -5299,7 +5201,7 @@ author = {Yoshua Bengio and
Bowen Zhou and
Bing Xiang},
pages = {174--179},
booktitle={The Association for Computer Linguistics},
publisher={Annual Meeting of the Association for Computational Linguistics},
year={2015}
}
......@@ -5308,7 +5210,7 @@ author = {Yoshua Bengio and
author = {C{\'{\i}}cero Nogueira dos Santos and
Maira Gatti},
pages = {69--78},
publisher = {The Association for Computer Linguistics},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year={2014}
}
......@@ -5318,7 +5220,7 @@ author = {Yoshua Bengio and
Angela Fan and
Michael Auli and
David Grangier},
booktitle={Proceedings of the 34th International Conference on Machine Learning},
publisher={International Conference on Machine Learning},
volume = {70},
pages = {933--941},
year={2017}
......@@ -5330,7 +5232,7 @@ author = {Yoshua Bengio and
Michael Auli and
David Grangier and
Yann N. Dauphin},
booktitle={The Association for Computer Linguistics},
publisher={Annual Meeting of the Association for Computational Linguistics},
pages = {123--135},
year={2017}
}
......@@ -5353,7 +5255,7 @@ author = {Yoshua Bengio and
author = {Lukasz Kaiser and
Aidan N. Gomez and
Fran{\c{c}}ois Chollet},
publisher = {OpenReview.net},
journal = {International Conference on Learning Representations},
year={2018},
}
......@@ -5364,7 +5266,7 @@ author = {Yoshua Bengio and
Yann N. Dauphin and
Michael Auli},
title = {Pay Less Attention with Lightweight and Dynamic Convolutions},
publisher = {7th International Conference on Learning Representations},
publisher = {International Conference on Learning Representations},
year = {2019},
}
......@@ -5421,7 +5323,7 @@ author = {Yoshua Bengio and
Shaoqing Ren and
Jian Sun},
title = {Deep Residual Learning for Image Recognition},
publisher = {{IEEE} Conference on Computer Vision and Pattern Recognition},
publisher = {IEEE Conference on Computer Vision and Pattern Recognition},
pages = {770--778},
year = {2016},
}
......@@ -5432,26 +5334,26 @@ author = {Yoshua Bengio and
Arthur Szlam and
Jason Weston and
Rob Fergus},
booktitle={Conference and Workshop on Neural Information Processing Systems},
publisher={Conference on Neural Information Processing Systems},
pages = {2440--2448},
year={2015}
}
@article{Islam2020HowMP,
title={How Much Position Information Do Convolutional Neural Networks Encode?},
author={Md. Amirul Islam and Sen Jia and Neil D. B. Bruce},
journal={ArXiv},
year={2020},
volume={abs/2001.08248}
@inproceedings{Islam2020HowMP,
author = {Md. Amirul Islam and
Sen Jia and
Neil D. B. Bruce},
title = {How much Position Information Do Convolutional Neural Networks Encode?},
publisher = {International Conference on Learning Representations},
year = {2020},
}
@inproceedings{Sutskever2013OnTI,
title={On the importance of initialization and momentum in deep learning},
author = {Ilya Sutskever and
James Martens and
George E. Dahl and
Geoffrey E. Hinton},
booktitle={International Conference on Machine Learning},
publisher = {International Conference on Machine Learning},
pages = {1139--1147},
year={2013}
}
......@@ -5459,7 +5361,7 @@ author = {Yoshua Bengio and
@article{Bengio2013AdvancesIO,
title={Advances in optimizing recurrent networks},
author={Yoshua Bengio and Nicolas Boulanger-Lewandowski and Razvan Pascanu},
journal={2013 IEEE International Conference on Acoustics, Speech and Signal Processing},
journal={IEEE Transactions on Acoustics, Speech, and Signal Processing},
year={2013},
pages={8624-8628}
}
......@@ -5476,7 +5378,7 @@ author = {Yoshua Bengio and
@article{Chollet2017XceptionDL,
title={Xception: Deep Learning with Depthwise Separable Convolutions},
author = {Fran{\c{c}}ois Chollet},
journal={2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
journal={IEEE Conference on Computer Vision and Pattern Recognition},
year={2017},
pages={1800-1807}
}
......@@ -5512,7 +5414,7 @@ author = {Yoshua Bengio and
title={Rotation, Scaling and Deformation Invariant Scattering for Texture Discrimination},
author = {Laurent Sifre and
St{\'{e}}phane Mallat},
journal={2013 IEEE Conference on Computer Vision and Pattern Recognition},
journal={IEEE Conference on Computer Vision and Pattern Recognition},
year={2013},
pages={1233-1240}
}
......@@ -5520,7 +5422,7 @@ author = {Yoshua Bengio and
@article{Taigman2014DeepFaceCT,
title={DeepFace: Closing the Gap to Human-Level Performance in Face Verification},
author={Yaniv Taigman and Ming Yang and Marc'Aurelio Ranzato and Lior Wolf},
journal={2014 IEEE Conference on Computer Vision and Pattern Recognition},
journal={IEEE Conference on Computer Vision and Pattern Recognition},
year={2014},
pages={1701-1708}
}
......@@ -5533,7 +5435,7 @@ author = {Yoshua Bengio and
Mirk{\'{o}} Visontai and
Raziel Alvarez and
Carolina Parada},
booktitle={the International Speech Communication Association},
publisher={Conference of the International Speech Communication Association},
pages = {1136--1140},
year={2015}
}
......@@ -5546,7 +5448,7 @@ author = {Yoshua Bengio and
Dongdong Chen and
Lu Yuan and
Zicheng Liu},
publisher = {Institute of Electrical and Electronics Engineers},
journal = {IEEE Conference on Computer Vision and Pattern Recognition},
year={2020},
pages={11027-11036}
}
......@@ -5563,7 +5465,7 @@ author = {Yoshua Bengio and
Chloe Hillier and
Timothy P. Lillicrap},
title = {Compressive Transformers for Long-Range Sequence Modelling},
publisher = {OpenReview.net},
publisher = {International Conference on Learning Representations},
year = {2020}
}
......@@ -5597,7 +5499,7 @@ author = {Yoshua Bengio and
Yujun Lin and
Song Han},
title = {Lite Transformer with Long-Short Range Attention},
publisher = {OpenReview.net},
publisher = {International Conference on Learning Representations},
year = {2020}
}
......@@ -5610,7 +5512,7 @@ author = {Yoshua Bengio and
title = {Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy
Lifting, the Rest Can Be Pruned},
pages = {5797--5808},
publisher = {Association for Computational Linguistics},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2019},
}
......@@ -5623,7 +5525,7 @@ author = {Yoshua Bengio and
Bowen Zhou and
Yoshua Bengio},
title = {A Structured Self-Attentive Sentence Embedding},
publisher = {5th International Conference on Learning Representations},
publisher = {International Conference on Learning Representations},
year = {2017},
}
@inproceedings{Shaw2018SelfAttentionWR,
......@@ -5631,8 +5533,8 @@ author = {Yoshua Bengio and
Jakob Uszkoreit and
Ashish Vaswani},
title = {Self-Attention with Relative Position Representations},
publisher = {Proceedings of the 2018 Conference of the North American Chapter of
the Association for Computational Linguistics: Human Language Technologies},
publisher = {Proceedings of the Human Language Technology Conference of
the North American Chapter of the Association for Computational Linguistics},
pages = {464--468},
year = {2018},
}
......@@ -5642,7 +5544,7 @@ author = {Yoshua Bengio and
Shaoqing Ren and
Jian Sun},
title = {Deep Residual Learning for Image Recognition},
publisher = {{IEEE} Conference on Computer Vision and Pattern Recognition},
publisher = {IEEE Conference on Computer Vision and Pattern Recognition},
pages = {770--778},
year = {2016},
}
......@@ -5661,7 +5563,7 @@ author = {Yoshua Bengio and
Jonathon Shlens and
Zbigniew Wojna},
title = {Rethinking the Inception Architecture for Computer Vision},
publisher = {{IEEE} Conference on Computer Vision and Pattern Recognition},
publisher = {IEEE Conference on Computer Vision and Pattern Recognition},
pages = {2818--2826},
year = {2016},
}
......@@ -5670,8 +5572,7 @@ author = {Yoshua Bengio and
Deyi Xiong and
Jinsong Su},
title = {Accelerating Neural Transformer via an Average Attention Network},
publisher = {Proceedings of the 56th Annual Meeting of the Association for Computational
Linguistics},
publisher = {Annual Meeting of the Association for Computational Linguistics},
pages = {1789--1798},
year = {2018},
}
......@@ -5691,7 +5592,7 @@ author = {Yoshua Bengio and
Yann N. Dauphin and
Michael Auli},
title = {Pay Less Attention with Lightweight and Dynamic Convolutions},
publisher = {7th International Conference on Learning Representations},
publisher = {International Conference on Learning Representations},
year = {2019},
}
......@@ -5704,7 +5605,7 @@ author = {Yoshua Bengio and
Ruslan Salakhutdinov},
title = {Transformer-XL: Attentive Language Models beyond a Fixed-Length Context},
pages = {2978--2988},
publisher = {Association for Computational Linguistics},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2019}
}
@article{Liu2020LearningTE,
......@@ -5729,7 +5630,7 @@ author = {Yoshua Bengio and
Tong Zhang},
title = {Modeling Localness for Self-Attention Networks},
pages = {4449--4458},
publisher = {Association for Computational Linguistics},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2018}
}
@inproceedings{DBLP:journals/corr/abs-1904-03107,
......@@ -5740,7 +5641,7 @@ author = {Yoshua Bengio and
Zhaopeng Tu},
title = {Convolutional Self-Attention Networks},
pages = {4040--4045},
publisher = {Association for Computational Linguistics},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2019},
}
@article{Wang2018MultilayerRF,
......@@ -5759,7 +5660,7 @@ author = {Yoshua Bengio and
title = {Training Deeper Neural Machine Translation Models with Transparent
Attention},
pages = {3028--3033},
publisher = {Association for Computational Linguistics},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2018}
}
@inproceedings{Dou2018ExploitingDR,
......@@ -5770,7 +5671,7 @@ author = {Yoshua Bengio and
Tong Zhang},
title = {Exploiting Deep Representations for Neural Machine Translation},
pages = {4253--4262},
publisher = {Association for Computational Linguistics},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2018}
}
@inproceedings{Wang2019ExploitingSC,
......@@ -5789,13 +5690,13 @@ author = {Yoshua Bengio and
Tong Zhang},
title = {Dynamic Layer Aggregation for Neural Machine Translation with Routing-by-Agreement},
pages = {86--93},
publisher = {the Association for the Advance of Artificial Intelligence},
publisher = {AAAI Conference on Artificial Intelligence},
year = {2019}
}
@inproceedings{Wei2020MultiscaleCD,
title={Multiscale Collaborative Deep Models for Neural Machine Translation},
author={Xiangpeng Wei and Heng Yu and Yue Hu and Yue Zhang and Rongxiang Weng and Weihua Luo},
booktitle={Annual Meeting of the Association for Computational Linguistics},
publisher={Annual Meeting of the Association for Computational Linguistics},
year={2020}
}
......@@ -5824,7 +5725,7 @@ author = {Yoshua Bengio and
Lukasz Kaiser and
Anselm Levskaya},
title = {Reformer: The Efficient Transformer},
publisher = {OpenReview.net},
journal = {International Conference on Learning Representations},
year = {2020}
}
......@@ -5839,7 +5740,7 @@ author = {Yoshua Bengio and
@article{li2020shallow,
title={Shallow-to-Deep Training for Neural Machine Translation},
author={Li, Bei and Wang, Ziyang and Liu, Hui and Jiang, Yufan and Du, Quan and Xiao, Tong and Wang, Huizhen and Zhu, Jingbo},
publisher={Conference on Empirical Methods in Natural Language Processing},
journal={Conference on Empirical Methods in Natural Language Processing},
year={2020}
}
%%%%% chapter 12------------------------------------------------------
......@@ -6311,6 +6212,971 @@ author = {Yoshua Bengio and
journal = {Computer Science},
year = {2015},
}
@phdthesis{黄书剑0统计机器翻译中的词对齐研究,
title={统计机器翻译中的词对齐研究},
author={黄书剑},
publisher={南京大学},
year={2012}
}
@article{DBLP:journals/corr/MikolovLS13,
author = {Tomas Mikolov and
Quoc V. Le and
Ilya Sutskever},
title = {Exploiting Similarities among Languages for Machine Translation},
journal = {CoRR},
volume = {abs/1309.4168},
year = {2013}
}
@inproceedings{DBLP:conf/acl/VulicK16,
author = {Ivan Vulic and
Anna Korhonen},
title = {On the Role of Seed Lexicons in Learning Bilingual Word Embeddings},
publisher = {The Association for Computer Linguistics},
year = {2016}
}
@inproceedings{DBLP:conf/iclr/SmithTHH17,
author = {Samuel L. Smith and
David H. P. Turban and
Steven Hamblin and
Nils Y. Hammerla},
title = {Offline bilingual word vectors, orthogonal transformations and the
inverted softmax},
publisher = {International Conference on Learning Representations},
year = {2017}
}
@inproceedings{DBLP:conf/acl/ArtetxeLA17,
author = {Mikel Artetxe and
Gorka Labaka and
Eneko Agirre},
title = {Learning bilingual word embeddings with (almost) no bilingual data},
pages = {451--462},
publisher = {Association for Computational Linguistics},
year = {2017}
}
@article{1966ASchnemann,
title={A generalized solution of the orthogonal procrustes problem},
author={Schnemann, Peter H. },
journal={Psychometrika},
volume={31},
number={1},
pages={1-10},
year={1966},
}
@inproceedings{DBLP:conf/iclr/LampleCRDJ18,
author = {Guillaume Lample and
Alexis Conneau and
Marc'Aurelio Ranzato and
Ludovic Denoyer and
Herv{\'{e}} J{\'{e}}gou},
title = {Word translation without parallel data},
publisher = {International Conference on Learning Representations},
year = {2018}
}
@inproceedings{DBLP:conf/acl/ZhangLLS17,
author = {Meng Zhang and
Yang Liu and
Huanbo Luan and
Maosong Sun},
title = {Adversarial Training for Unsupervised Bilingual Lexicon Induction},
pages = {1959--1970},
publisher = {Association for Computational Linguistics},
year = {2017}
}
@inproceedings{DBLP:conf/emnlp/XuYOW18,
author = {Ruochen Xu and
Yiming Yang and
Naoki Otani and
Yuexin Wu},
title = {Unsupervised Cross-lingual Transfer of Word Embedding Spaces},
pages = {2465--2474},
publisher = {Association for Computational Linguistics},
year = {2018}
}
@inproceedings{DBLP:conf/emnlp/Alvarez-MelisJ18,
author = {David Alvarez-Melis and
Tommi S. Jaakkola},
title = {Gromov-Wasserstein Alignment of Word Embedding Spaces},
pages = {1881--1890},
publisher = {Association for Computational Linguistics},
year = {2018}
}
@inproceedings{DBLP:conf/lrec/GarneauGBDL20,
author = {Nicolas Garneau and
Mathieu Godbout and
David Beauchemin and
Audrey Durand and
Luc Lamontagne},
title = {A Robust Self-Learning Method for Fully Unsupervised Cross-Lingual
Mappings of Word Embeddings: Making the Method Robustly Reproducible
as Well},
pages = {5546--5554},
publisher = {European Language Resources Association},
year = {2020}
}
@inproceedings{DBLP:conf/naacl/XingWLL15,
author = {Chao Xing and
Dong Wang and
Chao Liu and
Yiye Lin},
title = {Normalized Word Embedding and Orthogonal Transform for Bilingual Word
Translation},
pages = {1006--1011},
publisher = {The Association for Computational Linguistics},
year = {2015}
}
@inproceedings{DBLP:conf/iclr/SmithTHH17,
author = {Samuel L. Smith and
David H. P. Turban and
Steven Hamblin and
Nils Y. Hammerla},
title = {Offline bilingual word vectors, orthogonal transformations and the
inverted softmax},
publisher = {International Conference on Learning Representations},
year = {2017}
}
@inproceedings{DBLP:conf/emnlp/VulicGRK19,
author = {Ivan Vulic and
Goran Glavas and
Roi Reichart and
Anna Korhonen},
title = {Do We Really Need Fully Unsupervised Cross-Lingual Embeddings?},
pages = {4406--4417},
publisher = {Association for Computational Linguistics},
year = {2019}
}
@inproceedings{DBLP:conf/acl/SogaardVR18,
author = {Anders S{\o}gaard and
Sebastian Ruder and
Ivan Vulic},
title = {On the Limitations of Unsupervised Bilingual Dictionary Induction},
pages = {778--788},
publisher = {Association for Computational Linguistics},
year = {2018}
}
@article{DBLP:journals/talip/MarieF20,
author = {Benjamin Marie and
Atsushi Fujita},
title = {Iterative Training of Unsupervised Neural and Statistical Machine
Translation Systems},
journal = {{ACM} Trans. Asian Low Resour. Lang. Inf. Process.},
volume = {19},
number = {5},
pages = {68:1--68:21},
year = {2020}
}
@inproceedings{DBLP:conf/acl/ArtetxeLA19,
author = {Mikel Artetxe and
Gorka Labaka and
Eneko Agirre},
title = {An Effective Approach to Unsupervised Machine Translation},
pages = {194--203},
publisher = {Association for Computational Linguistics},
year = {2019}
}
@inproceedings{DBLP:conf/acl/PourdamghaniAGK19,
author = {Nima Pourdamghani and
Nada Aldarrab and
Marjan Ghazvininejad and
Kevin Knight and
Jonathan May},
title = {Translating Translationese: {A} Two-Step Approach to Unsupervised
Machine Translation},
pages = {3057--3062},
publisher = {Association for Computational Linguistics},
year = {2019}
}
@inproceedings{DBLP:conf/iclr/LampleCDR18,
author = {Guillaume Lample and
Alexis Conneau and
Ludovic Denoyer and
Marc'Aurelio Ranzato},
title = {Unsupervised Machine Translation Using Monolingual Corpora Only},
publisher = {International Conference on Learning Representations},
year = {2018}
}
@inproceedings{DBLP:conf/nips/ConneauL19,
author = {Alexis Conneau and
Guillaume Lample},
title = {Cross-lingual Language Model Pretraining},
pages = {7057--7067},
year = {2019}
}
@article{DBLP:journals/ipm/FarhanTAJATT20,
author = {Wael Farhan and
Bashar Talafha and
Analle Abuammar and
Ruba Jaikat and
Mahmoud Al-Ayyoub and
Ahmad Bisher Tarakji and
Anas Toma},
title = {Unsupervised dialectal neural machine translation},
journal = {Information Processing \& Management},
volume = {57},
number = {3},
pages = {102181},
year = {2020}
}
@article{A2020Li,
title={A Simple and Effective Approach to Robust Unsupervised Bilingual Dictionary Induction},
author={Yanyang Li and Yingfeng Luo and Ye Lin and Quan Du and Huizhen Wang and Shujian Huang and Tong Xiao and Jingbo Zhu},
publisher={International Conference on Computational Linguistics},
year={2020}
}
@inproceedings{2018When,
title={When and Why are Pre-trained Word Embeddings Useful for Neural Machine Translation?},
author={ Qi, Ye and Sachan, Devendra Singh and Felix, Matthieu and Padmanabhan, Sarguna Janani and Neubig, Graham },
publisher={Annual Conference of the North American Chapter of the Association for Computational Linguistics},
year={2018},
}
@inproceedings{DBLP:conf/emnlp/ClinchantJN19,
author = {St{\'{e}}phane Clinchant and
Kweon Woo Jung and
Vassilina Nikoulina},
title = {On the use of {BERT} for Neural Machine Translation},
pages = {108--117},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2019}
}
@inproceedings{DBLP:conf/emnlp/ImamuraS19,
author = {Kenji Imamura and
Eiichiro Sumita},
title = {Recycling a Pre-trained {BERT} Encoder for Neural Machine Translation},
booktitle = {Proceedings of the 3rd Workshop on Neural Generation and Translation@EMNLP-IJCNLP
2019, Hong Kong, November 4, 2019},
pages = {23--31},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2019}
}
@inproceedings{DBLP:conf/aaai/YangW0Z00020,
author = {Jiacheng Yang and
Mingxuan Wang and
Hao Zhou and
Chengqi Zhao and
Weinan Zhang and
Yong Yu and
Lei Li},
title = {Towards Making the Most of {BERT} in Neural Machine Translation},
pages = {9378--9385},
publisher = {AAAI Conference on Artificial Intelligence},
year = {2020}
}
@inproceedings{DBLP:conf/aaai/WengYHCL20,
author = {Rongxiang Weng and
Heng Yu and
Shujian Huang and
Shanbo Cheng and
Weihua Luo},
title = {Acquiring Knowledge from Pre-Trained Model to Neural Machine Translation},
pages = {9266--9273},
publisher = {AAAI Conference on Artificial Intelligence},
year = {2020}
}
@article{DBLP:journals/corr/abs-2001-08210,
author = {Yinhan Liu and
Jiatao Gu and
Naman Goyal and
Xian Li and
Sergey Edunov and
Marjan Ghazvininejad and
Mike Lewis and
Luke Zettlemoyer},
title = {Multilingual Denoising Pre-training for Neural Machine Translation},
journal = {CoRR},
volume = {abs/2001.08210},
year = {2020}
}
@inproceedings{DBLP:conf/aaai/JiZDZCL20,
author = {Baijun Ji and
Zhirui Zhang and
Xiangyu Duan and
Min Zhang and
Boxing Chen and
Weihua Luo},
title = {Cross-Lingual Pre-Training Based Transfer for Zero-Shot Neural Machine
Translation},
pages = {115--122},
publisher = {AAAI Conference on Artificial Intelligence},
year = {2020}
}
@inproceedings{DBLP:conf/acl/LewisLGGMLSZ20,
author = {Mike Lewis and
Yinhan Liu and
Naman Goyal and
Marjan Ghazvininejad and
Abdelrahman Mohamed and
Omer Levy and
Veselin Stoyanov and
Luke Zettlemoyer},
title = {{BART:} Denoising Sequence-to-Sequence Pre-training for Natural Language
Generation, Translation, and Comprehension},
pages = {7871--7880},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2020}
}
@article{DBLP:journals/corr/abs-2009-08088,
author = {Zhen Yang and
Bojie Hu and
Ambyera Han and
Shen Huang and
Qi Ju},
title = {Code-switching pre-training for neural machine translation},
journal = {CoRR},
volume = {abs/2009.08088},
year = {2020}
}
@article{DBLP:journals/corr/abs-2010-09403,
author = {Dusan Varis and
Ondrej Bojar},
title = {Unsupervised Pretraining for Neural Machine Translation Using Elastic
Weight Consolidation},
journal = {CoRR},
volume = {abs/2010.09403},
year = {2020}
}
@inproceedings{DBLP:conf/emnlp/LampleOCDR18,
author = {Guillaume Lample and
Myle Ott and
Alexis Conneau and
Ludovic Denoyer and
Marc'Aurelio Ranzato},
title = {Phrase-Based {\&} Neural Unsupervised Machine Translation},
pages = {5039--5049},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2018}
}
@article{DBLP:journals/jbd/ShortenK19,
author = {Connor Shorten and
Taghi M. Khoshgoftaar},
title = {A survey on Image Data Augmentation for Deep Learning},
journal = {J. Big Data},
volume = {6},
pages = {60},
year = {2019}
}
@inproceedings{DBLP:conf/naacl/MohiuddinJ19,
author = {Tasnim Mohiuddin and
Shafiq R. Joty},
title = {Revisiting Adversarial Autoencoder for Unsupervised Word Translation
with Cycle Consistency and Improved Training},
pages = {3857--3867},
publisher = {Annual Meeting of the Annual Meeting of the Association for Computational Linguistics},
year = {2019}
}
@inproceedings{DBLP:conf/acl/HuangQC19,
author = {Jiaji Huang and
Qiang Qiu and
Kenneth Church},
title = {Hubless Nearest Neighbor Search for Bilingual Lexicon Induction},
pages = {4072--4080},
publisher = {Annual Meeting of the Annual Meeting of the Association for Computational Linguistics},
year = {2019}
}
@article{DBLP:journals/corr/abs-1811-01124,
author = {Jean Alaux and
Edouard Grave and
Marco Cuturi and
Armand Joulin},
title = {Unsupervised Hyperalignment for Multilingual Word Embeddings},
journal = {CoRR},
volume = {abs/1811.01124},
year = {2018}
}
@inproceedings{DBLP:conf/emnlp/XuYOW18,
author = {Ruochen Xu and
Yiming Yang and
Naoki Otani and
Yuexin Wu},
title = {Unsupervised Cross-lingual Transfer of Word Embedding Spaces},
pages = {2465--2474},
publisher = {Annual Meeting of the Annual Meeting of the Association for Computational Linguistics},
year = {2018}
}
@inproceedings{DBLP:conf/emnlp/DouZH18,
author = {Zi-Yi Dou and
Zhi-Hao Zhou and
Shujian Huang},
title = {Unsupervised Bilingual Lexicon Induction via Latent Variable Models},
pages = {621--626},
publisher = {Annual Meeting of the Annual Meeting of the Association for Computational Linguistics},
year = {2018}
}
@inproceedings{DBLP:conf/emnlp/HoshenW18,
author = {Yedid Hoshen and
Lior Wolf},
title = {Non-Adversarial Unsupervised Word Translation},
pages = {469--478},
publisher = {Annual Meeting of the Annual Meeting of the Association for Computational Linguistics},
year = {2018}
}
@inproceedings{DBLP:conf/emnlp/KimGN18,
author = {Yunsu Kim and
Jiahui Geng and
Hermann Ney},
title = {Improving Unsupervised Word-by-Word Translation with Language Model
and Denoising Autoencoder},
pages = {862--868},
publisher = {Annual Meeting of the Annual Meeting of the Association for Computational Linguistics},
year = {2018}
}
@inproceedings{DBLP:conf/emnlp/MukherjeeYH18,
author = {Tanmoy Mukherjee and
Makoto Yamada and
Timothy M. Hospedales},
title = {Learning Unsupervised Word Translations Without Adversaries},
pages = {627--632},
publisher = {Annual Meeting of the Annual Meeting of the Association for Computational Linguistics},
year = {2018}
}
@inproceedings{DBLP:conf/emnlp/JoulinBMJG18,
author = {Armand Joulin and
Piotr Bojanowski and
Tomas Mikolov and
Herv{\'{e}} J{\'{e}}gou and
Edouard Grave},
title = {Loss in Translation: Learning Bilingual Word Mapping with a Retrieval
Criterion},
pages = {2979--2984},
publisher = {Annual Meeting of the Annual Meeting of the Association for Computational Linguistics},
year = {2018}
}
@inproceedings{DBLP:conf/emnlp/ChenC18,
author = {Xilun Chen and
Claire Cardie},
title = {Unsupervised Multilingual Word Embeddings},
pages = {261--270},
publisher = {Annual Meeting of the Annual Meeting of the Association for Computational Linguistics},
year = {2018}
}
@inproceedings{DBLP:conf/naacl/MohiuddinJ19,
author = {Tasnim Mohiuddin and
Shafiq R. Joty},
title = {Revisiting Adversarial Autoencoder for Unsupervised Word Translation
with Cycle Consistency and Improved Training},
pages = {3857--3867},
publisher = {Annual Meeting of the Annual Meeting of the Association for Computational Linguistics},
year = {2019}
}
@inproceedings{DBLP:conf/emnlp/TaitelbaumCG19,
author = {Hagai Taitelbaum and
Gal Chechik and
Jacob Goldberger},
title = {Multilingual word translation using auxiliary languages},
pages = {1330--1335},
publisher = {Annual Meeting of the Annual Meeting of the Association for Computational Linguistics},
year = {2019}
}
@inproceedings{DBLP:conf/acl/YangLCLS19,
author = {Pengcheng Yang and
Fuli Luo and
Peng Chen and
Tianyu Liu and
Xu Sun},
title = {{MAAM:} {A} Morphology-Aware Alignment Model for Unsupervised Bilingual
Lexicon Induction},
pages = {3190--3196},
publisher = {Annual Meeting of the Annual Meeting of the Association for Computational Linguistics},
year = {2019}
}
@inproceedings{DBLP:conf/acl/OrmazabalALSA19,
author = {Aitor Ormazabal and
Mikel Artetxe and
Gorka Labaka and
Aitor Soroa and
Eneko Agirre},
title = {Analyzing the Limitations of Cross-lingual Word Embedding Mappings},
pages = {4990--4995},
publisher = {Annual Meeting of the Annual Meeting of the Association for Computational Linguistics},
year = {2019}
}
@inproceedings{DBLP:conf/acl/ArtetxeLA19a,
author = {Mikel Artetxe and
Gorka Labaka and
Eneko Agirre},
title = {Bilingual Lexicon Induction through Unsupervised Machine Translation},
pages = {5002--5007},
publisher = {Annual Meeting of the Annual Meeting of the Association for Computational Linguistics},
year = {2019}
}
@inproceedings{DBLP:conf/rep4nlp/VulicKG20,
author = {Ivan Vulic and
Anna Korhonen and
Goran Glavas},
title = {Improving Bilingual Lexicon Induction with Unsupervised Post-Processing
of Monolingual Word Vector Spaces},
pages = {45--54},
publisher = {Annual Meeting of the Annual Meeting of the Association for Computational Linguistics},
year = {2020}
}
@article{hartmann2018empirical,
title={Empirical observations on the instability of aligning word vector spaces with GANs},
author={Hartmann, Mareike and Kementchedjhieva, Yova and S{\o}gaard, Anders},
year={2018}
}
@inproceedings{DBLP:conf/emnlp/Kementchedjhieva19,
author = {Yova Kementchedjhieva and
Mareike Hartmann and
Anders S{\o}gaard},
title = {Lost in Evaluation: Misleading Benchmarks for Bilingual Dictionary
Induction},
pages = {3334--3339},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2019}
}
@inproceedings{DBLP:conf/nips/HartmannKS19,
author = {Mareike Hartmann and
Yova Kementchedjhieva and
Anders S{\o}gaard},
title = {Comparing Unsupervised Word Translation Methods Step by Step},
pages = {6031--6041},
year = {2019}
}
@inproceedings{DBLP:conf/emnlp/HartmannKS18,
author = {Mareike Hartmann and
Yova Kementchedjhieva and
Anders S{\o}gaard},
title = {Why is unsupervised alignment of English embeddings from different
algorithms so hard?},
pages = {582--586},
publisher = {Annual Meeting of the Annual Meeting of the Association for Computational Linguistics},
year = {2018}
}
@inproceedings{DBLP:conf/emnlp/VulicGRK19,
author = {Ivan Vulic and
Goran Glavas and
Roi Reichart and
Anna Korhonen},
title = {Do We Really Need Fully Unsupervised Cross-Lingual Embeddings?},
pages = {4406--4417},
publisher = {Annual Meeting of the Annual Meeting of the Association for Computational Linguistics},
year = {2019}
}
@inproceedings{DBLP:conf/emnlp/JoulinBMJG18,
author = {Armand Joulin and
Piotr Bojanowski and
Tomas Mikolov and
Herv{\'{e}} J{\'{e}}gou and
Edouard Grave},
title = {Loss in Translation: Learning Bilingual Word Mapping with a Retrieval
Criterion},
pages = {2979--2984},
publisher = {Annual Meeting of the Annual Meeting of the Association for Computational Linguistics},
year = {2018}
}
@inproceedings{DBLP:conf/acl/SogaardVR18,
author = {Anders S{\o}gaard and
Sebastian Ruder and
Ivan Vulic},
title = {On the Limitations of Unsupervised Bilingual Dictionary Induction},
pages = {778--788},
publisher = {Annual Meeting of the Annual Meeting of the Association for Computational Linguistics},
year = {2018}
}
@inproceedings{DBLP:conf/naacl/HeymanVVM19,
author = {Geert Heyman and
Bregt Verreet and
Ivan Vulic and
Marie-Francine Moens},
title = {Learning Unsupervised Multilingual Word Embeddings with Incremental
Multilingual Hubs},
pages = {1890--1902},
publisher = {Annual Meeting of the Annual Meeting of the Association for Computational Linguistics},
year = {2019}
}
@article{2019ADabre,
title={A Survey of Multilingual Neural Machine Translation},
author={Dabre, Raj and Chu, Chenhui and Kunchukuttan, Anoop },
year={2019},
}
@inproceedings{DBLP:conf/naacl/ZophK16,
author = {Barret Zoph and
Kevin Knight},
title = {Multi-Source Neural Translation},
pages = {30--34},
publisher = {Annual Conference of the North American Chapter of the Association for Computational Linguistics},
year = {2016}
}
@inproceedings{DBLP:conf/naacl/FiratCB16,
author = {Orhan Firat and
Kyunghyun Cho and
Yoshua Bengio},
title = {Multi-Way, Multilingual Neural Machine Translation with a Shared Attention
Mechanism},
pages = {866--875},
publisher = {Annual Meeting of the Association for Computational Linguistics},
year = {2016}
}
@article{DBLP:journals/tacl/JohnsonSLKWCTVW17,
author = {Melvin Johnson and
Mike Schuster and
Quoc V. Le and
Maxim Krikun and
Yonghui Wu and
Zhifeng Chen and
Nikhil Thorat and
Fernanda B. Vi{\'{e}}gas and
Martin Wattenberg and
Greg Corrado and
Macduff Hughes and
Jeffrey Dean},
title = {Google's Multilingual Neural Machine Translation System: Enabling
Zero-Shot Translation},
journal = {Trans. Assoc. Comput. Linguistics},
volume = {5},
pages = {339--351},
year = {2017}
}
@inproceedings{DBLP:conf/emnlp/KimPPKN19,
author = {Yunsu Kim and
Petre Petrov and
Pavel Petrushkov and
Shahram Khadivi and
Hermann Ney},
title = {Pivot-based Transfer Learning for Neural Machine Translation between
Non-English Languages},
pages = {866--876},
publisher = {Association for Computational Linguistics},
year = {2019}
}
@inproceedings{DBLP:conf/acl/ChenLCL17,
author = {Yun Chen and
Yang Liu and
Yong Cheng and
Victor O. K. Li},
title = {A Teacher-Student Framework for Zero-Resource Neural Machine Translation},
pages = {1925--1935},
publisher = {Association for Computational Linguistics},
year = {2017}
}
@article{DBLP:journals/mt/WuW07,
author = {Hua Wu and
Haifeng Wang},
title = {Pivot language approach for phrase-based statistical machine translation},
journal = {Mach. Transl.},
volume = {21},
number = {3},
pages = {165--181},
year = {2007}
}
@article{Farsi2010somayeh,
author = {Somayeh Bakhshaei and Shahram Khadivi and Noushin Riahi },
title = {Farsi-german statistical machine translation through bridge language},
publisher = {International Telecommunications Symposium},
pages = {165--181},
year = {2010}
}
@inproceedings{DBLP:conf/acl/ZahabiBK13,
author = {Samira Tofighi Zahabi and
Somayeh Bakhshaei and
Shahram Khadivi},
title = {Using Context Vectors in Improving a Machine Translation System with
Bridge Language},
pages = {318--322},
publisher = {The Association for Computer Linguistics},
year = {2013}
}
@inproceedings{DBLP:conf/emnlp/ZhuHWZWZ14,
author = {Xiaoning Zhu and
Zhongjun He and
Hua Wu and
Conghui Zhu and
Haifeng Wang and
Tiejun Zhao},
title = {Improving Pivot-Based Statistical Machine Translation by Pivoting
the Co-occurrence Count of Phrase Pairs},
pages = {1665--1675},
publisher = {{ACL}},
year = {2014}
}
@inproceedings{DBLP:conf/acl/MiuraNSTN15,
author = {Akiva Miura and
Graham Neubig and
Sakriani Sakti and
Tomoki Toda and
Satoshi Nakamura},
title = {Improving Pivot Translation by Remembering the Pivot},
pages = {573--577},
publisher = {The Association for Computer Linguistics},
year = {2015}
}
@inproceedings{DBLP:conf/acl/CohnL07,
author = {Trevor Cohn and
Mirella Lapata},
title = {Machine Translation by Triangulation: Making Effective Use of Multi-Parallel
Corpora},
publisher = {The Association for Computational Linguistics},
year = {2007}
}
@article{DBLP:journals/mt/WuW07,
author = {Hua Wu and
Haifeng Wang},
title = {Pivot language approach for phrase-based statistical machine translation},
journal = {Mach. Transl.},
volume = {21},
number = {3},
pages = {165--181},
year = {2007}
}
@inproceedings{DBLP:conf/acl/WuW09,
author = {Hua Wu and
Haifeng Wang},
title = {Revisiting Pivot Language Approach for Machine Translation},
pages = {154--162},
publisher = {The Association for Computer Linguistics},
year = {2009}
}
@article{DBLP:journals/corr/ChengLYSX16,
author = {Yong Cheng and
Yang Liu and
Qian Yang and
Maosong Sun and
Wei Xu},
title = {Neural Machine Translation with Pivot Languages},
journal = {CoRR},
volume = {abs/1611.04928},
year = {2016}
}
@inproceedings{DBLP:conf/interspeech/KauersVFW02,
author = {Manuel Kauers and
Stephan Vogel and
Christian F{\"{u}}gen and
Alex Waibel},
title = {Interlingua based statistical machine translation},
publisher = {International Symposium on Computer Architecture},
year = {2002}
}
@inproceedings{de2006catalan,
title={Catalan-English statistical machine translation without parallel corpus: bridging through Spanish},
author={De Gispert, Adri{\`a} and Marino, Jose B},
booktitle={Proc. of 5th International Conference on Language Resources and Evaluation (LREC)},
pages={65--68},
year={2006}
}
@inproceedings{DBLP:conf/naacl/UtiyamaI07,
author = {Masao Utiyama and
Hitoshi Isahara},
title = {A Comparison of Pivot Methods for Phrase-Based Statistical Machine
Translation},
pages = {484--491},
publisher = {The Association for Computational Linguistics},
year = {2007}
}
@inproceedings{DBLP:conf/ijcnlp/Costa-JussaHB11,
author = {Marta R. Costa-juss{\`{a}} and
Carlos A. Henr{\'{\i}}quez Q. and
Rafael E. Banchs},
title = {Enhancing scarce-resource language translation through pivot combinations},
pages = {1361--1365},
publisher = {The Association for Computer Linguistics},
year = {2011}
}
@article{DBLP:journals/corr/HintonVD15,
author = {Geoffrey E. Hinton and
Oriol Vinyals and
Jeffrey Dean},
title = {Distilling the Knowledge in a Neural Network},
journal = {CoRR},
volume = {abs/1503.02531},
year = {2015}
}
@article{gu2018meta,
title={Meta-learning for low-resource neural machine translation},
author={Gu, Jiatao and Wang, Yong and Chen, Yun and Cho, Kyunghyun and Li, Victor OK},
journal={arXiv preprint arXiv:1808.08437},
year={2018}
}
@inproceedings{DBLP:conf/naacl/GuHDL18,
author = {Jiatao Gu and
Hany Hassan and
Jacob Devlin and
Victor O. K. Li},
title = {Universal Neural Machine Translation for Extremely Low Resource Languages},
pages = {344--354},
publisher = {Association for Computational Linguistics},
year = {2018}
}
@inproceedings{DBLP:conf/icml/FinnAL17,
author = {Chelsea Finn and
Pieter Abbeel and
Sergey Levine},
title = {Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks},
series = {Proceedings of Machine Learning Research},
volume = {70},
pages = {1126--1135},
publisher = {International Conference on Machine Learning},
year = {2017}
}
@inproceedings{DBLP:conf/acl/DongWHYW15,
author = {Daxiang Dong and
Hua Wu and
Wei He and
Dianhai Yu and
Haifeng Wang},
title = {Multi-Task Learning for Multiple Language Translation},
pages = {1723--1732},
publisher = {The Association for Computer Linguistics},
year = {2015}
}
@article{DBLP:journals/tacl/LeeCH17,
author = {Jason Lee and
Kyunghyun Cho and
Thomas Hofmann},
title = {Fully Character-Level Neural Machine Translation without Explicit
Segmentation},
journal = {Trans. Assoc. Comput. Linguistics},
volume = {5},
pages = {365--378},
year = {2017}
}
@inproceedings{DBLP:conf/lrec/RiktersPK18,
author = {Matiss Rikters and
Marcis Pinnis and
Rihards Krislauks},
title = {Training and Adapting Multilingual {NMT} for Less-resourced and Morphologically
Rich Languages},
publisher = {European Language Resources Association},
year = {2018}
}
@article{DBLP:journals/tkde/PanY10,
author = {Sinno Jialin Pan and
Qiang Yang},
title = {A Survey on Transfer Learning},
journal = {{IEEE} Trans. Knowl. Data Eng.},
volume = {22},
number = {10},
pages = {1345--1359},
year = {2010}
}
@article{DBLP:journals/tacl/JohnsonSLKWCTVW17,
author = {Melvin Johnson and
Mike Schuster and
Quoc V. Le and
Maxim Krikun and
Yonghui Wu and
Zhifeng Chen and
Nikhil Thorat and
Fernanda B. Vi{\'{e}}gas and
Martin Wattenberg and
Greg Corrado and
Macduff Hughes and
Jeffrey Dean},
title = {Google's Multilingual Neural Machine Translation System: Enabling
Zero-Shot Translation},
journal = {Trans. Assoc. Comput. Linguistics},
volume = {5},
pages = {339--351},
year = {2017}
}
@book{2009Handbook,
title={Handbook Of Research On Machine Learning Applications and Trends: Algorithms, Methods and Techniques - 2 Volumes},
author={ Olivas, Emilio Soria and Guerrero, Jose David Martin and Sober, Marcelino Martinez and Benedito, Jose Rafael Magdalena and Lopez, Antonio Jose Serrano },
publisher={Information Science Reference - Imprint of: IGI Publishing},
year={2009},
}
@incollection{DBLP:books/crc/aggarwal14/Pan14,
author = {Sinno Jialin Pan},
title = {Transfer Learning},
booktitle = {Data Classification: Algorithms and Applications},
pages = {537--570},
publisher = {{CRC} Press},
year = {2014}
}
@inproceedings{DBLP:conf/iclr/TanRHQZL19,
author = {Xu Tan and
Yi Ren and
Di He and
Tao Qin and
Zhou Zhao and
Tie-Yan Liu},
title = {Multilingual Neural Machine Translation with Knowledge Distillation},
publisher = {OpenReview.net},
year = {2019}
}
@article{platanios2018contextual,
title={Contextual parameter generation for universal neural machine translation},
author={Platanios, Emmanouil Antonios and Sachan, Mrinmaya and Neubig, Graham and Mitchell, Tom},
journal={arXiv preprint arXiv:1808.08493},
year={2018}
}
@inproceedings{ji2020cross,
title={Cross-Lingual Pre-Training Based Transfer for Zero-Shot Neural Machine Translation},
author={Ji, Baijun and Zhang, Zhirui and Duan, Xiangyu and Zhang, Min and Chen, Boxing and Luo, Weihua},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={34},
number={01},
pages={115--122},
year={2020}
}
@inproceedings{DBLP:conf/wmt/KocmiB18,
author = {Tom Kocmi and
Ondrej Bojar},
title = {Trivial Transfer Learning for Low-Resource Neural Machine Translation},
pages = {244--252},
publisher = {Association for Computational Linguistics},
year = {2018}
}
@inproceedings{DBLP:conf/acl/ZhangWTS20,
author = {Biao Zhang and
Philip Williams and
Ivan Titov and
Rico Sennrich},
title = {Improving Massively Multilingual Neural Machine Translation and Zero-Shot
Translation},
pages = {1628--1639},
publisher = {Association for Computational Linguistics},
year = {2020}
}
@inproceedings{DBLP:conf/naacl/PaulYSN09,
author = {Michael Paul and
Hirofumi Yamamoto and
Eiichiro Sumita and
Satoshi Nakamura},
title = {On the Importance of Pivot Language Selection for Statistical Machine
Translation},
pages = {221--224},
publisher = {The Association for Computational Linguistics},
year = {2009}
}
@article{dabre2019brief,
title={A Brief Survey of Multilingual Neural Machine Translation},
author={Dabre, Raj and Chu, Chenhui and Kunchukuttan, Anoop},
journal={arXiv preprint arXiv:1905.05395},
year={2019}
}
@article{dabre2020survey,
title={A survey of multilingual neural machine translation},
author={Dabre, Raj and Chu, Chenhui and Kunchukuttan, Anoop},
journal={ACM Computing Surveys (CSUR)},
volume={53},
number={5},
pages={1--38},
year={2020}
}
@inproceedings{DBLP:conf/emnlp/VulicGRK19,
author = {Ivan Vulic and
Goran Glavas and
Roi Reichart and
Anna Korhonen},
title = {Do We Really Need Fully Unsupervised Cross-Lingual Embeddings?},
pages = {4406--4417},
publisher = {Association for Computational Linguistics},
year = {2019}
}
@article{DBLP:journals/corr/MikolovLS13,
author = {Tomas Mikolov and
Quoc V. Le and
Ilya Sutskever},
title = {Exploiting Similarities among Languages for Machine Translation},
journal = {CoRR},
volume = {abs/1309.4168},
year = {2013}
}
%%%%% chapter 16------------------------------------------------------
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
......
......@@ -139,14 +139,14 @@
%\include{Chapter6/chapter6}
%\include{Chapter7/chapter7}
%\include{Chapter8/chapter8}
\include{Chapter9/chapter9}
\include{Chapter10/chapter10}
\include{Chapter11/chapter11}
\include{Chapter12/chapter12}
%\include{Chapter9/chapter9}
%\include{Chapter10/chapter10}
%\include{Chapter11/chapter11}
%\include{Chapter12/chapter12}
%\include{Chapter13/chapter13}
%\include{Chapter14/chapter14}
%\include{Chapter15/chapter15}
%\include{Chapter16/chapter16}
\include{Chapter16/chapter16}
%\include{Chapter17/chapter17}
%\include{Chapter18/chapter18}
%\include{ChapterAppend/chapterappend}
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论