update 17

bb5e660a · 曹润柘 · 32343cf4 · bb5e660a · bb5e660a · bb5e660a
Commit bb5e660a authored Dec 23, 2020 by 曹润柘
--- a/Chapter17/Figures/figure-application-of-multimodal-machine-translation-to-multitask-learning.tex
+++ b/Chapter17/Figures/figure-application-of-multimodal-machine-translation-to-multitask-learning.tex
+\tikzstyle{coder} = [rectangle,thick,rounded corners,minimum width=2.8cm,minimum height=1.3cm,text centered,draw=black,fill=red!25]
+\begin{tikzpicture}[node distance = 0,scale = 1]
+\tikzstyle{every node}=[scale=1]
+\node(x)[]{\LARGE x};
+\node(encoder)[coder, above of = x,yshift=2cm]{\Large{编码器}};
+\node(decoder_left)[coder, above of = encoder, yshift=2.5cm,fill=blue!25]{\Large{解码器}};
+\node(y_hat)[above of = decoder_left, yshift=2cm]{\LARGE{$\rm y'$}};
+\node(y)[above of = decoder_left, xshift=-3cm]{\LARGE{$\rm y$}};
+\node(decoder_right)[coder, above of = encoder, xshift=5cm,fill=yellow!25]{\Large{解码器}};
+
+\node(figure)[draw=white,above of = decoder_right,yshift=2.5cm,scale=0.25] {\includegraphics[width=0.62\textwidth]{./Chapter17/Figures/figure-bank-without-attention.png}};
+
+\draw[->,very thick](x)to(encoder);
+\draw[->,very thick](encoder)to(decoder_left)node[right,xshift=-0.1cm,yshift=-1.25cm,scale=1.1]{翻译};
+\draw[->,very thick](decoder_left)to(y_hat);
+\draw[->,very thick](y)to(decoder_left);
+\draw[->,very thick](encoder)to(decoder_right)node[right,xshift=-3.5cm,yshift=0.25cm,scale=1.1]{生成图片};
+\draw[->,very thick](decoder_right)to(figure);
+\end{tikzpicture}
\ No newline at end of file
--- a/Chapter17/Figures/figure-bank-with-attention.png
+++ b/Chapter17/Figures/figure-bank-with-attention.png
--- a/Chapter17/Figures/figure-bank-without-attention.png
+++ b/Chapter17/Figures/figure-bank-without-attention.png
--- a/Chapter17/Figures/figure-comparison-of-attention-mechanism-of-target-word-bank.tex
+++ b/Chapter17/Figures/figure-comparison-of-attention-mechanism-of-target-word-bank.tex
+\begin{tikzpicture}[node distance = 0,scale = 0.7]
+\tikzstyle{every node}=[scale=0.7]
+\node[draw=white] (input) at (0,0){\includegraphics[width=0.62\textwidth]{./Chapter17/Figures/figure-bank-without-attention.png}};(1.9,-1.4);
+\node[draw=white] (input) at (10,0){\includegraphics[width=0.62\textwidth]{./Chapter17/Figures/figure-bank-with-attention.png}};(1.9,-1.4);
+\end{tikzpicture}
\ No newline at end of file
--- a/Chapter17/Figures/figure-dog-with-hat.png
+++ b/Chapter17/Figures/figure-dog-with-hat.png
--- a/Chapter17/Figures/figure-the-encoder-explicitly-incorporates-semantic-information.tex
+++ b/Chapter17/Figures/figure-the-encoder-explicitly-incorporates-semantic-information.tex
+\tikzstyle{word} = [rectangle,thick,minimum width=2cm,minimum height=0.7cm,text centered,]
+\begin{tikzpicture}[node distance = 0,scale = 0.9]
+\tikzstyle{every node}=[scale=0.9]
+\node(figure)[draw=white,scale=0.4] {\includegraphics[width=0.62\textwidth]{./Chapter17/Figures/figure-bank-without-attention.png}};
+\node(river)[word, right of = figure, xshift=5cm, yshift=0.35cm, fill=blue!45]{river};
+\node(mountain)[word, above of = river, yshift=0.75cm, fill=blue!45]{mountain};
+\node(child)[word, above of = mountain, yshift=0.75cm, fill=blue!15]{child};
+\node(man)[word, above of = child, yshift=0.75cm, fill=blue!25]{man};
+\node(jump)[word, below of = river, yshift=-0.75cm, fill=blue!30]{jump};
+\node(bank)[word, below of = jump, yshift=-0.75cm, fill=blue!65]{bank};
+\node(sky)[word, below of = bank, yshift=-0.75cm, fill=blue!30]{sky};
+\node(tree)[word, below of = sky, yshift=-0.75cm, fill=blue!15]{tree};
+\node(cir)[circle,very thick, minimum width=0.6cm, xshift=8cm,  draw=black]{};
+\node(decoder)[rectangle, rounded corners, minimum width=2.5cm, minimum height=1.2cm, right of = cir,xshift=3cm, draw=black, fill=blue!25]{\large{解码器}};
+\node(yn_1)[below of = decoder,yshift=-2cm,scale=1.2]{$\rm y_{n-1}$};
+\node(yn_2)[above of = decoder,yshift=2cm,scale=1.2]{$\rm y'_{n-1}$(bank)};
+
+\draw[->, very thick]([xshift=0.1cm]figure.east)to([xshift=2cm]figure.east);
+\draw[-,very thick]([xshift=-0.03cm]cir.east)to([xshift=0.03cm]cir.west);
+\draw[-,very thick]([yshift=0.03cm]cir.south)to([yshift=-0.03cm]cir.north);
+\draw[->, very thick]([xshift=0.1cm]cir.east)to([xshift=-0.1cm]decoder.west);
+\draw[->, very thick](yn_1)to([yshift=-0.1cm]decoder.south);
+\draw[->, very thick]([yshift=0.1cm]decoder.north)to(yn_2);
+
+\draw[->, thick, color=blue!45]([xshift=0.05cm]river.east)to([xshift=-0.05cm]cir.west);
+\draw[->, thick, color=blue!45]([xshift=0.05cm]mountain.east)to([xshift=-0.05cm]cir.west);
+\draw[->, thick, color=blue!15]([xshift=0.05cm]child.east)to([xshift=-0.05cm]cir.west);
+\draw[->, thick, color=blue!25]([xshift=0.05cm]man.east)to([xshift=-0.05cm]cir.west);
+\draw[->, thick, color=blue!30]([xshift=0.05cm]jump.east)to([xshift=-0.05cm]cir.west);
+\draw[->, thick, color=blue!65]([xshift=0.05cm]bank.east)to([xshift=-0.05cm]cir.west);
+\draw[->, thick, color=blue!30]([xshift=0.05cm]sky.east)to([xshift=-0.05cm]cir.west);
+\draw[->, thick, color=blue!15]([xshift=0.05cm]tree.east)to([xshift=-0.05cm]cir.west);
+\end{tikzpicture}
\ No newline at end of file
--- a/Chapter17/Figures/figure-traditional-methods-of-image-description.tex
+++ b/Chapter17/Figures/figure-traditional-methods-of-image-description.tex
+\definecolor{color_gray}{rgb}{0.278,0.337,0.426}
+\definecolor{color_green}{rgb}{0.663,0.82,0.557}
+\definecolor{color_orange}{rgb}{0.957,0.694,0.514}
+\definecolor{color_blue}{rgb}{0.335,0.708,0.735}
+\tikzstyle{description} = [rectangle,rounded corners=1mm, minimum width=3cm,minimum height=0.6cm,text centered]
+\begin{tikzpicture}[node distance = 0,scale = 0.8]
+\tikzstyle{every node}=[scale=0.8]
+
+
+
+\node(figure-1)[draw=white,scale=0.25] at (0,0){\includegraphics[width=0.62\textwidth]{./Chapter17/Figures/figure-dog-with-hat.png}};
+\node(ground-1)[rectangle,rounded corners, minimum width=5cm, minimum height=3.5cm,right of = figure-1, xshift=5cm,fill=blue!15]{};
+\node(text-1)[right of = figure-1, xshift=3.6cm,yshift=2cm,scale=1.2]{\textcolor{color_gray}{描述候选池}};
+\node(text_1-1)[description, right of = figure-1, xshift=4.2cm,yshift=1.2cm,fill=color_gray!50]{\textcolor{white}{天空中有很多鸟。}};
+\node(text_2-1)[description, right of = figure-1, xshift=5.3cm,yshift=0.5cm,fill=color_green]{\textcolor{white}{孩子从河岸上跳下来。}};
+\node(text_3-1)[description, right of = figure-1, xshift=4.5cm,yshift=-0.2cm,fill=color_orange]{\textcolor{white}{狗在吐舌头。}};
+\node(surd-1)[right of = text_3-1, xshift=2cm,scale=1.5]{\textcolor{red}{$\surd$}};
+\node(text_4-1)[description, right of = figure-1, xshift=5.2cm,yshift=-0.9cm,fill=color_blue]{\textcolor{white}{男人戴着眼镜。}};
+\node(point-1)[right of = figure-1, xshift=5cm,yshift=-1.4cm,scale=1.5]{...};
+\draw[->,very thick](figure-1)to([xshift=-0.1cm]ground-1.west);
+
+\node(figure)[draw=white,scale=0.25]at ([xshift=20.0em]figure-1.east){\includegraphics[width=0.62\textwidth]{./Chapter17/Figures/figure-dog-with-hat.png}};
+\node(ground)[rectangle,rounded corners, minimum width=5cm, minimum height=1.5cm,right of = figure, xshift=5cm,yshift=-0.8cm,fill=blue!15]{\large{图片中有\underline{\textcolor{red}{狗}}，\underline{\textcolor{red}{帽子}}，\underline{\quad\ }。}};
+\node(dog)[rectangle,rounded corners, minimum width=1cm, minimum height=0.7cm,right of = figure, xshift=3cm,yshift=1.5cm,thick, draw=color_orange,fill=color_orange!50]{狗};
+\node(hat)[rectangle,rounded corners, minimum width=1.5cm, minimum height=0.7cm,right of = figure, xshift=4.5cm,yshift=1.5cm,thick, draw=color_green,fill=color_green!50]{帽子};
+\draw[->, very thick,color=black!60](figure.east)to([xshift=-0.1cm]dog.west)node[left,xshift=-0.2cm,yshift=-0.1cm,color=black]{图片检测};
+\draw[->, very thick,color=black!60]([yshift=-0.1cm]hat.south)to([yshift=0.1cm]ground.north)node[right,xshift=-0.2cm,yshift=0.5cm,color=black]{模板填充};
+
+
+\end{tikzpicture}
\ No newline at end of file
--- a/Chapter17/chapter17.tex
+++ b/Chapter17/chapter17.tex
@@ -311,6 +311,7 @@
 %----------------------------------------------------------------------------------------------------
 \begin{figure}[htp]
 \centering
+\input{./Chapter17/Figures/figure-comparison-of-attention-mechanism-of-target-word-bank}
 \caption{目标词“bank”注意力机制前后对比}
 \label{tab:17-2-3-c}
 \end{figure}
@@ -336,13 +337,14 @@

 \parinterval 基于多任务学习的方法通常是把翻译任务与其他视觉任务结合，进行联合训练。在{\chapterfifteen}和{\chaptersixteen}已经提到过多任务学习。一种常见的多任务学习框架是针对多个相关的任务，共享模型的部分参数来学习不同任务之间相似的部分，并通过特定的模块来学习每个任务特有的部分。在多模态机器翻译中，应用多任务学习的主要策略就是将翻译作为主任务，同时设置一些与其他模态相关的子任务，通过这些子任务来辅助源语言理解自身的语言知识。

-\parinterval 如图4所示，可以将多模态机器翻译任务分解为两个子任务：机器翻译和图片生成\upcite{DBLP:conf/ijcnlp/ElliottK17}。其中机器翻译作为主任务，图片生成作为子任务，图片生成这里指的是从一个图片描述生成对应图片，对于图片生成任务在后面叙述。通过单个编码器对源语言数据进行建模，然后通过两个解码器（翻译解码器和图像解码器）来学习翻译任务和图像生成任务。顶层任务学习每个任务的独立特征，底层共享参数层能够学习到更丰富的文本特征表示。另外在视觉问答领域有研究表明\upcite{DBLP:conf/nips/LuYBP16}，在多模态任务中，不宜引入多层的注意力，因为多层注意力会导致模型严重的过拟合，从另一角度来说，利用多任务学习的方式，提高模型的泛化能力，也是一种有效防止过拟合现象的方式。类似的思想，也大量使用在多模态自然语言处理中，例如图像描述生成、视觉问答\upcite{DBLP:conf/iccv/AntolALMBZP15}等。
+\parinterval 如图\ref{fig:17-13}所示，可以将多模态机器翻译任务分解为两个子任务：机器翻译和图片生成\upcite{DBLP:conf/ijcnlp/ElliottK17}。其中机器翻译作为主任务，图片生成作为子任务，图片生成这里指的是从一个图片描述生成对应图片，对于图片生成任务在后面叙述。通过单个编码器对源语言数据进行建模，然后通过两个解码器（翻译解码器和图像解码器）来学习翻译任务和图像生成任务。顶层任务学习每个任务的独立特征，底层共享参数层能够学习到更丰富的文本特征表示。另外在视觉问答领域有研究表明\upcite{DBLP:conf/nips/LuYBP16}，在多模态任务中，不宜引入多层的注意力，因为多层注意力会导致模型严重的过拟合，从另一角度来说，利用多任务学习的方式，提高模型的泛化能力，也是一种有效防止过拟合现象的方式。类似的思想，也大量使用在多模态自然语言处理中，例如图像描述生成、视觉问答\upcite{DBLP:conf/iccv/AntolALMBZP15}等。

 %----------------------------------------------------------------------------------------------------
 \begin{figure}[htp]
 \centering
+\input{./Chapter17/Figures/figure-application-of-multimodal-machine-translation-to-multitask-learning.tex}
 \caption{多模态机器翻译多任务学习的应用}
-\label{tab:17-2-4-c}
+\label{fig:17-13}
 \end{figure}
 %----------------------------------------------------------------------------------------------------

@@ -357,12 +359,13 @@
 %----------------------------------------------------------------------------------------------------
 \begin{figure}[htp]
 \centering
+\input{./Chapter17/Figures/figure-traditional-methods-of-image-description}
 \caption{图像描述传统方法}
 \label{tab:17-2-5-c}
 \end{figure}
 %----------------------------------------------------------------------------------------------------

-\parinterval 传统图像描述生成有两种范式：基于检索的方法和基于模板的方法。其中基于检索的方法（图5左）是指在指定的图像描述候选句子中选择其中的句子作为图像的描述，这种方法的弊端是所选择的句子可能会和图像很大程度上不相符。而基于模板的方法（图5右）是指在图像上检测视觉特征，然后把内容填在实现设计好的模板当中，这种方法的缺点是生成的图像描述过于呆板，‘像是在一个模子中刻出来的’说的就是这个意思。近几年来 ，由于卷积神经网络在计算机视觉领域效果显著，而循环神经网络在自然语言处理领域卓有成效，受到机器翻译领域编码器-解码器框架的启发，逐渐的，这种基于卷积神经网络作为编码器编码图像，循环神经网络作为解码器解码描述的编码器-解码器框架成了图像描述任务的基础范式。本章节，从基础的图像描述范式编码器-解码器框架展开\upcite{DBLP:conf/cvpr/VinyalsTBE15,DBLP:conf/icml/XuBKCCSZB15}，从编码器的改进、解码器的改进展开介绍。  
+\parinterval 传统图像描述生成有两种范式：基于检索的方法和基于模板的方法。其中基于检索的方法（图5左）是指在指定的图像描述候选句子中选择其中的句子作为图像的描述，这种方法的弊端是所选择的句子可能会和图像很大程度上不相符。而基于模板的方法（图5右）是指在图像上检测视觉特征，然后把内容填在实现设计好的模板当中，这种方法的缺点是生成的图像描述过于呆板，“像是在一个模子中刻出来的”说的就是这个意思。近几年来 ，由于卷积神经网络在计算机视觉领域效果显著，而循环神经网络在自然语言处理领域卓有成效，受到机器翻译领域编码器-解码器框架的启发，逐渐的，这种基于卷积神经网络作为编码器编码图像，循环神经网络作为解码器解码描述的编码器-解码器框架成了图像描述任务的基础范式。本章节，从基础的图像描述范式编码器-解码器框架展开\upcite{DBLP:conf/cvpr/VinyalsTBE15,DBLP:conf/icml/XuBKCCSZB15}，从编码器的改进、解码器的改进展开介绍。  

 %----------------------------------------------------------------------------------------
 %    NEW SUBSUB-SECTION
@@ -399,13 +402,14 @@

 \parinterval 要想使编码器-解码器框架在图像描述中充分发挥作用，编码器也要更好的表示图像信息。对于编码器的改进，大多也是从这个方向出发。通常，体现在向编码器中添加图像的语义信息\upcite{DBLP:conf/cvpr/YouJWFL16,DBLP:conf/cvpr/ChenZXNSLC17,DBLP:journals/pami/FuJCSZ17}和位置信息\upcite{DBLP:conf/cvpr/ChenZXNSLC17,DBLP:conf/ijcai/LiuSWWY17}。

-\parinterval 图像的语义信息一般是指图像中存在的实体、属性、场景等等。如图XX所示，从图像中利用属性或实体检测器提取出“child”、“river”、“bank”等等的属性词和实体词作为图像的语义信息，提取全局的图像特征初始化循环神经网络，再利用注意力机制计算目标词与属性词或实体词之间的注意力权重，根据该权重计算上下文向量，从而将编码语义信息送入解码端\upcite{DBLP:conf/cvpr/YouJWFL16}，在解码‘bank’单词时，会更关注图像语义信息中的‘bank’。当然，除了图像中的实体和属性作为语义信息外，也可以将图片的场景信息也加入到编码器当中\upcite{DBLP:journals/pami/FuJCSZ17}。有关如何做属性、实体和场景的检测，涉及到目标检测任务的工作，例如Faster-RCNN\upcite{DBLP:journals/pami/RenHG017}、YOLO\upcite{DBLP:journals/corr/abs-1804-02767,DBLP:journals/corr/abs-2004-10934}等等,这里不过多赘述。
+\parinterval 图像的语义信息一般是指图像中存在的实体、属性、场景等等。如图\ref{fig:17-16}所示，从图像中利用属性或实体检测器提取出“child”、“river”、“bank”等等的属性词和实体词作为图像的语义信息，提取全局的图像特征初始化循环神经网络，再利用注意力机制计算目标词与属性词或实体词之间的注意力权重，根据该权重计算上下文向量，从而将编码语义信息送入解码端\upcite{DBLP:conf/cvpr/YouJWFL16}，在解码‘bank’单词时，会更关注图像语义信息中的‘bank’。当然，除了图像中的实体和属性作为语义信息外，也可以将图片的场景信息也加入到编码器当中\upcite{DBLP:journals/pami/FuJCSZ17}。有关如何做属性、实体和场景的检测，涉及到目标检测任务的工作，例如Faster-RCNN\upcite{DBLP:journals/pami/RenHG017}、YOLO\upcite{DBLP:journals/corr/abs-1804-02767,DBLP:journals/corr/abs-2004-10934}等等,这里不过多赘述。

 %----------------------------------------------------------------------------------------------------
 \begin{figure}[htp]
 \centering
+\input{./Chapter17/Figures/figure-the-encoder-explicitly-incorporates-semantic-information}
 \caption{编码器“显式”融入语义信息}
-\label{tab:17-2-6-c}
+\label{fig:17-16}
 \end{figure}
 %----------------------------------------------------------------------------------------------------