合并分支 'master' 到 'caorunzhe'

Master 查看合并请求 !920

合并分支 'master' 到 'caorunzhe'
Master 查看合并请求 !920
6c97b510 · 曹润柘 · df534c62 · a6a8e910 · 6c97b510 · 6c97b510
Commit 6c97b510 authored Jan 15, 2021 by 曹润柘
--- a/Chapter17/Figures/figure-application-of-multimodal-machine-translation-to-multitask-learning.tex
+++ b/Chapter17/Figures/figure-application-of-multimodal-machine-translation-to-multitask-learning.tex
-\tikzstyle{coder} = [rectangle,rounded corners,minimum height=2.2em,minimum width=4.3em,text centered,draw=black,fill=red!25]
+\tikzstyle{coder} = [rectangle,thick,rounded corners,minimum height=2.2em,minimum width=4.3em,text centered,draw=black,fill=red!20]
 \begin{tikzpicture}[node distance = 0,scale = 0.75]
 \tikzstyle{every node}=[scale=0.75]
 \node(x)[]{$x$};
-\node(encoder)[coder, above of = x,yshift=4em]{{编码器}};
-\node(decoder_left)[coder, above of = encoder, yshift=6em,fill=blue!25]{{解码器}};
+\node(encoder)[coder, above of = x,yshift=4em]{\large{编码器}};
+\node(decoder_left)[coder, above of = encoder, yshift=6em,fill=blue!20]{\large{解码器}};
 \node(y_hat)[above of = decoder_left, yshift=4em]{{$y$}};
 \node(y)[above of = decoder_left, xshift=-6em]{{$y_{<}$}};
-\node(decoder_right)[coder, above of = encoder, xshift=11em,fill=yellow!25]{{解码器}};
+\node(decoder_right)[coder, above of = encoder, xshift=11em,fill=yellow!20]{\large{解码器}};

 \node(figure)[draw=white,above of = decoder_right,yshift=6.5em,scale=0.25] {\includegraphics[width=0.62\textwidth]{./Chapter17/Figures/figure-bank-without-attention.jpg}};

-\node [anchor=south,scale=1.2] (node1) at ([xshift=-2.5em,yshift=4.5em]y.north) {\small{$x$：源语言文本数据}};
-\node [anchor=north,scale=1.2] (node2) at ([xshift=0.57em]node1.south){\small{$y$：目标语言文本数据}};
+\node [anchor=south,scale=1.2] (node1) at ([xshift=-2.5em,yshift=4.5em]y.north) {{$x$：源语言文本数据}};
+\node [anchor=north,scale=1.2] (node2) at ([xshift=0.57em]node1.south){{$y$：目标语言文本数据}};

 \draw[->,thick](x)to(encoder);
-\draw[->,thick](encoder)to(decoder_left)node[right,xshift=-0.1cm,yshift=-1.25cm,scale=1.2]{\small{翻译}};
+\draw[->,thick](encoder)to(decoder_left)node[right,xshift=-0.1cm,yshift=-1.25cm,scale=1.2]{{翻译}};
 \draw[->,thick](decoder_left)to(y_hat);
 \draw[->,thick](y)to(decoder_left);
-\draw[->,thick](encoder)to(decoder_right)node[left,xshift=-3.1em,yshift=0.25cm,scale=1.2]{\small{生成图片}};
+\draw[->,thick](encoder)to(decoder_right)node[left,xshift=-3.1em,yshift=0.25cm,scale=1.2]{{生成图片}};
 \draw[->,thick](decoder_right)to(figure);
 \end{tikzpicture}
\ No newline at end of file
--- a/Chapter17/Figures/figure-three-ways-of-dual-decoder-speech-translation.tex
+++ b/Chapter17/Figures/figure-three-ways-of-dual-decoder-speech-translation.tex
-\tikzstyle{coder} = [rectangle,thick,rounded corners,minimum height=2.2em,minimum width=4.3em,text centered,draw=black!70,fill=red!20]
+\tikzstyle{coder} = [rectangle,thick,rounded corners,minimum height=2.2em,minimum width=4.3em,text centered,draw=black,fill=red!20]

 \begin{tikzpicture}[node distance = 0,scale = 0.75]
 \tikzstyle{every node}=[scale=0.75]

--- a/Chapter17/Figures/figure-traditional-methods-of-image-description.tex
+++ b/Chapter17/Figures/figure-traditional-methods-of-image-description.tex
@@ -2,31 +2,31 @@
 \definecolor{color_green}{rgb}{0.663,0.82,0.557}
 \definecolor{color_orange}{rgb}{0.957,0.694,0.514}
 \definecolor{color_blue}{rgb}{0.335,0.708,0.735}
-\tikzstyle{description} = [rectangle,rounded corners=1mm, minimum width=3cm,minimum height=0.6cm,text centered]
+\tikzstyle{description} = [rectangle,rounded corners=1mm, minimum width=3cm,minimum height=0.6cm,text centered,draw,thick]
 \begin{tikzpicture}[node distance = 0,scale = 0.8]
 \tikzstyle{every node}=[scale=0.8]



 \node(figure-1)[draw=white,scale=0.25] at (0,0){\includegraphics[width=0.62\textwidth]{./Chapter17/Figures/figure-dog-with-hat.png}};
-\node(ground-1)[rectangle,rounded corners, minimum width=5cm, minimum height=3.5cm,right of = figure-1, xshift=5cm,fill=blue!20]{};
-\node(text-1)[right of = figure-1, xshift=3.6cm,yshift=2cm,scale=1.2]{\textcolor{color_gray}{描述候选池}};
-\node(text_1-1)[description, right of = figure-1, xshift=4.2cm,yshift=1.2cm,fill=color_gray!50]{\textcolor{white}{天空中有很多鸟。}};
-\node(text_2-1)[description, right of = figure-1, xshift=5.3cm,yshift=0.5cm,fill=color_green]{\textcolor{white}{孩子从河岸上跳下来。}};
-\node(text_3-1)[description, right of = figure-1, xshift=4.5cm,yshift=-0.2cm,fill=color_orange]{\textcolor{white}{狗在吐舌头。}};
+\node(ground-1)[rectangle,rounded corners, minimum width=5cm, minimum height=3.5cm,right of = figure-1, xshift=5cm,fill=gray!10,draw,thick,drop shadow]{};
+\node(text-1)[right of = figure-1, xshift=3.6cm,yshift=2.1cm,scale=1.2]{{描述候选池}};
+\node(text_1-1)[description, right of = figure-1, xshift=4.2cm,yshift=1.2cm,fill=gray!20]{{天空中有很多鸟。}};
+\node(text_2-1)[description, right of = figure-1, xshift=5.3cm,yshift=0.5cm,fill=green!20]{{孩子从河岸上跳下来。}};
+\node(text_3-1)[description, right of = figure-1, xshift=4.5cm,yshift=-0.2cm,fill=orange!20]{{狗在吐舌头。}};
 \node(surd-1)[right of = text_3-1, xshift=2cm,scale=1.5]{\textcolor{red}{$\surd$}};
-\node(text_4-1)[description, right of = figure-1, xshift=5.2cm,yshift=-0.9cm,fill=color_blue]{\textcolor{white}{男人戴着眼镜。}};
+\node(text_4-1)[description, right of = figure-1, xshift=5.2cm,yshift=-0.9cm,fill=blue!20]{{男人戴着眼镜。}};
 \node(point-1)[right of = figure-1, xshift=5cm,yshift=-1.4cm,scale=1.5]{...};
 \draw[->,thick](figure-1)to([xshift=-0.1cm]ground-1.west);

 \node(figure)[draw=white,scale=0.25]at ([xshift=20.0em]figure-1.east){\includegraphics[width=0.62\textwidth]{./Chapter17/Figures/figure-dog-with-hat.png}};
-\node(ground)[rectangle,rounded corners, minimum width=5cm, minimum height=1.5cm,right of = figure, xshift=5cm,yshift=-2.6em,fill=blue!20]{\large{图片中有\underline{\textcolor{red}{狗}}，\underline{\textcolor{red}{帽子}}，\underline{\quad\ }。}};
-\node(dog)[rectangle,rounded corners, minimum width=1cm, minimum height=0.7cm,right of = figure, xshift=3cm,yshift=1.5cm,thick, draw=color_orange,fill=color_orange!50]{狗};
-\node(hat)[rectangle,rounded corners, minimum width=1.5cm, minimum height=0.7cm,right of = figure, xshift=4.5cm,yshift=1.5cm,thick, draw=color_green,fill=color_green!50]{帽子};
-\draw[->, thick,color=black!60](figure.east)to([xshift=-0.1cm]dog.west)node[left,xshift=-0.2cm,yshift=-0.1cm,color=black]{图片检测};
-\draw[->, thick,color=black!60]([yshift=-0.1cm]hat.south)to([yshift=0.1cm]ground.north)node[right,xshift=-0.2cm,yshift=0.5cm,color=black]{模板填充};
+\node(ground)[rectangle,rounded corners, minimum width=5cm, minimum height=1.5cm,right of = figure, xshift=5cm,yshift=-2.6em,fill=gray!10,draw,thick,drop shadow]{\large{图片中有\underline{\textcolor{red}{狗}}，\underline{\textcolor{red}{帽子}}，\underline{\quad\ }。}};
+\node(dog)[rectangle,rounded corners, minimum width=1cm, minimum height=0.7cm,right of = figure, xshift=3cm,yshift=1.5cm,thick, draw,fill=orange!20,thick]{狗};
+\node(hat)[rectangle,rounded corners, minimum width=1.5cm, minimum height=0.7cm,right of = figure, xshift=4.5cm,yshift=1.5cm,thick, draw,fill=green!20,thick]{帽子};
+\draw[->, thick](figure.east)to([xshift=-0.1cm]dog.west)node[left,xshift=-0.2cm,yshift=-0.1cm,color=black]{图片检测};
+\draw[->, thick]([yshift=-0.1cm]hat.south)to([yshift=0.1cm]ground.north)node[right,xshift=-0.2cm,yshift=0.5cm,color=black]{模板填充};

-\node [anchor=north](pos1)at ([xshift=-3.8em,yshift=-0.5em]ground-1.south){(a) 基于检索的图像描述生成};
-\node [anchor=north](pos2)at ([xshift=-3.8em,yshift=-0.5em]ground.south){(b) 基于模板的图像描述生成};
+\node [anchor=north](pos1)at ([xshift=-3.8em,yshift=-1em]ground-1.south){(a) 基于检索的图像描述生成};
+\node [anchor=north](pos2)at ([xshift=-3.8em,yshift=-1em]ground.south){(b) 基于模板的图像描述生成};

 \end{tikzpicture}
\ No newline at end of file
--- a/Chapter17/chapter17.tex
+++ b/Chapter17/chapter17.tex
@@ -62,7 +62,7 @@

 \subsection{音频处理}

-\parinterval 为了保证对相关内容描述的完整性，这里对语音处理的基本知识作简要介绍。不同于文本，音频本质上是经过若干信号处理之后的{\small\bfnew{波形}}（Waveform）\index{Waveform}。具体来说，声音是一种空气的震动，因此可以被转换为模拟信号。模拟信号是一段连续的信号，经过采样变为离散的数字信号。采样是每隔固定的时间记录一下声音的振幅，采样率表示每秒的采样点数，单位是赫兹（Hz）。采样率越高，结果的损失则越小。通常来说，采样的标准是能够通过离散化的数字信号重现原始语音。日常生活中使用的手机和电脑设备的采样率一般为16kHz，表示每秒16000个采样点；而音频CD的采样率可以达到44.1kHz。 经过进一步的量化，将采样点的值转换为整型数值保存，从而减少占用的存储空间，通常采用的是16位量化。将采样率和量化位数相乘，就可以得到{\small\bfnew{比特率}}\index{比特率}（Bits Per Second，BPS）\index{Bits Per Second}，表示音频每秒占用的位数。例如，16kHz采样率和16位量化的音频，比特率为256kb/s。音频处理的整体流程如图\ref{fig:17-2}所示\upcite{洪青阳2020语音识别原理与应用,陈果果2020语音识别实战}。
+\parinterval 为了保证对相关内容描述的完整性，这里对语音处理的基本知识作简要介绍。不同于文本，音频本质上是经过若干信号处理之后的{\small\bfnew{波形}}（Waveform）\index{Waveform}。具体来说，声音是一种空气的震动，因此可以被转换为模拟信号。模拟信号是一段连续的信号，经过采样变为离散的数字信号。采样是每隔固定的时间记录一下声音的振幅，采样率表示每秒的采样点数，单位是赫兹（Hz）。采样率越高，采样的结果与原始的语音越相像。通常来说，采样的标准是能够通过离散化的数字信号重现原始语音。日常生活中使用的手机和电脑设备的采样率一般为16kHz，表示每秒16000个采样点；而音频CD的采样率可以达到44.1kHz。 经过进一步的量化，将采样点的值转换为整型数值保存，从而减少占用的存储空间，通常采用的是16位量化。将采样率和量化位数相乘，就可以得到{\small\bfnew{比特率}}\index{比特率}（Bits Per Second，BPS）\index{Bits Per Second}，表示音频每秒占用的位数。例如，16kHz采样率和16位量化的音频，比特率为256kb/s。音频处理的整体流程如图\ref{fig:17-2}所示\upcite{洪青阳2020语音识别原理与应用,陈果果2020语音识别实战}。

 %----------------------------------------------------------------------------------------------------
 \begin{figure}[htp]
@@ -385,7 +385,7 @@

 \parinterval 要想使编码器-解码器框架在图像描述生成中充分发挥作用，编码器也要更好的表示图像信息。对于编码器的改进，通常体现在向编码器中添加图像的语义信息\upcite{DBLP:conf/cvpr/YouJWFL16,DBLP:conf/cvpr/ChenZXNSLC17,DBLP:journals/pami/FuJCSZ17}和位置信息\upcite{DBLP:conf/cvpr/ChenZXNSLC17,DBLP:conf/ijcai/LiuSWWY17}。

-\parinterval 图像的语义信息一般是指图像中存在的实体、属性、场景等等。如图\ref{fig:17-17}所示，从图像中利用属性或实体检测器提取出“girl”、“river”、“bank”等属性词和实体词，将他们作为图像的语义信息编码的一部分，再利用注意力机制计算目标语言单词与这些属性词或实体词之间的注意力权重\upcite{DBLP:conf/cvpr/YouJWFL16}。当然，除了图像中的实体和属性作为语义信息外，也可以将图片的场景信息加入到编码器当中\upcite{DBLP:journals/pami/FuJCSZ17}。有关如何做属性、实体和场景的检测，涉及到目标检测任务的工作，例如Faster-RCNN\upcite{DBLP:journals/pami/RenHG017}、YOLO\upcite{DBLP:journals/corr/abs-1804-02767,DBLP:journals/corr/abs-2004-10934}等等,这里不再赘述。
+\parinterval 图像的语义信息一般是指图像中存在的实体、属性、场景等等。如图\ref{fig:17-17}所示，从图像中利用属性或实体检测器提取出“jump”、“girl”、“river”、“bank”等属性词和实体词，将他们作为图像的语义信息编码的一部分，再利用注意力机制计算目标语言单词与这些属性词或实体词之间的注意力权重\upcite{DBLP:conf/cvpr/YouJWFL16}。当然，除了图像中的实体和属性作为语义信息外，也可以将图片的场景信息加入到编码器当中\upcite{DBLP:journals/pami/FuJCSZ17}。有关如何做属性、实体和场景的检测，涉及到目标检测任务的工作，例如Faster-RCNN\upcite{DBLP:journals/pami/RenHG017}、YOLO\upcite{DBLP:journals/corr/abs-1804-02767,DBLP:journals/corr/abs-2004-10934}等等,这里不再赘述。

 %----------------------------------------------------------------------------------------------------
 \begin{figure}[htp]