合并分支 'caorunzhe' 到 'master'

Caorunzhe 查看合并请求 !717

合并分支 'caorunzhe' 到 'master'
Caorunzhe 查看合并请求 !717
8b63e72e · 曹润柘 · dfca8041 · 692876b2 · dfca8041 · dfca8041
Commit 8b63e72e authored Dec 27, 2020 by 曹润柘
--- a/Chapter15/Figures/figure-common-multi-branch-structure-1.png
+++ b/Chapter15/Figures/figure-common-multi-branch-structure-1.png
--- a/Chapter15/Figures/figure-common-multi-branch-structure-2.png
+++ b/Chapter15/Figures/figure-common-multi-branch-structure-2.png
--- a/Chapter15/Figures/figure-convolutional-attention-network.png
+++ b/Chapter15/Figures/figure-convolutional-attention-network.png
--- a/Chapter15/Figures/figure-encoder-structure-of-transformer-model-optimized-by-nas.jpg
+++ b/Chapter15/Figures/figure-encoder-structure-of-transformer-model-optimized-by-nas.jpg
--- a/Chapter15/Figures/figure-encoder-structure-of-transformer-model-optimized-by-nas.tex
+++ b/Chapter15/Figures/figure-encoder-structure-of-transformer-model-optimized-by-nas.tex
@@ -3,22 +3,22 @@

 %left
 \begin{scope}
-\foreach \x/\d in {1/2em, 2/8em, 3/18em, 4/24em}
-	\node[unit,fill=yellow!20] at (0,\d) (ln_\x) {层正则};
+\foreach \x/\d in {1/2em, 2/8em}
+	\node[unit,fill=yellow!20] at (0,\d) (ln_\x) {层正则化};

-\foreach \x/\d in {1/4em, 2/20em}
+\foreach \x/\d in {1/4em}
 	\node[unit,fill=green!20] at (0,\d) (sa_\x) {8头自注意力：512};

-\foreach \x/\d in {1/6em, 2/16em, 3/22em, 4/32em}
+\foreach \x/\d in {1/6em, 2/16em}
 	\node[draw,circle,minimum size=1em,inner sep=1pt] at (0,\d) (add_\x) {\scriptsize\bfnew{+}};

-\foreach \x/\d in {2/14em, 4/30em}
+\foreach \x/\d in {2/14em}
 	\node[unit,fill=red!20] at (0,\d) (conv_\x) {卷积$1 \times 1$：512};

-\foreach \x/\d in {1/10em,3/26em}
+\foreach \x/\d in {1/10em}
 	\node[unit,fill=red!20] at (0,\d) (conv_\x) {卷积$1 \times 1$：2048};

-\foreach \x/\d in {1/12em, 2/28em}
+\foreach \x/\d in {1/12em}
 	\node[unit,fill=blue!20] at (0,\d) (relu_\x) {RELU};

 \draw[->,thick] ([yshift=-1.4em]ln_1.-90) -- ([yshift=-0.1em]ln_1.-90);
@@ -29,80 +29,57 @@
 \draw[->,thick] ([yshift=0.1em]conv_1.90) -- ([yshift=-0.1em]relu_1.-90);
 \draw[->,thick] ([yshift=0.1em]relu_1.90) -- ([yshift=-0.1em]conv_2.-90);
 \draw[->,thick] ([yshift=0.1em]conv_2.90) -- ([yshift=-0.1em]add_2.-90);
-\draw[->,thick] ([yshift=0.1em]add_2.90) -- ([yshift=-0.1em]ln_3.-90);
-\draw[->,thick] ([yshift=0.1em]ln_3.90) -- ([yshift=-0.1em]sa_2.-90);
-\draw[->,thick] ([yshift=0.1em]sa_2.90) -- ([yshift=-0.1em]add_3.-90);
-\draw[->,thick] ([yshift=0.1em]add_3.90) -- ([yshift=-0.1em]ln_4.-90);
-\draw[->,thick] ([yshift=0.1em]ln_4.90) -- ([yshift=-0.1em]conv_3.-90);
-\draw[->,thick] ([yshift=0.1em]conv_3.90) -- ([yshift=-0.1em]relu_2.-90);
-\draw[->,thick] ([yshift=0.1em]relu_2.90) -- ([yshift=-0.1em]conv_4.-90);
-\draw[->,thick] ([yshift=0.1em]conv_4.90) -- ([yshift=-0.1em]add_4.-90);
-\draw[->,thick] ([yshift=0.1em]add_4.90) -- ([yshift=1em]add_4.90);
+\draw[->,thick] ([yshift=0.1em]add_2.90) -- ([yshift=1em]add_2.90);
+

 \draw[->,thick] ([yshift=-0.8em]ln_1.-90) .. controls ([xshift=5em,yshift=-0.8em]ln_1.-90) and ([xshift=5em]add_1.0) .. (add_1.0);
 \draw[->,thick] (add_1.0) .. controls ([xshift=5em]add_1.0) and ([xshift=5em]add_2.0) .. (add_2.0);
-\draw[->,thick] (add_2.0) .. controls ([xshift=5em]add_2.0) and ([xshift=5em]add_3.0) .. (add_3.0);
-\draw[->,thick] (add_3.0) .. controls ([xshift=5em]add_3.0) and ([xshift=5em]add_4.0) .. (add_4.0);

 \node[font=\scriptsize] at (0em, -1em){(a) Transformer编码器中若干块的结构};
 \end{scope}

 %right
 \begin{scope}[xshift=14em]
-\foreach \x/\d in {1/2em, 2/8em, 3/16em, 4/22em, 5/28em}
-	\node[unit,fill=yellow!20] at (0,\d) (ln_\x) {层正则};
+\foreach \x/\d in {1/2em, 2/8em, 3/14em}
+	\node[unit,fill=yellow!20] at (0,\d) (ln_\x) {层正则化};

 \node[unit,fill=green!20] at (0,24em) (sa_1) {8头自注意力：512};

-\foreach \x/\d in {1/6em, 2/14em, 3/20em, 4/26em, 5/36em}
+\foreach \x/\d in {1/6em, 2/12em, 3/22em}
 	\node[draw,circle,minimum size=1em,inner sep=1pt] at (0,\d) (add_\x) {\scriptsize\bfnew{+}};

-\node[unit,fill=red!20] at (0,30em) (conv_4) {卷积$1 \times 1$：2048};
-\node[unit,fill=red!20] at (0,34em) (conv_5) {卷积$1 \times 1$：512};
+\node[unit,fill=red!20] at (0,16em) (conv_4) {卷积$1 \times 1$：2048};
+\node[unit,fill=red!20] at (0,20em) (conv_5) {卷积$1 \times 1$：512};
+
+\node[unit,fill=blue!20] at (0,18em) (relu_3) {RELU};
+\node[unit,fill=cyan!20] at (0,4em) (conv_3) {Sep卷积$9 \times 1$：256};
+\node[unit,fill=green!20] at (0,10em) (sa_1) {8头自注意力：512};

-\node[unit,fill=blue!20] at (0,32em) (relu_3) {RELU};
-\node[unit,fill=red!20] at (0,4em) (glu_1) {门控线性单元：512};
-\node[unit,fill=red!20] at (-3em,10em) (conv_1) {卷积$1 \times 1$：2048};
-\node[unit,fill=cyan!20] at (3em,10em) (conv_2) {卷积$3 \times 1$：256};
-\node[unit,fill=blue!20] at (-3em,12em) (relu_1) {RELU};
-\node[unit,fill=blue!20] at (3em,12em) (relu_2) {RELU};
-\node[unit,fill=cyan!20] at (0em,18em) (conv_3) {Sep卷积$9 \times 1$：256};


 \draw[->,thick] ([yshift=-1.4em]ln_1.-90) -- ([yshift=-0.1em]ln_1.-90);
-\draw[->,thick] ([yshift=0.1em]ln_1.90) -- ([yshift=-0.1em]glu_1.-90);
-\draw[->,thick] ([yshift=0.1em]glu_1.90) -- ([yshift=-0.1em]add_1.-90);
+\draw[->,thick] ([yshift=0.1em]ln_1.90) -- ([yshift=-0.1em]conv_3.-90);
+\draw[->,thick] ([yshift=0.1em]conv_3.90) -- ([yshift=-0.1em]add_1.-90);
 \draw[->,thick] ([yshift=0.1em]add_1.90) -- ([yshift=-0.1em]ln_2.-90);
-\draw[->,thick] ([,yshift=0.1em]ln_2.135) -- ([yshift=-0.1em]conv_1.-90);
-\draw[->,thick] ([yshift=0.1em]ln_2.45) -- ([yshift=-0.1em]conv_2.-90);
-\draw[->,thick] ([yshift=0.1em]conv_1.90) -- ([yshift=-0.1em]relu_1.-90);
-\draw[->,thick] ([yshift=0.1em]conv_2.90) -- ([yshift=-0.1em]relu_2.-90);
-\draw[->,thick] ([yshift=0.1em]relu_1.90) -- ([yshift=-0.1em]add_2.-135);
-\draw[->,thick] ([yshift=0.1em]relu_2.90) -- ([yshift=-0.1em]add_2.-45);
+\draw[->,thick] ([,yshift=0.1em]ln_2.90) -- ([yshift=-0.1em]sa_1.-90);
+\draw[->,thick] ([yshift=0.1em]sa_1.90) -- ([yshift=-0.1em]add_2.-90);
 \draw[->,thick] ([yshift=0.1em]add_2.90) -- ([yshift=-0.1em]ln_3.-90);
-\draw[->,thick] ([yshift=0.1em]ln_3.90) -- ([yshift=-0.1em]conv_3.-90);
-\draw[->,thick] ([yshift=0.1em]conv_3.90) -- ([yshift=-0.1em]add_3.-90);
-\draw[->,thick] ([yshift=0.1em]add_3.90) -- ([yshift=-0.1em]ln_4.-90);
-\draw[->,thick] ([yshift=0.1em]ln_4.90) -- ([yshift=-0.1em]sa_1.-90);
-\draw[->,thick] ([yshift=0.1em]sa_1.90) -- ([yshift=-0.1em]add_4.-90);
-\draw[->,thick] ([yshift=0.1em]add_4.90) -- ([yshift=-0.1em]ln_5.-90);
-\draw[->,thick] ([yshift=0.1em]ln_5.90) -- ([yshift=-0.1em]conv_4.-90);
+\draw[->,thick] ([yshift=0.1em]ln_3.90) -- ([yshift=-0.1em]conv_4.-90);
 \draw[->,thick] ([yshift=0.1em]conv_4.90) -- ([yshift=-0.1em]relu_3.-90);
 \draw[->,thick] ([yshift=0.1em]relu_3.90) -- ([yshift=-0.1em]conv_5.-90);
-\draw[->,thick] ([yshift=0.1em]conv_5.90) -- ([yshift=-0.1em]add_5.-90);
-\draw[->,thick] ([yshift=0.1em]add_5.90) -- ([yshift=1em]add_5.90);
+\draw[->,thick] ([yshift=0.1em]conv_5.90) -- ([yshift=-0.1em]add_3.-90);
+\draw[->,thick] ([yshift=0.1em]add_3.90) -- ([yshift=1em]add_3.90);

 \draw[->,thick] ([yshift=-0.8em]ln_1.-90) .. controls ([xshift=5em,yshift=-0.8em]ln_1.-90) and ([xshift=5em]add_1.0) .. (add_1.0);
-\draw[->,thick] (add_1.0) .. controls ([xshift=8em]add_1.0) and ([xshift=8em]add_3.0) .. (add_3.0);
-\draw[->,thick] (add_3.0) .. controls ([xshift=5em]add_3.0) and ([xshift=5em]add_4.0) .. (add_4.0);
-\draw[->,thick] (add_4.0) .. controls ([xshift=5em]add_4.0) and ([xshift=5em]add_5.0) .. (add_5.0);
+\draw[->,thick] (add_1.0) .. controls ([xshift=5em]add_1.0) and ([xshift=5em]add_2.0) .. (add_2.0);
+\draw[->,thick] (add_2.0) .. controls ([xshift=5em]add_2.0) and ([xshift=5em]add_3.0) .. (add_3.0);

 \node[font=\scriptsize,align=center] at (0em, -1.5em){(b) 使用结构搜索方法优化后的 \\ Transformer编码器中若干块的结构};

-\node[minimum size=0.8em,inner sep=0pt,rounded corners=1pt,draw,fill=blue!20] (act) at (5.5em, 38em){};
+\node[minimum size=0.8em,inner sep=0pt,rounded corners=1pt,draw,fill=blue!20] (act) at (5.5em, 24em){};
 \node[anchor=west,font=\footnotesize] at ([xshift=0.1em]act.east){激活函数};
 \node[anchor=north,minimum size=0.8em,inner sep=0pt,rounded corners=1pt,draw,fill=yellow!20] (nor) at ([yshift=-0.6em]act.south){};
-\node[anchor=west,font=\footnotesize] at ([xshift=0.1em]nor.east){正则化};
+\node[anchor=west,font=\footnotesize] at ([xshift=0.1em]nor.east){层正则化};
 \node[anchor=north,minimum size=0.8em,inner sep=0pt,rounded corners=1pt,draw,fill=cyan!20] (wc) at ([yshift=-0.6em]nor.south){};
 \node[anchor=west,font=\footnotesize] at ([xshift=0.1em]wc.east){宽卷积};
 \node[anchor=north,minimum size=0.8em,inner sep=0pt,rounded corners=1pt,draw,fill=green!20] (at) at ([yshift=-0.6em]wc.south){};

--- a/Chapter15/Figures/figure-evolution-and-change-of-ml-methods.jpg
+++ b/Chapter15/Figures/figure-evolution-and-change-of-ml-methods.jpg
--- a/Chapter15/Figures/figure-layer-fusion-method.tex
+++ b/Chapter15/Figures/figure-layer-fusion-method.tex
@@ -17,7 +17,7 @@

 \node [anchor=west,rectangle,minimum height=1.5em,minimum width=2.5em,rounded corners=5pt] (n6) at ([xshift=1em,yshift=0em]n5.east) {$\ldots$};

-\node [anchor=west,encnode,draw=red!60!black!80,fill=red!20] (n7) at ([xshift=1em,yshift=0em]n6.east) {$\mathbi{h}_{N-1}$};
+\node [anchor=west,encnode,draw=red!60!black!80,fill=red!20] (n7) at ([xshift=1em,yshift=0em]n6.east) {$\mathbi{h}_{L-1}$};

 \node [anchor=north,rectangle,draw=teal!80, inner sep=0mm,minimum height=2em,minimum width=8em,fill=teal!17,rounded corners=5pt,thick] (n8) at ([xshift=3em,yshift=-1.2em]n4.south) {权重聚合$\mathbi{g}$};


--- a/Chapter15/Figures/figure-main-flow-of-neural-network-structure-search.png
+++ b/Chapter15/Figures/figure-main-flow-of-neural-network-structure-search.png
--- a/Chapter15/Figures/figure-main-flow-of-neural-network-structure-search.tex
+++ b/Chapter15/Figures/figure-main-flow-of-neural-network-structure-search.tex
+
+\begin{tikzpicture}
+\tikzstyle{node}=[draw,minimum height=1.4em,minimum width=2em,rounded corners=1pt,thick]
+
+\begin{scope}[scale=0.36]
+\tikzstyle{every node}=[scale=0.36]
+
+\node[draw=ublue,very thick,drop shadow,fill=white,minimum width=40em,minimum height=25em] (rec3) at (2.25,0){};
+\node[draw=ublue,very thick,drop shadow,fill=white,minimum width=22em,minimum height=25em] (rec2) at (-12.4,0){};
+\node[draw=ublue,very thick,drop shadow,fill=white,minimum width=24em,minimum height=25em] (rec1) at (-24,0){};
+
+%left
+\node[text=ublue] (label1) at (-26.4,4){\Huge\bfnew{结构空间}};
+\node[align=left] at (-24,-0.5){\Huge\bfnew{1.前馈神经网络} \\ [4ex] \Huge\bfnew{2.卷积神经网络} \\ [4ex] \Huge\bfnew{3.循环神经网络} \\  [4ex] \Huge\bfnew{4. Transformer网络} \\ [4ex] \Huge\bfnew{...}};
+
+\draw[ublue,very thick,-latex] (rec1.0) -- node[align=center,above,text=violet]{\huge{设计} \\ \huge{搜索} \\ \huge{空间}}(rec2.180);
+
+%mid
+\node[text=ublue] (label2) at (-14.4,4){\Huge\bfnew{搜索空间}};
+\node[align=left] at (-12.4,-0.5){\Huge\bfnew{循环神经网络} \\ [4ex] \Huge\bfnew{1.普通RNN网络} \\ [4ex] \Huge\bfnew{2. LSTM网络} \\  [4ex] \Huge\bfnew{3. GRU网络} \\ [4ex] \Huge\bfnew{...}};
+
+\draw[ublue,very thick,-latex] (rec2.0) -- node[align=center,above,text=violet]{\huge{选择} \\ \huge{搜索} \\ \huge{策略}}(rec3.180);
+
+\draw[ublue,very thick,-latex,out=-150,in=-30] (rec3.-90) to node[above,text=violet,yshift=1em]{\huge{迭代结构搜索的过程}}(rec2.-90);
+
+\draw[ublue,very thick,-latex,out=60,in=130] ([xshift=-8em]rec3.90) to node[above,text=violet]{\huge{性能评估}}([xshift=8em]rec3.90);
+%right
+\node[node] (n1) at (0,0){};
+\node[node] (n2) at (1.5,0){};
+\node[node] (n3) at (3,0){};
+\node[node] (n4) at (4.5,0){};
+\node[node] (n5) at (1.5,-1.3){};
+\node[node] (n6) at (3,-1.3){};
+\node[node] (n7) at (2.25,-2.4){};
+\node[node] (n8) at (3,1.3){};
+
+\draw[->,thick] (n1.0) -- (n2.180);
+\draw[->,thick] (n2.0) -- (n3.180);
+\draw[->,thick] (n3.0) -- (n4.180);
+\draw[->,thick,out=60,in=180] (n1.90) to (n8.180);
+\draw[->,thick,out=-10,in=90] (n8.0) to (n4.90);
+\draw[->,thick,out=90,in=-90] (n5.90) to (n3.-90);
+\draw[->,thick,out=90,in=-90] (n6.90) to (n4.-90);
+\draw[->,thick,out=90,in=-90] (n7.90) to (n5.-90);
+\draw[->,thick,out=90,in=-90] (n7.90) to (n6.-90);
+\node[font=\huge] (ht) at (-0.2,2.2){$\mathbi h_t$};
+
+\node[draw,font=\huge,inner sep=0pt,minimum width=4em,minimum height=4em,very thick,rounded corners=2pt] (ht-1) at (-3,0) {$\mathbi h_{t-1}$};
+\node[draw,font=\huge,inner sep=0pt,minimum width=4em,minimum height=4em,very thick,rounded corners=2pt] (ht+1) at (7.5,0) {$\mathbi h_{t+1}$};
+
+\node[font=\huge] (xt) at (2.25,-4.2){$x_t$};
+\node[font=\Huge]  at (9,0){$\cdots$};
+\node[font=\Huge]  at (-4.5,0){$\cdots$};
+
+\node[text=ublue] (label3) at (-2,4){\Huge\bfnew{找到的模型结构}};
+
+\node[draw,rounded corners=6pt,very thick,minimum width=16em,minimum height=15em] (box1) at (2.25,0){};
+
+\draw[->,very thick] (ht-1.0) -- (box1.180);
+\draw[->,very thick] (box1.0) -- (ht+1.180);
+\draw[->,very thick] (ht-1.90) -- ([yshift=2em]ht-1.90);
+\draw[->,very thick] (ht+1.90) -- ([yshift=2em]ht+1.90);
+\draw[->,very thick] (box1.90) -- ([yshift=2em]box1.90);
+\draw[->,very thick] ([yshift=-2em]ht-1.-90) -- (ht-1.-90);
+\draw[->,very thick] ([yshift=-2em]ht+1.-90) -- (ht+1.-90);
+\draw[->,very thick] ([yshift=-2em]box1.-90) -- (box1.-90);
+\end{scope}
+\end{tikzpicture}
\ No newline at end of file
--- a/Chapter15/Figures/figure-multi-scale-local-modeling.png
+++ b/Chapter15/Figures/figure-multi-scale-local-modeling.png
--- a/Chapter15/Figures/figure-post-norm-vs-pre-norm.tex
+++ b/Chapter15/Figures/figure-post-norm-vs-pre-norm.tex
@@ -39,8 +39,8 @@
 \node [rectangle,inner sep=0.3em,fill=blue!10] [fit = (x3) (F2) (n2) (ln2) (x4) (k2)] (box1) {};
 \end{pgfonlayer}

-\node [anchor=north] (c1) at (box0.south){\footnotesize {(a)后作方式的残差连接}};
-\node [anchor=north] (c2) at (box1.south){\footnotesize {(b)前作方式的残差连接}};
+\node [anchor=north] (c1) at (box0.south){\footnotesize {(a)Post-Norm方式的残差连接}};
+\node [anchor=north] (c2) at (box1.south){\footnotesize {(b)Pre-Norm方式的残差连接}};
 \end{scope}
 \end{tikzpicture}
 \end{center}
\ No newline at end of file
--- a/Chapter15/Figures/figure-transparent-attention-mechanism.png
+++ b/Chapter15/Figures/figure-transparent-attention-mechanism.png
--- a/Chapter15/chapter15.tex
+++ b/Chapter15/chapter15.tex
--- a/Chapter16/chapter16.tex
+++ b/Chapter16/chapter16.tex
@@ -134,7 +134,7 @@
 %----------------------------------------------------------------------------------------
 \subsubsection{3. 双语句对挖掘}

-\parinterval 在双语平行语料缺乏的时候，从可比语料中挖掘可用的双语句对也是一种有效的方法\upcite{finding2006adafre,2005Improving,DBLP:conf/emnlp/WuZHGQLL19}。可比语料是指源语言和目标语言虽然不是完全互译的文本，但是蕴含了丰富的双语对照知识，可以从中挖掘出可用的双语句对来训练。相比双语平行语料来说,可比语料相对容易获取，比如，多种语言报道的新闻事件、多种语言的维基百科词条和多种语言翻译的书籍（如圣经等）等。如图\ref{fig:16-4}中的维基百科词条所示。
+\parinterval 在双语平行语料缺乏的时候，从可比语料中挖掘可用的双语句对也是一种有效的方法\upcite{finding2006adafre,DBLP:journals/coling/MunteanuM05,DBLP:conf/emnlp/WuZHGQLL19}。可比语料是指源语言和目标语言虽然不是完全互译的文本，但是蕴含了丰富的双语对照知识，可以从中挖掘出可用的双语句对来训练。相比双语平行语料来说,可比语料相对容易获取，比如，多种语言报道的新闻事件、多种语言的维基百科词条和多种语言翻译的书籍（如圣经等）等。如图\ref{fig:16-4}中的维基百科词条所示。

 %----------------------------------------------
 \begin{figure}[htp]
@@ -187,7 +187,7 @@

 \parinterval 需要注意的是，在神经机器翻译中使用预训练词嵌入有两种方法。一种方法是直接将词嵌入作为固定的输入，也就是在训练神经机器翻译模型的过程中，并不调整词嵌入的参数。这样做的目的是完全将词嵌入模块独立出来，机器翻译可以被看作是在固定的词嵌入输入上进行的建模，从而降低了机器翻译模型学习的难度。另一种方法是仍然遵循``预训练+微调''的策略，将词嵌入作为机器翻译模型部分参数的初始值。在之后机器翻译训练过程中，词嵌入模型结果会被进一步更新。近些年，在词嵌入预训练的基础上进行微调的方法越来越受到研究者的青睐。因为在实践中发现，完全用单语数据学习的单词表示，与双语数据上的翻译任务并不完全匹配。同时目标语言的信息也会影响源语言的表示学习。

-\parinterval 虽然预训练词嵌入在海量的单语数据上学习到了丰富的表示，但词嵌入很主要的一个缺点是无法解决一词多义问题。在不同的上下文中，同一个单词经常表示不同的意思，但词嵌入是完全相同的。模型需要在编码过程中通过上下文去理解每个词在当前语境下的含义。因此，上下文词向量在近些年得到了广泛的关注\upcite{DBLP:conf/acl/PetersABP17,mccann2017learned,DBLP:conf/naacl/PetersNIGCLZ18}。上下文词嵌入是指一个词的表示不仅依赖于单词自身，还依赖于上下文语境。由于在不同的上下文中，每个词对应的词嵌入是不同的，因此无法简单地通过词嵌入矩阵来表示，通常的做法是使用海量的单语数据预训练语言模型任务，使模型具备丰富的特征提取能力\upcite{DBLP:conf/naacl/PetersNIGCLZ18,radford2018improving,devlin2019bert}。
+\parinterval 虽然预训练词嵌入在海量的单语数据上学习到了丰富的表示，但词嵌入很主要的一个缺点是无法解决一词多义问题。在不同的上下文中，同一个单词经常表示不同的意思，但词嵌入是完全相同的。模型需要在编码过程中通过上下文去理解每个词在当前语境下的含义。因此，上下文词向量在近些年得到了广泛的关注\upcite{DBLP:conf/acl/PetersABP17,mccann2017learned,DBLP:journals/corr/abs-1802-05365}。上下文词嵌入是指一个词的表示不仅依赖于单词自身，还依赖于上下文语境。由于在不同的上下文中，每个词对应的词嵌入是不同的，因此无法简单地通过词嵌入矩阵来表示，通常的做法是使用海量的单语数据预训练语言模型任务，使模型具备丰富的特征提取能力\upcite{DBLP:journals/corr/abs-1802-05365,radford2018improving,devlin2019bert}。

 %----------------------------------------------------------------------------------------
 %    NEW SUB-SUB-SECTION