9的图和符号

c1abc697 · 孟霞 · 1b52aa27 · 1b52aa27 · c1abc697 · c1abc697
Commit c1abc697 authored Oct 23, 2020 by 孟霞
--- a/Chapter9/Figures/biological-neuron.jpg
+++ b/Chapter9/Figures/biological-neuron.jpg
--- a/Chapter9/Figures/deep-learning.jpg
+++ b/Chapter9/Figures/deep-learning.jpg
--- a/Chapter9/Figures/feature-engineering.jpg
+++ b/Chapter9/Figures/feature-engineering.jpg
--- a/Chapter9/Figures/fig-bias.tex
+++ b/Chapter9/Figures/fig-bias.tex
@@ -10,7 +10,7 @@
 \node [anchor=south east,inner sep=1pt] (label2) at (0,0) {\tiny{0}};
 \node [anchor=south east,inner sep=1pt] (labela) at (0.2,-0.5) {\footnotesize{(a)}};
 }
-{\node [anchor=north west,align=left] (wblabel) at (-1.8,2) {{\scriptsize{$w_{11}=100$}}\\[-0ex] \scriptsize{\ $b_1=0$}};}
+{\node [anchor=north west,align=left] (wblabel) at (-1.8,2) {{\scriptsize{\ $w_{11}=100$}}\\[-0ex] \scriptsize{\ $b_1=0$}};}
 {\draw [-,very thick,ublue,rounded corners=0.1em] (-1.5,0) -- (0,0) -- (0,1) -- (1.5,1);}
 \end{scope}
 %---------------------------------------------------------------------------------------------
@@ -23,7 +23,7 @@
 \node [anchor=south east,inner sep=1pt] (label2) at (0,0) {\tiny{0}};
 \node [anchor=south east,inner sep=1pt] (labelb) at (0.2,-0.5) {\footnotesize{(b)}};
 }
-{\node [anchor=north west,align=left] (wblabel) at (-1.8,2) {\scriptsize{$w_{11}=100$}\\[-0ex] {\scriptsize{\ $b_1=-2$}}};}
+{\node [anchor=north west,align=left] (wblabel) at (-1.8,2) {\scriptsize{\ $w_{11}=100$}\\[-0ex] {\scriptsize{\ $b_1=-2$}}};}
 {\draw [-,very thick,ublue,rounded corners=0.1em] (-1.5,0) -- (0.25,0) -- (0.25,1) -- (1.5,1);}
 \end{scope}
 %-----------------------------------------------------------------------------------------------
@@ -36,7 +36,7 @@
 \node [anchor=south east,inner sep=1pt] (label2) at (0,0) {\tiny{0}};
 \node [anchor=south east,inner sep=1pt] (labelc) at (0.2,-0.5) {\footnotesize{(c)}};
 }
-{\node [anchor=north west,align=left] (wblabel) at (-1.8,2) {\scriptsize{$w_{11}=100$}\\[-0ex] {\scriptsize{\ $b_1=-4$}}};}
+{\node [anchor=north west,align=left] (wblabel) at (-1.8,2) {\scriptsize{\ $w_{11}=100$}\\[-0ex] {\scriptsize{\ $b_1=-4$}}};}
 {\draw [-,very thick,ublue,rounded corners=0.1em] (-1.5,0) -- (0.5,0) -- (0.5,1) -- (1.5,1);}
 \end{scope}
 \end{tikzpicture}

--- a/Chapter9/Figures/fig-fit.tex
+++ b/Chapter9/Figures/fig-fit.tex
@@ -36,8 +36,8 @@
 {\node [anchor=center,rotate=-59,fill=white,inner sep=1pt] (w2) at ([yshift=1.2em,xshift=-1.2em]x1.north) {\tiny{$w_{11}=100$}};}
 {\node [anchor=center,rotate=59,fill=white,inner sep=1pt] (b1) at ([yshift=5.1em,xshift=2.3em]b.north) {\tiny{$b_2=-4$}};}
 {\node [anchor=center,rotate=90,fill=white,inner sep=1pt] (w1) at ([yshift=3em,xshift=0.5em]x1.north) {\tiny{$w_{12}=100$}};}
-{\node [anchor=center,rotate=62,fill=white,inner sep=1pt] (w21) at ([yshift=1.8em,xshift=0.2em]n10.north) {\tiny{$w_{21}=-0.7$}};}
-{\node [anchor=center,rotate=-62,fill=white,inner sep=1pt] (w22) at ([yshift=1.8em,xshift=-0.2em]n11.north) {\tiny{$w_{22}=0.7$}};}
+{\node [anchor=center,rotate=62,fill=white,inner sep=1pt] (w21) at ([yshift=1.8em,xshift=0.2em]n10.north) {\tiny{$w'_{11}=-0.7$}};}
+{\node [anchor=center,rotate=-62,fill=white,inner sep=1pt] (w22) at ([yshift=1.8em,xshift=-0.2em]n11.north) {\tiny{$w'_{21}=0.7$}};}

 %% sigmoid box
 \begin{scope}
@@ -130,8 +130,8 @@
 {\node [anchor=center,rotate=-59,fill=white,inner sep=1pt] (w2) at ([yshift=1.2em,xshift=-1.2em]x1.north) {\tiny{$w_{11}=100$}};}
 {\node [anchor=center,rotate=59,fill=white,inner sep=1pt] (b1) at ([yshift=5.1em,xshift=2.3em]b.north) {\tiny{$b_2=-4$}};}
 {\node [anchor=center,rotate=90,fill=white,inner sep=1pt] (w1) at ([yshift=3em,xshift=0.5em]x1.north) {\tiny{$w_{12}=100$}};}
-{\node [anchor=center,rotate=62,fill=white,inner sep=1pt] (w21) at ([yshift=1.8em,xshift=0.2em]n10.north) {\tiny{$w_{21}=-0.7$}};}
-{\node [anchor=center,rotate=-62,fill=white,inner sep=1pt] (w22) at ([yshift=1.8em,xshift=-0.2em]n11.north) {\tiny{$w_{22}=0.7$}};}
+{\node [anchor=center,rotate=62,fill=white,inner sep=1pt] (w21) at ([yshift=1.8em,xshift=0.2em]n10.north) {\tiny{$w'_{11}=-0.7$}};}
+{\node [anchor=center,rotate=-62,fill=white,inner sep=1pt] (w22) at ([yshift=1.8em,xshift=-0.2em]n11.north) {\tiny{$w'_{21}=0.7$}};}

 %% sigmoid box
 \begin{scope}

--- a/Chapter9/Figures/fig-four-layers-of-neural-network.tex
+++ b/Chapter9/Figures/fig-four-layers-of-neural-network.tex
@@ -19,7 +19,7 @@
    \node [anchor=north] (x\n) at ([yshift=-2em]neuron0\n.south) {$x_\n$};
 }

-\node [anchor=west] (w1label) at ([xshift=-0.5em,yshift=0.8em]x5.north east) {${\vectorn{\emph{W}}}_1$};
+\node [anchor=west] (w1label) at ([xshift=-0.5em,yshift=0.8em]x5.north east) {${\vectorn{\emph{W}}}^{[1]}$};

 \begin{pgfonlayer}{background}
 \node [rectangle,inner sep=0.2em,fill=red!20] [fit = (neuron01) (neuron05)] (layer01) {};
@@ -47,7 +47,7 @@
    }
 }

-\node [anchor=west] (w2label) at ([xshift=-2.5em,yshift=5.4em]x5.north east) {${\vectorn{\emph{W}}}_2$};
+\node [anchor=west] (w2label) at ([xshift=-2.5em,yshift=5.4em]x5.north east) {${\vectorn{\emph{W}}}^{[2]}$};

 \begin{pgfonlayer}{background}
 {
@@ -77,7 +77,7 @@
    \draw [<-,thick] ([yshift=1.1em]neuron2\n.north) -- (neuron2\n.north);
 }

-\node [anchor=west] (w3label) at ([xshift=-2.5em,yshift=9.5em]x5.north east) {${\vectorn{\emph{W}}}_3$};
+\node [anchor=west] (w3label) at ([xshift=-2.5em,yshift=9.5em]x5.north east) {${\vectorn{\emph{W}}}^{[3]}$};

 \begin{pgfonlayer}{background}
 {

--- a/Chapter9/Figures/fig-two-layer-neural-network.tex
+++ b/Chapter9/Figures/fig-two-layer-neural-network.tex
@@ -36,8 +36,8 @@
 %% weight and bias
 {\node [anchor=center,rotate=90,fill=white,inner sep=1pt] (b0) at ([yshift=2em,xshift=-0.5em]b.north) {\scriptsize{$b_1$}};}
 {\node [anchor=center,rotate=-59,fill=white,inner sep=1pt] (w2) at ([yshift=1em,xshift=-1.0em]x1.north) {\scriptsize{$w_{11}$}};}
-{\node [anchor=center,rotate=62,fill=white,inner sep=1pt] (w21) at ([yshift=1.2em,xshift=-0.2em]n10.north) {\scriptsize{$w_{21}$}};}
-{\node [anchor=center,rotate=-62,fill=white,inner sep=1pt] (w22) at ([yshift=1.2em,xshift=0.2em]n11.north) {\scriptsize{$w_{22}$}};}
+{\node [anchor=center,rotate=62,fill=white,inner sep=1pt] (w21) at ([yshift=1.2em,xshift=-0.2em]n10.north) {\scriptsize{$w'_{11}$}};}
+{\node [anchor=center,rotate=-62,fill=white,inner sep=1pt] (w22) at ([yshift=1.2em,xshift=0.2em]n11.north) {\scriptsize{$w'_{21}$}};}
 {\node [anchor=center,rotate=59,fill=white,inner sep=1pt] (b1) at ([yshift=3.4em,xshift=1.5em]b.north) {\scriptsize{$b_2$}};}
 {\node [anchor=center,rotate=90,fill=white,inner sep=1pt] (w1) at ([yshift=2em,xshift=0.5em]x1.north) {\scriptsize{$w_{12}$}};}


--- a/Chapter9/Figures/fig-w1.tex
+++ b/Chapter9/Figures/fig-w1.tex
@@ -10,7 +10,7 @@
 \node [anchor=south east,inner sep=1pt] (label2) at (0,0) {\tiny{0}};
 \node [anchor=south east,inner sep=1pt] (labela) at (0.2,-0.5) {\footnotesize{(a)}};
 }
-{\node [anchor=north west,align=left] (wblabel) at (-1.8,2) {\scriptsize{$w_{11}=100$}\\[-0ex] {\scriptsize{\ $b_1=-4$}}};}
+{\node [anchor=north west,align=left] (wblabel) at (-1.8,2) {\scriptsize{\ $w_{11}=100$}\\[-0ex] {\scriptsize{\ $b_1=-4$}}};}
 {\draw [-,very thick,ublue,rounded corners=0.1em] (-1.5,0) -- (0.5,0) -- (0.5,1) -- (1.5,1);}
 \end{scope}
 %---------------------------------------------------------------------------------------------
@@ -23,7 +23,7 @@
 \node [anchor=south east,inner sep=1pt] (label2) at (0,0) {\tiny{0}};
 \node [anchor=south east,inner sep=1pt] (labelb) at (0.2,-0.5) {\footnotesize{(b)}};
 }
-{\node [anchor=north west,align=left] (wblabel) at (-1.8,2) {{\scriptsize{$w_{21}=0.9$}}};}
+{\node [anchor=north west,align=left] (wblabel) at (-1.8,2) {{\scriptsize{\ $w'_{11}=0.9$}}};}
 {\draw [-,very thick,ublue,rounded corners=0.1em] (-1.8,0) -- (0.5,0) -- (0.5,0.9) -- (1.8,0.9);}
 \end{scope}
 %-----------------------------------------------------------------------------------------------
@@ -37,7 +37,7 @@
 \node [anchor=south east,inner sep=1pt] (label2) at (0,0) {\tiny{0}};
 \node [anchor=south east,inner sep=1pt] (labelc) at (0.2,-0.5) {\footnotesize{(c)}};
 }
-{\node [anchor=north west,align=left] (wblabel) at (-1.8,2) {{\scriptsize{$w_{21}=0.7$}}};}
+{\node [anchor=north west,align=left] (wblabel) at (-1.8,2) {{\scriptsize{\ $w'_{11}=0.7$}}};}
 {\draw [-,very thick,ublue,rounded corners=0.1em] (-1.5,0) -- (0.5,0) -- (0.5,0.7) -- (1.5,0.7);}
 \end{scope}


--- a/Chapter9/Figures/fig-w2.tex
+++ b/Chapter9/Figures/fig-w2.tex
@@ -10,7 +10,7 @@
 \node [anchor=south east,inner sep=1pt] (label2) at (0,0) {\tiny{0}};
 \node [anchor=south east,inner sep=1pt] (labela) at (0.2,-0.5) {\footnotesize{(a)}};
 }
-{\node [anchor=north west,align=left] (wblabel) at (-1.8,2) {{\scriptsize{$w_{21}=0.7$}}};}
+{\node [anchor=north west,align=left] (wblabel) at (-1.8,2) {{\scriptsize{\ $w'_{11}=0.7$}}};}
 {\draw [-,very thick,ublue,rounded corners=0.1em] (-1.5,0) -- (0.5,0) -- (0.5,0.7) -- (1.5,0.7);}
 \end{scope}
 %---------------------------------------------------------------------------------------------
@@ -23,7 +23,7 @@
 \node [anchor=south east,inner sep=1pt] (label2) at (0,0) {\tiny{0}};
 \node [anchor=south east,inner sep=1pt] (labelb) at (0.2,-0.5) {\footnotesize{(b)}};
 }
-{\node [anchor=north west,align=left] (wblabel) at (-1.8,2) {{\scriptsize{$w_{12}=100$}}\\[-0ex] {\scriptsize{\ $b_2=-6$}}\\[-0ex] {\scriptsize{\ $w_{22}=0.7$}}};}
+{\node [anchor=north west,align=left] (wblabel) at (-1.8,2) {{\scriptsize{\ $w_{12}=100$}}\\[-0ex] {\scriptsize{\ $b_2=-6$}}\\[-0ex] {\scriptsize{\ $w'_{21}=0.7$}}};}
 {\draw [-,very thick,ublue,rounded corners=0.1em] (-1.5,0) -- (0.5,0) -- (0.5,0.7) -- (0.7,0.7) -- (0.7,1.4) -- (1.5,1.4);}
 \end{scope}
 %-----------------------------------------------------------------------------------------------
@@ -37,7 +37,7 @@
 \node [anchor=south east,inner sep=1pt] (label2) at (0,0) {\tiny{0}};
 \node [anchor=south east,inner sep=1pt] (labelc) at (0.2,-0.5) {\footnotesize{(c)}};
 }
-{\node [anchor=north west,align=left] (wblabel) at (-1.8,2) {\scriptsize{$w_{12}=100$}\\[-0ex] \scriptsize{\ $b_2=-6$}\\[-0ex] {\scriptsize{\ $w_{22}=-0.7$}}};}
+{\node [anchor=north west,align=left] (wblabel) at (-1.8,2) {\scriptsize{\ $w_{12}=100$}\\[-0ex] \scriptsize{\ $b_2=-6$}\\[-0ex] {\scriptsize{\ $w'_{21}=-0.7$}}};}
 {\draw [-,very thick,ublue,rounded corners=0.1em] (-1.5,0) -- (0.5,0) -- (0.5,0.7) -- (0.7,0.7) -- (0.7,0) -- (1.5,0);}
 \end{scope}


--- a/Chapter9/Figures/fig-weather.tex
+++ b/Chapter9/Figures/fig-weather.tex
@@ -3,32 +3,48 @@
 \tikzstyle{neuronnode} = [minimum size=2.2em,circle,draw,ublue,very thick,inner sep=1pt, fill=white,align=center,drop shadow={shadow xshift=0.1em,shadow yshift=-0.1em}]
 \node [anchor=west,minimum width=2.0em,minimum height=1.5em] (bias10) at (0,0.05)  {\footnotesize{${\vectorn{\emph{b}}}^{[1]}$}};
 \node [anchor=west,minimum width=2.0em,minimum height=1.5em] (bias11) at ([xshift=-1.5em,yshift=-0.3em]bias10.south)  {\footnotesize{偏置1}};
+\node [anchor=center,rotate=13,fill=white,inner sep=1pt] (b11) at ([yshift=1.0em,xshift=1.8em]bias10.north) {\scriptsize{$b_{11}$}};
+
 \node [anchor=west,minimum width=2.0em,minimum height=1.5em] (input10) at (2,0) {\footnotesize {$x_1$}};
 \node [anchor=west,minimum width=2.0em,minimum height=1.5em] (input11) at ([xshift=-2.1em,yshift=-0.3em]input10.south) {\footnotesize {天空状况}};
+
 \node [anchor=west,minimum width=2.0em,minimum height=1.5em] (input20) at (4,0) {\footnotesize {$x_2$}};
-\node [anchor=west,minimum width=2.0em,minimum height=1.5em] (input21) at ([xshift=-2.1em,yshift=-0.3em]input20.south) {\footnotesize {低温气温}};
+\node [anchor=center,rotate=38,fill=white,inner sep=1pt] (w12) at ([yshift=1.25em,xshift=0.85em]input20.north) {\scriptsize{$w_{22}$}};
+\node [anchor=west,minimum width=2.0em,minimum height=1.5em] (input21) at ([xshift=-2.1em,yshift=-0.3em]input20.south) {\footnotesize {低空气温}};
 \node [anchor=west,minimum width=2.0em,minimum height=1.5em] (input30) at (6,0) {\footnotesize {$x_3$}};
+\node [anchor=center,rotate=-35,fill=white,inner sep=1pt] (w13) at ([yshift=1.2em,xshift=-1.0em]input30.north) {\scriptsize{$w_{32}$}};
 \node [anchor=west,minimum width=2.0em,minimum height=1.5em] (input31) at ([xshift=-2.1em,yshift=-0.3em]input30.south) {\footnotesize {水平气压}};

-\node [neuronnode] (n10) at ([xshift=1.5em,yshift=4em]input10.east) {\tiny{$f$}\\[-1ex] \tiny{$\sum$}};
-\node [anchor=west,minimum width=2.0em,minimum height=1.5em] (bias20) at ([xshift=-4em,yshift=0.5em]n10.west)  {\footnotesize {$b^{[2]}$}};
+\node [neuronnode] (n10) at ([xshift=1.5em,yshift=4em]input10.east) {\tiny{Tanh}\\[-1ex] \tiny{$\sum$}};
+\node [anchor=east,minimum width=2.0em,minimum height=1.5em] (a1) at ([xshift=2.3em,yshift=0em]n10.east) {\footnotesize {温度}};
+\node [anchor=center,rotate=0,fill=white,inner sep=1pt] (w21) at ([xshift=0.8em,yshift=0.8em]n10.north) {\scriptsize{$w'_{11}$}};
+\node [anchor=west,minimum width=2.0em,minimum height=1.5em] (bias20) at ([xshift=-5em,yshift=0.3em]n10.west)  {\footnotesize {$b^{[2]}$}};
 \node [anchor=west,minimum width=2.0em,minimum height=1.5em] (bias21) at ([xshift=-1.5em,yshift=-0.3em]bias20.south)  {\footnotesize {偏置2}};
-\node [neuronnode] (n11) at ([xshift=1.5em,yshift=4em]input20.east){\tiny{$f$}\\[-1ex] \tiny{$\sum$}};
+\node [anchor=center,rotate=25,fill=white,inner sep=1pt] (b21) at ([yshift=1.1em,xshift=1.9em]bias20.north) {\scriptsize{$b'_{11}$}};
+\node [neuronnode] (n11) at ([xshift=1.5em,yshift=4em]input20.east){\tiny{Tanh}\\[-1ex] \tiny{$\sum$}};
+\node [anchor=east,minimum width=2.0em,minimum height=1.5em] (a1) at ([xshift=2.3em,yshift=0em]n11.east) {\footnotesize {风速}};
+\node [anchor=center,rotate=-15,fill=white,inner sep=1pt] (w22) at ([yshift=1.05em,xshift=-1.8em]n11.north) {\scriptsize{$w'_{21}$}};
 \draw [-,ublue] (n10.west) -- (n10.east);
 \draw [-,ublue] (n11.west) -- (n11.east);
-\node [neuronnode] (n20) at ([xshift=1.5em,yshift=8em]input10.east) {\tiny{$f$}\\[-1ex] \tiny{$\sum$}};
+\node [neuronnode] (n20) at ([xshift=1.5em,yshift=8em]input10.east) {\tiny{Sigmoid}\\[-1ex] \tiny{$\sum$}};
+\node [anchor=east,minimum width=2.0em,minimum height=1.5em] (a1) at ([xshift=3.9em,yshift=0em]n20.east) {\footnotesize {穿衣指数}};
 \draw [-,ublue] (n20.west) -- (n20.east);
 \node [anchor=west,minimum width=2.0em,minimum height=1.5em] (output) at ([xshift=0.5em,yshift=12em]input10.east)  {\footnotesize {$y$}};

-\draw [->,thick] (input10.north) -- (n10.south);
-\draw [->,thick] (input20.north) -- (n10.south);
-\draw [->,thick] (input20.north) -- (n11.south);
-\draw [->,thick] (input30.north) -- (n11.south);
-\draw [->,thick] (n10.north) -- (n20.south);
-\draw [->,thick] (n11.north) -- (n20.south);
-\draw [->,thick] (bias20.north) -- (n20.south);
-\draw [->,thick] (n20.north) -- (output.south);
-\draw [->,thick] (bias10.north) -- (n10.south);
+\draw [->,thick,ublue!70,line width=0.33mm] (input10.north) -- (n10.south);
+\draw [->,thick,ugreen,line width=0.33mm] (input20.north) -- (n10.south);
+\draw [->,thick,blue!80,line width=0.33mm] (input20.north) -- (n11.south);
+\draw [->,thick,red!40,line width=0.33mm] (input30.north) -- (n11.south);
+\draw [->,thick,brown,line width=0.33mm] (n10.north) -- (n20.south);
+\draw [->,thick,ugreen!40,line width=0.33mm] (n11.north) -- (n20.south);
+\draw [->,thick,purple,line width=0.33mm] (bias20.north) -- (n20.south);
+\draw [->,thick,line width=0.33mm] (n20.north) -- (output.south);
+\draw [->,thick,red!80,line width=0.33mm] (bias10.north) -- (n10.south);
+\draw [->,thick,orange,line width=0.33mm] (bias10.north) -- (n11.south);
+\node [anchor=center,rotate=10,fill=white,inner sep=1pt] (b12) at ([yshift=1.0em,xshift=4em]bias10.north) {\scriptsize{$b_{12}$}};
+\node [anchor=center,rotate=40,fill=white,inner sep=1pt] (w11) at ([yshift=0.55em,xshift=1.5em]input10.north) {\scriptsize{$w_{11}$}};
+\node [anchor=center,rotate=-30,fill=white,inner sep=1pt] (w12) at ([yshift=0.5em,xshift=-1.4em]input20.north) {\scriptsize{$w_{21}$}};
+
 \end{tikzpicture}
 %%%------------------------------------------------------------------------------------------------------------
 %%------------------------------------------------------------------------------------------------------------

--- a/Chapter9/Figures/fig-weight.tex
+++ b/Chapter9/Figures/fig-weight.tex
@@ -10,7 +10,7 @@
 \node [anchor=south east,inner sep=1pt] (label2) at (0,0) {\tiny{0}};
 \node [anchor=south east,inner sep=1pt] (labela) at (0.2,-0.5) {\footnotesize{(a)}};
 }
-{\node [anchor=north west,align=left] (wblabel) at (-1.8,2) {\scriptsize{$w_{11}=1$}\\[-0ex] \scriptsize{\ $b_1=0$}};}
+{\node [anchor=north west,align=left] (wblabel) at (-1.8,2) {\scriptsize{\ $w_{11}=1$}\\[-0ex] \scriptsize{\ $b_1=0$}};}
 {\draw [-,very thick,ublue,domain=-1.5:1.5,samples=100] plot (\x,{1/(1+exp(-2*\x))});}
 \end{scope}
 %---------------------------------------------------------------------------------------------
@@ -23,7 +23,7 @@
 \node [anchor=south east,inner sep=1pt] (label2) at (0,0) {\tiny{0}};
 \node [anchor=south east,inner sep=1pt] (labelb) at (0.2,-0.5) {\footnotesize{(b)}};
 }
-{\node [anchor=north west,align=left] (wblabel) at (-1.8,2) {{\scriptsize{$w_{11}=10$}}\\[-0ex] \scriptsize{\ $b_1=0$}};}
+{\node [anchor=north west,align=left] (wblabel) at (-1.8,2) {{\scriptsize{\ $w_{11}=10$}}\\[-0ex] \scriptsize{\ $b_1=0$}};}
 {\draw [-,very thick,ublue,domain=-1.5:1.5,samples=100] plot (\x,{1/(1+exp(-4*\x))});}
 \end{scope}
 %-----------------------------------------------------------------------------------------------
@@ -36,7 +36,7 @@
 \node [anchor=south east,inner sep=1pt] (label2) at (0,0) {\tiny{0}};
 \node [anchor=south east,inner sep=1pt] (labelc) at (0.2,-0.5) {\footnotesize{(c)}};
 }
-{\node [anchor=north west,align=left] (wblabel) at (-1.8,2) {{\scriptsize{$w_{11}=100$}}\\[-0ex] \scriptsize{\ $b_1=0$}};}
+{\node [anchor=north west,align=left] (wblabel) at (-1.8,2) {{\scriptsize{\ $w_{11}=100$}}\\[-0ex] \scriptsize{\ $b_1=0$}};}
 {\draw [-,very thick,ublue,rounded corners=0.1em] (-1.5,0) -- (0,0) -- (0,1) -- (1.5,1);}
 \end{scope}
 \end{tikzpicture}

--- a/Chapter9/chapter9.tex
+++ b/Chapter9/chapter9.tex
@@ -146,7 +146,7 @@
        \includegraphics[width=8cm]{./Chapter9/Figures/deep-learning.jpg}
    \end{minipage}%
    }
-\caption{特征工程{\small\sffamily\bfseries{vs}}端到端学习\red{（图要换）}}
+\caption{特征工程{\small\sffamily\bfseries{vs}}端到端学习}
 \label{fig:9-2}
 \end {figure}
 %------------------------------------------------------------------------------
@@ -171,13 +171,13 @@

 \subsubsection{2. 深度学习的效果}

-\parinterval 相比于传统的基于特征工程的方法，基于深度学习的模型更加方便、通用，在系统性能上也普遍更优。这里以语言建模任务为例。语言建模的目的是开发一个模型来描述词串出现的可能性（见{\chaptertwo}）。这个任务已经有着很长时间的历史。表\ref{tab:5-1}给出了不同方法在常用的PTB数据集上的困惑度结果 \footnote{困惑度越低标明语言建模的效果越好。} 。传统的$ n$-gram语言模型由于面临维度灾难和数据稀疏问题，最终语言模型的性能并不是很好。而在深度学习模型中，通过引入循环神经网络等结构，所得到的语言模型可以更好地描述序列生成的问题。而最新的基于Transformer架构的语言模型将PPL从最初的178.0 下降到了惊人的35.7。可见深度学习为这个任务带来的进步是巨大的。
+\parinterval 相比于传统的基于特征工程的方法，基于深度学习的模型更加方便、通用，在系统性能上也普遍更优。这里以语言建模任务为例。语言建模的目的是开发一个模型来描述词串出现的可能性（见{\chaptertwo}）。这个任务已经有着很长时间的历史。表\ref{tab:9-1}给出了不同方法在常用的PTB数据集上的困惑度结果 \footnote{困惑度越低标明语言建模的效果越好。} 。传统的$ n$-gram语言模型由于面临维度灾难和数据稀疏问题，最终语言模型的性能并不是很好。而在深度学习模型中，通过引入循环神经网络等结构，所得到的语言模型可以更好地描述序列生成的问题。而最新的基于Transformer架构的语言模型将PPL从最初的178.0 下降到了惊人的35.7。可见深度学习为这个任务带来的进步是巨大的。

 %----------------------------------------------------------------------------------------------------
 \begin{table}[htp]
 \centering
 \caption{不同方法在PTB语言建模任务上的困惑度（PPL）}
-\label{tab:5-1}
+\label{tab:9-1}
 \small
 \begin{tabular}{l | l l l}
 \rule{0pt}{15pt}     模型 & 作者 & 年份 & PPL  \\
@@ -508,18 +508,7 @@ l_p({\vectorn{\emph{x}}}) & = & {\Vert{\vectorn{\emph{x}}}\Vert}_p \nonumber \\
 %----------------------------------------------------------------------------------------
 \subsection{人工神经元和感知机}

-\parinterval 生物学中，神经元是神经系统的基本组成单元。图\ref{fig:9-3}展示了一个生物神经元实例。
-
-%----------------------------------------------
-\begin{figure}[htp]
-\centering
-\includegraphics[width=8cm]{./Chapter9/Figures/biological-neuron.jpg}
-\caption{生物神经元\red{（图要换）}}
-\label{fig:9-3}
-\end{figure}
-%-------------------------------------------
-
-\parinterval 同样，人工神经元是人工神经网络的基本单元。在人们的想象中，人工神经元应该与生物神经元类似。但事实上，二者在形态上是有明显差别的。如图\ref{fig:9-4} 是一个典型的人工神经元，其本质是一个形似$ y=f({\vectorn{\emph{x}}}\cdot {\vectorn{\emph{w}}}+b) $的函数。显而易见，一个神经元主要由$ {\vectorn{\emph{x}}} $，$ {\vectorn{\emph{w}}} $，$ b $，$ f $四个部分构成。其中$ {\vectorn{\emph{x}}} $是一个形如$ (x_1,x_2,\dots,x_n) $ 的实数向量，在一个神经元中担任“输入”的角色。$ {\vectorn{\emph{w}}} $通常被理解为神经元连接的{\small\sffamily\bfseries{权重}}\index{权重}（Weight）\index{Weight}（对于一个人工神经元，权重是一个向量，表示为$ {\vectorn{\emph{w}}} $；对于由多个神经元组成的神经网络，权重是一个矩阵，表示为$ {\vectorn{\emph{W}}} $），其中的每一个元素都对应着一个输入和一个输出，代表着“某输入对某输出的贡献程度”。$ b $被称作偏置（对于一个人工神经元，偏置是一个实数，表示为$b$；对于神经网络中的某一层，偏置是一个向量，表示为${\vectorn{\emph{b}}}$）。$ f $被称作激活函数，用于对输入向量各项加权和后进行某种变换。可见，一个人工神经元的功能是将输入向量与权重矩阵右乘（做内积）后，加上偏置量，经过一个激活函数得到一个标量结果。
+\parinterval 生物学中，神经元是神经系统的基本组成单元。同样，人工神经元是人工神经网络的基本单元。在人们的想象中，人工神经元应该与生物神经元类似。但事实上，二者在形态上是有明显差别的。如图\ref{fig:9-4} 是一个典型的人工神经元，其本质是一个形似$ y=f({\vectorn{\emph{x}}}\cdot {\vectorn{\emph{w}}}+b) $的函数。显而易见，一个神经元主要由$ {\vectorn{\emph{x}}} $，$ {\vectorn{\emph{w}}} $，$ b $，$ f $四个部分构成。其中$ {\vectorn{\emph{x}}} $是一个形如$ (x_1,x_2,\dots,x_n) $ 的实数向量，在一个神经元中担任“输入”的角色。$ {\vectorn{\emph{w}}} $通常被理解为神经元连接的{\small\sffamily\bfseries{权重}}\index{权重}（Weight）\index{Weight}（对于一个人工神经元，权重是一个向量，表示为$ {\vectorn{\emph{w}}} $；对于由多个神经元组成的神经网络，权重是一个矩阵，表示为$ {\vectorn{\emph{W}}} $），其中的每一个元素都对应着一个输入和一个输出，代表着“某输入对某输出的贡献程度”。$ b $被称作偏置（对于一个人工神经元，偏置是一个实数，表示为$b$；对于神经网络中的某一层，偏置是一个向量，表示为${\vectorn{\emph{b}}}$）。$ f $被称作激活函数，用于对输入向量各项加权和后进行某种变换。可见，一个人工神经元的功能是将输入向量与权重矩阵右乘（做内积）后，加上偏置量，经过一个激活函数得到一个标量结果。

 %----------------------------------------------
 \begin{figure}[htp]
@@ -806,7 +795,7 @@ x_1\cdot w_1+x_2\cdot w_2+x_3\cdot w_3 & = & 0\cdot 1+0\cdot 1+1\cdot 1 \nonumbe
 \parinterval 众所周知，单层神经网络无法解决线性不可分问题，比如经典的异或问题。但是具有一个隐藏层的两层神经网络在理论上就可以拟合所有的函数了。接下来我们分析一下为什么仅仅是多了一层，神经网络就能变得如此强大。对于二维空间（平面），“拟合”是指：把平面上一系列的点，用一条光滑的曲线连接起来，并用函数来表示这条拟合的曲线。这个概念可以推广到更高维空间上。在用神经网络解决问题时，可以通过拟合训练数据中的“ 数据点”来获得输入与输出之间的函数关系，并利用其对未知数据做出判断。可以假设输入与输出之间存在一种函数关系，而神经网络的“拟合”是要尽可能地逼近原函数输出值，与原函数输出值越逼近，则意味着拟合得越好。


-\parinterval 如图\ref{fig:9-18}是一个以Sigmoid作为隐藏层激活函数的两层神经网络。通过调整参数$ {\vectorn{\emph{W}}}_1=(w_{11},w_{12}) $，$ {\vectorn{\emph{b}}}=(b_1,b_2) $和$ {\vectorn{\emph{W}}}_2=(w_{21},w_{22}) $ 的值，可以不断地改变目标函数的形状。
+\parinterval 如图\ref{fig:9-18}是一个以Sigmoid作为隐藏层激活函数的两层神经网络。通过调整参数$ {\vectorn{\emph{W}}}^{[1]}=(w_{11},w_{12}) $，$ {\vectorn{\emph{b}}}=(b_1,b_2) $和$ {\vectorn{\emph{W}}}^{[2]}={(w'_{11},w'_{21})}^{\textrm{T}} $的值，可以不断地改变目标函数的形状。


 %----------------------------------------------
@@ -818,7 +807,7 @@ x_1\cdot w_1+x_2\cdot w_2+x_3\cdot w_3 & = & 0\cdot 1+0\cdot 1+1\cdot 1 \nonumbe
 \end{figure}
 %-------------------------------------------

-\parinterval 设置$ w_{21}=1 $，$ w_{11}=1 $，$ b_1=0 $，其他参数设置为0。可以得到如图\ref{fig:9-19}(a)所示的目标函数，此时目标函数还是比较平缓的。通过调大$ w_{11} $，可以将图\ref{fig:9-19}(a) 中函数的坡度调得更陡：当$ w_{11}=10 $时，如图\ref{fig:9-19}(b)所示，目标函数的坡度与图\ref{fig:9-19}(a)相比变得更陡了；当$ w_{11}=100 $时，如图\ref{fig:9-19}(c)所示,目标函数的坡度变得更陡、更尖锐，已经逼近一个阶梯函数。
+\parinterval 设置$ w'_{11}=1 $，$ w_{11}=1 $，$ b_1=0 $，其他参数设置为0。可以得到如图\ref{fig:9-19}(a)所示的目标函数，此时目标函数还是比较平缓的。通过调大$ w_{11} $，可以将图\ref{fig:9-19}(a) 中函数的坡度调得更陡：当$ w_{11}=10 $时，如图\ref{fig:9-19}(b)所示，目标函数的坡度与图\ref{fig:9-19}(a)相比变得更陡了；当$ w_{11}=100 $时，如图\ref{fig:9-19}(c)所示,目标函数的坡度变得更陡、更尖锐，已经逼近一个阶梯函数。

 %----------------------------------------------
 \begin{figure}[htp]
@@ -830,7 +819,7 @@ x_1\cdot w_1+x_2\cdot w_2+x_3\cdot w_3 & = & 0\cdot 1+0\cdot 1+1\cdot 1 \nonumbe
 %-------------------------------------------


-\parinterval 设置$ w_{21}=1 $，$ w_{11}=100 $，$ b_1=0 $，其他参数设置为0。可以得到如图\ref{fig:9-20}(a)所示的目标函数，此时目标函数是一个阶梯函数，其“阶梯”恰好与y轴重合。通过改变$ b_1 $，可以将整个函数沿x轴向左右平移：当$ b_1=-2 $时，如图\ref{fig:9-20}(b)所示，与图\ref{fig:9-20}(a)相比目标函数的形状没有发生改变，但其位置沿x轴向右平移；当$ b_1=-4 $时，如图\ref{fig:9-20}(c)所示，目标函数的位置继续沿x轴向右平移。
+\parinterval 设置$ w'_{11}=1 $，$ w_{11}=100 $，$ b_1=0 $，其他参数设置为0。可以得到如图\ref{fig:9-20}(a)所示的目标函数，此时目标函数是一个阶梯函数，其“阶梯”恰好与y轴重合。通过改变$ b_1 $，可以将整个函数沿x轴向左右平移：当$ b_1=-2 $时，如图\ref{fig:9-20}(b)所示，与图\ref{fig:9-20}(a)相比目标函数的形状没有发生改变，但其位置沿x轴向右平移；当$ b_1=-4 $时，如图\ref{fig:9-20}(c)所示，目标函数的位置继续沿x轴向右平移。

 %----------------------------------------------
 \begin{figure}[htp]
@@ -841,7 +830,7 @@ x_1\cdot w_1+x_2\cdot w_2+x_3\cdot w_3 & = & 0\cdot 1+0\cdot 1+1\cdot 1 \nonumbe
 \end {figure}
 %-------------------------------------------

-\parinterval 设置$ w_{21}=1 $，$ w_{11}=100 $，$ b_1=-4 $，其他参数设置为0。可以得到如图\ref{fig:9-21}\\(a)所示的目标函数，此时目标函数是一个阶梯函数，该阶梯函数取得最大值的分段处为$ y=1 $。 通过改变$ w_{21} $，可以将目标函数“拉高”或是“压扁”。如图\ref{fig:9-21}(b)和(c)所示,目标函数变得 “扁”了。最终，该阶梯函数取得最大值的分段处约为$ y=0.7 $。
+\parinterval 设置$ w'_{11}=1 $，$ w_{11}=100 $，$ b_1=-4 $，其他参数设置为0。可以得到如图\ref{fig:9-21}\\(a)所示的目标函数，此时目标函数是一个阶梯函数，该阶梯函数取得最大值的分段处为$ y=1 $。 通过改变$ w'_{11} $，可以将目标函数“拉高”或是“压扁”。如图\ref{fig:9-21}(b)和(c)所示,目标函数变得 “扁”了。最终，该阶梯函数取得最大值的分段处约为$ y=0.7 $。

 %----------------------------------------------
 \begin{figure}[htp]
@@ -852,7 +841,7 @@ x_1\cdot w_1+x_2\cdot w_2+x_3\cdot w_3 & = & 0\cdot 1+0\cdot 1+1\cdot 1 \nonumbe
 \end {figure}
 %-------------------------------------------

-\parinterval 设置$ w_{21}=0.7 $，$ w_{11}=100 $，$ b_1=-4 $，其他参数设置为0。可以得到如图\ref{fig:9-22}(a)所示的目标函数，此时目标函数是一个阶梯函数。若是将其他参数设置为$ w_{22}=0.7 $，$ w_{12}=100 $，$ b_2=16 $，由图\ref{fig:9-22}(b)可以看出，原来目标函数的“阶梯”由一级变成了两级，由此可以推测，将第二组参数进行设置，可以使目标函数分段数增多；若将第二组参数中的$ w_{22} $由原来的$ 0.7 $设置为$ -0.7 $，可得到如图\ref{fig:9-22}(c)所示的目标函数，与图\ref{fig:9-22}(b)相比，原目标函数的“第二级阶梯”向下翻转，由此可见${\vectorn{\emph{W}}}_2$的符号决定了目标函数的翻转方向。
+\parinterval 设置$ w'_{11}=0.7 $，$ w_{11}=100 $，$ b_1=-4 $，其他参数设置为0。可以得到如图\ref{fig:9-22}(a)所示的目标函数，此时目标函数是一个阶梯函数。若是将其他参数设置为$ w'_{21}=0.7 $，$ w'_{11}=100 $，$ b_2=16 $，由图\ref{fig:9-22}(b)可以看出，原来目标函数的“阶梯”由一级变成了两级，由此可以推测，将第二组参数进行设置，可以使目标函数分段数增多；若将第二组参数中的$ w'_{21} $由原来的$ 0.7 $设置为$ -0.7 $，可得到如图\ref{fig:9-22}(c)所示的目标函数，与图\ref{fig:9-22}(b)相比，原目标函数的“第二级阶梯”向下翻转，由此可见${\vectorn{\emph{W}}}^{[2]}$的符号决定了目标函数的翻转方向。

 %----------------------------------------------
 \begin{figure}[htp]
@@ -1053,13 +1042,13 @@ f(x)=\begin{cases} 0 & x\le 0 \\x & x>0\end{cases}

 \parinterval 实现神经网络的开源系统有很多，比如，使用经典的Python工具包Numpy。也可以使用成熟的深度学习框架，比如，Tensorflow和Pytorch就是非常受欢迎的深度学习工具包，除此之外还有很多其他优秀的框架：CNTK、MXNet、PaddlePaddle、\\Keras、Chainer、dl4j、NiuTensor等。开发者可以根据自身的喜好和开发项目的要求选择所采用的框架。

-\parinterval NiuTensor是一个面向自然语言处理任务的张量库，它支持丰富的张量计算接口，如张量的声明、定义和张量的各种代数运算，各种单元算子，如$ + $、$ - $、$ \ast $、$ / $、Log（取对数）、Exp（指数运算）、Power（幂方运算）、Absolute（绝对值）等，还有Sigmoid、Softmax等激活函数，除了上述单元算子外。NiuTensor还支持张量之间的高阶运算，其中最常用的是矩阵乘法。表\ref{tab:5-2}展示了一些NiuTensor支持的其他函数操作。
+\parinterval NiuTensor是一个面向自然语言处理任务的张量库，它支持丰富的张量计算接口，如张量的声明、定义和张量的各种代数运算，各种单元算子，如$ + $、$ - $、$ \ast $、$ / $、Log（取对数）、Exp（指数运算）、Power（幂方运算）、Absolute（绝对值）等，还有Sigmoid、Softmax等激活函数，除了上述单元算子外。NiuTensor还支持张量之间的高阶运算，其中最常用的是矩阵乘法。表\ref{tab:9-2}展示了一些NiuTensor支持的其他函数操作。

 %--------------------------------------------------------------------
 \begin{table}[htp]
 \centering
 \caption{NiuTensor支持的部分函数}
-\label{tab:5-2}
+\label{tab:9-2}
 \small
 \begin{tabular}{l | l}
 \rule{0pt}{15pt}     函数 & 描述  \\
@@ -1099,7 +1088,7 @@ f(x)=\begin{cases} 0 & x\le 0 \\x & x>0\end{cases}
 \begin{figure}[htp]
 \centering
 \input{./Chapter9/Figures/fig-weather}
-\caption{判断穿衣指数问题的神经网络过程\red{（图需要改）}}
+\caption{判断穿衣指数问题的神经网络过程}
 \label{fig:9-37}
 \end{figure}
 %-------------------------------------------
@@ -1165,13 +1154,13 @@ y&=&{\textrm{Sigmoid}}({\textrm{Tanh}}({\vectorn{\emph{x}}}\cdot {\vectorn{\emph

 \parinterval 通常，可以通过设计{\small\sffamily\bfseries{损失函数}}\index{损失函数}（Loss Function）\index{Loss Function}来度量正确答案$ \widetilde{\vectorn{\emph{y}}}_i $和神经网络输出$ {\vectorn{\emph{y}}}_i $之间的偏差。而这个损失函数往往充当训练的{\small\sffamily\bfseries{目标函数}}\index{目标函数}（Objective Function）\index{Objective Function}，神经网络训练就是通过不断调整神经网络内部的参数而使损失函数最小化。图\ref{fig:9-42}展示了一个绝对值损失函数的实例。

-\parinterval 这里用$ Loss(\widetilde{\vectorn{\emph{y}}}_i,{\vectorn{\emph{y}}}_i) $表示网络输出$ {\vectorn{\emph{y}}}_i $相对于答案$ \widetilde{\vectorn{\emph{y}}}_i $的损失，简记为$ L $。表\ref{tab:5-3}是几种常见损失函数的定义。需要注意的是，没有一种损失函数可以适用于所有的问题。损失函数的选择取决于许多因素，包括：数据中是否有离群点、模型结构的选择、是否易于找到函数的导数以及预测结果的置信度等。对于相同的神经网络，不同的损失函数会对训练得到的模型产生不同的影响。对于新的问题，如果无法找到已有的、适合于该问题的损失函数，研究人员也可以自定义损失函数。因此设计新的损失函数也是神经网络中有趣的研究方向。
+\parinterval 这里用$ Loss(\widetilde{\vectorn{\emph{y}}}_i,{\vectorn{\emph{y}}}_i) $表示网络输出$ {\vectorn{\emph{y}}}_i $相对于答案$ \widetilde{\vectorn{\emph{y}}}_i $的损失，简记为$ L $。表\ref{tab:9-3}是几种常见损失函数的定义。需要注意的是，没有一种损失函数可以适用于所有的问题。损失函数的选择取决于许多因素，包括：数据中是否有离群点、模型结构的选择、是否易于找到函数的导数以及预测结果的置信度等。对于相同的神经网络，不同的损失函数会对训练得到的模型产生不同的影响。对于新的问题，如果无法找到已有的、适合于该问题的损失函数，研究人员也可以自定义损失函数。因此设计新的损失函数也是神经网络中有趣的研究方向。

 %--------------------------------------------------------------------
 \begin{table}[htp]
 \centering
 \caption{常见的损失函数}
-\label{tab:5-3}
+\label{tab:9-3}
 \small
 \begin{tabular}{l | l l}
 \rule{0pt}{15pt}     名称 & 定义 & 应用  \\
@@ -1327,13 +1316,13 @@ J({\bm \theta})&=&\frac{1}{m}\sum_{i=j}^{j+m-1}{L({\vectorn{\emph{x}}}_i,\wideti

 \parinterval 顾名思义，符号微分就是通过建立符号表达式求解微分的方法：借助符号表达式和求导公式，推导出目标函数关于自变量的微分表达式，最后再带入具体数值得到微分结果。例如，对于表达式$ L({\bm \theta})={\vectorn{\emph{x}}}\cdot {\bm \theta}+2{\bm \theta}^2 $，可以手动推导出微分表达式$ \frac{\partial L({\bm \theta})}{\partial {\bm \theta}}=\vectorn{\emph{x}}+4{\bm \theta}  $，最后将具体数值$ \vectorn{\emph{x}} = {(\begin{array}{cc} 2 & -3\end{array})} $和$ {\bm \theta} = {(\begin{array}{cc} -1 & 1\end{array})} $带入后，得到微分结果$\frac{\partial L({\bm \theta})}{\partial {\bm \theta}}= {(\begin{array}{cc} 2 & -3\end{array})}+4{(\begin{array}{cc} -1 & 1\end{array})}= {(\begin{array}{cc} -2 & 1\end{array})}$。

-\parinterval  使用这种求梯度的方法，要求必须将目标函数转化成一种完整的数学表达式，这个过程中存在{\small\bfnew{表达式膨胀}}\index{表达式膨胀}（Expression Swell）\index{Expression Swell}的问题，很容易导致符号微分求解的表达式急速“膨胀”，大大增加系统存储和处理表达式的负担。关于这个问题的一个实例请看表\ref{tab:5-4}。在深层的神经网络中，神经元数量和参数量极大，损失函数的表达式会非常冗长，不易存储和管理，而且，仅仅写出损失函数的微分表达式就是一个很庞大的工作量。从另一方面来说，这里真正需要的是微分的结果值，而不是微分表达式，推导微分表达式仅仅是求解过程中的中间产物。
+\parinterval  使用这种求梯度的方法，要求必须将目标函数转化成一种完整的数学表达式，这个过程中存在{\small\bfnew{表达式膨胀}}\index{表达式膨胀}（Expression Swell）\index{Expression Swell}的问题，很容易导致符号微分求解的表达式急速“膨胀”，大大增加系统存储和处理表达式的负担。关于这个问题的一个实例请看表\ref{tab:9-4}。在深层的神经网络中，神经元数量和参数量极大，损失函数的表达式会非常冗长，不易存储和管理，而且，仅仅写出损失函数的微分表达式就是一个很庞大的工作量。从另一方面来说，这里真正需要的是微分的结果值，而不是微分表达式，推导微分表达式仅仅是求解过程中的中间产物。

 %--------------------------------------------------------------------
 \begin{table}[htp]
 \centering
 \caption{符号微分的表达式随函数的规模增加而膨胀}
-\label{tab:5-4}
+\label{tab:9-4}
 \small
 \begin{tabular}{l | l l}
 \rule{0pt}{18pt}     函数 & 微分表达式 & 化简的微分表达式  \\
@@ -2190,6 +2179,6 @@ Jobs was the CEO of {\red{\underline{apple}}}.
 \vspace{0.5em}
 \item 为了进一步提高神经语言模型性能，除了改进模型，还可以在模型中引入新的结构或是其他有效信息，该领域也有很多典型工作值得关注。例如在神经语言模型中引入除了词嵌入以外的单词特征，如语言特征（形态、语法、语义特征等）\upcite{Wu2012FactoredLM,Adel2015SyntacticAS}、上下文信息\upcite{mikolov2012context,Wang2015LargerContextLM}、知识图谱等外部知识\upcite{Ahn2016ANK}；或是在神经语言模型中引入字符级信息，将其作为字符特征单独\upcite{Kim2016CharacterAwareNL,Hwang2017CharacterlevelLM}或与单词特征一起\upcite{Onoe2016GatedWR,Verwimp2017CharacterWordLL}送入模型中；在神经语言模型中引入双向模型也是一种十分有效的尝试，在单词预测时可以同时利用来自过去和未来的文本信息\upcite{Graves2013HybridSR,bahdanau2014neural,Peters2018DeepCW}；在神经语言模型中引入注意力机制能够明显提高模型性能，\ref{sec:9.5.2.2}节对此有简短介绍，除了Transformer模型，GPT\upcite{radford2018improving}和BERT\upcite{devlin2019bert}也是不错的工作。
 \vspace{0.5em}
-\item 词嵌入是自然语言处理近些年的重要进展。所谓“嵌入”是一类方法，理论上，把一个事物进行分布式表示的过程都可以被看作是广义上的“嵌入”。基于这种思想的表示学习也成为了自然语言处理中的前沿方法。比如，如何对树结构，甚至图结构进行分布式表示\upcite{DBLP:journals/corr/abs-1809-01854,Yin2018StructVAETL,Aharoni2017TowardsSN}成为了分析自然语言的重要方法。此外，除了语言建模，还有很多方式可以进行词嵌入的学习，比如，SENNA\upcite{collobert2011natural}、word2vec\upcite{DBLP:journals/corr/abs-1301-3781}\upcite{mikolov2013distributed}、Glove\upcite{DBLP:conf/emnlp/PenningtonSM14}、CoVe\upcite{mccann2017learned} 等。
+\item 词嵌入是自然语言处理近些年的重要进展。所谓“嵌入”是一类方法，理论上，把一个事物进行分布式表示的过程都可以被看作是广义上的“嵌入”。基于这种思想的表示学习也成为了自然语言处理中的前沿方法。比如，如何对树结构，甚至图结构进行分布式表示\upcite{DBLP:journals/corr/abs-1809-01854,Yin2018StructVAETL,Aharoni2017TowardsSN}成为了分析自然语言的重要方法。此外，除了语言建模，还有很多方式可以进行词嵌入的学习，比如，SENNA\upcite{collobert2011natural}、word2vec\upcite{DBLP:journals/corr/abs-1301-3781,mikolov2013distributed}、Glove\upcite{DBLP:conf/emnlp/PenningtonSM14}、CoVe\upcite{mccann2017learned} 等。
 \vspace{0.5em}
 \end{itemize}