合并分支 'caorunzhe' 到 'master'

Caorunzhe 查看合并请求 !436

合并分支 'caorunzhe' 到 'master'
Caorunzhe 查看合并请求 !436
faa55b43 · 曹润柘 · 92a5de00 · 8fd10511 · faa55b43 · faa55b43
Commit faa55b43 authored Nov 19, 2020 by 曹润柘
--- a/Chapter9/Figures/figure-activate.tex
+++ b/Chapter9/Figures/figure-activate.tex
@@ -6,7 +6,7 @@
 \foreach \x in {-1.0,-0.5,0.0,0.5,1.0}{\draw(\x,0)--(\x,0.05)node[below,outer sep=2pt,font=\scriptsize]at(\x,0){\x};}
 \foreach \y in {1.0,0.5}{\draw(0,\y)--(0.05,\y)node[left,outer sep=2pt,font=\scriptsize]at(0,\y){\y};}
 \draw[color=red ,domain=-1.4:1, line width=1pt]plot(\x,{ln(1+(exp(\x))});
-\node[black,anchor=south] at (0,1.4) {\small $y = ln(1+e^x)$};
+\node[black,anchor=south] at (0,1.4) {\small $y = \ln(1+{\textrm e}^x)$};
 \node [anchor=south east,inner sep=1pt] (labela) at (0.8,-2) {\footnotesize{(a) Softplus}};
 \end{scope}
@@ -15,13 +15,13 @@
 \draw[->, line width=1pt](-1.4,0)--(1.4,0)node[left,below,font=\scriptsize]{$x$};
 \draw[->, line width=1pt](0,-1.4)--(0,1.4)node[right,font=\scriptsize]{$y$};
-\draw[dashed](-1.4,1)--(1.4,1);
+\draw[dashed](0,1)--(1.4,1);
 \foreach \x in {-1,-0.5,0,0.5,1}{\draw(\x,0)--(\x,0.05)node[below,outer sep=2pt,font=\scriptsize]at(\x,0){
      \pgfmathparse{(\x)*5}
      \pgfmathresult};}
-\foreach \y in {0.5,1.0}{\draw(0,\y)--(0.05,\y)node[left,outer sep=2pt,font=\scriptsize]at(0,\y){\y};}
+\foreach \y in {0.5,1.0}{\draw(0,\y)--(0.05,\y)node[left,outer sep=2pt,font=\scriptsize]at(-0.15,\y){\y};}
 \draw[color=red,domain=-1.4:1.4, line width=1pt]plot(\x,{1/(1+(exp(-5*\x)))});
-\node[black,anchor=south] at (0,1.4) {\small $y = \frac{1}{1+e^{-x}}$};
+\node[black,anchor=south] at (0,1.4) {\small $y = \frac{1}{1+{\textrm e}^{-x}}$};
 \node [anchor=south east,inner sep=1pt] (labelb) at (0.8,-2) {\footnotesize{(b) Sigmoid}};
 \end{scope}
 %%%------------------------------------------------------------------------------------------------------------
@@ -29,12 +29,12 @@
 \begin{scope}[xshift=3.2in]
 \draw[->, line width=1pt](-1.4,0)--(1.4,0)node[left,below,font=\scriptsize]{$x$};
        \draw[->, line width=1pt](0,-1.4)--(0,1.4)node[right,font=\scriptsize]{$y$};
-        \draw[dashed](-1.4,1)--(1.4,1);
+        \draw[dashed](0,1)--(1.4,1);
-        \draw[dashed](-1.4,-1)--(1.4,-1);
+        \draw[dashed](-1.4,-1)--(0,-1);
        \foreach \x in {-1.0,-0.5,0.0,0.5,1.0}{\draw(\x,0)--(\x,0.05)node[below,outer sep=2pt,font=\scriptsize]at(\x,0){\x};}
-        \foreach \y in {0.5,1.0}{\draw(0,\y)--(0.05,\y)node[left,outer sep=2pt,font=\scriptsize]at(0,\y){\y};}
+        \foreach \y in {,-1.0-0.5,0.5,1.0}{\draw(0,\y)--(0.05,\y)node[left,outer sep=2pt,font=\scriptsize]at(0,\y){\y};}
        \draw[color=red ,domain=-1.4:1.4, line width=1pt]plot(\x,{tanh(\x)});
-        \node[black,anchor=south] at (0,1.4) {\small $y = \frac{e^{x}-e^{-x}}{e^{x}+e^{-x}}$};
+        \node[black,anchor=south] at (0,1.4) {\small $y = \frac{{\textrm e}^{x}-{\textrm e}^{-x}}{{e}^{x}+e^{-x}}$};
 \node [anchor=south east,inner sep=1pt] (labelc) at (0.8,-2) {\footnotesize{(c) Tanh}};
 \end{scope}
@@ -43,8 +43,6 @@
 \begin{scope}[yshift=-1.7in]
  \draw[->, line width=1pt](-1.4,0)--(1.4,0)node[left,below,font=\scriptsize]{$x$};
        \draw[->, line width=1pt](0,-1.4)--(0,1.4)node[right,font=\scriptsize]{$y$};
-        \draw[dashed](-1.4,1)--(1.4,1);
-        \draw[dashed](-1.4,-1)--(1.4,-1);
        \foreach \x in {-1.0,-0.5,0.0,0.5,1.0}{\draw(\x,0)--(\x,0.05)node[below,outer sep=2pt,font=\scriptsize]at(\x,0){\x};}
        \foreach \y in {0.5,1.0}{\draw(0,\y)--(0.05,\y)node[left,outer sep=2pt,font=\scriptsize]at(0,\y){\y};}
        \draw[color=red ,domain=-1.4:1.4, line width=1pt]plot(\x,{max(\x,0)});
@@ -56,9 +54,8 @@
 \begin{scope}[yshift=-1.7in,xshift=1.6in]
        \draw[->, line width=1pt](-1.4,0)--(1.4,0)node[left,below,font=\scriptsize]{$x$};
        \draw[->, line width=1pt](0,-1.4)--(0,1.4)node[right,font=\scriptsize]{$y$};
-        \draw[dashed](-1.4,1)--(1.4,1);
        \foreach \x in {-1.0,-0.5,0.0,0.5,1.0}{\draw(\x,0)--(\x,0.05)node[below,outer sep=2pt,font=\scriptsize]at(\x,0){\x};}
-        \foreach \y in {0.5,1.0}{\draw(0,\y)--(0.05,\y)node[left,outer sep=2pt,font=\scriptsize]at(0,\y){\y};}
+        \foreach \y in {0.5,1.0}{\draw(0,\y)--(0.05,\y)node[left,outer sep=2pt,font=\scriptsize]at(-0.15,\y){\y};}
        \draw[color=red ,domain=-1.4:1.4, line width=1pt]plot(\x,{exp(-1*((\x)^2))});
        \node[black,anchor=south] at (0,1.4) {\small $y =e^{-x^2}$};
 \node [anchor=south east,inner sep=1pt] (labele) at (0.8,-2) {\footnotesize{(e) Gaussian}};

--- a/Chapter9/Figures/figure-broadcast.tex
+++ b/Chapter9/Figures/figure-broadcast.tex
@@ -2,10 +2,10 @@
 \begin{tikzpicture}
 \begin{scope}[xshift=0.6in]
 \setcounter{mycount1}{1}
-\draw[step=0.5cm,color=orange,thick] (-1,-0.5) grid (1,0.5);
+\draw[step=0.5cm,color=orange!70,thick] (-1,-0.5) grid (1,0.5);
 \foreach \y in {+0.25,-0.25}
  \foreach \x in {-0.75,-0.25,0.25,0.75}{
-    \node [fill=orange!20,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,\y) {$\number\value{mycount1}$};
+    \node [fill=orange!15,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,\y) {$\number\value{mycount1}$};
    \addtocounter{mycount1}{1};
  }
 \node [anchor=south] (varlabel) at (0,0.6) {$\mathbi{s}$};
@@ -14,10 +14,10 @@
 \begin{scope}[xshift=2.1in]
 \setcounter{mycount1}{1}
-\draw[step=0.5cm,color=ugreen,thick] (-1,-0.5) grid (1,0);
+\draw[step=0.5cm,color=ugreen!70,thick] (-1,-0.5) grid (1,0);
 \foreach \y in {-0.25}
  \foreach \x in {-0.75,-0.25,0.25,0.75}{
-    \node [fill=green!20,inner sep=0pt,minimum height=0.48cm,minimum width=0.48cm] at (\x,\y) {$1$};
+    \node [fill=green!15,inner sep=0pt,minimum height=0.48cm,minimum width=0.48cm] at (\x,\y) {$1$};
    \addtocounter{mycount1}{1};
  }
 \node [anchor=south] (varlabel) at (0,0.1) {$\mathbi{b}$};
@@ -28,25 +28,25 @@
 \begin{scope}[yshift=-1in]
 \setcounter{mycount1}{1}
-\draw[step=0.5cm,color=orange,thick] (-1,-0.5) grid (1,0.5);
+\draw[step=0.5cm,color=orange!70,thick] (-1,-0.5) grid (1,0.5);
 \foreach \y in {+0.25,-0.25}
  \foreach \x in {-0.75,-0.25,0.25,0.75}{
-    \node [fill=orange!20,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,\y) {$\number\value{mycount1}$};
+    \node [fill=orange!15,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,\y) {$\number\value{mycount1}$};
    \addtocounter{mycount1}{1};
  }
 \node [anchor=south] (varlabel) at (0,0.6) {$\mathbi{s}$};
 \end{scope}
 \begin{scope}[yshift=-1in,xshift=1.5in]
 \setcounter{mycount1}{1}
-\draw[step=0.5cm,color=ugreen,thick] (-1,-0.5) grid (1,0.5);
+\draw[step=0.5cm,color=ugreen!70,thick] (-1,-0.5) grid (1,0.5);
 \foreach \y in {+0.25}
  \foreach \x in {-0.75,-0.25,0.25,0.75}{
-    \node [fill=green!20,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,\y) {$1$};
+    \node [fill=green!15,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,\y) {$1$};
    \addtocounter{mycount1}{1};
  }
  \foreach \y in {-0.25}
  \foreach \x in {-0.75,-0.25,0.25,0.75}{
-    \node [fill=purple!20,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,\y) {$1$};
+    \node [fill=purple!15,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,\y) {$1$};
    \addtocounter{mycount1}{1};
  }
 \node [anchor=center] (plabel) at (-4.5em,0) {\huge{$\mathbf{+}$}};
@@ -55,10 +55,10 @@
 \end{scope}
 \begin{scope}[yshift=-1in,xshift=3in]
 \setcounter{mycount1}{2}
-\draw[step=0.5cm,color=orange,thick] (-1,-0.5) grid (1,0.5);
+\draw[step=0.5cm,color=orange!70,thick] (-1,-0.5) grid (1,0.5);
 \foreach \y in {+0.25,-0.25}
  \foreach \x in {-0.75,-0.25,0.25,0.75}{
-    \node [fill=orange!20,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,\y) {$\number\value{mycount1}$};
+    \node [fill=orange!15,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,\y) {$\number\value{mycount1}$};
    \addtocounter{mycount1}{1};
  }
 \node [anchor=center] (plabel) at (-4.5em,0) {\huge{$\mathbf{=}$}};

--- a/Chapter9/Figures/figure-different-forms-of-neuronal-input.tex
+++ b/Chapter9/Figures/figure-different-forms-of-neuronal-input.tex
@@ -2,29 +2,30 @@
 \begin{tikzpicture}
 \begin{scope}
-\draw [->,thick] (0,0) -- (2.5,0);
+\draw [->,thick] (0,0) -- (3.1,0);
-\draw [->,thick] (0,0) -- (0, 1.5);
+\draw [->,thick] (0,0) -- (0, 2.1);
-\draw [-,very thick,ublue,domain=0.6:2,samples=100] plot (\x,{ 1/\x - 0.2});
+\draw [-,very thick,ublue,domain=0.55:2.6,samples=100] plot (\x,{ 1/\x - 0.2});
-\node [anchor=east] (ylabel) at (0, 3.2em) {\footnotesize{$x_1$}};
+\node [anchor=east] (ylabel) at (0, 4.4em) {\footnotesize{$x_1$}};
-\node [anchor=north] (xlabel) at (5em, 0em) {\scriptsize{距离(km)}};
+\node [anchor=north] (xlabel) at (6.5em, 0em) {\scriptsize{距离(km)}};
 \end{scope}
-\begin{scope}[xshift=9em]
+\begin{scope}[xshift=10em]
-\draw [->,thick] (0,0) -- (2.5,0);
+\draw [->,thick] (0,0) -- (3.1,0);
-\draw [->,thick] (0,0) -- (0, 1.5);
+\draw [->,thick] (0,0) -- (0, 2.1);
-\draw [-,very thick,ublue,domain=0.4:2,samples=100] plot (\x,{ 0.5/\x});
+\draw [-,very thick,ublue,domain=0.3:2.6,samples=100] plot (\x,{ 0.5/\x});
-\node [anchor=east] (ylabel) at (0, 3.2em) {\footnotesize{$x_2$}};
+\node [anchor=east] (ylabel) at (0, 4.4em) {\footnotesize{$x_2$}};
-\node [anchor=north] (xlabel) at (5em, 0em) {\scriptsize{票价(元)}};
+\node [anchor=north] (xlabel) at (6.5em, 0em) {\scriptsize{票价(元)}};
 \end{scope}
-\begin{scope}[xshift=18em]
+\begin{scope}[xshift=20em]
-\draw [->,thick] (0,0) -- (2.5,0);
+\draw [->,thick] (0,0) -- (3.1,0);
-\draw [->,thick] (0,0) -- (0, 1.5);
+\draw [->,thick] (0,0) -- (0, 2.1);
-\node [anchor=east] (ylabel) at (0, 3.2em) {\footnotesize{$x_3$}};
+\node [anchor=east] (ylabel) at (0, 4.4em) {\footnotesize{$x_3$}};
 \node [anchor=south, fill=ublue, minimum width=1.5em, minimum height=0.1em, inner sep=0] (histogram1) at (1.5em, 0) {};
 \node [anchor=south, fill=ublue, minimum width=1.5em, minimum height=3em, inner sep=0] (histogram2) at (4.0em, 0) {};
-\node [anchor=north] (hlabel1) at (histogram1.south) {\tiny{女友不去}};
+\node [anchor=north] (hlabel1) at (histogram1.south) {\tiny{不喜欢}};
-\node [anchor=north] (hlabel2) at (histogram2.south) {\tiny{女友去}};
+\node [anchor=north] (hlabel2) at (histogram2.south) {\tiny{喜欢}};
+\node [anchor=north] (xlabel) at (6.5em, 0em) {\scriptsize{是否喜欢}};
 \end{scope}
 \end{tikzpicture}

--- a/Chapter9/Figures/figure-embedding.tex
+++ b/Chapter9/Figures/figure-embedding.tex
@@ -8,7 +8,7 @@
 \node [anchor=south] (w1) at (o1.north) {\footnotesize{桌子}};
 \node [anchor=south] (w2) at (o2.north) {\footnotesize{椅子}};
 {
-\node [anchor=south,fill=red!20!white] (cosine) at (w1.north) {\footnotesize{$\textrm{cosine}(\textrm{`桌子'},\textrm{`椅子'})=0.5$}};
+\node [anchor=south,fill=red!20!white] (cosine) at (w1.north) {\footnotesize{$\textrm{cosine}(\textrm{‘桌子’},\textrm{‘椅子’})=0.5$}};
 }
 \end{scope}
 }

--- a/Chapter9/Figures/figure-fit.tex
+++ b/Chapter9/Figures/figure-fit.tex
@@ -44,7 +44,7 @@
 {
 \node [anchor=west] (flabel) at ([xshift=0.5in]y.east) {\scriptsize{Sigmoid:}};
 \node [anchor=north east] (slabel) at ([xshift=0]flabel.south east) {\scriptsize{Sum:}};
-\node [anchor=west,inner sep=2pt] (flabel2) at (flabel.east) {\scriptsize{$f(s_2)=1/(1+e^{-s_2})$}};
+\node [anchor=west,inner sep=2pt] (flabel2) at (flabel.east) {\scriptsize{$f(s_2)=1/(1+{\textrm e}^{-s_2})$}};
 \node [anchor=west,inner sep=2pt] (flabel3) at (slabel.east) {\scriptsize{$s_2=x_1 \cdot w_{12} + b$}};
 \draw [->,thick,dotted] ([yshift=-0.3em,xshift=-0.1em]n11.60)  .. controls +(east:1) and +(west:2) ..  ([xshift=-0.2em]flabel.west) ;
@@ -138,7 +138,7 @@
 {
 \node [anchor=west] (flabel) at ([xshift=0.8in]y.east) {\scriptsize{Sigmoid:}};
 \node [anchor=north east] (slabel) at ([xshift=0]flabel.south east) {\scriptsize{Sum:}};
-\node [anchor=west,inner sep=2pt] (flabel2) at (flabel.east) {\scriptsize{$f(s_2)=1/(1+e^{-s_2})$}};
+\node [anchor=west,inner sep=2pt] (flabel2) at (flabel.east) {\scriptsize{$f(s_2)=1/(1+{\textrm e}^{-s_2})$}};
 \node [anchor=west,inner sep=2pt] (flabel3) at (slabel.east) {\scriptsize{$s_2=x_1 \cdot w_{12} + b$}};
 \draw [->,thick,dotted] ([yshift=-0.3em,xshift=-0.1em]n11.60)  .. controls +(east:1) and +(west:2) ..  ([xshift=-0.2em]flabel.west) ;
 \begin{pgfonlayer}{background}

--- a/Chapter9/Figures/figure-four-layers-of-neural-network.tex
+++ b/Chapter9/Figures/figure-four-layers-of-neural-network.tex
@@ -26,12 +26,12 @@
 \end{pgfonlayer}
 \node [anchor=west] (layer00label) at ([xshift=1.4em]x5.east) {\footnotesize{第0层}};
-\node [anchor=west] (layer00label2) at (layer00label.east) {\footnotesize{(输入层)}};
+\node [anchor=west] (layer00label2) at (layer00label.east) {\footnotesize{（输入层）}};
 {
 \node [anchor=west] (layer01label) at ([xshift=1em]layer01.east) {\footnotesize{第1层}};
 }
 {
-\node [anchor=west] (layer01label2) at (layer01label.east) {\footnotesize{({隐层})}};
+\node [anchor=west] (layer01label2) at (layer01label.east) {\footnotesize{（{隐层}）}};
 }
 %%% layer 2
@@ -57,7 +57,7 @@
 \node [anchor=west] (layer02label) at ([xshift=5em]layer02.east) {\footnotesize{第2层}};
 {
-\node [anchor=west] (layer02label2) at (layer02label.east) {\footnotesize{({隐层})}};
+\node [anchor=west] (layer02label2) at (layer02label.east) {\footnotesize{（{隐层}）}};
 }
 }
@@ -87,7 +87,7 @@
 \node [anchor=west] (layer03label) at ([xshift=1em]layer03.east) {\footnotesize{第3层}};
 {
-\node [anchor=west] (layer03label2) at (layer03label.east) {\footnotesize{({输出层})}};
+\node [anchor=west] (layer03label2) at (layer03label.east) {\footnotesize{（{输出层}）}};
 }
 }

--- a/Chapter9/Figures/figure-one-hot.tex
+++ b/Chapter9/Figures/figure-one-hot.tex
@@ -7,7 +7,7 @@
 \node [anchor=south] (w1) at (o1.north) {\footnotesize{桌子}};
 \node [anchor=south] (w2) at (o2.north) {\footnotesize{椅子}};
 {
-\node [anchor=south,fill=red!20!white] (cosine) at (w1.north) {\footnotesize{$\textrm{cosine}(\textrm{`桌子'},\textrm{`椅子'})=0$}};
+\node [anchor=south,fill=red!20!white] (cosine) at (w1.north) {\footnotesize{$\textrm{cosine}(\textrm{‘桌子’},\textrm{‘椅子’})=0$}};
 }
 \end{scope}

--- a/Chapter9/Figures/figure-parallel.tex
+++ b/Chapter9/Figures/figure-parallel.tex
@@ -22,7 +22,7 @@
 }
 \end{pgfonlayer}
-\tikzstyle{processor} = [draw,thick,fill=orange!20,minimum width=4em,align=left,rounded corners=2pt]
+\tikzstyle{processor} = [draw,thick,fill=orange!15,minimum width=4em,align=left,rounded corners=2pt]
 {
 \node [processor,anchor=north,align=center] (processor2) at ([yshift=-1.2in]serverlabel.south) {\footnotesize{处理器 2}\\\footnotesize{(G2)}};
@@ -47,15 +47,15 @@
 \footnotesize{
 {
-\node[job,anchor=south west,fill=blue!50] (fetch11) at ([xshift=6em,yshift=-0.2em]processor3.east) {\textbf{F}};
+\node[job,anchor=south west,fill=blue!30] (fetch11) at ([xshift=6em,yshift=-0.2em]processor3.east) {\textbf{F}};
-\node[job,anchor=west,fill=orange!30] (minibatch11) at ([yshift=1pt]fetch11.east) {\scriptsize{minibatch3}};
+\node[job,anchor=west,fill=orange!25] (minibatch11) at ([yshift=1pt]fetch11.east) {\scriptsize{minibatch3}};
-\node[job,anchor=west,fill=red!50] (push11) at ([yshift=1pt]minibatch11.east) {\textbf{P}};
+\node[job,anchor=west,fill=red!30] (push11) at ([yshift=1pt]minibatch11.east) {\textbf{P}};
-\node[job,anchor=north west,fill=blue!50] (fetch12) at ([xshift=0.8em]fetch11.south west) {\textbf{F}};
+\node[job,anchor=north west,fill=blue!30] (fetch12) at ([xshift=0.8em]fetch11.south west) {\textbf{F}};
-\node[job,anchor=west,fill=orange!30] (minibatch12) at ([yshift=1pt]fetch12.east) {\scriptsize{minibatch2}};
+\node[job,anchor=west,fill=orange!25] (minibatch12) at ([yshift=1pt]fetch12.east) {\scriptsize{minibatch2}};
-\node[job,anchor=west,fill=red!50] (push12) at ([yshift=1pt]minibatch12.east) {\textbf{P}};
+\node[job,anchor=west,fill=red!30] (push12) at ([yshift=1pt]minibatch12.east) {\textbf{P}};
-\node[job,anchor=north west,fill=blue!50] (fetch13) at ([xshift=0.8em]fetch12.south west) {\textbf{F}};
+\node[job,anchor=north west,fill=blue!30] (fetch13) at ([xshift=0.8em]fetch12.south west) {\textbf{F}};
-\node[job,anchor=west,fill=orange!30,minimum width=8.2em] (minibatch13) at ([yshift=1pt]fetch13.east) {\footnotesize{minibatch1}};
+\node[job,anchor=west,fill=orange!25,minimum width=8.2em] (minibatch13) at ([yshift=1pt]fetch13.east) {\footnotesize{minibatch1}};
-\node[job,anchor=west,fill=red!50] (push13) at ([yshift=1pt]minibatch13.east) {\textbf{P}};
+\node[job,anchor=west,fill=red!30] (push13) at ([yshift=1pt]minibatch13.east) {\textbf{P}};
 \node[anchor=south west,draw,fill=gray!20,minimum width=7.7em] (update11) at ([yshift=3.82em]push11.north east) {更新};
 \node[anchor=north] (G11) at (fetch11.west) {\small{G3}};
@@ -100,7 +100,7 @@
 }
 \end{pgfonlayer}
-\tikzstyle{processor} = [draw,thick,fill=orange!20,minimum width=4em,align=left,rounded corners=2pt]
+\tikzstyle{processor} = [draw,thick,fill=orange!15,minimum width=4em,align=left,rounded corners=2pt]
 {
 \node [processor,anchor=north,align=center] (processor2) at ([yshift=-1.2in]serverlabel.south) {\footnotesize{处理器 2}\\\footnotesize{(G2)}};
@@ -125,15 +125,15 @@
 \footnotesize{
 {
-\node[job,anchor=south west,fill=blue!50] (fetch21) at ([xshift=6em,yshift=-0.3em]processor3.east) {\textbf{F}};
+\node[job,anchor=south west,fill=blue!30] (fetch21) at ([xshift=6em,yshift=-0.3em]processor3.east) {\textbf{F}};
-\node[job,anchor=west,fill=orange!30] (minibatch21) at ([yshift=1pt]fetch21.east) {\scriptsize{minibatch3}};
+\node[job,anchor=west,fill=orange!25] (minibatch21) at ([yshift=1pt]fetch21.east) {\scriptsize{minibatch3}};
-\node[job,anchor=west,fill=red!50] (push21) at ([yshift=1pt]minibatch21.east) {\textbf{P}};
+\node[job,anchor=west,fill=red!30] (push21) at ([yshift=1pt]minibatch21.east) {\textbf{P}};
-\node[job,anchor=north west,fill=blue!50] (fetch22) at ([xshift=0.8em]fetch21.south west) {\textbf{F}};
+\node[job,anchor=north west,fill=blue!30] (fetch22) at ([xshift=0.8em]fetch21.south west) {\textbf{F}};
-\node[job,anchor=west,fill=orange!30] (minibatch22) at ([yshift=1pt]fetch22.east) {\scriptsize{minibatch2}};
+\node[job,anchor=west,fill=orange!25] (minibatch22) at ([yshift=1pt]fetch22.east) {\scriptsize{minibatch2}};
-\node[job,anchor=west,fill=red!50] (push22) at ([yshift=1pt]minibatch22.east) {\textbf{P}};
+\node[job,anchor=west,fill=red!30] (push22) at ([yshift=1pt]minibatch22.east) {\textbf{P}};
-\node[job,anchor=north west,fill=blue!50] (fetch23) at ([xshift=0.8em]fetch22.south west) {\textbf{F}};
+\node[job,anchor=north west,fill=blue!30] (fetch23) at ([xshift=0.8em]fetch22.south west) {\textbf{F}};
-\node[job,anchor=west,fill=orange!30,minimum width=8.25em] (minibatch23) at ([yshift=1pt]fetch23.east) {\footnotesize{minibatch1}};
+\node[job,anchor=west,fill=orange!25,minimum width=8.25em] (minibatch23) at ([yshift=1pt]fetch23.east) {\footnotesize{minibatch1}};
-\node[job,anchor=west,fill=red!50] (push23) at ([yshift=1pt]minibatch23.east) {\textbf{P}};
+\node[job,anchor=west,fill=red!30] (push23) at ([yshift=1pt]minibatch23.east) {\textbf{P}};
 \node[anchor=south west,draw,fill=gray!20,minimum width=0.6in] (update21) at ([yshift=2pt]push21.north east) {更新};
 \node[anchor=south west,draw,fill=gray!20,minimum width=0.25in] (update22) at ([yshift=2.8pt]push23.north east) {\tiny{更新}};

--- a/Chapter9/Figures/figure-piecewise.tex
+++ b/Chapter9/Figures/figure-piecewise.tex
@@ -33,7 +33,7 @@
 \draw [->,thick] (-2.2,0) -- (2.2,0);
 \draw [->,thick] (0,0) -- (0,2);
 \draw [-] (-0.05,1) -- (0.05,1);
-\node [anchor=east,inner sep=1pt] (label1) at (0,1) {\tiny{1}};
+\node [anchor=east,inner sep=1pt] (label1) at (0,1.18) {\tiny{1}};
 \node [anchor=south east,inner sep=1pt] (label2) at (0,0) {\tiny{0}};
 \node [anchor=north,inner sep=1pt] (labela) at (0,-0.2) {\footnotesize{(a)}};
 }

--- a/Chapter9/Figures/figure-rnn-model.tex
+++ b/Chapter9/Figures/figure-rnn-model.tex
@@ -10,7 +10,7 @@
 \node [anchor=north,rnnnode,fill=red!30!white] (e2) at ([yshift=-1.2em]node12.south) {\scriptsize{embedding}};
 \node [anchor=north,rnnnode,fill=red!30!white] (e3) at ([yshift=-1.2em]node13.south) {\scriptsize{embedding}};
 \node [anchor=north,rnnnode,fill=red!30!white] (e4) at ([yshift=-1.2em]node14.south) {\scriptsize{embedding}};
-\node [anchor=north] (w1) at ([yshift=-1em]e1.south) {\footnotesize{乔布斯}};
+\node [anchor=north] (w1) at ([yshift=-1em]e1.south) {\footnotesize{亚伦}};
 \node [anchor=north] (w2) at ([yshift=-1em]e2.south) {\footnotesize{任职}};
 \node [anchor=north] (w3) at ([yshift=-1em]e3.south) {\footnotesize{于}};
 \node [anchor=north] (w4) at ([yshift=-1em]e4.south) {\footnotesize{苹果}};
@@ -31,13 +31,13 @@
 \node [anchor=south,rnnnode] (node24) at ([yshift=1.5em]node14.north) {\scriptsize{RNN Cell}};
 \node [anchor=south] (node31) at ([yshift=1.0em]node21.north) {\scriptsize{的表示}};
-\node [anchor=south west] (node31new) at ([yshift=-0.3em]node31.north west) {\scriptsize{``乔布斯''}};
+\node [anchor=south west] (node31new) at ([yshift=-0.3em]node31.north west) {\scriptsize{“亚伦”}};
 \node [anchor=south] (node32) at ([yshift=1.0em]node22.north) {\scriptsize{的表示\ \ \ }};
-\node [anchor=south west] (node32new) at ([yshift=-0.3em]node32.north west) {\scriptsize{``乔布斯 任职''}};
+\node [anchor=south west] (node32new) at ([yshift=-0.3em]node32.north west) {\scriptsize{“亚伦 任职”}};
 \node [anchor=south] (node33) at ([yshift=1.0em]node23.north) {\scriptsize{的表示\ \ \ \ \ \ \ \ }};
-\node [anchor=south west] (node33new) at ([yshift=-0.3em]node33.north west) {\scriptsize{``乔布斯 任职 于''}};
+\node [anchor=south west] (node33new) at ([yshift=-0.3em]node33.north west) {\scriptsize{“亚伦 任职 于”}};
 \node [anchor=south] (node34) at ([yshift=1.0em]node24.north) {\scriptsize{的表示\ \ \ \ \ \ \ \ }};
-\node [anchor=south west] (node34new) at ([yshift=-0.3em]node34.north west) {\scriptsize{``乔布斯 任职 于 苹果''}};
+\node [anchor=south west] (node34new) at ([yshift=-0.3em]node34.north west) {\scriptsize{“亚伦 任职 于 苹果”}};
 \draw [->,thick] ([yshift=0.1em]node21.north)--([yshift=-0.1em]node31.south);
 \draw [->,thick] ([yshift=0.1em]node22.north)--([yshift=-0.1em]node32.south);
@@ -62,7 +62,7 @@
 \draw [->,thick] ([xshift=0.1em]node14.east)--([xshift=1em]node14.east);
 {
-\node [anchor=south] (toplabel1) at ([yshift=2em,xshift=-1.3em]node32new.north) {\footnotesize{``苹果''的表示：}};
+\node [anchor=south] (toplabel1) at ([yshift=2em,xshift=-1.3em]node32new.north) {\footnotesize{“苹果”的表示：}};
 \node [anchor=west,fill=blue!20!white,minimum width=3em] (toplabel2) at (toplabel1.east) {\footnotesize{上下文}};
 }
 {

--- a/Chapter9/Figures/figure-save.tex
+++ b/Chapter9/Figures/figure-save.tex
@@ -4,7 +4,7 @@
 \setcounter{mycount1}{1}
 \draw[step=0.5cm,thick] (0,-0) grid (1.5,0.5);
 \foreach \x in {0.25,0.75,1.25}{
-    \node [fill=green!20,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm](vector1) at (\x,0.25) {$\number\value{mycount1}$};
+    \node [fill=green!15,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm](vector1) at (\x,0.25) {$\number\value{mycount1}$};
    \addtocounter{mycount1}{1};
 }
 \node [anchor=north] (labela) at ([xshift=-1.2em,yshift=-0em]vector1.south) {\footnotesize{(a) }};
@@ -14,11 +14,11 @@
 \draw[step=0.5cm,thick] (0,-0) grid (3.0,0.5);
 \setcounter{mycount2}{1}
 \foreach \x in {0.25,0.75,1.25}{
-    \node [fill=green!20,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] (vector2)at (\x,0.25) {$\number\value{mycount2}$};
+    \node [fill=green!15,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] (vector2)at (\x,0.25) {$\number\value{mycount2}$};
    \addtocounter{mycount2}{1};
 }
 \foreach \x in {1.75,2.25,2.75}{
-    \node [fill=red!20,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,0.25) {$\number\value{mycount2}$};
+    \node [fill=red!15,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,0.25) {$\number\value{mycount2}$};
    \addtocounter{mycount2}{1};
 }
 \node [anchor=north] (labelb) at ([xshift=0.3em,yshift=-0em]vector2.south) {\footnotesize{(b) }};
@@ -28,19 +28,19 @@
 \draw[step=0.5cm,thick] (0,-0) grid (6.0,0.5);
 \setcounter{mycount3}{1}
 \foreach \x in {0.25,0.75,1.25}{
-    \node [fill=green!20,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,0.25) {$\number\value{mycount3}$};
+    \node [fill=green!15,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,0.25) {$\number\value{mycount3}$};
    \addtocounter{mycount3}{1};
 }
 \foreach \x in {1.75,2.25,2.75}{
-    \node [fill=red!20,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,0.25) {$\number\value{mycount3}$};
+    \node [fill=red!15,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,0.25) {$\number\value{mycount3}$};
    \addtocounter{mycount3}{1};
 }
 \foreach \x in {3.25,3.75,4.25}{
-    \node [fill=green!20,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,0.25) {$\number\value{mycount3}$};
+    \node [fill=green!15,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,0.25) {$\number\value{mycount3}$};
    \addtocounter{mycount3}{1};
 }
 \foreach \x in {4.75,5.25,5.75}{
-    \node [fill=red!20,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,0.25) {$\number\value{mycount3}$};
+    \node [fill=red!15,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,0.25) {$\number\value{mycount3}$};
    \addtocounter{mycount3}{1};
 }
 \draw[decorate,thick,decoration={brace,mirror,raise=0.2em}] (0,-0.2) -- (2.95,-0.2);

--- a/Chapter9/Figures/figure-softmax.tex
+++ b/Chapter9/Figures/figure-softmax.tex
@@ -14,10 +14,10 @@
 % ymajorgrids,
  %xmajorgrids,
 axis x line*=bottom,
-  xmin=-6,
+  xmin=-6.4,
-  xmax=6,
+  xmax=6.4,
  ymin=0,
-  ymax=1]
+  ymax=1.2]
 \addplot[draw=ublue,very thick]{(tanh(x/2) + 1)/2};
 \end{axis}
 \end{tikzpicture}

--- a/Chapter9/Figures/figure-tensor-mul.tex
+++ b/Chapter9/Figures/figure-tensor-mul.tex
@@ -4,10 +4,10 @@
 \begin{scope}[yshift=6.5em,xshift=1em]
 {
 \setcounter{mycount1}{1}
-\draw[step=0.5cm,color=orange,thick] (-1,-1) grid (1,1);
+\draw[step=0.5cm,color=orange!70,thick] (-1,-1) grid (1,1);
 \foreach \y in {+0.75,+0.25,-0.25,-0.75}
  \foreach \x in {-0.75,-0.25,0.25,0.75}{
-    \node [fill=orange!20,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,\y) {$\number\value{mycount1}$};
+    \node [fill=orange!15,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,\y) {$\number\value{mycount1}$};
    \addtocounter{mycount1}{1};
  }
 }
@@ -17,10 +17,10 @@
 \begin{scope}[yshift=6em,xshift=0.5em]
 {
 \setcounter{mycount2}{2}
-\draw[step=0.5cm,color=blue,thick] (-1,-1) grid (1,1);
+\draw[step=0.5cm,color=blue!70,thick] (-1,-1) grid (1,1);
 \foreach \y in {+0.75,+0.25,-0.25,-0.75}
  \foreach \x in {-0.75,-0.25,0.25,0.75}{
-    \node [fill=blue!20,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,\y) {$\number\value{mycount2}$};
+    \node [fill=blue!15,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,\y) {$\number\value{mycount2}$};
    \addtocounter{mycount2}{1};
  }
 }
@@ -30,10 +30,10 @@
 \begin{scope}[yshift=5.5em,xshift=0em]
 {
 \setcounter{mycount3}{3}
-\draw[step=0.5cm,color=ugreen,thick] (-1,-1) grid (1,1);
+\draw[step=0.5cm,color=ugreen!70,thick] (-1,-1) grid (1,1);
 \foreach \y in {+0.75,+0.25,-0.25,-0.75}
  \foreach \x in {-0.75,-0.25,0.25,0.75}{
-    \node [fill=green!20,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,\y) {$\number\value{mycount3}$};
+    \node [fill=green!15,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,\y) {$\number\value{mycount3}$};
    \addtocounter{mycount3}{1};
  }
 }
@@ -43,10 +43,10 @@
 \begin{scope}[yshift=5em,xshift=-0.5em]
 {
 \setcounter{mycount4}{4}
-\draw[step=0.5cm,color=red,thick] (-1,-1) grid (1,1);
+\draw[step=0.5cm,color=red!70,thick] (-1,-1) grid (1,1);
 \foreach \y in {+0.75,+0.25,-0.25,-0.75}
  \foreach \x in {-0.75,-0.25,0.25,0.75}{
-    \node [fill=red!20,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,\y) {$\number\value{mycount4}$};
+    \node [fill=red!15,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,\y) {$\number\value{mycount4}$};
    \addtocounter{mycount4}{1};
  }
 \node [anchor=north] (xlabel) at (0,-1.2) {$\mathbi{x}$};
@@ -76,11 +76,11 @@
 \begin{scope}[yshift=6.5em,xshift=1em+3in]
 {
-\draw[step=0.5cm,color=orange,thick] (-0.5,-1) grid (0.5,1.0);
+\draw[step=0.5cm,color=orange!70,thick] (-0.5,-1) grid (0.5,1.0);
 \foreach \y in {+0.75,+0.25,-0.25,-0.75}{
  \setcounter{mycount1}{2}
  \foreach \x in {-0.25,0.25}{
-    \node [fill=orange!20,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,\y) {$\number\value{mycount1}$};
+    \node [fill=orange!15,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,\y) {$\number\value{mycount1}$};
    \addtocounter{mycount1}{-1};
  }
 }
@@ -90,11 +90,11 @@
 \begin{scope}[yshift=6em,xshift=0.5em+3in]
 {
-\draw[step=0.5cm,color=blue,thick] (-0.5,-1) grid (0.5,1.0);
+\draw[step=0.5cm,color=blue!70,thick] (-0.5,-1) grid (0.5,1.0);
 \foreach \y in {+0.75,+0.25,-0.25,-0.75}{
  \setcounter{mycount1}{2}
  \foreach \x in {-0.25,0.25}{
-    \node [fill=blue!20,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,\y) {$\number\value{mycount1}$};
+    \node [fill=blue!15,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,\y) {$\number\value{mycount1}$};
    \addtocounter{mycount1}{-1};
  }
 }
@@ -104,11 +104,11 @@
 \begin{scope}[yshift=5.5em,xshift=0em+3in]
 {
-\draw[step=0.5cm,color=ugreen,thick] (-0.5,-1) grid (0.5,1.0);
+\draw[step=0.5cm,color=ugreen!70,thick] (-0.5,-1) grid (0.5,1.0);
 \foreach \y in {+0.75,+0.25,-0.25,-0.75}{
  \setcounter{mycount1}{2}
  \foreach \x in {-0.25,0.25}{
-    \node [fill=green!20,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,\y) {$\number\value{mycount1}$};
+    \node [fill=green!15,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,\y) {$\number\value{mycount1}$};
    \addtocounter{mycount1}{-1};
  }
 }
@@ -118,11 +118,11 @@
 \begin{scope}[yshift=5.0em,xshift=-0.5em+3in]
 {
-\draw[step=0.5cm,color=red,thick] (-0.5,-1) grid (0.5,1.0);
+\draw[step=0.5cm,color=red!70,thick] (-0.5,-1) grid (0.5,1.0);
 \foreach \y in {+0.75,+0.25,-0.25,-0.75}{
  \setcounter{mycount1}{2}
  \foreach \x in {-0.25,0.25}{
-    \node [fill=red!20,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,\y) {$\number\value{mycount1}$};
+    \node [fill=red!15,inner sep=0pt,minimum height=0.49cm,minimum width=0.49cm] at (\x,\y) {$\number\value{mycount1}$};
    \addtocounter{mycount1}{-1};
  }
 }

--- a/Chapter9/Figures/figure-tensor-sample.tex
+++ b/Chapter9/Figures/figure-tensor-sample.tex
@@ -7,40 +7,40 @@
 \begin{tikzpicture}
 \begin{scope}[yshift=6.5em,xshift=1em]
 \setcounter{mycount1}{1}
-\draw[step=0.5cm,color=orange,line width=0.4mm] (-2,-2) grid (1,1);
+\draw[step=0.5cm,color=orange!70,line width=0.4mm] (-2,-2) grid (1,1);
 \foreach \y in {+0.5,-0.5,-1.5}
  \foreach \x in {-1.5,-0.5,0.5}{
-    \node [fill=orange!20,inner sep=0pt,minimum height=0.98cm,minimum width=0.98cm] at (\x,\y) {\number\value{mycount1}};
+    \node [fill=orange!15,inner sep=0pt,minimum height=0.98cm,minimum width=0.98cm] at (\x,\y) {\number\value{mycount1}};
    \addtocounter{mycount1}{1};
  }
 \end{scope}
-\begin{scope}[yshift=5.5em,xshift=0em]
+\begin{scope}[yshift=5em,xshift=-0.5em]
 \setcounter{mycount2}{2}
-\draw[step=0.5cm,color=blue,line width=0.4mm] (-2,-2) grid (1,1);
+\draw[step=0.5cm,color=blue!70,line width=0.4mm] (-2,-2) grid (1,1);
 \foreach \y in {+0.5,-0.5,-1.5}
  \foreach \x in {-1.5,-0.5,0.5}{
-    \node [fill=blue!20,inner sep=0pt,minimum height=0.98cm,minimum width=0.98cm] at (\x,\y) {\number\value{mycount2}};
+    \node [fill=blue!15,inner sep=0pt,minimum height=0.98cm,minimum width=0.98cm] at (\x,\y) {\number\value{mycount2}};
    \addtocounter{mycount2}{1};
  }
 \end{scope}
-\begin{scope}[yshift=4.5em,xshift=-1em]
+\begin{scope}[yshift=3.5em,xshift=-2em]
 \setcounter{mycount3}{3}
-\draw[step=0.5cm,color=ugreen,line width=0.4mm] (-2,-2) grid (1,1);
+\draw[step=0.5cm,color=ugreen!70,line width=0.4mm] (-2,-2) grid (1,1);
 \foreach \y in {+0.5,-0.5,-1.5}
  \foreach \x in {-1.5,-0.5,0.5}{
-    \node [fill=green!20,inner sep=0pt,minimum height=0.98cm,minimum width=0.98cm] at (\x,\y) {\number\value{mycount3}};
+    \node [fill=ugreen!15,inner sep=0pt,minimum height=0.98cm,minimum width=0.98cm] at (\x,\y) {\number\value{mycount3}};
    \addtocounter{mycount3}{1};
  }
 \end{scope}
-\begin{scope}[yshift=3.5em,xshift=-2em]
+\begin{scope}[yshift=2em,xshift=-3.5em]
 \setcounter{mycount4}{4}
-\draw[step=0.5cm,color=red,line width=0.4mm] (-2,-2) grid (1,1);
+\draw[step=0.5cm,color=red!70,line width=0.4mm] (-2,-2) grid (1,1);
 \foreach \y in {+0.5,-0.5,-1.5}
  \foreach \x in {-1.5,-0.5,0.5}{
-    \node [fill=red!20,inner sep=0pt,minimum height=0.98cm,minimum width=0.98cm] at (\x,\y) {\number\value{mycount4}};
+    \node [fill=red!15,inner sep=0pt,minimum height=0.98cm,minimum width=0.98cm] at (\x,\y) {\number\value{mycount4}};
    \addtocounter{mycount4}{1};
  }
 \end{scope}

--- a/Chapter9/Figures/figure-the-amount-of-data-in-a-bilingual-corpus.tex
+++ b/Chapter9/Figures/figure-the-amount-of-data-in-a-bilingual-corpus.tex
@@ -7,17 +7,17 @@
    yticklabel style={/pgf/number format/precision=1,/pgf/number format/fixed zerofill},
    xticklabel style={/pgf/number format/1000 sep=},
    xlabel style={yshift=0.5em},
-    xlabel={\footnotesize{Year}},ylabel={\footnotesize{句子数量(个)}},
+    xlabel={\footnotesize{年份}},ylabel={\footnotesize{句子数量(个)}},
    ymin=1,ymax=1000000000000,
    xmin=1999,xmax=2020,xtick={2000,2005,2010,2015,2020},
    legend style={yshift=-5em,xshift=0em,legend cell align=left,legend plot pos=right}
 ]
 \addplot[purple,mark=square,mark=star,very thick] coordinates {(2001,10000) (2005,2000000) (2008,8000000) (2009,9000000) (2011,10000000) (2012,12000000) (2014,20000000) (2016,30000000) (2018,40000000) };
-\addlegendentry{\tiny{Bi-text used in MT papers}\ \ \ \ \ \ \ \ \ \ }
+\addlegendentry{\tiny{机器翻译论文中使用的双语数据量}\ \ \ \ \ \ \ \ \ \ }
 {
 \addplot[ublue,mark=otimes*,very thick] coordinates {(2005,10000000) (2008,100000000) (2012,3000000000) (2016,5000000000) (2019,10000000000) };
-\addlegendentry{\tiny{Bi-text used in practical systems}}
+\addlegendentry{\tiny{实用系统中使用的双语数据量}}
 }
 \end{semilogyaxis}

--- a/Chapter9/Figures/figure-two-layer-neural-network.tex
+++ b/Chapter9/Figures/figure-two-layer-neural-network.tex
@@ -46,7 +46,7 @@
 {
 \node [anchor=west] (flabel) at ([xshift=1in]y.east) {\footnotesize{Sigmoid:}};
 \node [anchor=north east] (slabel) at ([xshift=0]flabel.south east) {\footnotesize{Sum:}};
-\node [anchor=west,inner sep=2pt] (flabel2) at (flabel.east) {\footnotesize{$f(s_2)=1/(1+e^{-s_2})$}};
+\node [anchor=west,inner sep=2pt] (flabel2) at (flabel.east) {\footnotesize{$f(s_2)=1/(1+{\textrm e}^{-s_2})$}};
 \node [anchor=west,inner sep=2pt] (flabel3) at (slabel.east) {\footnotesize{$s_2=x_1 \cdot w_{12} + b$}};
 \draw [->,thick,dotted] ([yshift=-0.3em,xshift=-0.1em]n11.60)  .. controls +(east:1) and +(west:2) ..  ([xshift=-0.2em]flabel.west) ;

--- a/Chapter9/chapter9.tex
+++ b/Chapter9/chapter9.tex
@@ -436,7 +436,7 @@ f(c{\mathbi{v}})&=&cf({\mathbi{v}})
 \label{eq:9-13}\end{pmatrix}
 \end{eqnarray}
-\parinterval 上例中矩阵$ {\mathbi{A}} $定义了一个从$ {\mathbb R}^n $到$ {\mathbb R}^m $的线性映射：向量$ {\mathbi{x}}^{\textrm{T}}\in {\mathbb R}^n $和$ {\mathbi{y}}^{\textrm{T}}\in {\mathbb R}^m $别为两个空间中的列向量，即大小为$ n\times 1 $ 和$ m\times 1 $ 的矩阵。
+\parinterval 上例中矩阵$ {\mathbi{A}} $定义了一个从$ {\mathbb R}^n $到$ {\mathbb R}^m $的线性映射：向量$ {\mathbi{x}}^{\textrm{T}}\in {\mathbb R}^n $和$ {\mathbi{y}}^{\textrm{T}}\in {\mathbb R}^m $分别为两个空间中的列向量，即大小为$ n\times 1 $ 和$ m\times 1 $ 的矩阵。
 %----------------------------------------------------------------------------------------
 %    NEW SUBSUB-SECTION
@@ -444,20 +444,20 @@ f(c{\mathbi{v}})&=&cf({\mathbi{v}})
 \subsubsection{6. 范数}
-\parinterval 工程领域，经常会使用被称为{\small\bfnew{范数}}\index{范数}（Norm）\index{Norm}的函数衡量向量大小，范数为向量空间内的所有向量赋予非零的正长度或大小。对于一个$n$维向量$ {\mathbi{x}} $，一个常见的范数函数为$ l_p $ 范数，通常表示为$ {\Vert{\mathbi{x}}\Vert}_p $ ，其中$p\ge 0$，是一个标量形式的参数。常用的$ p $的取值有$ 1 $、$ 2 $、$ \infty $等。范数的计算方式如公式\eqref{eq:9-14}：
+\parinterval 工程领域，经常会使用被称为{\small\bfnew{范数}}\index{范数}（Norm）\index{Norm}的函数衡量向量大小，范数为向量空间内的所有向量赋予非零的正长度或大小。对于一个$n$维向量$ {\mathbi{x}} $，一个常见的范数函数为$ l_p $ 范数，通常表示为$ {\Vert{\mathbi{x}}\Vert}_p $ ，其中$p\ge 0$，是一个标量形式的参数。常用的$ p $的取值有$ 1 $、$ 2 $、$ \infty $等。范数的计算方式如公式\eqref{eq:9-14}所示：
 \begin{eqnarray}
 l_p({\mathbi{x}}) & = & {\Vert{\mathbi{x}}\Vert}_p \nonumber \\
               & = & {\left (\sum_{i=1}^{n}{{\vert x_{i}\vert}^p}\right )}^{\frac{1}{p}}
 \label{eq:9-14}
 \end{eqnarray}
-\parinterval $ l_1 $范数为向量的各个元素的绝对值之和，如公式\eqref{eq:9-15}：
+\parinterval $ l_1 $范数为向量的各个元素的绝对值之和，如公式\eqref{eq:9-15}所示：
 \begin{eqnarray}
 {\Vert{\mathbi{x}}\Vert}_1&=&\sum_{i=1}^{n}{\vert x_{i}\vert}
 \label{eq:9-15}
 \end{eqnarray}
-\parinterval $ l_2 $范数为向量的各个元素平方和的二分之一次方，如公式\eqref{eq:9-16}：
+\parinterval $ l_2 $范数为向量的各个元素平方和的二分之一次方，如公式\eqref{eq:9-16}所示：
 \begin{eqnarray}
 {\Vert{\mathbi{x}}\Vert}_2&=&\sqrt{\sum_{i=1}^{n}{{x_{i}}^2}} \nonumber \\
                                      &=&\sqrt{{\mathbi{x}}^{\textrm T}{\mathbi{x}}}
@@ -466,7 +466,7 @@ l_p({\mathbi{x}}) & = & {\Vert{\mathbi{x}}\Vert}_p \nonumber \\
 \parinterval $ l_2 $范数被称为{\small\bfnew{欧几里得范数}}\index{欧几里得范数}（Euclidean Norm）\index{Euclidean Norm}。从几何角度，向量也可以表示为从原点出发的一个带箭头的有向线段，其$ l_2 $范数为线段的长度，也常被称为向量的模。$ l_2 $ 范数在机器学习中非常常用。向量$ {\mathbi{x}} $的$ l_2 $范数经常简化表示为$ \Vert{\mathbi{x}}\Vert $，可以通过点积$ {\mathbi{x}}^{\textrm T}{\mathbi{x}} $进行计算。
-\parinterval $ l_{\infty} $范数为向量的各个元素的最大绝对值，如公式\eqref{eq:9-17}：
+\parinterval $ l_{\infty} $范数为向量的各个元素的最大绝对值，如公式\eqref{eq:9-17}所示：
 \begin{eqnarray}
 {\Vert{\mathbi{x}}\Vert}_{\infty}&=&{\textrm{max}}\{x_1,x_2,\dots,x_n\}
 \label{eq:9-17}
@@ -541,14 +541,14 @@ y=\begin{cases} 0 & \sum_{i}{x_i\cdot w_i}-\sigma <0\\1 & \sum_{i}{x_i\cdot w_i}
 \vspace{0.5em}
 \end{itemize}
-\parinterval 在这种情况下应该如何做出决定呢？比如，女朋友很希望和你一起去看音乐会，但是剧场很远而且票价500元，如果这些因素对你都是同等重要的（即$ w_1=w_2=w_3 $,假设这三个权重都设置为1）那么会得到一个综合得分，如公式\eqref{eq:9-20}：
+\parinterval 在这种情况下应该如何做出决定呢？比如，女朋友很希望和你一起去看音乐会，但是剧场很远而且票价500元，如果这些因素对你都是同等重要的（即$ w_1=w_2=w_3 $,假设这三个权重都设置为1）那么会得到一个综合得分，如公式\eqref{eq:9-20}所示：
 \begin{eqnarray}
 x_1\cdot w_1+x_2\cdot w_2+x_3\cdot w_3 & = & 0\cdot 1+0\cdot 1+1\cdot 1 \nonumber \\
                                                                     & = & 1
 \label{eq:9-20}
 \end{eqnarray}
-\parinterval 如果你不是十分纠结的人，能够接受不完美的事情，你可能会把$ \sigma $设置为1，于是$ \sum{w_i\cdot x_i}-\sigma \ge 0 $，那么你会去音乐会。可以看出，上面的例子的本质就是一个如图\ref{fig:9-6}的感知机：
+\parinterval 如果你不是十分纠结的人，能够接受不完美的事情，你可能会把$ \sigma $设置为1，于是$ \sum{w_i\cdot x_i}-\sigma \ge 0 $，那么你会去音乐会。可以看出，上面的例子的本质就是一个如图\ref{fig:9-6}所示的感知机：
 %----------------------------------------------
 \begin{figure}[htp]
@@ -599,7 +599,7 @@ x_1\cdot w_1+x_2\cdot w_2+x_3\cdot w_3 & = & 0\cdot 1+0\cdot 1+1\cdot 1 \nonumbe
 \parinterval $ x_3 $：女朋友是否喜欢
-\parinterval 在新修改的模型中，$ x_0 $和$ x_1 $变成了连续变量，$ x_2 $仍然是离散变量，如图\ref{fig:9-8}。
+\parinterval 在新修改的模型中，$ x_0 $和$ x_1 $变成了连续变量，$ x_2 $仍然是离散变量，如图\ref{fig:9-8}所示。
 %----------------------------------------------
 \begin{figure}[htp]
@@ -610,7 +610,7 @@ x_1\cdot w_1+x_2\cdot w_2+x_3\cdot w_3 & = & 0\cdot 1+0\cdot 1+1\cdot 1 \nonumbe
 \end{figure}
 %-------------------------------------------
-\parinterval 使用修改后的模型做决策：女朋友很希望和你一起，但是剧场有20km远而且票价有500元。于是有$ x_1=10/20 $，$ x_2=150/500 $，$ x_3=1 $。此时决策过程如公式\eqref{eq:9-22}：
+\parinterval 使用修改后的模型做决策：女朋友很希望和你一起，但是剧场有20km远而且票价有500元。于是有$ x_1=10/20 $，$ x_2=150/500 $，$ x_3=1 $。此时决策过程如公式\eqref{eq:9-22}所示：
 \begin{eqnarray}
 \sum_{i}{x_i\cdot w_i} & = & 0.5\cdot 0.5+0.3\cdot 2+1\cdot 0.5 \nonumber \\
                                   & = & 1.35 \nonumber \\
@@ -672,7 +672,7 @@ x_1\cdot w_1+x_2\cdot w_2+x_3\cdot w_3 & = & 0\cdot 1+0\cdot 1+1\cdot 1 \nonumbe
 \parinterval 为了建立多层神经网络，首先需要把前面提到的简单的神经元进行扩展，把多个神经元组成一“层”神经元。比如，很多实际问题需要同时有多个输出，这时可以把多个相同的神经元并列起来，每个神经元都会有一个单独的输出，这就构成一“层”，形成了单层神经网络。单层神经网络中的每一个神经元都对应着一组权重和一个输出，可以把单层神经网络中的不同输出看作一个事物不同角度的描述。
-\parinterval 举个简单的例子，预报天气时，往往需要预测温度、湿度和风力，这就意味着如果使用单层神经网络进行预测，需要设置3个神经元。如图\ref{fig:9-10}，此时权重矩阵如公式\eqref{eq:9-105}：
+\parinterval 举个简单的例子，预报天气时，往往需要预测温度、湿度和风力，这就意味着如果使用单层神经网络进行预测，需要设置3个神经元。如图\ref{fig:9-10}所示，此时权重矩阵如公式\eqref{eq:9-105}所示：
 \begin{eqnarray}
 {\mathbi{W}}=\begin{pmatrix} w_{11} & w_{12} & w_{13}\\ w_{21} & w_{22} & w_{23}\end{pmatrix}
@@ -727,7 +727,7 @@ x_1\cdot w_1+x_2\cdot w_2+x_3\cdot w_3 & = & 0\cdot 1+0\cdot 1+1\cdot 1 \nonumbe
 \end{figure}
 %-------------------------------------------
-\parinterval 也就是说，线性变换提供了对输入数据进行空间中旋转、平移的能力。当然，线性变换也适用于更加复杂的情况，这也为神经网络提供了拟合不同函数的能力。比如，可以利用线性变换将三维图形投影到二维平面上，或者将二维平面上的图形映射到三维空间。如图\ref{fig:9-14}，通过一个简单的线性变换，可以将三维图形投影到二维平面上。
+\parinterval 也就是说，线性变换提供了对输入数据进行空间中旋转、平移的能力。当然，线性变换也适用于更加复杂的情况，这也为神经网络提供了拟合不同函数的能力。比如，可以利用线性变换将三维图形投影到二维平面上，或者将二维平面上的图形映射到三维空间。如图\ref{fig:9-14}所示，通过一个简单的线性变换，可以将三维图形投影到二维平面上。
 \vspace{-0.5em}
 %----------------------------------------------
@@ -740,7 +740,7 @@ x_1\cdot w_1+x_2\cdot w_2+x_3\cdot w_3 & = & 0\cdot 1+0\cdot 1+1\cdot 1 \nonumbe
 %-------------------------------------------
 \vspace{-0.5em}
-\parinterval 那激活函数又是什么？神经元在接收到经过线性变换的结果后，通过激活函数的处理，得到最终的输出$ \mathbf y $。激活函数的目的是解决实际问题中的非线性变换，线性变换只能拟合直线，而激活函数的加入，使神经网络具有了拟合曲线的能力。 特别是在实际问题中，很多现象都无法用简单的线性关系描述，这时可以使用非线性激活函数来描述更加复杂的问题。常见的非线性激活函数有Sigmoid、ReLU、Tanh等。如图\ref{fig:9-15}列举了几种激活函数的形式。
+\parinterval 那激活函数又是什么？神经元在接收到经过线性变换的结果后，通过激活函数的处理，得到最终的输出$ \mathbf y $。激活函数的目的是解决实际问题中的非线性变换，线性变换只能拟合直线，而激活函数的加入，使神经网络具有了拟合曲线的能力。 特别是在实际问题中，很多现象都无法用简单的线性关系描述，这时可以使用非线性激活函数来描述更加复杂的问题。常见的非线性激活函数有Sigmoid、ReLU、Tanh等。图\ref{fig:9-15}中列举了几种激活函数的形式。
 %----------------------------------------------
 \begin{figure}[htp]
@@ -769,7 +769,7 @@ x_1\cdot w_1+x_2\cdot w_2+x_3\cdot w_3 & = & 0\cdot 1+0\cdot 1+1\cdot 1 \nonumbe
 \end{figure}
 %-------------------------------------------
-\parinterval 在多层神经网络中，通常包括输入层、输出层和至少一个隐藏层。如图\ref{fig:9-17}是一个由四层神经网络构成的模型，包括输入层、输出层和两个隐藏层。\\
+\parinterval 在多层神经网络中，通常包括输入层、输出层和至少一个隐藏层。图\ref{fig:9-17}展示了一个由四层神经网络构成的模型，包括输入层、输出层和两个隐藏层。\\
 %----------------------------------------------------------------------------------------
 %    NEW SUB-SECTION
@@ -909,7 +909,7 @@ x_1\cdot w_1+x_2\cdot w_2+x_3\cdot w_3 & = & 0\cdot 1+0\cdot 1+1\cdot 1 \nonumbe
 \parinterval 简单来说，张量是一种通用的工具，用于描述由多个数据构成的量。比如，输入的量有三个维度在变化，用矩阵不容易描述，但是用张量却很容易。
-\parinterval 从计算机实现的角度来看，现在所有深度学习框架都把张量定义为“多维数组”。张量有一个非常重要的属性\ \dash \ {\small\bfnew{阶}}\index{阶}（Rank）\index{Rank}。可以将多维数组中“维”的属性与张量的“阶”的属性作类比，这两个属性都表示多维数组（张量）有多少个独立的方向。例如，3是一个标量（Scalar），相当于一个0维数组或0阶张量；$ {(\begin{array}{cccc} 2 & -3 & 0.8 & 0.2\end{array})}^{\textrm T} $ 是一个向量（Vector），相当于一个1维数组或1阶张量；$ \begin{pmatrix} -1 & 3 & 7\\ 0.2 & 2 & 9\end{pmatrix} $是一个矩阵（Matrix)，相当于一个2维数组或2阶张量；如图\ref{fig:9-25}，这是一个3 维数组或3阶张量，其中，每个$4 \times 4$的方形代表一个2阶张量，这样的方形有4个，最终形成3阶张量。
+\parinterval 从计算机实现的角度来看，现在所有深度学习框架都把张量定义为“多维数组”。张量有一个非常重要的属性\ \dash \ {\small\bfnew{阶}}\index{阶}（Rank）\index{Rank}。可以将多维数组中“维”的属性与张量的“阶”的属性作类比，这两个属性都表示多维数组（张量）有多少个独立的方向。例如，3是一个标量（Scalar），相当于一个0维数组或0阶张量；$ {(\begin{array}{cccc} 2 & -3 & 0.8 & 0.2\end{array})}^{\textrm T} $ 是一个向量（Vector），相当于一个1维数组或1阶张量；$ \begin{pmatrix} -1 & 3 & 7\\ 0.2 & 2 & 9\end{pmatrix} $是一个矩阵（Matrix)，相当于一个2维数组或2阶张量；如图\ref{fig:9-25}所示，这是一个3 维数组或3阶张量，其中，每个$4 \times 4$的方形代表一个2阶张量，这样的方形有4个，最终形成3阶张量。
 %----------------------------------------------
 \begin{figure}[htp]
@@ -1069,7 +1069,7 @@ f(x)=\begin{cases} 0 & x\le 0 \\x & x>0\end{cases}
 \parinterval 有了张量这个工具，可以很容易地实现任意的神经网络。反过来，神经网络都可以被看作是张量的函数。一种经典的神经网络计算模型是：给定输入张量，通过各个神经网络层所对应的张量计算之后，最后得到输出张量。这个过程也被称作{\small\sffamily\bfseries{前向传播}}\index{前向传播}（Forward Propagation\index{Forward Propagation}），它常常被应用在使用神经网络对新的样本进行推断中。
-\parinterval 来看一个具体的例子，如图\ref{fig:9-37}是一个根据天气情况判断穿衣指数（穿衣指数是人们穿衣薄厚的依据）的过程，将当天的天空状况、低空气温、水平气压作为输入，通过一层神经元在输入数据中提取温度、风速两方面的特征，并根据这两方面的特征判断穿衣指数。需要注意的是，在实际的神经网络中，并不能准确地知道神经元究竟可以提取到哪方面的特征，以上表述是为了让读者更好地理解神经网络的建模过程和前向传播过程。这里将上述过程建模为如图\ref{fig:9-37}所示的两层神经网络。
+\parinterval 来看一个具体的例子，图\ref{fig:9-37}展示了一个根据天气情况判断穿衣指数（穿衣指数是人们穿衣薄厚的依据）的过程，将当天的天空状况、低空气温、水平气压作为输入，通过一层神经元在输入数据中提取温度、风速两方面的特征，并根据这两方面的特征判断穿衣指数。需要注意的是，在实际的神经网络中，并不能准确地知道神经元究竟可以提取到哪方面的特征，以上表述是为了让读者更好地理解神经网络的建模过程和前向传播过程。这里将上述过程建模为如图\ref{fig:9-37}所示的两层神经网络。
 %----------------------------------------------
 \begin{figure}[htp]
@@ -1278,7 +1278,7 @@ J({\bm \theta})&=&\frac{1}{m}\sum_{i=j}^{j+m-1}{L({\mathbi{x}}_i,\widetilde{\mat
 \noindent {\small\sffamily\bfseries{（1）数值微分\index{数值微分}（Numerical Differentiation）\index{Numerical Differentiation}}}
 \vspace{0.5em}
-\parinterval 数学上，梯度的求解其实就是求函数偏导的问题。导数是用极限来定义的，如公式\eqref{eq:9-33}：
+\parinterval 数学上，梯度的求解其实就是求函数偏导的问题。导数是用极限来定义的，如公式\eqref{eq:9-33}所示：
 \begin{eqnarray}
 \frac{\partial L({\bm \theta})}{\partial {\bm \theta}}&=&\lim\limits_{\Delta {\bm \theta} \to 0}\frac{L({\bm \theta}+\Delta {\bm \theta})-L({\bm \theta}-\Delta {\bm \theta})}{2\Delta {\bm \theta}}
 \label{eq:9-33}
@@ -1349,7 +1349,7 @@ $+2x^2+x+1)$ & \ \ $(x^4+2x^3+2x^2+x+1)$ & $+6x+1$ \\
 \noindent 这里，$\bar{{\mathbi{h}}_i}$表示损失函数$L$相对于${\mathbi{h}}_i$的梯度信息，它会被保存在节点$i$处。为了计算$\bar{{\mathbi{h}}_i}$，需要从网络的输出反向计算每一个节点处的梯度。具体实现时，这个过程由一个包括前向计算和反向计算的两阶段方法实现。
-\parinterval 首先，从神经网络的输入，逐层计算每层网络的输出值。如图\ref{fig:9-44}，第$ i $ 层的输出$ {\mathbi{h}}_i $ 作为第$ i+1 $ 层的输入，数据流在神经网络内部逐层传递。
+\parinterval 首先，从神经网络的输入，逐层计算每层网络的输出值。如图\ref{fig:9-44}所示，第$ i $ 层的输出$ {\mathbi{h}}_i $ 作为第$ i+1 $ 层的输入，数据流在神经网络内部逐层传递。
 %----------------------------------------------
 \begin{figure}[htp]
@@ -1370,7 +1370,7 @@ $+2x^2+x+1)$ & \ \ $(x^4+2x^3+2x^2+x+1)$ & $+6x+1$ \\
 \vspace{0.5em}
 \end{itemize}
-\parinterval  对于反向计算的实现，一般从神经网络的输出开始，逆向逐层计算每层网络输入所对应的微分结果。如图\ref{fig:9-45}，在第$ i $层计算此处的梯度$ \frac{\partial L}{\partial {\mathbi{h}}_i} $，并将微分值向前一层传递，根据链式法则继续计算梯度。
+\parinterval  对于反向计算的实现，一般从神经网络的输出开始，逆向逐层计算每层网络输入所对应的微分结果。如图\ref{fig:9-45}所示，在第$ i $层计算此处的梯度$ \frac{\partial L}{\partial {\mathbi{h}}_i} $，并将微分值向前一层传递，根据链式法则继续计算梯度。
 %----------------------------------------------
 \begin{figure}[htp]
@@ -1439,7 +1439,7 @@ v_t&=&\beta v_{t-1}+(1-\beta)\frac{\partial J}{\partial \theta_t}
 \parinterval  在神经网络的学习中，学习率的设置很重要。学习率过小， 会导致学习花费过多时间；反过来，学习率过大，则会导致学习发散，甚至造成模型的“跑偏”。在深度学习实现过程中，有一种被称为学习率{\small\bfnew{衰减}}\index{衰减}（Decay）\index{Decay}的方法，即最初设置较大的学习率，随着学习的进行，使学习率逐渐减小，这种方法相当于将“全体”参数的学习率值一起降低。AdaGrad梯度下降算法进一步发展了这个思想\upcite{duchi2011adaptive}。
-\parinterval  AdaGrad会为参数的每个元素适当地调整学习率，与此同时进行学习。其参数更新方式如公式\eqref{eq:9-36}和\eqref{eq:9-37}：
+\parinterval  AdaGrad会为参数的每个元素适当地调整学习率，与此同时进行学习。其参数更新方式如公式\eqref{eq:9-36}和\eqref{eq:9-37}所示：
 \begin{eqnarray}
 z_t&=&z_{t-1}+\frac{\partial J}{\partial {\theta}_t} \cdot \frac{\partial J}{\partial {\theta}_t}
 \label{eq:9-36}\\
@@ -1650,7 +1650,7 @@ z_t&=&\gamma z_{t-1}+(1-\gamma) \frac{\partial J}{\partial {\theta}_t} \cdot  \f
 \vspace{0.5em}
 \item  $ {\mathbi{h}}^K $：整个网络的输出；
 \vspace{0.5em}
-\item  $ {\mathbi{s}}^k $：第$ k $层的线性变换结果，其计算方式如公式\eqref{eq:9-109}：
+\item  $ {\mathbi{s}}^k $：第$ k $层的线性变换结果，其计算方式如公式\eqref{eq:9-109}所示：
       \begin{eqnarray}
       {\mathbi{s}}^k & = & {\mathbi{h}}^{k-1}{\mathbi{W}}^k \nonumber \\
                   & = & \sum{h_j^{k-1}w_{j,i}^k}
@@ -1661,7 +1661,7 @@ z_t&=&\gamma z_{t-1}+(1-\gamma) \frac{\partial J}{\partial {\theta}_t} \cdot  \f
 \vspace{0.5em}
 \end{itemize}
-\parinterval  于是，在神经网络的第$ k $层，前向计算过程如公式\eqref{eq:9-46}：
+\parinterval  于是，在神经网络的第$ k $层，前向计算过程如公式\eqref{eq:9-46}所示：
 \begin{eqnarray}
 {\mathbi{h}}^k & = & f^k({\mathbi{s}}^k) \nonumber \nonumber \\
            & = & f^k({\mathbi{h}}^{k-1}{\mathbi{W}}^k)
@@ -1731,7 +1731,7 @@ z_t&=&\gamma z_{t-1}+(1-\gamma) \frac{\partial J}{\partial {\theta}_t} \cdot  \f
 \vspace{0.5em}
 \item $ \frac{\partial L}{\partial {\mathbi{h}}^K} $表示损失函数$ L $相对网络输出$ {\mathbi{h}}^K $的梯度。比如，对于平方损失$ L=\frac{1}{2}{\Vert \widetilde {\mathbi{y}}-{\mathbi{h}}^K\Vert}^2 $，有$ \frac{\partial L}{\partial {\mathbi{h}}^K}= \widetilde{\mathbi{y}} -{\mathbi{h}}^K $。计算结束后，将$ \frac{\partial L}{\partial {\mathbi{h}}^K} $向前传递。
 \vspace{0.5em}
-\item $ \frac{\partial f^T({\mathbi{s}}^K)}{\partial {\mathbi{s}}^K} $表示激活函数相对于其输入$ {\mathbi{s}}^K $的梯度。比如，对于Sigmoid函数$ f({\mathbi{s}})=\frac{1}{1+e^{- {\mathbi{s}}}}$，有$ \frac{\partial f({\mathbi{s}})}{\partial {\mathbi{s}}}=f({\mathbi{s}}) (1-f({\mathbi{s}}))$
+\item $ \frac{\partial f^T({\mathbi{s}}^K)}{\partial {\mathbi{s}}^K} $表示激活函数相对于其输入$ {\mathbi{s}}^K $的梯度。比如，对于Sigmoid函数$ f({\mathbi{s}})=\frac{1}{1+{\textrm e}^{- {\mathbi{s}}}}$，有$ \frac{\partial f({\mathbi{s}})}{\partial {\mathbi{s}}}=f({\mathbi{s}}) (1-f({\mathbi{s}}))$
 \vspace{0.5em}
 \end{itemize}
 \end{spacing}
@@ -1849,19 +1849,19 @@ z_t&=&\gamma z_{t-1}+(1-\gamma) \frac{\partial J}{\partial {\theta}_t} \cdot  \f
 \subsection{基于前馈神经网络的语言模型}
-\parinterval  回顾一下{\chaptertwo}的内容，语言建模的问题被定义为：对于一个词序列$ w_1w_2\dots w_m$，如何计算该词序列的可能性？词序列出现的概率可以通过链式法则得到，如公式\eqref{eq:9-57}：
+\parinterval  回顾一下{\chaptertwo}的内容，语言建模的问题被定义为：对于一个词序列$ w_1w_2\dots w_m$，如何计算该词序列的可能性？词序列出现的概率可以通过链式法则得到，如公式\eqref{eq:9-57}所示：
 \begin{eqnarray}
 \funp{P}(w_1w_2\dots w_m)&=&\funp{P}(w_1)\funp{P}(w_2|w_1)\funp{P}(w_3|w_1w_2)\dots \funp{P}(w_m|w_1\dots w_{m-1})
 \label{eq:9-57}
 \end{eqnarray}
-\parinterval  由于$ \funp{P}(w_m|w_1\dots w_{m-1}) $需要建模$ m-1 $个词构成的历史信息，这个模型仍然很复杂。于是就有了基于局部历史的$n$-gram语言模型，如公式\eqref{eq:9-58}：
+\parinterval  由于$ \funp{P}(w_m|w_1\dots w_{m-1}) $需要建模$ m-1 $个词构成的历史信息，这个模型仍然很复杂。于是就有了基于局部历史的$n$-gram语言模型，如公式\eqref{eq:9-58}所示：
 \begin{eqnarray}
 \funp{P}(w_m|w_1\dots w_{m-1})&=&\funp{P}(w_m|w_{m-n+1}\dots w_{m-1})
 \label{eq:9-58}
 \end{eqnarray}
-\noindent  其中，$\funp{P}(w_m|w_{m-n+1}\dots w_{m-1}) $可以通过相对频次估计进行计算，如公式\eqref{eq:9-110}，其中$ {\textrm{count}}(\cdot) $表示在训练数据上的频次：
+\noindent  其中，$\funp{P}(w_m|w_{m-n+1}\dots w_{m-1}) $可以通过相对频次估计进行计算，如公式\eqref{eq:9-110}所示，其中$ {\textrm{count}}(\cdot) $表示在训练数据上的频次：
 \begin{eqnarray}
 \funp{P}(w_m|w_{m-n+1}\dots w_{m-1})&=&\frac{{\textrm{count}}(w_{m-n+1}\dots w_m)}{{\textrm{count}}(w_{m-n+1}\dots w_{m-1})}
 \label{eq:9-110}
@@ -2116,7 +2116,7 @@ z_t&=&\gamma z_{t-1}+(1-\gamma) \frac{\partial J}{\partial {\theta}_t} \cdot  \f
 \parinterval  目前，词嵌入已经成为诸多自然语言处理系统的标配，也衍生出很多有趣的研究法方向。但是，冷静地看，词嵌入依旧存在一些问题：每个词都对应唯一的向量表示，那么对于一词多义现象，词义需要通过上下文进行区分，这时使用简单的词嵌入式是无法处理的。有一个著名的例子：
 \begin{example}
-Jobs was the CEO of {\red{\underline{apple}}}.
+Aaron is an employee of {\red{\underline{apple}}}.
 \hspace{2em} He finally ate the {\red{\underline{apple}}}.
 \end{example}