合并分支 'caorunzhe' 到 'master'

Caorunzhe 查看合并请求 !452

合并分支 'caorunzhe' 到 'master'
Caorunzhe 查看合并请求 !452
8c1b3175 · 曹润柘 · 5207477a · d83c25d1 · 8c1b3175
Commit 8c1b3175 authored Nov 23, 2020 by 曹润柘
--- a/Chapter9/chapter9.tex
+++ b/Chapter9/chapter9.tex
@@ -1179,7 +1179,7 @@ y&=&{\textrm{Sigmoid}}({\textrm{Tanh}}({\mathbi{x}}\cdot {\mathbi{W}}^{[1]}+{\ma
 \label{eq:9-28}
 \end{eqnarray}

-\noindent 其中，$ \widehat{\bm \theta} $表示在训练数据上使损失的平均值达到最小的参数。$ \frac{1}{n}\sum_{i=1}^{n}{L({\mathbi{x}}_i,\widetilde{\mathbi{y}}_i;{\bm \theta})} $也被称作{\small\sffamily\bfseries{代价函数}}\index{代价函数}（Cost Function）\index{Cost Function}，它是损失函数均值期望的估计，记为$ J({\bm \theta}) $。
+\noindent 其中，$ \widehat{\bm \theta} $表示在训练数据上使损失的平均值达到最小的参数，$n$为训练数据总量。$ \frac{1}{n}\sum_{i=1}^{n}{L({\mathbi{x}}_i,\widetilde{\mathbi{y}}_i;{\bm \theta})} $也被称作{\small\sffamily\bfseries{代价函数}}\index{代价函数}（Cost Function）\index{Cost Function}，它是损失函数均值期望的估计，记为$ J({\bm \theta}) $。

 \parinterval 参数优化的核心问题是：找到使代价函数$ J({\bm\theta}) $达到最小的$ \bm \theta $。然而$ J({\bm\theta}) $可能会包含大量的参数，比如，基于神经网络的机器翻译模型的参数量可能会超过一亿个。这时不可能用手动方法进行调参。为了实现高效的参数优化，比较常用的手段是使用{\small\bfnew{梯度下降方法}}\index{梯度下降方法}（The Gradient Descent Method）\index{The Gradient Descent Method}。

@@ -1389,7 +1389,7 @@ $+2x^2+x+1)$ & \ \ $(x^4+2x^3+2x^2+x+1)$ & $+6x+1$ \\

 \subsubsection{3. 基于梯度的方法的变种和改进}\label{sec:9.4.2.3}

-\parinterval  参数优化通常基于梯度下降算法，即在每个更新步骤$ t $，沿梯度方向更新参数：
+\parinterval  参数优化通常基于梯度下降算法，即在每个更新步骤$ t $，沿梯度反方向更新参数：
 \begin{eqnarray}
 {\bm \theta}_{t+1}&=&{\bm \theta}_{t}-\alpha \cdot \frac{\partial J({\bm \theta}_t)}{\partial {\bm \theta}_t}
 \label{}