更新 chapter9.tex

e95e8678 · 孟霞 · b7d33ced · e95e8678
Commit e95e8678 authored Nov 23, 2020 by 孟霞
--- a/Chapter9/chapter9.tex
+++ b/Chapter9/chapter9.tex
@@ -1179,7 +1179,7 @@ y&=&{\textrm{Sigmoid}}({\textrm{Tanh}}({\mathbi{x}}\cdot {\mathbi{W}}^{[1]}+{\ma
 \label{eq:9-28}
 \end{eqnarray}

-\noindent 其中，$ \widehat{\bm \theta} $表示在训练数据上使损失的平均值达到最小的参数。$ \frac{1}{n}\sum_{i=1}^{n}{L({\mathbi{x}}_i,\widetilde{\mathbi{y}}_i;{\bm \theta})} $也被称作{\small\sffamily\bfseries{代价函数}}\index{代价函数}（Cost Function）\index{Cost Function}，它是损失函数均值期望的估计，记为$ J({\bm \theta}) $。
+\noindent 其中，$ \widehat{\bm \theta} $表示在训练数据上使损失的平均值达到最小的参数，$n$为训练数据总量。$ \frac{1}{n}\sum_{i=1}^{n}{L({\mathbi{x}}_i,\widetilde{\mathbi{y}}_i;{\bm \theta})} $也被称作{\small\sffamily\bfseries{代价函数}}\index{代价函数}（Cost Function）\index{Cost Function}，它是损失函数均值期望的估计，记为$ J({\bm \theta}) $。

 \parinterval 参数优化的核心问题是：找到使代价函数$ J({\bm\theta}) $达到最小的$ \bm \theta $。然而$ J({\bm\theta}) $可能会包含大量的参数，比如，基于神经网络的机器翻译模型的参数量可能会超过一亿个。这时不可能用手动方法进行调参。为了实现高效的参数优化，比较常用的手段是使用{\small\bfnew{梯度下降方法}}\index{梯度下降方法}（The Gradient Descent Method）\index{The Gradient Descent Method}。