wording

c767cce8 · xiaotong · cae88113 · c767cce8
Commit c767cce8 authored Aug 06, 2018 by xiaotong
--- a/source/tensor/function/LogSoftmax.cpp
+++ b/source/tensor/function/LogSoftmax.cpp
@@ -252,7 +252,7 @@ There are two ways to implement this process.
 Method 1. we compute dE/dy and dy/dx resepectively, and then reach dE/dx by dE/dx = dE/dy * dy/dx
 (or more precisely dE/dx_j = \sum_{i} {dE/dy_i * dy_i/dx_j})
 Method 2. we compute dE/dx (or dE/dx_j) in a single step, rather than resorting to the
-sub-models dE/dy and dy/dx. We can do this by using dE/dx_j = -gold_j + exp(y_j)
+sub-models of dE/dy and dy/dx. We can do this by using dE/dx_j = -gold_j + exp(y_j)

 Here we choose Method 2, i.e., we straightforwardly compute dE/dx_j by

@@ -261,12 +261,12 @@ dE/dx_j = -gold_j + exp(y_j)
 (or dE/dx_j = -\delta(i,j) + exp(y_j) for a Maximum A Posteriori Estimation (MAP))

 Method 1 is also fine but is more time consuming due to the summation over dimensions.
-Note that this method is not good for the standard version softmax when working with
-the cross entropy loss. Because it is numerical unstable. When we use a usual method to
+Note that this method is not good for the standard version softmax when we work with
+the cross entropy loss because it is numerical unstable. When we use a usual method to
 define softmax, we have softmax: y_i = log(e^{x_i} / \sum_{k} e^{x_k}). It is trivial to
-know that dy_i/dx_j = y_i * \delta(i,j) - y_i * y_j. As y_i and y_j could be a small number,
-y_i * y_i would result in a much smaller one with a risk of lossing precision. This is even
-worse we multiply dy_i/dx_j with dE/dy_i. So it is in general to use log softmax instead for
+know that dy_i/dx_j = y_i * \delta(i,j) - y_i * y_j. As y_i and y_j could be small numbers,
+y_i * y_i would result in a much smaller value with a risk of lossing precision. This is even
+worse we multiply dy_i/dx_j with dE/dy_i. So it is in general to use log softmax for
 better numerical stability.

 >> gold - gold standard to measure error (or loss)