|
|
@@ -143,7 +143,7 @@ |
|
|
|
"cell_type": "markdown", |
|
|
|
"metadata": {}, |
|
|
|
"source": [ |
|
|
|
"## 2. 推导过程\n", |
|
|
|
"## 3. 推导过程\n", |
|
|
|
"\n", |
|
|
|
"首先,我们要明确一下我们要求什么,我们要求的是我们的$loss$对于神经元输出($z_i$)的梯度,即:\n", |
|
|
|
"\n", |
|
|
@@ -158,14 +158,26 @@ |
|
|
|
"$$\n", |
|
|
|
"\n", |
|
|
|
"有个人可能有疑问了,这里为什么是$a_j$而不是$a_i$,这里要看一下$softmax$的公式了,因为$softmax$公式的特性,它的分母包含了所有神经元的输出,所以,对于不等于$i$的其他输出里面,也包含着$z_i$,所有的$a$都要纳入到计算范围中,并且后面的计算可以看到需要分为$i = j$和$i \\ne j$两种情况求导。\n", |
|
|
|
"\n", |
|
|
|
"### 2.1 针对$a_j$的偏导\n", |
|
|
|
"\n" |
|
|
|
] |
|
|
|
}, |
|
|
|
{ |
|
|
|
"cell_type": "markdown", |
|
|
|
"metadata": {}, |
|
|
|
"source": [ |
|
|
|
"### 3.1 针对$a_j$的偏导\n", |
|
|
|
"\n", |
|
|
|
"$$\n", |
|
|
|
"\\frac{\\partial C}{\\partial a_j} = \\frac{(\\partial -\\sum_j y_j ln a_j)}{\\partial a_j} = -\\sum_j y_j \\frac{1}{a_j}\n", |
|
|
|
"$$\n", |
|
|
|
"\n", |
|
|
|
"### 2.2 针对$z_i$的偏导\n", |
|
|
|
"\n" |
|
|
|
] |
|
|
|
}, |
|
|
|
{ |
|
|
|
"cell_type": "markdown", |
|
|
|
"metadata": {}, |
|
|
|
"source": [ |
|
|
|
"### 3.2 针对$z_i$的偏导\n", |
|
|
|
"\n", |
|
|
|
"如果 $i=j$ :\n", |
|
|
|
"\n", |
|
|
@@ -188,8 +200,14 @@ |
|
|
|
"$$\n", |
|
|
|
"(\\frac{u}{v})' = \\frac{u'v - uv'}{v^2} \n", |
|
|
|
"$$\n", |
|
|
|
"\n", |
|
|
|
"### 2.3 整体的推导\n", |
|
|
|
"\n" |
|
|
|
] |
|
|
|
}, |
|
|
|
{ |
|
|
|
"cell_type": "markdown", |
|
|
|
"metadata": {}, |
|
|
|
"source": [ |
|
|
|
"### 3.3 整体的推导\n", |
|
|
|
"\n", |
|
|
|
"\\begin{eqnarray}\n", |
|
|
|
"\\frac{\\partial C}{\\partial z_i} & = & (-\\sum_j y_j \\frac{1}{a_j} ) \\frac{\\partial a_j}{\\partial z_i} \\\\\n", |
|
|
@@ -234,7 +252,7 @@ |
|
|
|
"cell_type": "markdown", |
|
|
|
"metadata": {}, |
|
|
|
"source": [ |
|
|
|
"## 3. 问题\n", |
|
|
|
"## 4. 问题\n", |
|
|
|
"如何将本节所讲的softmax,交叉熵代价函数应用到上节所讲的BP方法中?" |
|
|
|
] |
|
|
|
}, |
|
|
|