You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

softmax_ce.ipynb 6.8 kB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176
  1. {
  2. "cells": [
  3. {
  4. "cell_type": "markdown",
  5. "metadata": {},
  6. "source": [
  7. "# Softmax & 交叉熵代价函数\n"
  8. ]
  9. },
  10. {
  11. "cell_type": "markdown",
  12. "metadata": {},
  13. "source": [
  14. "softmax经常被添加在分类任务的神经网络中的输出层,神经网络的反向传播中关键的步骤就是求导,从这个过程也可以更深刻地理解反向传播的过程,还可以对梯度传播的问题有更多的思考。\n",
  15. "\n",
  16. "## softmax 函数\n",
  17. "\n",
  18. "softmax(柔性最大值)函数,一般在神经网络中, softmax可以作为分类任务的输出层。其实可以认为softmax输出的是几个类别选择的概率,比如我有一个分类任务,要分为三个类,softmax函数可以根据它们相对的大小,输出三个类别选取的概率,并且概率和为1。\n",
  19. "\n",
  20. "softmax函数的公式是这种形式:\n",
  21. "\n",
  22. "$$\n",
  23. "S_i = \\frac{e^{z_i}}{\\sum_k e^{z_k}}\n",
  24. "$$\n",
  25. "\n",
  26. "* $S_i$是经过softmax的类别概率输出\n",
  27. "* $z_k$是神经元的输出\n",
  28. "\n",
  29. "\n",
  30. "更形象的如下图表示:\n",
  31. "\n",
  32. "![softmax_demo](images/softmax_demo.png)\n",
  33. "\n",
  34. "softmax直白来说就是将原来输出是$[3,1,-3]$通过softmax函数一作用,就映射成为(0,1)的值,而这些值的累和为1(满足概率的性质),那么我们就可以将它理解成概率,在最后选取输出结点的时候,我们就可以选取概率最大(也就是值对应最大的)结点,作为我们的预测目标!\n",
  35. "\n",
  36. "\n",
  37. "\n",
  38. "首先是神经元的输出,一个神经元如下图:\n",
  39. "\n",
  40. "![softmax_neuron](images/softmax_neuron.png)\n",
  41. "\n",
  42. "神经元的输出设为:\n",
  43. "\n",
  44. "$$\n",
  45. "z_i = \\sum_{j} w_{ij} x_{j} + b\n",
  46. "$$\n",
  47. "\n",
  48. "其中$W_{ij}$是第$i$个神经元的第$j$个权重,$b$是偏置。$z_i$表示该网络的第$i$个输出。\n",
  49. "\n",
  50. "给这个输出加上一个softmax函数,那就变成了这样:\n",
  51. "\n",
  52. "$$\n",
  53. "a_i = \\frac{e^{z_i}}{\\sum_k e^{z_k}}\n",
  54. "$$\n",
  55. "\n",
  56. "$a_i$代表softmax的第$i$个输出值,右侧套用了softmax函数。\n",
  57. "\n",
  58. "\n",
  59. "### 损失函数 loss function\n",
  60. "\n",
  61. "在神经网络反向传播中,要求一个损失函数,这个损失函数其实表示的是真实值与网络的估计值的误差,知道误差了,才能知道怎样去修改网络中的权重。\n",
  62. "\n",
  63. "损失函数可以有很多形式,这里用的是交叉熵函数,主要是由于这个求导结果比较简单,易于计算,并且交叉熵解决某些损失函数学习缓慢的问题。**[交叉熵函数](https://blog.csdn.net/u014313009/article/details/51043064)**是这样的:\n",
  64. "\n",
  65. "$$\n",
  66. "C = - \\sum_i y_i ln a_i\n",
  67. "$$\n",
  68. "\n",
  69. "其中$y_i$表示真实的分类结果。\n",
  70. "\n"
  71. ]
  72. },
  73. {
  74. "cell_type": "markdown",
  75. "metadata": {},
  76. "source": [
  77. "## 推导过程\n",
  78. "\n",
  79. "首先,我们要明确一下我们要求什么,我们要求的是我们的$loss$对于神经元输出($z_i$)的梯度,即:\n",
  80. "\n",
  81. "$$\n",
  82. "\\frac{\\partial C}{\\partial z_i}\n",
  83. "$$\n",
  84. "\n",
  85. "根据复合函数求导法则:\n",
  86. "\n",
  87. "$$\n",
  88. "\\frac{\\partial C}{\\partial z_i} = \\frac{\\partial C}{\\partial a_j} \\frac{\\partial a_j}{\\partial z_i}\n",
  89. "$$\n",
  90. "\n",
  91. "有个人可能有疑问了,这里为什么是$a_j$而不是$a_i$,这里要看一下$softmax$的公式了,因为$softmax$公式的特性,它的分母包含了所有神经元的输出,所以,对于不等于i的其他输出里面,也包含着$z_i$,所有的$a$都要纳入到计算范围中,并且后面的计算可以看到需要分为$i = j$和$i \\ne j$两种情况求导。\n",
  92. "\n",
  93. "### 针对$a_j$的偏导\n",
  94. "\n",
  95. "$$\n",
  96. "\\frac{\\partial C}{\\partial a_j} = \\frac{(\\partial -\\sum_j y_j ln a_j)}{\\partial a_j} = -\\sum_j y_j \\frac{1}{a_j}\n",
  97. "$$\n",
  98. "\n",
  99. "### 针对$z_i$的偏导\n",
  100. "\n",
  101. "如果 $i=j$ :\n",
  102. "\n",
  103. "\\begin{eqnarray}\n",
  104. "\\frac{\\partial a_i}{\\partial z_i} & = & \\frac{\\partial (\\frac{e^{z_i}}{\\sum_k e^{z_k}})}{\\partial z_i} \\\\\n",
  105. " & = & \\frac{\\sum_k e^{z_k} e^{z_i} - (e^{z_i})^2}{\\sum_k (e^{z_k})^2} \\\\\n",
  106. " & = & (\\frac{e^{z_i}}{\\sum_k e^{z_k}} ) (1 - \\frac{e^{z_i}}{\\sum_k e^{z_k}} ) \\\\\n",
  107. " & = & a_i (1 - a_i)\n",
  108. "\\end{eqnarray}\n",
  109. "\n",
  110. "如果 $i \\ne j$:\n",
  111. "\\begin{eqnarray}\n",
  112. "\\frac{\\partial a_j}{\\partial z_i} & = & \\frac{\\partial (\\frac{e^{z_j}}{\\sum_k e^{z_k}})}{\\partial z_i} \\\\\n",
  113. " & = & \\frac{0 \\cdot \\sum_k e^{z_k} - e^{z_j} \\cdot e^{z_i} }{(\\sum_k e^{z_k})^2} \\\\\n",
  114. " & = & - \\frac{e^{z_j}}{\\sum_k e^{z_k}} \\cdot \\frac{e^{z_i}}{\\sum_k e^{z_k}} \\\\\n",
  115. " & = & -a_j a_i\n",
  116. "\\end{eqnarray}\n",
  117. "\n",
  118. "当u,v都是变量的函数时的导数推导公式:\n",
  119. "$$\n",
  120. "(\\frac{u}{v})' = \\frac{u'v - uv'}{v^2} \n",
  121. "$$\n",
  122. "\n",
  123. "### 整体的推导\n",
  124. "\n",
  125. "\\begin{eqnarray}\n",
  126. "\\frac{\\partial C}{\\partial z_i} & = & (-\\sum_j y_j \\frac{1}{a_j} ) \\frac{\\partial a_j}{\\partial z_i} \\\\\n",
  127. " & = & - \\frac{y_i}{a_i} a_i ( 1 - a_i) + \\sum_{j \\ne i} \\frac{y_j}{a_j} a_i a_j \\\\\n",
  128. " & = & -y_i + y_i a_i + \\sum_{j \\ne i} y_j a_i \\\\\n",
  129. " & = & -y_i + a_i \\sum_{j} y_j\n",
  130. "\\end{eqnarray}"
  131. ]
  132. },
  133. {
  134. "cell_type": "markdown",
  135. "metadata": {},
  136. "source": [
  137. "## 问题\n",
  138. "如何将本节所讲的softmax,交叉熵代价函数应用到上节所讲的方法中?"
  139. ]
  140. },
  141. {
  142. "cell_type": "markdown",
  143. "metadata": {},
  144. "source": [
  145. "## References\n",
  146. "\n",
  147. "* Softmax & 交叉熵\n",
  148. " * [交叉熵代价函数(作用及公式推导)](https://blog.csdn.net/u014313009/article/details/51043064)\n",
  149. " * [手打例子一步一步带你看懂softmax函数以及相关求导过程](https://www.jianshu.com/p/ffa51250ba2e)\n",
  150. " * [简单易懂的softmax交叉熵损失函数求导](https://www.jianshu.com/p/c02a1fbffad6)"
  151. ]
  152. }
  153. ],
  154. "metadata": {
  155. "kernelspec": {
  156. "display_name": "Python 3",
  157. "language": "python",
  158. "name": "python3"
  159. },
  160. "language_info": {
  161. "codemirror_mode": {
  162. "name": "ipython",
  163. "version": 3
  164. },
  165. "file_extension": ".py",
  166. "mimetype": "text/x-python",
  167. "name": "python",
  168. "nbconvert_exporter": "python",
  169. "pygments_lexer": "ipython3",
  170. "version": "3.5.2"
  171. },
  172. "main_language": "python"
  173. },
  174. "nbformat": 4,
  175. "nbformat_minor": 2
  176. }

机器学习越来越多应用到飞行器、机器人等领域,其目的是利用计算机实现类似人类的智能,从而实现装备的智能化与无人化。本课程旨在引导学生掌握机器学习的基本知识、典型方法与技术,通过具体的应用案例激发学生对该学科的兴趣,鼓励学生能够从人工智能的角度来分析、解决飞行器、机器人所面临的问题和挑战。本课程主要内容包括Python编程基础,机器学习模型,无监督学习、监督学习、深度学习基础知识与实现,并学习如何利用机器学习解决实际问题,从而全面提升自我的《综合能力》。