You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

3-softmax_ce.ipynb 12 kB

3 years ago
3 years ago
3 years ago
3 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316
  1. {
  2. "cells": [
  3. {
  4. "cell_type": "markdown",
  5. "metadata": {},
  6. "source": [
  7. "# Softmax & 交叉熵代价函数\n"
  8. ]
  9. },
  10. {
  11. "cell_type": "markdown",
  12. "metadata": {},
  13. "source": [
  14. "`Softmax`经常被添加在分类任务的神经网络中的输出层,神经网络的反向传播中关键的步骤就是求导,从这个过程也可以更深刻地理解反向传播的过程,还可以对梯度传播的问题有更多的思考。\n",
  15. "\n",
  16. "## 1. softmax 函数\n",
  17. "\n",
  18. "`softmax`(柔性最大值)函数,一般在神经网络中, `softmax`可以作为分类任务的输出层。其实可以认为`softmax`输出的是几个类别选择的概率,比如有一个分类任务,要分为三个类,softmax函数可以根据它们相对的大小,输出三个类别选取的概率,并且概率和为1。\n",
  19. "\n",
  20. "Softmax从字面上来说,可以分成`soft`和`max`两个部分。`max`故名思议就是最大值的意思。Softmax的核心在于`soft`,而`soft`有软的含义,与之相对的是`hard`硬。很多场景中需要找出数组所有元素中值最大的元素,实质上都是求的`hardmax`。下面使用`Numpy`模块实现hardmax。"
  21. ]
  22. },
  23. {
  24. "cell_type": "code",
  25. "execution_count": 1,
  26. "metadata": {},
  27. "outputs": [
  28. {
  29. "name": "stdout",
  30. "output_type": "stream",
  31. "text": [
  32. "5\n"
  33. ]
  34. }
  35. ],
  36. "source": [
  37. "import numpy as np\n",
  38. "\n",
  39. "a = np.array([1, 2, 3, 4, 5]) # 创建ndarray数组\n",
  40. "a_max = np.max(a)\n",
  41. "print(a_max) # 5"
  42. ]
  43. },
  44. {
  45. "cell_type": "markdown",
  46. "metadata": {},
  47. "source": [
  48. "\n",
  49. "通过上面的例子可以看出hardmax最大的特点就是只选出其中一个最大的值,即非黑即白。但是往往在实际中这种方式是不合情理的,比如对于文本分类来说,一篇文章或多或少包含着各种主题信息,我们更期望得到文章对于每个可能的文本类别的概率值(置信度),可以简单理解成属于对应类别的可信度。所以此时用到了soft的概念,**Softmax的含义就在于不再唯一的确定某一个最大值,而是为每个输出分类的结果都赋予一个概率值,表示属于每个类别的可能性。**\n",
  50. "\n",
  51. "softmax函数的公式是这种形式:\n",
  52. "\n",
  53. "$$\n",
  54. "S_i = \\frac{e^{z_i}}{\\sum_k e^{z_k}}\n",
  55. "$$\n",
  56. "\n",
  57. "* $S_i$是经过softmax的类别概率输出\n",
  58. "* $z_k$是神经元的输出\n",
  59. "\n",
  60. "\n",
  61. "更形象的如下图表示:\n",
  62. "\n",
  63. "![softmax_demo](images/softmax_demo.png)\n",
  64. "\n",
  65. "softmax直白来说就是将原来输出是$[3,1,-3]$通过softmax函数作用,就映射成为(0,1)的值,而这些值的累和为1(满足概率的性质),那么我们就可以将它理解成概率,在最后选取输出结点的时候,选取概率最大(也就是值对应最大的)结点,作为预测目标!\n"
  66. ]
  67. },
  68. {
  69. "cell_type": "markdown",
  70. "metadata": {},
  71. "source": [
  72. "\n",
  73. "\n",
  74. "首先是神经元的输出,一个神经元如下图:\n",
  75. "\n",
  76. "![softmax_neuron](images/softmax_neuron.png)\n",
  77. "\n",
  78. "神经元的输出设为:\n",
  79. "\n",
  80. "$$\n",
  81. "z_i = \\sum_{j} w_{ij} x_{j} + w_b\n",
  82. "$$\n",
  83. "\n",
  84. "其中$W_{ij}$是第$i$个神经元的第$j$个权重,$w_b$是偏置。$z_i$表示该网络的第$i$个输出。**请注意这里没有使用sigmoid等激活函数。**\n",
  85. "\n",
  86. "给这个网络输出加上一个softmax函数,那就变成了这样:\n",
  87. "\n",
  88. "$$\n",
  89. "a_i = \\frac{e^{z_i}}{\\sum_k e^{z_k}}\n",
  90. "$$\n",
  91. "\n",
  92. "$a_i$代表softmax的第$i$个输出值,右侧套用了softmax函数。\n"
  93. ]
  94. },
  95. {
  96. "cell_type": "markdown",
  97. "metadata": {},
  98. "source": [
  99. "## 2. 交叉熵损失函数\n",
  100. "\n",
  101. "在神经网络反向传播中,需要设计一个损失函数,这个损失函数表示真实值与网络的估计值的误差,知道误差了,才能知道怎样去修改网络中的权重。\n",
  102. "\n",
  103. "神经网络的设计目的之一是为了使机器可以像人一样学习知识。**人在学习分析新事物时,当发现自己犯的错误越大时,改正的力度就越大**。比如投篮:当运动员发现自己的投篮方向离正确方向越远,那么他调整的投篮角度就应该越大,篮球就更容易投进篮筐。同理,我们希望:ANN在训练时,如果预测值与实际值的误差越大,那么在反向传播训练的过程中,各种参数调整的幅度就要更大,从而使训练更快收敛。然而,**如果使用二次代价函数训练ANN,看到的实际效果是,如果误差越大,参数调整的幅度可能更小,训练更缓慢。**\n",
  104. " \n"
  105. ]
  106. },
  107. {
  108. "cell_type": "markdown",
  109. "metadata": {},
  110. "source": [
  111. "以一个神经元的二类分类训练为例,进行两次实验(神经网络常用的激活函数为`sigmoid`函数,该实验也采用该函数):输入一个相同的样本数据$x=1.0$(该样本对应的实际分类$y=0$);两次实验各自随机初始化参数,从而在各自的第一次前向传播后得到不同的输出值,形成不同的代价(误差):\n",
  112. "\n",
  113. "![cross_entropy_loss_1](images/cross_entropy_loss_1.png)\n",
  114. "实验1:第一次输出值为0.82\n",
  115. "\n",
  116. "![cross_entropy_loss_2](images/cross_entropy_loss_2.png)\n",
  117. "实验2:第一次输出值为0.98\n",
  118. "\n",
  119. "\n",
  120. "在实验1中,随机初始化参数,使得第一次输出值为0.82(该样本对应的实际值为0);经过300次迭代训练后,输出值由0.82降到0.09,逼近实际值。而在实验2中,第一次输出值为0.98,同样经过300迭代训练,输出值只降到了0.20。\n",
  121. "\n",
  122. "\n",
  123. "神经网络常用的激活函数为sigmoid函数,该函数的曲线如下所示:\n",
  124. "![cross_entropy_loss_sigmod.png](images/cross_entropy_loss_sigmod.png)\n",
  125. "\n",
  126. "如图所示,实验2的初始输出值(0.98)对应的梯度明显小于实验1的输出值(0.82),因此实验2的参数梯度下降得比实验1慢。这就是初始的代价(误差)越大,导致训练越慢的原因。与我们的期望不符,即:不能像人一样,错误越大,改正的幅度越大,从而学习得越快。"
  127. ]
  128. },
  129. {
  130. "cell_type": "markdown",
  131. "metadata": {},
  132. "source": [
  133. "损失函数可以有很多形式,这里用的是交叉熵函数,主要是由于这个求导结果比较简单,易于计算,并且交叉熵解决某些损失函数学习缓慢的问题。**[交叉熵函数](https://blog.csdn.net/u014313009/article/details/51043064)**是这样的:\n",
  134. "\n",
  135. "$$\n",
  136. "C = - \\sum_i y_i ln a_i\n",
  137. "$$\n",
  138. "\n",
  139. "其中$y_i$表示真实的分类结果。\n"
  140. ]
  141. },
  142. {
  143. "cell_type": "markdown",
  144. "metadata": {},
  145. "source": [
  146. "## 3. 推导过程\n",
  147. "\n",
  148. "首先,我们要明确一下我们要求什么,我们要求的是我们的$loss$对于神经元输出($z_i$)的梯度,即:\n",
  149. "\n",
  150. "$$\n",
  151. "\\frac{\\partial C}{\\partial z_i}\n",
  152. "$$\n",
  153. "\n",
  154. "根据复合函数求导法则:\n",
  155. "\n",
  156. "$$\n",
  157. "\\frac{\\partial C}{\\partial z_i} = \\frac{\\partial C}{\\partial a_j} \\frac{\\partial a_j}{\\partial z_i}\n",
  158. "$$\n",
  159. "\n",
  160. "有个人可能有疑问了,这里为什么是$a_j$而不是$a_i$,这里要看一下$softmax$的公式了,因为$softmax$公式的特性,它的分母包含了所有神经元的输出,所以,对于不等于$i$的其他输出里面,也包含着$z_i$,所有的$a$都要纳入到计算范围中,并且后面的计算可以看到需要分为$i = j$和$i \\ne j$两种情况求导。\n",
  161. "\n"
  162. ]
  163. },
  164. {
  165. "cell_type": "markdown",
  166. "metadata": {},
  167. "source": [
  168. "### 3.1 针对$a_j$的偏导\n",
  169. "\n",
  170. "$$\n",
  171. "\\frac{\\partial C}{\\partial a_j} = \\frac{(\\partial -\\sum_j y_j ln a_j)}{\\partial a_j} = -\\sum_j y_j \\frac{1}{a_j}\n",
  172. "$$\n",
  173. "\n"
  174. ]
  175. },
  176. {
  177. "cell_type": "markdown",
  178. "metadata": {},
  179. "source": [
  180. "### 3.2 针对$z_i$的偏导\n",
  181. "\n",
  182. "如果 $i=j$ :\n",
  183. "\n",
  184. "\\begin{eqnarray}\n",
  185. "\\frac{\\partial a_i}{\\partial z_i} & = & \\frac{\\partial (\\frac{e^{z_i}}{\\sum_k e^{z_k}})}{\\partial z_i} \\\\\n",
  186. " & = & \\frac{\\sum_k e^{z_k} e^{z_i} - (e^{z_i})^2}{\\sum_k (e^{z_k})^2} \\\\\n",
  187. " & = & (\\frac{e^{z_i}}{\\sum_k e^{z_k}} ) (1 - \\frac{e^{z_i}}{\\sum_k e^{z_k}} ) \\\\\n",
  188. " & = & a_i (1 - a_i)\n",
  189. "\\end{eqnarray}\n",
  190. "\n",
  191. "如果 $i \\ne j$:\n",
  192. "\\begin{eqnarray}\n",
  193. "\\frac{\\partial a_j}{\\partial z_i} & = & \\frac{\\partial (\\frac{e^{z_j}}{\\sum_k e^{z_k}})}{\\partial z_i} \\\\\n",
  194. " & = & \\frac{0 \\cdot \\sum_k e^{z_k} - e^{z_j} \\cdot e^{z_i} }{(\\sum_k e^{z_k})^2} \\\\\n",
  195. " & = & - \\frac{e^{z_j}}{\\sum_k e^{z_k}} \\cdot \\frac{e^{z_i}}{\\sum_k e^{z_k}} \\\\\n",
  196. " & = & -a_j a_i\n",
  197. "\\end{eqnarray}\n",
  198. "\n",
  199. "当$u$,$v$都是变量的函数时的导数推导公式:\n",
  200. "$$\n",
  201. "(\\frac{u}{v})' = \\frac{u'v - uv'}{v^2} \n",
  202. "$$\n",
  203. "\n"
  204. ]
  205. },
  206. {
  207. "cell_type": "markdown",
  208. "metadata": {},
  209. "source": [
  210. "### 3.3 整体的推导\n",
  211. "\n",
  212. "\\begin{eqnarray}\n",
  213. "\\frac{\\partial C}{\\partial z_i} & = & (-\\sum_j y_j \\frac{1}{a_j} ) \\frac{\\partial a_j}{\\partial z_i} \\\\\n",
  214. " & = & - \\frac{y_i}{a_i} a_i ( 1 - a_i) + \\sum_{j \\ne i} \\frac{y_j}{a_j} a_i a_j \\\\\n",
  215. " & = & -y_i + y_i a_i + \\sum_{j \\ne i} y_j a_i \\\\\n",
  216. " & = & -y_i + a_i \\sum_{j} y_j \\\\\n",
  217. " & = & -y_i + a_i\n",
  218. "\\end{eqnarray}"
  219. ]
  220. },
  221. {
  222. "cell_type": "markdown",
  223. "metadata": {},
  224. "source": [
  225. "### 3.4 参数更新\n",
  226. "\n",
  227. "误差与参数的偏导为:\n",
  228. "$$\n",
  229. "\\frac{\\partial C}{\\partial w_{ij}} = (-y_i + a_i) x_i\n",
  230. "$$\n",
  231. "\n",
  232. "误差项为:\n",
  233. "$$\n",
  234. "\\delta_i = -(-y_i + a_i)\n",
  235. "$$\n",
  236. "\n",
  237. "参数跟新公式为:\n",
  238. "$$\n",
  239. "w_{ij} = w_{ij} + \\eta \\delta_i x_i\n",
  240. "$$\n",
  241. "\n",
  242. "\n",
  243. "其中\n",
  244. "$$\n",
  245. "a_i = \\frac{e^{z_i}}{\\sum_k e^{z_k}}\n",
  246. "$$\n",
  247. "\n",
  248. "$$\n",
  249. "z_i = \\sum_{j} w_{ij} x_{j} + w_b\n",
  250. "$$\n",
  251. "\n"
  252. ]
  253. },
  254. {
  255. "cell_type": "markdown",
  256. "metadata": {},
  257. "source": [
  258. "### 3.4 二次代价函数的更行方程\n",
  259. "\n",
  260. "最为对比,使用二次代价函数的更新方程为:\n",
  261. "\n",
  262. "$$\n",
  263. "\\delta_i = a_i (1-a_i) (y_i - a_i)\n",
  264. "$$\n",
  265. "\n",
  266. "$$\n",
  267. "w_{ji} = w_{ji} + \\eta \\delta_j x_{ji}\n",
  268. "$$\n",
  269. "\n",
  270. "需要注意这里 $w_{ji}$ 和上面的定义不太一样!"
  271. ]
  272. },
  273. {
  274. "cell_type": "markdown",
  275. "metadata": {},
  276. "source": [
  277. "## 4. 问题\n",
  278. "如何将本节所讲的softmax,交叉熵代价函数应用到上节所讲的BP方法中?"
  279. ]
  280. },
  281. {
  282. "cell_type": "markdown",
  283. "metadata": {},
  284. "source": [
  285. "## 参考资料\n",
  286. "\n",
  287. "* [一文详解Softmax函数](https://zhuanlan.zhihu.com/p/105722023)\n",
  288. "* [损失函数:交叉熵详解](https://zhuanlan.zhihu.com/p/115277553)\n",
  289. "* [交叉熵代价函数(作用及公式推导)](https://blog.csdn.net/u014313009/article/details/51043064)\n",
  290. "* [手打例子一步一步带你看懂softmax函数以及相关求导过程](https://www.jianshu.com/p/ffa51250ba2e)\n",
  291. "* [简单易懂的softmax交叉熵损失函数求导](https://www.jianshu.com/p/c02a1fbffad6)"
  292. ]
  293. }
  294. ],
  295. "metadata": {
  296. "kernelspec": {
  297. "display_name": "Python 3",
  298. "language": "python",
  299. "name": "python3"
  300. },
  301. "language_info": {
  302. "codemirror_mode": {
  303. "name": "ipython",
  304. "version": 3
  305. },
  306. "file_extension": ".py",
  307. "mimetype": "text/x-python",
  308. "name": "python",
  309. "nbconvert_exporter": "python",
  310. "pygments_lexer": "ipython3",
  311. "version": "3.7.9"
  312. }
  313. },
  314. "nbformat": 4,
  315. "nbformat_minor": 2
  316. }

机器学习越来越多应用到飞行器、机器人等领域,其目的是利用计算机实现类似人类的智能,从而实现装备的智能化与无人化。本课程旨在引导学生掌握机器学习的基本知识、典型方法与技术,通过具体的应用案例激发学生对该学科的兴趣,鼓励学生能够从人工智能的角度来分析、解决飞行器、机器人所面临的问题和挑战。本课程主要内容包括Python编程基础,机器学习模型,无监督学习、监督学习、深度学习基础知识与实现,并学习如何利用机器学习解决实际问题,从而全面提升自我的《综合能力》。