You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

3-softmax_ce_EN.ipynb 7.3 kB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174
  1. {
  2. "cells": [
  3. {
  4. "cell_type": "markdown",
  5. "metadata": {},
  6. "source": [
  7. "# Softmax & Cross entropy cost function\n"
  8. ]
  9. },
  10. {
  11. "cell_type": "markdown",
  12. "metadata": {},
  13. "source": [
  14. "Softmax is often added as an output layer in the neural network for sorting tasks, the key process in the backward propagation is derivation. This process can also provide a deeper understanding of the back propagation process and give more thought to the problem of gradient propagation.\n",
  15. "\n",
  16. "## 1. softmax function\n",
  17. "\n",
  18. "Softmax(Flexible maximum) function, usually in neural network, can work as the output layer of classification assignment. Actually we can think of softmax output as the probability of selecting several categories. For example, If I have a classification task that is divided into three classes, the Softmax function can output the probability of the selection of the three classes based on their relative size, and the probability sum is 1.\n",
  19. "\n",
  20. "The form of softmax function is:\n",
  21. "\n",
  22. "$$\n",
  23. "S_i = \\frac{e^{z_i}}{\\sum_k e^{z_k}}\n",
  24. "$$\n",
  25. "\n",
  26. "* $S_i$ is the class probability output that pass through the softmax\n",
  27. "* $z_k$ is the output of neuron\n",
  28. "\n",
  29. "More vivid expression is shown as the following graph:\n",
  30. "\n",
  31. "![softmax_demo](images/softmax_demo.png)\n",
  32. "\n",
  33. "Softmax straightforward is the original output is $[3, 1, 3] $by softmax function role, is mapping the value of (0, 1), and these values are tired and 1 (meet the properties of probability), then we can understand it into probability, in the final selection of the output nodes, we can choose most probability (that is, value corresponding to the largest) node, as we predict the goal.\n",
  34. "softm\n",
  35. "\n",
  36. "First is the output of neuron, the following graph shows a neuron:\n",
  37. "\n",
  38. "![softmax_neuron](images/softmax_neuron.png)\n",
  39. "\n",
  40. "we assume that the output of neuron is:\n",
  41. "\n",
  42. "$$\n",
  43. "z_i = \\sum_{j} w_{ij} x_{j} + b\n",
  44. "$$\n",
  45. "\n",
  46. "Among them $W_{ij}$ is the $jth$ weight of $ith$ neuron and $b$ is the bias. $z_i$ represent the $ith$ output of this network.\n",
  47. "\n",
  48. "Add a softmax function to the outpur we have:\n",
  49. "\n",
  50. "$$\n",
  51. "a_i = \\frac{e^{z_i}}{\\sum_k e^{z_k}}\n",
  52. "$$\n",
  53. "\n",
  54. "$a_i$ represent the $ith$ output value of softmax, while the right side uses softmax function.\n",
  55. "\n",
  56. "\n",
  57. "### 1.1 loss function\n",
  58. "\n",
  59. "In the propagation of neural networks, we need to calculate a loss function, this loss function is actually the error between the true value and the estimation of network. Only when we get the error, it is possible to know how to change the weight in the network.\n",
  60. "\n",
  61. "There are many form of loss function, what we used here is the cross entropy function, it is mainly because that the derivation reasult is quiet easy and convenient to calculate, and cross entropy can solve some lower learning rate problem**[Cross entropy function](https://blog.csdn.net/u014313009/article/details/51043064)**is this:\n",
  62. "\n",
  63. "$$\n",
  64. "C = - \\sum_i y_i ln a_i\n",
  65. "$$\n",
  66. "\n",
  67. "Among them $y_i$ represent the truly classification result.\n",
  68. "\n"
  69. ]
  70. },
  71. {
  72. "cell_type": "markdown",
  73. "metadata": {},
  74. "source": [
  75. "## 2. Derive process\n",
  76. "\n",
  77. "Firstly, we need to make sure what we want, we want to get the gradient of our $loss$ to neuron output($z_i$), which is:\n",
  78. "\n",
  79. "$$\n",
  80. "\\frac{\\partial C}{\\partial z_i}\n",
  81. "$$\n",
  82. "\n",
  83. "According to the derivation rule of composite function:\n",
  84. "\n",
  85. "$$\n",
  86. "\\frac{\\partial C}{\\partial z_i} = \\frac{\\partial C}{\\partial a_j} \\frac{\\partial a_j}{\\partial z_i}\n",
  87. "$$\n",
  88. "\n",
  89. "Someone may have question, why we have $a_j$ instead of $a_i$. We need to check the formula of $softmax$ here, because of the special characteristcs, its denominatorc contains all the output of neurons. Therefore, for the other output which do not equal to i, it also contains $z_i$, all the $a$ are needed to be included into the calcultaion range and the calcultaion backwards need to be divide into two parts, which is $i = j$ and $i\\ne j$.\n",
  90. "\n",
  91. "### 2.1 The partial derviation of $a_j$\n",
  92. "\n",
  93. "$$\n",
  94. "\\frac{\\partial C}{\\partial a_j} = \\frac{(\\partial -\\sum_j y_j ln a_j)}{\\partial a_j} = -\\sum_j y_j \\frac{1}{a_j}\n",
  95. "$$\n",
  96. "\n",
  97. "### 2.2 The partial derviation of $z_i$\n",
  98. "\n",
  99. "If $i=j$ :\n",
  100. "\n",
  101. "\\begin{eqnarray}\n",
  102. "\\frac{\\partial a_i}{\\partial z_i} & = & \\frac{\\partial (\\frac{e^{z_i}}{\\sum_k e^{z_k}})}{\\partial z_i} \\\\\n",
  103. " & = & \\frac{\\sum_k e^{z_k} e^{z_i} - (e^{z_i})^2}{\\sum_k (e^{z_k})^2} \\\\\n",
  104. " & = & (\\frac{e^{z_i}}{\\sum_k e^{z_k}} ) (1 - \\frac{e^{z_i}}{\\sum_k e^{z_k}} ) \\\\\n",
  105. " & = & a_i (1 - a_i)\n",
  106. "\\end{eqnarray}\n",
  107. "\n",
  108. "IF $i \\ne j$:\n",
  109. "\\begin{eqnarray}\n",
  110. "\\frac{\\partial a_j}{\\partial z_i} & = & \\frac{\\partial (\\frac{e^{z_j}}{\\sum_k e^{z_k}})}{\\partial z_i} \\\\\n",
  111. " & = & \\frac{0 \\cdot \\sum_k e^{z_k} - e^{z_j} \\cdot e^{z_i} }{(\\sum_k e^{z_k})^2} \\\\\n",
  112. " & = & - \\frac{e^{z_j}}{\\sum_k e^{z_k}} \\cdot \\frac{e^{z_i}}{\\sum_k e^{z_k}} \\\\\n",
  113. " & = & -a_j a_i\n",
  114. "\\end{eqnarray}\n",
  115. "\n",
  116. "When u, v are the dependent variable the derivation formula of derivative:\n",
  117. "$$\n",
  118. "(\\frac{u}{v})' = \\frac{u'v - uv'}{v^2} \n",
  119. "$$\n",
  120. "\n",
  121. "### 2.3 Derivation of the whole\n",
  122. "\n",
  123. "\\begin{eqnarray}\n",
  124. "\\frac{\\partial C}{\\partial z_i} & = & (-\\sum_j y_j \\frac{1}{a_j} ) \\frac{\\partial a_j}{\\partial z_i} \\\\\n",
  125. " & = & - \\frac{y_i}{a_i} a_i ( 1 - a_i) + \\sum_{j \\ne i} \\frac{y_j}{a_j} a_i a_j \\\\\n",
  126. " & = & -y_i + y_i a_i + \\sum_{j \\ne i} y_j a_i \\\\\n",
  127. " & = & -y_i + a_i \\sum_{j} y_j \\\\\n",
  128. " & = & -y_i + a_i\n",
  129. "\\end{eqnarray}"
  130. ]
  131. },
  132. {
  133. "cell_type": "markdown",
  134. "metadata": {},
  135. "source": [
  136. "## 3. Question\n",
  137. "How to apply the softmax, cross entropy cost function in this section to the BP method in the previous section?"
  138. ]
  139. },
  140. {
  141. "cell_type": "markdown",
  142. "metadata": {},
  143. "source": [
  144. "## References\n",
  145. "\n",
  146. "* Softmax & 交叉熵\n",
  147. " * [交叉熵代价函数(作用及公式推导)](https://blog.csdn.net/u014313009/article/details/51043064)\n",
  148. " * [手打例子一步一步带你看懂softmax函数以及相关求导过程](https://www.jianshu.com/p/ffa51250ba2e)\n",
  149. " * [简单易懂的softmax交叉熵损失函数求导](https://www.jianshu.com/p/c02a1fbffad6)"
  150. ]
  151. }
  152. ],
  153. "metadata": {
  154. "kernelspec": {
  155. "display_name": "Python 3",
  156. "language": "python",
  157. "name": "python3"
  158. },
  159. "language_info": {
  160. "codemirror_mode": {
  161. "name": "ipython",
  162. "version": 3
  163. },
  164. "file_extension": ".py",
  165. "mimetype": "text/x-python",
  166. "name": "python",
  167. "nbconvert_exporter": "python",
  168. "pygments_lexer": "ipython3",
  169. "version": "3.6.8"
  170. }
  171. },
  172. "nbformat": 4,
  173. "nbformat_minor": 2
  174. }

机器学习越来越多应用到飞行器、机器人等领域,其目的是利用计算机实现类似人类的智能,从而实现装备的智能化与无人化。本课程旨在引导学生掌握机器学习的基本知识、典型方法与技术,通过具体的应用案例激发学生对该学科的兴趣,鼓励学生能够从人工智能的角度来分析、解决飞行器、机器人所面临的问题和挑战。本课程主要内容包括Python编程基础,机器学习模型,无监督学习、监督学习、深度学习基础知识与实现,并学习如何利用机器学习解决实际问题,从而全面提升自我的《综合能力》。