You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

knn_classification_EN.ipynb 88 kB


  1. {
  2. "cells": [
  3. {
  4. "cell_type": "markdown",
  5. "metadata": {},
  6. "source": [
  7. "# kNN Classification\n",
  8. "\n",
  9. "\n",
  10. "K-Nearest Neighbor (kNN) classification algorithm is a mature method in theory and one of the simplest machine learning algorithms. The idea of this method is:***If a sample has k most similar smaples(have a shortest distance in characteristic space) which are mostly belong to a category, then the sample also belongs to this category.*** Although KNN method also depends on limit theorem in principle, it is only related to a very small number of adjacent samples when making category decisions. Because the KNN method mainly depends on the limited neighboring samples, rather than the method of judging the class domain, the KNN method is more suitable than other methods for the sample set which has more overlapping or overlapping class domains. \n",
  11. "\n",
  12. "KNN algorithm can be used not only for classification, but also for regression. KNN algorithm can be used not only for classification, but also for regression. The attributes of a sample can be obtained by finding out k nearest neighbors of the sample and assigning the average value of the attributes of these neighbors to the sample. A more useful method is to give different weights to the influence of neighbors with different distances on the sample, for example, the weights are proportional to the distance (combinatorial function).\n",
  13. "\n",
  14. "The main disadvantage of this algorithm in classification is that when the samples are unbalanced, for example, the sample size of one class is very large, while the sample size of other classes is very small, which may lead to a large number samples contains most in the k neighbors of the sample when a new sample is input.In this case, the result of misjudgment may be produced. Therefore, we need to reduce the influence of quantity on operation results. \n",
  15. "another disadvantage of this method is that it is computationally intensive, because the distance between each text to be classified and all known samples must be calculated in order to obtain its K nearest neighbors. At present, the commonly used solution is to clip the known sample points in advance and remove the samples which have little effect on classification in advance. This algorithm is more suitable for automatic classification of class domains with large sample size, while those with small sample size are more prone to mismatching.\n",
  16. "\n",
  17. "K-NN is the most direct method to classify unknown data. Basically, you can understand what K-NN does through the following picture and text description\n",
  18. "![knn](images/knn.png)\n",
  19. "\n",
  20. "In short,k-NN can be seen as:**While you have a set of data which have already being sorted, \n",
  21. "There is a pile of data that you already know the classification, and then when a new data enters, you start to find the distance from each point in the training data, and then pick the K points closest to this training data to see what these points belong to, and then classify the new data with the principle that the minority obeys the majority.**\n"
  22. ]
  23. },
  24. {
  25. "cell_type": "markdown",
  26. "metadata": {},
  27. "source": [
  28. "## Algorithm steps:\n",
  29. "\n",
  30. "* step.1---Import training sample\n",
  31. "* step.2---Transfer the featuresof the sample into numbers\n",
  32. "* step.3---Calculate the distance between unkonwn sample and training sample.\n",
  33. "* step.4---Record the distace calculated in step 3 and save the category which training sample belong.\n",
  34. "* step.5---Repeat step2,3, until we calculate all the distance.\n",
  35. "* step.6---Sort the training data according to the distance with unkonwn sample and find the K nearest sample.\n",
  36. "* step.7---Count the number of occurrences of each class label in the K-nearest neighbor sample\n",
  37. "* step.8---Choose the label with the highest occurrence frequency as the class label of the unknown sample"
  38. ]
  39. },
  40. {
  41. "cell_type": "markdown",
  42. "metadata": {},
  43. "source": [
  44. "## 生成数据"
  45. ]
  46. },
  47. {
  48. "cell_type": "code",
  49. "execution_count": 1,
  50. "metadata": {},
  51. "outputs": [
  52. {
  53. "data": {
  54. "image/png": "\n",
  55. "text/plain": [
  56. "<Figure size 432x288 with 1 Axes>"
  57. ]
  58. },
  59. "metadata": {
  60. "needs_background": "light"
  61. },
  62. "output_type": "display_data"
  63. },
  64. {
  65. "data": {
  66. "image/png": "\n",
  67. "text/plain": [
  68. "<Figure size 432x288 with 1 Axes>"
  69. ]
  70. },
  71. "metadata": {
  72. "needs_background": "light"
  73. },
  74. "output_type": "display_data"
  75. }
  76. ],
  77. "source": [
  78. "%matplotlib inline\n",
  79. "\n",
  80. "import numpy as np\n",
  81. "import matplotlib.pyplot as plt\n",
  82. "\n",
  83. "# data generation\n",
  84. "np.random.seed(314)\n",
  85. "data_size_1 = 300\n",
  86. "x1_1 = np.random.normal(loc=5.0, scale=1.0, size=data_size_1)\n",
  87. "x2_1 = np.random.normal(loc=4.0, scale=1.0, size=data_size_1)\n",
  88. "y_1 = [0 for _ in range(data_size_1)]\n",
  89. "\n",
  90. "data_size_2 = 400\n",
  91. "x1_2 = np.random.normal(loc=10.0, scale=2.0, size=data_size_2)\n",
  92. "x2_2 = np.random.normal(loc=8.0, scale=2.0, size=data_size_2)\n",
  93. "y_2 = [1 for _ in range(data_size_2)]\n",
  94. "\n",
  95. "x1 = np.concatenate((x1_1, x1_2), axis=0)\n",
  96. "x2 = np.concatenate((x2_1, x2_2), axis=0)\n",
  97. "x = np.hstack((x1.reshape(-1,1), x2.reshape(-1,1)))\n",
  98. "y = np.concatenate((y_1, y_2), axis=0)\n",
  99. "\n",
  100. "data_size_all = data_size_1+data_size_2\n",
  101. "shuffled_index = np.random.permutation(data_size_all)\n",
  102. "x = x[shuffled_index]\n",
  103. "y = y[shuffled_index]\n",
  104. "\n",
  105. "split_index = int(data_size_all*0.7)\n",
  106. "x_train = x[:split_index]\n",
  107. "y_train = y[:split_index]\n",
  108. "x_test = x[split_index:]\n",
  109. "y_test = y[split_index:]\n",
  110. "\n",
  111. "# visualize data\n",
  112. "plt.scatter(x_train[:,0], x_train[:,1], c=y_train, marker='.')\n",
  113. "plt.title(\"train data\")\n",
  114. "plt.show()\n",
  115. "plt.scatter(x_test[:,0], x_test[:,1], c=y_test, marker='.')\n",
  116. "plt.title(\"test data\")\n",
  117. "plt.show()\n"
  118. ]
  119. },
  120. {
  121. "cell_type": "markdown",
  122. "metadata": {},
  123. "source": [
  124. "## Program"
  125. ]
  126. },
  127. {
  128. "cell_type": "code",
  129. "execution_count": 2,
  130. "metadata": {},
  131. "outputs": [],
  132. "source": [
  133. "import numpy as np\n",
  134. "import operator\n",
  135. "\n",
  136. "class KNN(object):\n",
  137. "\n",
  138. " def __init__(self, k=3):\n",
  139. " self.k = k\n",
  140. "\n",
  141. " def fit(self, x, y):\n",
  142. " self.x = x\n",
  143. " self.y = y\n",
  144. "\n",
  145. " def _square_distance(self, v1, v2):\n",
  146. " return np.sum(np.square(v1-v2))\n",
  147. "\n",
  148. " def _vote(self, ys):\n",
  149. " ys_unique = np.unique(ys)\n",
  150. " vote_dict = {}\n",
  151. " for y in ys:\n",
  152. " if y not in vote_dict.keys():\n",
  153. " vote_dict[y] = 1\n",
  154. " else:\n",
  155. " vote_dict[y] += 1\n",
  156. " sorted_vote_dict = sorted(vote_dict.items(), key=operator.itemgetter(1), reverse=True)\n",
  157. " return sorted_vote_dict[0][0]\n",
  158. "\n",
  159. " def predict(self, x):\n",
  160. " y_pred = []\n",
  161. " for i in range(len(x)):\n",
  162. " dist_arr = [self._square_distance(x[i], self.x[j]) for j in range(len(self.x))]\n",
  163. " sorted_index = np.argsort(dist_arr)\n",
  164. " top_k_index = sorted_index[:self.k]\n",
  165. " y_pred.append(self._vote(ys=self.y[top_k_index]))\n",
  166. " return np.array(y_pred)\n",
  167. "\n",
  168. " def score(self, y_true=None, y_pred=None):\n",
  169. " if y_true is None and y_pred is None:\n",
  170. " y_pred = self.predict(self.x)\n",
  171. " y_true = self.y\n",
  172. " score = 0.0\n",
  173. " for i in range(len(y_true)):\n",
  174. " if y_true[i] == y_pred[i]:\n",
  175. " score += 1\n",
  176. " score /= len(y_true)\n",
  177. " return score"
  178. ]
  179. },
  180. {
  181. "cell_type": "code",
  182. "execution_count": 3,
  183. "metadata": {},
  184. "outputs": [
  185. {
  186. "name": "stdout",
  187. "output_type": "stream",
  188. "text": [
  189. "train accuracy: 0.986\n",
  190. "test accuracy: 0.957\n"
  191. ]
  192. }
  193. ],
  194. "source": [
  195. "# data preprocessing\n",
  196. "x_train = (x_train - np.min(x_train, axis=0)) / (np.max(x_train, axis=0) - np.min(x_train, axis=0))\n",
  197. "x_test = (x_test - np.min(x_test, axis=0)) / (np.max(x_test, axis=0) - np.min(x_test, axis=0))\n",
  198. "\n",
  199. "# knn classifier\n",
  200. "clf = KNN(k=3)\n",
  201. "clf.fit(x_train, y_train)\n",
  202. "\n",
  203. "print('train accuracy: {:.3}'.format(clf.score()))\n",
  204. "\n",
  205. "y_test_pred = clf.predict(x_test)\n",
  206. "print('test accuracy: {:.3}'.format(clf.score(y_test, y_test_pred)))"
  207. ]
  208. },
  209. {
  210. "cell_type": "markdown",
  211. "metadata": {},
  212. "source": [
  213. "## sklearn program"
  214. ]
  215. },
  216. {
  217. "cell_type": "code",
  218. "execution_count": 7,
  219. "metadata": {},
  220. "outputs": [
  221. {
  222. "name": "stdout",
  223. "output_type": "stream",
  224. "text": [
  225. "Feature dimensions: (1797, 64)\n",
  226. "Label dimensions: (1797,)\n"
  227. ]
  228. }
  229. ],
  230. "source": [
  231. "% matplotlib inline\n",
  232. "\n",
  233. "import matplotlib.pyplot as plt\n",
  234. "from sklearn import datasets, neighbors, linear_model\n",
  235. "\n",
  236. "# load data\n",
  237. "digits = datasets.load_digits()\n",
  238. "X_digits = digits.data\n",
  239. "y_digits = digits.target\n",
  240. "\n",
  241. "print(\"Feature dimensions: \", X_digits.shape)\n",
  242. "print(\"Label dimensions: \", y_digits.shape)\n"
  243. ]
  244. },
  245. {
  246. "cell_type": "code",
  247. "execution_count": 8,
  248. "metadata": {},
  249. "outputs": [
  250. {
  251. "data": {
  252. "image/png": "iVBORw0KGgoAAAANSUhEUgAAAW4AAABLCAYAAABQtG2+AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4wLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvqOYd8AAAFI5JREFUeJztnXmcFNW1x79nNgZmYAQGB9kExBEhUVRC1ERxeUZM3guo+USjiXlGJYGHL0bNxjMfSWIkLyaicSGSIHGLS94n6Iu7LwqK4jIRA0EZIovsy7DOvvV5f1RPV912ehime7q6M+f7+fRn7u1bXfc3t27dqjp17j2iqhiGYRjZQ07YAgzDMIzDwwZuwzCMLMMGbsMwjCzDBm7DMIwswwZuwzCMLMMGbsMwjCzDBm7DMIwsIyMGbhEZICKLRaRWRD4SkctC0DBLRCpEpFFEfp/u+gM6eonIwmg7VIvIeyJyQUhaHhaR7SJyUETWisjVYegI6DlWRBpE5OGQ6l8Srb8m+qkMQ0dUy6Ui8kH0nFknImekuf6auE+riNyVTg0BLSNF5FkR2SciO0TkbhHJC0HH8SLysogcEJEPReTC7qorIwZu4B6gCSgDLgfmi8j4NGvYBtwC3J/meuPJAzYDk4ES4CbgCREZGYKWucBIVe0HfBG4RUROCUFHG/cA74RYP8AsVS2Ofo4LQ4CInAf8N3Al0Bc4E1ifTg2BNigGBgP1wB/TqSHAvcAu4ChgAt65MzOdAqIXiqeAp4EBwHTgYREp7476Qh+4RaQIuBj4karWqOoy4H+Br6VTh6r+SVWfBPaks952dNSq6hxV3aiqEVV9GtgApH3AVNXVqtrYlo1+jkm3DvDuMIH9wF/CqD/D+DHwE1V9M9pHtqrq1hD1XIw3cL4WUv2jgCdUtUFVdwDPA+m+8RsLDAHmqWqrqr4MvE43jWOhD9xAOdCiqmsD3/2N9Dd8RiIiZXhttDqk+u8VkTpgDbAdeDYEDf2AnwDXp7vudpgrIlUi8rqInJXuykUkF5gIDIo+jm+JmgZ6p1tLgK8DD2p462fcAVwqIn1EZChwAd7gHTYCfKI7dpwJA3cxcDDuuwN4j4A9GhHJBx4BHlDVNWFoUNWZeMfiDOBPQGPHv+gWfgosVNUtIdQd5PvAaGAosAD4s4ik+wmkDMgHvoR3TCYAJ+GZ1NKOiByNZ5p4IIz6o7yKd6N3ENgCVABPpllDJd5Tx3dFJF9EPofXLn26o7JMGLhrgH5x3/UDqkPQkjGISA7wEJ7tf1aYWqKPfsuAYcCMdNYtIhOAfwHmpbPe9lDVt1S1WlUbVfUBvEfhz6dZRn30712qul1Vq4DbQ9DRxteAZaq6IYzKo+fJ83g3FUVAKdAf7x1A2lDVZmAa8AVgB3AD8ATehSTlZMLAvRbIE5FjA9+dSEimgUxARARYiHd3dXG0U2QCeaTfxn0WMBLYJCI7gBuBi0Xk3TTraA/FexxOX4Wq+/AGg6BZIswlPq8g3LvtAcAI4O7oBXUPsIgQLmSqulJVJ6vqQFU9H+/p7O3uqCv0gVtVa/Gulj8RkSIR+QwwFe9uM22ISJ6IFAK5QK6IFIbhUhRlPnA88G+qWn+ojbsDETky6nJWLCK5InI+8BXS/3JwAd7FYkL08xvgGeD8dIoQkSNE5Py2fiEil+N5c4RhS10EXBs9Rv2B7+B5M6QVETkdz2wUljcJ0SeODcCM6HE5As/mvjLdWkTkhGj/6CMiN+J5ufy+WypT1dA/eFfNJ4FaYBNwWQga5uB7TrR95oSg4+ho3Q14ZqS2z+Vp1jEIWIrnyXEQWAVckwF9ZQ7wcAj1DsJzRayOtsmbwHkhtUE+ngvcfrzH8l8DhSHouA94KAP6xARgCbAPqMIzUZSFoOO2qIYa4DlgTHfVJdEKDcMwjCwhdFOJYRiGcXjYwG0YhpFl2MBtGIaRZXRq4BaRKSJSGZ2p9YPuFmU6TIfpMB3/rDpSwSFfTkan2K4FzsPzH30H+Iqqvp/oNwXSSwsparespdT9fvDgvbH01tojnLLCLb77sqpS07KXPhQj5FBHNYUUkUsuDdTSpI0f86ftSMfHth3rX8N65bQ4Zft3+pM4VZWGvdu7TUfkCH+7kcN3OmU7mv15SqrKvsr9KdPRNNT9/hMDd8fSeyO5TtmeSn/b7j4ukud7ZEZGu/cZsrbJ14FSy8GU6Qj2B4Da5oJYOn9dQ0K9qdbRka74flr9vl+Wah1NQ9zvNdAlSvu6c+WOyvPbR1VZVdnEyFF55OVB5Wqld24/ciWP+tZqmiL1h6WjcaQ7EXF4sT9+bD4w0Ckr3O5P8lVValpT10+1vMDJB49F05pIu785FIl0tEdn/JQnAR+q6noAEXkMz8864cBdSBGflnPbLau6+DQn/90bHoulf/TXqU5Z+fXbY+l9TTv4x55lnBxdvXJDdAb4KBnLW9q+a3FHOuIZ8oA/OB/bZ5dT9uTt58TSNbs2svuZJ7pNR905n46lF95xu1M2d/uUWHr3ql28dXVFynRsuNY9Lm9/fX4s/Vh1f6fsocmTYunuPi65pUfG0vX3ustxFJz3USy9X/ewnvdTpiPYHwDe3joilh52ceK5YanW0ZGu+H669AS/fVKtY9M3T3fyTSX+4HTVua84ZbNL/dVul1fU8/3bdvO7h71B9YvjvHYcXXwKy6vad//uSMfamyc6+V+c4Y8fNzz9VafsuJ/7Cybub9rBP/a+nrL2aLr3aCc/sq9/Adl2atcmfSfS0R6dMZUMxVtmtI0t0e8cRGS6eOtZVzR3w3IWja01FOJ3zEJ608jH56Z0t47mugMZoaNuV11G6MiU49JIvenIQB1bd7QyeIh/e16YW0xDpDbtOhoitRnRHqkiZS8nVXWBqk5U1Yn59ErVbk2H6TAdpqPH6TgUnTGVbAWGB/LDot91iaBpBODSvvti6TuOqHHKnnn3hVh6eUU9F8woouoL3iN9w31r6EXqVrLcWD0gll40wl1W+Ldn+sFFGocU0fyKf6VuoD4pHZHJJzn51+65L5ZeG7dCydSBK2LpyjG1rCI5HWvn+yaPuee4x+UTd/rr0P/92/c6ZXedMTKWbq6ChpffS0pHR2yYMSaWbvq7azscg28q6UVvGpJsjyDBtoa4PrHN3fbJ2uJYuvLdWn755dTp2PfvrgnrhRG+CeuYx7/llI3hzVg61e0RT8EB/57vuZvPcspemjk2lj5Yv42dG5Yyd7u3QkHdgXcoAFrrd6Hq2ug7w1njEgcd+tW/uoGRnjrNP7dyVuWx+erk2iN3vB8345XxjyfeMK5/3FrlxtsImrS6SmfuuN8BjhWRUSJSAFyKF+ggrXxqQiGNB6poPLiHSGsLO9nMII5KtwwKRg2jnhrqtZaIRkLTMeaEPhmho3jA8IzQ0Y/+GaEjU45LprRH3+MGU735INXbqmltbg1Nx8DjSzOiPVLFIe+4VbVFRGYBL+AtwHS/qqZ95b68PGHYZy9i/bMLUFWGM4xiKUm3DCQ3l+OYwApeQ1GGMDIUHbl5khE6JCcz2iNHcjhOw9eRKcclU9pDcnOYdOOp/N9/vohGlDJGhNMeeTkZcVxSRadWv1PVZ0ki8knLOX7UrUv7vueUXTDl0li6ZKUbK+DLy9w3ui1Tj6Fs6vcAGDUjudUS400U95XfHci5LkD9VrmuP6VyFKUpulqvn+ba0YKPVQv/crZTtu6S3zj5+TImKR1j5/vxKx768SSn7Kalj8bS8V4lxX98y82nsD1yy4508l+7yH/T/vgitz8EH10ByjiOMs4CoHV1cnF83693379PK/L3t7bZfbn2Xysvd/JHD95NGSd6Ona6nh+Hy7TrX05YNvrJjl+epbKfjpjzRsKyD+ed6uSvKnPP42W/KOd0vNCLrZJceyx53z3mb5ck9va56yN34carLrqeCUwDoM9itw93hubSxDERrtzkm1ODHkgAPzvhKSe/lDEki82cNAzDyDJs4DYMw8gybOA2DMPIMtIS4aVhoF/NTbs+6ZRFViaOgfvOqtRGydo0x5/99dSVtzll5fmJpxwPfXGPk29Noabg7C6Axzf5dtznrnM1nr36MidfEHCH6wpO258w1ikLuml+eb1rW84b7Hablh3u1PxkCLr/AdxRsjiWXjrPdaP64H53Fl3OAV/XmO8kp+OlnW57BGcDxveVyCr3JVfrztS9ux/X2/W8Db4DyVm6In7zlFJ3oT+Ld9uZiWdiP3fRrzrcz+OX+f1n8LzkbNxjHnDPvpcefSSWvvLNM5yy95vKnHzftftj6a6cw/lrEntB75zq981JT21yysYVxJ8fZuM2DMPocdjAbRiGkWWkx1TS378+PLLcnQlW3kEQ5LySJiffcqAgwZadI+jSdN38C52yZ1e8mPB38W5AyV7tgi5vlT8Y7ZRddW7ihWZ6f9VdWyGVJpt4k9UXTvZj8Z70fNxUsLjwuCumDImlu2I2Cc4O/GC6O0tz/PLpsfQwXBPEhim/c/In3jaTVBFcwArgjAu/GUtXneiulhiv+Xh8HR250XWG+Mfsp/b4bqyb5rhmx1F/jDPpJekSGTQtjJjproh4X/kfEv7uquuud/KDFyfXBkEaBiQeA+JnPH/+vEucfLLtEXTtjJ8NGRw/Rj1/tVP2w6PcEyboxtpVTXbHbRiGkWXYwG0YhpFldMpUIiIbgWq8p/MWVZ3Y8S+6h62z55JT2AtyhH3a0Ok1jFPNMn2WXPIQBCGnx+tYsutB8iQfQUAjoemo/N1PycnvheTk8JHWh6Zj/byfklPg6dgeoo5M6R+mI/Ucjo37bFWt6kolhfv8Vd0+9cl1TtmBoJjBrvvOJeP+6uRvi8DQq2aQW1TMqB8u74qULrHr5LjVvJbAKUymQLq27OMHc/0psRum/CbhdpNm3+jk++/8+P+cjI6OCNqqgzZsgD33u0EGmkuWMHD2teQWF1HehaUIeh3w+0f8dPLVp/nuXreudO2K8eTWtHBa6TQKcnonPbU6nuAU6VI+3cGWoLnKoJuuIbdvEeXfqEiq3v85cLKTD9pxb73I/R9nT3ftpUUjCzjplJkUFBR1yXUwaH8tOM8tK9/mu0ROmj3DKeu/OLX9NLg8RXD1THBXSCwc4QYwuPxRt+2XnpzPpGO+QUFen6Tt3fEr/L0y+cpYunypW+/593/byY+8w48uFd+uncVMJYZhGFlGZ++4FXhRRBS4T1UXxG8gItOB6QCFJF6MJSlE2LZwAYiQr0cyTEa3s0kadAAreA0UhjLadAjsuvO3IEIfLQtNhwhU7P0zgjBEh4fYHsKuXy4MvT0Q4b2VixCEoTq4x/dTASo2/iHaP4aE2h7J0tmB+7OqulVEjgReEpE1qvpqcIPoYL4AoJ8M6DgCcRcZ+q1Z5JWU0FJTzZZb5lGkfekvg5xt0qFjImdTKL1p0gbe5bUer6Psxpnk9S+h9WANW753Z2g6Jg24kMLcYhpb66jYvTi89pj9Lb89vn1XaDpOOekaevUqoamphvfeuLfH99NJo6+gML8fjS21VKxZFJqOVNDZZV23Rv/uEpHFeAGEX+34Vz79Kn1L9s3DnnbKrpju+3zmT9tNRxz7cz8+cQ5DOMhe+jOog190D4Xi2bcKpJBBevg6gtN2b53o2m2DU6vfvnW+U3b25W4w5fpHhsRinAxatC6p9ghGwwEY8rI/xTnohw/w4Dg3iPG0/TOAJvJKChjUheMStB9fu/gzTlnQvnnPg3c7ZUEfb4BhVatppY486JKOIPGRZ4J2+DHfTxgnG4CRy9qiKZWwPUkdD/3JfYEWtGPHT8v/Usm7Tn7rJW3zBXox6I3kdKyNW15gbfPrsXTpc+57q/j5BcmeL8Gp5vHvQIJLRjSPdZfinf2oa8deOKNtmeT+DLouteNH8B1CfFu9cO6dTj7o597VZSsOaeMWkSIR6duWBj4H/L1LtSVBU10LLerF8mrVFvaykyLSvxB6pLkxI3S01jfT2uxNimhtbgxNR11dhEi9ty50pKEpvOPS0JQRx6W2LuIfl5bw2qOuLkKkwdMRaQyvf7RqZpy3tRnSHqmiM3fcZcBiEWnb/g+q+nzHP0k9tXsbqeAtUFCUwQynVAanWwYttTVUsCR0HU37aql8xrsDVY0wlKNC0bG3KsK2Ob/1dLRGGBGSjpb9tRlxXHbubmXVq95MSo1EGBJSe+zZHWH7r+/xMpHwjksjDaxkeUYcl0xoj1TRmdBl6yEazqOLBKdTXzL/Bqfsphv8SCt3rHMfC9+ZEJxa3I9TpYu+M+0QH5nk7NW+GeKV8W7EipbP+qaeHPI4dVFyOoKPVR25FbXctNctC+oaD6Nu992MRiXpdpa/353Gfe0tjyXYEqa94bp/nbk5ENUo8SJyXSK/qi6Wjl+Vb8DDxYFcMaNT2D92n+lGao6fXh9k/HI3As7pBwNT85Nsj1HzP3TzI/zp1PGP4N9c664eeUa5HwA7Z2dyKwleM9GdTv7Vm31X1fbcVNvoI8WcSnLHJXiuxv+Pr6zwz4l4M0r8aprnRPwlI5J1F403hwSDGE/u47bVf1wxy8n3WXr40XfiMXdAwzCMLMMGbsMwjCzDBm7DMIwsQ1RT76ooIruBWqBLU+TjKO3Efo5W1Y/59ZiOjNbxUSf3YTpMxz+Djs5oaVdHu6hqt3yAikzYj+nITB22D9tHT9pHKvejqmYqMQzDyDZs4DYMw8gyunPg/thCVCHtx3Sk9vep3I/tw/bRU/aRyv10z8tJwzAMo/swU4lhGEaWYQO3YRhGltEtA7eITBGRShH5UER+kMR+NorIKhF5T0QOezEO02E6TIfpyHYd7ZIqv8KAr2IusA4YDRQAfwPGdXFfG4FS02E6TIfp6Ik6En264457EvChqq5X1SbgMWDqIX7THZgO02E6TEe262iX7hi4hwKbA/kt0e+6Qlusy79GY8GZDtNhOkxHT9LRLp2NORkWh4x1aTpMh+kwHT1NR3fccW8Fhgfyw6LfHTYaiHUJtMW6NB2mw3SYjp6iI+FOU/rBu4tfD4zCN+qP78J+ioC+gfQbwBTTYTpMh+noKToSfVJuKlHVFhGZBbyA92b2flVdfYiftUdSsS5Nh+kwHaYj23Ukwqa8G4ZhZBk2c9IwDCPLsIHbMAwjy7CB2zAMI8uwgdswDCPLsIHbMAwjy7CB2zAMI8uwgdswDCPL+H+2ihC0591JagAAAABJRU5ErkJggg==\n",
  253. "text/plain": [
  254. "<Figure size 432x288 with 10 Axes>"
  255. ]
  256. },
  257. "metadata": {
  258. "needs_background": "light"
  259. },
  260. "output_type": "display_data"
  261. }
  262. ],
  263. "source": [
  264. "# plot sample images\n",
  265. "nplot = 10\n",
  266. "fig, axes = plt.subplots(nrows=1, ncols=nplot)\n",
  267. "\n",
  268. "for i in range(nplot):\n",
  269. " img = X_digits[i].reshape(8, 8)\n",
  270. " axes[i].imshow(img)\n",
  271. " axes[i].set_title(y_digits[i])\n"
  272. ]
  273. },
  274. {
  275. "cell_type": "code",
  276. "execution_count": 9,
  277. "metadata": {},
  278. "outputs": [],
  279. "source": [
  280. "# split train / test data\n",
  281. "n_samples = len(X_digits)\n",
  282. "n_train = int(0.4 * n_samples)\n",
  283. "\n",
  284. "X_train = X_digits[:n_train]\n",
  285. "y_train = y_digits[:n_train]\n",
  286. "X_test = X_digits[n_train:]\n",
  287. "y_test = y_digits[n_train:]\n"
  288. ]
  289. },
  290. {
  291. "cell_type": "code",
  292. "execution_count": 12,
  293. "metadata": {},
  294. "outputs": [
  295. {
  296. "name": "stdout",
  297. "output_type": "stream",
  298. "text": [
  299. "KNN score: 0.953661\n",
  300. "LogisticRegression score: 0.908248\n"
  301. ]
  302. }
  303. ],
  304. "source": [
  305. "# do KNN classification\n",
  306. "knn = neighbors.KNeighborsClassifier()\n",
  307. "logistic = linear_model.LogisticRegression()\n",
  308. "\n",
  309. "print('KNN score: %f' % knn.fit(X_train, y_train).score(X_test, y_test))\n",
  310. "print('LogisticRegression score: %f' % logistic.fit(X_train, y_train).score(X_test, y_test))"
  311. ]
  312. },
  313. {
  314. "cell_type": "markdown",
  315. "metadata": {},
  316. "source": [
  317. "## References\n",
  318. "* [Digits Classification Exercise](http://scikit-learn.org/stable/auto_examples/exercises/plot_digits_classification_exercise.html)\n",
  319. "* [knn算法的原理与实现](https://zhuanlan.zhihu.com/p/36549000)"
  320. ]
  321. }
  322. ],
  323. "metadata": {
  324. "kernelspec": {
  325. "display_name": "Python 3",
  326. "language": "python",
  327. "name": "python3"
  328. },
  329. "language_info": {
  330. "codemirror_mode": {
  331. "name": "ipython",
  332. "version": 3
  333. },
  334. "file_extension": ".py",
  335. "mimetype": "text/x-python",
  336. "name": "python",
  337. "nbconvert_exporter": "python",
  338. "pygments_lexer": "ipython3",
  339. "version": "3.6.8"
  340. }
  341. },
  342. "nbformat": 4,
  343. "nbformat_minor": 2
  344. }

机器学习越来越多应用到飞行器、机器人等领域,其目的是利用计算机实现类似人类的智能,从而实现装备的智能化与无人化。本课程旨在引导学生掌握机器学习的基本知识、典型方法与技术,通过具体的应用案例激发学生对该学科的兴趣,鼓励学生能够从人工智能的角度来分析、解决飞行器、机器人所面临的问题和挑战。本课程主要内容包括Python编程基础,机器学习模型,无监督学习、监督学习、深度学习基础知识与实现,并学习如何利用机器学习解决实际问题,从而全面提升自我的《综合能力》。