{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# kNN 分类算法\n", "\n", "\n", "K最近邻(k-Nearest Neighbor,kNN)分类算法,是一个理论上比较成熟的方法,也是最简单的机器学习算法之一。该方法的思路是:***如果一个样本在特征空间中的k个最相似(即特征空间中最邻近)的样本中的大多数属于某一个类别,则该样本也属于这个类别***。\n", "\n", "kNN方法虽然从原理上也依赖于[极限定理](https://baike.baidu.com/item/%E6%9E%81%E9%99%90%E5%AE%9A%E7%90%86/13672616),但在类别决策时,只与极少量的相邻样本有关。由于kNN方法主要靠周围有限的邻近的样本,而不是靠判别类域的方法来确定所属类别的,因此对于类域的交叉或重叠较多的待分样本集来说,kNN方法较其他方法更为适合。\n", "\n", "kNN算法不仅可以用于分类,还可以用于回归。通过找出一个样本的`k`个最近邻居,将这些邻居的属性的平均值赋给该样本,就可以得到该样本的属性。更有用的方法是将不同距离的邻居对该样本产生的影响给予不同的权值(weight),如权值与距离成正比(组合函数)。\n", "\n", "kNN可以说是一种最直接的用来分类未知数据的方法。基本通过下面这张图跟文字说明就可以明白kNN是干什么的\n", "![knn](images/knn.png)\n", "\n", "简单来说,kNN可以看成:**有那么一堆你已经知道分类的数据,然后当一个新数据进入的时候,就开始跟训练数据里的每个点求距离,然后挑选这个训练数据最近的K个点,看看这几个点属于什么类型,然后用少数服从多数的原则,给新数据归类**。\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "该算法存在的问题:\n", "1. 当样本不平衡时,如一个类的样本数量很大,而其他类样本数量很小时,有可能导致当输入一个新样本时,该样本的K个邻居中大数量类的样本占多数。在这种情况下可能会产生误判的结果。因此我们需要减少数量对运行结果的影响。可以采用权值的方法(和该样本距离小的邻居权值大)来改进。\n", "2. 计算量较大,因为对每一个待分类的数据都要计算它到全体已知样本的距离,才能求得它的K个最近邻点。目前常用的解决方法是事先对已知样本点进行剪辑,事先去除对分类作用不大的样本。该算法比较适用于样本容量比较大的类域的自动分类,而那些样本容量较小的类域采用这种算法比较容易产生误分。\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. 算法步骤:\n", "\n", "输入:\n", "* 训练数据: $T=\\{(x_1,y_1),(x_2,y_2), ..., (x_N,y_N)\\}$, 其中$x_i \\in X=R^n$,$y_i \\in Y = {0, 1, ..., K-1}$,i=1,2...N\n", "* 用户输入数据:$x_u$\n", "\n", "输出:预测的最优类别$y_{pred}$\n", "\n", "\n", "1. 准备数据;\n", "2. 计算测试数据与各个训练数据之间的**距离**;\n", "3. 按照距离的递增关系进行排序;\n", "4. 选取距离最小的`k`个点;\n", "5. 确定前`k`个点所在类别的出现频率;\n", "6. 返回前`k`个点中出现频率最高的类别作为测试数据的预测分类。\n", "\n", "\n", "\n", "**深入思考:**\n", "* 上述的处理过程,难点有哪些?\n", "* 每个处理步骤如何用程序语言来描述?\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1.1 距离计算\n", "\n", "要度量空间中点距离的话,有好几种度量方式,比如常见的曼哈顿距离计算、欧式距离计算等等。不过通常 kNN 算法中使用的是欧式距离。这里只是简单说一下,拿二维平面为例,二维空间两个点的欧式距离计算公式如下:\n", "$$\n", "d = \\sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2}\n", "$$\n", "\n", "在二维空间其实就是计算 $(x_1,y_1)$ 和 $(x_2, y_2)$ 的距离。拓展到多维空间,则公式变成:\n", "$$\n", "d(p, q) = \\sqrt{ (p_1-q_1)^2 + (p_1-q_1)^2 + ... + (p_n-q_n)^2 } = \\sqrt{ \\sum_{i=1,n} (p_i-q_i)^2}\n", "$$\n", "\n", "kNN 算法最简单粗暴的就是将 `预测点` 与 `所有点` 距离进行计算,然后保存并排序,选出前面 k 个值看看哪些类别比较多。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## 2. 机器学习的思维模型\n", "\n", "针对kNN方法从原理、算法、到实现,可以得出机器学习的思维模型,在给定问题的情况下,是如何思考并解决机器学习问题。\n", "\n", "![machine learning - methodology](images/ml_methodology.png)\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "上图是机器学习的经典的流程\n", "* 问题:我们需要解决的问题是什么?\n", "* 核心思想: 通过什么手段解决问题?\n", "* 数学理论: 如何构建数学模型,使用什么数学方法?\n", "* 算法: 如何将数学理论、处理流程转化成计算机可以实现的程序?\n", "* 编程: 如何把算法变成可以计算机执行的程序?\n", "* 测试: 如何使用训练、测试数据来验证算法\n", "* 深入思考:所采用的方法能够取得什么效果,存在什么问题,如何改进?\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. 生成数据" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAW4AAAEFCAYAAADDkQ0WAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAAioUlEQVR4nO3df5Rc5X3f8fdXu6tdhC1AQfwINgLqgLFTp7XWjS0gcSSD3WMQAnoaA0a4mFK7oSc1qdMARuW4x5xaaXqMY3pcjsE1NsYIghEyoRQtboP4UbOikHNCIDFoDVIIEiWAQUawq2//eGYyV6N7Z2fu3Dv33rmf1zlzdmZ2duaZmZ3vfeb7fJ/nMXdHRESqY0HRDRARkd4ocIuIVIwCt4hIxShwi4hUjAK3iEjFKHBLJZjZ4ZHzo/Pc9jQzG2u77iozOzjmth81s+PN7Dtm9gkz+7iZHWJma8zsSwn3f7KZLUz5VET6psAtpWdmJwAPmNkhjateMLOjzeyrZjbRCMqXNm57MPBd4KjI3x8FnAu8HnP3RwC/B7wFjAL/FRgBfh2YTWjSfwIWmdkpZvaCmW2JnB43sxv7ftIiHShwS+m5+9PAdcA/bly1B/gb4AzAG5f3NH73L4BvAgeZ2cVmNg1sIQTjzWb2gJn9aeTufwT8Z2Bv4/LngVeAU4AxMzvdzM40s+PMbKTRk59tPO4ssNHdT26egIsJBwGR3HT8yilSNDM7EDgYuBV4sXH1W+4+a2Z7gTlC0HUz+2XgXwIfA24CNgD/BHgIOMfdd8Q8xBSwGHgvsAJ4A1gO/ENCwL8I+GXg3zZ+rgfeB9wDfA9YbWbvBd4JHAY8A9yd2QsgEkM9bim7FcAPgGeB8cZ1exNuexjwNiGo/pzQm/4p8G7gbjO7M+ZvTgeeAx4G/hjYBvw/4C/c/Q8a9/FDd3/E3be4+wrgceA04P8Cd7n7R4HfAf5H4/wfmZk+W5Ib/XNJqbn7fe5+CiG4dkxBuPvjwEmEXPa/Bv4W+CtgNfA52nLWjUHO/whcDcwAjwB/CZwDHGZm7wCOBLZH/uYAYBEhF/6rhB73FkJ65vTG+Qcb7RDJhVIlUhnuntTTBsDMDLgP2AF8AXgS+EfAfyGkMt5tZg8AT7r7vwLOJgxCrgIOBY4Htrr7x83sD4DzgA8BdzXu/yRCwF4KfJYQuG9w96vMbBK42N0/l+mTFomhwC2VYWbHAi8RXx2Cu7uZfYfQM/8rQipjAyHnvRP4d8CXaPWg7wA2AQ8QAvSphLQKwI3A/26c/4vGz4fc/dfM7H8RUiqfBp40s6eAV4EDzexx4OvursoSyY1SJVIVBtwOfIqQzoDw/2vNU6O2+lFCnvufEXrF1wD/nNb/+m8TBhhx91lCT3uaEMSvBWYbA6IvEwYqtzR7+r7/UpofJ6RX/tbdf93dfxX4BqAab8mVAreUnpkdBLwL+DLwCcKAIcAYIUiOEb49vgP4I0Jt9g8IPe8PAf+G0JN+mxDIf9PMPtK4j78B/pRQufIV4CPAJCHlcj9wipld3pz0Y2afA04APglYo1Txvc06buDf5/MqiLQoVSJV8Bqh17yTEDRvB3D3X4G/H2QcdfeXgY82/8jMfoXQ8343oQ78Und/28x+091fMrN/ANzbOH3S3V9oBN9NwO+5+21mdh3hILCZ0Js/p3H7V4H/0HioZxo13JjZxcCBub0SIoQeQ9FtEOmamS2Yb5Cyx/tb6O5vtV032kijiJSSAreISMUoxy0iUjEK3CIiFZP74OShhx7qxxxzTN4PIyIyVLZu3fqSuy+N+13ugfuYY45heno674cRERkqZvazpN8pVSIiUjEK3CIiFaPALSJSMQrcIiIVo8AtUmFzc0W3QIqgwC1SUTMzcNhh8LPE2gMZVgrc0hP18Mpj3Tp45RW46qqiWyKDpsAtXVMPrzxmZuC222Dv3vBT70m9KHBL19TDK49162C2sX7h3Jzek7pR4JauqIdXHs33ohm4335b70ndKHBLV9TDK4/oe9Gk96Recl+Pe3Jy0rVWSbXNzMCJJ8Kbb7aum5iAp56CZcsKa1Ytzc3BgQeCOyyIdLv27g2XX38dRkaKa59kx8y2uvtk3O+0dZnMq1MP76abimlTXY2MwM6dsGfP/r8bH1fQrgv1uKUj9fDKb25O78EwUo9bUlMPr9xmZmD5cnjsMaWt6kSBW+a1eHHRLZAk0RJNpa3qQ1UlIhWlEs36UuAWiVGFqf0q0awvBW6RNlWY2q9JOPWmwC3SpgpT+zUJp95UDii11l5KF51sVNZJRirRrIdO5YDqcUttxaVE8sgbZ50vb5Zobt8Ozz3XOm3fDi++qKBdBwrcUlvtKZE88sZ55csXL4alS/c/qXSzHroK3GY2Zmab2q77gpltzqdZIvmKK6XLI29chXy5VM+8gdvMDgC2AqdGrlsGfCa/Zonkqz0l8qUvwYYNIUc8MdE6mYXAnibdoTprycu8gdvdf+HuHwC2R66+Frg86W/M7BIzmzaz6V27dmXQTJHsxKVEbr8dpqezzRurzlry0nOO28zOA54Anky6jbtf7+6T7j65dOnSftonkrmklMj69dnljVVnLXlKMzh5OrAK+AGw3MwuzbZJIvmZm8s+JRJHddaSp67ruM3sp+7+nsjlY4BvufvHOv2d6rilbF57LXm1wyyqMlRnLVnQsq4iEXmXzGkpXMlb14E72ttuXJ4BOva2RepK9dSSJ03AERGpGAVuEZGKUeAWEakYBW4RkYpR4BYRqRgFbhGRilHgFhGpGAVuEZGKUeAWEakYBW4RkYpR4BYRqRgFbhGRilHgFhmgrHd8l3pS4BbpQhYBN68d36WlLgdGBW6ReWQVcLXje77qdGBU4BaZRxYBVzu+569OB0YFbpEOsgq42vE9X3U7MCpwS211kw/NIuAOYsf3uuR2k9TtwKjALbXUTT40q4Cb947vdcrtxhnEgbFsFLillrrJh2YRcOfmYMOGsLv7xETrZBaCSxY95TrlduPkfWAsI3P3XB9gcnLSp6enc30MkV7MzMCJJ8Kbb4Yg+tRTsGzZvreZm4MDDwT3EHSb9u4Nl19/vfvd2l97LXnH9343Fe7muQyzLN+nsjGzre4+Gfe7rnZ5N7Mx4A53P8PMDPjvwAnATuBsd5/t9Pci85mbG9wHLC4fetNN+95mZAR27kwOuL20Nc8d37t5Llkb5Hs1nyzfpyqZN1ViZgcAW4FTG1edBIy6+4eBxcBp+TVP6mCQOdpe8qGLF8PSpfuf8gzEvSgit1vGfHrZ36c8zBu43f0X7v4BYHvjqheBaxvn38qrYVIfg8zRDlM+tIjnUvd8ell0neM2s5+6+3sil88CfhdY5e5zbbe9BLgE4Oijj17+szIdnqVUBpmjHaZ86CCfSzM1Uvd8+qD1neOOucPVhKB9RnvQBnD364HrIQxOpnkMyUeZ8pMw2BztMOVDB/VcZmZg+XJ47LFi8ukSr+cet5kdAdwGfMLd35jv71RVUh7RD2EZekrRHlyTenLlsnYt3HwznHkm3HOP3qtB6tTjTlPHfSFwJHCvmW0xs4v6ap0MTNnyk8OUb85bETMjo9PI77orDH62t0nvVTFUx10TZctPDipHW7bUUBrdflPK+rmuXQu33NI6uC5YAAsXtn5fxbGBKsm6xy0VVLa1HJo52u3b4bnnWqft2+HFF7MJBGUsXUujm29KWT/X9lJDgLExePDBfN4r6Y0Cdw2UdS2HvOtvy5YaSqPbVe+yfq5xaay9e+FrX6tPrXSZKXDXQB1zyWVc5jNNnrqbb0pZP9dBrK8i/VHgHnJ1/RD2mhrK+3VIk8ro9ptS1mmwQaSxpD8K3EOujh/CXlNDeebCmweENKmMbr4p5ZUGi6axlixRaqRsFLhroExrOQyih99raiivXHjzgLBlS++pjG6/Ka1bl2+Z3rAM8A4blQPKwAxiAlCvZYZ5lkk2J68cdRQ8/3y4bmwMPvWp7mYczrccbPO5vvVW6/kuXJhtmV7zOZx/fvpZksNQklkElQNKKQyiyqPX1FBeZZLRAcNm0IbeUhnzfVMaGYFHH23VVjfL9bJKg2Ux6Kkeez4UuGUgBlnl0W1qKM8yybh0TVM/B4j2VNMf/mHrumi5XhZpsCwOasNQkllGCtwyEHn0bPvNl+dVJhk3eaVpZCR9RU9777WfA898j53FQa2MJZnDQoFbcpdHz7bfr+B5lkl26m2PjsLDD6dLZbT3XtMeeLp57bI4qJWtJDOtUrbL3XM9LV++3KXeLrjAfXTUPQyhhdPYWLi+n/tcsKC/+3j1VfedO/c/vfpq+vucnXUfH3dfuNB9ZGTf5wzhdUjT5m3b3Ccmwn1MTLg/80zrcSYmWqfm5dnZ5Pua77WLPode7zuuvc3TxIT7zEzy7ZcsSf59UYpsFzDtCXFVgVtylUUQaNcexMr2YX/1VfcXXgjPcWwsPP/mKe3zjh78mge9NAeebl+7fg9qvR6sszgQ56HIdnUK3CoHlNxlvct5dNW6XsrrBi2r553luuWDeO3KVJLZj6Lb1akcUD1uqZRev4J3K03Pf1CySjXl9drF6aXHHvdtogyKbhcdetwanJRKyaMSpDlY9+yzfTUtF1kOog5ysbEylGT2I6t25TWwqcAtlRENYs1JJ+Pj8UGs14D2d38XvhYXHTDaZbXWTFkXG5vvYFLWdnUjz8lHCtxSGSMj8MILIWideWYIQqtX7x/EevnANHtW7mHq+GWX5fkM0slirZkyLjY238HkmWeKmXWZ1UEuz8lHGpysuSqtI9Fc62TjRjj11ORBo17W11i7Fr7//daHcWQkBIwyDI7lIen9Lur/oNMA7qWX9r9OSh7t6uaAmcXApgYnJVZZa2eTNEuzjj02edCol1LBuME6cD/77LyfSTGS3u/268swUFv2ks/5ZDGwieq4JU5Za2fjJAXZ9g929AMz32SXCy6InyQzMlK9QNGNpPc7en1ZDuZFV3T0I6vqHQVu2U/VejRxJXHtpXFxH5jx8fjn1pwYZBZ/n5/+9MCfYq6S3u/26886q/iD+SDLFvOQVflm34EbGAM2Nc5PAD8CngC+SyNPnnRS4C6nKvVotm0LQbY9uI6P7zsL8/zz9//AmCU/t5dfjp/dODaWflZnWSW93+3fUBYsKD5Q5rFEwqBkOVO4U+Ced3DSzA4A/g9wvLtPmNnFwKS7f87MfgR83d3/Z9Lfa3CyfLKciTcIZ58NP/zhvteNjsJZZ8F117UuH3lk+JhDqBBpGh+HN96IH3zLelbnoHUzqJj0ft93X2uQt10WsyrTDHj2OutykG3rVlb/U31tpODuv3D3DwDbG1etBO5rnL8f+K3umyJlUKVd3+fmQhUJ7FuetWABbNrU2g/xkENa5W5nnhkCObQCfNKHtEzbuvWq27LHpPd77drkVQz7nQiTtoZ5EGWLeW/uMIj/qTR13L8EvNo4/xqwpP0GZnaJmU2b2fSuXbv6aZ9kLK5GdXw8fFA3bCjfEpbPP7//Di9JH+TFi0PPetOmVkCanYU77yzfxJosdFMnnFSTDLBtW+v6pBLBtAfzfmqY8w58w7C5Q5rA/RJwUOP8QY3L+3D369190t0nly5d2k/7JGNxPZrVq8PkgjVrylfTHe0tRnd4SfogV+nbRD+63aQgqQe7Y0c4KG7fHgL4yEg4MI6Ph1M/syrLvIFCmdvWk6Tkd/sJ+Gnj50XAf2ucvxv4WKe/0+BkuZW5uqTX6oI8lpAdtG7bmPXgcpZrk5d54LvMbWtHFuWAkcA9Tqgq+XNUVVJ5Zf5HTlNdkMfmCGmkXWe8mxrqMpfLqW3ZySRwpz0pcJdXmf+Rq9x7TjuJpdsJUWUul1PbstMpcGutkhqLLqrfVKaNCapaqtfLWilN3a5tMahyufbH7OY+i2hbt8rctiSdygEVuGuqiv/IVZB2caHoYlfzHTwHeUBrLuz12GPdPY8yH2zL3LY4CtwSq2r/yFnIexW8NFuDzczACSfsO2moLBOi0nx7kGz0NQFHhleVJ5+kkffEi7S7pqxbF24bVYYSxqEpnRtCCtxSG3lPvEhTQz43B7fe2pqqD8m7+gxa9PmU4UAiLQrcMpTaA17evce0u6aMjIQp+dEp+nG7+mRtvgNCWfeClECBW4ZOXEok795j2jU2ZmbCWizRKfqbNsHu3fmlrLpJGdVlBmpVKXDL0GlPiQyq95hmzKCIADlfyqisGwtLi6pKZKjEleNddVU569WLKMnstlyxjhVHZdOpqmR00I0RyVN7SuTKK+H221u9x6Zmrvvb3y6uXr2ZXkkKkGnb1ankMS5lFHfwUnAuN/W4pRSyqK9O2jDg0Ufh8MP3v/0w9h7bJ8xEX9eqbaBRd6rjllLLqr46KV+8fv1w1qvH5Zqj+ev211UDjsNDgVsKl0V9dd0G1OIOdu0lj5dd1npdq/76lL19g6ZUiRQq7doeceo0oBY3FT063X50NAx6zs21XtdDDqnm69PreinDQqkSKa0s6qubvbG6TOGPm0zUXvI4O9t6XZqva1Vfn2HYaixrCtxSmCzqq/Nef6SM4g52cfnrpirPetR6KfEUuKUwWQyW1a03lnSwu/XWVv56QcynuqqDkFovJZ5y3FKILCafZJkfr4qkzS/OOQe+/vXwui5b1npdzcJtep3Uk/fyt92oe/mictxSOtG1PbZsCR/IBx/sbXGlKvfG0lRJdKoMufNOWLIEjjgCdu1q7eLe7ZopUWVJP6l8MZl63FK4frf6aqpKb6yfKolBVM6UYfME7dCkHXCkxPrZ6quM6490o+jA2CkNUqb0U53KO+MoVSKllSbdUeXJJEVXScyXBilT+qmq5YuDkCpwm9mBZrbRzB40s/VZN0rqIW05YNq1r8ug6MDYqQpHmydUR9oe9/nAI+5+EvB+MzsxwzZJTfQz+FTF3ljRgXG+3r4GA6sjbeDeAywyMwMmgLeivzSzS8xs2symd+3a1W8bZQhVOd2RVtGBsVNvv47vR5WlGpw0szHgYeAdwJS7/07SbTU4KUnqNPhUdJVEN1U4dXo/qiCPjRQuB77p7t8ys1vMbIW7P5S+iVJHdQoGeW2a0K2k3v6VV8L3vhcu1+n9qLq0gfudQPPYvYfQ8xYpvSJnBBYVGNvTINHrb74ZvvxlOO64Ytom6aTNcV8HfN7MHgYOAKaya5JIPsoyI3DQkqpw1qwJwfzqq4tuofQqVeB29xl3P8ndP+Luv+3uGrqQ0qvbglRR7VU4b7wBmzZp1b2q0gQcqYWiJ76UTdH15NIfBW6phflK4eqk6Hpy6Z8Ctwy9ToGqjnnvfurJ63aQKysFbhl6nQJV3fLe/Uy0qeNBrqy0OqAMtU4TX5qbDOzZU/xKeIOUdqJN0asa1o1WB5Ta6rQg1Zo1+2+oWwdp1nnR4G65KHDL0IsGqiVLWuVwGzdqgK5bqkIpFwVuqYW5uX1ztEUv+FQlqkIpHwVuGXrNgH3ZZWEg8sors1sJrw5VFjrIlY8GJ2XorV0bFlJasCAEnIkJePRROPzw/W/by0p4/ewdWRVFr2pYZ3msDihSCc2v+e77DkSuX99/ZUS0lHBYqyyKXtVQ4ilVIkNt3bqQk43KIkdbpyqLKu42NOyGP3BPTYWk5pQWMKybZnCNy0P3m6NVlYUUabgD99QUrF4N11wTfip410pcbxvC1/t+tuRSlYUUbbgD9/33w+7d4fzu3eGy1EJzavfISMjFNk9jYzA6GoJs2h3hVWUhRRvuwL1yJSxaFM4vWhQuSy1EZ0w+/3zrtGNHuP6II9LlaLWprpTBcFeVrFoFd90VetorV4bLUht5DJ6pykLKYLgDN4RgrYAtGVI1hRRtuFMlIiJDSIF7UFSWKCIZUeAeBJUlikiGUgduM/t9M3vAzO4xs4VZNmroqCxRRDKUKnCb2XHA+939FOAe4F2ZtmrYqCxRRDKUtqpkFXCImf0Z8CLwx9k1KSNTU+UpA1RZ4sDMzakkT4Zf2lTJUmCXu/8Gobd9cvSXZnaJmU2b2fSuXbv6bWPvyphTXrUKvvIVBe0caTNbqYu0gfs14OnG+WeBo6K/dPfr3X3S3SeXLl3aT/vSUU65luq2Y7vUV9rAvRX4UOP8ewjBuzzqmFOueblhnZZZFUkVuN39YeAlM3sUeNrdf5Jts7rQKVA1c8pXXBF+ljk9kUXALWNqaMC0zKrUirvnelq+fLlnbvNm90WL3CH83Lw5+8cYhKyexxVXhPtonq64Itt2lty2be4TE/u+BBMT7jMzRbdMJD1g2hPiajUn4Nxww3DksLPKxdcxNRShZValbsofuNtTCVNTcMcdrd+Pj1c3UGUVcKuUGsqYllmVOir36oDN3O3u3fC1r7VqoaNrap5zTnUDVZb13TVdBVHLrEodlTtwx6USVq4MQXz37tBLveiiQpvYt5oG3CxpmVWpm3KnSuJSCTVOC4iIQNl73EmphKx6qWWaFi8i0iULVSf5mZyc9Onp6VwfI5Vo/nzRouHvvesgJVIpZrbV3SfjflfuVEme6jQtXhN0RIZKfQN3nWqf63SQEqmB4Qvc3U4hr9MgZ50OUiI1MFw57va89dVXh+XilNdVjlukYjrluMtdVdKr9pTAFVeEudDNyTvRgNVvIKtaIFS9uMjQGK5USTQlMDraWsCiPa/b72CdBvtEpEDDFbijeetrrknO6/Y7WNfN38etsVLG9bLL2i4RSZa0bGBWp1yWde3W5s1hidP25VLTLqfavL/16zv/ffv9z3f7ogzL8rgiQ4gOy7oOV467XVJeN83iTr0MfLb3yO+8c/8eehnyzXHfHMrQLhHpqPqpkriv+t18/Y/bvLfT37UHuVdeSd78d+XKsDQdwNhYWGe0eblM5XgqExSppqSueFanXFMlcV/1+0mDNP9ufNz93HP3/dte7nfz5nAf0S1Z4u6zDJLSSdK32dmiWyBVxtCmStp7wRdeCMcfn+7rf/S+9uyBW26BjRtbZYS9pFfa1wxv3uexx5YvFaEywVzMzMDy5fDYY7BsWdGtkWFT7VRJ9Ks+wI4d8OMfh/QEJH/9j0uJtN8X7F8xEpde6aZdndoiQ2ndupBN0/Zpkofqz5ycmgo97R07Wte9732wZk18z7jTqoBTU3DjjfAnfxJ6yJ1WDZxvAk7z9wcf3P/szapN9qm5mRk48UR4880wvPHUU+p1S+86zZysdo67af36ffPJ69cn37abHdHny/sOooyu29LDTn/bS7uU687MBRe4j46Gt2xsLFwW6RUdctx9BWXgC8DmTrcZWB33+vXuK1Z0Dtru+wbd0dHubt8e0LoJ/v1ob2Mvj5XmoKJ67sxs2+Y+MbHvWzYx4T4zU3TLpGo6Be7UOW4zWwZ8Ju3fZ+6LX4QHHww/kzRTDuef35oSf/XVyWWDcVPbp6Zg27Z8y/uiA6Wzs6Gt3T5WmlmhWvY1M+vWtVZaaJqbU65bstXP4OS1wOVxvzCzS8xs2symd+3a1cdDZCgahL/97eR1TKLaA9qNN4b7uOUW2Ls35NKvvjr7vHN7ffU113S//Gya2mzVc2dibg42bIAFC0Juu3kyg9tuC78XyUKqckAzOw94Angy7vfufj1wPYTByVQty3pALq4XOzvbOVC17yjv3rqPt9+GJ58MgfuDH8w2eKeZ2dnP3/bzePL3RkZg5879K0EhfEEbGRl8m2Q4paoqMbPvA0cTAv8JwFXu/o2426aqKsljP8i0a3VHDyDQuo+oK64IZYLtt1cAFJGUMl+P293Pa9zxMcC3koJ2almvodEMpmk2VmifoHLXXXDDDXDHHa2SwWZQjx4c4tYAT2qXgryI9KCcMyfbUxT95Fyz7r03A3lc0O3lgNNrkO+GDgQitdBX4Hb3GeBj2TQlIsuca14r4MVNFe/lgJPHt4r2A0HzcRTIRYZKeae8dzu9fD5ZVUx0u+JgtxsQZ13JkVQBo116RIZOeQN3P6JBNovd3JO2KosL5s0DDnQO9FnvMt9+IIhWwKg2W2S4JM3Myeo08B1w8pgFGDdTstPjFDUTMTrLU7MhRSqNPGZOllbaWYCdUiFxaY1Oj1PUTMRoeinrHr2IlMbwBe40ueP5dm2PC4KdHqcsMxGzGicQkVIpZzlgL9pL4NJUpHRT4REt+WteTnqcss1EVJmgyHBJyqFkdRr41mVp7uPcc1tbjSWtGBjdjmx8vDo5Y+W6RSqJ2mxd1mstdLT2eWwsLCbRXDEQ9p1lecMNrUUo9uwJl6vQe9VO7iJDp9o57oMPjl/ytNud36NB7e23W8u37d4d8tnRnLfZvo/dfrmsypJvF5HsJHXFszrllipJ2hChl53f23d2HxsL583iy//6SZUUucOMdrcRqRyGMlXSvkzrK6/sf320FC8uXRAdRDz44NAjhxCum5q91FWr4O670w3y5bEuSS+0k7vIUKluqiQpBRB3fad0QbNk7pVXQrokasWKfYNstLyuPfXSqQ5cO8yISJaSuuJZnXKvKolLAcRd380GwM1UyHzpkPbUy3wb+qqyQ0R6RIdUSaqNFHqRaiOFokxNhcWZ3OGzn01OL1x5ZRi4bFqxAh56qHU5urFC9L5VSy0iXcp8I4Wh1W0uuH351jVr4PHHOy/nqjyziGREgTuN5qBms3f+wQ+Wa6akiAw1Be5+3Hln6GVv3BgCd3t6REQkB9WtKimaKkVEpCAK3GlpRqKIFESpkrTKtgKgiNSGAnc/VCkiIgVIlSqx4Dtm9oiZ3WVmOgCIiAxI2hz3ScCou38YWAycll2TRESkk7SB+0Xg2sb5tzJqi4iIdCFVisPd/xrAzM4CFgL3Rn9vZpcAlwAcffTRfTZRuqZp9SK1kHqtEjNbDVwGnOHuP0+6XenXKhmWYBddOnbRIu3sLlJxndYqSTs4eQTwReCTnYJ26c23u3uVaEKQSG2kzXFfCBwJ3GtmW8zsogzbNDjDFOw0IUikNtLmuL8KfDXjtgxe+yp/VQ52mhAkUhv1rr8etmCnCUEitVDvwA0KdiJSOVpkSkSkYhS4RUQqRoFbRKRiFLhFRCpGgVtEpGIUuEVEKib1WiVdP4DZLuBnjYuHAi/l+oD5UvuLV/XnUPX2Q/WfQ1Xav8zdl8b9IvfAvc+DmU0nLZpSBWp/8ar+HKrefqj+c6h6+0GpEhGRylHgFhGpmEEH7usH/HhZU/uLV/XnUPX2Q/WfQ9XbP9gct4iI9E+pEhGRilHgFhGpmIEEbgu+Y2aPmNldZlbJ5WTN7AtmtrnodqRlZr9vZg+Y2T1mtrDo9vTCzA40s41m9qCZrS+6Pb0wszEz29Q4P2FmPzKzJ8zsu2ZmRbevG23PoXKf52j7I9dV9vM8qB73ScCou38YWAycNqDHzYyZLQM+U3Q70jKz44D3u/spwD3AuwpuUq/OBx5x95OA95vZiUU3qBtmdgCwFTi1cdWnge3u/mvAIZHrSyvmOVTq8xzT/sp/ngcVuF8Erm2cf2tAj5m1a4HLi25EH1YBh5jZnwGnANsKbk+v9gCLGj3UCSryf+Tuv3D3DwDbG1etBO5rnL8f+K1CGtaDmOdQqc9zTPuh4p/ngQRud/9rd/+JmZ0FLATuHcTjZsXMzgOeAJ4sui19WArscvffIPS2Ty64Pb36PvBPgb8EnnL3ZwpuT1q/BLzaOP8asKTAtqSiz3PxBjY4aWargd8FznD3uUE9bkZOJ/RYfwAsN7NLC25PGq8BTzfOPwscVWBb0rgc+Ka7vxdYYmYrim5QSi8BBzXOH0Q11szYjz7PxRrU4OQRwBeBT7r7zwfxmFly9/Pc/WTgU8BWd/9G0W1KYSvwocb59xCCd5W8E3izcX4P8I4C29KPKVo54ZXAjwtsSyr6PBdvUD3uC4EjgXvNbIuZXTSgx5UGd38YeMnMHgWedvefFN2mHl0HfN7MHgYOIATAKroZOMrM/hx4mWo+D32eC6aZkyIiFaMJOCIiFaPALSJSMQrcIiIVo8AtIlIxCtwiIhWjwC0iUjH/HyazQGSZ+5e+AAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAW4AAAEFCAYAAADDkQ0WAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAAYOklEQVR4nO3df7DldX3f8ed73YVdkEWUpVIMSwkNbDHS6V5H5UdSdyNqBFLS6bTFJDgk2cYZO5ZJNQHKlsYxU23/YBvtOIw1ShLHwpgOaMMwsmh1+VG8y1Rb4qLVXHDTZr1oceXXAnff/eN7TvZwOefec773+z3n+z3n+Zi5c889P9974bzu57w/n+/nG5mJJKk91k26AEnSaAxuSWoZg1uSWsbglqSWMbglqWUMbk2diIg+110ZEdsiYtOy6/9+RPxMRHwmIt4REW+PiFMi4h9ExL8a8PwXR8RxddUvrWb9pAuQVhMRnwL+O/B94Od6bvpD4C+Bp4H9mfn3IuK3gZOB3cue5p3AKcC1EbE3M/9z5/rXAv8UeJ7i/bAHeFPn68cDSvq3wGUR8SbgNuC7Pbe9Eng4M68p82+VhuGIW22wiyKg3wI8ANxC8f/uGcAdwEnA4c59Pw1cHRF/Y9lzHAaOAtcD3+q5/ovAv+/cBvBe4EngEmBDRFwWEb8UEWdHxCsiYgPwIpCd73dk5sXdL+A3KP4ISLVxxK1G67Q9NmTmFyNijmIU/FXgkxRh+wI9QZmZP4yI1wPnRsR/A57q3HQa8I+A9wEnRcTFmbkI7AU2A+cBF1KM3rcDPwvsA64B/ibwLzrfPwr8HeAu4I+BKyLiPIo/HqdRjL7/ay2/DKnD4FbTnQ98LiL+Q891f9nvjhHxNuADwF9l5q9RhHH3tncDfzszb1r2sMsoAvj/UYzW3wr8EHgkM383Iq4BtmTmg537XxgRX+k87meBOzPztyLizcCvZ+ZvRsS6iFiXmUeRamCrRI2Wmf+Lot/8zc5V6yhG2f3cC/wiRUuFiPhERHwzIh6hGBGf2XvniFgPfAi4CVgAHqRoo/xD4LSIeCVwOnCw5zGbgBOA/wi8nmLEvQ/4BEXfex9wH3DRWv7d0koMbjVeZj4NPAoE8GrgJ52b1i2731Jmvggsda7658DDFD3y2ylGyL1+meKPwq0Uk5efAs7IzLdTtGKuAt5I549GRFxEEe6vA/4dRRvmP/X0trv97rdk5teq+ddLL2dwq/Ei4mTgKxQtkm4f+dsU/e71wIY+jzk9M7sj86XM/CGwKSI299ztTyn62j8G/hbwEYo/EFCE+LXAucAjnevuz8wLOq/9F8DfBf53RBwAPg5cHBH/o9NekWpjcKsNLgW+nJm3UPS876UIzf8J/A7w7LL7nwx8OSJO7Py8PSJ+iWLS8IqIuCAiju+Mzk8F5ilCfA/wYudxP6KYqNzX7VXny7fSfDvFCPyvMvNNmfl64GOAa7xVK4NbbfCbwO0RcTlwAfBfKJbi/UFm7qdojawDiIizKUbl/5JiFPxWipB/mmIZ4e8C/wa4uvPc/wf4M+BVwIcp+uNzwJco/kBcEhHXdfrhRMRvUYzC3wVEZj4KnBcR+zr97d+p7bcgdbiqRI0WEa+hCNX9wOeBd2dmRsQNwDmdu+2lOEAH4HHgys7ywYeBQ5m51PN8nwb+GbArIn4auLvz9a7M/L+d8P0C8NuZeXtEfBz4HHAP8HWKicu7Kdor/7rztN/t9LmJiN8AuiN9qRbhiRTUdhHxit5wHvGxx2Xm88uuW99po0iNZHBLUsvY45akljG4Jallap+cPPXUU/Oss86q+2Ukaars37//iczc0u+22oP7rLPOYn5+vu6XkaSpEhGPDbrNVokktYzBLUktY3BLUssY3JLUMga3NKOWSh1rqiYwuKUZtLAAp50Gjw1ct6AmM7ilGbR7Nzz5JNx446QrURkGtzRjFhbg9tvh6NHiu6Pu9jG4pRmzeze82Nn7cGnJUXcbGdzSDOmOtrvB/cILjrrbyOCWZkjvaLvLUXf7GNzSjFhagttug3XrYOPGY18Rxajb5YHtMdQmUxGxAfjTzLy857prKU739At1FSepOq94BfzgB3DkyMtvO/744na1w6rBHRGbKM7n9zM9120F3gMs1laZpMpt3jzpClSFVVslmflsZr4BONhz9R7gukGPiYhdETEfEfOLi2a7JFVp5B53RFwFfAP480H3ycxbMnMuM+e2bOm7D7gkqaQyk5OXATuBzwHbI+J91ZYkSVrJyGfAycyrACLiLOCTmfmxqouSJA3mckBJapmhR9yZec6ynxcAlwJK0pg54lYpHqwhTY7BrZG5l7M0WQa3RuZeztJkGdwaiXs5S5NncGsk7uUsDa+uuSCDW0NzL2dpeHXOBRncGpp7OUvDq3MuKDKz+mftMTc3l/Pz87W+huq3tAQnngiZxX7OXUePFj8/9ZTbgkpdCwuwbRs891yx5/mBA7B162jPERH7M3Ou320jH/Ku2eReztLw+s0F3Xprdc/viFuSKtQ72u4qM+peacRtj1uSKjSOuSCDW5IqMq7zetrj1sxbWrJHr2qMay7I4NZMW1iA7dvh4YdHn/WX+hnHeT1tlWimue+K2sjg1sxy3xW1lcGtmeW+K2org1szyX1XVufJMprL4NZMct+VlXmyjGYzuDVzxrXWts2ctG02lwNqKnTXYg+zJtt9V1a2fNL2Qx9yqWTTDDXijogNEfGFzuWIiM9ExIMRcWdEGP6aqO7H+n37hv94v3kzbNny8q9xrMFtOidtm2/V4I6ITcB+4G2dqy4C1mfmm4HNwKX1lSetrvux/uqr/Xi/Vk7atsOqwZ2Zz2bmG4CDnasOAXs6l5/v95iI2BUR8xExv7i4WE2lUh+9H+u/9z3XZK+Vk7btMPLkZGZ+JzMfiogrgeOAu/vc55bMnMvMuS1btlRRp9SXQVMdJ23bo1R/OiKuAN4PXJ6Z/ufURCz/WN/V/XjvpNponLRtj5GDOyJeC3wAeEdmPl19SdJw+o22u+o468gscHK2Hcqs474aOB24OyL2RcQ1Fdckrar3Y/0gfrzXtBp6xJ2Z53S+fwT4SG0VSUPo/Vj/k5+89OP98cfDSSf58V7TyzXYaq3ux3rnvzVrPORdklrG4FZt7C+Pj7/r2WJwqxbuLjc+/q5nj8GtWri73Pj4u549BrcqN82nBGtaS2Kaf9cazOBW5aZ1d7kmtiSm9XetlRncqlQTd5erapTctJZEE3/XGg+DW5Vq2qZPVY2Sm9iSaNrvWuNjcKsyTdxdrqpRctNaEk38XWt8IjNrfYG5ubmcn5+v9TXUHIcPD95dbtwbGC0swLZt8NxzRagdOFBut8De5+nauBEeeQTOPruqakfXpN+1qhcR+zNzrt9tHvKuSjUpMPqNksvsFtivJfHii0WYf/vbk9s6tkm/a42XrRJNpaom7ga1JI4eheefhxtuqLx0aVUGt2o1qV5rVRN33V0IDx6Exx8vvvbtgw0bits///lmTFRqthjcqs2k1j1XPXG3/Izwe/Yce44mTFRq9hjcqs2k1j33GyU//njx86FDa9uj27XTagKDWy9TRXtj0uuel4+Su19rndBz7bSawODWS1TV3mjauucquHZaTWFw6yWqaG9MazuhzhaMNAqDW3+tqvbGNLcT6mrBSKMwuPXXqmhv2E6Q6jdUcEfEhoj4Qufyxoj4YkR8IyL+KCKi3hI1DlW1N2wnSPVbNbgjYhOwH3hb56pfAQ5m5gXAKT3Xq8WqbG/YTpDqtWpwZ+azmfkG4GDnqh3AlzqX7wXeuvwxEbErIuYjYn5xcbGyYlUP2xtSu5TZZOo1wI87lw8D5y6/Q2beAtwCxe6ApavTWHTbG4N2mrO9ITVLmeB+Aji5c/nkzs9qOdsYUnuUWVWyF7i0c3kH8OXqypEkraZMcP8JcEZEfBP4EUWQS5LGZOhWSWae0/l+BListookSSvyABxpirgCaDYY3NKUGMf+5/5haAaDW5oSde9/PqkTY+jlDG61hqO9wcax//mkToyhlzO41QqO9lZW9/7nkz4xhl7K4FYrtG20N85PB+PY/3waT4zRZga3Gq9to71xfzqoe//zaT0xRpsZ3Gq8to32xvnpYBwbhE3ziTHaKjLr3QNqbm4u5+fna30NTa+FBdi2DZ577th1GzfCgQOwdevEyhqot95x1Xn48OANwta6B83SEpx4ImQWfxy6jh4tfn7qKTchq0tE7M/MuX63ldlkShqblUZ7t946mZpW0u/TQd111rlBmDtHNpMjbjVWm0Z7S0vw/e+369OBms0Rt1qpLaO9hQXYvh1+/ufr+3SwtNScf68mz8lJNVobToPWnYy84456Jgldw67lHHFLa9C7VHHjRrjvPvipn3rpfdb66aB3lUoT+/oaP0fc0hosn4y8+eZqPx20bQ27xsPglkryiEVNisEtleQRi5oUg1sqwSMWNUmu45ZK8ohF1cl13FINPGJRk2JwSw3VpLXqapZSPe6IODEi7oiI+yLio1UXJUkarOzk5LuBBzPzIuD8iNhWYU2SpBWUDe4jwAkREcBG4PneGyNiV0TMR8T84uLiWmuUJPUoG9yfBd4JfAs4kJnf7b0xM2/JzLnMnNuyZctaa5Qk9Sgb3NcBn8jM84BXR8SFFdYkTRXPTq+qlQ3uk4DursNHgFdWU46qYFA0hzv7qQ5lg/vjwHsj4gFgE7C3upK0FgZFs7Tt7PRqh1LBnZkLmXlRZr4lM/9xZjrGawiDojnc2U91ca+SKWJQNIs7+6kuBvcUMSjqNcrcgTv7qU4G95QwKOo16tyBO/upTgb3lDAoCnWtqBll7qB3y9fjj69ny1fNNoN7Coxjb+g2qGtFzahzB92d/e6/HzZtKs5D+fjjcPAgHDrkzn5aO4N7CnSD4uDBIiC6X7MWFHWtqCkzd7B5M+zZU+zZ3XseSnf8UxU8kYKmwsICbNsGzz1XfNo4cAC2bq32ebuGef666tHsWOlECo64NRXqWlFTdu7AFT6qkyNutV7ZUfFqyp4+rK56NFsccWuq1bWipuzcgSt8VDdH3Gq1pp1Ut2n1qL08WbCmVtNOqtu0ejSdDG61XtOW2DWtHk0fe9yS1DIGtyS1jMEtSS1jcEtSyxjcktQyBrcktYzBXYW9e+GGG4rvklQzg3ut9u6FK66A3//94rvhLalmpYM7Ij4YEV+LiLsi4rgqi2qVe++FZ54pLj/zTPGzJNWoVHBHxNnA+Zl5CXAX8LpKq2qTHTvghBOKyyecUPwsSTUqe8j7TuCUiPgqcAj4g94bI2IXsAvgzDPPXFOBjbdzJ9x5ZzHS3rGj+FmSalRqd8CIuB746cz89Yh4APhgZn6t333dHVCSRlfHftyHgUc7l78HnFHyedrDlSOSGqJscO8H3ti5fA5FeE8vV45IapBSwZ2ZDwBPRMTXgUcz86Fqy2oYV45IapDSywEz872Z+cbM/LUqC2okV45IahBPpDCMMitH9u51pYmkWnjOyTp0e+LPPFOM0O+80/CWNBLP8l7GWlaR2BOXVCODu5+1riKxJy6pRva4++k3Yh621dHtbd90Ezz5pD1uSZUzuPvZsQNuvvlYj3rYEXMdvW0nOSUtY3D3U3b/kbWM1Pvp/UNw881OckoCDO7Bdu4cPSTLjtQHWf6HYPfuY7VJmllOTlapO1K//vpqRse9k5wA998/tkPul5ZqfwlJJRncKymzJHDnTvjwh6sZFXf/EFx44bHrxrC8cGEBTjsNHnus1peRVJLBPUhTNpbauRN+7/fGurxw9+5iQcyNN9b6MpJKMrgHKXsQzVq3f+33+KpbMCtYWIDbb4ejR4vvjrqlBsrMWr+2b9+erXTPPZknnJAJxfd77qnnMVU+vgK/+quZ69cXJWzYUPwsafyA+RyQq+0fcdd1goMyo9y1Huo+4UPlu6PtF18sfn7hBUfdUhO1O7jr7kOPOtG41kPdJ3yo/O7dx0K7a2nJXrfUNO0O7qZt5rTWXvRqj6/x9GlLS3DbbbBuHWzceOwrohh1uzxQao52b+s6S9unjuHfevgwHDny8uuPPx42b670pSStYqVtXdt95GTZQ9ObbNDeJFUfTt+H4Sy1Q7uDG8odmr6aSW3stNLeJFUfTi+ptdof3FWb5MZOK42qp/HThaRS2j05OaphJvfGNeHZr5beVSXr18OrXvXSx1R5OL2k1lpTcEfEtRFxT1XFVGp5MA67dHAcS/IG1bJzZ3EChvXri3V5N900uUPtJTVW6eCOiK3Ae6orpUL9gnHYkfQ4Di9fqZYnnzy2mLoJSxwlNc5aRtx7gOv63RARuyJiPiLmFxcX1/ASJfULxlFG0qO0JMqsrV6pFs9XKWkVpdZxR8RVwLnAHwKfzMxfGHTfWtdxDzJozXPVq0XWsrZ6pVo8XZk081Zax102uD8LnEmxKuVc4MbM/Fi/+04kuGE84XfDDUU7puv664uR+jgZ8tJUqjy4e574LJo44q7L8pCc9JGbk359SbWZ3iMnx2nQ+u5R11ZXOUIew9GUkppnTcsBM3NhpdH2VBm0EmTUicwqdzN0IlOaSbN1AM5aVBGSVR/cM8Yz40hqDlslwxqmLbJaG6SO/Ubq2KtFUqM1d1vXtq2WGHaisG3/LkkT0b7JyUlu9FTWsBOF/UbIhrmkETSzx920M9sMo2wPvO7Tr0maOs0M7jaulig7UdjGP1KSJqqZrZKm7D09agujzEShJ0iQNKLmTk5O2jiPSrTHLWmZ9k1ONsE4j0p0SZ+kETSzx90Eg/rsZbZxlaQKOeIepF+fvY3LFCVNHYN7JctbGG7qJKkBbJWMoq5lirZfJI3AEfco6limaPtF0ogM7lGttAKkzLI+2y+SRmSrZBjDtDLKHrrexqNEJU2UI+7VDNvKKDtybspRopJawxH3aobdS2QtI+dhzqLjBKakDoN7NcMGcp1no3EHQUk9bJWsZpRWRl2HrjuBKamHwT2MSe8l4g6CknqUCu6ICODTwLnAD4BfzswXK6xLvZzAlNSj7Ij7ImB9Zr45Ir4CXAr8WWVV6eXqHvW7tazUGmUnJw8BezqXn19+Y0Tsioj5iJhfXFwsXZzGxMlPqVVKBXdmficzH4qIK4HjgLuX3X5LZs5l5tyWLVuqqFN18vRpUquUXg4YEVcA7wcuz8yl6kqaQZNeo+3Rm1KrlJ2cfC3wAeAdmfl0tSXNmCZsMuXkp9QqZScnrwZOB+4uFpjwqcz8VGVVzZKmrNGe9JJHSUMr2+P+SGaek5kXd74M7bJsU0gakQfgTJptCkkjMribwDaFpBG4yZQktYzBLUktY3BLUssY3JLUMga3JLWMwS1JLWNwS1LLGNyS1DIGd1NNesdASY1lcDeRJzaQtAKDu4k8sYGkFRjcTeSOgZJW4CZTTeSOgZJWYHA3lTsGShrAVokktYzBLUktY3BLUssY3JLUMga3JLWMwS1JLROZWe8LRCwCj434sFOBJ2oopw7WWg9rrYe11qOOWrdm5pZ+N9Qe3GVExHxmzk26jmFYaz2stR7WWo9x12qrRJJaxuCWpJZpanDfMukCRmCt9bDWelhrPcZaayN73JKkwZo64pYkDWBwS1LLNCq4o/CZiHgwIu6MiEZvOxsR10bEPZOuYxgR8cGI+FpE3BURx026nkEi4sSIuCMi7ouIj066nkEiYkNEfKFzeWNEfDEivhERfxQRMen6ei2rtdHvsd5ae65r5Ptsea3jfI81KriBi4D1mflmYDNw6YTrGSgitgLvmXQdw4iIs4HzM/MS4C7gdRMuaSXvBh7MzIuA8yNi26QLWi4iNgH7gbd1rvoV4GBmXgCc0nP9xPWptbHvsT61NvZ9trzWcb/Hmhbch4A9ncvPT7KQIewBrpt0EUPaCZwSEV8FLgH+YsL1rOQIcEJn1LqRBv5/kJnPZuYbgIOdq3YAX+pcvhd460QK66NPrY19j/WpFRr6PutT61jfY40K7sz8TmY+FBFXAscBd0+6pn4i4irgG8CfT7qWIW0BFjPz5yhGAhdPuJ6VfBZ4J/At4EBmfnfC9QzjNcCPO5cPA6+eYC0rast7DFr3Phvre6xRwQ0QEVcA7wcuz8ylSdczwGUUf2E/B2yPiPdNuJ7VHAYe7Vz+HnDGBGtZzXXAJzLzPODVEXHhpAsawhPAyZ3LJ9Pw/TVa8h6Ddr3Pxvoea1RwR8RrgQ8A78rMn0y6nkEy86rMvBj4J8D+zPzYpGtaxX7gjZ3L51D8j9VUJwHPdS4fAV45wVqGtZdjveIdwJcnWMuK2vIeg9a9z8b6HmtUcANXA6cDd0fEvoi4ZtIFTYPMfAB4IiK+DjyamQ9NuqYVfBx4b0Q8AGyiCMWm+xPgjIj4JvAjml2z77EajPs95pGTktQyTRtxS5JWYXBLUssY3JLUMga3JLWMwS1JLWNwS1LL/H8/JS6/bhNUFQAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "%matplotlib inline\n", "\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import matplotlib as mpl\n", "\n", "# 生成模拟数据\n", "np.random.seed(314)\n", "\n", "data_size1 = 100\n", "x1 = np.random.randn(data_size1, 2) + np.array([4,4])\n", "y1 = [0 for _ in range(data_size1)]\n", "\n", "data_size2 = 100\n", "x2 = np.random.randn(data_size2, 2)*2 + np.array([10,10])\n", "y2 = [1 for _ in range(data_size2)]\n", "\n", "\n", "# 合并生成全部数据\n", "x = np.concatenate((x1, x2), axis=0)\n", "y = np.concatenate((y1, y2), axis=0)\n", "\n", "data_size_all = data_size1 + data_size2\n", "shuffled_index = np.random.permutation(data_size_all)\n", "x = x[shuffled_index]\n", "y = y[shuffled_index]\n", "\n", "# 分割训练与测试数据\n", "split_index = int(data_size_all*0.7)\n", "x_train = x[:split_index]\n", "y_train = y[:split_index]\n", "x_test = x[split_index:]\n", "y_test = y[split_index:]\n", "\n", "\n", "# 绘制结果\n", "for i in range(split_index):\n", " if y_train[i] == 0:\n", " plt.scatter(x_train[i,0],x_train[i,1], s=38, c = 'r', marker='.')\n", " else:\n", " plt.scatter(x_train[i,0],x_train[i,1], s=38, c = 'b', marker='^') \n", "#plt.rcParams['figure.figsize']=(12.0, 8.0)\n", "mpl.rcParams['font.family'] = 'SimHei'\n", "plt.title(\"训练数据\")\n", "plt.savefig(\"fig-res-knn-traindata.pdf\")\n", "plt.show()\n", "\n", "for i in range(data_size_all - split_index):\n", " if y_test[i] == 0:\n", " plt.scatter(x_test[i,0],x_test[i,1], s=38, c = 'r', marker='.')\n", " else:\n", " plt.scatter(x_test[i,0],x_test[i,1], s=38, c = 'b', marker='^')\n", "#plt.rcParams['figure.figsize']=(12.0, 8.0)\n", "mpl.rcParams['font.family'] = 'SimHei'\n", "plt.title(\"测试数据\")\n", "plt.savefig(\"fig-res-knn-testdata.pdf\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. 最简单的程序实现" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0]\n" ] } ], "source": [ "import numpy as np\n", "import operator\n", "\n", "def knn_distance(v1, v2):\n", " \"\"\"计算两个多维向量的距离\"\"\"\n", " return np.sum(np.square(v1-v2))\n", "\n", "def knn_vote(ys):\n", " \"\"\"根据ys的类别,挑选类别最多一类作为输出\"\"\"\n", " vote_dict = {}\n", " for y in ys:\n", " if y not in vote_dict.keys():\n", " vote_dict[y] = 1\n", " else:\n", " vote_dict[y] += 1\n", " \n", " method = 1\n", " \n", " # 方法1 - 使用排序的方法\n", " if method == 1:\n", " sorted_vote_dict = sorted(vote_dict.items(), \\\n", " #key=operator.itemgetter(1), \\\n", " key=lambda x:x[1], \\\n", " reverse=True)\n", " return sorted_vote_dict[0][0]\n", " \n", " # 方法2 - 使用循环遍历找到类别最多的一类\n", " if method == 2:\n", " maxv = maxk = 0 \n", " for y in np.unique(ys):\n", " if maxv < vote_dict[y]:\n", " maxv = vote_dict[y]\n", " maxk = y\n", " return maxk\n", " \n", "def knn_predict(x, train_x, train_y, k=3):\n", " \"\"\"\n", " 针对给定的数据进行分类\n", " 参数\n", " x - 输入的待分类样本\n", " train_x - 训练数据的样本\n", " train_y - 训练数据的标签\n", " k - 最近邻的样本个数\n", " \"\"\"\n", " dist_arr = [knn_distance(x, train_x[j]) for j in range(len(train_x))]\n", " sorted_index = np.argsort(dist_arr)\n", " top_k_index = sorted_index[:k]\n", " ys=train_y[top_k_index]\n", " return knn_vote(ys)\n", " \n", "\n", "# 对每个样本进行分类\n", "y_train_est = [knn_predict(x_train[i], x_train, y_train, k=5) for i in range(len(x_train))]\n", "print(y_train_est)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAW4AAAEFCAYAAADDkQ0WAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAAhe0lEQVR4nO3df4wd1XUH8O/x7noX0xgwbDAisk2KQggVVfFGopC0qR3SJi3GQNUEHAwlkdUfVFVQEzWmWG7VoHaTVnEg/UFDCCRAAgn4RyJE8UIU7EDiNQmpSk0b8AKm1F43AQccDPt8+sd90zc7O/PevJk7M/fOfD/Sk9/vubPP78x9Z869V1QVRETkj3lVN4CIiPrDwE1E5BkGbiIizzBwExF5hoGbiMgzDNxERJ5h4KZCicgnROSgiLwmIq+0r6/I8X43isiHcrz+KhE5IiL/KyIHROQzIiJZ36/Ldt4jIt+2/b5EAAM3FUxVx1X1JABfBfBnqnqSqj6U4/3+RFW/mrNZ96nqiQB+CcAHAKzK+X6pici3ReQ9ZW2P6omBmxpLVQ8A2A7gjKrbQtQPBm6qjIhMicgKEXlYRG5r3zdPRP5RRPaLyDMi8r7Ia74kIleFbr+n3Yv9OxH5qYh8R0SOSbn9EwH8OoA97dsfEJEnReRFEdkYet5HRWSfiEyLyF+271smIlOh52wMvyZmWxeLyEEA5wPY0k4Zvb392Lki8u/t9M0WERlJ035qLgZuqtpnAFwP4Jr27XcCeDOAtwBYA+CvU7zHuQCeB7AYwPEA3t/j+ReJyP8AOADgAQDbRGQUwI0AfhPALwL4PRH5lfbz/x7ABQBOA3C2iLwp1Z6FqOp97ZTRTgAXtVNGe9oPrwdwE8x+7wGwvN/3p2YZrLoB1HjjqrojuKGq3xORzwD4SwDvhQlmvUwD2KSqKiI/ALCwx/O3ALgMwJMAHmy/7lwApwLY1X7OMICzAPwAwCMwB5DNAD6iqj9r99bDBEDWiX8eAfBRAAMA/kVVf5zxfagh2OOmqj0WviEil8H0PicB/FHK99irndnSUgXP9vNvAvDHwaYBPKyqi1V1MYAlAO5rP7aq/dwzAPybiJwU85anpmxrXFs+DeD3YQL3hIiszPpe1AwM3OSa8wFMANgG4IMpX3M047ZuA/BuEVkCcwA5R0Te0c4xTwBYKSLHAvhPAP8O4K8AHAZwOoBDAE4SkQUi8hYAl6bc5kGYlAva6RmIyHYAJ6rqJgDfhkkXESVi4CbXfAnA7wLYCxOQR0XkuCI2pKqvALgdwB+0K0w+ApMOeRbAI6q6VVVfBbAJwA8B/DeAHQB2qepPANwK4LswufE7U272bwD8uYj8FKaXDQCfAvAPIvK/MPn12/LvHdWZcD5uIiK/sMdNROQZBm4iIs8wcBMReYaBm4jIM4UPwDnppJN02bJlRW+GiKhWdu/efVBVR+MeKzxwL1u2DJOTk0VvhoioVkTk2aTHmCohIvIMAzcRkWcYuImIPMPATUTkGQZuIo+1WlW3gKrAwE3kqakp4M1vBp5NrD2gumLgpr6wh+eODRuAl14Crr++6pZQ2Ri4KTX28NwxNQXccw9w9Kj5l59JszBwU2rs4bljwwZgZsZcb7X4mTQNAzelwh6eO4LPIgjcb7zBz6RpGLgpFfbw3BH+LAL8TJql8BVwxsbGlHOV+G1qCjjzTOC11zr3jYwAe/YAS5dW1qxGarWAY48FVIF5oW7X0aPm9iuvAAMD1bWP7BGR3ao6FvdY4ZNMkf+69fBuv72aNjXVwABw4ABw5Mjcx4aHGbSbgj1u6oo9PPe1WvwM6og9bsqMPTy3TU0By5cDjz/OtFWTMHBTTwsXVt0CShIu0WTaqjlYVULkKZZoNhcDN1EMH4b2s0SzuRi4iSJ8GNrPQTjNxsBNFOHD0H4Owmk2lgNSo0VL6cKDjVwdZMQSzWboVg7IHjc1VlxKpIi8se18eVCiuW8f8Nxzncu+fcD+/QzaTcDATY0VTYkUkTcuKl++cCEwOjr3wtLNZkgVuEVkSES2Re77mIhsL6ZZRMWKK6UrIm/sQ76c/NMzcIvIMQB2A7ggdN9SAFcV1yyiYkVTIn/xF8Ddd5sc8chI5yJiAnuWdAfrrKkoPQO3qv5cVc8GsC909yYAn0x6jYisE5FJEZmcnp620Ewie+JSIl//OjA5aTdvzDprKkrfOW4RuRzAEwCeTHqOqt6sqmOqOjY6OpqnfUTWJaVExsft5Y1ZZ01FynJy8ncArATwVQDLReQau00iKk6rZT8lEod11lSk1HXcIvJjVT09dHsZgC+o6nu7vY513OSaQ4eSZzu0UZXBOmuygdO6EoUUXTLHqXCpaKkDd7i33b49BaBrb5uoqVhPTUXiABwiIs8wcBMReYaBm4jIMwzcRESeYeAmIvIMAzcRkWcYuImIPMPATUTkGQZuIiLPMHATEXmGgZuIyDMM3EREnmHgJiqR7RXfqZkYuIlSsBFwi1rxnTqacmBk4CbqwVbA5YrvxWrSgZGBm6gHGwGXK74Xr0kHRgZuoi5sBVyu+F6sph0YGbipsdLkQ20E3DJWfG9KbjdJ0w6MDNzUSGnyobYCbtErvjcptxunjAOjaxi4qZHS5ENtBNxWC7j7brO6+8hI5yJigouNnnKTcrtxij4wukhUtdANjI2N6eTkZKHbIOrH1BRw5pnAa6+ZILpnD7B06ezntFrAsccCqiboBo4eNbdfeSX9au2HDiWv+J53UeE0+1JnNj8n14jIblUdi3ss1SrvIjIE4F5VvVBEBMCXAJwB4ACAS1R1ptvriXpptcr7gsXlQ2+/ffZzBgaAAweSA24/bS1yxfc0+2JbmZ9VLzY/J5/0TJWIyDEAdgO4oH3X+QAGVfVcAAsBvK+45lETlJmj7ScfunAhMDo691JkIO5HFbldF/Pprn9ORegZuFX156p6NoB97bv2A9jUvv56UQ2j5igzR1unfGgV+9L0fLorUue4ReTHqnp66PbFAP4UwEpVbUWeuw7AOgBYsmTJ8mddOjyTU8rM0dYpH1rmvgSpkabn08uWO8cd84arYIL2hdGgDQCqejOAmwFzcjLLNqgYLuUngXJztHXKh5a1L1NTwPLlwOOPV5NPp3h997hFZDGAewD8lqq+2ut1rCpxR/hL6EJPKdyDC7An55a1a4E77gAuugi4/35+VmXq1uPOUsd9JYBTADwgIjtE5OpcraPSuJafrFO+uWhVjIwMDyPfutWc/Iy2iZ9VNVjH3RCu5SfLytG6lhrKIu0vJdv7unYtcNddnYPrvHnA/Pmdx308N+AT2z1u8pBrczkEOdp9+4Dnnutc9u0D9u+3EwhcLF3LIs0vJdv7Gi01BIChIWDnzmI+K+oPA3cDuDqXQ9H1t66lhrJIO+ud7X2NS2MdPQp89rPNqZV2GQN3AzQxl+ziNJ9Z8tRpfinZ3tcy5lehfBi4a66pX8J+U0NF/x2ypDLS/lKynQYrI41F+TBw11wTv4T9poaKzIUHB4QsqYw0v5SKSoOF01iLFjE14hoG7gZwaS6HMnr4/aaGisqFBweEHTv6T2Wk/aW0YUOxZXp1OcFbNywHpNKUMQCo3zLDIsskg8Erp54KPP+8uW9oCPjQh9KNOOw1HWywr6+/3tnf+fPtlukF+7BmTfZRknUoyawCywHJCWVUefSbGiqqTDJ8wjAI2kB/qYxev5QGBoBduzq11UG5nq00mI2TnuyxF4OBm0pRZpVH2tRQkWWScemaQJ4DRDTV9OlPd+4Ll+vZSIPZOKjVoSTTRQzcVIoierZ58+VFlUnGDV4JDAxkr+iJ9l7zHHh6bdvGQc3Fksy6YOCmwhXRs837E7zIMsluve3BQeDRR7OlMqK916wHnjR/OxsHNddKMrNysl2qWuhl+fLlSs12xRWqg4Oq5hSauQwNmfvzvOe8efne4+WXVQ8cmHt5+eXs7zkzozo8rDp/vurAwOx9BszfIUub9+5VHRkx7zEyovr0053tjIx0LsHtmZnk9+r1twvvQ7/vHdfe4DIyojo1lfz8RYuSH69Kle0CMKkJcZWBmwplIwhERYOYa1/2l19WffFFs49DQ2b/g0vW/Q4f/IKDXpYDT9q/Xd6DWr8HaxsH4iJU2a5ugZvlgFQ426uch2et66e8rmy29tvmvOVl/O1cKsnMo+p2dSsHZI+bvNLvT/C0svT8y2Ir1VTU3y5OPz32uF8TLqi6XejS4+bJSfJKEZUgwcm6Z57J1bRC2DyJWuZkYy6UZOZhq11Fndhk4CZvhINYMOhkeDg+iPUb0H76U/OzuOqAEWVrrhlXJxvrdTBxtV1pFDn4iIGbvDEwALz4oglaF11kgtCqVXODWD9fmKBnpWqGjl97bZF7kI2NuWZcnGys18Hk6aerGXVp6yBX5OAjnpxsOJ/mkQjmOtmyBbjgguSTRv3Mr7F2LXDnnZ0v48CACRgunBwrQtLnXdX/g24ncK+5Jv88KUW0K80B08aJTZ6cpFiu1s4mCUqzTjst+aRRP6WCcSfrANVLLil6T6qR9HlH73fhRK3rJZ+92DixCdZxUxxXa2fjJAXZ6Bc7/IXpNdjliiviB8kMDPgXKNJI+rzD97tyMK+6oiMPW9U7DNw0h289mriSuGhpXNwXZng4ft+CgUEi8e/54Q+XvouFSvq8o/dffHH1B/MyyxaLYKt8M3fgBjAEYFv7+giAbwJ4AsCX0c6TJ10YuN3kU49m714TZKPBdXh49ijMNWvmfmFEkvftJz+JH904NJR9VKerkj7v6C+UefOqD5RFTJFQFpsjhbsF7p4nJ0XkGADfA/A2VR0RkY8CGFPVPxCRbwL4nKr+a9LreXLSPTZH4pXhkkuA++6bfd/gIHDxxcDnP9+5fcop5msOmAqRwPAw8Oqr8SffbI/qLFuak4pJn/eDD3ZO8kbZGFWZ5YRnv6Muy2xbWrb+T+VaSEFVf66qZwPY175rBYAH29cfAvAb6ZtCLvBp1fdWy1SRALPLs+bNA7Zt66yHeMIJnXK3iy4ygRzoBPikL6lLy7r1K23ZY9LnvXZt8iyGeQfCZK1hLqNssejFHcr4P5WljvtEAC+3rx8CsCj6BBFZJyKTIjI5PT2dp31kWVyN6vCw+aLefbd7U1g+//zcFV6SvsgLF5qe9bZtnYA0MwNs3uzewBob0tQJJ9UkA8DevZ37k0oEsx7M89QwFx346rC4Q5bAfRDAce3rx7Vvz6KqN6vqmKqOjY6O5mkfWRbXo1m1ygwuWL3avZrucG8xvMJL0hfZp18TeaRdpCCpB/vCC+aguG+fCeADA+bAODxsLnlGVbq8gILLbetLUvI7egHw4/a/VwP45/b1bwF4b7fX8eSk21yuLum3uqCIKWTLlraNtk8u25yb3OUT3y63LQo2ygFDgXsYpqrkR2BVifdc/o+cpbqgiMURssg6z3iaGmqXy+XYNnusBO6sFwZud7n8H9nn3nPWQSxpB0S5XC7HttnTLXBzrpIGC0+qH3BpYQJfS/X6mSslkHZui7LK5aLbTPOeVbQtLZfblqRbOSADd0P5+B/ZB1knFwpPdtXr4FnmAS2Y2Ovxx9Pth8sHW5fbFoeBm2L59h/ZhqJnwcuyNNjUFHDGGbMHDbkyICrLrweyI9cAHKovnwefZFH0wIusq6Zs2GCeG+ZCCWNtSudqiIGbGqPogRdZashbLeBrX+sM1QeSV/UpW3h/XDiQUAcDN9VSNOAV3XvMumrKwIAZkh8eoh+3qo9tvQ4Irq4FSQYDN9VOXEqk6N5j1jk2pqbMXCzhIfrbtgGHDxeXskqTMmrKCFRfMXBT7URTImX1HrOcM6giQPZKGbm6sDB1sKqEaiWuHO/6692sV6+iJDNtuWITK45c062qZLDsxhAVKZoSue464Otf7/QeA0Gu+9Zbq6tXD9IrSQEya7u6lTzGpYziDl4Mzm5jj5ucYKO+OmnBgF27gJNPnvv8OvYeowNmwn9X3xbQaDrWcZPTbNVXJ+WLx8frWa8el2sO56+jf1eecKwPBm6qnI366qadUIs72EVLHq+9tvN39f3v43r7ysZUCVUq69wecZp0Qi1uKHp4uP3goDnp2Wp1/q4nnODn36ff+VLqgqkScpaN+uqgN9aUIfxxg4miJY8zM52/S/B39fXvU4elxmxj4KbK2KivLnr+ERfFHezi8tcBn0c9cr6UeAzcVBkbJ8ua1htLOth97Wud/PW8mG+1rychOV9KPOa4qRI2Bp/YzI/7Imnxi0svBT73OfN3Xbq083cVMc/pd1BP0dPfptH08kXmuMk54bk9duwwX8idO/ubXMnn3liWKolulSGbNwOLFgGLFwPT051V3NPOmRLmSvqJ5YvJ2OOmyuVd6ivgS28sT5VEGZUzLiyewBWauAIOOSzPUl8uzj+SRtWBsVsaxKX0U5PKO+NYT5WIyLEiskVEdorIeL7mUZNlSXf4PJik6iqJXmkQl9JPvpYvliFrjnsNgMdU9XwAZ4nImRbbRA2RtRww69zXLqg6MHarwuHiCf7IGriPAFggIgJgBMDrPZ5PNEeek08+9saqDoy9evs8GeiPrIH7TgDvB/AfAPao6tPhB0VknYhMisjk9PR03jZSDfmc7siq6sDYrbffxM/DZ5lOTorIBgD/rapfEJG7ANyoqt+Ney5PTlKSJp18qrpKIk0VTpM+Dx8UsZDCmwAE/wWOAPiFjO9DDdakYFDUoglpJfX2r7sO+MpXzO0mfR6+y9rjXgbgDphUy3MALlfV2B9T7HGTS1wYEVi2pN5+q2Xy7E8/Dbz1rdW1j+JZLwdU1SlVPV9Vf1VVP5gUtIlc4sqIwLIlVeGsXm0C+caNVbeQ+sUh79QYTZuQKixahfPqq8C2bZx1z1cM3NQIVQ98cU3V9eSUDwM3NUKvUrgmqbqenPJj4Kba6xaompj3zlNP3rSDnKsYuKn2ugWqpuW98wy0aeJBzlWcHZBqrdvAl2CRgSNHqp8Jr0xZB9pUPath03AhBWqsbhNSrV49d0HdJsgyzwtP7rqFgZtqLxyoFi3qlMNt2cITdGmxCsUtDNzUCK3W7Bxt1RM++YRVKO5h4KbaCwL2tdeaE5HXXWdvJrwmVFnwIOcenpyk2lu71kykNG+eCTgjI8CuXcDJJ899bj8z4eVZO9IXVc9q2GRFzA5I5IXgZ77q7BOR4+P5KyPCpYR1rbKoelZDisdUCdXahg0mJxtmI0fbpCoLH1cbqrv6B+6JCZPUnJiouiVUsiC4xuWh8+ZoWWVBVap34J6YAFatAm64wfzL4N0ocb1twPy8z7MkF6ssqGr1DtwPPQQcPmyuHz5sblMjBEO7BwZMLja4DA0Bg4MmyGZdEZ5VFlS1egfuFSuABQvM9QULzG1qhPCIyeef71xeeMHcv3hxthwtF9UlF9S7qmTlSmDrVtPTXrHC3KbGKOLkGassyAX1DtyACdYM2GQRqymoavVOlRAR1RADd1lYlkhEljBwl4FliURkUebALSKfEJFHROR+EZlvs1G1w7JEIrIoU+AWkbcCOEtV3w3gfgBvsdqqumFZIhFZlLWqZCWAE0TkOwD2A7jRXpMsmZhwpwyQZYmlabVYkkf1lzVVMgpgWlV/Daa3/a7wgyKyTkQmRWRyeno6bxv752JOeeVK4FOfYtAuEBezpabIGrgPAXiqff0ZAKeGH1TVm1V1TFXHRkdH87QvG+aUG6lpK7ZTc2UN3LsBvLN9/XSY4O2OJuaUG15u2KRpVokyBW5VfRTAQRHZBeApVf2+3Wal0C1QBTnl9evNvy6nJ2wEXBdTQyXjNKvUKKpa6GX58uVq3fbtqgsWqALm3+3b7W+jDLb2Y/168x7BZf16u+103N69qiMjs/8EIyOqU1NVt4woOwCTmhBX/RyAc8st9chh28rFNzE1FMJpVqlp3A/c0VTCxARw772dx4eH/Q1UtgKuT6khyzjNKjWR27MDBrnbw4eBz362UwsdnlPz0kv9DVQ267sbOgsip1mlJnI7cMelElasMEH88GHTS7366kqbmFtDA65NnGaVmsbtVElcKqHBaQEiIsD1HndSKsFWL9WlYfFERCmJqTopztjYmE5OTha6jUzC+fMFC+rfe+dBisgrIrJbVcfiHnM7VVKkJg2L5wAdolppbuBuUu1zkw5SRA1Qv8Cddgh5k05yNukgRdQA9cpxR/PWGzea6eKY12WOm8gz3XLcbleV9CuaEli/3oyFDgbvhANW3kDmWyBkvThRbdQrVRJOCQwOdiawiOZ1856s48k+IqpQvQJ3OG99ww3Jed28J+vSvD5ujhUX58t2tV1ElCxp2kBbl0KmdU1r+3YzxWl0utSs06kG7zc+3v310ffv9fyq1GV6XKIaQpdpXeuV445KyutmmdypnxOf0R755s1ze+gu5Jvjfjm40C4i6sr/VEncT/00P//jFu/t9rpokHvppeTFf1esMFPTAcDQkJlnNLjtUjkeywSJ/JTUFbd1KTRVEvdTP08aJHjd8LDqZZfNfm0/77t9u3mP8JIsce/pgqR0EuU2M1N1C8hnqG2qJNoLvvJK4G1vy/bzP/xeR44Ad90FbNnSKSPsJ70SnTM8eM/TTnMvFcEywUJMTQHLlwOPPw4sXVp1a6hu/E6VhH/qA8ALLwAPP2zSE0Dyz/+4lEj0vYC5FSNx6ZU07erWFqqlDRtMNo3Lp1ER/B85OTFhetovvNC57x3vAFavju8Zd5sVcGIC+OIXgW98w/SQu80a2GsATvD48cfnH73p22CfhpuaAs48E3jtNXN6Y88e9rqpf91GTvqd4w6Mj8/OJ4+PJz83zYrovfK+ZZTRpS097PbaftrFXLc1V1yhOjhoPrKhIXObqF/okuPOFZQBfAzA9m7PKa2Oe3xc9bzzugdt1dlBd3Aw3fOjAS1N8M8j2sZ+tpXloMJ6bmv27lUdGZn9kY2MqE5NVd0y8k23wJ05xy0iSwFclfX11n3848DOnebfJEHKYc2azpD4jRuTywbjhrZPTAB79xZb3hc+UTozY9qadltZRoVy2ldrNmzozLQQaLWY6ya78pyc3ATgk3EPiMg6EZkUkcnp6ekcm7AoHIRvvTV5HpOwaED74hfNe9x1F3D0qMmlb9xoP+8cra++4Yb0089mqc1mPbcVrRZw993AvHkmtx1cRIB77jGPE9mQqRxQRC4H8ASAJ+MeV9WbAdwMmJOTmVpm+4RcXC92ZqZ7oIquKK/aeY833gCefNIE7nPOsRu8s4zszPPaPNuj/zcwABw4MLcSFDA/0AYGym8T1VOmqhIRuRPAEpjAfwaA61X1prjnZqoqKWI9yKxzdYcPIEDnPcLWrzdlgtHnMwASUUbW5+NW1cvbb7wMwBeSgnZmtufQCIJploUVogNUtm4FbrkFuPfeTslgENTDB4e4OcCT2sUgT0R9cHPkZDRFkSfnarv3HgTyuKDbzwGn3yCfBg8ERI2QK3Cr6hSA99ppSojNnGtRM+DFDRXv54BTxK+K6IEg2A4DOVGtuDvkPe3w8l5sVUyknXEw7QLEtis5kipguEoPUe24G7jzCAdZG6u5Jy1VFhfMgwMO0D3Q215lPnogCFfAsDabqF6SRubYupS+Ak4RowDjRkp2205VIxHDozw5GpLIayhi5KSzso4C7JYKiUtrdNtOVSMRw+kl2z16InJG/QJ3ltxxr1Xb44Jgt+24MhLR1nkCInKKm+WA/YiWwGWpSElT4REu+QtuJ23HtZGILBMkqpekHIqtS+lLl2V5j8su6yw1ljRjYHg5suFhf3LGzHUTeQmNWbqs31rocO3z0JCZTCKYMRCYPcrylls6k1AcOWJu+9B75UruRLXjd477+OPjpzxNu/J7OKi98UZn+rbDh00+O5zzFpm97ehtV7mSbycie5K64rYuhaVKkhZE6Gfl9+jK7kND5rpIfPlfnlRJlSvMcHUbIu+glqmS6DStL7009/5wKV5cuiB8EvH4402PHDDhOhD0UleuBL71rWwn+YqYl6QfXMmdqFb8TZUkpQDi7u+WLghK5l56yaRLws47b3aQDZfXRVMv3erAucIMEdmU1BW3dSm8qiQuBRB3f5oFgINUSK90SDT10mtBX1Z2EFGf0CVVkmkhhX5kWkihKhMTZnImVeAjH0lOL1x3nTlxGTjvPOC73+3cDi+sEH5v1lITUUrWF1KorbS54Oj0ratXAz/8YffpXJlnJiJLGLizCE5qBr3zc85xa6QkEdUaA3cemzebXvaWLSZwR9MjREQF8LeqpGqsFCGiijBwZ8URiURUEaZKsnJtBkAiagwG7jxYKUJEFciUKhHjNhF5TES2iggPAEREJcma4z4fwKCqngtgIYD32WsSERF1kzVw7wewqX39dUttISKiFDKlOFT1vwBARC4GMB/AA+HHRWQdgHUAsGTJkpxNpNQ4rJ6oETLPVSIiqwBcC+BCVf1Z0vOcn6ukLsEuPHXsggVc2Z3Ic93mKsl6cnIxgI8D+O1uQdt5vVZ39wkHBBE1RtYc95UATgHwgIjsEJGrLbapPHUKdhwQRNQYWXPcfwvgby23pXzRWf58DnYcEETUGM2uv65bsOOAIKJGaHbgBhjsiMg7nGSKiMgzDNxERJ5h4CYi8gwDNxGRZxi4iYg8w8BNROSZzHOVpN6AyDSAZ9s3TwJwsNANFovtr57v++B7+wH/98GX9i9V1dG4BwoP3LM2JjKZNGmKD9j+6vm+D763H/B/H3xvP8BUCRGRdxi4iYg8U3bgvrnk7dnG9lfP933wvf2A//vge/vLzXETEVF+TJUQEXmGgZuIyDOlBG4xbhORx0Rkq4h4OZ2siHxMRLZX3Y6sROQTIvKIiNwvIvOrbk8/RORYEdkiIjtFZLzq9vRDRIZEZFv7+oiIfFNEnhCRL4uIVN2+NCL74N33Odz+0H3efp/L6nGfD2BQVc8FsBDA+0rarjUishTAVVW3IysReSuAs1T13QDuB/CWipvUrzUAHlPV8wGcJSJnVt2gNETkGAC7AVzQvuvDAPap6i8DOCF0v7Ni9sGr73NM+73/PpcVuPcD2NS+/npJ27RtE4BPVt2IHFYCOEFEvgPg3QD2Vtyefh0BsKDdQx2BJ/+PVPXnqno2gH3tu1YAeLB9/SEAv1FJw/oQsw9efZ9j2g94/n0uJXCr6n+p6vdF5GIA8wE8UMZ2bRGRywE8AeDJqtuSwyiAaVX9NZje9rsqbk+/7gTwfgD/AWCPqj5dcXuyOhHAy+3rhwAsqrAtmfD7XL3STk6KyCoAfwrgQlVtlbVdS34Hpsf6VQDLReSaituTxSEAT7WvPwPg1ArbksUnAfyTqr4dwCIROa/qBmV0EMBx7evHwY85M+bg97laZZ2cXAzg4wB+W1V/VsY2bVLVy1X1XQA+BGC3qt5UdZsy2A3gne3rp8MEb5+8CcBr7etHAPxChW3JYwKdnPAKAA9X2JZM+H2uXlk97isBnALgARHZISJXl7RdalPVRwEcFJFdAJ5S1e9X3aY+fR7AH4rIowCOgQmAProDwKki8iMAP4Gf+8Hvc8U4cpKIyDMcgENE5BkGbiIizzBwExF5hoGbiMgzDNxERJ5h4CYi8sz/AWhJ7zS4KfvtAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# 绘制结果\n", "for i in range(len(y_train_est)):\n", " if y_train_est[i] == 0:\n", " plt.scatter(x_train[i,0],x_train[i,1], s=38, c = 'r', marker='.')\n", " else:\n", " plt.scatter(x_train[i,0],x_train[i,1], s=38, c = 'b', marker='^') \n", "#plt.rcParams['figure.figsize']=(12.0, 8.0)\n", "mpl.rcParams['font.family'] = 'SimHei'\n", "plt.title(\"Train Results\")\n", "plt.savefig(\"fig-res-knn-train-res.pdf\")\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Train Accuracy: 100.000000%\n" ] } ], "source": [ "# 计算训练数据的精度\n", "n_correct = 0\n", "for i in range(len(x_train)):\n", " if y_train_est[i] == y_train[i]:\n", " n_correct += 1\n", "accuracy = n_correct / len(x_train) * 100.0\n", "print(\"Train Accuracy: %f%%\" % accuracy)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Test Accuracy: 96.666667%\n", "58 60\n" ] } ], "source": [ "# 计算测试数据的精度\n", "y_test_est = [knn_predict(x_test[i], x_train, y_train, 3) for i in range(len(x_test))]\n", "n_correct = 0\n", "for i in range(len(x_test)):\n", " if y_test_est[i] == y_test[i]:\n", " n_correct += 1\n", "accuracy = n_correct / len(x_test) * 100.0\n", "print(\"Test Accuracy: %f%%\" % accuracy)\n", "print(n_correct, len(x_test))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. 通过类实现kNN程序" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import numpy as np\n", "import operator\n", "\n", "class KNN(object):\n", " def __init__(self, k=3):\n", " \"\"\"对象构造函数,参数为:\n", " k - 近邻个数\"\"\"\n", " self.k = k\n", "\n", " def fit(self, x, y):\n", " \"\"\"拟合给定的数据,参数为:\n", " x - 样本的特征;y - 样本的标签\"\"\"\n", " self.x = x\n", " self.y = y\n", " return self\n", "\n", " def _square_distance(self, v1, v2):\n", " \"\"\"计算两个样本点的特征空间距离,参数为:\n", " v1 - 样本点1;v2 - 样本点2\"\"\"\n", " return np.sum(np.square(v1-v2))\n", "\n", " def _vote(self, ys):\n", " \"\"\"投票算法,参数为:\n", " ys - k个近邻样本的类别\"\"\"\n", " ys_unique = np.unique(ys)\n", " vote_dict = {}\n", " for y in ys:\n", " if y not in vote_dict.keys():\n", " vote_dict[y] = 1\n", " else:\n", " vote_dict[y] += 1\n", " sorted_vote_dict = sorted(vote_dict.items(), key=operator.itemgetter(1), reverse=True)\n", " return sorted_vote_dict[0][0]\n", "\n", " def predict(self, x):\n", " \n", " y_pred = []\n", " for i in range(len(x)):\n", " dist_arr = [self._square_distance(x[i], self.x[j]) for j in range(len(self.x))]\n", " sorted_index = np.argsort(dist_arr)\n", " top_k_index = sorted_index[:self.k]\n", " y_pred.append(self._vote(ys=self.y[top_k_index]))\n", " return np.array(y_pred)\n", "\n", " def score(self, y_true=None, y_pred=None):\n", " if y_true is None and y_pred is None:\n", " y_pred = self.predict(self.x)\n", " y_true = self.y\n", " score = 0.0\n", " for i in range(len(y_true)):\n", " if y_true[i] == y_pred[i]:\n", " score += 1\n", " score /= len(y_true)\n", " return score" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "train accuracy: 100.000000 %\n", "test accuracy: 96.666667 %\n" ] } ], "source": [ "# data preprocessing\n", "#x_train = (x_train - np.min(x_train, axis=0)) / (np.max(x_train, axis=0) - np.min(x_train, axis=0))\n", "#x_test = (x_test - np.min(x_test, axis=0)) / (np.max(x_test, axis=0) - np.min(x_test, axis=0))\n", "\n", "# knn classifier\n", "clf = KNN(k=3)\n", "train_acc = clf.fit(x_train, y_train).score() * 100.0\n", "\n", "y_test_pred = clf.predict(x_test)\n", "test_acc = clf.score(y_test, y_test_pred) * 100.0\n", "\n", "print('train accuracy: %f %%' % train_acc)\n", "print('test accuracy: %f %%' % test_acc)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6. sklearn program" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Feature dimensions: (1797, 64)\n", "Label dimensions: (1797,)\n" ] } ], "source": [ "#% matplotlib inline\n", "\n", "import matplotlib.pyplot as plt\n", "from sklearn import datasets, neighbors, linear_model\n", "\n", "# load data\n", "digits = datasets.load_digits()\n", "X_digits = digits.data\n", "y_digits = digits.target\n", "\n", "print(\"Feature dimensions: \", X_digits.shape)\n", "print(\"Label dimensions: \", y_digits.shape)\n" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAA5cAAAB4CAYAAABmbwYvAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAAXRElEQVR4nO3dfbRVdZ3H8c+X5wcRL5Ig4FUUgcAkjDAakadQZmpGyRmfsxwcFKdWhs5UjC51msimUptM0so0MzV10JXlAyrdMEkySQSEOwKCilyekUGBy72/+eMc9cogv5/ufX777H3fr7VY61z88t1fP/ecffbvnH3ONuecAAAAAABIok3WAwAAAAAA8o/FJQAAAAAgMRaXAAAAAIDEWFwCAAAAABJjcQkAAAAASIzFJQAAAAAgsUIvLs3sLDNba2brzOzzWc9TdGY2wcxuz3qOojOzU8xsVfm+/Y9Zz1NkZnaOmb1sZivN7FNZz9MamNnvzeyarOcoIjPrbWbNZraxxZ8Ds56rqMxssJk9Z2brzexfs56nqMxs2F736UYzG5v1XEVlZp8rPy/Wm9m4rOcpMjObZmavmNlqMxuf9TyhrKjXuTSzGkkrJI2TtFvSHyQd7ZzblOlgBWVmcySNlnS/c+7MrOcpKjPrLGm1pPEq3a+flVTrnNuc6WAFVN6H/FnSSEkDJP3SOXdktlMVm5mdLOlhSd92zn0t63mKxsx6S/qDc+6orGdpDcysTtI3JT0nabmkYc651dlOVWxm1lbSBkkfdc6tyXqeojGzXpL+ImmYpN6SHpLUzxV1MZEhM+uj0n5jqKT+ku5U6XhvT6aDBSjyO5d/J+mPzrnnnHMvSHpK0t9mPFNhOecmSroo6zlagYGSNjjnFjvn6iVtV2nhg/T1lXSbc26jSk+mR5hZkfeZ1eAbkh7LeoiC25L1AK2BmR0mqYdz7lHnXIOkkyW9nvFYrcFISQ0sLCtmsKSXnHPrnXOLJHWR9KGMZyqqj0ta5pxb45yrU+kNheMynilIkQ+Uhqi04n/L/0j6cEazAGlZqtI7xG+9qtVD0suZTlRQ5QX81eUfT5f0rHOuOcuZiszMTpG0WaWzTFA5B5nZUjPbYGZTsx6mwD4saYOZPWFm6yWd6JxjYV95J0t6JOshCmy5pMPNrI+ZfVzSVkmcEVgZ7SU1tfh5h6TDM5rlfWmX9QAVVCOpocXPO8SrK8g551yjpM3lU39ulvQj59xrGY9VaGZ2maRvSzot61mKysxM0r9LmiLpMxmPU2SNklZJOkdSP0nzzOxXzrmtmU5VTD0ljZJ0vEoH4C+Y2b3OuVWZTlV8J6t0BgQqwDm3zsx+rNLHc9pIOs851+T5Z/hgFko6pnwWRB9JR6m04Kx6RX7ncrOkbi1+PqD8d0CulU/NvE1Ss6TpGY9TeM6570oaK+lWMzso22kK6wxJq5xzz2Q9SJE55zY55yaWT2l7VqXT6o/Oeq6CapK0yDm3qHyK5mLl5JS2vDKzHpKOlfS7jEcpLDM7QaWPmB0iqVbSN8yM7yKoAOfcCpVedH1K0oWSlqj0eeKqV+TF5RK9+zTYwSqdUgjk3dUqvQv/D7xiWDlmNvmt0wadc/Mk7VTpM69I31hJo81snaTLJH3RzL6a7UjFY2bjzWxyi7/qpNICE+l7SaUXtd/STqV3jlE5n5I03zn3RtaDFNgnJf3eObfFOfeqpOcljch4pkIys66S7nPOHabSWT2H6d0f96taRV5c/lrSSDMbbmZDVDo15TcZzwQkYmZ9JZ0r6XTn3K6s5ym4HZIuNrOuZnaMSgeKL2Y8UyE55y5yzh3snOst6buSbnDOfTvruQqok6QrzKyzmX1apS/34T5dGc9K6mFmo8ysv6RBknhnvrL4vGXlLZU0wcy6l7/34Xjxxk2l9Ja0oPyO/OmSlufli6oK+5lL59xWM5um0oKyraQvc7kGFMBkSYdKWlH6mJok6RLn3C+yG6mYnHOPlq9tuUKldy0vYB+CPHPO/dbMTlLpXbUNks7Kw9fa55FzrtHMzpN0t0rHIF91zq3NeKyiO0nSf2U9RJE55x40swkqvYPWLOl7zrnFGY9VSM65FWZ2rUpZv6bSx0dyobDXuQQAAAAAxFPk02IBAAAAAJGwuAQAAAAAJMbiEgAAAACQGItLAAAAAEBiLC4BAAAAAImleimSDtbRdVLXVHrt6env07t32FUBXt1xkLem0yth1zZ2jel9a/t2bdnonPvQB/m3aWYdtL3BYa9DdGzjz2drQ7egXm037QiqC5GnrJsPCtvWEYc1eGvWNR4Y1Gv3suagOp+d2qHdbpf5K/ctzax39/X3OebgDUG9Nje39dZsWh42d1r7kCRZx75PW7uwp5rmI/37GavfnXSc961a9h8h++EdjR2CerVfsTPpOBVRLVkHbS/F58XtS+O/1l8tWe/u4+/j/LtgSVLPbtu9NYe2C7vv73T+58WXXzjIW/Nm03btbn6zKp4Xdx3RxVtz2AFhx9YvbzvYW9PptbDLcbs9xXpedAP9++GQ/YKU3vFZ2va3/9jvM76ZdZJ0r6TDJC2SdJ7bz7VLOqmrjrcJSWZ928bTRnlr/uXSu4J6XfHnU7w1A6e/FtRrzzr/AX2ox9y9q9+6nWXWIfrcFrYgPLrLem/N/deOD+pVc+v8oLoQecr6jfHHB9X99PprvTXfem1SUK+1n/A/IYd42j3+rp+zzHrVl/z7kAWfnxXU667tNd6a28eMDOqV1j4kSdax79Ntex4SVPfmjZ29NR0mrvbWpK1a9h8h++EFr9YG9ep32pKk41REtWQdIs3nxbpj/ff9tFVL1msu/KS3Znf3sAPsKRPmemtm9Fwe1Ku+0f8C9yUjJ3tr5m+8510/Z5l1/ZUjvDX/OTrs2PrSB8/11gy6ZmVQr6YG/2MkRLU8L+6+8XBvzRHdwhbxaR2fpa3l/mNvvpfKzpX0inNumKQaSRPTHAzvQtbxkHU8ZB0PWcdBzvGQdTxkHQ9Zx0PWGfAtLsdLmlO+/YSkcZUdp1Uj63jIOh6yjoes4yDneMg6HrKOh6zjIesM+BaXB0vaVr79uqQelR2nVSPreMg6HrKOh6zjIOd4yDoeso6HrOMh6wz4vmVho6Tu5dvdyz+/i5lNlTRVkjrJ/0FhvCeyjoes4yHrePabNTmnhvt0PGQdD1nHQ9bx8LyYAd87l49LOql8e7yk//dJaefczc65Ec65Ee3VMe35WhOyjoes4yHrePabNTmnhvt0PGQdD1nHQ9bx8LyYAd/i8g5Jfc1skaTNKv2SUBlkHQ9Zx0PW8ZB1HOQcD1nHQ9bxkHU8ZJ2B/Z4W65zbJekzkWZp1cg6HrKOh6zjIes4yDkeso6HrOMh63jIOhthV7bOQMg1LM/stiWo1/UH/a+35jfPPhLU62NXTfPW9Lw5veszVouXtod9BvpntfO8NT8+cXRQr5pbg8pypXnMcG/NvB/eFNSrvtFfc8rBC4N6zdKAoLpqUD8r7HqS3xrv34cc8/2Lg3ot/vKN3pofjD4iqNcB96R3rdy8WDUt7P61e7H/WnYDFP86l9Ui5PEcsg+WJK31l9y/44CgVrOOzs/+I9SWL/ivk/tIbdh1co+6+yJvzQD9MahXa9Vhm+9Eu5KHrhzrrZlz8eCgXiHXIQy5PqNze4K2F8PYIWHX+Azxvc/8wlvzwCj/MY8krf1E0mniaDt0UFDd3KF3p7fRgH31zI1hc8W6nm7YoxUAAAAAgP1gcQkAAAAASIzFJQAAAAAgMRaXAAAAAIDEWFwCAAAAABJjcQkAAAAASIzFJQAAAAAgMRaXAAAAAIDEWFwCAAAAABJrF3uDe8Z/LKjuzG5/8db89aQzg3p1X7TMW3P6kxOCem0e3uSt6RnUqXo0jxnurblp4A2B3bp6Kw58vkNgr+JZeWpHb83MjYOCev308XHemhVn/Cio16ygquoweNbrQXW3Xz3SW3N53Z1Bve7aXuOtOeCep4N6FU3bXod4az732ceDet39M/9+uO3QsMdHiKYly1PrFcPSN/t6a07tGvb/VN+4w1vzb4vOCep1eK8N3pqmhvVBvarFqdOfSK3XkffvSq1XEdVe9VRqvV687hPemim9/MeEkvTkxMMDqrYH9aoWv1vq338u6F4b1KvfaUu8NT9Y/XBQrymTp3truszO/jm2sWeX1Hqdv2Z0UN2CV/2/j28e+0BQrzoNCKpLincuAQAAAACJsbgEAAAAACTG4hIAAAAAkBiLSwAAAABAYiwuAQAAAACJsbgEAAAAACTG4hIAAAAAkBiLSwAAAABAYu1ib3DnwWGbvHz9R7w1zYvCLoQb4k/PH5Var2qx5qpPBtU9cP53vDUD23dNOs7b+j66KaiuKbUtVo9B16z01ty9xn8heUl66BL/723ckrODenXQ6qC6ahD8uD92sLfkzG5bglqdvtL/O2nXO2zftmddQ1BdXqya5r8o8/XdZwf1qruus7fmhVtGBPVqs83/+xjwlaBWVWNOg/8+PaPn8qBeIfv05ue7B/VqavBfTD1vhnR+1Vszc6P/gvSS1KZuYdJxcumNyccH1a090VLb5kOf/V5qve4+27/f733d+tS2F8OA2/xHVnPuvCOo1/l/HO2tWbq7V1CvbvVbvTXVcEzYfpl/vxCq4RT/850kjXxgjbdmSIfQ4wr/83UaeOcSAAAAAJAYi0sAAAAAQGIsLgEAAAAAibG4BAAAAAAkxuISAAAAAJAYi0sAAAAAQGIsLgEAAAAAibG4BAAAAAAkxuISAAAAAJBYu9gb3FkTtp69Y/4ob81ALUg6ztvadd8dVLdnW4fUtllptVc9FVR3yazJ3prfLnw06Thva+zZJaguT698tO11SFDd8q8d6a2ZMuHxpOO8rfO5bwbVNaW2xerRvGiZt+bTx50c1Gv4w2v9RQ8HtdLCSX28NXvWNYQ1q6AtX/DvgyXphak3emuGzp8a1KuflnhrVk36SVCvYd+5OKguTzpMXO2tGT35wqBeG4e19daE/G4l6cPyZx36fFQthnTwPwYf2DQ8qNeaqz7irel/z6agXk1LlgfVVYNu9VuD6mov3umtuWngLxNO844pl0wPqus9O1/32RA7e6R3DPuz2nnemr+ZeEZQr7zcr5sa1gfVzdw4yFsTelzd/+ELvDVfPzTsAKTtUP9cafwu8nT8DgAAAACoUiwuAQAAAACJsbgEAAAAACTG4hIAAAAAkBiLSwAAAABAYiwuAQAAAACJsbgEAAAAACTG4hIAAAAAkFi72BvstKU5qO7jH1nhrdkWuM12vXt5a84Y8uegXr966ITAreK9rD+uc1Bd77oKD5KiF75VG1S3atKPUtvmyBmXeWtqGuantr0i2rPOf6F0SVo4qY+3ZtMt3YJ6NVzZw1szcFrYXJXUcVvYvrq+cYe3ZsmoO4J6zVzkv8BzqL6/fNFb05Ta1qpHl9lPB9X11PGpbXNn7e7UelWLe7cd560JuYi8JM38rP/C6zOmhl24fOJZ53tr2tQtDOpVaaEXY+8w0V8zcG3XoF4jZ0zz1tTMLt7zYvOY4UF18354k7fmqLsvCurVqXa7t+acO58J6vXkWR/11oTen6pB3bH+49y5Y/yPZUkaWOfP8ORbvhzU64jrN3hrQh6PPrxzCQAAAABIzLu4NLNJZvaKmT1Z/pPeS8t4GznHQ9bxkHU8ZB0PWcdBzvGQdTxkHQ9ZZyP0tNhZzrlvVnQSSOQcE1nHQ9bxkHU8ZB0HOcdD1vGQdTxkHVnoabGnmdkCM7vPzKyiE7Vu5BwPWcdD1vGQdTxkHQc5x0PW8ZB1PGQdWcjicoWkK5xzIyUdKmlMy/9oZlPN7Bkze6ZRuyoxY2ux35wlsk4RWcdD1vGwr46HrONg/xEPWcdD1vGwr85AyOJys6THyrdfknRIy//onLvZOTfCOTeivTqmPF6rst+cJbJOEVnHQ9bxsK+Oh6zjYP8RD1nHQ9bxsK/OQMjicrqkM82sjaRjJC2u7EitFjnHQ9bxkHU8ZB0PWcdBzvGQdTxkHQ9ZZyBkcXmDpPMlPS1ptnNuaWVHarXIOR6yjoes4yHreMg6DnKOh6zjIet4yDoD3m+Ldc69Jmls5Udp3cg5HrKOh6zjIet4yDoOco6HrOMh63jIOhuhlyJJzYHLtwXVXdnvQW/NeVOnB/Vqf+qGoLoQ/b8+P7VeKI4BtzUF1c0c4b/E0oyey4N6LZg5y1sz7pxTgnrtuKOPt6bm1nzd9+tnjfTW9Hki7Ivjdtb4T/L4+ZBrg3qdunVaUF3Wusx+OqjuS7P/ylvTPGZ4UK8f/vwGb83Q+VODevVrWBJUlydbvjDKW9NxW3NQrwFfTe8F/H6/bptar2px+39P8NbMmBq2r57TMNhb8/fdnw3qtfJU/+fCBtQFtaoa9beM8Nc0/iGoV8+HVnhrwp6t86X9sleD6uobd3hrBl2zMqhX4+C+3poZd4Y9Ro66YJy3ZsBXglrlRpu6hUF1IY+PRyZ8P6jXlEv866YOWh3Ua39CL0UCAAAAAMB7YnEJAAAAAEiMxSUAAAAAIDEWlwAAAACAxFhcAgAAAAASY3EJAAAAAEiMxSUAAAAAIDEWlwAAAACAxNrF3mDzomVBdWfMutRbc/mldwb1un6F/2LIf/po8S4CHaqpYb23ZtySU4J6zR36gLdmzwnbgnrpurCyahB6Mdy6Yzt7a+aOOT+o157LN/t7Bfw+JKn/iRd4a2puDWpVNdpv9T+mv/Qfd6W2vVOfmhZUd+TZf0ltm3nRfuMbQXUD23f11vT4xQFJx8mtDSc2emtWTfpJatsbOv+coLp+s59ObZvVov+sF/01tf79phR2gfML688O6nXk/buC6vLkn0bM89ace+VlQb1qGuYnHSeXQo7jpLD72dyFYccN9Y07vDXjloTdrwdds9Jb0xTUqTrU3zLCWzN2yPKgXmO6+B8f/3zeF4N6damLs6/mnUsAAAAAQGIsLgEAAAAAibG4BAAAAAAkxuISAAAAAJAYi0sAAAAAQGIsLgEAAAAAibG4BAAAAAAkxuISAAAAAJAYi0sAAAAAQGLmnEuvmdkGSav3+uuekjamtpG4Kj374c65D32Qf0jW7xtZv6OSs3/gnKV9Zp3nnKUqzZr79PtG1u8g63jylHWec5aqdF8tkfX7xP7jHZntP1JdXO5zA2bPOOdGVHQjFZK32fM2b0t5mz1v87aUp9nzNOu+5Gn+PM26t7zNnrd5W8rb7Hmbt6U8zZ6nWfclT/PnadZ9ydP8eZp1b1nOzmmxAAAAAIDEWFwCAAAAABKLsbi8OcI2KiVvs+dt3pbyNnve5m0pT7PnadZ9ydP8eZp1b3mbPW/ztpS32fM2b0t5mj1Ps+5LnubP06z7kqf58zTr3jKbveKfuQQAAAAAFB+nxQIAAAAAEqvY4tLMOpnZg2b2nJndbmZWqW2lzcwmmdkrZvZk+c+grGd6L3nOWSLrmMg6jjzlLJF1TGQdR55zlsg6JrKOI085S2SdVCXfuTxX0ivOuWGSaiRNrOC2KmGWc+6E8p/lWQ+zH3nPWSLrmMg6jrzkLJF1TGQdR95zlsg6JrKOIy85S2SdSCUXl+MlzSnffkLSuApuqxJOM7MFZnZflb9ikfecJbKOiazjyEvOElnHRNZx5D1niaxjIus48pKzRNaJVHJxebCkbeXbr0vqUcFtpW2FpCuccyMlHSppTMbz7E+ec5bIOiayjiNPOUtkHRNZx5HnnCWyjoms48hTzhJZJ1LJxeVGSd3Lt7uXf86LzZIeK99+SdIh2Y3ileecJbKOiazjyFPOElnHRNZx5DlniaxjIus48pSzRNaJVHJx+bikk8q3x0uaW8FtpW26pDPNrI2kYyQtznie/clzzhJZx0TWceQpZ4msYyLrOPKcs0TWMZF1HHnKWSLrRCp2nUsz6yjpPkm1kp6TdJ7LyUU1zexQSXdK6irpt865KzMe6T3lOWeJrGMi6zjylLNE1jGRdRx5zlki65jIOo485SyRdeIZcpIVAAAAAKCKVfK0WAAAAABAK8HiEgAAAACQGItLAAAAAEBiLC4BAAAAAImxuAQAAAAAJMbiEgAAAACQGItLAAAAAEBi/weiOk1REZeWagAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# plot sample images\n", "nplot = 10\n", "fig, axes = plt.subplots(nrows=1, ncols=nplot)\n", "\n", "for i in range(nplot):\n", " img = X_digits[i].reshape(8, 8)\n", " axes[i].imshow(img)\n", " axes[i].set_title(y_digits[i])\n", "fig.set_size_inches(16,9)\n", "fig.savefig('fig-res-digits.pdf')" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# split train / test data\n", "n_samples = len(X_digits)\n", "n_train = int(0.4 * n_samples)\n", "\n", "X_train = X_digits[:n_train]\n", "y_train = y_digits[:n_train]\n", "X_test = X_digits[n_train:]\n", "y_test = y_digits[n_train:]\n" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "KNN score: 0.953661\n", "LogisticRegression score: 0.927711\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/home/bushuhui/anaconda3/envs/dl/lib/python3.7/site-packages/sklearn/linear_model/_logistic.py:765: ConvergenceWarning: lbfgs failed to converge (status=1):\n", "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n", "\n", "Increase the number of iterations (max_iter) or scale the data as shown in:\n", " https://scikit-learn.org/stable/modules/preprocessing.html\n", "Please also refer to the documentation for alternative solver options:\n", " https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n", " extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG)\n" ] } ], "source": [ "# do KNN classification\n", "knn = neighbors.KNeighborsClassifier()\n", "logistic = linear_model.LogisticRegression()\n", "\n", "print('KNN score: %f' % knn.fit(X_train, y_train).score(X_test, y_test))\n", "print('LogisticRegression score: %f' % logistic.fit(X_train, y_train).score(X_test, y_test))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 7. 深入思考\n", "\n", "* 如果输入的数据非常多,怎么快速进行距离计算?\n", " - [kd-tree](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KDTree.html#sklearn.neighbors.KDTree) \n", " - Fast Library for Approximate Nearest Neighbors (FLANN)\n", " - [PyNNDescent for fast Approximate Nearest Neighbors](https://pynndescent.readthedocs.io/en/latest/)\n", "* 如何选择最好的`k`?\n", " - https://zhuanlan.zhihu.com/p/143092725\n", "* kNN存在的问题?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 参考资料\n", "* [Digits Classification Exercise](http://scikit-learn.org/stable/auto_examples/exercises/plot_digits_classification_exercise.html)\n", "* [knn算法的原理与实现](https://zhuanlan.zhihu.com/p/36549000)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.4" } }, "nbformat": 4, "nbformat_minor": 2 }