|
|
- {
- "cells": [
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "# k-Means"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "\n",
- "根据训练样本中是否包含标签信息,机器学习可以分为 **监督学习** 和 **无监督学习**。`聚类算法`是典型的无监督学习,其训练的样本中值包含`样本的特征`,**不包含样本的标签信息**。在聚类算法中,利用样本的特征,将具有相似特征空间分布的样本划分到同一类别中。\n",
- "\n",
- "\n",
- ""
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## 1. 方法\n",
- "\n",
- "由于具有出色的速度和良好的可扩展性,K-Means最经典的聚类方法。***k-Means算法是一个重复移动类中心点(重心,centroids)的过程***:\n",
- "* 移动中心点到其包含成员的平均位置;\n",
- "* 然后重新划分其内部成员。\n",
- "\n",
- "`k`是算法中的超参数,表示类的数量;k-Means可以自动分配样本到不同的类,但是不能决定究竟要分几个类。`k`必须是一个比训练集样本数小的正整数。有时,类的数量是由问题内容指定的。例如,一个鞋厂有三种新款式,它想知道每种新款式都有哪些潜在客户,于是它调研客户,然后从数据里找出三类。也有一些问题没有指定聚类的数量,最优的聚类数量是不确定的。\n",
- "\n",
- "k-Means的参数是类的重心位置和其内部观测值的位置。与广义线性模型和决策树类似,k-Means参数的最优解也是以代价函数最小化为目标。k-Means代价函数公式如下:\n",
- "$$\n",
- "J = \\sum_{k=1}^{K} \\sum_{i \\in C_k} | x_i - u_k|^2\n",
- "$$\n",
- "\n",
- "$u_k$是第$k$个类的重心位置,定义为:\n",
- "$$\n",
- "u_k = \\frac{1}{|C_k|} \\sum_{i \\in C_k} x_i\n",
- "$$\n",
- "\n",
- "\n",
- "成本函数是各个类畸变程度(distortions)之和。每个类的畸变程度等于该类重心与其内部成员位置距离的平方和。若类内部的成员彼此间越紧凑则类的畸变程度越小,反之,若类内部的成员彼此间越分散则类的畸变程度越大。"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## 2. 算法\n",
- "求解成本函数最小化的参数就是一个重复配置每个类包含的观测值,并不断移动类重心的过程。\n",
- "\n",
- "输入:$T=\\{ x_1, x_2, ..., x_N\\}$,其中$x_i \\in \\mathbb{R}^D$,i=1,2...N\n",
- "\n",
- "输出:聚类集合$C_k$, 聚类中心$u_k$, 其中k=1,2,...K\n",
- "\n",
- "1. 初始化类的重心$u_k$,可以随机选择样本作为聚类中心\n",
- "2. 每次迭代的时候,把所有样本分配到离它们最近的类,即更新聚类集合$C_k$\n",
- "3. 然后把重心移动到该类全部成员位置的平均值那里,即更新$u_k$\n",
- "4. 若达到最大迭代步数,或两次迭代差小于设定的阈值则算法结束,否则重复步骤2\n",
- "\n"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## 3. 计算过程演示"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 1,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "image/png": "iVBORw0KGgoAAAANSUhEUgAAAWoAAAD4CAYAAADFAawfAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAAN7klEQVR4nO3dT2zkd3nH8c/H3sxmdqGAlMhVd6N6Dwi0QqqCVzRDVDTqcICC4NJDqMIBH3wpIfypUIJUcar2ghA50EpRMlwYwWGJqiqNSCrjOVQzWrG7iRR2F6QoQP4Q1OVAwVH1G7Lz9GC7s4286zHrn7+Px++XZGn9dx89Hr89/vnP1xEhAEBec6UHAADcGqEGgOQINQAkR6gBIDlCDQDJHanjjd51112xuLhYx5ue2ptvvqnjx48XnSELdjHBLibYxUSGXVy8ePE3EXH3ds+rJdSLi4u6cOFCHW96av1+X+12u+gMWbCLCXYxwS4mMuzC9i9v9jwufQBAcoQaAJIj1ACQHKEGgOQINQAkR6gBIDlCDQDJEWoASI5QA0ByhBoAkiPUAJAcoQaA5Ag1ACRHqAEgOUINAMkRagBIjlADQHJThdr2l2xftv0T29+zfWfdgwEANuwYatsnJH1B0pmI+ICkeUkP1D0YgPoMh0P1ej0Nh8PSo2AK0176OCKpafuIpGOSflXfSADqNBwO1el01O121el0iPUBsOPhthHxuu1vSHpF0v9Iei4innv7y9lekbQiSQsLC+r3+3s86u6sr68XnyELdjHBLqRer6eqqjQej1VVlbrdrqqqKj1WUelvFxFxywdJ75H0I0l3S7pD0r9KevBWr7O0tBSlra2tlR4hDXYxwS4iBoNBNJvNmJubi2azGYPBoPRIxWW4XUi6EDdp6jSXPj4q6ecRcS0i/iDpKUkfruWzBoDatVotra6uanl5Waurq2q1WqVHwg52vPShjUse99k+po1LHx1JF2qdCkCtWq2Wqqoi0gfEjveoI+K8pHOSLkl6cfN1Hq95LgDApmnuUSsivi7p6zXPAgDYBr+ZCADJEWoASI5QA0ByhBoAkiPUAJAcoQaA5Ag1ACRHqAEgOUINAMkRagBIjlADQHKEGgCSI9QAkByhBoDkCDVqNxwOdfbsWQ5RFbu4UZZdHIQT2af6e9TAH2vrxOvRaKRGo3Goj35iFxNZdrE1R1VV6vV6ad8n3KNGrfr9vkajka5fv67RaJT7pOeasYuJLLvYmmM8Hqd+nxBq1KrdbqvRaGh+fl6NRkPtdrv0SMWwi4ksu9iaY25uLvX7hEsfqNXWidf9fl/tdjvll5X7hV1MZNnF1hzdblfLy8tp3yeEGrVrtVppPwD2G7uYyLKLg3AiO5c+ACA5Qg0AyRFqAEiOUANAcoQaAJIj1ACQHKEGgOQINQAkR6gBIDlCDQDJEWoASI5QA0ByhBoAkiPUAJDcVKG2/W7b52z/1PZV23n/HiAAzJhp/x71Y5J+GBF/a7sh6ViNMwEAbrDjPWrb75L0EUlPSlJEjCLitzXPBey5g3DaNLCdaS59nJJ0TdJ3bD9v+wnbx2ueC9hTW6dNd7tddTodYo0DZZpLH0ckfVDSQxFx3vZjkh6R9I83vpDtFUkrkrSwsFD8NN/19fXiM2TBLqRer6eqqjQej1VVlbrdrqqqKj1WUdwuJtLvIiJu+SDpTyX94obH/0rSv9/qdZaWlqK0tbW10iOkwS4iBoNBNJvNmJubi2azGYPBoPRIxXG7mMiwC0kX4iZN3fHSR0T8WtKrtt+3+aSOpCv1fNoA6rF12vTy8rJWV1dTH2QKvN20P/XxkKTe5k98vCzpc/WNBNTjIJw2DWxnqlBHxAuSztQ7CgBgO/xmIgAkR6gBIDlCDQDJEWoASI5QA0ByhBoAkiPUAJAcoQaA5Ag1ACRHqAEgOUINAMkRagBIjlADQHKEGgCSI9TAPhoOhzp79ixnNopd7Ma0BwcAuE1bB+yORiM1Go1DfdIMu9gd7lED+6Tf72s0Gun69esajUa5D1OtGbvYHUIN7JN2u61Go6H5+Xk1Gg212+3SIxXDLnaHSx/APtk6YLff76vdbh/qL/XZxe4QamAftVotorSJXUyPSx8AkByhBoDkCDUAJEeoASA5Qg0AyRFqAEiOUANAcoQaAJIj1ACQHKEGgOQINQAkR6gBIDlCDQDJEWoASI5QA0ByU4fa9rzt520/XedAAID/bzf3qB+WdLWuQWYRpywD2AtTnfBi+6SkT0j6J0lfrnWiGcEpywD2yrRHcX1L0lclvfNmL2B7RdKKJC0sLBQ/VXh9fb3oDL1eT1VVaTweq6oqdbtdVVVVZJbSu8iEXUywi4n0u4iIWz5I+qSkf978d1vS0zu9ztLSUpS2trZW9P8fDAbRbDZjfn4+ms1mDAaDYrOU3kUm7GKCXUxk2IWkC3GTpk5zj/p+SZ+y/TeS7pT0J7a/GxEP1vOpYzZwyjKAvbJjqCPiUUmPSpLttqR/INLT4ZRlAHuBn6MGgOSm/WaiJCki+pL6tUwCANgW96gBIDlCDQDJEWoASI5QA0ByhBoAkiPUAJAcoQaA5Ag1ACRHqAEgOUINAMkRagBIjlADQHKEGgCSI9QAkByhRu04jR24Pbv6e9TAbnEaO3D7uEeNWvX7fY1GI12/fl2j0Sj3Sc9AUoQatWq322o0Gpqfn1ej0VC73S49EnDgcOkDteI0duD2EWrUjtPYgdvDpQ8ASI5QA0ByhBoAkiPUAJAcoQaA5Ag1ACRHqAEgOUINAMkRagBIjlADQHKEGgCSI9QAkByhBoDkCDUAJLdjqG3fY3vN9hXbl20/vB+DAQA2TPP3qN+S9JWIuGT7nZIu2v6PiLhS82wAAE1xjzoi3oiIS5v//r2kq5JO1D0Y9sZwOFSv1+MEcOAA29U1atuLku6VdL6WabCntk4A73a76nQ6xBo4oKY+isv2OyT9QNIXI+J32zx/RdKKJC0sLBQ/bXp9fb34DKX1ej1VVaXxeKyqqtTtdlVVVemxiuJ2McEuJtLvIiJ2fJB0h6RnJX15mpdfWlqK0tbW1kqPUNxgMIhmsxlzc3PRbDZjMBiUHqk4bhcT7GIiwy4kXYibNHWan/qwpCclXY2Ib9b6WQN7ausE8OXlZa2urnLALHBATXPp435Jn5X0ou0XNp/2tYh4prapsGdarZaqqiLSwAG2Y6gj4j8leR9mAQBsg99MBIDkCDUAJEeoASA5Qg0AyRFqAEiOUANAcoQaAJIj1ACQHKEGgOQINQAkR6gBIDlCDQDJEWoASI5QA0ByhBoAkiPUAJAcoQaA5Ag1ACRHqAEgOUINAMkRagBIjlADQHKEGgCSI9QAkByhBoDkCDUAJEeoASA5Qg0AyRFqAEiOUANAcoQaAJIj1ACQHKEGgOQINQAkR6gBILmpQm37Y7Z/Zvsl24/UPRQAYGLHUNuel/RtSR+XdFrSZ2yfrnuw2zEcDtXr9TQcDkuPAgC3bZp71B+S9FJEvBwRI0nfl/Tpesf64w2HQ3U6HXW7XXU6HWIN4MA7MsXLnJD06g2PvybpL9/+QrZXJK1I0sLCgvr9/l7Mt2u9Xk9VVWk8HquqKnW7XVVVVWSWLNbX14u9P7JhFxPsYiL7LqYJ9VQi4nFJj0vSmTNnot1u79Wb3pWjR4/+X6yPHj2q5eVltVqtIrNk0e/3Ver9kQ27mGAXE9l3Mc2lj9cl3XPD4yc3n5ZSq9XS6uqqlpeXtbq6eugjDeDgm+Ye9Y8lvdf2KW0E+gFJf1frVLep1WqpqioiDWAm7BjqiHjL9uclPStpXlI3Ii7XPhkAQNKU16gj4hlJz9Q8CwBgG/xmIgAkR6gBIDlCDQDJEWoASI5QA0ByhBoAkiPUAJAcoQaA5Ag1ACRHqAEgOUINAMkRagBIjlADQHKEGgCSI9QAkByhBoDkHBF7/0bta5J+uedveHfukvSbwjNkwS4m2MUEu5jIsIs/j4i7t3tGLaHOwPaFiDhTeo4M2MUEu5hgFxPZd8GlDwBIjlADQHKzHOrHSw+QCLuYYBcT7GIi9S5m9ho1AMyKWb5HDQAzgVADQHIzGWrbH7P9M9sv2X6k9Dyl2L7H9prtK7Yv23649Ewl2Z63/bztp0vPUpLtd9s+Z/untq/abpWeqRTbX9r82PiJ7e/ZvrP0TNuZuVDbnpf0bUkfl3Ra0mdsny47VTFvSfpKRJyWdJ+kvz/Eu5CkhyVdLT1EAo9J+mFEvF/SX+iQ7sT2CUlfkHQmIj4gaV7SA2Wn2t7MhVrShyS9FBEvR8RI0vclfbrwTEVExBsRcWnz37/XxgfkibJTlWH7pKRPSHqi9Cwl2X6XpI9IelKSImIUEb8tOlRZRyQ1bR+RdEzSrwrPs61ZDPUJSa/e8PhrOqRxupHtRUn3SjpfeJRSviXpq5LGheco7ZSka5K+s3kZ6Anbx0sPVUJEvC7pG5JekfSGpP+OiOfKTrW9WQw13sb2OyT9QNIXI+J3pefZb7Y/Kem/IuJi6VkSOCLpg5L+JSLulfSmpEP5fRzb79HGV9unJP2ZpOO2Hyw71fZmMdSvS7rnhsdPbj7tULJ9hzYi3YuIp0rPU8j9kj5l+xfauBT217a/W3akYl6T9FpEbH1ldU4b4T6MPirp5xFxLSL+IOkpSR8uPNO2ZjHUP5b0XtunbDe08c2Bfys8UxG2rY1rkVcj4pul5yklIh6NiJMRsaiN28OPIiLlPae6RcSvJb1q+32bT+pIulJwpJJekXSf7WObHysdJf3G6pHSA+y1iHjL9uclPauN7+J2I+Jy4bFKuV/SZyW9aPuFzad9LSKeKTcSEnhIUm/zjszLkj5XeJ4iIuK87XOSLmnjJ6SeV9JfJedXyAEguVm89AEAM4VQA0ByhBoAkiPUAJAcoQaA5Ag1ACRHqAEguf8FNFbkKND8AT8AAAAASUVORK5CYII=\n",
- "text/plain": [
- "<Figure size 432x288 with 1 Axes>"
- ]
- },
- "metadata": {
- "needs_background": "light"
- },
- "output_type": "display_data"
- }
- ],
- "source": [
- "%matplotlib inline\n",
- "import matplotlib.pyplot as plt\n",
- "import numpy as np\n",
- "\n",
- "X0 = np.array([7, 5, 7, 3, 4, 1, 0, 2, 8, 6, 5, 3])\n",
- "X1 = np.array([5, 7, 7, 3, 6, 4, 0, 2, 7, 8, 5, 7])\n",
- "plt.figure()\n",
- "plt.axis([-1, 9, -1, 9])\n",
- "plt.grid(True)\n",
- "plt.plot(X0, X1, 'k.');"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "假设K-Means初始化时,将第一个类的重心设置在第5个样本,第二个类的重心设置在第11个样本.那么我们可以把每个实例与两个重心的距离都计算出来,将其分配到最近的类里面。计算结果如下表所示:\n",
- "\n",
- "\n",
- "新的重心位置和初始聚类结果如下图所示。第一类用X表示,第二类用点表示。重心位置用稍大的点突出显示。\n",
- "\n",
- "\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 2,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "image/png": "iVBORw0KGgoAAAANSUhEUgAAAWoAAAEICAYAAAB25L6yAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAAVAklEQVR4nO3df3Dcd33n8ec7cmRwzCVtnRMXR0ZhoPRy5CCx0yJy9KSKOUhJ4K9LU4Jd8HR81zmaQOLLUdKkLdRNJ01ToNNyQ6nLJNHg8wSGaUL4dbK2w7UiYzvkLiQmM7lEsYKhDdD8UMJJtvy+P3bFyo4srWytvx9Jz8eMRvr+2O++9+31Sx99vrv7jcxEklSuM6ouQJI0N4NakgpnUEtS4QxqSSqcQS1JhTOoJalwBrXaKiLeFhGPVVzDRyPis1XWcCoioi8inq66DlXHoBYAEfHBiNgXERMR8bkF3G40It5+ou2Z+c3MfEOr+5+q2UItM/8oM3+zXfd5urW7hyrPqqoLUDEOAX8IvAN4ZcW1zCoiAojMPFp1LbOJiFWZeaTqOrT8OKIWAJn5xcz8EvCj47dFxLqIuC8ino2IH0fENyPijIi4C9gA3BsR4xFx4yy3/ekI90T7R8RbIuIfGsf/3xHRN+P2tYjYERF/D7wEvDYiPhARByLihYh4IiL+U2Pfs4CvAOc1jj8eEedFxO9HxN0zjvnuiHikcX+1iPjXM7aNRsT2iPg/EfFcRPyPiHjFbD2LiPdHxN9HxJ9FxI+A34+I1RFxe0QcjIh/jIj/HhGvnKuPjW0ZEa+bcezPRcQfznKfL+thRLwiIu6OiB81jr03Irpm/YfWkmRQqxU3AE8D5wJdwEeBzMzNwEHgysxcm5m3zXWQ2faPiPXAl6mP5n8W2A58ISLOnXHTzcA24FXAU8A/AVcA/wL4APBnEXFJZr4IXA4cahx/bWYemllDRPw88HngQ43Hcz/10OucsdtVwDuBC4B/C7x/jof1S8ATjb7sAP4Y+HngzcDrgPXALY19Z+3jXD073gl6/hvA2UA38HPAfwZ+spDjqmwGtVpxGPhXwGsy83Bj3nmxPiTmfcD9mXl/Zh7NzG8A+4BfnbHP5zLzkcw80rj/L2fm/826vwO+Drytxfv7NeDLmfmNzDwM3E59quetM/b5VGYeyswfA/dSD90TOZSZf96Y8vh/1H+hfDgzf5yZLwB/BFzd2LddfTxMPaBfl5lTmbk/M59fhOOqEAa1WvEnwOPA1xtTDR9ZxGO/BviPjT/Zn42IZ4F/Rz3Qpo3NvEFEXB4R32pMHzxLPdTXtXh/51EflQPQmO8eoz7ynfaDGT+/BKyd43gzazsXWAPsn/FYvtpYD+3r413A14BdEXEoIm6LiDMX6dgqgEGteWXmC5l5Q2a+Fng3cH1EDExvXujhjlseA+7KzHNmfJ2VmX88220iYjXwBeoj4a7MPIf69EW0WM8h6r8cpo8X1KcMvrfAx/Gy2oAfUp9y+DczHsvZmbkW5u3jS9RDftqrW7xPGqPzP8jMC6n/ZXAFsOUkH48KZFALqL9ioXHSrAPoaJygWtXYdkVEvK4Ras8BU8D0Ky/+EXjtAu7q+P3vBq6MiHdExPT99kXE+Se4fSewGngGOBIRlwP/4bjj/1xEnH2C2+8G3hURA41R5w3ABPAPC3gMs2qMzv+K+pz5vwSIiPUR8Y7Gz3P18SHgvY0evBP493Pc1TE9jIj+iLgoIjqA56lPhRT5yhidHINa036X+mjwI9TnjX/SWAfweuB/AuPACPCXmTnc2HYr8LuNP/W3t3A/x+yfmWPAe6ifWHuG+gj7v3KC52Zj3vda6oH7z8B7gb+dsf271E8WPtG4j/OOu/1jjcf359RHwFdSPzE32ULtrfhv1Kc3vhURz1Pv2/TryOfq43WNWp4FrgG+NMd9HN/zVwP3UA/pA8DfUZ8O0TIRXjhAksrmiFqSCmdQS1LhDGpJKpxBLUmFa8uHMq1bty57enraceiWvfjii5x11lmV1lAKe9FkL5rsRVMJvdi/f/8PM/Pc2ba1Jah7enrYt29fOw7dslqtRl9fX6U1lMJeNNmLJnvRVEIvIuKpE21z6kOSCmdQS1LhDGpJKpxBLUmFM6glqXAGtSQVzqCWpMIZ1JJUOINakgpnUEtS4QxqSSqcQS1JhTOoJalwBrUkFc6glqTCGdSSVDiDWpIK11JQR8SHI+KRiPhORHw+Il7R7sIktcFtt8Hw8LHrhofr61WseYM6ItYD1wKbMvONQAdwdbsLk9QGl14KV13VDOvh4frypZdWW5fm1Oo1E1cBr4yIw8Aa4FD7SpLUNv39sHs3XHUVPZdfDl/5Sn25v7/qyjSHyMz5d4q4DtgB/AT4emZeM8s+24BtAF1dXRt37dq1yKUuzPj4OGvXrq20hlLYiyZ7Udezcyc9d93F6ObNjG7dWnU5lSvhedHf378/MzfNujEz5/wCfgbYA5wLnAl8CXjfXLfZuHFjVm14eLjqEophL5rsRWbu2ZO5bl0+uXlz5rp19eUVroTnBbAvT5CprZxMfDvwZGY+k5mHgS8Cbz313x+STrvpOendu+sj6cY0yMtOMKoorQT1QeAtEbEmIgIYAA60tyxJbbF377Fz0tNz1nv3VluX5jTvycTMfCAi7gEeBI4A3wY+0+7CJLXBjTe+fF1/vycTC9fSqz4y8/eA32tzLZKkWfjOREkqnEEtSYUzqCWpcAa1JBXOoJakwhnUklQ4g1qSCmdQS1LhDGpJKpxBLUmFM6glqXAGtSQVzqCWpMIZ1Gofr3jdZC90CgxqtY9XvG6yFy8zMjbCrd+8lZGxkcrrGDw4WHkdc2n1KuTSws244jW/9Vvw6U+v3Cte24tjjIyNMHDnAJNTk3R2dDK0ZYje7t7K6pg4MsHg2GBldczHEbXaq7+/Hkwf/3j9+woNJsBezFAbrTE5NclUTjE5NUlttFZpHUc5Wmkd8zGo1V7Dw/XR480317+v5Iuo2ouf6uvpo7Ojk47ooLOjk76evkrrOIMzKq1jPk59qH1mXPH6p9flm7m8ktiLY/R29zK0ZYjaaI2+nr7Kphum69g5vJOt/VuLnPYAg1rtNNcVr1daONmLl+nt7i0iGHu7e5nYMFFELSdiUKt9vOJ1k73QKXCOWpIKZ1BLUuEMakkqnEEtSYUzqCWpcAa1JBXOoJakwhnUklQ4g1qSCmdQS1LhDGpJKpxBreVptktfnYiXxFLhDGotT8df+upEvCSWloCWgjoizomIeyLiuxFxICLK/TxACY699NWJwvr4z4iWCtXqiPqTwFcz8xeANwEH2leStEimw/rKK+GOO47ddscd9fWGtJaAeT+POiLOBn4ZeD9AZk4Ck+0tS1ok/f3wsY/B9u315UsuqYf09u1w++2GtJaEVi4ccAHwDPA3EfEmYD9wXWa+2NbKpMVy/fX179u38+Y3vhG+8516SE+vlwoXmTn3DhGbgG8Bl2XmAxHxSeD5zLz5uP22AdsAurq6Nu7atatNJbdmfHyctWvXVlpDKexF3ZuvvZZzHn6YZy+6iIc+9amqy6mcz4umEnrR39+/PzM3zboxM+f8Al4NjM5Yfhvw5blus3Hjxqza8PBw1SUUw15k5p/+aWZE/vNFF2VG1JdXOJ8XTSX0AtiXJ8jUeac+MvMHETEWEW/IzMeAAeDRxfotIrXdjDnphy65hL4HH2zOWTv9oSWg1Yvb/jYwGBGdwBPAB9pXkrSIhofhlluac9K1WjOcb7kFLr7YE4oqXktBnZkPAbPPnUilmn6d9L33vjyMr7++HtK+jlpLgO9M1PLUyptZWnlTjFQAg1rL0969rY2Up8N6797TU5d0Elqdo5aWlhtvbH3f/n6nPlQ0R9SSVDiDWpIKZ1BLUuEMakkqnEEtSYUzqCWpcAa1JBXOoJakwhnUklQ4g1qSCmdQS6fJ4MOD9HyihzP+4Ax6PtHD4MODVZekJcLP+pBOg8GHB9l27zZeOvwSAE899xTb7t0GwDUXXVNlaZUZGRuhNlqjr6eP3u7eqsspmkEtnQY3Dd3005Ce9tLhl7hp6KYVGdQjYyMM3DnA5NQknR2dDG0ZMqzn4NSHdBocfO7ggtYvd7XRGpNTk0zlFJNTk9RGa1WXVDSDWjoNNpy9YUHrl7u+nj46OzrpiA46Ozrp6+mruqSiGdTSabBjYAdrzlxzzLo1Z65hx8COiiqqVm93L0Nbhvh4/8ed9miBc9TSaTA9D33T0E0cfO4gG87ewI6BHStyfnpab3evAd0ig1o6Ta656JoVHcw6eU59SFLhDGpJKpxBLUmFM6glqXAGtSQVzqCWpMIZ1JJUOINakgpnUEtS4QxqSSqcQS1JhTOoJalwBrUkFc6glqTCtRzUEdEREd+OiPvaWdCycNttMDx87Lrh4fp6SVqghYyorwMOtKuQZeXSS+Gqq5phPTxcX7700mrrkrQktRTUEXE+8C7gs+0tZ5no74fdu+vhfMst9e+7d9fXS9ICRWbOv1PEPcCtwKuA7Zl5xSz7bAO2AXR1dW3ctWvXIpe6MOPj46xdu7bSGnp27qTnrrsY3byZ0a1bK6ujhF6Uwl402YumEnrR39+/PzM3zboxM+f8Aq4A/rLxcx9w33y32bhxY1ZteHi42gL27Mlcty7z5pvr3/fsqayUyntREHvRZC+aSugFsC9PkKmtTH1cBrw7IkaBXcCvRMTdp/77YxmbnpPevRs+9rHmNMjxJxglqQXzBnVm/k5mnp+ZPcDVwJ7MfF/bK1vK9u49dk56es56795q65K0JHkV8na48caXr+vv92SipJOyoKDOzBpQa0slkqRZ+c5ESSqcQS1JhTOoJalwBrUkFc6glqTCGdSSVDiDWpIKZ1BLUuEMakkqnEEtSYUzqCWpcAa1JBXOoJakwhnUklQ4g1ptNzI2wq3fvJWRsZGqS5GWJC8coLYaGRth4M4BJqcm6ezoZGjLEL3dvVWXJS0pjqjVVrXRGpNTk0zlFJNTk9RGa1WXJC05BrXaqq+nj86OTjqig86OTvp6+qouSVpynPpQW/V29zK0ZYjaaI2+nj6nPaSTYFCr7Xq7ew1o6RQ49SFJhTOoJalwBrUkFc6glqTCGdSSVDiDWpIKZ1BLUuEMakkqnEEtSYUzqCWpcAa1JBXOoJakwhnUklQ4g1qSCjdvUEdEd0QMR8SjEfFIRFx3OgqTJNW18nnUR4AbMvPBiHgVsD8ivpGZj7a5NkkSLYyoM/P7mflg4+cXgAPA+nYXpsUxMjbC4MFBrwAuLWELmqOOiB7gYuCBtlSjRTV9BfCdT+5k4M4Bw1paolq+FFdErAW+AHwoM5+fZfs2YBtAV1cXtVptsWo8KePj45XXULXBg4NMHJngKEeZODLBzuGdTGyYqLqsSvm8aLIXTaX3IjJz/p0izgTuA76WmXfMt/+mTZty3759i1DeyavVavT19VVaQ9WmR9QTRyZYvWo1Q1uGVvy1C31eNNmLphJ6ERH7M3PTbNtaedVHAH8NHGglpFWO6SuAb71gqyEtLWGtTH1cBmwGHo6IhxrrPpqZ97etKi2a3u5eJjZMGNLSEjZvUGfm/wLiNNQiSZqF70yUpMIZ1JJUOINakgpnUEtS4QxqSSqcQS1JhTOoJalwBrUkFc6glqTCGdSSVDiDWpIKZ1BLUuEMakkqnEEtSYUzqCWpcAa1JBXOoJakwhnUklQ4g1qSCmdQS1LhDGpJKpxBLUmFM6glqXAGtSQVzqCWpMIZ1JJUOINakgpnUEtS4QxqSSqcQS1JhTOoJalwBrUkFc6glqTCGdSSVDiDWpIK11JQR8Q7I+KxiHg8Ij7S7qIkSU3zBnVEdAB/AVwOXAj8ekRc2O7CTsXI2AiDBwcZGRupuhRJOmWtjKh/EXg8M5/IzElgF/Ce9pZ18kbGRhi4c4CdT+5k4M4Bw1rSkreqhX3WA2Mzlp8Gfun4nSJiG7ANoKuri1qtthj1LdjgwUEmjkxwlKNMHJlg5/BOJjZMVFJLKcbHxyv79yiNvWiyF02l96KVoG5JZn4G+AzApk2bsq+vb7EOvSCrx1YzOFYP69WrVrO1fyu93b2V1FKKWq1GVf8epbEXTfaiqfRetDL18T2ge8by+Y11Rert7mVoyxBbL9jK0JahFR/Skpa+VkbUe4HXR8QF1AP6auC9ba3qFPV29zKxYcKQlrQszBvUmXkkIj4IfA3oAHZm5iNtr0ySBLQ4R52Z9wP3t7kWSdIsfGeiJBXOoJakwhnUklQ4g1qSCmdQS1LhDGpJKpxBLUmFM6glqXAGtSQVzqCWpMIZ1JJUOINakgpnUEtS4QxqSSqcQS1JhTOoJalwkZmLf9CIZ4CnFv3AC7MO+GHFNZTCXjTZiyZ70VRCL16TmefOtqEtQV2CiNiXmZuqrqME9qLJXjTZi6bSe+HUhyQVzqCWpMIt56D+TNUFFMReNNmLJnvRVHQvlu0ctSQtF8t5RC1Jy4JBLUmFW5ZBHRHvjIjHIuLxiPhI1fVUJSK6I2I4Ih6NiEci4rqqa6pSRHRExLcj4r6qa6lSRJwTEfdExHcj4kBE9FZdU1Ui4sON/xvfiYjPR8Qrqq5pNssuqCOiA/gL4HLgQuDXI+LCaquqzBHghsy8EHgL8F9WcC8ArgMOVF1EAT4JfDUzfwF4Eyu0JxGxHrgW2JSZbwQ6gKurrWp2yy6ogV8EHs/MJzJzEtgFvKfimiqRmd/PzAcbP79A/T/k+mqrqkZEnA+8C/hs1bVUKSLOBn4Z+GuAzJzMzGcrLapaq4BXRsQqYA1wqOJ6ZrUcg3o9MDZj+WlWaDjNFBE9wMXAAxWXUpVPADcCRyuuo2oXAM8Af9OYBvpsRJxVdVFVyMzvAbcDB4HvA89l5terrWp2yzGodZyIWAt8AfhQZj5fdT2nW0RcAfxTZu6vupYCrAIuAT6dmRcDLwIr8jxORPwM9b+2LwDOA86KiPdVW9XslmNQfw/onrF8fmPdihQRZ1IP6cHM/GLV9VTkMuDdETFKfSrsVyLi7mpLqszTwNOZOf2X1T3Ug3slejvwZGY+k5mHgS8Cb624plktx6DeC7w+Ii6IiE7qJwf+tuKaKhERQX0u8kBm3lF1PVXJzN/JzPMzs4f682FPZhY5cmq3zPwBMBYRb2isGgAerbCkKh0E3hIRaxr/VwYo9MTqqqoLWGyZeSQiPgh8jfpZ3J2Z+UjFZVXlMmAz8HBEPNRY99HMvL+6klSA3wYGGwOZJ4APVFxPJTLzgYi4B3iQ+iukvk2hbyX3LeSSVLjlOPUhScuKQS1JhTOoJalwBrUkFc6glqTCGdSSVDiDWpIK9/8Bsi7Q+mRmA4QAAAAASUVORK5CYII=\n",
- "text/plain": [
- "<Figure size 432x288 with 1 Axes>"
- ]
- },
- "metadata": {
- "needs_background": "light"
- },
- "output_type": "display_data"
- }
- ],
- "source": [
- "C1 = [1, 4, 5, 9, 11]\n",
- "C2 = list(set(range(12)) - set(C1))\n",
- "X0C1, X1C1 = X0[C1], X1[C1]\n",
- "X0C2, X1C2 = X0[C2], X1[C2]\n",
- "plt.figure()\n",
- "plt.title('1st iteration results')\n",
- "plt.axis([-1, 9, -1, 9])\n",
- "plt.grid(True)\n",
- "plt.plot(X0C1, X1C1, 'rx')\n",
- "plt.plot(X0C2, X1C2, 'g.')\n",
- "plt.plot(4,6,'rx',ms=12.0)\n",
- "plt.plot(5,5,'g.',ms=12.0);"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "现在我们重新计算两个类的重心,把重心移动到新位置,并重新计算各个样本与新重心的距离,并根据距离远近为样本重新归类。结果如下表所示:\n",
- "\n",
- "\n",
- "\n",
- "画图结果如下:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 3,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "image/png": "iVBORw0KGgoAAAANSUhEUgAAAWoAAAEICAYAAAB25L6yAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAAUz0lEQVR4nO3dfZBddX3H8feXDQmEKKjBpULCpg+KjI61CeLCaHcbxpFC1Zm2FMWgpk5aWiw6WitSFLXUqe1YdVQciqEFtmYyaFtEWtGwa32IlASYIgRba0ICgsQHHjbobh6+/eOe9S7hZvcu2Zvz2933a2Zn99577jnf8927nz33d+69v8hMJEnlOqzuAiRJEzOoJalwBrUkFc6glqTCGdSSVDiDWpIKZ1Br2kTEmyPiGwe4bWlEDEdE16Gua1wN50XEzXVt/2BFRE9EZETMq7sWHVoG9RwWEQsi4rMRcV9EPB4Rd0bEmZ3YVmZuz8xFmbm32vZQRLy1E9uq1v+UUMvMgcx8Vae2eah1uocqh0E9t80DdgC/CRwN/CWwPiJ66iyqHXUemU/GI15NN4N6DsvMXZl5WWZuy8x9mXkjsBVYDhARfRFxf0S8MyIejogHI+ItY/ePiOdExA0R8VhE/BfwKwfa1vgj3Ii4HHgF8MlqOOST1TInRcRXIuInEfHdiDhn3P3/MSKuiIibImIX0B8RZ0XEHdX2d0TEZeM2+Z/V90eqbfTuPzQTEadFxG0R8Wj1/bRxtw1FxIci4pvVs42bI2LxAfZtrE9/EREPAVdHxGER8Z6I+L+I+HFErI+IZ1fLHxER11XXP1Jtu7u6bVtEnDFu3ZdFxHUttvmUHkbD31e/q8ci4q6IeNGBfieaOQxq/UIVFs8H7h539XE0jraPB/4Q+FREPKu67VPAz4FfAlZXX5PKzEuArwMXVsMhF0bEUcBXgH8GngucC3w6Ik4ed9c3AJcDzwC+AewCzgeOAc4CLoiI11XLvrL6fky1jY377euzgS8BnwCeA3wU+FJEPGe/7b2lqmc+8K4Jdus44NnAicAa4G3A62g8W3ke8FMa/QJ4E42eLqm2/cfAzyZY91O06iHwqmq/n1+t/xzgx1NZr8pkUAuAiDgcGAD+KTPvHXfTbuCDmbk7M28ChoEXVEMPvwu8rzoy/w7wTwdRwtnAtsy8OjP3ZOYdwOeB3x+3zL9l5jero/+fZ+ZQZt5VXf5v4HM0grEdZwH/m5nXVtv7HHAv8Dvjlrk6M/8nM38GrAd+fYL17QPen5kj1fJ/DFySmfdn5ghwGfB71bDIbhoB/auZuTczN2fmY23WPZHdNP6JnQREZm7JzAenYb2qmUEtIuIw4FpgFLhwv5t/nJl7xl1+AlgEHEtzjHvMfQdRxonAqdVQwCMR8QhwHo0j1THjt0VEnBoRgxGxMyIepRGOLYcnWnhei3rvo/HMYcxD434e2+8D2ZmZPx93+UTgX8btyxZgL9BNo9dfBtZFxA8i4iPVP8qDkpm3AJ+kceT+cERcGRHPPNj1qn4G9RwXEQF8lkaA/G5m7m7zrjuBPTSevo9ZOoVN7/+xjTuAr2XmMeO+FmXmBRPc55+BG4AlmXk08BkgDrDs/n5AI0zHWwo80PYePFmr/Tlzv/05IjMfqJ6dfCAzTwZOo/Fs4vzqfruAhePWcxwH9pR9zMxPZOZy4GQaQyB//jT3RwUxqHUF8ELgd6qn7G2pXmb3BeCyiFhYjSW/aQrb/SHwy+Mu3wg8PyJWRcTh1dcpEfHCCdbxDOAnmfnziHgZjTHlMTtpDEf8cst7wk3V9t5QneD8AxrhduMU9mEinwEuj4gTASLi2Ih4bfVzf0S8uBo+eozGkMW+6n53AudW+78C+L0JtvGkHlb9OrU6Ot9F4/zBvgPdWTOHQT2HVSHyRzTGXh+qXj0wHBHntbmKC2kMBzwE/CNw9RQ2/3EaY7Y/jYhPZObjNE6GnUvjaPch4G+ABROs40+AD0bE48D7aIwjA5CZT9A48fjNavjh5ePvmJk/pnEk+04aJ9zeDZydmT+awj5Mtn83ADdX9X0bOLW67TjgehohvQX4Go3hEIBLabx65qfAB2g8a5hoG7/oIfBM4B+q+95X7dffTtP+qEbhxAGSVDaPqCWpcAa1JBXOoJakwhnUklS4jnx4zOLFi7Onp6cTq27brl27OOqoo2qtoRT2osleNNmLphJ6sXnz5h9l5rGtbutIUPf09LBp06ZOrLptQ0ND9PX11VpDKexFk71oshdNJfQiIg74zl6HPiSpcAa1JBXOoJakwhnUklQ4g1qSCmdQS1LhDGpJKpxBLUmFM6glqXAGtSQVzqCWpMIZ1JJUOINakgpnUEtS4QxqSSqcQS1JhTOoJalwbQV1RLwjIu6OiO9ExOci4ohOFyapAz7yERgcfPJ1g4ON61WsSYM6Io4H/gxYkZkvArqAcztdmKQOOOUUOOecZlgPDjYun3JKvXVpQu3OmTgPODIidgMLgR90riRJHdPfD+vXwznn0HPmmfDv/9643N9fd2WaQGTm5AtFXARcDvwMuDkzz2uxzBpgDUB3d/fydevWTXOpUzM8PMyiRYtqraEU9qLJXjT0rF1Lz7XXsm3VKratXl13ObUr4XHR39+/OTNXtLwxMyf8Ap4F3AIcCxwO/Cvwxonus3z58qzb4OBg3SUUw1402YvMvOWWzMWLc+uqVZmLFzcuz3ElPC6ATXmATG3nZOIZwNbM3JmZu4EvAKcd/P8PSYfc2Jj0+vWNI+lqGOQpJxhVlHaCejvw8ohYGBEBrAS2dLYsSR1x221PHpMeG7O+7bZ669KEJj2ZmJm3RsT1wO3AHuAO4MpOFyapA9797qde19/vycTCtfWqj8x8P/D+DtciSWrBdyZKUuEMakkqnEEtSYUzqCWpcAa1JBXOoJakwhnUklQ4g1qSCmdQS1LhDGpJKpxBLUmFM6glqXAGtSQVzqBW5zjjdZO9aCqlF6XU0QaDWp3jjNdN9qKplF6UUkcb2p2FXJq6cTNec8EFcMUVc3fGa3vRVEovZtCM7B5Rq7P6+xt/jB/6UON7gX8Eh4y9aCqlF1UdPddeW/TvxKBWZw0ONo6YLr208X0uT6JqL5pK6UVVx7ZVq4r+nRjU6pxxM17zwQ/O7Rmv7UVTKb2YQTOyG9TqHGe8brIXTaX0opQ62uDJRHWOM1432YumUnpRSh1t8IhakgpnUEtS4QxqzVyt3ll2IIW+40xqh0GtmWv/d5YdSMHvOJPaYVBr5hr/DrcDhfX4l4IVeJJIaodBrZltorA2pDVLGNSa+VqFtSGtWcTXUWt2KOWDfqQO8Ihas0cpH/QjTTODWrNHKR/0I00zg1qzQykf9CN1gEGtma/VicN2XronzRAGtWa2iV7dYVhrlmgrqCPimIi4PiLujYgtEdHb6cKkSbXzEjzDWrNAu0fUHwf+IzNPAl4CbOlcSVKb9v884YmWu/jiJ3/OsJ/9oRlk0tdRR8TRwCuBNwNk5igw2tmypDa0+jzhVsY+E2T9+sbl8Ufi0gzQzhtelgE7gasj4iXAZuCizNzV0cqk6TKDZpuWWonMnHiBiBXAt4HTM/PWiPg48FhmXrrfcmuANQDd3d3L161b16GS2zM8PMyiRYtqraEU9qKhZ+1aeq69lm2rVjXmyJvjfFw0ldCL/v7+zZm5ouWNmTnhF3AcsG3c5VcAX5roPsuXL8+6DQ4O1l1CMexFZt5yS+bixbl11arMxYsbl+c4HxdNJfQC2JQHyNRJTyZm5kPAjoh4QXXVSuCeafgHIh0aM2i2aamVdl/18TZgICL+G/h14K87VpE03WbQbNNSK219el5m3gm0HjuRSjeDZpuWWvGdiZJUOINakgpnUEtS4QxqSSqcQS1JhTOoJalwBrUkFc6glqTCGdSSVDiDWpIKZ1BLUuEMakkqnEEtSYUzqCWpcAa1dAht3LGRD3/9w2zcsbHuUmpnL9rX1udRSzp4G3dsZOU1KxndO8r8rvlsOH8DvUt66y6rFvZiajyilg6RoW1DjO4dZW/uZXTvKEPbhuouqTb2YmoMaukQ6evpY37XfLqii/ld8+nr6au7pNrYi6lx6EM6RHqX9LLh/A0MbRuir6dvTj/VtxdTY1BLh1Dvkl5DqWIv2ufQhyQVzqCWpMIZ1JJUOINakgpnUEtS4QxqSSqcQS1JhTOoJalwBrUkFc6glqTCGdSSVDiDWpIKZ1BLUzRw1wA9H+vhsA8cRs/Hehi4a6DukjTL+el50hQM3DXAmi+u4YndTwBw36P3seaLawA478Xn1VmaZjGPqKUpuGTDJb8I6TFP7H6CSzZcUlNFmgvaDuqI6IqIOyLixk4WJJVs+6Pbp3S9NB2mckR9EbClU4XMRs6yPPssPXrplK6XpkNbQR0RJwBnAVd1tpzZY2yW5UsHL2XlNSsN61ni8pWXs/DwhU+6buHhC7l85eU1VaS5IDJz8oUirgc+DDwDeFdmnt1imTXAGoDu7u7l69atm+ZSp2Z4eJhFixbVtv2B7QOs3bqWfezjMA5j9bLVnLe0npNNdfeiJNPRi6/+8KtctfUqHh55mOcueC5vXfZWzug+Y5oqPHR8XDSV0Iv+/v7Nmbmi5Y2ZOeEXcDbw6ernPuDGye6zfPnyrNvg4GCt2//W9m/lkX91ZHZ9oCuP/Ksj81vbv1VbLXX3oiT2osleNJXQC2BTHiBT23l53unAayLit4EjgGdGxHWZ+cZp+CcyaznLsqTpMmlQZ+bFwMUAEdFHY+jDkG6DsyxLmg6+jlqSCjeldyZm5hAw1JFKJEkteUQtSYUzqCWpcAa1JBXOoJakwhnUklQ4g1qSCmdQS1LhDGpJKpxBLUmFM6glqXAGtSQVzqCWpMIZ1JJUOINakgpnUKvjnI1dOjhT+jxqaarGZmMf3TvK/K75bDh/g7PeSFPkEbU6amjbEKN7R9mbexndO8rQtqG6S5JmHINaHdXX08f8rvl0RRfzu+bT19NXd0nSjOPQhzrK2dilg2dQq+OcjV06OA59SFLhDGpJKpxBLUmFM6glqXAGtSQVzqCWpMIZ1JJUOINakgpnUEtS4QxqSSqcQS1JhTOoJalwBrUkFc6glqTCTRrUEbEkIgYj4p6IuDsiLjoUhUmSGtr5POo9wDsz8/aIeAawOSK+kpn3dLg2SRJtHFFn5oOZeXv18+PAFuD4Them6bFxx0YGtg84A7g0g01pjDoieoCXArd2pBpNq7EZwNduXcvKa1Ya1tIM1fZUXBGxCPg88PbMfKzF7WuANQDd3d0MDQ1NV41Py/DwcO011G1g+wAje0bYxz5G9oywdnAtI0tH6i6rVj4umuxFU+m9iMycfKGIw4EbgS9n5kcnW37FihW5adOmaSjv6RsaGqKvr6/WGuo2dkQ9smeEBfMWsOH8DXN+7kIfF032oqmEXkTE5sxc0eq2dl71EcBngS3thLTKMTYD+Oplqw1paQZrZ+jjdGAVcFdE3Fld997MvKljVWna9C7pZWTpiCEtzWCTBnVmfgOIQ1CLJKkF35koSYUzqCWpcAa1JBXOoJakwhnUklQ4g1qSCmdQS1LhDGpJKpxBLUmFM6glqXAGtSQVzqCWpMIZ1JJUOINakgpnUEtS4QxqSSqcQS1JhTOoJalwBrUkFc6glqTCGdSSVDiDWpIKZ1BLUuEMakkqnEEtSYUzqCWpcAa1JBXOoJakwhnUklQ4g1qSCmdQS1LhDGpJKpxBLUmFM6glqXAGtSQVrq2gjohXR8R3I+J7EfGeThclSWqaNKgjogv4FHAmcDLw+og4udOFHYyNOzYysH2AjTs21l2KJB20do6oXwZ8LzO/n5mjwDrgtZ0t6+nbuGMjK69Zydqta1l5zUrDWtKMN6+NZY4Hdoy7fD9w6v4LRcQaYA1Ad3c3Q0ND01HflA1sH2Bkzwj72MfInhHWDq5lZOlILbWUYnh4uLbfR2nsRZO9aCq9F+0EdVsy80rgSoAVK1ZkX1/fdK16ShbsWMDAjkZYL5i3gNX9q+ld0ltLLaUYGhqirt9HaexFk71oKr0X7Qx9PAAsGXf5hOq6IvUu6WXD+RtYvWw1G87fMOdDWtLM184R9W3Ar0XEMhoBfS7who5WdZB6l/QysnTEkJY0K0wa1Jm5JyIuBL4MdAFrM/PujlcmSQLaHKPOzJuAmzpciySpBd+ZKEmFM6glqXAGtSQVzqCWpMIZ1JJUOINakgpnUEtS4QxqSSqcQS1JhTOoJalwBrUkFc6glqTCGdSSVDiDWpIKZ1BLUuEMakkqXGTm9K80Yidw37SveGoWAz+quYZS2Isme9FkL5pK6MWJmXlsqxs6EtQliIhNmbmi7jpKYC+a7EWTvWgqvRcOfUhS4QxqSSrcbA7qK+suoCD2osleNNmLpqJ7MWvHqCVptpjNR9SSNCsY1JJUuFkZ1BHx6oj4bkR8LyLeU3c9dYmIJRExGBH3RMTdEXFR3TXVKSK6IuKOiLix7lrqFBHHRMT1EXFvRGyJiN66a6pLRLyj+tv4TkR8LiKOqLumVmZdUEdEF/Ap4EzgZOD1EXFyvVXVZg/wzsw8GXg58KdzuBcAFwFb6i6iAB8H/iMzTwJewhztSUQcD/wZsCIzXwR0AefWW1Vrsy6ogZcB38vM72fmKLAOeG3NNdUiMx/MzNurnx+n8Qd5fL1V1SMiTgDOAq6qu5Y6RcTRwCuBzwJk5mhmPlJrUfWaBxwZEfOAhcAPaq6npdkY1McDO8Zdvp85Gk7jRUQP8FLg1ppLqcvHgHcD+2quo27LgJ3A1dUw0FURcVTdRdUhMx8A/g7YDjwIPJqZN9dbVWuzMai1n4hYBHweeHtmPlZ3PYdaRJwNPJyZm+uupQDzgN8ArsjMlwK7gDl5HicinkXj2fYy4HnAURHxxnqram02BvUDwJJxl0+orpuTIuJwGiE9kJlfqLuempwOvCYittEYCvutiLiu3pJqcz9wf2aOPbO6nkZwz0VnAFszc2dm7ga+AJxWc00tzcagvg34tYhYFhHzaZwcuKHmmmoREUFjLHJLZn607nrqkpkXZ+YJmdlD4/FwS2YWeeTUaZn5ELAjIl5QXbUSuKfGkuq0HXh5RCys/lZWUuiJ1Xl1FzDdMnNPRFwIfJnGWdy1mXl3zWXV5XRgFXBXRNxZXffezLypvpJUgLcBA9WBzPeBt9RcTy0y89aIuB64ncYrpO6g0LeS+xZySSrcbBz6kKRZxaCWpMIZ1JJUOINakgpnUEtS4QxqSSqcQS1Jhft/uEEZ1c5o3CIAAAAASUVORK5CYII=\n",
- "text/plain": [
- "<Figure size 432x288 with 1 Axes>"
- ]
- },
- "metadata": {
- "needs_background": "light"
- },
- "output_type": "display_data"
- }
- ],
- "source": [
- "C1 = [1, 2, 4, 8, 9, 11]\n",
- "C2 = list(set(range(12)) - set(C1))\n",
- "X0C1, X1C1 = X0[C1], X1[C1]\n",
- "X0C2, X1C2 = X0[C2], X1[C2]\n",
- "plt.figure()\n",
- "plt.title('2nd iteration results')\n",
- "plt.axis([-1, 9, -1, 9])\n",
- "plt.grid(True)\n",
- "plt.plot(X0C1, X1C1, 'rx')\n",
- "plt.plot(X0C2, X1C2, 'g.')\n",
- "plt.plot(3.8,6.4,'rx',ms=12.0)\n",
- "plt.plot(4.57,4.14,'g.',ms=12.0);"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "我们再重复一次上面的做法,把重心移动到新位置,并重新计算各个样本与新重心的距离,并根据距离远近为样本重新归类。结果如下表所示:\n",
- "\n",
- "\n",
- "画图结果如下:\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 4,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "image/png": "iVBORw0KGgoAAAANSUhEUgAAAWoAAAEICAYAAAB25L6yAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAAUlklEQVR4nO3dfZBddX3H8feXPEkIgp1gLJCw+ARSHLUEJFLbXeOM4gN2OlOKYhgbnbS0KlhtFCgVRcRaR8ERaaNEBbemDKKjCGIn7E5ljAgBWh4CHUpCNggFHxAWcEPIt3/cE+4l7m7usnv3/Hb3/Zq5s3vvOfec7/3m5rO/+7u79xeZiSSpXHvVXYAkaXQGtSQVzqCWpMIZ1JJUOINakgpnUEtS4QxqjVtEdEfEtlG2D0bEiyezpt3O//qIuLuu80+EiMiIeGnddageBrWIiG9GxAMR8WhE/E9EvG8ij5+ZCzLz3upcX4+IT03k8Xe3e6hl5o8z87BOnnMyTUYPVRaDWgDnA12Z+XzgBOBTEXHUcDtGxOxJrayw84+m5No0tRnUIjPvyMyhXVery0ugOa0RER+NiAeBr0XE3tWo7tcRcSdw9GjH3zXCjYhVwMnA6mo65PvV9gMj4tsR8XBEbI6ID7bc95yIuKIa9T8KvCcijomIDRHxSPVK4EsRMbfa/z+ru/5XdY6/2H1qJiJeERH91f3viIgTWrZ9PSIuiogfRMRjEXFDRLxkhMfVVT2290bEVuC66vaVEbGp6s+1EXFIdXtExBci4qHq1cttEXFkta2/9ZVMRLwnIq4f5pwj9fCjEXF/VfPdEbF8tH8TTTGZ6cULwJeBJ2iE9M3Agur2bmAH8E/APGBv4DPAj4HfAxYDtwPbRjl2Ai+tvv868KmWbXsBG4F/BOYCLwbuBd5UbT8HeAr402rfvYGjgGOB2UAXsAk4fbjztTyGbdX3c4B7gDOr870BeAw4rKW+XwLHVMfvBdaN8Li6qnNdCuxT1faO6vivqO7/D8BPqv3fVD3W/YGo9vn9als/8L6WY78HuL7NHh4GDAAHttT1krqfU14m7uKIWgBk5t8A+wKvB64Ehlo27wQ+nplDmfkkcCJwXmb+KjMHgC+O49RHAwdk5iczc3s25rK/ApzUss+GzPxuZu7MzCczc2Nm/jQzd2TmFuBfgT9p83zHAguAz1Tnuw64Cnhnyz7fycyfZeYOGkH96j0c85zMfLzqzV8D52fmpur+nwZeXY2qn6LR48OBqPZ5oM26R/M0jR+iR0TEnMzckpn/OwHHVSEMaj0jM5/OzOuBg4FTWzY9nJm/bbl+II0R3C73jeO0hwAHVtMQj0TEIzRGu4ta9mk9FxHx8oi4KiIerKZDPg0sbPN8BwIDmbmz5bb7gINarj/Y8v0TNIJ9NK31HQJc2PJYfkVj9HxQ9UPhS8BFwEMRsSYint9m3SPKzHuA02m8+ngoItZFxIHjPa7KYVBrOLOp5qgru3/E4gM0pjx2WTKGY+9+rAFgc2bu33LZNzPfMsp9LgbuAl6WjTdAz6QRhu34ObA4Ilqf+0uA+9t/CL+jtb4B4K92ezx7Z+ZPADLzi5l5FHAE8HLg76v7PQ7MbznOi9o8H9Vx/y0z/4jGD4qkMVWlacKgnuEi4oURcVJELIiIWRHxJhrTAOtHudvlwBkR8YKIOBj4wBhO+X805qF3+RnwWPVm2N5VDUdGxGhvUO4LPAoMRsThPHv0P9w5Wt1AY5S8OiLmREQ38HZg3Rgew2j+hUZv/gAgIvaLiD+vvj86Il4bEXNoBPNvaUwrAdwK/FlEzK9+tfC9o5zjWY8vIg6LiDdExLzqmE+2HFfTgEGtpBF024BfA5+j8cbc90a5zydoTBdsBn4EXDaG811CYy71kYj4bmY+DbyNxjzwZuAXwFeB/UY5xkeAd9F4E/ArwL/vtv0c4BvVOU5s3ZCZ22kE8/HVub4MnJKZd43hMYwoM79DYzS7rpqWub06F8Dzq3p/TaN/vwT+udr2BWA7jRD+Bo258ZE8q4c05qc/Uz2eB4EXAmdMxONRGSLThQMkqWSOqCWpcAa1JBXOoJakwhnUklS4jnyIzMKFC7Orq6sTh27b448/zj777FNrDaWwF032osleNJXQi40bN/4iMw8YbltHgrqrq4ubbrqpE4duW39/P93d3bXWUAp70WQvmuxFUwm9iIgR/8LXqQ9JKpxBLUmFM6glqXAGtSQVzqCWpMIZ1JJUOINakgpnUEtS4QxqSSqcQS1JhTOoJalwBrUkFc6glqTCGdSSVDiDWpIKZ1BLUuEMakkqXFtBHREfiog7IuL2iPhWRDyv04VJ6oDPfhb6+p59W19f43YVa49BHREHAR8ElmbmkcAs4KROFyapA44+Gk48sRnWfX2N60cfXW9dGlW7aybOBvaOiKeA+cDPO1eSpI7p6YHLL4cTT6Tr+OPhmmsa13t66q5Mo4jM3PNOEacB5wFPAj/KzJOH2WcVsApg0aJFR61bt26CSx2bwcFBFixYUGsNpbAXTfaioWvtWrouu4wtK1awZeXKusupXQnPi56eno2ZuXTYjZk56gV4AXAdcAAwB/gu8O7R7nPUUUdl3fr6+uouoRj2osleZOZ112UuXJibV6zIXLiwcX2GK+F5AdyUI2RqO28mvhHYnJkPZ+ZTwJXA68b/80PSpNs1J3355Y2RdDUN8jtvMKoo7QT1VuDYiJgfEQEsBzZ1tixJHXHjjc+ek941Z33jjfXWpVHt8c3EzLwhIq4AbgZ2ALcAazpdmKQOWL36d2/r6fHNxMK19Vsfmflx4OMdrkWSNAz/MlGSCmdQS1LhDGpJKpxBLUmFM6glqXAGtSQVzqCWpMIZ1JJUOINakgpnUEtS4QxqqWTDLZ01EpfUmrYMaqlkuy+dNRKX1JrWDGqpZC1LZ40Y1i2fMe2n4E1PBrU6xxWvm8bTi9HCeiqGdCnPi1LqaINBrc5xxeum8fZiuLCeiiEN5TwvSqmjHSOt0TWei2smlqXWXlTr8+XZZxexPt+U78UE9nPK92IC6yhh/UjGuWai9Nz19MCpp8K55za+TqWR30SbiF5Ml36W8jiqOrouu6zofhrU6qy+Prj4Yjj77MbXmbyI6kT0Yrr0s5THUdWxZcWKsvs50lB7PBenPspSWy92vbzd9XJy9+s1mNK9mOB+TuleTHAdfX19tT8/cepDtXDF66bx9mK4Nw7b+dW9EpXyvCiljnaMlODjuTiiLou9aJqSvdjTSO85jgSnZC86pIRe4IhamqLa+RW8qTqyVtsMaqlku788H0nJL9s1brPrLkDSKFavbn/fnp5if71M4+OIWpIKZ1BLUuEMakkqnEEtSYUzqCWpcAa1JBXOoJakwhnUklQ4g1qSCmdQS1Lh2grqiNg/Iq6IiLsiYlNELOt0YZKkhnZH1BcCP8zMw4FXAZs6V5I0wabQatPScPYY1BGxH/DHwCUAmbk9Mx/pcF3SxJlKq01Lw2jn0/MOBR4GvhYRrwI2Aqdl5uMdrUyaKC2f19x1/PFwzTXtfXSoVIhoLCwwyg4RS4GfAsdl5g0RcSHwaGaevdt+q4BVAIsWLTpq3bp1HSq5PYODgyxYsKDWGkphLxq61q6l67LL2LJiBVtWrqy7nNr5vGgqoRc9PT0bM3PpsBtHWvpl1wV4EbCl5frrgR+Mdh+X4iqLvchnlqvavGJF7QvslsLnRVMJvWA8S3Fl5oPAQEQcVt20HLhzAn6ASJOjZTmrLStXumyVppx2f+vjA0BvRPw38Grg0x2rSJpoU2m1aWkYbS3FlZm3AsPPnUilG245K5et0hTiXyZKUuEMakkqnEEtSYUzqCWpcAa1JBXOoJakwhnUklQ4g1qSCmdQS1LhDGpJKpxBLUmFM6glqXAGtSQVzqCWJoML7DbZizEzqKXJ4AK7TfZizNr6PGpJ49SywC6nngoXXzxzF9i1F2PmiFqaLD09jWA699zG15kcTPZiTAxqabL09TVGj2ef3fg6k9dstBdjYlBLk6FlgV0++cmZvcCuvRgzg1qaDC6w22Qvxsw3E6XJ4AK7TfZizBxRS1LhDGpJKpxBLUmFM6glqXAGtSQVzqCWpMIZ1JJUOINakgpnUEtS4QxqSSqcQS1JhTOoJalwBrUkFc6glqTCtR3UETErIm6JiKs6WZAk6dnGMqI+DdjUqUKmow0DGzj/x+ezYWBD3aVImsLaWjggIg4G3gqcB/xdRyuaJjYMbGD5pcvZ/vR25s6ay/pT1rNs8bK6y5I0BbW7wssFwGpg35F2iIhVwCqARYsW0d/fP97axmVwcLDWGnq39jK0Y4id7GRoxxBr+9YytGSollrq7kVJ7EWTvWgqvRd7DOqIeBvwUGZujIjukfbLzDXAGoClS5dmd/eIu06K/v5+6qxh3sA8egd6nxlRr+xZWduIuu5elMReNNmLptJ70c6I+jjghIh4C/A84PkR8c3MfHdnS5vali1exvpT1tO/pZ/urm6nPSQ9Z3sM6sw8AzgDoBpRf8SQbs+yxcsMaEnj5u9RS1Lh2n0zEYDM7Af6O1KJJGlYjqglqXAGtSQVzqCWpMIZ1JJUOINakgpnUEtS4QxqSSqcQS1JhTOoJalwBrUkFc6glqTCGdSSVDiDWpIKZ1BLUuEManWcq7FL4zOmz6OWxsrV2KXxc0Stjurf0s/2p7fzdD7N9qe307+lv+6SpCnHoJ6hem/rpeuCLvb6xF50XdBF7229HTlPd1c3c2fNZVbMYu6suXR3dXfkPNJ05tTHDNR7Wy+rvr+KJ556AoD7fnMfq76/CoCTX3nyhJ7L1dil8TOoZ6Cz1p/1TEjv8sRTT3DW+rMmPKjB1dil8XLqYwba+putY7pdUr0M6hloyX5LxnS7pHoZ1DPQecvPY/6c+c+6bf6c+Zy3/LyaKpI0GoN6Bjr5lSez5u1rOGS/QwiCQ/Y7hDVvX9OR+WlJ4+ebiTPUya882WCWpghH1JJUOINakgpnUEtS4QxqSSqcQS1JhTOoJalwBrUkFc6glqTCGdSSVLg9BnVELI6Ivoi4MyLuiIjTJqMwSVJDO39CvgP4cGbeHBH7Ahsj4j8y884O1yZJoo0RdWY+kJk3V98/BmwCDup0YZoYGwY20Lu11xXApSlsTHPUEdEFvAa4oSPVaELtWgF87ea1LL90uWEtTVFtf3peRCwAvg2cnpmPDrN9FbAKYNGiRfT3909Ujc/J4OBg7TXUrXdrL0M7htjJToZ2DLG2by1DS4bqLqtWPi+a7EVT6b2IzNzzThFzgKuAazPz83vaf+nSpXnTTTdNQHnPXX9/P93d3bXWULddI+qhHUPMmz2P9aesn/FrF/q8aLIXTSX0IiI2ZubS4ba181sfAVwCbGonpFWOXSuArzx0pSEtTWHtTH0cB6wAbouIW6vbzszMqztWlSbMssXLGFoyZEhLU9gegzozrwdiEmqRJA3Dv0yUpMIZ1JJUOINakgpnUEtS4QxqSSqcQS1JhTOoJalwBrUkFc6glqTCGdSSVDiDWpIKZ1BLUuEMakkqnEEtSYUzqCWpcAa1JBXOoJakwhnUklQ4g1qSCmdQS1LhDGpJKpxBLUmFM6glqXAGtSQVzqCWpMIZ1JJUOINakgpnUEtS4QxqSSqcQS1JhTOoJalwBrUkFc6glqTCGdSSVDiDWpIK11ZQR8SbI+LuiLgnIj7W6aIkSU17DOqImAVcBBwPHAG8MyKO6HRh47FhYAO9W3vZMLCh7lIkadzaGVEfA9yTmfdm5nZgHfCOzpb13G0Y2MDyS5ezdvNall+63LCWNOXNbmOfg4CBluvbgNfuvlNErAJWASxatIj+/v6JqG/Merf2MrRjiJ3sZGjHEGv71jK0ZKiWWkoxODhY279HaexFk71oKr0X7QR1WzJzDbAGYOnSpdnd3T1Rhx6TeQPz6B1ohPW82fNY2bOSZYuX1VJLKfr7+6nr36M09qLJXjSV3ot2pj7uBxa3XD+4uq1IyxYvY/0p61l56ErWn7J+xoe0pKmvnRH1jcDLIuJQGgF9EvCujlY1TssWL2NoyZAhLWla2GNQZ+aOiHg/cC0wC1ibmXd0vDJJEtDmHHVmXg1c3eFaJEnD8C8TJalwBrUkFc6glqTCGdSSVDiDWpIKZ1BLUuEMakkqnEEtSYUzqCWpcAa1JBXOoJakwhnUklQ4g1qSCmdQS1LhDGpJKpxBLUmFi8yc+INGPAzcN+EHHpuFwC9qrqEU9qLJXjTZi6YSenFIZh4w3IaOBHUJIuKmzFxadx0lsBdN9qLJXjSV3gunPiSpcAa1JBVuOgf1mroLKIi9aLIXTfaiqeheTNs5akmaLqbziFqSpgWDWpIKNy2DOiLeHBF3R8Q9EfGxuuupS0Qsjoi+iLgzIu6IiNPqrqlOETErIm6JiKvqrqVOEbF/RFwREXdFxKaIWFZ3TXWJiA9V/zduj4hvRcTz6q5pONMuqCNiFnARcDxwBPDOiDii3qpqswP4cGYeARwL/O0M7gXAacCmuosowIXADzPzcOBVzNCeRMRBwAeBpZl5JDALOKneqoY37YIaOAa4JzPvzcztwDrgHTXXVIvMfCAzb66+f4zGf8iD6q2qHhFxMPBW4Kt111KniNgP+GPgEoDM3J6Zj9RaVL1mA3tHxGxgPvDzmusZ1nQM6oOAgZbr25ih4dQqIrqA1wA31FxKXS4AVgM7a66jbocCDwNfq6aBvhoR+9RdVB0y837gc8BW4AHgN5n5o3qrGt50DGrtJiIWAN8GTs/MR+uuZ7JFxNuAhzJzY921FGA28IfAxZn5GuBxYEa+jxMRL6DxavtQ4EBgn4h4d71VDW86BvX9wOKW6wdXt81IETGHRkj3ZuaVdddTk+OAEyJiC42psDdExDfrLak224BtmbnrldUVNIJ7JnojsDkzH87Mp4ArgdfVXNOwpmNQ3wi8LCIOjYi5NN4c+F7NNdUiIoLGXOSmzPx83fXUJTPPyMyDM7OLxvPhuswscuTUaZn5IDAQEYdVNy0H7qyxpDptBY6NiPnV/5XlFPrG6uy6C5hombkjIt4PXEvjXdy1mXlHzWXV5ThgBXBbRNxa3XZmZl5dX0kqwAeA3mogcy/wlzXXU4vMvCEirgBupvEbUrdQ6J+S+yfkklS46Tj1IUnTikEtSYUzqCWpcAa1JBXOoJakwhnUklQ4g1qSCvf/X4HY9SMyqPMAAAAASUVORK5CYII=\n",
- "text/plain": [
- "<Figure size 432x288 with 1 Axes>"
- ]
- },
- "metadata": {
- "needs_background": "light"
- },
- "output_type": "display_data"
- }
- ],
- "source": [
- "C1 = [0, 1, 2, 4, 8, 9, 10, 11]\n",
- "C2 = list(set(range(12)) - set(C1))\n",
- "X0C1, X1C1 = X0[C1], X1[C1]\n",
- "X0C2, X1C2 = X0[C2], X1[C2]\n",
- "plt.figure()\n",
- "plt.title('3rd iteration results')\n",
- "plt.axis([-1, 9, -1, 9])\n",
- "plt.grid(True)\n",
- "plt.plot(X0C1, X1C1, 'rx')\n",
- "plt.plot(X0C2, X1C2, 'g.')\n",
- "plt.plot(5.5,7.0,'rx',ms=12.0)\n",
- "plt.plot(2.2,2.8,'g.',ms=12.0);"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "再重复上面的方法就会发现类的重心不变了,k-Means会在条件满足的时候停止重复聚类过程。通常,条件是前后两次迭代的成本函数值的差达到了限定值,或者是前后两次迭代的重心位置变化达到了限定值。如果这些停止条件足够小,k-Means就能找到最优解,不过这个最优解不一定是全局最优解。\n",
- "\n"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## 4. Program"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 5,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "image/png": "\n",
- "text/plain": [
- "<Figure size 1080x648 with 1 Axes>"
- ]
- },
- "metadata": {
- "needs_background": "light"
- },
- "output_type": "display_data"
- }
- ],
- "source": [
- "# This line configures matplotlib to show figures embedded in the notebook, \n",
- "# instead of opening a new window for each figure. More about that later. \n",
- "# If you are using an old version of IPython, try using '%pylab inline' instead.\n",
- "%matplotlib inline\n",
- "\n",
- "# import librarys\n",
- "import numpy as np\n",
- "from sklearn.datasets import make_blobs\n",
- "import matplotlib.pyplot as plt\n",
- "import random\n",
- "\n",
- "# 生成数据\n",
- "centers = [(7, 0), (0, 0), (5, 5)]\n",
- "n_samples = 500\n",
- "\n",
- "X, y = make_blobs(n_samples=n_samples, n_features=2, \n",
- " cluster_std=1.0, centers=centers, \n",
- " shuffle=True, random_state=42)\n",
- "\n",
- "# 画出数据\n",
- "plt.figure(figsize=(15, 9))\n",
- "\n",
- "marksamples = ['or', 'ob', 'og', 'ok', '^r', '^b', '<g'] # 样本图形标记\n",
- "for i in range(len(X)):\n",
- " markindex = y[i]\n",
- " plt.plot(X[i, 0], X[i, 1], marksamples[markindex], markersize=10)\n",
- " \n",
- "plt.savefig(\"fig-res-k-means_data.pdf\")\n",
- "plt.show()\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 9,
- "metadata": {},
- "outputs": [],
- "source": [
- "# k-means\n",
- "\n",
- "def calc_distance(v1, v2):\n",
- " \"\"\"\n",
- " 计算两个向量的距离\n",
- " \n",
- " v1 - 特征1\n",
- " v2 - 特征2\n",
- " \"\"\"\n",
- " return np.sum(np.square(v1-v2))\n",
- "\n",
- "def rand_cluster_cents(X, k):\n",
- " \"\"\"\n",
- " 初始化聚类中心:通过在区间范围随机产生的值作为新的中心点\n",
- " \n",
- " X - 数据样本\n",
- " k - 聚类个数\n",
- " \"\"\"\n",
- "\n",
- " # 样本数\n",
- " n=np.shape(X)[0]\n",
- " \n",
- " # 生成随机下标列表\n",
- " dataIndex=list(range(n))\n",
- " random.shuffle(dataIndex)\n",
- " centroidsIndex = dataIndex[:k]\n",
- " \n",
- " # 返回随机的聚类中心\n",
- " return X[centroidsIndex, :]\n",
- "\n",
- "def kmeans(X, k):\n",
- " \"\"\"\n",
- " kMeans算法\n",
- " \n",
- " X - 数据样本\n",
- " k - 聚类个数\n",
- " \"\"\"\n",
- " # 样本总数\n",
- " n = np.shape(X)[0]\n",
- " \n",
- " # 分配样本到最近的簇:存[簇序号,距离的平方] (n行 x 2列)\n",
- " clusterAssment = np.zeros((n, 2))\n",
- "\n",
- " # step1: 通过随机产生的样本点初始化聚类中心\n",
- " centroids = rand_cluster_cents(X, k)\n",
- " print('最初的中心=', centroids)\n",
- "\n",
- " iterN = 0\n",
- " \n",
- " while True: \n",
- " clusterChanged = False\n",
- " \n",
- " # step2:分配到最近的聚类中心对应的簇中\n",
- " for i in range(n):\n",
- " minDist = np.inf;\n",
- " minIndex = -1\n",
- " for j in range(k):\n",
- " # 计算第i个样本到第j个中心点的距离\n",
- " distJI = calc_distance(centroids[j, :], X[i, :])\n",
- " if distJI < minDist:\n",
- " minDist = distJI\n",
- " minIndex = j\n",
- " \n",
- " # 样本上次分配结果跟本次不一样,标志位clusterChanged置True\n",
- " if clusterAssment[i, 0] != minIndex:\n",
- " clusterChanged = True\n",
- " clusterAssment[i, :] = minIndex, minDist ** 2 # 分配样本到最近的簇\n",
- " \n",
- " iterN += 1\n",
- " sse = sum(clusterAssment[:, 1])\n",
- " print('the SSE of %d' % iterN + 'th iteration is %f' % sse)\n",
- " \n",
- " # step3:更新聚类中心\n",
- " for cent in range(k): # 样本分配结束后,重新计算聚类中心\n",
- " ptsInClust = X[clusterAssment[:, 0] == cent, :]\n",
- " centroids[cent, :] = np.mean(ptsInClust, axis=0)\n",
- " \n",
- " # 如果聚类重心没有发生改变,则退出迭代\n",
- " if not clusterChanged:\n",
- " break\n",
- " \n",
- " return centroids, clusterAssment\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 15,
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "最初的中心= [[5.44933657 0.06856297]\n",
- " [1.68714164 0.88163976]\n",
- " [1.8820245 1.34542005]]\n",
- "the SSE of 1th iteration is 120663.858114\n",
- "the SSE of 2th iteration is 7006.354257\n",
- "the SSE of 3th iteration is 3567.793025\n",
- "the SSE of 4th iteration is 3502.239035\n"
- ]
- }
- ],
- "source": [
- "# 进行k-means聚类\n",
- "k = 3 # 用户定义聚类数\n",
- "mycentroids, clusterAssment = kmeans(X, k)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 16,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "image/png": "\n",
- "text/plain": [
- "<Figure size 432x288 with 1 Axes>"
- ]
- },
- "metadata": {
- "needs_background": "light"
- },
- "output_type": "display_data"
- },
- {
- "data": {
- "image/png": "\n",
- "text/plain": [
- "<Figure size 432x288 with 1 Axes>"
- ]
- },
- "metadata": {
- "needs_background": "light"
- },
- "output_type": "display_data"
- }
- ],
- "source": [
- "def datashow(dataSet, k, centroids, clusterAssment, fnFig=None): # 二维空间显示聚类结果\n",
- " from matplotlib import pyplot as plt\n",
- " num, dim = np.shape(dataSet) # 样本数num ,维数dim\n",
- "\n",
- " if dim != 2:\n",
- " print('sorry,the dimension of your dataset is not 2!')\n",
- " return 1\n",
- " marksamples = ['or', 'ob', 'og', 'ok', '^r', '^b', '<g'] # 样本图形标记\n",
- " if k > len(marksamples):\n",
- " print('sorry,your k is too large,please add length of the marksample!')\n",
- " return 1\n",
- " # 绘所有样本\n",
- " for i in range(num):\n",
- " markindex = int(clusterAssment[i, 0]) # 矩阵形式转为int值, 簇序号\n",
- " # 特征维对应坐标轴x,y;样本图形标记及大小\n",
- " plt.plot(dataSet[i, 0], dataSet[i, 1], marksamples[markindex], markersize=6)\n",
- "\n",
- " # 绘中心点\n",
- " markcentroids = ['o', '*', '^'] # 聚类中心图形标记\n",
- " label = ['0', '1', '2']\n",
- " c = ['yellow', 'pink', 'red']\n",
- " for i in range(k):\n",
- " plt.plot(centroids[i, 0], centroids[i, 1], markcentroids[i], markersize=15, label=label[i], c=c[i])\n",
- " #plt.legend(loc='upper left') #图例\n",
- " plt.xlabel('feature 1')\n",
- " plt.ylabel('feature 2')\n",
- "\n",
- " plt.title('k-means cluster result') # 标题\n",
- " if fnFig != None: plt.savefig(fnFig)\n",
- " plt.show()\n",
- " \n",
- " \n",
- "# 画出实际图像\n",
- "def trgartshow(dataSet, k, labels, fnFig=None):\n",
- " from matplotlib import pyplot as plt\n",
- "\n",
- " num, dim = np.shape(dataSet)\n",
- " label = ['0', '1', '2']\n",
- " marksamples = ['ob', 'or', 'og', 'ok', '^r', '^b', '<g']\n",
- " # 通过循环的方式,完成分组散点图的绘制\n",
- " for i in range(num):\n",
- " plt.plot(dataSet[i, 0], dataSet[i, 1], marksamples[int(labels[i])], markersize=6)\n",
- "\n",
- " \n",
- " # 添加轴标签和标题\n",
- " plt.xlabel('feature 1')\n",
- " plt.ylabel('feature 2')\n",
- " plt.title('true result') # 标题\n",
- "\n",
- " # 显示图形\n",
- " if fnFig != None: plt.savefig(fnFig)\n",
- " plt.show()\n",
- " # label=labels.iat[i,0]\n",
- " \n",
- "# 绘图显示\n",
- "datashow(X, k, mycentroids, clusterAssment, \"fig-res-k-means_predict.pdf\")\n",
- "trgartshow(X, 3, y, \"fig-res-k-means_groundtruth.pdf\")"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## 5. 利用sklearn进行聚类\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 6,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- "<Figure size 432x288 with 0 Axes>"
- ]
- },
- "metadata": {},
- "output_type": "display_data"
- },
- {
- "data": {
- "image/png": "iVBORw0KGgoAAAANSUhEUgAAAPoAAAECCAYAAADXWsr9AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAAL1UlEQVR4nO3df6hX9R3H8ddrptVS0laL0MiMIUSw/IEsitg0w1a4f5YoFCw29I8tkg3K9s/ov/6K9scIxGpBZqQljNhaSkYMtprXbJnaKDFSKgsNsz+U7L0/vsdhznXPvZ3P537v9/18wBe/997vPe/3vdfX95zz/Z5z3o4IARhs3xrrBgCUR9CBBAg6kABBBxIg6EACBB1IoC+CbnuJ7bdtv2N7TeFaj9k+ZHtXyTqn1bvc9jbbu22/ZfuewvXOs/2a7Teaeg+UrNfUnGD7ddvPl67V1Ntv+03bO21vL1xrqu1Ntvfa3mP7uoK1Zjc/06nbUdurO1l4RIzpTdIESe9KmiVpkqQ3JF1dsN6NkuZK2lXp57tM0tzm/hRJ/y7881nS5Ob+REmvSvpB4Z/x15KekvR8pd/pfkkXV6r1hKRfNPcnSZpaqe4ESR9KuqKL5fXDGn2BpHciYl9EnJD0tKSflCoWEa9IOlxq+Wep90FE7GjufyZpj6TpBetFRBxrPpzY3IodFWV7hqRbJa0rVWOs2L5QvRXDo5IUESci4tNK5RdJejci3utiYf0Q9OmS3j/t4wMqGISxZHumpDnqrWVL1plge6ekQ5K2RETJeg9LulfSlwVrnCkkvWh7yPbKgnWulPSxpMebXZN1ti8oWO90yyVt6Gph/RD0FGxPlvSspNURcbRkrYg4GRHXSpohaYHta0rUsX2bpEMRMVRi+V/jhoiYK+kWSb+0fWOhOueot5v3SETMkfS5pKKvIUmS7UmSlkra2NUy+yHoByVdftrHM5rPDQzbE9UL+fqIeK5W3WYzc5ukJYVKXC9pqe396u1yLbT9ZKFa/xURB5t/D0narN7uXwkHJB04bYtok3rBL+0WSTsi4qOuFtgPQf+npO/ZvrJ5Jlsu6U9j3FNnbFu9fbw9EfFQhXqX2J7a3D9f0mJJe0vUioj7I2JGRMxU7+/2UkTcUaLWKbYvsD3l1H1JN0sq8g5KRHwo6X3bs5tPLZK0u0StM6xQh5vtUm/TZExFxBe2fyXpr+q90vhYRLxVqp7tDZJ+KOli2wck/S4iHi1VT7213p2S3mz2myXptxHx50L1LpP0hO0J6j2RPxMRVd72quRSSZt7z586R9JTEfFCwXp3S1rfrIT2SbqrYK1TT16LJa3qdLnNS/kABlg/bLoDKIygAwkQdCABgg4kQNCBBPoq6IUPZxyzWtSj3ljX66ugS6r5y6z6h6Me9cayXr8FHUABRQ6YsT3QR+FMmzZtxN9z/PhxnXvuuaOqN336yE/mO3z4sC666KJR1Tt6dOTn3Bw7dkyTJ08eVb2DB0d+akNEqDk6bsROnjw5qu8bLyLif34xY34I7Hh00003Va334IMPVq23devWqvXWrCl+QthXHDlypGq9fsCmO5AAQQcSIOhAAgQdSICgAwkQdCABgg4kQNCBBFoFvebIJADdGzbozUUG/6DeJWivlrTC9tWlGwPQnTZr9KojkwB0r03Q04xMAgZVZye1NCfK1z5nF0ALbYLeamRSRKyVtFYa/NNUgfGmzab7QI9MAjIYdo1ee2QSgO612kdv5oSVmhUGoDCOjAMSIOhAAgQdSICgAwkQdCABgg4kQNCBBAg6kACTWkah9uSUWbNmVa03mpFT38Thw4er1lu2bFnVehs3bqxa72xYowMJEHQgAYIOJEDQgQQIOpAAQQcSIOhAAgQdSICgAwkQdCCBNiOZHrN9yPauGg0B6F6bNfofJS0p3AeAgoYNekS8IqnuWQcAOsU+OpAAs9eABDoLOrPXgP7FpjuQQJu31zZI+ruk2bYP2P55+bYAdKnNkMUVNRoBUA6b7kACBB1IgKADCRB0IAGCDiRA0IEECDqQAEEHEhiI2Wvz5s2rWq/2LLSrrrqqar19+/ZVrbdly5aq9Wr/f2H2GoAqCDqQAEEHEiDoQAIEHUiAoAMJEHQgAYIOJEDQgQQIOpBAm4tDXm57m+3dtt+yfU+NxgB0p82x7l9I+k1E7LA9RdKQ7S0RsbtwbwA60mb22gcRsaO5/5mkPZKml24MQHdGtI9ue6akOZJeLdINgCJan6Zqe7KkZyWtjoijZ/k6s9eAPtUq6LYnqhfy9RHx3Nkew+w1oH+1edXdkh6VtCciHirfEoCutdlHv17SnZIW2t7Z3H5cuC8AHWoze+1vklyhFwCFcGQckABBBxIg6EACBB1IgKADCRB0IAGCDiRA0IEEBmL22rRp06rWGxoaqlqv9iy02mr/PjNijQ4kQNCBBAg6kABBBxIg6EACBB1IgKADCRB0IAGCDiRA0IEE2lwF9jzbr9l+o5m99kCNxgB0p82x7sclLYyIY8313f9m+y8R8Y/CvQHoSJurwIakY82HE5sbAxqAcaTVPrrtCbZ3SjokaUtEMHsNGEdaBT0iTkbEtZJmSFpg+5ozH2N7pe3ttrd33COAb2hEr7pHxKeStklacpavrY2I+RExv6PeAHSkzavul9ie2tw/X9JiSXsL9wWgQ21edb9M0hO2J6j3xPBMRDxfti0AXWrzqvu/JM2p0AuAQjgyDkiAoAMJEHQgAYIOJEDQgQQIOpAAQQcSIOhAAsxeG4WtW7dWrTfoav/9jhw5UrVeP2CNDiRA0IEECDqQAEEHEiDoQAIEHUiAoAMJEHQgAYIOJEDQgQRaB70Z4vC6bS4MCYwzI1mj3yNpT6lGAJTTdiTTDEm3SlpXth0AJbRdoz8s6V5JX5ZrBUApbSa13CbpUEQMDfM4Zq8BfarNGv16SUtt75f0tKSFtp8880HMXgP617BBj4j7I2JGRMyUtFzSSxFxR/HOAHSG99GBBEZ0KamIeFnSy0U6AVAMa3QgAYIOJEDQgQQIOpAAQQcSIOhAAgQdSICgAwkMxOy12rO05s2bV7VebbVnodX+fW7cuLFqvX7AGh1IgKADCRB0IAGCDiRA0IEECDqQAEEHEiDoQAIEHUiAoAMJtDoEtrnU82eSTkr6gks6A+PLSI51/1FEfFKsEwDFsOkOJNA26CHpRdtDtleWbAhA99puut8QEQdtf1fSFtt7I+KV0x/QPAHwJAD0oVZr9Ig42Px7SNJmSQvO8hhmrwF9qs001QtsTzl1X9LNknaVbgxAd9psul8qabPtU49/KiJeKNoVgE4NG/SI2Cfp+xV6AVAIb68BCRB0IAGCDiRA0IEECDqQAEEHEiDoQAIEHUjAEdH9Qu3uF/o1Zs2aVbOctm/fXrXeqlWrqta7/fbbq9ar/febP3+wT8eICJ/5OdboQAIEHUiAoAMJEHQgAYIOJEDQgQQIOpAAQQcSIOhAAgQdSKBV0G1Ptb3J9l7be2xfV7oxAN1pO8Dh95JeiIif2p4k6dsFewLQsWGDbvtCSTdK+pkkRcQJSSfKtgWgS2023a+U9LGkx22/bntdM8jhK2yvtL3ddt1TuwAMq03Qz5E0V9IjETFH0ueS1pz5IEYyAf2rTdAPSDoQEa82H29SL/gAxolhgx4RH0p63/bs5lOLJO0u2hWATrV91f1uSeubV9z3SbqrXEsAutYq6BGxUxL73sA4xZFxQAIEHUiAoAMJEHQgAYIOJEDQgQQIOpAAQQcSGIjZa7WtXLmyar377ruvar2hoaGq9ZYtW1a13qBj9hqQFEEHEiDoQAIEHUiAoAMJEHQgAYIOJEDQgQQIOpDAsEG3Pdv2ztNuR22vrtAbgI4Me824iHhb0rWSZHuCpIOSNpdtC0CXRrrpvkjSuxHxXolmAJQx0qAvl7ShRCMAymkd9Oaa7kslbfw/X2f2GtCn2g5wkKRbJO2IiI/O9sWIWCtprTT4p6kC481INt1XiM12YFxqFfRmTPJiSc+VbQdACW1HMn0u6TuFewFQCEfGAQkQdCABgg4kQNCBBAg6kABBBxIg6EACBB1IgKADCZSavfaxpNGcs36xpE86bqcfalGPerXqXRERl5z5ySJBHy3b2yNi/qDVoh71xroem+5AAgQdSKDfgr52QGtRj3pjWq+v9tEBlNFva3QABRB0IAGCDiRA0IEECDqQwH8An6mM7XzL9vMAAAAASUVORK5CYII=\n",
- "text/plain": [
- "<Figure size 288x288 with 1 Axes>"
- ]
- },
- "metadata": {
- "needs_background": "light"
- },
- "output_type": "display_data"
- }
- ],
- "source": [
- "from sklearn.datasets import load_digits\n",
- "import matplotlib.pyplot as plt \n",
- "from sklearn.cluster import KMeans\n",
- "\n",
- "# load digital data\n",
- "digits, dig_label = load_digits(return_X_y=True)\n",
- "\n",
- "# draw one digital\n",
- "plt.gray() \n",
- "plt.matshow(digits[0].reshape([8, 8])) \n",
- "plt.show() \n",
- "\n",
- "# calculate train/test data number\n",
- "N = len(digits)\n",
- "N_train = int(N*0.8)\n",
- "N_test = N - N_train\n",
- "\n",
- "# split train/test data\n",
- "x_train = digits[:N_train, :]\n",
- "y_train = dig_label[:N_train]\n",
- "x_test = digits[N_train:, :]\n",
- "y_test = dig_label[N_train:]\n",
- "\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 7,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "image/png": "iVBORw0KGgoAAAANSUhEUgAAAWoAAAA9CAYAAACEJCMYAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAAP0ElEQVR4nO2da2xU1RbH/3tmGOhjrCAvKQ9BEQT1VoKoeBMlEawabfSDgK8YNRgVEz5owgc0AeMDRMVEoxBzFUwIagVzVRDQIJBiIiCPy6NgwSKtthQstpTptJ3u+6Gdzdqr7fTMmenMka5f0nTtWTPn/OfMOfucvfbaeyutNQRBEATv4su0AEEQBCE+UlELgiB4HKmoBUEQPI5U1IIgCB5HKmpBEASPIxW1IAiCx3FUUSulCpVSR5RSZUqp+T0tSnSIDtEhOi5WHW5Q3eVRK6X8AI4CmA6gAsBOALO11ofifKbLjYZCIaucn59v7EAgYPmqqqqMrbXGmTNnutSptVaJ6PD7/Vb5yiuvNHZLS4vlO3nypKWD+xPV4fNduD+OGDHCeu9ll11m7NbWVstXXV1t6aiqqoJSypQT1UG55JJLrPLo0aON3dTUZPl+//13S8f58+e72mzCx2PUqFHWewcMGGDsU6dOWb4//vjD0sGPV6I64tG3b19j03MFsPVrrXHo0CFkZ2dDKYVz586lVMeQIUOMPWjQIMv366+/Wjr475aMjv79+1vl4cOHG5tfS+Fw2NJRVlaGUCgEn8+H2tpa+Hw+KKXQ2tqK1tbWbnXQeuGKK66w3puVldWljubmZktHaWkpcnNz4fP5cPbsWeu9iR6PwYMHd1mOd90CwF9//WXp6k4HAAQ6e5ExBUCZ1vo4ACil1gAoAtBlRc2JVSYAMHnyZMv3xhtvGJufDIsXLzZ2dXU11q9fj2AwCABobGx0uvtOufTSS63y8uXLrX1RXnjhBWNHIhHU1NQktW96ci1YsMDyPfbYY8bm3/Gtt94y9smTJ7Fq1Sr069cPANDQ0JCUpltvvdUqr1y50tgVFRWW79lnnzV2fX09Dh48mNS+s7Ozjf3qq69avpkzZxr7vffes3yLFi0ydnNzM+rq6pLSEQ9aMa1evdry5ebmGvuXX37B008/jYKCAgDAli1bALRdA24Gl/HK59FHHzX2nDlzLN/dd99t7HA4jMrKSvP5aDSasA563d5xxx2Wb8mSJcamN1MA2L9/v7EPHDiARYsWmc9/+eWXANquAV5ZdgWtF95++23Ld8MNNxg7JyfH8tHrdM+ePZg3bx5uu+02AMBnn33maN8U+ls8/PDDlm/u3LnGrq+vt3zvvvuuVV6zZo2x6U0tHk4q6nwAJ0m5AsBN/E1KqTkA5vDXU8X58+etE6crelpH7ITPtI76+npPHI94T23p1BHvaTqdOqqrq83NM5M6otGoJ86PmpoaqwL1+XzWk266dFRXV1sPSF3R0zrc4qSidoTWegWAFUBiTblUIzpEh+gQHf90HRwnFXUlABpIHd7+mmNo05A2zwBgzJgxxuZN16KiImOXlpaipKTENLNofNIp9AmDNt8B4Oabbzb2iy++aPloWCFefNop06ZNM3asKRbj448/Nva4ceMs3/3332/sMWPG4NtvvzUxy+PHjyesgzYpV6xYYfnoseJhlffff9/Y+/fvx5w5c5CXlwcAOH36dMI6pk+fbuwZM2ZYvrKyMmNPnTrV8l1zzTXGrqurs5rcbqDfmYY6AGD+/At9T+PHj7d8tO9k5MiR0FqbmGVsm25DH7RpD9ihsnXr1lk+/qQaCARM30NtbS2Atua703OYhgefeeYZy0f7j/bt22f5rr32WmOPHz8e4XDYxPgjkQiAtrCe0+NBw3L33nuv5Tty5Iixv/nmG8tH+1IqKipQU1OD0tJSR/vsDHq+vfbaa5bv888/NzZvycyaNcsqf/3118Z2GvpwkvWxE8BYpdRopVQQwCwA/3W09RQyduxYtLS0oKWlxdUJnyp4zDBTTJw4Ec3NzWhubs7o8ZgwYQKi0Sii0WhGdfBO6kxRUFCA+vp6nDt3LqPHJBgMWr9La2ur1emZLq6//nrU1dWhvr7e6OFJA+lg2LBhaGpqQiQScRwm8xLdHjGtdYtSai6AjQD8AP6jtU6u98gFfr8feXl5cTM/0oGTuF86CAQCGDhwIKqqqjJaQQYCAeTm5uLvv//OmAbAW7/LpEmTsHXrVmitoZTKiDalFHJyckwrNZZpkW4CgQCmTp2KDRs2QGsNv9+fkRuGz+dDfn6+q5anF3B0a9Narwew3u1OaK9+LGsjxrFjx7r0dXbxx5rYNMXFKTQEw5sjtCd/1apVlo+nWCULbX7xnnu6LxpiAIDDhw9b5XA4bI6tm4ryxhtvNDZv6j/00EPG3rlzp+XjTe4pU6YY+/vvv09YB+2g5RkVNPTx1FNPWb4+ffokvK94DBw40Ng00wewQ1SVlXbkj6c2KqUwadIkAMD27dsTvpHSzrdXXnnF8p04ccLY/HfgoSH6UPPjjz8mpAGwW4+ffvqp5du+fbuxeThi6NChVjkSiZjjsW3bNtPcd/pkS3+XWOgkxuuvv27szZs3W75YuCdGspliNC2Thn4A4IsvvjB2LOMnBg2ZAHY95DRUKCMTBUEQPI5U1IIgCB5HKmpBEASPk5buVxpX+vPPPy3f1VdfbWyeUUHjT4C7OCxl2LBhxuajIOloxJtussfz0KG5gB0ndNORR+PyfFj0Sy+9ZGwe21q7dq1VTnY04uWXX25sPvrwp59+MjaPye7Zs8cq0xQyNzHqHTt2GJsfj8LCQmPzgSQ8Bpks9Hs8+OCDlo/GU+l5BKDDSFU+xD5RaIoi/f6AnVbKz49bbrnFKn/33XfG3rRpU8I66HfmA5vuuusuY/O+Ax57pjFrNxkXNG2Xx5mff/55Y9NpDwDgo48+ssp8yoFEoamAvE6gfQl8Wgg+0pmPoHSCPFELgiB4HKmoBUEQPE5aQh+02cRTmWgIgjcneFqY03k2uoLui85SB9ij/nhzkzZlAeDll182drKj4fh32rp1q7HpaEkAePzxx60yHYnlZmIkOvcBb57RtCE+4o03G2n6pRvovvi26Ci38vJyy5fq0AcNafFJe2hz9YEHHrB8PPQRbzZBJ9x5551d+uikZnxUKw/J8PS9RKGhSD5pF71GeMiBp8klO4nZzz//bOwPP/zQ8tF981Gc/PgUFxcbm6f5OYGGLBcuXGj5aGokTzPmvycNjTi9buWJWhAEweNIRS0IguBxHIU+lFLlAOoBRAG0aK0nx//Exc3hw4c9MefH0aNHMzY0mLJs2TL07ds34zq8wunTpzM2dJxSXFyMPn36ZFxHSUkJ/H5/xnVEIpGMa3BLIjHqaVrrxKdGgz1hDk99ozFeHgvlqzmUl5ebH9zNLHY0NZDHWWl8kg+Xve+++6xyXl4eFi5ciFAohCeeeMK87lQTTTOLDYmPQdPb+ATkfOL8rKwsPPLII8jOzsbSpUsd7Zty6NCFtR/4saa/GY8b87SwaDSKwsJCZGVlWQswOIVO0nPddddZPrqyCk8L5PMLK6WQlZUFpZSr1EV6fnzwwQeWj37ne+65x/Lx1DWttdHhdPoBWoEcOHDA2LTPArDjsHylmW3btlnlxsZG9O/fH36/31U8n16PfAg67TvgfTg8fhuJRJK6kdO0Oj4JP+3H4ZP5jxw50ione37QOoOnCscWRAA61mN8Rkh6LtEUynhI6EMQBMHjOH2i1gA2tU+kvbx9cm2LdK2M0F3mR7p0LFmyxKz71tlsYOnSUVxcHPdJJR06lFJYvz7+nF3pOh7dTbyTLh3dzTOcLh18AFGmdHSXZdHbzo9EcVpR/1trXamUGgxgs1KqVGtttbPSsTJCLOyhte6ywk6HjgULFmDAgAGoq6vD3LlzO41HpkPH7NmzEQqF0NDQ0KGpnk4dRUVFyMnJQTgc7jDzYDp19OvXDz6fL+6Cu+nQkZWVBZ/Ph9bW1ozqGDp0KAKBAKLRaIeRp+nUEQt7aK27rCh70/nhBqfTnFa2/z+llFqHtgVvt8X/1AXo3ZTH7Gh+Kp+2kk+9SWOSbqYepauJb9iwwfJdddVVneoFOg43p6s/B4NBKKUQDAYdx71ojPrJJ5+0fDT2xVfkpitQA21z7PJ9JrKSyN69e43Nb3x0EVE+5JXHs3nsOFHoSiLPPfec5YtNjwnY010CHaeTLCkpMfbSpUuhlEJ2drbj+Gy8oco03tlZbJxC/W5yqunip3R4PWDn+/MVgN58802rzIf+JwpdfJYOnwbsc5H3nfDxD07X1ewKejxvv/12y0dXB5owYYLlo3FjAGbhArfQ65aveLNr1y5j87xxvpA2hbbG4w2v7zZGrZTKUUqFYjaAGQAOxP9U6qEru2RyovxwOGwuvnA4jJaWloxMhN7Q0GAugM4WC02njtjJn8mVMxobG80NtqmpCU1NTRlZSSQcDnviPE3FknGpIJPHgOKV+sMtTs7kIQDWtT81BACs1lo766pMIeFw2PH6Yj3J2bNnTa92bFmhTFQINTU1+OqrrwBcqCAzkXpUU1OD3377Le375dTW1uKTTz4B0HY8gsFghxFi6eDMmTMZX+0GcDfy7mKmsbHRMzcvNzhZius4gH8lsxN64vJmEm2u8VQ12kzJzs62/G7SjegTH2+6rFy50th0tQbATj8aN26cNXR77dq15gnb6RMlbVZPnDjR8s2cOdPY/ILnaYNdhYKcnpC0qc+Hp9PFOvn25s2bZ5WTfZKmN2Cekkib2Hz6AfregoICq4n5zjvvGHv37t0Ja+LNdXoz4qt70N8pFApZ0xN015kXgz7l0SH1NPwA2KEProOmW6YCeuPnq8fQ47NlyxbLl+onVhoS5YtS0xkDly1bZvk2btyYUl00jHX06FHLt3jxYmPT0CjQMQXvhx9+SFiTpOcJgiB4HKmoBUEQPI5U1IIgCB5H9UQPqFKqBkADAFdDzhkDHWxnlNZ6EH9RdHhaxwmH2xAdouNi0OFES6c6ALQFs3viD8AuL2xHdHhTh2xDttGbtpHsdiT0IQiC4HGkohYEQfA4PVlRd5i4KUPbER2p/XwqtyPbkG30lm0ktZ0e6UwUBEEQUoeEPgRBEDyOVNSCIAgep0cqaqVUoVLqiFKqTCk1P4ntlCul/qeU2quU2tX9J0SH6BAdouPi0gEg9XnUAPwAjgEYAyAIYB+ACS63VQ5goOgQHaJDdPRGHbG/nniingKgTGt9XGvdBGANgKIe2I/oEB2iQ3Rc7DoA9EzoIx/ASVKuaH/NDbG1Gne3r2UmOkSH6BAdvUkHAOdrJmaKbtdqFB2iQ3SIjotdR088UVcCGEHKw9tfSxhN1moEEFurUXSIDtEhOnqLDrORlP6h7Sn9OIDRuBCEn+hiOzkAQsTeAaBQdIgO0SE6eouO2F/KQx9a6xal1FwAG9HWc/ofrfVBF5tKaq1G0SE6RIfo+KfriCFDyAVBEDyOjEwUBEHwOFJRC4IgeBypqAVBEDyOVNSCIAgeRypqQRAEjyMVtSAIgseRiloQBMHj/B/yv5/mcRNijQAAAABJRU5ErkJggg==\n",
- "text/plain": [
- "<Figure size 432x288 with 10 Axes>"
- ]
- },
- "metadata": {
- "needs_background": "light"
- },
- "output_type": "display_data"
- }
- ],
- "source": [
- "# do kmeans\n",
- "kmeans = KMeans(n_clusters=10, random_state=0).fit(x_train)\n",
- "\n",
- "# kmeans.labels_ - output label\n",
- "# kmeans.cluster_centers_ - cluster centers\n",
- "\n",
- "# draw cluster centers\n",
- "fig, axes = plt.subplots(nrows=1, ncols=10)\n",
- "for i in range(10):\n",
- " img = kmeans.cluster_centers_[i].reshape(8, 8)\n",
- " axes[i].imshow(img)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## 6. 深入思考\n",
- "\n",
- "1. 如何计算聚类的精度?\n",
- "2. 如何匹配聚类的类别和真实类别?\n"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## 7. 评估聚类性能\n",
- "\n",
- "### 7.1 方法1 - ARI\n",
- "\n",
- "如果被用来评估的数据本身带有正确的类别信息,则利用Adjusted Rand Index(ARI)对聚类结果进行评估,ARI与分类问题中计算准确性的方法类似,兼顾了类簇无法和分类标记一一对应的问题。\n",
- "\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 8,
- "metadata": {
- "scrolled": true
- },
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "ari_train = 0.687021\n"
- ]
- }
- ],
- "source": [
- "from sklearn.metrics import adjusted_rand_score\n",
- "\n",
- "ari_train = adjusted_rand_score(y_train, kmeans.labels_)\n",
- "print(\"ari_train = %f\" % ari_train)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "contingency表的定义:\n",
- "\n",
- "\n",
- "其中$X$为真实类别,$Y$为聚类的簇\n"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "#### 7.1.1 RI\n",
- "为了方便理解ARI,先讨论一下RI,也就是rand index,是ARI的基础方法。\n",
- "\n",
- "假如有两类,那么针对这两类的的RI评价指标为:\n",
- "\n",
- "$$\n",
- "R = \\frac{a + b}{a+b+c+d}\n",
- "$$\n",
- "\n",
- "a,b,c,d分别代表的含义为:\n",
- "* a : 应该在一类,最后聚类到一类的数量,\n",
- "* b : 不应该在一类,最后聚类结果也没把他们聚类在一起的数量。\n",
- "* c和d那么就是应该在一起而被分开的和不应该在一起而被迫在一起的。毕竟强扭的瓜不甜,c和d固然是错误的。\n",
- "\n",
- "所以从R的表达式中可以看出,a和b是对的,这样能够保证R在0到1之间,而且,聚类越准确,指标越接近于1.\n",
- "\n",
- "这里有一个关键性的问题,就是什么叫数量?怎么去计算?准确的说,是配对的数量。比如说a是应该在一起而真的幸福的在一起了的数量,这显然就应该像人类一样按照小夫妻数量计算,但是我们的样本可不管一夫一妻制,任意选两个就是一个配对,所以,就是 $n(n-1)/2$ 这样来计算,也就是组合数,n个当中选两个的选法。同时我们看到,分母其实是所有配对的总和,所以,我们最后可以写成这样:\n",
- "\n",
- "$$\n",
- "R = \\frac{a + b}{a+b+c+d} = \\frac{a + b}{\\binom{n}{2}}\n",
- "$$"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "#### 7.1.2 ARI\n",
- "\n",
- "有了先前RI的感性理解之后,接下来解释一下ARI。\n",
- "\n",
- "RI有一个缺点,就是惩罚力度不够,换句话说,大家普遍得分比较高,没什么区分度,遍地80分。这样的话,往往是考试的制度不合适,于是就诞生出了ARI,这个指标相对于RI就很有区分度了。\n",
- "\n",
- "$$\n",
- "ARI = \\frac{Index - ExpctedIndex}{MaxIndex - ExpectedIndex}\n",
- "$$\n",
- "\n",
- "具体的公式是:\n",
- "$$\n",
- "ARI = \\frac{ \\sum_{ij} \\binom{n_{ij}}{2} - \\left[ \\sum_i \\binom{a_i}{2} \\sum_j \\binom{b_j}{2} \\right] / \\binom{n}{2} }{ \\frac{1}{2} \\left[ \\sum_i \\binom{a_i}{2} + \\sum_j \\binom{b_j}{2} \\right] - \\left[ \\sum_i \\binom{a_i}{2} \\sum_j \\binom{b_j}{2} \\right] / \\binom{n}{2} }\n",
- "$$\n",
- "\n",
- "ARI取值范围为[-1,1],值越大越好,反映两种划分的重叠程度,使用该度量指标需要数据本身有类别标记。\n",
- "\n",
- "* $ \\sum_{ij} \\binom{n_{ij}}{2}$ : $n_{ij}$代表的是聚类之后在$i$类,应该在$j$类的样本数量,很显然,这一求和,就是RI中的a,应该在一起而真的在一起的数量。\n",
- "\n",
- "* $\\frac{1}{2} \\left[ \\sum_i \\binom{a_i}{2} + \\sum_j \\binom{b_j}{2} \\right]$ : 是如果聚类是完全对的,那么就应该是$a$, $b$的所有组合可能之和,所以在表达式里面叫做MaxIndex。\n",
- "\n",
- "* $\\left[ \\sum_i \\binom{a_i}{2} \\sum_j \\binom{b_j}{2} \\right] / \\binom{n}{2}$ 是a的期望\n",
- "$$\n",
- "E(\\sum_{ij} \\binom{n_{ij}}{2}) = \\sum_i \\binom{n_i}{2} \\sum_j \\binom{n_j}{2} / \\binom{n}{2}\n",
- "$$\n",
- "\n",
- "假设配对矩阵是这样的,共有n(n-1)/2个配对方法。在行方向计算出可能取到的配对数,在列方向计算可能取到的配对数,相乘以后,除以总的配对数,这就是a的期望了。"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "\n",
- "\n",
- "* [ARI聚类效果评价指标](https://blog.csdn.net/qtlyx/article/details/52678895)\n",
- "* [ARI reference](https://davetang.org/muse/2017/09/21/adjusted-rand-index/)\n",
- "* [聚类性能评估-ARI(调兰德指数)](https://zhuanlan.zhihu.com/p/145856959)\n"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "\n",
- "### 7.2 方法2 - 轮廓系数\n",
- "如果被用来评估的数据没有所属类别,则使用轮廓系数(Silhouette Coefficient)来度量聚类结果的质量,评估聚类的效果。**轮廓系数同时兼顾了聚类的凝聚度和分离度,取值范围是[-1,1],轮廓系数越大,表示聚类效果越好。** \n",
- "\n",
- "轮廓系数的具体计算步骤: \n",
- "1. 对于已聚类数据中第i个样本$x_i$,计算$x_i$与其同一类簇内的所有其他样本距离的平均值,记作$a_i$,用于量化簇内的凝聚度 \n",
- "2. 选取$x_i$外的一个簇$b$,计算$x_i$与簇$b$中所有样本的平均距离,遍历所有其他簇,找到最近的这个平均距离,记作$b_i$,用于量化簇之间分离度 \n",
- "3. 对于样本$x_i$,轮廓系数为$sc_i = \\frac{b_i−a_i}{max(b_i,a_i)}$ \n",
- "4. 最后,对所有样本集合$\\mathbf{X}$求出平均值,即为当前聚类结果的整体轮廓系数。"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 21,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "image/png": "\n",
- "text/plain": [
- "<Figure size 720x720 with 6 Axes>"
- ]
- },
- "metadata": {
- "needs_background": "light"
- },
- "output_type": "display_data"
- },
- {
- "data": {
- "image/png": "\n",
- "text/plain": [
- "<Figure size 720x720 with 1 Axes>"
- ]
- },
- "metadata": {
- "needs_background": "light"
- },
- "output_type": "display_data"
- }
- ],
- "source": [
- "import numpy as np\n",
- "from sklearn.cluster import KMeans\n",
- "from sklearn.metrics import silhouette_score\n",
- "import matplotlib.pyplot as plt\n",
- "\n",
- "plt.rcParams['figure.figsize']=(10,10)\n",
- "plt.subplot(3,2,1)\n",
- "\n",
- "x1=np.array([1,2,3,1,5,6,5,5,6,7,8,9,7,9]) #初始化原始数据\n",
- "x2=np.array([1,3,2,2,8,6,7,6,7,1,2,1,1,3])\n",
- "X=np.array(list(zip(x1,x2))).reshape(len(x1),2)\n",
- "\n",
- "plt.xlim([0,10])\n",
- "plt.ylim([0,10])\n",
- "plt.title('Instances')\n",
- "plt.scatter(x1,x2)\n",
- "\n",
- "colors=['b','g','r','c','m','y','k','b']\n",
- "markers=['o','s','D','v','^','p','*','+']\n",
- "\n",
- "clusters=[2,3,4,5,8]\n",
- "subplot_counter=1\n",
- "sc_scores=[]\n",
- "for t in clusters:\n",
- " subplot_counter +=1\n",
- " plt.subplot(3,2,subplot_counter)\n",
- " kmeans_model=KMeans(n_clusters=t).fit(X) #KMeans建模\n",
- "\n",
- " for i,l in enumerate(kmeans_model.labels_):\n",
- " plt.plot(x1[i],x2[i],color=colors[l],marker=markers[l],ls='None')\n",
- "\n",
- " plt.xlim([0,10])\n",
- " plt.ylim([0,10])\n",
- "\n",
- " sc_score=silhouette_score(X,kmeans_model.labels_,metric='euclidean') #计算轮廓系数\n",
- " sc_scores.append(sc_score)\n",
- "\n",
- " plt.title('k=%s,silhouette coefficient=%0.03f'%(t,sc_score))\n",
- "\n",
- "plt.figure()\n",
- "plt.plot(clusters,sc_scores,'*-') #绘制类簇数量与对应轮廓系数关系\n",
- "plt.xlabel('Number of Clusters')\n",
- "plt.ylabel('Silhouette Coefficient Score')\n",
- "plt.savefig('fig-res-k-means_silhouette_coef.pdf')\n",
- "plt.show() "
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## 8. 如何确定K\n",
- "\n",
- "利用“肘部观察法”可以粗略地估计相对合理的聚类个数。K-means模型最终期望*所有数据点到其所属的类簇距离的平方和趋于稳定,所以可以通过观察这个值随着K的走势来找出最佳的类簇数量。理想条件下,这个折线在不断下降并且趋于平缓的过程中会有斜率的拐点,这表示从这个拐点对应的K值开始,类簇中心的增加不会过于破坏数据聚类的结构*。\n",
- "\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 22,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXgAAAEGCAYAAABvtY4XAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAASRElEQVR4nO3dbYxcZ3nG8euKvSiD87JSvULxOq4jKjlq45INQ0plFJEgcCgRtUxbURUqaCRLLaKJqExjpL6kauVUliL6oaK1EiAFBxoljhtCFYPqSJQCSXazCY7jWFXTUDwO8kawwiHbZmPufthZe72eGc/MmWfO7DP/n2R5PTOe554v15x9zn3u44gQACA/F5VdAAAgDQIeADJFwANApgh4AMgUAQ8AmVpddgFLrV27NjZu3Fh2GQCwYkxNTb0SEWONnhuogN+4caMmJyfLLgMAVgzbP2j2HFs0AJApAh4AMkXAA0CmCHgAyBQBDwCZGqguGgBYiQ5M17Tn4DGdmJ3TutGKdm7dpG0T4xd8LjUCHgAKODBd0679hzU3f1qSVJud0679h8883+y5foQ8AQ8ABew5eOxMgC+amz+tPQePnfm50XMEPAAMuBOzcx09fqHneomTrABQwLrRStPHWz3XDwQ8ABSwc+smVUZWnfNYZWSVdm7d1PK5fmCLBsDQSNHRsvj/W71vWV00HqR7slar1WDYGIAUlne7SAtH07u3b+5b4KZgeyoiqo2eY4sGwFC4ULdLjtiiATAUuul26bV+X/REwAMYCutGK6o1CPNOOlqKBHSrC6JShTxbNACGQtGOlsWArs3OKXQ2oA9M19r6/2VsERHwAIbCtolx7d6+WeOjFVnS+GiloxOsRQO6jC0itmgADI1tE+Ndb4cUDehebBF1iiN4AGhD0atSy7joiYAHgDYUDeiiW0TdYIsGANrQzhWr7bxHPy+qShrwtkcl3SPpGkkh6Q8i4rsp1wSAVPod0EWlPoL/O0mPRcRv2X6TpDcnXg8AUJcs4G1fLukGSR+TpIh4XdLrqdYDAJwr5UnWqyTNSPqC7Wnb99hes/xFtnfYnrQ9OTMzk7AcABguKQN+taTrJH0uIiYk/UzSHctfFBF7I6IaEdWxsbGE5QDAcEm5B39c0vGIeKL+7wfVIOABoCz9Hv7Vb8mO4CPiR5J+aHuxSfQ9kp5PtR4AdKLobJmVIHUXzScl7at30Lwo6eOJ1wOAtjSbLfOXjxzJ5qg+acBHxDOSGt5pBADK1GyGzOzcvGbn5iX1Z6RvSowqADCU2p0hs5Lv+kTAAxhKjWbLNNPPuz71ErNoAAylRrNlXnv9Df3ktfnzXptypG9KBDyAobV8tszy2+pJ6Uf6pkTAA0BdLyZGDhICHgCWKDoxspOLp1JfaEXAA0CXlgf0jVeP6aGp2pktnlZtlsu3g1K0ZNJFAwBdaHQl7L7v/U/bN+YuehPvdhDwANCFRgEdTV7bqM2y6E2820HAA0AXOgniRm2WRW/i3Q4CHgC60CyIvezfzdosi97Eux0EPAB0oVlA/947N2h8tCJLGh+taPf2zQ1Pmm6bGNfu7Zvbem236KIBgC70omc+9U28CXgA6FLqgC6KLRoAyBQBDwCZIuABIFMEPABkioAHgEwR8ACQKQIeADJFwANApgh4AMgUAQ8AmUo6qsD2S5JOSTot6Y2IqKZcDwBwVj9m0dwYEa/0YR0AwBJs0QBAplIHfEj6hu0p2zsavcD2DtuTtidnZmYSlwMAwyN1wL8rIq6T9H5Jn7B9w/IXRMTeiKhGRHVsbCxxOQAwPJIGfETU6n+flPSwpOtTrgcAOCtZwNteY/vSxZ8lvU/Sc6nWAwCcK2UXzVskPWx7cZ37I+KxhOsBAJZIFvAR8aKkt6V6fwBAa7RJAkCmCHgAyBQBDwCZIuABIFMEPABkioAHgEwR8ACQKQIeADJFwANApgh4AMgUAQ8AmSLgASBTBDwAZIqAB4BMEfAAkCkCHgAyRcADQKYIeADIFAEPAJki4AEgUwQ8AGSKgAeATBHwAJApAh4AMpU84G2vsj1t+9HUawEAzlrdhzVuk3RU0mV9WAto6MB0TXsOHtOJ2TmtG61o59ZN2jYxXnZZQFJJA972ekkfkPQ3kj6Vci2gmQPTNe3af1hz86clSbXZOe3af1iSSg15vnSQWuotms9K+rSknzd7ge0dtidtT87MzCQuB8Noz8FjZ8J90dz8ae05eKykis5+6dRm5xQ6+6VzYLpWWk3IT7KAt32LpJMRMdXqdRGxNyKqEVEdGxtLVQ6G2InZuY4e74dB/NJBflIewW+R9EHbL0n6qqSbbH854XpAQ+tGKx093g+D+KWD/CQL+IjYFRHrI2KjpA9LOhQRH0m1HtDMzq2bVBlZdc5jlZFV2rl1U0kVDeaXDvJDHzyyt21iXLu3b9b4aEWWND5a0e7tm0s9oTmIXzrIjyOi7BrOqFarMTk5WXYZQF/QRYNesD0VEdVGz/WjDx5AA9smxgl0JEXAAwlxlI4yEfDABXQb0oN6gRWGBydZgRaKXJBErzvKxhE80EKrkL7QUXirXne2btAPHMEDLRS5IKlZT/vllRHGFKAvCHighSIXJDXrdbfF1g36omXA277M9lsbPP6r6UoCBkeRC5KaXWA1+9p8w9czpgC91nQP3vbvaGEa5EnbI5I+FhFP1Z/+oqTrklcHlGxxX7zb/fJGve57Dh5TrUGYM6YAvdbqJOtnJL09Il62fb2kL9neFREPS3J/ygPK1+sLknZu3XRO+6TEmAKk0SrgV0XEy5IUEU/avlHSo7avlDQ48w2AFabobwVAu1oF/Cnbb42I/5Kk+pH8uyUdkPQr6UsD8sWYAvRDq4D/Q0kX2f7liHhekiLilO2btTD+F8gO/enISdMumoh4NiL+U9IDtv/UCyqS7pb0R32rEOgTbqOH3LTTB/9rkq6U9B1JT0k6oYW7NQFZYbQActPOqIJ5SXOSKpIulvTfEdH0JtrAoOh0uyXFbfTY8kGZ2gn4pyT9i6R3SFor6R9sfygifjtpZUAXFgO1Njsn62y7VzuTHNeNVnran840SZStnS2aWyPizyNiPiJejojflPRI6sKATi3dQ5fO7+W90HZLr2+jx5YPynbBI/iIOO8eehHxpTTlAN1rFKjLtdpu6XV/eootH6ATjAtGNopMeFzUy/70Xm/5AJ1imiSycaHg7Pc4gGZbPjdePaYtdx3SVXd8XVvuOkQbJpIh4JGNRoG6ODRpcZJjP09uNpom+aG3j+uhqRq99ugLtmiQjUGc8bJ8y2fLXYe6vkMU0CkCHlkZ9BkvnHhFPyXborF9se0nbT9r+4jtO1OtBawURe4QBXQq5R78/0m6KSLeJulaSTfbfmfC9YCB1+tee6CVZFs0ERGSXq3/c6T+hznyGGqDeJ4A+Uq6B297laQpSb8k6e8j4okGr9khaYckbdiwIWU5wEAY9PMEyEfSNsmIOB0R10paL+l629c0eM3eiKhGRHVsbCxlOQAwVPrSBx8Rs5Iel3RzP9YDAKTtohmzPVr/uSLpvZJeSLUeAOBcKffgr5B0X30f/iJJD0TEownXAwAskbKL5vuSJlK9PwCgNWbRAECmCHgAyBQBDwCZIuABIFMEPABkioAHgEwR8ACQKQIeADJFwANApgh4AMgUAQ8AmSLgASBTBDwAZIqAB4BMEfAAkCkCHgAyRcADQKYIeADIFAEPAJki4AEgUwQ8AGSKgAeATBHwAJApAh4AMpUs4G1faftx28/bPmL7tlRrAQDOtzrhe78h6U8i4mnbl0qasv3NiHg+4ZoAgLpkR/AR8XJEPF3/+ZSko5LGU60HADhXX/bgbW+UNCHpiQbP7bA9aXtyZmamH+UAwFBIHvC2L5H0kKTbI+Kny5+PiL0RUY2I6tjYWOpyAGBoJA142yNaCPd9EbE/5VoAgHOl7KKxpHslHY2Iu1OtAwBoLOUR/BZJH5V0k+1n6n9+I+F6AIAlkrVJRsS3JTnV+wMAWuNKVgDIFAEPAJki4AEgUwQ8AGSKgAeATBHwAJApAh4AMkXAA0CmUs6DL8WB6Zr2HDymE7NzWjda0c6tm7RtginFAIaPI6LsGs6oVqsxOTnZ9f8/MF3Trv2HNTd/+sxjlhSSxgl7ABmyPRUR1UbPZbVFs+fgsXPCXVoId0mqzc5p1/7DOjBd639hAFCCrAL+xOxcy+fn5k9rz8FjfaoGAMqVVcCvG61c8DUX+hIAgFxkFfA7t25SZWRVy9e08yUAADnIqotm8QTqnoPHVJudO3OCdVFlZJV2bt1USm0A0G9ZBby0EPKLQd9NyyRtlgBykV3AL7U07NuxvM1ysfNm8b0AYCXJOuA71ajNcrHzptMvCn4LAFA2An6JZh02nXTe8FsAgEGRVRdNUc06bDrpvGn1WwAA9BMBv0SjNstOO2+aHe3XZue4ihZAXxHwS2ybGNfu7Zs1PlqRtTC/Zvf2zR1trbQ62mdUAoB+ymrY2CBoNPBsqfHRiv7jjpv6XBWAXLUaNsZJ1h5bPNq//Z+fafg8oxIA9EuyLRrbn7d90vZzqdaQFo6Yt9x1SFfd8XVtuevQQGyBbJsY13gPTtgCQBEp9+C/KOnmhO+vA9M17XzwWdVm5xRaOJG588FnByLke3HCFgCKSBbwEfEtST9O9f6SdOfXjmj+9LnnEOZPh+782pGUy7alFydsAaCI0vfgbe+QtEOSNmzY0NH//clr8x093gudXKXa6agEAOil0tskI2JvRFQjojo2NlZ2OS0tdsgs3RKi9RHAoCo94Itwh48XxVWqAFaSFR3wzTr4U3X292JWDQD0S8o2ya9I+q6kTbaP276112s0a0Vs9nhRvZhVAwD9krKL5ncj4oqIGImI9RFxb6/X6HcrIq2PAFaS0rtoilh6i75+zF7v93oAUASzaABgBWs1i2ZFn2QFADRHwANApgh4AMgUAQ8AmSLgASBTK7pNclB1MpAMAFIh4Hts+S37FgeSSSLkAfQVWzQ9xkAyAIOCgO8xBpIBGBQEfI8xkAzAoCDge4yBZAAGBSdZe4yBZAAGBQGfAPdiBTAI2KIBgEwR8ACQKQIeADJFwANApgh4AMjUQN2yz/aMpB+UXUcLayW9UnYRPcTnGVw5fRaJz5PSL0bEWKMnBirgB53tyWb3PlyJ+DyDK6fPIvF5ysIWDQBkioAHgEwR8J3ZW3YBPcbnGVw5fRaJz1MK9uABIFMcwQNApgh4AMgUAd8G25+3fdL2c2XXUpTtK20/bvt520ds31Z2TUXYvtj2k7afrX+eO8uuqRdsr7I9bfvRsmspyvZLtg/bfsb2ZNn1FGF71PaDtl+wfdT2r5ddUyvswbfB9g2SXpX0TxFxTdn1FGH7CklXRMTTti+VNCVpW0Q8X3JpXbFtSWsi4lXbI5K+Lem2iPheyaUVYvtTkqqSLouIW8qupwjbL0mqRsSgXBjUNdv3Sfr3iLjH9pskvTkiZksuqymO4NsQEd+S9OOy6+iFiHg5Ip6u/3xK0lFJK3Z4fSx4tf7PkfqfFX3UYnu9pA9IuqfsWnCW7csl3SDpXkmKiNcHOdwlAn6o2d4oaULSEyWXUkh9O+MZSSclfTMiVvTnkfRZSZ+W9POS6+iVkPQN21O2d5RdTAFXSZqR9IX69tk9tteUXVQrBPyQsn2JpIck3R4RPy27niIi4nREXCtpvaTrba/YbTTbt0g6GRFTZdfSQ++KiOskvV/SJ+pbnivRaknXSfpcRExI+pmkO8otqTUCfgjV96ofkrQvIvaXXU+v1H9dflzSzSWXUsQWSR+s71t/VdJNtr9cbknFRESt/vdJSQ9Lur7cirp2XNLxJb8hPqiFwB9YBPyQqZ+UvFfS0Yi4u+x6irI9Znu0/nNF0nslvVBqUQVExK6IWB8RGyV9WNKhiPhIyWV1zfaa+sl81bcz3idpRXajRcSPJP3Q9qb6Q++RNNDNCdx0uw22vyLp3ZLW2j4u6S8i4t5yq+raFkkflXS4vm8tSZ+JiH8tr6RCrpB0n+1VWjhgeSAiVnxrYUbeIunhheMKrZZ0f0Q8Vm5JhXxS0r56B82Lkj5ecj0t0SYJAJliiwYAMkXAA0CmCHgAyBQBDwCZIuABIFMEPNAG24/Zns1huiOGBwEPtGePFq4fAFYMAh5YwvY7bH+/Pmd+TX3G/DUR8W+STpVdH9AJrmQFloiIp2w/IumvJVUkfTkiVuSl9QABD5zvryQ9Jel/Jf1xybUAXWOLBjjfL0i6RNKlki4uuRagawQ8cL5/lPRnkvZJ+tuSawG6xhYNsITt35c0HxH31ydUfsf2TZLulHS1pEvqE0VvjYiDZdYKXAjTJAEgU2zRAECmCHgAyBQBDwCZIuABIFMEPABkioAHgEwR8ACQqf8HYfzSbRnXCzcAAAAASUVORK5CYII=\n",
- "text/plain": [
- "<Figure size 432x288 with 1 Axes>"
- ]
- },
- "metadata": {
- "needs_background": "light"
- },
- "output_type": "display_data"
- }
- ],
- "source": [
- "%matplotlib inline\n",
- "import numpy as np\n",
- "from sklearn.cluster import KMeans\n",
- "from scipy.spatial.distance import cdist\n",
- "import matplotlib.pyplot as plt\n",
- "\n",
- "cluster1=np.random.uniform(0.5,1.5,(2,10))\n",
- "cluster2=np.random.uniform(5.5,6.5,(2,10))\n",
- "cluster3=np.random.uniform(3,4,(2,10))\n",
- "\n",
- "X=np.hstack((cluster1,cluster2,cluster3)).T\n",
- "plt.scatter(X[:,0],X[:,1])\n",
- "plt.xlabel('x1')\n",
- "plt.ylabel('x2')\n",
- "plt.show()"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 23,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "image/png": "\n",
- "text/plain": [
- "<Figure size 432x288 with 1 Axes>"
- ]
- },
- "metadata": {
- "needs_background": "light"
- },
- "output_type": "display_data"
- }
- ],
- "source": [
- "K=range(1,10)\n",
- "meandistortions=[]\n",
- "\n",
- "for k in K:\n",
- " kmeans=KMeans(n_clusters=k)\n",
- " kmeans.fit(X)\n",
- " meandistortions.append(\\\n",
- " sum(np.min(cdist(X,kmeans.cluster_centers_,'euclidean'),axis=1))/X.shape[0])\n",
- "\n",
- "plt.plot(K,meandistortions,'bx-')\n",
- "plt.xlabel('k')\n",
- "plt.ylabel('Average Dispersion')\n",
- "plt.title('Selecting k with the Elbow Method')\n",
- "plt.show()"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "从上图可见,类簇数量从1降到2再降到3的过程,更改K值让整体聚类结构有很大改变,这意味着新的聚类数量让算法有更大的收敛空间,这样的K值不能反映真实的类簇数量。而当K=3以后再增大K,平均距离的下降速度显著变缓慢,这意味着进一步增加K值不再会有利于算法的收敛,同时也暗示着K=3是相对最佳的类簇数量。"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## 参考资料\n",
- "* [机器学习聚类算法之K-Means](https://www.biaodianfu.com/k-means.html)"
- ]
- }
- ],
- "metadata": {
- "jupytext_formats": "ipynb,py",
- "kernelspec": {
- "display_name": "Python 3",
- "language": "python",
- "name": "python3"
- },
- "language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
- },
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.7.9"
- }
- },
- "nbformat": 4,
- "nbformat_minor": 2
- }
|