You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

1-k-means.ipynb 199 kB

6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863864865866867868869870871872873874875876877878879880881882883884885886887888889890891892893894895896897898899900901902903904905906907908909910911912913914915916917918919920921922923924925926927928929930931932933934935936937938939940941942943944945946947948949950951952953954955956957958959960961962963
  1. {
  2. "cells": [
  3. {
  4. "cell_type": "markdown",
  5. "metadata": {},
  6. "source": [
  7. "# k-Means"
  8. ]
  9. },
  10. {
  11. "cell_type": "markdown",
  12. "metadata": {},
  13. "source": [
  14. "## 方法\n",
  15. "\n",
  16. "由于具有出色的速度和良好的可扩展性,K-Means聚类算法算得上是最著名的聚类方法。***K-Means算法是一个重复移动类中心点的过程,把类的中心点,也称重心(centroids),移动到其包含成员的平均位置,然后重新划分其内部成员。***\n",
  17. "\n",
  18. "K是算法计算出的超参数,表示类的数量;K-Means可以自动分配样本到不同的类,但是不能决定究竟要分几个类。\n",
  19. "\n",
  20. "K必须是一个比训练集样本数小的正整数。有时,类的数量是由问题内容指定的。例如,一个鞋厂有三种新款式,它想知道每种新款式都有哪些潜在客户,于是它调研客户,然后从数据里找出三类。也有一些问题没有指定聚类的数量,最优的聚类数量是不确定的。\n",
  21. "\n",
  22. "K-Means的参数是类的重心位置和其内部观测值的位置。与广义线性模型和决策树类似,K-Means参数的最优解也是以成本函数最小化为目标。K-Means成本函数公式如下:\n",
  23. "$$\n",
  24. "J = \\sum_{k=1}^{K} \\sum_{i \\in C_k} | x_i - u_k|^2\n",
  25. "$$\n",
  26. "\n",
  27. "$u_k$是第$k$个类的重心位置,定义为:\n",
  28. "$$\n",
  29. "u_k = \\frac{1}{|C_k|} \\sum_{x \\in C_k} x\n",
  30. "$$\n",
  31. "\n",
  32. "\n",
  33. "成本函数是各个类畸变程度(distortions)之和。每个类的畸变程度等于该类重心与其内部成员位置距离的平方和。若类内部的成员彼此间越紧凑则类的畸变程度越小,反之,若类内部的成员彼此间越分散则类的畸变程度越大。\n",
  34. "\n",
  35. "求解成本函数最小化的参数就是一个重复配置每个类包含的观测值,并不断移动类重心的过程。\n",
  36. "1. 首先,类的重心是随机确定的位置。实际上,重心位置等于随机选择的观测值的位置。\n",
  37. "2. 每次迭代的时候,K-Means会把观测值分配到离它们最近的类,然后把重心移动到该类全部成员位置的平均值那里。\n",
  38. "3. 若达到最大迭代步数或两次迭代差小于设定的阈值则算法结束,否则重复步骤2。\n",
  39. "\n"
  40. ]
  41. },
  42. {
  43. "cell_type": "code",
  44. "execution_count": 6,
  45. "metadata": {},
  46. "outputs": [
  47. {
  48. "data": {
  49. "image/png": "iVBORw0KGgoAAAANSUhEUgAAAW4AAAD8CAYAAABXe05zAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAADhRJREFUeJzt3WGI5Hd9x/HPZ/ecc+5UFCJbehd6EcQ2CCXuIZmGytDpA0VpnrQQIQrug31SY7QWSYTiw3siYh7YQkjGJxkUevpAbDCW7c6DMsPh3SWgl1MIaUzOJBgfRN0U/tO7/fbB7naOuL393+3+7zff2fcLBm43c5sv35197+x/d+/niBAAII+F0gMAAG4O4QaAZAg3ACRDuAEgGcINAMkQbgBIhnADQDKEGwCSIdwAkMyRJt7oHXfcEadOnWriTdf21ltv6fjx40VnmBXsYopdTLGLqVnYxYULF34TEe+vc99Gwn3q1CmdP3++iTdd23A4VLfbLTrDrGAXU+xiil1MzcIubP+y7n25VAIAyRBuAEiGcANAMoQbAJIh3ACQDOEGgGQINwAkQ7gBIBnCDQDJEG4ASIZwA0AyhBsAkiHcAJAM4QaAZAg3ACRDuAEgmVrhtv0l25ds/8z2d2y/s+nBAAC72zPctk9I+oKk0xHxYUmLkh5oejAAwO7qXio5Iqlt+4ikY5JebW4kAE0bj8caDAYaj8elR8Et2DPcEfErSV+X9LKk1yT9NiJ+3PRgAJoxHo/V6/XU7/fV6/WId0J7HhZs+32S7pd0l6Q3Jf2r7Qcj4qm33W9V0qokLS0taTgcHvy0N2FjY6P4DLOCXUyxC2kwGKiqKm1ubqqqKvX7fVVVVXqsotI9LiLihjdJfyfpyete/qykf77R31leXo7S1tfXS48wM9jFFLuIGI1G0W63Y2FhIdrtdoxGo9IjFTcLjwtJ52OPHu/c6lzjflnSvbaP2baknqTLDX0eAdCwTqejtbU1raysaG1tTZ1Op/RIuEl7XiqJiHO2z0q6KOmqpGclPd70YACa0+l0VFUV0U5qz3BLUkR8TdLXGp4FAFADvzkJAMkQbgBIhnADQDKEGwCSIdwAkAzhBoBkCDcAJEO4ASAZwg0AyRBuAEiGcANAMoQbAJIh3ACQDOEGgGQINwAkQ7jRuPF4rDNnznAordjF9WZlFxlPvK91kAJwq3ZOFJ9MJmq1Wof6qCx2MTUru9iZo6oqDQaDNO8TnnGjUcPhUJPJRNeuXdNkMsl1kvYBYxdTs7KLnTk2NzdTvU8INxrV7XbVarW0uLioVqulbrdbeqRi2MXUrOxiZ46FhYVU7xMulaBROyeKD4dDdbvdFF+GNoVdTM3KLnbm6Pf7WllZSfM+IdxoXKfTSfMB0TR2MTUru8h44j2XSgAgGcINAMkQbgBIhnADQDKEGwCSIdwAkAzhBoBkCDcAJEO4ASAZwg0AyRBuAEiGcANAMoQbAJIh3ACQTK1w236v7bO2f277su08//4hAMyZuv8e92OSfhQRf2u7JelYgzMBAG5gz2fctt8j6WOSnpSkiJhExJtNDwYctIyneQO7qXOp5AOS3pD0bdvP2n7C9vGG5wIO1M5p3v1+X71ej3gjtTqXSo5I+oikhyLinO3HJD0i6Z+uv5PtVUmrkrS0tFT8tOSNjY3iM8wKdiENBgNVVaXNzU1VVaV+v6+qqkqPVRSPi6l0u4iIG94k/ZGkl657+S8l/duN/s7y8nKUtr6+XnqEmcEuIkajUbTb7VhYWIh2ux2j0aj0SMXxuJiahV1IOh979Hjntuelkoh4XdIrtj+0/aqepOeb+TQCNGPnNO+VlRWtra2lOhgWeLu6P1XykKTB9k+UvCjpc82NBDQj42newG5qhTsinpN0uuFZAAA18JuTAJAM4QaAZAg3ACRDuAEgGcINAMkQbgBIhnADQDKEGwCSIdwAkAzhBoBkCDcAJEO4ASAZwg0AyRBuAEiGcAO30Xg81pkzZzjzUuxiP+oepABgn3YOLJ5MJmq1Wof6JB52sT884wZuk+FwqMlkomvXrmkymeQ6nPaAsYv9IdzAbdLtdtVqtbS4uKhWq6Vut1t6pGLYxf5wqQS4TXYOLB4Oh+p2u4f60gC72B/CDdxGnU6HSG1jF7eOSyUAkAzhBoBkCDcAJEO4ASAZwg0AyRBuAEiGcANAMoQbAJIh3ACQDOEGgGQINwAkQ7gBIBnCDQDJEG4ASKZ2uG0v2n7W9g+bHAgAcGM384z7YUmXmxoEAFBPrXDbPinpk5KeaHac+cIp1gCaUPcEnG9K+oqkdzc4y1zhFGsATdkz3LY/JenXEXHBdvcG91uVtCpJS0tLxU9t3tjYKDrDYDBQVVXa3NxUVVXq9/uqqqrILKV3MUvYxRS7mEq3i4i44U3SGUlXJL0k6XVJ/y3pqRv9neXl5ShtfX296P9/NBpFu92OxcXFaLfbMRqNis1SehezhF1MsYupWdiFpPOxR493bns+446IRyU9Kknbz7j/MSIebObTyPzgFGsATeGU9wZxijWAJtxUuCNiKGnYyCQAgFr4zUkASIZwA0AyhBsAkiHcAJAM4QaAZAg3ACRDuAEgGcINAMkQbgBIhnADQDKEGwCSIdwAkAzhBoBkCDcAJEO4ASAZwo3Gcdo9cLA4AQeN4rR74ODxjBuNGg6HmkwmunbtmiaTSa6TtIEZRbjRqG63q1arpcXFRbVaLXW73dIjAelxqQSN4rR74OARbjSO0+6Bg8WlEgBIhnADQDKEGwCSIdwAkAzhBoBkCDcAJEO4ASAZwg0AyRBuAEiGcANAMoQbAJIh3ACQDOEGgGQINwAks2e4bd9pe932ZduXbD98OwYDAOyuzr/HfVXSlyPiou13S7pg+98j4vmGZwMA7GLPZ9wR8VpEXNz+8+8lXZZ0ounBcDDG47EGgwEnrANz5Kaucds+JekeSeeaGAYHa+eE9X6/r16vR7yBOVH76DLb75L0PUlfjIjf7fLfVyWtStLS0lLx07w3NjaKz1DaYDBQVVXa3NxUVVXq9/uqqqr0WEXxuJhiF1PpdhERe94kvUPSM5L+oc79l5eXo7T19fXSIxQ3Go2i3W7HwsJCtNvtGI1GpUcqjsfFFLuYmoVdSDofNfoaEbV+qsSSnpR0OSK+0ehnERyonRPWV1ZWtLa2xoG9wJyoc6nkPkmfkfRT289tv+6rEfF0c2PhoHQ6HVVVRbSBObJnuCPiPyX5NswCAKiB35wEgGQINwAkQ7gBIBnCDQDJEG4ASIZwA0AyhBsAkiHcAJAM4QaAZAg3ACRDuAEgGcINAMkQbgBIhnADQDKEGwCSIdwAkAzhBoBkCDcAJEO4ASAZwg0AyRBuAEiGcANAMoQbAJIh3ACQDOEGgGQINwAkQ7gBIBnCDQDJEG4ASIZwA0AyhBsAkiHcAJAM4QaAZAg3ACRTK9y2P277F7ZfsP1I00MBAP5/e4bb9qKkb0n6hKS7JX3a9t1NDwYA2F2dZ9wflfRCRLwYERNJ35V0f7Nj7c94PNZgMNB4PC49CgAcuDrhPiHpletevrL9upk0Ho/V6/XU7/fV6/WIN4C5c6TGfbzL6+IP7mSvSlqVpKWlJQ2Hw/1NdosGg4GqqtLm5qaqqlK/31dVVUVmmRUbGxvF3h+zhl1MsYupbLuoE+4rku687uWTkl59+50i4nFJj0vS6dOno9vtHsR8N+3o0aP/F++jR49qZWVFnU6nyCyzYjgcqtT7Y9awiyl2MZVtF3UulfxE0gdt32W7JekBST9odqxb1+l0tLa2ppWVFa2trR36aAOYP3s+446Iq7Y/L+kZSYuS+hFxqfHJ9qHT6aiqKqINYC7VuVSiiHha0tMNzwIAqIHfnASAZAg3ACRDuAEgGcINAMkQbgBIhnADQDKEGwCSIdwAkAzhBoBkCDcAJEO4ASAZwg0AyRBuAEiGcANAMoQbAJIh3ACQDOEGgGQc8QcHtu//jdpvSPrlgb/hm3OHpN8UnmFWsIspdjHFLqZmYRd/EhHvr3PHRsI9C2yfj4jTpeeYBexiil1MsYupbLvgUgkAJEO4ASCZeQ7346UHmCHsYopdTLGLqVS7mNtr3AAwr+b5GTcAzKW5DLftj9v+he0XbD9Sep5SbN9pe932ZduXbD9ceqaSbC/aftb2D0vPUpLt99o+a/vn24+NTumZSrH9pe2PjZ/Z/o7td5aeqY65C7ftRUnfkvQJSXdL+rTtu8tOVcxVSV+OiD+TdK+kvz/Eu5CkhyVdLj3EDHhM0o8i4k8l/bkO6U5sn5D0BUmnI+LDkhYlPVB2qnrmLtySPirphYh4MSImkr4r6f7CMxUREa9FxMXtP/9eWx+gJ8pOVYbtk5I+KemJ0rOUZPs9kj4m6UlJiohJRLxZdqqijkhq2z4i6ZikVwvPU8s8hvuEpFeue/mKDmmsrmf7lKR7JJ0rO0kx35T0FUmbpQcp7AOS3pD07e3LRk/YPl56qBIi4leSvi7pZUmvSfptRPy47FT1zGO4vcvrDvWPzth+l6TvSfpiRPyu9Dy3m+1PSfp1RFwoPcsMOCLpI5L+JSLukfSWpEP5fSDb79PWV+N3SfpjScdtP1h2qnrmMdxXJN153csnleTLnybYfoe2oj2IiO+XnqeQ+yT9je2XtHXp7K9sP1V2pGKuSLoSETtfeZ3VVsgPo7+W9F8R8UZE/I+k70v6i8Iz1TKP4f6JpA/avst2S1vfbPhB4ZmKsG1tXcu8HBHfKD1PKRHxaEScjIhT2no8/EdEpHhmddAi4nVJr9j+0ParepKeLzhSSS9Lutf2se2PlZ6SfKP2SOkBDlpEXLX9eUnPaOu7xP2IuFR4rFLuk/QZST+1/dz2674aEU8XnAnlPSRpsP3E5kVJnys8TxERcc72WUkXtfUTWM8qyW9Q8puTAJDMPF4qAYC5RrgBIBnCDQDJEG4ASIZwA0AyhBsAkiHcAJAM4QaAZP4X2rUUvmi+hNYAAAAASUVORK5CYII=\n",
  50. "text/plain": [
  51. "<Figure size 432x288 with 1 Axes>"
  52. ]
  53. },
  54. "metadata": {},
  55. "output_type": "display_data"
  56. }
  57. ],
  58. "source": [
  59. "% matplotlib inline\n",
  60. "import matplotlib.pyplot as plt\n",
  61. "import numpy as np\n",
  62. "\n",
  63. "X0 = np.array([7, 5, 7, 3, 4, 1, 0, 2, 8, 6, 5, 3])\n",
  64. "X1 = np.array([5, 7, 7, 3, 6, 4, 0, 2, 7, 8, 5, 7])\n",
  65. "plt.figure()\n",
  66. "plt.axis([-1, 9, -1, 9])\n",
  67. "plt.grid(True)\n",
  68. "plt.plot(X0, X1, 'k.');"
  69. ]
  70. },
  71. {
  72. "cell_type": "markdown",
  73. "metadata": {},
  74. "source": [
  75. "假设K-Means初始化时,将第一个类的重心设置在第5个样本,第二个类的重心设置在第11个样本.那么我们可以把每个实例与两个重心的距离都计算出来,将其分配到最近的类里面。计算结果如下表所示:\n",
  76. "![data_0](images/data_0.png)\n",
  77. "\n",
  78. "新的重心位置和初始聚类结果如下图所示。第一类用X表示,第二类用点表示。重心位置用稍大的点突出显示。\n",
  79. "\n",
  80. "\n"
  81. ]
  82. },
  83. {
  84. "cell_type": "code",
  85. "execution_count": 7,
  86. "metadata": {},
  87. "outputs": [
  88. {
  89. "data": {
  90. "image/png": "iVBORw0KGgoAAAANSUhEUgAAAW4AAAEICAYAAAB/Dx7IAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAFZRJREFUeJzt3X+U3XV95/Hnm0kGCVHpbty0wISBalEKRyGhdaR2Zxx3Lavonj2nrC3C0awnbs8qWGSzKkJdldJ1KVW71h5WUxaYNZuDrqcgVt3J3D1ap2wSZAsYOYfCkEG04io/AnaGhPf+8b3DnYQkcyeZm+/9zDwf59wz8/3e7/3e1/3k5nW/87l35huZiSSpHMfUHUCSND8WtyQVxuKWpMJY3JJUGItbkgpjcUtSYSxudVREvD4i7q85w4cj4vN1ZjhSEZER8fK6c6g7WNwCICLeGxHbI2IqIm6cx+0mIuKNB7s+M7+Vmae3u/2RiojBiHhkvwx/mJnv7tR9Hm0RcWNEfKLuHKrPsroDqGs8CnwCeBNwXM1ZDigiAojMfK7uLAcSEcsyc0/dObT4ecQtADLzy5n5FeD/7X9dRKyKiNsj4vGI+GlEfCsijomIm4E1wG0RsTsiNh7gts8fAR9s+4h4bUR8p7n//xsRg7Nu34iIayLir4FngNMi4l0RsTMinoqIByPiPc1tjwe+BpzY3P/uiDgxIj4aEbfM2udbI+K+5v01IuJVs66biIgrIuJvI+KJiPgfEfGiA41ZRLwzIv46Iv4kIn4KfLS5fn0z388i4usRcUpzfTS3/XFz338bEWfOepzv3m/f3z7AfW4ALgI2Nh/fbc31/yEiftAck/sjYvhAmbVIZKYXL89fqI66b9xv3bXAnwPLm5fXUx35AkwAbzzE/gaBR2Yt77M9cBLVi8W/oDqQ+GfN5Zc1r28Au4BfpfoJcTnwZuCXgQD+KVWhn3Og+2uu+yhwS/P7XwGebt7PcmAj8ADQOyvf/wFOBP4RsBP4twd5bO8E9gDva2Y7DviXzf29qrnuI8B3mtu/CdgBnNDM/irgl2Y9znfvt+9vz1pO4OXN728EPjHrutOBSeDE5nI/8Mt1P5e8dO7iEbfa8SzwS8ApmflsVvPWC/VHbt4B3JGZd2Tmc5n5TWA7VZHPuDEz78vMPc37/2pm/l1W/jfwDaoXk3b8a+CrmfnNzHwWuI6qcF83a5vPZOajmflT4DbgNYfY36OZ+afNbD8H3gNcm5k7s5o2+UPgNc2j7meBFwOvpHrh25mZP2wz96HsBY4FzoiI5Zk5kZl/twD7VZeyuNWO/0x1FPmN5tTEBxdw36cAv92ctng8Ih4HfoPqhWLG5OwbRMT5EfE3zWmbx6lKflWb93ci8PDMQlbz5ZNUR/4zfjTr+2eAlYfY3+R+y6cAn571WH5KdXR9UmZuBf4L8Fng7yPihoh4SZu5DyozHwDeT/WTxY8jYnNEnHik+1X3srg1p8x8KjM/kJmnARcAl8+aQ53vkff+208CN2fmCbMux2fmHx3oNhFxLPAlqiPl1Zl5AnAHVTm2k+dRqnKd2V8AfcAP5vk4XpCtaRJ4z36P57jM/A5AZn4mM9dSTf38CvDvm7d7Glgxaz+/OI/7JDP/e2b+BtVjS+A/Hd7DUQksbgHVJyKab8L1AD0R8aKIWNa87i0R8fJmyT1J9aP53uZN/x44bR53tf/2twAXRMSbImLmfgcj4uSD3L6XalrgMWBPRJwP/PP99v+PI+KlB7n9FuDNETEcEcuBDwBTwHfm8RgO5c+BD0XErwJExEsj4reb358bEb/evN+ngX+gNY53A/8qIlZE9Xntf3OI+9hnDCPi9Ih4Q/NF7R+An8/arxYhi1szPkL1H/6DVPPOP2+uA3gF8L+A3cA48GeZ2Whedy3wkebUwBVt3M8+22fmJPA24MNUZTxJdRR6wOdmZj4FXEpVwD8Dfhf4y1nXfx/4IvBg8z5O3O/29zcf358CP6H6CeKCzJxuI/ucMvN/Uh3tbo6IJ4F7gfObV78E+K/N3A9TvQl7XfO6PwGmqUr5vwEjh7ibL1DNZz8eEV+heiH7o+bj+RHwT6jGU4vUzCcDJEmF8IhbkgpjcUtSYSxuSSqMxS1JhenIH5latWpV9vf3d2LXbXv66ac5/vjja83QLRyLFseixbFo6Yax2LFjx08y82XtbNuR4u7v72f79u2d2HXbGo0Gg4ODtWboFo5Fi2PR4li0dMNYRMTDc29VcapEkgpjcUtSYSxuSSqMxS1JhbG4JakwFrckFcbilqTCWNySVBiLW5IKY3FLUmEsbkkqjMUtSYWxuCWpMBa3JBXG4pakwljcklQYi1uSCtNWcUfE70fEfRFxb0R8MSJe1Olgkjrgk5+EsbF9142NVetVjDmLOyJOAi4F1mXmmUAP8PZOB5PUAeeeCxde2CrvsbFq+dxz682leWn3nJPLgOMi4llgBfBo5yJJ6pihIdiyBS68kP7zz4evfa1aHhqqO5nmITJz7o0iLgOuAX4OfCMzLzrANhuADQCrV69eu3nz5gWOOj+7d+9m5cqVtWboFo5Fi2NR6d+0if6bb2bi4ouZWL++7ji164bnxdDQ0I7MXNfWxpl5yAvwC8BW4GXAcuArwDsOdZu1a9dm3cbGxuqO0DUcixbHIjO3bs1ctSofuvjizFWrquUlrhueF8D2nKOPZy7tvDn5RuChzHwsM58Fvgy87jBeUCTVbWZOe8uW6ki7OW3ygjcs1dXaKe5dwGsjYkVEBDAM7OxsLEkdsW3bvnPaM3Pe27bVm0vzMuebk5l5Z0TcCtwF7AG+C9zQ6WCSOmDjxheuGxryzcnCtPWpksz8A+APOpxFktQGf3NSkgpjcUtSYSxuSSqMxS1JhbG4JakwFrckFcbilqTCWNySVBiLW5IKY3FLUmEsbkkqjMUtSYWxuCWpMBa3Osczirc4FlpAFrc6xzOKtzgWLzA+Oc6137qW8cnx2nOM7BqpPcd8tHuWd2n+Zp1RnN/7Pfjc55buGcUdi32MT44zfNMw03un6e3pZfSSUQb6BmrLMbVnipHJkdpyzJdH3OqsoaGqqD7+8errEi0qwLGYpTHRYHrvNHtzL9N7p2lMNGrN8RzP1ZpjvixuddbYWHV0edVV1delfFJax+J5g/2D9Pb00hM99Pb0Mtg/WGuOYzim1hzz5VSJOmfWGcWfP6/h7OWlxLHYx0DfAKOXjNKYaDDYP1jb9MRMjk1jm1g/tL6IaRKwuNVJhzqj+FIrK8fiBQb6BrqiKAf6BphaM9UVWdplcatzPKN4i2OhBeQctyQVxuKWpMJY3JJUGItbkgpjcUtSYSxuSSqMxS1JhbG4JakwFrckFcbilqTCWNySVBiLW4vTgU4VdjCeQkyFsbi1OO1/qrCD8RRiKlBbxR0RJ0TErRHx/YjYGRHl/P1DLU2zTxV2sPLe/29kS4Vo94j708BfZeYrgVcDOzsXSVogM+V9wQVw/fX7Xnf99dV6S1sFmvPvcUfES4DfBN4JkJnTwHRnY0kLZGgIPvYxuOKKavmcc6rSvuIKuO46S1tFaudECqcBjwF/ERGvBnYAl2Xm0x1NJi2Uyy+vvl5xBa8580y4996qtGfWS4WJzDz0BhHrgL8BzsvMOyPi08CTmXnVftttADYArF69eu3mzZs7FLk9u3fvZuXKlbVm6BaOReU1l17KCffcw+NnncXdn/lM3XFq5/OipRvGYmhoaEdmrmtr48w85AX4RWBi1vLrga8e6jZr167Nuo2NjdUdoWs4Fpn5x3+cGZE/O+uszIhqeYnzedHSDWMBbM85+njmMudUSWb+KCImI+L0zLwfGAa+d7ivKtJRN2tO++5zzmHwrrtac95Ol6hA7Z4s+H3ASET0Ag8C7+pcJGkBjY3B1Ve35rQbjVZZX301nH22b1CqOG0Vd2beDbQ39yJ1i5nPad922wvL+fLLq9L2c9wqkL85qcWpnV+uaeeXdKQuZHFrcdq2rb0j6Zny3rbt6OSSFkC7c9xSWTZubH/boSGnSlQUj7glqTAWtyQVxuKWpMJY3JJUGItbkgpjcUtSYSxuSSqMxS1JhbG4JakwFrckFcbilo6SkXtG6P9UP8f8x2Po/1Q/I/eM1B1JhfJvlUhHwcg9I2y4bQPPPPsMAA8/8TAbbtsAwEVnXVRntNqMT47TmGgw2D/IQN9A3XGKYnFLR8GVo1c+X9oznnn2Ga4cvXJJFvf45DjDNw0zvXea3p5eRi8ZtbznwakS6SjY9cSuea1f7BoTDab3TrM39zK9d5rGRKPuSEWxuKWjYM1L18xr/WI32D9Ib08vPdFDb08vg/2DdUcqisUtHQXXDF/DiuUr9lm3YvkKrhm+pqZE9RroG2D0klE+PvRxp0kOg3Pc0lEwM4995eiV7HpiF2teuoZrhq9ZkvPbMwb6Bizsw2RxS0fJRWddtKSLWgvHqRJJKozFLUmFsbglqTAWtyQVxuKWpMJY3JJUGItbkgpjcUtSYSxuSSqMxS1JhbG4JakwFrckFcbilqTCWNySVJi2izsieiLiuxFxeycDLQqf/CSMje27bmysWi9JR2g+R9yXATs7FWRROfdcuPDCVnmPjVXL555bby5Ji0JbxR0RJwNvBj7f2TiLxNAQbNlSlfXVV1dft2yp1kvSEYrMnHujiFuBa4EXA1dk5lsOsM0GYAPA6tWr127evHmBo87P7t27WblyZa0Z+jdtov/mm5m4+GIm1q+vLUc3jEW3cCxaHIuWbhiLoaGhHZm5rq2NM/OQF+AtwJ81vx8Ebp/rNmvXrs26jY2N1Rtg69bMVasyr7qq+rp1a21Rah+LLuJYtDgWLd0wFsD2nKNbZy7tTJWcB7w1IiaAzcAbIuKW+b+eLCEzc9pbtsDHPtaaNtn/DUtJOgxzFndmfigzT87MfuDtwNbMfEfHk5Vs27Z957Rn5ry3bas3l6RFwbO8d8LGjS9cNzTkm5OSFsS8ijszG0CjI0kkSW3xNyclqTAWtyQVxuKWpMJY3JJUGItbkgpjcUtSYSxuSSqMxS1JhbG4JakwFrckFcbilqTCWNySVBiLW5IKY3FLUmEsbnXc+OQ4137rWsYnx+uOIi0KnkhBHTU+Oc7wTcNM752mt6eX0UtGGegbqDuWVDSPuNVRjYkG03un2Zt7md47TWOiUXckqXgWtzpqsH+Q3p5eeqKH3p5eBvsH644kFc+pEnXUQN8Ao5eM0phoMNg/6DSJtAAsbnXcQN+AhS0tIKdKJKkwFrckFcbilqTCWNySVBiLW5IKY3FLUmEsbkkqjMUtSYWxuCWpMBa3JBXG4pakwljcklQYi1uSCmNxS1Jh5izuiOiLiLGI2BkR90XEZUcjmCTpwNr5e9x7gA9k5l0R8WJgR0R8MzO/1+FskqQDmPOIOzN/mJl3Nb9/CtgJnNTpYFoY45PjjOwa8Qzr0iIyrznuiOgHzgbu7EQYLayZM6xvemgTwzcNW97SItH2qcsiYiXwJeD9mfnkAa7fAGwAWL16NY1GY6EyHpbdu3fXnqFuI7tGmNozxXM8x9SeKTaNbWJqzVTdsWrl86LFsWgpbSwiM+feKGI5cDvw9cy8fq7t161bl9u3b1+AeIev0WgwODhYa4a6zRxxT+2Z4thlxzJ6yeiSP/ejz4sWx6KlG8YiInZk5rp2tm3nUyUBfAHY2U5pq3vMnGF9/anrLW1pEWlnquQ84GLgnoi4u7nuw5l5R+diaaEM9A0wtWbK0pYWkTmLOzO/DcRRyCJJaoO/OSlJhbG4JakwFrckFcbilqTCWNySVBiLW5IKY3FLUmEsbkkqjMUtSYWxuCWpMBa3JBXG4pakwljcklQYi1uSCmNxS1JhLG5JKozFLUmFsbglqTAWtyQVxuKWpMJY3JJUGItbkgpjcUtSYSxuSSqMxS1JhbG4JakwFrckFcbilqTCWNySVBiLW5IKY3FLUmEsbkkqjMUtSYWxuCWpMBa3JBWmreKOiN+KiPsj4oGI+GCnQ0mSDm7O4o6IHuCzwPnAGcDvRMQZnQ52JMYnxxnZNcL45HjdUSRpwbVzxP1rwAOZ+WBmTgObgbd1NtbhG58cZ/imYTY9tInhm4Ytb0mLzrI2tjkJmJy1/Ajw6/tvFBEbgA0Aq1evptFoLES+eRvZNcLUnime4zmm9kyxaWwTU2umasnSLXbv3l3bv0e3cSxaHIuW0saineKOA6zLF6zIvAG4AWDdunU5ODh4ZMkO07GTxzIyWZX3scuOZf3Qegb6BmrJ0i0ajQZ1/Xt0G8eixbFoKW0s2pkqeQTom7V8MvBoZ+IcuYG+AUYvGWX9qesZvWR0yZe2pMWnnSPubcArIuJU4AfA24Hf7WiqIzTQN8DUmilLW9KiNGdxZ+aeiHgv8HWgB9iUmfd1PJkk6YDaOeImM+8A7uhwFklSG/zNSUkqjMUtSYWxuCWpMBa3JBXG4pakwljcklQYi1uSCmNxS1JhLG5JKozFLUmFsbglqTAWtyQVxuKWpMJY3JJUGItbkgpjcUtSYSLzBef9PfKdRjwGPLzgO56fVcBPas7QLRyLFseixbFo6YaxOCUzX9bOhh0p7m4QEdszc13dObqBY9HiWLQ4Fi2ljYVTJZJUGItbkgqzmIv7hroDdBHHosWxaHEsWooai0U7xy1Ji9ViPuKWpEXJ4pakwizK4o6I34qI+yPigYj4YN156hIRfRExFhE7I+K+iLis7kx1ioieiPhuRNxed5Y6RcQJEXFrRHy/+dwYqDtTXSLi95v/N+6NiC9GxIvqztSORVfcEdEDfBY4HzgD+J2IOKPeVLXZA3wgM18FvBb4d0t4LAAuA3bWHaILfBr4q8x8JfBqluiYRMRJwKXAusw8E+gB3l5vqvYsuuIGfg14IDMfzMxpYDPwtpoz1SIzf5iZdzW/f4rqP+hJ9aaqR0ScDLwZ+HzdWeoUES8BfhP4AkBmTmfm4/WmqtUy4LiIWAasAB6tOU9bFmNxnwRMzlp+hCVaVrNFRD9wNnBnvUlq8ylgI/Bc3UFqdhrwGPAXzWmjz0fE8XWHqkNm/gC4DtgF/BB4IjO/UW+q9izG4o4DrFvSn3mMiJXAl4D3Z+aTdec52iLiLcCPM3NH3Vm6wDLgHOBzmXk28DSwJN8HiohfoPpp/FTgROD4iHhHvanasxiL+xGgb9byyRTy408nRMRyqtIeycwv152nJucBb42ICaqpszdExC31RqrNI8AjmTnzk9etVEW+FL0ReCgzH8vMZ4EvA6+rOVNbFmNxbwNeERGnRkQv1ZsNf1lzplpERFDNZe7MzOvrzlOXzPxQZp6cmf1Uz4etmVnEkdVCy8wfAZMRcXpz1TDwvRoj1WkX8NqIWNH8vzJMIW/ULqs7wELLzD0R8V7g61TvEm/KzPtqjlWX84CLgXsi4u7mug9n5h01ZlL93geMNA9sHgTeVXOeWmTmnRFxK3AX1Sewvkshv/rur7xLUmEW41SJJC1qFrckFcbilqTCWNySVBiLW5IKY3FLUmEsbkkqzP8Hl/EjZI/oyWwAAAAASUVORK5CYII=\n",
  91. "text/plain": [
  92. "<Figure size 432x288 with 1 Axes>"
  93. ]
  94. },
  95. "metadata": {},
  96. "output_type": "display_data"
  97. }
  98. ],
  99. "source": [
  100. "C1 = [1, 4, 5, 9, 11]\n",
  101. "C2 = list(set(range(12)) - set(C1))\n",
  102. "X0C1, X1C1 = X0[C1], X1[C1]\n",
  103. "X0C2, X1C2 = X0[C2], X1[C2]\n",
  104. "plt.figure()\n",
  105. "plt.title('1st iteration results')\n",
  106. "plt.axis([-1, 9, -1, 9])\n",
  107. "plt.grid(True)\n",
  108. "plt.plot(X0C1, X1C1, 'rx')\n",
  109. "plt.plot(X0C2, X1C2, 'g.')\n",
  110. "plt.plot(4,6,'rx',ms=12.0)\n",
  111. "plt.plot(5,5,'g.',ms=12.0);"
  112. ]
  113. },
  114. {
  115. "cell_type": "markdown",
  116. "metadata": {},
  117. "source": [
  118. "现在我们重新计算两个类的重心,把重心移动到新位置,并重新计算各个样本与新重心的距离,并根据距离远近为样本重新归类。结果如下表所示:\n",
  119. "\n",
  120. "![data_1](images/data_1.png)\n",
  121. "\n",
  122. "画图结果如下:"
  123. ]
  124. },
  125. {
  126. "cell_type": "code",
  127. "execution_count": 8,
  128. "metadata": {},
  129. "outputs": [
  130. {
  131. "data": {
  132. "image/png": "iVBORw0KGgoAAAANSUhEUgAAAW4AAAEICAYAAAB/Dx7IAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAFRBJREFUeJzt3X+Q3Hddx/Hnu5cfNAk/DRzSJlxxpKWiCEmhR/1x51WHQgVnYMqPkgoZjKJoceqghanUQgUdxwEHrWJJpeUg1sIo1grF9E6FxtqkrZaSopWkSaGlAeyPS+GuSd7+8d1zl/Qut5fc5rufvedjZufuu/vd/b73fXev++7n+939RGYiSSrHCXUXIEmaH4NbkgpjcEtSYQxuSSqMwS1JhTG4JakwBrcWTES8JSK+OMttayNiIiL6jnddLTWcHxE31rX9hRARuyPi7LrrUL0M7kUsIpZHxMci4t6IeDQibo+Iczqxrczck5mrMvNgY9vjEfG2Tmyr8fgDEZERsaSlhtHM/LlObfN4i4hLI+ITddeh48/gXtyWAHuBnwaeClwCXBsRAzXW1JY699zn0vrPQuoEg3sRy8z9mXlpZu7OzEOZeT2wC1gHEBFDEXFfRFwUEQ9GxP0R8dbp+0fED0TEZyPikYj4d+CHZttW6x5wRFwO/CTwkcbwyUca65wWEV+IiO9ExFcj4ryW+/9VRFwRETdExH5gOCJe1XiV8EhE7I2IS1s2+S+Nrw81tjF4+FBORLw8Im6NiIcbX1/ectt4RLwvIr7UeDVyY0SsnuW5TffptyPiAeCqxvXnRsQdEfFQRNwcET/Wcp/fjoivNx77qxEx0vI833/4Y8+wzVcA7wZe33h+/9G4/i0R8bXG4+6KiPNn+5moYJnpxQuZCdAPfA84rbE8BBwALgOWAq8EHgOe3rh9C3AtsBJ4IfB14IuzPPYAkMCSxvI48LaW21dS7f2/leqVwEuAbwE/0rj9r4CHgbOodjie1KjvRxvLPwZ8E/iFmbbXuO4t0/UBzwD+F9jQ2N4bG8s/0FLf/wDPB05sLH9wluc23ac/AJY31n8J8CDwMqAP+EVgd+P2UxvP9Tkttf5Qy/N8/2GPfV/L8m7g7Mb3lwKfOKyHjwCnNpZ/cLp/Xnrr4h63AIiIpcAo8PHMvLvlpseByzLz8cy8AZgATm0MVbwW+N2s9ty/DHz8GEo4F9idmVdl5oHMvA34NPC6lnX+LjO/lNWrg+9l5nhm3tlY/k/gU1TDPu14FfDfmXlNY3ufAu4Gfr5lnasy878y87tU/6B+/AiPdwh4b2ZONtb/JeAvMvOWzDyYmR8HJoEzgYNUAX56RCzN6hXP/7RZ91wOAS+MiBMz8/7MvGuBHlddxOAWEXECcA0wBbzjsJu/nZkHWpYfA1YBz6Q5Rj7t3mMo47nAyxrDCg9FxEPA+cCzW9Zp3RYR8bKIGIuIfRHxMPArwIzDGTN4zgz13guc1LL8QMv30897Nvsy83sty88FLjrs+ayh2su+B3gn1R7zgxGxJSKe02bds8rM/cDrqfpwf0T8Q0ScdqyPq+5jcC9yERHAx6iGSV6bmY+3edd9VMMDa1quWzuPTR/+sZR7gX/OzKe1XFZl5tuPcJ9PAp8F1mTmU4E/B2KWdQ/3DapwbbWWarjnaMz0fC4/7PmsaOzZk5mfzMyfaNSQVMMsAPuBFS2P82xm94TnmJmfz8yfpRomuRv4y6N7OupmBreuAF4A/HzjJX5bsjqt7zPApRGxIiJOpxrHbdc3gee1LF8PPD8iNkTE0sbljIh4wREe48nAdzLzexHxUuBNLbftoxo2eN6M94QbGtt7U+OA6euB0xt1LIS/BH6l8aogImJl42DqkyPi1Ij4mYhYTnVM4btUwycAdwCvjIhnRMSzqfbMZ/NNYKDxiomI6I+IV0fESqphmYmWx1UPMbgXsYh4LvDLVGO3DzTOTpiYx5kI76AaPniA6qDaVfPY/IeB10XE/0bEn2Tmo8DPAW+g2ht+gObBvtn8KnBZRDwK/C7VODQAmfkYcDnwpcZQxZmtd8zMb1ONq18EfBt4F3BuZn5rHs9hVpm5nWqc+yNUBz3voTo4SuM5fZDq4OsDwLOozhCBasjqP6gOQt4I/PURNvM3ja/fjojbqP6eL6Lq33eoxvt/dSGej7pLZDqRgiSVxD1uSSqMwS1JhTG4JakwBrckFaYjH4azevXqHBgY6MRDt23//v2sXLmy1hq6hb1oshdN9qKpG3qxY8eOb2XmM9tZtyPBPTAwwPbt2zvx0G0bHx9naGio1hq6hb1oshdN9qKpG3oREW2/89ihEkkqjMEtSYUxuCWpMAa3JBXG4JakwhjcklQYg1uSCmNwS1JhDG5JKozBLUmFMbglqTAGtyQVxuCWpMIY3JJUGINbkgpjcEtSYQxuSSpMW8EdEb8ZEXdFxJcj4lMR8aROFyapA/7wD2Fs7PuvGxurrlcx5gzuiDgJ+A1gfWa+EOgD3tDpwiR1wBlnwHnnNcN7bKxaPuOMeuvSvLQ75+QS4MSIeBxYAXyjcyVJ6pjhYbj2WjjvPAbOOQf+8R+r5eHhuivTPERmzr1SxIXA5cB3gRsz8/wZ1tkEbALo7+9ft2XLlgUudX4mJiZYtWpVrTV0C3vRZC8qA5s3M3DNNezesIHdGzfWXU7tuuH3Ynh4eEdmrm9r5cw84gV4OnAT8ExgKfC3wJuPdJ9169Zl3cbGxuouoWvYiyZ7kZk33ZS5enXu2rAhc/XqanmR64bfC2B7zpHH05d2Dk6eDezKzH2Z+TjwGeDlR/EPRVLdpse0r7222tNuDJs84YClulo7wb0HODMiVkREACPAzs6WJakjbr31+8e0p8e8b7213ro0L3MenMzMWyLiOuA24ABwO/DRThcmqQPe9a4nXjc87MHJwrR1Vklmvhd4b4drkSS1wXdOSlJhDG5JKozBLUmFMbglqTAGtyQVxuCWpMIY3JJUGINbkgpjcEtSYQxuSSqMwS1JhTG4JakwBrckFcbgVuc4o3iTvWjqll50Sx1HweBW5zijeJO9aOqWXnRLHUeh3VnepflrmVGct78drrhi8c4obi+auqUXBc947x63Omt4uPrjfN/7qq8F/FF0jL1o6pZeNOoYuOaaon4mBrc6a2ys2qO65JLq62KelNZeNHVLLxp17N6woaificGtzmmZUZzLLlvcM4rbi6Zu6UXBM94b3OocZxRvshdN3dKLbqnjKHhwUp3jjOJN9qKpW3rRLXUcBfe4JakwBrckFcbgVrlmeufbbAp5R5zUDoNb5Tr8nW+zKegdcVI7DG6Vq/UdeLOFd+upZwUcdJLaYXCrbEcKb0NbPcrgVvlmCm9DWz3M87jVG7rlg4uk48A9bvWObvngIqnDDG71jm754CKpwwxu9YZu+eAi6TgwuFW+mQ5EtnOqoFQog1tlO9LZI4a3elRbwR0RT4uI6yLi7ojYGRGDnS5MmlM7p/wZ3upB7e5xfxj4XGaeBrwI2Nm5kqQ2Hf55ykda7+KLv/9zlv3sEhVszvO4I+IpwE8BbwHIzClgqrNlSW2Y6fOUZzL9mSbXXlstt+6pSwVq5w04zwP2AVdFxIuAHcCFmbm/o5VJC6Xg2bylmURmHnmFiPXAvwFnZeYtEfFh4JHMvOSw9TYBmwD6+/vXbdmypUMlt2diYoJVq1bVWkO3sBeVgc2bGbjmGnZv2FDNMbjI+XvR1A29GB4e3pGZ69taOTOPeAGeDexuWf5J4B+OdJ9169Zl3cbGxuouoWvYi8y86abM1atz14YNmatXV8uLnL8XTd3QC2B7zpHH05c5D05m5gPA3og4tXHVCPCVo/iHItWj4Nm8pZm0e1bJrwOjEfGfwI8Dv9+5kqQFVvBs3tJM2vp0wMy8A2hv7EXqNgXP5i3NxHdOSlJhDG5JKozBLUmFMbglqTAGtyQVxuCWpMIY3JJUGINbkgpjcEtSYQxuSSqMwS1JhTG4JakwBrckFcbglqTCGNzScbRt7zY+8K8fYNvebXWXUjt7cfTa+jxuScdu295tjFw9wtTBKZb1LWPrBVsZXDNYd1m1sBfHxj1u6TgZ3z3O1MEpDuZBpg5OMb57vO6SamMvjo3BLR0nQwNDLOtbRl/0saxvGUMDQ3WXVBt7cWwcKpGOk8E1g2y9YCvju8cZGhha1EMD9uLYGNzScTS4ZtCQarAXR8+hEkkqjMEtSYUxuCWpMAa3JBXG4JakwhjcklQYg1uSCmNwS1JhDG5JKozBLUmFMbglqTAGtyQVxuCW5mn0zlEGPjTACb93AgMfGmD0ztG6S9Ii46cDSvMweucom/5+E489/hgA9z58L5v+fhMA5//o+XWWpkXEPW5pHt6z9T3/H9rTHnv8Md6z9T01VaTFqO3gjoi+iLg9Iq7vZEFSN9vz8J55XS91wnz2uC8EdnaqkF7kLNa9Z+1T187reqkT2gruiDgZeBVwZWfL6R3Ts1hfMnYJI1ePGN494vKRy1mxdMX3Xbdi6QouH7m8poq0GEVmzr1SxHXAB4AnA7+VmefOsM4mYBNAf3//ui1btixwqfMzMTHBqlWratv+6J5RNu/azCEOcQInsPGUjZy/tp6DV3X3opssRC/+6Zv/xJW7ruTByQd51vJn8bZT3sbZ/WcvUIXHj78XTd3Qi+Hh4R2Zub6tlTPziBfgXODPGt8PAdfPdZ9169Zl3cbGxmrd/s17bs4T339i9v1eX574/hPz5j0311ZL3b3oJvaiyV40dUMvgO05R7ZOX9o5HfAs4NUR8UrgScBTIuITmfnmo/insmg4i7WkTpkzuDPzYuBigIgYohoqMbTb4CzWkjrB87glqTDzeudkZo4D4x2pRJLUFve4JakwBrckFcbglqTCGNySVBiDW5IKY3BLUmEMbkkqjMEtSYUxuCWpMAa3JBXG4JakwhjcklQYg1uSCmNwS1JhDG51nLPdSwtrXp/HLc3X9Gz3UwenWNa3jK0XbHVWIOkYucetjhrfPc7UwSkO5kGmDk4xvnu87pKk4hnc6qihgSGW9S2jL/pY1reMoYGhukuSiudQiTrK2e6lhWdwq+Oc7V5aWA6VSFJhDG5JKozBLUmFMbglqTAGtyQVxuCWpMIY3JJUGINbkgpjcEtSYQxuSSqMwS1JhTG4JakwBrckFcbglqTCzBncEbEmIsYiYmdE3BURFx6PwiRJM2vn87gPABdl5m0R8WRgR0R8ITO/0uHaJEkzmHOPOzPvz8zbGt8/CuwETup0YVoY2/ZuY3TPqDOsSz1kXmPcETEAvBi4pRPFaGFNz7C+eddmRq4eMbylHtH21GURsQr4NPDOzHxkhts3AZsA+vv7GR8fX6gaj8rExETtNdRtdM8okwcmOcQhJg9MsnlsM5NrJ+suq1b+XjTZi6bSehGZOfdKEUuB64HPZ+Yfz7X++vXrc/v27QtQ3tEbHx9naGio1hrqNr3HPXlgkuVLlrP1gq2Lfu5Hfy+a7EVTN/QiInZk5vp21m3nrJIAPgbsbCe01T2mZ1jfeMpGQ1vqIe0MlZwFbADujIg7Gte9OzNv6FxZWiiDawaZXDtpaEs9ZM7gzswvAnEcapEktcF3TkpSYQxuSSqMwS1JhTG4JakwBrckFcbglqTCGNySVBiDW5IKY3BLUmEMbkkqjMEtSYUxuCWpMAa3JBXG4JakwhjcklQYg1uSCmNwS1JhDG5JKozBLUmFMbglqTAGtyQVxuCWpMIY3JJUGINbkgpjcEtSYQxuSSqMwS1JhTG4JakwBrckFcbglqTCGNySVBiDW5IKY3BLUmEMbkkqjMEtSYVpK7gj4hUR8dWIuCcifqfTRUmSZjdncEdEH/CnwDnA6cAbI+L0Thd2LLbt3cbonlG27d1WdymStODa2eN+KXBPZn4tM6eALcBrOlvW0du2dxsjV4+weddmRq4eMbwl9ZwlbaxzErC3Zfk+4GWHrxQRm4BNAP39/YyPjy9EffM2umeUyQOTHOIQkwcm2Ty2mcm1k7XU0i0mJiZq+3l0G3vRZC+aSutFO8EdM1yXT7gi86PARwHWr1+fQ0NDx1bZUVq+dzmje6vwXr5kORuHNzK4ZrCWWrrF+Pg4df08uo29aLIXTaX1op2hkvuANS3LJwPf6Ew5x25wzSBbL9jKxlM2svWCrYs+tCX1nnb2uG8FfjgiTgG+DrwBeFNHqzpGg2sGmVw7aWhL6klzBndmHoiIdwCfB/qAzZl5V8crkyTNqJ09bjLzBuCGDtciSWqD75yUpMIY3JJUGINbkgpjcEtSYQxuSSqMwS1JhTG4JakwBrckFcbglqTCGNySVBiDW5IKY3BLUmEMbkkqjMEtSYUxuCWpMAa3JBUmMp8w7++xP2jEPuDeBX/g+VkNfKvmGrqFvWiyF032oqkbevHczHxmOyt2JLi7QURsz8z1ddfRDexFk71oshdNpfXCoRJJKozBLUmF6eXg/mjdBXQRe9FkL5rsRVNRvejZMW5J6lW9vMctST3J4JakwvRkcEfEKyLiqxFxT0T8Tt311CUi1kTEWETsjIi7IuLCumuqU0T0RcTtEXF93bXUKSKeFhHXRcTdjd+NwbprqktE/Gbjb+PLEfGpiHhS3TW1o+eCOyL6gD8FzgFOB94YEafXW1VtDgAXZeYLgDOBX1vEvQC4ENhZdxFd4MPA5zLzNOBFLNKeRMRJwG8A6zPzhUAf8IZ6q2pPzwU38FLgnsz8WmZOAVuA19RcUy0y8/7MvK3x/aNUf6An1VtVPSLiZOBVwJV111KniHgK8FPAxwAycyozH6q3qlotAU6MiCXACuAbNdfTll4M7pOAvS3L97FIw6pVRAwALwZuqbeS2nwIeBdwqO5CavY8YB9wVWPY6MqIWFl3UXXIzK8DfwTsAe4HHs7MG+utqj29GNwxw3WL+pzHiFgFfBp4Z2Y+Unc9x1tEnAs8mJk76q6lCywBXgJckZkvBvYDi/I4UEQ8nerV+CnAc4CVEfHmeqtqTy8G933Ampblkynk5U8nRMRSqtAezczP1F1PTc4CXh0Ru6mGzn4mIj5Rb0m1uQ+4LzOnX3ldRxXki9HZwK7M3JeZjwOfAV5ec01t6cXgvhX44Yg4JSKWUR1s+GzNNdUiIoJqLHNnZv5x3fXUJTMvzsyTM3OA6vfhpswsYs9qoWXmA8DeiDi1cdUI8JUaS6rTHuDMiFjR+FsZoZADtUvqLmChZeaBiHgH8Hmqo8SbM/Oumsuqy1nABuDOiLijcd27M/OGGmtS/X4dGG3s2HwNeGvN9dQiM2+JiOuA26jOwLqdQt767lveJakwvThUIkk9zeCWpMIY3JJUGINbkgpjcEtSYQxuSSqMwS1Jhfk/hpVzVg+cfn8AAAAASUVORK5CYII=\n",
  133. "text/plain": [
  134. "<Figure size 432x288 with 1 Axes>"
  135. ]
  136. },
  137. "metadata": {},
  138. "output_type": "display_data"
  139. }
  140. ],
  141. "source": [
  142. "C1 = [1, 2, 4, 8, 9, 11]\n",
  143. "C2 = list(set(range(12)) - set(C1))\n",
  144. "X0C1, X1C1 = X0[C1], X1[C1]\n",
  145. "X0C2, X1C2 = X0[C2], X1[C2]\n",
  146. "plt.figure()\n",
  147. "plt.title('2nd iteration results')\n",
  148. "plt.axis([-1, 9, -1, 9])\n",
  149. "plt.grid(True)\n",
  150. "plt.plot(X0C1, X1C1, 'rx')\n",
  151. "plt.plot(X0C2, X1C2, 'g.')\n",
  152. "plt.plot(3.8,6.4,'rx',ms=12.0)\n",
  153. "plt.plot(4.57,4.14,'g.',ms=12.0);"
  154. ]
  155. },
  156. {
  157. "cell_type": "markdown",
  158. "metadata": {},
  159. "source": [
  160. "我们再重复一次上面的做法,把重心移动到新位置,并重新计算各个样本与新重心的距离,并根据距离远近为样本重新归类。结果如下表所示:\n",
  161. "![data_2](images/data_2.png)\n",
  162. "\n",
  163. "画图结果如下:\n"
  164. ]
  165. },
  166. {
  167. "cell_type": "code",
  168. "execution_count": 9,
  169. "metadata": {},
  170. "outputs": [
  171. {
  172. "data": {
  173. "image/png": "iVBORw0KGgoAAAANSUhEUgAAAW4AAAEICAYAAAB/Dx7IAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAFL5JREFUeJzt3X2QXXV9x/H314REQhC0wVhIwuJDVaqjlqCuVLvb2KnUp5lOS6EYqmmbKa2KT8UHpLbS6NSxFqwWG3UZwa2UQadVC2oNu1U7EUnAijHaMiRko6DQCrigG5J8+8c9y13DPtzN3ptzf7vv18zO5p49e8/3fHP3s7/7O3fvLzITSVI5HlV3AZKk2TG4JakwBrckFcbglqTCGNySVBiDW5IKY3CrLSIiI+LJU3zt+oj4gyNd0yE1jEbEE+usYS4i4i8j4pN116HuYHCLiPhkRNwZEfdHxH9HxB+18/4z88zM/ER1rFdHxNfaef+HiojhQ88hM5dn5u2dPO6REhE91S/KxXXXonoY3AJ4L9CTmY8BXgH8dUScNtmOdYdF3cefTjfXpvnF4BaZuSMzx8ZvVh9PAoiIvojYGxFvjYi7gCuq7X9ejdJ/EBEbprv/8RFwRDwd+AjQW01d3Ft9fWlEvD8i9kTEDyPiIxFx9FTHj4jHRsTnI+LuiPhx9e9V1f6bgBcCH6qO8aFq+8NTORFxXERcWX3/HRHxzoh4VPW1V0fE16p6fhwRuyLizGnObXdV27eAByJicUScGBGfru5/V0S8fsL+z42IbdWzmx9GxAcmnuck9/3iSQ77lerzvdU59kbEkyPiPyLivoi4JyL+ebr/E5XN4BYAEfEPEfEg8F3gTuC6CV9+AvA44GRgY0S8BHgL8BvAU4DJwuURMnMn8CfA1mrq4vjqS38D/BLwbODJwEnAX0x1fBqP2yuq22uAnwIfqo5xEfBV4LXVMV47SSl/DxwHPBH4NeA84DUTvv484HvACuB9wMcjIqY5tXOAlwLHAweBzwH/VZ3HOuANEfGb1b6XAZdVz26eBFwzzf1O5UXV5+Orc9wKXAJ8CXgssKo6R81TBrcAyMw/BY6lMVr9DDA24csHgXdl5lhm/hQ4C7giM7+dmQ8Af3m4x60C8Y+BN2bm/2XmT4D3AGdPdfzM/N/M/HRmPljtv4lGALdyvEXA7wFvz8yfZOZu4G+B9RN2uyMzP5qZB4BPAL8IrJzmbj+YmSNVb04HTsjMd2fmvmpe/aMTzuch4MkRsSIzRzPz663U3YKHaPwiOzEzf5aZHb2OoHoZ3HpYZh6ofuBXAedP+NLdmfmzCbdPBEYm3L5jDoc9AVgGbI+Ie6vpky9U2yc9fkQsi4h/rKY57qcxdXB8FcozWQEsOaTmO2iMjsfdNf6PzHyw+ufyae5zYi9OBk4cP5fqfN5BM/j/kMazi+9GxE0R8bIWam7FhUAA34iIHTNNX6lsXkzRZBZTzXFXDn0LyTuB1RNur5nFfR96X/fQmOr45cz8fovf82bgqcDzMvOuiHg2cAuN4Jps/0OPNz46/U61bQ0w1bFbMfF4I8CuzHzKpDtm/g9wTjWn/tvAtRHxC8ADNH6BAQ8/MzhhsvtgkvPLzLtoPHMhIn4V+HJEfCUzbzuM81GXc8S9wEXE4yPi7IhYHhGLqrnYc4Abpvm2a4BXR8SpEbEMeNcsDvlDYFVELAHIzIM0phL+LiIeX9V00oQ54ckcSyPs742Ix01y/B/SmL9+hGr64xpgU0QcGxEnA28C2vUa6W8A91cXLI+uevqMiDgdICJeFREnVOd9b/U9B4D/Bh4dES+NiKOAdwJLpzjG3TSmjx4+x4j43fELtMCPaYT7gTadk7qMwa2kMS2yl8YP/PuBN2Tmv075DZnXA5fSCPfbmD7kD3UDsAO4KyLuqba9tbqfr1dTH1+mMaKeyqXA0TRGz1+nMbUy0WXA71SvCvngJN//Ohoj3NuBrwH/BAzM4hymVP1ieDmNC627qho/RuNiKMBLgB0RMVrVeXY1J30f8KfVvt+v6tvLJKrpm03Af1bTMc+nMbd+Y3W/nwUuyMxd7TgndZ9wIQVJKosjbkkqjMEtSYUxuCWpMAa3JBWmI6/jXrFiRfb09HTirlv2wAMPcMwxx9RaQ7ewF032osleNHVDL7Zv335PZk712v2f05Hg7unpYdu2bZ2465YNDw/T19dXaw3dwl402Ysme9HUDb2IiJb/AtmpEkkqjMEtSYUxuCWpMAa3JBXG4JakwhjcklQYg1uSCmNwS1JhDG5JKozBLUmFMbglqTAGtyQVxuCWpMIY3JJUGINbkgpjcEtSYQxuSSpMS8EdEW+MiB0R8e2I+FREPLrThUnqgPe9D4aGfn7b0FBju4oxY3BHxEnA64G1mfkMYBFwdqcLk9QBp58OZ53VDO+hocbt00+vty7NSqtrTi4Gjo6Ih4BlwA86V5Kkjunvh2uugbPOoufMM+H66xu3+/vrrkyzEJk5804RFwCbgJ8CX8rMcyfZZyOwEWDlypWnXX311W0udXZGR0dZvnx5rTV0C3vRZC8aegYG6LnqKnavX8/uDRvqLqd23fC46O/v356Za1vaOTOn/QAeC9wAnAAcBfwL8Krpvue0007Lug0NDdVdQtewF032IjNvuCFzxYrctX595ooVjdsLXDc8LoBtOUMej3+0cnHyxcCuzLw7Mx8CPgO84DB+oUiq2/ic9jXXNEba1bTJIy5Yqqu1Etx7gOdHxLKICGAdsLOzZUnqiJtu+vk57fE575tuqrcuzcqMFycz88aIuBa4GdgP3AJs7nRhkjrgwgsfua2/34uThWnpVSWZ+S7gXR2uRZLUAv9yUpIKY3BLUmEMbkkqjMEtSYUxuCWpMAa3JBXG4JakwhjcklQYg1uSCmNwS1JhDG6pm0221NhUXIJswTC4pW526FJjU3EJsgXF4Ja62YSlxqYM7wnvse27/C0MBrc6xxXFm+bSi+nCu8TQ7pbHRbfUcRgMbnWOK4o3zbUXk4V3iaEN3fO46JY6Dkera5zN5sM1J7tLrb2o1jfMiy/uivUNi+9FG/tZfC/aWEc3rL9Jm9eclA5ffz+cfz5ccknjc0kjw3ZrRy/mSz+75TyqOnquuqqofhrc6qyhIbj8crj44sbnhbwobTt6MV/62S3nUdWxe/36svrZ6tB8Nh9OlXSX2nox/nR4/OnnobdrUHQv2tzPonvR5jqGhoZqf3ziVIm6giuKN821F5NdiGzlpYLdqFseF91Sx+FoNeFn8+GIu7vYi6YiezHTSPAwR4pF9qJDuqEXOOKW5olWXvJX6shbh83glrrZoU/np1LS03zN2eK6C5A0jQsvbH3f/v5iXs6muXHELUmFMbglqTAGtyQVxuCWpMIY3JJUGINbkgpjcEtSYQxuSSqMwS1JhTG4JakwLQV3RBwfEddGxHcjYmdE9Ha6MEnS5FodcV8GfCEznwY8C9jZuZKkNit4NW9pMjMGd0Q8BngR8HGAzNyXmfd2ujCpbUpezVuaRCvvDvhE4G7gioh4FrAduCAzH+hoZVK7THi/6p4zz4Trr2/trVKlLhWNhRem2SFiLfB14IzMvDEiLgPuz8yLD9lvI7ARYOXKladdffXVHSq5NaOjoyxfvrzWGrqFvWjoGRig56qr2L1+Pbs3bKi7nNr5uGjqhl709/dvz8y1Le080xI5wBOA3RNuvxD4t+m+x6XLuou9yIeX99q1fn3tCxZ3Cx8XTd3QC9q5dFlm3gWMRMRTq03rgO8cxi8UqR4Tlv/avWGDy3ypeK2+quR1wGBEfAt4NvCezpUktVnJq3lLk2hp6bLM/CbQ2tyL1G0mW/7LZb5UMP9yUpIKY3BLUmEMbkkqjMEtSYUxuCWpMAa3JBXG4JakwhjcklQYg1uSCmNwS1JhDG5JKozBLUmFMbglqTAGt3QkuGBxk72YM4NbOhJcsLjJXsxZS+/HLWmOJixYzPnnw+WXL9wFi+3FnDnilo6U/v5GUF1ySePzQg4qezEnBrd0pAwNNUaXF1/c+LyQ17y0F3NicEtHwoQFi3n3uxf2gsX2Ys4MbulIcMHiJnsxZ16clI4EFyxushdz5ohbkgpjcEtSYQxuSSqMwS1JhTG4JakwBrckFcbglqTCGNySVBiDW5IKY3BLUmEMbkkqjMEtSYUxuCWpMAa3JBWm5eCOiEURcUtEfL6TBUmSpjebEfcFwM5OFTIfbR3Zynu/+l62jmytuxRJ80hLCylExCrgpcAm4E0drWie2DqylXVXrmPfgX0sWbSELedtoXd1b91lSZoHWl0B51LgQuDYqXaIiI3ARoCVK1cyPDw85+LmYnR0tNYaBvcMMrZ/jIMcZGz/GANDA4ytGaullrp70U3sRZO9aCqtFzMGd0S8DPhRZm6PiL6p9svMzcBmgLVr12Zf35S7HhHDw8PUWcPSkaUMjgw+POLe0L+hthF33b3oJvaiyV40ldaLVkbcZwCviIjfAh4NPCYiPpmZr+psaWXrXd3LlvO2MLx7mL6ePqdJJLXNjMGdmW8H3g5QjbjfYmi3pnd1r4Etqe18HbckFabVi5MAZOYwMNyRSiRJLXHELUmFMbglqTAGtyQVxuCWpMIY3JJUGINbkgpjcEtSYQxuSSqMwS1JhTG4JakwBrckFcbglqTCGNySVBiDW5IKY3Cr41ztXmqvWb0ftzRbrnYvtZ8jbnXU8O5h9h3Yx4E8wL4D+xjePVx3SVLxDO4FavDWQXou7eFRf/Uoei7tYfDWwY4cp6+njyWLlrAoFrFk0RL6evo6chxpIXGqZAEavHWQjZ/byIMPPQjAHffdwcbPbQTg3Gee29Zjudq91H4G9wJ00ZaLHg7tcQ8+9CAXbbmo7cENrnYvtZtTJQvQnvv2zGq7pO5icC9Aa45bM6vtkrqLwb0AbVq3iWVHLfu5bcuOWsamdZtqqkjSbBjcC9C5zzyXzS/fzMnHnUwQnHzcyWx++eaOzG9Laj8vTi5Q5z7zXINaKpQjbkkqjMEtSYUxuCWpMAa3JBXG4JakwhjcklQYg1uSCmNwS1JhDG5JKsyMwR0RqyNiKCJ2RsSOiLjgSBQmSZpcK3/yvh94c2beHBHHAtsj4t8z8zsdrk2SNIkZR9yZeWdm3lz9+yfATuCkThem9tg6spXBPYOusC7NI7Oa446IHuA5wI2dKEbtNb7C+sCuAdZduc7wluaJlt8dMCKWA58G3pCZ90/y9Y3ARoCVK1cyPDzcrhoPy+joaO011G1wzyBj+8c4yEHG9o8xMDTA2JqxusuqlY+LJnvRVFovIjNn3iniKODzwBcz8wMz7b927drctm1bG8o7fMPDw/T19dVaQ93GR9xj+8dYungpW87bsuDXfvRx0WQvmrqhFxGxPTPXtrJvK68qCeDjwM5WQlvdY3yF9Q2nbDC0pXmklamSM4D1wK0R8c1q2zsy87rOlaV26V3dy9iaMUNbmkdmDO7M/BoQR6AWSVIL/MtJSSqMwS1JhTG4JakwBrckFcbglqTCGNySVBiDW5IKY3BLUmEMbkkqjMEtSYUxuCWpMAa3JBXG4JakwhjcklQYg1uSCmNwS1JhDG5JKozBLUmFMbglqTAGtyQVxuCWpMIY3JJUGINbkgpjcEtSYQxuSSqMwS1JhTG4JakwBrckFcbglqTCGNySVBiDW5IKY3BLUmEMbkkqjMEtSYUxuCWpMC0Fd0S8JCK+FxG3RcTbOl2UJGlqMwZ3RCwCPgycCZwKnBMRp3a6sLnYOrKVwT2DbB3ZWncpktR2rYy4nwvclpm3Z+Y+4GrglZ0t6/BtHdnKuivXMbBrgHVXrjO8Jc07i1vY5yRgZMLtvcDzDt0pIjYCGwFWrlzJ8PBwO+qbtcE9g4ztH+MgBxnbP8bA0ABja8ZqqaVbjI6O1vb/0W3sRZO9aCqtF60Ed0yyLR+xIXMzsBlg7dq12dfXN7fKDtPSkaUMjjTCe+nipWzo30Dv6t5aaukWw8PD1PX/0W3sRZO9aCqtF61MlewFVk+4vQr4QWfKmbve1b1sOW8LG07ZwJbztiz40JY0/7Qy4r4JeEpEnAJ8Hzgb+P2OVjVHvat7GVszZmhLmpdmDO7M3B8RrwW+CCwCBjJzR8crkyRNqpURN5l5HXBdh2uRJLXAv5yUpMIY3JJUGINbkgpjcEtSYQxuSSqMwS1JhTG4JakwBrckFcbglqTCGNySVBiDW5IKY3BLUmEMbkkqjMEtSYUxuCWpMAa3JBUmMh+x7u/c7zTibuCOtt/x7KwA7qm5hm5hL5rsRZO9aOqGXpycmSe0smNHgrsbRMS2zFxbdx3dwF402Ysme9FUWi+cKpGkwhjcklSY+Rzcm+suoIvYiyZ70WQvmorqxbyd45ak+Wo+j7glaV4yuCWpMPMyuCPiJRHxvYi4LSLeVnc9dYmI1RExFBE7I2JHRFxQd011iohFEXFLRHy+7lrqFBHHR8S1EfHd6rHRW3dNdYmIN1Y/G9+OiE9FxKPrrqkV8y64I2IR8GHgTOBU4JyIOLXeqmqzH3hzZj4deD7wZwu4FwAXADvrLqILXAZ8ITOfBjyLBdqTiDgJeD2wNjOfASwCzq63qtbMu+AGngvclpm3Z+Y+4GrglTXXVIvMvDMzb67+/RMaP6An1VtVPSJiFfBS4GN111KniHgM8CLg4wCZuS8z7623qlotBo6OiMXAMuAHNdfTkvkY3CcBIxNu72WBhtVEEdEDPAe4sd5KanMpcCFwsO5CavZE4G7gimra6GMRcUzdRdUhM78PvB/YA9wJ3JeZX6q3qtbMx+COSbYt6Nc8RsRy4NPAGzLz/rrrOdIi4mXAjzJze921dIHFwK8Al2fmc4AHgAV5HSgiHkvj2fgpwInAMRHxqnqras18DO69wOoJt1dRyNOfToiIo2iE9mBmfqbuempyBvCKiNhNY+rs1yPik/WWVJu9wN7MHH/mdS2NIF+IXgzsysy7M/Mh4DPAC2quqSXzMbhvAp4SEadExBIaFxs+W3NNtYiIoDGXuTMzP1B3PXXJzLdn5qrM7KHxeLghM4sYWbVbZt4FjETEU6tN64Dv1FhSnfYAz4+IZdXPyjoKuVC7uO4C2i0z90fEa4Ev0rhKPJCZO2ouqy5nAOuBWyPim9W2d2TmdTXWpPq9DhisBja3A6+puZ5aZOaNEXEtcDONV2DdQiF/+u6fvEtSYebjVIkkzWsGtyQVxuCWpMIY3JJUGINbkgpjcEtSYQxuSSrM/wO0r+VgwXrTMwAAAABJRU5ErkJggg==\n",
  174. "text/plain": [
  175. "<Figure size 432x288 with 1 Axes>"
  176. ]
  177. },
  178. "metadata": {},
  179. "output_type": "display_data"
  180. }
  181. ],
  182. "source": [
  183. "C1 = [0, 1, 2, 4, 8, 9, 10, 11]\n",
  184. "C2 = list(set(range(12)) - set(C1))\n",
  185. "X0C1, X1C1 = X0[C1], X1[C1]\n",
  186. "X0C2, X1C2 = X0[C2], X1[C2]\n",
  187. "plt.figure()\n",
  188. "plt.title('3rd iteration results')\n",
  189. "plt.axis([-1, 9, -1, 9])\n",
  190. "plt.grid(True)\n",
  191. "plt.plot(X0C1, X1C1, 'rx')\n",
  192. "plt.plot(X0C2, X1C2, 'g.')\n",
  193. "plt.plot(5.5,7.0,'rx',ms=12.0)\n",
  194. "plt.plot(2.2,2.8,'g.',ms=12.0);"
  195. ]
  196. },
  197. {
  198. "cell_type": "markdown",
  199. "metadata": {},
  200. "source": [
  201. "再重复上面的方法就会发现类的重心不变了,K-Means会在条件满足的时候停止重复聚类过程。通常,条件是前后两次迭代的成本函数值的差达到了限定值,或者是前后两次迭代的重心位置变化达到了限定值。如果这些停止条件足够小,K-Means就能找到最优解。不过这个最优解不一定是全局最优解。\n",
  202. "\n"
  203. ]
  204. },
  205. {
  206. "cell_type": "markdown",
  207. "metadata": {},
  208. "source": [
  209. "## Program"
  210. ]
  211. },
  212. {
  213. "cell_type": "code",
  214. "execution_count": 10,
  215. "metadata": {},
  216. "outputs": [
  217. {
  218. "data": {
  219. "text/html": [
  220. "<div>\n",
  221. "<style scoped>\n",
  222. " .dataframe tbody tr th:only-of-type {\n",
  223. " vertical-align: middle;\n",
  224. " }\n",
  225. "\n",
  226. " .dataframe tbody tr th {\n",
  227. " vertical-align: top;\n",
  228. " }\n",
  229. "\n",
  230. " .dataframe thead th {\n",
  231. " text-align: right;\n",
  232. " }\n",
  233. "</style>\n",
  234. "<table border=\"1\" class=\"dataframe\">\n",
  235. " <thead>\n",
  236. " <tr style=\"text-align: right;\">\n",
  237. " <th></th>\n",
  238. " <th>sepal-length</th>\n",
  239. " <th>sepal-width</th>\n",
  240. " <th>petal-length</th>\n",
  241. " <th>petal-width</th>\n",
  242. " <th>class</th>\n",
  243. " </tr>\n",
  244. " </thead>\n",
  245. " <tbody>\n",
  246. " <tr>\n",
  247. " <th>0</th>\n",
  248. " <td>5.1</td>\n",
  249. " <td>3.5</td>\n",
  250. " <td>1.4</td>\n",
  251. " <td>0.2</td>\n",
  252. " <td>Iris-setosa</td>\n",
  253. " </tr>\n",
  254. " <tr>\n",
  255. " <th>1</th>\n",
  256. " <td>4.9</td>\n",
  257. " <td>3.0</td>\n",
  258. " <td>1.4</td>\n",
  259. " <td>0.2</td>\n",
  260. " <td>Iris-setosa</td>\n",
  261. " </tr>\n",
  262. " <tr>\n",
  263. " <th>2</th>\n",
  264. " <td>4.7</td>\n",
  265. " <td>3.2</td>\n",
  266. " <td>1.3</td>\n",
  267. " <td>0.2</td>\n",
  268. " <td>Iris-setosa</td>\n",
  269. " </tr>\n",
  270. " <tr>\n",
  271. " <th>3</th>\n",
  272. " <td>4.6</td>\n",
  273. " <td>3.1</td>\n",
  274. " <td>1.5</td>\n",
  275. " <td>0.2</td>\n",
  276. " <td>Iris-setosa</td>\n",
  277. " </tr>\n",
  278. " <tr>\n",
  279. " <th>4</th>\n",
  280. " <td>5.0</td>\n",
  281. " <td>3.6</td>\n",
  282. " <td>1.4</td>\n",
  283. " <td>0.2</td>\n",
  284. " <td>Iris-setosa</td>\n",
  285. " </tr>\n",
  286. " </tbody>\n",
  287. "</table>\n",
  288. "</div>"
  289. ],
  290. "text/plain": [
  291. " sepal-length sepal-width petal-length petal-width class\n",
  292. "0 5.1 3.5 1.4 0.2 Iris-setosa\n",
  293. "1 4.9 3.0 1.4 0.2 Iris-setosa\n",
  294. "2 4.7 3.2 1.3 0.2 Iris-setosa\n",
  295. "3 4.6 3.1 1.5 0.2 Iris-setosa\n",
  296. "4 5.0 3.6 1.4 0.2 Iris-setosa"
  297. ]
  298. },
  299. "execution_count": 10,
  300. "metadata": {},
  301. "output_type": "execute_result"
  302. }
  303. ],
  304. "source": [
  305. "# This line configures matplotlib to show figures embedded in the notebook, \n",
  306. "# instead of opening a new window for each figure. More about that later. \n",
  307. "# If you are using an old version of IPython, try using '%pylab inline' instead.\n",
  308. "%matplotlib inline\n",
  309. "\n",
  310. "# import librarys\n",
  311. "from numpy import *\n",
  312. "import matplotlib.pyplot as plt\n",
  313. "import pandas as pd\n",
  314. "import random\n",
  315. "\n",
  316. "# Load dataset\n",
  317. "names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'class']\n",
  318. "dataset = pd.read_csv(\"iris.csv\", header=0, index_col=0)\n",
  319. "dataset.head()\n"
  320. ]
  321. },
  322. {
  323. "cell_type": "code",
  324. "execution_count": 12,
  325. "metadata": {
  326. "lines_to_next_cell": 2
  327. },
  328. "outputs": [
  329. {
  330. "ename": "TypeError",
  331. "evalue": "list indices must be integers or slices, not Series",
  332. "output_type": "error",
  333. "traceback": [
  334. "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
  335. "\u001b[1;31mTypeError\u001b[0m Traceback (most recent call last)",
  336. "\u001b[1;32m<ipython-input-12-8ca54c156256>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[0;32m 4\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 5\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 6\u001b[1;33m \u001b[0mdataset\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mloc\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;34m'class'\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m[\u001b[0m\u001b[0mdataset\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;34m'class'\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m==\u001b[0m\u001b[1;34m'Iris-setosa'\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;36m0\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 7\u001b[0m \u001b[0mdataset\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mloc\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;34m'class'\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m[\u001b[0m\u001b[0mdataset\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;34m'class'\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m==\u001b[0m\u001b[1;34m'Iris-versicolor'\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;36m1\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 8\u001b[0m \u001b[0mdataset\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mloc\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;34m'class'\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m[\u001b[0m\u001b[0mdataset\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;34m'class'\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m==\u001b[0m\u001b[1;34m'Iris-virginica'\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;36m2\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
  337. "\u001b[1;31mTypeError\u001b[0m: list indices must be integers or slices, not Series"
  338. ]
  339. }
  340. ],
  341. "source": [
  342. "#对类别进行编码,3个类别分别赋值0,1,2\n",
  343. "\n",
  344. "dataset['class'][dataset['class']=='Iris-setosa']=0\n",
  345. "dataset['class'][dataset['class']=='Iris-versicolor']=1\n",
  346. "dataset['class'][dataset['class']=='Iris-virginica']=2"
  347. ]
  348. },
  349. {
  350. "cell_type": "code",
  351. "execution_count": 13,
  352. "metadata": {
  353. "lines_to_next_cell": 2
  354. },
  355. "outputs": [],
  356. "source": [
  357. "def originalDatashow(dataSet):\n",
  358. " #绘制原始的样本点\n",
  359. " num,dim=shape(dataSet)\n",
  360. " marksamples=['ob'] #Sample graphic marking\n",
  361. " for i in range(num):\n",
  362. " plt.plot(datamat.iat[i,0],datamat.iat[i,1],marksamples[0],markersize=5)\n",
  363. " plt.title('original dataset')\n",
  364. " plt.xlabel('sepal length')\n",
  365. " plt.ylabel('sepal width') \n",
  366. " plt.show()"
  367. ]
  368. },
  369. {
  370. "cell_type": "code",
  371. "execution_count": 14,
  372. "metadata": {
  373. "lines_to_end_of_cell_marker": 2,
  374. "scrolled": true
  375. },
  376. "outputs": [
  377. {
  378. "data": {
  379. "image/png": "\n",
  380. "text/plain": [
  381. "<Figure size 432x288 with 1 Axes>"
  382. ]
  383. },
  384. "metadata": {},
  385. "output_type": "display_data"
  386. }
  387. ],
  388. "source": [
  389. "#获取样本数据\n",
  390. "datamat = dataset.loc[:, ['sepal-length', 'sepal-width']]\n",
  391. "# 真实的标签\n",
  392. "labels = dataset.loc[:, ['class']]\n",
  393. "#原始数据显示\n",
  394. "originalDatashow(datamat)"
  395. ]
  396. },
  397. {
  398. "cell_type": "code",
  399. "execution_count": 15,
  400. "metadata": {},
  401. "outputs": [],
  402. "source": [
  403. "import random\n",
  404. "\n",
  405. "def randChosenCent(dataSet,k):\n",
  406. " \"\"\"初始化聚类中心:通过在区间范围随机产生的值作为新的中心点\"\"\"\n",
  407. "\n",
  408. " # 样本数\n",
  409. " m=shape(dataSet)[0]\n",
  410. " # 初始化列表\n",
  411. " centroidsIndex=[]\n",
  412. " \n",
  413. " #生成类似于样本索引的列表\n",
  414. " dataIndex=list(range(m))\n",
  415. " if False:\n",
  416. " for i in range(k):\n",
  417. " #生成随机数\n",
  418. " randIndex=random.randint(0,len(dataIndex))\n",
  419. " #将随机产生的样本的索引放入centroidsIndex\n",
  420. " centroidsIndex.append(dataIndex[randIndex])\n",
  421. " #删除已经被抽中的样本\n",
  422. " del dataIndex[randIndex]\n",
  423. " else:\n",
  424. " random.shuffle(dataIndex)\n",
  425. " centroidsIndex = dataIndex[:k]\n",
  426. " \n",
  427. " #根据索引获取样本\n",
  428. " centroids = dataSet.iloc[centroidsIndex]\n",
  429. " return mat(centroids)"
  430. ]
  431. },
  432. {
  433. "cell_type": "code",
  434. "execution_count": 24,
  435. "metadata": {},
  436. "outputs": [],
  437. "source": [
  438. "\n",
  439. "def distEclud(vecA, vecB):\n",
  440. " \"\"\"算距离, 两个向量间欧式距离\"\"\"\n",
  441. " return sqrt(sum(power(vecA - vecB, 2))) #la.norm(vecA-vecB)\n",
  442. "\n",
  443. "\n",
  444. "def kMeans(dataSet, k):\n",
  445. " # 样本总数\n",
  446. " m = shape(dataSet)[0]\n",
  447. " # 分配样本到最近的簇:存[簇序号,距离的平方] (m行 x 2 列)\n",
  448. " clusterAssment = mat(zeros((m, 2)))\n",
  449. "\n",
  450. " # step1: 通过随机产生的样本点初始化聚类中心\n",
  451. " centroids = randChosenCent(dataSet, k)\n",
  452. " print('最初的中心=', centroids)\n",
  453. "\n",
  454. " # 标志位,如果迭代前后样本分类发生变化值为True,否则为False\n",
  455. " clusterChanged = True\n",
  456. " # 查看迭代次数\n",
  457. " iterTime = 0\n",
  458. " \n",
  459. " # 所有样本分配结果不再改变,迭代终止\n",
  460. " while clusterChanged:\n",
  461. " clusterChanged = False\n",
  462. " \n",
  463. " # step2:分配到最近的聚类中心对应的簇中\n",
  464. " for i in range(m):\n",
  465. " # 初始定义距离为无穷大\n",
  466. " minDist = inf;\n",
  467. " # 初始化索引值\n",
  468. " minIndex = -1\n",
  469. " # 计算每个样本与k个中心点距离\n",
  470. " for j in range(k):\n",
  471. " # 计算第i个样本到第j个中心点的距离\n",
  472. " distJI = distEclud(centroids[j, :], dataSet.values[i, :])\n",
  473. " # 判断距离是否为最小\n",
  474. " if distJI < minDist:\n",
  475. " # 更新获取到最小距离\n",
  476. " minDist = distJI\n",
  477. " # 获取对应的簇序号\n",
  478. " minIndex = j\n",
  479. " # 样本上次分配结果跟本次不一样,标志位clusterChanged置True\n",
  480. " if clusterAssment[i, 0] != minIndex:\n",
  481. " clusterChanged = True\n",
  482. " clusterAssment[i, :] = minIndex, minDist ** 2 # 分配样本到最近的簇\n",
  483. " \n",
  484. " iterTime += 1\n",
  485. " sse = sum(clusterAssment[:, 1])\n",
  486. " print('the SSE of %d' % iterTime + 'th iteration is %f' % sse)\n",
  487. " \n",
  488. " # step3:更新聚类中心\n",
  489. " for cent in range(k): # 样本分配结束后,重新计算聚类中心\n",
  490. " # 获取该簇所有的样本点,nonzero[0]表示A == cent的元素所在的行,如果没有[0],列也会表示\n",
  491. " ptsInClust = dataSet.iloc[nonzero(clusterAssment[:, 0].A == cent)[0]]\n",
  492. " # 更新聚类中心:axis=0沿列方向求均值。\n",
  493. " centroids[cent, :] = mean(ptsInClust, axis=0)\n",
  494. " return centroids, clusterAssment\n"
  495. ]
  496. },
  497. {
  498. "cell_type": "code",
  499. "execution_count": 16,
  500. "metadata": {},
  501. "outputs": [
  502. {
  503. "ename": "NameError",
  504. "evalue": "name 'kMeans' is not defined",
  505. "output_type": "error",
  506. "traceback": [
  507. "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
  508. "\u001b[1;31mNameError\u001b[0m Traceback (most recent call last)",
  509. "\u001b[1;32m<ipython-input-16-e15ec3d8a107>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[0;32m 1\u001b[0m \u001b[1;31m# 进行k-means聚类\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 2\u001b[0m \u001b[0mk\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;36m3\u001b[0m \u001b[1;31m# 用户定义聚类数\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 3\u001b[1;33m \u001b[0mmycentroids\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mclusterAssment\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mkMeans\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mdatamat\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mk\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
  510. "\u001b[1;31mNameError\u001b[0m: name 'kMeans' is not defined"
  511. ]
  512. }
  513. ],
  514. "source": [
  515. "# 进行k-means聚类\n",
  516. "k = 3 # 用户定义聚类数\n",
  517. "mycentroids, clusterAssment = kMeans(datamat, k)"
  518. ]
  519. },
  520. {
  521. "cell_type": "code",
  522. "execution_count": 26,
  523. "metadata": {},
  524. "outputs": [],
  525. "source": [
  526. "def datashow(dataSet, k, centroids, clusterAssment): # 二维空间显示聚类结果\n",
  527. " from matplotlib import pyplot as plt\n",
  528. " num, dim = shape(dataSet) # 样本数num ,维数dim\n",
  529. "\n",
  530. " if dim != 2:\n",
  531. " print('sorry,the dimension of your dataset is not 2!')\n",
  532. " return 1\n",
  533. " marksamples = ['or', 'ob', 'og', 'ok', '^r', '^b', '<g'] # 样本图形标记\n",
  534. " if k > len(marksamples):\n",
  535. " print('sorry,your k is too large,please add length of the marksample!')\n",
  536. " return 1\n",
  537. " # 绘所有样本\n",
  538. " for i in range(num):\n",
  539. " markindex = int(clusterAssment[i, 0]) # 矩阵形式转为int值, 簇序号\n",
  540. " # 特征维对应坐标轴x,y;样本图形标记及大小\n",
  541. " plt.plot(dataSet.iat[i, 0], dataSet.iat[i, 1], marksamples[markindex], markersize=6)\n",
  542. "\n",
  543. " # 绘中心点\n",
  544. " markcentroids = ['o', '*', '^'] # 聚类中心图形标记\n",
  545. " label = ['0', '1', '2']\n",
  546. " c = ['yellow', 'pink', 'red']\n",
  547. " for i in range(k):\n",
  548. " plt.plot(centroids[i, 0], centroids[i, 1], markcentroids[i], markersize=15, label=label[i], c=c[i])\n",
  549. " plt.legend(loc='upper left') #图例\n",
  550. " plt.xlabel('sepal length')\n",
  551. " plt.ylabel('sepal width')\n",
  552. "\n",
  553. " plt.title('k-means cluster result') # 标题\n",
  554. " plt.show()\n",
  555. " \n",
  556. " \n",
  557. "# 画出实际图像\n",
  558. "def trgartshow(dataSet, k, labels):\n",
  559. " from matplotlib import pyplot as plt\n",
  560. "\n",
  561. " num, dim = shape(dataSet)\n",
  562. " label = ['0', '1', '2']\n",
  563. " marksamples = ['ob', 'or', 'og', 'ok', '^r', '^b', '<g']\n",
  564. " # 通过循环的方式,完成分组散点图的绘制\n",
  565. " for i in range(num):\n",
  566. " plt.plot(datamat.iat[i, 0], datamat.iat[i, 1], marksamples[int(labels.iat[i, 0])], markersize=6)\n",
  567. " for i in range(0, num, 50):\n",
  568. " plt.plot(datamat.iat[i, 0], datamat.iat[i, 1], marksamples[int(labels.iat[i, 0])], markersize=6,\n",
  569. " label=label[int(labels.iat[i, 0])])\n",
  570. " plt.legend(loc='upper left')\n",
  571. " \n",
  572. " # 添加轴标签和标题\n",
  573. " plt.xlabel('sepal length')\n",
  574. " plt.ylabel('sepal width')\n",
  575. " plt.title('iris true result') # 标题\n",
  576. "\n",
  577. " # 显示图形\n",
  578. " plt.show()\n",
  579. " # label=labels.iat[i,0]"
  580. ]
  581. },
  582. {
  583. "cell_type": "code",
  584. "execution_count": 9,
  585. "metadata": {},
  586. "outputs": [
  587. {
  588. "data": {
  589. "image/png": "\n",
  590. "text/plain": [
  591. "<Figure size 432x288 with 1 Axes>"
  592. ]
  593. },
  594. "metadata": {
  595. "needs_background": "light"
  596. },
  597. "output_type": "display_data"
  598. },
  599. {
  600. "data": {
  601. "image/png": "\n",
  602. "text/plain": [
  603. "<Figure size 432x288 with 1 Axes>"
  604. ]
  605. },
  606. "metadata": {
  607. "needs_background": "light"
  608. },
  609. "output_type": "display_data"
  610. }
  611. ],
  612. "source": [
  613. "# 绘图显示\n",
  614. "datashow(datamat, k, mycentroids, clusterAssment)\n",
  615. "trgartshow(datamat, 3, labels)"
  616. ]
  617. },
  618. {
  619. "cell_type": "markdown",
  620. "metadata": {},
  621. "source": [
  622. "## 利用sklearn进行分类\n"
  623. ]
  624. },
  625. {
  626. "cell_type": "code",
  627. "execution_count": 27,
  628. "metadata": {},
  629. "outputs": [
  630. {
  631. "data": {
  632. "text/plain": [
  633. "<Figure size 432x288 with 0 Axes>"
  634. ]
  635. },
  636. "metadata": {},
  637. "output_type": "display_data"
  638. },
  639. {
  640. "data": {
  641. "image/png": "iVBORw0KGgoAAAANSUhEUgAAAP4AAAECCAYAAADesWqHAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4wLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvqOYd8AAAC8tJREFUeJzt3X+o1fUdx/HXazetlpK2WoRGZgwhguUPZFHEphm2wv2zRKFgsaF/bJFsULZ/Rv/1V7Q/RiBWCzKjawkjtpaSEUGr3Wu2TG2UGCnVLTTM/lCy9/44X4eJ637v3f187jnn/XzAwXO9x/P63Ht9ne/3e+73nLcjQgBy+c5kLwBAfRQfSIjiAwlRfCAhig8kRPGBhLqi+LaX237X9nu21xfOesz2iO3dJXNOy7vc9g7be2y/Y/uewnnn2X7D9ltN3gMl85rMAdtv2n6+dFaTd8D227Z32R4qnDXD9hbb+2zvtX1dwax5zdd06nLU9roiYRExqRdJA5LelzRX0lRJb0m6umDejZIWSNpd6eu7TNKC5vp0Sf8u/PVZ0rTm+hRJr0v6UeGv8beSnpL0fKXv6QFJF1fKekLSr5rrUyXNqJQ7IOljSVeUuP9u2OIvlvReROyPiBOSnpb0s1JhEfGKpMOl7v8seR9FxM7m+heS9kqaVTAvIuJY8+GU5lLsLC3bsyXdKmljqYzJYvtCdTYUj0pSRJyIiM8rxS+V9H5EfFDizruh+LMkfXjaxwdVsBiTyfYcSfPV2QqXzBmwvUvSiKRtEVEy72FJ90r6umDGmULSi7aHba8pmHOlpE8lPd4cymy0fUHBvNOtkrS51J13Q/FTsD1N0rOS1kXE0ZJZEXEyIq6VNFvSYtvXlMixfZukkYgYLnH/3+KGiFgg6RZJv7Z9Y6Gcc9Q5LHwkIuZL+lJS0eegJMn2VEkrJA2WyuiG4h+SdPlpH89u/q5v2J6iTuk3RcRztXKb3dIdkpYXirhe0grbB9Q5RFti+8lCWf8VEYeaP0ckbVXncLGEg5IOnrbHtEWdB4LSbpG0MyI+KRXQDcX/p6Qf2L6yeaRbJekvk7ymCWPb6hwj7o2IhyrkXWJ7RnP9fEnLJO0rkRUR90fE7IiYo87P7aWIuKNE1im2L7A9/dR1STdLKvIbmoj4WNKHtuc1f7VU0p4SWWdYrYK7+VJnV2ZSRcRXtn8j6e/qPJP5WES8UyrP9mZJP5Z0se2Dkv4QEY+WylNnq3inpLeb425J+n1E/LVQ3mWSnrA9oM4D+zMRUeXXbJVcKmlr5/FU50h6KiJeKJh3t6RNzUZpv6S7CmadejBbJmlt0ZzmVwcAEumGXX0AlVF8ICGKDyRE8YGEKD6QUFcVv/Dpl5OWRR553ZbXVcWXVPObW/UHSR553ZTXbcUHUEGRE3hs9/VZQTNnzhzzvzl+/LjOPffcceXNmjX2FysePnxYF1100bjyjh4d+2uIjh07pmnTpo0r79Chsb80IyLUnL03ZidPnhzXv+sVETHqN2bST9ntRTfddFPVvAcffLBq3vbt26vmrV9f/AVv33DkyJGqed2IXX0gIYoPJETxgYQoPpAQxQcSovhAQhQfSIjiAwm1Kn7NEVcAyhu1+M2bNv5Jnbf8vVrSattXl14YgHLabPGrjrgCUF6b4qcZcQVkMWEv0mneOKD2a5YBjEOb4rcacRURGyRtkPr/ZblAr2uzq9/XI66AjEbd4tcecQWgvFbH+M2ct1Kz3gBUxpl7QEIUH0iI4gMJUXwgIYoPJETxgYQoPpAQxQcSYpLOONSebDN37tyqeeMZEfb/OHz4cNW8lStXVs0bHBysmtcGW3wgIYoPJETxgYQoPpAQxQcSovhAQhQfSIjiAwlRfCAhig8k1GaE1mO2R2zvrrEgAOW12eL/WdLywusAUNGoxY+IVyTVfRUFgKI4xgcSYnYekNCEFZ/ZeUDvYFcfSKjNr/M2S3pN0jzbB23/svyyAJTUZmjm6hoLAVAPu/pAQhQfSIjiAwlRfCAhig8kRPGBhCg+kBDFBxLqi9l5CxcurJpXe5bdVVddVTVv//79VfO2bdtWNa/2/xdm5wHoChQfSIjiAwlRfCAhig8kRPGBhCg+kBDFBxKi+EBCFB9IqM2bbV5ue4ftPbbfsX1PjYUBKKfNufpfSfpdROy0PV3SsO1tEbGn8NoAFNJmdt5HEbGzuf6FpL2SZpVeGIByxnSMb3uOpPmSXi+xGAB1tH5Zru1pkp6VtC4ijp7l88zOA3pEq+LbnqJO6TdFxHNnuw2z84De0eZZfUt6VNLeiHio/JIAlNbmGP96SXdKWmJ7V3P5aeF1ASiozey8VyW5wloAVMKZe0BCFB9IiOIDCVF8ICGKDyRE8YGEKD6QEMUHEuqL2XkzZ86smjc8PFw1r/Ysu9pqfz/BFh9IieIDCVF8ICGKDyRE8YGEKD6QEMUHEqL4QEIUH0iI4gMJtXmX3fNsv2H7rWZ23gM1FgagnDbn6h+XtCQijjXvr/+q7b9FxD8Krw1AIW3eZTckHWs+nNJcGJgB9LBWx/i2B2zvkjQiaVtEMDsP6GGtih8RJyPiWkmzJS22fc2Zt7G9xvaQ7aGJXiSAiTWmZ/Uj4nNJOyQtP8vnNkTEoohYNFGLA1BGm2f1L7E9o7l+vqRlkvaVXhiActo8q3+ZpCdsD6jzQPFMRDxfdlkASmrzrP6/JM2vsBYAlXDmHpAQxQcSovhAQhQfSIjiAwlRfCAhig8kRPGBhJidNw7bt2+vmtfvav/8jhw5UjWvG7HFBxKi+EBCFB9IiOIDCVF8ICGKDyRE8YGEKD6QEMUHEqL4QEKti98M1XjTNm+0CfS4sWzx75G0t9RCANTTdoTWbEm3StpYdjkAami7xX9Y0r2Svi64FgCVtJmkc5ukkYgYHuV2zM4DekSbLf71klbYPiDpaUlLbD955o2YnQf0jlGLHxH3R8TsiJgjaZWklyLijuIrA1AMv8cHEhrTW29FxMuSXi6yEgDVsMUHEqL4QEIUH0iI4gMJUXwgIYoPJETxgYQoPpBQX8zOqz0LbeHChVXzaqs9y67293NwcLBqXjdiiw8kRPGBhCg+kBDFBxKi+EBCFB9IiOIDCVF8ICGKDyRE8YGEWp2y27y19heSTkr6irfQBnrbWM7V/0lEfFZsJQCqYVcfSKht8UPSi7aHba8puSAA5bXd1b8hIg7Z/r6kbbb3RcQrp9+geUDgQQHoAa22+BFxqPlzRNJWSYvPchtm5wE9os203AtsTz91XdLNknaXXhiActrs6l8qaavtU7d/KiJeKLoqAEWNWvyI2C/phxXWAqASfp0HJETxgYQoPpAQxQcSovhAQhQfSIjiAwlRfCAhR8TE36k98Xf6LebOnVszTkNDQ1Xz1q5dWzXv9ttvr5pX++e3aFF/v5wkIjzabdjiAwlRfCAhig8kRPGBhCg+kBDFBxKi+EBCFB9IiOIDCVF8IKFWxbc9w/YW2/ts77V9XemFASin7UCNP0p6ISJ+bnuqpO8WXBOAwkYtvu0LJd0o6ReSFBEnJJ0ouywAJbXZ1b9S0qeSHrf9pu2NzWCNb7C9xvaQ7bovXQMwZm2Kf46kBZIeiYj5kr6UtP7MGzFCC+gdbYp/UNLBiHi9+XiLOg8EAHrUqMWPiI8lfWh7XvNXSyXtKboqAEW1fVb/bkmbmmf090u6q9ySAJTWqvgRsUsSx+5An+DMPSAhig8kRPGBhCg+kBDFBxKi+EBCFB9IiOIDCfXF7Lza1qxZUzXvvvvuq5o3PDxcNW/lypVV8/ods/MAnBXFBxKi+EBCFB9IiOIDCVF8ICGKDyRE8YGEKD6Q0KjFtz3P9q7TLkdtr6uxOABljPqeexHxrqRrJcn2gKRDkrYWXheAgsa6q79U0vsR8UGJxQCoY6zFXyVpc4mFAKindfGb99RfIWnwf3ye2XlAj2g7UEOSbpG0MyI+OdsnI2KDpA1S/78sF+h1Y9nVXy1284G+0Kr4zVjsZZKeK7scADW0HaH1paTvFV4LgEo4cw9IiOIDCVF8ICGKDyRE8YGEKD6QEMUHEqL4QEIUH0io1Oy8TyWN5zX7F0v6bIKX0w1Z5JFXK++KiLhktBsVKf542R6KiEX9lkUeed2Wx64+kBDFBxLqtuJv6NMs8sjrqryuOsYHUEe3bfEBVEDxgYQoPpAQxQcSovhAQv8BVOSY4UmSu60AAAAASUVORK5CYII=\n",
  642. "text/plain": [
  643. "<Figure size 288x288 with 1 Axes>"
  644. ]
  645. },
  646. "metadata": {
  647. "needs_background": "light"
  648. },
  649. "output_type": "display_data"
  650. }
  651. ],
  652. "source": [
  653. "from sklearn.datasets import load_digits\n",
  654. "import matplotlib.pyplot as plt \n",
  655. "from sklearn.cluster import KMeans\n",
  656. "\n",
  657. "# load digital data\n",
  658. "digits, dig_label = load_digits(return_X_y=True)\n",
  659. "\n",
  660. "# draw one digital\n",
  661. "plt.gray() \n",
  662. "plt.matshow(digits[0].reshape([8, 8])) \n",
  663. "plt.show() \n",
  664. "\n",
  665. "# calculate train/test data number\n",
  666. "N = len(digits)\n",
  667. "N_train = int(N*0.8)\n",
  668. "N_test = N - N_train\n",
  669. "\n",
  670. "# split train/test data\n",
  671. "x_train = digits[:N_train, :]\n",
  672. "y_train = dig_label[:N_train]\n",
  673. "x_test = digits[N_train:, :]\n",
  674. "y_test = dig_label[N_train:]\n",
  675. "\n"
  676. ]
  677. },
  678. {
  679. "cell_type": "code",
  680. "execution_count": 28,
  681. "metadata": {},
  682. "outputs": [
  683. {
  684. "data": {
  685. "image/png": "iVBORw0KGgoAAAANSUhEUgAAAW4AAAA/CAYAAADAByJpAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4wLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvqOYd8AAAEHNJREFUeJztnWtsVNUWx/97ZjptZzqtVCqUhxT0+kDN1QZB8CZqfIAaJfpBfMRH1OADTPygxpuguWpEwUdqolGJuQokPquQaCwPTa0GTASjeBUEeRR5SC1tebUzbWdm3w90NmvvdqbnnHmcHrp+yYR1WNNz/nPmnDV7r7P23kJKCYZhGMY7+NwWwDAMw9iDAzfDMIzH4MDNMAzjMThwMwzDeAwO3AzDMB6DAzfDMIzH4MDNMAzjMSwFbiHELCHEViHEdiHEk/kWxTpYB+tgHSerjpwgpcz4AuAHsAPAJABBAJsATB7s73L9Yh2sg3WwDq/ryNVL9H2otAghpgP4j5RyZt/2v/sC/gsZ/ibtTiORiLY9duxYZQcCAc134MABZff29uLw4cMQQqDv+Np7pZTCjg6/369tn3HGGcqOx+Oab8+ePcpOJpNIJBLpdmtJh893oqMzfvx47b2nnnqqdixKS0uLsnt6etDW1qb2Zb7X7vkoLy/XtidOnKgdi/Lnn38qO5FIIBaLpdut7fMxYcIE7b2VlZXK/vvvvzXf/v376XH6nQO7OjJRXFysbHqtALr+rq4u7N69G6FQCABw9OjRnOoYNWqUsquqqjTfH3/8oexkMone3t60+7GrY8SIEdr2uHHjlG3eS9FoVLP379+vrq+Ojg4Ax89ZMplEMpkcVAeNCzU1Ndp7S0tL0+qgn7+rqwt79uxR8SelI4Xd83Haaael3c503wJAe3s7Pe6gOgYiMPhbMBbAHrK9F8A0KztPkQq2ADBlyhTN9+KLLyrbvDgWLVqk7F27dqGpqQlFRUUAkDFYWOGUU07Rtt9++21lmyf6scceU3ZXV5d24p1AL7YFCxZovrvuukvZ5md85ZVXlL1582asXLlSBZTOzs6sNF166aXa9tKlS5W9d+9ezffwww8ru62tTQsYTkgFOQB4/vnnNd+cOXOU/frrr2u+Z599Vtnd3d04duxYVjoyQQPV+++/r/nKysqU3dDQgIULF+Kcc84BADQ2NgI4fg8M1kgaCDMY3XnnncqeO3eu5rvuuuuUfezYMbS0tKgflVRjw44Oet9eddVVmm/x4sXKpj+uAPDLL78ou7GxEcuWLcP06dMBACtWrEBvby/Kyspw6NAhSzpoXHj11Vc130UXXaTscDis+VpbW5W9atUq1NXVYerUqQCAjz76yNKxKfS7uOOOOzTf/PnzlW3+WL/22mva9ocffqhs+iNnByuB2xJCiLkA5g76xjzDOlgH62AdXtcxGFYC9z4AtD8/ru//NKSUSwAsAex1/awSCoUstRTyrcNsAbmlIxKJDInzEQwGLb0v3zpousJNHaNGjbLUGyzEdToUro+qqiqtN5hMJge8hwrxvVhp3eZbR66wErg3APiHEGIijgfsWwHcbucgtCtJu3MAMGnSJGUfOXJE882ePVvZiUQCDQ0NKC8vh9/vx19//WVHAgC960e7+wBwySWXKPvxxx/XfPTCc9LdNbniiiuUfdlll2m+d999V9lnn3225rvpppuUHY/HsXz5clRWViIQCGDXrl22ddAu6JIlSzQfPVdmGuaNN97QdEyfPh0VFRXw+Xxoa2uzrePqq69W9jXXXKP5tm/fruwZM2ZovnPPPVfZUkqsW7fO9rEp9DPT1AgAPPnkiSKEVBokBf3M06ZNQ09PD0KhkEqJCSEcp0poKgDQU2srVqzQfDSn6/f74fP5EIlE4PP50N7ejkAgACFEv2c46aDpxIceekjz0edPmzZt0nznn3++squqqvDEE0+ocxKLxVBcXIxYLGb5fNA03g033KD5tm7dquwvvvhC89FnMclkEu3t7di0aZNKt9qFXm8LFy7UfB9//LGy6XUEALfeequ2/fnnnys7b6kSKWVcCDEfwGocfzL7Xynlb46OlgV+vx8VFRWOAkMuMb8UtwgEAhg5ciQOHDiQkx+TbHSUlZXh8OHDrmkAhtb3Ultbi6ampuNP//uCdqERQiAcDqvGkM/nc0VHIBDAjBkz0NDQACml+kEpND6fD2PHjsXOnTsLfux8YCnHLaX8EsCXedYyKCUlJSgpKQGgVxQMV0KhkHqw5+YFGQwG1QOqgwcPuqZjqDBmzBiMGTMGAPDJJ5+4piMYDKpUltmbLSTjx49X1VMffPCBazrKy8tVdYvZS/AaOXs4mQlaNWDmRHfs2JHWZ7bisq0koSkbs/tCKwWWLVum+XJdrfD7778r26wMoMeiKQkA2LJli7adbe/j4osvVraZGrj99hPZsA0bNmg+s4t+4YUXKvurr76yrYOWV5oVGzRVcv/992s+p13edIwcOVLZtJII0FNa+/bpj3jMUkqaxnPSG6LVEc8995zm2717t7LN78FMJdHr45tvvrGtg+aily9frvm+++47ZZvpi9GjR2vb9DukqYFM5ZsU+r10d3drvhdeOFGVvHbtWs1nlvxlGz9oGShNFQH6DzS9HwA9xQLocchpQ4eHvDMMw3gMDtwMwzAegwM3wzCMxyhIjpvmpcwyvrPOOkvZZn0nzV8B/XPedkk9MAL6j9KkoyWnTdMHhpojA2me0UkOk+b1zWHcTz31lLLN3Nhnn32mbWc7WrK6ulrZ5ujI77//XtlmTvenn37StmnJmpMc9/r165Vtno9Zs2YpO/VgOoWZw8wW+jluueUWzUfzsfQ6AvQReoBehubk+qAlkfTzA3oZq3l9pEYnpli1apWy16xZY1sH/czmtAfXXnutss1nD2bumua8rea1KfTBqpmnfuSRR5RNp2kAgHfeeUfbNqdIsAstPTRjAn0WYU5jYY7ENkd4OoFb3AzDMB6DAzfDMIzHKEiqhHazzNIpmrIwux9mGVqmWfmsQI9FZ+ED9FGJZveUdn0B4Omnn1Y2nVDHCeZnampqUjYdzQkA99xzj7ZNR4r99pv9MVF0siuzO0fLlMxZ5sxuJi33dAI9lrkvOgqvublZ8+U6VUJTYOYkRLR7e/PNN2s+M1XS1dWVlY6ZM2em9dFJ2sxRt2YKxywXtAtNXZqTkNF7xExRmGV55vmxyw8//KDst956S/PRY5ujTM3zU19fr2yzrNAKNMX5zDPPaD5aimmWNZvfJ02lOLlvAW5xMwzDeA5LLW4hRDOAowASAOJSyimZ/+LkZsuWLZYnm8on27Ztc20oM6Wurg7FxcWu6xgqHDx40LWh7pT6+noUFRW5rmPdunXw+/2u6+ju7nZdQ66w0+K+Qkp54XAP2ikmTZqkVcS4RU1NTb+J/d3g7rvvxoMPPui2jCHDiBEj+s1T7QYzZ87EjTfe6LYM1NbW9qvWcoOioiLLM1oOZQqS46ar3phfHs0Rm7lUc7WL5uZm9cttdYYzSqahyDS/aQ7vNS/8iooKzJs3D5FIBPfee6/6f6uaaFlbRUWF5qPldOaE7OZCAqWlpbjyyisRCoUc5co2b96sbPNc0+/MzDubZWiJRAItLS1aztwOdIWTCy64QPPRlWfMMkTzeEIIlJaWQgjhqFSSXh9vvvmm5qOf+frrr9d8ZqmclBLRaNRW646+99dff1U2feYB6Hlc8wf722+/1bZjsZhq7TqB3o/mkHn67MF8BmTmf7u7u7FhwwbHrV1axmcuSkCfA5mLG5x++unathBCzZDoJMdNY4ZZmvzpp58q24xj5oyX9FqiJZt2sBq4JYA1ffPTvt03Z60rZPuAMlcsXrwYQggkk0lXZjtLUV9f73r3TwiBL790fQ4yANnPR5ErnE7XmWvMuni3cBIo88FQuT6yxWrg/peUcp8Q4jQAa4UQv0sptZ/3QqwckWptSynTBvBC6FiwYAEqKytx5MgRzJ8/f8B8ZiF03HbbbYhEIujs7OzXQiykjtmzZyMcDiMajfaboKuQOkpKSuDz+SClTFvZUQgdpaWlak1FN3WMHj0agUAAiUSi3wCrQupIPf+QUqYNnMPp+sgFlpqKUsp9ff/+DWAFgKkDvGeJlHJKPnPgqeCYqYVZCB2p3GV5eXnaCfILoSOVzsg0EqsQOlLHz5QqKYSOVM/H7esjpSNTT6wQOlIpqEypkuF03w6V6yMXDNriFkKEAfiklEf77GsAPDvIn2nQbpI5RSoNOuY0nXSq0Xg8rv1yO8lx09XaGxoaNN+ZZ545oF5Ar/+ORqMIh8MIhUIqj1lcXIxAIGA5r0pz3Pfdd5/mo7kzc8VzusJ3Z2cnOjo6EAwGtZyanZVWfv75Z2WbPRi6KKv5w0Dz4Z2dnWhqaoLf73c0nBnQV1qZN2+e5qutrVU2nd4T0KfPjMViWL9+PYqLi9HT04OXX34Z4XAYwWDQcr13pqHVNF86UG49RTQaRTAYdLziDaAvJkunAwD08QbmCkkvvfSSsuPxuPbcxgl0MV863BvQr0Xz2QsdfyGlzLjavBXo+b788ss1H109afLkyZqP5p3j8Th6enqy+l7ofWuuCLRx40Zlm3Xr5sLkFPrjbuf+sZIqGQVgRd/FGQDwvpTSWUY9C6LR6JDIGx46dEg9fEkkEggEAtrDtULR2tqKlStXAjjxhbuR625tbXW0bFqu6ejowHvvvQfg+PmgiwgUkra2NtdXAwKGTk55qBCLxRw19oYqVpYu2wngnwXQkpFIJKJVN+R6cQOrVFdXa5PX0HUxC0lNTY02EZI5oqyQOmhvxelIsGyprq7GAw88oLbT5fzzzbhx47QWlltL7eViIqNc4PaD8xRlZWVaj96sBPIaBWkq0haI2a2i3TuzNM7svme74gntiphdnaVLlyrbXG7KLHeiq6PQhxxWuzq0G37eeedpvjlz5ijbbLmZZYp00VHa6rfasqCpAXM4PV381Nzfo48+qm1v27bN0vHSQXtSZgkk7ZKb0yWY76X7od3mH3/80bYm88amvQpz9RPze3JSfke773QKAJquAPRUiamDlnfmAhp0zdV16PlpbGzUfLleA5Xe9+Yi33RGxLq6Os23evVqbTvbYE3vdfOaX7RokbKrqqo0n1ny9/XXXyvb6bniIe8MwzAegwM3wzCMx+DAzTAM4zFErvNRACCEaAXQCcDZEsY6Iy3sZ4KUssr8T9YxpHXstrgP1sE6TgYdVrQMqGNApJR5eQHYOBT2wzqGpg7eB+9jOO0jl/uRUnKqhGEYxmtw4GYYhvEY+QzcuZpBMNv9sI7c/n0u98P74H0Ml33kcj/5eTjJMAzD5A9OlTAMw3iMvARuIcQsIcRWIcR2IcSTWeynWQjxPyHEz0KIjYP/BetgHayDdZxcOgYkV+UppOTFD2AHgEkAggA2AZjscF/NAEayDtbBOljHcNSR7pWPFvdUANullDullD0APgTgxhR6rIN1sA7W4XUdA5KPwD0WwB6yvbfv/5yQWuvyx74lhVgH62AdrGM46RiQwq8AYI9B17pkHayDdbCO4aYjHy3ufQDGk+1xff9nG2lhrUvWwTpYB+s4iXWk3WlOXzjeit8JYCJOJPXPc7CfMIAIsdcDmMU6WAfrYB3DRUe6V85TJVLKuBBiPoDVOP5k9r9SSifrWWW11iXrYB2sg3V4XUc6eOQkwzCMx+CRkwzDMB6DAzfDMIzH4MDNMAzjMThwMwzDeAwO3AzDMB6DAzfDMIzH4MDNMAzjMThwMwzDeIz/A3IZWsVEJuJMAAAAAElFTkSuQmCC\n",
  686. "text/plain": [
  687. "<Figure size 432x288 with 10 Axes>"
  688. ]
  689. },
  690. "metadata": {
  691. "needs_background": "light"
  692. },
  693. "output_type": "display_data"
  694. }
  695. ],
  696. "source": [
  697. "# do kmeans\n",
  698. "kmeans = KMeans(n_clusters=10, random_state=0).fit(x_train)\n",
  699. "\n",
  700. "# kmeans.labels_ - output label\n",
  701. "# kmeans.cluster_centers_ - cluster centers\n",
  702. "\n",
  703. "# draw cluster centers\n",
  704. "fig, axes = plt.subplots(nrows=1, ncols=10)\n",
  705. "for i in range(10):\n",
  706. " img = kmeans.cluster_centers_[i].reshape(8, 8)\n",
  707. " axes[i].imshow(img)"
  708. ]
  709. },
  710. {
  711. "cell_type": "markdown",
  712. "metadata": {},
  713. "source": [
  714. "## Exerciese - How to caluate the accuracy?\n",
  715. "\n",
  716. "1. How to match cluster label to groundtruth label\n",
  717. "2. How to solve the uncertainty of some digital"
  718. ]
  719. },
  720. {
  721. "cell_type": "markdown",
  722. "metadata": {},
  723. "source": [
  724. "## 评估聚类性能\n",
  725. "\n",
  726. "方法1: 如果被用来评估的数据本身带有正确的类别信息,则利用Adjusted Rand Index(ARI),ARI与分类问题中计算准确性的方法类似,兼顾了类簇无法和分类标记一一对应的问题。\n",
  727. "\n"
  728. ]
  729. },
  730. {
  731. "cell_type": "code",
  732. "execution_count": 29,
  733. "metadata": {},
  734. "outputs": [
  735. {
  736. "name": "stdout",
  737. "output_type": "stream",
  738. "text": [
  739. "ari_train = 0.687021\n"
  740. ]
  741. }
  742. ],
  743. "source": [
  744. "from sklearn.metrics import adjusted_rand_score\n",
  745. "\n",
  746. "ari_train = adjusted_rand_score(y_train, kmeans.labels_)\n",
  747. "print(\"ari_train = %f\" % ari_train)"
  748. ]
  749. },
  750. {
  751. "cell_type": "markdown",
  752. "metadata": {},
  753. "source": [
  754. "Given the contingency table:\n",
  755. "![ARI_ct](images/ARI_ct.png)\n",
  756. "\n",
  757. "the adjusted index is:\n",
  758. "![ARI_define](images/ARI_define.png)\n",
  759. "\n",
  760. "* [ARI reference](https://davetang.org/muse/2017/09/21/adjusted-rand-index/)"
  761. ]
  762. },
  763. {
  764. "cell_type": "markdown",
  765. "metadata": {},
  766. "source": [
  767. "\n",
  768. "\n",
  769. "方法2: 如果被用来评估的数据没有所属类别,则使用轮廓系数(Silhouette Coefficient)来度量聚类结果的质量,评估聚类的效果。**轮廓系数同时兼顾了聚类的凝聚度和分离度,取值范围是[-1,1],轮廓系数越大,表示聚类效果越好。** \n",
  770. "\n",
  771. "轮廓系数的具体计算步骤: \n",
  772. "1. 对于已聚类数据中第i个样本$x_i$,计算$x_i$与其同一类簇内的所有其他样本距离的平均值,记作$a_i$,用于量化簇内的凝聚度 \n",
  773. "2. 选取$x_i$外的一个簇$b$,计算$x_i$与簇$b$中所有样本的平均距离,遍历所有其他簇,找到最近的这个平均距离,记作$b_i$,用于量化簇之间分离度 \n",
  774. "3. 对于样本$x_i$,轮廓系数为$sc_i = \\frac{b_i−a_i}{max(b_i,a_i)}$ \n",
  775. "4. 最后,对所有样本集合$\\mathbf{X}$求出平均值,即为当前聚类结果的整体轮廓系数。"
  776. ]
  777. },
  778. {
  779. "cell_type": "code",
  780. "execution_count": 12,
  781. "metadata": {},
  782. "outputs": [
  783. {
  784. "data": {
  785. "image/png": "\n",
  786. "text/plain": [
  787. "<Figure size 720x720 with 6 Axes>"
  788. ]
  789. },
  790. "metadata": {
  791. "needs_background": "light"
  792. },
  793. "output_type": "display_data"
  794. },
  795. {
  796. "data": {
  797. "image/png": "\n",
  798. "text/plain": [
  799. "<Figure size 720x720 with 1 Axes>"
  800. ]
  801. },
  802. "metadata": {
  803. "needs_background": "light"
  804. },
  805. "output_type": "display_data"
  806. }
  807. ],
  808. "source": [
  809. "import numpy as np\n",
  810. "from sklearn.cluster import KMeans\n",
  811. "from sklearn.metrics import silhouette_score\n",
  812. "import matplotlib.pyplot as plt\n",
  813. "\n",
  814. "plt.rcParams['figure.figsize']=(10,10)\n",
  815. "plt.subplot(3,2,1)\n",
  816. "\n",
  817. "x1=np.array([1,2,3,1,5,6,5,5,6,7,8,9,7,9]) #初始化原始数据\n",
  818. "x2=np.array([1,3,2,2,8,6,7,6,7,1,2,1,1,3])\n",
  819. "X=np.array(list(zip(x1,x2))).reshape(len(x1),2)\n",
  820. "\n",
  821. "plt.xlim([0,10])\n",
  822. "plt.ylim([0,10])\n",
  823. "plt.title('Instances')\n",
  824. "plt.scatter(x1,x2)\n",
  825. "\n",
  826. "colors=['b','g','r','c','m','y','k','b']\n",
  827. "markers=['o','s','D','v','^','p','*','+']\n",
  828. "\n",
  829. "clusters=[2,3,4,5,8]\n",
  830. "subplot_counter=1\n",
  831. "sc_scores=[]\n",
  832. "for t in clusters:\n",
  833. " subplot_counter +=1\n",
  834. " plt.subplot(3,2,subplot_counter)\n",
  835. " kmeans_model=KMeans(n_clusters=t).fit(X) #KMeans建模\n",
  836. "\n",
  837. " for i,l in enumerate(kmeans_model.labels_):\n",
  838. " plt.plot(x1[i],x2[i],color=colors[l],marker=markers[l],ls='None')\n",
  839. "\n",
  840. " plt.xlim([0,10])\n",
  841. " plt.ylim([0,10])\n",
  842. "\n",
  843. " sc_score=silhouette_score(X,kmeans_model.labels_,metric='euclidean') #计算轮廓系数\n",
  844. " sc_scores.append(sc_score)\n",
  845. "\n",
  846. " plt.title('k=%s,silhouette coefficient=%0.03f'%(t,sc_score))\n",
  847. "\n",
  848. "plt.figure()\n",
  849. "plt.plot(clusters,sc_scores,'*-') #绘制类簇数量与对应轮廓系数关系\n",
  850. "plt.xlabel('Number of Clusters')\n",
  851. "plt.ylabel('Silhouette Coefficient Score')\n",
  852. "\n",
  853. "plt.show() "
  854. ]
  855. },
  856. {
  857. "cell_type": "markdown",
  858. "metadata": {},
  859. "source": [
  860. "## 如何确定K\n",
  861. "\n",
  862. "利用“肘部观察法”可以粗略地估计相对合理的聚类个数。K-means模型最终期望*所有数据点到其所属的类簇距离的平方和趋于稳定,所以可以通过观察这个值随着K的走势来找出最佳的类簇数量。理想条件下,这个折线在不断下降并且趋于平缓的过程中会有斜率的拐点,这表示从这个拐点对应的K值开始,类簇中心的增加不会过于破坏数据聚类的结构*。\n",
  863. "\n"
  864. ]
  865. },
  866. {
  867. "cell_type": "code",
  868. "execution_count": 18,
  869. "metadata": {},
  870. "outputs": [
  871. {
  872. "data": {
  873. "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXwAAAEKCAYAAAARnO4WAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAEiZJREFUeJzt3X9s3PV9x/HXK8YdtwCyKryKOLhpN8loJQOD+VF5QpBWDVsRjfLHVDaYhipFmqaOapM7Uq1iTEihi1S1f0yTImjLIFC1IVgbnUjRAmJtBcXGQIAk/zBS4oBixCxg80YI7/3hc+qY8+XOd5/73tef50OKcM6X+7zvD17+3Pv7/n7siBAAYPVbU3QBAIDOIPABIBMEPgBkgsAHgEwQ+ACQCQIfADJB4ANAJgh8AMgEgQ8AmTir6AIWO//882PDhg1FlwEApTE5OflWRPQ38tyuCvwNGzZoYmKi6DIAoDRsH2n0ubR0ACATBD4AZILAB4BMEPgAkAkCHwAyQeADQCa6aiwTAHIyPjWtnfsO69jsnNb1VTS2eUhbhgeSrUfgA0ABxqemtX3vAc2dOClJmp6d0/a9ByQpWejT0gGAAuzcd/hU2C+YO3FSO/cdTrYmgQ8ABTg2O9fU4+1A4ANAAdb1VZp6vB0IfAAowNjmIVV6e057rNLbo7HNQ8nW5KItgCx0eiLmTBbWZkoHANqoiImYRmwZHujo+rR0AKx6RUzEdCMCH8CqV8RETDci8AGsekVMxHQjAh/AqtfOiZjxqWmN3r1fn7r9Jxq9e7/Gp6bbVWZyXLQFsOq1ayKmWy/+NorAB7Aq1RrD/Pntm1p6zXoXfwl8AChAqp142S/+0sMHsOqkGsNs9eJv0f3/pIFvu8/2HtuHbB+0/dmU6wGAlG4n3srF34VPHdOzcwr9+lNHJ0M/9Q7/u5Iei4iLJF0i6WDi9QAg2RjmluEB7di6UQN9FVnSQF9FO7ZubKhN1A03fyXr4ds+T9I1kv5MkiLifUnvp1oPABaMbR46rYcvte9gspUeh9AN/f+UO/xPS5qR9H3bU7bvsb024XoAIKm1nXgq3XDzlyMizQvbI5KeljQaEc/Y/q6kdyLim0uet03SNkkaHBy8/MiRI0nqAYBWtHra5tLJIWn+U0erP4hsT0bESCPPTbnDPyrpaEQ8U/37HkmXLX1SROyKiJGIGOnv709YDgCsTDsuuHbDp45kPfyIeNP267aHIuKwpM9JeiXVegCQSrtuuOr0cchLpb7x6quSdtv+mKRXJd2aeD0AaLtuuODaDkkDPyKel9RQbwkAVir1b7Na11fRdI1wL9tpm9xpC6DUOnFDUxG/fzYFAh9AqXXihqZuuODaDhyeBqDUOtVfL/qCazuwwwdQat1wQ1NZEPgASi1Vf73oky1ToKUDoNTa9dusFmvXefrjU9O6819f1n/9zwlJUl+lV39342cKaw0R+ABKr9399XbcaDU+Na2xPS/oxMlfH18zO3dCYz9+4VTNnUZLBwCWaMeF4J37Dp8W9gtOfBgdPRJ5MQIfAJZox4Xgej8cirpDl8AHgCXacSG43g+HoiaICHwAWKIdN1qNbR5Sb48/8njvGhd2hy4XbQGghlYvBC/8W6Z0ACAD3XZ3Li0dAMgEgQ8AmSDwASATBD4AZILAB4BMEPgAkAkCHwAyQeADQCYIfADIBIEPAJkg8AEgEwQ+AGSCwAeATCQ9LdP2a5LelXRS0gcRMZJyPQDA8jpxPPJ1EfFWB9YBANRBSwcAMpE68EPST21P2t6WeC0AQB2pWzqjEXHM9m9Jetz2oYh4avETqj8ItknS4OBg4nIAIF9Jd/gRcaz63+OSHpF0ZY3n7IqIkYgY6e/vT1kOAGQtWeDbXmv73IWvJX1B0kup1gMA1JeypfMJSY/YXljnwYh4LOF6AIA6kgV+RLwq6ZJUrw8AaA5jmQCQCQIfADJB4ANAJgh8AMgEgQ8AmSDwASATBD4AZILAB4BMEPgAkAkCHwAyQeADQCYIfADIBIEPAJkg8AEgEwQ+AGSCwAeATBD4AJAJAh8AMkHgA0AmCHwAyASBDwCZIPABIBMEPgBkgsAHgEwQ+ACQCQIfADJB4ANAJpIHvu0e21O2H029FgBgeZ3Y4d8m6WAH1gEA1JE08G2vl/RFSfekXAcAcGapd/jfkfR1SR8mXgcAcAbJAt/2DZKOR8TkGZ63zfaE7YmZmZlU5QBA9lLu8Ecl3Wj7NUk/lLTJ9gNLnxQRuyJiJCJG+vv7E5YDAHlLFvgRsT0i1kfEBklflrQ/Im5OtR4AoD7m8AEgE2d1YpGIeFLSk51YCwBQGzt8AMhER3b4wGoxPjWtnfsO69jsnNb1VTS2eUhbhgeKLgtoSN0dvu3zbP92jcd/L11JQHcan5rW9r0HND07p5A0PTun7XsPaHxquujSgIYsG/i2/0jSIUkP237Z9hWLvv2D1IUB3WbnvsOaO3HytMfmTpzUzn2HC6oIaE69Hf43JF0eEZdKulXS/ba3Vr/n5JUBXebY7FxTjwPdpl4Pvyci3pCkiPil7eskPVo9Hyc6Uh3QRdb1VTRdI9zX9VUKqAZoXr0d/ruL+/fV8L9W0pckfSZxXUDXGds8pEpvz2mPVXp7NLZ5qKCKgObU2+H/uaQ1tn83Il6RpIh41/b1mr9zFsjKwjQOUzooq2UDPyJekCTbL9m+X9I/SDq7+t8RSfd3pEKgi2wZHiDgUVqNzOFfJelbkn4h6VxJuzV/MBqARZjRR7drJPBPSJqTVNH8Dv8/I4Lz7YFFFmb0F8Y2F2b0JRH66BqNHK3wrOYD/wpJvy/pJtt7klYFlAwz+iiDRnb4X4mIierXb0r6ku1bEtYEdIWlLZrrLurXE4dmarZsmNFHGZwx8BeF/eLHuGCL0mmmx16rRfPA07869f2lLRtm9FEGnJaJLDR7Dk6tFs1Si1s2zOijDDgtE6XU7ERMvR57rX/XaCtm4XnM6KMMCHyUzkomYprtsS/Xoqn1vAXM6KPb0dJB6axkIma5Xvpyj9dq0SxFywZlQ+CjdJbblU/Pzmn07v01+/LN9ti3DA9ox9aNGuiryJIG+iq6+erB0/6+Y+tGdvQoFVo6KJ167Zbl2jsr6bGvpEXD3bboZo7onpOOR0ZGYmLiI1OgwGmW9vBrGeir6Oe3b+pgVbXrqvT28EkASdmejIiRRp5LSwels7jdspwibnjiblt0O1o6KKWFdsvo3fubuuGpkZbLStsy3G2LbscOH6XWzMXYRm6+auUXlTc7CQR0GoGPUqs1TbNcz7yRlksrbRnutkW3o6WD0mt0mqaRlksrbRnutkW3Sxb4ts+W9JSk36iusyci7ki1HnAmjRxw1uohaNxti26WsqXzf5I2RcQlki6VdL3tqxOuB9TVSMuFtgxWs2Q7/Jgf8H+v+tfe6p/uGfpHdhppudCWwWqW9MYr2z2SJiX9jqR/jIi/qfd8brwCgOZ0zY1XEXEyIi6VtF7SlbYvXvoc29tsT9iemJmZSVkOAGStI2OZETEr6UlJ19f43q6IGImIkf7+/k6UAwBZShb4tvtt91W/rkj6vKRDqdYDANSXcg7/Akn3Vfv4ayT9KCIeTbgeAKCOlFM6L0oaTvX6AIDmcLQCAGSCwAeATBD4AJAJAh8AMkHgA0AmCHwAyASBDwCZIPABIBMEPgBkgsAHgEwQ+ACQCQIfADJB4ANAJgh8AMgEgQ8AmSDwASATBD4AZILAB4BMEPgAkAkCHwAyQeADQCYIfADIBIEPAJkg8AEgEwQ+AGSCwAeATBD4AJCJZIFv+0LbT9g+aPtl27elWgsAcGZnJXztDyT9dUQ8Z/tcSZO2H4+IVxKuCQBYRrIdfkS8ERHPVb9+V9JBSQOp1gMA1NeRHr7tDZKGJT1T43vbbE/YnpiZmelEOQCQpeSBb/scSQ9L+lpEvLP0+xGxKyJGImKkv78/dTkAkK2kgW+7V/Nhvzsi9qZcCwBQX8opHUu6V9LBiPh2qnUAAI1JucMflXSLpE22n6/++cOE6wEA6kg2lhkRP5PkVK8PAGgOd9oCQCYIfADIBIEPAJkg8AEgEwQ+AGSCwAeATBD4AJAJAh8AMkHgA0AmCHwAyASBDwCZIPABIBMEPgBkgsAHgEwkOx65jManprVz32Edm53Tur6KxjYPacswv3cdwOpA4FeNT01r+94DmjtxUpI0PTun7XsPSBKhD2BVoKVTtXPf4VNhv2DuxEnt3He4oIoAoL0I/Kpjs3NNPQ4AZUPgV63rqzT1OACUDYFfNbZ5SJXentMeq/T2aGzzUEEVAUB7cdG2auHCLFM6AFarVR/4zYxabhkeSBrwjH0CKNKqDvxuGrXsploA5GlV9/DbNWo5PjWt0bv361O3/0Sjd+/X+NR007Vs3/siY58ACrWqd/jtGLVsx878b8cPaO7Ehy3XAgCtKP0Ov97ue7mRyjV2w7v1dnxKeOiZ15f9HmOfADol2Q7f9vck3SDpeERcnGKN5XbfE0fe1hOHZjQ9OydLiiX/7mTEac+Xlt+tt+NTwsJ6tTD2CaBTUu7wfyDp+oSvv+zue/fTv9J0NZBDkqvf67G11Jl26+24IavWupK0xlywBdA5yQI/Ip6S9Haq15eW32Uv3U+HpIG+ij5cZqddb7fejhuybrrqwpqP//FVgw2/BgC0qtQ9/GZ22Quz782+zpbhAe3YulED1ef02Kc+FTQ6rXPXlo26+erBUzv9Hls3Xz2ou7ZsbLh+AGiVo05/ueUXtzdIerReD9/2NknbJGlwcPDyI0eONPz6S3v4kmr27KX5Hf7Y5qGPPL/S26MdWzeesbVSa61G/y0ApGJ7MiJGGnlu4Tv8iNgVESMRMdLf39/Uv128+7bmQ/1Prh5ctgVT6/mNBjbHJwMou9LP4dc6DmHkkx9f9giDlR6fwPHJAMou5VjmQ5KulXS+7aOS7oiIe1Ott1iKM3HW9VVOTf4sfRwAyiDllM5NEXFBRPRGxPpOhX0qHJ8MoOxK39LpFI5PBlB2BH4TUh+fDAApFT6lAwDoDAIfADJB4ANAJgh8AMgEgQ8AmSDwASATSQ9Pa5btGUmNn57WOedLeqvoItqE99K9VtP74b10zicjoqGDyLoq8LuV7YlGT6PrdryX7rWa3g/vpTvR0gGATBD4AJAJAr8xu4ouoI14L91rNb0f3ksXoocPAJlghw8AmSDw67D9PdvHbb9UdC2tsn2h7SdsH7T9su3biq5ppWyfbfuXtl+ovpc7i66pVbZ7bE/ZfrToWlph+zXbB2w/b3ui6HpaYbvP9h7bh6r/33y26JpaRUunDtvXSHpP0j/X+0XsZWD7AkkXRMRzts+VNClpS0S8UnBpTbNtSWsj4j3bvZJ+Jum2iHi64NJWzPZfSRqRdF5E3FB0PStl+zVJIxHRzXPrDbF9n6T/iIh7bH9M0m9GxGzRdbWCHX4dEfGUpLeLrqMdIuKNiHiu+vW7kg5KKuXh/jHvvepfe6t/Srtzsb1e0hcl3VN0LZhn+zxJ10i6V5Ii4v2yh71E4GfJ9gZJw5KeKbaSlau2QJ6XdFzS4xFR2vci6TuSvi7pw6ILaYOQ9FPbk7a3FV1MCz4taUbS96uttntsry26qFYR+JmxfY6khyV9LSLeKbqelYqIkxFxqaT1kq60XcqWm+0bJB2PiMmia2mT0Yi4TNIfSPqLalu0jM6SdJmkf4qIYUn/Len2YktqHYGfkWq/+2FJuyNib9H1tEP1Y/aTkq4vuJSVGpV0Y7X3/UNJm2w/UGxJKxcRx6r/PS7pEUlXFlvRih2VdHTRJ8c9mv8BUGoEfiaqFzrvlXQwIr5ddD2tsN1vu6/6dUXS5yUdKraqlYmI7RGxPiI2SPqypP0RcXPBZa2I7bXVgQBV2x9fkFTKCbeIeFPS67aHqg99TlLpBhyW4peY12H7IUnXSjrf9lFJd0TEvcVWtWKjkm6RdKDa+5akb0TEvxVY00pdIOk+2z2a37T8KCJKPc64SnxC0iPzewudJenBiHis2JJa8lVJu6sTOq9KurXgelrGWCYAZIKWDgBkgsAHgEwQ+ACQCQIfADJB4ANAJgh8oAG2H7M9W/bTLJE3Ah9ozE7N38cAlBaBDyxi+wrbL1bP3F9bPW//4oj4d0nvFl0f0ArutAUWiYhnbf+LpLskVSQ9EBGlPB4AWIrABz7q7yU9K+l/Jf1lwbUAbUNLB/ioj0s6R9K5ks4uuBagbQh84KN2SfqmpN2SvlVwLUDb0NIBFrH9p5I+iIgHq6dx/sL2Jkl3SrpI0jnVk1O/EhH7iqwVaBanZQJAJmjpAEAmCHwAyASBDwCZIPABIBMEPgBkgsAHgEwQ+ACQCQIfADLx//CsvuLnwA1fAAAAAElFTkSuQmCC\n",
  874. "text/plain": [
  875. "<Figure size 432x288 with 1 Axes>"
  876. ]
  877. },
  878. "metadata": {},
  879. "output_type": "display_data"
  880. }
  881. ],
  882. "source": [
  883. "%matplotlib inline\n",
  884. "import numpy as np\n",
  885. "from sklearn.cluster import KMeans\n",
  886. "from scipy.spatial.distance import cdist\n",
  887. "import matplotlib.pyplot as plt\n",
  888. "\n",
  889. "cluster1=np.random.uniform(0.5,1.5,(2,10))\n",
  890. "cluster2=np.random.uniform(5.5,6.5,(2,10))\n",
  891. "cluster3=np.random.uniform(3,4,(2,10))\n",
  892. "\n",
  893. "X=np.hstack((cluster1,cluster2,cluster3)).T\n",
  894. "plt.scatter(X[:,0],X[:,1])\n",
  895. "plt.xlabel('x1')\n",
  896. "plt.ylabel('x2')\n",
  897. "plt.show()"
  898. ]
  899. },
  900. {
  901. "cell_type": "code",
  902. "execution_count": 19,
  903. "metadata": {},
  904. "outputs": [
  905. {
  906. "data": {
  907. "image/png": "\n",
  908. "text/plain": [
  909. "<Figure size 432x288 with 1 Axes>"
  910. ]
  911. },
  912. "metadata": {},
  913. "output_type": "display_data"
  914. }
  915. ],
  916. "source": [
  917. "K=range(1,10)\n",
  918. "meandistortions=[]\n",
  919. "\n",
  920. "for k in K:\n",
  921. " kmeans=KMeans(n_clusters=k)\n",
  922. " kmeans.fit(X)\n",
  923. " meandistortions.append(\\\n",
  924. " sum(np.min(cdist(X,kmeans.cluster_centers_,'euclidean'),axis=1))/X.shape[0])\n",
  925. "\n",
  926. "plt.plot(K,meandistortions,'bx-')\n",
  927. "plt.xlabel('k')\n",
  928. "plt.ylabel('Average Dispersion')\n",
  929. "plt.title('Selecting k with the Elbow Method')\n",
  930. "plt.show()"
  931. ]
  932. },
  933. {
  934. "cell_type": "markdown",
  935. "metadata": {},
  936. "source": [
  937. "从上图可见,类簇数量从1降到2再降到3的过程,更改K值让整体聚类结构有很大改变,这意味着新的聚类数量让算法有更大的收敛空间,这样的K值不能反映真实的类簇数量。而当K=3以后再增大K,平均距离的下降速度显著变缓慢,这意味着进一步增加K值不再会有利于算法的收敛,同时也暗示着K=3是相对最佳的类簇数量。"
  938. ]
  939. }
  940. ],
  941. "metadata": {
  942. "jupytext_formats": "ipynb,py",
  943. "kernelspec": {
  944. "display_name": "Python 3",
  945. "language": "python",
  946. "name": "python3"
  947. },
  948. "language_info": {
  949. "codemirror_mode": {
  950. "name": "ipython",
  951. "version": 3
  952. },
  953. "file_extension": ".py",
  954. "mimetype": "text/x-python",
  955. "name": "python",
  956. "nbconvert_exporter": "python",
  957. "pygments_lexer": "ipython3",
  958. "version": "3.6.9"
  959. }
  960. },
  961. "nbformat": 4,
  962. "nbformat_minor": 2
  963. }

机器学习越来越多应用到飞行器、机器人等领域,其目的是利用计算机实现类似人类的智能,从而实现装备的智能化与无人化。本课程旨在引导学生掌握机器学习的基本知识、典型方法与技术,通过具体的应用案例激发学生对该学科的兴趣,鼓励学生能够从人工智能的角度来分析、解决飞行器、机器人所面临的问题和挑战。本课程主要内容包括Python编程基础,机器学习模型,无监督学习、监督学习、深度学习基础知识与实现,并学习如何利用机器学习解决实际问题,从而全面提升自我的《综合能力》。