add changes to logestic regression

4 years ago · 16e70e29f6
--- a/3_kmeans/2-kmeans-color-vq.ipynb
+++ b/3_kmeans/2-kmeans-color-vq.ipynb
@@ -4,12 +4,12 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Color Quantization by K-Means\n",
    "# 用Ｋ-means进行颜色量化\n",
    "\n",
    "Performs a pixel-wise **Vector Quantization (VQ)** of an image of the summer palace (China), reducing the number of colors required to show the image from 96,615 unique colors to 64, while preserving the overall appearance quality.\n",
    "对圆明园的图像进行**像素矢量量化(VQ)**，将显示图像所需的颜色从96,615种减少到64种，同时保持整体外观质量。\n",
    "\n",
    "In this example, pixels are represented in a 3D-space and K-means is used to find 64 color clusters. In the image processing literature, the codebook obtained from K-means (the cluster centers) is called the color palette. Using a single byte, up to 256 colors can be addressed, whereas an RGB encoding requires 3 bytes per pixel. The GIF file format, for example, uses such a palette.\n",
    "\n",
    "在本例中，像素在3d空间中表示，使用K-means找到64个颜色簇。在图像处理文献中，由K-means(聚类中心)得到的码本称为调色板。使用单个字节，最多可以寻址256种颜色，而RGB编码需要每个像素3个字节。例如，GIF文件格式就使用了这样一个调色板。\n",
    "\n"
   ]
  },
@@ -64,15 +64,15 @@
   "source": [
    "n_colors = 64\n",
    "\n",
    "# Load the Summer Palace photo\n",
    "# 加载圆明园的图像\n",
    "china = load_sample_image(\"china.jpg\")\n",
    "\n",
    "# Convert to floats instead of the default 8 bits integer coding. Dividing by\n",
    "# 255 is important so that plt.imshow behaves works well on float data (need to\n",
    "# be in the range [0-1])\n",
    "# 转化为浮点数而不是默认的８位整数编码。\n",
    "# 除以255是重要的这样plt.imshow在浮点数(需要在[0-1]的范围)上的表现会很好 \n",
    "\n",
    "china = np.array(china, dtype=np.float64) / 255\n",
    "\n",
    "# Load Image and transform to a 2D numpy array.\n",
    "# 加载图像并转化成2D的numpy数组。\n",
    "w, h, d = original_shape = tuple(china.shape)\n",
    "assert d == 3\n",
    "image_array = np.reshape(china, (w * h, d))\n",
@@ -83,7 +83,7 @@
    "kmeans = KMeans(n_clusters=n_colors, random_state=0).fit(image_array_sample)\n",
    "print(\"    done in %0.3fs.\" % (time() - t0))\n",
    "\n",
    "# Get labels for all points\n",
    "# 获得所有点的标签\n",
    "print(\"Predicting color indices on the full image (k-means)\")\n",
    "t0 = time()\n",
    "labels = kmeans.predict(image_array)\n",
@@ -140,7 +140,7 @@
    }
   ],
   "source": [
    "# draw original image\n",
    "# 画出原始图像\n",
    "plt.figure(1)\n",
    "plt.clf()\n",
    "ax = plt.axes([0, 0, 1, 1])\n",
@@ -178,7 +178,7 @@
    }
   ],
   "source": [
    "# 64 VQ image\n",
    "# 64 VQ 图像\n",
    "plt.figure(2)\n",
    "plt.clf()\n",
    "ax = plt.axes([0, 0, 1, 1])\n",
@@ -206,7 +206,7 @@
    }
   ],
   "source": [
    "# Random VQ image\n",
    "# 随机VQ图像\n",
    "plt.figure(3)\n",
    "plt.clf()\n",
    "ax = plt.axes([0, 0, 1, 1])\n",
--- a/3_kmeans/3-ClusteringAlgorithms.ipynb
+++ b/3_kmeans/3-ClusteringAlgorithms.ipynb
@@ -4,11 +4,12 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Comparing different clustering algorithms on toy datasets\n",
    "# 在玩具数据集上比较不同的聚类算法\n",
    "\n",
    "This example shows characteristics of different clustering algorithms on datasets that are “interesting” but still in 2D. With the exception of the last dataset, the parameters of each of these dataset-algorithm pairs has been tuned to produce good clustering results. Some algorithms are more sensitive to parameter values than others.\n",
    "The last dataset is an example of a ‘null’ situation for clustering: the data is homogeneous, and there is no good clustering. For this example, the null dataset uses the same parameters as the dataset in the row above it, which represents a mismatch in the parameter values and the data structure.\n",
    "While these examples give some intuition about the algorithms, this intuition might not apply to very high dimensional data."
    "这个例子展示了不同的聚类算法在数据集上的特征，这些数据集虽然有趣但仍然是2D的。除了最后一个数据集以外，每一个数据集的参数都经过调整为了产生更好的效果。一些算法对于参数值比其他算法更加敏感。\n",
    "最后一个数据集是一个\"null\"情况下的聚类：数据是同构的，没有良好的聚类。对于本例，null数据集使用与上面一行中的数据集相同的参数，这表示参数值和数据结构不匹配。\n",
    "这些例子\n",
    "虽然这些例子给出了一些关于算法的直觉，但这种直觉可能不适用于非常高维的数据。"
   ]
  },
  {
@@ -44,8 +45,8 @@
    "np.random.seed(0)\n",
    "\n",
    "# ============\n",
    "# Generate datasets. We choose the size big enough to see the scalability\n",
    "# of the algorithms, but not too big to avoid too long running times\n",
    "# 生成数据集，我们选择足够大的数据来观察算法的可伸缩性，\n",
    "# 但是不能太大以免需要较长的运行时间。\n",
    "# ============\n",
    "n_samples = 1500\n",
    "noisy_circles = datasets.make_circles(n_samples=n_samples, factor=.5,\n",
--- a/4_logistic_regression/1-Least_squares.ipynb
+++ b/4_logistic_regression/1-Least_squares.ipynb
@@ -149,7 +149,7 @@
    "A1 = np.array([[S_X2, S_X], \n",
    "               [S_X, N]])\n",
    "B1 = np.array([S_XY, S_Y])\n",
    "\n",
    "# numpy.linalg模块包含线性代数的函数。使用这个模块，可以计算逆矩阵、求特征值、解线性方程组以及求解行列式等。\n",
    "coeff = np.linalg.inv(A1).dot(B1)\n",
    "\n",
    "print('a = %f, b = %f' % (coeff[0], coeff[1]))\n",
--- a/4_logistic_regression/1-Least_squares_EN.ipynb
+++ b/4_logistic_regression/1-Least_squares_EN.ipynb