|
|
@@ -13,18 +13,19 @@ |
|
|
|
"source": [ |
|
|
|
"## 方法\n", |
|
|
|
"\n", |
|
|
|
"由于具有出色的速度和良好的可扩展性,K-Means聚类算法算得上是最著名的聚类方法。***K-Means算法是一个重复移动类中心点的过程,把类的中心点,也称重心(centroids),移动到其包含成员的平均位置,然后重新划分其内部成员。***\n", |
|
|
|
"Because of the excellent speed and good expandability,K-Means cluster method is regarded as the most famous cluster method。***K-Means algorthms is a process which repeatly moves the center point,moving the center point of the class which is called centroids to the average position of all other members,and redivide the members of it.***\n", |
|
|
|
"\n", |
|
|
|
"K是算法计算出的超参数,表示类的数量;K-Means可以自动分配样本到不同的类,但是不能决定究竟要分几个类。\n", |
|
|
|
"K is the hyper-parameter which has been calculated, represents the numbers of class. K-means can distribute sample into different class automatically, without deciding the numbers of the class.\n", |
|
|
|
"\n", |
|
|
|
"K必须是一个比训练集样本数小的正整数。有时,类的数量是由问题内容指定的。例如,一个鞋厂有三种新款式,它想知道每种新款式都有哪些潜在客户,于是它调研客户,然后从数据里找出三类。也有一些问题没有指定聚类的数量,最优的聚类数量是不确定的。\n", |
|
|
|
"K must be a positive integer smaller than the number of samples in the training set. Sometimes, the number of classes is specified by the content of the question. For example, a shoe factory has three new styles. It wants to know which potential customers each new style has, so it investigates customers and then finds out three types from the data. \n", |
|
|
|
"\n", |
|
|
|
"K-Means的参数是类的重心位置和其内部观测值的位置。与广义线性模型和决策树类似,K-Means参数的最优解也是以成本函数最小化为目标。K-Means成本函数公式如下:\n", |
|
|
|
"The parameter of K-Means is the centriod positon of class and the position of its internal observation. Similar with generalized linear models and decision tree, the optimal solution of k-means parameter is also the goal of minimizing the cost function. The cost function of K-Means is\n", |
|
|
|
":\n", |
|
|
|
"$$\n", |
|
|
|
"J = \\sum_{k=1}^{K} \\sum_{i \\in C_k} | x_i - u_k|^2\n", |
|
|
|
"$$\n", |
|
|
|
"\n", |
|
|
|
"$u_k$是第$k$个类的重心位置,定义为:\n", |
|
|
|
"$u_k$is the centriod poisition$k$个类的重心位置,定义为:\n", |
|
|
|
"$$\n", |
|
|
|
"u_k = \\frac{1}{|C_k|} \\sum_{x \\in C_k} x\n", |
|
|
|
"$$\n", |
|
|
@@ -989,7 +990,7 @@ |
|
|
|
"name": "python", |
|
|
|
"nbconvert_exporter": "python", |
|
|
|
"pygments_lexer": "ipython3", |
|
|
|
"version": "3.6.8" |
|
|
|
"version": "3.6.5" |
|
|
|
} |
|
|
|
}, |
|
|
|
"nbformat": 4, |
|
|
|