|
|
@@ -25,6 +25,89 @@ |
|
|
|
] |
|
|
|
}, |
|
|
|
{ |
|
|
|
"cell_type": "markdown", |
|
|
|
"metadata": {}, |
|
|
|
"source": [ |
|
|
|
"### 逻辑回归表达式\n", |
|
|
|
"\n", |
|
|
|
"这个函数称为Logistic函数(logistic function),也称为Sigmoid函数(sigmoid function)。函数公式如下:\n", |
|
|
|
"\n", |
|
|
|
"$$\n", |
|
|
|
"g(z) = \\frac{1}{1+e^{-z}}\n", |
|
|
|
"$$\n", |
|
|
|
"\n", |
|
|
|
"Logistic函数当z趋近于无穷大时,g(z)趋近于1;当z趋近于无穷小时,g(z)趋近于0。Logistic函数的图形如上图所示。Logistic函数求导时有一个特性,这个特性将在下面的推导中用到,这个特性为:\n", |
|
|
|
"$$\n", |
|
|
|
"g'(z) = \\frac{d}{dz} \\frac{1}{1+e^{-z}} \\\\\n", |
|
|
|
" = \\frac{1}{(1+e^{-z})^2}(e^{-z}) \\\\\n", |
|
|
|
" = \\frac{1}{(1+e^{-z})} (1 - \\frac{1}{(1+e^{-z})}) \\\\\n", |
|
|
|
" = g(z)(1-g(z))\n", |
|
|
|
"$$\n", |
|
|
|
"\n" |
|
|
|
] |
|
|
|
}, |
|
|
|
{ |
|
|
|
"cell_type": "markdown", |
|
|
|
"metadata": {}, |
|
|
|
"source": [ |
|
|
|
"逻辑回归本质上是线性回归,只是在特征到结果的映射中加入了一层函数映射,即先把特征线性求和,然后使用函数$g(z)$将最为假设函数来预测。$g(z)$可以将连续值映射到0到1之间。线性回归模型的表达式带入$g(z)$,就得到逻辑回归的表达式:\n", |
|
|
|
"\n", |
|
|
|
"$$\n", |
|
|
|
"h_\\theta(x) = g(\\theta^T x) = \\frac{1}{1+e^{-\\theta^T x}}\n", |
|
|
|
"$$" |
|
|
|
] |
|
|
|
}, |
|
|
|
{ |
|
|
|
"cell_type": "markdown", |
|
|
|
"metadata": {}, |
|
|
|
"source": [ |
|
|
|
"### 逻辑回归的软分类\n", |
|
|
|
"\n", |
|
|
|
"现在我们将y的取值$h_\\theta(x)$通过Logistic函数归一化到(0,1)间,$y$的取值有特殊的含义,它表示结果取1的概率,因此对于输入$x$分类结果为类别1和类别0的概率分别为:\n", |
|
|
|
"\n", |
|
|
|
"$$\n", |
|
|
|
"P(y=1|x,\\theta) = h_\\theta(x) \\\\\n", |
|
|
|
"P(y=0|x,\\theta) = 1 - h_\\theta(x)\n", |
|
|
|
"$$\n", |
|
|
|
"\n", |
|
|
|
"对上面的表达式合并一下就是:\n", |
|
|
|
"\n", |
|
|
|
"$$\n", |
|
|
|
"p(y|x,\\theta) = (h_\\theta(x))^y (1 - h_\\theta(x))^{1-y}\n", |
|
|
|
"$$\n", |
|
|
|
"\n" |
|
|
|
] |
|
|
|
}, |
|
|
|
{ |
|
|
|
"cell_type": "markdown", |
|
|
|
"metadata": {}, |
|
|
|
"source": [ |
|
|
|
"### 梯度上升\n", |
|
|
|
"\n", |
|
|
|
"得到了逻辑回归的表达式,下一步跟线性回归类似,构建似然函数,然后最大似然估计,最终推导出$\\theta$的迭代更新表达式。只不过这里用的不是梯度下降,而是梯度上升,因为这里是最大化似然函数。\n", |
|
|
|
"\n", |
|
|
|
"我们假设训练样本相互独立,那么似然函数表达式为:\n", |
|
|
|
"\n", |
|
|
|
"\n", |
|
|
|
"同样对似然函数取log,转换为:\n", |
|
|
|
"\n", |
|
|
|
"\n", |
|
|
|
"转换后的似然函数对$\\theta$求偏导,在这里我们以只有一个训练样本的情况为例:\n", |
|
|
|
"\n", |
|
|
|
"\n", |
|
|
|
"这个求偏导过程中:\n", |
|
|
|
"* 第一步是对$\\theta$偏导的转化,依据偏导公式:$y=lnx$, $y'=1/x$。\n", |
|
|
|
"* 第二步是根据g(z)求导的特性g'(z) = g(z)(1 - g(z)) 。\n", |
|
|
|
"* 第三步就是普通的变换。\n", |
|
|
|
"\n", |
|
|
|
"这样我们就得到了梯度上升每次迭代的更新方向,那么$\\theta$的迭代表达式为:\n", |
|
|
|
"$$\n", |
|
|
|
"\\theta_j := \\theta_j + \\alpha(y^i - h_\\theta(x^i)) x_j^i\n", |
|
|
|
"$$\n", |
|
|
|
"\n" |
|
|
|
] |
|
|
|
}, |
|
|
|
{ |
|
|
|
"cell_type": "code", |
|
|
|
"execution_count": 2, |
|
|
|
"metadata": {}, |
|
|
|