diff --git a/1_logistic_regression/Logistic_regression.ipynb b/1_logistic_regression/Logistic_regression.ipynb index 2e6c35f..53b24cf 100644 --- a/1_logistic_regression/Logistic_regression.ipynb +++ b/1_logistic_regression/Logistic_regression.ipynb @@ -25,6 +25,89 @@ ] }, { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 逻辑回归表达式\n", + "\n", + "这个函数称为Logistic函数(logistic function),也称为Sigmoid函数(sigmoid function)。函数公式如下:\n", + "\n", + "$$\n", + "g(z) = \\frac{1}{1+e^{-z}}\n", + "$$\n", + "\n", + "Logistic函数当z趋近于无穷大时,g(z)趋近于1;当z趋近于无穷小时,g(z)趋近于0。Logistic函数的图形如上图所示。Logistic函数求导时有一个特性,这个特性将在下面的推导中用到,这个特性为:\n", + "$$\n", + "g'(z) = \\frac{d}{dz} \\frac{1}{1+e^{-z}} \\\\\n", + " = \\frac{1}{(1+e^{-z})^2}(e^{-z}) \\\\\n", + " = \\frac{1}{(1+e^{-z})} (1 - \\frac{1}{(1+e^{-z})}) \\\\\n", + " = g(z)(1-g(z))\n", + "$$\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "逻辑回归本质上是线性回归,只是在特征到结果的映射中加入了一层函数映射,即先把特征线性求和,然后使用函数$g(z)$将最为假设函数来预测。$g(z)$可以将连续值映射到0到1之间。线性回归模型的表达式带入$g(z)$,就得到逻辑回归的表达式:\n", + "\n", + "$$\n", + "h_\\theta(x) = g(\\theta^T x) = \\frac{1}{1+e^{-\\theta^T x}}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 逻辑回归的软分类\n", + "\n", + "现在我们将y的取值$h_\\theta(x)$通过Logistic函数归一化到(0,1)间,$y$的取值有特殊的含义,它表示结果取1的概率,因此对于输入$x$分类结果为类别1和类别0的概率分别为:\n", + "\n", + "$$\n", + "P(y=1|x,\\theta) = h_\\theta(x) \\\\\n", + "P(y=0|x,\\theta) = 1 - h_\\theta(x)\n", + "$$\n", + "\n", + "对上面的表达式合并一下就是:\n", + "\n", + "$$\n", + "p(y|x,\\theta) = (h_\\theta(x))^y (1 - h_\\theta(x))^{1-y}\n", + "$$\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 梯度上升\n", + "\n", + "得到了逻辑回归的表达式,下一步跟线性回归类似,构建似然函数,然后最大似然估计,最终推导出$\\theta$的迭代更新表达式。只不过这里用的不是梯度下降,而是梯度上升,因为这里是最大化似然函数。\n", + "\n", + "我们假设训练样本相互独立,那么似然函数表达式为:\n", + "![Loss](images/eq_loss.png)\n", + "\n", + "同样对似然函数取log,转换为:\n", + "![LogLoss](images/eq_logloss.png)\n", + "\n", + "转换后的似然函数对$\\theta$求偏导,在这里我们以只有一个训练样本的情况为例:\n", + "![LogLossDiff](images/eq_logloss_diff.png)\n", + "\n", + "这个求偏导过程中:\n", + "* 第一步是对$\\theta$偏导的转化,依据偏导公式:$y=lnx$, $y'=1/x$。\n", + "* 第二步是根据g(z)求导的特性g'(z) = g(z)(1 - g(z)) 。\n", + "* 第三步就是普通的变换。\n", + "\n", + "这样我们就得到了梯度上升每次迭代的更新方向,那么$\\theta$的迭代表达式为:\n", + "$$\n", + "\\theta_j := \\theta_j + \\alpha(y^i - h_\\theta(x^i)) x_j^i\n", + "$$\n", + "\n" + ] + }, + { "cell_type": "code", "execution_count": 2, "metadata": {}, diff --git a/1_logistic_regression/Logistic_regression.py b/1_logistic_regression/Logistic_regression.py index eabdefb..971b871 100644 --- a/1_logistic_regression/Logistic_regression.py +++ b/1_logistic_regression/Logistic_regression.py @@ -38,6 +38,72 @@ # # +# ### 逻辑回归表达式 +# +# 这个函数称为Logistic函数(logistic function),也称为Sigmoid函数(sigmoid function)。函数公式如下: +# +# $$ +# g(z) = \frac{1}{1+e^{-z}} +# $$ +# +# Logistic函数当z趋近于无穷大时,g(z)趋近于1;当z趋近于无穷小时,g(z)趋近于0。Logistic函数的图形如上图所示。Logistic函数求导时有一个特性,这个特性将在下面的推导中用到,这个特性为: +# $$ +# g'(z) = \frac{d}{dz} \frac{1}{1+e^{-z}} \\ +# = \frac{1}{(1+e^{-z})^2}(e^{-z}) \\ +# = \frac{1}{(1+e^{-z})} (1 - \frac{1}{(1+e^{-z})}) \\ +# = g(z)(1-g(z)) +# $$ +# +# + +# 逻辑回归本质上是线性回归,只是在特征到结果的映射中加入了一层函数映射,即先把特征线性求和,然后使用函数$g(z)$将最为假设函数来预测。$g(z)$可以将连续值映射到0到1之间。线性回归模型的表达式带入$g(z)$,就得到逻辑回归的表达式: +# +# $$ +# h_\theta(x) = g(\theta^T x) = \frac{1}{1+e^{-\theta^T x}} +# $$ + +# ### 逻辑回归的软分类 +# +# 现在我们将y的取值$h_\theta(x)$通过Logistic函数归一化到(0,1)间,$y$的取值有特殊的含义,它表示结果取1的概率,因此对于输入$x$分类结果为类别1和类别0的概率分别为: +# +# $$ +# P(y=1|x,\theta) = h_\theta(x) \\ +# P(y=0|x,\theta) = 1 - h_\theta(x) +# $$ +# +# 对上面的表达式合并一下就是: +# +# $$ +# p(y|x,\theta) = (h_\theta(x))^y (1 - h_\theta(x))^{1-y} +# $$ +# +# + +# ### 梯度上升 +# +# 得到了逻辑回归的表达式,下一步跟线性回归类似,构建似然函数,然后最大似然估计,最终推导出$\theta$的迭代更新表达式。只不过这里用的不是梯度下降,而是梯度上升,因为这里是最大化似然函数。 +# +# 我们假设训练样本相互独立,那么似然函数表达式为: +# ![Loss](images/eq_loss.png) +# +# 同样对似然函数取log,转换为: +# ![LogLoss](images/eq_logloss.png) +# +# 转换后的似然函数对$\theta$求偏导,在这里我们以只有一个训练样本的情况为例: +# ![LogLossDiff](images/eq_logloss_diff.png) +# +# 这个求偏导过程中: +# * 第一步是对$\theta$偏导的转化,依据偏导公式:$y=lnx$, $y'=1/x$。 +# * 第二步是根据g(z)求导的特性g'(z) = g(z)(1 - g(z)) 。 +# * 第三步就是普通的变换。 +# +# 这样我们就得到了梯度上升每次迭代的更新方向,那么$\theta$的迭代表达式为: +# $$ +# \theta_j := \theta_j + \alpha(y^i - h_\theta(x^i)) x_j^i +# $$ +# +# + # + # %matplotlib inline diff --git a/1_logistic_regression/images/eq_logloss.png b/1_logistic_regression/images/eq_logloss.png new file mode 100644 index 0000000..a802d44 Binary files /dev/null and b/1_logistic_regression/images/eq_logloss.png differ diff --git a/1_logistic_regression/images/eq_logloss_diff.png b/1_logistic_regression/images/eq_logloss_diff.png new file mode 100644 index 0000000..337f9c5 Binary files /dev/null and b/1_logistic_regression/images/eq_logloss_diff.png differ diff --git a/1_logistic_regression/images/eq_loss.png b/1_logistic_regression/images/eq_loss.png new file mode 100644 index 0000000..8e1bd6b Binary files /dev/null and b/1_logistic_regression/images/eq_loss.png differ