{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 线性模型和梯度下降\n", "\n", "本节我们简单回顾一下线性回归模型,并演示一下如何使用PyTorch来对线性回归模型进行建模和模型参数计算。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. 一元线性回归\n", "一元线性模型非常简单,假设我们有变量 $x_i$ 和目标 $y_i$,每个 i 对应于一个数据点,希望建立一个模型\n", "\n", "$$\n", "\\hat{y}_i = w x_i + b\n", "$$\n", "\n", "$\\hat{y}_i$ 是我们预测的结果,希望通过 $\\hat{y}_i$ 来拟合目标 $y_i$,通俗来讲就是找到这个函数拟合 $y_i$ 使得误差最小,即最小化\n", "\n", "$$\n", "\\frac{1}{n} \\sum_{i=1}^n(\\hat{y}_i - y_i)^2\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "那么如何最小化这个误差呢?\n", "\n", "这里需要用到**梯度下降**,这是我们接触到的第一个优化算法,非常简单,但是却非常强大,在深度学习中被大量使用,所以让我们从简单的例子出发了解梯度下降法的原理" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. 梯度下降法\n", "在梯度下降法中,我们首先要明确梯度的概念,随后我们再了解如何使用梯度进行下降。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.1 梯度\n", "梯度在数学上就是导数,如果是一个多元函数,那么梯度就是偏导数。比如一个函数f(x, y),那么 f 的梯度就是 \n", "\n", "$$\n", "(\\frac{\\partial f}{\\partial x},\\ \\frac{\\partial f}{\\partial y})\n", "$$\n", "\n", "可以称为 grad f(x, y) 或者 $\\nabla f(x, y)$。具体某一点 $(x_0,\\ y_0)$ 的梯度就是 $\\nabla f(x_0,\\ y_0)$。\n", "\n", "下面这个图片是 $f(x) = x^2$ 这个函数在 x=1 处的梯度\n", "\n", "![](https://ws3.sinaimg.cn/large/006tNc79ly1fmarbuh2j3j30ba0b80sy.jpg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "梯度有什么意义呢?从几何意义来讲,一个点的梯度值是这个函数变化最快的地方,具体来说,对于函数 f(x, y),在点 $(x_0, y_0)$ 处,沿着梯度 $\\nabla f(x_0,\\ y_0)$ 的方向,函数增加最快,也就是说沿着梯度的方向,我们能够更快地找到函数的极大值点,或者反过来沿着梯度的反方向,我们能够更快地找到函数的最小值点。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.2 梯度下降法\n", "有了对梯度的理解,我们就能了解梯度下降发的原理了。上面我们需要最小化这个误差,也就是需要找到这个误差的最小值点,那么沿着梯度的反方向我们就能够找到这个最小值点。\n", "\n", "我们可以来看一个直观的解释。比如我们在一座大山上的某处位置,由于我们不知道怎么下山,于是决定走一步算一步,也就是在每走到一个位置的时候,求解当前位置的梯度,沿着梯度的负方向,也就是当前最陡峭的位置向下走一步,然后继续求解当前位置梯度,向这一步所在位置沿着最陡峭最易下山的位置走一步。这样一步步的走下去,一直走到觉得我们已经到了山脚。当然这样走下去,有可能我们不能走到山脚,而是到了某一个局部的山峰低处。\n", "\n", "类比我们的问题,就是沿着梯度的反方向,我们不断改变 w 和 b 的值,最终找到一组最好的 w 和 b 使得误差最小。\n", "\n", "在更新的时候,我们需要决定每次更新的幅度,比如在下山的例子中,我们需要每次往下走的那一步的长度,这个长度称为学习率,用 $\\eta$ 表示,这个学习率非常重要,不同的学习率都会导致不同的结果,学习率太小会导致下降非常缓慢,学习率太大又会导致跳动非常明显,可以看看下面的例子\n", "\n", "![](https://ws2.sinaimg.cn/large/006tNc79ly1fmgn23lnzjg30980gogso.gif)\n", "\n", "可以看到上面的学习率较为合适,而下面的学习率太大,就会导致不断跳动\n", "\n", "最后我们的更新公式就是\n", "\n", "$$\n", "w := w - \\eta \\frac{\\partial f(w,\\ b)}{\\partial w} \\\\\n", "b := b - \\eta \\frac{\\partial f(w,\\ b)}{\\partial b}\n", "$$\n", "\n", "通过不断地迭代更新,最终我们能够找到一组最优的 w 和 b,这就是梯度下降法的原理。\n", "\n", "最后可以通过这张图形象地说明一下这个方法\n", "\n", "![](https://ws3.sinaimg.cn/large/006tNc79ly1fmarxsltfqj30gx091gn4.jpg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.3 PyTorch实现\n", "\n", "上面是原理部分,下面通过一个例子来进一步学习线性模型" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import torch\n", "import numpy as np\n", "from torch.autograd import Variable\n", "\n", "torch.manual_seed(2021)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAWoAAAD4CAYAAADFAawfAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAAPB0lEQVR4nO3df4xsZ13H8fd3uam6TQXSe2tM6e5CBKS5Biibpv5BlVRJbUybKGrJIoKVDWCq6F8k+4dGc/8gURNNiLrB366ILWJugjb1B9hIaHEuLbSUQNp699JS6aK0GjfQln7948z23m5m75y5O+fMc+a8X8lmZs6cO/t9ZrafPnPO8zwnMhNJUrkWZl2AJOn8DGpJKpxBLUmFM6glqXAGtSQV7kgTL3r06NFcWVlp4qUlaS6dOnXq65l5bNRzjQT1ysoKg8GgiZeWpLkUEdsHPeehD0kqnEEtSYUzqCWpcAa1JBXOoJakwhnUknRIW1uwsgILC9Xt1tZ0X7+R4XmS1BdbW7C+Dru71ePt7eoxwNradH6HPWpJOoSNjbMhvWd3t9o+LQa1JB3CmTOTbb8QBrUkHcLS0mTbL4RBLUmHcOIELC6+cNviYrV9WgxqSTqEtTXY3ITlZYiobjc3p3ciERz1IUmHtrY23WDezx61JBXOoJakwhnUklQ4g1qSCmdQS1LhDGpJKpxBLUmFM6glqXAGtSQVzqCWpMIZ1JJUOINakgpnUEtS4QxqSSqcQS1JhTOoJalwBrUkFc6glqTCGdSSVDiDWpIKZ1BLUuEMakkqnEEtSYWrFdQR8csR8UBEfCEi3tdwTZKkc4wN6og4DrwLuBp4LfDjEfF9TRcmSarU6VG/BrgnM3cz81ngX4GfaLYsSdKeOkH9APDGiLg0IhaBG4Ar9u8UEesRMYiIwc7OzrTrlKTeGhvUmflF4APAncAdwH3At0fst5mZq5m5euzYsWnXKUm9VetkYmb+UWa+ITOvBb4BfLnZsiRJe47U2SkiLsvMJyJiier49DXNliVJ2lMrqIGPRsSlwDPAL2bmk82VJEk6V62gzsw3Nl2IJGk0ZyZKUuEMakkqnEEtSYUzqCWpcAa1JBXOoJakwhnUklQ4g1qaI1tbsLICCwvV7dbWrCvSNNSdmSipcFtbsL4Ou7vV4+3t6jHA2trs6tLh2aOW5sTGxtmQ3rO7W21XtxnU0pw4c2ay7eoOg1qaE0tLk21XdxjU0pw4cQIWF1+4bXGx2q5uM6ilObG2BpubsLwMEdXt5qYnEueBoz6kObK2ZjDPI3vUklQ4g1qSCmdQS1LhDGpJKpxBLUmFM6glqXAGtSQVzqCWpMIZ1JJUOINakgpnUEtS4QxqSSqcQS1JhTOoJalwBrVUoJKuJl5SLX1lUKu3Sg2gvauJb29D5tmric+ivmnWUur73QWRmVN/0dXV1RwMBlN/XWla9gLo3Kt2Ly6WcUWUlZUqEPdbXobTp7tZS8nvdyki4lRmro58rk5QR8SvAL8AJHA/8M7M/OZB+xvUKl1JYbjfwkLVe90vAp57rpu1lPx+l+J8QT320EdEXA78ErCamceBFwE3T7dEqV1nzky2vU0lXU18WrWU/H53Qd1j1EeA74qII8Ai8NXmSpKaV1IY7lfS1cSnVUvJ73cXjA3qzHwM+C3gDPA48FRm3rl/v4hYj4hBRAx2dnamX6k0RSWF4X4lXU18WrWU/H53wdhj1BHxUuCjwM8ATwK3Abdn5l8e9G88Rq0u2NqCjY3q6/fSUhUanthqju/3+R3qZGJE/BRwfWbeMnz8duCazHzvQf/GoJakyRzqZCLVIY9rImIxIgK4DvjiNAuUJB2szjHqe4Dbgc9SDc1bADYbrkuSNFRr1Edm/lpmfn9mHs/Mn83MbzVdmKRuceZhc47MugBJ3bd/5uHeVHPwhOE0uNaHWmWvaz5tbLxwejhUjzc2ZlPPvLFHrdbY65pfzjxslj1qtabvva55/jbhzMNmGdRqTZ97XSUtXdoEZx42y6BWa/rc65r3bxMlTXufRwa1WtPnXlcfvk2srVVLlj73XHVrSE+PQa3W9LnX1edvEzo8g1qt6muvq8/fJnR4BrXUgj5/m9DhOY5aasnamsGsC2OPWpIKZ1BLOtA8T9LpEg99SBrJKf/lsEctaaR5n6TTJQa1pJH6MEmnKwxqSSM5SaccBrWkkZykUw6DWtJITtIph0EtdVBbw+b6OuW/NA7PkzrGYXP9Y49a6hiHzfWPQS11jMPm+segljrGYXP9Y1BLHeOwuf4xqKWOcdhc/zjqQ+og17buF3vUklQ4g1qSCmdQS1LhDGpJKpxBLUmFGxvUEfHqiLjvnJ//iYj3tVCbJIkaQZ2ZX8rM12Xm64A3ALvAx5ouTNLseFHbskw6jvo64OHM3G6iGEmz5+p85Zn0GPXNwIebKERSGVydrzy1gzoiLgJuBG474Pn1iBhExGBnZ2da9UlqmavzlWeSHvWPAZ/NzK+NejIzNzNzNTNXjx07Np3qJLXO1fnKM0lQv5UGD3t48kIqg6vzladWUEfExcCPAn/bRBF7Jy+2tyHz7MkLw1pqn6vzlScyc+ovurq6moPBoPb+KytVOO+3vFxdUFOS5l1EnMrM1VHPFTEz0ZMXknSwIoLakxeSdLAigtqTF/V50lXqnyKC2pMX9XjSVeqnIk4mqh5Pukrzq/iTiarHk65SPxnUHeJJV6mfDOoO8aSr1E8GdYd40lXqp0nXo9aMra0ZzFLf2KOWpMIZ1JJUOINakgpnUEtS4QxqSSqcQS1JhTOoJalwBnUPuVSq1C1OeOmZvaVSd3erx3tLpYITaaRS2aPumY2NsyG9Z3e32i6pTAZ1z0xjqVQPnUjtMqh75rBLpXqVGal9BnXPHHapVA+dSO0zqHvmsEulepUZqX2O+uihwyyVurQ0+rqNXmVGao49ak3Eq8xI7TOoO2wWoy+8yozUPg99dNQsJ654lRmpXfaoO8rRF1J/GNQd5egLqT8M6o467MQVSd1hUHeUoy+k/jCoO8rRF1J/1Br1EREvAT4EHAcS+PnM/HSDdakGR19I/VB3eN7vAndk5lsi4iJgcdw/kCRNx9igjogXA9cC7wDIzKeBp5stS5K0p84x6pcDO8CfRMS9EfGhiLh4/04RsR4Rg4gY7OzsTL1QSeqrOkF9BLgK+P3MfD3wf8D79++UmZuZuZqZq8eOHZtymZLUX3WC+lHg0cy8Z/j4dqrgliS1YGxQZ+Z/Al+JiFcPN10HPNhoVZKk59Ud9XErsDUc8fEI8M7mSpIknatWUGfmfcBqs6VIkkbp1cxEr54tqYt6sx71LNdvlqTD6E2P2vWbJXVVb4La9ZsldVVvgtr1myV1VW+C2vWbJXVVb4La9ZsldVVvRn2A6zdL6qbe9KglqasM6gI4EUfS+fTq0EeJnIgjaRx71DPmRBxJ4xjUM+ZEHEnjGNQz5kQcSeMY1DPmRBxJ4xjUM+ZEHEnjOOqjAE7EkXQ+9qglqXAGtSQVzqCWpMIZ1JJUOINakgpnUEtS4QxqSSqcQS1JhTOoJalwBrUkFc6glqTCGdSSVDiDWpIKZ1BLUuEMakkqXK31qCPiNPC/wLeBZzNztcmiJElnTXLhgDdl5tcbq0SSNNLcHPrY2oKVFVhYqG63tmZdkSRNR92gTuDOiDgVEeujdoiI9YgYRMRgZ2dnehXWsLUF6+uwvQ2Z1e36umEtaT5EZo7fKeLyzHwsIi4D/hG4NTPvOmj/1dXVHAwGUyzz/FZWqnDeb3kZTp9urQxJumARceqg83+1etSZ+djw9gngY8DV0yvv8M6cmWy7JHXJ2KCOiIsj4pK9+8CbgQeaLmwSS0uTbZekLqnTo/4e4N8i4nPAZ4CPZ+YdzZY1mRMnYHHxhdsWF6vtktR1Y4fnZeYjwGtbqOWCra1Vtxsb1eGOpaUqpPe2S1KXTTKOumhrawazpPk0N+OoJWleGdSSVDiDWpIKZ1BLUuEMakkqXK0p5BO/aMQOMGJS9/OOAn1dic+291Nf297XdsPkbV/OzGOjnmgkqMeJiEFf17S27ba9T/rabphu2z30IUmFM6glqXCzCurNGf3eEtj2fupr2/vabphi22dyjFqSVJ+HPiSpcAa1JBWu0aCOiOsj4ksR8VBEvH/E898RER8ZPn9PRKw0WU+barT9VyPiwYj4fET8c0Qsz6LOJoxr+zn7/WREZETMxfCtOu2OiJ8efu5fiIi/arvGptT4e1+KiE9ExL3Dv/kbZlHntEXEH0fEExEx8mIqUfm94fvy+Yi46oJ+UWY28gO8CHgYeAVwEfA54Mp9+7wX+IPh/ZuBjzRVT5s/Ndv+JmBxeP89fWr7cL9LgLuAu4HVWdfd0mf+SuBe4KXDx5fNuu4W274JvGd4/0rg9KzrnlLbrwWuAh444PkbgH8AArgGuOdCfk+TPeqrgYcy85HMfBr4a+CmffvcBPzZ8P7twHUREQ3W1Jaxbc/MT2Tm7vDh3cDLWq6xKXU+d4DfBD4AfLPN4hpUp93vAj6Ymd+A569BOg/qtD2B7x7efzHw1Rbra0xWF/n+7/PschPw51m5G3hJRHzvpL+nyaC+HPjKOY8fHW4buU9mPgs8BVzaYE1tqdP2c91C9X/deTC27cOvf1dk5sfbLKxhdT7zVwGviohPRcTdEXF9a9U1q07bfx14W0Q8Cvw9cGs7pc3cpFkw0txc4aWrIuJtwCrwQ7OupQ0RsQD8DvCOGZcyC0eoDn/8MNU3qLsi4gcy88lZFtWStwJ/mpm/HRE/CPxFRBzPzOdmXVgXNNmjfgy44pzHLxtuG7lPRByh+kr0Xw3W1JY6bScifgTYAG7MzG+1VFvTxrX9EuA48MmIOE113O7kHJxQrPOZPwqczMxnMvM/gC9TBXfX1Wn7LcDfAGTmp4HvpFq0aN7VyoJxmgzqfwdeGREvj4iLqE4Wnty3z0ng54b33wL8Sw6PwHfc2LZHxOuBP6QK6Xk5Vglj2p6ZT2Xm0cxcycwVquPzN2bmYDblTk2dv/e/o+pNExFHqQ6FPNJijU2p0/YzwHUAEfEaqqDeabXK2TgJvH04+uMa4KnMfHziV2n4jOgNVL2Gh4GN4bbfoPoPE6oP6zbgIeAzwCtmfRa3xbb/E/A14L7hz8lZ19xW2/ft+0nmYNRHzc88qA77PAjcD9w865pbbPuVwKeoRoTcB7x51jVPqd0fBh4HnqH6xnQL8G7g3ed85h8cvi/3X+jfulPIJalwzkyUpMIZ1JJUOINakgpnUEtS4QxqSSqcQS1JhTOoJalw/w9HECtz8n/B+wAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# 生层测试数据\n", "x_train = np.random.rand(20, 1)\n", "y_train = x_train * 3 + 4 + 3*np.random.rand(20,1)\n", "\n", "# 画出图像\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", "\n", "plt.plot(x_train, y_train, 'bo')" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# 转换成 Tensor\n", "x_train = torch.from_numpy(x_train)\n", "y_train = torch.from_numpy(y_train)\n", "\n", "# 定义参数 w 和 b\n", "w = Variable(torch.randn(1), requires_grad=True) # 随机初始化\n", "b = Variable(torch.zeros(1), requires_grad=True) # 使用 0 进行初始化" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# 构建线性回归模型\n", "x_train = Variable(x_train)\n", "y_train = Variable(y_train)\n", "\n", "def linear_model(x):\n", " return x * w + b\n", "\n", "def logistc_regression(x):\n", " return torch.sigmoid(x*w+b) " ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "y_ = linear_model(x_train)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "经过上面的步骤我们就定义好了模型,在进行参数更新之前,我们可以先看看模型的输出结果长什么样" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.plot(x_train.data.numpy(), y_train.data.numpy(), 'bo', label='real')\n", "plt.plot(x_train.data.numpy(), y_.data.numpy(), 'ro', label='estimated')\n", "plt.legend()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**思考:红色的点表示预测值,似乎排列成一条直线,请思考一下这些点是否在一条直线上?**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "这个时候需要计算我们的误差函数,也就是\n", "\n", "$$\n", "E = \\sum_{i=1}^n(\\hat{y}_i - y_i)^2\n", "$$" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "# 计算误差\n", "def get_loss(y_, y):\n", " return torch.sum((y_ - y) ** 2)\n", "\n", "loss = get_loss(y_, y_train)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor(748.8935, dtype=torch.float64, grad_fn=)\n" ] } ], "source": [ "# 打印一下看看 loss 的大小\n", "print(loss)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "定义好了误差函数,接下来我们需要计算 w 和 b 的梯度了,这时得益于 PyTorch 的自动求导,我们不需要手动去算梯度,有兴趣的同学可以手动计算一下,w 和 b 的梯度分别是\n", "\n", "$$\n", "\\frac{\\partial}{\\partial w} = \\frac{2}{n} \\sum_{i=1}^n x_i(w x_i + b - y_i) \\\\\n", "\\frac{\\partial}{\\partial b} = \\frac{2}{n} \\sum_{i=1}^n (w x_i + b - y_i)\n", "$$" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "# 自动求导\n", "loss.backward()" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([-125.1102])\n", "tensor([-243.2102])\n" ] } ], "source": [ "# 查看 w 和 b 的梯度\n", "print(w.grad)\n", "print(b.grad)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "# 更新一次参数\n", "w.data = w.data - 1e-2 * w.grad.data\n", "b.data = b.data - 1e-2 * b.grad.data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "更新完成参数之后,我们再一次看看模型输出的结果" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "y_ = linear_model(x_train)\n", "plt.plot(x_train.data.numpy(), y_train.data.numpy(), 'bo', label='real')\n", "plt.plot(x_train.data.numpy(), y_.data.numpy(), 'ro', label='estimated')\n", "plt.legend()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "从上面的例子可以看到,更新之后红色的线跑到了蓝色的线下面,没有特别好的拟合蓝色的真实值,所以我们需要在进行几次更新" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "epoch: 19, loss: 9.138844332292493\n", "epoch: 39, loss: 8.31670591484358\n", "epoch: 59, loss: 8.010376750480548\n", "epoch: 79, loss: 7.896237967760094\n", "epoch: 99, loss: 7.853709612500179\n" ] } ], "source": [ "for e in range(100): # 进行 100 次更新\n", " y_ = linear_model(x_train)\n", " loss = get_loss(y_, y_train)\n", " \n", " w.grad.zero_() # 记得归零梯度\n", " b.grad.zero_() # 记得归零梯度\n", " loss.backward()\n", " \n", " w.data = w.data - 1e-2 * w.grad.data # 更新 w\n", " b.data = b.data - 1e-2 * b.grad.data # 更新 b \n", " if (e + 1) % 20 == 0:\n", " print('epoch: {}, loss: {}'.format(e, loss.item()))" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "y_ = linear_model(x_train)\n", "plt.plot(x_train.data.numpy(), y_train.data.numpy(), 'bo', label='real')\n", "plt.plot(x_train.data.numpy(), y_.data.numpy(), 'ro', label='estimated')\n", "plt.legend()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "经过 100 次更新,我们发现红色的预测结果已经比较好的拟合了蓝色的真实值。\n", "\n", "现在你已经学会了你的第一个机器学习模型了,再接再厉,完成下面的小练习。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.4 练习题\n", "\n", "重启 notebook 运行上面的线性回归模型,但是改变训练次数以及不同的学习率进行尝试得到不同的结果" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. 多项式回归模型" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "下面我们更进一步,讲一讲多项式回归。什么是多项式回归呢?非常简单,根据上面的线性回归模型\n", "\n", "$$\n", "\\hat{y} = w x + b\n", "$$\n", "\n", "这里是关于 x 的一个一次多项式,这个模型比较简单,没有办法拟合比较复杂的模型,所以我们可以使用更高次的模型,比如\n", "\n", "$$\n", "\\hat{y} = w_0 + w_1 x + w_2 x^2 + w_3 x^3 + \\cdots\n", "$$\n", "\n", "这样就能够拟合更加复杂的模型,这就是多项式模型,这里使用了 x 的更高次,同理还有多元回归模型,形式也是一样的,只是出了使用 x,还是更多的变量,比如 y、z 等等,同时他们的 loss 函数和简单的线性回归模型是一致的。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "首先我们可以先定义一个需要拟合的目标函数,这个函数是个三次的多项式" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "y = 0.90 + 0.50 * x + 3.00 * x^2 + 2.40 * x^3\n" ] } ], "source": [ "# 定义一个多变量函数\n", "\n", "w_target = np.array([0.5, 3, 2.4]) # 定义参数\n", "b_target = np.array([0.9]) # 定义参数\n", "\n", "f_des = 'y = {:.2f} + {:.2f} * x + {:.2f} * x^2 + {:.2f} * x^3'.format(\n", " b_target[0], w_target[0], w_target[1], w_target[2]) # 打印出函数的式子\n", "\n", "print(f_des)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "我们可以先画出这个多项式的图像" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# 画出这个函数的曲线\n", "x_sample = np.arange(-3, 3.1, 0.1)\n", "y_sample = b_target[0] + w_target[0] * x_sample + w_target[1] * x_sample ** 2 + w_target[2] * x_sample ** 3\n", "\n", "plt.plot(x_sample, y_sample, label='real curve')\n", "plt.legend()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "接着我们可以构建数据集,需要 x 和 y,同时是一个三次多项式,所以我们取了 $x,\\ x^2, x^3$" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "# 构建数据 x 和 y\n", "# x 是一个如下矩阵 [x, x^2, x^3]\n", "# y 是函数的结果 [y]\n", "\n", "x_train = np.stack([x_sample ** i for i in range(1, 4)], axis=1)\n", "x_train = torch.from_numpy(x_train).float() # 转换成 float tensor\n", "\n", "y_train = torch.from_numpy(y_sample).float().unsqueeze(1) # 转化成 float tensor " ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "torch.Size([61, 3])\n" ] } ], "source": [ "print(x_train.size())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "接着我们可以定义需要优化的参数,就是前面这个函数里面的 $w_i$" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "# 定义参数和模型\n", "w = Variable(torch.randn(3, 1), requires_grad=True)\n", "b = Variable(torch.zeros(1), requires_grad=True)\n", "\n", "# 将 x 和 y 转换成 Variable\n", "x_train = Variable(x_train)\n", "y_train = Variable(y_train)\n", "\n", "def multi_linear(x):\n", " return torch.mm(x, w) + b\n", "\n", "def get_loss(y_, y):\n", " return torch.mean((y_ - y) ** 2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "我们可以画出没有更新之前的模型和真实的模型之间的对比" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# 画出更新之前的模型\n", "y_pred = multi_linear(x_train)\n", "\n", "plt.plot(x_train.data.numpy()[:, 0], y_pred.data.numpy(), label='fitting curve', color='r')\n", "plt.plot(x_train.data.numpy()[:, 0], y_sample, label='real curve', color='b')\n", "plt.legend()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "可以发现,这两条曲线之间存在差异,我们计算一下他们之间的误差" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor(1144.2655, grad_fn=)\n" ] } ], "source": [ "# 计算误差,这里的误差和一元的线性模型的误差是相同的,前面已经定义过了 get_loss\n", "loss = get_loss(y_pred, y_train)\n", "print(loss)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "# 自动求导\n", "loss.backward()" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[ -94.7455],\n", " [-139.1247],\n", " [-629.8584]])\n", "tensor([-25.7413])\n" ] } ], "source": [ "# 查看一下 w 和 b 的梯度\n", "print(w.grad)\n", "print(b.grad)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "# 更新一下参数\n", "w.data = w.data - 0.001 * w.grad.data\n", "b.data = b.data - 0.001 * b.grad.data" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# 画出更新一次之后的模型\n", "y_pred = multi_linear(x_train)\n", "\n", "plt.plot(x_train.data.numpy()[:, 0], y_pred.data.numpy(), label='fitting curve', color='r')\n", "plt.plot(x_train.data.numpy()[:, 0], y_sample, label='real curve', color='b')\n", "plt.legend()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "因为只更新了一次,所以两条曲线之间的差异仍然存在,我们进行 100 次迭代" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "epoch 20, Loss: 65.56586\n", "epoch 40, Loss: 15.41177\n", "epoch 60, Loss: 3.70702\n", "epoch 80, Loss: 0.97122\n", "epoch 100, Loss: 0.32874\n" ] } ], "source": [ "# 进行 100 次参数更新\n", "for e in range(100):\n", " y_pred = multi_linear(x_train)\n", " loss = get_loss(y_pred, y_train)\n", " \n", " w.grad.data.zero_()\n", " b.grad.data.zero_()\n", " loss.backward()\n", " \n", " # 更新参数\n", " w.data = w.data - 0.001 * w.grad.data\n", " b.data = b.data - 0.001 * b.grad.data\n", " if (e + 1) % 20 == 0:\n", " print('epoch {}, Loss: {:.5f}'.format(e+1, loss.data.item()))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "可以看到更新完成之后 loss 已经非常小了,我们画出更新之后的曲线对比" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# 画出更新之后的结果\n", "y_pred = multi_linear(x_train)\n", "\n", "plt.plot(x_train.data.numpy()[:, 0], y_pred.data.numpy(), label='fitting curve', color='r')\n", "plt.plot(x_train.data.numpy()[:, 0], y_sample, label='real curve', color='b')\n", "plt.legend()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "可以看到,经过 100 次更新之后,可以看到拟合的线和真实的线已经完全重合了" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## 4. 练习题\n", "\n", "上面的例子是一个三次的多项式,尝试使用二次的多项式去拟合它,看看最后能做到多好\n", "\n", "**提示:参数 `w = torch.randn(2, 1)`,同时重新构建 x 数据集**" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" } }, "nbformat": 4, "nbformat_minor": 2 }