OpenI
/
machinelearning_notebook

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 自动求导\n",
    "\n",
    "自动求导是 PyTorch 中非常重要的特性，能够让我们避免手动去计算非常复杂的导数，这能够极大地减少构建模型的时间。 PyTorch 的 Autograd 模块实现了深度学习的算法中的反向传播求导数，在张量（Tensor类）上的所有操作， Autograd 都能为他们自动提供微分，简化了手动计算导数的复杂过程。\n",
    "\n",
    "在PyTorch 0.4以前的版本中， PyTorch 使用 `Variabe` 类来自动计算所有的梯度 `Variable` 类主要包含三个属性 \n",
    "* Variable 所包含的 Tensor；\n",
    "* grad：保存 data 对应的梯度，grad 也是个 Variable，而不是 Tensor，它和 data 的形状一样；\n",
    "* grad_fn：指向一个 Function 对象，这个 Function 用来反向传播计算输入的梯度;\n",
    "\n",
    "从 PyTorch 0.4版本起， `Variable` 正式合并入 `Tensor` 类，通过 `Variable` 嵌套实现的自动微分功能已经整合进入了 `Tensor` 类中。虽然为了的兼容性还是可以使用 `Variable`（tensor）这种方式进行嵌套，但是这个操作其实什么都没做。\n",
    "\n",
    "**以后的代码建议直接使用 `Tensor` 类进行操作，因为官方文档中已经将 `Variable` 设置成过期模块。**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. 简单情况的自动求导\n",
    "\n",
    "下面展示一些简单情况的自动求导，\"简单\"体现在计算的结果都是标量，也就是一个数，对这个标量进行自动求导。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([19.], grad_fn=<AddBackward0>)\n"
     ]
    }
   ],
   "source": [
    "import torch\n",
    "\n",
    "x = torch.tensor([2.0], requires_grad=True)\n",
    "y = x + 2\n",
    "z = y ** 2 + 3\n",
    "print(z)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "通过上面的一些列操作，我们从 x 得到了最后的结果out，我们可以将其表示为数学公式\n",
    "\n",
    "$$\n",
    "z = (x + 2)^2 + 3\n",
    "$$\n",
    "\n",
    "那么我们从 $z$ 对 $x$ （当$x=2$）求导的结果就是 \n",
    "\n",
    "$$\n",
    "\\frac{\\partial z}{\\partial x} = 2 (x + 2) = 2 (2 + 2) = 8\n",
    "$$\n",
    "\n",
    ">如果对求导不熟悉，可以查看[《导数介绍资料》](https://baike.baidu.com/item/%E5%AF%BC%E6%95%B0#1)进行复习。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([8.])\n"
     ]
    }
   ],
   "source": [
    "# 使用自动求导\n",
    "z.backward()\n",
    "print(x.grad)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "上面简单的例子验证了自动求导的功能，可以发现使用自动求导非常方便，不需要关系中间变量的状态。如果是一个更加复杂的例子，那么手动求导有可能非常的麻烦，所以自动求导的机制能够帮助我们省去繁琐的数学公式推导，下面给出一个更加复杂的例子。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([[1., 2.],\n",
      "        [3., 4.]], requires_grad=True)\n"
     ]
    }
   ],
   "source": [
    "# 定义变量\n",
    "x = torch.tensor([1,2], dtype=torch.float, requires_grad=False)\n",
    "b = torch.tensor([5,6], dtype=torch.float, requires_grad=False)\n",
    "w = torch.tensor([[1,2],[3,4]], dtype=torch.float, requires_grad=True)\n",
    "print(w)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "z = torch.mean(torch.matmul(w, x) + b) # torch.matmul 是做矩阵乘法\n",
    "z.backward()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> 如果对矩阵乘法不熟悉，可以查看[《矩阵乘法说明》](https://baike.baidu.com/item/%E7%9F%A9%E9%98%B5%E4%B9%98%E6%B3%95/5446029?fr=aladdin)进行复习。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([[0.5000, 1.0000],\n",
      "        [0.5000, 1.0000]])\n"
     ]
    }
   ],
   "source": [
    "# 得到 w 的梯度\n",
    "print(w.grad)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "具体计算的公式为：\n",
    "$$\n",
    "z_1 = w_{11}*x_1 + w_{12}*x_2 + b_1 \\\\\n",
    "z_2 = w_{21}*x_1 + w_{22}*x_2 + b_2 \\\\\n",
    "z = \\frac{1}{2} (z_1 + z_2)\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "则微分计算结果是：\n",
    "$$\n",
    "\\frac{\\partial z}{w_{11}} = \\frac{1}{2} x_1 \\\\\n",
    "\\frac{\\partial z}{w_{12}} = \\frac{1}{2} x_2 \\\\\n",
    "\\frac{\\partial z}{w_{21}} = \\frac{1}{2} x_1 \\\\\n",
    "\\frac{\\partial z}{w_{22}} = \\frac{1}{2} x_2\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "上面数学公式的具体含义是：矩阵乘法之后对两个矩阵对应元素相乘，然后所有元素求平均。使用 PyTorch 的自动求导，能够非常容易得到 对 `w` 的导数，因为深度学习中充满大量的矩阵运算，所以手动去求这些导数比较费时间和精力，有了自动求导能够非常方便地解决网络更新的问题。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. 复杂情况的自动求导\n",
    "\n",
    "上面展示了简单情况下的自动求导，都是对标量进行自动求导，那么如何对一个向量或者矩阵自动求导？"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([[2., 3.]], requires_grad=True)\n",
      "tensor([[0., 0.]])\n"
     ]
    }
   ],
   "source": [
    "m = torch.tensor([[2, 3]], dtype=torch.float, requires_grad=True) # 构建一个 1 x 2 的矩阵\n",
    "n = torch.zeros(1, 2) # 构建一个相同大小的 0 矩阵\n",
    "print(m)\n",
    "print(n)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor(2., grad_fn=<SelectBackward0>)\n",
      "tensor([[ 4., 27.]], grad_fn=<CopySlices>)\n"
     ]
    }
   ],
   "source": [
    "# 通过 m 中的值计算新的 n 中的值\n",
    "print(m[0,0])\n",
    "n[0, 0] = m[0, 0] ** 2\n",
    "n[0, 1] = m[0, 1] ** 3\n",
    "print(n)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "将上面的式子写成数学公式，可以得到 \n",
    "$$\n",
    "n = (n_0,\\ n_1) = (m_0^2,\\ m_1^3) = (2^2,\\ 3^3) \n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "下面我们直接对 `n` 进行反向传播，也就是求 `n` 对 `m` 的导数。\n",
    "\n",
    "这时我们需要明确这个导数的定义，即如何定义\n",
    "\n",
    "$$\n",
    "\\frac{\\partial n}{\\partial m} = \\frac{\\partial (n_0,\\ n_1)}{\\partial (m_0,\\ m_1)}\n",
    "$$\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在 PyTorch 中，如果要调用自动求导，需要往`backward()`中传入一个参数，这个参数的形状和 n 一样大，比如是 $(w_0,\\ w_1)$，那么自动求导的结果就是：\n",
    "$$\n",
    "\\frac{\\partial n}{\\partial m_0} = w_0 \\frac{\\partial n_0}{\\partial m_0} + w_1 \\frac{\\partial n_1}{\\partial m_0}\n",
    "$$\n",
    "$$\n",
    "\\frac{\\partial n}{\\partial m_1} = w_0 \\frac{\\partial n_0}{\\partial m_1} + w_1 \\frac{\\partial n_1}{\\partial m_1}\n",
    "$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "n.backward(torch.ones_like(n)) # 将 (w0, w1) 取成 (1, 1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([[ 4., 27.]])\n"
     ]
    }
   ],
   "source": [
    "print(m.grad)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "通过自动求导我们得到了梯度是 4 和 27，我们可以验算一下\n",
    "$$\n",
    "\\frac{\\partial n}{\\partial m_0} = w_0 \\frac{\\partial n_0}{\\partial m_0} + w_1 \\frac{\\partial n_1}{\\partial m_0} = 2 m_0 + 0 = 2 \\times 2 = 4\n",
    "$$\n",
    "$$\n",
    "\\frac{\\partial n}{\\partial m_1} = w_0 \\frac{\\partial n_0}{\\partial m_1} + w_1 \\frac{\\partial n_1}{\\partial m_1} = 0 + 3 m_1^2 = 3 \\times 3^2 = 27\n",
    "$$\n",
    "通过验算我们可以得到相同的结果"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. 多次自动求导\n",
    "通过调用 backward 我们可以进行一次自动求导，如果我们再调用一次 backward，会发现程序报错，没有办法再做一次。这是因为 PyTorch 默认做完一次自动求导之后，计算图就被丢弃了，所以两次自动求导需要手动设置一个东西，我们通过下面的小例子来说明。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([18.], grad_fn=<AddBackward0>)\n"
     ]
    }
   ],
   "source": [
    "x = torch.tensor([3], dtype=torch.float, requires_grad=True)\n",
    "y = x * 2 + x ** 2 + 3\n",
    "print(y)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "y.backward(retain_graph=True) # 设置 retain_graph 为 True 来保留计算图"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([8.])\n"
     ]
    }
   ],
   "source": [
    "print(x.grad)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "y.backward() # 再做一次自动求导，这次不保留计算图"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([16.])\n"
     ]
    }
   ],
   "source": [
    "print(x.grad)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "可以发现 x 的梯度变成了 16，因为这里做了两次自动求导，所以讲第一次的梯度 8 和第二次的梯度 8 加起来得到了 16 的结果。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. 练习题\n",
    "\n",
    "定义\n",
    "\n",
    "$$\n",
    "x = \n",
    "\\left[\n",
    "\\begin{matrix}\n",
    "x_0 \\\\\n",
    "x_1\n",
    "\\end{matrix}\n",
    "\\right] = \n",
    "\\left[\n",
    "\\begin{matrix}\n",
    "2 \\\\\n",
    "3\n",
    "\\end{matrix}\n",
    "\\right]\n",
    "$$\n",
    "\n",
    "$$\n",
    "k = (k_0,\\ k_1) = (x_0^2 + 3 x_1,\\ 2 x_0 + x_1^2)\n",
    "$$\n",
    "\n",
    "希望求得\n",
    "\n",
    "$$\n",
    "j = \\left[\n",
    "\\begin{matrix}\n",
    "\\frac{\\partial k_0}{\\partial x_0} & \\frac{\\partial k_0}{\\partial x_1} \\\\\n",
    "\\frac{\\partial k_1}{\\partial x_0} & \\frac{\\partial k_1}{\\partial x_1}\n",
    "\\end{matrix}\n",
    "\\right]\n",
    "$$\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "x = torch.tensor([2, 3], dtype=torch.float, requires_grad=True)\n",
    "k = torch.zeros(2)\n",
    "\n",
    "k[0] = x[0] ** 2 + 3 * x[1]\n",
    "k[1] = x[1] ** 2 + 2 * x[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([13., 13.], grad_fn=<CopySlices>)\n",
      "tensor([4., 3.])\n",
      "tensor([2., 6.])\n"
     ]
    }
   ],
   "source": [
    "# calc k_0 -> (x_0, x_1)\n",
    "j = torch.zeros(2, 2)\n",
    "k.backward(torch.FloatTensor([1, 0]), retain_graph=True)\n",
    "print(k)\n",
    "j[0] = x.grad.data\n",
    "print(x.grad.data)\n",
    "\n",
    "x.grad.data.zero_() # 归零之前求得的梯度\n",
    "\n",
    "# calc k_1 -> (x_0, x_1)\n",
    "k.backward(torch.FloatTensor([0, 1]))\n",
    "j[1] = x.grad.data\n",
    "print(x.grad.data)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([[4., 3.],\n",
      "        [2., 6.]])\n"
     ]
    }
   ],
   "source": [
    "print(j)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([2., 3., 4.], requires_grad=True)\n",
      "tensor([2., 0., 0.])\n"
     ]
    }
   ],
   "source": [
    "# demo to show how to use `.backward`\n",
    "x = torch.tensor([2,3,4], dtype=torch.float, requires_grad=True)\n",
    "print(x)\n",
    "y = x*2\n",
    "\n",
    "y.backward(torch.tensor([1, 0, 0], dtype=torch.float))\n",
    "print(x.grad)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 参考资料\n",
    "* [PyTorch 的 Autograd](https://zhuanlan.zhihu.com/p/69294347)\n",
    "* [PyTorch学习笔记之自动求导（AutoGrad)](https://zhuanlan.zhihu.com/p/102942725)\n",
    "* [Pytorch Autograd (自动求导机制)](https://www.cnblogs.com/wangqinze/p/13418291.html)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}