{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 自动求导\n", "\n", "自动求导是 PyTorch 中非常重要的特性,能够让我们避免手动去计算非常复杂的导数,这能够极大地减少构建模型的时间。 PyTorch 的 Autograd 模块实现了深度学习的算法中的反向传播求导数,在张量(Tensor类)上的所有操作, Autograd 都能为他们自动提供微分,简化了手动计算导数的复杂过程。\n", "\n", "在PyTorch 0.4以前的版本中, PyTorch 使用 `Variabe` 类来自动计算所有的梯度 `Variable` 类主要包含三个属性 \n", "* Variable 所包含的 Tensor;\n", "* grad:保存 data 对应的梯度,grad 也是个 Variable,而不是 Tensor,它和 data 的形状一样;\n", "* grad_fn:指向一个 Function 对象,这个 Function 用来反向传播计算输入的梯度;\n", "\n", "从 PyTorch 0.4版本起, `Variable` 正式合并入 `Tensor` 类,通过 `Variable` 嵌套实现的自动微分功能已经整合进入了 `Tensor` 类中。虽然为了的兼容性还是可以使用 `Variable`(tensor)这种方式进行嵌套,但是这个操作其实什么都没做。\n", "\n", "**以后的代码建议直接使用 `Tensor` 类进行操作,因为官方文档中已经将 `Variable` 设置成过期模块。**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. 简单情况的自动求导\n", "\n", "下面展示一些简单情况的自动求导,\"简单\"体现在计算的结果都是标量,也就是一个数,对这个标量进行自动求导。" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([19.], grad_fn=)\n" ] } ], "source": [ "import torch\n", "\n", "x = torch.tensor([2.0], requires_grad=True)\n", "y = x + 2\n", "z = y ** 2 + 3\n", "print(z)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "通过上面的一些列操作,我们从 x 得到了最后的结果out,我们可以将其表示为数学公式\n", "\n", "$$\n", "z = (x + 2)^2 + 3\n", "$$\n", "\n", "那么我们从 $z$ 对 $x$ (当$x=2$)求导的结果就是 \n", "\n", "$$\n", "\\frac{\\partial z}{\\partial x} = 2 (x + 2) = 2 (2 + 2) = 8\n", "$$\n", "\n", ">如果对求导不熟悉,可以查看[《导数介绍资料》](https://baike.baidu.com/item/%E5%AF%BC%E6%95%B0#1)进行复习。" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([8.])\n" ] } ], "source": [ "# 使用自动求导\n", "z.backward()\n", "print(x.grad)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "上面简单的例子验证了自动求导的功能,可以发现使用自动求导非常方便,不需要关系中间变量的状态。如果是一个更加复杂的例子,那么手动求导有可能非常的麻烦,所以自动求导的机制能够帮助我们省去繁琐的数学公式推导,下面给出一个更加复杂的例子。" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[1., 2.],\n", " [3., 4.]], requires_grad=True)\n" ] } ], "source": [ "# 定义变量\n", "x = torch.tensor([1,2], dtype=torch.float, requires_grad=False)\n", "b = torch.tensor([5,6], dtype=torch.float, requires_grad=False)\n", "w = torch.tensor([[1,2],[3,4]], dtype=torch.float, requires_grad=True)\n", "print(w)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true }, "outputs": [], "source": [ "z = torch.mean(torch.matmul(w, x) + b) # torch.matmul 是做矩阵乘法\n", "z.backward()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> 如果对矩阵乘法不熟悉,可以查看[《矩阵乘法说明》](https://baike.baidu.com/item/%E7%9F%A9%E9%98%B5%E4%B9%98%E6%B3%95/5446029?fr=aladdin)进行复习。" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[0.5000, 1.0000],\n", " [0.5000, 1.0000]])\n" ] } ], "source": [ "# 得到 w 的梯度\n", "print(w.grad)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "具体计算的公式为:\n", "$$\n", "z_1 = w_{11}*x_1 + w_{12}*x_2 + b_1 \\\\\n", "z_2 = w_{21}*x_1 + w_{22}*x_2 + b_2 \\\\\n", "z = \\frac{1}{2} (z_1 + z_2)\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "则微分计算结果是:\n", "$$\n", "\\frac{\\partial z}{w_{11}} = \\frac{1}{2} x_1 \\\\\n", "\\frac{\\partial z}{w_{12}} = \\frac{1}{2} x_2 \\\\\n", "\\frac{\\partial z}{w_{21}} = \\frac{1}{2} x_1 \\\\\n", "\\frac{\\partial z}{w_{22}} = \\frac{1}{2} x_2\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "上面数学公式的具体含义是:矩阵乘法之后对两个矩阵对应元素相乘,然后所有元素求平均。使用 PyTorch 的自动求导,能够非常容易得到 对 `w` 的导数,因为深度学习中充满大量的矩阵运算,所以手动去求这些导数比较费时间和精力,有了自动求导能够非常方便地解决网络更新的问题。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. 复杂情况的自动求导\n", "\n", "上面展示了简单情况下的自动求导,都是对标量进行自动求导,那么如何对一个向量或者矩阵自动求导?" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[2., 3.]], requires_grad=True)\n", "tensor([[0., 0.]])\n" ] } ], "source": [ "m = torch.tensor([[2, 3]], dtype=torch.float, requires_grad=True) # 构建一个 1 x 2 的矩阵\n", "n = torch.zeros(1, 2) # 构建一个相同大小的 0 矩阵\n", "print(m)\n", "print(n)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor(2., grad_fn=)\n", "tensor([[ 4., 27.]], grad_fn=)\n" ] } ], "source": [ "# 通过 m 中的值计算新的 n 中的值\n", "print(m[0,0])\n", "n[0, 0] = m[0, 0] ** 2\n", "n[0, 1] = m[0, 1] ** 3\n", "print(n)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "将上面的式子写成数学公式,可以得到 \n", "$$\n", "n = (n_0,\\ n_1) = (m_0^2,\\ m_1^3) = (2^2,\\ 3^3) \n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "下面我们直接对 `n` 进行反向传播,也就是求 `n` 对 `m` 的导数。\n", "\n", "这时我们需要明确这个导数的定义,即如何定义\n", "\n", "$$\n", "\\frac{\\partial n}{\\partial m} = \\frac{\\partial (n_0,\\ n_1)}{\\partial (m_0,\\ m_1)}\n", "$$\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "在 PyTorch 中,如果要调用自动求导,需要往`backward()`中传入一个参数,这个参数的形状和 n 一样大,比如是 $(w_0,\\ w_1)$,那么自动求导的结果就是:\n", "$$\n", "\\frac{\\partial n}{\\partial m_0} = w_0 \\frac{\\partial n_0}{\\partial m_0} + w_1 \\frac{\\partial n_1}{\\partial m_0}\n", "$$\n", "$$\n", "\\frac{\\partial n}{\\partial m_1} = w_0 \\frac{\\partial n_0}{\\partial m_1} + w_1 \\frac{\\partial n_1}{\\partial m_1}\n", "$$" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": true }, "outputs": [], "source": [ "n.backward(torch.ones_like(n)) # 将 (w0, w1) 取成 (1, 1)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[ 4., 27.]])\n" ] } ], "source": [ "print(m.grad)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "通过自动求导我们得到了梯度是 4 和 27,我们可以验算一下\n", "$$\n", "\\frac{\\partial n}{\\partial m_0} = w_0 \\frac{\\partial n_0}{\\partial m_0} + w_1 \\frac{\\partial n_1}{\\partial m_0} = 2 m_0 + 0 = 2 \\times 2 = 4\n", "$$\n", "$$\n", "\\frac{\\partial n}{\\partial m_1} = w_0 \\frac{\\partial n_0}{\\partial m_1} + w_1 \\frac{\\partial n_1}{\\partial m_1} = 0 + 3 m_1^2 = 3 \\times 3^2 = 27\n", "$$\n", "通过验算我们可以得到相同的结果" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. 多次自动求导\n", "通过调用 backward 我们可以进行一次自动求导,如果我们再调用一次 backward,会发现程序报错,没有办法再做一次。这是因为 PyTorch 默认做完一次自动求导之后,计算图就被丢弃了,所以两次自动求导需要手动设置一个东西,我们通过下面的小例子来说明。" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([18.], grad_fn=)\n" ] } ], "source": [ "x = torch.tensor([3], dtype=torch.float, requires_grad=True)\n", "y = x * 2 + x ** 2 + 3\n", "print(y)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": true }, "outputs": [], "source": [ "y.backward(retain_graph=True) # 设置 retain_graph 为 True 来保留计算图" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([8.])\n" ] } ], "source": [ "print(x.grad)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": true }, "outputs": [], "source": [ "y.backward() # 再做一次自动求导,这次不保留计算图" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([16.])\n" ] } ], "source": [ "print(x.grad)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "可以发现 x 的梯度变成了 16,因为这里做了两次自动求导,所以讲第一次的梯度 8 和第二次的梯度 8 加起来得到了 16 的结果。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. 练习题\n", "\n", "定义\n", "\n", "$$\n", "x = \n", "\\left[\n", "\\begin{matrix}\n", "x_0 \\\\\n", "x_1\n", "\\end{matrix}\n", "\\right] = \n", "\\left[\n", "\\begin{matrix}\n", "2 \\\\\n", "3\n", "\\end{matrix}\n", "\\right]\n", "$$\n", "\n", "$$\n", "k = (k_0,\\ k_1) = (x_0^2 + 3 x_1,\\ 2 x_0 + x_1^2)\n", "$$\n", "\n", "希望求得\n", "\n", "$$\n", "j = \\left[\n", "\\begin{matrix}\n", "\\frac{\\partial k_0}{\\partial x_0} & \\frac{\\partial k_0}{\\partial x_1} \\\\\n", "\\frac{\\partial k_1}{\\partial x_0} & \\frac{\\partial k_1}{\\partial x_1}\n", "\\end{matrix}\n", "\\right]\n", "$$\n" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": true }, "outputs": [], "source": [ "x = torch.tensor([2, 3], dtype=torch.float, requires_grad=True)\n", "k = torch.zeros(2)\n", "\n", "k[0] = x[0] ** 2 + 3 * x[1]\n", "k[1] = x[1] ** 2 + 2 * x[0]" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([13., 13.], grad_fn=)\n", "tensor([4., 3.])\n", "tensor([2., 6.])\n" ] } ], "source": [ "# calc k_0 -> (x_0, x_1)\n", "j = torch.zeros(2, 2)\n", "k.backward(torch.FloatTensor([1, 0]), retain_graph=True)\n", "print(k)\n", "j[0] = x.grad.data\n", "print(x.grad.data)\n", "\n", "x.grad.data.zero_() # 归零之前求得的梯度\n", "\n", "# calc k_1 -> (x_0, x_1)\n", "k.backward(torch.FloatTensor([0, 1]))\n", "j[1] = x.grad.data\n", "print(x.grad.data)\n" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[4., 3.],\n", " [2., 6.]])\n" ] } ], "source": [ "print(j)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([2., 3., 4.], requires_grad=True)\n", "tensor([2., 0., 0.])\n" ] } ], "source": [ "# demo to show how to use `.backward`\n", "x = torch.tensor([2,3,4], dtype=torch.float, requires_grad=True)\n", "print(x)\n", "y = x*2\n", "\n", "y.backward(torch.tensor([1, 0, 0], dtype=torch.float))\n", "print(x.grad)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 参考资料\n", "* [PyTorch 的 Autograd](https://zhuanlan.zhihu.com/p/69294347)\n", "* [PyTorch学习笔记之自动求导(AutoGrad)](https://zhuanlan.zhihu.com/p/102942725)\n", "* [Pytorch Autograd (自动求导机制)](https://www.cnblogs.com/wangqinze/p/13418291.html)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.4" } }, "nbformat": 4, "nbformat_minor": 2 }