{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 深层神经网络\n", "\n", "前一节简要介绍了PyTorch的神经网络实现,同时示范了如何用神经网络构建一个复杂的非线性二分类器。针对图像分类的问题,下面用深度学习的入门级数据集 MNIST 手写体分类来说明深层神经网络的优良表现。\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. MNIST 数据集\n", "\n", "MNIS数据集是一个非常出名的数据集,基本上很多网络都将其作为一个测试的标准,其来自美国国家标准与技术研究所, National Institute of Standards and Technology (NIST)。 训练集 (training set) 由来自 250 个不同人手写的数字构成, 其中 50% 是高中学生, 50% 来自人口普查局 (the Census Bureau) 的工作人员,一共有 60000 张图片。 测试集(test set) 也是同样比例的手写数字数据,一共有 10000 张图片。\n", "\n", "每张图片大小是 28 x 28 的灰度图,如下\n", "\n", "![MNIS](imgs/MNIST.jpeg)\n", "\n", "任务就是给出一张图片,希望区别出其到底属于 0 到 9 这 10 个数字中的哪一个。\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. 多分类问题\n", "\n", "前面讲过二分类问题,现在处理的问题更加复杂,是一个 10 分类问题,统称为多分类问题,对于多分类问题, loss 函数使用一个更加复杂的函数,叫交叉熵。\n", "\n", "### 2.1 softmax\n", "提到交叉熵,先讲一下 softmax 函数,前面我们见过了 sigmoid 函数,如下\n", "\n", "$$s(x) = \\frac{1}{1 + e^{-x}}$$\n", "\n", "可以将任何一个值转换到 0 ~ 1 之间,当然对于一个二分类问题,这样就足够了,因为对于二分类问题,如果不属于第一类,那么必定属于第二类,所以只需要用一个值来表示其属于其中一类概率,但是对于多分类问题,这样并不行,需要知道其属于每一类的概率,这个时候就需要 softmax 函数了。\n", "\n", "softmax 函数示例如下\n", "\n", "![softmax](imgs/softmax.jpeg)\n", "\n", "对于网络的输出 $z_1, z_2, \\cdots z_k$,我们首先对他们每个都取指数变成 $e^{z_1}, e^{z_2}, \\cdots, e^{z_k}$,那么每一项都除以他们的求和,也就是\n", "\n", "$$\n", "z_i \\rightarrow \\frac{e^{z_i}}{\\sum_{j=1}^{k} e^{z_j}}\n", "$$\n", "\n", "如果对经过 softmax 函数的所有项求和就等于 1,所以他们每一项都分别表示属于其中某一类的概率。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.2 交叉熵\n", "\n", "交叉熵衡量两个分布相似性的一种度量方式,前面讲的二分类问题的 loss 函数就是交叉熵的一种特殊情况,交叉熵的一般公式为\n", "\n", "$$\n", "cross\\_entropy(p, q) = E_{p}[-\\log q] = - \\frac{1}{m} \\sum_{x} p(x) \\log q(x)\n", "$$\n", "\n", "对于二分类问题我们可以写成\n", "\n", "$$\n", "-\\frac{1}{m} \\sum_{i=1}^m (y^{i} \\log sigmoid(x^{i}) + (1 - y^{i}) \\log (1 - sigmoid(x^{i}))\n", "$$\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.3 示例程序" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import torch\n", "from torchvision.datasets import mnist # 导入 pytorch 内置的 mnist 数据\n", "\n", "from torch import nn\n", "from torch.autograd import Variable" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# 使用内置函数下载 mnist 数据集\n", "train_set = mnist.MNIST('../data/mnist', train=True, download=True)\n", "test_set = mnist.MNIST('../data/mnist', train=False, download=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "我们可以看看其中的一个数据是什么样子的" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "a_data, a_label = train_set[0]" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAABwAAAAcCAAAAABXZoBIAAABAElEQVR4nGNgGMyAWUhIqK5jvdSy/9/rGRgYGFhgEnJsVjYCwQwMDAxPJgV+vniQgYGBgREqZ7iXH8r6l/SV4dn7m8gmCt3++/fv37/Htn3/iMW+gDnZf/+e5WbQnoXNNXyMs/5GoQoxwVmf/n9kSGFiwAW49/11wynJoPzx4YIcRlyygR/+/i2XxCWru+vv32nSuGQFYv/83Y3b4p9/fzpAmSyoMnohpiwM1w5h06Q+5enfv39/bcMiJVF09+/fv39P+mFKiTtd/fv3799jgZiBJLT69t+/f/8eDuDEkDJf8+jv379/v7Ryo4qzMDAwMAQGMjBc3/y35wM2V1IfAABFF16Aa0wAOwAAAABJRU5ErkJggg==\n", "text/plain": [ "" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a_data" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a_label" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "这里的读入的数据是 PIL 库中的格式,我们可以非常方便地将其转换为 numpy array" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(28, 28)\n" ] } ], "source": [ "a_data = np.array(a_data, dtype='float32')\n", "print(a_data.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "这里我们可以看到这种图片的大小是 28 x 28" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", " [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", " [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", " [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", " [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", " [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 3. 18.\n", " 18. 18. 126. 136. 175. 26. 166. 255. 247. 127. 0. 0. 0. 0.]\n", " [ 0. 0. 0. 0. 0. 0. 0. 0. 30. 36. 94. 154. 170. 253.\n", " 253. 253. 253. 253. 225. 172. 253. 242. 195. 64. 0. 0. 0. 0.]\n", " [ 0. 0. 0. 0. 0. 0. 0. 49. 238. 253. 253. 253. 253. 253.\n", " 253. 253. 253. 251. 93. 82. 82. 56. 39. 0. 0. 0. 0. 0.]\n", " [ 0. 0. 0. 0. 0. 0. 0. 18. 219. 253. 253. 253. 253. 253.\n", " 198. 182. 247. 241. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", " [ 0. 0. 0. 0. 0. 0. 0. 0. 80. 156. 107. 253. 253. 205.\n", " 11. 0. 43. 154. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", " [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 14. 1. 154. 253. 90.\n", " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", " [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 139. 253. 190.\n", " 2. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", " [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 11. 190. 253.\n", " 70. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", " [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 35. 241.\n", " 225. 160. 108. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", " [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 81.\n", " 240. 253. 253. 119. 25. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", " [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", " 45. 186. 253. 253. 150. 27. 0. 0. 0. 0. 0. 0. 0. 0.]\n", " [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", " 0. 16. 93. 252. 253. 187. 0. 0. 0. 0. 0. 0. 0. 0.]\n", " [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", " 0. 0. 0. 249. 253. 249. 64. 0. 0. 0. 0. 0. 0. 0.]\n", " [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", " 46. 130. 183. 253. 253. 207. 2. 0. 0. 0. 0. 0. 0. 0.]\n", " [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 39. 148.\n", " 229. 253. 253. 253. 250. 182. 0. 0. 0. 0. 0. 0. 0. 0.]\n", " [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 24. 114. 221. 253.\n", " 253. 253. 253. 201. 78. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", " [ 0. 0. 0. 0. 0. 0. 0. 0. 23. 66. 213. 253. 253. 253.\n", " 253. 198. 81. 2. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", " [ 0. 0. 0. 0. 0. 0. 18. 171. 219. 253. 253. 253. 253. 195.\n", " 80. 9. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", " [ 0. 0. 0. 0. 55. 172. 226. 253. 253. 253. 253. 244. 133. 11.\n", " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", " [ 0. 0. 0. 0. 136. 253. 253. 253. 212. 135. 132. 16. 0. 0.\n", " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", " [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", " [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", " [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]\n" ] } ], "source": [ "print(a_data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "我们可以将数组展示出来,里面的 0 就表示黑色,255 表示白色\n", "\n", "对于神经网络,我们第一层的输入就是 28 x 28 = 784,所以必须将得到的数据我们做一个变换,使用 reshape 将他们拉平成一个一维向量" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "def data_tf(x):\n", " x = np.array(x, dtype='float32') / 255\n", " x = (x - 0.5) / 0.5 # 标准化,这个技巧之后会讲到\n", " x = x.reshape((-1,)) # 拉平成一维向量\n", " x = torch.from_numpy(x)\n", " return x\n", "\n", "train_set = mnist.MNIST('../data/mnist', train=True, transform=data_tf, download=True) # 重新载入数据集,申明定义的数据变换\n", "test_set = mnist.MNIST('../data/mnist', train=False, transform=data_tf, download=True)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "torch.Size([784])\n", "5\n" ] } ], "source": [ "a, a_label = train_set[0]\n", "print(a.shape)\n", "print(a_label)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "from torch.utils.data import DataLoader\n", "\n", "# 使用 pytorch 自带的 DataLoader 定义一个数据迭代器\n", "train_data = DataLoader(train_set, batch_size=64, shuffle=True)\n", "test_data = DataLoader(test_set, batch_size=128, shuffle=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "使用这样的数据迭代器是非常有必要的,如果数据量太大,就无法一次将它们全部读入内存,所以需要使用 Python 迭代器,每次生成一个批次的数据" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "a, a_label = next(iter(train_data))" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "torch.Size([64, 784])\n", "torch.Size([64])\n" ] } ], "source": [ "# 打印出一个批次的数据大小\n", "print(a.shape)\n", "print(a_label.shape)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "# 使用 Sequential 定义 4 层神经网络\n", "net = nn.Sequential(\n", " nn.Linear(784, 400),\n", " nn.ReLU(),\n", " nn.Linear(400, 200),\n", " nn.ReLU(),\n", " nn.Linear(200, 100),\n", " nn.ReLU(),\n", " nn.Linear(100, 10)\n", ")" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Sequential(\n", " (0): Linear(in_features=784, out_features=400, bias=True)\n", " (1): ReLU()\n", " (2): Linear(in_features=400, out_features=200, bias=True)\n", " (3): ReLU()\n", " (4): Linear(in_features=200, out_features=100, bias=True)\n", " (5): ReLU()\n", " (6): Linear(in_features=100, out_features=10, bias=True)\n", ")" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "net" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "交叉熵在 pytorch 中已经内置了,交叉熵的数值稳定性更差,所以内置的函数已经帮我们解决了这个问题" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "# 定义 loss 函数\n", "criterion = nn.CrossEntropyLoss()\n", "optimizer = torch.optim.SGD(net.parameters(), 1e-1) # 使用随机梯度下降,学习率 0.1" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "epoch: 0, Train Loss: 0.515279, Train Acc: 0.833889, Eval Loss: 0.162182, Eval Acc: 0.949367\n", "epoch: 1, Train Loss: 0.164546, Train Acc: 0.948244, Eval Loss: 0.121298, Eval Acc: 0.962025\n", "epoch: 2, Train Loss: 0.116251, Train Acc: 0.963669, Eval Loss: 0.160981, Eval Acc: 0.951543\n", "epoch: 3, Train Loss: 0.091204, Train Acc: 0.971149, Eval Loss: 0.098640, Eval Acc: 0.970036\n", "epoch: 4, Train Loss: 0.075570, Train Acc: 0.975796, Eval Loss: 0.125001, Eval Acc: 0.960839\n", "epoch: 5, Train Loss: 0.058536, Train Acc: 0.981710, Eval Loss: 0.072245, Eval Acc: 0.975475\n", "epoch: 6, Train Loss: 0.052349, Train Acc: 0.982743, Eval Loss: 0.082497, Eval Acc: 0.974782\n", "epoch: 7, Train Loss: 0.051543, Train Acc: 0.984125, Eval Loss: 0.065229, Eval Acc: 0.979727\n", "epoch: 8, Train Loss: 0.039741, Train Acc: 0.987257, Eval Loss: 0.116367, Eval Acc: 0.964893\n", "epoch: 9, Train Loss: 0.033266, Train Acc: 0.989489, Eval Loss: 0.071046, Eval Acc: 0.978441\n", "epoch: 10, Train Loss: 0.029305, Train Acc: 0.990039, Eval Loss: 0.087192, Eval Acc: 0.975771\n", "epoch: 11, Train Loss: 0.026703, Train Acc: 0.991388, Eval Loss: 0.067075, Eval Acc: 0.980617\n", "epoch: 12, Train Loss: 0.021403, Train Acc: 0.992970, Eval Loss: 0.063208, Eval Acc: 0.982002\n", "epoch: 13, Train Loss: 0.238340, Train Acc: 0.962787, Eval Loss: 0.122586, Eval Acc: 0.962124\n", "epoch: 14, Train Loss: 0.070087, Train Acc: 0.977046, Eval Loss: 0.134682, Eval Acc: 0.961432\n", "epoch: 15, Train Loss: 0.049751, Train Acc: 0.983575, Eval Loss: 0.078269, Eval Acc: 0.977650\n", "epoch: 16, Train Loss: 0.040535, Train Acc: 0.986657, Eval Loss: 0.069318, Eval Acc: 0.980914\n", "epoch: 17, Train Loss: 0.033759, Train Acc: 0.988739, Eval Loss: 0.075110, Eval Acc: 0.979035\n", "epoch: 18, Train Loss: 0.028471, Train Acc: 0.990672, Eval Loss: 0.079602, Eval Acc: 0.977551\n", "epoch: 19, Train Loss: 0.027123, Train Acc: 0.991021, Eval Loss: 0.078461, Eval Acc: 0.979233\n" ] } ], "source": [ "# 开始训练\n", "losses = []\n", "acces = []\n", "eval_losses = []\n", "eval_acces = []\n", "\n", "for e in range(20):\n", " train_loss = 0\n", " train_acc = 0\n", " net.train()\n", " for im, label in train_data:\n", " im = Variable(im)\n", " label = Variable(label)\n", " # 前向传播\n", " out = net(im)\n", " loss = criterion(out, label)\n", " # 反向传播\n", " optimizer.zero_grad()\n", " loss.backward()\n", " optimizer.step()\n", " # 记录误差\n", " train_loss += loss.item()\n", " # 计算分类的准确率\n", " _, pred = out.max(1)\n", " num_correct = float((pred == label).sum().item())\n", " acc = num_correct / im.shape[0]\n", " train_acc += acc\n", " \n", " losses.append(train_loss / len(train_data))\n", " acces.append(train_acc / len(train_data))\n", " # 在测试集上检验效果\n", " eval_loss = 0\n", " eval_acc = 0\n", " net.eval() # 将模型改为预测模式\n", " for im, label in test_data:\n", " im = Variable(im)\n", " label = Variable(label)\n", " out = net(im)\n", " loss = criterion(out, label)\n", " # 记录误差\n", " eval_loss += loss.item()\n", " # 记录准确率\n", " _, pred = out.max(1)\n", " num_correct = float((pred == label).sum().item())\n", " acc = num_correct / im.shape[0]\n", " eval_acc += acc\n", " \n", " eval_losses.append(eval_loss / len(test_data))\n", " eval_acces.append(eval_acc / len(test_data))\n", " print('epoch: {}, Train Loss: {:.6f}, Train Acc: {:.6f}, Eval Loss: {:.6f}, Eval Acc: {:.6f}'\n", " .format(e, train_loss / len(train_data), train_acc / len(train_data), \n", " eval_loss / len(test_data), eval_acc / len(test_data)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "画出 loss 曲线和 准确率曲线" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", "\n", "plt.title('train loss')\n", "plt.plot(np.arange(len(losses)), losses)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Text(0.5, 1.0, 'train acc')" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.plot(np.arange(len(acces)), acces)\n", "plt.title('train acc')" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.plot(np.arange(len(eval_losses)), eval_losses)\n", "plt.title('test loss')\n", "plt.show()\n", "\n", "plt.plot(np.arange(len(eval_acces)), eval_acces)\n", "plt.title('test acc')\n", "plt.show()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 练习\n", "\n", "* 看一看上面的训练过程,看一下准确率是怎么计算出来的,特别注意 max 这个函数\n", "* 自己重新实现一个新的网络,试试改变隐藏层的数目和激活函数,看看有什么新的结果" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 参考\n", "* [损失函数:交叉熵详解](https://zhuanlan.zhihu.com/p/115277553)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" } }, "nbformat": 4, "nbformat_minor": 2 }