OpenI
/
machinelearning_notebook

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# ResNet\n",
    "\n",
    "当大家还在惊叹 GoogLeNet 的 Inception 结构的时候，微软亚洲研究院的研究员已经在设计更深但结构更加简单的网络 ResNet，并且凭借这个网络子在 2015 年 ImageNet 比赛上大获全胜。\n",
    "\n",
    "ResNet 有效地解决了深度神经网络难以训练的问题，可以训练高达 1000 层的卷积网络。网络之所以难以训练，是因为存在着梯度消失的问题，离 loss 函数越远的层，在反向传播的时候，梯度越小，就越难以更新，随着层数的增加，这个现象越严重。之前有两种常见的方案来解决这个问题：\n",
    "\n",
    "1. 按层训练，先训练比较浅的层，然后在不断增加层数，但是这种方法效果不是特别好，而且比较麻烦\n",
    "2. 使用更宽的层，或者增加输出通道，而不加深网络的层数，这种结构往往得到的效果又不好\n",
    "\n",
    "ResNet 通过引入了跨层链接解决了梯度回传消失的问题。\n",
    "\n",
    "![](images/ResNet_PlainNet.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "这就普通的网络连接跟跨层残差连接的对比图，使用普通的连接（左图），上层的梯度必须要一层一层传回来；而是用残差连接（右图），相当于中间有了一条更短的路，梯度能够从这条更短的路传回来，避免了梯度过小的情况。\n",
    "\n",
    "假设某层的输入是 $x$，期望输出是 $H(x)$\n",
    "* 如果我们直接把输入 $x$ 传到输出作为初始结果，这就是一个更浅层的网络，更容易训练\n",
    "* 而这个网络没有学习的部分，我们可以使用更深的网络 $F(x)$ 去训练它，使得训练更加容易\n",
    "* 最后希望拟合的结果就是 $F(x) = H(x) - x$，这就是一个残差的结构\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. ResidualBlock\n",
    "\n",
    "残差网络的结构就是上面这种残差块的堆叠，下面让我们来实现一个 residual block"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-12-22T12:56:06.772059Z",
     "start_time": "2017-12-22T12:56:06.766027Z"
    }
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import torch\n",
    "from torch import nn\n",
    "import torch.nn.functional as F\n",
    "from torch.autograd import Variable\n",
    "from torchvision.datasets import CIFAR10\n",
    "from torchvision import transforms as tfs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-12-22T12:47:49.222432Z",
     "start_time": "2017-12-22T12:47:49.217940Z"
    }
   },
   "outputs": [],
   "source": [
    "def conv3x3(in_channel, out_channel, stride=1):\n",
    "    return nn.Conv2d(in_channel, out_channel, 3, \n",
    "                     stride=stride, padding=1, bias=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-12-22T13:14:02.429145Z",
     "start_time": "2017-12-22T13:14:02.383322Z"
    }
   },
   "outputs": [],
   "source": [
    "class Residual_Block(nn.Module):\n",
    "    def __init__(self, in_channel, out_channel, same_shape=True):\n",
    "        super(Residual_Block, self).__init__()\n",
    "        self.same_shape = same_shape\n",
    "        stride=1 if self.same_shape else 2\n",
    "        \n",
    "        self.conv1 = conv3x3(in_channel, out_channel, stride=stride)\n",
    "        self.bn1 = nn.BatchNorm2d(out_channel)\n",
    "        \n",
    "        self.conv2 = conv3x3(out_channel, out_channel)\n",
    "        self.bn2 = nn.BatchNorm2d(out_channel)\n",
    "        if not self.same_shape:\n",
    "            self.conv3 = nn.Conv2d(in_channel, out_channel, 1, \n",
    "                                   stride=stride)\n",
    "        \n",
    "    def forward(self, x):\n",
    "        out = self.conv1(x)\n",
    "        out = F.relu(self.bn1(out), True)\n",
    "        out = self.conv2(out)\n",
    "        out = F.relu(self.bn2(out), True)\n",
    "        \n",
    "        if not self.same_shape:\n",
    "            x = self.conv3(x)\n",
    "        return F.relu(x+out, True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们测试一下一个 residual block 的输入和输出"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-12-22T13:14:05.793185Z",
     "start_time": "2017-12-22T13:14:05.763382Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "input: torch.Size([1, 32, 96, 96])\n",
      "output: torch.Size([1, 32, 96, 96])\n"
     ]
    }
   ],
   "source": [
    "# 输入输出形状相同\n",
    "test_net = Residual_Block(32, 32)\n",
    "test_x = Variable(torch.zeros(1, 32, 96, 96))\n",
    "print('input: {}'.format(test_x.shape))\n",
    "test_y = test_net(test_x)\n",
    "print('output: {}'.format(test_y.shape))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-12-22T13:14:11.929120Z",
     "start_time": "2017-12-22T13:14:11.914604Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "input: torch.Size([1, 3, 96, 96])\n",
      "output: torch.Size([1, 32, 48, 48])\n"
     ]
    }
   ],
   "source": [
    "# 输入输出形状不同\n",
    "test_net = Residual_Block(3, 32, False)\n",
    "test_x = Variable(torch.zeros(1, 3, 96, 96))\n",
    "print('input: {}'.format(test_x.shape))\n",
    "test_y = test_net(test_x)\n",
    "print('output: {}'.format(test_y.shape))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "一个Residual_Block的结构如下图所示\n",
    "\n",
    "![resnet-block.png](images/resnet-block.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. ResNet的网络实现\n",
    "\n",
    "下面实现一个 ResNet，它就是 residual block 模块的堆叠"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-12-22T13:27:46.099404Z",
     "start_time": "2017-12-22T13:27:45.986235Z"
    }
   },
   "outputs": [],
   "source": [
    "class ResNet(nn.Module):\n",
    "    def __init__(self, in_channel, num_classes, verbose=False):\n",
    "        super(ResNet, self).__init__()\n",
    "        self.verbose = verbose\n",
    "        \n",
    "        self.block1 = nn.Conv2d(in_channel, 64, 7, 2)\n",
    "        \n",
    "        self.block2 = nn.Sequential(\n",
    "            nn.MaxPool2d(3, 2),\n",
    "            Residual_Block(64, 64),\n",
    "            Residual_Block(64, 64)\n",
    "        )\n",
    "        \n",
    "        self.block3 = nn.Sequential(\n",
    "            Residual_Block(64, 128, False),\n",
    "            Residual_Block(128, 128)\n",
    "        )\n",
    "        \n",
    "        self.block4 = nn.Sequential(\n",
    "            Residual_Block(128, 256, False),\n",
    "            Residual_Block(256, 256)\n",
    "        )\n",
    "        \n",
    "        self.block5 = nn.Sequential(\n",
    "            Residual_Block(256, 512, False),\n",
    "            Residual_Block(512, 512),\n",
    "            nn.AvgPool2d(3)\n",
    "        )\n",
    "        \n",
    "        self.classifier = nn.Linear(512, num_classes)\n",
    "        \n",
    "    def forward(self, x):\n",
    "        x = self.block1(x)\n",
    "        if self.verbose:\n",
    "            print('block 1 output: {}'.format(x.shape))\n",
    "        x = self.block2(x)\n",
    "        if self.verbose:\n",
    "            print('block 2 output: {}'.format(x.shape))\n",
    "        x = self.block3(x)\n",
    "        if self.verbose:\n",
    "            print('block 3 output: {}'.format(x.shape))\n",
    "        x = self.block4(x)\n",
    "        if self.verbose:\n",
    "            print('block 4 output: {}'.format(x.shape))\n",
    "        x = self.block5(x)\n",
    "        if self.verbose:\n",
    "            print('block 5 output: {}'.format(x.shape))\n",
    "        x = x.view(x.shape[0], -1)\n",
    "        x = self.classifier(x)\n",
    "        return x"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "输出一下每个 block 之后的大小"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-12-22T13:28:00.597030Z",
     "start_time": "2017-12-22T13:28:00.417746Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "block 1 output: torch.Size([1, 64, 45, 45])\n",
      "block 2 output: torch.Size([1, 64, 22, 22])\n",
      "block 3 output: torch.Size([1, 128, 11, 11])\n",
      "block 4 output: torch.Size([1, 256, 6, 6])\n",
      "block 5 output: torch.Size([1, 512, 1, 1])\n",
      "output: torch.Size([1, 10])\n"
     ]
    }
   ],
   "source": [
    "test_net = ResNet(3, 10, True)\n",
    "test_x = Variable(torch.zeros(1, 3, 96, 96))\n",
    "test_y = test_net(test_x)\n",
    "print('output: {}'.format(test_y.shape))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-12-22T13:29:01.484172Z",
     "start_time": "2017-12-22T13:29:00.095952Z"
    }
   },
   "outputs": [],
   "source": [
    "from utils import train\n",
    "\n",
    "def data_tf(x):\n",
    "    im_aug = tfs.Compose([\n",
    "        tfs.Resize(96),\n",
    "        tfs.ToTensor(),\n",
    "        tfs.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])\n",
    "    ])\n",
    "    x = im_aug(x)\n",
    "    return x\n",
    "     \n",
    "train_set  = CIFAR10('../../data', train=True,  transform=data_tf)\n",
    "train_data = torch.utils.data.DataLoader(train_set, batch_size=64, shuffle=True)\n",
    "test_set   = CIFAR10('../../data', train=False, transform=data_tf)\n",
    "test_data  = torch.utils.data.DataLoader(test_set, batch_size=128, shuffle=False)\n",
    "\n",
    "net = ResNet(3, 10)\n",
    "optimizer = torch.optim.Adam(net.parameters(), lr=1e-3)\n",
    "criterion = nn.CrossEntropyLoss()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-12-22T13:45:00.783186Z",
     "start_time": "2017-12-22T13:29:09.214453Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[ 0] Train:(L=1.506980, Acc=0.449868), Valid:(L=1.119623, Acc=0.598596), T: 00:00:48\n",
      "[ 1] Train:(L=1.022635, Acc=0.641504), Valid:(L=0.942414, Acc=0.669600), T: 00:00:47\n",
      "[ 2] Train:(L=0.806174, Acc=0.717551), Valid:(L=0.921687, Acc=0.682061), T: 00:00:47\n",
      "[ 3] Train:(L=0.638939, Acc=0.775555), Valid:(L=0.802450, Acc=0.729727), T: 00:00:47\n",
      "[ 4] Train:(L=0.497571, Acc=0.826606), Valid:(L=0.658700, Acc=0.775316), T: 00:00:47\n",
      "[ 5] Train:(L=0.364864, Acc=0.872442), Valid:(L=0.717290, Acc=0.768888), T: 00:00:47\n",
      "[ 6] Train:(L=0.263076, Acc=0.907888), Valid:(L=0.832575, Acc=0.750000), T: 00:00:47\n",
      "[ 7] Train:(L=0.181254, Acc=0.935782), Valid:(L=0.818366, Acc=0.764933), T: 00:00:47\n",
      "[ 8] Train:(L=0.124111, Acc=0.957820), Valid:(L=0.883527, Acc=0.778184), T: 00:00:47\n",
      "[ 9] Train:(L=0.108587, Acc=0.961657), Valid:(L=0.899127, Acc=0.780756), T: 00:00:47\n",
      "[10] Train:(L=0.091386, Acc=0.968670), Valid:(L=0.975022, Acc=0.781448), T: 00:00:47\n",
      "[11] Train:(L=0.079259, Acc=0.972287), Valid:(L=1.061239, Acc=0.770075), T: 00:00:47\n",
      "[12] Train:(L=0.067858, Acc=0.976123), Valid:(L=1.025909, Acc=0.782140), T: 00:00:47\n",
      "[13] Train:(L=0.064745, Acc=0.977701), Valid:(L=0.987410, Acc=0.789062), T: 00:00:47\n",
      "[14] Train:(L=0.056921, Acc=0.979779), Valid:(L=1.165746, Acc=0.773438), T: 00:00:47\n",
      "[15] Train:(L=0.058128, Acc=0.980039), Valid:(L=1.057119, Acc=0.782437), T: 00:00:47\n",
      "[16] Train:(L=0.050794, Acc=0.982257), Valid:(L=1.098127, Acc=0.779074), T: 00:00:47\n",
      "[17] Train:(L=0.046720, Acc=0.984415), Valid:(L=1.066124, Acc=0.787184), T: 00:00:47\n",
      "[18] Train:(L=0.044737, Acc=0.984375), Valid:(L=1.053032, Acc=0.792029), T: 00:00:47\n"
     ]
    }
   ],
   "source": [
    "res = train(net, train_data, test_data, 20, optimizer, criterion)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline\n",
    "\n",
    "plt.plot(res[0], label='train')\n",
    "plt.plot(res[2], label='valid')\n",
    "plt.xlabel('epoch')\n",
    "plt.ylabel('Loss')\n",
    "plt.legend(loc='best')\n",
    "plt.savefig('fig-res-resnet-train-validate-loss.pdf')\n",
    "plt.show()\n",
    "\n",
    "plt.plot(res[1], label='train')\n",
    "plt.plot(res[3], label='valid')\n",
    "plt.xlabel('epoch')\n",
    "plt.ylabel('Acc')\n",
    "plt.legend(loc='best')\n",
    "plt.savefig('fig-res-resnet-train-validate-acc.pdf')\n",
    "plt.show()\n",
    "\n",
    "# save raw data\n",
    "import numpy\n",
    "numpy.save('fig-res-resnet_data.npy', res)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "ResNet 使用跨层通道使得训练非常深的卷积神经网络成为可能。同样它使用很简单的卷积层配置，使得其拓展更加简单。\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 练习\n",
    "\n",
    "* 尝试一下论文中提出的 bottleneck 的结构   \n",
    "* 尝试改变 conv -> bn -> relu 的顺序为 bn -> relu -> conv，看看精度会不会提高\n",
    "* 在Residual_Block加入1x1卷积，并尝试结果的差别"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 参考资料\n",
    "* [Residual Networks (ResNet)](https://d2l.ai/chapter_convolutional-modern/resnet.html)\n",
    "* [An Overview of ResNet and its Variants](https://towardsdatascience.com/an-overview-of-resnet-and-its-variants-5281e2f56035)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}