OpenI
/
machinelearning_notebook

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# GoogLeNet\n",
    "前面我们讲的 VGG 是 2014 年 ImageNet 比赛的亚军，那么冠军是谁呢？就是我们马上要讲的 GoogLeNet，这是 Google 的研究人员提出的网络结构，在当时取得了非常大的影响，因为网络的结构变得前所未有，它颠覆了大家对卷积网络的串联的印象和固定做法，采用了一种非常有效的 inception 模块，得到了比 VGG 更深的网络结构，但是却比 VGG 的参数更少，因为其去掉了后面的全连接层，所以参数大大减少，同时有了很高的计算效率。\n",
    "\n",
    "![](https://ws2.sinaimg.cn/large/006tNc79ly1fmprhdocouj30qb08vac3.jpg)\n",
    "\n",
    "这是 googlenet 的网络示意图，下面我们介绍一下其作为创新的 inception 模块。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Inception 模块\n",
    "在上面的网络中，我们看到了多个四个并行卷积的层，这些四个卷积并行的层就是 inception 模块，可视化如下\n",
    "\n",
    "![](https://ws4.sinaimg.cn/large/006tNc79gy1fmprivb2hxj30dn09dwef.jpg)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "一个 inception 模块的四个并行线路如下：\n",
    "1.一个 1 x 1 的卷积，一个小的感受野进行卷积提取特征\n",
    "2.一个 1 x 1 的卷积加上一个 3 x 3 的卷积，1 x 1 的卷积降低输入的特征通道，减少参数计算量，然后接一个 3 x 3 的卷积做一个较大感受野的卷积\n",
    "3.一个 1 x 1 的卷积加上一个 5 x 5 的卷积，作用和第二个一样\n",
    "4.一个 3 x 3 的最大池化加上 1 x 1 的卷积，最大池化改变输入的特征排列，1 x 1 的卷积进行特征提取\n",
    "\n",
    "最后将四个并行线路得到的特征在通道这个维度上拼接在一起，下面我们可以实现一下"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-12-22T12:51:05.427292Z",
     "start_time": "2017-12-22T12:51:04.924747Z"
    },
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import sys\n",
    "sys.path.append('..')\n",
    " \n",
    "import numpy as np\n",
    "import torch\n",
    "from torch import nn\n",
    "from torch.autograd import Variable\n",
    "from torchvision.datasets import CIFAR10"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-12-22T12:51:08.890890Z",
     "start_time": "2017-12-22T12:51:08.876313Z"
    },
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# 定义一个卷积加一个 relu 激活函数和一个 batchnorm 作为一个基本的层结构\n",
    "def conv_relu(in_channel, out_channel, kernel, stride=1, padding=0):\n",
    "    layer = nn.Sequential(\n",
    "        nn.Conv2d(in_channel, out_channel, kernel, stride, padding),\n",
    "        nn.BatchNorm2d(out_channel, eps=1e-3),\n",
    "        nn.ReLU(True)\n",
    "    )\n",
    "    return layer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-12-22T12:51:09.671474Z",
     "start_time": "2017-12-22T12:51:09.587337Z"
    },
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "class inception(nn.Module):\n",
    "    def __init__(self, in_channel, out1_1, out2_1, out2_3, out3_1, out3_5, out4_1):\n",
    "        super(inception, self).__init__()\n",
    "        # 第一条线路\n",
    "        self.branch1x1 = conv_relu(in_channel, out1_1, 1)\n",
    "        \n",
    "        # 第二条线路\n",
    "        self.branch3x3 = nn.Sequential( \n",
    "            conv_relu(in_channel, out2_1, 1),\n",
    "            conv_relu(out2_1, out2_3, 3, padding=1)\n",
    "        )\n",
    "        \n",
    "        # 第三条线路\n",
    "        self.branch5x5 = nn.Sequential(\n",
    "            conv_relu(in_channel, out3_1, 1),\n",
    "            conv_relu(out3_1, out3_5, 5, padding=2)\n",
    "        )\n",
    "        \n",
    "        # 第四条线路\n",
    "        self.branch_pool = nn.Sequential(\n",
    "            nn.MaxPool2d(3, stride=1, padding=1),\n",
    "            conv_relu(in_channel, out4_1, 1)\n",
    "        )\n",
    "        \n",
    "    def forward(self, x):\n",
    "        f1 = self.branch1x1(x)\n",
    "        f2 = self.branch3x3(x)\n",
    "        f3 = self.branch5x5(x)\n",
    "        f4 = self.branch_pool(x)\n",
    "        output = torch.cat((f1, f2, f3, f4), dim=1)\n",
    "        return output"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-12-22T12:51:10.948630Z",
     "start_time": "2017-12-22T12:51:10.757903Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "input shape: 3 x 96 x 96\n",
      "output shape: 256 x 96 x 96\n"
     ]
    }
   ],
   "source": [
    "test_net = inception(3, 64, 48, 64, 64, 96, 32)\n",
    "test_x = Variable(torch.zeros(1, 3, 96, 96))\n",
    "print('input shape: {} x {} x {}'.format(test_x.shape[1], test_x.shape[2], test_x.shape[3]))\n",
    "test_y = test_net(test_x)\n",
    "print('output shape: {} x {} x {}'.format(test_y.shape[1], test_y.shape[2], test_y.shape[3]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "可以看到输入经过了 inception 模块之后，大小没有变化，通道的维度变多了"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "下面我们定义 GoogLeNet，GoogLeNet 可以看作是很多个 inception 模块的串联，注意，原论文中使用了多个输出来解决梯度消失的问题，这里我们只定义一个简单版本的 GoogLeNet，简化为一个输出"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-12-22T12:51:13.149380Z",
     "start_time": "2017-12-22T12:51:12.934110Z"
    },
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "class googlenet(nn.Module):\n",
    "    def __init__(self, in_channel, num_classes, verbose=False):\n",
    "        super(googlenet, self).__init__()\n",
    "        self.verbose = verbose\n",
    "        \n",
    "        self.block1 = nn.Sequential(\n",
    "            conv_relu(in_channel, out_channel=64, kernel=7, stride=2, padding=3),\n",
    "            nn.MaxPool2d(3, 2)\n",
    "        )\n",
    "        \n",
    "        self.block2 = nn.Sequential(\n",
    "            conv_relu(64, 64, kernel=1),\n",
    "            conv_relu(64, 192, kernel=3, padding=1),\n",
    "            nn.MaxPool2d(3, 2)\n",
    "        )\n",
    "        \n",
    "        self.block3 = nn.Sequential(\n",
    "            inception(192, 64, 96, 128, 16, 32, 32),\n",
    "            inception(256, 128, 128, 192, 32, 96, 64),\n",
    "            nn.MaxPool2d(3, 2)\n",
    "        )\n",
    "        \n",
    "        self.block4 = nn.Sequential(\n",
    "            inception(480, 192, 96, 208, 16, 48, 64),\n",
    "            inception(512, 160, 112, 224, 24, 64, 64),\n",
    "            inception(512, 128, 128, 256, 24, 64, 64),\n",
    "            inception(512, 112, 144, 288, 32, 64, 64),\n",
    "            inception(528, 256, 160, 320, 32, 128, 128),\n",
    "            nn.MaxPool2d(3, 2)\n",
    "        )\n",
    "        \n",
    "        self.block5 = nn.Sequential(\n",
    "            inception(832, 256, 160, 320, 32, 128, 128),\n",
    "            inception(832, 384, 182, 384, 48, 128, 128),\n",
    "            nn.AvgPool2d(2)\n",
    "        )\n",
    "        \n",
    "        self.classifier = nn.Linear(1024, num_classes)\n",
    "        \n",
    "    def forward(self, x):\n",
    "        x = self.block1(x)\n",
    "        if self.verbose:\n",
    "            print('block 1 output: {}'.format(x.shape))\n",
    "        x = self.block2(x)\n",
    "        if self.verbose:\n",
    "            print('block 2 output: {}'.format(x.shape))\n",
    "        x = self.block3(x)\n",
    "        if self.verbose:\n",
    "            print('block 3 output: {}'.format(x.shape))\n",
    "        x = self.block4(x)\n",
    "        if self.verbose:\n",
    "            print('block 4 output: {}'.format(x.shape))\n",
    "        x = self.block5(x)\n",
    "        if self.verbose:\n",
    "            print('block 5 output: {}'.format(x.shape))\n",
    "        x = x.view(x.shape[0], -1)\n",
    "        x = self.classifier(x)\n",
    "        return x"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-12-22T12:51:13.614936Z",
     "start_time": "2017-12-22T12:51:13.428383Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "block 1 output: torch.Size([1, 64, 23, 23])\n",
      "block 2 output: torch.Size([1, 192, 11, 11])\n",
      "block 3 output: torch.Size([1, 480, 5, 5])\n",
      "block 4 output: torch.Size([1, 832, 2, 2])\n",
      "block 5 output: torch.Size([1, 1024, 1, 1])\n",
      "output: torch.Size([1, 10])\n"
     ]
    }
   ],
   "source": [
    "test_net = googlenet(3, 10, True)\n",
    "test_x = Variable(torch.zeros(1, 3, 96, 96))\n",
    "test_y = test_net(test_x)\n",
    "print('output: {}'.format(test_y.shape))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "可以看到输入的尺寸不断减小，通道的维度不断增加"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-12-22T12:51:16.387778Z",
     "start_time": "2017-12-22T12:51:15.121350Z"
    },
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from utils import train\n",
    "\n",
    "def data_tf(x):\n",
    "    x = x.resize((96, 96), 2) # 将图片放大到 96 x 96\n",
    "    x = np.array(x, dtype='float32') / 255\n",
    "    x = (x - 0.5) / 0.5 # 标准化，这个技巧之后会讲到\n",
    "    x = x.transpose((2, 0, 1)) # 将 channel 放到第一维，只是 pytorch 要求的输入方式\n",
    "    x = torch.from_numpy(x)\n",
    "    return x\n",
    "     \n",
    "train_set = CIFAR10('./data', train=True, transform=data_tf)\n",
    "train_data = torch.utils.data.DataLoader(train_set, batch_size=64, shuffle=True)\n",
    "test_set = CIFAR10('./data', train=False, transform=data_tf)\n",
    "test_data = torch.utils.data.DataLoader(test_set, batch_size=128, shuffle=False)\n",
    "\n",
    "net = googlenet(3, 10)\n",
    "optimizer = torch.optim.SGD(net.parameters(), lr=0.01)\n",
    "criterion = nn.CrossEntropyLoss()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-12-22T13:17:25.310685Z",
     "start_time": "2017-12-22T12:51:16.389607Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 0. Train Loss: 1.504840, Train Acc: 0.452605, Valid Loss: 1.372426, Valid Acc: 0.514339, Time 00:01:25\n",
      "Epoch 1. Train Loss: 1.046663, Train Acc: 0.630734, Valid Loss: 1.147823, Valid Acc: 0.606309, Time 00:01:02\n",
      "Epoch 2. Train Loss: 0.833869, Train Acc: 0.710618, Valid Loss: 1.017181, Valid Acc: 0.644284, Time 00:00:54\n",
      "Epoch 3. Train Loss: 0.688739, Train Acc: 0.760670, Valid Loss: 0.847099, Valid Acc: 0.712520, Time 00:00:58\n",
      "Epoch 4. Train Loss: 0.576516, Train Acc: 0.801111, Valid Loss: 0.850494, Valid Acc: 0.706487, Time 00:01:01\n",
      "Epoch 5. Train Loss: 0.483854, Train Acc: 0.832241, Valid Loss: 0.802392, Valid Acc: 0.726958, Time 00:01:08\n",
      "Epoch 6. Train Loss: 0.410416, Train Acc: 0.857657, Valid Loss: 0.865246, Valid Acc: 0.721618, Time 00:01:23\n",
      "Epoch 7. Train Loss: 0.346010, Train Acc: 0.881813, Valid Loss: 0.850472, Valid Acc: 0.729430, Time 00:01:28\n",
      "Epoch 8. Train Loss: 0.289854, Train Acc: 0.900815, Valid Loss: 1.313582, Valid Acc: 0.650712, Time 00:01:22\n",
      "Epoch 9. Train Loss: 0.239552, Train Acc: 0.918378, Valid Loss: 0.970173, Valid Acc: 0.726661, Time 00:01:30\n",
      "Epoch 10. Train Loss: 0.212439, Train Acc: 0.927270, Valid Loss: 1.188284, Valid Acc: 0.665843, Time 00:01:29\n",
      "Epoch 11. Train Loss: 0.175206, Train Acc: 0.939758, Valid Loss: 0.736437, Valid Acc: 0.790051, Time 00:01:29\n",
      "Epoch 12. Train Loss: 0.140491, Train Acc: 0.952366, Valid Loss: 0.878171, Valid Acc: 0.764241, Time 00:01:14\n",
      "Epoch 13. Train Loss: 0.127249, Train Acc: 0.956981, Valid Loss: 1.159881, Valid Acc: 0.731309, Time 00:01:00\n",
      "Epoch 14. Train Loss: 0.108748, Train Acc: 0.962836, Valid Loss: 1.234320, Valid Acc: 0.716377, Time 00:01:23\n",
      "Epoch 15. Train Loss: 0.091655, Train Acc: 0.969030, Valid Loss: 0.822575, Valid Acc: 0.790348, Time 00:01:28\n",
      "Epoch 16. Train Loss: 0.086218, Train Acc: 0.970309, Valid Loss: 0.943607, Valid Acc: 0.767306, Time 00:01:24\n",
      "Epoch 17. Train Loss: 0.069979, Train Acc: 0.976822, Valid Loss: 1.038973, Valid Acc: 0.755340, Time 00:01:22\n",
      "Epoch 18. Train Loss: 0.066750, Train Acc: 0.977322, Valid Loss: 0.838827, Valid Acc: 0.801226, Time 00:01:23\n",
      "Epoch 19. Train Loss: 0.052757, Train Acc: 0.982577, Valid Loss: 0.876127, Valid Acc: 0.796479, Time 00:01:25\n"
     ]
    }
   ],
   "source": [
    "train(net, train_data, test_data, 20, optimizer, criterion)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "GoogLeNet 加入了更加结构化的 Inception 块使得我们能够使用更大的通道，更多的层，同时也控制了计算量。\n",
    "\n",
    "**小练习：GoogLeNet 有很多后续的版本，尝试看看论文，看看有什么不同，实现一下：  \n",
    "v1：最早的版本  \n",
    "v2：加入 batch normalization 加快训练  \n",
    "v3：对 inception 模块做了调整  \n",
    "v4：基于 ResNet 加入了 残差连接  **"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}