|
|
- {
- "cells": [
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "# GoogLeNet\n",
- "\n",
- "前面讲的 VGG 是 2014 年 ImageNet 比赛的亚军,那么冠军是谁呢?就是接下来要讲的 GoogLeNet,这是 Google 的研究人员提出的网络结构,在当时取得了非常大的影响,因为网络的结构变得前所未有,它颠覆了大家对卷积网络的串联的印象和固定做法,采用了一种非常有效的 Inception 模块,得到了比 VGG 更深的网络结构,但是却比 VGG 的参数更少,因为其去掉了后面的全连接层,所以参数大大减少,同时有了很高的计算效率。\n",
- "\n",
- "\n",
- "\n",
- "这是 googlenet 的网络示意图,下面我们介绍一下其作为创新的 Inception 模块。"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Inception 模块\n",
- "\n",
- "在上面的网络中,我们看到了多个四个并行卷积的层,这些四个卷积并行的层就是 Inception 模块\n"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "一个 inception 模块的四个并行线路如下:\n",
- "1. 一个 1 x 1 的卷积,一个小的感受野进行卷积提取特征\n",
- "2. 一个 1 x 1 的卷积加上一个 3 x 3 的卷积,1 x 1 的卷积降低输入的特征通道,减少参数计算量,然后接一个 3 x 3 的卷积做一个较大感受野的卷积\n",
- "3. 一个 1 x 1 的卷积加上一个 5 x 5 的卷积,作用和第二个一样\n",
- "4. 一个 3 x 3 的最大池化加上 1 x 1 的卷积,最大池化改变输入的特征排列,1 x 1 的卷积进行特征提取\n",
- "\n",
- "最后将四个并行线路得到的特征在通道这个维度上拼接在一起,下面是PyTorch的实现一下"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 9,
- "metadata": {
- "ExecuteTime": {
- "end_time": "2017-12-22T12:51:05.427292Z",
- "start_time": "2017-12-22T12:51:04.924747Z"
- },
- "collapsed": true
- },
- "outputs": [],
- "source": [
- "import numpy as np\n",
- "import torch\n",
- "from torch import nn\n",
- "from torch.autograd import Variable\n",
- "from torchvision.datasets import CIFAR10\n",
- "from torchvision import transforms as tfs"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 2,
- "metadata": {
- "ExecuteTime": {
- "end_time": "2017-12-22T12:51:08.890890Z",
- "start_time": "2017-12-22T12:51:08.876313Z"
- },
- "collapsed": true
- },
- "outputs": [],
- "source": [
- "# 定义一个卷积加一个 relu 激活函数和一个 batchnorm 作为一个基本的层结构\n",
- "def Conv_ReLU(in_channel, out_channel, kernel, stride=1, padding=0):\n",
- " layer = nn.Sequential(\n",
- " nn.Conv2d(in_channel, out_channel, kernel, stride, padding),\n",
- " nn.BatchNorm2d(out_channel, eps=1e-3),\n",
- " nn.ReLU(True)\n",
- " )\n",
- " return layer"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 3,
- "metadata": {
- "ExecuteTime": {
- "end_time": "2017-12-22T12:51:09.671474Z",
- "start_time": "2017-12-22T12:51:09.587337Z"
- },
- "collapsed": true
- },
- "outputs": [],
- "source": [
- "class Inception(nn.Module):\n",
- " def __init__(self, in_channel, out1_1, out2_1, out2_3, out3_1, out3_5, out4_1):\n",
- " super(Inception, self).__init__()\n",
- " # 第一条线路\n",
- " self.branch1x1 = Conv_ReLU(in_channel, out1_1, 1)\n",
- " \n",
- " # 第二条线路\n",
- " self.branch3x3 = nn.Sequential( \n",
- " Conv_ReLU(in_channel, out2_1, 1),\n",
- " Conv_ReLU(out2_1, out2_3, 3, padding=1)\n",
- " )\n",
- " \n",
- " # 第三条线路\n",
- " self.branch5x5 = nn.Sequential(\n",
- " Conv_ReLU(in_channel, out3_1, 1),\n",
- " Conv_ReLU(out3_1, out3_5, 5, padding=2)\n",
- " )\n",
- " \n",
- " # 第四条线路\n",
- " self.branch_pool = nn.Sequential(\n",
- " nn.MaxPool2d(3, stride=1, padding=1),\n",
- " Conv_ReLU(in_channel, out4_1, 1)\n",
- " )\n",
- " \n",
- " def forward(self, x):\n",
- " f1 = self.branch1x1(x)\n",
- " f2 = self.branch3x3(x)\n",
- " f3 = self.branch5x5(x)\n",
- " f4 = self.branch_pool(x)\n",
- " output = torch.cat((f1, f2, f3, f4), dim=1)\n",
- " return output"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 4,
- "metadata": {
- "ExecuteTime": {
- "end_time": "2017-12-22T12:51:10.948630Z",
- "start_time": "2017-12-22T12:51:10.757903Z"
- }
- },
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "input shape: 3 x 96 x 96\n",
- "output shape: 256 x 96 x 96\n"
- ]
- }
- ],
- "source": [
- "test_net = Inception(3, 64, 48, 64, 64, 96, 32)\n",
- "test_x = Variable(torch.zeros(1, 3, 96, 96))\n",
- "print('input shape: {} x {} x {}'.format(test_x.shape[1], test_x.shape[2], test_x.shape[3]))\n",
- "test_y = test_net(test_x)\n",
- "print('output shape: {} x {} x {}'.format(test_y.shape[1], test_y.shape[2], test_y.shape[3]))"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "可以看到输入经过了 Inception 模块之后,大小没有变化,通道的维度变多了"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "下面我们定义 GoogLeNet,GoogLeNet 可以看作是很多个 Inception 模块的串联,注意,原论文中使用了多个输出来解决梯度消失的问题,这里只定义一个简单版本的 GoogLeNet,简化为一个输出"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 5,
- "metadata": {
- "ExecuteTime": {
- "end_time": "2017-12-22T12:51:13.149380Z",
- "start_time": "2017-12-22T12:51:12.934110Z"
- },
- "collapsed": true
- },
- "outputs": [],
- "source": [
- "class GoogLeNet(nn.Module):\n",
- " def __init__(self, in_channel, num_classes, verbose=False):\n",
- " super(GoogLeNet, self).__init__()\n",
- " self.verbose = verbose\n",
- " \n",
- " self.block1 = nn.Sequential(\n",
- " Conv_ReLU(in_channel, out_channel=64, kernel=7, stride=2, padding=3),\n",
- " nn.MaxPool2d(kernel_size=3, stride=2)\n",
- " )\n",
- " \n",
- " self.block2 = nn.Sequential(\n",
- " Conv_ReLU(64, 64, kernel=1),\n",
- " Conv_ReLU(64, 192, kernel=3, padding=1),\n",
- " nn.MaxPool2d(kernel_size=3, stride=2)\n",
- " )\n",
- " \n",
- " self.block3 = nn.Sequential(\n",
- " Inception(192, 64, 96, 128, 16, 32, 32),\n",
- " Inception(256, 128, 128, 192, 32, 96, 64),\n",
- " nn.MaxPool2d(kernel_size=3, stride=2)\n",
- " )\n",
- " \n",
- " self.block4 = nn.Sequential(\n",
- " Inception(480, 192, 96, 208, 16, 48, 64),\n",
- " Inception(512, 160, 112, 224, 24, 64, 64),\n",
- " Inception(512, 128, 128, 256, 24, 64, 64),\n",
- " Inception(512, 112, 144, 288, 32, 64, 64),\n",
- " Inception(528, 256, 160, 320, 32, 128, 128),\n",
- " nn.MaxPool2d(kernel_size=3, stride=2)\n",
- " )\n",
- " \n",
- " self.block5 = nn.Sequential(\n",
- " Inception(832, 256, 160, 320, 32, 128, 128),\n",
- " Inception(832, 384, 182, 384, 48, 128, 128),\n",
- " nn.AvgPool2d(kernel_size=2)\n",
- " )\n",
- " \n",
- " self.classifier = nn.Linear(1024, num_classes)\n",
- " \n",
- " def forward(self, x):\n",
- " x = self.block1(x)\n",
- " if self.verbose:\n",
- " print('block 1 output: {}'.format(x.shape))\n",
- " \n",
- " x = self.block2(x)\n",
- " if self.verbose:\n",
- " print('block 2 output: {}'.format(x.shape))\n",
- " \n",
- " x = self.block3(x)\n",
- " if self.verbose:\n",
- " print('block 3 output: {}'.format(x.shape))\n",
- " \n",
- " x = self.block4(x)\n",
- " if self.verbose:\n",
- " print('block 4 output: {}'.format(x.shape))\n",
- " \n",
- " x = self.block5(x)\n",
- " if self.verbose:\n",
- " print('block 5 output: {}'.format(x.shape))\n",
- " \n",
- " x = x.view(x.shape[0], -1)\n",
- " x = self.classifier(x)\n",
- " return x"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 6,
- "metadata": {
- "ExecuteTime": {
- "end_time": "2017-12-22T12:51:13.614936Z",
- "start_time": "2017-12-22T12:51:13.428383Z"
- }
- },
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "block 1 output: torch.Size([1, 64, 23, 23])\n",
- "block 2 output: torch.Size([1, 192, 11, 11])\n",
- "block 3 output: torch.Size([1, 480, 5, 5])\n",
- "block 4 output: torch.Size([1, 832, 2, 2])\n",
- "block 5 output: torch.Size([1, 1024, 1, 1])\n",
- "output: torch.Size([1, 10])\n"
- ]
- }
- ],
- "source": [
- "test_net = GoogLeNet(3, 10, True)\n",
- "test_x = Variable(torch.zeros(1, 3, 96, 96))\n",
- "test_y = test_net(test_x)\n",
- "print('output: {}'.format(test_y.shape))"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "可以看到输入的尺寸不断减小,通道的维度不断增加"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 10,
- "metadata": {
- "ExecuteTime": {
- "end_time": "2017-12-22T12:51:16.387778Z",
- "start_time": "2017-12-22T12:51:15.121350Z"
- },
- "collapsed": true
- },
- "outputs": [],
- "source": [
- "from utils import train\n",
- "\n",
- "def data_tf(x):\n",
- " im_aug = tfs.Compose([\n",
- " tfs.Resize(96),\n",
- " tfs.ToTensor(),\n",
- " tfs.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])\n",
- " ])\n",
- " x = im_aug(x)\n",
- " return x\n",
- " \n",
- "train_set = CIFAR10('../../data', train=True, transform=data_tf)\n",
- "train_data = torch.utils.data.DataLoader(train_set, batch_size=64, shuffle=True)\n",
- "test_set = CIFAR10('../../data', train=False, transform=data_tf)\n",
- "test_data = torch.utils.data.DataLoader(test_set, batch_size=128, shuffle=False)\n",
- "\n",
- "net = GoogLeNet(3, 10)\n",
- "optimizer = torch.optim.Adam(net.parameters(), lr=1e-3)\n",
- "criterion = nn.CrossEntropyLoss()"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 11,
- "metadata": {
- "ExecuteTime": {
- "end_time": "2017-12-22T13:17:25.310685Z",
- "start_time": "2017-12-22T12:51:16.389607Z"
- },
- "scrolled": false
- },
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "[ 0] Train:(L=1.329815, Acc=0.523318), Valid:(L=1.289094, Acc=0.566555), Time 00:01:15\n",
- "[ 1] Train:(L=0.868416, Acc=0.699808), Valid:(L=0.834760, Acc=0.715190), Time 00:01:15\n",
- "[ 2] Train:(L=0.661615, Acc=0.772998), Valid:(L=0.681946, Acc=0.765131), Time 00:01:15\n",
- "[ 3] Train:(L=0.538752, Acc=0.817315), Valid:(L=0.604022, Acc=0.794699), Time 00:01:15\n",
- "[ 4] Train:(L=0.443314, Acc=0.850264), Valid:(L=0.628162, Acc=0.788370), Time 00:01:15\n",
- "[ 5] Train:(L=0.377100, Acc=0.872462), Valid:(L=0.527649, Acc=0.825752), Time 00:01:14\n",
- "[ 6] Train:(L=0.310084, Acc=0.894981), Valid:(L=0.520545, Acc=0.833267), Time 00:01:15\n",
- "[ 7] Train:(L=0.263667, Acc=0.908628), Valid:(L=0.530805, Acc=0.839399), Time 00:01:14\n",
- "[ 8] Train:(L=0.214284, Acc=0.925831), Valid:(L=0.492261, Acc=0.850672), Time 00:01:14\n",
- "[ 9] Train:(L=0.178758, Acc=0.938679), Valid:(L=0.543371, Acc=0.843948), Time 00:01:14\n",
- "[10] Train:(L=0.154360, Acc=0.945213), Valid:(L=0.560078, Acc=0.839794), Time 00:01:14\n",
- "[11] Train:(L=0.127252, Acc=0.957121), Valid:(L=0.607742, Acc=0.833267), Time 00:01:14\n",
- "[12] Train:(L=0.122219, Acc=0.957980), Valid:(L=0.579313, Acc=0.842959), Time 00:01:14\n",
- "[13] Train:(L=0.100576, Acc=0.964734), Valid:(L=0.551588, Acc=0.856507), Time 00:01:14\n",
- "[14] Train:(L=0.085722, Acc=0.969969), Valid:(L=0.571536, Acc=0.851266), Time 00:01:14\n",
- "[15] Train:(L=0.078888, Acc=0.972746), Valid:(L=0.649491, Acc=0.847409), Time 00:01:14\n",
- "[16] Train:(L=0.079078, Acc=0.973026), Valid:(L=0.681464, Acc=0.840487), Time 00:01:14\n",
- "[17] Train:(L=0.069273, Acc=0.976582), Valid:(L=0.615183, Acc=0.848991), Time 00:01:15\n",
- "[18] Train:(L=0.062320, Acc=0.978780), Valid:(L=0.618147, Acc=0.858584), Time 00:01:16\n",
- "[19] Train:(L=0.060656, Acc=0.979220), Valid:(L=0.613905, Acc=0.857002), Time 00:01:17\n"
- ]
- }
- ],
- "source": [
- "res = train(net, train_data, test_data, 20, optimizer, criterion)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 12,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "image/png": "\n",
- "text/plain": [
- "<Figure size 432x288 with 1 Axes>"
- ]
- },
- "metadata": {
- "needs_background": "light"
- },
- "output_type": "display_data"
- },
- {
- "data": {
- "image/png": "\n",
- "text/plain": [
- "<Figure size 432x288 with 1 Axes>"
- ]
- },
- "metadata": {
- "needs_background": "light"
- },
- "output_type": "display_data"
- }
- ],
- "source": [
- "import matplotlib.pyplot as plt\n",
- "%matplotlib inline\n",
- "\n",
- "plt.plot(res[0], label='train')\n",
- "plt.plot(res[2], label='valid')\n",
- "plt.xlabel('epoch')\n",
- "plt.ylabel('Loss')\n",
- "plt.legend(loc='best')\n",
- "plt.savefig('fig-res-googlenet-train-validate-loss.pdf')\n",
- "plt.show()\n",
- "\n",
- "plt.plot(res[1], label='train')\n",
- "plt.plot(res[3], label='valid')\n",
- "plt.xlabel('epoch')\n",
- "plt.ylabel('Acc')\n",
- "plt.legend(loc='best')\n",
- "plt.savefig('fig-res-googlenet-train-validate-acc.pdf')\n",
- "plt.show()\n",
- "\n",
- "# save raw data\n",
- "import numpy\n",
- "numpy.save('fig-res-googlenet_data.npy', res)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "GoogLeNet 加入了更加结构化的 Inception 块使得我们能够使用更大的通道,更多的层,同时也控制了计算量。\n",
- "\n"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## 练习\n",
- "GoogLeNet 有很多后续的版本,尝试看看论文,并亲自实现,看看有什么不同\n",
- "* v1:最早的版本 \n",
- "* v2:加入 batch normalization 加快训练 \n",
- "* v3:对 inception 模块做了调整 \n",
- "* v4:基于 ResNet 加入了 残差连接\n"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## 参考资料\n",
- "* [深入理解GoogLeNet结构](https://zhuanlan.zhihu.com/p/32702031)"
- ]
- }
- ],
- "metadata": {
- "kernelspec": {
- "display_name": "Python 3",
- "language": "python",
- "name": "python3"
- },
- "language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
- },
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.5.4"
- }
- },
- "nbformat": 4,
- "nbformat_minor": 2
- }
|