You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

7-googlenet.ipynb 15 kB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385
  1. {
  2. "cells": [
  3. {
  4. "cell_type": "markdown",
  5. "metadata": {},
  6. "source": [
  7. "# GoogLeNet\n",
  8. "前面我们讲的 VGG 是 2014 年 ImageNet 比赛的亚军,那么冠军是谁呢?就是我们马上要讲的 GoogLeNet,这是 Google 的研究人员提出的网络结构,在当时取得了非常大的影响,因为网络的结构变得前所未有,它颠覆了大家对卷积网络的串联的印象和固定做法,采用了一种非常有效的 inception 模块,得到了比 VGG 更深的网络结构,但是却比 VGG 的参数更少,因为其去掉了后面的全连接层,所以参数大大减少,同时有了很高的计算效率。\n",
  9. "\n",
  10. "![](https://ws2.sinaimg.cn/large/006tNc79ly1fmprhdocouj30qb08vac3.jpg)\n",
  11. "\n",
  12. "这是 googlenet 的网络示意图,下面我们介绍一下其作为创新的 inception 模块。"
  13. ]
  14. },
  15. {
  16. "cell_type": "markdown",
  17. "metadata": {},
  18. "source": [
  19. "## Inception 模块\n",
  20. "在上面的网络中,我们看到了多个四个并行卷积的层,这些四个卷积并行的层就是 inception 模块,可视化如下\n",
  21. "\n",
  22. "![](https://ws4.sinaimg.cn/large/006tNc79gy1fmprivb2hxj30dn09dwef.jpg)\n"
  23. ]
  24. },
  25. {
  26. "cell_type": "markdown",
  27. "metadata": {},
  28. "source": [
  29. "一个 inception 模块的四个并行线路如下:\n",
  30. "1.一个 1 x 1 的卷积,一个小的感受野进行卷积提取特征\n",
  31. "2.一个 1 x 1 的卷积加上一个 3 x 3 的卷积,1 x 1 的卷积降低输入的特征通道,减少参数计算量,然后接一个 3 x 3 的卷积做一个较大感受野的卷积\n",
  32. "3.一个 1 x 1 的卷积加上一个 5 x 5 的卷积,作用和第二个一样\n",
  33. "4.一个 3 x 3 的最大池化加上 1 x 1 的卷积,最大池化改变输入的特征排列,1 x 1 的卷积进行特征提取\n",
  34. "\n",
  35. "最后将四个并行线路得到的特征在通道这个维度上拼接在一起,下面我们可以实现一下"
  36. ]
  37. },
  38. {
  39. "cell_type": "code",
  40. "execution_count": 1,
  41. "metadata": {
  42. "ExecuteTime": {
  43. "end_time": "2017-12-22T12:51:05.427292Z",
  44. "start_time": "2017-12-22T12:51:04.924747Z"
  45. },
  46. "collapsed": true
  47. },
  48. "outputs": [],
  49. "source": [
  50. "import sys\n",
  51. "sys.path.append('..')\n",
  52. " \n",
  53. "import numpy as np\n",
  54. "import torch\n",
  55. "from torch import nn\n",
  56. "from torch.autograd import Variable\n",
  57. "from torchvision.datasets import CIFAR10"
  58. ]
  59. },
  60. {
  61. "cell_type": "code",
  62. "execution_count": 3,
  63. "metadata": {
  64. "ExecuteTime": {
  65. "end_time": "2017-12-22T12:51:08.890890Z",
  66. "start_time": "2017-12-22T12:51:08.876313Z"
  67. },
  68. "collapsed": true
  69. },
  70. "outputs": [],
  71. "source": [
  72. "# 定义一个卷积加一个 relu 激活函数和一个 batchnorm 作为一个基本的层结构\n",
  73. "def conv_relu(in_channel, out_channel, kernel, stride=1, padding=0):\n",
  74. " layer = nn.Sequential(\n",
  75. " nn.Conv2d(in_channel, out_channel, kernel, stride, padding),\n",
  76. " nn.BatchNorm2d(out_channel, eps=1e-3),\n",
  77. " nn.ReLU(True)\n",
  78. " )\n",
  79. " return layer"
  80. ]
  81. },
  82. {
  83. "cell_type": "code",
  84. "execution_count": 4,
  85. "metadata": {
  86. "ExecuteTime": {
  87. "end_time": "2017-12-22T12:51:09.671474Z",
  88. "start_time": "2017-12-22T12:51:09.587337Z"
  89. },
  90. "collapsed": true
  91. },
  92. "outputs": [],
  93. "source": [
  94. "class inception(nn.Module):\n",
  95. " def __init__(self, in_channel, out1_1, out2_1, out2_3, out3_1, out3_5, out4_1):\n",
  96. " super(inception, self).__init__()\n",
  97. " # 第一条线路\n",
  98. " self.branch1x1 = conv_relu(in_channel, out1_1, 1)\n",
  99. " \n",
  100. " # 第二条线路\n",
  101. " self.branch3x3 = nn.Sequential( \n",
  102. " conv_relu(in_channel, out2_1, 1),\n",
  103. " conv_relu(out2_1, out2_3, 3, padding=1)\n",
  104. " )\n",
  105. " \n",
  106. " # 第三条线路\n",
  107. " self.branch5x5 = nn.Sequential(\n",
  108. " conv_relu(in_channel, out3_1, 1),\n",
  109. " conv_relu(out3_1, out3_5, 5, padding=2)\n",
  110. " )\n",
  111. " \n",
  112. " # 第四条线路\n",
  113. " self.branch_pool = nn.Sequential(\n",
  114. " nn.MaxPool2d(3, stride=1, padding=1),\n",
  115. " conv_relu(in_channel, out4_1, 1)\n",
  116. " )\n",
  117. " \n",
  118. " def forward(self, x):\n",
  119. " f1 = self.branch1x1(x)\n",
  120. " f2 = self.branch3x3(x)\n",
  121. " f3 = self.branch5x5(x)\n",
  122. " f4 = self.branch_pool(x)\n",
  123. " output = torch.cat((f1, f2, f3, f4), dim=1)\n",
  124. " return output"
  125. ]
  126. },
  127. {
  128. "cell_type": "code",
  129. "execution_count": 5,
  130. "metadata": {
  131. "ExecuteTime": {
  132. "end_time": "2017-12-22T12:51:10.948630Z",
  133. "start_time": "2017-12-22T12:51:10.757903Z"
  134. }
  135. },
  136. "outputs": [
  137. {
  138. "name": "stdout",
  139. "output_type": "stream",
  140. "text": [
  141. "input shape: 3 x 96 x 96\n",
  142. "output shape: 256 x 96 x 96\n"
  143. ]
  144. }
  145. ],
  146. "source": [
  147. "test_net = inception(3, 64, 48, 64, 64, 96, 32)\n",
  148. "test_x = Variable(torch.zeros(1, 3, 96, 96))\n",
  149. "print('input shape: {} x {} x {}'.format(test_x.shape[1], test_x.shape[2], test_x.shape[3]))\n",
  150. "test_y = test_net(test_x)\n",
  151. "print('output shape: {} x {} x {}'.format(test_y.shape[1], test_y.shape[2], test_y.shape[3]))"
  152. ]
  153. },
  154. {
  155. "cell_type": "markdown",
  156. "metadata": {},
  157. "source": [
  158. "可以看到输入经过了 inception 模块之后,大小没有变化,通道的维度变多了"
  159. ]
  160. },
  161. {
  162. "cell_type": "markdown",
  163. "metadata": {},
  164. "source": [
  165. "下面我们定义 GoogLeNet,GoogLeNet 可以看作是很多个 inception 模块的串联,注意,原论文中使用了多个输出来解决梯度消失的问题,这里我们只定义一个简单版本的 GoogLeNet,简化为一个输出"
  166. ]
  167. },
  168. {
  169. "cell_type": "code",
  170. "execution_count": 6,
  171. "metadata": {
  172. "ExecuteTime": {
  173. "end_time": "2017-12-22T12:51:13.149380Z",
  174. "start_time": "2017-12-22T12:51:12.934110Z"
  175. },
  176. "collapsed": true
  177. },
  178. "outputs": [],
  179. "source": [
  180. "class googlenet(nn.Module):\n",
  181. " def __init__(self, in_channel, num_classes, verbose=False):\n",
  182. " super(googlenet, self).__init__()\n",
  183. " self.verbose = verbose\n",
  184. " \n",
  185. " self.block1 = nn.Sequential(\n",
  186. " conv_relu(in_channel, out_channel=64, kernel=7, stride=2, padding=3),\n",
  187. " nn.MaxPool2d(3, 2)\n",
  188. " )\n",
  189. " \n",
  190. " self.block2 = nn.Sequential(\n",
  191. " conv_relu(64, 64, kernel=1),\n",
  192. " conv_relu(64, 192, kernel=3, padding=1),\n",
  193. " nn.MaxPool2d(3, 2)\n",
  194. " )\n",
  195. " \n",
  196. " self.block3 = nn.Sequential(\n",
  197. " inception(192, 64, 96, 128, 16, 32, 32),\n",
  198. " inception(256, 128, 128, 192, 32, 96, 64),\n",
  199. " nn.MaxPool2d(3, 2)\n",
  200. " )\n",
  201. " \n",
  202. " self.block4 = nn.Sequential(\n",
  203. " inception(480, 192, 96, 208, 16, 48, 64),\n",
  204. " inception(512, 160, 112, 224, 24, 64, 64),\n",
  205. " inception(512, 128, 128, 256, 24, 64, 64),\n",
  206. " inception(512, 112, 144, 288, 32, 64, 64),\n",
  207. " inception(528, 256, 160, 320, 32, 128, 128),\n",
  208. " nn.MaxPool2d(3, 2)\n",
  209. " )\n",
  210. " \n",
  211. " self.block5 = nn.Sequential(\n",
  212. " inception(832, 256, 160, 320, 32, 128, 128),\n",
  213. " inception(832, 384, 182, 384, 48, 128, 128),\n",
  214. " nn.AvgPool2d(2)\n",
  215. " )\n",
  216. " \n",
  217. " self.classifier = nn.Linear(1024, num_classes)\n",
  218. " \n",
  219. " def forward(self, x):\n",
  220. " x = self.block1(x)\n",
  221. " if self.verbose:\n",
  222. " print('block 1 output: {}'.format(x.shape))\n",
  223. " x = self.block2(x)\n",
  224. " if self.verbose:\n",
  225. " print('block 2 output: {}'.format(x.shape))\n",
  226. " x = self.block3(x)\n",
  227. " if self.verbose:\n",
  228. " print('block 3 output: {}'.format(x.shape))\n",
  229. " x = self.block4(x)\n",
  230. " if self.verbose:\n",
  231. " print('block 4 output: {}'.format(x.shape))\n",
  232. " x = self.block5(x)\n",
  233. " if self.verbose:\n",
  234. " print('block 5 output: {}'.format(x.shape))\n",
  235. " x = x.view(x.shape[0], -1)\n",
  236. " x = self.classifier(x)\n",
  237. " return x"
  238. ]
  239. },
  240. {
  241. "cell_type": "code",
  242. "execution_count": 7,
  243. "metadata": {
  244. "ExecuteTime": {
  245. "end_time": "2017-12-22T12:51:13.614936Z",
  246. "start_time": "2017-12-22T12:51:13.428383Z"
  247. }
  248. },
  249. "outputs": [
  250. {
  251. "name": "stdout",
  252. "output_type": "stream",
  253. "text": [
  254. "block 1 output: torch.Size([1, 64, 23, 23])\n",
  255. "block 2 output: torch.Size([1, 192, 11, 11])\n",
  256. "block 3 output: torch.Size([1, 480, 5, 5])\n",
  257. "block 4 output: torch.Size([1, 832, 2, 2])\n",
  258. "block 5 output: torch.Size([1, 1024, 1, 1])\n",
  259. "output: torch.Size([1, 10])\n"
  260. ]
  261. }
  262. ],
  263. "source": [
  264. "test_net = googlenet(3, 10, True)\n",
  265. "test_x = Variable(torch.zeros(1, 3, 96, 96))\n",
  266. "test_y = test_net(test_x)\n",
  267. "print('output: {}'.format(test_y.shape))"
  268. ]
  269. },
  270. {
  271. "cell_type": "markdown",
  272. "metadata": {},
  273. "source": [
  274. "可以看到输入的尺寸不断减小,通道的维度不断增加"
  275. ]
  276. },
  277. {
  278. "cell_type": "code",
  279. "execution_count": 8,
  280. "metadata": {
  281. "ExecuteTime": {
  282. "end_time": "2017-12-22T12:51:16.387778Z",
  283. "start_time": "2017-12-22T12:51:15.121350Z"
  284. },
  285. "collapsed": true
  286. },
  287. "outputs": [],
  288. "source": [
  289. "from utils import train\n",
  290. "\n",
  291. "def data_tf(x):\n",
  292. " x = x.resize((96, 96), 2) # 将图片放大到 96 x 96\n",
  293. " x = np.array(x, dtype='float32') / 255\n",
  294. " x = (x - 0.5) / 0.5 # 标准化,这个技巧之后会讲到\n",
  295. " x = x.transpose((2, 0, 1)) # 将 channel 放到第一维,只是 pytorch 要求的输入方式\n",
  296. " x = torch.from_numpy(x)\n",
  297. " return x\n",
  298. " \n",
  299. "train_set = CIFAR10('./data', train=True, transform=data_tf)\n",
  300. "train_data = torch.utils.data.DataLoader(train_set, batch_size=64, shuffle=True)\n",
  301. "test_set = CIFAR10('./data', train=False, transform=data_tf)\n",
  302. "test_data = torch.utils.data.DataLoader(test_set, batch_size=128, shuffle=False)\n",
  303. "\n",
  304. "net = googlenet(3, 10)\n",
  305. "optimizer = torch.optim.SGD(net.parameters(), lr=0.01)\n",
  306. "criterion = nn.CrossEntropyLoss()"
  307. ]
  308. },
  309. {
  310. "cell_type": "code",
  311. "execution_count": 9,
  312. "metadata": {
  313. "ExecuteTime": {
  314. "end_time": "2017-12-22T13:17:25.310685Z",
  315. "start_time": "2017-12-22T12:51:16.389607Z"
  316. }
  317. },
  318. "outputs": [
  319. {
  320. "name": "stdout",
  321. "output_type": "stream",
  322. "text": [
  323. "Epoch 0. Train Loss: 1.504840, Train Acc: 0.452605, Valid Loss: 1.372426, Valid Acc: 0.514339, Time 00:01:25\n",
  324. "Epoch 1. Train Loss: 1.046663, Train Acc: 0.630734, Valid Loss: 1.147823, Valid Acc: 0.606309, Time 00:01:02\n",
  325. "Epoch 2. Train Loss: 0.833869, Train Acc: 0.710618, Valid Loss: 1.017181, Valid Acc: 0.644284, Time 00:00:54\n",
  326. "Epoch 3. Train Loss: 0.688739, Train Acc: 0.760670, Valid Loss: 0.847099, Valid Acc: 0.712520, Time 00:00:58\n",
  327. "Epoch 4. Train Loss: 0.576516, Train Acc: 0.801111, Valid Loss: 0.850494, Valid Acc: 0.706487, Time 00:01:01\n",
  328. "Epoch 5. Train Loss: 0.483854, Train Acc: 0.832241, Valid Loss: 0.802392, Valid Acc: 0.726958, Time 00:01:08\n",
  329. "Epoch 6. Train Loss: 0.410416, Train Acc: 0.857657, Valid Loss: 0.865246, Valid Acc: 0.721618, Time 00:01:23\n",
  330. "Epoch 7. Train Loss: 0.346010, Train Acc: 0.881813, Valid Loss: 0.850472, Valid Acc: 0.729430, Time 00:01:28\n",
  331. "Epoch 8. Train Loss: 0.289854, Train Acc: 0.900815, Valid Loss: 1.313582, Valid Acc: 0.650712, Time 00:01:22\n",
  332. "Epoch 9. Train Loss: 0.239552, Train Acc: 0.918378, Valid Loss: 0.970173, Valid Acc: 0.726661, Time 00:01:30\n",
  333. "Epoch 10. Train Loss: 0.212439, Train Acc: 0.927270, Valid Loss: 1.188284, Valid Acc: 0.665843, Time 00:01:29\n",
  334. "Epoch 11. Train Loss: 0.175206, Train Acc: 0.939758, Valid Loss: 0.736437, Valid Acc: 0.790051, Time 00:01:29\n",
  335. "Epoch 12. Train Loss: 0.140491, Train Acc: 0.952366, Valid Loss: 0.878171, Valid Acc: 0.764241, Time 00:01:14\n",
  336. "Epoch 13. Train Loss: 0.127249, Train Acc: 0.956981, Valid Loss: 1.159881, Valid Acc: 0.731309, Time 00:01:00\n",
  337. "Epoch 14. Train Loss: 0.108748, Train Acc: 0.962836, Valid Loss: 1.234320, Valid Acc: 0.716377, Time 00:01:23\n",
  338. "Epoch 15. Train Loss: 0.091655, Train Acc: 0.969030, Valid Loss: 0.822575, Valid Acc: 0.790348, Time 00:01:28\n",
  339. "Epoch 16. Train Loss: 0.086218, Train Acc: 0.970309, Valid Loss: 0.943607, Valid Acc: 0.767306, Time 00:01:24\n",
  340. "Epoch 17. Train Loss: 0.069979, Train Acc: 0.976822, Valid Loss: 1.038973, Valid Acc: 0.755340, Time 00:01:22\n",
  341. "Epoch 18. Train Loss: 0.066750, Train Acc: 0.977322, Valid Loss: 0.838827, Valid Acc: 0.801226, Time 00:01:23\n",
  342. "Epoch 19. Train Loss: 0.052757, Train Acc: 0.982577, Valid Loss: 0.876127, Valid Acc: 0.796479, Time 00:01:25\n"
  343. ]
  344. }
  345. ],
  346. "source": [
  347. "train(net, train_data, test_data, 20, optimizer, criterion)"
  348. ]
  349. },
  350. {
  351. "cell_type": "markdown",
  352. "metadata": {},
  353. "source": [
  354. "GoogLeNet 加入了更加结构化的 Inception 块使得我们能够使用更大的通道,更多的层,同时也控制了计算量。\n",
  355. "\n",
  356. "**小练习:GoogLeNet 有很多后续的版本,尝试看看论文,看看有什么不同,实现一下: \n",
  357. "v1:最早的版本 \n",
  358. "v2:加入 batch normalization 加快训练 \n",
  359. "v3:对 inception 模块做了调整 \n",
  360. "v4:基于 ResNet 加入了 残差连接 **"
  361. ]
  362. }
  363. ],
  364. "metadata": {
  365. "kernelspec": {
  366. "display_name": "Python 3",
  367. "language": "python",
  368. "name": "python3"
  369. },
  370. "language_info": {
  371. "codemirror_mode": {
  372. "name": "ipython",
  373. "version": 3
  374. },
  375. "file_extension": ".py",
  376. "mimetype": "text/x-python",
  377. "name": "python",
  378. "nbconvert_exporter": "python",
  379. "pygments_lexer": "ipython3",
  380. "version": "3.6.8"
  381. }
  382. },
  383. "nbformat": 4,
  384. "nbformat_minor": 2
  385. }

机器学习越来越多应用到飞行器、机器人等领域,其目的是利用计算机实现类似人类的智能,从而实现装备的智能化与无人化。本课程旨在引导学生掌握机器学习的基本知识、典型方法与技术,通过具体的应用案例激发学生对该学科的兴趣,鼓励学生能够从人工智能的角度来分析、解决飞行器、机器人所面临的问题和挑战。本课程主要内容包括Python编程基础,机器学习模型,无监督学习、监督学习、深度学习基础知识与实现,并学习如何利用机器学习解决实际问题,从而全面提升自我的《综合能力》。