You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

6-param_initialize.ipynb 15 kB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476
  1. {
  2. "cells": [
  3. {
  4. "cell_type": "markdown",
  5. "metadata": {},
  6. "source": [
  7. "# 参数初始化\n",
  8. "参数初始化对模型具有较大的影响,不同的初始化方式可能会导致截然不同的结果,所幸的是很多深度学习的先驱们已经帮我们探索了各种各样的初始化方式,所以我们只需要学会如何对模型的参数进行初始化的赋值即可。"
  9. ]
  10. },
  11. {
  12. "cell_type": "markdown",
  13. "metadata": {},
  14. "source": [
  15. "PyTorch 的初始化方式并没有那么显然,如果你使用最原始的方式创建模型,那么你需要定义模型中的所有参数,当然这样你可以非常方便地定义每个变量的初始化方式,但是对于复杂的模型,这并不容易,而且我们推崇使用 Sequential 和 Module 来定义模型,所以这个时候我们就需要知道如何来自定义初始化方式"
  16. ]
  17. },
  18. {
  19. "cell_type": "markdown",
  20. "metadata": {},
  21. "source": [
  22. "## 使用 NumPy 来初始化\n",
  23. "因为 PyTorch 是一个非常灵活的框架,理论上能够对所有的 Tensor 进行操作,所以我们能够通过定义新的 Tensor 来初始化,直接看下面的例子"
  24. ]
  25. },
  26. {
  27. "cell_type": "code",
  28. "execution_count": 1,
  29. "metadata": {
  30. "collapsed": true
  31. },
  32. "outputs": [],
  33. "source": [
  34. "import numpy as np\n",
  35. "import torch\n",
  36. "from torch import nn"
  37. ]
  38. },
  39. {
  40. "cell_type": "code",
  41. "execution_count": 2,
  42. "metadata": {
  43. "collapsed": true
  44. },
  45. "outputs": [],
  46. "source": [
  47. "# 定义一个 Sequential 模型\n",
  48. "net1 = nn.Sequential(\n",
  49. " nn.Linear(30, 40),\n",
  50. " nn.ReLU(),\n",
  51. " nn.Linear(40, 50),\n",
  52. " nn.ReLU(),\n",
  53. " nn.Linear(50, 10)\n",
  54. ")"
  55. ]
  56. },
  57. {
  58. "cell_type": "code",
  59. "execution_count": 3,
  60. "metadata": {
  61. "collapsed": true
  62. },
  63. "outputs": [],
  64. "source": [
  65. "# 访问第一层的参数\n",
  66. "w1 = net1[0].weight\n",
  67. "b1 = net1[0].bias"
  68. ]
  69. },
  70. {
  71. "cell_type": "code",
  72. "execution_count": 4,
  73. "metadata": {},
  74. "outputs": [
  75. {
  76. "name": "stdout",
  77. "output_type": "stream",
  78. "text": [
  79. "Parameter containing:\n",
  80. "tensor([[-0.0784, 0.1559, 0.0451, ..., 0.0432, 0.0325, -0.0626],\n",
  81. " [ 0.0436, 0.0976, 0.1529, ..., -0.1601, -0.1227, -0.0831],\n",
  82. " [ 0.0890, 0.0343, 0.1744, ..., -0.0332, 0.0897, 0.0002],\n",
  83. " ...,\n",
  84. " [-0.1447, -0.0411, -0.0851, ..., 0.0117, 0.1457, 0.0585],\n",
  85. " [ 0.1642, 0.0744, -0.1118, ..., 0.0623, -0.0591, 0.0512],\n",
  86. " [-0.1610, 0.0070, 0.0184, ..., -0.1529, -0.0314, 0.1748]],\n",
  87. " requires_grad=True)\n"
  88. ]
  89. }
  90. ],
  91. "source": [
  92. "print(w1)"
  93. ]
  94. },
  95. {
  96. "cell_type": "markdown",
  97. "metadata": {},
  98. "source": [
  99. "注意,这是一个 Parameter,也就是一个特殊的 Variable,我们可以访问其 `.data`属性得到其中的数据,然后直接定义一个新的 Tensor 对其进行替换,我们可以使用 PyTorch 中的一些随机数据生成的方式,比如 `torch.randn`,如果要使用更多 PyTorch 中没有的随机化方式,可以使用 numpy"
  100. ]
  101. },
  102. {
  103. "cell_type": "code",
  104. "execution_count": 5,
  105. "metadata": {
  106. "collapsed": true
  107. },
  108. "outputs": [],
  109. "source": [
  110. "# 定义一个 Tensor 直接对其进行替换\n",
  111. "net1[0].weight.data = torch.from_numpy(np.random.uniform(3, 5, size=(40, 30)))"
  112. ]
  113. },
  114. {
  115. "cell_type": "code",
  116. "execution_count": 6,
  117. "metadata": {},
  118. "outputs": [
  119. {
  120. "name": "stdout",
  121. "output_type": "stream",
  122. "text": [
  123. "Parameter containing:\n",
  124. "tensor([[3.5493, 3.2984, 4.3041, ..., 4.5181, 3.7561, 4.5633],\n",
  125. " [4.4523, 3.7956, 3.7448, ..., 3.5031, 3.9477, 4.8617],\n",
  126. " [3.5174, 4.1082, 4.6358, ..., 3.5759, 4.5291, 3.9545],\n",
  127. " ...,\n",
  128. " [3.6757, 4.2100, 3.9763, ..., 3.2017, 3.4422, 4.0191],\n",
  129. " [3.0283, 3.8147, 3.1705, ..., 3.9442, 4.1054, 4.9491],\n",
  130. " [3.5879, 3.7237, 4.0656, ..., 3.2279, 3.1818, 4.7489]],\n",
  131. " dtype=torch.float64, requires_grad=True)\n"
  132. ]
  133. }
  134. ],
  135. "source": [
  136. "print(net1[0].weight)"
  137. ]
  138. },
  139. {
  140. "cell_type": "markdown",
  141. "metadata": {},
  142. "source": [
  143. "可以看到这个参数的值已经被改变了,也就是说已经被定义成了我们需要的初始化方式,如果模型中某一层需要我们手动去修改,那么我们可以直接用这种方式去访问,但是更多的时候是模型中相同类型的层都需要初始化成相同的方式,这个时候一种更高效的方式是使用循环去访问,比如"
  144. ]
  145. },
  146. {
  147. "cell_type": "code",
  148. "execution_count": 7,
  149. "metadata": {
  150. "collapsed": true
  151. },
  152. "outputs": [],
  153. "source": [
  154. "for layer in net1:\n",
  155. " if isinstance(layer, nn.Linear): # 判断是否是线性层\n",
  156. " param_shape = layer.weight.shape\n",
  157. " layer.weight.data = torch.from_numpy(np.random.normal(0, 0.5, size=param_shape)) \n",
  158. " # 定义为均值为 0,方差为 0.5 的正态分布"
  159. ]
  160. },
  161. {
  162. "cell_type": "markdown",
  163. "metadata": {},
  164. "source": [
  165. "**小练习:一种非常流行的初始化方式叫 Xavier,方法来源于 2010 年的一篇论文 [Understanding the difficulty of training deep feedforward neural networks](http://proceedings.mlr.press/v9/glorot10a.html),其通过数学的推到,证明了这种初始化方式可以使得每一层的输出方差是尽可能相等的,有兴趣的同学可以去看看论文**\n",
  166. "\n",
  167. "我们给出这种初始化的公式\n",
  168. "\n",
  169. "$$\n",
  170. "w\\ \\sim \\ Uniform[- \\frac{\\sqrt{6}}{\\sqrt{n_j + n_{j+1}}}, \\frac{\\sqrt{6}}{\\sqrt{n_j + n_{j+1}}}]\n",
  171. "$$\n",
  172. "\n",
  173. "其中 $n_j$ 和 $n_{j+1}$ 表示该层的输入和输出数目,所以请尝试实现以下这种初始化方式"
  174. ]
  175. },
  176. {
  177. "cell_type": "markdown",
  178. "metadata": {},
  179. "source": [
  180. "对于 Module 的参数初始化,其实也非常简单,如果想对其中的某层进行初始化,可以直接像 Sequential 一样对其 Tensor 进行重新定义,其唯一不同的地方在于,如果要用循环的方式访问,需要介绍两个属性,children 和 modules,下面我们举例来说明"
  181. ]
  182. },
  183. {
  184. "cell_type": "code",
  185. "execution_count": 8,
  186. "metadata": {
  187. "collapsed": true
  188. },
  189. "outputs": [],
  190. "source": [
  191. "class sim_net(nn.Module):\n",
  192. " def __init__(self):\n",
  193. " super(sim_net, self).__init__()\n",
  194. " self.l1 = nn.Sequential(\n",
  195. " nn.Linear(30, 40),\n",
  196. " nn.ReLU()\n",
  197. " )\n",
  198. " \n",
  199. " self.l1[0].weight.data = torch.randn(40, 30) # 直接对某一层初始化\n",
  200. " \n",
  201. " self.l2 = nn.Sequential(\n",
  202. " nn.Linear(40, 50),\n",
  203. " nn.ReLU()\n",
  204. " )\n",
  205. " \n",
  206. " self.l3 = nn.Sequential(\n",
  207. " nn.Linear(50, 10),\n",
  208. " nn.ReLU()\n",
  209. " )\n",
  210. " \n",
  211. " def forward(self, x):\n",
  212. " x = self.l1(x)\n",
  213. " x =self.l2(x)\n",
  214. " x = self.l3(x)\n",
  215. " return x"
  216. ]
  217. },
  218. {
  219. "cell_type": "code",
  220. "execution_count": 9,
  221. "metadata": {
  222. "collapsed": true
  223. },
  224. "outputs": [],
  225. "source": [
  226. "net2 = sim_net()"
  227. ]
  228. },
  229. {
  230. "cell_type": "code",
  231. "execution_count": 10,
  232. "metadata": {},
  233. "outputs": [
  234. {
  235. "name": "stdout",
  236. "output_type": "stream",
  237. "text": [
  238. "Sequential(\n",
  239. " (0): Linear(in_features=30, out_features=40, bias=True)\n",
  240. " (1): ReLU()\n",
  241. ")\n",
  242. "Sequential(\n",
  243. " (0): Linear(in_features=40, out_features=50, bias=True)\n",
  244. " (1): ReLU()\n",
  245. ")\n",
  246. "Sequential(\n",
  247. " (0): Linear(in_features=50, out_features=10, bias=True)\n",
  248. " (1): ReLU()\n",
  249. ")\n"
  250. ]
  251. }
  252. ],
  253. "source": [
  254. "# 访问 children\n",
  255. "for i in net2.children():\n",
  256. " print(i)"
  257. ]
  258. },
  259. {
  260. "cell_type": "code",
  261. "execution_count": 11,
  262. "metadata": {},
  263. "outputs": [
  264. {
  265. "name": "stdout",
  266. "output_type": "stream",
  267. "text": [
  268. "sim_net(\n",
  269. " (l1): Sequential(\n",
  270. " (0): Linear(in_features=30, out_features=40, bias=True)\n",
  271. " (1): ReLU()\n",
  272. " )\n",
  273. " (l2): Sequential(\n",
  274. " (0): Linear(in_features=40, out_features=50, bias=True)\n",
  275. " (1): ReLU()\n",
  276. " )\n",
  277. " (l3): Sequential(\n",
  278. " (0): Linear(in_features=50, out_features=10, bias=True)\n",
  279. " (1): ReLU()\n",
  280. " )\n",
  281. ")\n",
  282. "Sequential(\n",
  283. " (0): Linear(in_features=30, out_features=40, bias=True)\n",
  284. " (1): ReLU()\n",
  285. ")\n",
  286. "Linear(in_features=30, out_features=40, bias=True)\n",
  287. "ReLU()\n",
  288. "Sequential(\n",
  289. " (0): Linear(in_features=40, out_features=50, bias=True)\n",
  290. " (1): ReLU()\n",
  291. ")\n",
  292. "Linear(in_features=40, out_features=50, bias=True)\n",
  293. "ReLU()\n",
  294. "Sequential(\n",
  295. " (0): Linear(in_features=50, out_features=10, bias=True)\n",
  296. " (1): ReLU()\n",
  297. ")\n",
  298. "Linear(in_features=50, out_features=10, bias=True)\n",
  299. "ReLU()\n"
  300. ]
  301. }
  302. ],
  303. "source": [
  304. "# 访问 modules\n",
  305. "for i in net2.modules():\n",
  306. " print(i)"
  307. ]
  308. },
  309. {
  310. "cell_type": "markdown",
  311. "metadata": {},
  312. "source": [
  313. "通过上面的例子,看到区别了吗?\n",
  314. "\n",
  315. "children 只会访问到模型定义中的第一层,因为上面的模型中定义了三个 Sequential,所以只会访问到三个 Sequential,而 modules 会访问到最后的结构,比如上面的例子,modules 不仅访问到了 Sequential,也访问到了 Sequential 里面,这就对我们做初始化非常方便,比如"
  316. ]
  317. },
  318. {
  319. "cell_type": "code",
  320. "execution_count": 12,
  321. "metadata": {
  322. "collapsed": true
  323. },
  324. "outputs": [],
  325. "source": [
  326. "for layer in net2.modules():\n",
  327. " if isinstance(layer, nn.Linear):\n",
  328. " param_shape = layer.weight.shape\n",
  329. " layer.weight.data = torch.from_numpy(np.random.normal(0, 0.5, size=param_shape)) "
  330. ]
  331. },
  332. {
  333. "cell_type": "markdown",
  334. "metadata": {},
  335. "source": [
  336. "这上面实现了和 Sequential 相同的初始化,同样非常简便"
  337. ]
  338. },
  339. {
  340. "cell_type": "markdown",
  341. "metadata": {},
  342. "source": [
  343. "## torch.nn.init\n",
  344. "因为 PyTorch 灵活的特性,我们可以直接对 Tensor 进行操作从而初始化,PyTorch 也提供了初始化的函数帮助我们快速初始化,就是 `torch.nn.init`,其操作层面仍然在 Tensor 上,下面我们举例说明"
  345. ]
  346. },
  347. {
  348. "cell_type": "code",
  349. "execution_count": 13,
  350. "metadata": {
  351. "collapsed": true
  352. },
  353. "outputs": [],
  354. "source": [
  355. "from torch.nn import init"
  356. ]
  357. },
  358. {
  359. "cell_type": "code",
  360. "execution_count": 14,
  361. "metadata": {},
  362. "outputs": [
  363. {
  364. "name": "stdout",
  365. "output_type": "stream",
  366. "text": [
  367. "Parameter containing:\n",
  368. "tensor([[ 0.2725, -0.2262, -0.4229, ..., -0.2451, 0.2344, 0.1583],\n",
  369. " [ 0.1886, 0.3226, -0.5023, ..., -0.2228, 0.5089, -0.6994],\n",
  370. " [-0.4689, 0.2612, 0.3464, ..., -0.0423, -0.2999, -0.5813],\n",
  371. " ...,\n",
  372. " [ 0.4200, 0.2091, -0.3690, ..., 0.4142, 0.1120, 0.0771],\n",
  373. " [ 0.6540, 0.0475, 0.0594, ..., 0.1726, -0.2264, 0.1510],\n",
  374. " [-1.0729, -0.2862, 0.4953, ..., 0.4702, 0.5555, -0.2246]],\n",
  375. " dtype=torch.float64, requires_grad=True)\n"
  376. ]
  377. }
  378. ],
  379. "source": [
  380. "print(net1[0].weight)"
  381. ]
  382. },
  383. {
  384. "cell_type": "code",
  385. "execution_count": 16,
  386. "metadata": {},
  387. "outputs": [
  388. {
  389. "data": {
  390. "text/plain": [
  391. "Parameter containing:\n",
  392. "tensor([[ 0.1173, -0.0864, 0.1008, ..., -0.1053, 0.2642, -0.1045],\n",
  393. " [-0.0244, 0.1722, 0.1330, ..., 0.2443, -0.2385, 0.1613],\n",
  394. " [-0.1767, 0.0678, 0.1282, ..., 0.1033, -0.2423, -0.0864],\n",
  395. " ...,\n",
  396. " [-0.1673, -0.1338, -0.0839, ..., 0.0267, 0.1693, -0.2911],\n",
  397. " [ 0.2146, 0.0194, 0.2873, ..., 0.1486, 0.2775, 0.2740],\n",
  398. " [-0.0400, 0.2231, 0.0800, ..., 0.2804, 0.2121, 0.2764]],\n",
  399. " dtype=torch.float64, requires_grad=True)"
  400. ]
  401. },
  402. "execution_count": 16,
  403. "metadata": {},
  404. "output_type": "execute_result"
  405. }
  406. ],
  407. "source": [
  408. "init.xavier_uniform_(net1[0].weight) # 这就是上面我们讲过的 Xavier 初始化方法,PyTorch 直接内置了其实现"
  409. ]
  410. },
  411. {
  412. "cell_type": "code",
  413. "execution_count": 17,
  414. "metadata": {},
  415. "outputs": [
  416. {
  417. "name": "stdout",
  418. "output_type": "stream",
  419. "text": [
  420. "Parameter containing:\n",
  421. "tensor([[ 0.1173, -0.0864, 0.1008, ..., -0.1053, 0.2642, -0.1045],\n",
  422. " [-0.0244, 0.1722, 0.1330, ..., 0.2443, -0.2385, 0.1613],\n",
  423. " [-0.1767, 0.0678, 0.1282, ..., 0.1033, -0.2423, -0.0864],\n",
  424. " ...,\n",
  425. " [-0.1673, -0.1338, -0.0839, ..., 0.0267, 0.1693, -0.2911],\n",
  426. " [ 0.2146, 0.0194, 0.2873, ..., 0.1486, 0.2775, 0.2740],\n",
  427. " [-0.0400, 0.2231, 0.0800, ..., 0.2804, 0.2121, 0.2764]],\n",
  428. " dtype=torch.float64, requires_grad=True)\n"
  429. ]
  430. }
  431. ],
  432. "source": [
  433. "print(net1[0].weight)"
  434. ]
  435. },
  436. {
  437. "cell_type": "markdown",
  438. "metadata": {},
  439. "source": [
  440. "可以看到参数已经被修改了\n",
  441. "\n",
  442. "`torch.nn.init` 为我们提供了更多的内置初始化方式,避免了我们重复去实现一些相同的操作"
  443. ]
  444. },
  445. {
  446. "cell_type": "markdown",
  447. "metadata": {},
  448. "source": [
  449. "上面讲了两种初始化方式,其实它们的本质都是一样的,就是去修改某一层参数的实际值,而 `torch.nn.init` 提供了更多成熟的深度学习相关的初始化方式,非常方便\n",
  450. "\n",
  451. "下一节课,我们将讲一下目前流行的各种基于梯度的优化算法"
  452. ]
  453. }
  454. ],
  455. "metadata": {
  456. "kernelspec": {
  457. "display_name": "Python 3",
  458. "language": "python",
  459. "name": "python3"
  460. },
  461. "language_info": {
  462. "codemirror_mode": {
  463. "name": "ipython",
  464. "version": 3
  465. },
  466. "file_extension": ".py",
  467. "mimetype": "text/x-python",
  468. "name": "python",
  469. "nbconvert_exporter": "python",
  470. "pygments_lexer": "ipython3",
  471. "version": "3.5.4"
  472. }
  473. },
  474. "nbformat": 4,
  475. "nbformat_minor": 2
  476. }

机器学习越来越多应用到飞行器、机器人等领域,其目的是利用计算机实现类似人类的智能,从而实现装备的智能化与无人化。本课程旨在引导学生掌握机器学习的基本知识、典型方法与技术,通过具体的应用案例激发学生对该学科的兴趣,鼓励学生能够从人工智能的角度来分析、解决飞行器、机器人所面临的问题和挑战。本课程主要内容包括Python编程基础,机器学习模型,无监督学习、监督学习、深度学习基础知识与实现,并学习如何利用机器学习解决实际问题,从而全面提升自我的《综合能力》。