You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

3a - Linear regression 1D.ipynb 70 kB

6 years ago

  1. {
  2. "cells": [
  3. {
  4. "cell_type": "markdown",
  5. "metadata": {},
  6. "source": [
  7. "# Linear regression \n",
  8. "\n",
  9. "\n",
  10. "We load the dataset 'diabetes' using the sklearn load function: "
  11. ]
  12. },
  13. {
  14. "cell_type": "code",
  15. "execution_count": 11,
  16. "metadata": {},
  17. "outputs": [],
  18. "source": [
  19. "# matplotlib inline\n",
  20. "from sklearn import datasets\n",
  21. "\n",
  22. "# Load the diabetes dataset\n",
  23. "diabetes = datasets.load_diabetes()"
  24. ]
  25. },
  26. {
  27. "cell_type": "markdown",
  28. "metadata": {},
  29. "source": [
  30. "The dataset consists of data and targets. Target tells us what is the desired output for specific example from data: "
  31. ]
  32. },
  33. {
  34. "cell_type": "code",
  35. "execution_count": 3,
  36. "metadata": {},
  37. "outputs": [
  38. {
  39. "name": "stdout",
  40. "output_type": "stream",
  41. "text": [
  42. "(442, 10)\n",
  43. "(442,)\n"
  44. ]
  45. }
  46. ],
  47. "source": [
  48. "X = diabetes.data\n",
  49. "y = diabetes.target\n",
  50. "print(X.shape)\n",
  51. "print(y.shape)"
  52. ]
  53. },
  54. {
  55. "cell_type": "markdown",
  56. "metadata": {},
  57. "source": [
  58. "Splitting the data\n",
  59. "==================\n",
  60. "We want to split the data into train set and test set. We fit the linear model on the train set, and we show that it performs good on test set. \n",
  61. "\n",
  62. "Before splitting the data, we shuffle (mix) the examples, because for some datasets the examples are ordered. \n",
  63. "\n",
  64. "If we wouldn't shuffle, train set and test set could be totally different, thus linear model fitted on train set wouldn't be valid on test set.\n",
  65. "Now we shuffle:\n"
  66. ]
  67. },
  68. {
  69. "cell_type": "code",
  70. "execution_count": 5,
  71. "metadata": {},
  72. "outputs": [
  73. {
  74. "name": "stdout",
  75. "output_type": "stream",
  76. "text": [
  77. "(442, 10)\n",
  78. "(442,)\n"
  79. ]
  80. }
  81. ],
  82. "source": [
  83. "from sklearn.utils import shuffle\n",
  84. "X, y = shuffle(X, y, random_state=1)\n",
  85. "print(X.shape)\n",
  86. "print(y.shape)"
  87. ]
  88. },
  89. {
  90. "cell_type": "markdown",
  91. "metadata": {},
  92. "source": [
  93. "Each example of data has 10 columns in total.\n",
  94. "\n",
  95. "We want to work with 1-dim data because it is simple to visualize. Therefore select only one column, e.g column 2 and fit linear model on it:"
  96. ]
  97. },
  98. {
  99. "cell_type": "code",
  100. "execution_count": 6,
  101. "metadata": {},
  102. "outputs": [
  103. {
  104. "name": "stdout",
  105. "output_type": "stream",
  106. "text": [
  107. "(442, 10)\n",
  108. "(442, 1)\n"
  109. ]
  110. }
  111. ],
  112. "source": [
  113. "# Use only one column from data\n",
  114. "print(X.shape)\n",
  115. "X = X[:, 2:3]\n",
  116. "print(X.shape)"
  117. ]
  118. },
  119. {
  120. "cell_type": "markdown",
  121. "metadata": {},
  122. "source": [
  123. "Split the data into training/testing sets"
  124. ]
  125. },
  126. {
  127. "cell_type": "code",
  128. "execution_count": 7,
  129. "metadata": {},
  130. "outputs": [
  131. {
  132. "name": "stdout",
  133. "output_type": "stream",
  134. "text": [
  135. "(250, 1)\n",
  136. "(192, 1)\n"
  137. ]
  138. }
  139. ],
  140. "source": [
  141. "train_set_size = 250\n",
  142. "X_train = X[:train_set_size] # selects first 250 rows (examples) for train set\n",
  143. "X_test = X[train_set_size:] # selects from row 250 until the last one for test set\n",
  144. "print(X_train.shape)\n",
  145. "print(X_test.shape)"
  146. ]
  147. },
  148. {
  149. "cell_type": "markdown",
  150. "metadata": {},
  151. "source": [
  152. "Split the targets into training/testing sets"
  153. ]
  154. },
  155. {
  156. "cell_type": "code",
  157. "execution_count": 8,
  158. "metadata": {},
  159. "outputs": [
  160. {
  161. "name": "stdout",
  162. "output_type": "stream",
  163. "text": [
  164. "(250,)\n",
  165. "(192,)\n"
  166. ]
  167. }
  168. ],
  169. "source": [
  170. "y_train = y[:train_set_size] # selects first 250 rows (targets) for train set\n",
  171. "y_test = y[train_set_size:] # selects from row 250 until the last one for test set\n",
  172. "print(y_train.shape)\n",
  173. "print(y_test.shape)"
  174. ]
  175. },
  176. {
  177. "cell_type": "markdown",
  178. "metadata": {},
  179. "source": [
  180. "Now we can look at our train data. We can see that the examples have linear relation. \n",
  181. "\n",
  182. "Therefore, we can use linear model to make good classification of our examples.\n"
  183. ]
  184. },
  185. {
  186. "cell_type": "code",
  187. "execution_count": 12,
  188. "metadata": {},
  189. "outputs": [
  190. {
  191. "data": {
  192. "image/png": "\n",
  193. "text/plain": [
  194. "<Figure size 432x288 with 1 Axes>"
  195. ]
  196. },
  197. "metadata": {},
  198. "output_type": "display_data"
  199. }
  200. ],
  201. "source": [
  202. "import matplotlib.pyplot as plt\n",
  203. "\n",
  204. "plt.scatter(X_train, y_train)\n",
  205. "plt.scatter(X_test, y_test)\n",
  206. "plt.xlabel('Data')\n",
  207. "plt.ylabel('Target');"
  208. ]
  209. },
  210. {
  211. "cell_type": "markdown",
  212. "metadata": {},
  213. "source": [
  214. "Linear regression\n",
  215. "=================\n",
  216. "Create linear regression object, which we use later to apply linear regression on data"
  217. ]
  218. },
  219. {
  220. "cell_type": "code",
  221. "execution_count": 20,
  222. "metadata": {},
  223. "outputs": [],
  224. "source": [
  225. "from sklearn import linear_model\n",
  226. "import numpy as np\n",
  227. "regr = linear_model.LinearRegression()"
  228. ]
  229. },
  230. {
  231. "cell_type": "markdown",
  232. "metadata": {},
  233. "source": [
  234. "Fit the model using the training set"
  235. ]
  236. },
  237. {
  238. "cell_type": "code",
  239. "execution_count": 21,
  240. "metadata": {},
  241. "outputs": [],
  242. "source": [
  243. "regr.fit(X_train, y_train);"
  244. ]
  245. },
  246. {
  247. "cell_type": "markdown",
  248. "metadata": {},
  249. "source": [
  250. "We found the coefficients and the bias (the intercept)"
  251. ]
  252. },
  253. {
  254. "cell_type": "code",
  255. "execution_count": 22,
  256. "metadata": {},
  257. "outputs": [
  258. {
  259. "name": "stdout",
  260. "output_type": "stream",
  261. "text": [
  262. "[988.07836941]\n",
  263. "150.80798145969447\n"
  264. ]
  265. }
  266. ],
  267. "source": [
  268. "print(regr.coef_)\n",
  269. "print(regr.intercept_)"
  270. ]
  271. },
  272. {
  273. "cell_type": "markdown",
  274. "metadata": {},
  275. "source": [
  276. "Now we calculate the mean square error on the test set"
  277. ]
  278. },
  279. {
  280. "cell_type": "code",
  281. "execution_count": 23,
  282. "metadata": {},
  283. "outputs": [
  284. {
  285. "name": "stdout",
  286. "output_type": "stream",
  287. "text": [
  288. "Training error: 3960.4058766864073\n",
  289. "Test error: 3811.1989929980004\n"
  290. ]
  291. }
  292. ],
  293. "source": [
  294. "# The mean square error\n",
  295. "print(\"Training error: \", np.mean((regr.predict(X_train) - y_train) ** 2))\n",
  296. "print(\"Test error: \", np.mean((regr.predict(X_test) - y_test) ** 2))\n"
  297. ]
  298. },
  299. {
  300. "cell_type": "markdown",
  301. "metadata": {},
  302. "source": [
  303. "Plotting data and linear model\n",
  304. "==============================\n",
  305. "Now we want to plot the train data and teachers (marked as dots). \n",
  306. "\n",
  307. "With line we represents the data and predictions (linear model that we found):\n"
  308. ]
  309. },
  310. {
  311. "cell_type": "code",
  312. "execution_count": 24,
  313. "metadata": {},
  314. "outputs": [
  315. {
  316. "data": {
  317. "text/plain": [
  318. "Text(0,0.5,'Target')"
  319. ]
  320. },
  321. "execution_count": 24,
  322. "metadata": {},
  323. "output_type": "execute_result"
  324. },
  325. {
  326. "data": {
  327. "image/png": "\n",
  328. "text/plain": [
  329. "<Figure size 432x288 with 1 Axes>"
  330. ]
  331. },
  332. "metadata": {},
  333. "output_type": "display_data"
  334. }
  335. ],
  336. "source": [
  337. "# Visualises dots, where each dot represent a data exaple and corresponding teacher\n",
  338. "plt.scatter(X_train, y_train, color='black')\n",
  339. "# Plots the linear model\n",
  340. "plt.plot(X_train, regr.predict(X_train), color='blue', linewidth=3);\n",
  341. "plt.xlabel('Data')\n",
  342. "plt.ylabel('Target')"
  343. ]
  344. },
  345. {
  346. "cell_type": "markdown",
  347. "metadata": {},
  348. "source": [
  349. "We do similar with test data, and show that linear model is valid for a test set:"
  350. ]
  351. },
  352. {
  353. "cell_type": "code",
  354. "execution_count": 26,
  355. "metadata": {
  356. "scrolled": true
  357. },
  358. "outputs": [
  359. {
  360. "data": {
  361. "image/png": "\n",
  362. "text/plain": [
  363. "<Figure size 432x288 with 1 Axes>"
  364. ]
  365. },
  366. "metadata": {},
  367. "output_type": "display_data"
  368. }
  369. ],
  370. "source": [
  371. "# Visualises dots, where each dot represent a data example and corresponding teacher\n",
  372. "plt.scatter(X_test, y_test, color='black')\n",
  373. "# Plots the linear model\n",
  374. "plt.plot(X_test, regr.predict(X_test), color='blue', linewidth=3);\n",
  375. "plt.xlabel('Data')\n",
  376. "plt.ylabel('Target');"
  377. ]
  378. },
  379. {
  380. "cell_type": "code",
  381. "execution_count": 90,
  382. "metadata": {},
  383. "outputs": [],
  384. "source": []
  385. }
  386. ],
  387. "metadata": {
  388. "kernelspec": {
  389. "display_name": "Python 3",
  390. "language": "python",
  391. "name": "python3"
  392. },
  393. "language_info": {
  394. "codemirror_mode": {
  395. "name": "ipython",
  396. "version": 3
  397. },
  398. "file_extension": ".py",
  399. "mimetype": "text/x-python",
  400. "name": "python",
  401. "nbconvert_exporter": "python",
  402. "pygments_lexer": "ipython3",
  403. "version": "3.5.2"
  404. }
  405. },
  406. "nbformat": 4,
  407. "nbformat_minor": 1
  408. }

机器学习越来越多应用到飞行器、机器人等领域,其目的是利用计算机实现类似人类的智能,从而实现装备的智能化与无人化。本课程旨在引导学生掌握机器学习的基本知识、典型方法与技术,通过具体的应用案例激发学生对该学科的兴趣,鼓励学生能够从人工智能的角度来分析、解决飞行器、机器人所面临的问题和挑战。本课程主要内容包括Python编程基础,机器学习模型,无监督学习、监督学习、深度学习基础知识与实现,并学习如何利用机器学习解决实际问题,从而全面提升自我的《综合能力》。