You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

Least_squares.py 12 kB

6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360
  1. # -*- coding: utf-8 -*-
  2. # ---
  3. # jupyter:
  4. # jupytext_format_version: '1.2'
  5. # kernelspec:
  6. # display_name: Python 3
  7. # language: python
  8. # name: python3
  9. # language_info:
  10. # codemirror_mode:
  11. # name: ipython
  12. # version: 3
  13. # file_extension: .py
  14. # mimetype: text/x-python
  15. # name: python
  16. # nbconvert_exporter: python
  17. # pygments_lexer: ipython3
  18. # version: 3.5.2
  19. # ---
  20. # # Least squares
  21. #
  22. # A mathematical procedure for finding the best-fitting curve to a given set of points by minimizing the sum of the squares of the offsets ("the residuals") of the points from the curve. The sum of the squares of the offsets is used instead of the offset absolute values because this allows the residuals to be treated as a continuous differentiable quantity. However, because squares of the offsets are used, outlying points can have a disproportionate effect on the fit, a property which may or may not be desirable depending on the problem at hand.
  23. #
  24. # ### Show the data
  25. #
  26. # +
  27. # %matplotlib inline
  28. import matplotlib.pyplot as plt
  29. import numpy as np
  30. import sklearn
  31. from sklearn import datasets
  32. # load data
  33. d = datasets.load_diabetes()
  34. X = d.data[:, 2]
  35. Y = d.target
  36. # draw original data
  37. plt.scatter(X, Y)
  38. plt.xlabel("X")
  39. plt.ylabel("Y")
  40. plt.show()
  41. # -
  42. # ### Theory
  43. # For $N$ observation data:
  44. # $$
  45. # \mathbf{X} = \{x_1, x_2, ..., x_N \} \\
  46. # \mathbf{Y} = \{y_1, y_2, ..., y_N \}
  47. # $$
  48. #
  49. # We want to find the model which can predict the data. The simplest model is linear model, which has the form of
  50. # $$
  51. # y = ax + b
  52. # $$
  53. #
  54. # The purpose is to find parameters $a, b$ which best fit the model to the observation data.
  55. #
  56. # We use the sum of squares to measure the differences (loss function) between the model's prediction and observation data:
  57. # $$
  58. # L = \sum_{i=1}^{N} (y_i - a x_i + b)^2
  59. # $$
  60. #
  61. # To make the loss function minimize, we can find the parameters:
  62. # $$
  63. # \frac{\partial L}{\partial a} = -2 \sum_{i=1}^{N} (y_i - a x_i - b) x_i \\
  64. # \frac{\partial L}{\partial b} = -2 \sum_{i=1}^{N} (y_i - a x_i - b)
  65. # $$
  66. # When the loss is minimized, therefore the partial difference is zero, then we can get:
  67. # $$
  68. # -2 \sum_{i=1}^{N} (y_i - a x_i - b) x_i = 0 \\
  69. # -2 \sum_{i=1}^{N} (y_i - a x_i - b) = 0 \\
  70. # $$
  71. #
  72. # We reoder the items as:
  73. # $$
  74. # a \sum x_i^2 + b \sum x_i = \sum y_i x_i \\
  75. # a \sum x_i + b N = \sum y_i
  76. # $$
  77. # By solving the linear equation we can obtain the model parameters.
  78. # ### Program
  79. # +
  80. N = X.shape[0]
  81. S_X2 = np.sum(X*X)
  82. S_X = np.sum(X)
  83. S_XY = np.sum(X*Y)
  84. S_Y = np.sum(Y)
  85. A1 = np.array([[S_X2, S_X],
  86. [S_X, N]])
  87. B1 = np.array([S_XY, S_Y])
  88. coeff = np.linalg.inv(A1).dot(B1)
  89. print('a = %f, b = %f' % (coeff[0], coeff[1]))
  90. x_min = np.min(X)
  91. x_max = np.max(X)
  92. y_min = coeff[0] * x_min + coeff[1]
  93. y_max = coeff[0] * x_max + coeff[1]
  94. plt.scatter(X, Y, label='original data')
  95. plt.plot([x_min, x_max], [y_min, y_max], 'r', label='model')
  96. plt.legend()
  97. plt.show()
  98. # -
  99. # ## 如何使用迭代的方法求出模型参数
  100. #
  101. # 当数据比较多的时候,或者模型比较复杂,无法直接使用解析的方式求出模型参数。因此更为常用的方式是,通过迭代的方式逐步逼近模型的参数。
  102. #
  103. # ### 梯度下降法
  104. # 在机器学习算法中,对于很多监督学习模型,需要对原始的模型构建损失函数,接下来便是通过优化算法对损失函数进行优化,以便寻找到最优的参数。在求解机器学习参数的优化算法中,使用较多的是基于梯度下降的优化算法(Gradient Descent, GD)。
  105. #
  106. # 梯度下降法有很多优点,其中,在梯度下降法的求解过程中,只需求解损失函数的一阶导数,计算的代价比较小,这使得梯度下降法能在很多大规模数据集上得到应用。梯度下降法的含义是通过当前点的梯度方向寻找到新的迭代点。
  107. #
  108. # 梯度下降法的基本思想可以类比为一个下山的过程。假设这样一个场景:一个人被困在山上,需要从山上下来(i.e. 找到山的最低点,也就是山谷)。但此时山上的浓雾很大,导致可视度很低。因此,下山的路径就无法确定,他必须利用自己周围的信息去找到下山的路径。这个时候,他就可以利用梯度下降算法来帮助自己下山。具体来说就是,以他当前的所处的位置为基准,寻找这个位置最陡峭的地方,然后朝着山的高度下降的地方走,同理,如果我们的目标是上山,也就是爬到山顶,那么此时应该是朝着最陡峭的方向往上走。然后每走一段距离,都反复采用同一个方法,最后就能成功的抵达山谷。
  109. #
  110. #
  111. # 我们同时可以假设这座山最陡峭的地方是无法通过肉眼立马观察出来的,而是需要一个复杂的工具来测量,同时,这个人此时正好拥有测量出最陡峭方向的能力。所以,此人每走一段距离,都需要一段时间来测量所在位置最陡峭的方向,这是比较耗时的。那么为了在太阳下山之前到达山底,就要尽可能的减少测量方向的次数。这是一个两难的选择,如果测量的频繁,可以保证下山的方向是绝对正确的,但又非常耗时,如果测量的过少,又有偏离轨道的风险。所以需要找到一个合适的测量方向的频率,来确保下山的方向不错误,同时又不至于耗时太多!
  112. #
  113. #
  114. # ![gradient_descent](images/gradient_descent.png)
  115. #
  116. # 如上图所示,得到了局部最优解。x,y表示的是$\theta_0$和$\theta_1$,z方向表示的是花费函数,很明显出发点不同,最后到达的收敛点可能不一样。当然如果是碗状的,那么收敛点就应该是一样的。
  117. #
  118. # 对于某一个损失函数
  119. # $$
  120. # L = \sum_{i=1}^{N} (y_i - a x_i + b)^2
  121. # $$
  122. #
  123. # 我们更新的策略是:
  124. # $$
  125. # \theta^1 = \theta^0 - \alpha \triangledown L(\theta)
  126. # $$
  127. # 其中$\theta$代表了模型中的参数,例如$a$, $b$
  128. #
  129. # 此公式的意义是:L是关于$\theta$的一个函数,我们当前所处的位置为$\theta_0$点,要从这个点走到L的最小值点,也就是山底。首先我们先确定前进的方向,也就是梯度的反向,然后走一段距离的步长,也就是$\alpha$,走完这个段步长,就到达了$\theta_1$这个点!
  130. #
  131. # 下面就这个公式的几个常见的疑问:
  132. #
  133. # * **$\alpha$是什么含义?**
  134. # $\alpha$在梯度下降算法中被称作为学习率或者步长,意味着我们可以通过$\alpha$来控制每一步走的距离,以保证不要步子跨的太大扯着蛋,哈哈,其实就是不要走太快,错过了最低点。同时也要保证不要走的太慢,导致太阳下山了,还没有走到山下。所以$\alpha$的选择在梯度下降法中往往是很重要的!$\alpha$不能太大也不能太小,太小的话,可能导致迟迟走不到最低点,太大的话,会导致错过最低点!
  135. # ![gd_stepsize](images/gd_stepsize.png)
  136. #
  137. # * **为什么要梯度要乘以一个负号?**
  138. # 梯度前加一个负号,就意味着朝着梯度相反的方向前进!我们在前文提到,梯度的方向实际就是函数在此点上升最快的方向!而我们需要朝着下降最快的方向走,自然就是负的梯度的方向,所以此处需要加上负号
  139. #
  140. #
  141. # ### Program
  142. # +
  143. n_epoch = 3000 # epoch size
  144. a, b = 1, 1 # initial parameters
  145. epsilon = 0.001 # learning rate
  146. for i in range(n_epoch):
  147. for j in range(N):
  148. a = a + epsilon*2*(Y[j] - a*X[j] - b)*X[j]
  149. b = b + epsilon*2*(Y[j] - a*X[j] - b)
  150. L = 0
  151. for j in range(N):
  152. L = L + (Y[j]-a*X[j]-b)**2
  153. print("epoch %4d: loss = %f, a = %f, b = %f" % (i, L, a, b))
  154. x_min = np.min(X)
  155. x_max = np.max(X)
  156. y_min = a * x_min + b
  157. y_max = a * x_max + b
  158. plt.scatter(X, Y, label='original data')
  159. plt.plot([x_min, x_max], [y_min, y_max], 'r', label='model')
  160. plt.legend()
  161. plt.show()
  162. # -
  163. # ## How to show the iterative process
  164. # +
  165. # %matplotlib nbagg
  166. import matplotlib.pyplot as plt
  167. import matplotlib.animation as animation
  168. n_epoch = 3000 # epoch size
  169. a, b = 1, 1 # initial parameters
  170. epsilon = 0.001 # learning rate
  171. fig = plt.figure()
  172. imgs = []
  173. for i in range(n_epoch):
  174. for j in range(N):
  175. a = a + epsilon*2*(Y[j] - a*X[j] - b)*X[j]
  176. b = b + epsilon*2*(Y[j] - a*X[j] - b)
  177. L = 0
  178. for j in range(N):
  179. L = L + (Y[j]-a*X[j]-b)**2
  180. #print("epoch %4d: loss = %f, a = %f, b = %f" % (i, L, a, b))
  181. if i % 50 == 0:
  182. x_min = np.min(X)
  183. x_max = np.max(X)
  184. y_min = a * x_min + b
  185. y_max = a * x_max + b
  186. img = plt.scatter(X, Y, label='original data')
  187. img = plt.plot([x_min, x_max], [y_min, y_max], 'r', label='model')
  188. imgs.append(img)
  189. ani = animation.ArtistAnimation(fig, imgs)
  190. plt.show()
  191. # -
  192. # ## How to use batch update method?
  193. #
  194. # If some data is outliear, then only use one data can make the learning inaccuracy and slow.
  195. #
  196. #
  197. # * [梯度下降方法的几种形式](https://blog.csdn.net/u010402786/article/details/51188876)
  198. # ## How to fit polynomial function?
  199. #
  200. # If we observe a missle at some time, then how to estimate the trajectory? Acoording the physical theory, the trajectory can be formulated as:
  201. # $$
  202. # y = at^2 + bt + c
  203. # $$
  204. # The we need at least three data to compute the parameters $a, b, c$.
  205. #
  206. #
  207. # +
  208. t = np.array([2, 4, 6, 8])
  209. #t = np.linspace(0, 10)
  210. pa = -20
  211. pb = 90
  212. pc = 800
  213. y = pa*t**2 + pb*t + pc
  214. plt.scatter(t, y)
  215. plt.show()
  216. # -
  217. # ## How to use sklearn to solve linear problem?
  218. #
  219. #
  220. # +
  221. from sklearn import linear_model
  222. # load data
  223. d = datasets.load_diabetes()
  224. X = d.data[:, np.newaxis, 2]
  225. Y = d.target
  226. # create regression model
  227. regr = linear_model.LinearRegression()
  228. regr.fit(X, Y)
  229. a, b = regr.coef_, regr.intercept_
  230. print("a = %f, b = %f" % (a, b))
  231. x_min = np.min(X)
  232. x_max = np.max(X)
  233. y_min = a * x_min + b
  234. y_max = a * x_max + b
  235. plt.scatter(X, Y)
  236. plt.plot([x_min, x_max], [y_min, y_max], 'r')
  237. plt.show()
  238. # -
  239. # ## How to use sklearn to fit polynomial function?
  240. # +
  241. # Fitting polynomial functions
  242. from sklearn.preprocessing import PolynomialFeatures
  243. from sklearn.linear_model import LinearRegression
  244. from sklearn.pipeline import Pipeline
  245. t = np.array([2, 4, 6, 8])
  246. pa = -20
  247. pb = 90
  248. pc = 800
  249. y = pa*t**2 + pb*t + pc
  250. model = Pipeline([('poly', PolynomialFeatures(degree=2)),
  251. ('linear', LinearRegression(fit_intercept=False))])
  252. model = model.fit(t[:, np.newaxis], y)
  253. model.named_steps['linear'].coef_
  254. # -
  255. # ## How to estimate some missing value by the model?
  256. #
  257. # +
  258. # load data
  259. d = datasets.load_diabetes()
  260. N = d.target.shape[0]
  261. N_train = int(N*0.9)
  262. N_test = N - N_train
  263. X = d.data[:N_train, np.newaxis, 2]
  264. Y = d.target[:N_train]
  265. X_test = d.data[N_train:, np.newaxis, 2]
  266. Y_test = d.target[N_train:]
  267. # create regression model
  268. regr = linear_model.LinearRegression()
  269. regr.fit(X, Y)
  270. Y_est = regr.predict(X_test)
  271. print("Y_est = ", Y_est)
  272. print("Y_test = ", Y_test)
  273. err = (Y_est - Y_test)**2
  274. err2 = sklearn.metrics.mean_squared_error(Y_test, Y_est)
  275. score = regr.score(X_test, Y_test)
  276. print("err = %f (%f), score = %f" % (np.sqrt(np.sum(err))/N_test, np.sqrt(err2), score))
  277. # plot data
  278. a, b = regr.coef_, regr.intercept_
  279. print("a = %f, b = %f" % (a, b))
  280. x_min = np.min(X)
  281. x_max = np.max(X)
  282. y_min = a * x_min + b
  283. y_max = a * x_max + b
  284. plt.scatter(X, Y, label='train data')
  285. plt.scatter(X_test, Y_test, label='test data')
  286. plt.plot([x_min, x_max], [y_min, y_max], 'r', label='model')
  287. plt.legend()
  288. plt.show()
  289. # -

机器学习越来越多应用到飞行器、机器人等领域,其目的是利用计算机实现类似人类的智能,从而实现装备的智能化与无人化。本课程旨在引导学生掌握机器学习的基本知识、典型方法与技术,通过具体的应用案例激发学生对该学科的兴趣,鼓励学生能够从人工智能的角度来分析、解决飞行器、机器人所面临的问题和挑战。本课程主要内容包括Python编程基础,机器学习模型,无监督学习、监督学习、深度学习基础知识与实现,并学习如何利用机器学习解决实际问题,从而全面提升自我的《综合能力》。