You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

results.md 13 kB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223
  1. # Results with minimal test RMSE for each kernel on dataset Asyclic
  2. All kernels expect for Cyclic pattern kernel are tested on dataset Asyclic, which consists of 185 molecules (graphs). (Cyclic pattern kernel is tested on dataset MAO and PAH.)
  3. The criteria used for prediction are SVM for classification and kernel Ridge regression for regression.
  4. For predition we randomly divide the data in train and test subset, where 90% of entire dataset is for training and rest for testing. 10 splits are performed. For each split, we first train on the train data, then evaluate the performance on the test set. We choose the optimal parameters for the test set and finally provide the corresponding performance. The final results correspond to the average of the performances on the test sets.
  5. All the results were run under Python 3.5.2, in a machine of 64 bit with one Intel(R) Core(TM) i7-7920HQ CPU @ 3.10GHz, Memory of 32GB, and Ubuntu 16.04.3 LTS OS.
  6. ## Summary
  7. | Kernels | RMSE(℃) | STD(℃) | Parameter | k_time |
  8. |------------------|:-------:|:------:|------------------:|-------:|
  9. | Shortest path | 35.19 | 4.50 | - | 14.58" |
  10. | Marginalized | 18.02 | 6.29 | p_quit = 0.1 | 4'19" |
  11. | Path | 14.00 | 6.94 | - | 37.58" |
  12. | WL subtree | 7.55 | 2.33 | height = 1 | 0.84" |
  13. | WL shortest path | 35.16 | 4.50 | height = 2 | 40.24" |
  14. | WL edge | 33.41 | 4.73 | height = 5 | 5.66" |
  15. | Treelet | 8.31 | 3.38 | - | 0.50" |
  16. | Path up to d | 7.43 | 2.69 | depth = 2 | 0.52" |
  17. | Tree pattern | 7.27 | 2.21 | lamda = 1, h = 2 | 37.24" |
  18. | Cyclic pattern | 0.9 | 0.11 | cycle bound = 100 | 0.31" |
  19. * RMSE stands for arithmetic mean of the root mean squared errors on all splits.
  20. * STD stands for standard deviation of the root mean squared errors on all splits.
  21. * Paremeter is the one with which the kenrel achieves the best results.
  22. * k_time is the time spent on building the kernel matrix.
  23. * The targets of training data are normalized before calculating *path kernel* and *treelet kernel*.
  24. ## Detailed results of each kernel
  25. In each table below:
  26. * The unit of the *RMSEs* and *stds* is *℃*, The unit of the *k_time* is *s*.
  27. * k_time is the time spent on building the kernel matrix.
  28. ### shortest path kernel
  29. ```
  30. RMSE_test std_test RMSE_train std_train k_time
  31. ----------- ---------- ------------ ----------- --------
  32. 35.192 4.49577 28.3604 1.35718 14.5768
  33. ```
  34. ### Marginalized kernel
  35. The table below shows the results of the marginalized under different termimation probability.
  36. ```
  37. p_quit RMSE_test std_test RMSE_train std_train k_time
  38. -------- ----------- ---------- ------------ ----------- --------
  39. 0.1 18.0243 6.29247 12.1863 7.03899 258.77
  40. 0.2 18.3376 5.85454 13.9554 7.54407 256.327
  41. 0.3 18.496 5.73492 13.9391 7.95812 255.614
  42. 0.4 19.4491 5.3713 16.2593 6.69358 254.897
  43. 0.5 19.7857 5.55054 17.0181 6.84437 256.757
  44. 0.6 20.1922 5.59122 17.6618 6.56718 256.557
  45. 0.7 21.6614 6.02685 20.5882 5.74601 254.953
  46. 0.8 22.996 6.08347 23.5943 3.80637 252.804
  47. 0.9 24.4241 4.95119 25.8082 3.31207 256.738
  48. ```
  49. ### Path kernel
  50. **The targets of training data are normalized before calculating the kernel.**
  51. ```
  52. RMSE_test std_test RMSE_train std_train k_time
  53. ----------- ---------- ------------ ----------- --------
  54. 14.0015 6.93602 3.76191 0.702594 37.5759
  55. ```
  56. ### Weisfeiler-Lehman subtree kernel
  57. The table below shows the results of the WL subtree under different subtree heights.
  58. ```
  59. height RMSE_test std_test RMSE_train std_train k_time
  60. -------- ----------- ---------- ------------ ----------- --------
  61. 0 15.6859 4.1392 17.6816 0.713183 0.360443
  62. 1 7.55046 2.33179 6.27001 0.654734 0.837389
  63. 2 9.72847 2.05767 4.45068 0.882129 1.25317
  64. 3 11.2961 2.79994 2.27059 0.481516 1.79971
  65. 4 12.8083 3.44694 1.07403 0.637823 2.35346
  66. 5 14.0179 3.67504 0.700602 0.57264 2.78285
  67. 6 14.9184 3.80535 0.691515 0.56462 3.20764
  68. 7 15.6295 3.86539 0.691516 0.56462 3.71648
  69. 8 16.2144 3.92876 0.691515 0.56462 3.99213
  70. 9 16.7257 3.9931 0.691515 0.56462 4.26315
  71. 10 17.1864 4.05672 0.691516 0.564621 5.00918
  72. ```
  73. ### Weisfeiler-Lehman shortest path kernel
  74. The table below shows the results of the WL subtree under different subtree heights.
  75. ```
  76. height rmse_test std_test rmse_train std_train k_time
  77. -------- ----------- ---------- ------------ ----------- --------
  78. 0 35.192 4.49577 28.3604 1.35718 13.5041
  79. 1 35.1808 4.50045 27.9335 1.44836 26.8292
  80. 2 35.1632 4.50205 28.1113 1.50891 40.2356
  81. 3 35.1946 4.49801 28.3903 1.36571 54.6704
  82. 4 35.1753 4.50111 27.9746 1.46222 67.1522
  83. 5 35.1997 4.5071 28.0184 1.45564 80.0881
  84. 6 35.1645 4.49849 28.3731 1.60057 92.1925
  85. 7 35.1771 4.5009 27.9604 1.45742 105.812
  86. 8 35.1968 4.50526 28.1991 1.5149 119.022
  87. 9 35.1956 4.50197 28.2665 1.30769 131.228
  88. 10 35.1676 4.49723 28.4163 1.61596 144.964
  89. ```
  90. ### Weisfeiler-Lehman edge kernel
  91. The table below shows the results of the WL subtree under different subtree heights.
  92. ```
  93. height rmse_test std_test rmse_train std_train k_time
  94. -------- ----------- ---------- ------------ ----------- ---------
  95. 0 33.4077 4.73272 29.9975 0.90234 0.853002
  96. 1 33.4235 4.72131 30.1603 1.09423 1.71751
  97. 2 33.433 4.72441 29.9286 0.787941 2.66032
  98. 3 33.4073 4.73243 30.0114 0.909674 3.47763
  99. 4 33.4256 4.72166 30.1842 1.1089 4.54367
  100. 5 33.4067 4.72641 30.0411 1.01845 5.66178
  101. 6 33.419 4.73075 29.9056 0.782179 6.14803
  102. 7 33.4248 4.72155 30.1759 1.10382 7.60354
  103. 8 33.4122 4.71554 30.1365 1.07485 7.97222
  104. 9 33.4071 4.73193 30.0329 0.921065 9.07084
  105. 10 33.4165 4.73169 29.9242 0.790843 10.0254
  106. ```
  107. ### Treelet kernel
  108. **The targets of training data are normalized before calculating the kernel.**
  109. ```
  110. RMSE_test std_test RMSE_train std_train k_time
  111. ----------- ---------- ------------ ----------- --------
  112. 8.3079 3.37838 2.90887 1.2679 0.500302
  113. ```
  114. ### Path kernel up to depth *d*
  115. The table below shows the results of the path kernel up to different depth *d*.
  116. The first table is the results using *Tanimoto kernel*, where **The targets of training data are normalized before calculating the kernel.**.
  117. ```
  118. depth rmse_test std_test rmse_train std_train k_time
  119. ------- ----------- ---------- ------------ ----------- ---------
  120. 0 41.6202 6.453 43.6169 2.13212 0.0904737
  121. 1 38.8446 6.44648 40.8329 3.44147 0.175414
  122. 2 35.2915 4.7813 35.7461 1.61134 0.344896
  123. 3 29.4845 3.90351 28.4646 3.00137 0.553939
  124. 4 22.6693 6.28053 19.2517 3.42893 0.770649
  125. 5 21.7956 5.5225 16.886 2.60519 1.01558
  126. 6 20.6049 5.49983 13.1097 2.58431 1.33302
  127. 7 20.3479 5.17631 12.0152 2.5928 1.60266
  128. 8 19.8228 5.13769 10.7981 2.13082 1.81218
  129. 9 19.8734 5.10369 10.7997 2.09549 2.21726
  130. 10 19.8708 5.09217 10.7787 2.10002 2.41006
  131. ```
  132. The second table is the results using *MinMax kernel*.
  133. ```
  134. depth rmse_test std_test rmse_train std_train k_time
  135. ------- ----------- ---------- ------------ ----------- --------
  136. 0 12.58 2.73235 12.1209 0.500467 0.377576
  137. 1 12.6215 2.18866 10.2243 0.734261 0.456332
  138. 2 7.42903 2.69395 2.71885 0.732922 0.585278
  139. 3 9.02468 2.50808 1.54 1.13813 0.706556
  140. 4 10.0811 3.6477 1.36029 1.42399 0.847957
  141. 5 11.3005 4.44163 1.08518 1.06206 1.00086
  142. 6 12.186 4.88816 1.06443 1.00191 1.19792
  143. 7 12.7534 5.14529 1.19912 1.34031 1.4372
  144. 8 13.0471 5.27184 1.35822 1.84315 1.68449
  145. 9 13.1789 5.27707 1.36002 1.84834 1.96545
  146. 10 13.2538 5.26425 1.36208 1.85426 2.24943
  147. ```
  148. ### Tree pattern kernel
  149. Until N kernel when h = 2:
  150. ```
  151. lmda rmse_test std_test rmse_train std_train k_time
  152. ----------- ----------- ---------- ------------ ----------- --------
  153. 1e-10 7.46524 1.71862 5.99486 0.356634 38.1447
  154. 1e-09 7.37326 1.77195 5.96155 0.374395 37.4921
  155. 1e-08 7.35105 1.78349 5.96481 0.378047 37.9971
  156. 1e-07 7.35213 1.77903 5.96728 0.382251 38.3182
  157. 1e-06 7.3524 1.77992 5.9696 0.3863 39.6428
  158. 1e-05 7.34958 1.78141 5.97114 0.39017 37.3711
  159. 0.0001 7.3513 1.78136 5.94251 0.331843 37.3967
  160. 0.001 7.35822 1.78119 5.9326 0.32534 36.7357
  161. 0.01 7.37552 1.79037 5.94089 0.34763 36.8864
  162. 0.1 7.32951 1.91346 6.42634 1.29405 36.8382
  163. 1 7.27134 2.20774 6.62425 1.2242 37.2425
  164. 10 7.49787 2.36815 6.81697 1.50182 37.8286
  165. 100 7.42887 2.64789 6.68766 1.34809 36.3701
  166. 1000 7.24914 2.65554 6.81906 1.41008 36.1695
  167. 10000 7.08183 2.6248 6.93431 1.38441 37.5723
  168. 100000 8.021 3.43694 8.69813 0.909839 37.8158
  169. 1e+06 8.49625 3.6332 9.59333 0.96626 38.4688
  170. 1e+07 10.9067 3.17593 11.5642 2.07792 36.9926
  171. 1e+08 61.1524 10.4355 65.3527 13.9538 37.1321
  172. 1e+09 99.943 13.6994 98.8848 5.27014 36.7443
  173. 1e+10 100.083 13.8503 97.9168 3.22768 37.096
  174. ```
  175. ### Cyclic pattern kernel
  176. **This kernel is not tested on dataset Acyclic**
  177. Results on dataset MAO:
  178. ```
  179. cycle_bound accur_test std_test accur_train std_train k_time
  180. ------------- ------------ ---------- ------------- ----------- --------
  181. 0 0.642857 0.146385 0.54918 0.0167983 0.187052
  182. 50 0.871429 0.1 0.698361 0.116889 0.300629
  183. 100 0.9 0.111575 0.732787 0.0826366 0.309837
  184. 150 0.9 0.111575 0.732787 0.0826366 0.31808
  185. 200 0.9 0.111575 0.732787 0.0826366 0.317575
  186. ```
  187. Results on dataset PAH:
  188. ```
  189. cycle_bound accur_test std_test accur_train std_train k_time
  190. ------------- ------------ ---------- ------------- ----------- --------
  191. 0 0.61 0.113578 0.629762 0.0135212 0.521801
  192. 10 0.61 0.113578 0.629762 0.0135212 0.52589
  193. 20 0.61 0.113578 0.629762 0.0135212 0.548528
  194. 30 0.64 0.111355 0.633333 0.0157935 0.535311
  195. 40 0.64 0.111355 0.633333 0.0157935 0.61764
  196. 50 0.67 0.09 0.658333 0.0345238 0.733868
  197. 60 0.68 0.107703 0.671429 0.0365769 0.871147
  198. 70 0.67 0.100499 0.666667 0.0380208 1.12625
  199. 80 0.78 0.107703 0.709524 0.0588534 1.19828
  200. 90 0.78 0.107703 0.709524 0.0588534 1.21182
  201. ```

A Python package for graph kernels, graph edit distances and graph pre-image problem.