You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.md 6.7 kB

7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697
  1. # py-graph
  2. A python package for graph kernels.
  3. ## Requirements
  4. * numpy - 1.13.3
  5. * scipy - 1.0.0
  6. * matplotlib - 2.1.0
  7. * networkx - 2.0
  8. * sklearn - 0.19.1
  9. * tabulate - 0.8.2
  10. ## Results with minimal test RMSE for each kernel on dataset Asyclic
  11. All kernels expect for Cyclic pattern kernel are tested on dataset Asyclic, which consists of 185 molecules (graphs). (Cyclic pattern kernel is tested on dataset MAO and PAH.)
  12. The criteria used for prediction are SVM for classification and kernel Ridge regression for regression.
  13. For prediction we randomly divide the data in train and test subset, where 90\% of entire dataset is for training and rest for testing. 30 splits are performed. For each split, we first train on the train data, then evaluate the performance on the test set. We choose the optimal parameters for the test set and finally provide the corresponding performance. The final results correspond to the average of the performances on the test sets.
  14. | Kernels | train_perf | valid_perf | test_perf | Parameters | gram_matrix_time |
  15. |---------------------------|------------|------------|------------|--------------------------------------------------|---------------------------|
  16. | Shortest path | 28.77±0.60 | 38.31±0.92 | 39.40±6.32 | 'alpha': '1.00' | 13.54" |
  17. | Marginalized | 12.95±0.37 | 19.02±1.73 | 18.24±5.00 | 'p_quit': 0.2, 'alpha': '1.00e-04' | 437.04"/447.44"±5.32" |
  18. | Extension of Marginalized | 20.65±0.44 | 26.06±1.83 | 26.84±4.81 | 'p_quit': 0.1, 'alpha': '5.62e-04' | 6388.50"/6266.67"±149.16" |
  19. | Path | 8.71±0.63 | 19.28±1.75 | 17.42±6.57 | 'alpha': '2.82e-02' | 21.94" |
  20. | WL subtree | 13.90±0.35 | 18.47±1.36 | 18.08±4.70 | 'height': 1.0, 'alpha': '1.50e-03' | 0.79"/1.32"±0.76" |
  21. | WL shortest path | 28.74±0.60 | 38.20±0.62 | 39.02±6.09 | 'height': 10.0, 'alpha': '1.00' | 146.83"/80.63"±45.04" |
  22. | WL edge | 30.21±0.64 | 36.53±1.02 | 38.42±6.42 | 'height': 5.0, 'alpha': '6.31e-01' | 5.24"/5.15"±2.83" |
  23. | Treelet | 7.33±0.64 | 13.86±0.80 | 15.38±3.56 | 'alpha': '1.12e+01' | 0.48" |
  24. | Path up to d | 5.76±0.27 | 9.89±0.87 | 10.21±4.16 | 'depth': 2.0, 'k_func': 'MinMax', 'alpha': '0.1' | 0.56"/1.16"±0.75" |
  25. | Cyclic pattern | | | | | |
  26. | Walk up to n | 20.88±0.74 | 23.34±1.11 | 24.46±6.57 | 'n': 2.0, 'alpha': '1.00e-03' | 0.56"/331.70"±753.44" |
  27. In table above,last column is the time consumed to calculate the gram matrix. Note for
  28. kernels which need to tune hyper-parameters that are required to calculate gram
  29. matrices, average time consumption and its confidence are obtained over the
  30. hyper-parameters grids, which are shown after "/". The time shown before "/"
  31. is the one spent on building the gram matrix corresponding to the best test
  32. performance.
  33. * See detail results in [results.md](pygraph/kernels/results.md).
  34. ## References
  35. [1] K. M. Borgwardt and H.-P. Kriegel. Shortest-path kernels on graphs. In Proceedings of the International Conference on Data Mining, pages 74-81, 2005.
  36. [2] H. Kashima, K. Tsuda, and A. Inokuchi. Marginalized kernels between labeled graphs. In Proceedings of the 20th International Conference on Machine Learning, Washington, DC, United States, 2003.
  37. [3] Suard F, Rakotomamonjy A, Bensrhair A. Kernel on Bag of Paths For Measuring Similarity of Shapes. InESANN 2007 Apr 25 (pp. 355-360).
  38. [4] N. Shervashidze, P. Schweitzer, E. J. van Leeuwen, K. Mehlhorn, and K. M. Borgwardt. Weisfeiler-lehman graph kernels. Journal of Machine Learning Research, 12:2539-2561, 2011.
  39. [5] Gaüzère B, Brun L, Villemin D. Two new graphs kernels in chemoinformatics. Pattern Recognition Letters. 2012 Nov 1;33(15):2038-47.
  40. [6] Liva Ralaivola, Sanjay J Swamidass, Hiroto Saigo, and Pierre Baldi. Graph kernels for chemical informatics. Neural networks, 18(8):1093–1110, 2005.
  41. [7] Pierre Mahé and Jean-Philippe Vert. Graph kernels based on tree patterns for molecules. Machine learning, 75(1):3–35, 2009.
  42. [8] Tamás Horváth, Thomas Gärtner, and Stefan Wrobel. Cyclic pattern kernels for predictive graph mining. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 158–167. ACM, 2004.
  43. [9] Thomas Gärtner, Peter Flach, and Stefan Wrobel. On graph kernels: Hardness results and efficient alternatives. Learning Theory and Kernel Machines, pages 129–143, 2003.
  44. ## Updates
  45. ### 2018.02.28
  46. * ADD *walk kernel up to n* and its result on dataset Asyclic.
  47. * MOD training process, use nested cross validation for model selection. Recalculate performance of all kernels.
  48. ### 2018.02.08
  49. * ADD *tree pattern kernel* and its result on dataset Asyclic.
  50. * ADD *cyclic pattern kernel* and its result on classification datasets.
  51. ### 2018.01.24
  52. * ADD *path kernel up to depth d* and its result on dataset Asyclic.
  53. * MOD treelet kernel, retrieve canonkeys of all graphs before calculate kernels, wildly speed it up.
  54. ### 2018.01.17
  55. * ADD comments to code of treelet kernel.
  56. ### 2018.01.16
  57. * ADD *treelet kernel* and its result on dataset Asyclic.
  58. * MOD the way to calculate WL subtree kernel, correct its results.
  59. * ADD *kernel_train_test* and *split_train_test* to wrap training and testing process.
  60. * MOD readme.md file, add detailed results of each kernel. - linlin
  61. ### 2017.12.22
  62. * ADD calculation of the time spend to acquire kernel matrices for each kernel.
  63. * MOD floydTransformation function, calculate shortest paths taking into consideration user-defined edge weight.
  64. * MOD implementation of nodes and edges attributes genericity for all kernels.
  65. * ADD detailed results file results.md.
  66. ### 2017.12.21
  67. * MOD Weisfeiler-Lehman subtree kernel and the test code.
  68. ### 2017.12.20
  69. * ADD *Weisfeiler-Lehman subtree kernel* and its result on dataset Asyclic.
  70. ### 2017.12.07
  71. * ADD *mean average path kernel* and its result on dataset Asyclic.
  72. * ADD delta kernel. - linlin
  73. * MOD reconstruction the code of marginalized kernel.
  74. ### 2017.12.05
  75. * ADD *marginalized kernel* and its result. - linlin
  76. * ADD list required python packages in file README.md.
  77. ### 2017.11.24
  78. * ADD *shortest path kernel* and its result.

A Python package for graph kernels, graph edit distances and graph pre-image problem.