You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

README 12 kB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367
  1. ----------------------------------
  2. --- Python interface of LIBSVM ---
  3. ----------------------------------
  4. Table of Contents
  5. =================
  6. - Introduction
  7. - Installation
  8. - Quick Start
  9. - Design Description
  10. - Data Structures
  11. - Utility Functions
  12. - Additional Information
  13. Introduction
  14. ============
  15. Python (http://www.python.org/) is a programming language suitable for rapid
  16. development. This tool provides a simple Python interface to LIBSVM, a library
  17. for support vector machines (http://www.csie.ntu.edu.tw/~cjlin/libsvm). The
  18. interface is very easy to use as the usage is the same as that of LIBSVM. The
  19. interface is developed with the built-in Python library "ctypes."
  20. Installation
  21. ============
  22. On Unix systems, type
  23. > make
  24. The interface needs only LIBSVM shared library, which is generated by
  25. the above command. We assume that the shared library is on the LIBSVM
  26. main directory or in the system path.
  27. For windows, the shared library libsvm.dll for 32-bit python is ready
  28. in the directory `..\windows'. You can also copy it to the system
  29. directory (e.g., `C:\WINDOWS\system32\' for Windows XP). To regenerate
  30. the shared library, please follow the instruction of building windows
  31. binaries in LIBSVM README.
  32. Quick Start
  33. ===========
  34. There are two levels of usage. The high-level one uses utility functions
  35. in svmutil.py and the usage is the same as the LIBSVM MATLAB interface.
  36. >>> from svmutil import *
  37. # Read data in LIBSVM format
  38. >>> y, x = svm_read_problem('../heart_scale')
  39. >>> m = svm_train(y[:200], x[:200], '-c 4')
  40. >>> p_label, p_acc, p_val = svm_predict(y[200:], x[200:], m)
  41. # Construct problem in python format
  42. # Dense data
  43. >>> y, x = [1,-1], [[1,0,1], [-1,0,-1]]
  44. # Sparse data
  45. >>> y, x = [1,-1], [{1:1, 3:1}, {1:-1,3:-1}]
  46. >>> prob = svm_problem(y, x)
  47. >>> param = svm_parameter('-t 0 -c 4 -b 1')
  48. >>> m = svm_train(prob, param)
  49. # Precomputed kernel data (-t 4)
  50. # Dense data
  51. >>> y, x = [1,-1], [[1, 2, -2], [2, -2, 2]]
  52. # Sparse data
  53. >>> y, x = [1,-1], [{0:1, 1:2, 2:-2}, {0:2, 1:-2, 2:2}]
  54. # isKernel=True must be set for precomputed kernel
  55. >>> prob = svm_problem(y, x, isKernel=True)
  56. >>> param = svm_parameter('-t 4 -c 4 -b 1')
  57. >>> m = svm_train(prob, param)
  58. # For the format of precomputed kernel, please read LIBSVM README.
  59. # Other utility functions
  60. >>> svm_save_model('heart_scale.model', m)
  61. >>> m = svm_load_model('heart_scale.model')
  62. >>> p_label, p_acc, p_val = svm_predict(y, x, m, '-b 1')
  63. >>> ACC, MSE, SCC = evaluations(y, p_label)
  64. # Getting online help
  65. >>> help(svm_train)
  66. The low-level use directly calls C interfaces imported by svm.py. Note that
  67. all arguments and return values are in ctypes format. You need to handle them
  68. carefully.
  69. >>> from svm import *
  70. >>> prob = svm_problem([1,-1], [{1:1, 3:1}, {1:-1,3:-1}])
  71. >>> param = svm_parameter('-c 4')
  72. >>> m = libsvm.svm_train(prob, param) # m is a ctype pointer to an svm_model
  73. # Convert a Python-format instance to svm_nodearray, a ctypes structure
  74. >>> x0, max_idx = gen_svm_nodearray({1:1, 3:1})
  75. >>> label = libsvm.svm_predict(m, x0)
  76. Design Description
  77. ==================
  78. There are two files svm.py and svmutil.py, which respectively correspond to
  79. low-level and high-level use of the interface.
  80. In svm.py, we adopt the Python built-in library "ctypes," so that
  81. Python can directly access C structures and interface functions defined
  82. in svm.h.
  83. While advanced users can use structures/functions in svm.py, to
  84. avoid handling ctypes structures, in svmutil.py we provide some easy-to-use
  85. functions. The usage is similar to LIBSVM MATLAB interface.
  86. Data Structures
  87. ===============
  88. Four data structures derived from svm.h are svm_node, svm_problem, svm_parameter,
  89. and svm_model. They all contain fields with the same names in svm.h. Access
  90. these fields carefully because you directly use a C structure instead of a
  91. Python object. For svm_model, accessing the field directly is not recommanded.
  92. Programmers should use the interface functions or methods of svm_model class
  93. in Python to get the values. The following description introduces additional
  94. fields and methods.
  95. Before using the data structures, execute the following command to load the
  96. LIBSVM shared library:
  97. >>> from svm import *
  98. - class svm_node:
  99. Construct an svm_node.
  100. >>> node = svm_node(idx, val)
  101. idx: an integer indicates the feature index.
  102. val: a float indicates the feature value.
  103. Show the index and the value of a node.
  104. >>> print(node)
  105. - Function: gen_svm_nodearray(xi [,feature_max=None [,isKernel=False]])
  106. Generate a feature vector from a Python list/tuple or a dictionary:
  107. >>> xi, max_idx = gen_svm_nodearray({1:1, 3:1, 5:-2})
  108. xi: the returned svm_nodearray (a ctypes structure)
  109. max_idx: the maximal feature index of xi
  110. feature_max: if feature_max is assigned, features with indices larger than
  111. feature_max are removed.
  112. isKernel: if isKernel == True, the list index starts from 0 for precomputed
  113. kernel. Otherwise, the list index starts from 1. The default
  114. value is False.
  115. - class svm_problem:
  116. Construct an svm_problem instance
  117. >>> prob = svm_problem(y, x)
  118. y: a Python list/tuple of l labels (type must be int/double).
  119. x: a Python list/tuple of l data instances. Each element of x must be
  120. an instance of list/tuple/dictionary type.
  121. Note that if your x contains sparse data (i.e., dictionary), the internal
  122. ctypes data format is still sparse.
  123. For pre-computed kernel, the isKernel flag should be set to True:
  124. >>> prob = svm_problem(y, x, isKernel=True)
  125. Please read LIBSVM README for more details of pre-computed kernel.
  126. - class svm_parameter:
  127. Construct an svm_parameter instance
  128. >>> param = svm_parameter('training_options')
  129. If 'training_options' is empty, LIBSVM default values are applied.
  130. Set param to LIBSVM default values.
  131. >>> param.set_to_default_values()
  132. Parse a string of options.
  133. >>> param.parse_options('training_options')
  134. Show values of parameters.
  135. >>> print(param)
  136. - class svm_model:
  137. There are two ways to obtain an instance of svm_model:
  138. >>> model = svm_train(y, x)
  139. >>> model = svm_load_model('model_file_name')
  140. Note that the returned structure of interface functions
  141. libsvm.svm_train and libsvm.svm_load_model is a ctypes pointer of
  142. svm_model, which is different from the svm_model object returned
  143. by svm_train and svm_load_model in svmutil.py. We provide a
  144. function toPyModel for the conversion:
  145. >>> model_ptr = libsvm.svm_train(prob, param)
  146. >>> model = toPyModel(model_ptr)
  147. If you obtain a model in a way other than the above approaches,
  148. handle it carefully to avoid memory leak or segmentation fault.
  149. Some interface functions to access LIBSVM models are wrapped as
  150. members of the class svm_model:
  151. >>> svm_type = model.get_svm_type()
  152. >>> nr_class = model.get_nr_class()
  153. >>> svr_probability = model.get_svr_probability()
  154. >>> class_labels = model.get_labels()
  155. >>> sv_indices = model.get_sv_indices()
  156. >>> nr_sv = model.get_nr_sv()
  157. >>> is_prob_model = model.is_probability_model()
  158. >>> support_vector_coefficients = model.get_sv_coef()
  159. >>> support_vectors = model.get_SV()
  160. Utility Functions
  161. =================
  162. To use utility functions, type
  163. >>> from svmutil import *
  164. The above command loads
  165. svm_train() : train an SVM model
  166. svm_predict() : predict testing data
  167. svm_read_problem() : read the data from a LIBSVM-format file.
  168. svm_load_model() : load a LIBSVM model.
  169. svm_save_model() : save model to a file.
  170. evaluations() : evaluate prediction results.
  171. - Function: svm_train
  172. There are three ways to call svm_train()
  173. >>> model = svm_train(y, x [, 'training_options'])
  174. >>> model = svm_train(prob [, 'training_options'])
  175. >>> model = svm_train(prob, param)
  176. y: a list/tuple of l training labels (type must be int/double).
  177. x: a list/tuple of l training instances. The feature vector of
  178. each training instance is an instance of list/tuple or dictionary.
  179. training_options: a string in the same form as that for LIBSVM command
  180. mode.
  181. prob: an svm_problem instance generated by calling
  182. svm_problem(y, x).
  183. For pre-computed kernel, you should use
  184. svm_problem(y, x, isKernel=True)
  185. param: an svm_parameter instance generated by calling
  186. svm_parameter('training_options')
  187. model: the returned svm_model instance. See svm.h for details of this
  188. structure. If '-v' is specified, cross validation is
  189. conducted and the returned model is just a scalar: cross-validation
  190. accuracy for classification and mean-squared error for regression.
  191. To train the same data many times with different
  192. parameters, the second and the third ways should be faster..
  193. Examples:
  194. >>> y, x = svm_read_problem('../heart_scale')
  195. >>> prob = svm_problem(y, x)
  196. >>> param = svm_parameter('-s 3 -c 5 -h 0')
  197. >>> m = svm_train(y, x, '-c 5')
  198. >>> m = svm_train(prob, '-t 2 -c 5')
  199. >>> m = svm_train(prob, param)
  200. >>> CV_ACC = svm_train(y, x, '-v 3')
  201. - Function: svm_predict
  202. To predict testing data with a model, use
  203. >>> p_labs, p_acc, p_vals = svm_predict(y, x, model [,'predicting_options'])
  204. y: a list/tuple of l true labels (type must be int/double). It is used
  205. for calculating the accuracy. Use [0]*len(x) if true labels are
  206. unavailable.
  207. x: a list/tuple of l predicting instances. The feature vector of
  208. each predicting instance is an instance of list/tuple or dictionary.
  209. predicting_options: a string of predicting options in the same format as
  210. that of LIBSVM.
  211. model: an svm_model instance.
  212. p_labels: a list of predicted labels
  213. p_acc: a tuple including accuracy (for classification), mean
  214. squared error, and squared correlation coefficient (for
  215. regression).
  216. p_vals: a list of decision values or probability estimates (if '-b 1'
  217. is specified). If k is the number of classes in training data,
  218. for decision values, each element includes results of predicting
  219. k(k-1)/2 binary-class SVMs. For classification, k = 1 is a
  220. special case. Decision value [+1] is returned for each testing
  221. instance, instead of an empty list.
  222. For probabilities, each element contains k values indicating
  223. the probability that the testing instance is in each class.
  224. Note that the order of classes is the same as the 'model.label'
  225. field in the model structure.
  226. Example:
  227. >>> m = svm_train(y, x, '-c 5')
  228. >>> p_labels, p_acc, p_vals = svm_predict(y, x, m)
  229. - Functions: svm_read_problem/svm_load_model/svm_save_model
  230. See the usage by examples:
  231. >>> y, x = svm_read_problem('data.txt')
  232. >>> m = svm_load_model('model_file')
  233. >>> svm_save_model('model_file', m)
  234. - Function: evaluations
  235. Calculate some evaluations using the true values (ty) and predicted
  236. values (pv):
  237. >>> (ACC, MSE, SCC) = evaluations(ty, pv)
  238. ty: a list of true values.
  239. pv: a list of predict values.
  240. ACC: accuracy.
  241. MSE: mean squared error.
  242. SCC: squared correlation coefficient.
  243. Additional Information
  244. ======================
  245. This interface was written by Hsiang-Fu Yu from Department of Computer
  246. Science, National Taiwan University. If you find this tool useful, please
  247. cite LIBSVM as follows
  248. Chih-Chung Chang and Chih-Jen Lin, LIBSVM : a library for support
  249. vector machines. ACM Transactions on Intelligent Systems and
  250. Technology, 2:27:1--27:27, 2011. Software available at
  251. http://www.csie.ntu.edu.tw/~cjlin/libsvm
  252. For any question, please contact Chih-Jen Lin <cjlin@csie.ntu.edu.tw>,
  253. or check the FAQ page:
  254. http://www.csie.ntu.edu.tw/~cjlin/libsvm/faq.html

A Python package for graph kernels, graph edit distances and graph pre-image problem.