|
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367 |
- ----------------------------------
- --- Python interface of LIBSVM ---
- ----------------------------------
-
- Table of Contents
- =================
-
- - Introduction
- - Installation
- - Quick Start
- - Design Description
- - Data Structures
- - Utility Functions
- - Additional Information
-
- Introduction
- ============
-
- Python (http://www.python.org/) is a programming language suitable for rapid
- development. This tool provides a simple Python interface to LIBSVM, a library
- for support vector machines (http://www.csie.ntu.edu.tw/~cjlin/libsvm). The
- interface is very easy to use as the usage is the same as that of LIBSVM. The
- interface is developed with the built-in Python library "ctypes."
-
- Installation
- ============
-
- On Unix systems, type
-
- > make
-
- The interface needs only LIBSVM shared library, which is generated by
- the above command. We assume that the shared library is on the LIBSVM
- main directory or in the system path.
-
- For windows, the shared library libsvm.dll for 32-bit python is ready
- in the directory `..\windows'. You can also copy it to the system
- directory (e.g., `C:\WINDOWS\system32\' for Windows XP). To regenerate
- the shared library, please follow the instruction of building windows
- binaries in LIBSVM README.
-
- Quick Start
- ===========
-
- There are two levels of usage. The high-level one uses utility functions
- in svmutil.py and the usage is the same as the LIBSVM MATLAB interface.
-
- >>> from svmutil import *
- # Read data in LIBSVM format
- >>> y, x = svm_read_problem('../heart_scale')
- >>> m = svm_train(y[:200], x[:200], '-c 4')
- >>> p_label, p_acc, p_val = svm_predict(y[200:], x[200:], m)
-
- # Construct problem in python format
- # Dense data
- >>> y, x = [1,-1], [[1,0,1], [-1,0,-1]]
- # Sparse data
- >>> y, x = [1,-1], [{1:1, 3:1}, {1:-1,3:-1}]
- >>> prob = svm_problem(y, x)
- >>> param = svm_parameter('-t 0 -c 4 -b 1')
- >>> m = svm_train(prob, param)
-
- # Precomputed kernel data (-t 4)
- # Dense data
- >>> y, x = [1,-1], [[1, 2, -2], [2, -2, 2]]
- # Sparse data
- >>> y, x = [1,-1], [{0:1, 1:2, 2:-2}, {0:2, 1:-2, 2:2}]
- # isKernel=True must be set for precomputed kernel
- >>> prob = svm_problem(y, x, isKernel=True)
- >>> param = svm_parameter('-t 4 -c 4 -b 1')
- >>> m = svm_train(prob, param)
- # For the format of precomputed kernel, please read LIBSVM README.
-
-
- # Other utility functions
- >>> svm_save_model('heart_scale.model', m)
- >>> m = svm_load_model('heart_scale.model')
- >>> p_label, p_acc, p_val = svm_predict(y, x, m, '-b 1')
- >>> ACC, MSE, SCC = evaluations(y, p_label)
-
- # Getting online help
- >>> help(svm_train)
-
- The low-level use directly calls C interfaces imported by svm.py. Note that
- all arguments and return values are in ctypes format. You need to handle them
- carefully.
-
- >>> from svm import *
- >>> prob = svm_problem([1,-1], [{1:1, 3:1}, {1:-1,3:-1}])
- >>> param = svm_parameter('-c 4')
- >>> m = libsvm.svm_train(prob, param) # m is a ctype pointer to an svm_model
- # Convert a Python-format instance to svm_nodearray, a ctypes structure
- >>> x0, max_idx = gen_svm_nodearray({1:1, 3:1})
- >>> label = libsvm.svm_predict(m, x0)
-
- Design Description
- ==================
-
- There are two files svm.py and svmutil.py, which respectively correspond to
- low-level and high-level use of the interface.
-
- In svm.py, we adopt the Python built-in library "ctypes," so that
- Python can directly access C structures and interface functions defined
- in svm.h.
-
- While advanced users can use structures/functions in svm.py, to
- avoid handling ctypes structures, in svmutil.py we provide some easy-to-use
- functions. The usage is similar to LIBSVM MATLAB interface.
-
- Data Structures
- ===============
-
- Four data structures derived from svm.h are svm_node, svm_problem, svm_parameter,
- and svm_model. They all contain fields with the same names in svm.h. Access
- these fields carefully because you directly use a C structure instead of a
- Python object. For svm_model, accessing the field directly is not recommanded.
- Programmers should use the interface functions or methods of svm_model class
- in Python to get the values. The following description introduces additional
- fields and methods.
-
- Before using the data structures, execute the following command to load the
- LIBSVM shared library:
-
- >>> from svm import *
-
- - class svm_node:
-
- Construct an svm_node.
-
- >>> node = svm_node(idx, val)
-
- idx: an integer indicates the feature index.
-
- val: a float indicates the feature value.
-
- Show the index and the value of a node.
-
- >>> print(node)
-
- - Function: gen_svm_nodearray(xi [,feature_max=None [,isKernel=False]])
-
- Generate a feature vector from a Python list/tuple or a dictionary:
-
- >>> xi, max_idx = gen_svm_nodearray({1:1, 3:1, 5:-2})
-
- xi: the returned svm_nodearray (a ctypes structure)
-
- max_idx: the maximal feature index of xi
-
- feature_max: if feature_max is assigned, features with indices larger than
- feature_max are removed.
-
- isKernel: if isKernel == True, the list index starts from 0 for precomputed
- kernel. Otherwise, the list index starts from 1. The default
- value is False.
-
- - class svm_problem:
-
- Construct an svm_problem instance
-
- >>> prob = svm_problem(y, x)
-
- y: a Python list/tuple of l labels (type must be int/double).
-
- x: a Python list/tuple of l data instances. Each element of x must be
- an instance of list/tuple/dictionary type.
-
- Note that if your x contains sparse data (i.e., dictionary), the internal
- ctypes data format is still sparse.
-
- For pre-computed kernel, the isKernel flag should be set to True:
-
- >>> prob = svm_problem(y, x, isKernel=True)
-
- Please read LIBSVM README for more details of pre-computed kernel.
-
- - class svm_parameter:
-
- Construct an svm_parameter instance
-
- >>> param = svm_parameter('training_options')
-
- If 'training_options' is empty, LIBSVM default values are applied.
-
- Set param to LIBSVM default values.
-
- >>> param.set_to_default_values()
-
- Parse a string of options.
-
- >>> param.parse_options('training_options')
-
- Show values of parameters.
-
- >>> print(param)
-
- - class svm_model:
-
- There are two ways to obtain an instance of svm_model:
-
- >>> model = svm_train(y, x)
- >>> model = svm_load_model('model_file_name')
-
- Note that the returned structure of interface functions
- libsvm.svm_train and libsvm.svm_load_model is a ctypes pointer of
- svm_model, which is different from the svm_model object returned
- by svm_train and svm_load_model in svmutil.py. We provide a
- function toPyModel for the conversion:
-
- >>> model_ptr = libsvm.svm_train(prob, param)
- >>> model = toPyModel(model_ptr)
-
- If you obtain a model in a way other than the above approaches,
- handle it carefully to avoid memory leak or segmentation fault.
-
- Some interface functions to access LIBSVM models are wrapped as
- members of the class svm_model:
-
- >>> svm_type = model.get_svm_type()
- >>> nr_class = model.get_nr_class()
- >>> svr_probability = model.get_svr_probability()
- >>> class_labels = model.get_labels()
- >>> sv_indices = model.get_sv_indices()
- >>> nr_sv = model.get_nr_sv()
- >>> is_prob_model = model.is_probability_model()
- >>> support_vector_coefficients = model.get_sv_coef()
- >>> support_vectors = model.get_SV()
-
- Utility Functions
- =================
-
- To use utility functions, type
-
- >>> from svmutil import *
-
- The above command loads
- svm_train() : train an SVM model
- svm_predict() : predict testing data
- svm_read_problem() : read the data from a LIBSVM-format file.
- svm_load_model() : load a LIBSVM model.
- svm_save_model() : save model to a file.
- evaluations() : evaluate prediction results.
-
- - Function: svm_train
-
- There are three ways to call svm_train()
-
- >>> model = svm_train(y, x [, 'training_options'])
- >>> model = svm_train(prob [, 'training_options'])
- >>> model = svm_train(prob, param)
-
- y: a list/tuple of l training labels (type must be int/double).
-
- x: a list/tuple of l training instances. The feature vector of
- each training instance is an instance of list/tuple or dictionary.
-
- training_options: a string in the same form as that for LIBSVM command
- mode.
-
- prob: an svm_problem instance generated by calling
- svm_problem(y, x).
- For pre-computed kernel, you should use
- svm_problem(y, x, isKernel=True)
-
- param: an svm_parameter instance generated by calling
- svm_parameter('training_options')
-
- model: the returned svm_model instance. See svm.h for details of this
- structure. If '-v' is specified, cross validation is
- conducted and the returned model is just a scalar: cross-validation
- accuracy for classification and mean-squared error for regression.
-
- To train the same data many times with different
- parameters, the second and the third ways should be faster..
-
- Examples:
-
- >>> y, x = svm_read_problem('../heart_scale')
- >>> prob = svm_problem(y, x)
- >>> param = svm_parameter('-s 3 -c 5 -h 0')
- >>> m = svm_train(y, x, '-c 5')
- >>> m = svm_train(prob, '-t 2 -c 5')
- >>> m = svm_train(prob, param)
- >>> CV_ACC = svm_train(y, x, '-v 3')
-
- - Function: svm_predict
-
- To predict testing data with a model, use
-
- >>> p_labs, p_acc, p_vals = svm_predict(y, x, model [,'predicting_options'])
-
- y: a list/tuple of l true labels (type must be int/double). It is used
- for calculating the accuracy. Use [0]*len(x) if true labels are
- unavailable.
-
- x: a list/tuple of l predicting instances. The feature vector of
- each predicting instance is an instance of list/tuple or dictionary.
-
- predicting_options: a string of predicting options in the same format as
- that of LIBSVM.
-
- model: an svm_model instance.
-
- p_labels: a list of predicted labels
-
- p_acc: a tuple including accuracy (for classification), mean
- squared error, and squared correlation coefficient (for
- regression).
-
- p_vals: a list of decision values or probability estimates (if '-b 1'
- is specified). If k is the number of classes in training data,
- for decision values, each element includes results of predicting
- k(k-1)/2 binary-class SVMs. For classification, k = 1 is a
- special case. Decision value [+1] is returned for each testing
- instance, instead of an empty list.
- For probabilities, each element contains k values indicating
- the probability that the testing instance is in each class.
- Note that the order of classes is the same as the 'model.label'
- field in the model structure.
-
- Example:
-
- >>> m = svm_train(y, x, '-c 5')
- >>> p_labels, p_acc, p_vals = svm_predict(y, x, m)
-
- - Functions: svm_read_problem/svm_load_model/svm_save_model
-
- See the usage by examples:
-
- >>> y, x = svm_read_problem('data.txt')
- >>> m = svm_load_model('model_file')
- >>> svm_save_model('model_file', m)
-
- - Function: evaluations
-
- Calculate some evaluations using the true values (ty) and predicted
- values (pv):
-
- >>> (ACC, MSE, SCC) = evaluations(ty, pv)
-
- ty: a list of true values.
-
- pv: a list of predict values.
-
- ACC: accuracy.
-
- MSE: mean squared error.
-
- SCC: squared correlation coefficient.
-
-
- Additional Information
- ======================
-
- This interface was written by Hsiang-Fu Yu from Department of Computer
- Science, National Taiwan University. If you find this tool useful, please
- cite LIBSVM as follows
-
- Chih-Chung Chang and Chih-Jen Lin, LIBSVM : a library for support
- vector machines. ACM Transactions on Intelligent Systems and
- Technology, 2:27:1--27:27, 2011. Software available at
- http://www.csie.ntu.edu.tw/~cjlin/libsvm
-
- For any question, please contact Chih-Jen Lin <cjlin@csie.ntu.edu.tw>,
- or check the FAQ page:
-
- http://www.csie.ntu.edu.tw/~cjlin/libsvm/faq.html
|