From ad0788062d8b563e4b14fdf013787cd1217f70cb Mon Sep 17 00:00:00 2001 From: linlin Date: Mon, 5 Oct 2020 16:34:59 +0200 Subject: [PATCH] New translations README (French) --- .../fr/gklearn/gedlib/lib/libsvm.3.22/tools/README | 210 +++++++++++++++++++++ 1 file changed, 210 insertions(+) create mode 100644 lang/fr/gklearn/gedlib/lib/libsvm.3.22/tools/README diff --git a/lang/fr/gklearn/gedlib/lib/libsvm.3.22/tools/README b/lang/fr/gklearn/gedlib/lib/libsvm.3.22/tools/README new file mode 100644 index 0000000..172f4af --- /dev/null +++ b/lang/fr/gklearn/gedlib/lib/libsvm.3.22/tools/README @@ -0,0 +1,210 @@ +This directory includes some useful codes: + +1. subset selection tools. +2. parameter selection tools. +3. LIBSVM format checking tools + +Part I: Subset selection tools + +Introduction +============ + +Training large data is time consuming. Sometimes one should work on a +smaller subset first. The python script subset.py randomly selects a +specified number of samples. For classification data, we provide a +stratified selection to ensure the same class distribution in the +subset. + +Usage: subset.py [options] dataset number [output1] [output2] + +This script selects a subset of the given data set. + +options: +-s method : method of selection (default 0) + 0 -- stratified selection (classification only) + 1 -- random selection + +output1 : the subset (optional) +output2 : the rest of data (optional) + +If output1 is omitted, the subset will be printed on the screen. + +Example +======= + +> python subset.py heart_scale 100 file1 file2 + +From heart_scale 100 samples are randomly selected and stored in +file1. All remaining instances are stored in file2. + + +Part II: Parameter Selection Tools + +Introduction +============ + +grid.py is a parameter selection tool for C-SVM classification using +the RBF (radial basis function) kernel. It uses cross validation (CV) +technique to estimate the accuracy of each parameter combination in +the specified range and helps you to decide the best parameters for +your problem. + +grid.py directly executes libsvm binaries (so no python binding is needed) +for cross validation and then draw contour of CV accuracy using gnuplot. +You must have libsvm and gnuplot installed before using it. The package +gnuplot is available at http://www.gnuplot.info/ + +On Mac OSX, the precompiled gnuplot file needs the library Aquarterm, +which thus must be installed as well. In addition, this version of +gnuplot does not support png, so you need to change "set term png +transparent small" and use other image formats. For example, you may +have "set term pbm small color". + +Usage: grid.py [grid_options] [svm_options] dataset + +grid_options : +-log2c {begin,end,step | "null"} : set the range of c (default -5,15,2) + begin,end,step -- c_range = 2^{begin,...,begin+k*step,...,end} + "null" -- do not grid with c +-log2g {begin,end,step | "null"} : set the range of g (default 3,-15,-2) + begin,end,step -- g_range = 2^{begin,...,begin+k*step,...,end} + "null" -- do not grid with g +-v n : n-fold cross validation (default 5) +-svmtrain pathname : set svm executable path and name +-gnuplot {pathname | "null"} : + pathname -- set gnuplot executable path and name + "null" -- do not plot +-out {pathname | "null"} : (default dataset.out) + pathname -- set output file path and name + "null" -- do not output file +-png pathname : set graphic output file path and name (default dataset.png) +-resume [pathname] : resume the grid task using an existing output file (default pathname is dataset.out) + Use this option only if some parameters have been checked for the SAME data. + +svm_options : additional options for svm-train + +The program conducts v-fold cross validation using parameter C (and gamma) += 2^begin, 2^(begin+step), ..., 2^end. + +You can specify where the libsvm executable and gnuplot are using the +-svmtrain and -gnuplot parameters. + +For windows users, please use pgnuplot.exe. If you are using gnuplot +3.7.1, please upgrade to version 3.7.3 or higher. The version 3.7.1 +has a bug. If you use cygwin on windows, please use gunplot-x11. + +If the task is terminated accidentally or you would like to change the +range of parameters, you can apply '-resume' to save time by re-using +previous results. You may specify the output file of a previous run +or use the default (i.e., dataset.out) without giving a name. Please +note that the same condition must be used in two runs. For example, +you cannot use '-v 10' earlier and resume the task with '-v 5'. + +The value of some options can be "null." For example, `-log2c -1,0,1 +-log2 "null"' means that C=2^-1,2^0,2^1 and g=LIBSVM's default gamma +value. That is, you do not conduct parameter selection on gamma. + +Example +======= + +> python grid.py -log2c -5,5,1 -log2g -4,0,1 -v 5 -m 300 heart_scale + +Users (in particular MS Windows users) may need to specify the path of +executable files. You can either change paths in the beginning of +grid.py or specify them in the command line. For example, + +> grid.py -log2c -5,5,1 -svmtrain "c:\Program Files\libsvm\windows\svm-train.exe" -gnuplot c:\tmp\gnuplot\binary\pgnuplot.exe -v 10 heart_scale + +Output: two files +dataset.png: the CV accuracy contour plot generated by gnuplot +dataset.out: the CV accuracy at each (log2(C),log2(gamma)) + +The following example saves running time by loading the output file of a previous run. + +> python grid.py -log2c -7,7,1 -log2g -5,2,1 -v 5 -resume heart_scale.out heart_scale + +Parallel grid search +==================== + +You can conduct a parallel grid search by dispatching jobs to a +cluster of computers which share the same file system. First, you add +machine names in grid.py: + +ssh_workers = ["linux1", "linux5", "linux5"] + +and then setup your ssh so that the authentication works without +asking a password. + +The same machine (e.g., linux5 here) can be listed more than once if +it has multiple CPUs or has more RAM. If the local machine is the +best, you can also enlarge the nr_local_worker. For example: + +nr_local_worker = 2 + +Example: + +> python grid.py heart_scale +[local] -1 -1 78.8889 (best c=0.5, g=0.5, rate=78.8889) +[linux5] -1 -7 83.3333 (best c=0.5, g=0.0078125, rate=83.3333) +[linux5] 5 -1 77.037 (best c=0.5, g=0.0078125, rate=83.3333) +[linux1] 5 -7 83.3333 (best c=0.5, g=0.0078125, rate=83.3333) +. +. +. + +If -log2c, -log2g, or -v is not specified, default values are used. + +If your system uses telnet instead of ssh, you list the computer names +in telnet_workers. + +Calling grid in Python +====================== + +In addition to using grid.py as a command-line tool, you can use it as a +Python module. + +>>> rate, param = find_parameters(dataset, options) + +You need to specify `dataset' and `options' (default ''). See the following example. + +> python + +>>> from grid import * +>>> rate, param = find_parameters('../heart_scale', '-log2c -1,1,1 -log2g -1,1,1') +[local] 0.0 0.0 rate=74.8148 (best c=1.0, g=1.0, rate=74.8148) +[local] 0.0 -1.0 rate=77.037 (best c=1.0, g=0.5, rate=77.037) +. +. +[local] -1.0 -1.0 rate=78.8889 (best c=0.5, g=0.5, rate=78.8889) +. +. +>>> rate +78.8889 +>>> param +{'c': 0.5, 'g': 0.5} + + +Part III: LIBSVM format checking tools + +Introduction +============ + +`svm-train' conducts only a simple check of the input data. To do a +detailed check, we provide a python script `checkdata.py.' + +Usage: checkdata.py dataset + +Exit status (returned value): 1 if there are errors, 0 otherwise. + +This tool is written by Rong-En Fan at National Taiwan University. + +Example +======= + +> cat bad_data +1 3:1 2:4 +> python checkdata.py bad_data +line 1: feature indices must be in an ascending order, previous/current features 3:1 2:4 +Found 1 lines with error. + +