@@ -0,0 +1,165 @@ | |||||
# graphkit-learn | |||||
[](https://travis-ci.org/jajupmochi/graphkit-learn) [](https://ci.appveyor.com/project/jajupmochi/graphkit-learn) [](https://codecov.io/gh/jajupmochi/graphkit-learn) [](https://graphkit-learn.readthedocs.io/en/master/?badge=master) [](https://badge.fury.io/py/graphkit-learn) | |||||
A Python package for graph kernels, graph edit distances and graph pre-image problem. | |||||
## Requirements | |||||
* python>=3.6 | |||||
* numpy>=1.16.2 | |||||
* scipy>=1.1.0 | |||||
* matplotlib>=3.1.0 | |||||
* networkx>=2.2 | |||||
* scikit-learn>=0.20.0 | |||||
* tabulate>=0.8.2 | |||||
* tqdm>=4.26.0 | |||||
* control>=0.8.2 (for generalized random walk kernels only) | |||||
* slycot==0.3.3 (for generalized random walk kernels only, which requires a fortran compiler, gfortran for example) | |||||
## How to use? | |||||
### Install the library | |||||
* Install stable version from PyPI (may not be up-to-date): | |||||
``` | |||||
$ pip install graphkit-learn | |||||
``` | |||||
* Install latest version from GitHub: | |||||
``` | |||||
$ git clone https://github.com/jajupmochi/graphkit-learn.git | |||||
$ cd graphkit-learn/ | |||||
$ python setup.py install | |||||
``` | |||||
### Run the test | |||||
A series of [tests](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/tests) can be run to check if the library works correctly: | |||||
``` | |||||
$ pip install -U pip pytest codecov coverage pytest-cov | |||||
$ pytest -v --cov-config=.coveragerc --cov-report term --cov=gklearn gklearn/tests/ | |||||
``` | |||||
### Check examples | |||||
A series of demos of using the library can be found on [Google Colab](https://drive.google.com/drive/folders/1r2gtPuFzIys2_MZw1wXqE2w3oCoVoQUG?usp=sharing) and in the [`example`](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/examples) folder. | |||||
### Other demos | |||||
Check [`notebooks`](https://github.com/jajupmochi/graphkit-learn/tree/master/notebooks) directory for more demos: | |||||
* [`notebooks`](https://github.com/jajupmochi/graphkit-learn/tree/master/notebooks) directory includes test codes of graph kernels based on linear patterns; | |||||
* [`notebooks/tests`](https://github.com/jajupmochi/graphkit-learn/tree/master/notebooks/tests) directory includes codes that test some libraries and functions; | |||||
* [`notebooks/utils`](https://github.com/jajupmochi/graphkit-learn/tree/master/notebooks/utils) directory includes some useful tools, such as a Gram matrix checker and a function to get properties of datasets; | |||||
* [`notebooks/else`](https://github.com/jajupmochi/graphkit-learn/tree/master/notebooks/else) directory includes other codes that we used for experiments. | |||||
### Documentation | |||||
The docs of the library can be found [here](https://graphkit-learn.readthedocs.io/en/master/?badge=master). | |||||
## Main contents | |||||
### 1 List of graph kernels | |||||
* Based on walks | |||||
* [The common walk kernel](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/kernels/common_walk.py) [1] | |||||
* Exponential | |||||
* Geometric | |||||
* [The marginalized kenrel](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/kernels/marginalized.py) | |||||
* With tottering [2] | |||||
* Without tottering [7] | |||||
* [The generalized random walk kernel](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/kernels/random_walk.py) [3] | |||||
* [Sylvester equation](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/kernels/sylvester_equation.py) | |||||
* Conjugate gradient | |||||
* Fixed-point iterations | |||||
* [Spectral decomposition](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/kernels/spectral_decomposition.py) | |||||
* Based on paths | |||||
* [The shortest path kernel](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/kernels/shortest_path.py) [4] | |||||
* [The structural shortest path kernel](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/kernels/structural_sp.py) [5] | |||||
* [The path kernel up to length h](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/kernels/path_up_to_h.py) [6] | |||||
* The Tanimoto kernel | |||||
* The MinMax kernel | |||||
* Non-linear kernels | |||||
* [The treelet kernel](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/kernels/treelet.py) [10] | |||||
* [Weisfeiler-Lehman kernel](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/kernels/weisfeiler_lehman.py) [11] | |||||
* [Subtree](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/kernels/weisfeiler_lehman.py#L479) | |||||
A demo of computing graph kernels can be found on [Google Colab](https://colab.research.google.com/drive/17Q2QCl9CAtDweGF8LiWnWoN2laeJqT0u?usp=sharing) and in the [`examples`](https://github.com/jajupmochi/graphkit-learn/blob/master/gklearn/examples/compute_graph_kernel.py) folder. | |||||
### 2 Graph Edit Distances | |||||
### 3 Graph preimage methods | |||||
A demo of generating graph preimages can be found on [Google Colab](https://colab.research.google.com/drive/1PIDvHOcmiLEQ5Np3bgBDdu0kLOquOMQK?usp=sharing) and in the [`examples`](https://github.com/jajupmochi/graphkit-learn/blob/master/gklearn/examples/median_preimege_generator.py) folder. | |||||
### 4 Interface to `GEDLIB` | |||||
[`GEDLIB`](https://github.com/dbblumenthal/gedlib) is an easily extensible C++ library for (suboptimally) computing the graph edit distance between attributed graphs. [A Python interface](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/gedlib) for `GEDLIB` is integrated in this library, based on [`gedlibpy`](https://github.com/Ryurin/gedlibpy) library. | |||||
### 5 Computation optimization methods | |||||
* Python’s `multiprocessing.Pool` module is applied to perform **parallelization** on the computations of all kernels as well as the model selection. | |||||
* **The Fast Computation of Shortest Path Kernel (FCSP) method** [8] is implemented in *the random walk kernel*, *the shortest path kernel*, as well as *the structural shortest path kernel* where FCSP is applied on both vertex and edge kernels. | |||||
* **The trie data structure** [9] is employed in *the path kernel up to length h* to store paths in graphs. | |||||
## Issues | |||||
* This library uses `multiprocessing.Pool.imap_unordered` function to do the parallelization, which may not be able to run correctly under Windows system. For now, Windows users may need to comment the parallel codes and uncomment the codes below them which run serially. We will consider adding a parameter to control serial or parallel computations as needed. | |||||
* Some modules (such as `Numpy`, `Scipy`, `sklearn`) apply [`OpenBLAS`](https://www.openblas.net/) to perform parallel computation by default, which causes conflicts with other parallelization modules such as `multiprossing.Pool`, highly increasing the computing time. By setting its thread to 1, `OpenBLAS` is forced to use a single thread/CPU, thus avoids the conflicts. For now, this procedure has to be done manually. Under Linux, type this command in terminal before running the code: | |||||
``` | |||||
$ export OPENBLAS_NUM_THREADS=1 | |||||
``` | |||||
Or add `export OPENBLAS_NUM_THREADS=1` at the end of your `~/.bashrc` file, then run | |||||
``` | |||||
$ source ~/.bashrc | |||||
``` | |||||
to make this effective permanently. | |||||
## Results | |||||
Check this paper for detailed description of graph kernels and experimental results: | |||||
Linlin Jia, Benoit Gaüzère, and Paul Honeine. Graph Kernels Based on Linear Patterns: Theoretical and Experimental Comparisons. working paper or preprint, March 2019. URL https://hal-normandie-univ.archives-ouvertes.fr/hal-02053946. | |||||
A comparison of performances of graph kernels on benchmark datasets can be found [here](https://graphkit-learn.readthedocs.io/en/master/experiments.html). | |||||
## How to contribute | |||||
Fork the library and open a pull request! Make your own contribute to the community! | |||||
## Authors | |||||
* [Linlin Jia](https://jajupmochi.github.io/), LITIS, INSA Rouen Normandie | |||||
* [Benoit Gaüzère](http://pagesperso.litislab.fr/~bgauzere/#contact_en), LITIS, INSA Rouen Normandie | |||||
* [Paul Honeine](http://honeine.fr/paul/Welcome.html), LITIS, Université de Rouen Normandie | |||||
## Citation | |||||
Still waiting... | |||||
## Acknowledgments | |||||
This research was supported by CSC (China Scholarship Council) and the French national research agency (ANR) under the grant APi (ANR-18-CE23-0014). The authors would like to thank the CRIANN (Le Centre Régional Informatique et d’Applications Numériques de Normandie) for providing computational resources. | |||||
## References | |||||
[1] Thomas Gärtner, Peter Flach, and Stefan Wrobel. On graph kernels: Hardness results and efficient alternatives. Learning Theory and Kernel Machines, pages 129–143, 2003. | |||||
[2] H. Kashima, K. Tsuda, and A. Inokuchi. Marginalized kernels between labeled graphs. In Proceedings of the 20th International Conference on Machine Learning, Washington, DC, United States, 2003. | |||||
[3] Vishwanathan, S.V.N., Schraudolph, N.N., Kondor, R., Borgwardt, K.M., 2010. Graph kernels. Journal of Machine Learning Research 11, 1201–1242. | |||||
[4] K. M. Borgwardt and H.-P. Kriegel. Shortest-path kernels on graphs. In Proceedings of the International Conference on Data Mining, pages 74-81, 2005. | |||||
[5] Liva Ralaivola, Sanjay J Swamidass, Hiroto Saigo, and Pierre Baldi. Graph kernels for chemical informatics. Neural networks, 18(8):1093–1110, 2005. | |||||
[6] Suard F, Rakotomamonjy A, Bensrhair A. Kernel on Bag of Paths For Measuring Similarity of Shapes. InESANN 2007 Apr 25 (pp. 355-360). | |||||
[7] Mahé, P., Ueda, N., Akutsu, T., Perret, J.L., Vert, J.P., 2004. Extensions of marginalized graph kernels, in: Proc. the twenty-first international conference on Machine learning, ACM. p. 70. | |||||
[8] Lifan Xu, Wei Wang, M Alvarez, John Cavazos, and Dongping Zhang. Parallelization of shortest path graph kernels on multi-core cpus and gpus. Proceedings of the Programmability Issues for Heterogeneous Multicores (MultiProg), Vienna, Austria, 2014. | |||||
[9] Edward Fredkin. Trie memory. Communications of the ACM, 3(9):490–499, 1960. | |||||
[10] Gaüzere, B., Brun, L., Villemin, D., 2012. Two new graphs kernels in chemoinformatics. Pattern Recognition Letters 33, 2038–2047. | |||||
[11] Shervashidze, N., Schweitzer, P., Leeuwen, E.J.v., Mehlhorn, K., Borgwardt, K.M., 2011. Weisfeiler-lehman graph kernels. Journal of Machine Learning Research 12, 2539–2561. |
@@ -0,0 +1,29 @@ | |||||
--- | |||||
environment: | |||||
matrix: | |||||
- | |||||
PYTHON: "C:\\Python36" | |||||
- | |||||
PYTHON: "C:\\Python36-x64" | |||||
- | |||||
PYTHON: "C:\\Python37" | |||||
- | |||||
PYTHON: "C:\\Python37-x64" | |||||
- | |||||
PYTHON: "C:\\Python38" | |||||
- | |||||
PYTHON: "C:\\Python38-x64" | |||||
#skip_commits: | |||||
#files: | |||||
#- "*.yml" | |||||
#- "*.rst" | |||||
#- "LICENSE" | |||||
install: | |||||
- "%PYTHON%\\python.exe -m pip install -U pip" | |||||
- "%PYTHON%\\python.exe -m pip install wheel" | |||||
- "%PYTHON%\\python.exe -m pip install -r requirements.txt" | |||||
- "%PYTHON%\\python.exe -m pip install -U pytest" | |||||
build: false | |||||
test_script: | |||||
- "%PYTHON%\\python.exe setup.py bdist_wheel" | |||||
- "%PYTHON%\\python.exe -m pytest -v gklearn/tests/ --ignore=gklearn/tests/test_median_preimage_generator.py" |
@@ -0,0 +1,4 @@ | |||||
[run] | |||||
omit = | |||||
gklearn/tests/* | |||||
gklearn/examples/* |
@@ -0,0 +1,81 @@ | |||||
# Jupyter Notebook | |||||
.ipynb_checkpoints | |||||
datasets/* | |||||
!datasets/ds.py | |||||
!datasets/Alkane/ | |||||
!datasets/acyclic/ | |||||
!datasets/Acyclic/ | |||||
!datasets/MAO/ | |||||
!datasets/PAH/ | |||||
!datasets/MUTAG/ | |||||
!datasets/Letter-med/ | |||||
!datasets/ENZYMES_txt/ | |||||
!datasets/DD/ | |||||
!datasets/NCI1/ | |||||
!datasets/NCI109/ | |||||
!datasets/AIDS/ | |||||
!datasets/monoterpenoides/ | |||||
!datasets/Monoterpenoides/ | |||||
!datasets/Fingerprint/*.txt | |||||
!datasets/Cuneiform/*.txt | |||||
notebooks/results/* | |||||
notebooks/check_gm/* | |||||
notebooks/test_parallel/* | |||||
requirements/* | |||||
gklearn/model.py | |||||
gklearn/kernels/*_sym.py | |||||
*.npy | |||||
*.eps | |||||
*.dat | |||||
*.pyc | |||||
gklearn/preimage/* | |||||
!gklearn/preimage/*.py | |||||
!gklearn/preimage/experiments/*.py | |||||
!gklearn/preimage/experiments/tools/*.py | |||||
__pycache__ | |||||
##*# | |||||
docs/build/* | |||||
!docs/build/latex/*.pdf | |||||
docs/log* | |||||
*.egg-info | |||||
dist/ | |||||
build/ | |||||
.coverage | |||||
htmlcov | |||||
virtualenv | |||||
.vscode/ | |||||
# gedlibpy | |||||
gklearn/gedlib/build/ | |||||
gklearn/gedlib/build/__pycache__/ | |||||
gklearn/gedlib/collections/ | |||||
gklearn/gedlib/Median_Example/ | |||||
gklearn/gedlib/build/include/gedlib-master/median/collections/ | |||||
gklearn/gedlib/include/ | |||||
gklearn/gedlib/libgxlgedlib.so | |||||
# misc | |||||
notebooks/preimage/ | |||||
notebooks/unfinished | |||||
gklearn/kernels/else/ | |||||
gklearn/kernels/unfinished/ | |||||
gklearn/kernels/.tags | |||||
# pyenv | |||||
.python-version | |||||
# docker travis debug. | |||||
ci.sh | |||||
# outputs. | |||||
outputs/ | |||||
# pyCharm. | |||||
.idea/ |
@@ -0,0 +1,27 @@ | |||||
--- | |||||
#.readthedocs.yml | |||||
#Read the Docs configuration file | |||||
#See https://docs.readthedocs.io/en/stable/config-file/v2.html for details | |||||
#Required | |||||
version: 2 | |||||
#Build documentation in the docs/ directory with Sphinx | |||||
sphinx: | |||||
configuration: docs/source/conf.py | |||||
#Build documentation with MkDocs | |||||
#mkdocs: | |||||
#configuration: mkdocs.yml | |||||
#Optionally build your docs in additional formats such as PDF and ePub | |||||
formats: all | |||||
#Optionally set the version of Python and requirements required to build your docs | |||||
python: | |||||
version: 3.6 | |||||
install: | |||||
- | |||||
requirements: docs/requirements.txt | |||||
- | |||||
requirements: requirements.txt | |||||
- | |||||
method: pip | |||||
path: . | |||||
extra_requirements: | |||||
- docs |
@@ -0,0 +1,22 @@ | |||||
--- | |||||
language: python | |||||
python: | |||||
- '3.6' | |||||
- '3.7' | |||||
- '3.8' | |||||
before_install: | |||||
- python --version | |||||
- pip install -U pip | |||||
- pip install -U pytest | |||||
- pip install codecov | |||||
- pip install coverage | |||||
- pip install pytest-cov | |||||
- sudo apt-get -y install gfortran | |||||
install: | |||||
- pip install -r requirements.txt | |||||
- pip install wheel | |||||
script: | |||||
- python setup.py bdist_wheel | |||||
- if [ $TRAVIS_PYTHON_VERSION == 3.6 ]; then pytest -v --cov-config=.coveragerc --cov-report term --cov=gklearn gklearn/tests/; else pytest -v --cov-config=.coveragerc --cov-report term --cov=gklearn gklearn/tests/ --ignore=gklearn/tests/test_median_preimage_generator.py; fi | |||||
after_success: | |||||
- codecov |
@@ -0,0 +1,674 @@ | |||||
GNU GENERAL PUBLIC LICENSE | |||||
Version 3, 29 June 2007 | |||||
Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/> | |||||
Everyone is permitted to copy and distribute verbatim copies | |||||
of this license document, but changing it is not allowed. | |||||
Preamble | |||||
The GNU General Public License is a free, copyleft license for | |||||
software and other kinds of works. | |||||
The licenses for most software and other practical works are designed | |||||
to take away your freedom to share and change the works. By contrast, | |||||
the GNU General Public License is intended to guarantee your freedom to | |||||
share and change all versions of a program--to make sure it remains free | |||||
software for all its users. We, the Free Software Foundation, use the | |||||
GNU General Public License for most of our software; it applies also to | |||||
any other work released this way by its authors. You can apply it to | |||||
your programs, too. | |||||
When we speak of free software, we are referring to freedom, not | |||||
price. Our General Public Licenses are designed to make sure that you | |||||
have the freedom to distribute copies of free software (and charge for | |||||
them if you wish), that you receive source code or can get it if you | |||||
want it, that you can change the software or use pieces of it in new | |||||
free programs, and that you know you can do these things. | |||||
To protect your rights, we need to prevent others from denying you | |||||
these rights or asking you to surrender the rights. Therefore, you have | |||||
certain responsibilities if you distribute copies of the software, or if | |||||
you modify it: responsibilities to respect the freedom of others. | |||||
For example, if you distribute copies of such a program, whether | |||||
gratis or for a fee, you must pass on to the recipients the same | |||||
freedoms that you received. You must make sure that they, too, receive | |||||
or can get the source code. And you must show them these terms so they | |||||
know their rights. | |||||
Developers that use the GNU GPL protect your rights with two steps: | |||||
(1) assert copyright on the software, and (2) offer you this License | |||||
giving you legal permission to copy, distribute and/or modify it. | |||||
For the developers' and authors' protection, the GPL clearly explains | |||||
that there is no warranty for this free software. For both users' and | |||||
authors' sake, the GPL requires that modified versions be marked as | |||||
changed, so that their problems will not be attributed erroneously to | |||||
authors of previous versions. | |||||
Some devices are designed to deny users access to install or run | |||||
modified versions of the software inside them, although the manufacturer | |||||
can do so. This is fundamentally incompatible with the aim of | |||||
protecting users' freedom to change the software. The systematic | |||||
pattern of such abuse occurs in the area of products for individuals to | |||||
use, which is precisely where it is most unacceptable. Therefore, we | |||||
have designed this version of the GPL to prohibit the practice for those | |||||
products. If such problems arise substantially in other domains, we | |||||
stand ready to extend this provision to those domains in future versions | |||||
of the GPL, as needed to protect the freedom of users. | |||||
Finally, every program is threatened constantly by software patents. | |||||
States should not allow patents to restrict development and use of | |||||
software on general-purpose computers, but in those that do, we wish to | |||||
avoid the special danger that patents applied to a free program could | |||||
make it effectively proprietary. To prevent this, the GPL assures that | |||||
patents cannot be used to render the program non-free. | |||||
The precise terms and conditions for copying, distribution and | |||||
modification follow. | |||||
TERMS AND CONDITIONS | |||||
0. Definitions. | |||||
"This License" refers to version 3 of the GNU General Public License. | |||||
"Copyright" also means copyright-like laws that apply to other kinds of | |||||
works, such as semiconductor masks. | |||||
"The Program" refers to any copyrightable work licensed under this | |||||
License. Each licensee is addressed as "you". "Licensees" and | |||||
"recipients" may be individuals or organizations. | |||||
To "modify" a work means to copy from or adapt all or part of the work | |||||
in a fashion requiring copyright permission, other than the making of an | |||||
exact copy. The resulting work is called a "modified version" of the | |||||
earlier work or a work "based on" the earlier work. | |||||
A "covered work" means either the unmodified Program or a work based | |||||
on the Program. | |||||
To "propagate" a work means to do anything with it that, without | |||||
permission, would make you directly or secondarily liable for | |||||
infringement under applicable copyright law, except executing it on a | |||||
computer or modifying a private copy. Propagation includes copying, | |||||
distribution (with or without modification), making available to the | |||||
public, and in some countries other activities as well. | |||||
To "convey" a work means any kind of propagation that enables other | |||||
parties to make or receive copies. Mere interaction with a user through | |||||
a computer network, with no transfer of a copy, is not conveying. | |||||
An interactive user interface displays "Appropriate Legal Notices" | |||||
to the extent that it includes a convenient and prominently visible | |||||
feature that (1) displays an appropriate copyright notice, and (2) | |||||
tells the user that there is no warranty for the work (except to the | |||||
extent that warranties are provided), that licensees may convey the | |||||
work under this License, and how to view a copy of this License. If | |||||
the interface presents a list of user commands or options, such as a | |||||
menu, a prominent item in the list meets this criterion. | |||||
1. Source Code. | |||||
The "source code" for a work means the preferred form of the work | |||||
for making modifications to it. "Object code" means any non-source | |||||
form of a work. | |||||
A "Standard Interface" means an interface that either is an official | |||||
standard defined by a recognized standards body, or, in the case of | |||||
interfaces specified for a particular programming language, one that | |||||
is widely used among developers working in that language. | |||||
The "System Libraries" of an executable work include anything, other | |||||
than the work as a whole, that (a) is included in the normal form of | |||||
packaging a Major Component, but which is not part of that Major | |||||
Component, and (b) serves only to enable use of the work with that | |||||
Major Component, or to implement a Standard Interface for which an | |||||
implementation is available to the public in source code form. A | |||||
"Major Component", in this context, means a major essential component | |||||
(kernel, window system, and so on) of the specific operating system | |||||
(if any) on which the executable work runs, or a compiler used to | |||||
produce the work, or an object code interpreter used to run it. | |||||
The "Corresponding Source" for a work in object code form means all | |||||
the source code needed to generate, install, and (for an executable | |||||
work) run the object code and to modify the work, including scripts to | |||||
control those activities. However, it does not include the work's | |||||
System Libraries, or general-purpose tools or generally available free | |||||
programs which are used unmodified in performing those activities but | |||||
which are not part of the work. For example, Corresponding Source | |||||
includes interface definition files associated with source files for | |||||
the work, and the source code for shared libraries and dynamically | |||||
linked subprograms that the work is specifically designed to require, | |||||
such as by intimate data communication or control flow between those | |||||
subprograms and other parts of the work. | |||||
The Corresponding Source need not include anything that users | |||||
can regenerate automatically from other parts of the Corresponding | |||||
Source. | |||||
The Corresponding Source for a work in source code form is that | |||||
same work. | |||||
2. Basic Permissions. | |||||
All rights granted under this License are granted for the term of | |||||
copyright on the Program, and are irrevocable provided the stated | |||||
conditions are met. This License explicitly affirms your unlimited | |||||
permission to run the unmodified Program. The output from running a | |||||
covered work is covered by this License only if the output, given its | |||||
content, constitutes a covered work. This License acknowledges your | |||||
rights of fair use or other equivalent, as provided by copyright law. | |||||
You may make, run and propagate covered works that you do not | |||||
convey, without conditions so long as your license otherwise remains | |||||
in force. You may convey covered works to others for the sole purpose | |||||
of having them make modifications exclusively for you, or provide you | |||||
with facilities for running those works, provided that you comply with | |||||
the terms of this License in conveying all material for which you do | |||||
not control copyright. Those thus making or running the covered works | |||||
for you must do so exclusively on your behalf, under your direction | |||||
and control, on terms that prohibit them from making any copies of | |||||
your copyrighted material outside their relationship with you. | |||||
Conveying under any other circumstances is permitted solely under | |||||
the conditions stated below. Sublicensing is not allowed; section 10 | |||||
makes it unnecessary. | |||||
3. Protecting Users' Legal Rights From Anti-Circumvention Law. | |||||
No covered work shall be deemed part of an effective technological | |||||
measure under any applicable law fulfilling obligations under article | |||||
11 of the WIPO copyright treaty adopted on 20 December 1996, or | |||||
similar laws prohibiting or restricting circumvention of such | |||||
measures. | |||||
When you convey a covered work, you waive any legal power to forbid | |||||
circumvention of technological measures to the extent such circumvention | |||||
is effected by exercising rights under this License with respect to | |||||
the covered work, and you disclaim any intention to limit operation or | |||||
modification of the work as a means of enforcing, against the work's | |||||
users, your or third parties' legal rights to forbid circumvention of | |||||
technological measures. | |||||
4. Conveying Verbatim Copies. | |||||
You may convey verbatim copies of the Program's source code as you | |||||
receive it, in any medium, provided that you conspicuously and | |||||
appropriately publish on each copy an appropriate copyright notice; | |||||
keep intact all notices stating that this License and any | |||||
non-permissive terms added in accord with section 7 apply to the code; | |||||
keep intact all notices of the absence of any warranty; and give all | |||||
recipients a copy of this License along with the Program. | |||||
You may charge any price or no price for each copy that you convey, | |||||
and you may offer support or warranty protection for a fee. | |||||
5. Conveying Modified Source Versions. | |||||
You may convey a work based on the Program, or the modifications to | |||||
produce it from the Program, in the form of source code under the | |||||
terms of section 4, provided that you also meet all of these conditions: | |||||
a) The work must carry prominent notices stating that you modified | |||||
it, and giving a relevant date. | |||||
b) The work must carry prominent notices stating that it is | |||||
released under this License and any conditions added under section | |||||
7. This requirement modifies the requirement in section 4 to | |||||
"keep intact all notices". | |||||
c) You must license the entire work, as a whole, under this | |||||
License to anyone who comes into possession of a copy. This | |||||
License will therefore apply, along with any applicable section 7 | |||||
additional terms, to the whole of the work, and all its parts, | |||||
regardless of how they are packaged. This License gives no | |||||
permission to license the work in any other way, but it does not | |||||
invalidate such permission if you have separately received it. | |||||
d) If the work has interactive user interfaces, each must display | |||||
Appropriate Legal Notices; however, if the Program has interactive | |||||
interfaces that do not display Appropriate Legal Notices, your | |||||
work need not make them do so. | |||||
A compilation of a covered work with other separate and independent | |||||
works, which are not by their nature extensions of the covered work, | |||||
and which are not combined with it such as to form a larger program, | |||||
in or on a volume of a storage or distribution medium, is called an | |||||
"aggregate" if the compilation and its resulting copyright are not | |||||
used to limit the access or legal rights of the compilation's users | |||||
beyond what the individual works permit. Inclusion of a covered work | |||||
in an aggregate does not cause this License to apply to the other | |||||
parts of the aggregate. | |||||
6. Conveying Non-Source Forms. | |||||
You may convey a covered work in object code form under the terms | |||||
of sections 4 and 5, provided that you also convey the | |||||
machine-readable Corresponding Source under the terms of this License, | |||||
in one of these ways: | |||||
a) Convey the object code in, or embodied in, a physical product | |||||
(including a physical distribution medium), accompanied by the | |||||
Corresponding Source fixed on a durable physical medium | |||||
customarily used for software interchange. | |||||
b) Convey the object code in, or embodied in, a physical product | |||||
(including a physical distribution medium), accompanied by a | |||||
written offer, valid for at least three years and valid for as | |||||
long as you offer spare parts or customer support for that product | |||||
model, to give anyone who possesses the object code either (1) a | |||||
copy of the Corresponding Source for all the software in the | |||||
product that is covered by this License, on a durable physical | |||||
medium customarily used for software interchange, for a price no | |||||
more than your reasonable cost of physically performing this | |||||
conveying of source, or (2) access to copy the | |||||
Corresponding Source from a network server at no charge. | |||||
c) Convey individual copies of the object code with a copy of the | |||||
written offer to provide the Corresponding Source. This | |||||
alternative is allowed only occasionally and noncommercially, and | |||||
only if you received the object code with such an offer, in accord | |||||
with subsection 6b. | |||||
d) Convey the object code by offering access from a designated | |||||
place (gratis or for a charge), and offer equivalent access to the | |||||
Corresponding Source in the same way through the same place at no | |||||
further charge. You need not require recipients to copy the | |||||
Corresponding Source along with the object code. If the place to | |||||
copy the object code is a network server, the Corresponding Source | |||||
may be on a different server (operated by you or a third party) | |||||
that supports equivalent copying facilities, provided you maintain | |||||
clear directions next to the object code saying where to find the | |||||
Corresponding Source. Regardless of what server hosts the | |||||
Corresponding Source, you remain obligated to ensure that it is | |||||
available for as long as needed to satisfy these requirements. | |||||
e) Convey the object code using peer-to-peer transmission, provided | |||||
you inform other peers where the object code and Corresponding | |||||
Source of the work are being offered to the general public at no | |||||
charge under subsection 6d. | |||||
A separable portion of the object code, whose source code is excluded | |||||
from the Corresponding Source as a System Library, need not be | |||||
included in conveying the object code work. | |||||
A "User Product" is either (1) a "consumer product", which means any | |||||
tangible personal property which is normally used for personal, family, | |||||
or household purposes, or (2) anything designed or sold for incorporation | |||||
into a dwelling. In determining whether a product is a consumer product, | |||||
doubtful cases shall be resolved in favor of coverage. For a particular | |||||
product received by a particular user, "normally used" refers to a | |||||
typical or common use of that class of product, regardless of the status | |||||
of the particular user or of the way in which the particular user | |||||
actually uses, or expects or is expected to use, the product. A product | |||||
is a consumer product regardless of whether the product has substantial | |||||
commercial, industrial or non-consumer uses, unless such uses represent | |||||
the only significant mode of use of the product. | |||||
"Installation Information" for a User Product means any methods, | |||||
procedures, authorization keys, or other information required to install | |||||
and execute modified versions of a covered work in that User Product from | |||||
a modified version of its Corresponding Source. The information must | |||||
suffice to ensure that the continued functioning of the modified object | |||||
code is in no case prevented or interfered with solely because | |||||
modification has been made. | |||||
If you convey an object code work under this section in, or with, or | |||||
specifically for use in, a User Product, and the conveying occurs as | |||||
part of a transaction in which the right of possession and use of the | |||||
User Product is transferred to the recipient in perpetuity or for a | |||||
fixed term (regardless of how the transaction is characterized), the | |||||
Corresponding Source conveyed under this section must be accompanied | |||||
by the Installation Information. But this requirement does not apply | |||||
if neither you nor any third party retains the ability to install | |||||
modified object code on the User Product (for example, the work has | |||||
been installed in ROM). | |||||
The requirement to provide Installation Information does not include a | |||||
requirement to continue to provide support service, warranty, or updates | |||||
for a work that has been modified or installed by the recipient, or for | |||||
the User Product in which it has been modified or installed. Access to a | |||||
network may be denied when the modification itself materially and | |||||
adversely affects the operation of the network or violates the rules and | |||||
protocols for communication across the network. | |||||
Corresponding Source conveyed, and Installation Information provided, | |||||
in accord with this section must be in a format that is publicly | |||||
documented (and with an implementation available to the public in | |||||
source code form), and must require no special password or key for | |||||
unpacking, reading or copying. | |||||
7. Additional Terms. | |||||
"Additional permissions" are terms that supplement the terms of this | |||||
License by making exceptions from one or more of its conditions. | |||||
Additional permissions that are applicable to the entire Program shall | |||||
be treated as though they were included in this License, to the extent | |||||
that they are valid under applicable law. If additional permissions | |||||
apply only to part of the Program, that part may be used separately | |||||
under those permissions, but the entire Program remains governed by | |||||
this License without regard to the additional permissions. | |||||
When you convey a copy of a covered work, you may at your option | |||||
remove any additional permissions from that copy, or from any part of | |||||
it. (Additional permissions may be written to require their own | |||||
removal in certain cases when you modify the work.) You may place | |||||
additional permissions on material, added by you to a covered work, | |||||
for which you have or can give appropriate copyright permission. | |||||
Notwithstanding any other provision of this License, for material you | |||||
add to a covered work, you may (if authorized by the copyright holders of | |||||
that material) supplement the terms of this License with terms: | |||||
a) Disclaiming warranty or limiting liability differently from the | |||||
terms of sections 15 and 16 of this License; or | |||||
b) Requiring preservation of specified reasonable legal notices or | |||||
author attributions in that material or in the Appropriate Legal | |||||
Notices displayed by works containing it; or | |||||
c) Prohibiting misrepresentation of the origin of that material, or | |||||
requiring that modified versions of such material be marked in | |||||
reasonable ways as different from the original version; or | |||||
d) Limiting the use for publicity purposes of names of licensors or | |||||
authors of the material; or | |||||
e) Declining to grant rights under trademark law for use of some | |||||
trade names, trademarks, or service marks; or | |||||
f) Requiring indemnification of licensors and authors of that | |||||
material by anyone who conveys the material (or modified versions of | |||||
it) with contractual assumptions of liability to the recipient, for | |||||
any liability that these contractual assumptions directly impose on | |||||
those licensors and authors. | |||||
All other non-permissive additional terms are considered "further | |||||
restrictions" within the meaning of section 10. If the Program as you | |||||
received it, or any part of it, contains a notice stating that it is | |||||
governed by this License along with a term that is a further | |||||
restriction, you may remove that term. If a license document contains | |||||
a further restriction but permits relicensing or conveying under this | |||||
License, you may add to a covered work material governed by the terms | |||||
of that license document, provided that the further restriction does | |||||
not survive such relicensing or conveying. | |||||
If you add terms to a covered work in accord with this section, you | |||||
must place, in the relevant source files, a statement of the | |||||
additional terms that apply to those files, or a notice indicating | |||||
where to find the applicable terms. | |||||
Additional terms, permissive or non-permissive, may be stated in the | |||||
form of a separately written license, or stated as exceptions; | |||||
the above requirements apply either way. | |||||
8. Termination. | |||||
You may not propagate or modify a covered work except as expressly | |||||
provided under this License. Any attempt otherwise to propagate or | |||||
modify it is void, and will automatically terminate your rights under | |||||
this License (including any patent licenses granted under the third | |||||
paragraph of section 11). | |||||
However, if you cease all violation of this License, then your | |||||
license from a particular copyright holder is reinstated (a) | |||||
provisionally, unless and until the copyright holder explicitly and | |||||
finally terminates your license, and (b) permanently, if the copyright | |||||
holder fails to notify you of the violation by some reasonable means | |||||
prior to 60 days after the cessation. | |||||
Moreover, your license from a particular copyright holder is | |||||
reinstated permanently if the copyright holder notifies you of the | |||||
violation by some reasonable means, this is the first time you have | |||||
received notice of violation of this License (for any work) from that | |||||
copyright holder, and you cure the violation prior to 30 days after | |||||
your receipt of the notice. | |||||
Termination of your rights under this section does not terminate the | |||||
licenses of parties who have received copies or rights from you under | |||||
this License. If your rights have been terminated and not permanently | |||||
reinstated, you do not qualify to receive new licenses for the same | |||||
material under section 10. | |||||
9. Acceptance Not Required for Having Copies. | |||||
You are not required to accept this License in order to receive or | |||||
run a copy of the Program. Ancillary propagation of a covered work | |||||
occurring solely as a consequence of using peer-to-peer transmission | |||||
to receive a copy likewise does not require acceptance. However, | |||||
nothing other than this License grants you permission to propagate or | |||||
modify any covered work. These actions infringe copyright if you do | |||||
not accept this License. Therefore, by modifying or propagating a | |||||
covered work, you indicate your acceptance of this License to do so. | |||||
10. Automatic Licensing of Downstream Recipients. | |||||
Each time you convey a covered work, the recipient automatically | |||||
receives a license from the original licensors, to run, modify and | |||||
propagate that work, subject to this License. You are not responsible | |||||
for enforcing compliance by third parties with this License. | |||||
An "entity transaction" is a transaction transferring control of an | |||||
organization, or substantially all assets of one, or subdividing an | |||||
organization, or merging organizations. If propagation of a covered | |||||
work results from an entity transaction, each party to that | |||||
transaction who receives a copy of the work also receives whatever | |||||
licenses to the work the party's predecessor in interest had or could | |||||
give under the previous paragraph, plus a right to possession of the | |||||
Corresponding Source of the work from the predecessor in interest, if | |||||
the predecessor has it or can get it with reasonable efforts. | |||||
You may not impose any further restrictions on the exercise of the | |||||
rights granted or affirmed under this License. For example, you may | |||||
not impose a license fee, royalty, or other charge for exercise of | |||||
rights granted under this License, and you may not initiate litigation | |||||
(including a cross-claim or counterclaim in a lawsuit) alleging that | |||||
any patent claim is infringed by making, using, selling, offering for | |||||
sale, or importing the Program or any portion of it. | |||||
11. Patents. | |||||
A "contributor" is a copyright holder who authorizes use under this | |||||
License of the Program or a work on which the Program is based. The | |||||
work thus licensed is called the contributor's "contributor version". | |||||
A contributor's "essential patent claims" are all patent claims | |||||
owned or controlled by the contributor, whether already acquired or | |||||
hereafter acquired, that would be infringed by some manner, permitted | |||||
by this License, of making, using, or selling its contributor version, | |||||
but do not include claims that would be infringed only as a | |||||
consequence of further modification of the contributor version. For | |||||
purposes of this definition, "control" includes the right to grant | |||||
patent sublicenses in a manner consistent with the requirements of | |||||
this License. | |||||
Each contributor grants you a non-exclusive, worldwide, royalty-free | |||||
patent license under the contributor's essential patent claims, to | |||||
make, use, sell, offer for sale, import and otherwise run, modify and | |||||
propagate the contents of its contributor version. | |||||
In the following three paragraphs, a "patent license" is any express | |||||
agreement or commitment, however denominated, not to enforce a patent | |||||
(such as an express permission to practice a patent or covenant not to | |||||
sue for patent infringement). To "grant" such a patent license to a | |||||
party means to make such an agreement or commitment not to enforce a | |||||
patent against the party. | |||||
If you convey a covered work, knowingly relying on a patent license, | |||||
and the Corresponding Source of the work is not available for anyone | |||||
to copy, free of charge and under the terms of this License, through a | |||||
publicly available network server or other readily accessible means, | |||||
then you must either (1) cause the Corresponding Source to be so | |||||
available, or (2) arrange to deprive yourself of the benefit of the | |||||
patent license for this particular work, or (3) arrange, in a manner | |||||
consistent with the requirements of this License, to extend the patent | |||||
license to downstream recipients. "Knowingly relying" means you have | |||||
actual knowledge that, but for the patent license, your conveying the | |||||
covered work in a country, or your recipient's use of the covered work | |||||
in a country, would infringe one or more identifiable patents in that | |||||
country that you have reason to believe are valid. | |||||
If, pursuant to or in connection with a single transaction or | |||||
arrangement, you convey, or propagate by procuring conveyance of, a | |||||
covered work, and grant a patent license to some of the parties | |||||
receiving the covered work authorizing them to use, propagate, modify | |||||
or convey a specific copy of the covered work, then the patent license | |||||
you grant is automatically extended to all recipients of the covered | |||||
work and works based on it. | |||||
A patent license is "discriminatory" if it does not include within | |||||
the scope of its coverage, prohibits the exercise of, or is | |||||
conditioned on the non-exercise of one or more of the rights that are | |||||
specifically granted under this License. You may not convey a covered | |||||
work if you are a party to an arrangement with a third party that is | |||||
in the business of distributing software, under which you make payment | |||||
to the third party based on the extent of your activity of conveying | |||||
the work, and under which the third party grants, to any of the | |||||
parties who would receive the covered work from you, a discriminatory | |||||
patent license (a) in connection with copies of the covered work | |||||
conveyed by you (or copies made from those copies), or (b) primarily | |||||
for and in connection with specific products or compilations that | |||||
contain the covered work, unless you entered into that arrangement, | |||||
or that patent license was granted, prior to 28 March 2007. | |||||
Nothing in this License shall be construed as excluding or limiting | |||||
any implied license or other defenses to infringement that may | |||||
otherwise be available to you under applicable patent law. | |||||
12. No Surrender of Others' Freedom. | |||||
If conditions are imposed on you (whether by court order, agreement or | |||||
otherwise) that contradict the conditions of this License, they do not | |||||
excuse you from the conditions of this License. If you cannot convey a | |||||
covered work so as to satisfy simultaneously your obligations under this | |||||
License and any other pertinent obligations, then as a consequence you may | |||||
not convey it at all. For example, if you agree to terms that obligate you | |||||
to collect a royalty for further conveying from those to whom you convey | |||||
the Program, the only way you could satisfy both those terms and this | |||||
License would be to refrain entirely from conveying the Program. | |||||
13. Use with the GNU Affero General Public License. | |||||
Notwithstanding any other provision of this License, you have | |||||
permission to link or combine any covered work with a work licensed | |||||
under version 3 of the GNU Affero General Public License into a single | |||||
combined work, and to convey the resulting work. The terms of this | |||||
License will continue to apply to the part which is the covered work, | |||||
but the special requirements of the GNU Affero General Public License, | |||||
section 13, concerning interaction through a network will apply to the | |||||
combination as such. | |||||
14. Revised Versions of this License. | |||||
The Free Software Foundation may publish revised and/or new versions of | |||||
the GNU General Public License from time to time. Such new versions will | |||||
be similar in spirit to the present version, but may differ in detail to | |||||
address new problems or concerns. | |||||
Each version is given a distinguishing version number. If the | |||||
Program specifies that a certain numbered version of the GNU General | |||||
Public License "or any later version" applies to it, you have the | |||||
option of following the terms and conditions either of that numbered | |||||
version or of any later version published by the Free Software | |||||
Foundation. If the Program does not specify a version number of the | |||||
GNU General Public License, you may choose any version ever published | |||||
by the Free Software Foundation. | |||||
If the Program specifies that a proxy can decide which future | |||||
versions of the GNU General Public License can be used, that proxy's | |||||
public statement of acceptance of a version permanently authorizes you | |||||
to choose that version for the Program. | |||||
Later license versions may give you additional or different | |||||
permissions. However, no additional obligations are imposed on any | |||||
author or copyright holder as a result of your choosing to follow a | |||||
later version. | |||||
15. Disclaimer of Warranty. | |||||
THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY | |||||
APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT | |||||
HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY | |||||
OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, | |||||
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR | |||||
PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM | |||||
IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF | |||||
ALL NECESSARY SERVICING, REPAIR OR CORRECTION. | |||||
16. Limitation of Liability. | |||||
IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING | |||||
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS | |||||
THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY | |||||
GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE | |||||
USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF | |||||
DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD | |||||
PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), | |||||
EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF | |||||
SUCH DAMAGES. | |||||
17. Interpretation of Sections 15 and 16. | |||||
If the disclaimer of warranty and limitation of liability provided | |||||
above cannot be given local legal effect according to their terms, | |||||
reviewing courts shall apply local law that most closely approximates | |||||
an absolute waiver of all civil liability in connection with the | |||||
Program, unless a warranty or assumption of liability accompanies a | |||||
copy of the Program in return for a fee. | |||||
END OF TERMS AND CONDITIONS | |||||
How to Apply These Terms to Your New Programs | |||||
If you develop a new program, and you want it to be of the greatest | |||||
possible use to the public, the best way to achieve this is to make it | |||||
free software which everyone can redistribute and change under these terms. | |||||
To do so, attach the following notices to the program. It is safest | |||||
to attach them to the start of each source file to most effectively | |||||
state the exclusion of warranty; and each file should have at least | |||||
the "copyright" line and a pointer to where the full notice is found. | |||||
<one line to give the program's name and a brief idea of what it does.> | |||||
Copyright (C) <year> <name of author> | |||||
This program is free software: you can redistribute it and/or modify | |||||
it under the terms of the GNU General Public License as published by | |||||
the Free Software Foundation, either version 3 of the License, or | |||||
(at your option) any later version. | |||||
This program is distributed in the hope that it will be useful, | |||||
but WITHOUT ANY WARRANTY; without even the implied warranty of | |||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | |||||
GNU General Public License for more details. | |||||
You should have received a copy of the GNU General Public License | |||||
along with this program. If not, see <http://www.gnu.org/licenses/>. | |||||
Also add information on how to contact you by electronic and paper mail. | |||||
If the program does terminal interaction, make it output a short | |||||
notice like this when it starts in an interactive mode: | |||||
<program> Copyright (C) <year> <name of author> | |||||
This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'. | |||||
This is free software, and you are welcome to redistribute it | |||||
under certain conditions; type `show c' for details. | |||||
The hypothetical commands `show w' and `show c' should show the appropriate | |||||
parts of the General Public License. Of course, your program's commands | |||||
might be different; for a GUI interface, you would use an "about box". | |||||
You should also get your employer (if you work as a programmer) or school, | |||||
if any, to sign a "copyright disclaimer" for the program, if necessary. | |||||
For more information on this, and how to apply and follow the GNU GPL, see | |||||
<http://www.gnu.org/licenses/>. | |||||
The GNU General Public License does not permit incorporating your program | |||||
into proprietary programs. If your program is a subroutine library, you | |||||
may consider it more useful to permit linking proprietary applications with | |||||
the library. If this is what you want to do, use the GNU Lesser General | |||||
Public License instead of this License. But first, please read | |||||
<http://www.gnu.org/philosophy/why-not-lgpl.html>. |
@@ -0,0 +1,23 @@ | |||||
# About graph kenrels. | |||||
## (Random walk) Sylvester equation kernel. | |||||
### ImportError: cannot import name 'frange' from 'matplotlib.mlab' | |||||
You are using an outdated `control` with a recent `matplotlib`. `mlab.frange` was removed in `matplotlib-3.1.0`, and `control` removed the call in `control-0.8.2`. | |||||
Update your `control` package. | |||||
### Intel MKL FATAL ERROR: Cannot load libmkl_avx2.so or libmkl_def.so. | |||||
The Intel Math Kernel Library (MKL) is missing or not properly set. I assume MKL is required by the `control` module. | |||||
Install MKL. Then add the following to your path: | |||||
``` | |||||
export PATH=/opt/intel/bin:$PATH | |||||
export LD_LIBRARY_PATH=/opt/intel/lib/intel64:/opt/intel/mkl/lib/intel64:$LD_LIBRARY_PATH | |||||
export LD_PRELOAD=/opt/intel/mkl/lib/intel64/libmkl_def.so:/opt/intel/mkl/lib/intel64/libmkl_avx2.so:/opt/intel/mkl/lib/intel64/libmkl_core.so:/opt/intel/mkl/lib/intel64/libmkl_intel_lp64.so:/opt/intel/mkl/lib/intel64/libmkl_intel_thread.so:/opt/intel/lib/intel64_lin/libiomp5.so | |||||
``` |
@@ -0,0 +1,165 @@ | |||||
# graphkit-learn | |||||
[](https://travis-ci.org/jajupmochi/graphkit-learn) [](https://ci.appveyor.com/project/jajupmochi/graphkit-learn) [](https://codecov.io/gh/jajupmochi/graphkit-learn) [](https://graphkit-learn.readthedocs.io/en/master/?badge=master) [](https://badge.fury.io/py/graphkit-learn) | |||||
A Python package for graph kernels, graph edit distances and graph pre-image problem. | |||||
## Requirements | |||||
* python>=3.6 | |||||
* numpy>=1.16.2 | |||||
* scipy>=1.1.0 | |||||
* matplotlib>=3.1.0 | |||||
* networkx>=2.2 | |||||
* scikit-learn>=0.20.0 | |||||
* tabulate>=0.8.2 | |||||
* tqdm>=4.26.0 | |||||
* control>=0.8.2 (for generalized random walk kernels only) | |||||
* slycot>0.4.0 (for generalized random walk kernels only, which requires a fortran compiler, gfortran for example) | |||||
## How to use? | |||||
### Install the library | |||||
* Install stable version from PyPI (may not be up-to-date): | |||||
``` | |||||
$ pip install graphkit-learn | |||||
``` | |||||
* Install latest version from GitHub: | |||||
``` | |||||
$ git clone https://github.com/jajupmochi/graphkit-learn.git | |||||
$ cd graphkit-learn/ | |||||
$ python setup.py install | |||||
``` | |||||
### Run the test | |||||
A series of [tests](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/tests) can be run to check if the library works correctly: | |||||
``` | |||||
$ pip install -U pip pytest codecov coverage pytest-cov | |||||
$ pytest -v --cov-config=.coveragerc --cov-report term --cov=gklearn gklearn/tests/ | |||||
``` | |||||
### Check examples | |||||
A series of demos of using the library can be found on [Google Colab](https://drive.google.com/drive/folders/1r2gtPuFzIys2_MZw1wXqE2w3oCoVoQUG?usp=sharing) and in the [`example`](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/examples) folder. | |||||
### Other demos | |||||
Check [`notebooks`](https://github.com/jajupmochi/graphkit-learn/tree/master/notebooks) directory for more demos: | |||||
* [`notebooks`](https://github.com/jajupmochi/graphkit-learn/tree/master/notebooks) directory includes test codes of graph kernels based on linear patterns; | |||||
* [`notebooks/tests`](https://github.com/jajupmochi/graphkit-learn/tree/master/notebooks/tests) directory includes codes that test some libraries and functions; | |||||
* [`notebooks/utils`](https://github.com/jajupmochi/graphkit-learn/tree/master/notebooks/utils) directory includes some useful tools, such as a Gram matrix checker and a function to get properties of datasets; | |||||
* [`notebooks/else`](https://github.com/jajupmochi/graphkit-learn/tree/master/notebooks/else) directory includes other codes that we used for experiments. | |||||
### Documentation | |||||
The docs of the library can be found [here](https://graphkit-learn.readthedocs.io/en/master/?badge=master). | |||||
## Main contents | |||||
### 1 List of graph kernels | |||||
* Based on walks | |||||
* [The common walk kernel](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/kernels/common_walk.py) [1] | |||||
* Exponential | |||||
* Geometric | |||||
* [The marginalized kenrel](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/kernels/marginalized.py) | |||||
* With tottering [2] | |||||
* Without tottering [7] | |||||
* [The generalized random walk kernel](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/kernels/random_walk.py) [3] | |||||
* [Sylvester equation](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/kernels/sylvester_equation.py) | |||||
* Conjugate gradient | |||||
* Fixed-point iterations | |||||
* [Spectral decomposition](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/kernels/spectral_decomposition.py) | |||||
* Based on paths | |||||
* [The shortest path kernel](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/kernels/shortest_path.py) [4] | |||||
* [The structural shortest path kernel](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/kernels/structural_sp.py) [5] | |||||
* [The path kernel up to length h](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/kernels/path_up_to_h.py) [6] | |||||
* The Tanimoto kernel | |||||
* The MinMax kernel | |||||
* Non-linear kernels | |||||
* [The treelet kernel](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/kernels/treelet.py) [10] | |||||
* [Weisfeiler-Lehman kernel](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/kernels/weisfeiler_lehman.py) [11] | |||||
* [Subtree](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/kernels/weisfeiler_lehman.py#L479) | |||||
A demo of computing graph kernels can be found on [Google Colab](https://colab.research.google.com/drive/17Q2QCl9CAtDweGF8LiWnWoN2laeJqT0u?usp=sharing) and in the [`examples`](https://github.com/jajupmochi/graphkit-learn/blob/master/gklearn/examples/compute_graph_kernel.py) folder. | |||||
### 2 Graph Edit Distances | |||||
### 3 Graph preimage methods | |||||
A demo of generating graph preimages can be found on [Google Colab](https://colab.research.google.com/drive/1PIDvHOcmiLEQ5Np3bgBDdu0kLOquOMQK?usp=sharing) and in the [`examples`](https://github.com/jajupmochi/graphkit-learn/blob/master/gklearn/examples/median_preimege_generator.py) folder. | |||||
### 4 Interface to `GEDLIB` | |||||
[`GEDLIB`](https://github.com/dbblumenthal/gedlib) is an easily extensible C++ library for (suboptimally) computing the graph edit distance between attributed graphs. [A Python interface](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/gedlib) for `GEDLIB` is integrated in this library, based on [`gedlibpy`](https://github.com/Ryurin/gedlibpy) library. | |||||
### 5 Computation optimization methods | |||||
* Python’s `multiprocessing.Pool` module is applied to perform **parallelization** on the computations of all kernels as well as the model selection. | |||||
* **The Fast Computation of Shortest Path Kernel (FCSP) method** [8] is implemented in *the random walk kernel*, *the shortest path kernel*, as well as *the structural shortest path kernel* where FCSP is applied on both vertex and edge kernels. | |||||
* **The trie data structure** [9] is employed in *the path kernel up to length h* to store paths in graphs. | |||||
## Issues | |||||
* This library uses `multiprocessing.Pool.imap_unordered` function to do the parallelization, which may not be able to run correctly under Windows system. For now, Windows users may need to comment the parallel codes and uncomment the codes below them which run serially. We will consider adding a parameter to control serial or parallel computations as needed. | |||||
* Some modules (such as `Numpy`, `Scipy`, `sklearn`) apply [`OpenBLAS`](https://www.openblas.net/) to perform parallel computation by default, which causes conflicts with other parallelization modules such as `multiprossing.Pool`, highly increasing the computing time. By setting its thread to 1, `OpenBLAS` is forced to use a single thread/CPU, thus avoids the conflicts. For now, this procedure has to be done manually. Under Linux, type this command in terminal before running the code: | |||||
``` | |||||
$ export OPENBLAS_NUM_THREADS=1 | |||||
``` | |||||
Or add `export OPENBLAS_NUM_THREADS=1` at the end of your `~/.bashrc` file, then run | |||||
``` | |||||
$ source ~/.bashrc | |||||
``` | |||||
to make this effective permanently. | |||||
## Results | |||||
Check this paper for detailed description of graph kernels and experimental results: | |||||
Linlin Jia, Benoit Gaüzère, and Paul Honeine. Graph Kernels Based on Linear Patterns: Theoretical and Experimental Comparisons. working paper or preprint, March 2019. URL https://hal-normandie-univ.archives-ouvertes.fr/hal-02053946. | |||||
A comparison of performances of graph kernels on benchmark datasets can be found [here](https://graphkit-learn.readthedocs.io/en/master/experiments.html). | |||||
## How to contribute | |||||
Fork the library and open a pull request! Make your own contribute to the community! | |||||
## Authors | |||||
* [Linlin Jia](https://jajupmochi.github.io/), LITIS, INSA Rouen Normandie | |||||
* [Benoit Gaüzère](http://pagesperso.litislab.fr/~bgauzere/#contact_en), LITIS, INSA Rouen Normandie | |||||
* [Paul Honeine](http://honeine.fr/paul/Welcome.html), LITIS, Université de Rouen Normandie | |||||
## Citation | |||||
Still waiting... | |||||
## Acknowledgments | |||||
This research was supported by CSC (China Scholarship Council) and the French national research agency (ANR) under the grant APi (ANR-18-CE23-0014). The authors would like to thank the CRIANN (Le Centre Régional Informatique et d’Applications Numériques de Normandie) for providing computational resources. | |||||
## References | |||||
[1] Thomas Gärtner, Peter Flach, and Stefan Wrobel. On graph kernels: Hardness results and efficient alternatives. Learning Theory and Kernel Machines, pages 129–143, 2003. | |||||
[2] H. Kashima, K. Tsuda, and A. Inokuchi. Marginalized kernels between labeled graphs. In Proceedings of the 20th International Conference on Machine Learning, Washington, DC, United States, 2003. | |||||
[3] Vishwanathan, S.V.N., Schraudolph, N.N., Kondor, R., Borgwardt, K.M., 2010. Graph kernels. Journal of Machine Learning Research 11, 1201–1242. | |||||
[4] K. M. Borgwardt and H.-P. Kriegel. Shortest-path kernels on graphs. In Proceedings of the International Conference on Data Mining, pages 74-81, 2005. | |||||
[5] Liva Ralaivola, Sanjay J Swamidass, Hiroto Saigo, and Pierre Baldi. Graph kernels for chemical informatics. Neural networks, 18(8):1093–1110, 2005. | |||||
[6] Suard F, Rakotomamonjy A, Bensrhair A. Kernel on Bag of Paths For Measuring Similarity of Shapes. InESANN 2007 Apr 25 (pp. 355-360). | |||||
[7] Mahé, P., Ueda, N., Akutsu, T., Perret, J.L., Vert, J.P., 2004. Extensions of marginalized graph kernels, in: Proc. the twenty-first international conference on Machine learning, ACM. p. 70. | |||||
[8] Lifan Xu, Wei Wang, M Alvarez, John Cavazos, and Dongping Zhang. Parallelization of shortest path graph kernels on multi-core cpus and gpus. Proceedings of the Programmability Issues for Heterogeneous Multicores (MultiProg), Vienna, Austria, 2014. | |||||
[9] Edward Fredkin. Trie memory. Communications of the ACM, 3(9):490–499, 1960. | |||||
[10] Gaüzere, B., Brun, L., Villemin, D., 2012. Two new graphs kernels in chemoinformatics. Pattern Recognition Letters 33, 2038–2047. | |||||
[11] Shervashidze, N., Schweitzer, P., Leeuwen, E.J.v., Mehlhorn, K., Borgwardt, K.M., 2011. Weisfeiler-lehman graph kernels. Journal of Machine Learning Research 12, 2539–2561. |
@@ -0,0 +1,19 @@ | |||||
# Minimal makefile for Sphinx documentation | |||||
# | |||||
# You can set these variables from the command line. | |||||
SPHINXOPTS = | |||||
SPHINXBUILD = sphinx-build | |||||
SOURCEDIR = source | |||||
BUILDDIR = build | |||||
# Put it first so that "make" without argument is like "make help". | |||||
help: | |||||
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) | |||||
.PHONY: help Makefile | |||||
# Catch-all target: route all unknown targets to Sphinx using the new | |||||
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). | |||||
%: Makefile | |||||
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) |
@@ -0,0 +1,5 @@ | |||||
sphinx-apidoc -o docs/ gklearn/ --separate | |||||
sphinx-apidoc -o source/ ../gklearn/ --separate --force --module-first --no-toc | |||||
make html |
@@ -0,0 +1,35 @@ | |||||
@ECHO OFF | |||||
pushd %~dp0 | |||||
REM Command file for Sphinx documentation | |||||
if "%SPHINXBUILD%" == "" ( | |||||
set SPHINXBUILD=sphinx-build | |||||
) | |||||
set SOURCEDIR=source | |||||
set BUILDDIR=build | |||||
if "%1" == "" goto help | |||||
%SPHINXBUILD% >NUL 2>NUL | |||||
if errorlevel 9009 ( | |||||
echo. | |||||
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx | |||||
echo.installed, then set the SPHINXBUILD environment variable to point | |||||
echo.to the full path of the 'sphinx-build' executable. Alternatively you | |||||
echo.may add the Sphinx directory to PATH. | |||||
echo. | |||||
echo.If you don't have Sphinx installed, grab it from | |||||
echo.http://sphinx-doc.org/ | |||||
exit /b 1 | |||||
) | |||||
%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% | |||||
goto end | |||||
:help | |||||
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% | |||||
:end | |||||
popd |
@@ -0,0 +1,4 @@ | |||||
sphinx | |||||
m2r | |||||
nbsphinx | |||||
ipykernel |
@@ -0,0 +1,194 @@ | |||||
# -*- coding: utf-8 -*- | |||||
# | |||||
# Configuration file for the Sphinx documentation builder. | |||||
# | |||||
# This file does only contain a selection of the most common options. For a | |||||
# full list see the documentation: | |||||
# http://www.sphinx-doc.org/en/master/config | |||||
# -- Path setup -------------------------------------------------------------- | |||||
# If extensions (or modules to document with autodoc) are in another directory, | |||||
# add these directories to sys.path here. If the directory is relative to the | |||||
# documentation root, use os.path.abspath to make it absolute, like shown here. | |||||
# | |||||
import os | |||||
import sys | |||||
sys.path.insert(0, os.path.abspath('.')) | |||||
# sys.path.insert(0, os.path.abspath('..')) | |||||
sys.path.insert(0, '../') | |||||
sys.path.insert(0, '../../') | |||||
# -- Project information ----------------------------------------------------- | |||||
project = 'graphkit-learn' | |||||
copyright = '2020, Linlin Jia' | |||||
author = 'Linlin Jia' | |||||
# The short X.Y version | |||||
version = '' | |||||
# The full version, including alpha/beta/rc tags | |||||
release = '1.0.0' | |||||
# -- General configuration --------------------------------------------------- | |||||
# If your documentation needs a minimal Sphinx version, state it here. | |||||
# | |||||
# needs_sphinx = '1.0' | |||||
# Add any Sphinx extension module names here, as strings. They can be | |||||
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom | |||||
# ones. | |||||
extensions = [ | |||||
'sphinx.ext.autodoc', | |||||
'sphinx.ext.doctest', | |||||
'sphinx.ext.todo', | |||||
'sphinx.ext.coverage', | |||||
'sphinx.ext.mathjax', | |||||
'sphinx.ext.ifconfig', | |||||
'sphinx.ext.viewcode', | |||||
'm2r', | |||||
] | |||||
# Add any paths that contain templates here, relative to this directory. | |||||
templates_path = ['_templates'] | |||||
# The suffix(es) of source filenames. | |||||
# You can specify multiple suffix as a list of string: | |||||
# | |||||
source_suffix = ['.rst', '.md'] | |||||
# source_suffix = '.rst' | |||||
# The master toctree document. | |||||
master_doc = 'index' | |||||
# The language for content autogenerated by Sphinx. Refer to documentation | |||||
# for a list of supported languages. | |||||
# | |||||
# This is also used if you do content translation via gettext catalogs. | |||||
# Usually you set "language" from the command line for these cases. | |||||
language = None | |||||
# List of patterns, relative to source directory, that match files and | |||||
# directories to ignore when looking for source files. | |||||
# This pattern also affects html_static_path and html_extra_path. | |||||
exclude_patterns = [] | |||||
# The name of the Pygments (syntax highlighting) style to use. | |||||
pygments_style = None | |||||
# -- Options for HTML output ------------------------------------------------- | |||||
# The theme to use for HTML and HTML Help pages. See the documentation for | |||||
# a list of builtin themes. | |||||
# | |||||
# html_theme = 'alabaster' | |||||
html_theme = 'sphinx_rtd_theme' | |||||
# Theme options are theme-specific and customize the look and feel of a theme | |||||
# further. For a list of options available for each theme, see the | |||||
# documentation. | |||||
# | |||||
# html_theme_options = {} | |||||
# Add any paths that contain custom static files (such as style sheets) here, | |||||
# relative to this directory. They are copied after the builtin static files, | |||||
# so a file named "default.css" will overwrite the builtin "default.css". | |||||
html_static_path = ['_static'] | |||||
# Custom sidebar templates, must be a dictionary that maps document names | |||||
# to template names. | |||||
# | |||||
# The default sidebars (for documents that don't match any pattern) are | |||||
# defined by theme itself. Builtin themes are using these templates by | |||||
# default: ``['localtoc.html', 'relations.html', 'sourcelink.html', | |||||
# 'searchbox.html']``. | |||||
# | |||||
# html_sidebars = {} | |||||
# -- Options for HTMLHelp output --------------------------------------------- | |||||
# Output file base name for HTML help builder. | |||||
htmlhelp_basename = 'graphkit-learndoc' | |||||
# -- Options for LaTeX output ------------------------------------------------ | |||||
latex_elements = { | |||||
# The paper size ('letterpaper' or 'a4paper'). | |||||
# | |||||
# 'papersize': 'letterpaper', | |||||
# The font size ('10pt', '11pt' or '12pt'). | |||||
# | |||||
# 'pointsize': '10pt', | |||||
# Additional stuff for the LaTeX preamble. | |||||
# | |||||
# 'preamble': '', | |||||
# Latex figure (float) alignment | |||||
# | |||||
# 'figure_align': 'htbp', | |||||
} | |||||
# Grouping the document tree into LaTeX files. List of tuples | |||||
# (source start file, target name, title, | |||||
# author, documentclass [howto, manual, or own class]). | |||||
latex_documents = [ | |||||
(master_doc, 'graphkit-learn.tex', 'graphkit-learn Documentation', | |||||
'Linlin Jia', 'manual'), | |||||
] | |||||
# -- Options for manual page output ------------------------------------------ | |||||
# One entry per manual page. List of tuples | |||||
# (source start file, name, description, authors, manual section). | |||||
man_pages = [ | |||||
(master_doc, 'graphkit-learn', 'graphkit-learn Documentation', | |||||
[author], 1) | |||||
] | |||||
# -- Options for Texinfo output ---------------------------------------------- | |||||
# Grouping the document tree into Texinfo files. List of tuples | |||||
# (source start file, target name, title, author, | |||||
# dir menu entry, description, category) | |||||
texinfo_documents = [ | |||||
(master_doc, 'graphkit-learn', 'graphkit-learn Documentation', | |||||
author, 'graphkit-learn', 'One line description of project.', | |||||
'Miscellaneous'), | |||||
] | |||||
# -- Options for Epub output ------------------------------------------------- | |||||
# Bibliographic Dublin Core info. | |||||
epub_title = project | |||||
# The unique identifier of the text. This can be a ISBN number | |||||
# or the project homepage. | |||||
# | |||||
# epub_identifier = '' | |||||
# A unique identification for the text. | |||||
# | |||||
# epub_uid = '' | |||||
# A list of files that should not be packed into the epub file. | |||||
epub_exclude_files = ['search.html'] | |||||
# -- Extension configuration ------------------------------------------------- | |||||
# -- Options for todo extension ---------------------------------------------- | |||||
# If true, `todo` and `todoList` produce output, else they produce nothing. | |||||
todo_include_todos = True | |||||
add_module_names = False |
@@ -0,0 +1,22 @@ | |||||
Experiments | |||||
=========== | |||||
To exhibit the effectiveness and practicability of `graphkit-learn` library, we tested it on several benchmark datasets. See `(Kersting et al., 2016) <http://graphkernels.cs.tu-dortmund.de>`__ for details on these datasets. | |||||
A two-layer nested cross-validation (CV) is applied to select and evaluate models, where outer CV randomly splits the dataset into 10 folds with 9 as validation set, and inner CV then randomly splits validation set to 10 folds with 9 as training set. The whole procedure is performed 30 times, and the average performance is computed over these trails. Possible parameters of a graph kernel are also tuned during this procedure. | |||||
The machine used to execute the experiments is a cluster with 28 CPU cores of Intel(R) Xeon(R) E5-2680 v4 @ 2.40GHz, 252GB memory, and 64-bit operating system CentOS Linux release 7.3.1611. All results were run with Python 3.5.2. | |||||
The figure below exhibits accuracies achieved by graph kernels implemented in `graphkit-learn` library, in terms of regression error (the upper table) and classification rate (the lower table). Red color indicates the worse results and dark green the best ones. Gray cells with the “inf” marker indicate that the computation of the graph kernel on the dataset is omitted due to much higher consumption of computational resources than other kernels. | |||||
.. image:: figures/all_test_accuracy.svg | |||||
:width: 600 | |||||
:alt: accuracies | |||||
The figure below displays computational time consumed to compute Gram matrices of each graph | |||||
kernels (in :math:`log10` of seconds) on each dataset. Color legends have the same meaning as in the figure above. | |||||
.. image:: figures/all_ave_gm_times.svg | |||||
:width: 600 | |||||
:alt: computational time | |||||
@@ -0,0 +1,7 @@ | |||||
gklearn.kernels.commonWalkKernel | |||||
================================ | |||||
.. automodule:: gklearn.kernels.commonWalkKernel | |||||
:members: | |||||
:undoc-members: | |||||
:show-inheritance: |
@@ -0,0 +1,7 @@ | |||||
gklearn.kernels.marginalizedKernel | |||||
================================== | |||||
.. automodule:: gklearn.kernels.marginalizedKernel | |||||
:members: | |||||
:undoc-members: | |||||
:show-inheritance: |
@@ -0,0 +1,7 @@ | |||||
gklearn.kernels.randomWalkKernel | |||||
================================ | |||||
.. automodule:: gklearn.kernels.randomWalkKernel | |||||
:members: | |||||
:undoc-members: | |||||
:show-inheritance: |
@@ -0,0 +1,19 @@ | |||||
gklearn.kernels | |||||
=============== | |||||
.. automodule:: gklearn.kernels | |||||
:members: | |||||
:undoc-members: | |||||
:show-inheritance: | |||||
.. toctree:: | |||||
gklearn.kernels.commonWalkKernel | |||||
gklearn.kernels.marginalizedKernel | |||||
gklearn.kernels.randomWalkKernel | |||||
gklearn.kernels.spKernel | |||||
gklearn.kernels.structuralspKernel | |||||
gklearn.kernels.treeletKernel | |||||
gklearn.kernels.untilHPathKernel | |||||
gklearn.kernels.weisfeilerLehmanKernel | |||||
@@ -0,0 +1,7 @@ | |||||
gklearn.kernels.spKernel | |||||
======================== | |||||
.. automodule:: gklearn.kernels.spKernel | |||||
:members: | |||||
:undoc-members: | |||||
:show-inheritance: |
@@ -0,0 +1,7 @@ | |||||
gklearn.kernels.structuralspKernel | |||||
================================== | |||||
.. automodule:: gklearn.kernels.structuralspKernel | |||||
:members: | |||||
:undoc-members: | |||||
:show-inheritance: |
@@ -0,0 +1,7 @@ | |||||
gklearn.kernels.treeletKernel | |||||
============================= | |||||
.. automodule:: gklearn.kernels.treeletKernel | |||||
:members: | |||||
:undoc-members: | |||||
:show-inheritance: |
@@ -0,0 +1,7 @@ | |||||
gklearn.kernels.untilHPathKernel | |||||
================================ | |||||
.. automodule:: gklearn.kernels.untilHPathKernel | |||||
:members: | |||||
:undoc-members: | |||||
:show-inheritance: |
@@ -0,0 +1,7 @@ | |||||
gklearn.kernels.weisfeilerLehmanKernel | |||||
====================================== | |||||
.. automodule:: gklearn.kernels.weisfeilerLehmanKernel | |||||
:members: | |||||
:undoc-members: | |||||
:show-inheritance: |
@@ -0,0 +1,13 @@ | |||||
gklearn | |||||
======= | |||||
.. automodule:: gklearn | |||||
:members: | |||||
:undoc-members: | |||||
:show-inheritance: | |||||
.. toctree:: | |||||
gklearn.kernels | |||||
gklearn.utils | |||||
@@ -0,0 +1,7 @@ | |||||
gklearn.utils.graphdataset | |||||
========================== | |||||
.. automodule:: gklearn.utils.graphdataset | |||||
:members: | |||||
:undoc-members: | |||||
:show-inheritance: |
@@ -0,0 +1,7 @@ | |||||
gklearn.utils.graphfiles | |||||
======================== | |||||
.. automodule:: gklearn.utils.graphfiles | |||||
:members: | |||||
:undoc-members: | |||||
:show-inheritance: |
@@ -0,0 +1,7 @@ | |||||
gklearn.utils.kernels | |||||
===================== | |||||
.. automodule:: gklearn.utils.kernels | |||||
:members: | |||||
:undoc-members: | |||||
:show-inheritance: |
@@ -0,0 +1,7 @@ | |||||
gklearn.utils.model\_selection\_precomputed | |||||
=========================================== | |||||
.. automodule:: gklearn.utils.model_selection_precomputed | |||||
:members: | |||||
:undoc-members: | |||||
:show-inheritance: |
@@ -0,0 +1,7 @@ | |||||
gklearn.utils.parallel | |||||
====================== | |||||
.. automodule:: gklearn.utils.parallel | |||||
:members: | |||||
:undoc-members: | |||||
:show-inheritance: |
@@ -0,0 +1,19 @@ | |||||
gklearn.utils | |||||
============= | |||||
.. automodule:: gklearn.utils | |||||
:members: | |||||
:undoc-members: | |||||
:show-inheritance: | |||||
.. toctree:: | |||||
gklearn.utils.graphdataset | |||||
gklearn.utils.graphfiles | |||||
gklearn.utils.kernels | |||||
gklearn.utils.model_selection_precomputed | |||||
gklearn.utils.parallel | |||||
gklearn.utils.trie | |||||
gklearn.utils.utils | |||||
@@ -0,0 +1,7 @@ | |||||
gklearn.utils.trie | |||||
================== | |||||
.. automodule:: gklearn.utils.trie | |||||
:members: | |||||
:undoc-members: | |||||
:show-inheritance: |
@@ -0,0 +1,7 @@ | |||||
gklearn.utils.utils | |||||
=================== | |||||
.. automodule:: gklearn.utils.utils | |||||
:members: | |||||
:undoc-members: | |||||
:show-inheritance: |
@@ -0,0 +1,24 @@ | |||||
.. graphkit-learn documentation master file, created by | |||||
sphinx-quickstart on Wed Feb 12 15:06:37 2020. | |||||
You can adapt this file completely to your liking, but it should at least | |||||
contain the root `toctree` directive. | |||||
.. mdinclude:: ../../README.md | |||||
Documentation | |||||
------------- | |||||
.. toctree:: | |||||
:maxdepth: 1 | |||||
modules.rst | |||||
experiments.rst | |||||
Indices and tables | |||||
------------------ | |||||
* :ref:`genindex` | |||||
* :ref:`modindex` | |||||
* :ref:`search` |
@@ -0,0 +1,7 @@ | |||||
Modules | |||||
======= | |||||
.. toctree:: | |||||
:maxdepth: 4 | |||||
gklearn |
@@ -0,0 +1,21 @@ | |||||
# -*-coding:utf-8 -*- | |||||
""" | |||||
gklearn | |||||
This package contains 4 sub packages : | |||||
* c_ext : binders to C++ code | |||||
* ged : allows to compute graph edit distance between networkX graphs | |||||
* kernels : computation of graph kernels, ie graph similarity measure compatible with SVM | |||||
* notebooks : examples of code using this library | |||||
* utils : Diverse computation on graphs | |||||
""" | |||||
# info | |||||
__version__ = "0.1" | |||||
__author__ = "Benoit Gaüzère" | |||||
__date__ = "November 2017" | |||||
# import sub modules | |||||
# from gklearn import c_ext | |||||
# from gklearn import ged | |||||
# from gklearn import utils |
@@ -0,0 +1,58 @@ | |||||
# -*- coding: utf-8 -*- | |||||
"""compute_graph_edit_distance.ipynb | |||||
Automatically generated by Colaboratory. | |||||
Original file is located at | |||||
https://colab.research.google.com/drive/1Wfgn7WVuyOQQgwOvdUQBz0BzEVdp0YM3 | |||||
**This script demonstrates how to compute a graph edit distance.** | |||||
--- | |||||
**0. Install `graphkit-learn`.** | |||||
""" | |||||
"""**1. Get dataset.**""" | |||||
from gklearn.utils import Dataset | |||||
# Predefined dataset name, use dataset "MUTAG". | |||||
ds_name = 'MUTAG' | |||||
# Initialize a Dataset. | |||||
dataset = Dataset() | |||||
# Load predefined dataset "MUTAG". | |||||
dataset.load_predefined_dataset(ds_name) | |||||
graph1 = dataset.graphs[0] | |||||
graph2 = dataset.graphs[1] | |||||
print(graph1, graph2) | |||||
"""**2. Compute graph edit distance.**""" | |||||
from gklearn.ged.env import GEDEnv | |||||
ged_env = GEDEnv() # initailize GED environment. | |||||
ged_env.set_edit_cost('CONSTANT', # GED cost type. | |||||
edit_cost_constants=[3, 3, 1, 3, 3, 1] # edit costs. | |||||
) | |||||
ged_env.add_nx_graph(graph1, '') # add graph1 | |||||
ged_env.add_nx_graph(graph2, '') # add graph2 | |||||
listID = ged_env.get_all_graph_ids() # get list IDs of graphs | |||||
ged_env.init(init_type='LAZY_WITHOUT_SHUFFLED_COPIES') # initialize GED environment. | |||||
options = {'initialization_method': 'RANDOM', # or 'NODE', etc. | |||||
'threads': 1 # parallel threads. | |||||
} | |||||
ged_env.set_method('BIPARTITE', # GED method. | |||||
options # options for GED method. | |||||
) | |||||
ged_env.init_method() # initialize GED method. | |||||
ged_env.run_method(listID[0], listID[1]) # run. | |||||
pi_forward = ged_env.get_forward_map(listID[0], listID[1]) # forward map. | |||||
pi_backward = ged_env.get_backward_map(listID[0], listID[1]) # backward map. | |||||
dis = ged_env.get_upper_bound(listID[0], listID[1]) # GED bewteen two graphs. | |||||
print(pi_forward) | |||||
print(pi_backward) | |||||
print(dis) |
@@ -0,0 +1,73 @@ | |||||
# -*- coding: utf-8 -*- | |||||
"""compute_distance_in_kernel_space.ipynb | |||||
Automatically generated by Colaboratory. | |||||
Original file is located at | |||||
https://colab.research.google.com/drive/17tZP6IrineQmzo9sRtfZOnHpHx6HnlMA | |||||
**This script demonstrates how to compute distance in kernel space between the image of a graph and the mean of images of a group of graphs.** | |||||
--- | |||||
**0. Install `graphkit-learn`.** | |||||
""" | |||||
"""**1. Get dataset.**""" | |||||
from gklearn.utils import Dataset | |||||
# Predefined dataset name, use dataset "MUTAG". | |||||
ds_name = 'MUTAG' | |||||
# Initialize a Dataset. | |||||
dataset = Dataset() | |||||
# Load predefined dataset "MUTAG". | |||||
dataset.load_predefined_dataset(ds_name) | |||||
len(dataset.graphs) | |||||
"""**2. Compute graph kernel.**""" | |||||
from gklearn.kernels import PathUpToH | |||||
import multiprocessing | |||||
# Initailize parameters for graph kernel computation. | |||||
kernel_options = {'depth': 3, | |||||
'k_func': 'MinMax', | |||||
'compute_method': 'trie' | |||||
} | |||||
# Initialize graph kernel. | |||||
graph_kernel = PathUpToH(node_labels=dataset.node_labels, # list of node label names. | |||||
edge_labels=dataset.edge_labels, # list of edge label names. | |||||
ds_infos=dataset.get_dataset_infos(keys=['directed']), # dataset information required for computation. | |||||
**kernel_options, # options for computation. | |||||
) | |||||
# Compute Gram matrix. | |||||
gram_matrix, run_time = graph_kernel.compute(dataset.graphs, | |||||
parallel='imap_unordered', # or None. | |||||
n_jobs=multiprocessing.cpu_count(), # number of parallel jobs. | |||||
normalize=True, # whether to return normalized Gram matrix. | |||||
verbose=2 # whether to print out results. | |||||
) | |||||
"""**3. Compute distance in kernel space.** | |||||
Given a dataset $\mathcal{G}_N$, compute the distance in kernel space between the image of $G_1 \in \mathcal{G}_N$ and the mean of images of $\mathcal{G}_k \subset \mathcal{G}_N$. | |||||
""" | |||||
from gklearn.preimage.utils import compute_k_dis | |||||
# Index of $G_1$. | |||||
idx_1 = 10 | |||||
# Indices of graphs in $\mathcal{G}_k$. | |||||
idx_graphs = range(0, 10) | |||||
# Compute the distance in kernel space. | |||||
dis_k = compute_k_dis(idx_1, | |||||
idx_graphs, | |||||
[1 / len(idx_graphs)] * len(idx_graphs), # weights for images of graphs in $\mathcal{G}_k$; all equal when computing the mean. | |||||
gram_matrix, # gram matrix of al graphs. | |||||
withterm3=False | |||||
) | |||||
print(dis_k) |
@@ -0,0 +1,87 @@ | |||||
# -*- coding: utf-8 -*- | |||||
"""compute_graph_kernel.ipynb | |||||
Automatically generated by Colaboratory. | |||||
Original file is located at | |||||
https://colab.research.google.com/drive/17Q2QCl9CAtDweGF8LiWnWoN2laeJqT0u | |||||
**This script demonstrates how to compute a graph kernel.** | |||||
--- | |||||
**0. Install `graphkit-learn`.** | |||||
""" | |||||
"""**1. Get dataset.**""" | |||||
from gklearn.utils import Dataset | |||||
# Predefined dataset name, use dataset "MUTAG". | |||||
ds_name = 'MUTAG' | |||||
# Initialize a Dataset. | |||||
dataset = Dataset() | |||||
# Load predefined dataset "MUTAG". | |||||
dataset.load_predefined_dataset(ds_name) | |||||
len(dataset.graphs) | |||||
"""**2. Compute graph kernel.**""" | |||||
from gklearn.kernels import PathUpToH | |||||
# Initailize parameters for graph kernel computation. | |||||
kernel_options = {'depth': 3, | |||||
'k_func': 'MinMax', | |||||
'compute_method': 'trie' | |||||
} | |||||
# Initialize graph kernel. | |||||
graph_kernel = PathUpToH(node_labels=dataset.node_labels, # list of node label names. | |||||
edge_labels=dataset.edge_labels, # list of edge label names. | |||||
ds_infos=dataset.get_dataset_infos(keys=['directed']), # dataset information required for computation. | |||||
**kernel_options, # options for computation. | |||||
) | |||||
print('done.') | |||||
import multiprocessing | |||||
import matplotlib.pyplot as plt | |||||
# Compute Gram matrix. | |||||
gram_matrix, run_time = graph_kernel.compute(dataset.graphs, | |||||
parallel='imap_unordered', # or None. | |||||
n_jobs=multiprocessing.cpu_count(), # number of parallel jobs. | |||||
normalize=True, # whether to return normalized Gram matrix. | |||||
verbose=2 # whether to print out results. | |||||
) | |||||
# Print results. | |||||
print() | |||||
print(gram_matrix) | |||||
print(run_time) | |||||
plt.imshow(gram_matrix) | |||||
import multiprocessing | |||||
# Compute grah kernels between a graph and a list of graphs. | |||||
kernel_list, run_time = graph_kernel.compute(dataset.graphs, # a list of graphs. | |||||
dataset.graphs[0], # a single graph. | |||||
parallel='imap_unordered', # or None. | |||||
n_jobs=multiprocessing.cpu_count(), # number of parallel jobs. | |||||
verbose=2 # whether to print out results. | |||||
) | |||||
# Print results. | |||||
print() | |||||
print(kernel_list) | |||||
print(run_time) | |||||
import multiprocessing | |||||
# Compute a grah kernel between two graphs. | |||||
kernel, run_time = graph_kernel.compute(dataset.graphs[0], # a single graph. | |||||
dataset.graphs[1], # another single graph. | |||||
verbose=2 # whether to print out results. | |||||
) | |||||
# Print results. | |||||
print() | |||||
print(kernel) | |||||
print(run_time) |
@@ -0,0 +1,31 @@ | |||||
# -*- coding: utf-8 -*- | |||||
"""compute_graph_kernel_v0.1.ipynb | |||||
Automatically generated by Colaboratory. | |||||
Original file is located at | |||||
https://colab.research.google.com/drive/10jUz7-ahPiE_T1qvFrh2NvCVs1e47noj | |||||
**This script demonstrates how to compute a graph kernel.** | |||||
--- | |||||
**0. Install `graphkit-learn`.** | |||||
""" | |||||
"""**1. Get dataset.**""" | |||||
from gklearn.utils.graphfiles import loadDataset | |||||
graphs, targets = loadDataset('../../../datasets/MUTAG/MUTAG_A.txt') | |||||
"""**2. Compute graph kernel.**""" | |||||
from gklearn.kernels import untilhpathkernel | |||||
gram_matrix, run_time = untilhpathkernel( | |||||
graphs, # The list of input graphs. | |||||
depth=5, # The longest length of paths. | |||||
k_func='MinMax', # Or 'tanimoto'. | |||||
compute_method='trie', # Or 'naive'. | |||||
n_jobs=1, # The number of jobs to run in parallel. | |||||
verbose=True) |
@@ -0,0 +1,38 @@ | |||||
# -*- coding: utf-8 -*- | |||||
"""model_selection_old.ipynb | |||||
Automatically generated by Colaboratory. | |||||
Original file is located at | |||||
https://colab.research.google.com/drive/1uVkl7scNgEPrimX8ks6iEC5ijuhB8L_D | |||||
**This script demonstrates how to compute a graph kernel.** | |||||
--- | |||||
**0. Install `graphkit-learn`.** | |||||
""" | |||||
"""**1. Perform model seletion and classification.**""" | |||||
from gklearn.utils import model_selection_for_precomputed_kernel | |||||
from gklearn.kernels import untilhpathkernel | |||||
import numpy as np | |||||
# Set parameters. | |||||
datafile = '../../../datasets/MUTAG/MUTAG_A.txt' | |||||
param_grid_precomputed = {'depth': np.linspace(1, 10, 10), | |||||
'k_func': ['MinMax', 'tanimoto'], | |||||
'compute_method': ['trie']} | |||||
param_grid = {'C': np.logspace(-10, 10, num=41, base=10)} | |||||
# Perform model selection and classification. | |||||
model_selection_for_precomputed_kernel( | |||||
datafile, # The path of dataset file. | |||||
untilhpathkernel, # The graph kernel used for estimation. | |||||
param_grid_precomputed, # The parameters used to compute gram matrices. | |||||
param_grid, # The penelty Parameters used for penelty items. | |||||
'classification', # Or 'regression'. | |||||
NUM_TRIALS=30, # The number of the random trials of the outer CV loop. | |||||
ds_name='MUTAG', # The name of the dataset. | |||||
n_jobs=1, | |||||
verbose=True) |
@@ -0,0 +1,115 @@ | |||||
# -*- coding: utf-8 -*- | |||||
"""example_median_preimege_generator.ipynb | |||||
Automatically generated by Colaboratory. | |||||
Original file is located at | |||||
https://colab.research.google.com/drive/1PIDvHOcmiLEQ5Np3bgBDdu0kLOquOMQK | |||||
**This script demonstrates how to generate a graph preimage using Boria's method.** | |||||
--- | |||||
""" | |||||
"""**1. Get dataset.**""" | |||||
from gklearn.utils import Dataset, split_dataset_by_target | |||||
# Predefined dataset name, use dataset "MAO". | |||||
ds_name = 'MAO' | |||||
# The node/edge labels that will not be used in the computation. | |||||
irrelevant_labels = {'node_attrs': ['x', 'y', 'z'], 'edge_labels': ['bond_stereo']} | |||||
# Initialize a Dataset. | |||||
dataset_all = Dataset() | |||||
# Load predefined dataset "MAO". | |||||
dataset_all.load_predefined_dataset(ds_name) | |||||
# Remove irrelevant labels. | |||||
dataset_all.remove_labels(**irrelevant_labels) | |||||
# Split the whole dataset according to the classification targets. | |||||
datasets = split_dataset_by_target(dataset_all) | |||||
# Get the first class of graphs, whose median preimage will be computed. | |||||
dataset = datasets[0] | |||||
len(dataset.graphs) | |||||
"""**2. Set parameters.**""" | |||||
import multiprocessing | |||||
# Parameters for MedianPreimageGenerator (our method). | |||||
mpg_options = {'fit_method': 'k-graphs', # how to fit edit costs. "k-graphs" means use all graphs in median set when fitting. | |||||
'init_ecc': [4, 4, 2, 1, 1, 1], # initial edit costs. | |||||
'ds_name': ds_name, # name of the dataset. | |||||
'parallel': True, # whether the parallel scheme is to be used. | |||||
'time_limit_in_sec': 0, # maximum time limit to compute the preimage. If set to 0 then no limit. | |||||
'max_itrs': 100, # maximum iteration limit to optimize edit costs. If set to 0 then no limit. | |||||
'max_itrs_without_update': 3, # If the times that edit costs is not update is more than this number, then the optimization stops. | |||||
'epsilon_residual': 0.01, # In optimization, the residual is only considered changed if the change is bigger than this number. | |||||
'epsilon_ec': 0.1, # In optimization, the edit costs are only considered changed if the changes are bigger than this number. | |||||
'verbose': 2 # whether to print out results. | |||||
} | |||||
# Parameters for graph kernel computation. | |||||
kernel_options = {'name': 'PathUpToH', # use path kernel up to length h. | |||||
'depth': 9, | |||||
'k_func': 'MinMax', | |||||
'compute_method': 'trie', | |||||
'parallel': 'imap_unordered', # or None | |||||
'n_jobs': multiprocessing.cpu_count(), | |||||
'normalize': True, # whether to use normalized Gram matrix to optimize edit costs. | |||||
'verbose': 2 # whether to print out results. | |||||
} | |||||
# Parameters for GED computation. | |||||
ged_options = {'method': 'IPFP', # use IPFP huristic. | |||||
'initialization_method': 'RANDOM', # or 'NODE', etc. | |||||
'initial_solutions': 10, # when bigger than 1, then the method is considered mIPFP. | |||||
'edit_cost': 'CONSTANT', # use CONSTANT cost. | |||||
'attr_distance': 'euclidean', # the distance between non-symbolic node/edge labels is computed by euclidean distance. | |||||
'ratio_runs_from_initial_solutions': 1, | |||||
'threads': multiprocessing.cpu_count(), # parallel threads. Do not work if mpg_options['parallel'] = False. | |||||
'init_option': 'EAGER_WITHOUT_SHUFFLED_COPIES' | |||||
} | |||||
# Parameters for MedianGraphEstimator (Boria's method). | |||||
mge_options = {'init_type': 'MEDOID', # how to initial median (compute set-median). "MEDOID" is to use the graph with smallest SOD. | |||||
'random_inits': 10, # number of random initialization when 'init_type' = 'RANDOM'. | |||||
'time_limit': 600, # maximum time limit to compute the generalized median. If set to 0 then no limit. | |||||
'verbose': 2, # whether to print out results. | |||||
'refine': False # whether to refine the final SODs or not. | |||||
} | |||||
print('done.') | |||||
"""**3. Run median preimage generator.**""" | |||||
from gklearn.preimage import MedianPreimageGenerator | |||||
# Create median preimage generator instance. | |||||
mpg = MedianPreimageGenerator() | |||||
# Add dataset. | |||||
mpg.dataset = dataset | |||||
# Set parameters. | |||||
mpg.set_options(**mpg_options.copy()) | |||||
mpg.kernel_options = kernel_options.copy() | |||||
mpg.ged_options = ged_options.copy() | |||||
mpg.mge_options = mge_options.copy() | |||||
# Run. | |||||
mpg.run() | |||||
"""**4. Get results.**""" | |||||
# Get results. | |||||
import pprint | |||||
pp = pprint.PrettyPrinter(indent=4) # pretty print | |||||
results = mpg.get_results() | |||||
pp.pprint(results) | |||||
# Draw generated graphs. | |||||
def draw_graph(graph): | |||||
import matplotlib.pyplot as plt | |||||
import networkx as nx | |||||
plt.figure() | |||||
pos = nx.spring_layout(graph) | |||||
nx.draw(graph, pos, node_size=500, labels=nx.get_node_attributes(graph, 'atom_symbol'), font_color='w', width=3, with_labels=True) | |||||
plt.show() | |||||
plt.clf() | |||||
plt.close() | |||||
draw_graph(mpg.set_median) | |||||
draw_graph(mpg.gen_median) |
@@ -0,0 +1,113 @@ | |||||
#!/usr/bin/env python3 | |||||
# -*- coding: utf-8 -*- | |||||
""" | |||||
Created on Tue Jun 16 15:41:26 2020 | |||||
@author: ljia | |||||
**This script demonstrates how to generate a graph preimage using Boria's method with cost matrices learning.** | |||||
""" | |||||
"""**1. Get dataset.**""" | |||||
from gklearn.utils import Dataset, split_dataset_by_target | |||||
# Predefined dataset name, use dataset "MAO". | |||||
ds_name = 'MAO' | |||||
# The node/edge labels that will not be used in the computation. | |||||
irrelevant_labels = {'node_attrs': ['x', 'y', 'z'], 'edge_labels': ['bond_stereo']} | |||||
# Initialize a Dataset. | |||||
dataset_all = Dataset() | |||||
# Load predefined dataset "MAO". | |||||
dataset_all.load_predefined_dataset(ds_name) | |||||
# Remove irrelevant labels. | |||||
dataset_all.remove_labels(**irrelevant_labels) | |||||
# Split the whole dataset according to the classification targets. | |||||
datasets = split_dataset_by_target(dataset_all) | |||||
# Get the first class of graphs, whose median preimage will be computed. | |||||
dataset = datasets[0] | |||||
len(dataset.graphs) | |||||
"""**2. Set parameters.**""" | |||||
import multiprocessing | |||||
# Parameters for MedianPreimageGenerator (our method). | |||||
mpg_options = {'init_method': 'random', # how to initialize node label cost vector. "random" means to initialize randomly. | |||||
'init_ecc': [4, 4, 2, 1, 1, 1], # initial edit costs. | |||||
'ds_name': ds_name, # name of the dataset. | |||||
'parallel': True, # @todo: whether the parallel scheme is to be used. | |||||
'time_limit_in_sec': 0, # maximum time limit to compute the preimage. If set to 0 then no limit. | |||||
'max_itrs': 3, # maximum iteration limit to optimize edit costs. If set to 0 then no limit. | |||||
'max_itrs_without_update': 3, # If the times that edit costs is not update is more than this number, then the optimization stops. | |||||
'epsilon_residual': 0.01, # In optimization, the residual is only considered changed if the change is bigger than this number. | |||||
'epsilon_ec': 0.1, # In optimization, the edit costs are only considered changed if the changes are bigger than this number. | |||||
'verbose': 2 # whether to print out results. | |||||
} | |||||
# Parameters for graph kernel computation. | |||||
kernel_options = {'name': 'PathUpToH', # use path kernel up to length h. | |||||
'depth': 9, | |||||
'k_func': 'MinMax', | |||||
'compute_method': 'trie', | |||||
'parallel': 'imap_unordered', # or None | |||||
'n_jobs': multiprocessing.cpu_count(), | |||||
'normalize': True, # whether to use normalized Gram matrix to optimize edit costs. | |||||
'verbose': 2 # whether to print out results. | |||||
} | |||||
# Parameters for GED computation. | |||||
ged_options = {'method': 'BIPARTITE', # use Bipartite huristic. | |||||
'initialization_method': 'RANDOM', # or 'NODE', etc. | |||||
'initial_solutions': 10, # when bigger than 1, then the method is considered mIPFP. | |||||
'edit_cost': 'CONSTANT', # @todo: not needed. use CONSTANT cost. | |||||
'attr_distance': 'euclidean', # @todo: not needed. the distance between non-symbolic node/edge labels is computed by euclidean distance. | |||||
'ratio_runs_from_initial_solutions': 1, | |||||
'threads': multiprocessing.cpu_count(), # parallel threads. Do not work if mpg_options['parallel'] = False. | |||||
'init_option': 'LAZY_WITHOUT_SHUFFLED_COPIES' # 'EAGER_WITHOUT_SHUFFLED_COPIES' | |||||
} | |||||
# Parameters for MedianGraphEstimator (Boria's method). | |||||
mge_options = {'init_type': 'MEDOID', # how to initial median (compute set-median). "MEDOID" is to use the graph with smallest SOD. | |||||
'random_inits': 10, # number of random initialization when 'init_type' = 'RANDOM'. | |||||
'time_limit': 600, # maximum time limit to compute the generalized median. If set to 0 then no limit. | |||||
'verbose': 2, # whether to print out results. | |||||
'refine': False # whether to refine the final SODs or not. | |||||
} | |||||
print('done.') | |||||
"""**3. Run median preimage generator.**""" | |||||
from gklearn.preimage import MedianPreimageGeneratorCML | |||||
# Create median preimage generator instance. | |||||
mpg = MedianPreimageGeneratorCML() | |||||
# Add dataset. | |||||
mpg.dataset = dataset | |||||
# Set parameters. | |||||
mpg.set_options(**mpg_options.copy()) | |||||
mpg.kernel_options = kernel_options.copy() | |||||
mpg.ged_options = ged_options.copy() | |||||
mpg.mge_options = mge_options.copy() | |||||
# Run. | |||||
mpg.run() | |||||
"""**4. Get results.**""" | |||||
# Get results. | |||||
import pprint | |||||
pp = pprint.PrettyPrinter(indent=4) # pretty print | |||||
results = mpg.get_results() | |||||
pp.pprint(results) | |||||
# Draw generated graphs. | |||||
def draw_graph(graph): | |||||
import matplotlib.pyplot as plt | |||||
import networkx as nx | |||||
plt.figure() | |||||
pos = nx.spring_layout(graph) | |||||
nx.draw(graph, pos, node_size=500, labels=nx.get_node_attributes(graph, 'atom_symbol'), font_color='w', width=3, with_labels=True) | |||||
plt.show() | |||||
plt.clf() | |||||
plt.close() | |||||
draw_graph(mpg.set_median) | |||||
draw_graph(mpg.gen_median) |
@@ -0,0 +1,114 @@ | |||||
#!/usr/bin/env python3 | |||||
# -*- coding: utf-8 -*- | |||||
""" | |||||
Created on Tue Jun 16 15:41:26 2020 | |||||
@author: ljia | |||||
**This script demonstrates how to generate a graph preimage using Boria's method with cost matrices learning.** | |||||
""" | |||||
"""**1. Get dataset.**""" | |||||
from gklearn.utils import Dataset, split_dataset_by_target | |||||
# Predefined dataset name, use dataset "MAO". | |||||
ds_name = 'MAO' | |||||
# The node/edge labels that will not be used in the computation. | |||||
irrelevant_labels = {'node_attrs': ['x', 'y', 'z'], 'edge_labels': ['bond_stereo']} | |||||
# Initialize a Dataset. | |||||
dataset_all = Dataset() | |||||
# Load predefined dataset "MAO". | |||||
dataset_all.load_predefined_dataset(ds_name) | |||||
# Remove irrelevant labels. | |||||
dataset_all.remove_labels(**irrelevant_labels) | |||||
# Split the whole dataset according to the classification targets. | |||||
datasets = split_dataset_by_target(dataset_all) | |||||
# Get the first class of graphs, whose median preimage will be computed. | |||||
dataset = datasets[0] | |||||
# dataset.cut_graphs(range(0, 10)) | |||||
len(dataset.graphs) | |||||
"""**2. Set parameters.**""" | |||||
import multiprocessing | |||||
# Parameters for MedianPreimageGenerator (our method). | |||||
mpg_options = {'fit_method': 'k-graphs', # how to fit edit costs. "k-graphs" means use all graphs in median set when fitting. | |||||
'init_ecc': [4, 4, 2, 1, 1, 1], # initial edit costs. | |||||
'ds_name': ds_name, # name of the dataset. | |||||
'parallel': True, # @todo: whether the parallel scheme is to be used. | |||||
'time_limit_in_sec': 0, # maximum time limit to compute the preimage. If set to 0 then no limit. | |||||
'max_itrs': 100, # maximum iteration limit to optimize edit costs. If set to 0 then no limit. | |||||
'max_itrs_without_update': 3, # If the times that edit costs is not update is more than this number, then the optimization stops. | |||||
'epsilon_residual': 0.01, # In optimization, the residual is only considered changed if the change is bigger than this number. | |||||
'epsilon_ec': 0.1, # In optimization, the edit costs are only considered changed if the changes are bigger than this number. | |||||
'verbose': 2 # whether to print out results. | |||||
} | |||||
# Parameters for graph kernel computation. | |||||
kernel_options = {'name': 'PathUpToH', # use path kernel up to length h. | |||||
'depth': 9, | |||||
'k_func': 'MinMax', | |||||
'compute_method': 'trie', | |||||
'parallel': 'imap_unordered', # or None | |||||
'n_jobs': multiprocessing.cpu_count(), | |||||
'normalize': True, # whether to use normalized Gram matrix to optimize edit costs. | |||||
'verbose': 2 # whether to print out results. | |||||
} | |||||
# Parameters for GED computation. | |||||
ged_options = {'method': 'BIPARTITE', # use Bipartite huristic. | |||||
'initialization_method': 'RANDOM', # or 'NODE', etc. | |||||
'initial_solutions': 10, # when bigger than 1, then the method is considered mIPFP. | |||||
'edit_cost': 'CONSTANT', # use CONSTANT cost. | |||||
'attr_distance': 'euclidean', # the distance between non-symbolic node/edge labels is computed by euclidean distance. | |||||
'ratio_runs_from_initial_solutions': 1, | |||||
'threads': multiprocessing.cpu_count(), # parallel threads. Do not work if mpg_options['parallel'] = False. | |||||
'init_option': 'LAZY_WITHOUT_SHUFFLED_COPIES' # 'EAGER_WITHOUT_SHUFFLED_COPIES' | |||||
} | |||||
# Parameters for MedianGraphEstimator (Boria's method). | |||||
mge_options = {'init_type': 'MEDOID', # how to initial median (compute set-median). "MEDOID" is to use the graph with smallest SOD. | |||||
'random_inits': 10, # number of random initialization when 'init_type' = 'RANDOM'. | |||||
'time_limit': 600, # maximum time limit to compute the generalized median. If set to 0 then no limit. | |||||
'verbose': 2, # whether to print out results. | |||||
'refine': False # whether to refine the final SODs or not. | |||||
} | |||||
print('done.') | |||||
"""**3. Run median preimage generator.**""" | |||||
from gklearn.preimage import MedianPreimageGeneratorPy | |||||
# Create median preimage generator instance. | |||||
mpg = MedianPreimageGeneratorPy() | |||||
# Add dataset. | |||||
mpg.dataset = dataset | |||||
# Set parameters. | |||||
mpg.set_options(**mpg_options.copy()) | |||||
mpg.kernel_options = kernel_options.copy() | |||||
mpg.ged_options = ged_options.copy() | |||||
mpg.mge_options = mge_options.copy() | |||||
# Run. | |||||
mpg.run() | |||||
"""**4. Get results.**""" | |||||
# Get results. | |||||
import pprint | |||||
pp = pprint.PrettyPrinter(indent=4) # pretty print | |||||
results = mpg.get_results() | |||||
pp.pprint(results) | |||||
# Draw generated graphs. | |||||
def draw_graph(graph): | |||||
import matplotlib.pyplot as plt | |||||
import networkx as nx | |||||
plt.figure() | |||||
pos = nx.spring_layout(graph) | |||||
nx.draw(graph, pos, node_size=500, labels=nx.get_node_attributes(graph, 'atom_symbol'), font_color='w', width=3, with_labels=True) | |||||
plt.show() | |||||
plt.clf() | |||||
plt.close() | |||||
draw_graph(mpg.set_median) | |||||
draw_graph(mpg.gen_median) |
@@ -0,0 +1,126 @@ | |||||
#!/usr/bin/env python3 | |||||
# -*- coding: utf-8 -*- | |||||
""" | |||||
Created on Thu Jun 25 11:31:46 2020 | |||||
@author: ljia | |||||
""" | |||||
def xp_check_results_of_GEDEnv(): | |||||
"""Compare results of GEDEnv to GEDLIB. | |||||
""" | |||||
"""**1. Get dataset.**""" | |||||
from gklearn.utils import Dataset | |||||
# Predefined dataset name, use dataset "MUTAG". | |||||
ds_name = 'MUTAG' | |||||
# Initialize a Dataset. | |||||
dataset = Dataset() | |||||
# Load predefined dataset "MUTAG". | |||||
dataset.load_predefined_dataset(ds_name) | |||||
results1 = compute_geds_by_GEDEnv(dataset) | |||||
results2 = compute_geds_by_GEDLIB(dataset) | |||||
# Show results. | |||||
import pprint | |||||
pp = pprint.PrettyPrinter(indent=4) # pretty print | |||||
print('Restuls using GEDEnv:') | |||||
pp.pprint(results1) | |||||
print() | |||||
print('Restuls using GEDLIB:') | |||||
pp.pprint(results2) | |||||
return results1, results2 | |||||
def compute_geds_by_GEDEnv(dataset): | |||||
from gklearn.ged.env import GEDEnv | |||||
import numpy as np | |||||
graph1 = dataset.graphs[0] | |||||
graph2 = dataset.graphs[1] | |||||
ged_env = GEDEnv() # initailize GED environment. | |||||
ged_env.set_edit_cost('CONSTANT', # GED cost type. | |||||
edit_cost_constants=[3, 3, 1, 3, 3, 1] # edit costs. | |||||
) | |||||
for g in dataset.graphs[0:10]: | |||||
ged_env.add_nx_graph(g, '') | |||||
# ged_env.add_nx_graph(graph1, '') # add graph1 | |||||
# ged_env.add_nx_graph(graph2, '') # add graph2 | |||||
listID = ged_env.get_all_graph_ids() # get list IDs of graphs | |||||
ged_env.init(init_type='LAZY_WITHOUT_SHUFFLED_COPIES') # initialize GED environment. | |||||
options = {'threads': 1 # parallel threads. | |||||
} | |||||
ged_env.set_method('BIPARTITE', # GED method. | |||||
options # options for GED method. | |||||
) | |||||
ged_env.init_method() # initialize GED method. | |||||
ged_mat = np.empty((10, 10)) | |||||
for i in range(0, 10): | |||||
for j in range(i, 10): | |||||
ged_env.run_method(i, j) # run. | |||||
ged_mat[i, j] = ged_env.get_upper_bound(i, j) | |||||
ged_mat[j, i] = ged_mat[i, j] | |||||
results = {} | |||||
results['pi_forward'] = ged_env.get_forward_map(listID[0], listID[1]) # forward map. | |||||
results['pi_backward'] = ged_env.get_backward_map(listID[0], listID[1]) # backward map. | |||||
results['upper_bound'] = ged_env.get_upper_bound(listID[0], listID[1]) # GED bewteen two graphs. | |||||
results['runtime'] = ged_env.get_runtime(listID[0], listID[1]) | |||||
results['init_time'] = ged_env.get_init_time() | |||||
results['ged_mat'] = ged_mat | |||||
return results | |||||
def compute_geds_by_GEDLIB(dataset): | |||||
from gklearn.gedlib import librariesImport, gedlibpy | |||||
from gklearn.ged.util import ged_options_to_string | |||||
import numpy as np | |||||
graph1 = dataset.graphs[5] | |||||
graph2 = dataset.graphs[6] | |||||
ged_env = gedlibpy.GEDEnv() # initailize GED environment. | |||||
ged_env.set_edit_cost('CONSTANT', # GED cost type. | |||||
edit_cost_constant=[3, 3, 1, 3, 3, 1] # edit costs. | |||||
) | |||||
# ged_env.add_nx_graph(graph1, '') # add graph1 | |||||
# ged_env.add_nx_graph(graph2, '') # add graph2 | |||||
for g in dataset.graphs[0:10]: | |||||
ged_env.add_nx_graph(g, '') | |||||
listID = ged_env.get_all_graph_ids() # get list IDs of graphs | |||||
ged_env.init(init_option='LAZY_WITHOUT_SHUFFLED_COPIES') # initialize GED environment. | |||||
options = {'initialization-method': 'RANDOM', # or 'NODE', etc. | |||||
'threads': 1 # parallel threads. | |||||
} | |||||
ged_env.set_method('BIPARTITE', # GED method. | |||||
ged_options_to_string(options) # options for GED method. | |||||
) | |||||
ged_env.init_method() # initialize GED method. | |||||
ged_mat = np.empty((10, 10)) | |||||
for i in range(0, 10): | |||||
for j in range(i, 10): | |||||
ged_env.run_method(i, j) # run. | |||||
ged_mat[i, j] = ged_env.get_upper_bound(i, j) | |||||
ged_mat[j, i] = ged_mat[i, j] | |||||
results = {} | |||||
results['pi_forward'] = ged_env.get_forward_map(listID[0], listID[1]) # forward map. | |||||
results['pi_backward'] = ged_env.get_backward_map(listID[0], listID[1]) # backward map. | |||||
results['upper_bound'] = ged_env.get_upper_bound(listID[0], listID[1]) # GED bewteen two graphs. | |||||
results['runtime'] = ged_env.get_runtime(listID[0], listID[1]) | |||||
results['init_time'] = ged_env.get_init_time() | |||||
results['ged_mat'] = ged_mat | |||||
return results | |||||
if __name__ == '__main__': | |||||
results1, results2 = xp_check_results_of_GEDEnv() |
@@ -0,0 +1,196 @@ | |||||
#!/usr/bin/env python3 | |||||
# -*- coding: utf-8 -*- | |||||
""" | |||||
Created on Mon Oct 5 16:08:33 2020 | |||||
@author: ljia | |||||
This script compute classification accuracy of each geaph kernel on datasets | |||||
with different entropy of degree distribution. | |||||
""" | |||||
from utils import Graph_Kernel_List, cross_validate | |||||
import numpy as np | |||||
import logging | |||||
num_nodes = 40 | |||||
half_num_graphs = 100 | |||||
def generate_graphs(): | |||||
# from gklearn.utils.graph_synthesizer import GraphSynthesizer | |||||
# gsyzer = GraphSynthesizer() | |||||
# graphs = gsyzer.unified_graphs(num_graphs=1000, num_nodes=20, num_edges=40, num_node_labels=0, num_edge_labels=0, seed=None, directed=False) | |||||
# return graphs | |||||
import networkx as nx | |||||
degrees11 = [5] * num_nodes | |||||
# degrees12 = [2] * num_nodes | |||||
degrees12 = [5] * num_nodes | |||||
degrees21 = list(range(1, 11)) * 6 | |||||
# degrees22 = [5 * i for i in list(range(1, 11)) * 6] | |||||
degrees22 = list(range(1, 11)) * 6 | |||||
# method 1 | |||||
graphs11 = [nx.configuration_model(degrees11, create_using=nx.Graph) for i in range(half_num_graphs)] | |||||
graphs12 = [nx.configuration_model(degrees12, create_using=nx.Graph) for i in range(half_num_graphs)] | |||||
for g in graphs11: | |||||
g.remove_edges_from(nx.selfloop_edges(g)) | |||||
for g in graphs12: | |||||
g.remove_edges_from(nx.selfloop_edges(g)) | |||||
# method 2: can easily generate isomorphic graphs. | |||||
# graphs11 = [nx.random_regular_graph(2, num_nodes, seed=None) for i in range(half_num_graphs)] | |||||
# graphs12 = [nx.random_regular_graph(10, num_nodes, seed=None) for i in range(half_num_graphs)] | |||||
# Add node labels. | |||||
for g in graphs11: | |||||
for n in g.nodes(): | |||||
g.nodes[n]['atom'] = 0 | |||||
for g in graphs12: | |||||
for n in g.nodes(): | |||||
g.nodes[n]['atom'] = 1 | |||||
graphs1 = graphs11 + graphs12 | |||||
# method 1: the entorpy of the two classes is not the same. | |||||
graphs21 = [nx.configuration_model(degrees21, create_using=nx.Graph) for i in range(half_num_graphs)] | |||||
graphs22 = [nx.configuration_model(degrees22, create_using=nx.Graph) for i in range(half_num_graphs)] | |||||
for g in graphs21: | |||||
g.remove_edges_from(nx.selfloop_edges(g)) | |||||
for g in graphs22: | |||||
g.remove_edges_from(nx.selfloop_edges(g)) | |||||
# # method 2: tooo slow, and may fail. | |||||
# graphs21 = [nx.random_degree_sequence_graph(degrees21, seed=None, tries=100) for i in range(half_num_graphs)] | |||||
# graphs22 = [nx.random_degree_sequence_graph(degrees22, seed=None, tries=100) for i in range(half_num_graphs)] | |||||
# # method 3: no randomness. | |||||
# graphs21 = [nx.havel_hakimi_graph(degrees21, create_using=None) for i in range(half_num_graphs)] | |||||
# graphs22 = [nx.havel_hakimi_graph(degrees22, create_using=None) for i in range(half_num_graphs)] | |||||
# # method 4: | |||||
# graphs21 = [nx.configuration_model(degrees21, create_using=nx.Graph) for i in range(half_num_graphs)] | |||||
# graphs22 = [nx.degree_sequence_tree(degrees21, create_using=nx.Graph) for i in range(half_num_graphs)] | |||||
# # method 5: the entorpy of the two classes is not the same. | |||||
# graphs21 = [nx.expected_degree_graph(degrees21, seed=None, selfloops=False) for i in range(half_num_graphs)] | |||||
# graphs22 = [nx.expected_degree_graph(degrees22, seed=None, selfloops=False) for i in range(half_num_graphs)] | |||||
# # method 6: seems there is no randomness0 | |||||
# graphs21 = [nx.random_powerlaw_tree(num_nodes, gamma=3, seed=None, tries=10000) for i in range(half_num_graphs)] | |||||
# graphs22 = [nx.random_powerlaw_tree(num_nodes, gamma=3, seed=None, tries=10000) for i in range(half_num_graphs)] | |||||
# Add node labels. | |||||
for g in graphs21: | |||||
for n in g.nodes(): | |||||
g.nodes[n]['atom'] = 0 | |||||
for g in graphs22: | |||||
for n in g.nodes(): | |||||
g.nodes[n]['atom'] = 1 | |||||
graphs2 = graphs21 + graphs22 | |||||
# # check for isomorphism. | |||||
# iso_mat1 = np.zeros((len(graphs1), len(graphs1))) | |||||
# num1 = 0 | |||||
# num2 = 0 | |||||
# for i in range(len(graphs1)): | |||||
# for j in range(i + 1, len(graphs1)): | |||||
# if nx.is_isomorphic(graphs1[i], graphs1[j]): | |||||
# iso_mat1[i, j] = 1 | |||||
# iso_mat1[j, i] = 1 | |||||
# num1 += 1 | |||||
# print('iso:', num1, ':', i, ',', j) | |||||
# else: | |||||
# num2 += 1 | |||||
# print('not iso:', num2, ':', i, ',', j) | |||||
# | |||||
# iso_mat2 = np.zeros((len(graphs2), len(graphs2))) | |||||
# num1 = 0 | |||||
# num2 = 0 | |||||
# for i in range(len(graphs2)): | |||||
# for j in range(i + 1, len(graphs2)): | |||||
# if nx.is_isomorphic(graphs2[i], graphs2[j]): | |||||
# iso_mat2[i, j] = 1 | |||||
# iso_mat2[j, i] = 1 | |||||
# num1 += 1 | |||||
# print('iso:', num1, ':', i, ',', j) | |||||
# else: | |||||
# num2 += 1 | |||||
# print('not iso:', num2, ':', i, ',', j) | |||||
return graphs1, graphs2 | |||||
def get_infos(graph): | |||||
from gklearn.utils import Dataset | |||||
ds = Dataset() | |||||
ds.load_graphs(graph) | |||||
infos = ds.get_dataset_infos(keys=['all_degree_entropy', 'ave_node_degree']) | |||||
infos['ave_degree_entropy'] = np.mean(infos['all_degree_entropy']) | |||||
print(infos['ave_degree_entropy'], ',', infos['ave_node_degree']) | |||||
return infos | |||||
def xp_accuracy_diff_entropy(): | |||||
# Generate graphs. | |||||
graphs1, graphs2 = generate_graphs() | |||||
# Compute entropy of degree distribution of the generated graphs. | |||||
info11 = get_infos(graphs1[0:half_num_graphs]) | |||||
info12 = get_infos(graphs1[half_num_graphs:]) | |||||
info21 = get_infos(graphs2[0:half_num_graphs]) | |||||
info22 = get_infos(graphs2[half_num_graphs:]) | |||||
# Run and save. | |||||
import pickle | |||||
import os | |||||
save_dir = 'outputs/accuracy_diff_entropy/' | |||||
if not os.path.exists(save_dir): | |||||
os.makedirs(save_dir) | |||||
accuracies = {} | |||||
confidences = {} | |||||
for kernel_name in Graph_Kernel_List: | |||||
print() | |||||
print('Kernel:', kernel_name) | |||||
accuracies[kernel_name] = [] | |||||
confidences[kernel_name] = [] | |||||
for set_i, graphs in enumerate([graphs1, graphs2]): | |||||
print() | |||||
print('Graph set', set_i) | |||||
tmp_graphs = [g.copy() for g in graphs] | |||||
targets = [0] * half_num_graphs + [1] * half_num_graphs | |||||
accuracy = 'error' | |||||
confidence = 'error' | |||||
try: | |||||
accuracy, confidence = cross_validate(tmp_graphs, targets, kernel_name, ds_name=str(set_i), output_dir=save_dir) #, n_jobs=1) | |||||
except Exception as exp: | |||||
print('An exception occured when running this experiment:') | |||||
LOG_FILENAME = save_dir + 'error.txt' | |||||
logging.basicConfig(filename=LOG_FILENAME, level=logging.DEBUG) | |||||
logging.exception('\n' + kernel_name + ', ' + str(set_i) + ':') | |||||
print(repr(exp)) | |||||
accuracies[kernel_name].append(accuracy) | |||||
confidences[kernel_name].append(confidence) | |||||
pickle.dump(accuracy, open(save_dir + 'accuracy.' + kernel_name + '.' + str(set_i) + '.pkl', 'wb')) | |||||
pickle.dump(confidence, open(save_dir + 'confidence.' + kernel_name + '.' + str(set_i) + '.pkl', 'wb')) | |||||
# Save all. | |||||
pickle.dump(accuracies, open(save_dir + 'accuracies.pkl', 'wb')) | |||||
pickle.dump(confidences, open(save_dir + 'confidences.pkl', 'wb')) | |||||
return | |||||
if __name__ == '__main__': | |||||
xp_accuracy_diff_entropy() |
@@ -0,0 +1,57 @@ | |||||
#!/usr/bin/env python3 | |||||
# -*- coding: utf-8 -*- | |||||
""" | |||||
Created on Mon Sep 21 10:34:26 2020 | |||||
@author: ljia | |||||
""" | |||||
from utils import Graph_Kernel_List, Dataset_List, compute_graph_kernel | |||||
from gklearn.utils.graphdataset import load_predefined_dataset | |||||
import logging | |||||
def xp_runtimes_of_all_28cores(): | |||||
# Run and save. | |||||
import pickle | |||||
import os | |||||
save_dir = 'outputs/runtimes_of_all_28cores/' | |||||
if not os.path.exists(save_dir): | |||||
os.makedirs(save_dir) | |||||
run_times = {} | |||||
for ds_name in Dataset_List: | |||||
print() | |||||
print('Dataset:', ds_name) | |||||
run_times[ds_name] = [] | |||||
for kernel_name in Graph_Kernel_List: | |||||
print() | |||||
print('Kernel:', kernel_name) | |||||
# get graphs. | |||||
graphs, _ = load_predefined_dataset(ds_name) | |||||
# Compute Gram matrix. | |||||
run_time = 'error' | |||||
try: | |||||
gram_matrix, run_time = compute_graph_kernel(graphs, kernel_name, n_jobs=28) | |||||
except Exception as exp: | |||||
print('An exception occured when running this experiment:') | |||||
LOG_FILENAME = save_dir + 'error.txt' | |||||
logging.basicConfig(filename=LOG_FILENAME, level=logging.DEBUG) | |||||
logging.exception('') | |||||
print(repr(exp)) | |||||
run_times[ds_name].append(run_time) | |||||
pickle.dump(run_time, open(save_dir + 'run_time.' + kernel_name + '.' + ds_name + '.pkl', 'wb')) | |||||
# Save all. | |||||
pickle.dump(run_times, open(save_dir + 'run_times.pkl', 'wb')) | |||||
return | |||||
if __name__ == '__main__': | |||||
xp_runtimes_of_all_28cores() |
@@ -0,0 +1,62 @@ | |||||
#!/usr/bin/env python3 | |||||
# -*- coding: utf-8 -*- | |||||
""" | |||||
Created on Mon Sep 21 10:34:26 2020 | |||||
@author: ljia | |||||
""" | |||||
from utils import Graph_Kernel_List, Dataset_List, compute_graph_kernel | |||||
from gklearn.utils.graphdataset import load_predefined_dataset | |||||
import logging | |||||
def xp_runtimes_diff_chunksizes(): | |||||
# Run and save. | |||||
import pickle | |||||
import os | |||||
save_dir = 'outputs/runtimes_diff_chunksizes/' | |||||
if not os.path.exists(save_dir): | |||||
os.makedirs(save_dir) | |||||
run_times = {} | |||||
for ds_name in Dataset_List: | |||||
print() | |||||
print('Dataset:', ds_name) | |||||
run_times[ds_name] = [] | |||||
for kernel_name in Graph_Kernel_List: | |||||
print() | |||||
print('Kernel:', kernel_name) | |||||
run_times[ds_name].append([]) | |||||
for chunksize in [1, 5, 10, 50, 100, 500, 1000, 5000, 10000, 50000, 100000]: | |||||
print() | |||||
print('Chunksize:', chunksize) | |||||
# get graphs. | |||||
graphs, _ = load_predefined_dataset(ds_name) | |||||
# Compute Gram matrix. | |||||
run_time = 'error' | |||||
try: | |||||
gram_matrix, run_time = compute_graph_kernel(graphs, kernel_name, chunksize=chunksize) | |||||
except Exception as exp: | |||||
print('An exception occured when running this experiment:') | |||||
LOG_FILENAME = save_dir + 'error.txt' | |||||
logging.basicConfig(filename=LOG_FILENAME, level=logging.DEBUG) | |||||
logging.exception('') | |||||
print(repr(exp)) | |||||
run_times[ds_name][-1].append(run_time) | |||||
pickle.dump(run_time, open(save_dir + 'run_time.' + kernel_name + '.' + ds_name + '.' + str(chunksize) + '.pkl', 'wb')) | |||||
# Save all. | |||||
pickle.dump(run_times, open(save_dir + 'run_times.pkl', 'wb')) | |||||
return | |||||
if __name__ == '__main__': | |||||
xp_runtimes_diff_chunksizes() |
@@ -0,0 +1,64 @@ | |||||
#!/usr/bin/env python3 | |||||
# -*- coding: utf-8 -*- | |||||
""" | |||||
Created on Mon Sep 21 10:34:26 2020 | |||||
@author: ljia | |||||
""" | |||||
from utils import Graph_Kernel_List, compute_graph_kernel | |||||
import logging | |||||
def generate_graphs(): | |||||
from gklearn.utils.graph_synthesizer import GraphSynthesizer | |||||
gsyzer = GraphSynthesizer() | |||||
graphs = gsyzer.unified_graphs(num_graphs=1000, num_nodes=20, num_edges=40, num_node_labels=0, num_edge_labels=0, seed=None, directed=False) | |||||
return graphs | |||||
def xp_synthesized_graphs_dataset_size(): | |||||
# Generate graphs. | |||||
graphs = generate_graphs() | |||||
# Run and save. | |||||
import pickle | |||||
import os | |||||
save_dir = 'outputs/synthesized_graphs_N/' | |||||
if not os.path.exists(save_dir): | |||||
os.makedirs(save_dir) | |||||
run_times = {} | |||||
for kernel_name in Graph_Kernel_List: | |||||
print() | |||||
print('Kernel:', kernel_name) | |||||
run_times[kernel_name] = [] | |||||
for num_graphs in [100, 200, 300, 400, 500, 600, 700, 800, 900, 1000]: | |||||
print() | |||||
print('Number of graphs:', num_graphs) | |||||
sub_graphs = [g.copy() for g in graphs[0:num_graphs]] | |||||
run_time = 'error' | |||||
try: | |||||
gram_matrix, run_time = compute_graph_kernel(sub_graphs, kernel_name) | |||||
except Exception as exp: | |||||
print('An exception occured when running this experiment:') | |||||
LOG_FILENAME = save_dir + 'error.txt' | |||||
logging.basicConfig(filename=LOG_FILENAME, level=logging.DEBUG) | |||||
logging.exception('') | |||||
print(repr(exp)) | |||||
run_times[kernel_name].append(run_time) | |||||
pickle.dump(run_time, open(save_dir + 'run_time.' + kernel_name + '.' + str(num_graphs) + '.pkl', 'wb')) | |||||
# Save all. | |||||
pickle.dump(run_times, open(save_dir + 'run_times.pkl', 'wb')) | |||||
return | |||||
if __name__ == '__main__': | |||||
xp_synthesized_graphs_dataset_size() |
@@ -0,0 +1,63 @@ | |||||
#!/usr/bin/env python3 | |||||
# -*- coding: utf-8 -*- | |||||
""" | |||||
Created on Mon Sep 21 10:34:26 2020 | |||||
@author: ljia | |||||
""" | |||||
from utils import Graph_Kernel_List, compute_graph_kernel | |||||
import logging | |||||
def generate_graphs(degree): | |||||
from gklearn.utils.graph_synthesizer import GraphSynthesizer | |||||
gsyzer = GraphSynthesizer() | |||||
graphs = gsyzer.unified_graphs(num_graphs=100, num_nodes=20, num_edges=int(10*degree), num_node_labels=0, num_edge_labels=0, seed=None, directed=False) | |||||
return graphs | |||||
def xp_synthesized_graphs_degrees(): | |||||
# Run and save. | |||||
import pickle | |||||
import os | |||||
save_dir = 'outputs/synthesized_graphs_degrees/' | |||||
if not os.path.exists(save_dir): | |||||
os.makedirs(save_dir) | |||||
run_times = {} | |||||
for kernel_name in Graph_Kernel_List: | |||||
print() | |||||
print('Kernel:', kernel_name) | |||||
run_times[kernel_name] = [] | |||||
for degree in [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]: | |||||
print() | |||||
print('Degree:', degree) | |||||
# Generate graphs. | |||||
graphs = generate_graphs(degree) | |||||
# Compute Gram matrix. | |||||
run_time = 'error' | |||||
try: | |||||
gram_matrix, run_time = compute_graph_kernel(graphs, kernel_name) | |||||
except Exception as exp: | |||||
print('An exception occured when running this experiment:') | |||||
LOG_FILENAME = save_dir + 'error.txt' | |||||
logging.basicConfig(filename=LOG_FILENAME, level=logging.DEBUG) | |||||
logging.exception('') | |||||
print(repr(exp)) | |||||
run_times[kernel_name].append(run_time) | |||||
pickle.dump(run_time, open(save_dir + 'run_time.' + kernel_name + '.' + str(degree) + '.pkl', 'wb')) | |||||
# Save all. | |||||
pickle.dump(run_times, open(save_dir + 'run_times.pkl', 'wb')) | |||||
return | |||||
if __name__ == '__main__': | |||||
xp_synthesized_graphs_degrees() |
@@ -0,0 +1,63 @@ | |||||
#!/usr/bin/env python3 | |||||
# -*- coding: utf-8 -*- | |||||
""" | |||||
Created on Mon Sep 21 10:34:26 2020 | |||||
@author: ljia | |||||
""" | |||||
from utils import Graph_Kernel_List_ESym, compute_graph_kernel | |||||
import logging | |||||
def generate_graphs(num_el_alp): | |||||
from gklearn.utils.graph_synthesizer import GraphSynthesizer | |||||
gsyzer = GraphSynthesizer() | |||||
graphs = gsyzer.unified_graphs(num_graphs=100, num_nodes=20, num_edges=40, num_node_labels=0, num_edge_labels=num_el_alp, seed=None, directed=False) | |||||
return graphs | |||||
def xp_synthesized_graphs_num_edge_label_alphabet(): | |||||
# Run and save. | |||||
import pickle | |||||
import os | |||||
save_dir = 'outputs/synthesized_graphs_num_edge_label_alphabet/' | |||||
if not os.path.exists(save_dir): | |||||
os.makedirs(save_dir) | |||||
run_times = {} | |||||
for kernel_name in Graph_Kernel_List_ESym: | |||||
print() | |||||
print('Kernel:', kernel_name) | |||||
run_times[kernel_name] = [] | |||||
for num_el_alp in [0, 4, 8, 12, 16, 20, 24, 28, 32, 36, 40]: | |||||
print() | |||||
print('Number of edge label alphabet:', num_el_alp) | |||||
# Generate graphs. | |||||
graphs = generate_graphs(num_el_alp) | |||||
# Compute Gram matrix. | |||||
run_time = 'error' | |||||
try: | |||||
gram_matrix, run_time = compute_graph_kernel(graphs, kernel_name) | |||||
except Exception as exp: | |||||
print('An exception occured when running this experiment:') | |||||
LOG_FILENAME = save_dir + 'error.txt' | |||||
logging.basicConfig(filename=LOG_FILENAME, level=logging.DEBUG) | |||||
logging.exception('') | |||||
print(repr(exp)) | |||||
run_times[kernel_name].append(run_time) | |||||
pickle.dump(run_time, open(save_dir + 'run_time.' + kernel_name + '.' + str(num_el_alp) + '.pkl', 'wb')) | |||||
# Save all. | |||||
pickle.dump(run_times, open(save_dir + 'run_times.pkl', 'wb')) | |||||
return | |||||
if __name__ == '__main__': | |||||
xp_synthesized_graphs_num_edge_label_alphabet() |
@@ -0,0 +1,64 @@ | |||||
#!/usr/bin/env python3 | |||||
# -*- coding: utf-8 -*- | |||||
""" | |||||
Created on Mon Sep 21 10:34:26 2020 | |||||
@author: ljia | |||||
""" | |||||
from utils import Graph_Kernel_List_VSym, compute_graph_kernel | |||||
import logging | |||||
def generate_graphs(num_nl_alp): | |||||
from gklearn.utils.graph_synthesizer import GraphSynthesizer | |||||
gsyzer = GraphSynthesizer() | |||||
graphs = gsyzer.unified_graphs(num_graphs=100, num_nodes=20, num_edges=40, num_node_labels=num_nl_alp, num_edge_labels=0, seed=None, directed=False) | |||||
return graphs | |||||
def xp_synthesized_graphs_num_node_label_alphabet(): | |||||
# Run and save. | |||||
import pickle | |||||
import os | |||||
save_dir = 'outputs/synthesized_graphs_num_node_label_alphabet/' | |||||
if not os.path.exists(save_dir): | |||||
os.makedirs(save_dir) | |||||
run_times = {} | |||||
for kernel_name in Graph_Kernel_List_VSym: | |||||
print() | |||||
print('Kernel:', kernel_name) | |||||
run_times[kernel_name] = [] | |||||
for num_nl_alp in [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20]: | |||||
print() | |||||
print('Number of node label alphabet:', num_nl_alp) | |||||
# Generate graphs. | |||||
graphs = generate_graphs(num_nl_alp) | |||||
# Compute Gram matrix. | |||||
run_time = 'error' | |||||
try: | |||||
gram_matrix, run_time = compute_graph_kernel(graphs, kernel_name) | |||||
except Exception as exp: | |||||
run_times[kernel_name].append('error') | |||||
print('An exception occured when running this experiment:') | |||||
LOG_FILENAME = save_dir + 'error.txt' | |||||
logging.basicConfig(filename=LOG_FILENAME, level=logging.DEBUG) | |||||
logging.exception('') | |||||
print(repr(exp)) | |||||
run_times[kernel_name].append(run_time) | |||||
pickle.dump(run_time, open(save_dir + 'run_time.' + kernel_name + '.' + str(num_nl_alp) + '.pkl', 'wb')) | |||||
# Save all. | |||||
pickle.dump(run_times, open(save_dir + 'run_times.pkl', 'wb')) | |||||
return | |||||
if __name__ == '__main__': | |||||
xp_synthesized_graphs_num_node_label_alphabet() |
@@ -0,0 +1,64 @@ | |||||
#!/usr/bin/env python3 | |||||
# -*- coding: utf-8 -*- | |||||
""" | |||||
Created on Mon Sep 21 10:34:26 2020 | |||||
@author: ljia | |||||
""" | |||||
from utils import Graph_Kernel_List, compute_graph_kernel | |||||
import logging | |||||
def generate_graphs(num_nodes): | |||||
from gklearn.utils.graph_synthesizer import GraphSynthesizer | |||||
gsyzer = GraphSynthesizer() | |||||
graphs = gsyzer.unified_graphs(num_graphs=100, num_nodes=num_nodes, num_edges=int(num_nodes*2), num_node_labels=0, num_edge_labels=0, seed=None, directed=False) | |||||
return graphs | |||||
def xp_synthesized_graphs_num_nodes(): | |||||
# Run and save. | |||||
import pickle | |||||
import os | |||||
save_dir = 'outputs/synthesized_graphs_num_nodes/' | |||||
if not os.path.exists(save_dir): | |||||
os.makedirs(save_dir) | |||||
run_times = {} | |||||
for kernel_name in Graph_Kernel_List: | |||||
print() | |||||
print('Kernel:', kernel_name) | |||||
run_times[kernel_name] = [] | |||||
for num_nodes in [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]: | |||||
print() | |||||
print('Number of nodes:', num_nodes) | |||||
# Generate graphs. | |||||
graphs = generate_graphs(num_nodes) | |||||
# Compute Gram matrix. | |||||
run_time = 'error' | |||||
try: | |||||
gram_matrix, run_time = compute_graph_kernel(graphs, kernel_name) | |||||
except Exception as exp: | |||||
run_times[kernel_name].append('error') | |||||
print('An exception occured when running this experiment:') | |||||
LOG_FILENAME = save_dir + 'error.txt' | |||||
logging.basicConfig(filename=LOG_FILENAME, level=logging.DEBUG) | |||||
logging.exception('') | |||||
print(repr(exp)) | |||||
run_times[kernel_name].append(run_time) | |||||
pickle.dump(run_time, open(save_dir + 'run_time.' + kernel_name + '.' + str(num_nodes) + '.pkl', 'wb')) | |||||
# Save all. | |||||
pickle.dump(run_times, open(save_dir + 'run_times.pkl', 'wb')) | |||||
return | |||||
if __name__ == '__main__': | |||||
xp_synthesized_graphs_num_nodes() |
@@ -0,0 +1,236 @@ | |||||
#!/usr/bin/env python3 | |||||
# -*- coding: utf-8 -*- | |||||
""" | |||||
Created on Tue Sep 22 11:33:28 2020 | |||||
@author: ljia | |||||
""" | |||||
import multiprocessing | |||||
import numpy as np | |||||
from gklearn.utils import model_selection_for_precomputed_kernel | |||||
Graph_Kernel_List = ['PathUpToH', 'WLSubtree', 'SylvesterEquation', 'Marginalized', 'ShortestPath', 'Treelet', 'ConjugateGradient', 'FixedPoint', 'SpectralDecomposition', 'StructuralSP', 'CommonWalk'] | |||||
# Graph_Kernel_List = ['CommonWalk', 'Marginalized', 'SylvesterEquation', 'ConjugateGradient', 'FixedPoint', 'SpectralDecomposition', 'ShortestPath', 'StructuralSP', 'PathUpToH', 'Treelet', 'WLSubtree'] | |||||
Graph_Kernel_List_VSym = ['PathUpToH', 'WLSubtree', 'Marginalized', 'ShortestPath', 'Treelet', 'ConjugateGradient', 'FixedPoint', 'StructuralSP', 'CommonWalk'] | |||||
Graph_Kernel_List_ESym = ['PathUpToH', 'Marginalized', 'Treelet', 'ConjugateGradient', 'FixedPoint', 'StructuralSP', 'CommonWalk'] | |||||
Graph_Kernel_List_VCon = ['ShortestPath', 'ConjugateGradient', 'FixedPoint', 'StructuralSP'] | |||||
Graph_Kernel_List_ECon = ['ConjugateGradient', 'FixedPoint', 'StructuralSP'] | |||||
Dataset_List = ['Alkane', 'Acyclic', 'MAO', 'PAH', 'MUTAG', 'Letter-med', 'ENZYMES', 'AIDS', 'NCI1', 'NCI109', 'DD'] | |||||
def compute_graph_kernel(graphs, kernel_name, n_jobs=multiprocessing.cpu_count(), chunksize=None): | |||||
if kernel_name == 'CommonWalk': | |||||
from gklearn.kernels.commonWalkKernel import commonwalkkernel | |||||
estimator = commonwalkkernel | |||||
params = {'compute_method': 'geo', 'weight': 0.1} | |||||
elif kernel_name == 'Marginalized': | |||||
from gklearn.kernels.marginalizedKernel import marginalizedkernel | |||||
estimator = marginalizedkernel | |||||
params = {'p_quit': 0.5, 'n_iteration': 5, 'remove_totters': False} | |||||
elif kernel_name == 'SylvesterEquation': | |||||
from gklearn.kernels.randomWalkKernel import randomwalkkernel | |||||
estimator = randomwalkkernel | |||||
params = {'compute_method': 'sylvester', 'weight': 0.1} | |||||
elif kernel_name == 'ConjugateGradient': | |||||
from gklearn.kernels.randomWalkKernel import randomwalkkernel | |||||
estimator = randomwalkkernel | |||||
from gklearn.utils.kernels import deltakernel, gaussiankernel, kernelproduct | |||||
import functools | |||||
mixkernel = functools.partial(kernelproduct, deltakernel, gaussiankernel) | |||||
sub_kernel = {'symb': deltakernel, 'nsymb': gaussiankernel, 'mix': mixkernel} | |||||
params = {'compute_method': 'conjugate', 'weight': 0.1, 'node_kernels': sub_kernel, 'edge_kernels': sub_kernel} | |||||
elif kernel_name == 'FixedPoint': | |||||
from gklearn.kernels.randomWalkKernel import randomwalkkernel | |||||
estimator = randomwalkkernel | |||||
from gklearn.utils.kernels import deltakernel, gaussiankernel, kernelproduct | |||||
import functools | |||||
mixkernel = functools.partial(kernelproduct, deltakernel, gaussiankernel) | |||||
sub_kernel = {'symb': deltakernel, 'nsymb': gaussiankernel, 'mix': mixkernel} | |||||
params = {'compute_method': 'fp', 'weight': 1e-4, 'node_kernels': sub_kernel, 'edge_kernels': sub_kernel} | |||||
elif kernel_name == 'SpectralDecomposition': | |||||
from gklearn.kernels.randomWalkKernel import randomwalkkernel | |||||
estimator = randomwalkkernel | |||||
params = {'compute_method': 'spectral', 'sub_kernel': 'geo', 'weight': 0.1} | |||||
elif kernel_name == 'ShortestPath': | |||||
from gklearn.kernels.spKernel import spkernel | |||||
estimator = spkernel | |||||
from gklearn.utils.kernels import deltakernel, gaussiankernel, kernelproduct | |||||
import functools | |||||
mixkernel = functools.partial(kernelproduct, deltakernel, gaussiankernel) | |||||
sub_kernel = {'symb': deltakernel, 'nsymb': gaussiankernel, 'mix': mixkernel} | |||||
params = {'node_kernels': sub_kernel} | |||||
elif kernel_name == 'StructuralSP': | |||||
from gklearn.kernels.structuralspKernel import structuralspkernel | |||||
estimator = structuralspkernel | |||||
from gklearn.utils.kernels import deltakernel, gaussiankernel, kernelproduct | |||||
import functools | |||||
mixkernel = functools.partial(kernelproduct, deltakernel, gaussiankernel) | |||||
sub_kernel = {'symb': deltakernel, 'nsymb': gaussiankernel, 'mix': mixkernel} | |||||
params = {'node_kernels': sub_kernel, 'edge_kernels': sub_kernel} | |||||
elif kernel_name == 'PathUpToH': | |||||
from gklearn.kernels.untilHPathKernel import untilhpathkernel | |||||
estimator = untilhpathkernel | |||||
params = {'depth': 5, 'k_func': 'MinMax', 'compute_method': 'trie'} | |||||
elif kernel_name == 'Treelet': | |||||
from gklearn.kernels.treeletKernel import treeletkernel | |||||
estimator = treeletkernel | |||||
from gklearn.utils.kernels import polynomialkernel | |||||
import functools | |||||
sub_kernel = functools.partial(polynomialkernel, d=4, c=1e+8) | |||||
params = {'sub_kernel': sub_kernel} | |||||
elif kernel_name == 'WLSubtree': | |||||
from gklearn.kernels.weisfeilerLehmanKernel import weisfeilerlehmankernel | |||||
estimator = weisfeilerlehmankernel | |||||
params = {'base_kernel': 'subtree', 'height': 5} | |||||
# params['parallel'] = None | |||||
params['n_jobs'] = n_jobs | |||||
params['chunksize'] = chunksize | |||||
params['verbose'] = True | |||||
results = estimator(graphs, **params) | |||||
return results[0], results[1] | |||||
def cross_validate(graphs, targets, kernel_name, output_dir='outputs/', ds_name='synthesized', n_jobs=multiprocessing.cpu_count()): | |||||
param_grid = None | |||||
if kernel_name == 'CommonWalk': | |||||
from gklearn.kernels.commonWalkKernel import commonwalkkernel | |||||
estimator = commonwalkkernel | |||||
param_grid_precomputed = [{'compute_method': ['geo'], | |||||
'weight': np.linspace(0.01, 0.15, 15)}] | |||||
elif kernel_name == 'Marginalized': | |||||
from gklearn.kernels.marginalizedKernel import marginalizedkernel | |||||
estimator = marginalizedkernel | |||||
param_grid_precomputed = {'p_quit': np.linspace(0.1, 0.9, 9), | |||||
'n_iteration': np.linspace(1, 19, 7), | |||||
'remove_totters': [False]} | |||||
elif kernel_name == 'SylvesterEquation': | |||||
from gklearn.kernels.randomWalkKernel import randomwalkkernel | |||||
estimator = randomwalkkernel | |||||
param_grid_precomputed = {'compute_method': ['sylvester'], | |||||
# 'weight': np.linspace(0.01, 0.10, 10)} | |||||
'weight': np.logspace(-1, -10, num=10, base=10)} | |||||
elif kernel_name == 'ConjugateGradient': | |||||
from gklearn.kernels.randomWalkKernel import randomwalkkernel | |||||
estimator = randomwalkkernel | |||||
from gklearn.utils.kernels import deltakernel, gaussiankernel, kernelproduct | |||||
import functools | |||||
mixkernel = functools.partial(kernelproduct, deltakernel, gaussiankernel) | |||||
sub_kernel = {'symb': deltakernel, 'nsymb': gaussiankernel, 'mix': mixkernel} | |||||
param_grid_precomputed = {'compute_method': ['conjugate'], | |||||
'node_kernels': [sub_kernel], 'edge_kernels': [sub_kernel], | |||||
'weight': np.logspace(-1, -10, num=10, base=10)} | |||||
elif kernel_name == 'FixedPoint': | |||||
from gklearn.kernels.randomWalkKernel import randomwalkkernel | |||||
estimator = randomwalkkernel | |||||
from gklearn.utils.kernels import deltakernel, gaussiankernel, kernelproduct | |||||
import functools | |||||
mixkernel = functools.partial(kernelproduct, deltakernel, gaussiankernel) | |||||
sub_kernel = {'symb': deltakernel, 'nsymb': gaussiankernel, 'mix': mixkernel} | |||||
param_grid_precomputed = {'compute_method': ['fp'], | |||||
'node_kernels': [sub_kernel], 'edge_kernels': [sub_kernel], | |||||
'weight': np.logspace(-4, -10, num=7, base=10)} | |||||
elif kernel_name == 'SpectralDecomposition': | |||||
from gklearn.kernels.randomWalkKernel import randomwalkkernel | |||||
estimator = randomwalkkernel | |||||
param_grid_precomputed = {'compute_method': ['spectral'], | |||||
'weight': np.logspace(-1, -10, num=10, base=10), | |||||
'sub_kernel': ['geo', 'exp']} | |||||
elif kernel_name == 'ShortestPath': | |||||
from gklearn.kernels.spKernel import spkernel | |||||
estimator = spkernel | |||||
from gklearn.utils.kernels import deltakernel, gaussiankernel, kernelproduct | |||||
import functools | |||||
mixkernel = functools.partial(kernelproduct, deltakernel, gaussiankernel) | |||||
sub_kernel = {'symb': deltakernel, 'nsymb': gaussiankernel, 'mix': mixkernel} | |||||
param_grid_precomputed = {'node_kernels': [sub_kernel]} | |||||
elif kernel_name == 'StructuralSP': | |||||
from gklearn.kernels.structuralspKernel import structuralspkernel | |||||
estimator = structuralspkernel | |||||
from gklearn.utils.kernels import deltakernel, gaussiankernel, kernelproduct | |||||
import functools | |||||
mixkernel = functools.partial(kernelproduct, deltakernel, gaussiankernel) | |||||
sub_kernel = {'symb': deltakernel, 'nsymb': gaussiankernel, 'mix': mixkernel} | |||||
param_grid_precomputed = {'node_kernels': [sub_kernel], 'edge_kernels': [sub_kernel], | |||||
'compute_method': ['naive']} | |||||
elif kernel_name == 'PathUpToH': | |||||
from gklearn.kernels.untilHPathKernel import untilhpathkernel | |||||
estimator = untilhpathkernel | |||||
param_grid_precomputed = {'depth': np.linspace(1, 10, 10), # [2], | |||||
'k_func': ['MinMax', 'tanimoto'], # ['MinMax'], # | |||||
'compute_method': ['trie']} # ['MinMax']} | |||||
elif kernel_name == 'Treelet': | |||||
from gklearn.kernels.treeletKernel import treeletkernel | |||||
estimator = treeletkernel | |||||
from gklearn.utils.kernels import gaussiankernel, polynomialkernel | |||||
import functools | |||||
gkernels = [functools.partial(gaussiankernel, gamma=1 / ga) | |||||
# for ga in np.linspace(1, 10, 10)] | |||||
for ga in np.logspace(0, 10, num=11, base=10)] | |||||
pkernels = [functools.partial(polynomialkernel, d=d, c=c) for d in range(1, 11) | |||||
for c in np.logspace(0, 10, num=11, base=10)] | |||||
# pkernels = [functools.partial(polynomialkernel, d=1, c=1)] | |||||
param_grid_precomputed = {'sub_kernel': pkernels + gkernels} | |||||
# 'parallel': [None]} | |||||
elif kernel_name == 'WLSubtree': | |||||
from gklearn.kernels.weisfeilerLehmanKernel import weisfeilerlehmankernel | |||||
estimator = weisfeilerlehmankernel | |||||
param_grid_precomputed = {'base_kernel': ['subtree'], | |||||
'height': np.linspace(0, 10, 11)} | |||||
param_grid = {'C': np.logspace(-10, 4, num=29, base=10)} | |||||
if param_grid is None: | |||||
param_grid = {'C': np.logspace(-10, 10, num=41, base=10)} | |||||
results = model_selection_for_precomputed_kernel( | |||||
graphs, | |||||
estimator, | |||||
param_grid_precomputed, | |||||
param_grid, | |||||
'classification', | |||||
NUM_TRIALS=28, | |||||
datafile_y=targets, | |||||
extra_params=None, | |||||
ds_name=ds_name, | |||||
output_dir=output_dir, | |||||
n_jobs=n_jobs, | |||||
read_gm_from_file=False, | |||||
verbose=True) | |||||
return results[0], results[1] |
@@ -0,0 +1,2 @@ | |||||
from gklearn.ged.edit_costs.edit_cost import EditCost | |||||
from gklearn.ged.edit_costs.constant import Constant |
@@ -0,0 +1,50 @@ | |||||
#!/usr/bin/env python3 | |||||
# -*- coding: utf-8 -*- | |||||
""" | |||||
Created on Wed Jun 17 17:52:23 2020 | |||||
@author: ljia | |||||
""" | |||||
from gklearn.ged.edit_costs import EditCost | |||||
class Constant(EditCost): | |||||
"""Implements constant edit cost functions. | |||||
""" | |||||
def __init__(self, node_ins_cost=1, node_del_cost=1, node_rel_cost=1, edge_ins_cost=1, edge_del_cost=1, edge_rel_cost=1): | |||||
self._node_ins_cost = node_ins_cost | |||||
self._node_del_cost = node_del_cost | |||||
self._node_rel_cost = node_rel_cost | |||||
self._edge_ins_cost = edge_ins_cost | |||||
self._edge_del_cost = edge_del_cost | |||||
self._edge_rel_cost = edge_rel_cost | |||||
def node_ins_cost_fun(self, node_label): | |||||
return self._node_ins_cost | |||||
def node_del_cost_fun(self, node_label): | |||||
return self._node_del_cost | |||||
def node_rel_cost_fun(self, node_label_1, node_label_2): | |||||
if node_label_1 != node_label_2: | |||||
return self._node_rel_cost | |||||
return 0 | |||||
def edge_ins_cost_fun(self, edge_label): | |||||
return self._edge_ins_cost | |||||
def edge_del_cost_fun(self, edge_label): | |||||
return self._edge_del_cost | |||||
def edge_rel_cost_fun(self, edge_label_1, edge_label_2): | |||||
if edge_label_1 != edge_label_2: | |||||
return self._edge_rel_cost | |||||
return 0 |
@@ -0,0 +1,88 @@ | |||||
#!/usr/bin/env python3 | |||||
# -*- coding: utf-8 -*- | |||||
""" | |||||
Created on Wed Jun 17 17:49:24 2020 | |||||
@author: ljia | |||||
""" | |||||
class EditCost(object): | |||||
def __init__(self): | |||||
pass | |||||
def node_ins_cost_fun(self, node_label): | |||||
""" | |||||
/*! | |||||
* @brief Node insertions cost function. | |||||
* @param[in] node_label A node label. | |||||
* @return The cost of inserting a node with label @p node_label. | |||||
* @note Must be implemented by derived classes of ged::EditCosts. | |||||
*/ | |||||
""" | |||||
return 0 | |||||
def node_del_cost_fun(self, node_label): | |||||
""" | |||||
/*! | |||||
* @brief Node deletion cost function. | |||||
* @param[in] node_label A node label. | |||||
* @return The cost of deleting a node with label @p node_label. | |||||
* @note Must be implemented by derived classes of ged::EditCosts. | |||||
*/ | |||||
""" | |||||
return 0 | |||||
def node_rel_cost_fun(self, node_label_1, node_label_2): | |||||
""" | |||||
/*! | |||||
* @brief Node relabeling cost function. | |||||
* @param[in] node_label_1 A node label. | |||||
* @param[in] node_label_2 A node label. | |||||
* @return The cost of changing a node's label from @p node_label_1 to @p node_label_2. | |||||
* @note Must be implemented by derived classes of ged::EditCosts. | |||||
*/ | |||||
""" | |||||
return 0 | |||||
def edge_ins_cost_fun(self, edge_label): | |||||
""" | |||||
/*! | |||||
* @brief Edge insertion cost function. | |||||
* @param[in] edge_label An edge label. | |||||
* @return The cost of inserting an edge with label @p edge_label. | |||||
* @note Must be implemented by derived classes of ged::EditCosts. | |||||
*/ | |||||
""" | |||||
return 0 | |||||
def edge_del_cost_fun(self, edge_label): | |||||
""" | |||||
/*! | |||||
* @brief Edge deletion cost function. | |||||
* @param[in] edge_label An edge label. | |||||
* @return The cost of deleting an edge with label @p edge_label. | |||||
* @note Must be implemented by derived classes of ged::EditCosts. | |||||
*/ | |||||
""" | |||||
return 0 | |||||
def edge_rel_cost_fun(self, edge_label_1, edge_label_2): | |||||
""" | |||||
/*! | |||||
* @brief Edge relabeling cost function. | |||||
* @param[in] edge_label_1 An edge label. | |||||
* @param[in] edge_label_2 An edge label. | |||||
* @return The cost of changing an edge's label from @p edge_label_1 to @p edge_label_2. | |||||
* @note Must be implemented by derived classes of ged::EditCosts. | |||||
*/ | |||||
""" | |||||
return 0 |
@@ -0,0 +1,4 @@ | |||||
from gklearn.ged.env.common_types import Options, OptionsStringMap, AlgorithmState | |||||
from gklearn.ged.env.ged_data import GEDData | |||||
from gklearn.ged.env.ged_env import GEDEnv | |||||
from gklearn.ged.env.node_map import NodeMap |
@@ -0,0 +1,159 @@ | |||||
#!/usr/bin/env python3 | |||||
# -*- coding: utf-8 -*- | |||||
""" | |||||
Created on Thu Mar 19 18:17:38 2020 | |||||
@author: ljia | |||||
""" | |||||
from enum import Enum, unique | |||||
class Options(object): | |||||
"""Contains enums for options employed by ged::GEDEnv. | |||||
""" | |||||
@unique | |||||
class GEDMethod(Enum): | |||||
"""Selects the method. | |||||
""" | |||||
# @todo: what is this? #ifdef GUROBI | |||||
F1 = 1 # Selects ged::F1. | |||||
F2 = 2 # Selects ged::F2. | |||||
COMPACT_MIP = 3 # Selects ged::CompactMIP. | |||||
BLP_NO_EDGE_LABELS = 4 # Selects ged::BLPNoEdgeLabels. | |||||
#endif /* GUROBI */ | |||||
BRANCH = 5 # Selects ged::Branch. | |||||
BRANCH_FAST = 6 # Selects ged::BranchFast. | |||||
BRANCH_TIGHT = 7 # Selects ged::BranchTight. | |||||
BRANCH_UNIFORM = 8 # Selects ged::BranchUniform. | |||||
BRANCH_COMPACT = 9 # Selects ged::BranchCompact. | |||||
PARTITION = 10 # Selects ged::Partition. | |||||
HYBRID = 11 # Selects ged::Hybrid. | |||||
RING = 12 # Selects ged::Ring. | |||||
ANCHOR_AWARE_GED = 13 # Selects ged::AnchorAwareGED. | |||||
WALKS = 14 # Selects ged::Walks. | |||||
IPFP = 15 # Selects ged::IPFP | |||||
BIPARTITE = 16 # Selects ged::Bipartite. | |||||
SUBGRAPH = 17 # Selects ged::Subgraph. | |||||
NODE = 18 # Selects ged::Node. | |||||
RING_ML = 19 # Selects ged::RingML. | |||||
BIPARTITE_ML = 20 # Selects ged::BipartiteML. | |||||
REFINE = 21 # Selects ged::Refine. | |||||
BP_BEAM = 22 # Selects ged::BPBeam. | |||||
SIMULATED_ANNEALING = 23 # Selects ged::SimulatedAnnealing. | |||||
HED = 24 # Selects ged::HED. | |||||
STAR = 25 # Selects ged::Star. | |||||
@unique | |||||
class EditCosts(Enum): | |||||
"""Selects the edit costs. | |||||
""" | |||||
CHEM_1 = 1 # Selects ged::CHEM1. | |||||
CHEM_2 = 2 # Selects ged::CHEM2. | |||||
CMU = 3 # Selects ged::CMU. | |||||
GREC_1 = 4 # Selects ged::GREC1. | |||||
GREC_2 = 5 # Selects ged::GREC2. | |||||
PROTEIN = 6 # Selects ged::Protein. | |||||
FINGERPRINT = 7 # Selects ged::Fingerprint. | |||||
LETTER = 8 # Selects ged::Letter. | |||||
LETTER2 = 9 # Selects ged:Letter2. | |||||
NON_SYMBOLIC = 10 # Selects ged:NonSymbolic. | |||||
CONSTANT = 11 # Selects ged::Constant. | |||||
@unique | |||||
class InitType(Enum): | |||||
"""@brief Selects the initialization type of the environment. | |||||
* @details If eager initialization is selected, all edit costs are pre-computed when initializing the environment. | |||||
* Otherwise, they are computed at runtime. If initialization with shuffled copies is selected, shuffled copies of | |||||
* all graphs are created. These copies are used when calling ged::GEDEnv::run_method() with two identical graph IDs. | |||||
* In this case, one of the IDs is internally replaced by the ID of the shuffled copy and the graph is hence | |||||
* compared to an isomorphic but non-identical graph. If initialization without shuffled copies is selected, no shuffled copies | |||||
* are created and calling ged::GEDEnv::run_method() with two identical graph IDs amounts to comparing a graph to itself. | |||||
""" | |||||
LAZY_WITHOUT_SHUFFLED_COPIES = 1 # Lazy initialization, no shuffled graph copies are constructed. | |||||
EAGER_WITHOUT_SHUFFLED_COPIES = 2 # Eager initialization, no shuffled graph copies are constructed. | |||||
LAZY_WITH_SHUFFLED_COPIES = 3 # Lazy initialization, shuffled graph copies are constructed. | |||||
EAGER_WITH_SHUFFLED_COPIES = 4 # Eager initialization, shuffled graph copies are constructed. | |||||
@unique | |||||
class AlgorithmState(Enum): | |||||
"""can be used to specify the state of an algorithm. | |||||
""" | |||||
CALLED = 1 # The algorithm has been called. | |||||
INITIALIZED = 2 # The algorithm has been initialized. | |||||
CONVERGED = 3 # The algorithm has converged. | |||||
TERMINATED = 4 # The algorithm has terminated. | |||||
class OptionsStringMap(object): | |||||
# Map of available computation methods between enum type and string. | |||||
GEDMethod = { | |||||
"BRANCH": Options.GEDMethod.BRANCH, | |||||
"BRANCH_FAST": Options.GEDMethod.BRANCH_FAST, | |||||
"BRANCH_TIGHT": Options.GEDMethod.BRANCH_TIGHT, | |||||
"BRANCH_UNIFORM": Options.GEDMethod.BRANCH_UNIFORM, | |||||
"BRANCH_COMPACT": Options.GEDMethod.BRANCH_COMPACT, | |||||
"PARTITION": Options.GEDMethod.PARTITION, | |||||
"HYBRID": Options.GEDMethod.HYBRID, | |||||
"RING": Options.GEDMethod.RING, | |||||
"ANCHOR_AWARE_GED": Options.GEDMethod.ANCHOR_AWARE_GED, | |||||
"WALKS": Options.GEDMethod.WALKS, | |||||
"IPFP": Options.GEDMethod.IPFP, | |||||
"BIPARTITE": Options.GEDMethod.BIPARTITE, | |||||
"SUBGRAPH": Options.GEDMethod.SUBGRAPH, | |||||
"NODE": Options.GEDMethod.NODE, | |||||
"RING_ML": Options.GEDMethod.RING_ML, | |||||
"BIPARTITE_ML": Options.GEDMethod.BIPARTITE_ML, | |||||
"REFINE": Options.GEDMethod.REFINE, | |||||
"BP_BEAM": Options.GEDMethod.BP_BEAM, | |||||
"SIMULATED_ANNEALING": Options.GEDMethod.SIMULATED_ANNEALING, | |||||
"HED": Options.GEDMethod.HED, | |||||
"STAR": Options.GEDMethod.STAR, | |||||
# ifdef GUROBI | |||||
"F1": Options.GEDMethod.F1, | |||||
"F2": Options.GEDMethod.F2, | |||||
"COMPACT_MIP": Options.GEDMethod.COMPACT_MIP, | |||||
"BLP_NO_EDGE_LABELS": Options.GEDMethod.BLP_NO_EDGE_LABELS | |||||
} | |||||
# Map of available edit cost functions between enum type and string. | |||||
EditCosts = { | |||||
"CHEM_1": Options.EditCosts.CHEM_1, | |||||
"CHEM_2": Options.EditCosts.CHEM_2, | |||||
"CMU": Options.EditCosts.CMU, | |||||
"GREC_1": Options.EditCosts.GREC_1, | |||||
"GREC_2": Options.EditCosts.GREC_2, | |||||
"LETTER": Options.EditCosts.LETTER, | |||||
"LETTER2": Options.EditCosts.LETTER2, | |||||
"NON_SYMBOLIC": Options.EditCosts.NON_SYMBOLIC, | |||||
"FINGERPRINT": Options.EditCosts.FINGERPRINT, | |||||
"PROTEIN": Options.EditCosts.PROTEIN, | |||||
"CONSTANT": Options.EditCosts.CONSTANT | |||||
} | |||||
# Map of available initialization types of the environment between enum type and string. | |||||
InitType = { | |||||
"LAZY_WITHOUT_SHUFFLED_COPIES": Options.InitType.LAZY_WITHOUT_SHUFFLED_COPIES, | |||||
"EAGER_WITHOUT_SHUFFLED_COPIES": Options.InitType.EAGER_WITHOUT_SHUFFLED_COPIES, | |||||
"LAZY_WITH_SHUFFLED_COPIES": Options.InitType.LAZY_WITH_SHUFFLED_COPIES, | |||||
"LAZY_WITH_SHUFFLED_COPIES": Options.InitType.LAZY_WITH_SHUFFLED_COPIES | |||||
} | |||||
@unique | |||||
class AlgorithmState(Enum): | |||||
"""can be used to specify the state of an algorithm. | |||||
""" | |||||
CALLED = 1 # The algorithm has been called. | |||||
INITIALIZED = 2 # The algorithm has been initialized. | |||||
CONVERGED = 3 # The algorithm has converged. | |||||
TERMINATED = 4 # The algorithm has terminated. | |||||
@@ -0,0 +1,249 @@ | |||||
#!/usr/bin/env python3 | |||||
# -*- coding: utf-8 -*- | |||||
""" | |||||
Created on Wed Jun 17 15:05:01 2020 | |||||
@author: ljia | |||||
""" | |||||
from gklearn.ged.env import Options, OptionsStringMap | |||||
from gklearn.ged.edit_costs import Constant | |||||
from gklearn.utils import SpecialLabel, dummy_node | |||||
class GEDData(object): | |||||
def __init__(self): | |||||
self._graphs = [] | |||||
self._graph_names = [] | |||||
self._graph_classes = [] | |||||
self._num_graphs_without_shuffled_copies = 0 | |||||
self._strings_to_internal_node_ids = [] | |||||
self._internal_node_ids_to_strings = [] | |||||
self._edit_cost = None | |||||
self._node_costs = None | |||||
self._edge_costs = None | |||||
self._node_label_costs = None | |||||
self._edge_label_costs = None | |||||
self._node_labels = [] | |||||
self._edge_labels = [] | |||||
self._init_type = Options.InitType.EAGER_WITHOUT_SHUFFLED_COPIES | |||||
self._delete_edit_cost = True | |||||
self._max_num_nodes = 0 | |||||
self._max_num_edges = 0 | |||||
def num_graphs(self): | |||||
""" | |||||
/*! | |||||
* @brief Returns the number of graphs. | |||||
* @return Number of graphs in the instance. | |||||
*/ | |||||
""" | |||||
return len(self._graphs) | |||||
def graph(self, graph_id): | |||||
""" | |||||
/*! | |||||
* @brief Provides access to a graph. | |||||
* @param[in] graph_id The ID of the graph. | |||||
* @return Constant reference to the graph with ID @p graph_id. | |||||
*/ | |||||
""" | |||||
return self._graphs[graph_id] | |||||
def shuffled_graph_copies_available(self): | |||||
""" | |||||
/*! | |||||
* @brief Checks if shuffled graph copies are available. | |||||
* @return Boolean @p true if shuffled graph copies are available. | |||||
*/ | |||||
""" | |||||
return (self._init_type == Options.InitType.EAGER_WITH_SHUFFLED_COPIES or self._init_type == Options.InitType.LAZY_WITH_SHUFFLED_COPIES) | |||||
def num_graphs_without_shuffled_copies(self): | |||||
""" | |||||
/*! | |||||
* @brief Returns the number of graphs in the instance without the shuffled copies. | |||||
* @return Number of graphs without shuffled copies contained in the instance. | |||||
*/ | |||||
""" | |||||
return self._num_graphs_without_shuffled_copies | |||||
def node_cost(self, label1, label2): | |||||
""" | |||||
/*! | |||||
* @brief Returns node relabeling, insertion, or deletion cost. | |||||
* @param[in] label1 First node label. | |||||
* @param[in] label2 Second node label. | |||||
* @return Node relabeling cost if @p label1 and @p label2 are both different from ged::dummy_label(), | |||||
* node insertion cost if @p label1 equals ged::dummy_label and @p label2 does not, | |||||
* node deletion cost if @p label1 does not equal ged::dummy_label and @p label2 does, | |||||
* and 0 otherwise. | |||||
*/ | |||||
""" | |||||
if self._node_label_costs is None: | |||||
if self._eager_init(): # @todo: check if correct | |||||
return self._node_costs[label1, label2] | |||||
if label1 == label2: | |||||
return 0 | |||||
if label1 == SpecialLabel.DUMMY: # @todo: check dummy | |||||
return self._edit_cost.node_ins_cost_fun(label2) # self._node_labels[label2 - 1]) # @todo: check | |||||
if label2 == SpecialLabel.DUMMY: # @todo: check dummy | |||||
return self._edit_cost.node_del_cost_fun(label1) # self._node_labels[label1 - 1]) | |||||
return self._edit_cost.node_rel_cost_fun(label1, label2) # self._node_labels[label1 - 1], self._node_labels[label2 - 1]) | |||||
# use pre-computed node label costs. | |||||
else: | |||||
id1 = 0 if label1 == SpecialLabel.DUMMY else self._node_label_to_id(label1) # @todo: this is slow. | |||||
id2 = 0 if label2 == SpecialLabel.DUMMY else self._node_label_to_id(label2) | |||||
return self._node_label_costs[id1, id2] | |||||
def edge_cost(self, label1, label2): | |||||
""" | |||||
/*! | |||||
* @brief Returns edge relabeling, insertion, or deletion cost. | |||||
* @param[in] label1 First edge label. | |||||
* @param[in] label2 Second edge label. | |||||
* @return Edge relabeling cost if @p label1 and @p label2 are both different from ged::dummy_label(), | |||||
* edge insertion cost if @p label1 equals ged::dummy_label and @p label2 does not, | |||||
* edge deletion cost if @p label1 does not equal ged::dummy_label and @p label2 does, | |||||
* and 0 otherwise. | |||||
*/ | |||||
""" | |||||
if self._edge_label_costs is None: | |||||
if self._eager_init(): # @todo: check if correct | |||||
return self._node_costs[label1, label2] | |||||
if label1 == label2: | |||||
return 0 | |||||
if label1 == SpecialLabel.DUMMY: | |||||
return self._edit_cost.edge_ins_cost_fun(label2) # self._edge_labels[label2 - 1]) | |||||
if label2 == SpecialLabel.DUMMY: | |||||
return self._edit_cost.edge_del_cost_fun(label1) # self._edge_labels[label1 - 1]) | |||||
return self._edit_cost.edge_rel_cost_fun(label1, label2) # self._edge_labels[label1 - 1], self._edge_labels[label2 - 1]) | |||||
# use pre-computed edge label costs. | |||||
else: | |||||
id1 = 0 if label1 == SpecialLabel.DUMMY else self._edge_label_to_id(label1) # @todo: this is slow. | |||||
id2 = 0 if label2 == SpecialLabel.DUMMY else self._edge_label_to_id(label2) | |||||
return self._edge_label_costs[id1, id2] | |||||
def compute_induced_cost(self, g, h, node_map): | |||||
""" | |||||
/*! | |||||
* @brief Computes the edit cost between two graphs induced by a node map. | |||||
* @param[in] g Input graph. | |||||
* @param[in] h Input graph. | |||||
* @param[in,out] node_map Node map whose induced edit cost is to be computed. | |||||
*/ | |||||
""" | |||||
cost = 0 | |||||
# collect node costs | |||||
for node in g.nodes(): | |||||
image = node_map.image(node) | |||||
label2 = (SpecialLabel.DUMMY if image == dummy_node() else h.nodes[image]['label']) | |||||
cost += self.node_cost(g.nodes[node]['label'], label2) | |||||
for node in h.nodes(): | |||||
pre_image = node_map.pre_image(node) | |||||
if pre_image == dummy_node(): | |||||
cost += self.node_cost(SpecialLabel.DUMMY, h.nodes[node]['label']) | |||||
# collect edge costs | |||||
for (n1, n2) in g.edges(): | |||||
image1 = node_map.image(n1) | |||||
image2 = node_map.image(n2) | |||||
label2 = (h.edges[(image2, image1)]['label'] if h.has_edge(image2, image1) else SpecialLabel.DUMMY) | |||||
cost += self.edge_cost(g.edges[(n1, n2)]['label'], label2) | |||||
for (n1, n2) in h.edges(): | |||||
if not g.has_edge(node_map.pre_image(n2), node_map.pre_image(n1)): | |||||
cost += self.edge_cost(SpecialLabel.DUMMY, h.edges[(n1, n2)]['label']) | |||||
node_map.set_induced_cost(cost) | |||||
def _set_edit_cost(self, edit_cost, edit_cost_constants): | |||||
if self._delete_edit_cost: | |||||
self._edit_cost = None | |||||
if isinstance(edit_cost, str): | |||||
edit_cost = OptionsStringMap.EditCosts[edit_cost] | |||||
if edit_cost == Options.EditCosts.CHEM_1: | |||||
if len(edit_cost_constants) == 4: | |||||
self._edit_cost = CHEM1(edit_cost_constants[0], edit_cost_constants[1], edit_cost_constants[2], edit_cost_constants[3]) | |||||
elif len(edit_cost_constants) == 0: | |||||
self._edit_cost = CHEM1() | |||||
else: | |||||
raise Exception('Wrong number of constants for selected edit costs Options::EditCosts::CHEM_1. Expected: 4 or 0; actual:', len(edit_cost_constants), '.') | |||||
elif edit_cost == Options.EditCosts.LETTER: | |||||
if len(edit_cost_constants) == 3: | |||||
self._edit_cost = Letter(edit_cost_constants[0], edit_cost_constants[1], edit_cost_constants[2]) | |||||
elif len(edit_cost_constants) == 0: | |||||
self._edit_cost = Letter() | |||||
else: | |||||
raise Exception('Wrong number of constants for selected edit costs Options::EditCosts::LETTER. Expected: 3 or 0; actual:', len(edit_cost_constants), '.') | |||||
elif edit_cost == Options.EditCosts.LETTER2: | |||||
if len(edit_cost_constants) == 5: | |||||
self._edit_cost = Letter2(edit_cost_constants[0], edit_cost_constants[1], edit_cost_constants[2], edit_cost_constants[3], edit_cost_constants[4]) | |||||
elif len(edit_cost_constants) == 0: | |||||
self._edit_cost = Letter2() | |||||
else: | |||||
raise Exception('Wrong number of constants for selected edit costs Options::EditCosts::LETTER2. Expected: 5 or 0; actual:', len(edit_cost_constants), '.') | |||||
elif edit_cost == Options.EditCosts.NON_SYMBOLIC: | |||||
if len(edit_cost_constants) == 6: | |||||
self._edit_cost = NonSymbolic(edit_cost_constants[0], edit_cost_constants[1], edit_cost_constants[2], edit_cost_constants[3], edit_cost_constants[4], edit_cost_constants[5]) | |||||
elif len(edit_cost_constants) == 0: | |||||
self._edit_cost = NonSymbolic() | |||||
else: | |||||
raise Exception('Wrong number of constants for selected edit costs Options::EditCosts::NON_SYMBOLIC. Expected: 6 or 0; actual:', len(edit_cost_constants), '.') | |||||
elif edit_cost == Options.EditCosts.CONSTANT: | |||||
if len(edit_cost_constants) == 6: | |||||
self._edit_cost = Constant(edit_cost_constants[0], edit_cost_constants[1], edit_cost_constants[2], edit_cost_constants[3], edit_cost_constants[4], edit_cost_constants[5]) | |||||
elif len(edit_cost_constants) == 0: | |||||
self._edit_cost = Constant() | |||||
else: | |||||
raise Exception('Wrong number of constants for selected edit costs Options::EditCosts::CONSTANT. Expected: 6 or 0; actual:', len(edit_cost_constants), '.') | |||||
self._delete_edit_cost = True | |||||
def id_to_node_label(self, label_id): | |||||
if label_id > len(self._node_labels) or label_id == 0: | |||||
raise Exception('Invalid node label ID', str(label_id), '.') | |||||
return self._node_labels[label_id - 1] | |||||
def _node_label_to_id(self, node_label): | |||||
n_id = 0 | |||||
for n_l in self._node_labels: | |||||
if n_l == node_label: | |||||
return n_id + 1 | |||||
n_id += 1 | |||||
self._node_labels.append(node_label) | |||||
return n_id + 1 | |||||
def id_to_edge_label(self, label_id): | |||||
if label_id > len(self._edge_labels) or label_id == 0: | |||||
raise Exception('Invalid edge label ID', str(label_id), '.') | |||||
return self._edge_labels[label_id - 1] | |||||
def _edge_label_to_id(self, edge_label): | |||||
e_id = 0 | |||||
for e_l in self._edge_labels: | |||||
if e_l == edge_label: | |||||
return e_id + 1 | |||||
e_id += 1 | |||||
self._edge_labels.append(edge_label) | |||||
return e_id + 1 | |||||
def _eager_init(self): | |||||
return (self._init_type == Options.InitType.EAGER_WITHOUT_SHUFFLED_COPIES or self._init_type == Options.InitType.EAGER_WITH_SHUFFLED_COPIES) |
@@ -0,0 +1,733 @@ | |||||
#!/usr/bin/env python3 | |||||
# -*- coding: utf-8 -*- | |||||
""" | |||||
Created on Wed Jun 17 12:02:36 2020 | |||||
@author: ljia | |||||
""" | |||||
import numpy as np | |||||
import networkx as nx | |||||
from gklearn.ged.env import Options, OptionsStringMap | |||||
from gklearn.ged.env import GEDData | |||||
class GEDEnv(object): | |||||
def __init__(self): | |||||
self._initialized = False | |||||
self._new_graph_ids = [] | |||||
self._ged_data = GEDData() | |||||
# Variables needed for approximating ged_instance_. | |||||
self._lower_bounds = {} | |||||
self._upper_bounds = {} | |||||
self._runtimes = {} | |||||
self._node_maps = {} | |||||
self._original_to_internal_node_ids = [] | |||||
self._internal_to_original_node_ids = [] | |||||
self._ged_method = None | |||||
def set_edit_cost(self, edit_cost, edit_cost_constants=[]): | |||||
""" | |||||
/*! | |||||
* @brief Sets the edit costs to one of the predefined edit costs. | |||||
* @param[in] edit_costs Select one of the predefined edit costs. | |||||
* @param[in] edit_cost_constants Constants passed to the constructor of the edit cost class selected by @p edit_costs. | |||||
*/ | |||||
""" | |||||
self._ged_data._set_edit_cost(edit_cost, edit_cost_constants) | |||||
def add_graph(self, graph_name='', graph_class=''): | |||||
""" | |||||
/*! | |||||
* @brief Adds a new uninitialized graph to the environment. Call init() after calling this method. | |||||
* @param[in] graph_name The name of the added graph. Empty if not specified. | |||||
* @param[in] graph_class The class of the added graph. Empty if not specified. | |||||
* @return The ID of the newly added graph. | |||||
*/ | |||||
""" | |||||
# @todo: graphs are not uninitialized. | |||||
self._initialized = False | |||||
graph_id = self._ged_data._num_graphs_without_shuffled_copies | |||||
self._ged_data._num_graphs_without_shuffled_copies += 1 | |||||
self._new_graph_ids.append(graph_id) | |||||
self._ged_data._graphs.append(nx.Graph()) | |||||
self._ged_data._graph_names.append(graph_name) | |||||
self._ged_data._graph_classes.append(graph_class) | |||||
self._original_to_internal_node_ids.append({}) | |||||
self._internal_to_original_node_ids.append({}) | |||||
self._ged_data._strings_to_internal_node_ids.append({}) | |||||
self._ged_data._internal_node_ids_to_strings.append({}) | |||||
return graph_id | |||||
def clear_graph(self, graph_id): | |||||
""" | |||||
/*! | |||||
* @brief Clears and de-initializes a graph that has previously been added to the environment. Call init() after calling this method. | |||||
* @param[in] graph_id ID of graph that has to be cleared. | |||||
*/ | |||||
""" | |||||
if graph_id > self._ged_data.num_graphs_without_shuffled_copies(): | |||||
raise Exception('The graph', self.get_graph_name(graph_id), 'has not been added to the environment.') | |||||
self._ged_data._graphs[graph_id].clear() | |||||
self._original_to_internal_node_ids[graph_id].clear() | |||||
self._internal_to_original_node_ids[graph_id].clear() | |||||
self._ged_data._strings_to_internal_node_ids[graph_id].clear() | |||||
self._ged_data._internal_node_ids_to_strings[graph_id].clear() | |||||
self._initialized = False | |||||
def add_node(self, graph_id, node_id, node_label): | |||||
""" | |||||
/*! | |||||
* @brief Adds a labeled node. | |||||
* @param[in] graph_id ID of graph that has been added to the environment. | |||||
* @param[in] node_id The user-specific ID of the vertex that has to be added. | |||||
* @param[in] node_label The label of the vertex that has to be added. Set to ged::NoLabel() if template parameter @p UserNodeLabel equals ged::NoLabel. | |||||
*/ | |||||
""" | |||||
# @todo: check ids. | |||||
self._initialized = False | |||||
internal_node_id = nx.number_of_nodes(self._ged_data._graphs[graph_id]) | |||||
self._ged_data._graphs[graph_id].add_node(internal_node_id, label=node_label) | |||||
self._original_to_internal_node_ids[graph_id][node_id] = internal_node_id | |||||
self._internal_to_original_node_ids[graph_id][internal_node_id] = node_id | |||||
self._ged_data._strings_to_internal_node_ids[graph_id][str(node_id)] = internal_node_id | |||||
self._ged_data._internal_node_ids_to_strings[graph_id][internal_node_id] = str(node_id) | |||||
self._ged_data._node_label_to_id(node_label) | |||||
label_id = self._ged_data._node_label_to_id(node_label) | |||||
# @todo: ged_data_.graphs_[graph_id].set_label | |||||
def add_edge(self, graph_id, nd_from, nd_to, edge_label, ignore_duplicates=True): | |||||
""" | |||||
/*! | |||||
* @brief Adds a labeled edge. | |||||
* @param[in] graph_id ID of graph that has been added to the environment. | |||||
* @param[in] tail The user-specific ID of the tail of the edge that has to be added. | |||||
* @param[in] head The user-specific ID of the head of the edge that has to be added. | |||||
* @param[in] edge_label The label of the vertex that has to be added. Set to ged::NoLabel() if template parameter @p UserEdgeLabel equals ged::NoLabel. | |||||
* @param[in] ignore_duplicates If @p true, duplicate edges are ignores. Otherwise, an exception is thrown if an existing edge is added to the graph. | |||||
*/ | |||||
""" | |||||
# @todo: check everything. | |||||
self._initialized = False | |||||
# @todo: check ignore_duplicates. | |||||
self._ged_data._graphs[graph_id].add_edge(self._original_to_internal_node_ids[graph_id][nd_from], self._original_to_internal_node_ids[graph_id][nd_to], label=edge_label) | |||||
label_id = self._ged_data._edge_label_to_id(edge_label) | |||||
# @todo: ged_data_.graphs_[graph_id].set_label | |||||
def add_nx_graph(self, g, classe, ignore_duplicates=True) : | |||||
""" | |||||
Add a Graph (made by networkx) on the environment. Be careful to respect the same format as GXL graphs for labelling nodes and edges. | |||||
:param g: The graph to add (networkx graph) | |||||
:param ignore_duplicates: If True, duplicate edges are ignored, otherwise it's raise an error if an existing edge is added. True by default | |||||
:type g: networkx.graph | |||||
:type ignore_duplicates: bool | |||||
:return: The ID of the newly added graphe | |||||
:rtype: size_t | |||||
.. note:: The NX graph must respect the GXL structure. Please see how a GXL graph is construct. | |||||
""" | |||||
graph_id = self.add_graph(g.name, classe) # check if the graph name already exists. | |||||
for node in g.nodes: # @todo: if the keys of labels include int and str at the same time. | |||||
self.add_node(graph_id, node, tuple(sorted(g.nodes[node].items(), key=lambda kv: kv[0]))) | |||||
for edge in g.edges: | |||||
self.add_edge(graph_id, edge[0], edge[1], tuple(sorted(g.edges[(edge[0], edge[1])].items(), key=lambda kv: kv[0])), ignore_duplicates) | |||||
return graph_id | |||||
def load_nx_graph(self, nx_graph, graph_id, graph_name='', graph_class=''): | |||||
""" | |||||
Loads NetworkX Graph into the GED environment. | |||||
Parameters | |||||
---------- | |||||
nx_graph : NetworkX Graph object | |||||
The graph that should be loaded. | |||||
graph_id : int or None | |||||
The ID of a graph contained the environment (overwrite existing graph) or add new graph if `None`. | |||||
graph_name : string, optional | |||||
The name of newly added graph. The default is ''. Has no effect unless `graph_id` equals `None`. | |||||
graph_class : string, optional | |||||
The class of newly added graph. The default is ''. Has no effect unless `graph_id` equals `None`. | |||||
Returns | |||||
------- | |||||
int | |||||
The ID of the newly loaded graph. | |||||
""" | |||||
if graph_id is None: # @todo: undefined. | |||||
graph_id = self.add_graph(graph_name, graph_class) | |||||
else: | |||||
self.clear_graph(graph_id) | |||||
for node in nx_graph.nodes: | |||||
self.add_node(graph_id, node, tuple(sorted(nx_graph.nodes[node].items(), key=lambda kv: kv[0]))) | |||||
for edge in nx_graph.edges: | |||||
self.add_edge(graph_id, edge[0], edge[1], tuple(sorted(nx_graph.edges[(edge[0], edge[1])].items(), key=lambda kv: kv[0]))) | |||||
return graph_id | |||||
def init(self, init_type=Options.InitType.EAGER_WITHOUT_SHUFFLED_COPIES, print_to_stdout=False): | |||||
if isinstance(init_type, str): | |||||
init_type = OptionsStringMap.InitType[init_type] | |||||
# Throw an exception if no edit costs have been selected. | |||||
if self._ged_data._edit_cost is None: | |||||
raise Exception('No edit costs have been selected. Call set_edit_cost() before calling init().') | |||||
# Return if the environment is initialized. | |||||
if self._initialized: | |||||
return | |||||
# Set initialization type. | |||||
self._ged_data._init_type = init_type | |||||
# @todo: Construct shuffled graph copies if necessary. | |||||
# Re-initialize adjacency matrices (also previously initialized graphs must be re-initialized because of possible re-allocation). | |||||
# @todo: setup_adjacency_matrix, don't know if neccessary. | |||||
self._ged_data._max_num_nodes = np.max([nx.number_of_nodes(g) for g in self._ged_data._graphs]) | |||||
self._ged_data._max_num_edges = np.max([nx.number_of_edges(g) for g in self._ged_data._graphs]) | |||||
# Initialize cost matrices if necessary. | |||||
if self._ged_data._eager_init(): | |||||
pass # @todo: init_cost_matrices_: 1. Update node cost matrix if new node labels have been added to the environment; 2. Update edge cost matrix if new edge labels have been added to the environment. | |||||
# Mark environment as initialized. | |||||
self._initialized = True | |||||
self._new_graph_ids.clear() | |||||
def is_initialized(self): | |||||
""" | |||||
/*! | |||||
* @brief Check if the environment is initialized. | |||||
* @return True if the environment is initialized. | |||||
*/ | |||||
""" | |||||
return self._initialized | |||||
def get_init_type(self): | |||||
""" | |||||
/*! | |||||
* @brief Returns the initialization type of the last initialization. | |||||
* @return Initialization type. | |||||
*/ | |||||
""" | |||||
return self._ged_data._init_type | |||||
def set_label_costs(self, node_label_costs=None, edge_label_costs=None): | |||||
"""Set the costs between labels. | |||||
""" | |||||
if node_label_costs is not None: | |||||
self._ged_data._node_label_costs = node_label_costs | |||||
if edge_label_costs is not None: | |||||
self._ged_data._edge_label_costs = edge_label_costs | |||||
def set_method(self, method, options=''): | |||||
""" | |||||
/*! | |||||
* @brief Sets the GEDMethod to be used by run_method(). | |||||
* @param[in] method Select the method that is to be used. | |||||
* @param[in] options An options string of the form @"[--@<option@> @<arg@>] [...]@" passed to the selected method. | |||||
*/ | |||||
""" | |||||
del self._ged_method | |||||
if isinstance(method, str): | |||||
method = OptionsStringMap.GEDMethod[method] | |||||
if method == Options.GEDMethod.BRANCH: | |||||
self._ged_method = Branch(self._ged_data) | |||||
elif method == Options.GEDMethod.BRANCH_FAST: | |||||
self._ged_method = BranchFast(self._ged_data) | |||||
elif method == Options.GEDMethod.BRANCH_FAST: | |||||
self._ged_method = BranchFast(self._ged_data) | |||||
elif method == Options.GEDMethod.BRANCH_TIGHT: | |||||
self._ged_method = BranchTight(self._ged_data) | |||||
elif method == Options.GEDMethod.BRANCH_UNIFORM: | |||||
self._ged_method = BranchUniform(self._ged_data) | |||||
elif method == Options.GEDMethod.BRANCH_COMPACT: | |||||
self._ged_method = BranchCompact(self._ged_data) | |||||
elif method == Options.GEDMethod.PARTITION: | |||||
self._ged_method = Partition(self._ged_data) | |||||
elif method == Options.GEDMethod.HYBRID: | |||||
self._ged_method = Hybrid(self._ged_data) | |||||
elif method == Options.GEDMethod.RING: | |||||
self._ged_method = Ring(self._ged_data) | |||||
elif method == Options.GEDMethod.ANCHOR_AWARE_GED: | |||||
self._ged_method = AnchorAwareGED(self._ged_data) | |||||
elif method == Options.GEDMethod.WALKS: | |||||
self._ged_method = Walks(self._ged_data) | |||||
elif method == Options.GEDMethod.IPFP: | |||||
self._ged_method = IPFP(self._ged_data) | |||||
elif method == Options.GEDMethod.BIPARTITE: | |||||
from gklearn.ged.methods import Bipartite | |||||
self._ged_method = Bipartite(self._ged_data) | |||||
elif method == Options.GEDMethod.SUBGRAPH: | |||||
self._ged_method = Subgraph(self._ged_data) | |||||
elif method == Options.GEDMethod.NODE: | |||||
self._ged_method = Node(self._ged_data) | |||||
elif method == Options.GEDMethod.RING_ML: | |||||
self._ged_method = RingML(self._ged_data) | |||||
elif method == Options.GEDMethod.BIPARTITE_ML: | |||||
self._ged_method = BipartiteML(self._ged_data) | |||||
elif method == Options.GEDMethod.REFINE: | |||||
self._ged_method = Refine(self._ged_data) | |||||
elif method == Options.GEDMethod.BP_BEAM: | |||||
self._ged_method = BPBeam(self._ged_data) | |||||
elif method == Options.GEDMethod.SIMULATED_ANNEALING: | |||||
self._ged_method = SimulatedAnnealing(self._ged_data) | |||||
elif method == Options.GEDMethod.HED: | |||||
self._ged_method = HED(self._ged_data) | |||||
elif method == Options.GEDMethod.STAR: | |||||
self._ged_method = STAR(self._ged_data) | |||||
# #ifdef GUROBI | |||||
elif method == Options.GEDMethod.F1: | |||||
self._ged_method = F1(self._ged_data) | |||||
elif method == Options.GEDMethod.F2: | |||||
self._ged_method = F2(self._ged_data) | |||||
elif method == Options.GEDMethod.COMPACT_MIP: | |||||
self._ged_method = CompactMIP(self._ged_data) | |||||
elif method == Options.GEDMethod.BLP_NO_EDGE_LABELS: | |||||
self._ged_method = BLPNoEdgeLabels(self._ged_data) | |||||
self._ged_method.set_options(options) | |||||
def run_method(self, g_id, h_id): | |||||
""" | |||||
/*! | |||||
* @brief Runs the GED method specified by call to set_method() between the graphs with IDs @p g_id and @p h_id. | |||||
* @param[in] g_id ID of an input graph that has been added to the environment. | |||||
* @param[in] h_id ID of an input graph that has been added to the environment. | |||||
*/ | |||||
""" | |||||
if g_id >= self._ged_data.num_graphs(): | |||||
raise Exception('The graph with ID', str(g_id), 'has not been added to the environment.') | |||||
if h_id >= self._ged_data.num_graphs(): | |||||
raise Exception('The graph with ID', str(h_id), 'has not been added to the environment.') | |||||
if not self._initialized: | |||||
raise Exception('The environment is uninitialized. Call init() after adding all graphs to the environment.') | |||||
if self._ged_method is None: | |||||
raise Exception('No method has been set. Call set_method() before calling run().') | |||||
# Call selected GEDMethod and store results. | |||||
if self._ged_data.shuffled_graph_copies_available() and (g_id == h_id): | |||||
self._ged_method.run(g_id, self._ged_data.id_shuffled_graph_copy(h_id)) # @todo: why shuffle? | |||||
else: | |||||
self._ged_method.run(g_id, h_id) | |||||
self._lower_bounds[(g_id, h_id)] = self._ged_method.get_lower_bound() | |||||
self._upper_bounds[(g_id, h_id)] = self._ged_method.get_upper_bound() | |||||
self._runtimes[(g_id, h_id)] = self._ged_method.get_runtime() | |||||
self._node_maps[(g_id, h_id)] = self._ged_method.get_node_map() | |||||
def init_method(self): | |||||
"""Initializes the method specified by call to set_method(). | |||||
""" | |||||
if not self._initialized: | |||||
raise Exception('The environment is uninitialized. Call init() before calling init_method().') | |||||
if self._ged_method is None: | |||||
raise Exception('No method has been set. Call set_method() before calling init_method().') | |||||
self._ged_method.init() | |||||
def get_num_node_labels(self): | |||||
""" | |||||
/*! | |||||
* @brief Returns the number of node labels. | |||||
* @return Number of pairwise different node labels contained in the environment. | |||||
* @note If @p 1 is returned, the nodes are unlabeled. | |||||
*/ | |||||
""" | |||||
return len(self._ged_data._node_labels) | |||||
def get_all_node_labels(self): | |||||
""" | |||||
/*! | |||||
* @brief Returns the list of all node labels. | |||||
* @return List of pairwise different node labels contained in the environment. | |||||
* @note If @p 1 is returned, the nodes are unlabeled. | |||||
*/ | |||||
""" | |||||
return self._ged_data._node_labels | |||||
def get_node_label(self, label_id, to_dict=True): | |||||
""" | |||||
/*! | |||||
* @brief Returns node label. | |||||
* @param[in] label_id ID of node label that should be returned. Must be between 1 and num_node_labels(). | |||||
* @return Node label for selected label ID. | |||||
*/ | |||||
""" | |||||
if label_id < 1 or label_id > self.get_num_node_labels(): | |||||
raise Exception('The environment does not contain a node label with ID', str(label_id), '.') | |||||
if to_dict: | |||||
return dict(self._ged_data._node_labels[label_id - 1]) | |||||
return self._ged_data._node_labels[label_id - 1] | |||||
def get_num_edge_labels(self): | |||||
""" | |||||
/*! | |||||
* @brief Returns the number of edge labels. | |||||
* @return Number of pairwise different edge labels contained in the environment. | |||||
* @note If @p 1 is returned, the edges are unlabeled. | |||||
*/ | |||||
""" | |||||
return len(self._ged_data._edge_labels) | |||||
def get_all_edge_labels(self): | |||||
""" | |||||
/*! | |||||
* @brief Returns the list of all edge labels. | |||||
* @return List of pairwise different edge labels contained in the environment. | |||||
* @note If @p 1 is returned, the edges are unlabeled. | |||||
*/ | |||||
""" | |||||
return self._ged_data._edge_labels | |||||
def get_edge_label(self, label_id, to_dict=True): | |||||
""" | |||||
/*! | |||||
* @brief Returns edge label. | |||||
* @param[in] label_id ID of edge label that should be returned. Must be between 1 and num_node_labels(). | |||||
* @return Edge label for selected label ID. | |||||
*/ | |||||
""" | |||||
if label_id < 1 or label_id > self.get_num_edge_labels(): | |||||
raise Exception('The environment does not contain an edge label with ID', str(label_id), '.') | |||||
if to_dict: | |||||
return dict(self._ged_data._edge_labels[label_id - 1]) | |||||
return self._ged_data._edge_labels[label_id - 1] | |||||
def get_upper_bound(self, g_id, h_id): | |||||
""" | |||||
/*! | |||||
* @brief Returns upper bound for edit distance between the input graphs. | |||||
* @param[in] g_id ID of an input graph that has been added to the environment. | |||||
* @param[in] h_id ID of an input graph that has been added to the environment. | |||||
* @return Upper bound computed by the last call to run_method() with arguments @p g_id and @p h_id. | |||||
*/ | |||||
""" | |||||
if (g_id, h_id) not in self._upper_bounds: | |||||
raise Exception('Call run(' + str(g_id) + ',' + str(h_id) + ') before calling get_upper_bound(' + str(g_id) + ',' + str(h_id) + ').') | |||||
return self._upper_bounds[(g_id, h_id)] | |||||
def get_lower_bound(self, g_id, h_id): | |||||
""" | |||||
/*! | |||||
* @brief Returns lower bound for edit distance between the input graphs. | |||||
* @param[in] g_id ID of an input graph that has been added to the environment. | |||||
* @param[in] h_id ID of an input graph that has been added to the environment. | |||||
* @return Lower bound computed by the last call to run_method() with arguments @p g_id and @p h_id. | |||||
*/ | |||||
""" | |||||
if (g_id, h_id) not in self._lower_bounds: | |||||
raise Exception('Call run(' + str(g_id) + ',' + str(h_id) + ') before calling get_lower_bound(' + str(g_id) + ',' + str(h_id) + ').') | |||||
return self._lower_bounds[(g_id, h_id)] | |||||
def get_runtime(self, g_id, h_id): | |||||
""" | |||||
/*! | |||||
* @brief Returns runtime. | |||||
* @param[in] g_id ID of an input graph that has been added to the environment. | |||||
* @param[in] h_id ID of an input graph that has been added to the environment. | |||||
* @return Runtime of last call to run_method() with arguments @p g_id and @p h_id. | |||||
*/ | |||||
""" | |||||
if (g_id, h_id) not in self._runtimes: | |||||
raise Exception('Call run(' + str(g_id) + ',' + str(h_id) + ') before calling get_runtime(' + str(g_id) + ',' + str(h_id) + ').') | |||||
return self._runtimes[(g_id, h_id)] | |||||
def get_init_time(self): | |||||
""" | |||||
/*! | |||||
* @brief Returns initialization time. | |||||
* @return Runtime of the last call to init_method(). | |||||
*/ | |||||
""" | |||||
return self._ged_method.get_init_time() | |||||
def get_node_map(self, g_id, h_id): | |||||
""" | |||||
/*! | |||||
* @brief Returns node map between the input graphs. | |||||
* @param[in] g_id ID of an input graph that has been added to the environment. | |||||
* @param[in] h_id ID of an input graph that has been added to the environment. | |||||
* @return Node map computed by the last call to run_method() with arguments @p g_id and @p h_id. | |||||
*/ | |||||
""" | |||||
if (g_id, h_id) not in self._node_maps: | |||||
raise Exception('Call run(' + str(g_id) + ',' + str(h_id) + ') before calling get_node_map(' + str(g_id) + ',' + str(h_id) + ').') | |||||
return self._node_maps[(g_id, h_id)] | |||||
def get_forward_map(self, g_id, h_id) : | |||||
""" | |||||
Returns the forward map (or the half of the adjacence matrix) between nodes of the two indicated graphs. | |||||
:param g: The Id of the first compared graph | |||||
:param h: The Id of the second compared graph | |||||
:type g: size_t | |||||
:type h: size_t | |||||
:return: The forward map to the adjacence matrix between nodes of the two graphs | |||||
:rtype: list[npy_uint32] | |||||
.. seealso:: run_method(), get_upper_bound(), get_lower_bound(), get_backward_map(), get_runtime(), quasimetric_cost(), get_node_map(), get_assignment_matrix() | |||||
.. warning:: run_method() between the same two graph must be called before this function. | |||||
.. note:: I don't know how to connect the two map to reconstruct the adjacence matrix. Please come back when I know how it's work ! | |||||
""" | |||||
return self.get_node_map(g_id, h_id).forward_map | |||||
def get_backward_map(self, g_id, h_id) : | |||||
""" | |||||
Returns the backward map (or the half of the adjacence matrix) between nodes of the two indicated graphs. | |||||
:param g: The Id of the first compared graph | |||||
:param h: The Id of the second compared graph | |||||
:type g: size_t | |||||
:type h: size_t | |||||
:return: The backward map to the adjacence matrix between nodes of the two graphs | |||||
:rtype: list[npy_uint32] | |||||
.. seealso:: run_method(), get_upper_bound(), get_lower_bound(), get_forward_map(), get_runtime(), quasimetric_cost(), get_node_map(), get_assignment_matrix() | |||||
.. warning:: run_method() between the same two graph must be called before this function. | |||||
.. note:: I don't know how to connect the two map to reconstruct the adjacence matrix. Please come back when I know how it's work ! | |||||
""" | |||||
return self.get_node_map(g_id, h_id).backward_map | |||||
def compute_induced_cost(self, g_id, h_id, node_map): | |||||
""" | |||||
/*! | |||||
* @brief Computes the edit cost between two graphs induced by a node map. | |||||
* @param[in] g_id ID of input graph. | |||||
* @param[in] h_id ID of input graph. | |||||
* @param[in,out] node_map Node map whose induced edit cost is to be computed. | |||||
*/ | |||||
""" | |||||
self._ged_data.compute_induced_cost(self._ged_data._graphs[g_id], self._ged_data._graphs[h_id], node_map) | |||||
def get_nx_graph(self, graph_id): | |||||
""" | |||||
* @brief Returns NetworkX.Graph() representation. | |||||
* @param[in] graph_id ID of the selected graph. | |||||
""" | |||||
graph = nx.Graph() # @todo: add graph attributes. | |||||
graph.graph['id'] = graph_id | |||||
nb_nodes = self.get_graph_num_nodes(graph_id) | |||||
original_node_ids = self.get_original_node_ids(graph_id) | |||||
node_labels = self.get_graph_node_labels(graph_id, to_dict=True) | |||||
graph.graph['original_node_ids'] = original_node_ids | |||||
for node_id in range(0, nb_nodes): | |||||
graph.add_node(node_id, **node_labels[node_id]) | |||||
edges = self.get_graph_edges(graph_id, to_dict=True) | |||||
for (head, tail), labels in edges.items(): | |||||
graph.add_edge(head, tail, **labels) | |||||
return graph | |||||
def get_graph_node_labels(self, graph_id, to_dict=True): | |||||
""" | |||||
Searchs and returns all the labels of nodes on a graph, selected by its ID. | |||||
:param graph_id: The ID of the wanted graph | |||||
:type graph_id: size_t | |||||
:return: The list of nodes' labels on the selected graph | |||||
:rtype: list[dict{string : string}] | |||||
.. seealso:: get_graph_internal_id(), get_graph_num_nodes(), get_graph_num_edges(), get_original_node_ids(), get_graph_edges(), get_graph_adjacence_matrix() | |||||
.. note:: These functions allow to collect all the graph's informations. | |||||
""" | |||||
graph = self._ged_data.graph(graph_id) | |||||
node_labels = [] | |||||
for n in graph.nodes(): | |||||
node_labels.append(graph.nodes[n]['label']) | |||||
if to_dict: | |||||
return [dict(i) for i in node_labels] | |||||
return node_labels | |||||
def get_graph_edges(self, graph_id, to_dict=True): | |||||
""" | |||||
Searchs and returns all the edges on a graph, selected by its ID. | |||||
:param graph_id: The ID of the wanted graph | |||||
:type graph_id: size_t | |||||
:return: The list of edges on the selected graph | |||||
:rtype: dict{tuple(size_t, size_t) : dict{string : string}} | |||||
.. seealso::get_graph_internal_id(), get_graph_num_nodes(), get_graph_num_edges(), get_original_node_ids(), get_graph_node_labels(), get_graph_adjacence_matrix() | |||||
.. note:: These functions allow to collect all the graph's informations. | |||||
""" | |||||
graph = self._ged_data.graph(graph_id) | |||||
if to_dict: | |||||
edges = {} | |||||
for n1, n2, attr in graph.edges(data=True): | |||||
edges[(n1, n2)] = dict(attr['label']) | |||||
return edges | |||||
return {(n1, n2): attr['label'] for n1, n2, attr in graph.edges(data=True)} | |||||
def get_graph_name(self, graph_id): | |||||
""" | |||||
/*! | |||||
* @brief Returns the graph name. | |||||
* @param[in] graph_id ID of an input graph that has been added to the environment. | |||||
* @return Name of the input graph. | |||||
*/ | |||||
""" | |||||
return self._ged_data._graph_names[graph_id] | |||||
def get_graph_num_nodes(self, graph_id): | |||||
""" | |||||
/*! | |||||
* @brief Returns the number of nodes. | |||||
* @param[in] graph_id ID of an input graph that has been added to the environment. | |||||
* @return Number of nodes in the graph. | |||||
*/ | |||||
""" | |||||
return nx.number_of_nodes(self._ged_data.graph(graph_id)) | |||||
def get_original_node_ids(self, graph_id): | |||||
""" | |||||
Searchs and returns all th Ids of nodes on a graph, selected by its ID. | |||||
:param graph_id: The ID of the wanted graph | |||||
:type graph_id: size_t | |||||
:return: The list of IDs's nodes on the selected graph | |||||
:rtype: list[string] | |||||
.. seealso::get_graph_internal_id(), get_graph_num_nodes(), get_graph_num_edges(), get_graph_node_labels(), get_graph_edges(), get_graph_adjacence_matrix() | |||||
.. note:: These functions allow to collect all the graph's informations. | |||||
""" | |||||
return [i for i in self._internal_to_original_node_ids[graph_id].values()] | |||||
def get_node_cost(self, node_label_1, node_label_2): | |||||
return self._ged_data.node_cost(node_label_1, node_label_2) | |||||
def get_node_rel_cost(self, node_label_1, node_label_2): | |||||
""" | |||||
/*! | |||||
* @brief Returns node relabeling cost. | |||||
* @param[in] node_label_1 First node label. | |||||
* @param[in] node_label_2 Second node label. | |||||
* @return Node relabeling cost for the given node labels. | |||||
*/ | |||||
""" | |||||
if isinstance(node_label_1, dict): | |||||
node_label_1 = tuple(sorted(node_label_1.items(), key=lambda kv: kv[0])) | |||||
if isinstance(node_label_2, dict): | |||||
node_label_2 = tuple(sorted(node_label_2.items(), key=lambda kv: kv[0])) | |||||
return self._ged_data._edit_cost.node_rel_cost_fun(node_label_1, node_label_2) # @todo: may need to use node_cost() instead (or change node_cost() and modify ged_method for pre-defined cost matrices.) | |||||
def get_node_del_cost(self, node_label): | |||||
""" | |||||
/*! | |||||
* @brief Returns node deletion cost. | |||||
* @param[in] node_label Node label. | |||||
* @return Cost of deleting node with given label. | |||||
*/ | |||||
""" | |||||
if isinstance(node_label, dict): | |||||
node_label = tuple(sorted(node_label.items(), key=lambda kv: kv[0])) | |||||
return self._ged_data._edit_cost.node_del_cost_fun(node_label) | |||||
def get_node_ins_cost(self, node_label): | |||||
""" | |||||
/*! | |||||
* @brief Returns node insertion cost. | |||||
* @param[in] node_label Node label. | |||||
* @return Cost of inserting node with given label. | |||||
*/ | |||||
""" | |||||
if isinstance(node_label, dict): | |||||
node_label = tuple(sorted(node_label.items(), key=lambda kv: kv[0])) | |||||
return self._ged_data._edit_cost.node_ins_cost_fun(node_label) | |||||
def get_edge_cost(self, edge_label_1, edge_label_2): | |||||
return self._ged_data.edge_cost(edge_label_1, edge_label_2) | |||||
def get_edge_rel_cost(self, edge_label_1, edge_label_2): | |||||
""" | |||||
/*! | |||||
* @brief Returns edge relabeling cost. | |||||
* @param[in] edge_label_1 First edge label. | |||||
* @param[in] edge_label_2 Second edge label. | |||||
* @return Edge relabeling cost for the given edge labels. | |||||
*/ | |||||
""" | |||||
if isinstance(edge_label_1, dict): | |||||
edge_label_1 = tuple(sorted(edge_label_1.items(), key=lambda kv: kv[0])) | |||||
if isinstance(edge_label_2, dict): | |||||
edge_label_2 = tuple(sorted(edge_label_2.items(), key=lambda kv: kv[0])) | |||||
return self._ged_data._edit_cost.edge_rel_cost_fun(edge_label_1, edge_label_2) | |||||
def get_edge_del_cost(self, edge_label): | |||||
""" | |||||
/*! | |||||
* @brief Returns edge deletion cost. | |||||
* @param[in] edge_label Edge label. | |||||
* @return Cost of deleting edge with given label. | |||||
*/ | |||||
""" | |||||
if isinstance(edge_label, dict): | |||||
edge_label = tuple(sorted(edge_label.items(), key=lambda kv: kv[0])) | |||||
return self._ged_data._edit_cost.edge_del_cost_fun(edge_label) | |||||
def get_edge_ins_cost(self, edge_label): | |||||
""" | |||||
/*! | |||||
* @brief Returns edge insertion cost. | |||||
* @param[in] edge_label Edge label. | |||||
* @return Cost of inserting edge with given label. | |||||
*/ | |||||
""" | |||||
if isinstance(edge_label, dict): | |||||
edge_label = tuple(sorted(edge_label.items(), key=lambda kv: kv[0])) | |||||
return self._ged_data._edit_cost.edge_ins_cost_fun(edge_label) | |||||
def get_all_graph_ids(self): | |||||
return [i for i in range(0, self._ged_data._num_graphs_without_shuffled_copies)] |
@@ -0,0 +1,102 @@ | |||||
#!/usr/bin/env python3 | |||||
# -*- coding: utf-8 -*- | |||||
""" | |||||
Created on Wed Apr 22 11:31:26 2020 | |||||
@author: ljia | |||||
""" | |||||
import numpy as np | |||||
from gklearn.utils import dummy_node, undefined_node | |||||
class NodeMap(object): | |||||
def __init__(self, num_nodes_g, num_nodes_h): | |||||
self._forward_map = [undefined_node()] * num_nodes_g | |||||
self._backward_map = [undefined_node()] * num_nodes_h | |||||
self._induced_cost = np.inf | |||||
def clear(self): | |||||
""" | |||||
/*! | |||||
* @brief Clears the node map. | |||||
*/ | |||||
""" | |||||
self._forward_map = [undefined_node() for i in range(len(self._forward_map))] | |||||
self._backward_map = [undefined_node() for i in range(len(self._backward_map))] | |||||
def num_source_nodes(self): | |||||
return len(self._forward_map) | |||||
def num_target_nodes(self): | |||||
return len(self._backward_map) | |||||
def image(self, node): | |||||
if node < len(self._forward_map): | |||||
return self._forward_map[node] | |||||
else: | |||||
raise Exception('The node with ID ', str(node), ' is not contained in the source nodes of the node map.') | |||||
return undefined_node() | |||||
def pre_image(self, node): | |||||
if node < len(self._backward_map): | |||||
return self._backward_map[node] | |||||
else: | |||||
raise Exception('The node with ID ', str(node), ' is not contained in the target nodes of the node map.') | |||||
return undefined_node() | |||||
def as_relation(self, relation): | |||||
relation.clear() | |||||
for i in range(0, len(self._forward_map)): | |||||
k = self._forward_map[i] | |||||
if k != undefined_node(): | |||||
relation.append(tuple((i, k))) | |||||
for k in range(0, len(self._backward_map)): | |||||
i = self._backward_map[k] | |||||
if i == dummy_node(): | |||||
relation.append(tuple((i, k))) | |||||
def add_assignment(self, i, k): | |||||
if i != dummy_node(): | |||||
if i < len(self._forward_map): | |||||
self._forward_map[i] = k | |||||
else: | |||||
raise Exception('The node with ID ', str(i), ' is not contained in the source nodes of the node map.') | |||||
if k != dummy_node(): | |||||
if k < len(self._backward_map): | |||||
self._backward_map[k] = i | |||||
else: | |||||
raise Exception('The node with ID ', str(k), ' is not contained in the target nodes of the node map.') | |||||
def set_induced_cost(self, induced_cost): | |||||
self._induced_cost = induced_cost | |||||
def induced_cost(self): | |||||
return self._induced_cost | |||||
@property | |||||
def forward_map(self): | |||||
return self._forward_map | |||||
@forward_map.setter | |||||
def forward_map(self, value): | |||||
self._forward_map = value | |||||
@property | |||||
def backward_map(self): | |||||
return self._backward_map | |||||
@backward_map.setter | |||||
def backward_map(self, value): | |||||
self._backward_map = value |
@@ -0,0 +1,9 @@ | |||||
#!/usr/bin/env python3 | |||||
# -*- coding: utf-8 -*- | |||||
""" | |||||
Created on Tue Jul 7 16:07:25 2020 | |||||
@author: ljia | |||||
""" | |||||
from gklearn.ged.learning.cost_matrices_learner import CostMatricesLearner |
@@ -0,0 +1,148 @@ | |||||
#!/usr/bin/env python3 | |||||
# -*- coding: utf-8 -*- | |||||
""" | |||||
Created on Tue Jul 7 11:42:48 2020 | |||||
@author: ljia | |||||
""" | |||||
import numpy as np | |||||
import cvxpy as cp | |||||
import time | |||||
from gklearn.ged.learning.costs_learner import CostsLearner | |||||
from gklearn.ged.util import compute_geds_cml | |||||
class CostMatricesLearner(CostsLearner): | |||||
def __init__(self, edit_cost='CONSTANT', triangle_rule=False, allow_zeros=True, parallel=False, verbose=2): | |||||
super().__init__(parallel, verbose) | |||||
self._edit_cost = edit_cost | |||||
self._triangle_rule = triangle_rule | |||||
self._allow_zeros = allow_zeros | |||||
def fit(self, X, y): | |||||
if self._edit_cost == 'LETTER': | |||||
raise Exception('Cannot compute for cost "LETTER".') | |||||
elif self._edit_cost == 'LETTER2': | |||||
raise Exception('Cannot compute for cost "LETTER2".') | |||||
elif self._edit_cost == 'NON_SYMBOLIC': | |||||
raise Exception('Cannot compute for cost "NON_SYMBOLIC".') | |||||
elif self._edit_cost == 'CONSTANT': # @todo: node/edge may not labeled. | |||||
if not self._triangle_rule and self._allow_zeros: | |||||
w = cp.Variable(X.shape[1]) | |||||
cost_fun = cp.sum_squares(X @ w - y) | |||||
constraints = [w >= [0.0 for i in range(X.shape[1])]] | |||||
prob = cp.Problem(cp.Minimize(cost_fun), constraints) | |||||
self.execute_cvx(prob) | |||||
edit_costs_new = w.value | |||||
residual = np.sqrt(prob.value) | |||||
elif self._triangle_rule and self._allow_zeros: # @todo | |||||
x = cp.Variable(nb_cost_mat.shape[1]) | |||||
cost_fun = cp.sum_squares(nb_cost_mat @ x - dis_k_vec) | |||||
constraints = [x >= [0.0 for i in range(nb_cost_mat.shape[1])], | |||||
np.array([1.0, 0.0, 0.0, 0.0, 0.0, 0.0]).T@x >= 0.01, | |||||
np.array([0.0, 1.0, 0.0, 0.0, 0.0, 0.0]).T@x >= 0.01, | |||||
np.array([0.0, 0.0, 0.0, 1.0, 0.0, 0.0]).T@x >= 0.01, | |||||
np.array([0.0, 0.0, 0.0, 0.0, 1.0, 0.0]).T@x >= 0.01, | |||||
np.array([1.0, 1.0, -1.0, 0.0, 0.0, 0.0]).T@x >= 0.0, | |||||
np.array([0.0, 0.0, 0.0, 1.0, 1.0, -1.0]).T@x >= 0.0] | |||||
prob = cp.Problem(cp.Minimize(cost_fun), constraints) | |||||
self._execute_cvx(prob) | |||||
edit_costs_new = x.value | |||||
residual = np.sqrt(prob.value) | |||||
elif not self._triangle_rule and not self._allow_zeros: # @todo | |||||
x = cp.Variable(nb_cost_mat.shape[1]) | |||||
cost_fun = cp.sum_squares(nb_cost_mat @ x - dis_k_vec) | |||||
constraints = [x >= [0.01 for i in range(nb_cost_mat.shape[1])]] | |||||
prob = cp.Problem(cp.Minimize(cost_fun), constraints) | |||||
self._execute_cvx(prob) | |||||
edit_costs_new = x.value | |||||
residual = np.sqrt(prob.value) | |||||
elif self._triangle_rule and not self._allow_zeros: # @todo | |||||
x = cp.Variable(nb_cost_mat.shape[1]) | |||||
cost_fun = cp.sum_squares(nb_cost_mat @ x - dis_k_vec) | |||||
constraints = [x >= [0.01 for i in range(nb_cost_mat.shape[1])], | |||||
np.array([1.0, 1.0, -1.0, 0.0, 0.0, 0.0]).T@x >= 0.0, | |||||
np.array([0.0, 0.0, 0.0, 1.0, 1.0, -1.0]).T@x >= 0.0] | |||||
prob = cp.Problem(cp.Minimize(cost_fun), constraints) | |||||
self._execute_cvx(prob) | |||||
edit_costs_new = x.value | |||||
residual = np.sqrt(prob.value) | |||||
else: | |||||
raise Exception('The edit cost "', self._ged_options['edit_cost'], '" is not supported for update progress.') | |||||
self._cost_list.append(edit_costs_new) | |||||
def init_geds_and_nb_eo(self, y, graphs): | |||||
time0 = time.time() | |||||
self._cost_list.append(np.concatenate((self._ged_options['node_label_costs'], | |||||
self._ged_options['edge_label_costs']))) | |||||
ged_vec, self._nb_eo = self.compute_geds_and_nb_eo(graphs) | |||||
self._residual_list.append(np.sqrt(np.sum(np.square(np.array(ged_vec) - y)))) | |||||
self._runtime_list.append(time.time() - time0) | |||||
if self._verbose >= 2: | |||||
print('Current node label costs:', self._cost_list[-1][0:len(self._ged_options['node_label_costs'])]) | |||||
print('Current edge label costs:', self._cost_list[-1][len(self._ged_options['node_label_costs']):]) | |||||
print('Residual list:', self._residual_list) | |||||
def update_geds_and_nb_eo(self, y, graphs, time0): | |||||
self._ged_options['node_label_costs'] = self._cost_list[-1][0:len(self._ged_options['node_label_costs'])] | |||||
self._ged_options['edge_label_costs'] = self._cost_list[-1][len(self._ged_options['node_label_costs']):] | |||||
ged_vec, self._nb_eo = self.compute_geds_and_nb_eo(graphs) | |||||
self._residual_list.append(np.sqrt(np.sum(np.square(np.array(ged_vec) - y)))) | |||||
self._runtime_list.append(time.time() - time0) | |||||
def compute_geds_and_nb_eo(self, graphs): | |||||
ged_vec, ged_mat, n_edit_operations = compute_geds_cml(graphs, options=self._ged_options, parallel=self._parallel, verbose=(self._verbose > 1)) | |||||
return ged_vec, np.array(n_edit_operations) | |||||
def check_convergency(self): | |||||
self._ec_changed = False | |||||
for i, cost in enumerate(self._cost_list[-1]): | |||||
if cost == 0: | |||||
if self._cost_list[-2][i] > self._epsilon_ec: | |||||
self._ec_changed = True | |||||
break | |||||
elif abs(cost - self._cost_list[-2][i]) / cost > self._epsilon_ec: | |||||
self._ec_changed = True | |||||
break | |||||
# if abs(cost - edit_cost_list[-2][i]) > self._epsilon_ec: | |||||
# ec_changed = True | |||||
# break | |||||
self._residual_changed = False | |||||
if self._residual_list[-1] == 0: | |||||
if self._residual_list[-2] > self._epsilon_residual: | |||||
self._residual_changed = True | |||||
elif abs(self._residual_list[-1] - self._residual_list[-2]) / self._residual_list[-1] > self._epsilon_residual: | |||||
self._residual_changed = True | |||||
self._converged = not (self._ec_changed or self._residual_changed) | |||||
if self._converged: | |||||
self._itrs_without_update += 1 | |||||
else: | |||||
self._itrs_without_update = 0 | |||||
self._num_updates_ecs += 1 | |||||
def print_current_states(self): | |||||
print() | |||||
print('-------------------------------------------------------------------------') | |||||
print('States of iteration', self._itrs + 1) | |||||
print('-------------------------------------------------------------------------') | |||||
# print('Time spend:', self._runtime_optimize_ec) | |||||
print('Total number of iterations for optimizing:', self._itrs + 1) | |||||
print('Total number of updating edit costs:', self._num_updates_ecs) | |||||
print('Was optimization of edit costs converged:', self._converged) | |||||
print('Did edit costs change:', self._ec_changed) | |||||
print('Did residual change:', self._residual_changed) | |||||
print('Iterations without update:', self._itrs_without_update) | |||||
print('Current node label costs:', self._cost_list[-1][0:len(self._ged_options['node_label_costs'])]) | |||||
print('Current edge label costs:', self._cost_list[-1][len(self._ged_options['node_label_costs']):]) | |||||
print('Residual list:', self._residual_list) | |||||
print('-------------------------------------------------------------------------') |
@@ -0,0 +1,175 @@ | |||||
#!/usr/bin/env python3 | |||||
# -*- coding: utf-8 -*- | |||||
""" | |||||
Created on Tue Jul 7 11:30:31 2020 | |||||
@author: ljia | |||||
""" | |||||
import numpy as np | |||||
import cvxpy as cp | |||||
import time | |||||
from gklearn.utils import Timer | |||||
class CostsLearner(object): | |||||
def __init__(self, parallel, verbose): | |||||
### To set. | |||||
self._parallel = parallel | |||||
self._verbose = verbose | |||||
# For update(). | |||||
self._time_limit_in_sec = 0 | |||||
self._max_itrs = 100 | |||||
self._max_itrs_without_update = 3 | |||||
self._epsilon_residual = 0.01 | |||||
self._epsilon_ec = 0.1 | |||||
### To compute. | |||||
self._residual_list = [] | |||||
self._runtime_list = [] | |||||
self._cost_list = [] | |||||
self._nb_eo = None | |||||
# For update(). | |||||
self._itrs = 0 | |||||
self._converged = False | |||||
self._num_updates_ecs = 0 | |||||
self._ec_changed = None | |||||
self._residual_changed = None | |||||
self._itrs_without_update = 0 | |||||
### Both set and get. | |||||
self._ged_options = None | |||||
def fit(self, X, y): | |||||
pass | |||||
def preprocess(self): | |||||
pass # @todo: remove the zero numbers of edit costs. | |||||
def postprocess(self): | |||||
for i in range(len(self._cost_list[-1])): | |||||
if -1e-9 <= self._cost_list[-1][i] <= 1e-9: | |||||
self._cost_list[-1][i] = 0 | |||||
if self._cost_list[-1][i] < 0: | |||||
raise ValueError('The edit cost is negative.') | |||||
def set_update_params(self, **kwargs): | |||||
self._time_limit_in_sec = kwargs.get('time_limit_in_sec', self._time_limit_in_sec) | |||||
self._max_itrs = kwargs.get('max_itrs', self._max_itrs) | |||||
self._max_itrs_without_update = kwargs.get('max_itrs_without_update', self._max_itrs_without_update) | |||||
self._epsilon_residual = kwargs.get('epsilon_residual', self._epsilon_residual) | |||||
self._epsilon_ec = kwargs.get('epsilon_ec', self._epsilon_ec) | |||||
def update(self, y, graphs, ged_options, **kwargs): | |||||
# Set parameters. | |||||
self._ged_options = ged_options | |||||
if kwargs != {}: | |||||
self.set_update_params(**kwargs) | |||||
# The initial iteration. | |||||
if self._verbose >= 2: | |||||
print('\ninitial:') | |||||
self.init_geds_and_nb_eo(y, graphs) | |||||
self._converged = False | |||||
self._itrs_without_update = 0 | |||||
self._itrs = 0 | |||||
self._num_updates_ecs = 0 | |||||
timer = Timer(self._time_limit_in_sec) | |||||
# Run iterations from initial edit costs. | |||||
while not self.termination_criterion_met(self._converged, timer, self._itrs, self._itrs_without_update): | |||||
if self._verbose >= 2: | |||||
print('\niteration', self._itrs + 1) | |||||
time0 = time.time() | |||||
# Fit GED space to the target space. | |||||
self.preprocess() | |||||
self.fit(self._nb_eo, y) | |||||
self.postprocess() | |||||
# Compute new GEDs and numbers of edit operations. | |||||
self.update_geds_and_nb_eo(y, graphs, time0) | |||||
# Check convergency. | |||||
self.check_convergency() | |||||
# Print current states. | |||||
if self._verbose >= 2: | |||||
self.print_current_states() | |||||
self._itrs += 1 | |||||
def init_geds_and_nb_eo(self, y, graphs): | |||||
pass | |||||
def update_geds_and_nb_eo(self, y, graphs, time0): | |||||
pass | |||||
def compute_geds_and_nb_eo(self, graphs): | |||||
pass | |||||
def check_convergency(self): | |||||
pass | |||||
def print_current_states(self): | |||||
pass | |||||
def termination_criterion_met(self, converged, timer, itr, itrs_without_update): | |||||
if timer.expired() or (itr >= self._max_itrs if self._max_itrs >= 0 else False): | |||||
# if self._state == AlgorithmState.TERMINATED: | |||||
# self._state = AlgorithmState.INITIALIZED | |||||
return True | |||||
return converged or (itrs_without_update > self._max_itrs_without_update if self._max_itrs_without_update >= 0 else False) | |||||
def execute_cvx(self, prob): | |||||
try: | |||||
prob.solve(verbose=(self._verbose>=2)) | |||||
except MemoryError as error0: | |||||
if self._verbose >= 2: | |||||
print('\nUsing solver "OSQP" caused a memory error.') | |||||
print('the original error message is\n', error0) | |||||
print('solver status: ', prob.status) | |||||
print('trying solver "CVXOPT" instead...\n') | |||||
try: | |||||
prob.solve(solver=cp.CVXOPT, verbose=(self._verbose>=2)) | |||||
except Exception as error1: | |||||
if self._verbose >= 2: | |||||
print('\nAn error occured when using solver "CVXOPT".') | |||||
print('the original error message is\n', error1) | |||||
print('solver status: ', prob.status) | |||||
print('trying solver "MOSEK" instead. Notice this solver is commercial and a lisence is required.\n') | |||||
prob.solve(solver=cp.MOSEK, verbose=(self._verbose>=2)) | |||||
else: | |||||
if self._verbose >= 2: | |||||
print('solver status: ', prob.status) | |||||
else: | |||||
if self._verbose >= 2: | |||||
print('solver status: ', prob.status) | |||||
if self._verbose >= 2: | |||||
print() | |||||
def get_results(self): | |||||
results = {} | |||||
results['residual_list'] = self._residual_list | |||||
results['runtime_list'] = self._runtime_list | |||||
results['cost_list'] = self._cost_list | |||||
results['nb_eo'] = self._nb_eo | |||||
results['itrs'] = self._itrs | |||||
results['converged'] = self._converged | |||||
results['num_updates_ecs'] = self._num_updates_ecs | |||||
results['ec_changed'] = self._ec_changed | |||||
results['residual_changed'] = self._residual_changed | |||||
results['itrs_without_update'] = self._itrs_without_update | |||||
return results |
@@ -0,0 +1,4 @@ | |||||
from gklearn.ged.median.median_graph_estimator import MedianGraphEstimator | |||||
from gklearn.ged.median.median_graph_estimator_py import MedianGraphEstimatorPy | |||||
from gklearn.ged.median.median_graph_estimator_cml import MedianGraphEstimatorCML | |||||
from gklearn.ged.median.utils import constant_node_costs, mge_options_to_string |
@@ -0,0 +1,159 @@ | |||||
#!/usr/bin/env python3 | |||||
# -*- coding: utf-8 -*- | |||||
""" | |||||
Created on Mon Mar 16 17:26:40 2020 | |||||
@author: ljia | |||||
""" | |||||
def test_median_graph_estimator(): | |||||
from gklearn.utils import load_dataset | |||||
from gklearn.ged.median import MedianGraphEstimator, constant_node_costs | |||||
from gklearn.gedlib import librariesImport, gedlibpy | |||||
from gklearn.preimage.utils import get_same_item_indices | |||||
import multiprocessing | |||||
# estimator parameters. | |||||
init_type = 'MEDOID' | |||||
num_inits = 1 | |||||
threads = multiprocessing.cpu_count() | |||||
time_limit = 60000 | |||||
# algorithm parameters. | |||||
algo = 'IPFP' | |||||
initial_solutions = 1 | |||||
algo_options_suffix = ' --initial-solutions ' + str(initial_solutions) + ' --ratio-runs-from-initial-solutions 1 --initialization-method NODE ' | |||||
edit_cost_name = 'LETTER2' | |||||
edit_cost_constants = [0.02987291, 0.0178211, 0.01431966, 0.001, 0.001] | |||||
ds_name = 'Letter_high' | |||||
# Load dataset. | |||||
# dataset = '../../datasets/COIL-DEL/COIL-DEL_A.txt' | |||||
dataset = '../../../datasets/Letter-high/Letter-high_A.txt' | |||||
Gn, y_all, label_names = load_dataset(dataset) | |||||
y_idx = get_same_item_indices(y_all) | |||||
for i, (y, values) in enumerate(y_idx.items()): | |||||
Gn_i = [Gn[val] for val in values] | |||||
break | |||||
# Set up the environment. | |||||
ged_env = gedlibpy.GEDEnv() | |||||
# gedlibpy.restart_env() | |||||
ged_env.set_edit_cost(edit_cost_name, edit_cost_constant=edit_cost_constants) | |||||
for G in Gn_i: | |||||
ged_env.add_nx_graph(G, '') | |||||
graph_ids = ged_env.get_all_graph_ids() | |||||
set_median_id = ged_env.add_graph('set_median') | |||||
gen_median_id = ged_env.add_graph('gen_median') | |||||
ged_env.init(init_option='EAGER_WITHOUT_SHUFFLED_COPIES') | |||||
# Set up the estimator. | |||||
mge = MedianGraphEstimator(ged_env, constant_node_costs(edit_cost_name)) | |||||
mge.set_refine_method(algo, '--threads ' + str(threads) + ' --initial-solutions ' + str(initial_solutions) + ' --ratio-runs-from-initial-solutions 1') | |||||
mge_options = '--time-limit ' + str(time_limit) + ' --stdout 2 --init-type ' + init_type | |||||
mge_options += ' --random-inits ' + str(num_inits) + ' --seed ' + '1' + ' --update-order TRUE --refine FALSE --randomness PSEUDO --parallel TRUE '# @todo: std::to_string(rng()) | |||||
# Select the GED algorithm. | |||||
algo_options = '--threads ' + str(threads) + algo_options_suffix | |||||
mge.set_options(mge_options) | |||||
mge.set_label_names(node_labels=label_names['node_labels'], | |||||
edge_labels=label_names['edge_labels'], | |||||
node_attrs=label_names['node_attrs'], | |||||
edge_attrs=label_names['edge_attrs']) | |||||
mge.set_init_method(algo, algo_options) | |||||
mge.set_descent_method(algo, algo_options) | |||||
# Run the estimator. | |||||
mge.run(graph_ids, set_median_id, gen_median_id) | |||||
# Get SODs. | |||||
sod_sm = mge.get_sum_of_distances('initialized') | |||||
sod_gm = mge.get_sum_of_distances('converged') | |||||
print('sod_sm, sod_gm: ', sod_sm, sod_gm) | |||||
# Get median graphs. | |||||
set_median = ged_env.get_nx_graph(set_median_id) | |||||
gen_median = ged_env.get_nx_graph(gen_median_id) | |||||
return set_median, gen_median | |||||
def test_median_graph_estimator_symb(): | |||||
from gklearn.utils import load_dataset | |||||
from gklearn.ged.median import MedianGraphEstimator, constant_node_costs | |||||
from gklearn.gedlib import librariesImport, gedlibpy | |||||
from gklearn.preimage.utils import get_same_item_indices | |||||
import multiprocessing | |||||
# estimator parameters. | |||||
init_type = 'MEDOID' | |||||
num_inits = 1 | |||||
threads = multiprocessing.cpu_count() | |||||
time_limit = 60000 | |||||
# algorithm parameters. | |||||
algo = 'IPFP' | |||||
initial_solutions = 1 | |||||
algo_options_suffix = ' --initial-solutions ' + str(initial_solutions) + ' --ratio-runs-from-initial-solutions 1 --initialization-method NODE ' | |||||
edit_cost_name = 'CONSTANT' | |||||
edit_cost_constants = [4, 4, 2, 1, 1, 1] | |||||
ds_name = 'MUTAG' | |||||
# Load dataset. | |||||
dataset = '../../../datasets/MUTAG/MUTAG_A.txt' | |||||
Gn, y_all, label_names = load_dataset(dataset) | |||||
y_idx = get_same_item_indices(y_all) | |||||
for i, (y, values) in enumerate(y_idx.items()): | |||||
Gn_i = [Gn[val] for val in values] | |||||
break | |||||
Gn_i = Gn_i[0:10] | |||||
# Set up the environment. | |||||
ged_env = gedlibpy.GEDEnv() | |||||
# gedlibpy.restart_env() | |||||
ged_env.set_edit_cost(edit_cost_name, edit_cost_constant=edit_cost_constants) | |||||
for G in Gn_i: | |||||
ged_env.add_nx_graph(G, '') | |||||
graph_ids = ged_env.get_all_graph_ids() | |||||
set_median_id = ged_env.add_graph('set_median') | |||||
gen_median_id = ged_env.add_graph('gen_median') | |||||
ged_env.init(init_option='EAGER_WITHOUT_SHUFFLED_COPIES') | |||||
# Set up the estimator. | |||||
mge = MedianGraphEstimator(ged_env, constant_node_costs(edit_cost_name)) | |||||
mge.set_refine_method(algo, '--threads ' + str(threads) + ' --initial-solutions ' + str(initial_solutions) + ' --ratio-runs-from-initial-solutions 1') | |||||
mge_options = '--time-limit ' + str(time_limit) + ' --stdout 2 --init-type ' + init_type | |||||
mge_options += ' --random-inits ' + str(num_inits) + ' --seed ' + '1' + ' --update-order TRUE --refine FALSE --randomness PSEUDO --parallel TRUE '# @todo: std::to_string(rng()) | |||||
# Select the GED algorithm. | |||||
algo_options = '--threads ' + str(threads) + algo_options_suffix | |||||
mge.set_options(mge_options) | |||||
mge.set_label_names(node_labels=label_names['node_labels'], | |||||
edge_labels=label_names['edge_labels'], | |||||
node_attrs=label_names['node_attrs'], | |||||
edge_attrs=label_names['edge_attrs']) | |||||
mge.set_init_method(algo, algo_options) | |||||
mge.set_descent_method(algo, algo_options) | |||||
# Run the estimator. | |||||
mge.run(graph_ids, set_median_id, gen_median_id) | |||||
# Get SODs. | |||||
sod_sm = mge.get_sum_of_distances('initialized') | |||||
sod_gm = mge.get_sum_of_distances('converged') | |||||
print('sod_sm, sod_gm: ', sod_sm, sod_gm) | |||||
# Get median graphs. | |||||
set_median = ged_env.get_nx_graph(set_median_id) | |||||
gen_median = ged_env.get_nx_graph(gen_median_id) | |||||
return set_median, gen_median | |||||
if __name__ == '__main__': | |||||
# set_median, gen_median = test_median_graph_estimator() | |||||
set_median, gen_median = test_median_graph_estimator_symb() |
@@ -0,0 +1,63 @@ | |||||
#!/usr/bin/env python3 | |||||
# -*- coding: utf-8 -*- | |||||
""" | |||||
Created on Wed Apr 1 15:12:31 2020 | |||||
@author: ljia | |||||
""" | |||||
def constant_node_costs(edit_cost_name): | |||||
if edit_cost_name == 'NON_SYMBOLIC' or edit_cost_name == 'LETTER2' or edit_cost_name == 'LETTER': | |||||
return False | |||||
elif edit_cost_name == 'CONSTANT': | |||||
return True | |||||
else: | |||||
raise Exception('Can not recognize the given edit cost. Possible edit costs include: "NON_SYMBOLIC", "LETTER", "LETTER2", "CONSTANT".') | |||||
# elif edit_cost_name != '': | |||||
# # throw ged::Error("Invalid dataset " + dataset + ". Usage: ./median_tests <AIDS|Mutagenicity|Letter-high|Letter-med|Letter-low|monoterpenoides|SYNTHETICnew|Fingerprint|COIL-DEL>"); | |||||
# return False | |||||
# return True | |||||
def mge_options_to_string(options): | |||||
opt_str = ' ' | |||||
for key, val in options.items(): | |||||
if key == 'init_type': | |||||
opt_str += '--init-type ' + str(val) + ' ' | |||||
elif key == 'random_inits': | |||||
opt_str += '--random-inits ' + str(val) + ' ' | |||||
elif key == 'randomness': | |||||
opt_str += '--randomness ' + str(val) + ' ' | |||||
elif key == 'verbose': | |||||
opt_str += '--stdout ' + str(val) + ' ' | |||||
elif key == 'parallel': | |||||
opt_str += '--parallel ' + ('TRUE' if val else 'FALSE') + ' ' | |||||
elif key == 'update_order': | |||||
opt_str += '--update-order ' + ('TRUE' if val else 'FALSE') + ' ' | |||||
elif key == 'sort_graphs': | |||||
opt_str += '--sort-graphs ' + ('TRUE' if val else 'FALSE') + ' ' | |||||
elif key == 'refine': | |||||
opt_str += '--refine ' + ('TRUE' if val else 'FALSE') + ' ' | |||||
elif key == 'time_limit': | |||||
opt_str += '--time-limit ' + str(val) + ' ' | |||||
elif key == 'max_itrs': | |||||
opt_str += '--max-itrs ' + str(val) + ' ' | |||||
elif key == 'max_itrs_without_update': | |||||
opt_str += '--max-itrs-without-update ' + str(val) + ' ' | |||||
elif key == 'seed': | |||||
opt_str += '--seed ' + str(val) + ' ' | |||||
elif key == 'epsilon': | |||||
opt_str += '--epsilon ' + str(val) + ' ' | |||||
elif key == 'inits_increase_order': | |||||
opt_str += '--inits-increase-order ' + str(val) + ' ' | |||||
elif key == 'init_type_increase_order': | |||||
opt_str += '--init-type-increase-order ' + str(val) + ' ' | |||||
elif key == 'max_itrs_increase_order': | |||||
opt_str += '--max-itrs-increase-order ' + str(val) + ' ' | |||||
# else: | |||||
# valid_options = '[--init-type <arg>] [--random_inits <arg>] [--randomness <arg>] [--seed <arg>] [--verbose <arg>] ' | |||||
# valid_options += '[--time_limit <arg>] [--max_itrs <arg>] [--epsilon <arg>] ' | |||||
# valid_options += '[--inits_increase_order <arg>] [--init_type_increase_order <arg>] [--max_itrs_increase_order <arg>]' | |||||
# raise Exception('Invalid option "' + key + '". Options available = "' + valid_options + '"') | |||||
return opt_str |
@@ -0,0 +1,3 @@ | |||||
from gklearn.ged.methods.ged_method import GEDMethod | |||||
from gklearn.ged.methods.lsape_based_method import LSAPEBasedMethod | |||||
from gklearn.ged.methods.bipartite import Bipartite |
@@ -0,0 +1,117 @@ | |||||
#!/usr/bin/env python3 | |||||
# -*- coding: utf-8 -*- | |||||
""" | |||||
Created on Thu Jun 18 16:09:29 2020 | |||||
@author: ljia | |||||
""" | |||||
import numpy as np | |||||
import networkx as nx | |||||
from gklearn.ged.methods import LSAPEBasedMethod | |||||
from gklearn.ged.util import LSAPESolver | |||||
from gklearn.utils import SpecialLabel | |||||
class Bipartite(LSAPEBasedMethod): | |||||
def __init__(self, ged_data): | |||||
super().__init__(ged_data) | |||||
self._compute_lower_bound = False | |||||
########################################################################### | |||||
# Inherited member functions from LSAPEBasedMethod. | |||||
########################################################################### | |||||
def _lsape_populate_instance(self, g, h, master_problem): | |||||
# #ifdef _OPENMP | |||||
for row_in_master in range(0, nx.number_of_nodes(g)): | |||||
for col_in_master in range(0, nx.number_of_nodes(h)): | |||||
master_problem[row_in_master, col_in_master] = self._compute_substitution_cost(g, h, row_in_master, col_in_master) | |||||
for row_in_master in range(0, nx.number_of_nodes(g)): | |||||
master_problem[row_in_master, nx.number_of_nodes(h) + row_in_master] = self._compute_deletion_cost(g, row_in_master) | |||||
for col_in_master in range(0, nx.number_of_nodes(h)): | |||||
master_problem[nx.number_of_nodes(g) + col_in_master, col_in_master] = self._compute_insertion_cost(h, col_in_master) | |||||
# for row_in_master in range(0, master_problem.shape[0]): | |||||
# for col_in_master in range(0, master_problem.shape[1]): | |||||
# if row_in_master < nx.number_of_nodes(g) and col_in_master < nx.number_of_nodes(h): | |||||
# master_problem[row_in_master, col_in_master] = self._compute_substitution_cost(g, h, row_in_master, col_in_master) | |||||
# elif row_in_master < nx.number_of_nodes(g): | |||||
# master_problem[row_in_master, nx.number_of_nodes(h)] = self._compute_deletion_cost(g, row_in_master) | |||||
# elif col_in_master < nx.number_of_nodes(h): | |||||
# master_problem[nx.number_of_nodes(g), col_in_master] = self._compute_insertion_cost(h, col_in_master) | |||||
########################################################################### | |||||
# Helper member functions. | |||||
########################################################################### | |||||
def _compute_substitution_cost(self, g, h, u, v): | |||||
# Collect node substitution costs. | |||||
cost = self._ged_data.node_cost(g.nodes[u]['label'], h.nodes[v]['label']) | |||||
# Initialize subproblem. | |||||
d1, d2 = g.degree[u], h.degree[v] | |||||
subproblem = np.ones((d1 + d2, d1 + d2)) * np.inf | |||||
subproblem[d1:, d2:] = 0 | |||||
# subproblem = np.empty((g.degree[u] + 1, h.degree[v] + 1)) | |||||
# Collect edge deletion costs. | |||||
i = 0 # @todo: should directed graphs be considered? | |||||
for label in g[u].values(): # all u's neighbor | |||||
subproblem[i, d2 + i] = self._ged_data.edge_cost(label['label'], SpecialLabel.DUMMY) | |||||
# subproblem[i, h.degree[v]] = self._ged_data.edge_cost(label['label'], SpecialLabel.DUMMY) | |||||
i += 1 | |||||
# Collect edge insertion costs. | |||||
i = 0 # @todo: should directed graphs be considered? | |||||
for label in h[v].values(): # all u's neighbor | |||||
subproblem[d1 + i, i] = self._ged_data.edge_cost(SpecialLabel.DUMMY, label['label']) | |||||
# subproblem[g.degree[u], i] = self._ged_data.edge_cost(SpecialLabel.DUMMY, label['label']) | |||||
i += 1 | |||||
# Collect edge relabelling costs. | |||||
i = 0 | |||||
for label1 in g[u].values(): | |||||
j = 0 | |||||
for label2 in h[v].values(): | |||||
subproblem[i, j] = self._ged_data.edge_cost(label1['label'], label2['label']) | |||||
j += 1 | |||||
i += 1 | |||||
# Solve subproblem. | |||||
subproblem_solver = LSAPESolver(subproblem) | |||||
subproblem_solver.set_model(self._lsape_model) | |||||
subproblem_solver.solve() | |||||
# Update and return overall substitution cost. | |||||
cost += subproblem_solver.minimal_cost() | |||||
return cost | |||||
def _compute_deletion_cost(self, g, v): | |||||
# Collect node deletion cost. | |||||
cost = self._ged_data.node_cost(g.nodes[v]['label'], SpecialLabel.DUMMY) | |||||
# Collect edge deletion costs. | |||||
for label in g[v].values(): | |||||
cost += self._ged_data.edge_cost(label['label'], SpecialLabel.DUMMY) | |||||
# Return overall deletion cost. | |||||
return cost | |||||
def _compute_insertion_cost(self, g, v): | |||||
# Collect node insertion cost. | |||||
cost = self._ged_data.node_cost(SpecialLabel.DUMMY, g.nodes[v]['label']) | |||||
# Collect edge insertion costs. | |||||
for label in g[v].values(): | |||||
cost += self._ged_data.edge_cost(SpecialLabel.DUMMY, label['label']) | |||||
# Return overall insertion cost. | |||||
return cost |
@@ -0,0 +1,195 @@ | |||||
#!/usr/bin/env python3 | |||||
# -*- coding: utf-8 -*- | |||||
""" | |||||
Created on Thu Jun 18 15:52:35 2020 | |||||
@author: ljia | |||||
""" | |||||
import numpy as np | |||||
import time | |||||
import networkx as nx | |||||
class GEDMethod(object): | |||||
def __init__(self, ged_data): | |||||
self._initialized = False | |||||
self._ged_data = ged_data | |||||
self._options = None | |||||
self._lower_bound = 0 | |||||
self._upper_bound = np.inf | |||||
self._node_map = [0, 0] # @todo | |||||
self._runtime = None | |||||
self._init_time = None | |||||
def init(self): | |||||
"""Initializes the method with options specified by set_options(). | |||||
""" | |||||
start = time.time() | |||||
self._ged_init() | |||||
end = time.time() | |||||
self._init_time = end - start | |||||
self._initialized = True | |||||
def set_options(self, options): | |||||
""" | |||||
/*! | |||||
* @brief Sets the options of the method. | |||||
* @param[in] options String of the form <tt>[--@<option@> @<arg@>] [...]</tt>, where @p option contains neither spaces nor single quotes, | |||||
* and @p arg contains neither spaces nor single quotes or is of the form <tt>'[--@<sub-option@> @<sub-arg@>] [...]'</tt>, | |||||
* where both @p sub-option and @p sub-arg contain neither spaces nor single quotes. | |||||
*/ | |||||
""" | |||||
self._ged_set_default_options() | |||||
for key, val in options.items(): | |||||
if not self._ged_parse_option(key, val): | |||||
raise Exception('Invalid option "', key, '". Usage: options = "' + self._ged_valid_options_string() + '".') # @todo: not implemented. | |||||
self._initialized = False | |||||
def run(self, g_id, h_id): | |||||
""" | |||||
/*! | |||||
* @brief Runs the method with options specified by set_options(). | |||||
* @param[in] g_id ID of input graph. | |||||
* @param[in] h_id ID of input graph. | |||||
*/ | |||||
""" | |||||
start = time.time() | |||||
result = self.run_as_util(self._ged_data._graphs[g_id], self._ged_data._graphs[h_id]) | |||||
end = time.time() | |||||
self._lower_bound = result['lower_bound'] | |||||
self._upper_bound = result['upper_bound'] | |||||
if len(result['node_maps']) > 0: | |||||
self._node_map = result['node_maps'][0] | |||||
self._runtime = end - start | |||||
def run_as_util(self, g, h): | |||||
""" | |||||
/*! | |||||
* @brief Runs the method with options specified by set_options(). | |||||
* @param[in] g Input graph. | |||||
* @param[in] h Input graph. | |||||
* @param[out] result Result variable. | |||||
*/ | |||||
""" | |||||
# Compute optimal solution and return if at least one of the two graphs is empty. | |||||
if nx.number_of_nodes(g) == 0 or nx.number_of_nodes(h) == 0: | |||||
print('This is not implemented.') | |||||
pass # @todo: | |||||
# Run the method. | |||||
return self._ged_run(g, h) | |||||
def get_upper_bound(self): | |||||
""" | |||||
/*! | |||||
* @brief Returns an upper bound. | |||||
* @return Upper bound for graph edit distance provided by last call to run() or -1 if the method does not yield an upper bound. | |||||
*/ | |||||
""" | |||||
return self._upper_bound | |||||
def get_lower_bound(self): | |||||
""" | |||||
/*! | |||||
* @brief Returns a lower bound. | |||||
* @return Lower bound for graph edit distance provided by last call to run() or -1 if the method does not yield a lower bound. | |||||
*/ | |||||
""" | |||||
return self._lower_bound | |||||
def get_runtime(self): | |||||
""" | |||||
/*! | |||||
* @brief Returns the runtime. | |||||
* @return Runtime of last call to run() in seconds. | |||||
*/ | |||||
""" | |||||
return self._runtime | |||||
def get_init_time(self): | |||||
""" | |||||
/*! | |||||
* @brief Returns the initialization time. | |||||
* @return Runtime of last call to init() in seconds. | |||||
*/ | |||||
""" | |||||
return self._init_time | |||||
def get_node_map(self): | |||||
""" | |||||
/*! | |||||
* @brief Returns a graph matching. | |||||
* @return Constant reference to graph matching provided by last call to run() or to an empty matching if the method does not yield a matching. | |||||
*/ | |||||
""" | |||||
return self._node_map | |||||
def _ged_init(self): | |||||
""" | |||||
/*! | |||||
* @brief Initializes the method. | |||||
* @note Must be overridden by derived classes that require initialization. | |||||
*/ | |||||
""" | |||||
pass | |||||
def _ged_parse_option(self, option, arg): | |||||
""" | |||||
/*! | |||||
* @brief Parses one option. | |||||
* @param[in] option The name of the option. | |||||
* @param[in] arg The argument of the option. | |||||
* @return Boolean @p true if @p option is a valid option name for the method and @p false otherwise. | |||||
* @note Must be overridden by derived classes that have options. | |||||
*/ | |||||
""" | |||||
return False | |||||
def _ged_run(self, g, h): | |||||
""" | |||||
/*! | |||||
* @brief Runs the method with options specified by set_options(). | |||||
* @param[in] g Input graph. | |||||
* @param[in] h Input graph. | |||||
* @param[out] result Result variable. | |||||
* @note Must be overridden by derived classes. | |||||
*/ | |||||
""" | |||||
return {} | |||||
def _ged_valid_options_string(self): | |||||
""" | |||||
/*! | |||||
* @brief Returns string of all valid options. | |||||
* @return String of the form <tt>[--@<option@> @<arg@>] [...]</tt>. | |||||
* @note Must be overridden by derived classes that have options. | |||||
*/ | |||||
""" | |||||
return '' | |||||
def _ged_set_default_options(self): | |||||
""" | |||||
/*! | |||||
* @brief Sets all options to default values. | |||||
* @note Must be overridden by derived classes that have options. | |||||
*/ | |||||
""" | |||||
pass | |||||
@@ -0,0 +1,254 @@ | |||||
#!/usr/bin/env python3 | |||||
# -*- coding: utf-8 -*- | |||||
""" | |||||
Created on Thu Jun 18 16:01:24 2020 | |||||
@author: ljia | |||||
""" | |||||
import numpy as np | |||||
import networkx as nx | |||||
from gklearn.ged.methods import GEDMethod | |||||
from gklearn.ged.util import LSAPESolver, misc | |||||
from gklearn.ged.env import NodeMap | |||||
class LSAPEBasedMethod(GEDMethod): | |||||
def __init__(self, ged_data): | |||||
super().__init__(ged_data) | |||||
self._lsape_model = None # @todo: LSAPESolver::ECBP | |||||
self._greedy_method = None # @todo: LSAPESolver::BASIC | |||||
self._compute_lower_bound = True | |||||
self._solve_optimally = True | |||||
self._num_threads = 1 | |||||
self._centrality_method = 'NODE' # @todo | |||||
self._centrality_weight = 0.7 | |||||
self._centralities = {} | |||||
self._max_num_solutions = 1 | |||||
def populate_instance_and_run_as_util(self, g, h): #, lsape_instance): | |||||
""" | |||||
/*! | |||||
* @brief Runs the method with options specified by set_options() and provides access to constructed LSAPE instance. | |||||
* @param[in] g Input graph. | |||||
* @param[in] h Input graph. | |||||
* @param[out] result Result variable. | |||||
* @param[out] lsape_instance LSAPE instance. | |||||
*/ | |||||
""" | |||||
result = {'node_maps': [], 'lower_bound': 0, 'upper_bound': np.inf} | |||||
# Populate the LSAPE instance and set up the solver. | |||||
nb1, nb2 = nx.number_of_nodes(g), nx.number_of_nodes(h) | |||||
lsape_instance = np.ones((nb1 + nb2, nb1 + nb2)) * np.inf | |||||
# lsape_instance = np.empty((nx.number_of_nodes(g) + 1, nx.number_of_nodes(h) + 1)) | |||||
self.populate_instance(g, h, lsape_instance) | |||||
# nb1, nb2 = nx.number_of_nodes(g), nx.number_of_nodes(h) | |||||
# lsape_instance_new = np.empty((nb1 + nb2, nb1 + nb2)) * np.inf | |||||
# lsape_instance_new[nb1:, nb2:] = 0 | |||||
# lsape_instance_new[0:nb1, 0:nb2] = lsape_instance[0:nb1, 0:nb2] | |||||
# for i in range(nb1): # all u's neighbor | |||||
# lsape_instance_new[i, nb2 + i] = lsape_instance[i, nb2] | |||||
# for i in range(nb2): # all u's neighbor | |||||
# lsape_instance_new[nb1 + i, i] = lsape_instance[nb2, i] | |||||
# lsape_solver = LSAPESolver(lsape_instance_new) | |||||
lsape_solver = LSAPESolver(lsape_instance) | |||||
# Solve the LSAPE instance. | |||||
if self._solve_optimally: | |||||
lsape_solver.set_model(self._lsape_model) | |||||
else: | |||||
lsape_solver.set_greedy_method(self._greedy_method) | |||||
lsape_solver.solve(self._max_num_solutions) | |||||
# Compute and store lower and upper bound. | |||||
if self._compute_lower_bound and self._solve_optimally: | |||||
result['lower_bound'] = lsape_solver.minimal_cost() * self._lsape_lower_bound_scaling_factor(g, h) # @todo: test | |||||
for solution_id in range(0, lsape_solver.num_solutions()): | |||||
result['node_maps'].append(NodeMap(nx.number_of_nodes(g), nx.number_of_nodes(h))) | |||||
misc.construct_node_map_from_solver(lsape_solver, result['node_maps'][-1], solution_id) | |||||
self._ged_data.compute_induced_cost(g, h, result['node_maps'][-1]) | |||||
# Add centralities and reoptimize. | |||||
if self._centrality_weight > 0 and self._centrality_method != 'NODE': | |||||
print('This is not implemented.') | |||||
pass # @todo | |||||
# Sort the node maps and set the upper bound. | |||||
if len(result['node_maps']) > 1 or len(result['node_maps']) > self._max_num_solutions: | |||||
print('This is not implemented.') # @todo: | |||||
pass | |||||
if len(result['node_maps']) == 0: | |||||
result['upper_bound'] = np.inf | |||||
else: | |||||
result['upper_bound'] = result['node_maps'][0].induced_cost() | |||||
return result | |||||
def populate_instance(self, g, h, lsape_instance): | |||||
""" | |||||
/*! | |||||
* @brief Populates the LSAPE instance. | |||||
* @param[in] g Input graph. | |||||
* @param[in] h Input graph. | |||||
* @param[out] lsape_instance LSAPE instance. | |||||
*/ | |||||
""" | |||||
if not self._initialized: | |||||
pass | |||||
# @todo: if (not this->initialized_) { | |||||
self._lsape_populate_instance(g, h, lsape_instance) | |||||
lsape_instance[nx.number_of_nodes(g):, nx.number_of_nodes(h):] = 0 | |||||
# lsape_instance[nx.number_of_nodes(g), nx.number_of_nodes(h)] = 0 | |||||
########################################################################### | |||||
# Member functions inherited from GEDMethod. | |||||
########################################################################### | |||||
def _ged_init(self): | |||||
self._lsape_pre_graph_init(False) | |||||
for graph in self._ged_data._graphs: | |||||
self._init_graph(graph) | |||||
self._lsape_init() | |||||
def _ged_run(self, g, h): | |||||
# lsape_instance = np.empty((0, 0)) | |||||
result = self.populate_instance_and_run_as_util(g, h) # , lsape_instance) | |||||
return result | |||||
def _ged_parse_option(self, option, arg): | |||||
is_valid_option = False | |||||
if option == 'threads': # @todo: try.. catch... | |||||
self._num_threads = arg | |||||
is_valid_option = True | |||||
elif option == 'lsape_model': | |||||
self._lsape_model = arg # @todo | |||||
is_valid_option = True | |||||
elif option == 'greedy_method': | |||||
self._greedy_method = arg # @todo | |||||
is_valid_option = True | |||||
elif option == 'optimal': | |||||
self._solve_optimally = arg # @todo | |||||
is_valid_option = True | |||||
elif option == 'centrality_method': | |||||
self._centrality_method = arg # @todo | |||||
is_valid_option = True | |||||
elif option == 'centrality_weight': | |||||
self._centrality_weight = arg # @todo | |||||
is_valid_option = True | |||||
elif option == 'max_num_solutions': | |||||
if arg == 'ALL': | |||||
self._max_num_solutions = -1 | |||||
else: | |||||
self._max_num_solutions = arg # @todo | |||||
is_valid_option = True | |||||
is_valid_option = is_valid_option or self._lsape_parse_option(option, arg) | |||||
is_valid_option = True # @todo: this is not in the C++ code. | |||||
return is_valid_option | |||||
def _ged_set_default_options(self): | |||||
self._lsape_model = None # @todo: LSAPESolver::ECBP | |||||
self._greedy_method = None # @todo: LSAPESolver::BASIC | |||||
self._solve_optimally = True | |||||
self._num_threads = 1 | |||||
self._centrality_method = 'NODE' # @todo | |||||
self._centrality_weight = 0.7 | |||||
self._max_num_solutions = 1 | |||||
########################################################################### | |||||
# Private helper member functions. | |||||
########################################################################### | |||||
def _init_graph(self, graph): | |||||
if self._centrality_method != 'NODE': | |||||
self._init_centralities(graph) # @todo | |||||
self._lsape_init_graph(graph) | |||||
########################################################################### | |||||
# Virtual member functions to be overridden by derived classes. | |||||
########################################################################### | |||||
def _lsape_init(self): | |||||
""" | |||||
/*! | |||||
* @brief Initializes the method after initializing the global variables for the graphs. | |||||
* @note Must be overridden by derived classes of ged::LSAPEBasedMethod that require custom initialization. | |||||
*/ | |||||
""" | |||||
pass | |||||
def _lsape_parse_option(self, option, arg): | |||||
""" | |||||
/*! | |||||
* @brief Parses one option that is not among the ones shared by all derived classes of ged::LSAPEBasedMethod. | |||||
* @param[in] option The name of the option. | |||||
* @param[in] arg The argument of the option. | |||||
* @return Returns true if @p option is a valid option name for the method and false otherwise. | |||||
* @note Must be overridden by derived classes of ged::LSAPEBasedMethod that have options that are not among the ones shared by all derived classes of ged::LSAPEBasedMethod. | |||||
*/ | |||||
""" | |||||
return False | |||||
def _lsape_set_default_options(self): | |||||
""" | |||||
/*! | |||||
* @brief Sets all options that are not among the ones shared by all derived classes of ged::LSAPEBasedMethod to default values. | |||||
* @note Must be overridden by derived classes of ged::LSAPEBasedMethod that have options that are not among the ones shared by all derived classes of ged::LSAPEBasedMethod. | |||||
*/ | |||||
""" | |||||
pass | |||||
def _lsape_populate_instance(self, g, h, lsape_instance): | |||||
""" | |||||
/*! | |||||
* @brief Populates the LSAPE instance. | |||||
* @param[in] g Input graph. | |||||
* @param[in] h Input graph. | |||||
* @param[out] lsape_instance LSAPE instance of size (n + 1) x (m + 1), where n and m are the number of nodes in @p g and @p h. The last row and the last column represent insertion and deletion. | |||||
* @note Must be overridden by derived classes of ged::LSAPEBasedMethod. | |||||
*/ | |||||
""" | |||||
pass | |||||
def _lsape_init_graph(self, graph): | |||||
""" | |||||
/*! | |||||
* @brief Initializes global variables for one graph. | |||||
* @param[in] graph Graph for which the global variables have to be initialized. | |||||
* @note Must be overridden by derived classes of ged::LSAPEBasedMethod that require to initialize custom global variables. | |||||
*/ | |||||
""" | |||||
pass | |||||
def _lsape_pre_graph_init(self, called_at_runtime): | |||||
""" | |||||
/*! | |||||
* @brief Initializes the method at runtime or during initialization before initializing the global variables for the graphs. | |||||
* @param[in] called_at_runtime Equals @p true if called at runtime and @p false if called during initialization. | |||||
* @brief Must be overridden by derived classes of ged::LSAPEBasedMethod that require default initialization at runtime before initializing the global variables for the graphs. | |||||
*/ | |||||
""" | |||||
pass |
@@ -0,0 +1,3 @@ | |||||
from gklearn.ged.util.lsape_solver import LSAPESolver | |||||
from gklearn.ged.util.util import compute_geds, ged_options_to_string | |||||
from gklearn.ged.util.util import compute_geds_cml, label_costs_to_matrix |
@@ -0,0 +1,134 @@ | |||||
#!/usr/bin/env python3 | |||||
# -*- coding: utf-8 -*- | |||||
""" | |||||
Created on Fri Mar 20 11:09:04 2020 | |||||
@author: ljia | |||||
""" | |||||
import re | |||||
def convert_function(cpp_code): | |||||
# f_cpp = open('cpp_code.cpp', 'r') | |||||
# # f_cpp = open('cpp_ext/src/median_graph_estimator.ipp', 'r') | |||||
# cpp_code = f_cpp.read() | |||||
python_code = cpp_code.replace('else if (', 'elif ') | |||||
python_code = python_code.replace('if (', 'if ') | |||||
python_code = python_code.replace('else {', 'else:') | |||||
python_code = python_code.replace(') {', ':') | |||||
python_code = python_code.replace(';\n', '\n') | |||||
python_code = re.sub('\n(.*)}\n', '\n\n', python_code) | |||||
# python_code = python_code.replace('}\n', '') | |||||
python_code = python_code.replace('throw', 'raise') | |||||
python_code = python_code.replace('error', 'Exception') | |||||
python_code = python_code.replace('"', '\'') | |||||
python_code = python_code.replace('\\\'', '"') | |||||
python_code = python_code.replace('try {', 'try:') | |||||
python_code = python_code.replace('true', 'True') | |||||
python_code = python_code.replace('false', 'False') | |||||
python_code = python_code.replace('catch (...', 'except') | |||||
# python_code = re.sub('std::string\(\'(.*)\'\)', '$1', python_code) | |||||
return python_code | |||||
# # python_code = python_code.replace('}\n', '') | |||||
# python_code = python_code.replace('option.first', 'opt_name') | |||||
# python_code = python_code.replace('option.second', 'opt_val') | |||||
# python_code = python_code.replace('ged::Error', 'Exception') | |||||
# python_code = python_code.replace('std::string(\'Invalid argument "\')', '\'Invalid argument "\'') | |||||
# f_cpp.close() | |||||
# f_python = open('python_code.py', 'w') | |||||
# f_python.write(python_code) | |||||
# f_python.close() | |||||
def convert_function_comment(cpp_fun_cmt, param_types): | |||||
cpp_fun_cmt = cpp_fun_cmt.replace('\t', '') | |||||
cpp_fun_cmt = cpp_fun_cmt.replace('\n * ', ' ') | |||||
# split the input comment according to key words. | |||||
param_split = None | |||||
note = None | |||||
cmt_split = cpp_fun_cmt.split('@brief')[1] | |||||
brief = cmt_split | |||||
if '@param' in cmt_split: | |||||
cmt_split = cmt_split.split('@param') | |||||
brief = cmt_split[0] | |||||
param_split = cmt_split[1:] | |||||
if '@note' in cmt_split[-1]: | |||||
note_split = cmt_split[-1].split('@note') | |||||
if param_split is not None: | |||||
param_split.pop() | |||||
param_split.append(note_split[0]) | |||||
else: | |||||
brief = note_split[0] | |||||
note = note_split[1] | |||||
# get parameters. | |||||
if param_split is not None: | |||||
for idx, param in enumerate(param_split): | |||||
_, param_name, param_desc = param.split(' ', 2) | |||||
param_name = function_comment_strip(param_name, ' *\n\t/') | |||||
param_desc = function_comment_strip(param_desc, ' *\n\t/') | |||||
param_split[idx] = (param_name, param_desc) | |||||
# strip comments. | |||||
brief = function_comment_strip(brief, ' *\n\t/') | |||||
if note is not None: | |||||
note = function_comment_strip(note, ' *\n\t/') | |||||
# construct the Python function comment. | |||||
python_fun_cmt = '"""' | |||||
python_fun_cmt += brief + '\n' | |||||
if param_split is not None and len(param_split) > 0: | |||||
python_fun_cmt += '\nParameters\n----------' | |||||
for idx, param in enumerate(param_split): | |||||
python_fun_cmt += '\n' + param[0] + ' : ' + param_types[idx] | |||||
python_fun_cmt += '\n\t' + param[1] + '\n' | |||||
if note is not None: | |||||
python_fun_cmt += '\nNote\n----\n' + note + '\n' | |||||
python_fun_cmt += '"""' | |||||
return python_fun_cmt | |||||
def function_comment_strip(comment, bad_chars): | |||||
head_removed, tail_removed = False, False | |||||
while not head_removed or not tail_removed: | |||||
if comment[0] in bad_chars: | |||||
comment = comment[1:] | |||||
head_removed = False | |||||
else: | |||||
head_removed = True | |||||
if comment[-1] in bad_chars: | |||||
comment = comment[:-1] | |||||
tail_removed = False | |||||
else: | |||||
tail_removed = True | |||||
return comment | |||||
if __name__ == '__main__': | |||||
# python_code = convert_function(""" | |||||
# if (print_to_stdout_ == 2) { | |||||
# std::cout << "\n===========================================================\n"; | |||||
# std::cout << "Block gradient descent for initial median " << median_pos + 1 << " of " << medians.size() << ".\n"; | |||||
# std::cout << "-----------------------------------------------------------\n"; | |||||
# } | |||||
# """) | |||||
python_fun_cmt = convert_function_comment(""" | |||||
/*! | |||||
* @brief Returns the sum of distances. | |||||
* @param[in] state The state of the estimator. | |||||
* @return The sum of distances of the median when the estimator was in the state @p state during the last call to run(). | |||||
*/ | |||||
""", ['string', 'string']) |
@@ -0,0 +1,122 @@ | |||||
else if (option.first == "random-inits") { | |||||
try { | |||||
num_random_inits_ = std::stoul(option.second); | |||||
desired_num_random_inits_ = num_random_inits_; | |||||
} | |||||
catch (...) { | |||||
throw Error(std::string("Invalid argument \"") + option.second + "\" for option random-inits. Usage: options = \"[--random-inits <convertible to int greater 0>]\""); | |||||
} | |||||
if (num_random_inits_ <= 0) { | |||||
throw Error(std::string("Invalid argument \"") + option.second + "\" for option random-inits. Usage: options = \"[--random-inits <convertible to int greater 0>]\""); | |||||
} | |||||
} | |||||
else if (option.first == "randomness") { | |||||
if (option.second == "PSEUDO") { | |||||
use_real_randomness_ = false; | |||||
} | |||||
else if (option.second == "REAL") { | |||||
use_real_randomness_ = true; | |||||
} | |||||
else { | |||||
throw Error(std::string("Invalid argument \"") + option.second + "\" for option randomness. Usage: options = \"[--randomness REAL|PSEUDO] [...]\""); | |||||
} | |||||
} | |||||
else if (option.first == "stdout") { | |||||
if (option.second == "0") { | |||||
print_to_stdout_ = 0; | |||||
} | |||||
else if (option.second == "1") { | |||||
print_to_stdout_ = 1; | |||||
} | |||||
else if (option.second == "2") { | |||||
print_to_stdout_ = 2; | |||||
} | |||||
else { | |||||
throw Error(std::string("Invalid argument \"") + option.second + "\" for option stdout. Usage: options = \"[--stdout 0|1|2] [...]\""); | |||||
} | |||||
} | |||||
else if (option.first == "refine") { | |||||
if (option.second == "TRUE") { | |||||
refine_ = true; | |||||
} | |||||
else if (option.second == "FALSE") { | |||||
refine_ = false; | |||||
} | |||||
else { | |||||
throw Error(std::string("Invalid argument \"") + option.second + "\" for option refine. Usage: options = \"[--refine TRUE|FALSE] [...]\""); | |||||
} | |||||
} | |||||
else if (option.first == "time-limit") { | |||||
try { | |||||
time_limit_in_sec_ = std::stod(option.second); | |||||
} | |||||
catch (...) { | |||||
throw Error(std::string("Invalid argument \"") + option.second + "\" for option time-limit. Usage: options = \"[--time-limit <convertible to double>] [...]"); | |||||
} | |||||
} | |||||
else if (option.first == "max-itrs") { | |||||
try { | |||||
max_itrs_ = std::stoi(option.second); | |||||
} | |||||
catch (...) { | |||||
throw Error(std::string("Invalid argument \"") + option.second + "\" for option max-itrs. Usage: options = \"[--max-itrs <convertible to int>] [...]"); | |||||
} | |||||
} | |||||
else if (option.first == "max-itrs-without-update") { | |||||
try { | |||||
max_itrs_without_update_ = std::stoi(option.second); | |||||
} | |||||
catch (...) { | |||||
throw Error(std::string("Invalid argument \"") + option.second + "\" for option max-itrs-without-update. Usage: options = \"[--max-itrs-without-update <convertible to int>] [...]"); | |||||
} | |||||
} | |||||
else if (option.first == "seed") { | |||||
try { | |||||
seed_ = std::stoul(option.second); | |||||
} | |||||
catch (...) { | |||||
throw Error(std::string("Invalid argument \"") + option.second + "\" for option seed. Usage: options = \"[--seed <convertible to int greater equal 0>] [...]"); | |||||
} | |||||
} | |||||
else if (option.first == "epsilon") { | |||||
try { | |||||
epsilon_ = std::stod(option.second); | |||||
} | |||||
catch (...) { | |||||
throw Error(std::string("Invalid argument \"") + option.second + "\" for option epsilon. Usage: options = \"[--epsilon <convertible to double greater 0>] [...]"); | |||||
} | |||||
if (epsilon_ <= 0) { | |||||
throw Error(std::string("Invalid argument \"") + option.second + "\" for option epsilon. Usage: options = \"[--epsilon <convertible to double greater 0>] [...]"); | |||||
} | |||||
} | |||||
else if (option.first == "inits-increase-order") { | |||||
try { | |||||
num_inits_increase_order_ = std::stoul(option.second); | |||||
} | |||||
catch (...) { | |||||
throw Error(std::string("Invalid argument \"") + option.second + "\" for option inits-increase-order. Usage: options = \"[--inits-increase-order <convertible to int greater 0>]\""); | |||||
} | |||||
if (num_inits_increase_order_ <= 0) { | |||||
throw Error(std::string("Invalid argument \"") + option.second + "\" for option inits-increase-order. Usage: options = \"[--inits-increase-order <convertible to int greater 0>]\""); | |||||
} | |||||
} | |||||
else if (option.first == "init-type-increase-order") { | |||||
init_type_increase_order_ = option.second; | |||||
if (option.second != "CLUSTERS" and option.second != "K-MEANS++") { | |||||
throw ged::Error(std::string("Invalid argument ") + option.second + " for option init-type-increase-order. Usage: options = \"[--init-type-increase-order CLUSTERS|K-MEANS++] [...]\""); | |||||
} | |||||
} | |||||
else if (option.first == "max-itrs-increase-order") { | |||||
try { | |||||
max_itrs_increase_order_ = std::stoi(option.second); | |||||
} | |||||
catch (...) { | |||||
throw Error(std::string("Invalid argument \"") + option.second + "\" for option max-itrs-increase-order. Usage: options = \"[--max-itrs-increase-order <convertible to int>] [...]"); | |||||
} | |||||
} | |||||
else { | |||||
std::string valid_options("[--init-type <arg>] [--random-inits <arg>] [--randomness <arg>] [--seed <arg>] [--stdout <arg>] "); | |||||
valid_options += "[--time-limit <arg>] [--max-itrs <arg>] [--epsilon <arg>] "; | |||||
valid_options += "[--inits-increase-order <arg>] [--init-type-increase-order <arg>] [--max-itrs-increase-order <arg>]"; | |||||
throw Error(std::string("Invalid option \"") + option.first + "\". Usage: options = \"" + valid_options + "\""); | |||||
} |
@@ -0,0 +1,122 @@ | |||||
#!/usr/bin/env python3 | |||||
# -*- coding: utf-8 -*- | |||||
""" | |||||
Created on Mon Jun 22 15:37:36 2020 | |||||
@author: ljia | |||||
""" | |||||
import numpy as np | |||||
from scipy.optimize import linear_sum_assignment | |||||
class LSAPESolver(object): | |||||
def __init__(self, cost_matrix=None): | |||||
""" | |||||
/*! | |||||
* @brief Constructs solver for LSAPE problem instance. | |||||
* @param[in] cost_matrix Pointer to the LSAPE problem instance that should be solved. | |||||
*/ | |||||
""" | |||||
self._cost_matrix = cost_matrix | |||||
self._model = 'ECBP' | |||||
self._greedy_method = 'BASIC' | |||||
self._solve_optimally = True | |||||
self._minimal_cost = 0 | |||||
self._row_to_col_assignments = [] | |||||
self._col_to_row_assignments = [] | |||||
self._dual_var_rows = [] # @todo | |||||
self._dual_var_cols = [] # @todo | |||||
def clear_solution(self): | |||||
"""Clears a previously computed solution. | |||||
""" | |||||
self._minimal_cost = 0 | |||||
self._row_to_col_assignments.clear() | |||||
self._col_to_row_assignments.clear() | |||||
self._row_to_col_assignments.append([]) # @todo | |||||
self._col_to_row_assignments.append([]) | |||||
self._dual_var_rows = [] # @todo | |||||
self._dual_var_cols = [] # @todo | |||||
def set_model(self, model): | |||||
""" | |||||
/*! | |||||
* @brief Makes the solver use a specific model for optimal solving. | |||||
* @param[in] model The model that should be used. | |||||
*/ | |||||
""" | |||||
self._solve_optimally = True | |||||
self._model = model | |||||
def solve(self, num_solutions=1): | |||||
""" | |||||
/*! | |||||
* @brief Solves the LSAPE problem instance. | |||||
* @param[in] num_solutions The maximal number of solutions that should be computed. | |||||
*/ | |||||
""" | |||||
self.clear_solution() | |||||
if self._solve_optimally: | |||||
row_ind, col_ind = linear_sum_assignment(self._cost_matrix) # @todo: only hungarianLSAPE ('ECBP') can be used. | |||||
self._row_to_col_assignments[0] = col_ind | |||||
self._col_to_row_assignments[0] = np.argsort(col_ind) # @todo: might be slow, can use row_ind | |||||
self._compute_cost_from_assignments() | |||||
if num_solutions > 1: | |||||
pass # @todo: | |||||
else: | |||||
print('here is non op.') | |||||
pass # @todo: greedy. | |||||
# self._ | |||||
def minimal_cost(self): | |||||
""" | |||||
/*! | |||||
* @brief Returns the cost of the computed solutions. | |||||
* @return Cost of computed solutions. | |||||
*/ | |||||
""" | |||||
return self._minimal_cost | |||||
def get_assigned_col(self, row, solution_id=0): | |||||
""" | |||||
/*! | |||||
* @brief Returns the assigned column. | |||||
* @param[in] row Row whose assigned column should be returned. | |||||
* @param[in] solution_id ID of the solution where the assignment should be looked up. | |||||
* @returns Column to which @p row is assigned to in solution with ID @p solution_id or ged::undefined() if @p row is not assigned to any column. | |||||
*/ | |||||
""" | |||||
return self._row_to_col_assignments[solution_id][row] | |||||
def get_assigned_row(self, col, solution_id=0): | |||||
""" | |||||
/*! | |||||
* @brief Returns the assigned row. | |||||
* @param[in] col Column whose assigned row should be returned. | |||||
* @param[in] solution_id ID of the solution where the assignment should be looked up. | |||||
* @returns Row to which @p col is assigned to in solution with ID @p solution_id or ged::undefined() if @p col is not assigned to any row. | |||||
*/ | |||||
""" | |||||
return self._col_to_row_assignments[solution_id][col] | |||||
def num_solutions(self): | |||||
""" | |||||
/*! | |||||
* @brief Returns the number of solutions. | |||||
* @returns Actual number of solutions computed by solve(). Might be smaller than @p num_solutions. | |||||
*/ | |||||
""" | |||||
return len(self._row_to_col_assignments) | |||||
def _compute_cost_from_assignments(self): # @todo | |||||
self._minimal_cost = np.sum(self._cost_matrix[range(0, len(self._row_to_col_assignments[0])), self._row_to_col_assignments[0]]) |
@@ -0,0 +1,129 @@ | |||||
#!/usr/bin/env python3 | |||||
# -*- coding: utf-8 -*- | |||||
""" | |||||
Created on Thu Mar 19 18:13:56 2020 | |||||
@author: ljia | |||||
""" | |||||
from gklearn.utils import dummy_node | |||||
def construct_node_map_from_solver(solver, node_map, solution_id): | |||||
node_map.clear() | |||||
num_nodes_g = node_map.num_source_nodes() | |||||
num_nodes_h = node_map.num_target_nodes() | |||||
# add deletions and substitutions | |||||
for row in range(0, num_nodes_g): | |||||
col = solver.get_assigned_col(row, solution_id) | |||||
if col >= num_nodes_h: | |||||
node_map.add_assignment(row, dummy_node()) | |||||
else: | |||||
node_map.add_assignment(row, col) | |||||
# insertions. | |||||
for col in range(0, num_nodes_h): | |||||
if solver.get_assigned_row(col, solution_id) >= num_nodes_g: | |||||
node_map.add_assignment(dummy_node(), col) | |||||
def options_string_to_options_map(options_string): | |||||
"""Transforms an options string into an options map. | |||||
Parameters | |||||
---------- | |||||
options_string : string | |||||
Options string of the form "[--<option> <arg>] [...]". | |||||
Return | |||||
------ | |||||
options_map : dict{string : string} | |||||
Map with one key-value pair (<option>, <arg>) for each option contained in the string. | |||||
""" | |||||
if options_string == '': | |||||
return | |||||
options_map = {} | |||||
words = [] | |||||
tokenize(options_string, ' ', words) | |||||
expect_option_name = True | |||||
for word in words: | |||||
if expect_option_name: | |||||
is_opt_name, word = is_option_name(word) | |||||
if is_opt_name: | |||||
option_name = word | |||||
if option_name in options_map: | |||||
raise Exception('Multiple specification of option "' + option_name + '".') | |||||
options_map[option_name] = '' | |||||
else: | |||||
raise Exception('Invalid options "' + options_string + '". Usage: options = "[--<option> <arg>] [...]"') | |||||
else: | |||||
is_opt_name, word = is_option_name(word) | |||||
if is_opt_name: | |||||
raise Exception('Invalid options "' + options_string + '". Usage: options = "[--<option> <arg>] [...]"') | |||||
else: | |||||
options_map[option_name] = word | |||||
expect_option_name = not expect_option_name | |||||
return options_map | |||||
def tokenize(sentence, sep, words): | |||||
"""Separates a sentence into words separated by sep (unless contained in single quotes). | |||||
Parameters | |||||
---------- | |||||
sentence : string | |||||
The sentence that should be tokenized. | |||||
sep : string | |||||
The separator. Must be different from "'". | |||||
words : list[string] | |||||
The obtained words. | |||||
""" | |||||
outside_quotes = True | |||||
word_length = 0 | |||||
pos_word_start = 0 | |||||
for pos in range(0, len(sentence)): | |||||
if sentence[pos] == '\'': | |||||
if not outside_quotes and pos < len(sentence) - 1: | |||||
if sentence[pos + 1] != sep: | |||||
raise Exception('Sentence contains closing single quote which is followed by a char different from ' + sep + '.') | |||||
word_length += 1 | |||||
outside_quotes = not outside_quotes | |||||
elif outside_quotes and sentence[pos] == sep: | |||||
if word_length > 0: | |||||
words.append(sentence[pos_word_start:pos_word_start + word_length]) | |||||
pos_word_start = pos + 1 | |||||
word_length = 0 | |||||
else: | |||||
word_length += 1 | |||||
if not outside_quotes: | |||||
raise Exception('Sentence contains unbalanced single quotes.') | |||||
if word_length > 0: | |||||
words.append(sentence[pos_word_start:pos_word_start + word_length]) | |||||
def is_option_name(word): | |||||
"""Checks whether a word is an option name and, if so, removes the leading dashes. | |||||
Parameters | |||||
---------- | |||||
word : string | |||||
Word. | |||||
return | |||||
------ | |||||
True if word is of the form "--<option>". | |||||
word : string | |||||
The word without the leading dashes. | |||||
""" | |||||
if word[0] == '\'': | |||||
word = word[1:len(word) - 2] | |||||
return False, word | |||||
if len(word) < 3: | |||||
return False, word | |||||
if word[0] == '-' and word[1] == '-' and word[2] != '-': | |||||
word = word[2:] | |||||
return True, word | |||||
return False, word |
@@ -0,0 +1,620 @@ | |||||
#!/usr/bin/env python3 | |||||
# -*- coding: utf-8 -*- | |||||
""" | |||||
Created on Tue Mar 31 17:06:22 2020 | |||||
@author: ljia | |||||
""" | |||||
import numpy as np | |||||
from itertools import combinations | |||||
import multiprocessing | |||||
from multiprocessing import Pool | |||||
from functools import partial | |||||
import sys | |||||
from tqdm import tqdm | |||||
import networkx as nx | |||||
from gklearn.ged.env import GEDEnv | |||||
def compute_ged(g1, g2, options): | |||||
from gklearn.gedlib import librariesImport, gedlibpy | |||||
ged_env = gedlibpy.GEDEnv() | |||||
ged_env.set_edit_cost(options['edit_cost'], edit_cost_constant=options['edit_cost_constants']) | |||||
ged_env.add_nx_graph(g1, '') | |||||
ged_env.add_nx_graph(g2, '') | |||||
listID = ged_env.get_all_graph_ids() | |||||
ged_env.init(init_type=options['init_option']) | |||||
ged_env.set_method(options['method'], ged_options_to_string(options)) | |||||
ged_env.init_method() | |||||
g = listID[0] | |||||
h = listID[1] | |||||
ged_env.run_method(g, h) | |||||
pi_forward = ged_env.get_forward_map(g, h) | |||||
pi_backward = ged_env.get_backward_map(g, h) | |||||
upper = ged_env.get_upper_bound(g, h) | |||||
dis = upper | |||||
# make the map label correct (label remove map as np.inf) | |||||
nodes1 = [n for n in g1.nodes()] | |||||
nodes2 = [n for n in g2.nodes()] | |||||
nb1 = nx.number_of_nodes(g1) | |||||
nb2 = nx.number_of_nodes(g2) | |||||
pi_forward = [nodes2[pi] if pi < nb2 else np.inf for pi in pi_forward] | |||||
pi_backward = [nodes1[pi] if pi < nb1 else np.inf for pi in pi_backward] | |||||
# print(pi_forward) | |||||
return dis, pi_forward, pi_backward | |||||
def compute_geds_cml(graphs, options={}, sort=True, parallel=False, verbose=True): | |||||
# initialize ged env. | |||||
ged_env = GEDEnv() | |||||
ged_env.set_edit_cost(options['edit_cost'], edit_cost_constants=options['edit_cost_constants']) | |||||
for g in graphs: | |||||
ged_env.add_nx_graph(g, '') | |||||
listID = ged_env.get_all_graph_ids() | |||||
node_labels = ged_env.get_all_node_labels() | |||||
edge_labels = ged_env.get_all_edge_labels() | |||||
node_label_costs = label_costs_to_matrix(options['node_label_costs'], len(node_labels)) if 'node_label_costs' in options else None | |||||
edge_label_costs = label_costs_to_matrix(options['edge_label_costs'], len(edge_labels)) if 'edge_label_costs' in options else None | |||||
ged_env.set_label_costs(node_label_costs, edge_label_costs) | |||||
ged_env.init(init_type=options['init_option']) | |||||
if parallel: | |||||
options['threads'] = 1 | |||||
ged_env.set_method(options['method'], options) | |||||
ged_env.init_method() | |||||
# compute ged. | |||||
# options used to compute numbers of edit operations. | |||||
if node_label_costs is None and edge_label_costs is None: | |||||
neo_options = {'edit_cost': options['edit_cost'], | |||||
'is_cml': False, | |||||
'node_labels': options['node_labels'], 'edge_labels': options['edge_labels'], | |||||
'node_attrs': options['node_attrs'], 'edge_attrs': options['edge_attrs']} | |||||
else: | |||||
neo_options = {'edit_cost': options['edit_cost'], | |||||
'is_cml': True, | |||||
'node_labels': node_labels, | |||||
'edge_labels': edge_labels} | |||||
ged_mat = np.zeros((len(graphs), len(graphs))) | |||||
if parallel: | |||||
len_itr = int(len(graphs) * (len(graphs) - 1) / 2) | |||||
ged_vec = [0 for i in range(len_itr)] | |||||
n_edit_operations = [0 for i in range(len_itr)] | |||||
itr = combinations(range(0, len(graphs)), 2) | |||||
n_jobs = multiprocessing.cpu_count() | |||||
if len_itr < 100 * n_jobs: | |||||
chunksize = int(len_itr / n_jobs) + 1 | |||||
else: | |||||
chunksize = 100 | |||||
def init_worker(graphs_toshare, ged_env_toshare, listID_toshare): | |||||
global G_graphs, G_ged_env, G_listID | |||||
G_graphs = graphs_toshare | |||||
G_ged_env = ged_env_toshare | |||||
G_listID = listID_toshare | |||||
do_partial = partial(_wrapper_compute_ged_parallel, neo_options, sort) | |||||
pool = Pool(processes=n_jobs, initializer=init_worker, initargs=(graphs, ged_env, listID)) | |||||
if verbose: | |||||
iterator = tqdm(pool.imap_unordered(do_partial, itr, chunksize), | |||||
desc='computing GEDs', file=sys.stdout) | |||||
else: | |||||
iterator = pool.imap_unordered(do_partial, itr, chunksize) | |||||
# iterator = pool.imap_unordered(do_partial, itr, chunksize) | |||||
for i, j, dis, n_eo_tmp in iterator: | |||||
idx_itr = int(len(graphs) * i + j - (i + 1) * (i + 2) / 2) | |||||
ged_vec[idx_itr] = dis | |||||
ged_mat[i][j] = dis | |||||
ged_mat[j][i] = dis | |||||
n_edit_operations[idx_itr] = n_eo_tmp | |||||
# print('\n-------------------------------------------') | |||||
# print(i, j, idx_itr, dis) | |||||
pool.close() | |||||
pool.join() | |||||
else: | |||||
ged_vec = [] | |||||
n_edit_operations = [] | |||||
if verbose: | |||||
iterator = tqdm(range(len(graphs)), desc='computing GEDs', file=sys.stdout) | |||||
else: | |||||
iterator = range(len(graphs)) | |||||
for i in iterator: | |||||
# for i in range(len(graphs)): | |||||
for j in range(i + 1, len(graphs)): | |||||
if nx.number_of_nodes(graphs[i]) <= nx.number_of_nodes(graphs[j]) or not sort: | |||||
dis, pi_forward, pi_backward = _compute_ged(ged_env, listID[i], listID[j], graphs[i], graphs[j]) | |||||
else: | |||||
dis, pi_backward, pi_forward = _compute_ged(ged_env, listID[j], listID[i], graphs[j], graphs[i]) | |||||
ged_vec.append(dis) | |||||
ged_mat[i][j] = dis | |||||
ged_mat[j][i] = dis | |||||
n_eo_tmp = get_nb_edit_operations(graphs[i], graphs[j], pi_forward, pi_backward, **neo_options) | |||||
n_edit_operations.append(n_eo_tmp) | |||||
return ged_vec, ged_mat, n_edit_operations | |||||
def compute_geds(graphs, options={}, sort=True, parallel=False, verbose=True): | |||||
from gklearn.gedlib import librariesImport, gedlibpy | |||||
# initialize ged env. | |||||
ged_env = gedlibpy.GEDEnv() | |||||
ged_env.set_edit_cost(options['edit_cost'], edit_cost_constant=options['edit_cost_constants']) | |||||
for g in graphs: | |||||
ged_env.add_nx_graph(g, '') | |||||
listID = ged_env.get_all_graph_ids() | |||||
ged_env.init() | |||||
if parallel: | |||||
options['threads'] = 1 | |||||
ged_env.set_method(options['method'], ged_options_to_string(options)) | |||||
ged_env.init_method() | |||||
# compute ged. | |||||
neo_options = {'edit_cost': options['edit_cost'], | |||||
'node_labels': options['node_labels'], 'edge_labels': options['edge_labels'], | |||||
'node_attrs': options['node_attrs'], 'edge_attrs': options['edge_attrs']} | |||||
ged_mat = np.zeros((len(graphs), len(graphs))) | |||||
if parallel: | |||||
len_itr = int(len(graphs) * (len(graphs) - 1) / 2) | |||||
ged_vec = [0 for i in range(len_itr)] | |||||
n_edit_operations = [0 for i in range(len_itr)] | |||||
itr = combinations(range(0, len(graphs)), 2) | |||||
n_jobs = multiprocessing.cpu_count() | |||||
if len_itr < 100 * n_jobs: | |||||
chunksize = int(len_itr / n_jobs) + 1 | |||||
else: | |||||
chunksize = 100 | |||||
def init_worker(graphs_toshare, ged_env_toshare, listID_toshare): | |||||
global G_graphs, G_ged_env, G_listID | |||||
G_graphs = graphs_toshare | |||||
G_ged_env = ged_env_toshare | |||||
G_listID = listID_toshare | |||||
do_partial = partial(_wrapper_compute_ged_parallel, neo_options, sort) | |||||
pool = Pool(processes=n_jobs, initializer=init_worker, initargs=(graphs, ged_env, listID)) | |||||
if verbose: | |||||
iterator = tqdm(pool.imap_unordered(do_partial, itr, chunksize), | |||||
desc='computing GEDs', file=sys.stdout) | |||||
else: | |||||
iterator = pool.imap_unordered(do_partial, itr, chunksize) | |||||
# iterator = pool.imap_unordered(do_partial, itr, chunksize) | |||||
for i, j, dis, n_eo_tmp in iterator: | |||||
idx_itr = int(len(graphs) * i + j - (i + 1) * (i + 2) / 2) | |||||
ged_vec[idx_itr] = dis | |||||
ged_mat[i][j] = dis | |||||
ged_mat[j][i] = dis | |||||
n_edit_operations[idx_itr] = n_eo_tmp | |||||
# print('\n-------------------------------------------') | |||||
# print(i, j, idx_itr, dis) | |||||
pool.close() | |||||
pool.join() | |||||
else: | |||||
ged_vec = [] | |||||
n_edit_operations = [] | |||||
if verbose: | |||||
iterator = tqdm(range(len(graphs)), desc='computing GEDs', file=sys.stdout) | |||||
else: | |||||
iterator = range(len(graphs)) | |||||
for i in iterator: | |||||
# for i in range(len(graphs)): | |||||
for j in range(i + 1, len(graphs)): | |||||
if nx.number_of_nodes(graphs[i]) <= nx.number_of_nodes(graphs[j]) or not sort: | |||||
dis, pi_forward, pi_backward = _compute_ged(ged_env, listID[i], listID[j], graphs[i], graphs[j]) | |||||
else: | |||||
dis, pi_backward, pi_forward = _compute_ged(ged_env, listID[j], listID[i], graphs[j], graphs[i]) | |||||
ged_vec.append(dis) | |||||
ged_mat[i][j] = dis | |||||
ged_mat[j][i] = dis | |||||
n_eo_tmp = get_nb_edit_operations(graphs[i], graphs[j], pi_forward, pi_backward, **neo_options) | |||||
n_edit_operations.append(n_eo_tmp) | |||||
return ged_vec, ged_mat, n_edit_operations | |||||
def _wrapper_compute_ged_parallel(options, sort, itr): | |||||
i = itr[0] | |||||
j = itr[1] | |||||
dis, n_eo_tmp = _compute_ged_parallel(G_ged_env, G_listID[i], G_listID[j], G_graphs[i], G_graphs[j], options, sort) | |||||
return i, j, dis, n_eo_tmp | |||||
def _compute_ged_parallel(env, gid1, gid2, g1, g2, options, sort): | |||||
if nx.number_of_nodes(g1) <= nx.number_of_nodes(g2) or not sort: | |||||
dis, pi_forward, pi_backward = _compute_ged(env, gid1, gid2, g1, g2) | |||||
else: | |||||
dis, pi_backward, pi_forward = _compute_ged(env, gid2, gid1, g2, g1) | |||||
n_eo_tmp = get_nb_edit_operations(g1, g2, pi_forward, pi_backward, **options) # [0,0,0,0,0,0] | |||||
return dis, n_eo_tmp | |||||
def _compute_ged(env, gid1, gid2, g1, g2): | |||||
env.run_method(gid1, gid2) | |||||
pi_forward = env.get_forward_map(gid1, gid2) | |||||
pi_backward = env.get_backward_map(gid1, gid2) | |||||
upper = env.get_upper_bound(gid1, gid2) | |||||
dis = upper | |||||
# make the map label correct (label remove map as np.inf) | |||||
nodes1 = [n for n in g1.nodes()] | |||||
nodes2 = [n for n in g2.nodes()] | |||||
nb1 = nx.number_of_nodes(g1) | |||||
nb2 = nx.number_of_nodes(g2) | |||||
pi_forward = [nodes2[pi] if pi < nb2 else np.inf for pi in pi_forward] | |||||
pi_backward = [nodes1[pi] if pi < nb1 else np.inf for pi in pi_backward] | |||||
return dis, pi_forward, pi_backward | |||||
def label_costs_to_matrix(costs, nb_labels): | |||||
"""Reform a label cost vector to a matrix. | |||||
Parameters | |||||
---------- | |||||
costs : numpy.array | |||||
The vector containing costs between labels, in the order of node insertion costs, node deletion costs, node substitition costs, edge insertion costs, edge deletion costs, edge substitition costs. | |||||
nb_labels : integer | |||||
Number of labels. | |||||
Returns | |||||
------- | |||||
cost_matrix : numpy.array. | |||||
The reformed label cost matrix of size (nb_labels, nb_labels). Each row/column of cost_matrix corresponds to a label, and the first label is the dummy label. This is the same setting as in GEDData. | |||||
""" | |||||
# Initialize label cost matrix. | |||||
cost_matrix = np.zeros((nb_labels + 1, nb_labels + 1)) | |||||
i = 0 | |||||
# Costs of insertions. | |||||
for col in range(1, nb_labels + 1): | |||||
cost_matrix[0, col] = costs[i] | |||||
i += 1 | |||||
# Costs of deletions. | |||||
for row in range(1, nb_labels + 1): | |||||
cost_matrix[row, 0] = costs[i] | |||||
i += 1 | |||||
# Costs of substitutions. | |||||
for row in range(1, nb_labels + 1): | |||||
for col in range(row + 1, nb_labels + 1): | |||||
cost_matrix[row, col] = costs[i] | |||||
cost_matrix[col, row] = costs[i] | |||||
i += 1 | |||||
return cost_matrix | |||||
def get_nb_edit_operations(g1, g2, forward_map, backward_map, edit_cost=None, is_cml=False, **kwargs): | |||||
if is_cml: | |||||
if edit_cost == 'CONSTANT': | |||||
node_labels = kwargs.get('node_labels', []) | |||||
edge_labels = kwargs.get('edge_labels', []) | |||||
return get_nb_edit_operations_symbolic_cml(g1, g2, forward_map, backward_map, | |||||
node_labels=node_labels, edge_labels=edge_labels) | |||||
else: | |||||
raise Exception('Edit cost "', edit_cost, '" is not supported.') | |||||
else: | |||||
if edit_cost == 'LETTER' or edit_cost == 'LETTER2': | |||||
return get_nb_edit_operations_letter(g1, g2, forward_map, backward_map) | |||||
elif edit_cost == 'NON_SYMBOLIC': | |||||
node_attrs = kwargs.get('node_attrs', []) | |||||
edge_attrs = kwargs.get('edge_attrs', []) | |||||
return get_nb_edit_operations_nonsymbolic(g1, g2, forward_map, backward_map, | |||||
node_attrs=node_attrs, edge_attrs=edge_attrs) | |||||
elif edit_cost == 'CONSTANT': | |||||
node_labels = kwargs.get('node_labels', []) | |||||
edge_labels = kwargs.get('edge_labels', []) | |||||
return get_nb_edit_operations_symbolic(g1, g2, forward_map, backward_map, | |||||
node_labels=node_labels, edge_labels=edge_labels) | |||||
else: | |||||
return get_nb_edit_operations_symbolic(g1, g2, forward_map, backward_map) | |||||
def get_nb_edit_operations_symbolic_cml(g1, g2, forward_map, backward_map, | |||||
node_labels=[], edge_labels=[]): | |||||
"""Compute times that edit operations are used in an edit path for symbolic-labeled graphs, where the costs are different for each pair of nodes. | |||||
Returns | |||||
------- | |||||
list | |||||
A vector of numbers of times that costs bewteen labels are used in an edit path, formed in the order of node insertion costs, node deletion costs, node substitition costs, edge insertion costs, edge deletion costs, edge substitition costs. The dummy label is the first label, and the self label costs are not included. | |||||
""" | |||||
# Initialize. | |||||
nb_ops_node = np.zeros((1 + len(node_labels), 1 + len(node_labels))) | |||||
nb_ops_edge = np.zeros((1 + len(edge_labels), 1 + len(edge_labels))) | |||||
# For nodes. | |||||
nodes1 = [n for n in g1.nodes()] | |||||
for i, map_i in enumerate(forward_map): | |||||
label1 = tuple(g1.nodes[nodes1[i]].items()) # @todo: order and faster | |||||
idx_label1 = node_labels.index(label1) # @todo: faster | |||||
if map_i == np.inf: # deletions. | |||||
nb_ops_node[idx_label1 + 1, 0] += 1 | |||||
else: # substitutions. | |||||
label2 = tuple(g2.nodes[map_i].items()) | |||||
if label1 != label2: | |||||
idx_label2 = node_labels.index(label2) # @todo: faster | |||||
nb_ops_node[idx_label1 + 1, idx_label2 + 1] += 1 | |||||
# insertions. | |||||
nodes2 = [n for n in g2.nodes()] | |||||
for i, map_i in enumerate(backward_map): | |||||
if map_i == np.inf: | |||||
label = tuple(g2.nodes[nodes2[i]].items()) | |||||
idx_label = node_labels.index(label) # @todo: faster | |||||
nb_ops_node[0, idx_label + 1] += 1 | |||||
# For edges. | |||||
edges1 = [e for e in g1.edges()] | |||||
edges2_marked = [] | |||||
for nf1, nt1 in edges1: | |||||
label1 = tuple(g1.edges[(nf1, nt1)].items()) | |||||
idx_label1 = edge_labels.index(label1) # @todo: faster | |||||
idxf1 = nodes1.index(nf1) # @todo: faster | |||||
idxt1 = nodes1.index(nt1) # @todo: faster | |||||
# At least one of the nodes is removed, thus the edge is removed. | |||||
if forward_map[idxf1] == np.inf or forward_map[idxt1] == np.inf: | |||||
nb_ops_edge[idx_label1 + 1, 0] += 1 | |||||
# corresponding edge is in g2. | |||||
else: | |||||
nf2, nt2 = forward_map[idxf1], forward_map[idxt1] | |||||
if (nf2, nt2) in g2.edges(): | |||||
edges2_marked.append((nf2, nt2)) | |||||
# If edge labels are different. | |||||
label2 = tuple(g2.edges[(nf2, nt2)].items()) | |||||
if label1 != label2: | |||||
idx_label2 = edge_labels.index(label2) # @todo: faster | |||||
nb_ops_edge[idx_label1 + 1, idx_label2 + 1] += 1 | |||||
# Switch nf2 and nt2, for directed graphs. | |||||
elif (nt2, nf2) in g2.edges(): | |||||
edges2_marked.append((nt2, nf2)) | |||||
# If edge labels are different. | |||||
label2 = tuple(g2.edges[(nt2, nf2)].items()) | |||||
if label1 != label2: | |||||
idx_label2 = edge_labels.index(label2) # @todo: faster | |||||
nb_ops_edge[idx_label1 + 1, idx_label2 + 1] += 1 | |||||
# Corresponding nodes are in g2, however the edge is removed. | |||||
else: | |||||
nb_ops_edge[idx_label1 + 1, 0] += 1 | |||||
# insertions. | |||||
for nt, nf in g2.edges(): | |||||
if (nt, nf) not in edges2_marked and (nf, nt) not in edges2_marked: # @todo: for directed. | |||||
label = tuple(g2.edges[(nt, nf)].items()) | |||||
idx_label = edge_labels.index(label) # @todo: faster | |||||
nb_ops_edge[0, idx_label + 1] += 1 | |||||
# Reform the numbers of edit oeprations into a vector. | |||||
nb_eo_vector = [] | |||||
# node insertion. | |||||
for i in range(1, len(nb_ops_node)): | |||||
nb_eo_vector.append(nb_ops_node[0, i]) | |||||
# node deletion. | |||||
for i in range(1, len(nb_ops_node)): | |||||
nb_eo_vector.append(nb_ops_node[i, 0]) | |||||
# node substitution. | |||||
for i in range(1, len(nb_ops_node)): | |||||
for j in range(i + 1, len(nb_ops_node)): | |||||
nb_eo_vector.append(nb_ops_node[i, j]) | |||||
# edge insertion. | |||||
for i in range(1, len(nb_ops_edge)): | |||||
nb_eo_vector.append(nb_ops_edge[0, i]) | |||||
# edge deletion. | |||||
for i in range(1, len(nb_ops_edge)): | |||||
nb_eo_vector.append(nb_ops_edge[i, 0]) | |||||
# edge substitution. | |||||
for i in range(1, len(nb_ops_edge)): | |||||
for j in range(i + 1, len(nb_ops_edge)): | |||||
nb_eo_vector.append(nb_ops_edge[i, j]) | |||||
return nb_eo_vector | |||||
def get_nb_edit_operations_symbolic(g1, g2, forward_map, backward_map, | |||||
node_labels=[], edge_labels=[]): | |||||
"""Compute the number of each edit operations for symbolic-labeled graphs. | |||||
""" | |||||
n_vi = 0 | |||||
n_vr = 0 | |||||
n_vs = 0 | |||||
n_ei = 0 | |||||
n_er = 0 | |||||
n_es = 0 | |||||
nodes1 = [n for n in g1.nodes()] | |||||
for i, map_i in enumerate(forward_map): | |||||
if map_i == np.inf: | |||||
n_vr += 1 | |||||
else: | |||||
for nl in node_labels: | |||||
label1 = g1.nodes[nodes1[i]][nl] | |||||
label2 = g2.nodes[map_i][nl] | |||||
if label1 != label2: | |||||
n_vs += 1 | |||||
break | |||||
for map_i in backward_map: | |||||
if map_i == np.inf: | |||||
n_vi += 1 | |||||
# idx_nodes1 = range(0, len(node1)) | |||||
edges1 = [e for e in g1.edges()] | |||||
nb_edges2_cnted = 0 | |||||
for n1, n2 in edges1: | |||||
idx1 = nodes1.index(n1) | |||||
idx2 = nodes1.index(n2) | |||||
# one of the nodes is removed, thus the edge is removed. | |||||
if forward_map[idx1] == np.inf or forward_map[idx2] == np.inf: | |||||
n_er += 1 | |||||
# corresponding edge is in g2. | |||||
elif (forward_map[idx1], forward_map[idx2]) in g2.edges(): | |||||
nb_edges2_cnted += 1 | |||||
# edge labels are different. | |||||
for el in edge_labels: | |||||
label1 = g2.edges[((forward_map[idx1], forward_map[idx2]))][el] | |||||
label2 = g1.edges[(n1, n2)][el] | |||||
if label1 != label2: | |||||
n_es += 1 | |||||
break | |||||
elif (forward_map[idx2], forward_map[idx1]) in g2.edges(): | |||||
nb_edges2_cnted += 1 | |||||
# edge labels are different. | |||||
for el in edge_labels: | |||||
label1 = g2.edges[((forward_map[idx2], forward_map[idx1]))][el] | |||||
label2 = g1.edges[(n1, n2)][el] | |||||
if label1 != label2: | |||||
n_es += 1 | |||||
break | |||||
# corresponding nodes are in g2, however the edge is removed. | |||||
else: | |||||
n_er += 1 | |||||
n_ei = nx.number_of_edges(g2) - nb_edges2_cnted | |||||
return n_vi, n_vr, n_vs, n_ei, n_er, n_es | |||||
def get_nb_edit_operations_letter(g1, g2, forward_map, backward_map): | |||||
"""Compute the number of each edit operations. | |||||
""" | |||||
n_vi = 0 | |||||
n_vr = 0 | |||||
n_vs = 0 | |||||
sod_vs = 0 | |||||
n_ei = 0 | |||||
n_er = 0 | |||||
nodes1 = [n for n in g1.nodes()] | |||||
for i, map_i in enumerate(forward_map): | |||||
if map_i == np.inf: | |||||
n_vr += 1 | |||||
else: | |||||
n_vs += 1 | |||||
diff_x = float(g1.nodes[nodes1[i]]['x']) - float(g2.nodes[map_i]['x']) | |||||
diff_y = float(g1.nodes[nodes1[i]]['y']) - float(g2.nodes[map_i]['y']) | |||||
sod_vs += np.sqrt(np.square(diff_x) + np.square(diff_y)) | |||||
for map_i in backward_map: | |||||
if map_i == np.inf: | |||||
n_vi += 1 | |||||
# idx_nodes1 = range(0, len(node1)) | |||||
edges1 = [e for e in g1.edges()] | |||||
nb_edges2_cnted = 0 | |||||
for n1, n2 in edges1: | |||||
idx1 = nodes1.index(n1) | |||||
idx2 = nodes1.index(n2) | |||||
# one of the nodes is removed, thus the edge is removed. | |||||
if forward_map[idx1] == np.inf or forward_map[idx2] == np.inf: | |||||
n_er += 1 | |||||
# corresponding edge is in g2. Edge label is not considered. | |||||
elif (forward_map[idx1], forward_map[idx2]) in g2.edges() or \ | |||||
(forward_map[idx2], forward_map[idx1]) in g2.edges(): | |||||
nb_edges2_cnted += 1 | |||||
# corresponding nodes are in g2, however the edge is removed. | |||||
else: | |||||
n_er += 1 | |||||
n_ei = nx.number_of_edges(g2) - nb_edges2_cnted | |||||
return n_vi, n_vr, n_vs, sod_vs, n_ei, n_er | |||||
def get_nb_edit_operations_nonsymbolic(g1, g2, forward_map, backward_map, | |||||
node_attrs=[], edge_attrs=[]): | |||||
"""Compute the number of each edit operations. | |||||
""" | |||||
n_vi = 0 | |||||
n_vr = 0 | |||||
n_vs = 0 | |||||
sod_vs = 0 | |||||
n_ei = 0 | |||||
n_er = 0 | |||||
n_es = 0 | |||||
sod_es = 0 | |||||
nodes1 = [n for n in g1.nodes()] | |||||
for i, map_i in enumerate(forward_map): | |||||
if map_i == np.inf: | |||||
n_vr += 1 | |||||
else: | |||||
n_vs += 1 | |||||
sum_squares = 0 | |||||
for a_name in node_attrs: | |||||
diff = float(g1.nodes[nodes1[i]][a_name]) - float(g2.nodes[map_i][a_name]) | |||||
sum_squares += np.square(diff) | |||||
sod_vs += np.sqrt(sum_squares) | |||||
for map_i in backward_map: | |||||
if map_i == np.inf: | |||||
n_vi += 1 | |||||
# idx_nodes1 = range(0, len(node1)) | |||||
edges1 = [e for e in g1.edges()] | |||||
for n1, n2 in edges1: | |||||
idx1 = nodes1.index(n1) | |||||
idx2 = nodes1.index(n2) | |||||
n1_g2 = forward_map[idx1] | |||||
n2_g2 = forward_map[idx2] | |||||
# one of the nodes is removed, thus the edge is removed. | |||||
if n1_g2 == np.inf or n2_g2 == np.inf: | |||||
n_er += 1 | |||||
# corresponding edge is in g2. | |||||
elif (n1_g2, n2_g2) in g2.edges(): | |||||
n_es += 1 | |||||
sum_squares = 0 | |||||
for a_name in edge_attrs: | |||||
diff = float(g1.edges[n1, n2][a_name]) - float(g2.edges[n1_g2, n2_g2][a_name]) | |||||
sum_squares += np.square(diff) | |||||
sod_es += np.sqrt(sum_squares) | |||||
elif (n2_g2, n1_g2) in g2.edges(): | |||||
n_es += 1 | |||||
sum_squares = 0 | |||||
for a_name in edge_attrs: | |||||
diff = float(g1.edges[n2, n1][a_name]) - float(g2.edges[n2_g2, n1_g2][a_name]) | |||||
sum_squares += np.square(diff) | |||||
sod_es += np.sqrt(sum_squares) | |||||
# corresponding nodes are in g2, however the edge is removed. | |||||
else: | |||||
n_er += 1 | |||||
n_ei = nx.number_of_edges(g2) - n_es | |||||
return n_vi, n_vr, sod_vs, n_ei, n_er, sod_es | |||||
def ged_options_to_string(options): | |||||
opt_str = ' ' | |||||
for key, val in options.items(): | |||||
if key == 'initialization_method': | |||||
opt_str += '--initialization-method ' + str(val) + ' ' | |||||
elif key == 'initialization_options': | |||||
opt_str += '--initialization-options ' + str(val) + ' ' | |||||
elif key == 'lower_bound_method': | |||||
opt_str += '--lower-bound-method ' + str(val) + ' ' | |||||
elif key == 'random_substitution_ratio': | |||||
opt_str += '--random-substitution-ratio ' + str(val) + ' ' | |||||
elif key == 'initial_solutions': | |||||
opt_str += '--initial-solutions ' + str(val) + ' ' | |||||
elif key == 'ratio_runs_from_initial_solutions': | |||||
opt_str += '--ratio-runs-from-initial-solutions ' + str(val) + ' ' | |||||
elif key == 'threads': | |||||
opt_str += '--threads ' + str(val) + ' ' | |||||
elif key == 'num_randpost_loops': | |||||
opt_str += '--num-randpost-loops ' + str(val) + ' ' | |||||
elif key == 'max_randpost_retrials': | |||||
opt_str += '--maxrandpost-retrials ' + str(val) + ' ' | |||||
elif key == 'randpost_penalty': | |||||
opt_str += '--randpost-penalty ' + str(val) + ' ' | |||||
elif key == 'randpost_decay': | |||||
opt_str += '--randpost-decay ' + str(val) + ' ' | |||||
elif key == 'log': | |||||
opt_str += '--log ' + str(val) + ' ' | |||||
elif key == 'randomness': | |||||
opt_str += '--randomness ' + str(val) + ' ' | |||||
# if not isinstance(val, list): | |||||
# opt_str += '--' + key.replace('_', '-') + ' ' | |||||
# if val == False: | |||||
# val_str = 'FALSE' | |||||
# else: | |||||
# val_str = str(val) | |||||
# opt_str += val_str + ' ' | |||||
return opt_str |
@@ -0,0 +1,97 @@ | |||||
GEDLIBPY | |||||
==================================== | |||||
Please Read https://dbblumenthal.github.io/gedlib/ before using Python code. | |||||
You can also find this module documentation in documentation/build/html folder. | |||||
Make sure you have numpy installed (and Cython if you have to recompile the library). You can use pip for this. | |||||
Running the script | |||||
------------------- | |||||
After donwloading the entire folder, you can run test.py to ensure the library works. | |||||
For your code, you have to make two imports:: | |||||
import librariesImport | |||||
import gedlibpy | |||||
You can call each function in the library with this. You can't move any folder or files on the library, please make sure that the architecture remains the same. | |||||
This library is compiled for Python3 only. If you want to use it with Python 2, you have to recompile it with setup.py. You have to use this command on your favorite shell:: | |||||
python setup.py build_ext --inplace | |||||
After this step, you can use the same lines as Python3 for import, it will be ok. Check the documentation inside the documentation/build/html folder before using function. You can also copy the tests examples for basic use. | |||||
A problem with the library ? | |||||
------------------------------- | |||||
If the library isn't found, you can recompile the Python library because your Linux is different to mine. Please delete gedlibpy.so, gedlibpy.cpp and build folder. Then use this command on a linux shell :: | |||||
python3 setup.py build_ext --inplace | |||||
You can make it with Python 2 but make sure you use the same version with your code and the compilation. | |||||
If it's doesn't work, maybe the version of GedLib or another library can be a problem. If it is, you can re-install GedLib for your computer. You can download it on this git : https://dbblumenthal.github.io/gedlib/ | |||||
You have to install Gedlib with the Python installer after that. | |||||
Just call:: | |||||
python3 install.py | |||||
Make the links like indicate on the documentation. Use the same architecture like this library, but just change the .so and folders with your installation. You can recompile the Python library with setup command, after that. | |||||
If you use Mac OS, you have to follow all this part, and install the external libraries with this command:: | |||||
install_name_tool -change <mylib> <path>/<to>/<mylib> <myexec> | |||||
For an example, you have to write these lines:: | |||||
install_name_tool -change libdoublefann.2.dylib lib/fann/libdoublefann.2.dylib gedlibpy.so | |||||
install_name_tool -change libsvm.so lib/libsvm.3.22/libsvm.so gedlibpy.so | |||||
install_name_tool -change libnomad.so lib/nomad/libnomad.so gedlibpy.so | |||||
install_name_tool -change libsgtelib.so lib/nomad/libsgtelib.so gedlibpy.so | |||||
The name of the library gedlibpy can be different if you use Python 3. | |||||
If your problem is still here, you can contact me on : natacha.lambert@unicaen.fr | |||||
How to use this library | |||||
------------------------- | |||||
This library allow to compute edit distance between two graphs. You have to follow these steps to use it : | |||||
- Add your graphs (GXL files, NX Structures or your structure, make sure that the internal type is the same) | |||||
- Choose your cost function | |||||
- Init your environnment (After that, the cost function and your graphs can't be modified) | |||||
- Choose your method computation | |||||
- Run the computation with the IDs of the two graphs. You can have the ID when you add the graph or with some functions | |||||
- Find the result with differents functions (NodeMap, edit distance, etc) | |||||
Here is an example of code with GXL graphs:: | |||||
gedlibpy.load_GXL_graphs('include/gedlib-master/data/datasets/Mutagenicity/data/', 'collections/MUTA_10.xml') | |||||
listID = gedlibpy.get_all_graph_ids() | |||||
gedlibpy.set_edit_cost("CHEM_1") | |||||
gedlibpy.init() | |||||
gedlibpy.set_method("IPFP", "") | |||||
gedlibpy.init_method() | |||||
g = listID[0] | |||||
h = listID[1] | |||||
gedlibpy.run_method(g,h) | |||||
print("Node Map : ", gedlibpy.get_node_map(g,h)) | |||||
print ("Upper Bound = " + str(gedlibpy.get_upper_bound(g,h)) + ", Lower Bound = " + str(gedlibpy.get_lower_bound(g,h)) + ", Runtime = " + str(gedlibpy.get_runtime(g,h))) | |||||
Please read the documentation for more examples and functions. | |||||
An advice if you don't code in a shell | |||||
--------------------------------------- | |||||
Python library don't indicate each C++ error. If you have a restart causing by an error in your code, please use on a linux shell for having C++ errors. |
@@ -0,0 +1,10 @@ | |||||
# -*-coding:utf-8 -*- | |||||
""" | |||||
gedlib | |||||
""" | |||||
# info | |||||
__version__ = "0.1" | |||||
__author__ = "Linlin Jia" | |||||
__date__ = "March 2020" |
@@ -0,0 +1,20 @@ | |||||
# Minimal makefile for Sphinx documentation | |||||
# | |||||
# You can set these variables from the command line. | |||||
SPHINXOPTS = | |||||
SPHINXBUILD = sphinx-build | |||||
SPHINXPROJ = Cython_GedLib | |||||
SOURCEDIR = source | |||||
BUILDDIR = build | |||||
# Put it first so that "make" without argument is like "make help". | |||||
help: | |||||
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) | |||||
.PHONY: help Makefile | |||||
# Catch-all target: route all unknown targets to Sphinx using the new | |||||
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). | |||||
%: Makefile | |||||
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) |
@@ -0,0 +1,36 @@ | |||||
@ECHO OFF | |||||
pushd %~dp0 | |||||
REM Command file for Sphinx documentation | |||||
if "%SPHINXBUILD%" == "" ( | |||||
set SPHINXBUILD=sphinx-build | |||||
) | |||||
set SOURCEDIR=source | |||||
set BUILDDIR=build | |||||
set SPHINXPROJ=Cython_GedLib | |||||
if "%1" == "" goto help | |||||
%SPHINXBUILD% >NUL 2>NUL | |||||
if errorlevel 9009 ( | |||||
echo. | |||||
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx | |||||
echo.installed, then set the SPHINXBUILD environment variable to point | |||||
echo.to the full path of the 'sphinx-build' executable. Alternatively you | |||||
echo.may add the Sphinx directory to PATH. | |||||
echo. | |||||
echo.If you don't have Sphinx installed, grab it from | |||||
echo.http://sphinx-doc.org/ | |||||
exit /b 1 | |||||
) | |||||
%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% | |||||
goto end | |||||
:help | |||||
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% | |||||
:end | |||||
popd |
@@ -0,0 +1,199 @@ | |||||
# -*- coding: utf-8 -*- | |||||
# | |||||
# Python_GedLib documentation build configuration file, created by | |||||
# sphinx-quickstart on Thu Jun 13 16:10:06 2019. | |||||
# | |||||
# This file is execfile()d with the current directory set to its | |||||
# containing dir. | |||||
# | |||||
# Note that not all possible configuration values are present in this | |||||
# autogenerated file. | |||||
# | |||||
# All configuration values have a default; values that are commented out | |||||
# serve to show the default. | |||||
# If extensions (or modules to document with autodoc) are in another directory, | |||||
# add these directories to sys.path here. If the directory is relative to the | |||||
# documentation root, use os.path.abspath to make it absolute, like shown here. | |||||
# | |||||
import os | |||||
import sys | |||||
#sys.path.insert(0, os.path.abspath('.')) | |||||
sys.path.insert(0, os.path.abspath('../../')) | |||||
sys.path.append("../../lib/fann") | |||||
#,"lib/gedlib", "lib/libsvm.3.22","lib/nomad" | |||||
# -- General configuration ------------------------------------------------ | |||||
# If your documentation needs a minimal Sphinx version, state it here. | |||||
# | |||||
# needs_sphinx = '1.0' | |||||
# Add any Sphinx extension module names here, as strings. They can be | |||||
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom | |||||
# ones. | |||||
extensions = ['sphinx.ext.autodoc', | |||||
'sphinx.ext.intersphinx', | |||||
'sphinx.ext.coverage', | |||||
'sphinx.ext.mathjax', | |||||
'sphinx.ext.githubpages'] | |||||
# Add any paths that contain templates here, relative to this directory. | |||||
templates_path = ['_templates'] | |||||
# The suffix(es) of source filenames. | |||||
# You can specify multiple suffix as a list of string: | |||||
# | |||||
# source_suffix = ['.rst', '.md'] | |||||
source_suffix = '.rst' | |||||
# The master toctree document. | |||||
master_doc = 'index' | |||||
# General information about the project. | |||||
project = u'GeDLiBPy' | |||||
copyright = u'2019, Natacha Lambert' | |||||
author = u'Natacha Lambert' | |||||
# The version info for the project you're documenting, acts as replacement for | |||||
# |version| and |release|, also used in various other places throughout the | |||||
# built documents. | |||||
# | |||||
# The short X.Y version. | |||||
version = u'1.0' | |||||
# The full version, including alpha/beta/rc tags. | |||||
release = u'1.0' | |||||
# The language for content autogenerated by Sphinx. Refer to documentation | |||||
# for a list of supported languages. | |||||
# | |||||
# This is also used if you do content translation via gettext catalogs. | |||||
# Usually you set "language" from the command line for these cases. | |||||
language = None | |||||
# List of patterns, relative to source directory, that match files and | |||||
# directories to ignore when looking for source files. | |||||
# This patterns also effect to html_static_path and html_extra_path | |||||
exclude_patterns = [] | |||||
# The name of the Pygments (syntax highlighting) style to use. | |||||
pygments_style = 'sphinx' | |||||
# If true, `todo` and `todoList` produce output, else they produce nothing. | |||||
todo_include_todos = False | |||||
# -- Options for HTML output ---------------------------------------------- | |||||
# The theme to use for HTML and HTML Help pages. See the documentation for | |||||
# a list of builtin themes. | |||||
# | |||||
html_theme = 'alabaster' | |||||
# Theme options are theme-specific and customize the look and feel of a theme | |||||
# further. For a list of options available for each theme, see the | |||||
# documentation. | |||||
# | |||||
# html_theme_options = {} | |||||
# Add any paths that contain custom static files (such as style sheets) here, | |||||
# relative to this directory. They are copied after the builtin static files, | |||||
# so a file named "default.css" will overwrite the builtin "default.css". | |||||
html_static_path = ['_static'] | |||||
# Custom sidebar templates, must be a dictionary that maps document names | |||||
# to template names. | |||||
# | |||||
# This is required for the alabaster theme | |||||
# refs: http://alabaster.readthedocs.io/en/latest/installation.html#sidebars | |||||
html_sidebars = { | |||||
'**': [ | |||||
'relations.html', # needs 'show_related': True theme option to display | |||||
'searchbox.html', | |||||
] | |||||
} | |||||
# -- Options for HTMLHelp output ------------------------------------------ | |||||
# Output file base name for HTML help builder. | |||||
htmlhelp_basename = 'gedlibpydoc' | |||||
# -- Options for LaTeX output --------------------------------------------- | |||||
latex_elements = { | |||||
# The paper size ('letterpaper' or 'a4paper'). | |||||
# | |||||
# 'papersize': 'letterpaper', | |||||
# The font size ('10pt', '11pt' or '12pt'). | |||||
# | |||||
# 'pointsize': '10pt', | |||||
# Additional stuff for the LaTeX preamble. | |||||
# | |||||
# 'preamble': '', | |||||
# Latex figure (float) alignment | |||||
# | |||||
# 'figure_align': 'htbp', | |||||
} | |||||
# Grouping the document tree into LaTeX files. List of tuples | |||||
# (source start file, target name, title, | |||||
# author, documentclass [howto, manual, or own class]). | |||||
latex_documents = [ | |||||
(master_doc, 'gedlibpy.tex', u'gedlibpy Documentation', | |||||
u'Natacha Lambert', 'manual'), | |||||
] | |||||
# -- Options for manual page output --------------------------------------- | |||||
# One entry per manual page. List of tuples | |||||
# (source start file, name, description, authors, manual section). | |||||
man_pages = [ | |||||
(master_doc, 'gedlibpy', u'gedlibpy Documentation', | |||||
[author], 1) | |||||
] | |||||
# -- Options for Texinfo output ------------------------------------------- | |||||
# Grouping the document tree into Texinfo files. List of tuples | |||||
# (source start file, target name, title, author, | |||||
# dir menu entry, description, category) | |||||
texinfo_documents = [ | |||||
(master_doc, 'gedlibpy', u'gedlibpy Documentation', | |||||
author, 'gedlibpy', 'One line description of project.', | |||||
'Miscellaneous'), | |||||
] | |||||
# -- Options for Epub output ---------------------------------------------- | |||||
# Bibliographic Dublin Core info. | |||||
epub_title = project | |||||
epub_author = author | |||||
epub_publisher = author | |||||
epub_copyright = copyright | |||||
# The unique identifier of the text. This can be a ISBN number | |||||
# or the project homepage. | |||||
# | |||||
# epub_identifier = '' | |||||
# A unique identification for the text. | |||||
# | |||||
# epub_uid = '' | |||||
# A list of files that should not be packed into the epub file. | |||||
epub_exclude_files = ['search.html'] | |||||
# Example configuration for intersphinx: refer to the Python standard library. | |||||
intersphinx_mapping = {'https://docs.python.org/': None} |
@@ -0,0 +1,2 @@ | |||||
.. automodule:: gedlibpy | |||||
:members: |
@@ -0,0 +1,42 @@ | |||||
How to add your own editCost class | |||||
========================================= | |||||
When you choose your cost function, you can decide some parameters to personalize the function. But if you have some graphs which its type doesn't correpond to the choices, you can create your edit cost function. | |||||
For this, you have to write it in C++. | |||||
C++ side | |||||
------------- | |||||
You class must inherit to EditCost class, which is an asbtract class. You can find it here : include/gedlib-master/src/edit_costs | |||||
You can inspire you to the others to understand how to use it. You have to override these functions : | |||||
- virtual double node_ins_cost_fun(const UserNodeLabel & node_label) const final; | |||||
- virtual double node_del_cost_fun(const UserNodeLabel & node_label) const final; | |||||
- virtual double node_rel_cost_fun(const UserNodeLabel & node_label_1, const UserNodeLabel & node_label_2) const final; | |||||
- virtual double edge_ins_cost_fun(const UserEdgeLabel & edge_label) const final; | |||||
- virtual double edge_del_cost_fun(const UserEdgeLabel & edge_label) const final; | |||||
- virtual double edge_rel_cost_fun(const UserEdgeLabel & edge_label_1, const UserEdgeLabel & edge_label_2) const final; | |||||
You can add some attributes for parameters use or more functions, but these are unavoidable. | |||||
When your class is ready, please go to the C++ Bind here : src/GedLibBind.cpp . The function is : | |||||
void setPersonalEditCost(std::vector<double> editCostConstants){env.set_edit_costs(Your EditCost Class(editCostConstants));} | |||||
You have just to initialize your class. Parameters aren't mandatory, empty by default. If your class doesn't have one, you can skip this. After that, you have to recompile the project. | |||||
Python side | |||||
---------------- | |||||
For this, use setup.py with this command in a linux shell:: | |||||
python3 setup.py build_ext --inplace | |||||
You can also make it in Python 2. | |||||
Now you can use your edit cost function with the Python function set_personal_edit_cost(edit_cost_constant). | |||||
If you want more informations on C++, you can check the documentation of the original library here : https://github.com/dbblumenthal/gedlib | |||||
@@ -0,0 +1,165 @@ | |||||
Examples | |||||
============== | |||||
Before using each example, please make sure to put these lines on the beginnig of your code : | |||||
.. code-block:: python | |||||
import librariesImport | |||||
import gedlibpy | |||||
Use your path to access it, without changing the library architecture. After that, you are ready to use the library. | |||||
When you want to make new computation, please use this function : | |||||
.. code-block:: python | |||||
gedlibpy.restart_env() | |||||
All the graphs and results will be delete so make sure you don't need it. | |||||
Classique case with GXL graphs | |||||
------------------------------------ | |||||
.. code-block:: python | |||||
gedlibpy.load_GXL_graphs('include/gedlib-master/data/datasets/Mutagenicity/data/', 'collections/MUTA_10.xml') | |||||
listID = gedlibpy.get_all_graph_ids() | |||||
gedlibpy.set_edit_cost("CHEM_1") | |||||
gedlibpy.init() | |||||
gedlibpy.set_method("IPFP", "") | |||||
gedlibpy.init_method() | |||||
g = listID[0] | |||||
h = listID[1] | |||||
gedlibpy.run_method(g,h) | |||||
print("Node Map : ", gedlibpy.get_node_map(g,h)) | |||||
print ("Upper Bound = " + str(gedlibpy.get_upper_bound(g,h)) + ", Lower Bound = " + str(gedlibpy.get_lower_bound(g,h)) + ", Runtime = " + str(gedlibpy.get_runtime(g,h))) | |||||
You can also use this function : | |||||
.. code-block:: python | |||||
compute_edit_distance_on_GXl_graphs(path_folder, path_XML, edit_cost, method, options="", init_option = "EAGER_WITHOUT_SHUFFLED_COPIES") | |||||
This function compute all edit distance between all graphs, even itself. You can see the result with some functions and graphs IDs. Please see the documentation of the function for more informations. | |||||
Classique case with NX graphs | |||||
------------------------------------ | |||||
.. code-block:: python | |||||
for graph in dataset : | |||||
gedlibpy.add_nx_graph(graph, classe) | |||||
listID = gedlibpy.get_all_graph_ids() | |||||
gedlibpy.set_edit_cost("CHEM_1") | |||||
gedlibpy.init() | |||||
gedlibpy.set_method("IPFP", "") | |||||
gedlibpy.init_method() | |||||
g = listID[0] | |||||
h = listID[1] | |||||
gedlibpy.run_method(g,h) | |||||
print("Node Map : ", gedlibpy.get_node_map(g,h)) | |||||
print ("Upper Bound = " + str(gedlibpy.get_upper_bound(g,h)) + ", Lower Bound = " + str(gedlibpy.get_lower_bound(g,h)) + ", Runtime = " + str(gedlibpy.get_runtime(g,h))) | |||||
You can also use this function : | |||||
.. code-block:: python | |||||
compute_edit_distance_on_nx_graphs(dataset, classes, edit_cost, method, options, init_option = "EAGER_WITHOUT_SHUFFLED_COPIES") | |||||
This function compute all edit distance between all graphs, even itself. You can see the result in the return and with some functions and graphs IDs. Please see the documentation of the function for more informations. | |||||
Or this function : | |||||
.. code-block:: python | |||||
compute_ged_on_two_graphs(g1,g2, edit_cost, method, options, init_option = "EAGER_WITHOUT_SHUFFLED_COPIES") | |||||
This function allow to compute the edit distance just for two graphs. Please see the documentation of the function for more informations. | |||||
Add a graph from scratch | |||||
------------------------------------ | |||||
.. code-block:: python | |||||
currentID = gedlibpy.add_graph() | |||||
gedlibpy.add_node(currentID, "_1", {"chem" : "C"}) | |||||
gedlibpy.add_node(currentID, "_2", {"chem" : "O"}) | |||||
gedlibpy.add_edge(currentID,"_1", "_2", {"valence": "1"} ) | |||||
Please make sure as the type are the same (string for Ids and a dictionnary for labels). If you want a symmetrical graph, you can use this function to ensure the symmetry : | |||||
.. code-block:: python | |||||
add_symmetrical_edge(graph_id, tail, head, edge_label) | |||||
If you have a Nx structure, you can use directly this function : | |||||
.. code-block:: python | |||||
add_nx_graph(g, classe, ignore_duplicates=True) | |||||
Even if you have another structure, you can use this function : | |||||
.. code-block:: python | |||||
add_random_graph(name, classe, list_of_nodes, list_of_edges, ignore_duplicates=True) | |||||
Please read the documentation before using and respect the types. | |||||
Median computation | |||||
------------------------------------ | |||||
An example is available in the Median_Example folder. It contains the necessary to compute a median graph. You can launch xp-letter-gbr.py to compute median graph on all letters in the dataset, or median.py for le letter Z. | |||||
To summarize the use, you can follow this example : | |||||
.. code-block:: python | |||||
import pygraph #Available with the median example | |||||
from median import draw_Letter_graph, compute_median, compute_median_set | |||||
gedlibpy.load_GXL_graphs('../include/gedlib-master/data/datasets/Letter/HIGH/', '../include/gedlib-master/data/collections/Letter_Z.xml') | |||||
gedlibpy.set_edit_cost("LETTER") | |||||
gedlibpy.init() | |||||
gedlibpy.set_method("IPFP", "") | |||||
gedlibpy.init_method() | |||||
listID = gedlibpy.get_all_graph_ids() | |||||
dataset,my_y = pygraph.utils.graphfiles.loadDataset("../include/gedlib-master/data/datasets/Letter/HIGH/Letter_Z.cxl") | |||||
median, sod, sods_path,set_median = compute_median(gedlibpy,listID,dataset,verbose=True) | |||||
draw_Letter_graph(median) | |||||
Please use the function in the median.py code to simplify your use. You can adapt this example to your case. Also, some function in the PythonGedLib module can make the work easier. Ask Benoît Gauzere if you want more information. | |||||
Hungarian algorithm | |||||
------------------------------------ | |||||
LSAPE | |||||
~~~~~~ | |||||
.. code-block:: python | |||||
result = gedlibpy.hungarian_LSAPE(matrixCost) | |||||
print("Rho = ", result[0], " Varrho = ", result[1], " u = ", result[2], " v = ", result[3]) | |||||
LSAP | |||||
~~~~~~ | |||||
.. code-block:: python | |||||
result = gedlibpy.hungarian_LSAP(matrixCost) | |||||
print("Rho = ", result[0], " Varrho = ", result[1], " u = ", result[2], " v = ", result[3]) | |||||
@@ -0,0 +1,36 @@ | |||||
.. Python_GedLib documentation master file, created by | |||||
sphinx-quickstart on Thu Jun 13 16:10:06 2019. | |||||
You can adapt this file completely to your liking, but it should at least | |||||
contain the root `toctree` directive. | |||||
Welcome to GEDLIBPY's documentation! | |||||
========================================= | |||||
This module allow to use a C++ library for edit distance between graphs (GedLib) with Python. | |||||
Before using, please read the first section to ensure a good start with the library. Then, you can follow some examples or informations about each method. | |||||
.. toctree:: | |||||
:maxdepth: 2 | |||||
:caption: Contents: | |||||
readme | |||||
editcost | |||||
examples | |||||
doc | |||||
Authors | |||||
~~~~~~~~ | |||||
* David Blumenthal for C++ module | |||||
* Natacha Lambert for Python module | |||||
Copyright (C) 2019 by all the authors | |||||
Indices and tables | |||||
~~~~~~~~~~~~~~~~~~~~~ | |||||
* :ref:`genindex` | |||||
* :ref:`modindex` | |||||
* :ref:`search` |
@@ -0,0 +1,97 @@ | |||||
How to install this library | |||||
==================================== | |||||
Please Read https://dbblumenthal.github.io/gedlib/ before using Python code. | |||||
You can also find this module documentation in documentation/build/html folder. | |||||
Make sure you have numpy installed (and Cython if you have to recompile the library). You can use pip for this. | |||||
Running the script | |||||
------------------- | |||||
After donwloading the entire folder, you can run test.py to ensure the library works. | |||||
For your code, you have to make two imports:: | |||||
import librariesImport | |||||
import gedlibpy | |||||
You can call each function in the library with this. You can't move any folder or files on the library, please make sure that the architecture remains the same. | |||||
This library is compiled for Python3 only. If you want to use it with Python 2, you have to recompile it with setup.py. You have to use this command on your favorite shell:: | |||||
python setup.py build_ext --inplace | |||||
After this step, you can use the same lines as Python3 for import, it will be ok. Check the documentation inside the documentation/build/html folder before using function. You can also copy the tests examples for basic use. | |||||
A problem with the library ? | |||||
------------------------------- | |||||
If the library isn't found, you can recompile the Python library because your Linux is different to mine. Please delete gedlibpy.so, gedlibpy.cpp and build folder. Then use this command on a linux shell :: | |||||
python3 setup.py build_ext --inplace | |||||
You can make it with Python 2 but make sure you use the same version with your code and the compilation. | |||||
If it's doesn't work, maybe the version of GedLib or another library can be a problem. If it is, you can re-install GedLib for your computer. You can download it on this git : https://dbblumenthal.github.io/gedlib/ | |||||
You have to install Gedlib with the Python installer after that. | |||||
Just call:: | |||||
python3 install.py | |||||
Make the links like indicate on the documentation. Use the same architecture like this library, but just change the .so and folders with your installation. You can recompile the Python library with setup command, after that. | |||||
If you use Mac OS, you have to follow all this part, and install the external libraries with this command:: | |||||
install_name_tool -change <mylib> <path>/<to>/<mylib> <myexec> | |||||
For an example, you have to write these lines:: | |||||
install_name_tool -change libdoublefann.2.dylib lib/fann/libdoublefann.2.dylib gedlibpy.so | |||||
install_name_tool -change libsvm.so lib/libsvm.3.22/libsvm.so gedlibpy.so | |||||
install_name_tool -change libnomad.so lib/nomad/libnomad.so gedlibpy.so | |||||
install_name_tool -change libsgtelib.so lib/nomad/libsgtelib.so gedlibpy.so | |||||
The name of the library gedlibpy can be different if you use Python 3. | |||||
If your problem is still here, you can contact me on : natacha.lambert@unicaen.fr | |||||
How to use this library | |||||
------------------------- | |||||
This library allow to compute edit distance between two graphs. You have to follow these steps to use it : | |||||
- Add your graphs (GXL files, NX Structures or your structure, make sure that the internal type is the same) | |||||
- Choose your cost function | |||||
- Init your environnment (After that, the cost function and your graphs can't be modified) | |||||
- Choose your method computation | |||||
- Run the computation with the IDs of the two graphs. You can have the ID when you add the graph or with some functions | |||||
- Find the result with differents functions (NodeMap, edit distance, etc) | |||||
Here is an example of code with GXL graphs:: | |||||
gedlibpy.load_GXL_graphs('include/gedlib-master/data/datasets/Mutagenicity/data/', 'collections/MUTA_10.xml') | |||||
listID = gedlibpy.get_all_graph_ids() | |||||
gedlibpy.set_edit_cost("CHEM_1") | |||||
gedlibpy.init() | |||||
gedlibpy.set_method("IPFP", "") | |||||
gedlibpy.init_method() | |||||
g = listID[0] | |||||
h = listID[1] | |||||
gedlibpy.run_method(g,h) | |||||
print("Node Map : ", gedlibpy.get_node_map(g,h)) | |||||
print ("Upper Bound = " + str(gedlibpy.get_upper_bound(g,h)) + ", Lower Bound = " + str(gedlibpy.get_lower_bound(g,h)) + ", Runtime = " + str(gedlibpy.get_runtime(g,h))) | |||||
Please read the documentation for more examples and functions. | |||||
An advice if you don't code in a shell | |||||
--------------------------------------- | |||||
Python library don't indicate each C++ error. If you have a restart causing by an error in your code, please use on a linux shell for having C++ errors. |
@@ -0,0 +1 @@ | |||||
libdoublefann.so.2 |
@@ -0,0 +1 @@ | |||||
libdoublefann.so.2.2.0 |