Browse Source

Modify function comments of graph kernels.

v0.1
jajupmochi 6 years ago
parent
commit
344a6f8d4b
46 changed files with 3613565 additions and 226 deletions
  1. +64780
    -0
      datasets/AIDS/AIDS_A.txt
  2. +64780
    -0
      datasets/AIDS/AIDS_edge_labels.txt
  3. +31385
    -0
      datasets/AIDS/AIDS_graph_indicator.txt
  4. +2000
    -0
      datasets/AIDS/AIDS_graph_labels.txt
  5. +65
    -0
      datasets/AIDS/AIDS_label_readme.txt
  6. +31385
    -0
      datasets/AIDS/AIDS_node_attributes.txt
  7. +31385
    -0
      datasets/AIDS/AIDS_node_labels.txt
  8. BIN
      datasets/DD/1-s2.0-S0022283603006284-main.pdf
  9. BIN
      datasets/DD/DD.zip
  10. +1686092
    -0
      datasets/DD/DD_A.txt
  11. +334925
    -0
      datasets/DD/DD_graph_indicator.txt
  12. +1178
    -0
      datasets/DD/DD_graph_labels.txt
  13. +334925
    -0
      datasets/DD/DD_node_labels.txt
  14. +75
    -0
      datasets/DD/README.txt
  15. BIN
      datasets/NCI1/NCI1.zip
  16. +265506
    -0
      datasets/NCI1/NCI1_A.txt
  17. +122747
    -0
      datasets/NCI1/NCI1_graph_indicator.txt
  18. +4110
    -0
      datasets/NCI1/NCI1_graph_labels.txt
  19. +122747
    -0
      datasets/NCI1/NCI1_node_labels.txt
  20. +70
    -0
      datasets/NCI1/README.txt
  21. BIN
      datasets/NCI109/NCI109.zip
  22. +265208
    -0
      datasets/NCI109/NCI109_A.txt
  23. +122494
    -0
      datasets/NCI109/NCI109_graph_indicator.txt
  24. +4127
    -0
      datasets/NCI109/NCI109_graph_labels.txt
  25. +122494
    -0
      datasets/NCI109/NCI109_node_labels.txt
  26. +70
    -0
      datasets/NCI109/README.txt
  27. +13
    -13
      notebooks/run_commonwalkkernel.py
  28. +13
    -13
      notebooks/run_marginalizedkernel.py
  29. +16
    -15
      notebooks/run_randomwalkkernel.py
  30. +8
    -8
      notebooks/run_spkernel.py
  31. +14
    -14
      notebooks/run_structuralspkernel.py
  32. +7
    -7
      notebooks/run_treeletkernel.py
  33. +15
    -15
      notebooks/run_untilhpathkernel.py
  34. +13
    -13
      notebooks/run_weisfeilerlehmankernel.py
  35. +53
    -31
      preimage/gk_iam.py
  36. +80
    -44
      preimage/iam.py
  37. +218
    -0
      preimage/median.py
  38. +423
    -0
      preimage/run_gk_iam.py
  39. +15
    -13
      pygraph/kernels/commonWalkKernel.py
  40. +9
    -6
      pygraph/kernels/marginalizedKernel.py
  41. +59
    -12
      pygraph/kernels/randomWalkKernel.py
  42. +5
    -3
      pygraph/kernels/spKernel.py
  43. +14
    -6
      pygraph/kernels/structuralspKernel.py
  44. +11
    -3
      pygraph/kernels/treeletKernel.py
  45. +13
    -5
      pygraph/kernels/untilHPathKernel.py
  46. +18
    -5
      pygraph/kernels/weisfeilerLehmanKernel.py

+ 64780
- 0
datasets/AIDS/AIDS_A.txt
File diff suppressed because it is too large
View File


+ 64780
- 0
datasets/AIDS/AIDS_edge_labels.txt
File diff suppressed because it is too large
View File


+ 31385
- 0
datasets/AIDS/AIDS_graph_indicator.txt
File diff suppressed because it is too large
View File


+ 2000
- 0
datasets/AIDS/AIDS_graph_labels.txt
File diff suppressed because it is too large
View File


+ 65
- 0
datasets/AIDS/AIDS_label_readme.txt View File

@@ -0,0 +1,65 @@
Node labels: [symbol]

Node attributes: [chem, charge, x, y]

Edge labels: [valence]

Node labels were converted to integer values using this map:

Component 0:
0 C
1 O
2 N
3 Cl
4 F
5 S
6 Se
7 P
8 Na
9 I
10 Co
11 Br
12 Li
13 Si
14 Mg
15 Cu
16 As
17 B
18 Pt
19 Ru
20 K
21 Pd
22 Au
23 Te
24 W
25 Rh
26 Zn
27 Bi
28 Pb
29 Ge
30 Sb
31 Sn
32 Ga
33 Hg
34 Ho
35 Tl
36 Ni
37 Tb



Edge labels were converted to integer values using this map:

Component 0:
0 1
1 2
2 3



Class labels were converted to integer values using this map:

0 a
1 i



+ 31385
- 0
datasets/AIDS/AIDS_node_attributes.txt
File diff suppressed because it is too large
View File


+ 31385
- 0
datasets/AIDS/AIDS_node_labels.txt
File diff suppressed because it is too large
View File


BIN
datasets/DD/1-s2.0-S0022283603006284-main.pdf View File


BIN
datasets/DD/DD.zip View File


+ 1686092
- 0
datasets/DD/DD_A.txt
File diff suppressed because it is too large
View File


+ 334925
- 0
datasets/DD/DD_graph_indicator.txt
File diff suppressed because it is too large
View File


+ 1178
- 0
datasets/DD/DD_graph_labels.txt
File diff suppressed because it is too large
View File


+ 334925
- 0
datasets/DD/DD_node_labels.txt
File diff suppressed because it is too large
View File


+ 75
- 0
datasets/DD/README.txt View File

@@ -0,0 +1,75 @@
README for dataset DD


=== Usage ===

This folder contains the following comma separated text files
(replace DS by the name of the dataset):

n = total number of nodes
m = total number of edges
N = number of graphs

(1) DS_A.txt (m lines)
sparse (block diagonal) adjacency matrix for all graphs,
each line corresponds to (row, col) resp. (node_id, node_id)

(2) DS_graph_indicator.txt (n lines)
column vector of graph identifiers for all nodes of all graphs,
the value in the i-th line is the graph_id of the node with node_id i

(3) DS_graph_labels.txt (N lines)
class labels for all graphs in the dataset,
the value in the i-th line is the class label of the graph with graph_id i

(4) DS_node_labels.txt (n lines)
column vector of node labels,
the value in the i-th line corresponds to the node with node_id i

There are OPTIONAL files if the respective information is available:

(5) DS_edge_labels.txt (m lines; same size as DS_A_sparse.txt)
labels for the edges in DS_A_sparse.txt

(6) DS_edge_attributes.txt (m lines; same size as DS_A.txt)
attributes for the edges in DS_A.txt

(7) DS_node_attributes.txt (n lines)
matrix of node attributes,
the comma seperated values in the i-th line is the attribute vector of the node with node_id i

(8) DS_graph_attributes.txt (N lines)
regression values for all graphs in the dataset,
the value in the i-th line is the attribute of the graph with graph_id i


=== Description ===

D&D is a dataset of 1178 protein structures (Dobson and Doig, 2003). Each protein is
represented by a graph, in which the nodes are amino acids and two nodes are connected
by an edge if they are less than 6 Angstroms apart. The prediction task is to classify
the protein structures into enzymes and non-enzymes.


=== Previous Use of the Dataset ===

Neumann, M., Garnett R., Bauckhage Ch., Kersting K.: Propagation Kernels: Efficient Graph
Kernels from Propagated Information. Under review at MLJ.

Neumann, M., Patricia, N., Garnett, R., Kersting, K.: Efficient Graph Kernels by
Randomization. In: P.A. Flach, T.D. Bie, N. Cristianini (eds.) ECML/PKDD, Notes in
Computer Science, vol. 7523, pp. 378-393. Springer (2012).

Shervashidze, N., Schweitzer, P., van Leeuwen, E., Mehlhorn, K., Borgwardt, K.:
Weisfeiler-Lehman Graph Kernels. Journal of Machine Learning Research 12, 2539-2561 (2011)


=== References ===

P. D. Dobson and A. J. Doig. Distinguishing enzyme structures from non-enzymes without
alignments. J. Mol. Biol., 330(4):771–783, Jul 2003.






BIN
datasets/NCI1/NCI1.zip View File


+ 265506
- 0
datasets/NCI1/NCI1_A.txt
File diff suppressed because it is too large
View File


+ 122747
- 0
datasets/NCI1/NCI1_graph_indicator.txt
File diff suppressed because it is too large
View File


+ 4110
- 0
datasets/NCI1/NCI1_graph_labels.txt
File diff suppressed because it is too large
View File


+ 122747
- 0
datasets/NCI1/NCI1_node_labels.txt
File diff suppressed because it is too large
View File


+ 70
- 0
datasets/NCI1/README.txt View File

@@ -0,0 +1,70 @@
README for dataset NCI1


=== Usage ===

This folder contains the following comma separated text files
(replace DS by the name of the dataset):

n = total number of nodes
m = total number of edges
N = number of graphs

(1) DS_A.txt (m lines)
sparse (block diagonal) adjacency matrix for all graphs,
each line corresponds to (row, col) resp. (node_id, node_id)

(2) DS_graph_indicator.txt (n lines)
column vector of graph identifiers for all nodes of all graphs,
the value in the i-th line is the graph_id of the node with node_id i

(3) DS_graph_labels.txt (N lines)
class labels for all graphs in the dataset,
the value in the i-th line is the class label of the graph with graph_id i

(4) DS_node_labels.txt (n lines)
column vector of node labels,
the value in the i-th line corresponds to the node with node_id i

There are OPTIONAL files if the respective information is available:

(5) DS_edge_labels.txt (m lines; same size as DS_A_sparse.txt)
labels for the edges in DS_A_sparse.txt

(6) DS_edge_attributes.txt (m lines; same size as DS_A.txt)
attributes for the edges in DS_A.txt

(7) DS_node_attributes.txt (n lines)
matrix of node attributes,
the comma seperated values in the i-th line is the attribute vector of the node with node_id i

(8) DS_graph_attributes.txt (N lines)
regression values for all graphs in the dataset,
the value in the i-th line is the attribute of the graph with graph_id i


=== Description ===

NCI1 and NCI109 represent two balanced subsets of datasets of chemical compounds screened
for activity against non-small cell lung cancer and ovarian cancer cell lines respectively
(Wale and Karypis (2006) and http://pubchem.ncbi.nlm.nih.gov).


=== Previous Use of the Dataset ===

Neumann, M., Garnett R., Bauckhage Ch., Kersting K.: Propagation Kernels: Efficient Graph
Kernels from Propagated Information. Under review at MLJ.

Neumann, M., Patricia, N., Garnett, R., Kersting, K.: Efficient Graph Kernels by
Randomization. In: P.A. Flach, T.D. Bie, N. Cristianini (eds.) ECML/PKDD, Notes in
Computer Science, vol. 7523, pp. 378-393. Springer (2012).

Shervashidze, N., Schweitzer, P., van Leeuwen, E., Mehlhorn, K., Borgwardt, K.:
Weisfeiler-Lehman Graph Kernels. Journal of Machine Learning Research 12, 2539-2561 (2011)


=== References ===

N. Wale and G. Karypis. Comparison of descriptor spaces for chemical compound retrieval and
classification. In Proc. of ICDM, pages 678–689, Hong Kong, 2006.


BIN
datasets/NCI109/NCI109.zip View File


+ 265208
- 0
datasets/NCI109/NCI109_A.txt
File diff suppressed because it is too large
View File


+ 122494
- 0
datasets/NCI109/NCI109_graph_indicator.txt
File diff suppressed because it is too large
View File


+ 4127
- 0
datasets/NCI109/NCI109_graph_labels.txt
File diff suppressed because it is too large
View File


+ 122494
- 0
datasets/NCI109/NCI109_node_labels.txt
File diff suppressed because it is too large
View File


+ 70
- 0
datasets/NCI109/README.txt View File

@@ -0,0 +1,70 @@
README for dataset NCI109


=== Usage ===

This folder contains the following comma separated text files
(replace DS by the name of the dataset):

n = total number of nodes
m = total number of edges
N = number of graphs

(1) DS_A.txt (m lines)
sparse (block diagonal) adjacency matrix for all graphs,
each line corresponds to (row, col) resp. (node_id, node_id)

(2) DS_graph_indicator.txt (n lines)
column vector of graph identifiers for all nodes of all graphs,
the value in the i-th line is the graph_id of the node with node_id i

(3) DS_graph_labels.txt (N lines)
class labels for all graphs in the dataset,
the value in the i-th line is the class label of the graph with graph_id i

(4) DS_node_labels.txt (n lines)
column vector of node labels,
the value in the i-th line corresponds to the node with node_id i

There are OPTIONAL files if the respective information is available:

(5) DS_edge_labels.txt (m lines; same size as DS_A_sparse.txt)
labels for the edges in DS_A_sparse.txt

(6) DS_edge_attributes.txt (m lines; same size as DS_A.txt)
attributes for the edges in DS_A.txt

(7) DS_node_attributes.txt (n lines)
matrix of node attributes,
the comma seperated values in the i-th line is the attribute vector of the node with node_id i

(8) DS_graph_attributes.txt (N lines)
regression values for all graphs in the dataset,
the value in the i-th line is the attribute of the graph with graph_id i


=== Description ===

NCI1 and NCI109 represent two balanced subsets of datasets of chemical compounds screened
for activity against non-small cell lung cancer and ovarian cancer cell lines respectively
(Wale and Karypis (2006) and http://pubchem.ncbi.nlm.nih.gov).


=== Previous Use of the Dataset ===

Neumann, M., Garnett R., Bauckhage Ch., Kersting K.: Propagation Kernels: Efficient Graph
Kernels from Propagated Information. Under review at MLJ.

Neumann, M., Patricia, N., Garnett, R., Kersting, K.: Efficient Graph Kernels by
Randomization. In: P.A. Flach, T.D. Bie, N. Cristianini (eds.) ECML/PKDD, Notes in
Computer Science, vol. 7523, pp. 378-393. Springer (2012).

Shervashidze, N., Schweitzer, P., van Leeuwen, E., Mehlhorn, K., Borgwardt, K.:
Weisfeiler-Lehman Graph Kernels. Journal of Machine Learning Research 12, 2539-2561 (2011)


=== References ===

N. Wale and G. Karypis. Comparison of descriptor spaces for chemical compound retrieval and
classification. In Proc. of ICDM, pages 678–689, Hong Kong, 2006.


+ 13
- 13
notebooks/run_commonwalkkernel.py View File

@@ -12,21 +12,21 @@ import multiprocessing
from pygraph.kernels.commonWalkKernel import commonwalkkernel

dslist = [
# {'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds',
# 'task': 'regression'}, # node symb
# {'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression',
# 'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'},
# # contains single node graph, node symb
# {'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb
# {'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled
# {'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb
# {'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'},
# # node nsymb
# {'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'},
# # node symb/nsymb
{'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression',
'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'},
# contains single node graph, node symb
{'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds',
'task': 'regression'}, # node symb
{'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb
{'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled
{'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb
{'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'},
# node nsymb
{'name': 'AIDS', 'dataset': '../datasets/AIDS/AIDS_A.txt'}, # node symb/nsymb, edge symb
{'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'},
# node symb/nsymb
# {'name': 'NCI1', 'dataset': '../datasets/NCI1/NCI1_A.txt'}, # node symb
# {'name': 'NCI109', 'dataset': '../datasets/NCI109/NCI109_A.txt'}, # node symb
{'name': 'AIDS', 'dataset': '../datasets/AIDS/AIDS_A.txt'}, # node symb/nsymb, edge symb
# {'name': 'D&D', 'dataset': '../datasets/DD/DD_A.txt'}, # node symb
#
# {'name': 'Mutagenicity', 'dataset': '../datasets/Mutagenicity/Mutagenicity_A.txt'},


+ 13
- 13
notebooks/run_marginalizedkernel.py View File

@@ -12,22 +12,22 @@ import multiprocessing
from pygraph.kernels.marginalizedKernel import marginalizedkernel

dslist = [
# {'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds',
# 'task': 'regression'}, # node symb
# {'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression',
# 'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'},
# # contains single node graph, node symb
# {'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb
# {'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled
# {'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb
# {'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'},
# # node nsymb
# {'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'},
# # node symb/nsymb
{'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression',
'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'},
# contains single node graph, node symb
{'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds',
'task': 'regression'}, # node symb
{'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb
{'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled
{'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb
{'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'},
# node nsymb
{'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'},
# node symb/nsymb
# {'name': 'AIDS', 'dataset': '../datasets/AIDS/AIDS_A.txt'}, # node symb/nsymb, edge symb
# {'name': 'NCI1', 'dataset': '../datasets/NCI1/NCI1_A.txt'}, # node symb
# {'name': 'NCI109', 'dataset': '../datasets/NCI109/NCI109_A.txt'}, # node symb
{'name': 'D&D', 'dataset': '../datasets/DD/DD_A.txt'}, # node symb
# {'name': 'D&D', 'dataset': '../datasets/DD/DD_A.txt'}, # node symb
#
# {'name': 'Mutagenicity', 'dataset': '../datasets/Mutagenicity/Mutagenicity_A.txt'},
# # node/edge symb


+ 16
- 15
notebooks/run_randomwalkkernel.py View File

@@ -17,22 +17,23 @@ import numpy as np


dslist = [
# {'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds',
# 'task': 'regression'}, # node symb
# {'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression',
# 'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'},
# # contains single node graph, node symb
# {'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb
# {'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled
# {'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb
# {'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'},
# # node symb/nsymb
# {'name': 'NCI1', 'dataset': '../datasets/NCI1/NCI1_A.txt'}, # node symb
# {'name': 'NCI109', 'dataset': '../datasets/NCI109/NCI109_A.txt'}, # node symb
{'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression',
'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'},
# contains single node graph, node symb
{'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds',
'task': 'regression'}, # node symb
{'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb
{'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled
{'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb
{'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'},
# node nsymb
{'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'},
# node symb/nsymb
{'name': 'AIDS', 'dataset': '../datasets/AIDS/AIDS_A.txt'}, # node symb/nsymb, edge symb
# {'name': 'D&D', 'dataset': '../datasets/DD/DD_A.txt'}, # node symb
# {'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'},
# # node nsymb
{'name': 'NCI1', 'dataset': '../datasets/NCI1/NCI1_A.txt'}, # node symb
{'name': 'NCI109', 'dataset': '../datasets/NCI109/NCI109_A.txt'}, # node symb
{'name': 'D&D', 'dataset': '../datasets/DD/DD_A.txt'}, # node symb

#
# {'name': 'Mutagenicity', 'dataset': '../datasets/Mutagenicity/Mutagenicity_A.txt'},
# # node/edge symb


+ 8
- 8
notebooks/run_spkernel.py View File

@@ -8,14 +8,14 @@ from pygraph.utils.kernels import deltakernel, gaussiankernel, kernelproduct

# datasets
dslist = [
# {'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds',
# 'task': 'regression'}, # node symb
# {'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression',
# 'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'},
# # contains single node graph, node symb
# {'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb
# {'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled
# {'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb
{'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression',
'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'},
# contains single node graph, node symb
{'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds',
'task': 'regression'}, # node symb
{'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb
{'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled
{'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb
{'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'},
# node nsymb
{'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'},


+ 14
- 14
notebooks/run_structuralspkernel.py View File

@@ -14,22 +14,22 @@ from pygraph.kernels.structuralspKernel import structuralspkernel
from pygraph.utils.kernels import deltakernel, gaussiankernel, kernelproduct

dslist = [
# {'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds',
# 'task': 'regression'}, # node symb
# {'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression',
# 'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'},
# # contains single node graph, node symb
# {'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb
# {'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled
# {'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb
# {'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'},
# # node nsymb
# {'name': 'AIDS', 'dataset': '../datasets/AIDS/AIDS_A.txt'}, # node symb/nsymb, edge symb
# {'name': 'NCI1', 'dataset': '../datasets/NCI1/NCI1_A.txt'}, # node symb
# {'name': 'NCI109', 'dataset': '../datasets/NCI109/NCI109_A.txt'}, # node symb
{'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression',
'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'},
# contains single node graph, node symb
{'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds',
'task': 'regression'}, # node symb
{'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb
{'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled
{'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb
{'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'},
# node nsymb
{'name': 'AIDS', 'dataset': '../datasets/AIDS/AIDS_A.txt'}, # node symb/nsymb, edge symb
{'name': 'NCI1', 'dataset': '../datasets/NCI1/NCI1_A.txt'}, # node symb
{'name': 'NCI109', 'dataset': '../datasets/NCI109/NCI109_A.txt'}, # node symb
# {'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'},
# # node symb/nsymb
{'name': 'D&D', 'dataset': '../datasets/DD/DD_A.txt'}, # node symb
# {'name': 'D&D', 'dataset': '../datasets/DD/DD_A.txt'}, # node symb
#
# {'name': 'Mutagenicity', 'dataset': '../datasets/Mutagenicity/Mutagenicity_A.txt'},
# # node/edge symb


+ 7
- 7
notebooks/run_treeletkernel.py View File

@@ -14,22 +14,22 @@ from pygraph.kernels.treeletKernel import treeletkernel
from pygraph.utils.kernels import gaussiankernel, polynomialkernel

dslist = [
# {'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds',
# 'task': 'regression'}, # node symb
{'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression',
'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'},
# contains single node graph, node symb
{'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds',
'task': 'regression'}, # node symb
{'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb
{'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled
{'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb
{'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'},
# node symb/nsymb
{'name': 'NCI1', 'dataset': '../datasets/NCI1/NCI1_A.txt'}, # node symb
{'name': 'NCI109', 'dataset': '../datasets/NCI109/NCI109_A.txt'}, # node symb
{'name': 'AIDS', 'dataset': '../datasets/AIDS/AIDS_A.txt'}, # node symb/nsymb, edge symb
{'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'},
# node nsymb
{'name': 'D&D', 'dataset': '../datasets/DD/DD_A.txt'}, # node symb
{'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'},
# node symb/nsymb
# {'name': 'D&D', 'dataset': '../datasets/DD/DD_A.txt'}, # node symb
# {'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'},
# # node nsymb
#
# {'name': 'Mutagenicity', 'dataset': '../datasets/Mutagenicity/Mutagenicity_A.txt'},
# # node/edge symb


+ 15
- 15
notebooks/run_untilhpathkernel.py View File

@@ -12,21 +12,21 @@ import multiprocessing
from pygraph.kernels.untilHPathKernel import untilhpathkernel

dslist = [
# {'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds',
# 'task': 'regression'}, # node symb
# {'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression',
# 'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'},
# # contains single node graph, node symb
# {'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb
# {'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled
# {'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb
# {'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'},
# # node nsymb
# {'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'},
# # node symb/nsymb
# {'name': 'NCI1', 'dataset': '../datasets/NCI1/NCI1_A.txt'}, # node symb
# {'name': 'NCI109', 'dataset': '../datasets/NCI109/NCI109_A.txt'}, # node symb
# {'name': 'AIDS', 'dataset': '../datasets/AIDS/AIDS_A.txt'}, # node symb/nsymb, edge symb
{'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression',
'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'},
# contains single node graph, node symb
{'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds',
'task': 'regression'}, # node symb
{'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb
{'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled
{'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb
{'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'},
# node nsymb
{'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'},
# node symb/nsymb
{'name': 'AIDS', 'dataset': '../datasets/AIDS/AIDS_A.txt'}, # node symb/nsymb, edge symb
{'name': 'NCI1', 'dataset': '../datasets/NCI1/NCI1_A.txt'}, # node symb
{'name': 'NCI109', 'dataset': '../datasets/NCI109/NCI109_A.txt'}, # node symb
{'name': 'D&D', 'dataset': '../datasets/DD/DD_A.txt'}, # node symb
#
# {'name': 'Mutagenicity', 'dataset': '../datasets/Mutagenicity/Mutagenicity_A.txt'},


+ 13
- 13
notebooks/run_weisfeilerlehmankernel.py View File

@@ -14,22 +14,22 @@ from pygraph.kernels.weisfeilerLehmanKernel import weisfeilerlehmankernel


dslist = [
# {'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds',
# 'task': 'regression'}, # node symb
# {'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression',
# 'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'},
# # contains single node graph, node symb
# {'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb
# {'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled
# {'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb
{'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression',
'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'},
# contains single node graph, node symb
{'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds',
'task': 'regression'}, # node symb
{'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb
{'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled
{'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb
# {'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'},
# # node nsymb
# {'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'},
# # node symb/nsymb
# {'name': 'NCI1', 'dataset': '../datasets/NCI1/NCI1_A.txt'}, # node symb
# {'name': 'NCI109', 'dataset': '../datasets/NCI109/NCI109_A.txt'}, # node symb
# {'name': 'D&D', 'dataset': '../datasets/DD/DD_A.txt'}, # node symb
{'name': 'AIDS', 'dataset': '../datasets/AIDS/AIDS_A.txt'}, # node symb/nsymb, edge symb
{'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'},
# node symb/nsymb
{'name': 'NCI1', 'dataset': '../datasets/NCI1/NCI1_A.txt'}, # node symb
{'name': 'NCI109', 'dataset': '../datasets/NCI109/NCI109_A.txt'}, # node symb
{'name': 'D&D', 'dataset': '../datasets/DD/DD_A.txt'}, # node symb
#
# {'name': 'Mutagenicity', 'dataset': '../datasets/Mutagenicity/Mutagenicity_A.txt'},
# # node/edge symb


+ 53
- 31
preimage/gk_iam.py View File

@@ -277,7 +277,8 @@ def gk_iam_nearest(Gn, alpha, idx_gi, Kmatrix, k, r_max):
# return dhat, ghat_list


def gk_iam_nearest_multi(Gn_init, Gn_median, alpha, idx_gi, Kmatrix, k, r_max, gkernel):
def gk_iam_nearest_multi(Gn_init, Gn_median, alpha, idx_gi, Kmatrix, k, r_max,
gkernel, c_ei=1, c_er=1, c_es=1, epsilon=0.001):
"""This function constructs graph pre-image by the iterative pre-image
framework in reference [1], algorithm 1, where the step of generating new
graphs randomly is replaced by the IAM algorithm in reference [2].
@@ -312,37 +313,44 @@ def gk_iam_nearest_multi(Gn_init, Gn_median, alpha, idx_gi, Kmatrix, k, r_max, g
return 0, g0hat_list
dhat = dis_gs[0] # the nearest distance
ghat_list = [g.copy() for g in g0hat_list]
for g in ghat_list:
draw_Letter_graph(g)
# for g in ghat_list:
# draw_Letter_graph(g)
# nx.draw_networkx(g)
# plt.show()
print(g.nodes(data=True))
print(g.edges(data=True))
# print(g.nodes(data=True))
# print(g.edges(data=True))
Gk = [Gn_init[ig].copy() for ig in sort_idx[0:k]] # the k nearest neighbors
for gi in Gk:
# nx.draw_networkx(gi)
# plt.show()
draw_Letter_graph(g)
print(gi.nodes(data=True))
print(gi.edges(data=True))
# for gi in Gk:
## nx.draw_networkx(gi)
## plt.show()
# draw_Letter_graph(g)
# print(gi.nodes(data=True))
# print(gi.edges(data=True))
Gs_nearest = Gk.copy()
# gihat_list = []
# i = 1
r = 1
while r < r_max:
print('r =', r)
# found = False
r = 0
itr = 0
# cur_sod = dhat
# old_sod = cur_sod * 2
sod_list = [dhat]
found = False
nb_updated = 0
while r < r_max:# and not found: # @todo: if not found?# and np.abs(old_sod - cur_sod) > epsilon:
print('\nr =', r)
print('itr for gk =', itr, '\n')
found = False
# Gs_nearest = Gk + gihat_list
# g_tmp = iam(Gs_nearest)
g_tmp_list = test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations(
Gn_median, Gs_nearest, c_ei=1, c_er=1, c_es=1)
for g in g_tmp_list:
g_tmp_list, _ = test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations(
Gn_median, Gs_nearest, c_ei=c_ei, c_er=c_er, c_es=c_es)
# for g in g_tmp_list:
# nx.draw_networkx(g)
# plt.show()
draw_Letter_graph(g)
print(g.nodes(data=True))
print(g.edges(data=True))
# draw_Letter_graph(g)
# print(g.nodes(data=True))
# print(g.edges(data=True))
# compute distance between phi and the new generated graphs.
knew = compute_kernel(g_tmp_list + Gn_median, gkernel, False)
@@ -358,6 +366,7 @@ def gk_iam_nearest_multi(Gn_init, Gn_median, alpha, idx_gi, Kmatrix, k, r_max, g
# k_g1_list[1] + alpha[1] * alpha[1] * k_list[1])
# find the new k nearest graphs.
dnew_best = min(dnew_list)
dis_gs = dnew_list + dis_gs # add the new nearest distances.
Gs_nearest = [g.copy() for g in g_tmp_list] + Gs_nearest # add the corresponding graphs.
sort_idx = np.argsort(dis_gs)
@@ -367,21 +376,34 @@ def gk_iam_nearest_multi(Gn_init, Gn_median, alpha, idx_gi, Kmatrix, k, r_max, g
print(dis_gs[-1])
Gs_nearest = [Gs_nearest[idx] for idx in sort_idx[0:k]]
nb_best = len(np.argwhere(dis_gs == dis_gs[0]).flatten().tolist())
if len([i for i in sort_idx[0:nb_best] if i < len(dnew_list)]) > 0:
print('I have smaller or equal distance!')
if dnew_best < dhat and np.abs(dnew_best - dhat) > epsilon:
print('I have smaller distance!')
print(str(dhat) + '->' + str(dis_gs[0]))
dhat = dis_gs[0]
idx_best_list = np.argwhere(dnew_list == dhat).flatten().tolist()
ghat_list = [g_tmp_list[idx].copy() for idx in idx_best_list]
for g in ghat_list:
# nx.draw_networkx(g)
# plt.show()
draw_Letter_graph(g)
print(g.nodes(data=True))
print(g.edges(data=True))
r = 0
else:
# for g in ghat_list:
## nx.draw_networkx(g)
## plt.show()
# draw_Letter_graph(g)
# print(g.nodes(data=True))
# print(g.edges(data=True))
r = 0
found = True
nb_updated += 1
elif np.abs(dnew_best - dhat) < epsilon:
print('I have almost equal distance!')
print(str(dhat) + '->' + str(dnew_best))
if not found:
r += 1
# old_sod = cur_sod
# cur_sod = dnew_best
sod_list.append(dhat)
itr += 1
print('\nthe graph is updated', nb_updated, 'times.')
print('sods in kernel space:', sod_list, '\n')
return dhat, ghat_list



+ 80
- 44
preimage/iam.py View File

@@ -9,6 +9,7 @@ Iterative alternate minimizations using GED.
import numpy as np
import random
import networkx as nx
from tqdm import tqdm

import sys
#from Cython_GedLib_2 import librariesImport, script
@@ -181,13 +182,27 @@ def GED(g1, g2, lib='gedlib'):
return dis, pi_forward, pi_backward


def median_distance(Gn, Gn_median, measure='ged', verbose=False):
dis_list = []
pi_forward_list = []
for idx, G in tqdm(enumerate(Gn), desc='computing median distances',
file=sys.stdout) if verbose else enumerate(Gn):
dis_sum = 0
pi_forward_list.append([])
for G_p in Gn_median:
dis_tmp, pi_tmp_forward, pi_tmp_backward = GED(G, G_p)
pi_forward_list[idx].append(pi_tmp_forward)
dis_sum += dis_tmp
dis_list.append(dis_sum)
return dis_list, pi_forward_list


# --------------------------- These are tests --------------------------------#
def test_iam_with_more_graphs_as_init(Gn, G_candidate, c_ei=3, c_er=3, c_es=1,
node_label='atom', edge_label='bond_type'):
"""See my name, then you know what I do.
"""
from tqdm import tqdm
# Gn = Gn[0:10]
Gn = [nx.convert_node_labels_to_integers(g) for g in Gn]
@@ -321,7 +336,7 @@ def test_iam_with_more_graphs_as_init(Gn, G_candidate, c_ei=3, c_er=3, c_es=1,

def test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations(
Gn_median, Gn_candidate, c_ei=3, c_er=3, c_es=1, node_label='atom',
edge_label='bond_type', connected=True):
edge_label='bond_type', connected=False):
"""See my name, then you know what I do.
"""
from tqdm import tqdm
@@ -330,8 +345,11 @@ def test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations(
node_ir = np.inf # corresponding to the node remove and insertion.
label_r = 'thanksdanny' # the label for node remove. # @todo: make this label unrepeatable.
ds_attrs = get_dataset_attributes(Gn_median + Gn_candidate,
attr_names=['edge_labeled', 'node_attr_dim'],
attr_names=['edge_labeled', 'node_attr_dim', 'edge_attr_dim'],
edge_label=edge_label)
ite_max = 50
epsilon = 0.001

def generate_graph(G, pi_p_forward, label_set):
@@ -460,13 +478,15 @@ def test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations(
g_tmp.remove_edge(nd1, nd2)
# do not change anything when equal.
# find the best graph generated in this iteration and update pi_p.
# # find the best graph generated in this iteration and update pi_p.
# @todo: should we update all graphs generated or just the best ones?
dis_list, pi_forward_list = median_distance(G_new_list, Gn_median)
# @todo: should we remove the identical and connectivity check?
# Don't know which is faster.
G_new_list, idx_list = remove_duplicates(G_new_list)
pi_forward_list = [pi_forward_list[idx] for idx in idx_list]
if ds_attrs['node_attr_dim'] == 0 and ds_attrs['edge_attr_dim'] == 0:
G_new_list, idx_list = remove_duplicates(G_new_list)
pi_forward_list = [pi_forward_list[idx] for idx in idx_list]
dis_list = [dis_list[idx] for idx in idx_list]
# if connected == True:
# G_new_list, idx_list = remove_disconnected(G_new_list)
# pi_forward_list = [pi_forward_list[idx] for idx in idx_list]
@@ -482,25 +502,10 @@ def test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations(
# print(g.nodes(data=True))
# print(g.edges(data=True))
return G_new_list, pi_forward_list
return G_new_list, pi_forward_list, dis_list
def median_distance(Gn, Gn_median, measure='ged', verbose=False):
dis_list = []
pi_forward_list = []
for idx, G in tqdm(enumerate(Gn), desc='computing median distances',
file=sys.stdout) if verbose else enumerate(Gn):
dis_sum = 0
pi_forward_list.append([])
for G_p in Gn_median:
dis_tmp, pi_tmp_forward, pi_tmp_backward = GED(G, G_p)
pi_forward_list[idx].append(pi_tmp_forward)
dis_sum += dis_tmp
dis_list.append(dis_sum)
return dis_list, pi_forward_list
def best_median_graphs(Gn_candidate, dis_all, pi_all_forward):
def best_median_graphs(Gn_candidate, pi_all_forward, dis_all):
idx_min_list = np.argwhere(dis_all == np.min(dis_all)).flatten().tolist()
dis_min = dis_all[idx_min_list[0]]
pi_forward_min_list = [pi_all_forward[idx] for idx in idx_min_list]
@@ -508,25 +513,45 @@ def test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations(
return G_min_list, pi_forward_min_list, dis_min
def iteration_proc(G, pi_p_forward):
def iteration_proc(G, pi_p_forward, cur_sod):
G_list = [G]
pi_forward_list = [pi_p_forward]
old_sod = cur_sod * 2
sod_list = [cur_sod]
# iterations.
for itr in range(0, 5): # @todo: the convergence condition?
# print('itr is', itr)
itr = 0
while itr < ite_max and np.abs(old_sod - cur_sod) > epsilon:
# for itr in range(0, 5): # the convergence condition?
print('itr is', itr)
G_new_list = []
pi_forward_new_list = []
dis_new_list = []
for idx, G in enumerate(G_list):
label_set = get_node_labels(Gn_median + [G], node_label)
G_tmp_list, pi_forward_tmp_list = generate_graph(
G_tmp_list, pi_forward_tmp_list, dis_tmp_list = generate_graph(
G, pi_forward_list[idx], label_set)
G_new_list += G_tmp_list
pi_forward_new_list += pi_forward_tmp_list
dis_new_list += dis_tmp_list
G_list = G_new_list[:]
pi_forward_list = pi_forward_new_list[:]
dis_list = dis_new_list[:]
old_sod = cur_sod
cur_sod = np.min(dis_list)
sod_list.append(cur_sod)
itr += 1
G_list, idx_list = remove_duplicates(G_list)
pi_forward_list = [pi_forward_list[idx] for idx in idx_list]
# @todo: do we return all graphs or the best ones?
# get the best ones of the generated graphs.
G_list, pi_forward_list, dis_min = best_median_graphs(
G_list, pi_forward_list, dis_list)
if ds_attrs['node_attr_dim'] == 0 and ds_attrs['edge_attr_dim'] == 0:
G_list, idx_list = remove_duplicates(G_list)
pi_forward_list = [pi_forward_list[idx] for idx in idx_list]
# dis_list = [dis_list[idx] for idx in idx_list]
# import matplotlib.pyplot as plt
# for g in G_list:
@@ -535,7 +560,9 @@ def test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations(
# print(g.nodes(data=True))
# print(g.edges(data=True))
return G_list, pi_forward_list # do we return all graphs or the best ones?
print('\nsods:', sod_list, '\n')
return G_list, pi_forward_list, dis_min
def remove_duplicates(Gn):
@@ -570,28 +597,37 @@ def test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations(
# phase 1: initilize.
# compute set-median.
dis_min = np.inf
dis_all, pi_all_forward = median_distance(Gn_candidate, Gn_median)
dis_list, pi_forward_all = median_distance(Gn_candidate, Gn_median)
# find all smallest distances.
idx_min_list = np.argwhere(dis_all == np.min(dis_all)).flatten().tolist()
dis_min = dis_all[idx_min_list[0]]
idx_min_list = np.argwhere(dis_list == np.min(dis_list)).flatten().tolist()
dis_min = dis_list[idx_min_list[0]]
# phase 2: iteration.
G_list = []
for idx_min in idx_min_list[::-1]:
dis_list = []
pi_forward_list = []
for idx_min in idx_min_list:
# print('idx_min is', idx_min)
G = Gn_candidate[idx_min].copy()
# list of edit operations.
pi_p_forward = pi_all_forward[idx_min]
pi_p_forward = pi_forward_all[idx_min]
# pi_p_backward = pi_all_backward[idx_min]
Gi_list, pi_i_forward_list = iteration_proc(G, pi_p_forward)
Gi_list, pi_i_forward_list, dis_i_min = iteration_proc(G, pi_p_forward, dis_min)
G_list += Gi_list
dis_list.append(dis_i_min)
pi_forward_list += pi_i_forward_list
G_list, _ = remove_duplicates(G_list)
if ds_attrs['node_attr_dim'] == 0 and ds_attrs['edge_attr_dim'] == 0:
G_list, idx_list = remove_duplicates(G_list)
dis_list = [dis_list[idx] for idx in idx_list]
pi_forward_list = [pi_forward_list[idx] for idx in idx_list]
if connected == True:
G_list_con, _ = remove_disconnected(G_list)
# if there is no connected graphs at all, then remain the disconnected ones.
if len(G_list_con) > 0: # @todo: ??????????????????????????
G_list = G_list_con
G_list_con, idx_list = remove_disconnected(G_list)
# if there is no connected graphs at all, then remain the disconnected ones.
if len(G_list_con) > 0: # @todo: ??????????????????????????
G_list = G_list_con
dis_list = [dis_list[idx] for idx in idx_list]
pi_forward_list = [pi_forward_list[idx] for idx in idx_list]

# import matplotlib.pyplot as plt
# for g in G_list:
@@ -601,15 +637,15 @@ def test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations(
# print(g.edges(data=True))
# get the best median graphs
dis_all, pi_all_forward = median_distance(G_list, Gn_median)
# dis_list, pi_forward_list = median_distance(G_list, Gn_median)
G_min_list, pi_forward_min_list, dis_min = best_median_graphs(
G_list, dis_all, pi_all_forward)
G_list, pi_forward_list, dis_list)
# for g in G_min_list:
# nx.draw_networkx(g)
# plt.show()
# print(g.nodes(data=True))
# print(g.edges(data=True))
return G_min_list
return G_min_list, dis_min


if __name__ == '__main__':


+ 218
- 0
preimage/median.py View File

@@ -0,0 +1,218 @@
import sys
sys.path.insert(0, "../")
#import pathlib
import numpy as np
import networkx as nx
import time
#import librariesImport
#import script
#sys.path.insert(0, "/home/bgauzere/dev/optim-graphes/")
#import pygraph
from pygraph.utils.graphfiles import loadDataset
def replace_graph_in_env(script, graph, old_id, label='median'):
"""
Replace a graph in script
If old_id is -1, add a new graph to the environnemt
"""
if(old_id > -1):
script.PyClearGraph(old_id)
new_id = script.PyAddGraph(label)
for i in graph.nodes():
script.PyAddNode(new_id,str(i),graph.node[i]) # !! strings are required bt gedlib
for e in graph.edges:
script.PyAddEdge(new_id, str(e[0]),str(e[1]), {})
script.PyInitEnv()
script.PySetMethod("IPFP", "")
script.PyInitMethod()
return new_id
#Dessin median courrant
def draw_Letter_graph(graph, savepath=''):
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt
plt.figure()
pos = {}
for n in graph.nodes:
pos[n] = np.array([float(graph.node[n]['attributes'][0]),
float(graph.node[n]['attributes'][1])])
nx.draw_networkx(graph, pos)
if savepath != '':
plt.savefig(savepath + str(time.time()) + '.eps', format='eps', dpi=300)
plt.show()
plt.clf()
#compute new mappings
def update_mappings(script,median_id,listID):
med_distances = {}
med_mappings = {}
sod = 0
for i in range(0,len(listID)):
script.PyRunMethod(median_id,listID[i])
med_distances[i] = script.PyGetUpperBound(median_id,listID[i])
med_mappings[i] = script.PyGetForwardMap(median_id,listID[i])
sod += med_distances[i]
return med_distances, med_mappings, sod
def calcul_Sij(all_mappings, all_graphs,i,j):
s_ij = 0
for k in range(0,len(all_mappings)):
cur_graph = all_graphs[k]
cur_mapping = all_mappings[k]
size_graph = cur_graph.order()
if ((cur_mapping[i] < size_graph) and
(cur_mapping[j] < size_graph) and
(cur_graph.has_edge(cur_mapping[i], cur_mapping[j]) == True)):
s_ij += 1
return s_ij
# def update_median_nodes_L1(median,listIdSet,median_id,dataset, mappings):
# from scipy.stats.mstats import gmean
# for i in median.nodes():
# for k in listIdSet:
# vectors = [] #np.zeros((len(listIdSet),2))
# if(k != median_id):
# phi_i = mappings[k][i]
# if(phi_i < dataset[k].order()):
# vectors.append([float(dataset[k].node[phi_i]['x']),float(dataset[k].node[phi_i]['y'])])
# new_labels = gmean(vectors)
# median.node[i]['x'] = str(new_labels[0])
# median.node[i]['y'] = str(new_labels[1])
# return median
def update_median_nodes(median,dataset,mappings):
#update node attributes
for i in median.nodes():
nb_sub=0
mean_label = {'x' : 0, 'y' : 0}
for k in range(0,len(mappings)):
phi_i = mappings[k][i]
if ( phi_i < dataset[k].order() ):
nb_sub += 1
mean_label['x'] += 0.75*float(dataset[k].node[phi_i]['x'])
mean_label['y'] += 0.75*float(dataset[k].node[phi_i]['y'])
median.node[i]['x'] = str((1/0.75)*(mean_label['x']/nb_sub))
median.node[i]['y'] = str((1/0.75)*(mean_label['y']/nb_sub))
return median
def update_median_edges(dataset, mappings, median, cei=0.425,cer=0.425):
#for letter high, ceir = 1.7, alpha = 0.75
size_dataset = len(dataset)
ratio_cei_cer = cer/(cei + cer)
threshold = size_dataset*ratio_cei_cer
order_graph_median = median.order()
for i in range(0,order_graph_median):
for j in range(i+1,order_graph_median):
s_ij = calcul_Sij(mappings,dataset,i,j)
if(s_ij > threshold):
median.add_edge(i,j)
else:
if(median.has_edge(i,j)):
median.remove_edge(i,j)
return median
def compute_median(script, listID, dataset,verbose=False):
"""Compute a graph median of a dataset according to an environment
Parameters
script : An gedlib initialized environnement
listID (list): a list of ID in script: encodes the dataset
dataset (list): corresponding graphs in networkX format. We assume that graph
listID[i] corresponds to dataset[i]
Returns:
A networkX graph, which is the median, with corresponding sod
"""
print(len(listID))
median_set_index, median_set_sod = compute_median_set(script, listID)
print(median_set_index)
print(median_set_sod)
sods = []
#Ajout median dans environnement
set_median = dataset[median_set_index].copy()
median = dataset[median_set_index].copy()
cur_med_id = replace_graph_in_env(script,median,-1)
med_distances, med_mappings, cur_sod = update_mappings(script,cur_med_id,listID)
sods.append(cur_sod)
if(verbose):
print(cur_sod)
ite_max = 50
old_sod = cur_sod * 2
ite = 0
epsilon = 0.001
best_median
while((ite < ite_max) and (np.abs(old_sod - cur_sod) > epsilon )):
median = update_median_nodes(median,dataset, med_mappings)
median = update_median_edges(dataset,med_mappings,median)
cur_med_id = replace_graph_in_env(script,median,cur_med_id)
med_distances, med_mappings, cur_sod = update_mappings(script,cur_med_id,listID)
sods.append(cur_sod)
if(verbose):
print(cur_sod)
ite += 1
return median, cur_sod, sods, set_median
draw_Letter_graph(median)
def compute_median_set(script,listID):
'Returns the id in listID corresponding to median set'
#Calcul median set
N=len(listID)
map_id_to_index = {}
map_index_to_id = {}
for i in range(0,len(listID)):
map_id_to_index[listID[i]] = i
map_index_to_id[i] = listID[i]
distances = np.zeros((N,N))
for i in listID:
for j in listID:
script.PyRunMethod(i,j)
distances[map_id_to_index[i],map_id_to_index[j]] = script.PyGetUpperBound(i,j)
median_set_index = np.argmin(np.sum(distances,0))
sod = np.min(np.sum(distances,0))
return median_set_index, sod
#if __name__ == "__main__":
# #Chargement du dataset
# script.PyLoadGXLGraph('/home/bgauzere/dev/gedlib/data/datasets/Letter/HIGH/', '/home/bgauzere/dev/gedlib/data/collections/Letter_Z.xml')
# script.PySetEditCost("LETTER")
# script.PyInitEnv()
# script.PySetMethod("IPFP", "")
# script.PyInitMethod()
#
# dataset,my_y = pygraph.utils.graphfiles.loadDataset("/home/bgauzere/dev/gedlib/data/datasets/Letter/HIGH/Letter_Z.cxl")
#
# listID = script.PyGetAllGraphIds()
# median, sod = compute_median(script,listID,dataset,verbose=True)
#
# print(sod)
# draw_Letter_graph(median)
if __name__ == '__main__':
# test draw_Letter_graph
ds = {'name': 'Letter-high', 'dataset': '../datasets/Letter-high/Letter-high_A.txt',
'extra_params': {}} # node nsymb
Gn, y_all = loadDataset(ds['dataset'], extra_params=ds['extra_params'])
print(y_all)
for g in Gn:
draw_Letter_graph(g)

+ 423
- 0
preimage/run_gk_iam.py View File

@@ -0,0 +1,423 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Thu Jul 4 12:20:16 2019

@author: ljia
"""
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt
import time
from tqdm import tqdm

import sys
sys.path.insert(0, "../")
from pygraph.utils.graphfiles import loadDataset
from median import draw_Letter_graph


# --------------------------- These are tests --------------------------------#
def test_who_is_the_closest_in_kernel_space(Gn):
idx_gi = [0, 6]
g1 = Gn[idx_gi[0]]
g2 = Gn[idx_gi[1]]
# create the "median" graph.
gnew = g2.copy()
gnew.remove_node(0)
nx.draw_networkx(gnew)
plt.show()
print(gnew.nodes(data=True))
Gn = [gnew] + Gn
# compute gram matrix
Kmatrix = compute_kernel(Gn, 'untilhpathkernel', True)
# the distance matrix
dmatrix = gram2distances(Kmatrix)
print(np.sort(dmatrix[idx_gi[0] + 1]))
print(np.argsort(dmatrix[idx_gi[0] + 1]))
print(np.sort(dmatrix[idx_gi[1] + 1]))
print(np.argsort(dmatrix[idx_gi[1] + 1]))
# for all g in Gn, compute (d(g1, g) + d(g2, g)) / 2
dis_median = [(dmatrix[i, idx_gi[0] + 1] + dmatrix[i, idx_gi[1] + 1]) / 2 for i in range(len(Gn))]
print(np.sort(dis_median))
print(np.argsort(dis_median))
return


def test_who_is_the_closest_in_GED_space(Gn):
from iam import GED
idx_gi = [0, 6]
g1 = Gn[idx_gi[0]]
g2 = Gn[idx_gi[1]]
# create the "median" graph.
gnew = g2.copy()
gnew.remove_node(0)
nx.draw_networkx(gnew)
plt.show()
print(gnew.nodes(data=True))
Gn = [gnew] + Gn
# compute GEDs
ged_matrix = np.zeros((len(Gn), len(Gn)))
for i1 in tqdm(range(len(Gn)), desc='computing GEDs', file=sys.stdout):
for i2 in range(len(Gn)):
dis, _, _ = GED(Gn[i1], Gn[i2], lib='gedlib')
ged_matrix[i1, i2] = dis
print(np.sort(ged_matrix[idx_gi[0] + 1]))
print(np.argsort(ged_matrix[idx_gi[0] + 1]))
print(np.sort(ged_matrix[idx_gi[1] + 1]))
print(np.argsort(ged_matrix[idx_gi[1] + 1]))
# for all g in Gn, compute (GED(g1, g) + GED(g2, g)) / 2
dis_median = [(ged_matrix[i, idx_gi[0] + 1] + ged_matrix[i, idx_gi[1] + 1]) / 2 for i in range(len(Gn))]
print(np.sort(dis_median))
print(np.argsort(dis_median))
return


def test_will_IAM_give_the_median_graph_we_wanted(Gn):
idx_gi = [0, 6]
g1 = Gn[idx_gi[0]].copy()
g2 = Gn[idx_gi[1]].copy()
# del Gn[idx_gi[0]]
# del Gn[idx_gi[1] - 1]
g_median = test_iam_with_more_graphs_as_init([g1, g2], [g1, g2], c_ei=1, c_er=1, c_es=1)
# g_median = test_iam_with_more_graphs_as_init(Gn, Gn, c_ei=1, c_er=1, c_es=1)
nx.draw_networkx(g_median)
plt.show()
print(g_median.nodes(data=True))
print(g_median.edges(data=True))
def test_new_IAM_allGraph_deleteNodes(Gn):
idx_gi = [0, 6]
# g1 = Gn[idx_gi[0]].copy()
# g2 = Gn[idx_gi[1]].copy()

# g1 = nx.Graph(name='haha')
# g1.add_nodes_from([(0, {'atom': 'C'}), (1, {'atom': 'O'}), (2, {'atom': 'C'})])
# g1.add_edges_from([(0, 1, {'bond_type': '1'}), (1, 2, {'bond_type': '1'})])
# g2 = nx.Graph(name='hahaha')
# g2.add_nodes_from([(0, {'atom': 'C'}), (1, {'atom': 'O'}), (2, {'atom': 'C'}),
# (3, {'atom': 'O'}), (4, {'atom': 'C'})])
# g2.add_edges_from([(0, 1, {'bond_type': '1'}), (1, 2, {'bond_type': '1'}),
# (2, 3, {'bond_type': '1'}), (3, 4, {'bond_type': '1'})])
g1 = nx.Graph(name='haha')
g1.add_nodes_from([(0, {'atom': 'C'}), (1, {'atom': 'C'}), (2, {'atom': 'C'}),
(3, {'atom': 'S'}), (4, {'atom': 'S'})])
g1.add_edges_from([(0, 1, {'bond_type': '1'}), (1, 2, {'bond_type': '1'}),
(2, 3, {'bond_type': '1'}), (2, 4, {'bond_type': '1'})])
g2 = nx.Graph(name='hahaha')
g2.add_nodes_from([(0, {'atom': 'C'}), (1, {'atom': 'C'}), (2, {'atom': 'C'}),
(3, {'atom': 'O'}), (4, {'atom': 'O'})])
g2.add_edges_from([(0, 1, {'bond_type': '1'}), (1, 2, {'bond_type': '1'}),
(2, 3, {'bond_type': '1'}), (2, 4, {'bond_type': '1'})])

# g2 = g1.copy()
# g2.add_nodes_from([(3, {'atom': 'O'})])
# g2.add_nodes_from([(4, {'atom': 'C'})])
# g2.add_edges_from([(1, 3, {'bond_type': '1'})])
# g2.add_edges_from([(3, 4, {'bond_type': '1'})])

# del Gn[idx_gi[0]]
# del Gn[idx_gi[1] - 1]
nx.draw_networkx(g1)
plt.show()
print(g1.nodes(data=True))
print(g1.edges(data=True))
nx.draw_networkx(g2)
plt.show()
print(g2.nodes(data=True))
print(g2.edges(data=True))
g_median = test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations([g1, g2], [g1, g2], c_ei=1, c_er=1, c_es=1)
# g_median = test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations(Gn, Gn, c_ei=1, c_er=1, c_es=1)
nx.draw_networkx(g_median)
plt.show()
print(g_median.nodes(data=True))
print(g_median.edges(data=True))
def test_the_simple_two(Gn, gkernel):
from gk_iam import gk_iam_nearest_multi, compute_kernel
lmbda = 0.03 # termination probalility
r_max = 10 # recursions
l = 500
alpha_range = np.linspace(0.5, 0.5, 1)
k = 2 # k nearest neighbors
# randomly select two molecules
np.random.seed(1)
idx_gi = [0, 6] # np.random.randint(0, len(Gn), 2)
g1 = Gn[idx_gi[0]]
g2 = Gn[idx_gi[1]]
Gn_mix = [g.copy() for g in Gn]
Gn_mix.append(g1.copy())
Gn_mix.append(g2.copy())
# g_tmp = iam([g1, g2])
# nx.draw_networkx(g_tmp)
# plt.show()
# compute
# k_list = [] # kernel between each graph and itself.
# k_g1_list = [] # kernel between each graph and g1
# k_g2_list = [] # kernel between each graph and g2
# for ig, g in tqdm(enumerate(Gn), desc='computing self kernels', file=sys.stdout):
# ktemp = compute_kernel([g, g1, g2], 'marginalizedkernel', False)
# k_list.append(ktemp[0][0, 0])
# k_g1_list.append(ktemp[0][0, 1])
# k_g2_list.append(ktemp[0][0, 2])
km = compute_kernel(Gn_mix, gkernel, True)
# k_list = np.diag(km) # kernel between each graph and itself.
# k_g1_list = km[idx_gi[0]] # kernel between each graph and g1
# k_g2_list = km[idx_gi[1]] # kernel between each graph and g2

g_best = []
dis_best = []
# for each alpha
for alpha in alpha_range:
print('alpha =', alpha)
dhat, ghat_list = gk_iam_nearest_multi(Gn, [g1, g2], [alpha, 1 - alpha],
range(len(Gn), len(Gn) + 2), km,
k, r_max,gkernel)
dis_best.append(dhat)
g_best.append(ghat_list)
for idx, item in enumerate(alpha_range):
print('when alpha is', item, 'the shortest distance is', dis_best[idx])
print('the corresponding pre-images are')
for g in g_best[idx]:
nx.draw_networkx(g)
plt.show()
print(g.nodes(data=True))
print(g.edges(data=True))
def test_remove_bests(Gn, gkernel):
from gk_iam import gk_iam_nearest_multi, compute_kernel
lmbda = 0.03 # termination probalility
r_max = 10 # recursions
l = 500
alpha_range = np.linspace(0.5, 0.5, 1)
k = 20 # k nearest neighbors
# randomly select two molecules
np.random.seed(1)
idx_gi = [0, 6] # np.random.randint(0, len(Gn), 2)
g1 = Gn[idx_gi[0]]
g2 = Gn[idx_gi[1]]
# remove the best 2 graphs.
del Gn[idx_gi[0]]
del Gn[idx_gi[1] - 1]
# del Gn[8]
Gn_mix = [g.copy() for g in Gn]
Gn_mix.append(g1.copy())
Gn_mix.append(g2.copy())

# compute
km = compute_kernel(Gn_mix, gkernel, True)
g_best = []
dis_best = []
# for each alpha
for alpha in alpha_range:
print('alpha =', alpha)
dhat, ghat_list = gk_iam_nearest_multi(Gn, [g1, g2], [alpha, 1 - alpha],
range(len(Gn), len(Gn) + 2), km,
k, r_max, gkernel)
dis_best.append(dhat)
g_best.append(ghat_list)
for idx, item in enumerate(alpha_range):
print('when alpha is', item, 'the shortest distance is', dis_best[idx])
print('the corresponding pre-images are')
for g in g_best[idx]:
draw_Letter_graph(g)
# nx.draw_networkx(g)
# plt.show()
print(g.nodes(data=True))
print(g.edges(data=True))
def test_gkiam_letter_h():
from gk_iam import gk_iam_nearest_multi, compute_kernel
from iam import median_distance
ds = {'name': 'Letter-high', 'dataset': '../datasets/Letter-high/Letter-high_A.txt',
'extra_params': {}} # node nsymb
# ds = {'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt',
# 'extra_params': {}} # node nsymb
Gn, y_all = loadDataset(ds['dataset'], extra_params=ds['extra_params'])
gkernel = 'structuralspkernel'
lmbda = 0.03 # termination probalility
r_max = 3 # recursions
# alpha_range = np.linspace(0.5, 0.5, 1)
k = 10 # k nearest neighbors
# classify graphs according to letters.
idx_dict = get_same_item_indices(y_all)
time_list = []
sod_list = []
sod_min_list = []
for letter in idx_dict:
print('\n-------------------------------------------------------\n')
Gn_let = [Gn[i].copy() for i in idx_dict[letter]]
Gn_mix = Gn_let + [g.copy() for g in Gn_let]
alpha_range = np.linspace(1 / len(Gn_let), 1 / len(Gn_let), 1)
# compute
time0 = time.time()
km = compute_kernel(Gn_mix, gkernel, True)
g_best = []
dis_best = []
# for each alpha
for alpha in alpha_range:
print('alpha =', alpha)
dhat, ghat_list = gk_iam_nearest_multi(Gn_let, Gn_let, [alpha] * len(Gn_let),
range(len(Gn_let), len(Gn_mix)), km,
k, r_max, gkernel, c_ei=1.7,
c_er=1.7, c_es=1.7)
dis_best.append(dhat)
g_best.append(ghat_list)
time_list.append(time.time() - time0)
# show best graphs and save them to file.
for idx, item in enumerate(alpha_range):
print('when alpha is', item, 'the shortest distance is', dis_best[idx])
print('the corresponding pre-images are')
for g in g_best[idx]:
draw_Letter_graph(g, savepath='results/gk_iam/')
# nx.draw_networkx(g)
# plt.show()
print(g.nodes(data=True))
print(g.edges(data=True))
# compute the corresponding sod in graph space. (alpha range not considered.)
sod_tmp, _ = median_distance(g_best[0], Gn_let)
sod_list.append(sod_tmp)
sod_min_list.append(np.min(sod_tmp))
print('\nsods in graph space: ', sod_list)
print('\nsmallest sod in graph space for each letter: ', sod_min_list)
print('\ntimes:', time_list)
def get_same_item_indices(ls):
"""Get the indices of the same items in a list. Return a dict keyed by items.
"""
idx_dict = {}
for idx, item in enumerate(ls):
if item in idx_dict:
idx_dict[item].append(idx)
else:
idx_dict[item] = [idx]
return idx_dict


#def compute_letter_median_by_average(Gn):
# return g_median

def test_iam_letter_h():
from iam import test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations
from gk_iam import dis_gstar, compute_kernel
ds = {'name': 'Letter-high', 'dataset': '../datasets/Letter-high/Letter-high_A.txt',
'extra_params': {}} # node nsymb
# ds = {'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt',
# 'extra_params': {}} # node nsymb
Gn, y_all = loadDataset(ds['dataset'], extra_params=ds['extra_params'])
lmbda = 0.03 # termination probalility
# alpha_range = np.linspace(0.5, 0.5, 1)
# classify graphs according to letters.
idx_dict = get_same_item_indices(y_all)
time_list = []
sod_list = []
sod_min_list = []
for letter in idx_dict:
Gn_let = [Gn[i].copy() for i in idx_dict[letter]]
alpha_range = np.linspace(1 / len(Gn_let), 1 / len(Gn_let), 1)
# compute
g_best = []
dis_best = []
time0 = time.time()
# for each alpha
for alpha in alpha_range:
print('alpha =', alpha)
ghat_list, dhat = test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations(
Gn_let, Gn_let, c_ei=1.7, c_er=1.7, c_es=1.7)
dis_best.append(dhat)
g_best.append(ghat_list)
time_list.append(time.time() - time0)
# show best graphs and save them to file.
for idx, item in enumerate(alpha_range):
print('when alpha is', item, 'the shortest distance is', dis_best[idx])
print('the corresponding pre-images are')
for g in g_best[idx]:
draw_Letter_graph(g, savepath='results/iam/')
# nx.draw_networkx(g)
# plt.show()
print(g.nodes(data=True))
print(g.edges(data=True))
# compute the corresponding sod in kernel space. (alpha range not considered.)
gkernel = 'structuralspkernel'
sod_tmp = []
Gn_mix = g_best[0] + Gn_let
km = compute_kernel(Gn_mix, gkernel, True)
for ig, g in tqdm(enumerate(g_best[0]), desc='computing kernel sod', file=sys.stdout):
dtemp = dis_gstar(ig, range(len(g_best[0]), len(Gn_mix)),
[alpha_range[0]] * len(Gn_let), km, withterm3=False)
sod_tmp.append(dtemp)
sod_list.append(sod_tmp)
sod_min_list.append(np.min(sod_tmp))
print('\nsods in kernel space: ', sod_list)
print('\nsmallest sod in kernel space for each letter: ', sod_min_list)
print('\ntimes:', time_list)

if __name__ == '__main__':
# ds = {'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt',
# 'extra_params': {}} # node/edge symb
ds = {'name': 'Letter-high', 'dataset': '../datasets/Letter-high/Letter-high_A.txt',
'extra_params': {}} # node nsymb
# ds = {'name': 'Acyclic', 'dataset': '../datasets/monoterpenoides/trainset_9.ds',
# 'extra_params': {}}
# ds = {'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds',
# 'extra_params': {}} # node symb
Gn, y_all = loadDataset(ds['dataset'], extra_params=ds['extra_params'])
# Gn = Gn[0:20]
# import networkx.algorithms.isomorphism as iso
# G1 = nx.MultiDiGraph()
# G2 = nx.MultiDiGraph()
# G1.add_nodes_from([1,2,3], fill='red')
# G2.add_nodes_from([10,20,30,40], fill='red')
# nx.add_path(G1, [1,2,3,4], weight=3, linewidth=2.5)
# nx.add_path(G2, [10,20,30,40], weight=3)
# nm = iso.categorical_node_match('fill', 'red')
# print(nx.is_isomorphic(G1, G2, node_match=nm))
#
# test_new_IAM_allGraph_deleteNodes(Gn)
# test_will_IAM_give_the_median_graph_we_wanted(Gn)
# test_who_is_the_closest_in_GED_space(Gn)
# test_who_is_the_closest_in_kernel_space(Gn)
# test_the_simple_two(Gn, 'untilhpathkernel')
# test_remove_bests(Gn, 'untilhpathkernel')
test_gkiam_letter_h()
# test_iam_letter_h()

+ 15
- 13
pygraph/kernels/commonWalkKernel.py View File

@@ -23,7 +23,7 @@ from pygraph.utils.parallel import parallel_gm
def commonwalkkernel(*args,
node_label='atom',
edge_label='bond_type',
n=None,
# n=None,
weight=1,
compute_method=None,
n_jobs=None,
@@ -35,26 +35,28 @@ def commonwalkkernel(*args,
List of graphs between which the kernels are calculated.
/
G1, G2 : NetworkX graphs
2 graphs between which the kernel is calculated.
Two graphs between which the kernel is calculated.
node_label : string
node attribute used as label. The default node label is atom.
Node attribute used as symbolic label. The default node label is 'atom'.
edge_label : string
edge attribute used as label. The default edge label is bond_type.
n : integer
Longest length of walks. Only useful when applying the 'brute' method.
Edge attribute used as symbolic label. The default edge label is 'bond_type'.
# n : integer
# Longest length of walks. Only useful when applying the 'brute' method.
weight: integer
Weight coefficient of different lengths of walks, which represents beta
in 'exp' method and gamma in 'geo'.
compute_method : string
Method used to compute walk kernel. The Following choices are
available:
'exp' : exponential serial method applied on the direct product graph,
as shown in reference [1]. The time complexity is O(n^6) for graphs
with n vertices.
'geo' : geometric serial method applied on the direct product graph, as
shown in reference [1]. The time complexity is O(n^6) for graphs with n
vertices.
'brute' : brute force, simply search for all walks and compare them.
'exp': method based on exponential serials applied on the direct
product graph, as shown in reference [1]. The time complexity is O(n^6)
for graphs with n vertices.
'geo': method based on geometric serials applied on the direct product
graph, as shown in reference [1]. The time complexity is O(n^6) for
graphs with n vertices.
# 'brute': brute force, simply search for all walks and compare them.
n_jobs : int
Number of jobs for parallelization.

Return
------


+ 9
- 6
pygraph/kernels/marginalizedKernel.py View File

@@ -44,17 +44,20 @@ def marginalizedkernel(*args,
List of graphs between which the kernels are calculated.
/
G1, G2 : NetworkX graphs
2 graphs between which the kernel is calculated.
Two graphs between which the kernel is calculated.
node_label : string
node attribute used as label. The default node label is atom.
Node attribute used as symbolic label. The default node label is 'atom'.
edge_label : string
edge attribute used as label. The default edge label is bond_type.
Edge attribute used as symbolic label. The default edge label is 'bond_type'.
p_quit : integer
the termination probability in the random walks generating step
The termination probability in the random walks generating step.
n_iteration : integer
time of iterations to calculate R_inf
Time of iterations to calculate R_inf.
remove_totters : boolean
whether to remove totters. The default value is True.
Whether to remove totterings by method introduced in [2]. The default
value is False.
n_jobs : int
Number of jobs for parallelization.

Return
------


+ 59
- 12
pygraph/kernels/randomWalkKernel.py View File

@@ -41,15 +41,62 @@ def randomwalkkernel(*args,
List of graphs between which the kernels are calculated.
/
G1, G2 : NetworkX graphs
2 graphs between which the kernel is calculated.
node_label : string
node attribute used as label. The default node label is atom.
Two graphs between which the kernel is calculated.
compute_method : string
Method used to compute kernel. The Following choices are
available:
'sylvester' - Sylvester equation method.
'conjugate' - conjugate gradient method.
'fp' - fixed-point iterations.
'spectral' - spectral decomposition.
weight : float
A constant weight set for random walks of length h.
p : None
Initial probability distribution on the unlabeled direct product graph
of two graphs. It is set to be uniform over all vertices in the direct
product graph.
q : None
Stopping probability distribution on the unlabeled direct product graph
of two graphs. It is set to be uniform over all vertices in the direct
product graph.
edge_weight: float
Edge attribute name corresponding to the edge weight.
node_kernels: dict
A dictionary of kernel functions for nodes, including 3 items: 'symb'
for symbolic node labels, 'nsymb' for non-symbolic node labels, 'mix'
for both labels. The first 2 functions take two node labels as
parameters, and the 'mix' function takes 4 parameters, a symbolic and a
non-symbolic label for each the two nodes. Each label is in form of 2-D
dimension array (n_samples, n_features). Each function returns a number
as the kernel value. Ignored when nodes are unlabeled. This argument
is designated to conjugate gradient method and fixed-point iterations.
edge_kernels: dict
A dictionary of kernel functions for edges, including 3 items: 'symb'
for symbolic edge labels, 'nsymb' for non-symbolic edge labels, 'mix'
for both labels. The first 2 functions take two edge labels as
parameters, and the 'mix' function takes 4 parameters, a symbolic and a
non-symbolic label for each the two edges. Each label is in form of 2-D
dimension array (n_samples, n_features). Each function returns a number
as the kernel value. Ignored when edges are unlabeled. This argument
is designated to conjugate gradient method and fixed-point iterations.
node_label: string
Node attribute used as label. The default node label is atom. This
argument is designated to conjugate gradient method and fixed-point
iterations.
edge_label : string
edge attribute used as label. The default edge label is bond_type.
h : integer
Longest length of walks.
method : string
Method used to compute the random walk kernel. Available methods are 'sylvester', 'conjugate', 'fp', 'spectral' and 'kron'.
Edge attribute used as label. The default edge label is bond_type. This
argument is designated to conjugate gradient method and fixed-point
iterations.
sub_kernel: string
Method used to compute walk kernel. The Following choices are
available:
'exp' : method based on exponential serials.
'geo' : method based on geometric serials.
n_jobs: int
Number of jobs for parallelization.

Return
------
@@ -168,7 +215,7 @@ def _sylvester_equation(Gn, lmda, p, q, eweight, n_jobs, verbose=True):

if q == None:
# don't normalize adjacency matrices if q is a uniform vector. Note
# A_wave_list accually contains the transposes of the adjacency matrices.
# A_wave_list actually contains the transposes of the adjacency matrices.
A_wave_list = [
nx.adjacency_matrix(G, eweight).todense().transpose() for G in
(tqdm(Gn, desc='compute adjacency matrices', file=sys.stdout) if
@@ -259,7 +306,7 @@ def _conjugate_gradient(Gn, lmda, p, q, ds_attrs, node_kernels, edge_kernels,
# # this is faster from unlabeled graphs. @todo: why?
# if q == None:
# # don't normalize adjacency matrices if q is a uniform vector. Note
# # A_wave_list accually contains the transposes of the adjacency matrices.
# # A_wave_list actually contains the transposes of the adjacency matrices.
# A_wave_list = [
# nx.adjacency_matrix(G, eweight).todense().transpose() for G in
# tqdm(Gn, desc='compute adjacency matrices', file=sys.stdout)
@@ -376,7 +423,7 @@ def _fixed_point(Gn, lmda, p, q, ds_attrs, node_kernels, edge_kernels,
# # this is faster from unlabeled graphs. @todo: why?
# if q == None:
# # don't normalize adjacency matrices if q is a uniform vector. Note
# # A_wave_list accually contains the transposes of the adjacency matrices.
# # A_wave_list actually contains the transposes of the adjacency matrices.
# A_wave_list = [
# nx.adjacency_matrix(G, eweight).todense().transpose() for G in
# tqdm(Gn, desc='compute adjacency matrices', file=sys.stdout)
@@ -481,7 +528,7 @@ def _spectral_decomposition(Gn, weight, p, q, sub_kernel, eweight, n_jobs, verbo
for G in (tqdm(Gn, desc='spectral decompose', file=sys.stdout) if
verbose else Gn):
# don't normalize adjacency matrices if q is a uniform vector. Note
# A accually is the transpose of the adjacency matrix.
# A actually is the transpose of the adjacency matrix.
A = nx.adjacency_matrix(G, eweight).todense().transpose()
ew, ev = np.linalg.eig(A)
D_list.append(ew)


+ 5
- 3
pygraph/kernels/spKernel.py View File

@@ -33,12 +33,12 @@ def spkernel(*args,
List of graphs between which the kernels are calculated.
/
G1, G2 : NetworkX graphs
2 graphs between which the kernel is calculated.
Two graphs between which the kernel is calculated.
node_label : string
node attribute used as label. The default node label is atom.
Node attribute used as label. The default node label is atom.
edge_weight : string
Edge attribute name corresponding to the edge weight.
node_kernels: dict
node_kernels : dict
A dictionary of kernel functions for nodes, including 3 items: 'symb'
for symbolic node labels, 'nsymb' for non-symbolic node labels, 'mix'
for both labels. The first 2 functions take two node labels as
@@ -46,6 +46,8 @@ def spkernel(*args,
non-symbolic label for each the two nodes. Each label is in form of 2-D
dimension array (n_samples, n_features). Each function returns an
number as the kernel value. Ignored when nodes are unlabeled.
n_jobs : int
Number of jobs for parallelization.

Return
------


+ 14
- 6
pygraph/kernels/structuralspKernel.py View File

@@ -42,14 +42,15 @@ def structuralspkernel(*args,
List of graphs between which the kernels are calculated.
/
G1, G2 : NetworkX graphs
2 graphs between which the kernel is calculated.
Two graphs between which the kernel is calculated.
node_label : string
node attribute used as label. The default node label is atom.
Node attribute used as label. The default node label is atom.
edge_weight : string
Edge attribute name corresponding to the edge weight.
Edge attribute name corresponding to the edge weight. Applied for the
computation of the shortest paths.
edge_label : string
edge attribute used as label. The default edge label is bond_type.
node_kernels: dict
Edge attribute used as label. The default edge label is bond_type.
node_kernels : dict
A dictionary of kernel functions for nodes, including 3 items: 'symb'
for symbolic node labels, 'nsymb' for non-symbolic node labels, 'mix'
for both labels. The first 2 functions take two node labels as
@@ -57,7 +58,7 @@ def structuralspkernel(*args,
non-symbolic label for each the two nodes. Each label is in form of 2-D
dimension array (n_samples, n_features). Each function returns a number
as the kernel value. Ignored when nodes are unlabeled.
edge_kernels: dict
edge_kernels : dict
A dictionary of kernel functions for edges, including 3 items: 'symb'
for symbolic edge labels, 'nsymb' for non-symbolic edge labels, 'mix'
for both labels. The first 2 functions take two edge labels as
@@ -65,6 +66,13 @@ def structuralspkernel(*args,
non-symbolic label for each the two edges. Each label is in form of 2-D
dimension array (n_samples, n_features). Each function returns a number
as the kernel value. Ignored when edges are unlabeled.
compute_method : string
Computation method to store the shortest paths and compute the graph
kernel. The Following choices are available:
'trie': store paths as tries.
'naive': store paths to lists.
n_jobs : int
Number of jobs for parallelization.

Return
------


+ 11
- 3
pygraph/kernels/treeletKernel.py View File

@@ -40,11 +40,19 @@ def treeletkernel(*args,
The sub-kernel between 2 real number vectors. Each vector counts the
numbers of isomorphic treelets in a graph.
node_label : string
Node attribute used as label. The default node label is atom.
Node attribute used as label. The default node label is atom.
edge_label : string
Edge attribute used as label. The default edge label is bond_type.
labeled : boolean
Whether the graphs are labeled. The default is True.
parallel : string/None
Which paralleliztion method is applied to compute the kernel. The
Following choices are available:
'imap_unordered': use Python's multiprocessing.Pool.imap_unordered
method.
None: no parallelization is applied.
n_jobs : int
Number of jobs for parallelization. The default is to use all
computational cores. This argument is only valid when one of the
parallelization method is applied.

Return
------


+ 13
- 5
pygraph/kernels/untilHPathKernel.py View File

@@ -26,7 +26,7 @@ def untilhpathkernel(*args,
node_label='atom',
edge_label='bond_type',
depth=10,
k_func='tanimoto',
k_func='MinMax',
compute_method='trie',
n_jobs=None,
verbose=True):
@@ -38,7 +38,7 @@ def untilhpathkernel(*args,
List of graphs between which the kernels are calculated.
/
G1, G2 : NetworkX graphs
2 graphs between which the kernel is calculated.
Two graphs between which the kernel is calculated.
node_label : string
Node attribute used as label. The default node label is atom.
edge_label : string
@@ -47,9 +47,17 @@ def untilhpathkernel(*args,
Depth of search. Longest length of paths.
k_func : function
A kernel function applied using different notions of fingerprint
similarity.
compute_method: string
Computation method, 'trie' or 'naive'.
similarity, defining the type of feature map and normalization method
applied for the graph kernel. The Following choices are available:
'MinMax': use the MiniMax kernel and counting feature map.
'tanimoto': use the Tanimoto kernel and binary feature map.
compute_method : string
Computation method to store paths and compute the graph kernel. The
Following choices are available:
'trie': store paths as tries.
'naive': store paths to lists.
n_jobs : int
Number of jobs for parallelization.

Return
------


+ 18
- 5
pygraph/kernels/weisfeilerLehmanKernel.py View File

@@ -38,15 +38,28 @@ def weisfeilerlehmankernel(*args,
List of graphs between which the kernels are calculated.
/
G1, G2 : NetworkX graphs
2 graphs between which the kernel is calculated.
Two graphs between which the kernel is calculated.
node_label : string
node attribute used as label. The default node label is atom.
Node attribute used as label. The default node label is atom.
edge_label : string
edge attribute used as label. The default edge label is bond_type.
Edge attribute used as label. The default edge label is bond_type.
height : int
subtree height
Subtree height.
base_kernel : string
base kernel used in each iteration of WL kernel. The default base kernel is subtree kernel. For user-defined kernel, base_kernel is the name of the base kernel function used in each iteration of WL kernel. This function returns a Numpy matrix, each element of which is the user-defined Weisfeiler-Lehman kernel between 2 praphs.
Base kernel used in each iteration of WL kernel. Only default 'subtree'
kernel can be applied for now.
# The default base
# kernel is subtree kernel. For user-defined kernel, base_kernel is the
# name of the base kernel function used in each iteration of WL kernel.
# This function returns a Numpy matrix, each element of which is the
# user-defined Weisfeiler-Lehman kernel between 2 praphs.
parallel : None
Which paralleliztion method is applied to compute the kernel. No
parallelization can be applied for now.
n_jobs : int
Number of jobs for parallelization. The default is to use all
computational cores. This argument is only valid when one of the
parallelization method is applied and can be ignored for now.

Return
------


Loading…
Cancel
Save