@@ -0,0 +1,65 @@ | |||||
Node labels: [symbol] | |||||
Node attributes: [chem, charge, x, y] | |||||
Edge labels: [valence] | |||||
Node labels were converted to integer values using this map: | |||||
Component 0: | |||||
0 C | |||||
1 O | |||||
2 N | |||||
3 Cl | |||||
4 F | |||||
5 S | |||||
6 Se | |||||
7 P | |||||
8 Na | |||||
9 I | |||||
10 Co | |||||
11 Br | |||||
12 Li | |||||
13 Si | |||||
14 Mg | |||||
15 Cu | |||||
16 As | |||||
17 B | |||||
18 Pt | |||||
19 Ru | |||||
20 K | |||||
21 Pd | |||||
22 Au | |||||
23 Te | |||||
24 W | |||||
25 Rh | |||||
26 Zn | |||||
27 Bi | |||||
28 Pb | |||||
29 Ge | |||||
30 Sb | |||||
31 Sn | |||||
32 Ga | |||||
33 Hg | |||||
34 Ho | |||||
35 Tl | |||||
36 Ni | |||||
37 Tb | |||||
Edge labels were converted to integer values using this map: | |||||
Component 0: | |||||
0 1 | |||||
1 2 | |||||
2 3 | |||||
Class labels were converted to integer values using this map: | |||||
0 a | |||||
1 i | |||||
@@ -0,0 +1,75 @@ | |||||
README for dataset DD | |||||
=== Usage === | |||||
This folder contains the following comma separated text files | |||||
(replace DS by the name of the dataset): | |||||
n = total number of nodes | |||||
m = total number of edges | |||||
N = number of graphs | |||||
(1) DS_A.txt (m lines) | |||||
sparse (block diagonal) adjacency matrix for all graphs, | |||||
each line corresponds to (row, col) resp. (node_id, node_id) | |||||
(2) DS_graph_indicator.txt (n lines) | |||||
column vector of graph identifiers for all nodes of all graphs, | |||||
the value in the i-th line is the graph_id of the node with node_id i | |||||
(3) DS_graph_labels.txt (N lines) | |||||
class labels for all graphs in the dataset, | |||||
the value in the i-th line is the class label of the graph with graph_id i | |||||
(4) DS_node_labels.txt (n lines) | |||||
column vector of node labels, | |||||
the value in the i-th line corresponds to the node with node_id i | |||||
There are OPTIONAL files if the respective information is available: | |||||
(5) DS_edge_labels.txt (m lines; same size as DS_A_sparse.txt) | |||||
labels for the edges in DS_A_sparse.txt | |||||
(6) DS_edge_attributes.txt (m lines; same size as DS_A.txt) | |||||
attributes for the edges in DS_A.txt | |||||
(7) DS_node_attributes.txt (n lines) | |||||
matrix of node attributes, | |||||
the comma seperated values in the i-th line is the attribute vector of the node with node_id i | |||||
(8) DS_graph_attributes.txt (N lines) | |||||
regression values for all graphs in the dataset, | |||||
the value in the i-th line is the attribute of the graph with graph_id i | |||||
=== Description === | |||||
D&D is a dataset of 1178 protein structures (Dobson and Doig, 2003). Each protein is | |||||
represented by a graph, in which the nodes are amino acids and two nodes are connected | |||||
by an edge if they are less than 6 Angstroms apart. The prediction task is to classify | |||||
the protein structures into enzymes and non-enzymes. | |||||
=== Previous Use of the Dataset === | |||||
Neumann, M., Garnett R., Bauckhage Ch., Kersting K.: Propagation Kernels: Efficient Graph | |||||
Kernels from Propagated Information. Under review at MLJ. | |||||
Neumann, M., Patricia, N., Garnett, R., Kersting, K.: Efficient Graph Kernels by | |||||
Randomization. In: P.A. Flach, T.D. Bie, N. Cristianini (eds.) ECML/PKDD, Notes in | |||||
Computer Science, vol. 7523, pp. 378-393. Springer (2012). | |||||
Shervashidze, N., Schweitzer, P., van Leeuwen, E., Mehlhorn, K., Borgwardt, K.: | |||||
Weisfeiler-Lehman Graph Kernels. Journal of Machine Learning Research 12, 2539-2561 (2011) | |||||
=== References === | |||||
P. D. Dobson and A. J. Doig. Distinguishing enzyme structures from non-enzymes without | |||||
alignments. J. Mol. Biol., 330(4):771–783, Jul 2003. | |||||
@@ -0,0 +1,70 @@ | |||||
README for dataset NCI1 | |||||
=== Usage === | |||||
This folder contains the following comma separated text files | |||||
(replace DS by the name of the dataset): | |||||
n = total number of nodes | |||||
m = total number of edges | |||||
N = number of graphs | |||||
(1) DS_A.txt (m lines) | |||||
sparse (block diagonal) adjacency matrix for all graphs, | |||||
each line corresponds to (row, col) resp. (node_id, node_id) | |||||
(2) DS_graph_indicator.txt (n lines) | |||||
column vector of graph identifiers for all nodes of all graphs, | |||||
the value in the i-th line is the graph_id of the node with node_id i | |||||
(3) DS_graph_labels.txt (N lines) | |||||
class labels for all graphs in the dataset, | |||||
the value in the i-th line is the class label of the graph with graph_id i | |||||
(4) DS_node_labels.txt (n lines) | |||||
column vector of node labels, | |||||
the value in the i-th line corresponds to the node with node_id i | |||||
There are OPTIONAL files if the respective information is available: | |||||
(5) DS_edge_labels.txt (m lines; same size as DS_A_sparse.txt) | |||||
labels for the edges in DS_A_sparse.txt | |||||
(6) DS_edge_attributes.txt (m lines; same size as DS_A.txt) | |||||
attributes for the edges in DS_A.txt | |||||
(7) DS_node_attributes.txt (n lines) | |||||
matrix of node attributes, | |||||
the comma seperated values in the i-th line is the attribute vector of the node with node_id i | |||||
(8) DS_graph_attributes.txt (N lines) | |||||
regression values for all graphs in the dataset, | |||||
the value in the i-th line is the attribute of the graph with graph_id i | |||||
=== Description === | |||||
NCI1 and NCI109 represent two balanced subsets of datasets of chemical compounds screened | |||||
for activity against non-small cell lung cancer and ovarian cancer cell lines respectively | |||||
(Wale and Karypis (2006) and http://pubchem.ncbi.nlm.nih.gov). | |||||
=== Previous Use of the Dataset === | |||||
Neumann, M., Garnett R., Bauckhage Ch., Kersting K.: Propagation Kernels: Efficient Graph | |||||
Kernels from Propagated Information. Under review at MLJ. | |||||
Neumann, M., Patricia, N., Garnett, R., Kersting, K.: Efficient Graph Kernels by | |||||
Randomization. In: P.A. Flach, T.D. Bie, N. Cristianini (eds.) ECML/PKDD, Notes in | |||||
Computer Science, vol. 7523, pp. 378-393. Springer (2012). | |||||
Shervashidze, N., Schweitzer, P., van Leeuwen, E., Mehlhorn, K., Borgwardt, K.: | |||||
Weisfeiler-Lehman Graph Kernels. Journal of Machine Learning Research 12, 2539-2561 (2011) | |||||
=== References === | |||||
N. Wale and G. Karypis. Comparison of descriptor spaces for chemical compound retrieval and | |||||
classification. In Proc. of ICDM, pages 678–689, Hong Kong, 2006. | |||||
@@ -0,0 +1,70 @@ | |||||
README for dataset NCI109 | |||||
=== Usage === | |||||
This folder contains the following comma separated text files | |||||
(replace DS by the name of the dataset): | |||||
n = total number of nodes | |||||
m = total number of edges | |||||
N = number of graphs | |||||
(1) DS_A.txt (m lines) | |||||
sparse (block diagonal) adjacency matrix for all graphs, | |||||
each line corresponds to (row, col) resp. (node_id, node_id) | |||||
(2) DS_graph_indicator.txt (n lines) | |||||
column vector of graph identifiers for all nodes of all graphs, | |||||
the value in the i-th line is the graph_id of the node with node_id i | |||||
(3) DS_graph_labels.txt (N lines) | |||||
class labels for all graphs in the dataset, | |||||
the value in the i-th line is the class label of the graph with graph_id i | |||||
(4) DS_node_labels.txt (n lines) | |||||
column vector of node labels, | |||||
the value in the i-th line corresponds to the node with node_id i | |||||
There are OPTIONAL files if the respective information is available: | |||||
(5) DS_edge_labels.txt (m lines; same size as DS_A_sparse.txt) | |||||
labels for the edges in DS_A_sparse.txt | |||||
(6) DS_edge_attributes.txt (m lines; same size as DS_A.txt) | |||||
attributes for the edges in DS_A.txt | |||||
(7) DS_node_attributes.txt (n lines) | |||||
matrix of node attributes, | |||||
the comma seperated values in the i-th line is the attribute vector of the node with node_id i | |||||
(8) DS_graph_attributes.txt (N lines) | |||||
regression values for all graphs in the dataset, | |||||
the value in the i-th line is the attribute of the graph with graph_id i | |||||
=== Description === | |||||
NCI1 and NCI109 represent two balanced subsets of datasets of chemical compounds screened | |||||
for activity against non-small cell lung cancer and ovarian cancer cell lines respectively | |||||
(Wale and Karypis (2006) and http://pubchem.ncbi.nlm.nih.gov). | |||||
=== Previous Use of the Dataset === | |||||
Neumann, M., Garnett R., Bauckhage Ch., Kersting K.: Propagation Kernels: Efficient Graph | |||||
Kernels from Propagated Information. Under review at MLJ. | |||||
Neumann, M., Patricia, N., Garnett, R., Kersting, K.: Efficient Graph Kernels by | |||||
Randomization. In: P.A. Flach, T.D. Bie, N. Cristianini (eds.) ECML/PKDD, Notes in | |||||
Computer Science, vol. 7523, pp. 378-393. Springer (2012). | |||||
Shervashidze, N., Schweitzer, P., van Leeuwen, E., Mehlhorn, K., Borgwardt, K.: | |||||
Weisfeiler-Lehman Graph Kernels. Journal of Machine Learning Research 12, 2539-2561 (2011) | |||||
=== References === | |||||
N. Wale and G. Karypis. Comparison of descriptor spaces for chemical compound retrieval and | |||||
classification. In Proc. of ICDM, pages 678–689, Hong Kong, 2006. | |||||
@@ -12,21 +12,21 @@ import multiprocessing | |||||
from pygraph.kernels.commonWalkKernel import commonwalkkernel | from pygraph.kernels.commonWalkKernel import commonwalkkernel | ||||
dslist = [ | dslist = [ | ||||
# {'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds', | |||||
# 'task': 'regression'}, # node symb | |||||
# {'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression', | |||||
# 'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'}, | |||||
# # contains single node graph, node symb | |||||
# {'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb | |||||
# {'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled | |||||
# {'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb | |||||
# {'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'}, | |||||
# # node nsymb | |||||
# {'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'}, | |||||
# # node symb/nsymb | |||||
{'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression', | |||||
'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'}, | |||||
# contains single node graph, node symb | |||||
{'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds', | |||||
'task': 'regression'}, # node symb | |||||
{'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb | |||||
{'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled | |||||
{'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb | |||||
{'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'}, | |||||
# node nsymb | |||||
{'name': 'AIDS', 'dataset': '../datasets/AIDS/AIDS_A.txt'}, # node symb/nsymb, edge symb | |||||
{'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'}, | |||||
# node symb/nsymb | |||||
# {'name': 'NCI1', 'dataset': '../datasets/NCI1/NCI1_A.txt'}, # node symb | # {'name': 'NCI1', 'dataset': '../datasets/NCI1/NCI1_A.txt'}, # node symb | ||||
# {'name': 'NCI109', 'dataset': '../datasets/NCI109/NCI109_A.txt'}, # node symb | # {'name': 'NCI109', 'dataset': '../datasets/NCI109/NCI109_A.txt'}, # node symb | ||||
{'name': 'AIDS', 'dataset': '../datasets/AIDS/AIDS_A.txt'}, # node symb/nsymb, edge symb | |||||
# {'name': 'D&D', 'dataset': '../datasets/DD/DD_A.txt'}, # node symb | # {'name': 'D&D', 'dataset': '../datasets/DD/DD_A.txt'}, # node symb | ||||
# | # | ||||
# {'name': 'Mutagenicity', 'dataset': '../datasets/Mutagenicity/Mutagenicity_A.txt'}, | # {'name': 'Mutagenicity', 'dataset': '../datasets/Mutagenicity/Mutagenicity_A.txt'}, | ||||
@@ -12,22 +12,22 @@ import multiprocessing | |||||
from pygraph.kernels.marginalizedKernel import marginalizedkernel | from pygraph.kernels.marginalizedKernel import marginalizedkernel | ||||
dslist = [ | dslist = [ | ||||
# {'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds', | |||||
# 'task': 'regression'}, # node symb | |||||
# {'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression', | |||||
# 'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'}, | |||||
# # contains single node graph, node symb | |||||
# {'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb | |||||
# {'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled | |||||
# {'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb | |||||
# {'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'}, | |||||
# # node nsymb | |||||
# {'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'}, | |||||
# # node symb/nsymb | |||||
{'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression', | |||||
'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'}, | |||||
# contains single node graph, node symb | |||||
{'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds', | |||||
'task': 'regression'}, # node symb | |||||
{'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb | |||||
{'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled | |||||
{'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb | |||||
{'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'}, | |||||
# node nsymb | |||||
{'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'}, | |||||
# node symb/nsymb | |||||
# {'name': 'AIDS', 'dataset': '../datasets/AIDS/AIDS_A.txt'}, # node symb/nsymb, edge symb | # {'name': 'AIDS', 'dataset': '../datasets/AIDS/AIDS_A.txt'}, # node symb/nsymb, edge symb | ||||
# {'name': 'NCI1', 'dataset': '../datasets/NCI1/NCI1_A.txt'}, # node symb | # {'name': 'NCI1', 'dataset': '../datasets/NCI1/NCI1_A.txt'}, # node symb | ||||
# {'name': 'NCI109', 'dataset': '../datasets/NCI109/NCI109_A.txt'}, # node symb | # {'name': 'NCI109', 'dataset': '../datasets/NCI109/NCI109_A.txt'}, # node symb | ||||
{'name': 'D&D', 'dataset': '../datasets/DD/DD_A.txt'}, # node symb | |||||
# {'name': 'D&D', 'dataset': '../datasets/DD/DD_A.txt'}, # node symb | |||||
# | # | ||||
# {'name': 'Mutagenicity', 'dataset': '../datasets/Mutagenicity/Mutagenicity_A.txt'}, | # {'name': 'Mutagenicity', 'dataset': '../datasets/Mutagenicity/Mutagenicity_A.txt'}, | ||||
# # node/edge symb | # # node/edge symb | ||||
@@ -17,22 +17,23 @@ import numpy as np | |||||
dslist = [ | dslist = [ | ||||
# {'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds', | |||||
# 'task': 'regression'}, # node symb | |||||
# {'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression', | |||||
# 'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'}, | |||||
# # contains single node graph, node symb | |||||
# {'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb | |||||
# {'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled | |||||
# {'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb | |||||
# {'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'}, | |||||
# # node symb/nsymb | |||||
# {'name': 'NCI1', 'dataset': '../datasets/NCI1/NCI1_A.txt'}, # node symb | |||||
# {'name': 'NCI109', 'dataset': '../datasets/NCI109/NCI109_A.txt'}, # node symb | |||||
{'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression', | |||||
'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'}, | |||||
# contains single node graph, node symb | |||||
{'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds', | |||||
'task': 'regression'}, # node symb | |||||
{'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb | |||||
{'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled | |||||
{'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb | |||||
{'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'}, | |||||
# node nsymb | |||||
{'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'}, | |||||
# node symb/nsymb | |||||
{'name': 'AIDS', 'dataset': '../datasets/AIDS/AIDS_A.txt'}, # node symb/nsymb, edge symb | {'name': 'AIDS', 'dataset': '../datasets/AIDS/AIDS_A.txt'}, # node symb/nsymb, edge symb | ||||
# {'name': 'D&D', 'dataset': '../datasets/DD/DD_A.txt'}, # node symb | |||||
# {'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'}, | |||||
# # node nsymb | |||||
{'name': 'NCI1', 'dataset': '../datasets/NCI1/NCI1_A.txt'}, # node symb | |||||
{'name': 'NCI109', 'dataset': '../datasets/NCI109/NCI109_A.txt'}, # node symb | |||||
{'name': 'D&D', 'dataset': '../datasets/DD/DD_A.txt'}, # node symb | |||||
# | # | ||||
# {'name': 'Mutagenicity', 'dataset': '../datasets/Mutagenicity/Mutagenicity_A.txt'}, | # {'name': 'Mutagenicity', 'dataset': '../datasets/Mutagenicity/Mutagenicity_A.txt'}, | ||||
# # node/edge symb | # # node/edge symb | ||||
@@ -8,14 +8,14 @@ from pygraph.utils.kernels import deltakernel, gaussiankernel, kernelproduct | |||||
# datasets | # datasets | ||||
dslist = [ | dslist = [ | ||||
# {'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds', | |||||
# 'task': 'regression'}, # node symb | |||||
# {'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression', | |||||
# 'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'}, | |||||
# # contains single node graph, node symb | |||||
# {'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb | |||||
# {'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled | |||||
# {'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb | |||||
{'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression', | |||||
'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'}, | |||||
# contains single node graph, node symb | |||||
{'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds', | |||||
'task': 'regression'}, # node symb | |||||
{'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb | |||||
{'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled | |||||
{'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb | |||||
{'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'}, | {'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'}, | ||||
# node nsymb | # node nsymb | ||||
{'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'}, | {'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'}, | ||||
@@ -14,22 +14,22 @@ from pygraph.kernels.structuralspKernel import structuralspkernel | |||||
from pygraph.utils.kernels import deltakernel, gaussiankernel, kernelproduct | from pygraph.utils.kernels import deltakernel, gaussiankernel, kernelproduct | ||||
dslist = [ | dslist = [ | ||||
# {'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds', | |||||
# 'task': 'regression'}, # node symb | |||||
# {'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression', | |||||
# 'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'}, | |||||
# # contains single node graph, node symb | |||||
# {'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb | |||||
# {'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled | |||||
# {'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb | |||||
# {'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'}, | |||||
# # node nsymb | |||||
# {'name': 'AIDS', 'dataset': '../datasets/AIDS/AIDS_A.txt'}, # node symb/nsymb, edge symb | |||||
# {'name': 'NCI1', 'dataset': '../datasets/NCI1/NCI1_A.txt'}, # node symb | |||||
# {'name': 'NCI109', 'dataset': '../datasets/NCI109/NCI109_A.txt'}, # node symb | |||||
{'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression', | |||||
'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'}, | |||||
# contains single node graph, node symb | |||||
{'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds', | |||||
'task': 'regression'}, # node symb | |||||
{'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb | |||||
{'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled | |||||
{'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb | |||||
{'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'}, | |||||
# node nsymb | |||||
{'name': 'AIDS', 'dataset': '../datasets/AIDS/AIDS_A.txt'}, # node symb/nsymb, edge symb | |||||
{'name': 'NCI1', 'dataset': '../datasets/NCI1/NCI1_A.txt'}, # node symb | |||||
{'name': 'NCI109', 'dataset': '../datasets/NCI109/NCI109_A.txt'}, # node symb | |||||
# {'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'}, | # {'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'}, | ||||
# # node symb/nsymb | # # node symb/nsymb | ||||
{'name': 'D&D', 'dataset': '../datasets/DD/DD_A.txt'}, # node symb | |||||
# {'name': 'D&D', 'dataset': '../datasets/DD/DD_A.txt'}, # node symb | |||||
# | # | ||||
# {'name': 'Mutagenicity', 'dataset': '../datasets/Mutagenicity/Mutagenicity_A.txt'}, | # {'name': 'Mutagenicity', 'dataset': '../datasets/Mutagenicity/Mutagenicity_A.txt'}, | ||||
# # node/edge symb | # # node/edge symb | ||||
@@ -14,22 +14,22 @@ from pygraph.kernels.treeletKernel import treeletkernel | |||||
from pygraph.utils.kernels import gaussiankernel, polynomialkernel | from pygraph.utils.kernels import gaussiankernel, polynomialkernel | ||||
dslist = [ | dslist = [ | ||||
# {'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds', | |||||
# 'task': 'regression'}, # node symb | |||||
{'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression', | {'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression', | ||||
'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'}, | 'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'}, | ||||
# contains single node graph, node symb | # contains single node graph, node symb | ||||
{'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds', | |||||
'task': 'regression'}, # node symb | |||||
{'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb | {'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb | ||||
{'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled | {'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled | ||||
{'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb | {'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb | ||||
{'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'}, | |||||
# node symb/nsymb | |||||
{'name': 'NCI1', 'dataset': '../datasets/NCI1/NCI1_A.txt'}, # node symb | {'name': 'NCI1', 'dataset': '../datasets/NCI1/NCI1_A.txt'}, # node symb | ||||
{'name': 'NCI109', 'dataset': '../datasets/NCI109/NCI109_A.txt'}, # node symb | {'name': 'NCI109', 'dataset': '../datasets/NCI109/NCI109_A.txt'}, # node symb | ||||
{'name': 'AIDS', 'dataset': '../datasets/AIDS/AIDS_A.txt'}, # node symb/nsymb, edge symb | {'name': 'AIDS', 'dataset': '../datasets/AIDS/AIDS_A.txt'}, # node symb/nsymb, edge symb | ||||
{'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'}, | |||||
# node nsymb | |||||
{'name': 'D&D', 'dataset': '../datasets/DD/DD_A.txt'}, # node symb | |||||
{'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'}, | |||||
# node symb/nsymb | |||||
# {'name': 'D&D', 'dataset': '../datasets/DD/DD_A.txt'}, # node symb | |||||
# {'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'}, | |||||
# # node nsymb | |||||
# | # | ||||
# {'name': 'Mutagenicity', 'dataset': '../datasets/Mutagenicity/Mutagenicity_A.txt'}, | # {'name': 'Mutagenicity', 'dataset': '../datasets/Mutagenicity/Mutagenicity_A.txt'}, | ||||
# # node/edge symb | # # node/edge symb | ||||
@@ -12,21 +12,21 @@ import multiprocessing | |||||
from pygraph.kernels.untilHPathKernel import untilhpathkernel | from pygraph.kernels.untilHPathKernel import untilhpathkernel | ||||
dslist = [ | dslist = [ | ||||
# {'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds', | |||||
# 'task': 'regression'}, # node symb | |||||
# {'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression', | |||||
# 'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'}, | |||||
# # contains single node graph, node symb | |||||
# {'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb | |||||
# {'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled | |||||
# {'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb | |||||
# {'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'}, | |||||
# # node nsymb | |||||
# {'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'}, | |||||
# # node symb/nsymb | |||||
# {'name': 'NCI1', 'dataset': '../datasets/NCI1/NCI1_A.txt'}, # node symb | |||||
# {'name': 'NCI109', 'dataset': '../datasets/NCI109/NCI109_A.txt'}, # node symb | |||||
# {'name': 'AIDS', 'dataset': '../datasets/AIDS/AIDS_A.txt'}, # node symb/nsymb, edge symb | |||||
{'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression', | |||||
'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'}, | |||||
# contains single node graph, node symb | |||||
{'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds', | |||||
'task': 'regression'}, # node symb | |||||
{'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb | |||||
{'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled | |||||
{'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb | |||||
{'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'}, | |||||
# node nsymb | |||||
{'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'}, | |||||
# node symb/nsymb | |||||
{'name': 'AIDS', 'dataset': '../datasets/AIDS/AIDS_A.txt'}, # node symb/nsymb, edge symb | |||||
{'name': 'NCI1', 'dataset': '../datasets/NCI1/NCI1_A.txt'}, # node symb | |||||
{'name': 'NCI109', 'dataset': '../datasets/NCI109/NCI109_A.txt'}, # node symb | |||||
{'name': 'D&D', 'dataset': '../datasets/DD/DD_A.txt'}, # node symb | {'name': 'D&D', 'dataset': '../datasets/DD/DD_A.txt'}, # node symb | ||||
# | # | ||||
# {'name': 'Mutagenicity', 'dataset': '../datasets/Mutagenicity/Mutagenicity_A.txt'}, | # {'name': 'Mutagenicity', 'dataset': '../datasets/Mutagenicity/Mutagenicity_A.txt'}, | ||||
@@ -14,22 +14,22 @@ from pygraph.kernels.weisfeilerLehmanKernel import weisfeilerlehmankernel | |||||
dslist = [ | dslist = [ | ||||
# {'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds', | |||||
# 'task': 'regression'}, # node symb | |||||
# {'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression', | |||||
# 'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'}, | |||||
# # contains single node graph, node symb | |||||
# {'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb | |||||
# {'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled | |||||
# {'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb | |||||
{'name': 'Alkane', 'dataset': '../datasets/Alkane/dataset.ds', 'task': 'regression', | |||||
'dataset_y': '../datasets/Alkane/dataset_boiling_point_names.txt'}, | |||||
# contains single node graph, node symb | |||||
{'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds', | |||||
'task': 'regression'}, # node symb | |||||
{'name': 'MAO', 'dataset': '../datasets/MAO/dataset.ds'}, # node/edge symb | |||||
{'name': 'PAH', 'dataset': '../datasets/PAH/dataset.ds'}, # unlabeled | |||||
{'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt'}, # node/edge symb | |||||
# {'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'}, | # {'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt'}, | ||||
# # node nsymb | # # node nsymb | ||||
# {'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'}, | |||||
# # node symb/nsymb | |||||
# {'name': 'NCI1', 'dataset': '../datasets/NCI1/NCI1_A.txt'}, # node symb | |||||
# {'name': 'NCI109', 'dataset': '../datasets/NCI109/NCI109_A.txt'}, # node symb | |||||
# {'name': 'D&D', 'dataset': '../datasets/DD/DD_A.txt'}, # node symb | |||||
{'name': 'AIDS', 'dataset': '../datasets/AIDS/AIDS_A.txt'}, # node symb/nsymb, edge symb | {'name': 'AIDS', 'dataset': '../datasets/AIDS/AIDS_A.txt'}, # node symb/nsymb, edge symb | ||||
{'name': 'ENZYMES', 'dataset': '../datasets/ENZYMES_txt/ENZYMES_A_sparse.txt'}, | |||||
# node symb/nsymb | |||||
{'name': 'NCI1', 'dataset': '../datasets/NCI1/NCI1_A.txt'}, # node symb | |||||
{'name': 'NCI109', 'dataset': '../datasets/NCI109/NCI109_A.txt'}, # node symb | |||||
{'name': 'D&D', 'dataset': '../datasets/DD/DD_A.txt'}, # node symb | |||||
# | # | ||||
# {'name': 'Mutagenicity', 'dataset': '../datasets/Mutagenicity/Mutagenicity_A.txt'}, | # {'name': 'Mutagenicity', 'dataset': '../datasets/Mutagenicity/Mutagenicity_A.txt'}, | ||||
# # node/edge symb | # # node/edge symb | ||||
@@ -277,7 +277,8 @@ def gk_iam_nearest(Gn, alpha, idx_gi, Kmatrix, k, r_max): | |||||
# return dhat, ghat_list | # return dhat, ghat_list | ||||
def gk_iam_nearest_multi(Gn_init, Gn_median, alpha, idx_gi, Kmatrix, k, r_max, gkernel): | |||||
def gk_iam_nearest_multi(Gn_init, Gn_median, alpha, idx_gi, Kmatrix, k, r_max, | |||||
gkernel, c_ei=1, c_er=1, c_es=1, epsilon=0.001): | |||||
"""This function constructs graph pre-image by the iterative pre-image | """This function constructs graph pre-image by the iterative pre-image | ||||
framework in reference [1], algorithm 1, where the step of generating new | framework in reference [1], algorithm 1, where the step of generating new | ||||
graphs randomly is replaced by the IAM algorithm in reference [2]. | graphs randomly is replaced by the IAM algorithm in reference [2]. | ||||
@@ -312,37 +313,44 @@ def gk_iam_nearest_multi(Gn_init, Gn_median, alpha, idx_gi, Kmatrix, k, r_max, g | |||||
return 0, g0hat_list | return 0, g0hat_list | ||||
dhat = dis_gs[0] # the nearest distance | dhat = dis_gs[0] # the nearest distance | ||||
ghat_list = [g.copy() for g in g0hat_list] | ghat_list = [g.copy() for g in g0hat_list] | ||||
for g in ghat_list: | |||||
draw_Letter_graph(g) | |||||
# for g in ghat_list: | |||||
# draw_Letter_graph(g) | |||||
# nx.draw_networkx(g) | # nx.draw_networkx(g) | ||||
# plt.show() | # plt.show() | ||||
print(g.nodes(data=True)) | |||||
print(g.edges(data=True)) | |||||
# print(g.nodes(data=True)) | |||||
# print(g.edges(data=True)) | |||||
Gk = [Gn_init[ig].copy() for ig in sort_idx[0:k]] # the k nearest neighbors | Gk = [Gn_init[ig].copy() for ig in sort_idx[0:k]] # the k nearest neighbors | ||||
for gi in Gk: | |||||
# nx.draw_networkx(gi) | |||||
# plt.show() | |||||
draw_Letter_graph(g) | |||||
print(gi.nodes(data=True)) | |||||
print(gi.edges(data=True)) | |||||
# for gi in Gk: | |||||
## nx.draw_networkx(gi) | |||||
## plt.show() | |||||
# draw_Letter_graph(g) | |||||
# print(gi.nodes(data=True)) | |||||
# print(gi.edges(data=True)) | |||||
Gs_nearest = Gk.copy() | Gs_nearest = Gk.copy() | ||||
# gihat_list = [] | # gihat_list = [] | ||||
# i = 1 | # i = 1 | ||||
r = 1 | |||||
while r < r_max: | |||||
print('r =', r) | |||||
# found = False | |||||
r = 0 | |||||
itr = 0 | |||||
# cur_sod = dhat | |||||
# old_sod = cur_sod * 2 | |||||
sod_list = [dhat] | |||||
found = False | |||||
nb_updated = 0 | |||||
while r < r_max:# and not found: # @todo: if not found?# and np.abs(old_sod - cur_sod) > epsilon: | |||||
print('\nr =', r) | |||||
print('itr for gk =', itr, '\n') | |||||
found = False | |||||
# Gs_nearest = Gk + gihat_list | # Gs_nearest = Gk + gihat_list | ||||
# g_tmp = iam(Gs_nearest) | # g_tmp = iam(Gs_nearest) | ||||
g_tmp_list = test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations( | |||||
Gn_median, Gs_nearest, c_ei=1, c_er=1, c_es=1) | |||||
for g in g_tmp_list: | |||||
g_tmp_list, _ = test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations( | |||||
Gn_median, Gs_nearest, c_ei=c_ei, c_er=c_er, c_es=c_es) | |||||
# for g in g_tmp_list: | |||||
# nx.draw_networkx(g) | # nx.draw_networkx(g) | ||||
# plt.show() | # plt.show() | ||||
draw_Letter_graph(g) | |||||
print(g.nodes(data=True)) | |||||
print(g.edges(data=True)) | |||||
# draw_Letter_graph(g) | |||||
# print(g.nodes(data=True)) | |||||
# print(g.edges(data=True)) | |||||
# compute distance between phi and the new generated graphs. | # compute distance between phi and the new generated graphs. | ||||
knew = compute_kernel(g_tmp_list + Gn_median, gkernel, False) | knew = compute_kernel(g_tmp_list + Gn_median, gkernel, False) | ||||
@@ -358,6 +366,7 @@ def gk_iam_nearest_multi(Gn_init, Gn_median, alpha, idx_gi, Kmatrix, k, r_max, g | |||||
# k_g1_list[1] + alpha[1] * alpha[1] * k_list[1]) | # k_g1_list[1] + alpha[1] * alpha[1] * k_list[1]) | ||||
# find the new k nearest graphs. | # find the new k nearest graphs. | ||||
dnew_best = min(dnew_list) | |||||
dis_gs = dnew_list + dis_gs # add the new nearest distances. | dis_gs = dnew_list + dis_gs # add the new nearest distances. | ||||
Gs_nearest = [g.copy() for g in g_tmp_list] + Gs_nearest # add the corresponding graphs. | Gs_nearest = [g.copy() for g in g_tmp_list] + Gs_nearest # add the corresponding graphs. | ||||
sort_idx = np.argsort(dis_gs) | sort_idx = np.argsort(dis_gs) | ||||
@@ -367,21 +376,34 @@ def gk_iam_nearest_multi(Gn_init, Gn_median, alpha, idx_gi, Kmatrix, k, r_max, g | |||||
print(dis_gs[-1]) | print(dis_gs[-1]) | ||||
Gs_nearest = [Gs_nearest[idx] for idx in sort_idx[0:k]] | Gs_nearest = [Gs_nearest[idx] for idx in sort_idx[0:k]] | ||||
nb_best = len(np.argwhere(dis_gs == dis_gs[0]).flatten().tolist()) | nb_best = len(np.argwhere(dis_gs == dis_gs[0]).flatten().tolist()) | ||||
if len([i for i in sort_idx[0:nb_best] if i < len(dnew_list)]) > 0: | |||||
print('I have smaller or equal distance!') | |||||
if dnew_best < dhat and np.abs(dnew_best - dhat) > epsilon: | |||||
print('I have smaller distance!') | |||||
print(str(dhat) + '->' + str(dis_gs[0])) | print(str(dhat) + '->' + str(dis_gs[0])) | ||||
dhat = dis_gs[0] | dhat = dis_gs[0] | ||||
idx_best_list = np.argwhere(dnew_list == dhat).flatten().tolist() | idx_best_list = np.argwhere(dnew_list == dhat).flatten().tolist() | ||||
ghat_list = [g_tmp_list[idx].copy() for idx in idx_best_list] | ghat_list = [g_tmp_list[idx].copy() for idx in idx_best_list] | ||||
for g in ghat_list: | |||||
# nx.draw_networkx(g) | |||||
# plt.show() | |||||
draw_Letter_graph(g) | |||||
print(g.nodes(data=True)) | |||||
print(g.edges(data=True)) | |||||
r = 0 | |||||
else: | |||||
# for g in ghat_list: | |||||
## nx.draw_networkx(g) | |||||
## plt.show() | |||||
# draw_Letter_graph(g) | |||||
# print(g.nodes(data=True)) | |||||
# print(g.edges(data=True)) | |||||
r = 0 | |||||
found = True | |||||
nb_updated += 1 | |||||
elif np.abs(dnew_best - dhat) < epsilon: | |||||
print('I have almost equal distance!') | |||||
print(str(dhat) + '->' + str(dnew_best)) | |||||
if not found: | |||||
r += 1 | r += 1 | ||||
# old_sod = cur_sod | |||||
# cur_sod = dnew_best | |||||
sod_list.append(dhat) | |||||
itr += 1 | |||||
print('\nthe graph is updated', nb_updated, 'times.') | |||||
print('sods in kernel space:', sod_list, '\n') | |||||
return dhat, ghat_list | return dhat, ghat_list | ||||
@@ -9,6 +9,7 @@ Iterative alternate minimizations using GED. | |||||
import numpy as np | import numpy as np | ||||
import random | import random | ||||
import networkx as nx | import networkx as nx | ||||
from tqdm import tqdm | |||||
import sys | import sys | ||||
#from Cython_GedLib_2 import librariesImport, script | #from Cython_GedLib_2 import librariesImport, script | ||||
@@ -181,13 +182,27 @@ def GED(g1, g2, lib='gedlib'): | |||||
return dis, pi_forward, pi_backward | return dis, pi_forward, pi_backward | ||||
def median_distance(Gn, Gn_median, measure='ged', verbose=False): | |||||
dis_list = [] | |||||
pi_forward_list = [] | |||||
for idx, G in tqdm(enumerate(Gn), desc='computing median distances', | |||||
file=sys.stdout) if verbose else enumerate(Gn): | |||||
dis_sum = 0 | |||||
pi_forward_list.append([]) | |||||
for G_p in Gn_median: | |||||
dis_tmp, pi_tmp_forward, pi_tmp_backward = GED(G, G_p) | |||||
pi_forward_list[idx].append(pi_tmp_forward) | |||||
dis_sum += dis_tmp | |||||
dis_list.append(dis_sum) | |||||
return dis_list, pi_forward_list | |||||
# --------------------------- These are tests --------------------------------# | # --------------------------- These are tests --------------------------------# | ||||
def test_iam_with_more_graphs_as_init(Gn, G_candidate, c_ei=3, c_er=3, c_es=1, | def test_iam_with_more_graphs_as_init(Gn, G_candidate, c_ei=3, c_er=3, c_es=1, | ||||
node_label='atom', edge_label='bond_type'): | node_label='atom', edge_label='bond_type'): | ||||
"""See my name, then you know what I do. | """See my name, then you know what I do. | ||||
""" | """ | ||||
from tqdm import tqdm | |||||
# Gn = Gn[0:10] | # Gn = Gn[0:10] | ||||
Gn = [nx.convert_node_labels_to_integers(g) for g in Gn] | Gn = [nx.convert_node_labels_to_integers(g) for g in Gn] | ||||
@@ -321,7 +336,7 @@ def test_iam_with_more_graphs_as_init(Gn, G_candidate, c_ei=3, c_er=3, c_es=1, | |||||
def test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations( | def test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations( | ||||
Gn_median, Gn_candidate, c_ei=3, c_er=3, c_es=1, node_label='atom', | Gn_median, Gn_candidate, c_ei=3, c_er=3, c_es=1, node_label='atom', | ||||
edge_label='bond_type', connected=True): | |||||
edge_label='bond_type', connected=False): | |||||
"""See my name, then you know what I do. | """See my name, then you know what I do. | ||||
""" | """ | ||||
from tqdm import tqdm | from tqdm import tqdm | ||||
@@ -330,8 +345,11 @@ def test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations( | |||||
node_ir = np.inf # corresponding to the node remove and insertion. | node_ir = np.inf # corresponding to the node remove and insertion. | ||||
label_r = 'thanksdanny' # the label for node remove. # @todo: make this label unrepeatable. | label_r = 'thanksdanny' # the label for node remove. # @todo: make this label unrepeatable. | ||||
ds_attrs = get_dataset_attributes(Gn_median + Gn_candidate, | ds_attrs = get_dataset_attributes(Gn_median + Gn_candidate, | ||||
attr_names=['edge_labeled', 'node_attr_dim'], | |||||
attr_names=['edge_labeled', 'node_attr_dim', 'edge_attr_dim'], | |||||
edge_label=edge_label) | edge_label=edge_label) | ||||
ite_max = 50 | |||||
epsilon = 0.001 | |||||
def generate_graph(G, pi_p_forward, label_set): | def generate_graph(G, pi_p_forward, label_set): | ||||
@@ -460,13 +478,15 @@ def test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations( | |||||
g_tmp.remove_edge(nd1, nd2) | g_tmp.remove_edge(nd1, nd2) | ||||
# do not change anything when equal. | # do not change anything when equal. | ||||
# find the best graph generated in this iteration and update pi_p. | |||||
# # find the best graph generated in this iteration and update pi_p. | |||||
# @todo: should we update all graphs generated or just the best ones? | # @todo: should we update all graphs generated or just the best ones? | ||||
dis_list, pi_forward_list = median_distance(G_new_list, Gn_median) | dis_list, pi_forward_list = median_distance(G_new_list, Gn_median) | ||||
# @todo: should we remove the identical and connectivity check? | # @todo: should we remove the identical and connectivity check? | ||||
# Don't know which is faster. | # Don't know which is faster. | ||||
G_new_list, idx_list = remove_duplicates(G_new_list) | |||||
pi_forward_list = [pi_forward_list[idx] for idx in idx_list] | |||||
if ds_attrs['node_attr_dim'] == 0 and ds_attrs['edge_attr_dim'] == 0: | |||||
G_new_list, idx_list = remove_duplicates(G_new_list) | |||||
pi_forward_list = [pi_forward_list[idx] for idx in idx_list] | |||||
dis_list = [dis_list[idx] for idx in idx_list] | |||||
# if connected == True: | # if connected == True: | ||||
# G_new_list, idx_list = remove_disconnected(G_new_list) | # G_new_list, idx_list = remove_disconnected(G_new_list) | ||||
# pi_forward_list = [pi_forward_list[idx] for idx in idx_list] | # pi_forward_list = [pi_forward_list[idx] for idx in idx_list] | ||||
@@ -482,25 +502,10 @@ def test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations( | |||||
# print(g.nodes(data=True)) | # print(g.nodes(data=True)) | ||||
# print(g.edges(data=True)) | # print(g.edges(data=True)) | ||||
return G_new_list, pi_forward_list | |||||
return G_new_list, pi_forward_list, dis_list | |||||
def median_distance(Gn, Gn_median, measure='ged', verbose=False): | |||||
dis_list = [] | |||||
pi_forward_list = [] | |||||
for idx, G in tqdm(enumerate(Gn), desc='computing median distances', | |||||
file=sys.stdout) if verbose else enumerate(Gn): | |||||
dis_sum = 0 | |||||
pi_forward_list.append([]) | |||||
for G_p in Gn_median: | |||||
dis_tmp, pi_tmp_forward, pi_tmp_backward = GED(G, G_p) | |||||
pi_forward_list[idx].append(pi_tmp_forward) | |||||
dis_sum += dis_tmp | |||||
dis_list.append(dis_sum) | |||||
return dis_list, pi_forward_list | |||||
def best_median_graphs(Gn_candidate, dis_all, pi_all_forward): | |||||
def best_median_graphs(Gn_candidate, pi_all_forward, dis_all): | |||||
idx_min_list = np.argwhere(dis_all == np.min(dis_all)).flatten().tolist() | idx_min_list = np.argwhere(dis_all == np.min(dis_all)).flatten().tolist() | ||||
dis_min = dis_all[idx_min_list[0]] | dis_min = dis_all[idx_min_list[0]] | ||||
pi_forward_min_list = [pi_all_forward[idx] for idx in idx_min_list] | pi_forward_min_list = [pi_all_forward[idx] for idx in idx_min_list] | ||||
@@ -508,25 +513,45 @@ def test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations( | |||||
return G_min_list, pi_forward_min_list, dis_min | return G_min_list, pi_forward_min_list, dis_min | ||||
def iteration_proc(G, pi_p_forward): | |||||
def iteration_proc(G, pi_p_forward, cur_sod): | |||||
G_list = [G] | G_list = [G] | ||||
pi_forward_list = [pi_p_forward] | pi_forward_list = [pi_p_forward] | ||||
old_sod = cur_sod * 2 | |||||
sod_list = [cur_sod] | |||||
# iterations. | # iterations. | ||||
for itr in range(0, 5): # @todo: the convergence condition? | |||||
# print('itr is', itr) | |||||
itr = 0 | |||||
while itr < ite_max and np.abs(old_sod - cur_sod) > epsilon: | |||||
# for itr in range(0, 5): # the convergence condition? | |||||
print('itr is', itr) | |||||
G_new_list = [] | G_new_list = [] | ||||
pi_forward_new_list = [] | pi_forward_new_list = [] | ||||
dis_new_list = [] | |||||
for idx, G in enumerate(G_list): | for idx, G in enumerate(G_list): | ||||
label_set = get_node_labels(Gn_median + [G], node_label) | label_set = get_node_labels(Gn_median + [G], node_label) | ||||
G_tmp_list, pi_forward_tmp_list = generate_graph( | |||||
G_tmp_list, pi_forward_tmp_list, dis_tmp_list = generate_graph( | |||||
G, pi_forward_list[idx], label_set) | G, pi_forward_list[idx], label_set) | ||||
G_new_list += G_tmp_list | G_new_list += G_tmp_list | ||||
pi_forward_new_list += pi_forward_tmp_list | pi_forward_new_list += pi_forward_tmp_list | ||||
dis_new_list += dis_tmp_list | |||||
G_list = G_new_list[:] | G_list = G_new_list[:] | ||||
pi_forward_list = pi_forward_new_list[:] | pi_forward_list = pi_forward_new_list[:] | ||||
dis_list = dis_new_list[:] | |||||
old_sod = cur_sod | |||||
cur_sod = np.min(dis_list) | |||||
sod_list.append(cur_sod) | |||||
itr += 1 | |||||
G_list, idx_list = remove_duplicates(G_list) | |||||
pi_forward_list = [pi_forward_list[idx] for idx in idx_list] | |||||
# @todo: do we return all graphs or the best ones? | |||||
# get the best ones of the generated graphs. | |||||
G_list, pi_forward_list, dis_min = best_median_graphs( | |||||
G_list, pi_forward_list, dis_list) | |||||
if ds_attrs['node_attr_dim'] == 0 and ds_attrs['edge_attr_dim'] == 0: | |||||
G_list, idx_list = remove_duplicates(G_list) | |||||
pi_forward_list = [pi_forward_list[idx] for idx in idx_list] | |||||
# dis_list = [dis_list[idx] for idx in idx_list] | |||||
# import matplotlib.pyplot as plt | # import matplotlib.pyplot as plt | ||||
# for g in G_list: | # for g in G_list: | ||||
@@ -535,7 +560,9 @@ def test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations( | |||||
# print(g.nodes(data=True)) | # print(g.nodes(data=True)) | ||||
# print(g.edges(data=True)) | # print(g.edges(data=True)) | ||||
return G_list, pi_forward_list # do we return all graphs or the best ones? | |||||
print('\nsods:', sod_list, '\n') | |||||
return G_list, pi_forward_list, dis_min | |||||
def remove_duplicates(Gn): | def remove_duplicates(Gn): | ||||
@@ -570,28 +597,37 @@ def test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations( | |||||
# phase 1: initilize. | # phase 1: initilize. | ||||
# compute set-median. | # compute set-median. | ||||
dis_min = np.inf | dis_min = np.inf | ||||
dis_all, pi_all_forward = median_distance(Gn_candidate, Gn_median) | |||||
dis_list, pi_forward_all = median_distance(Gn_candidate, Gn_median) | |||||
# find all smallest distances. | # find all smallest distances. | ||||
idx_min_list = np.argwhere(dis_all == np.min(dis_all)).flatten().tolist() | |||||
dis_min = dis_all[idx_min_list[0]] | |||||
idx_min_list = np.argwhere(dis_list == np.min(dis_list)).flatten().tolist() | |||||
dis_min = dis_list[idx_min_list[0]] | |||||
# phase 2: iteration. | # phase 2: iteration. | ||||
G_list = [] | G_list = [] | ||||
for idx_min in idx_min_list[::-1]: | |||||
dis_list = [] | |||||
pi_forward_list = [] | |||||
for idx_min in idx_min_list: | |||||
# print('idx_min is', idx_min) | # print('idx_min is', idx_min) | ||||
G = Gn_candidate[idx_min].copy() | G = Gn_candidate[idx_min].copy() | ||||
# list of edit operations. | # list of edit operations. | ||||
pi_p_forward = pi_all_forward[idx_min] | |||||
pi_p_forward = pi_forward_all[idx_min] | |||||
# pi_p_backward = pi_all_backward[idx_min] | # pi_p_backward = pi_all_backward[idx_min] | ||||
Gi_list, pi_i_forward_list = iteration_proc(G, pi_p_forward) | |||||
Gi_list, pi_i_forward_list, dis_i_min = iteration_proc(G, pi_p_forward, dis_min) | |||||
G_list += Gi_list | G_list += Gi_list | ||||
dis_list.append(dis_i_min) | |||||
pi_forward_list += pi_i_forward_list | |||||
G_list, _ = remove_duplicates(G_list) | |||||
if ds_attrs['node_attr_dim'] == 0 and ds_attrs['edge_attr_dim'] == 0: | |||||
G_list, idx_list = remove_duplicates(G_list) | |||||
dis_list = [dis_list[idx] for idx in idx_list] | |||||
pi_forward_list = [pi_forward_list[idx] for idx in idx_list] | |||||
if connected == True: | if connected == True: | ||||
G_list_con, _ = remove_disconnected(G_list) | |||||
# if there is no connected graphs at all, then remain the disconnected ones. | |||||
if len(G_list_con) > 0: # @todo: ?????????????????????????? | |||||
G_list = G_list_con | |||||
G_list_con, idx_list = remove_disconnected(G_list) | |||||
# if there is no connected graphs at all, then remain the disconnected ones. | |||||
if len(G_list_con) > 0: # @todo: ?????????????????????????? | |||||
G_list = G_list_con | |||||
dis_list = [dis_list[idx] for idx in idx_list] | |||||
pi_forward_list = [pi_forward_list[idx] for idx in idx_list] | |||||
# import matplotlib.pyplot as plt | # import matplotlib.pyplot as plt | ||||
# for g in G_list: | # for g in G_list: | ||||
@@ -601,15 +637,15 @@ def test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations( | |||||
# print(g.edges(data=True)) | # print(g.edges(data=True)) | ||||
# get the best median graphs | # get the best median graphs | ||||
dis_all, pi_all_forward = median_distance(G_list, Gn_median) | |||||
# dis_list, pi_forward_list = median_distance(G_list, Gn_median) | |||||
G_min_list, pi_forward_min_list, dis_min = best_median_graphs( | G_min_list, pi_forward_min_list, dis_min = best_median_graphs( | ||||
G_list, dis_all, pi_all_forward) | |||||
G_list, pi_forward_list, dis_list) | |||||
# for g in G_min_list: | # for g in G_min_list: | ||||
# nx.draw_networkx(g) | # nx.draw_networkx(g) | ||||
# plt.show() | # plt.show() | ||||
# print(g.nodes(data=True)) | # print(g.nodes(data=True)) | ||||
# print(g.edges(data=True)) | # print(g.edges(data=True)) | ||||
return G_min_list | |||||
return G_min_list, dis_min | |||||
if __name__ == '__main__': | if __name__ == '__main__': | ||||
@@ -0,0 +1,218 @@ | |||||
import sys | |||||
sys.path.insert(0, "../") | |||||
#import pathlib | |||||
import numpy as np | |||||
import networkx as nx | |||||
import time | |||||
#import librariesImport | |||||
#import script | |||||
#sys.path.insert(0, "/home/bgauzere/dev/optim-graphes/") | |||||
#import pygraph | |||||
from pygraph.utils.graphfiles import loadDataset | |||||
def replace_graph_in_env(script, graph, old_id, label='median'): | |||||
""" | |||||
Replace a graph in script | |||||
If old_id is -1, add a new graph to the environnemt | |||||
""" | |||||
if(old_id > -1): | |||||
script.PyClearGraph(old_id) | |||||
new_id = script.PyAddGraph(label) | |||||
for i in graph.nodes(): | |||||
script.PyAddNode(new_id,str(i),graph.node[i]) # !! strings are required bt gedlib | |||||
for e in graph.edges: | |||||
script.PyAddEdge(new_id, str(e[0]),str(e[1]), {}) | |||||
script.PyInitEnv() | |||||
script.PySetMethod("IPFP", "") | |||||
script.PyInitMethod() | |||||
return new_id | |||||
#Dessin median courrant | |||||
def draw_Letter_graph(graph, savepath=''): | |||||
import numpy as np | |||||
import networkx as nx | |||||
import matplotlib.pyplot as plt | |||||
plt.figure() | |||||
pos = {} | |||||
for n in graph.nodes: | |||||
pos[n] = np.array([float(graph.node[n]['attributes'][0]), | |||||
float(graph.node[n]['attributes'][1])]) | |||||
nx.draw_networkx(graph, pos) | |||||
if savepath != '': | |||||
plt.savefig(savepath + str(time.time()) + '.eps', format='eps', dpi=300) | |||||
plt.show() | |||||
plt.clf() | |||||
#compute new mappings | |||||
def update_mappings(script,median_id,listID): | |||||
med_distances = {} | |||||
med_mappings = {} | |||||
sod = 0 | |||||
for i in range(0,len(listID)): | |||||
script.PyRunMethod(median_id,listID[i]) | |||||
med_distances[i] = script.PyGetUpperBound(median_id,listID[i]) | |||||
med_mappings[i] = script.PyGetForwardMap(median_id,listID[i]) | |||||
sod += med_distances[i] | |||||
return med_distances, med_mappings, sod | |||||
def calcul_Sij(all_mappings, all_graphs,i,j): | |||||
s_ij = 0 | |||||
for k in range(0,len(all_mappings)): | |||||
cur_graph = all_graphs[k] | |||||
cur_mapping = all_mappings[k] | |||||
size_graph = cur_graph.order() | |||||
if ((cur_mapping[i] < size_graph) and | |||||
(cur_mapping[j] < size_graph) and | |||||
(cur_graph.has_edge(cur_mapping[i], cur_mapping[j]) == True)): | |||||
s_ij += 1 | |||||
return s_ij | |||||
# def update_median_nodes_L1(median,listIdSet,median_id,dataset, mappings): | |||||
# from scipy.stats.mstats import gmean | |||||
# for i in median.nodes(): | |||||
# for k in listIdSet: | |||||
# vectors = [] #np.zeros((len(listIdSet),2)) | |||||
# if(k != median_id): | |||||
# phi_i = mappings[k][i] | |||||
# if(phi_i < dataset[k].order()): | |||||
# vectors.append([float(dataset[k].node[phi_i]['x']),float(dataset[k].node[phi_i]['y'])]) | |||||
# new_labels = gmean(vectors) | |||||
# median.node[i]['x'] = str(new_labels[0]) | |||||
# median.node[i]['y'] = str(new_labels[1]) | |||||
# return median | |||||
def update_median_nodes(median,dataset,mappings): | |||||
#update node attributes | |||||
for i in median.nodes(): | |||||
nb_sub=0 | |||||
mean_label = {'x' : 0, 'y' : 0} | |||||
for k in range(0,len(mappings)): | |||||
phi_i = mappings[k][i] | |||||
if ( phi_i < dataset[k].order() ): | |||||
nb_sub += 1 | |||||
mean_label['x'] += 0.75*float(dataset[k].node[phi_i]['x']) | |||||
mean_label['y'] += 0.75*float(dataset[k].node[phi_i]['y']) | |||||
median.node[i]['x'] = str((1/0.75)*(mean_label['x']/nb_sub)) | |||||
median.node[i]['y'] = str((1/0.75)*(mean_label['y']/nb_sub)) | |||||
return median | |||||
def update_median_edges(dataset, mappings, median, cei=0.425,cer=0.425): | |||||
#for letter high, ceir = 1.7, alpha = 0.75 | |||||
size_dataset = len(dataset) | |||||
ratio_cei_cer = cer/(cei + cer) | |||||
threshold = size_dataset*ratio_cei_cer | |||||
order_graph_median = median.order() | |||||
for i in range(0,order_graph_median): | |||||
for j in range(i+1,order_graph_median): | |||||
s_ij = calcul_Sij(mappings,dataset,i,j) | |||||
if(s_ij > threshold): | |||||
median.add_edge(i,j) | |||||
else: | |||||
if(median.has_edge(i,j)): | |||||
median.remove_edge(i,j) | |||||
return median | |||||
def compute_median(script, listID, dataset,verbose=False): | |||||
"""Compute a graph median of a dataset according to an environment | |||||
Parameters | |||||
script : An gedlib initialized environnement | |||||
listID (list): a list of ID in script: encodes the dataset | |||||
dataset (list): corresponding graphs in networkX format. We assume that graph | |||||
listID[i] corresponds to dataset[i] | |||||
Returns: | |||||
A networkX graph, which is the median, with corresponding sod | |||||
""" | |||||
print(len(listID)) | |||||
median_set_index, median_set_sod = compute_median_set(script, listID) | |||||
print(median_set_index) | |||||
print(median_set_sod) | |||||
sods = [] | |||||
#Ajout median dans environnement | |||||
set_median = dataset[median_set_index].copy() | |||||
median = dataset[median_set_index].copy() | |||||
cur_med_id = replace_graph_in_env(script,median,-1) | |||||
med_distances, med_mappings, cur_sod = update_mappings(script,cur_med_id,listID) | |||||
sods.append(cur_sod) | |||||
if(verbose): | |||||
print(cur_sod) | |||||
ite_max = 50 | |||||
old_sod = cur_sod * 2 | |||||
ite = 0 | |||||
epsilon = 0.001 | |||||
best_median | |||||
while((ite < ite_max) and (np.abs(old_sod - cur_sod) > epsilon )): | |||||
median = update_median_nodes(median,dataset, med_mappings) | |||||
median = update_median_edges(dataset,med_mappings,median) | |||||
cur_med_id = replace_graph_in_env(script,median,cur_med_id) | |||||
med_distances, med_mappings, cur_sod = update_mappings(script,cur_med_id,listID) | |||||
sods.append(cur_sod) | |||||
if(verbose): | |||||
print(cur_sod) | |||||
ite += 1 | |||||
return median, cur_sod, sods, set_median | |||||
draw_Letter_graph(median) | |||||
def compute_median_set(script,listID): | |||||
'Returns the id in listID corresponding to median set' | |||||
#Calcul median set | |||||
N=len(listID) | |||||
map_id_to_index = {} | |||||
map_index_to_id = {} | |||||
for i in range(0,len(listID)): | |||||
map_id_to_index[listID[i]] = i | |||||
map_index_to_id[i] = listID[i] | |||||
distances = np.zeros((N,N)) | |||||
for i in listID: | |||||
for j in listID: | |||||
script.PyRunMethod(i,j) | |||||
distances[map_id_to_index[i],map_id_to_index[j]] = script.PyGetUpperBound(i,j) | |||||
median_set_index = np.argmin(np.sum(distances,0)) | |||||
sod = np.min(np.sum(distances,0)) | |||||
return median_set_index, sod | |||||
#if __name__ == "__main__": | |||||
# #Chargement du dataset | |||||
# script.PyLoadGXLGraph('/home/bgauzere/dev/gedlib/data/datasets/Letter/HIGH/', '/home/bgauzere/dev/gedlib/data/collections/Letter_Z.xml') | |||||
# script.PySetEditCost("LETTER") | |||||
# script.PyInitEnv() | |||||
# script.PySetMethod("IPFP", "") | |||||
# script.PyInitMethod() | |||||
# | |||||
# dataset,my_y = pygraph.utils.graphfiles.loadDataset("/home/bgauzere/dev/gedlib/data/datasets/Letter/HIGH/Letter_Z.cxl") | |||||
# | |||||
# listID = script.PyGetAllGraphIds() | |||||
# median, sod = compute_median(script,listID,dataset,verbose=True) | |||||
# | |||||
# print(sod) | |||||
# draw_Letter_graph(median) | |||||
if __name__ == '__main__': | |||||
# test draw_Letter_graph | |||||
ds = {'name': 'Letter-high', 'dataset': '../datasets/Letter-high/Letter-high_A.txt', | |||||
'extra_params': {}} # node nsymb | |||||
Gn, y_all = loadDataset(ds['dataset'], extra_params=ds['extra_params']) | |||||
print(y_all) | |||||
for g in Gn: | |||||
draw_Letter_graph(g) |
@@ -0,0 +1,423 @@ | |||||
#!/usr/bin/env python3 | |||||
# -*- coding: utf-8 -*- | |||||
""" | |||||
Created on Thu Jul 4 12:20:16 2019 | |||||
@author: ljia | |||||
""" | |||||
import numpy as np | |||||
import networkx as nx | |||||
import matplotlib.pyplot as plt | |||||
import time | |||||
from tqdm import tqdm | |||||
import sys | |||||
sys.path.insert(0, "../") | |||||
from pygraph.utils.graphfiles import loadDataset | |||||
from median import draw_Letter_graph | |||||
# --------------------------- These are tests --------------------------------# | |||||
def test_who_is_the_closest_in_kernel_space(Gn): | |||||
idx_gi = [0, 6] | |||||
g1 = Gn[idx_gi[0]] | |||||
g2 = Gn[idx_gi[1]] | |||||
# create the "median" graph. | |||||
gnew = g2.copy() | |||||
gnew.remove_node(0) | |||||
nx.draw_networkx(gnew) | |||||
plt.show() | |||||
print(gnew.nodes(data=True)) | |||||
Gn = [gnew] + Gn | |||||
# compute gram matrix | |||||
Kmatrix = compute_kernel(Gn, 'untilhpathkernel', True) | |||||
# the distance matrix | |||||
dmatrix = gram2distances(Kmatrix) | |||||
print(np.sort(dmatrix[idx_gi[0] + 1])) | |||||
print(np.argsort(dmatrix[idx_gi[0] + 1])) | |||||
print(np.sort(dmatrix[idx_gi[1] + 1])) | |||||
print(np.argsort(dmatrix[idx_gi[1] + 1])) | |||||
# for all g in Gn, compute (d(g1, g) + d(g2, g)) / 2 | |||||
dis_median = [(dmatrix[i, idx_gi[0] + 1] + dmatrix[i, idx_gi[1] + 1]) / 2 for i in range(len(Gn))] | |||||
print(np.sort(dis_median)) | |||||
print(np.argsort(dis_median)) | |||||
return | |||||
def test_who_is_the_closest_in_GED_space(Gn): | |||||
from iam import GED | |||||
idx_gi = [0, 6] | |||||
g1 = Gn[idx_gi[0]] | |||||
g2 = Gn[idx_gi[1]] | |||||
# create the "median" graph. | |||||
gnew = g2.copy() | |||||
gnew.remove_node(0) | |||||
nx.draw_networkx(gnew) | |||||
plt.show() | |||||
print(gnew.nodes(data=True)) | |||||
Gn = [gnew] + Gn | |||||
# compute GEDs | |||||
ged_matrix = np.zeros((len(Gn), len(Gn))) | |||||
for i1 in tqdm(range(len(Gn)), desc='computing GEDs', file=sys.stdout): | |||||
for i2 in range(len(Gn)): | |||||
dis, _, _ = GED(Gn[i1], Gn[i2], lib='gedlib') | |||||
ged_matrix[i1, i2] = dis | |||||
print(np.sort(ged_matrix[idx_gi[0] + 1])) | |||||
print(np.argsort(ged_matrix[idx_gi[0] + 1])) | |||||
print(np.sort(ged_matrix[idx_gi[1] + 1])) | |||||
print(np.argsort(ged_matrix[idx_gi[1] + 1])) | |||||
# for all g in Gn, compute (GED(g1, g) + GED(g2, g)) / 2 | |||||
dis_median = [(ged_matrix[i, idx_gi[0] + 1] + ged_matrix[i, idx_gi[1] + 1]) / 2 for i in range(len(Gn))] | |||||
print(np.sort(dis_median)) | |||||
print(np.argsort(dis_median)) | |||||
return | |||||
def test_will_IAM_give_the_median_graph_we_wanted(Gn): | |||||
idx_gi = [0, 6] | |||||
g1 = Gn[idx_gi[0]].copy() | |||||
g2 = Gn[idx_gi[1]].copy() | |||||
# del Gn[idx_gi[0]] | |||||
# del Gn[idx_gi[1] - 1] | |||||
g_median = test_iam_with_more_graphs_as_init([g1, g2], [g1, g2], c_ei=1, c_er=1, c_es=1) | |||||
# g_median = test_iam_with_more_graphs_as_init(Gn, Gn, c_ei=1, c_er=1, c_es=1) | |||||
nx.draw_networkx(g_median) | |||||
plt.show() | |||||
print(g_median.nodes(data=True)) | |||||
print(g_median.edges(data=True)) | |||||
def test_new_IAM_allGraph_deleteNodes(Gn): | |||||
idx_gi = [0, 6] | |||||
# g1 = Gn[idx_gi[0]].copy() | |||||
# g2 = Gn[idx_gi[1]].copy() | |||||
# g1 = nx.Graph(name='haha') | |||||
# g1.add_nodes_from([(0, {'atom': 'C'}), (1, {'atom': 'O'}), (2, {'atom': 'C'})]) | |||||
# g1.add_edges_from([(0, 1, {'bond_type': '1'}), (1, 2, {'bond_type': '1'})]) | |||||
# g2 = nx.Graph(name='hahaha') | |||||
# g2.add_nodes_from([(0, {'atom': 'C'}), (1, {'atom': 'O'}), (2, {'atom': 'C'}), | |||||
# (3, {'atom': 'O'}), (4, {'atom': 'C'})]) | |||||
# g2.add_edges_from([(0, 1, {'bond_type': '1'}), (1, 2, {'bond_type': '1'}), | |||||
# (2, 3, {'bond_type': '1'}), (3, 4, {'bond_type': '1'})]) | |||||
g1 = nx.Graph(name='haha') | |||||
g1.add_nodes_from([(0, {'atom': 'C'}), (1, {'atom': 'C'}), (2, {'atom': 'C'}), | |||||
(3, {'atom': 'S'}), (4, {'atom': 'S'})]) | |||||
g1.add_edges_from([(0, 1, {'bond_type': '1'}), (1, 2, {'bond_type': '1'}), | |||||
(2, 3, {'bond_type': '1'}), (2, 4, {'bond_type': '1'})]) | |||||
g2 = nx.Graph(name='hahaha') | |||||
g2.add_nodes_from([(0, {'atom': 'C'}), (1, {'atom': 'C'}), (2, {'atom': 'C'}), | |||||
(3, {'atom': 'O'}), (4, {'atom': 'O'})]) | |||||
g2.add_edges_from([(0, 1, {'bond_type': '1'}), (1, 2, {'bond_type': '1'}), | |||||
(2, 3, {'bond_type': '1'}), (2, 4, {'bond_type': '1'})]) | |||||
# g2 = g1.copy() | |||||
# g2.add_nodes_from([(3, {'atom': 'O'})]) | |||||
# g2.add_nodes_from([(4, {'atom': 'C'})]) | |||||
# g2.add_edges_from([(1, 3, {'bond_type': '1'})]) | |||||
# g2.add_edges_from([(3, 4, {'bond_type': '1'})]) | |||||
# del Gn[idx_gi[0]] | |||||
# del Gn[idx_gi[1] - 1] | |||||
nx.draw_networkx(g1) | |||||
plt.show() | |||||
print(g1.nodes(data=True)) | |||||
print(g1.edges(data=True)) | |||||
nx.draw_networkx(g2) | |||||
plt.show() | |||||
print(g2.nodes(data=True)) | |||||
print(g2.edges(data=True)) | |||||
g_median = test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations([g1, g2], [g1, g2], c_ei=1, c_er=1, c_es=1) | |||||
# g_median = test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations(Gn, Gn, c_ei=1, c_er=1, c_es=1) | |||||
nx.draw_networkx(g_median) | |||||
plt.show() | |||||
print(g_median.nodes(data=True)) | |||||
print(g_median.edges(data=True)) | |||||
def test_the_simple_two(Gn, gkernel): | |||||
from gk_iam import gk_iam_nearest_multi, compute_kernel | |||||
lmbda = 0.03 # termination probalility | |||||
r_max = 10 # recursions | |||||
l = 500 | |||||
alpha_range = np.linspace(0.5, 0.5, 1) | |||||
k = 2 # k nearest neighbors | |||||
# randomly select two molecules | |||||
np.random.seed(1) | |||||
idx_gi = [0, 6] # np.random.randint(0, len(Gn), 2) | |||||
g1 = Gn[idx_gi[0]] | |||||
g2 = Gn[idx_gi[1]] | |||||
Gn_mix = [g.copy() for g in Gn] | |||||
Gn_mix.append(g1.copy()) | |||||
Gn_mix.append(g2.copy()) | |||||
# g_tmp = iam([g1, g2]) | |||||
# nx.draw_networkx(g_tmp) | |||||
# plt.show() | |||||
# compute | |||||
# k_list = [] # kernel between each graph and itself. | |||||
# k_g1_list = [] # kernel between each graph and g1 | |||||
# k_g2_list = [] # kernel between each graph and g2 | |||||
# for ig, g in tqdm(enumerate(Gn), desc='computing self kernels', file=sys.stdout): | |||||
# ktemp = compute_kernel([g, g1, g2], 'marginalizedkernel', False) | |||||
# k_list.append(ktemp[0][0, 0]) | |||||
# k_g1_list.append(ktemp[0][0, 1]) | |||||
# k_g2_list.append(ktemp[0][0, 2]) | |||||
km = compute_kernel(Gn_mix, gkernel, True) | |||||
# k_list = np.diag(km) # kernel between each graph and itself. | |||||
# k_g1_list = km[idx_gi[0]] # kernel between each graph and g1 | |||||
# k_g2_list = km[idx_gi[1]] # kernel between each graph and g2 | |||||
g_best = [] | |||||
dis_best = [] | |||||
# for each alpha | |||||
for alpha in alpha_range: | |||||
print('alpha =', alpha) | |||||
dhat, ghat_list = gk_iam_nearest_multi(Gn, [g1, g2], [alpha, 1 - alpha], | |||||
range(len(Gn), len(Gn) + 2), km, | |||||
k, r_max,gkernel) | |||||
dis_best.append(dhat) | |||||
g_best.append(ghat_list) | |||||
for idx, item in enumerate(alpha_range): | |||||
print('when alpha is', item, 'the shortest distance is', dis_best[idx]) | |||||
print('the corresponding pre-images are') | |||||
for g in g_best[idx]: | |||||
nx.draw_networkx(g) | |||||
plt.show() | |||||
print(g.nodes(data=True)) | |||||
print(g.edges(data=True)) | |||||
def test_remove_bests(Gn, gkernel): | |||||
from gk_iam import gk_iam_nearest_multi, compute_kernel | |||||
lmbda = 0.03 # termination probalility | |||||
r_max = 10 # recursions | |||||
l = 500 | |||||
alpha_range = np.linspace(0.5, 0.5, 1) | |||||
k = 20 # k nearest neighbors | |||||
# randomly select two molecules | |||||
np.random.seed(1) | |||||
idx_gi = [0, 6] # np.random.randint(0, len(Gn), 2) | |||||
g1 = Gn[idx_gi[0]] | |||||
g2 = Gn[idx_gi[1]] | |||||
# remove the best 2 graphs. | |||||
del Gn[idx_gi[0]] | |||||
del Gn[idx_gi[1] - 1] | |||||
# del Gn[8] | |||||
Gn_mix = [g.copy() for g in Gn] | |||||
Gn_mix.append(g1.copy()) | |||||
Gn_mix.append(g2.copy()) | |||||
# compute | |||||
km = compute_kernel(Gn_mix, gkernel, True) | |||||
g_best = [] | |||||
dis_best = [] | |||||
# for each alpha | |||||
for alpha in alpha_range: | |||||
print('alpha =', alpha) | |||||
dhat, ghat_list = gk_iam_nearest_multi(Gn, [g1, g2], [alpha, 1 - alpha], | |||||
range(len(Gn), len(Gn) + 2), km, | |||||
k, r_max, gkernel) | |||||
dis_best.append(dhat) | |||||
g_best.append(ghat_list) | |||||
for idx, item in enumerate(alpha_range): | |||||
print('when alpha is', item, 'the shortest distance is', dis_best[idx]) | |||||
print('the corresponding pre-images are') | |||||
for g in g_best[idx]: | |||||
draw_Letter_graph(g) | |||||
# nx.draw_networkx(g) | |||||
# plt.show() | |||||
print(g.nodes(data=True)) | |||||
print(g.edges(data=True)) | |||||
def test_gkiam_letter_h(): | |||||
from gk_iam import gk_iam_nearest_multi, compute_kernel | |||||
from iam import median_distance | |||||
ds = {'name': 'Letter-high', 'dataset': '../datasets/Letter-high/Letter-high_A.txt', | |||||
'extra_params': {}} # node nsymb | |||||
# ds = {'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt', | |||||
# 'extra_params': {}} # node nsymb | |||||
Gn, y_all = loadDataset(ds['dataset'], extra_params=ds['extra_params']) | |||||
gkernel = 'structuralspkernel' | |||||
lmbda = 0.03 # termination probalility | |||||
r_max = 3 # recursions | |||||
# alpha_range = np.linspace(0.5, 0.5, 1) | |||||
k = 10 # k nearest neighbors | |||||
# classify graphs according to letters. | |||||
idx_dict = get_same_item_indices(y_all) | |||||
time_list = [] | |||||
sod_list = [] | |||||
sod_min_list = [] | |||||
for letter in idx_dict: | |||||
print('\n-------------------------------------------------------\n') | |||||
Gn_let = [Gn[i].copy() for i in idx_dict[letter]] | |||||
Gn_mix = Gn_let + [g.copy() for g in Gn_let] | |||||
alpha_range = np.linspace(1 / len(Gn_let), 1 / len(Gn_let), 1) | |||||
# compute | |||||
time0 = time.time() | |||||
km = compute_kernel(Gn_mix, gkernel, True) | |||||
g_best = [] | |||||
dis_best = [] | |||||
# for each alpha | |||||
for alpha in alpha_range: | |||||
print('alpha =', alpha) | |||||
dhat, ghat_list = gk_iam_nearest_multi(Gn_let, Gn_let, [alpha] * len(Gn_let), | |||||
range(len(Gn_let), len(Gn_mix)), km, | |||||
k, r_max, gkernel, c_ei=1.7, | |||||
c_er=1.7, c_es=1.7) | |||||
dis_best.append(dhat) | |||||
g_best.append(ghat_list) | |||||
time_list.append(time.time() - time0) | |||||
# show best graphs and save them to file. | |||||
for idx, item in enumerate(alpha_range): | |||||
print('when alpha is', item, 'the shortest distance is', dis_best[idx]) | |||||
print('the corresponding pre-images are') | |||||
for g in g_best[idx]: | |||||
draw_Letter_graph(g, savepath='results/gk_iam/') | |||||
# nx.draw_networkx(g) | |||||
# plt.show() | |||||
print(g.nodes(data=True)) | |||||
print(g.edges(data=True)) | |||||
# compute the corresponding sod in graph space. (alpha range not considered.) | |||||
sod_tmp, _ = median_distance(g_best[0], Gn_let) | |||||
sod_list.append(sod_tmp) | |||||
sod_min_list.append(np.min(sod_tmp)) | |||||
print('\nsods in graph space: ', sod_list) | |||||
print('\nsmallest sod in graph space for each letter: ', sod_min_list) | |||||
print('\ntimes:', time_list) | |||||
def get_same_item_indices(ls): | |||||
"""Get the indices of the same items in a list. Return a dict keyed by items. | |||||
""" | |||||
idx_dict = {} | |||||
for idx, item in enumerate(ls): | |||||
if item in idx_dict: | |||||
idx_dict[item].append(idx) | |||||
else: | |||||
idx_dict[item] = [idx] | |||||
return idx_dict | |||||
#def compute_letter_median_by_average(Gn): | |||||
# return g_median | |||||
def test_iam_letter_h(): | |||||
from iam import test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations | |||||
from gk_iam import dis_gstar, compute_kernel | |||||
ds = {'name': 'Letter-high', 'dataset': '../datasets/Letter-high/Letter-high_A.txt', | |||||
'extra_params': {}} # node nsymb | |||||
# ds = {'name': 'Letter-med', 'dataset': '../datasets/Letter-med/Letter-med_A.txt', | |||||
# 'extra_params': {}} # node nsymb | |||||
Gn, y_all = loadDataset(ds['dataset'], extra_params=ds['extra_params']) | |||||
lmbda = 0.03 # termination probalility | |||||
# alpha_range = np.linspace(0.5, 0.5, 1) | |||||
# classify graphs according to letters. | |||||
idx_dict = get_same_item_indices(y_all) | |||||
time_list = [] | |||||
sod_list = [] | |||||
sod_min_list = [] | |||||
for letter in idx_dict: | |||||
Gn_let = [Gn[i].copy() for i in idx_dict[letter]] | |||||
alpha_range = np.linspace(1 / len(Gn_let), 1 / len(Gn_let), 1) | |||||
# compute | |||||
g_best = [] | |||||
dis_best = [] | |||||
time0 = time.time() | |||||
# for each alpha | |||||
for alpha in alpha_range: | |||||
print('alpha =', alpha) | |||||
ghat_list, dhat = test_iam_moreGraphsAsInit_tryAllPossibleBestGraphs_deleteNodesInIterations( | |||||
Gn_let, Gn_let, c_ei=1.7, c_er=1.7, c_es=1.7) | |||||
dis_best.append(dhat) | |||||
g_best.append(ghat_list) | |||||
time_list.append(time.time() - time0) | |||||
# show best graphs and save them to file. | |||||
for idx, item in enumerate(alpha_range): | |||||
print('when alpha is', item, 'the shortest distance is', dis_best[idx]) | |||||
print('the corresponding pre-images are') | |||||
for g in g_best[idx]: | |||||
draw_Letter_graph(g, savepath='results/iam/') | |||||
# nx.draw_networkx(g) | |||||
# plt.show() | |||||
print(g.nodes(data=True)) | |||||
print(g.edges(data=True)) | |||||
# compute the corresponding sod in kernel space. (alpha range not considered.) | |||||
gkernel = 'structuralspkernel' | |||||
sod_tmp = [] | |||||
Gn_mix = g_best[0] + Gn_let | |||||
km = compute_kernel(Gn_mix, gkernel, True) | |||||
for ig, g in tqdm(enumerate(g_best[0]), desc='computing kernel sod', file=sys.stdout): | |||||
dtemp = dis_gstar(ig, range(len(g_best[0]), len(Gn_mix)), | |||||
[alpha_range[0]] * len(Gn_let), km, withterm3=False) | |||||
sod_tmp.append(dtemp) | |||||
sod_list.append(sod_tmp) | |||||
sod_min_list.append(np.min(sod_tmp)) | |||||
print('\nsods in kernel space: ', sod_list) | |||||
print('\nsmallest sod in kernel space for each letter: ', sod_min_list) | |||||
print('\ntimes:', time_list) | |||||
if __name__ == '__main__': | |||||
# ds = {'name': 'MUTAG', 'dataset': '../datasets/MUTAG/MUTAG_A.txt', | |||||
# 'extra_params': {}} # node/edge symb | |||||
ds = {'name': 'Letter-high', 'dataset': '../datasets/Letter-high/Letter-high_A.txt', | |||||
'extra_params': {}} # node nsymb | |||||
# ds = {'name': 'Acyclic', 'dataset': '../datasets/monoterpenoides/trainset_9.ds', | |||||
# 'extra_params': {}} | |||||
# ds = {'name': 'Acyclic', 'dataset': '../datasets/acyclic/dataset_bps.ds', | |||||
# 'extra_params': {}} # node symb | |||||
Gn, y_all = loadDataset(ds['dataset'], extra_params=ds['extra_params']) | |||||
# Gn = Gn[0:20] | |||||
# import networkx.algorithms.isomorphism as iso | |||||
# G1 = nx.MultiDiGraph() | |||||
# G2 = nx.MultiDiGraph() | |||||
# G1.add_nodes_from([1,2,3], fill='red') | |||||
# G2.add_nodes_from([10,20,30,40], fill='red') | |||||
# nx.add_path(G1, [1,2,3,4], weight=3, linewidth=2.5) | |||||
# nx.add_path(G2, [10,20,30,40], weight=3) | |||||
# nm = iso.categorical_node_match('fill', 'red') | |||||
# print(nx.is_isomorphic(G1, G2, node_match=nm)) | |||||
# | |||||
# test_new_IAM_allGraph_deleteNodes(Gn) | |||||
# test_will_IAM_give_the_median_graph_we_wanted(Gn) | |||||
# test_who_is_the_closest_in_GED_space(Gn) | |||||
# test_who_is_the_closest_in_kernel_space(Gn) | |||||
# test_the_simple_two(Gn, 'untilhpathkernel') | |||||
# test_remove_bests(Gn, 'untilhpathkernel') | |||||
test_gkiam_letter_h() | |||||
# test_iam_letter_h() |
@@ -23,7 +23,7 @@ from pygraph.utils.parallel import parallel_gm | |||||
def commonwalkkernel(*args, | def commonwalkkernel(*args, | ||||
node_label='atom', | node_label='atom', | ||||
edge_label='bond_type', | edge_label='bond_type', | ||||
n=None, | |||||
# n=None, | |||||
weight=1, | weight=1, | ||||
compute_method=None, | compute_method=None, | ||||
n_jobs=None, | n_jobs=None, | ||||
@@ -35,26 +35,28 @@ def commonwalkkernel(*args, | |||||
List of graphs between which the kernels are calculated. | List of graphs between which the kernels are calculated. | ||||
/ | / | ||||
G1, G2 : NetworkX graphs | G1, G2 : NetworkX graphs | ||||
2 graphs between which the kernel is calculated. | |||||
Two graphs between which the kernel is calculated. | |||||
node_label : string | node_label : string | ||||
node attribute used as label. The default node label is atom. | |||||
Node attribute used as symbolic label. The default node label is 'atom'. | |||||
edge_label : string | edge_label : string | ||||
edge attribute used as label. The default edge label is bond_type. | |||||
n : integer | |||||
Longest length of walks. Only useful when applying the 'brute' method. | |||||
Edge attribute used as symbolic label. The default edge label is 'bond_type'. | |||||
# n : integer | |||||
# Longest length of walks. Only useful when applying the 'brute' method. | |||||
weight: integer | weight: integer | ||||
Weight coefficient of different lengths of walks, which represents beta | Weight coefficient of different lengths of walks, which represents beta | ||||
in 'exp' method and gamma in 'geo'. | in 'exp' method and gamma in 'geo'. | ||||
compute_method : string | compute_method : string | ||||
Method used to compute walk kernel. The Following choices are | Method used to compute walk kernel. The Following choices are | ||||
available: | available: | ||||
'exp' : exponential serial method applied on the direct product graph, | |||||
as shown in reference [1]. The time complexity is O(n^6) for graphs | |||||
with n vertices. | |||||
'geo' : geometric serial method applied on the direct product graph, as | |||||
shown in reference [1]. The time complexity is O(n^6) for graphs with n | |||||
vertices. | |||||
'brute' : brute force, simply search for all walks and compare them. | |||||
'exp': method based on exponential serials applied on the direct | |||||
product graph, as shown in reference [1]. The time complexity is O(n^6) | |||||
for graphs with n vertices. | |||||
'geo': method based on geometric serials applied on the direct product | |||||
graph, as shown in reference [1]. The time complexity is O(n^6) for | |||||
graphs with n vertices. | |||||
# 'brute': brute force, simply search for all walks and compare them. | |||||
n_jobs : int | |||||
Number of jobs for parallelization. | |||||
Return | Return | ||||
------ | ------ | ||||
@@ -44,17 +44,20 @@ def marginalizedkernel(*args, | |||||
List of graphs between which the kernels are calculated. | List of graphs between which the kernels are calculated. | ||||
/ | / | ||||
G1, G2 : NetworkX graphs | G1, G2 : NetworkX graphs | ||||
2 graphs between which the kernel is calculated. | |||||
Two graphs between which the kernel is calculated. | |||||
node_label : string | node_label : string | ||||
node attribute used as label. The default node label is atom. | |||||
Node attribute used as symbolic label. The default node label is 'atom'. | |||||
edge_label : string | edge_label : string | ||||
edge attribute used as label. The default edge label is bond_type. | |||||
Edge attribute used as symbolic label. The default edge label is 'bond_type'. | |||||
p_quit : integer | p_quit : integer | ||||
the termination probability in the random walks generating step | |||||
The termination probability in the random walks generating step. | |||||
n_iteration : integer | n_iteration : integer | ||||
time of iterations to calculate R_inf | |||||
Time of iterations to calculate R_inf. | |||||
remove_totters : boolean | remove_totters : boolean | ||||
whether to remove totters. The default value is True. | |||||
Whether to remove totterings by method introduced in [2]. The default | |||||
value is False. | |||||
n_jobs : int | |||||
Number of jobs for parallelization. | |||||
Return | Return | ||||
------ | ------ | ||||
@@ -41,15 +41,62 @@ def randomwalkkernel(*args, | |||||
List of graphs between which the kernels are calculated. | List of graphs between which the kernels are calculated. | ||||
/ | / | ||||
G1, G2 : NetworkX graphs | G1, G2 : NetworkX graphs | ||||
2 graphs between which the kernel is calculated. | |||||
node_label : string | |||||
node attribute used as label. The default node label is atom. | |||||
Two graphs between which the kernel is calculated. | |||||
compute_method : string | |||||
Method used to compute kernel. The Following choices are | |||||
available: | |||||
'sylvester' - Sylvester equation method. | |||||
'conjugate' - conjugate gradient method. | |||||
'fp' - fixed-point iterations. | |||||
'spectral' - spectral decomposition. | |||||
weight : float | |||||
A constant weight set for random walks of length h. | |||||
p : None | |||||
Initial probability distribution on the unlabeled direct product graph | |||||
of two graphs. It is set to be uniform over all vertices in the direct | |||||
product graph. | |||||
q : None | |||||
Stopping probability distribution on the unlabeled direct product graph | |||||
of two graphs. It is set to be uniform over all vertices in the direct | |||||
product graph. | |||||
edge_weight: float | |||||
Edge attribute name corresponding to the edge weight. | |||||
node_kernels: dict | |||||
A dictionary of kernel functions for nodes, including 3 items: 'symb' | |||||
for symbolic node labels, 'nsymb' for non-symbolic node labels, 'mix' | |||||
for both labels. The first 2 functions take two node labels as | |||||
parameters, and the 'mix' function takes 4 parameters, a symbolic and a | |||||
non-symbolic label for each the two nodes. Each label is in form of 2-D | |||||
dimension array (n_samples, n_features). Each function returns a number | |||||
as the kernel value. Ignored when nodes are unlabeled. This argument | |||||
is designated to conjugate gradient method and fixed-point iterations. | |||||
edge_kernels: dict | |||||
A dictionary of kernel functions for edges, including 3 items: 'symb' | |||||
for symbolic edge labels, 'nsymb' for non-symbolic edge labels, 'mix' | |||||
for both labels. The first 2 functions take two edge labels as | |||||
parameters, and the 'mix' function takes 4 parameters, a symbolic and a | |||||
non-symbolic label for each the two edges. Each label is in form of 2-D | |||||
dimension array (n_samples, n_features). Each function returns a number | |||||
as the kernel value. Ignored when edges are unlabeled. This argument | |||||
is designated to conjugate gradient method and fixed-point iterations. | |||||
node_label: string | |||||
Node attribute used as label. The default node label is atom. This | |||||
argument is designated to conjugate gradient method and fixed-point | |||||
iterations. | |||||
edge_label : string | edge_label : string | ||||
edge attribute used as label. The default edge label is bond_type. | |||||
h : integer | |||||
Longest length of walks. | |||||
method : string | |||||
Method used to compute the random walk kernel. Available methods are 'sylvester', 'conjugate', 'fp', 'spectral' and 'kron'. | |||||
Edge attribute used as label. The default edge label is bond_type. This | |||||
argument is designated to conjugate gradient method and fixed-point | |||||
iterations. | |||||
sub_kernel: string | |||||
Method used to compute walk kernel. The Following choices are | |||||
available: | |||||
'exp' : method based on exponential serials. | |||||
'geo' : method based on geometric serials. | |||||
n_jobs: int | |||||
Number of jobs for parallelization. | |||||
Return | Return | ||||
------ | ------ | ||||
@@ -168,7 +215,7 @@ def _sylvester_equation(Gn, lmda, p, q, eweight, n_jobs, verbose=True): | |||||
if q == None: | if q == None: | ||||
# don't normalize adjacency matrices if q is a uniform vector. Note | # don't normalize adjacency matrices if q is a uniform vector. Note | ||||
# A_wave_list accually contains the transposes of the adjacency matrices. | |||||
# A_wave_list actually contains the transposes of the adjacency matrices. | |||||
A_wave_list = [ | A_wave_list = [ | ||||
nx.adjacency_matrix(G, eweight).todense().transpose() for G in | nx.adjacency_matrix(G, eweight).todense().transpose() for G in | ||||
(tqdm(Gn, desc='compute adjacency matrices', file=sys.stdout) if | (tqdm(Gn, desc='compute adjacency matrices', file=sys.stdout) if | ||||
@@ -259,7 +306,7 @@ def _conjugate_gradient(Gn, lmda, p, q, ds_attrs, node_kernels, edge_kernels, | |||||
# # this is faster from unlabeled graphs. @todo: why? | # # this is faster from unlabeled graphs. @todo: why? | ||||
# if q == None: | # if q == None: | ||||
# # don't normalize adjacency matrices if q is a uniform vector. Note | # # don't normalize adjacency matrices if q is a uniform vector. Note | ||||
# # A_wave_list accually contains the transposes of the adjacency matrices. | |||||
# # A_wave_list actually contains the transposes of the adjacency matrices. | |||||
# A_wave_list = [ | # A_wave_list = [ | ||||
# nx.adjacency_matrix(G, eweight).todense().transpose() for G in | # nx.adjacency_matrix(G, eweight).todense().transpose() for G in | ||||
# tqdm(Gn, desc='compute adjacency matrices', file=sys.stdout) | # tqdm(Gn, desc='compute adjacency matrices', file=sys.stdout) | ||||
@@ -376,7 +423,7 @@ def _fixed_point(Gn, lmda, p, q, ds_attrs, node_kernels, edge_kernels, | |||||
# # this is faster from unlabeled graphs. @todo: why? | # # this is faster from unlabeled graphs. @todo: why? | ||||
# if q == None: | # if q == None: | ||||
# # don't normalize adjacency matrices if q is a uniform vector. Note | # # don't normalize adjacency matrices if q is a uniform vector. Note | ||||
# # A_wave_list accually contains the transposes of the adjacency matrices. | |||||
# # A_wave_list actually contains the transposes of the adjacency matrices. | |||||
# A_wave_list = [ | # A_wave_list = [ | ||||
# nx.adjacency_matrix(G, eweight).todense().transpose() for G in | # nx.adjacency_matrix(G, eweight).todense().transpose() for G in | ||||
# tqdm(Gn, desc='compute adjacency matrices', file=sys.stdout) | # tqdm(Gn, desc='compute adjacency matrices', file=sys.stdout) | ||||
@@ -481,7 +528,7 @@ def _spectral_decomposition(Gn, weight, p, q, sub_kernel, eweight, n_jobs, verbo | |||||
for G in (tqdm(Gn, desc='spectral decompose', file=sys.stdout) if | for G in (tqdm(Gn, desc='spectral decompose', file=sys.stdout) if | ||||
verbose else Gn): | verbose else Gn): | ||||
# don't normalize adjacency matrices if q is a uniform vector. Note | # don't normalize adjacency matrices if q is a uniform vector. Note | ||||
# A accually is the transpose of the adjacency matrix. | |||||
# A actually is the transpose of the adjacency matrix. | |||||
A = nx.adjacency_matrix(G, eweight).todense().transpose() | A = nx.adjacency_matrix(G, eweight).todense().transpose() | ||||
ew, ev = np.linalg.eig(A) | ew, ev = np.linalg.eig(A) | ||||
D_list.append(ew) | D_list.append(ew) | ||||
@@ -33,12 +33,12 @@ def spkernel(*args, | |||||
List of graphs between which the kernels are calculated. | List of graphs between which the kernels are calculated. | ||||
/ | / | ||||
G1, G2 : NetworkX graphs | G1, G2 : NetworkX graphs | ||||
2 graphs between which the kernel is calculated. | |||||
Two graphs between which the kernel is calculated. | |||||
node_label : string | node_label : string | ||||
node attribute used as label. The default node label is atom. | |||||
Node attribute used as label. The default node label is atom. | |||||
edge_weight : string | edge_weight : string | ||||
Edge attribute name corresponding to the edge weight. | Edge attribute name corresponding to the edge weight. | ||||
node_kernels: dict | |||||
node_kernels : dict | |||||
A dictionary of kernel functions for nodes, including 3 items: 'symb' | A dictionary of kernel functions for nodes, including 3 items: 'symb' | ||||
for symbolic node labels, 'nsymb' for non-symbolic node labels, 'mix' | for symbolic node labels, 'nsymb' for non-symbolic node labels, 'mix' | ||||
for both labels. The first 2 functions take two node labels as | for both labels. The first 2 functions take two node labels as | ||||
@@ -46,6 +46,8 @@ def spkernel(*args, | |||||
non-symbolic label for each the two nodes. Each label is in form of 2-D | non-symbolic label for each the two nodes. Each label is in form of 2-D | ||||
dimension array (n_samples, n_features). Each function returns an | dimension array (n_samples, n_features). Each function returns an | ||||
number as the kernel value. Ignored when nodes are unlabeled. | number as the kernel value. Ignored when nodes are unlabeled. | ||||
n_jobs : int | |||||
Number of jobs for parallelization. | |||||
Return | Return | ||||
------ | ------ | ||||
@@ -42,14 +42,15 @@ def structuralspkernel(*args, | |||||
List of graphs between which the kernels are calculated. | List of graphs between which the kernels are calculated. | ||||
/ | / | ||||
G1, G2 : NetworkX graphs | G1, G2 : NetworkX graphs | ||||
2 graphs between which the kernel is calculated. | |||||
Two graphs between which the kernel is calculated. | |||||
node_label : string | node_label : string | ||||
node attribute used as label. The default node label is atom. | |||||
Node attribute used as label. The default node label is atom. | |||||
edge_weight : string | edge_weight : string | ||||
Edge attribute name corresponding to the edge weight. | |||||
Edge attribute name corresponding to the edge weight. Applied for the | |||||
computation of the shortest paths. | |||||
edge_label : string | edge_label : string | ||||
edge attribute used as label. The default edge label is bond_type. | |||||
node_kernels: dict | |||||
Edge attribute used as label. The default edge label is bond_type. | |||||
node_kernels : dict | |||||
A dictionary of kernel functions for nodes, including 3 items: 'symb' | A dictionary of kernel functions for nodes, including 3 items: 'symb' | ||||
for symbolic node labels, 'nsymb' for non-symbolic node labels, 'mix' | for symbolic node labels, 'nsymb' for non-symbolic node labels, 'mix' | ||||
for both labels. The first 2 functions take two node labels as | for both labels. The first 2 functions take two node labels as | ||||
@@ -57,7 +58,7 @@ def structuralspkernel(*args, | |||||
non-symbolic label for each the two nodes. Each label is in form of 2-D | non-symbolic label for each the two nodes. Each label is in form of 2-D | ||||
dimension array (n_samples, n_features). Each function returns a number | dimension array (n_samples, n_features). Each function returns a number | ||||
as the kernel value. Ignored when nodes are unlabeled. | as the kernel value. Ignored when nodes are unlabeled. | ||||
edge_kernels: dict | |||||
edge_kernels : dict | |||||
A dictionary of kernel functions for edges, including 3 items: 'symb' | A dictionary of kernel functions for edges, including 3 items: 'symb' | ||||
for symbolic edge labels, 'nsymb' for non-symbolic edge labels, 'mix' | for symbolic edge labels, 'nsymb' for non-symbolic edge labels, 'mix' | ||||
for both labels. The first 2 functions take two edge labels as | for both labels. The first 2 functions take two edge labels as | ||||
@@ -65,6 +66,13 @@ def structuralspkernel(*args, | |||||
non-symbolic label for each the two edges. Each label is in form of 2-D | non-symbolic label for each the two edges. Each label is in form of 2-D | ||||
dimension array (n_samples, n_features). Each function returns a number | dimension array (n_samples, n_features). Each function returns a number | ||||
as the kernel value. Ignored when edges are unlabeled. | as the kernel value. Ignored when edges are unlabeled. | ||||
compute_method : string | |||||
Computation method to store the shortest paths and compute the graph | |||||
kernel. The Following choices are available: | |||||
'trie': store paths as tries. | |||||
'naive': store paths to lists. | |||||
n_jobs : int | |||||
Number of jobs for parallelization. | |||||
Return | Return | ||||
------ | ------ | ||||
@@ -40,11 +40,19 @@ def treeletkernel(*args, | |||||
The sub-kernel between 2 real number vectors. Each vector counts the | The sub-kernel between 2 real number vectors. Each vector counts the | ||||
numbers of isomorphic treelets in a graph. | numbers of isomorphic treelets in a graph. | ||||
node_label : string | node_label : string | ||||
Node attribute used as label. The default node label is atom. | |||||
Node attribute used as label. The default node label is atom. | |||||
edge_label : string | edge_label : string | ||||
Edge attribute used as label. The default edge label is bond_type. | Edge attribute used as label. The default edge label is bond_type. | ||||
labeled : boolean | |||||
Whether the graphs are labeled. The default is True. | |||||
parallel : string/None | |||||
Which paralleliztion method is applied to compute the kernel. The | |||||
Following choices are available: | |||||
'imap_unordered': use Python's multiprocessing.Pool.imap_unordered | |||||
method. | |||||
None: no parallelization is applied. | |||||
n_jobs : int | |||||
Number of jobs for parallelization. The default is to use all | |||||
computational cores. This argument is only valid when one of the | |||||
parallelization method is applied. | |||||
Return | Return | ||||
------ | ------ | ||||
@@ -26,7 +26,7 @@ def untilhpathkernel(*args, | |||||
node_label='atom', | node_label='atom', | ||||
edge_label='bond_type', | edge_label='bond_type', | ||||
depth=10, | depth=10, | ||||
k_func='tanimoto', | |||||
k_func='MinMax', | |||||
compute_method='trie', | compute_method='trie', | ||||
n_jobs=None, | n_jobs=None, | ||||
verbose=True): | verbose=True): | ||||
@@ -38,7 +38,7 @@ def untilhpathkernel(*args, | |||||
List of graphs between which the kernels are calculated. | List of graphs between which the kernels are calculated. | ||||
/ | / | ||||
G1, G2 : NetworkX graphs | G1, G2 : NetworkX graphs | ||||
2 graphs between which the kernel is calculated. | |||||
Two graphs between which the kernel is calculated. | |||||
node_label : string | node_label : string | ||||
Node attribute used as label. The default node label is atom. | Node attribute used as label. The default node label is atom. | ||||
edge_label : string | edge_label : string | ||||
@@ -47,9 +47,17 @@ def untilhpathkernel(*args, | |||||
Depth of search. Longest length of paths. | Depth of search. Longest length of paths. | ||||
k_func : function | k_func : function | ||||
A kernel function applied using different notions of fingerprint | A kernel function applied using different notions of fingerprint | ||||
similarity. | |||||
compute_method: string | |||||
Computation method, 'trie' or 'naive'. | |||||
similarity, defining the type of feature map and normalization method | |||||
applied for the graph kernel. The Following choices are available: | |||||
'MinMax': use the MiniMax kernel and counting feature map. | |||||
'tanimoto': use the Tanimoto kernel and binary feature map. | |||||
compute_method : string | |||||
Computation method to store paths and compute the graph kernel. The | |||||
Following choices are available: | |||||
'trie': store paths as tries. | |||||
'naive': store paths to lists. | |||||
n_jobs : int | |||||
Number of jobs for parallelization. | |||||
Return | Return | ||||
------ | ------ | ||||
@@ -38,15 +38,28 @@ def weisfeilerlehmankernel(*args, | |||||
List of graphs between which the kernels are calculated. | List of graphs between which the kernels are calculated. | ||||
/ | / | ||||
G1, G2 : NetworkX graphs | G1, G2 : NetworkX graphs | ||||
2 graphs between which the kernel is calculated. | |||||
Two graphs between which the kernel is calculated. | |||||
node_label : string | node_label : string | ||||
node attribute used as label. The default node label is atom. | |||||
Node attribute used as label. The default node label is atom. | |||||
edge_label : string | edge_label : string | ||||
edge attribute used as label. The default edge label is bond_type. | |||||
Edge attribute used as label. The default edge label is bond_type. | |||||
height : int | height : int | ||||
subtree height | |||||
Subtree height. | |||||
base_kernel : string | base_kernel : string | ||||
base kernel used in each iteration of WL kernel. The default base kernel is subtree kernel. For user-defined kernel, base_kernel is the name of the base kernel function used in each iteration of WL kernel. This function returns a Numpy matrix, each element of which is the user-defined Weisfeiler-Lehman kernel between 2 praphs. | |||||
Base kernel used in each iteration of WL kernel. Only default 'subtree' | |||||
kernel can be applied for now. | |||||
# The default base | |||||
# kernel is subtree kernel. For user-defined kernel, base_kernel is the | |||||
# name of the base kernel function used in each iteration of WL kernel. | |||||
# This function returns a Numpy matrix, each element of which is the | |||||
# user-defined Weisfeiler-Lehman kernel between 2 praphs. | |||||
parallel : None | |||||
Which paralleliztion method is applied to compute the kernel. No | |||||
parallelization can be applied for now. | |||||
n_jobs : int | |||||
Number of jobs for parallelization. The default is to use all | |||||
computational cores. This argument is only valid when one of the | |||||
parallelization method is applied and can be ignored for now. | |||||
Return | Return | ||||
------ | ------ | ||||