pygraph.utils package

Submodules

pygraph.utils.graphdataset module

Obtain all kinds of attributes of a graph dataset.

pygraph.utils.graphdataset.get_dataset_attributes(Gn, target=None, attr_names=[], node_label=None, edge_label=None)[source]

Returns the structure and property information of the graph dataset Gn.

Gn : List of NetworkX graph
List of graphs whose information will be returned.
target : list
The list of classification targets corresponding to Gn. Only works for classification problems.
attr_names : list

List of strings which indicate which informations will be returned. The possible choices includes: ‘substructures’: sub-structures Gn contains, including ‘linear’, ‘non

linear’ and ‘cyclic’.

‘node_labeled’: whether vertices have symbolic labels. ‘edge_labeled’: whether egdes have symbolic labels. ‘is_directed’: whether graphs in Gn are directed. ‘dataset_size’: number of graphs in Gn. ‘ave_node_num’: average number of vertices of graphs in Gn. ‘min_node_num’: minimum number of vertices of graphs in Gn. ‘max_node_num’: maximum number of vertices of graphs in Gn. ‘ave_edge_num’: average number of edges of graphs in Gn. ‘min_edge_num’: minimum number of edges of graphs in Gn. ‘max_edge_num’: maximum number of edges of graphs in Gn. ‘ave_node_degree’: average vertex degree of graphs in Gn. ‘min_node_degree’: minimum vertex degree of graphs in Gn. ‘max_node_degree’: maximum vertex degree of graphs in Gn. ‘ave_fill_factor’: average fill factor (number_of_edges /

(number_of_nodes ** 2)) of graphs in Gn.

‘min_fill_factor’: minimum fill factor of graphs in Gn. ‘max_fill_factor’: maximum fill factor of graphs in Gn. ‘node_label_num’: number of symbolic vertex labels. ‘edge_label_num’: number of symbolic edge labels. ‘node_attr_dim’: number of dimensions of non-symbolic vertex labels.

Extracted from the ‘attributes’ attribute of graph nodes.
‘edge_attr_dim’: number of dimensions of non-symbolic edge labels.
Extracted from the ‘attributes’ attribute of graph edges.
‘class_number’: number of classes. Only available for classification
problems.
node_label : string
Node attribute used as label. The default node label is atom. Mandatory when ‘node_labeled’ or ‘node_label_num’ is required.
edge_label : string
Edge attribute used as label. The default edge label is bond_type. Mandatory when ‘edge_labeled’ or ‘edge_label_num’ is required.
attrs : dict
Value for each property.

pygraph.utils.graphfiles module

Utilities function to manage graph files

pygraph.utils.graphfiles.loadCT(filename)[source]

load data from a Chemical Table (.ct) file.

a typical example of data in .ct is like this:

3 2 <- number of nodes and edges
0.0000 0.0000 0.0000 C <- each line describes a node (x,y,z + label) 0.0000 0.0000 0.0000 C 0.0000 0.0000 0.0000 O

1 3 1 1 <- each line describes an edge : to, from, bond type, bond stereo 2 3 1 1

Check https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=10&ved=2ahUKEwivhaSdjsTlAhVhx4UKHczHA8gQFjAJegQIARAC&url=https%3A%2F%2Fwww.daylight.com%2Fmeetings%2Fmug05%2FKappler%2Fctfile.pdf&usg=AOvVaw1cDNrrmMClkFPqodlF2inS for detailed format discription.

pygraph.utils.graphfiles.loadDataset(filename, filename_y=None, extra_params=None)[source]

Read graph data from filename and load them as NetworkX graphs.

filename : string
The name of the file from where the dataset is read.
filename_y : string
The name of file of the targets corresponding to graphs.
extra_params : dict
Extra parameters only designated to ‘.mat’ format.

data : List of NetworkX graph. y : List

Targets corresponding to graphs.

This function supports following graph dataset formats: ‘ds’: load data from .ds file. See comments of function loadFromDS for a example. ‘cxl’: load data from Graph eXchange Language file (.cxl file). See

‘sdf’: load data from structured data file (.sdf file). See
http://www.nonlinear.com/progenesis/sdf-studio/v0.9/faq/sdf-file-format-guidance.aspx, 2018 for details.
‘mat’: Load graph data from a MATLAB (up to version 7.1) .mat file. See
README in downloadable file in http://mlcb.is.tuebingen.mpg.de/Mitarbeiter/Nino/WL/, 2018 for details.
‘txt’: Load graph data from a special .txt file. See
https://ls11-www.cs.tu-dortmund.de/staff/morris/graphkerneldatasets, 2019 for details. Note here filename is the name of either .txt file in the dataset directory.
pygraph.utils.graphfiles.loadFromDS(filename, filename_y)[source]

Load data from .ds file. Possible graph formats include:

‘.ct’: see function loadCT for detail. ‘.gxl’: see dunction loadGXL for detail.

Note these graph formats are checked automatically by the extensions of graph files.

pygraph.utils.graphfiles.loadFromXML(filename, extra_params)[source]
pygraph.utils.graphfiles.loadGXL(filename)[source]
pygraph.utils.graphfiles.loadMAT(filename, extra_params)[source]

Load graph data from a MATLAB (up to version 7.1) .mat file.

A MAT file contains a struct array containing graphs, and a column vector lx containing a class label for each graph. Check README in downloadable file in http://mlcb.is.tuebingen.mpg.de/Mitarbeiter/Nino/WL/, 2018 for detailed structure.

pygraph.utils.graphfiles.loadSDF(filename)[source]

load data from structured data file (.sdf file).

A SDF file contains a group of molecules, represented in the similar way as in MOL format. Check http://www.nonlinear.com/progenesis/sdf-studio/v0.9/faq/sdf-file-format-guidance.aspx, 2018 for detailed structure.

pygraph.utils.graphfiles.loadTXT(dirname_dataset)[source]

Load graph data from a .txt file.

The graph data is loaded from separate files. Check README in downloadable file http://tiny.cc/PK_MLJ_data, 2018 for detailed structure.

pygraph.utils.graphfiles.saveDataset(Gn, y, gformat='gxl', group=None, filename='gfile', xparams=None)[source]

Save list of graphs.

pygraph.utils.graphfiles.saveGXL(graph, filename, method='benoit')[source]

pygraph.utils.ipython_log module

pygraph.utils.isNotebook module

Functions for python system.

pygraph.utils.isNotebook.isNotebook()[source]

check if code is executed in the IPython notebook.

pygraph.utils.kernels module

Those who are not graph kernels. We can be kernels for nodes or edges! These kernels are defined between pairs of vectors.

pygraph.utils.kernels.deltakernel(x, y)[source]

Delta kernel. Return 1 if x == y, 0 otherwise.

x, y : any
Two parts to compare.
kernel : integer
Delta kernel.

[1] H. Kashima, K. Tsuda, and A. Inokuchi. Marginalized kernels between labeled graphs. In Proceedings of the 20th International Conference on Machine Learning, Washington, DC, United States, 2003.

pygraph.utils.kernels.gaussiankernel(x, y, gamma=None)[source]

Gaussian kernel. Compute the rbf (gaussian) kernel between x and y:

K(x, y) = exp(-gamma ||x-y||^2).

Read more in the User Guide.

x, y : array

gamma : float, default None
If None, defaults to 1.0 / n_features

kernel : float

pygraph.utils.kernels.kernelproduct(k1, k2, d11, d12, d21=None, d22=None, lamda=1)[source]

Product of a pair of kernels.

k = lamda * k1(d11, d12) * k2(d21, d22)

k1, k2 : function
A pair of kernel functions.
d11, d12:
Inputs of k1. If d21 or d22 is None, apply d11, d12 to both k1 and k2.
d21, d22:
Inputs of k2.
lamda: float
Coefficient of the product.

kernel : integer

pygraph.utils.kernels.kernelsum(k1, k2, d11, d12, d21=None, d22=None, lamda1=1, lamda2=1)[source]

Sum of a pair of kernels.

k = lamda1 * k1(d11, d12) + lamda2 * k2(d21, d22)

k1, k2 : function
A pair of kernel functions.
d11, d12:
Inputs of k1. If d21 or d22 is None, apply d11, d12 to both k1 and k2.
d21, d22:
Inputs of k2.
lamda1, lamda2: float
Coefficients of the product.

kernel : integer

pygraph.utils.kernels.linearkernel(x, y)[source]

Polynomial kernel. Compute the polynomial kernel between x and y:

K(x, y) = <x, y>.

x, y : array

d : integer, default 1

c : float, default 0

kernel : float

pygraph.utils.kernels.polynomialkernel(x, y, d=1, c=0)[source]

Polynomial kernel. Compute the polynomial kernel between x and y:

K(x, y) = <x, y> ^d + c.

x, y : array

d : integer, default 1

c : float, default 0

kernel : float

pygraph.utils.logger2file module

Created on Fri Nov 8 14:21:25 2019

@author: ljia

class pygraph.utils.logger2file.Logger[source]

Bases: object

flush()[source]
write(message)[source]

pygraph.utils.model_selection_precomputed module

pygraph.utils.model_selection_precomputed.compute_gram_matrices(dataset, y, estimator, param_list_precomputed, results_dir, ds_name, n_jobs=1, str_fw='', verbose=True)[source]
pygraph.utils.model_selection_precomputed.model_selection_for_precomputed_kernel(datafile, estimator, param_grid_precomputed, param_grid, model_type, NUM_TRIALS=30, datafile_y=None, extra_params=None, ds_name='ds-unknown', n_jobs=1, read_gm_from_file=False, verbose=True)[source]

Perform model selection, fitting and testing for precomputed kernels using nested CV. Print out neccessary data during the process then finally the results.

datafile : string
Path of dataset file.
estimator : function
kernel function used to estimate. This function needs to return a gram matrix.
param_grid_precomputed : dictionary
Dictionary with names (string) of parameters used to calculate gram matrices as keys and lists of parameter settings to try as values. This enables searching over any sequence of parameter settings. Params with length 1 will be omitted.
param_grid : dictionary
Dictionary with names (string) of parameters used as penelties as keys and lists of parameter settings to try as values. This enables searching over any sequence of parameter settings. Params with length 1 will be omitted.
model_type : string
Type of the problem, can be ‘regression’ or ‘classification’.
NUM_TRIALS : integer
Number of random trials of outer cv loop. The default is 30.
datafile_y : string
Path of file storing y data. This parameter is optional depending on the given dataset file.
extra_params : dict
Extra parameters for loading dataset. See function pygraph.utils. graphfiles.loadDataset for detail.
ds_name : string
Name of the dataset.
n_jobs : int
Number of jobs for parallelization.
read_gm_from_file : boolean
Whether gram matrices are loaded from a file.
>>> import numpy as np
>>> import sys
>>> sys.path.insert(0, "../")
>>> from pygraph.utils.model_selection_precomputed import model_selection_for_precomputed_kernel
>>> from pygraph.kernels.untilHPathKernel import untilhpathkernel
>>>
>>> datafile = '../datasets/MUTAG/MUTAG_A.txt'
>>> estimator = untilhpathkernel
>>> param_grid_precomputed = {’depth’:  np.linspace(1, 10, 10), ’k_func’:
        [’MinMax’, ’tanimoto’], ’compute_method’:  [’trie’]}
>>> # ’C’ for classification problems and ’alpha’ for regression problems.
>>> param_grid = [{’C’: np.logspace(-10, 10, num=41, base=10)}, {’alpha’:
        np.logspace(-10, 10, num=41, base=10)}]
>>>
>>> model_selection_for_precomputed_kernel(datafile, estimator, 
        param_grid_precomputed, param_grid[0], 'classification', ds_name=’MUTAG’)
pygraph.utils.model_selection_precomputed.parallel_trial_do(param_list_pre_revised, param_list, y, model_type, trial)[source]
pygraph.utils.model_selection_precomputed.printResultsInTable(param_list, param_list_pre_revised, average_val_scores, std_val_scores, average_perf_scores, std_perf_scores, average_train_scores, std_train_scores, gram_matrix_time, model_type, verbose)[source]
pygraph.utils.model_selection_precomputed.read_gram_matrices_from_file(results_dir, ds_name)[source]
pygraph.utils.model_selection_precomputed.trial_do(param_list_pre_revised, param_list, gram_matrices, y, model_type, trial)[source]

pygraph.utils.parallel module

Created on Tue Dec 11 11:39:46 2018 Parallel aid functions. @author: ljia

pygraph.utils.parallel.parallel_gm(func, Kmatrix, Gn, init_worker=None, glbv=None, method='imap_unordered', n_jobs=None, chunksize=None, verbose=True)[source]
pygraph.utils.parallel.parallel_me(func, func_assign, var_to_assign, itr, len_itr=None, init_worker=None, glbv=None, method=None, n_jobs=None, chunksize=None, itr_desc='', verbose=True)[source]

pygraph.utils.trie module

Created on Wed Jan 30 10:48:49 2019

Trie (prefix tree) @author: ljia @references:

class pygraph.utils.trie.Trie[source]

Bases: object

deleteWord(word)[source]
getNode()[source]
insertWord(word)[source]
load_from_json(file_name)[source]
load_from_pickle(file_name)[source]
save_to_json(file_name)[source]
save_to_pickle(file_name)[source]
searchWord(word)[source]
searchWordPrefix(word)[source]
to_json()[source]

pygraph.utils.utils module

pygraph.utils.utils.direct_product(G1, G2, node_label, edge_label)[source]

Return the direct/tensor product of directed graphs G1 and G2.

G1, G2 : NetworkX graph
The original graphs.
node_label : string
node attribute used as label. The default node label is ‘atom’.
edge_label : string
edge attribute used as label. The default edge label is ‘bond_type’.
gt : NetworkX graph
The direct product graph of G1 and G2.

This method differs from networkx.tensor_product in that this method only adds nodes and edges in G1 and G2 that have the same labels to the direct product graph.

[1] Thomas Gärtner, Peter Flach, and Stefan Wrobel. On graph kernels: Hardness results and efficient alternatives. Learning Theory and Kernel Machines, pages 129–143, 2003.

pygraph.utils.utils.floydTransformation(G, edge_weight=None)[source]

Transform graph G to its corresponding shortest-paths graph using Floyd-transformation.

G : NetworkX graph
The graph to be tramsformed.
edge_weight : string
edge attribute corresponding to the edge weight. The default edge weight is bond_type.
S : NetworkX graph
The shortest-paths graph corresponding to G.

[1] Borgwardt KM, Kriegel HP. Shortest-path kernels on graphs. InData Mining, Fifth IEEE International Conference on 2005 Nov 27 (pp. 8-pp). IEEE.

pygraph.utils.utils.getSPGraph(G, edge_weight=None)[source]

Transform graph G to its corresponding shortest-paths graph.

G : NetworkX graph
The graph to be tramsformed.
edge_weight : string
edge attribute corresponding to the edge weight.
S : NetworkX graph
The shortest-paths graph corresponding to G.

For an input graph G, its corresponding shortest-paths graph S contains the same set of nodes as G, while there exists an edge between all nodes in S which are connected by a walk in G. Every edge in S between two nodes is labeled by the shortest distance between these two nodes.

[1] Borgwardt KM, Kriegel HP. Shortest-path kernels on graphs. InData Mining, Fifth IEEE International Conference on 2005 Nov 27 (pp. 8-pp). IEEE.

pygraph.utils.utils.getSPLengths(G1)[source]
pygraph.utils.utils.get_edge_labels(Gn, edge_label)[source]

Get edge labels of dataset Gn.

pygraph.utils.utils.get_node_labels(Gn, node_label)[source]

Get node labels of dataset Gn.

pygraph.utils.utils.graph_deepcopy(G)[source]

Deep copy a graph, including deep copy of all nodes, edges and attributes of the graph, nodes and edges.

It is the same as the NetworkX function graph.copy(), as far as I know.

pygraph.utils.utils.graph_isIdentical(G1, G2)[source]

Check if two graphs are identical, including: same nodes, edges, node labels/attributes, edge labels/attributes.

  1. The type of graphs has to be the same.
  2. Global/Graph attributes are neglected as they may contain names for graphs.
pygraph.utils.utils.untotterTransformation(G, node_label, edge_label)[source]

Transform graph G according to Mahé et al.’s work to filter out tottering patterns of marginalized kernel and tree pattern kernel.

G : NetworkX graph
The graph to be tramsformed.
node_label : string
node attribute used as label. The default node label is ‘atom’.
edge_label : string
edge attribute used as label. The default edge label is ‘bond_type’.
gt : NetworkX graph
The transformed graph corresponding to G.

[1] Pierre Mahé, Nobuhisa Ueda, Tatsuya Akutsu, Jean-Luc Perret, and Jean-Philippe Vert. Extensions of marginalized graph kernels. In Proceedings of the twenty-first international conference on Machine learning, page 70. ACM, 2004.

Module contents

Pygraph - utils module

Implement some methods to manage graphs
graphfiles.py : load .gxl and .ct files utils.py : compute some properties on networkX graphs