pygraph.utils package¶

Submodules¶

pygraph.utils.graphdataset module¶

Obtain all kinds of attributes of a graph dataset.

pygraph.utils.graphdataset.get_dataset_attributes(Gn, target=None, attr_names=[], node_label=None, edge_label=None)[source]¶

Returns the structure and property information of the graph dataset Gn.

Gn : List of NetworkX graph

List of graphs whose information will be returned.

target : list

The list of classification targets corresponding to Gn. Only works for classification problems.

attr_names : list

List of strings which indicate which informations will be returned. The possible choices includes: ‘substructures’: sub-structures Gn contains, including ‘linear’, ‘non

linear’ and ‘cyclic’.

‘node_labeled’: whether vertices have symbolic labels. ‘edge_labeled’: whether egdes have symbolic labels. ‘is_directed’: whether graphs in Gn are directed. ‘dataset_size’: number of graphs in Gn. ‘ave_node_num’: average number of vertices of graphs in Gn. ‘min_node_num’: minimum number of vertices of graphs in Gn. ‘max_node_num’: maximum number of vertices of graphs in Gn. ‘ave_edge_num’: average number of edges of graphs in Gn. ‘min_edge_num’: minimum number of edges of graphs in Gn. ‘max_edge_num’: maximum number of edges of graphs in Gn. ‘ave_node_degree’: average vertex degree of graphs in Gn. ‘min_node_degree’: minimum vertex degree of graphs in Gn. ‘max_node_degree’: maximum vertex degree of graphs in Gn. ‘ave_fill_factor’: average fill factor (number_of_edges /

(number_of_nodes ** 2)) of graphs in Gn.

‘min_fill_factor’: minimum fill factor of graphs in Gn. ‘max_fill_factor’: maximum fill factor of graphs in Gn. ‘node_label_num’: number of symbolic vertex labels. ‘edge_label_num’: number of symbolic edge labels. ‘node_attr_dim’: number of dimensions of non-symbolic vertex labels.

Extracted from the ‘attributes’ attribute of graph nodes.

‘edge_attr_dim’: number of dimensions of non-symbolic edge labels.: Extracted from the ‘attributes’ attribute of graph edges.
‘class_number’: number of classes. Only available for classification: problems.

node_label : string

Node attribute used as label. The default node label is atom. Mandatory when ‘node_labeled’ or ‘node_label_num’ is required.

edge_label : string

Edge attribute used as label. The default edge label is bond_type. Mandatory when ‘edge_labeled’ or ‘edge_label_num’ is required.

attrs : dict: Value for each property.

pygraph.utils.graphfiles module¶

Utilities function to manage graph files

pygraph.utils.graphfiles.loadCT(filename)[source]¶

load data from a Chemical Table (.ct) file.

a typical example of data in .ct is like this:

3 2 <- number of nodes and edges

0.0000 0.0000 0.0000 C <- each line describes a node (x,y,z + label) 0.0000 0.0000 0.0000 C 0.0000 0.0000 0.0000 O

1 3 1 1 <- each line describes an edge : to, from, bond type, bond stereo 2 3 1 1

Check https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=10&ved=2ahUKEwivhaSdjsTlAhVhx4UKHczHA8gQFjAJegQIARAC&url=https%3A%2F%2Fwww.daylight.com%2Fmeetings%2Fmug05%2FKappler%2Fctfile.pdf&usg=AOvVaw1cDNrrmMClkFPqodlF2inS for detailed format discription.

pygraph.utils.graphfiles.loadDataset(filename, filename_y=None, extra_params=None)[source]¶

Read graph data from filename and load them as NetworkX graphs.

filename : string: The name of the file from where the dataset is read.
filename_y : string: The name of file of the targets corresponding to graphs.
extra_params : dict: Extra parameters only designated to ‘.mat’ format.

data : List of NetworkX graph. y : List

Targets corresponding to graphs.

This function supports following graph dataset formats: ‘ds’: load data from .ds file. See comments of function loadFromDS for a example. ‘cxl’: load data from Graph eXchange Language file (.cxl file). See

http://www.gupro.de/GXL/Introduction/background.html, 2019 for detail.

‘sdf’: load data from structured data file (.sdf file). See: http://www.nonlinear.com/progenesis/sdf-studio/v0.9/faq/sdf-file-format-guidance.aspx, 2018 for details.
‘mat’: Load graph data from a MATLAB (up to version 7.1) .mat file. See: README in downloadable file in http://mlcb.is.tuebingen.mpg.de/Mitarbeiter/Nino/WL/, 2018 for details.
‘txt’: Load graph data from a special .txt file. See: https://ls11-www.cs.tu-dortmund.de/staff/morris/graphkerneldatasets, 2019 for details. Note here filename is the name of either .txt file in the dataset directory.

pygraph.utils.graphfiles.loadFromDS(filename, filename_y)[source]¶

Load data from .ds file. Possible graph formats include:

‘.ct’: see function loadCT for detail. ‘.gxl’: see dunction loadGXL for detail.

Note these graph formats are checked automatically by the extensions of graph files.

pygraph.utils.graphfiles.loadFromXML(filename, extra_params)[source]¶

pygraph.utils.graphfiles.loadGXL(filename)[source]¶

pygraph.utils.graphfiles.loadMAT(filename, extra_params)[source]¶

Load graph data from a MATLAB (up to version 7.1) .mat file.

A MAT file contains a struct array containing graphs, and a column vector lx containing a class label for each graph. Check README in downloadable file in http://mlcb.is.tuebingen.mpg.de/Mitarbeiter/Nino/WL/, 2018 for detailed structure.

pygraph.utils.graphfiles.loadSDF(filename)[source]¶

load data from structured data file (.sdf file).

A SDF file contains a group of molecules, represented in the similar way as in MOL format. Check http://www.nonlinear.com/progenesis/sdf-studio/v0.9/faq/sdf-file-format-guidance.aspx, 2018 for detailed structure.

pygraph.utils.graphfiles.loadTXT(dirname_dataset)[source]¶

Load graph data from a .txt file.

The graph data is loaded from separate files. Check README in downloadable file http://tiny.cc/PK_MLJ_data, 2018 for detailed structure.

pygraph.utils.graphfiles.saveDataset(Gn, y, gformat='gxl', group=None, filename='gfile', xparams=None)[source]¶: Save list of graphs.

pygraph.utils.graphfiles.saveGXL(graph, filename, method='benoit')[source]¶

pygraph.utils.ipython_log module¶

pygraph.utils.isNotebook module¶

Functions for python system.

pygraph.utils.isNotebook.isNotebook()[source]¶: check if code is executed in the IPython notebook.

pygraph.utils.kernels module¶

Those who are not graph kernels. We can be kernels for nodes or edges! These kernels are defined between pairs of vectors.

pygraph.utils.kernels.deltakernel(x, y)[source]¶

Delta kernel. Return 1 if x == y, 0 otherwise.

x, y : any: Two parts to compare.

kernel : integer: Delta kernel.

[1] H. Kashima, K. Tsuda, and A. Inokuchi. Marginalized kernels between labeled graphs. In Proceedings of the 20th International Conference on Machine Learning, Washington, DC, United States, 2003.

pygraph.utils.kernels.gaussiankernel(x, y, gamma=None)[source]¶

Gaussian kernel. Compute the rbf (gaussian) kernel between x and y:

K(x, y) = exp(-gamma ||x-y||^2).

pygraph.utils.logger2file module¶

Created on Fri Nov 8 14:21:25 2019

@author: ljia

class pygraph.utils.logger2file.Logger[source]¶

Bases: object

flush()[source]¶

write(message)[source]¶

pygraph.utils.model_selection_precomputed module¶

pygraph.utils.model_selection_precomputed.compute_gram_matrices(dataset, y, estimator, param_list_precomputed, results_dir, ds_name, n_jobs=1, str_fw='', verbose=True)[source]¶

pygraph.utils.model_selection_precomputed.model_selection_for_precomputed_kernel(datafile, estimator, param_grid_precomputed, param_grid, model_type, NUM_TRIALS=30, datafile_y=None, extra_params=None, ds_name='ds-unknown', n_jobs=1, read_gm_from_file=False, verbose=True)[source]¶

Perform model selection, fitting and testing for precomputed kernels using nested CV. Print out neccessary data during the process then finally the results.

datafile : string: Path of dataset file.
estimator : function: kernel function used to estimate. This function needs to return a gram matrix.
param_grid_precomputed : dictionary: Dictionary with names (string) of parameters used to calculate gram matrices as keys and lists of parameter settings to try as values. This enables searching over any sequence of parameter settings. Params with length 1 will be omitted.
param_grid : dictionary: Dictionary with names (string) of parameters used as penelties as keys and lists of parameter settings to try as values. This enables searching over any sequence of parameter settings. Params with length 1 will be omitted.
model_type : string: Type of the problem, can be ‘regression’ or ‘classification’.
NUM_TRIALS : integer: Number of random trials of outer cv loop. The default is 30.
datafile_y : string: Path of file storing y data. This parameter is optional depending on the given dataset file.
extra_params : dict: Extra parameters for loading dataset. See function pygraph.utils. graphfiles.loadDataset for detail.
ds_name : string: Name of the dataset.
n_jobs : int: Number of jobs for parallelization.
read_gm_from_file : boolean: Whether gram matrices are loaded from a file.

>>> import numpy as np
>>> import sys
>>> sys.path.insert(0, "../")
>>> from pygraph.utils.model_selection_precomputed import model_selection_for_precomputed_kernel
>>> from pygraph.kernels.untilHPathKernel import untilhpathkernel
>>>
>>> datafile = '../datasets/MUTAG/MUTAG_A.txt'
>>> estimator = untilhpathkernel
>>> param_grid_precomputed = {’depth’:  np.linspace(1, 10, 10), ’k_func’:
        [’MinMax’, ’tanimoto’], ’compute_method’:  [’trie’]}
>>> # ’C’ for classification problems and ’alpha’ for regression problems.
>>> param_grid = [{’C’: np.logspace(-10, 10, num=41, base=10)}, {’alpha’:
        np.logspace(-10, 10, num=41, base=10)}]
>>>
>>> model_selection_for_precomputed_kernel(datafile, estimator, 
        param_grid_precomputed, param_grid[0], 'classification', ds_name=’MUTAG’)

pygraph.utils.model_selection_precomputed.parallel_trial_do(param_list_pre_revised, param_list, y, model_type, trial)[source]¶

pygraph.utils.model_selection_precomputed.printResultsInTable(param_list, param_list_pre_revised, average_val_scores, std_val_scores, average_perf_scores, std_perf_scores, average_train_scores, std_train_scores, gram_matrix_time, model_type, verbose)[source]¶

pygraph.utils.model_selection_precomputed.read_gram_matrices_from_file(results_dir, ds_name)[source]¶

pygraph.utils.model_selection_precomputed.trial_do(param_list_pre_revised, param_list, gram_matrices, y, model_type, trial)[source]¶

pygraph.utils.parallel module¶

Created on Tue Dec 11 11:39:46 2018 Parallel aid functions. @author: ljia

pygraph.utils.parallel.parallel_gm(func, Kmatrix, Gn, init_worker=None, glbv=None, method='imap_unordered', n_jobs=None, chunksize=None, verbose=True)[source]¶

pygraph.utils.parallel.parallel_me(func, func_assign, var_to_assign, itr, len_itr=None, init_worker=None, glbv=None, method=None, n_jobs=None, chunksize=None, itr_desc='', verbose=True)[source]¶

pygraph.utils.trie module¶

Created on Wed Jan 30 10:48:49 2019

Trie (prefix tree) @author: ljia @references:

https://viblo.asia/p/nlp-build-a-trie-data-structure-from-scratch-with-python-3P0lPzroKox, 2019.1

class pygraph.utils.trie.Trie[source]¶

Bases: object

deleteWord(word)[source]¶

getNode()[source]¶

insertWord(word)[source]¶

load_from_json(file_name)[source]¶

load_from_pickle(file_name)[source]¶

save_to_json(file_name)[source]¶

save_to_pickle(file_name)[source]¶

searchWord(word)[source]¶

searchWordPrefix(word)[source]¶

to_json()[source]¶

pygraph.utils.utils module¶

pygraph.utils.utils.direct_product(G1, G2, node_label, edge_label)[source]¶

Return the direct/tensor product of directed graphs G1 and G2.

G1, G2 : NetworkX graph: The original graphs.
node_label : string: node attribute used as label. The default node label is ‘atom’.
edge_label : string: edge attribute used as label. The default edge label is ‘bond_type’.

gt : NetworkX graph: The direct product graph of G1 and G2.

This method differs from networkx.tensor_product in that this method only adds nodes and edges in G1 and G2 that have the same labels to the direct product graph.

[1] Thomas Gärtner, Peter Flach, and Stefan Wrobel. On graph kernels: Hardness results and efficient alternatives. Learning Theory and Kernel Machines, pages 129–143, 2003.

pygraph.utils.utils.floydTransformation(G, edge_weight=None)[source]¶

Transform graph G to its corresponding shortest-paths graph using Floyd-transformation.

G : NetworkX graph: The graph to be tramsformed.
edge_weight : string: edge attribute corresponding to the edge weight. The default edge weight is bond_type.

S : NetworkX graph: The shortest-paths graph corresponding to G.

[1] Borgwardt KM, Kriegel HP. Shortest-path kernels on graphs. InData Mining, Fifth IEEE International Conference on 2005 Nov 27 (pp. 8-pp). IEEE.

pygraph.utils.utils.getSPGraph(G, edge_weight=None)[source]¶

Transform graph G to its corresponding shortest-paths graph.

G : NetworkX graph: The graph to be tramsformed.
edge_weight : string: edge attribute corresponding to the edge weight.

S : NetworkX graph: The shortest-paths graph corresponding to G.

For an input graph G, its corresponding shortest-paths graph S contains the same set of nodes as G, while there exists an edge between all nodes in S which are connected by a walk in G. Every edge in S between two nodes is labeled by the shortest distance between these two nodes.

[1] Borgwardt KM, Kriegel HP. Shortest-path kernels on graphs. InData Mining, Fifth IEEE International Conference on 2005 Nov 27 (pp. 8-pp). IEEE.

pygraph.utils.utils.getSPLengths(G1)[source]¶

pygraph.utils.utils.get_edge_labels(Gn, edge_label)[source]¶: Get edge labels of dataset Gn.

pygraph.utils.utils.get_node_labels(Gn, node_label)[source]¶: Get node labels of dataset Gn.

pygraph.utils.utils.graph_deepcopy(G)[source]¶

Deep copy a graph, including deep copy of all nodes, edges and attributes of the graph, nodes and edges.

It is the same as the NetworkX function graph.copy(), as far as I know.

pygraph.utils.utils.graph_isIdentical(G1, G2)[source]¶

Check if two graphs are identical, including: same nodes, edges, node labels/attributes, edge labels/attributes.

The type of graphs has to be the same.
Global/Graph attributes are neglected as they may contain names for graphs.

pygraph.utils.utils.untotterTransformation(G, node_label, edge_label)[source]¶

Transform graph G according to Mahé et al.’s work to filter out tottering patterns of marginalized kernel and tree pattern kernel.

G : NetworkX graph: The graph to be tramsformed.
node_label : string: node attribute used as label. The default node label is ‘atom’.
edge_label : string: edge attribute used as label. The default edge label is ‘bond_type’.

gt : NetworkX graph: The transformed graph corresponding to G.

[1] Pierre Mahé, Nobuhisa Ueda, Tatsuya Akutsu, Jean-Luc Perret, and Jean-Philippe Vert. Extensions of marginalized graph kernels. In Proceedings of the twenty-first international conference on Machine learning, page 70. ACM, 2004.

Module contents¶

Pygraph - utils module

Implement some methods to manage graphs: graphfiles.py : load .gxl and .ct files utils.py : compute some properties on networkX graphs