You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.txt 2.5 kB

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970
  1. README for dataset NCI1
  2. === Usage ===
  3. This folder contains the following comma separated text files
  4. (replace DS by the name of the dataset):
  5. n = total number of nodes
  6. m = total number of edges
  7. N = number of graphs
  8. (1) DS_A.txt (m lines)
  9. sparse (block diagonal) adjacency matrix for all graphs,
  10. each line corresponds to (row, col) resp. (node_id, node_id)
  11. (2) DS_graph_indicator.txt (n lines)
  12. column vector of graph identifiers for all nodes of all graphs,
  13. the value in the i-th line is the graph_id of the node with node_id i
  14. (3) DS_graph_labels.txt (N lines)
  15. class labels for all graphs in the dataset,
  16. the value in the i-th line is the class label of the graph with graph_id i
  17. (4) DS_node_labels.txt (n lines)
  18. column vector of node labels,
  19. the value in the i-th line corresponds to the node with node_id i
  20. There are OPTIONAL files if the respective information is available:
  21. (5) DS_edge_labels.txt (m lines; same size as DS_A_sparse.txt)
  22. labels for the edges in DS_A_sparse.txt
  23. (6) DS_edge_attributes.txt (m lines; same size as DS_A.txt)
  24. attributes for the edges in DS_A.txt
  25. (7) DS_node_attributes.txt (n lines)
  26. matrix of node attributes,
  27. the comma seperated values in the i-th line is the attribute vector of the node with node_id i
  28. (8) DS_graph_attributes.txt (N lines)
  29. regression values for all graphs in the dataset,
  30. the value in the i-th line is the attribute of the graph with graph_id i
  31. === Description ===
  32. NCI1 and NCI109 represent two balanced subsets of datasets of chemical compounds screened
  33. for activity against non-small cell lung cancer and ovarian cancer cell lines respectively
  34. (Wale and Karypis (2006) and http://pubchem.ncbi.nlm.nih.gov).
  35. === Previous Use of the Dataset ===
  36. Neumann, M., Garnett R., Bauckhage Ch., Kersting K.: Propagation Kernels: Efficient Graph
  37. Kernels from Propagated Information. Under review at MLJ.
  38. Neumann, M., Patricia, N., Garnett, R., Kersting, K.: Efficient Graph Kernels by
  39. Randomization. In: P.A. Flach, T.D. Bie, N. Cristianini (eds.) ECML/PKDD, Notes in
  40. Computer Science, vol. 7523, pp. 378-393. Springer (2012).
  41. Shervashidze, N., Schweitzer, P., van Leeuwen, E., Mehlhorn, K., Borgwardt, K.:
  42. Weisfeiler-Lehman Graph Kernels. Journal of Machine Learning Research 12, 2539-2561 (2011)
  43. === References ===
  44. N. Wale and G. Karypis. Comparison of descriptor spaces for chemical compound retrieval and
  45. classification. In Proc. of ICDM, pages 678–689, Hong Kong, 2006.

A Python package for graph kernels, graph edit distances and graph pre-image problem.