Browse Source

update readme

Former-commit-id: 47fa140ee5 [formerly c8947ff62a] [formerly 1e452602be [formerly 0ee27e292d]] [formerly 72bf9dadc8 [formerly 01221ae213] [formerly b4db41a881 [formerly f37505bc67]]] [formerly 52b6f84ab8 [formerly 62fb88bfa2] [formerly dead6e62a2 [formerly e25e5c7aa1]] [formerly a4a8d685ba [formerly cbaedae9d9] [formerly 659beb3798 [formerly 958d6f8149]]]] [formerly 1f09226d55 [formerly 73a24b075f] [formerly d7130c6c7c [formerly f2741a2e55]] [formerly 2df9bbb118 [formerly f3edd735d6] [formerly 5cfcd34bca [formerly a117d6a445]]] [formerly 08f4372caa [formerly 82f00abc15] [formerly ef2e2597eb [formerly 7deeb478aa]] [formerly 98d54c8909 [formerly f697bb31d7] [formerly fd485acc89 [formerly c5da5bef44]]]]] [formerly 666570bdc1 [formerly 80b2a5e530] [formerly 99821efb17 [formerly 3f6f2cb79a]] [formerly ab0044c119 [formerly 820af61397] [formerly c951abf6ec [formerly a37983f82c]]] [formerly 931c40d9df [formerly 28ad2ffa40] [formerly e3b74ff347 [formerly 84cfd32b44]] [formerly 9c39a1f150 [formerly 499f9f1f9e] [formerly 7c680b0ac4 [formerly 8ccc2f164d]]]] [formerly 383f0d4efb [formerly e0af2b72fa] [formerly 7746c3f28a [formerly 70f6dc028a]] [formerly efbf7c346e [formerly 52761f176c] [formerly dfc17dc894 [formerly 4bdbe48e5d]]] [formerly 4f9bfb09c3 [formerly 2036a193ea] [formerly a2e71bc1c5 [formerly 2576cd6633]] [formerly cd612ca469 [formerly 46bb2d179a] [formerly 98a8a51080 [formerly 45cfa43a8d]]]]]]
Former-commit-id: dc962ee8e4 [formerly 86539f5120] [formerly a9a2805041 [formerly 3839132341]] [formerly e44a853574 [formerly 984a23c377] [formerly 8c5499acae [formerly 3797fbe5f5]]] [formerly 9d5c46d160 [formerly 8786bd60af] [formerly cd18059fd1 [formerly 5ac5178b44]] [formerly 4ff5e9d219 [formerly c459e09e49] [formerly b3d3aa8eb3 [formerly b146c06537]]]] [formerly 2e8afd33c9 [formerly 7923b7489b] [formerly 12dcfb06f0 [formerly 36a02f7490]] [formerly 02a8c69576 [formerly 8a499b3eab] [formerly 8267334d1d [formerly de7a16004d]]] [formerly 34a4c5f02f [formerly faf8f2dcfd] [formerly 4a27487b34 [formerly e3a238b03d]] [formerly d97a1e86ec [formerly d07ebb1dbd] [formerly 98a8a51080]]]]
Former-commit-id: 56dfb36413 [formerly 7530c66c09] [formerly fd5401a5b2 [formerly f13376cdc6]] [formerly 34c2d77c57 [formerly 7f058c3b7e] [formerly 9eb2a6005f [formerly 6203137da5]]] [formerly d8bccdf113 [formerly e3ca48a488] [formerly 31c6668797 [formerly f0b6b215d9]] [formerly 78d165bf0d [formerly a150d4cbfb] [formerly d70005e0ac [formerly 4fc2cb50e8]]]]
Former-commit-id: 0ffa5e2fc1 [formerly 39108ae92e] [formerly 0ff79e1194 [formerly 61443332fc]] [formerly 094d164afa [formerly b1b545b32a] [formerly 802f12e7cd [formerly 6fca3a1b8b]]]
Former-commit-id: 261dbcdd29 [formerly 301fb62a4e] [formerly d2523680ee [formerly cf3d0611a8]]
Former-commit-id: 50d4539376 [formerly 47d4be7a1f]
Former-commit-id: 8adddcd4a1
master
lhenry15 4 years ago
parent
commit
75c7f477cc
1 changed files with 1 additions and 57 deletions
  1. +1
    -57
      README.md

+ 1
- 57
README.md View File

@@ -1,5 +1,5 @@
# Time-series Outlie Detection System
TODS is an full-stack automated machine learning system for outlier detection on multivariate time-series data. TODS provides exahaustive modules for building machine learning-based outlier detection systems including: data processing, time series processing, feature analysis (extraction), detection algorithms, and reinforcement module. The functionalities provided via these modules including: data preprocessing for general purposes, time series data smoothing/transformation, extracting features from time/frequency domains, various detection algorithms, and involving human expertises to calibrate the system. Three common outlier detection scenarios on time-series data can be performed: point-wise detection (time points as outliers), pattern-wise detection (subsequences as outliers), and system-wise detection (sets of time series as outliers), and wide-range of corresponding algorithms are provided in TODS. This package is developed by [DATA Lab @ Texas A&M University](https://people.engr.tamu.edu/xiahu/index.html).
TODS is a full-stack automated machine learning system for outlier detection on multivariate time-series data. TODS provides exahaustive modules for building machine learning-based outlier detection systems including: data processing, time series processing, feature analysis (extraction), detection algorithms, and reinforcement module. The functionalities provided via these modules including: data preprocessing for general purposes, time series data smoothing/transformation, extracting features from time/frequency domains, various detection algorithms, and involving human expertises to calibrate the system. Three common outlier detection scenarios on time-series data can be performed: point-wise detection (time points as outliers), pattern-wise detection (subsequences as outliers), and system-wise detection (sets of time series as outliers), and wide-range of corresponding algorithms are provided in TODS. This package is developed by [DATA Lab @ Texas A&M University](https://people.engr.tamu.edu/xiahu/index.html).

TODS is featured for:
* **Full Sack Machine Learning System** which supports exhaustive components from preprocessings, feature extraction, detection algorithms and also human-in-the loop interface.
@@ -125,59 +125,3 @@ python3 -m d3m runtime fit-produce -p pipeline.yml -r datasets/anomaly/yahoo_sub
```
The above commands will generate two files `results.csv` and `pipline_run.yml`

# How to add a new primitive

For new primitives, put them in `/anomaly_pritives`. There is an example for isolation forest (however, this is essentially a RandomForest, although the name is IsolationForest. We need more efforts to change it to real IsolationForest).

In addition to add a new file, you need to register the promitive in `anomaly-primitives/setup.py` and rerun pip install.

Use the following command to check whether your new primitives are registered:
```
python3 -m d3m index search
```

Test the new primitives:
```
python3 examples/build_iforest_pipline.py
```

# Template for meta-data in primitives

* `__author__`: `DATA Lab at Texas A&M University`
* `name`: Just a name. Name your primitive with a few words
* `python_path`: This path should have **5** segments. The first two segments should be `d3m.primitives`. The third segment shoulb be `anomaly_detection`, `data_preprocessing` or `feature_construction` (it should match `primitive_family`). The fourth segment should be your algorithm name, e.g., `isolation_forest`. Note that this name should also be added to [this file](d3m/d3m/metadata/primitive_names.py). The last segment should be one of `Preprocessing`, `Feature`, `Algorithm` (for now).
* `source`: `name` should be `DATA Lab at Texas A&M University`, `contact` should be `mailto:khlai037@tamu.edu`, `uris` should have `https://gitlab.com/lhenry15/tods.git` and the path your py file.
* `algorithms_types`: Name the primitive by your self and add it to [here](d3m/d3m/metadata/schemas/v0/definitions.json#L1957). **Then reinstall d3m.** Fill this field with `metadata_base.PrimitiveAlgorithmType.YOUR_NAME`
* `primitive_family`: For preprocessing primitives, use `metadata_base.PrimitiveFamily.DATA_PREPROCESSING`. For feature analysis primitives, use `metadata_base.PrimitiveFamily.FEATURE_CONSTRUCTION`. For anomaly detection primitives, use `metadata_base.PrimitiveFamily.ANOMALY_DETECTION`.
* `id`: Randomly generate one with `import uuid; uuid.uuid4()`
* `hyperparameters_to_tune`: Specify what hyperparameters can be tuned in your primitive
* `version`: `0.0.1`

Notes:

1. `installation` is not required. We remove it.

2. Try to reinstall everything if it does not work.

3. An example of fake Isolation Forest is [here](anomaly-primitives/anomaly_primitives/SKIsolationForest.py#L294)


## Resources of D3M

If you still have questions, you may refer to the following resources.

Dataset format [https://gitlab.com/datadrivendiscovery/data-supply](https://gitlab.com/datadrivendiscovery/data-supply)

Instructions for creating primitives [https://docs.datadrivendiscovery.org/v2020.1.9/interfaces.html](https://docs.datadrivendiscovery.org/v2020.1.9/interfaces.html)

We use a stable version of d3m core package at [https://gitlab.com/datadrivendiscovery/d3m/-/tree/v2020.1.9](https://gitlab.com/datadrivendiscovery/d3m/-/tree/v2020.1.9).

The documentation is at [https://docs.datadrivendiscovery.org/](https://docs.datadrivendiscovery.org/).

The core package documentation is at [https://docs.datadrivendiscovery.org/v2020.1.9/index.html](https://docs.datadrivendiscovery.org/v2020.1.9/index.html)

The common-primitives is v0.8.0 at [https://gitlab.com/datadrivendiscovery/common-primitives/-/tree/v0.8.0/common_primitives](https://gitlab.com/datadrivendiscovery/common-primitives/-/tree/v0.8.0/common_primitives)

The sklearn-wrap uses dist branch [https://gitlab.com/datadrivendiscovery/sklearn-wrap/-/tree/dist](https://gitlab.com/datadrivendiscovery/sklearn-wrap/-/tree/dist)

There are other primitives developed by many universities but are not used in this repo. See [https://gitlab.com/datadrivendiscovery/primitives](https://gitlab.com/datadrivendiscovery/primitives)

Loading…
Cancel
Save