Browse Source

add description of package

Former-commit-id: 3d87bcb102 [formerly cbcddaf8a2] [formerly 7645085288 [formerly e032c4e235]] [formerly dd1416058f [formerly 866b60825c] [formerly b23d3e0f29 [formerly 4b592e70b9]]] [formerly 32a62d6048 [formerly eb99525a32] [formerly f722fae610 [formerly 18f17251af]] [formerly 937aecd023 [formerly 66acef7718] [formerly e4790d1cff [formerly b6b55b79a1]]]] [formerly 85c4968f2c [formerly 69738408d8] [formerly fd62b25fac [formerly ead3d8273e]] [formerly 94948ff8e7 [formerly 015d314c5d] [formerly 8a1be8d81c [formerly 1ec9dd10d9]]] [formerly d8d30e2e69 [formerly 5740877e2b] [formerly ba9960d3af [formerly 6f9b4a3e9d]] [formerly b1dd439dbe [formerly 466d94e65b] [formerly 11bb1473f8 [formerly e171df0342]]]]] [formerly a930b671ac [formerly 723e1fcd10] [formerly 8010ac644a [formerly 45a30f3128]] [formerly 6fa3d4fa6f [formerly 0cc5cbf4ab] [formerly 1a837fbb29 [formerly cbe7906679]]] [formerly b1bba9bc05 [formerly e6185c2ed4] [formerly 389a1c93f7 [formerly 08b4073b34]] [formerly d031ce8044 [formerly ea0313afe6] [formerly 6c39b4e41a [formerly 90aa137d3d]]]] [formerly e5ab756acf [formerly 0602adbf64] [formerly 30484c9e69 [formerly c143de4212]] [formerly ea38d9356d [formerly 02d4cf1555] [formerly 63c8aa05b7 [formerly 303e84a6f3]]] [formerly 386d531a4e [formerly 33479db2a6] [formerly c85a233c91 [formerly 8d6219e826]] [formerly 11d58286bd [formerly 4bef84823f] [formerly 32a96da182 [formerly 8e1da03f7b]]]]]]
Former-commit-id: db38eb09b6 [formerly bbf15def37] [formerly 9d73fd29c1 [formerly 270fdcfaee]] [formerly cf12039fda [formerly 01323279a2] [formerly f64e5b2b08 [formerly 39f3d6d23f]]] [formerly 6381f70e37 [formerly 45759d2176] [formerly b62dc2c805 [formerly 2940d3fed3]] [formerly 8395cc72a4 [formerly 22e070936d] [formerly a1bab947c4 [formerly 8b5e0aaf2c]]]] [formerly 927174f0c9 [formerly 142c63a897] [formerly b14daa4ac2 [formerly 86b4d0eb7d]] [formerly 4cbdd6d0b3 [formerly 4bd2642f3f] [formerly 1d59eca49c [formerly 34cead24d4]]] [formerly 45ae1758d7 [formerly 0fae88dd8e] [formerly b583db8592 [formerly 2e3c6e1872]] [formerly b3f5b36029 [formerly e6630a66fc] [formerly 32a96da182]]]]
Former-commit-id: 725e712087 [formerly be8c04f04e] [formerly 7ea59d4eb9 [formerly 45209b4646]] [formerly d28b24f4cd [formerly b4b877b6a9] [formerly 8f9466e3b1 [formerly c345ee07e3]]] [formerly 6b7f5f3963 [formerly c53f2090f5] [formerly a59ea47161 [formerly dc6f505e72]] [formerly f38e6b018c [formerly 1e91e98a3a] [formerly 336a6b8fd6 [formerly 1d45298de3]]]]
Former-commit-id: 6d1da8f1a3 [formerly ca46486eaf] [formerly 7681a56439 [formerly a33a56e7aa]] [formerly 348b393cb0 [formerly b9cedc5896] [formerly ba09c1a149 [formerly dc7398eb39]]]
Former-commit-id: 87e5521a63 [formerly d6d506319b] [formerly 0680a08ae1 [formerly d345ad1183]]
Former-commit-id: 8a768ffe30 [formerly 92304aedf9]
Former-commit-id: 45bc24badd
master
lhenry15 4 years ago
parent
commit
d9d117387b
1 changed files with 1 additions and 33 deletions
  1. +1
    -33
      README.md

+ 1
- 33
README.md View File

@@ -1,5 +1,5 @@
# Time-series Outlie Detection System
TODS is an full-stack automated machine learning system for outlier detection on multivariate time-series data. TODS provides exahaustive modules for building machine learning-based outlier detection systems including: data processing, time series processing, feature analysis (extraction), detection algorithms, and reinforcement module. The functionalities provided via these modules including: data preprocessing for general purposes, time series data smoothing/transformation, extracting features from time/frequency domains, various detection algorithms, and involving human expertises to calibrate the system. Specifically, three application scenarios on time-series outlier detection are provided: point-wise detection (time points as outliers), pattern-wise detection (subsequences as outliers), and system-wise detection (sets of time series as outliers), and wide-range of corresponding algorithms are provided in TODS. This package is developed by [DATA Lab @ Texas A&M University](https://people.engr.tamu.edu/xiahu/index.html).
TODS is an full-stack automated machine learning system for outlier detection on multivariate time-series data. TODS provides exahaustive modules for building machine learning-based outlier detection systems including: data processing, time series processing, feature analysis (extraction), detection algorithms, and reinforcement module. The functionalities provided via these modules including: data preprocessing for general purposes, time series data smoothing/transformation, extracting features from time/frequency domains, various detection algorithms, and involving human expertises to calibrate the system. Three common outlier detection scenarios on time-series data can be performed: point-wise detection (time points as outliers), pattern-wise detection (subsequences as outliers), and system-wise detection (sets of time series as outliers), and wide-range of corresponding algorithms are provided in TODS. This package is developed by [DATA Lab @ Texas A&M University](https://people.engr.tamu.edu/xiahu/index.html).

TODS is featured for:
* **Full Sack Machine Learning System** which supports exhaustive components from preprocessings, feature extraction, detection algorithms and also human-in-the loop interface.
@@ -9,13 +9,6 @@ TODS is featured for:
* **Automated Machine Learning** aims on providing knowledge-free process that construct optimal pipeline based on the given data by automatically searching the best combination from all of the existing modules.


## Axolotl
Running pre-defined pipeline
```
python examples/build_AutoEncoder_pipeline.py
python examples/run_predefined_pipeline.py
```

## Installation

This package works with **Python 3.6** and pip 19+. You need to have the following packages installed on the system (for Debian/Ubuntu):
@@ -115,31 +108,6 @@ best_output = best_pipeline_result.output
# Evaluate the best pipeline
best_scores = search.evaluate(best_pipeline).scores
```

# Dataset
Datasets are located in `datasets/anomaly`. `raw_data` is the raw time series data. `transform.py` is script to transform the raw data to D3M format. `template` includes some templates for generating D3M data. If you run `transform.py`, the script will load the raw `kpi` data and create a folder named `kpi` in D3M format.

The generated csv file will have the following columns: `d3mIndex`, `timestamp`, `value`, `'ground_truth`. In the example kpi dataset, there is only one value. For other datasets there could be multiple values. The goal of the pipline is to predict the `ground_truth` based on `timestamp` and the value(s).

There is a nice script to check whether the dataset is in the right format. Run
```
python3 datasets/validate.py datasets/anomaly/kpi/
```
The expected output is as follows:
```
Validating problem '/home/grads/d/daochen/tods/tods/datasets/anomaly/kpi/SCORE/problem_TEST/problemDoc.json'.
Validating dataset '/home/grads/d/daochen/tods/tods/datasets/anomaly/kpi/SCORE/dataset_TEST/datasetDoc.json'.
Validating problem '/home/grads/d/daochen/tods/tods/datasets/anomaly/kpi/kpi_problem/problemDoc.json'.
Validating problem '/home/grads/d/daochen/tods/tods/datasets/anomaly/kpi/TEST/problem_TEST/problemDoc.json'.
Validating dataset '/home/grads/d/daochen/tods/tods/datasets/anomaly/kpi/TEST/dataset_TEST/datasetDoc.json'.
Validating dataset '/home/grads/d/daochen/tods/tods/datasets/anomaly/kpi/kpi_dataset/datasetDoc.json'.
Validating dataset '/home/grads/d/daochen/tods/tods/datasets/anomaly/kpi/TRAIN/dataset_TRAIN/datasetDoc.json'.
Validating problem '/home/grads/d/daochen/tods/tods/datasets/anomaly/kpi/TRAIN/problem_TRAIN/problemDoc.json'.
Validating all datasets and problems.
There are no errors.
```
Of course, you can also create other datasets with `transform.py`. But for now, we can focus on this example dataset since other datasets are usually in the same format.

# Example
In D3M, our goal is to provide a **solution** to a **problem** on a **dataset**. Here, solution is a pipline which consists of data processing, classifiers, etc.



Loading…
Cancel
Save