Browse Source

upgrade to py37 and fix tensorflow function conflicts

master
lhenry15 3 years ago
parent
commit
833d5b0314
16 changed files with 165 additions and 56 deletions
  1. +13
    -8
      README.md
  2. +113
    -0
      README.zh-CN.md
  3. +3
    -3
      examples/axolotl_interface/run_pipeline.py
  4. +1
    -1
      examples/axolotl_interface/run_search.py
  5. +2
    -3
      primitive_tests/test.sh
  6. +0
    -9
      replace.sh
  7. +5
    -4
      setup.py
  8. +4
    -4
      tods/detection_algorithm/DeepLog.py
  9. +1
    -1
      tods/detection_algorithm/LSTMODetect.py
  10. +1
    -1
      tods/detection_algorithm/PyodAE.py
  11. +5
    -5
      tods/detection_algorithm/Telemanom.py
  12. +2
    -2
      tods/detection_algorithm/core/LSTMOD.py
  13. +5
    -5
      tods/detection_algorithm/core/utils/modeling.py
  14. +2
    -2
      tods/reinforcement/RuleBasedFilter.py
  15. +1
    -1
      tods/searcher/brute_force_search.py
  16. +7
    -7
      tods/tests/run_tests.py

+ 13
- 8
README.md View File

@@ -4,6 +4,8 @@

[![Build Status](https://travis-ci.org/datamllab/tods.svg?branch=master)](https://travis-ci.org/datamllab/tods)

[中文文档](README.zh-CN.md)

TODS is a full-stack automated machine learning system for outlier detection on multivariate time-series data. TODS provides exhaustive modules for building machine learning-based outlier detection systems, including: data processing, time series processing, feature analysis (extraction), detection algorithms, and reinforcement module. The functionalities provided via these modules include data preprocessing for general purposes, time series data smoothing/transformation, extracting features from time/frequency domains, various detection algorithms, and involving human expertise to calibrate the system. Three common outlier detection scenarios on time-series data can be performed: point-wise detection (time points as outliers), pattern-wise detection (subsequences as outliers), and system-wise detection (sets of time series as outliers), and a wide-range of corresponding algorithms are provided in TODS. This package is developed by [DATA Lab @ Texas A&M University](https://people.engr.tamu.edu/xiahu/index.html).

TODS is featured for:
@@ -16,18 +18,21 @@ TODS is featured for:
## Resources
* API Documentations: [http://tods-doc.github.io](http://tods-doc.github.io)
* Paper: [https://arxiv.org/abs/2009.09822](https://arxiv.org/abs/2009.09822)
* Related Project: [AutoVideo: An Automated Video Action Recognition System](https://github.com/datamllab/autovideo)

## Cite this Work:
If you find this work useful, you may cite this work:
```
@misc{lai2020tods,
title={TODS: An Automated Time Series Outlier Detection System},
author={Kwei-Harng Lai and Daochen Zha and Guanchu Wang and Junjie Xu and Yue Zhao and Devesh Kumar and Yile Chen and Purav Zumkhawaka and Minyang Wan and Diego Martinez and Xia Hu},
year={2020},
eprint={2009.09822},
archivePrefix={arXiv},
primaryClass={cs.DB}
@article{Lai_Zha_Wang_Xu_Zhao_Kumar_Chen_Zumkhawaka_Wan_Martinez_Hu_2021,
title={TODS: An Automated Time Series Outlier Detection System},
volume={35},
number={18},
journal={Proceedings of the AAAI Conference on Artificial Intelligence},
author={Lai, Kwei-Herng and Zha, Daochen and Wang, Guanchu and Xu, Junjie and Zhao, Yue and Kumar, Devesh and Chen, Yile and Zumkhawaka, Purav and Wan, Minyang and Martinez, Diego and Hu, Xia},
year={2021}, month={May},
pages={16060-16062}
}

```

## Installation
@@ -37,7 +42,7 @@ This package works with **Python 3.6** and pip 19+. You need to have the followi
sudo apt-get install libssl-dev libcurl4-openssl-dev libyaml-dev build-essential libopenblas-dev libcap-dev ffmpeg
```

Clone the repository:
Clone the repository (if you are in China and Github is slow, you can use the mirror in [Gitee](https://gitee.com/daochenzha/tods)):
```
git clone https://github.com/datamllab/tods.git
```


+ 113
- 0
README.zh-CN.md View File

@@ -0,0 +1,113 @@

# TODS: Automated Time-series Outlier Detection System 自动化时间序列异常检测系统
<img width="500" src="./docs/img/tods_logo.png" alt="Logo" />

[![Build Status](https://travis-ci.org/datamllab/tods.svg?branch=master)](https://travis-ci.org/datamllab/tods)

[English README](README.md)

TODS是一个全栈的自动化机器学习系统,主要针对多变量时间序列数据的异常检测。TODS提供了详尽的用于构建基于机器学习的异常检测系统的模块,它们包括:数据处理(data processing),时间序列处理( time series processing),特征分析(feature analysis),检测算法(detection algorithms),和强化模块( reinforcement module)。这些模块所提供的功能包括常见的数据预处理、时间序列数据的平滑或变换,从时域或频域中抽取特征、多种多样的检测算法以及让人类专家来校准系统。该系统可以处理三种常见的时间序列异常检测场景:点的异常检测(异常是时间点)、模式的异常检测(异常是子序列)、系统的异常检测(异常是时间序列的集合)。TODS提供了一系列相应的算法。该包由 [DATA Lab @ Texas A&M University](https://people.engr.tamu.edu/xiahu/index.html) 开发。

TODS具有如下特点:
* **全栈式机器学习系统**:支持从数据预处理、特征提取、到检测算法和人为规则每一个步骤并提供相应的接口。

* **广泛的算法支持**:包括[PyOD](https://github.com/yzhao062/pyod) 提供的点的异常检测算法、最先进的模式的异常检测算法(例如 [DeepLog](https://www.cs.utah.edu/~lifeifei/papers/deeplog.pdf), [Telemanon](https://arxiv.org/pdf/1802.04431.pdf) ),以及用于系统的异常检测的集合算法。

* **自动化的机器学习**:旨在提供无需专业知识的过程,通过自动搜索所有现有模块中的最佳组合,基于给定数据构造最优管道。

## 相关资源
* API文档: [http://tods-doc.github.io](http://tods-doc.github.io)
* 论文: [https://arxiv.org/abs/2009.09822](https://arxiv.org/abs/2009.09822)
* 相关项目:[AutoVideo: An Automated Video Action Recognition System](https://github.com/datamllab/autovideo)

## 引用该工作:
如何您觉得我们的工作有用,请引用该工作:
```
@misc{lai2020tods,
title={TODS: An Automated Time Series Outlier Detection System},
author={Kwei-Harng Lai and Daochen Zha and Guanchu Wang and Junjie Xu and Yue Zhao and Devesh Kumar and Yile Chen and Purav Zumkhawaka and Minyang Wan and Diego Martinez and Xia Hu},
year={2020},
eprint={2009.09822},
archivePrefix={arXiv},
primaryClass={cs.DB}
}
```

## 安装

这个包的运行环境是 **Python 3.6** 和pip 19+。对于Debian和Ubuntu的使用者,您需要在系统上安装如下的包:
```
sudo apt-get install libssl-dev libcurl4-openssl-dev libyaml-dev build-essential libopenblas-dev libcap-dev ffmpeg
```

克隆该仓库(如果您访问Github较慢,国内用户可以使用[Gitee镜像](https://gitee.com/daochenzha/tods)):
```
git clone https://github.com/datamllab/tods.git
```
用`pip`在本地安装:
```
cd tods
pip install -e .
```

# 举例
例子在 [/examples](examples/) 中. 对于最基本的使用,你可以评估某个管道在某数据集上的表现。下面我们提供的例子演示了如何加载默认的管道,并评估它在yahoo数据集的子集上的表现。
```python
import pandas as pd

from tods import schemas as schemas_utils
from tods import generate_dataset, evaluate_pipeline

table_path = 'datasets/anomaly/raw_data/yahoo_sub_5.csv'
target_index = 6 # what column is the target
metric = 'F1_MACRO' # F1 on both label 0 and 1

# Read data and generate dataset
df = pd.read_csv(table_path)
dataset = generate_dataset(df, target_index)

# Load the default pipeline
pipeline = schemas_utils.load_default_pipeline()

# Run the pipeline
pipeline_result = evaluate_pipeline(dataset, pipeline, metric)
print(pipeline_result)
```
我们也提供AutoML的支持来自动帮您找到最适合您数据的管道。
```python
import pandas as pd

from axolotl.backend.simple import SimpleRunner

from tods import generate_dataset, generate_problem
from tods.searcher import BruteForceSearch

# Some information
table_path = 'datasets/yahoo_sub_5.csv'
target_index = 6 # what column is the target
time_limit = 30 # How many seconds you wanna search
metric = 'F1_MACRO' # F1 on both label 0 and 1

# Read data and generate dataset and problem
df = pd.read_csv(table_path)
dataset = generate_dataset(df, target_index=target_index)
problem_description = generate_problem(dataset, metric)

# Start backend
backend = SimpleRunner(random_seed=0)

# Start search algorithm
search = BruteForceSearch(problem_description=problem_description,
backend=backend)

# Find the best pipeline
best_runtime, best_pipeline_result = search.search_fit(input_data=[dataset], time_limit=time_limit)
best_pipeline = best_runtime.pipeline
best_output = best_pipeline_result.output

# Evaluate the best pipeline
best_scores = search.evaluate(best_pipeline).scores
```
# 致谢
我们诚挚地感谢DRAPA的Data Driven Discovery of Models (D3M)项目。


+ 3
- 3
examples/axolotl_interface/run_pipeline.py View File

@@ -13,8 +13,8 @@ parser.add_argument('--table_path', type=str, default=default_data_path,
help='Input the path of the input data table')
parser.add_argument('--target_index', type=int, default=6,
help='Index of the ground truth (for evaluation)')
parser.add_argument('--metric',type=str, default='F1_MACRO',
help='Evaluation Metric (F1, F1_MACRO)')
parser.add_argument('--metric',type=str, default='ALL',
help='Evaluation Metric (F1, F1_MACRO, RECALL, PRECISION, ALL)')
parser.add_argument('--pipeline_path',
default=os.path.join(this_path, './example_pipelines/autoencoder_pipeline.json'),
help='Input the path of the pre-built pipeline description')
@@ -35,6 +35,6 @@ pipeline = load_pipeline(pipeline_path)

# Run the pipeline
pipeline_result = evaluate_pipeline(dataset, pipeline, metric)
print(pipeline_result)
print(pipeline_result.scores)
#raise pipeline_result.error[0]


+ 1
- 1
examples/axolotl_interface/run_search.py View File

@@ -9,7 +9,7 @@ from tods.searcher import BruteForceSearch
#table_path = 'datasets/NAB/realTweets/labeled_Twitter_volume_GOOG.csv' # The path of the dataset
#target_index = 2 # what column is the target

table_path = 'datasets/yahoo_sub_5.csv'
table_path = '../../datasets/anomaly/raw_data/yahoo_sub_5.csv'
target_index = 6 # what column is the target
#table_path = 'datasets/NAB/realTweets/labeled_Twitter_volume_IBM.csv' # The path of the dataset
time_limit = 30 # How many seconds you wanna search


+ 2
- 3
primitive_tests/test.sh View File

@@ -1,9 +1,8 @@
#!/bin/bash

#modules="data_processing timeseries_processing feature_analysis detection_algorithms reinforcement"
modules="data_processing timeseries_processing feature_analysis detection_algorithm reinforcement"
#modules="data_processing timeseries_processing"
modules="detection_algorithm"
#test_scripts=$(ls primitive_tests | grep -v -f tested_file.txt)
#modules="detection_algorithm"

for module in $modules
do


+ 0
- 9
replace.sh View File

@@ -1,9 +0,0 @@
# !/bin/bash

files=$(ls primitive_tests)
for f in $files
do
f_path="./primitive_tests/"$f
save_path="./new_tests/"$f
cat $f_path | sed 's/d3m.primitives.data_transformation.dataset_to_dataframe.Common/d3m.primitives.tods.data_processing.dataset_to_dataframe/g'| sed 's/d3m.primitives.data_transformation.column_parser.Common/d3m.primitives.tods.data_processing.column_parser/g' | sed 's/d3m.primitives.data_transformation.extract_columns_by_semantic_types.Common/d3m.primitives.tods.data_processing.extract_columns_by_semantic_types/g' | sed 's/d3m.primitives.data_transformation.construct_predictions.Common/d3m.primitives.tods.data_processing.construct_predictions/g' > $save_path
done

+ 5
- 4
setup.py View File

@@ -35,13 +35,14 @@ setup(
]
},
install_requires=[
'tamu_d3m',
'tamu_axolotl',
'Jinja2',
#'tamu_d3m',
#'tamu_axolotl',
#'Jinja2',
'numpy==1.18.2',
'combo',
'simplejson==3.12.0',
'scikit-learn==0.22.0',
#'scikit-learn==0.22.0',
'scikit-learn',
'statsmodels==0.11.1',
'PyWavelets>=1.1.1',
'pillow==7.1.2',


+ 4
- 4
tods/detection_algorithm/DeepLog.py View File

@@ -7,10 +7,10 @@ import sklearn
import numpy
import typing
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Dropout , LSTM
from keras.regularizers import l2
from keras.losses import mean_squared_error
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout , LSTM
from tensorflow.keras.regularizers import l2
from tensorflow.keras.losses import mean_squared_error
from sklearn.preprocessing import StandardScaler
from sklearn.utils import check_array
from sklearn.utils.validation import check_is_fitted


+ 1
- 1
tods/detection_algorithm/LSTMODetect.py View File

@@ -196,7 +196,7 @@ class LSTMODetectorPrimitive(UnsupervisedOutlierDetectorBase[Inputs, Outputs, Pa
"python_path": "d3m.primitives.tods.detection_algorithm.LSTMODetector",
"source": {'name': "DATALAB @Taxes A&M University", 'contact': 'mailto:khlai037@tamu.edu',
'uris': ['https://gitlab.com/lhenry15/tods.git', 'https://gitlab.com/lhenry15/tods/-/blob/Junjie/anomaly-primitives/anomaly_primitives/LSTMOD.py']},
"algorithm_types": [metadata_base.PrimitiveAlgorithmType.ISOLATION_FOREST, ], # up to update
"algorithm_types": [metadata_base.PrimitiveAlgorithmType.TODS_PRIMITIVE ], # up to update
"primitive_family": metadata_base.PrimitiveFamily.ANOMALY_DETECTION,
"version": "0.0.1",
"hyperparams_to_tune": ['contamination', 'train_contamination', 'min_attack_time',


+ 1
- 1
tods/detection_algorithm/PyodAE.py View File

@@ -160,7 +160,7 @@ class Hyperparams(Hyperparams_ODBase):
contamination = hyperparams.Uniform(
lower=0.,
upper=0.5,
default=0.1,
default=0.01,
description='The amount of contamination of the data set, i.e. the proportion of outliers in the data set. ',
semantic_types=['https://metadata.datadrivendiscovery.org/types/TuningParameter']
)


+ 5
- 5
tods/detection_algorithm/Telemanom.py View File

@@ -9,11 +9,11 @@ import typing
import pandas as pd


from keras.models import Sequential, load_model
from keras.callbacks import History, EarlyStopping, Callback
from keras.layers.recurrent import LSTM
from keras.layers.core import Dense, Activation, Dropout
from keras.layers import Flatten
from tensorflow.keras.models import Sequential, load_model
from tensorflow.keras.callbacks import History, EarlyStopping, Callback
from tensorflow.keras.layers import LSTM
from tensorflow.keras.layers import Dense, Activation, Dropout
from tensorflow.keras.layers import Flatten

from d3m import container, utils
from d3m.base import utils as base_ut


+ 2
- 2
tods/detection_algorithm/core/LSTMOD.py View File

@@ -11,8 +11,8 @@ from .CollectiveBase import CollectiveBaseDetector

# from tod.utility import get_sub_matrices

from keras.layers import Dense, LSTM
from keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM
from tensorflow.keras.models import Sequential

class LSTMOutlierDetector(CollectiveBaseDetector):



+ 5
- 5
tods/detection_algorithm/core/utils/modeling.py View File

@@ -1,8 +1,8 @@
from keras.models import Sequential, load_model
from keras.callbacks import History, EarlyStopping, Callback
from keras.layers.recurrent import LSTM
from keras.layers.core import Dense, Activation, Dropout
from keras.layers import Flatten
from tensorflow.keras.models import Sequential, load_model
from tensorflow.keras.callbacks import History, EarlyStopping, Callback
from tensorflow.keras.layers import LSTM
from tensorflow.keras.layers import Dense, Activation, Dropout
from tensorflow.keras.layers import Flatten
import numpy as np
import os
import logging


+ 2
- 2
tods/reinforcement/RuleBasedFilter.py View File

@@ -115,8 +115,8 @@ class RuleBasedFilter(transformer.TransformerPrimitiveBase[Inputs, Outputs, Hype
"python_path": "d3m.primitives.tods.reinforcement.rule_filter",
"source": {'name': 'DATA Lab at Texas A&M University', 'contact': 'mailto:khlai037@tamu.edu',
'uris': ['https://gitlab.com/lhenry15/tods.git', ]},
"algorithm_types": [metadata_base.PrimitiveAlgorithmType.RULE_BASED_FILTER,],
"primitive_family": metadata_base.PrimitiveFamily.REINFORCEMENT,
"algorithm_types": [metadata_base.PrimitiveAlgorithmType.TODS_PRIMITIVE,],
"primitive_family": metadata_base.PrimitiveFamily.ANOMALY_DETECTION,
"id": "42744c37-8879-4785-9f18-6de9d612ea93",
"hyperparams_to_tune": ['rule',],
"version": "0.0.1",


+ 1
- 1
tods/searcher/brute_force_search.py View File

@@ -291,4 +291,4 @@ def _generate_pipelines(primitive_python_paths, cpu_count=40): # pragma: no cove
#for p in results:
# piplines.extend(p.get())

return piplines
return piplines

+ 7
- 7
tods/tests/run_tests.py View File

@@ -4,11 +4,11 @@ import sys
import unittest

runner = unittest.TextTestRunner(verbosity=1)
tests = unittest.TestLoader().discover('./')
if not runner.run(tests).wasSuccessful():
sys.exit(1)
#tests = unittest.TestLoader().discover('./')
#if not runner.run(tests).wasSuccessful():
# sys.exit(1)

#for each in ['data_processing', 'timeseries_processing', 'feature_analysis', 'detection_algorithm']:
# tests = unittest.TestLoader().discover(each)
# if not runner.run(tests).wasSuccessful():
# sys.exit(1)
for each in ['data_processing', 'timeseries_processing', 'feature_analysis', 'detection_algorithm']:
tests = unittest.TestLoader().discover(each)
if not runner.run(tests).wasSuccessful():
sys.exit(1)

Loading…
Cancel
Save