upgrade to py37 and fix tensorflow function conflicts

3 years ago · 833d5b0314
--- a/README.md
+++ b/README.md
@@ -4,6 +4,8 @@

 [![Build Status](https://travis-ci.org/datamllab/tods.svg?branch=master)](https://travis-ci.org/datamllab/tods)

 [中文文档](README.zh-CN.md)

 TODS is a full-stack automated machine learning system for outlier detection on multivariate time-series data. TODS provides exhaustive modules for building machine learning-based outlier detection systems, including: data processing, time series processing, feature analysis (extraction), detection algorithms, and reinforcement module. The functionalities provided via these modules include data preprocessing for general purposes, time series data smoothing/transformation, extracting features from time/frequency domains, various detection algorithms, and involving human expertise to calibrate the system. Three common outlier detection scenarios on time-series data can be performed: point-wise detection (time points as outliers), pattern-wise detection (subsequences as outliers), and system-wise detection (sets of time series as outliers), and a wide-range of corresponding algorithms are provided in TODS. This package is developed by [DATA Lab @ Texas A&M University](https://people.engr.tamu.edu/xiahu/index.html).

 TODS is featured for:
@@ -16,18 +18,21 @@ TODS is featured for:
 ## Resources
 * API Documentations: [http://tods-doc.github.io](http://tods-doc.github.io)
 * Paper: [https://arxiv.org/abs/2009.09822](https://arxiv.org/abs/2009.09822)
 * Related Project: [AutoVideo: An Automated Video Action Recognition System](https://github.com/datamllab/autovideo)

 ## Cite this Work:
 If you find this  work useful, you may cite this work:
 ```
@misc{lai2020tods,
    title={TODS: An Automated Time Series Outlier Detection System},
    author={Kwei-Harng Lai and Daochen Zha and Guanchu Wang and Junjie Xu and Yue Zhao and Devesh Kumar and Yile Chen and Purav Zumkhawaka and Minyang Wan and Diego Martinez and Xia Hu},
    year={2020},
    eprint={2009.09822},
    archivePrefix={arXiv},
    primaryClass={cs.DB}
@article{Lai_Zha_Wang_Xu_Zhao_Kumar_Chen_Zumkhawaka_Wan_Martinez_Hu_2021, 
 	title={TODS: An Automated Time Series Outlier Detection System}, 
 	volume={35}, 
 	number={18}, 
 	journal={Proceedings of the AAAI Conference on Artificial Intelligence}, 
 	author={Lai, Kwei-Herng and Zha, Daochen and Wang, Guanchu and Xu, Junjie and Zhao, Yue and Kumar, Devesh and Chen, Yile and Zumkhawaka, Purav and Wan, Minyang and Martinez, Diego and Hu, Xia}, 
 	year={2021}, month={May}, 
 	pages={16060-16062} 
 }

 ```

 ## Installation
@@ -37,7 +42,7 @@ This package works with **Python 3.6** and pip 19+. You need to have the followi
 sudo apt-get install libssl-dev libcurl4-openssl-dev libyaml-dev build-essential libopenblas-dev libcap-dev ffmpeg
 ```

 Clone the repository:
 Clone the repository (if you are in China and Github is slow, you can use the mirror in [Gitee](https://gitee.com/daochenzha/tods)):
 ```
 git clone https://github.com/datamllab/tods.git
 ```
--- a/README.zh-CN.md
+++ b/README.zh-CN.md
@@ -0,0 +1,113 @@

 # TODS: Automated Time-series Outlier Detection System 自动化时间序列异常检测系统
 <img width="500" src="./docs/img/tods_logo.png" alt="Logo" />

 [![Build Status](https://travis-ci.org/datamllab/tods.svg?branch=master)](https://travis-ci.org/datamllab/tods)

 [English README](README.md)

 TODS是一个全栈的自动化机器学习系统，主要针对多变量时间序列数据的异常检测。TODS提供了详尽的用于构建基于机器学习的异常检测系统的模块，它们包括：数据处理（data processing），时间序列处理（ time series processing），特征分析（feature analysis)，检测算法（detection algorithms），和强化模块（ reinforcement module）。这些模块所提供的功能包括常见的数据预处理、时间序列数据的平滑或变换，从时域或频域中抽取特征、多种多样的检测算法以及让人类专家来校准系统。该系统可以处理三种常见的时间序列异常检测场景：点的异常检测（异常是时间点）、模式的异常检测（异常是子序列）、系统的异常检测（异常是时间序列的集合）。TODS提供了一系列相应的算法。该包由 [DATA Lab @ Texas A&M University](https://people.engr.tamu.edu/xiahu/index.html) 开发。

 TODS具有如下特点：
 * **全栈式机器学习系统**：支持从数据预处理、特征提取、到检测算法和人为规则每一个步骤并提供相应的接口。

 * **广泛的算法支持**：包括[PyOD](https://github.com/yzhao062/pyod) 提供的点的异常检测算法、最先进的模式的异常检测算法（例如 [DeepLog](https://www.cs.utah.edu/~lifeifei/papers/deeplog.pdf), [Telemanon](https://arxiv.org/pdf/1802.04431.pdf) ），以及用于系统的异常检测的集合算法。

 * **自动化的机器学习**：旨在提供无需专业知识的过程，通过自动搜索所有现有模块中的最佳组合，基于给定数据构造最优管道。

 ## 相关资源
 * API文档: [http://tods-doc.github.io](http://tods-doc.github.io)
 * 论文: [https://arxiv.org/abs/2009.09822](https://arxiv.org/abs/2009.09822)
 * 相关项目：[AutoVideo: An Automated Video Action Recognition System](https://github.com/datamllab/autovideo)

 ## 引用该工作：
 如何您觉得我们的工作有用，请引用该工作：
 ```
@misc{lai2020tods,
    title={TODS: An Automated Time Series Outlier Detection System},
    author={Kwei-Harng Lai and Daochen Zha and Guanchu Wang and Junjie Xu and Yue Zhao and Devesh Kumar and Yile Chen and Purav Zumkhawaka and Minyang Wan and Diego Martinez and Xia Hu},
    year={2020},
    eprint={2009.09822},
    archivePrefix={arXiv},
    primaryClass={cs.DB}
 }
 ```

 ## 安装

 这个包的运行环境是 **Python 3.6** 和pip 19+。对于Debian和Ubuntu的使用者，您需要在系统上安装如下的包：
 ```
 sudo apt-get install libssl-dev libcurl4-openssl-dev libyaml-dev build-essential libopenblas-dev libcap-dev ffmpeg
 ```

 克隆该仓库（如果您访问Github较慢，国内用户可以使用[Gitee镜像](https://gitee.com/daochenzha/tods)）:
 ```
 git clone https://github.com/datamllab/tods.git
 ```
 用`pip`在本地安装:
 ```
 cd tods
 pip install -e .
 ```

 # 举例
 例子在 [/examples](examples/) 中. 对于最基本的使用，你可以评估某个管道在某数据集上的表现。下面我们提供的例子演示了如何加载默认的管道，并评估它在yahoo数据集的子集上的表现。
 ```python
 import pandas as pd

 from tods import schemas as schemas_utils
 from tods import generate_dataset, evaluate_pipeline

 table_path = 'datasets/anomaly/raw_data/yahoo_sub_5.csv'
 target_index = 6 # what column is the target
 metric = 'F1_MACRO' # F1 on both label 0 and 1

 # Read data and generate dataset
 df = pd.read_csv(table_path)
 dataset = generate_dataset(df, target_index)

 # Load the default pipeline
 pipeline = schemas_utils.load_default_pipeline()

 # Run the pipeline
 pipeline_result = evaluate_pipeline(dataset, pipeline, metric)
 print(pipeline_result)
 ```
 我们也提供AutoML的支持来自动帮您找到最适合您数据的管道。
 ```python
 import pandas as pd

 from axolotl.backend.simple import SimpleRunner

 from tods import generate_dataset, generate_problem
 from tods.searcher import BruteForceSearch

 # Some information
 table_path = 'datasets/yahoo_sub_5.csv'
 target_index = 6 # what column is the target
 time_limit = 30 # How many seconds you wanna search
 metric = 'F1_MACRO' # F1 on both label 0 and 1

 # Read data and generate dataset and problem
 df = pd.read_csv(table_path)
 dataset = generate_dataset(df, target_index=target_index)
 problem_description = generate_problem(dataset, metric)

 # Start backend
 backend = SimpleRunner(random_seed=0)

 # Start search algorithm
 search = BruteForceSearch(problem_description=problem_description,
                          backend=backend)

 # Find the best pipeline
 best_runtime, best_pipeline_result = search.search_fit(input_data=[dataset], time_limit=time_limit)
 best_pipeline = best_runtime.pipeline
 best_output = best_pipeline_result.output

 # Evaluate the best pipeline
 best_scores = search.evaluate(best_pipeline).scores
 ```
 # 致谢
 我们诚挚地感谢DRAPA的Data Driven Discovery of Models (D3M)项目。

--- a/examples/axolotl_interface/run_pipeline.py
+++ b/examples/axolotl_interface/run_pipeline.py
@@ -13,8 +13,8 @@ parser.add_argument('--table_path', type=str, default=default_data_path,
                    help='Input the path of the input data table')
 parser.add_argument('--target_index', type=int, default=6,
                    help='Index of the ground truth (for evaluation)')
 parser.add_argument('--metric',type=str, default='F1_MACRO',
                    help='Evaluation Metric (F1, F1_MACRO)')
 parser.add_argument('--metric',type=str, default='ALL',
                    help='Evaluation Metric (F1, F1_MACRO, RECALL, PRECISION, ALL)')
 parser.add_argument('--pipeline_path', 
                    default=os.path.join(this_path, './example_pipelines/autoencoder_pipeline.json'),
                    help='Input the path of the pre-built pipeline description')
@@ -35,6 +35,6 @@ pipeline = load_pipeline(pipeline_path)

 # Run the pipeline
 pipeline_result = evaluate_pipeline(dataset, pipeline, metric)
 print(pipeline_result)
 print(pipeline_result.scores)
 #raise pipeline_result.error[0]

--- a/examples/axolotl_interface/run_search.py
+++ b/examples/axolotl_interface/run_search.py
@@ -9,7 +9,7 @@ from tods.searcher import BruteForceSearch
 #table_path = 'datasets/NAB/realTweets/labeled_Twitter_volume_GOOG.csv' # The path of the dataset
 #target_index = 2 # what column is the target

 table_path = 'datasets/yahoo_sub_5.csv'
 table_path = '../../datasets/anomaly/raw_data/yahoo_sub_5.csv'
 target_index = 6 # what column is the target
 #table_path = 'datasets/NAB/realTweets/labeled_Twitter_volume_IBM.csv' # The path of the dataset
 time_limit = 30 # How many seconds you wanna search
--- a/primitive_tests/test.sh
+++ b/primitive_tests/test.sh
@@ -1,9 +1,8 @@
 #!/bin/bash

 #modules="data_processing timeseries_processing feature_analysis detection_algorithms reinforcement"
 modules="data_processing timeseries_processing feature_analysis detection_algorithm reinforcement"
 #modules="data_processing timeseries_processing"
 modules="detection_algorithm"
 #test_scripts=$(ls primitive_tests | grep -v -f tested_file.txt)
 #modules="detection_algorithm"

 for module in $modules
 do
--- a/replace.sh
+++ b/replace.sh
@@ -1,9 +0,0 @@
 # !/bin/bash

 files=$(ls primitive_tests)
 for f in $files
 do
 	f_path="./primitive_tests/"$f
 	save_path="./new_tests/"$f
 	cat $f_path | sed 's/d3m.primitives.data_transformation.dataset_to_dataframe.Common/d3m.primitives.tods.data_processing.dataset_to_dataframe/g'| sed 's/d3m.primitives.data_transformation.column_parser.Common/d3m.primitives.tods.data_processing.column_parser/g' | sed 's/d3m.primitives.data_transformation.extract_columns_by_semantic_types.Common/d3m.primitives.tods.data_processing.extract_columns_by_semantic_types/g' | sed 's/d3m.primitives.data_transformation.construct_predictions.Common/d3m.primitives.tods.data_processing.construct_predictions/g' > $save_path
 done
--- a/setup.py
+++ b/setup.py
@@ -35,13 +35,14 @@ setup(
                 ]
    },
    install_requires=[
        'tamu_d3m',
        'tamu_axolotl',
        'Jinja2',
        #'tamu_d3m',
        #'tamu_axolotl',
        #'Jinja2',
        'numpy==1.18.2',
        'combo',
        'simplejson==3.12.0',
        'scikit-learn==0.22.0',
        #'scikit-learn==0.22.0',
        'scikit-learn',
 	'statsmodels==0.11.1',
        'PyWavelets>=1.1.1',
        'pillow==7.1.2',
--- a/tods/detection_algorithm/DeepLog.py
+++ b/tods/detection_algorithm/DeepLog.py
@@ -7,10 +7,10 @@ import sklearn
 import numpy
 import typing
 import numpy as np
 from keras.models import Sequential
 from keras.layers import Dense, Dropout , LSTM
 from keras.regularizers import l2
 from keras.losses import mean_squared_error
 from tensorflow.keras.models import Sequential
 from tensorflow.keras.layers import Dense, Dropout , LSTM
 from tensorflow.keras.regularizers import l2
 from tensorflow.keras.losses import mean_squared_error
 from sklearn.preprocessing import StandardScaler
 from sklearn.utils import check_array
 from sklearn.utils.validation import check_is_fitted
--- a/tods/detection_algorithm/LSTMODetect.py
+++ b/tods/detection_algorithm/LSTMODetect.py
@@ -196,7 +196,7 @@ class LSTMODetectorPrimitive(UnsupervisedOutlierDetectorBase[Inputs, Outputs, Pa
        "python_path": "d3m.primitives.tods.detection_algorithm.LSTMODetector",
        "source": {'name': "DATALAB @Taxes A&M University", 'contact': 'mailto:khlai037@tamu.edu',
        'uris': ['https://gitlab.com/lhenry15/tods.git', 'https://gitlab.com/lhenry15/tods/-/blob/Junjie/anomaly-primitives/anomaly_primitives/LSTMOD.py']},
        "algorithm_types": [metadata_base.PrimitiveAlgorithmType.ISOLATION_FOREST, ], # up to update
        "algorithm_types": [metadata_base.PrimitiveAlgorithmType.TODS_PRIMITIVE ], # up to update
        "primitive_family": metadata_base.PrimitiveFamily.ANOMALY_DETECTION,
        "version": "0.0.1",
        "hyperparams_to_tune": ['contamination', 'train_contamination', 'min_attack_time',
--- a/tods/detection_algorithm/PyodAE.py
+++ b/tods/detection_algorithm/PyodAE.py
@@ -160,7 +160,7 @@ class Hyperparams(Hyperparams_ODBase):
    contamination = hyperparams.Uniform(
        lower=0.,
        upper=0.5,
        default=0.1,
        default=0.01,
        description='The amount of contamination of the data set, i.e. the proportion of outliers in the data set. ',
        semantic_types=['https://metadata.datadrivendiscovery.org/types/TuningParameter']
    )
--- a/tods/detection_algorithm/Telemanom.py
+++ b/tods/detection_algorithm/Telemanom.py
@@ -9,11 +9,11 @@ import typing
 import pandas as pd


 from keras.models import Sequential, load_model
 from keras.callbacks import History, EarlyStopping, Callback
 from keras.layers.recurrent import LSTM
 from keras.layers.core import Dense, Activation, Dropout
 from keras.layers import Flatten
 from tensorflow.keras.models import Sequential, load_model
 from tensorflow.keras.callbacks import History, EarlyStopping, Callback
 from tensorflow.keras.layers import LSTM
 from tensorflow.keras.layers import Dense, Activation, Dropout
 from tensorflow.keras.layers import Flatten

 from d3m import container, utils
 from d3m.base import utils as base_ut
--- a/tods/detection_algorithm/core/LSTMOD.py
+++ b/tods/detection_algorithm/core/LSTMOD.py
@@ -11,8 +11,8 @@ from .CollectiveBase import CollectiveBaseDetector

 # from tod.utility import get_sub_matrices

 from keras.layers import Dense, LSTM
 from keras.models import Sequential
 from tensorflow.keras.layers import Dense, LSTM
 from tensorflow.keras.models import Sequential

 class LSTMOutlierDetector(CollectiveBaseDetector):

--- a/tods/detection_algorithm/core/utils/modeling.py
+++ b/tods/detection_algorithm/core/utils/modeling.py
@@ -1,8 +1,8 @@
 from keras.models import Sequential, load_model
 from keras.callbacks import History, EarlyStopping, Callback
 from keras.layers.recurrent import LSTM
 from keras.layers.core import Dense, Activation, Dropout
 from keras.layers import Flatten
 from tensorflow.keras.models import Sequential, load_model
 from tensorflow.keras.callbacks import History, EarlyStopping, Callback
 from tensorflow.keras.layers import LSTM
 from tensorflow.keras.layers import Dense, Activation, Dropout
 from tensorflow.keras.layers import Flatten
 import numpy as np
 import os
 import logging
--- a/tods/reinforcement/RuleBasedFilter.py
+++ b/tods/reinforcement/RuleBasedFilter.py
@@ -115,8 +115,8 @@ class RuleBasedFilter(transformer.TransformerPrimitiveBase[Inputs, Outputs, Hype
         "python_path": "d3m.primitives.tods.reinforcement.rule_filter",
         "source": {'name': 'DATA Lab at Texas A&M University', 'contact': 'mailto:khlai037@tamu.edu', 
         'uris': ['https://gitlab.com/lhenry15/tods.git', ]},
         "algorithm_types": [metadata_base.PrimitiveAlgorithmType.RULE_BASED_FILTER,],
         "primitive_family": metadata_base.PrimitiveFamily.REINFORCEMENT,
         "algorithm_types": [metadata_base.PrimitiveAlgorithmType.TODS_PRIMITIVE,],
         "primitive_family": metadata_base.PrimitiveFamily.ANOMALY_DETECTION,
         "id": "42744c37-8879-4785-9f18-6de9d612ea93",
         "hyperparams_to_tune": ['rule',],
         "version": "0.0.1",
--- a/tods/searcher/brute_force_search.py
+++ b/tods/searcher/brute_force_search.py
@@ -291,4 +291,4 @@ def _generate_pipelines(primitive_python_paths, cpu_count=40): # pragma: no cove
    #for p in results:
    #    piplines.extend(p.get())

    return piplines
    return piplines
--- a/tods/tests/run_tests.py
+++ b/tods/tests/run_tests.py
@@ -4,11 +4,11 @@ import sys
 import unittest

 runner = unittest.TextTestRunner(verbosity=1)
 tests = unittest.TestLoader().discover('./')
 if not runner.run(tests).wasSuccessful():
    sys.exit(1)
 #tests = unittest.TestLoader().discover('./')
 #if not runner.run(tests).wasSuccessful():
 #    sys.exit(1)

 #for each in ['data_processing', 'timeseries_processing', 'feature_analysis', 'detection_algorithm']:
 #    tests = unittest.TestLoader().discover(each)
 #    if not runner.run(tests).wasSuccessful():
 #        sys.exit(1)
 for each in ['data_processing', 'timeseries_processing', 'feature_analysis', 'detection_algorithm']:
    tests = unittest.TestLoader().discover(each)
    if not runner.run(tests).wasSuccessful():
        sys.exit(1)