You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

gecco.py 1.1 kB

12345678910111213141516171819202122232425262728293031
  1. import pandas as pd
  2. import os
  3. import requests
  4. def preprocess_gecco():
  5. def get_data():
  6. link="https://ndownloader.figshare.com/articles/12451142/versions/1"
  7. r = requests.get(link)
  8. with open('./raw_data/gecco.zip', 'wb') as f:
  9. f.write(r.content)
  10. os.system("unzip ./raw_data/gecco.zip -d ./raw_data")
  11. os.system("rm ./raw_data/*.pdf ./raw_data/4_ResourcePackage_GECCO_Industrial_Challenge_2018.zip")
  12. get_data()
  13. df = pd.read_csv("./raw_data/1_gecco2018_water_quality.csv")
  14. # drop nan and str columns
  15. df = df.dropna()
  16. df = df.drop(columns=['Time', df.columns[0]])
  17. cols = df.columns.tolist()
  18. cols = cols[-1:] + cols[:-1]
  19. df = df[cols]
  20. df['EVENT'] = df['EVENT'].map({False:"0", True: "1"})
  21. df = df.rename(columns={"EVENT": "label"})
  22. #df['Class'] = df['Class'].map({0:"nominal", 1: "anomaly"})
  23. #df = df.sample(frac=0.025, replace=False, random_state=1)
  24. #df = df.sort_values(by=['Time'])
  25. #df = df.drop(columns=['Time'])
  26. df.to_csv("../water_quality.csv", index=False, encoding='utf-8')
  27. if __name__ == "__main__":
  28. preprocess_gecco()

全栈的自动化机器学习系统,主要针对多变量时间序列数据的异常检测。TODS提供了详尽的用于构建基于机器学习的异常检测系统的模块,它们包括:数据处理(data processing),时间序列处理( time series processing),特征分析(feature analysis),检测算法(detection algorithms),和强化模块( reinforcement module)。这些模块所提供的功能包括常见的数据预处理、时间序列数据的平滑或变换,从时域或频域中抽取特征、多种多样的检测算