You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

creditcard.py 926 B

12345678910111213141516171819202122232425262728
  1. import pandas as pd
  2. import os
  3. import requests
  4. def preprocess_creditcard():
  5. def get_data():
  6. link="https://www.openml.org/data/get_csv/1673544/phpKo8OWT"
  7. r = requests.get(link)
  8. with open('./raw_data/openml_creditcard.csv', 'wb') as f:
  9. f.write(r.content)
  10. get_data()
  11. df = pd.read_csv("./raw_data/openml_creditcard.csv")
  12. # drop nan and str columns
  13. df = df.dropna()
  14. #df = df.drop(columns=['Time'])
  15. cols = df.columns.tolist()
  16. cols = cols[-1:] + cols[:-1]
  17. df = df[cols]
  18. #df['Class'] = df['Class'].map({0:"nominal", 1: "anomaly"})
  19. #df = df.sample(frac=0.025, replace=False, random_state=1)
  20. df = df.sort_values(by=['Time'])
  21. df = df.drop(columns=['Time'])
  22. df['Class'] = df['Class'].str.replace(r'\'', '').astype(int)
  23. df.to_csv("../creditcard.csv", index=False, encoding='utf-8')
  24. if __name__ == "__main__":
  25. preprocess_creditcard()

全栈的自动化机器学习系统,主要针对多变量时间序列数据的异常检测。TODS提供了详尽的用于构建基于机器学习的异常检测系统的模块,它们包括:数据处理(data processing),时间序列处理( time series processing),特征分析(feature analysis),检测算法(detection algorithms),和强化模块( reinforcement module)。这些模块所提供的功能包括常见的数据预处理、时间序列数据的平滑或变换,从时域或频域中抽取特征、多种多样的检测算