You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

index.rst 5.3 kB

update docs Former-commit-id: a5f7230edef731fae198fadb1f7dea3e2770b959 [formerly f95e3d68e1a574bbe85bf68c002f5a1f61f301d6] [formerly 59ecc3d00c18be6f26947ac0e72676533e4743a3 [formerly 9d4d330eb69894827ac61c433ef493dc8a8bfc2d]] [formerly ecdac83d6ff6b470679dfec3cffa0bf881e8964f [formerly 283dd77be751282730c11f3d4dba9bd88cf82b98] [formerly 257b382eab5ff0705940121a5b5c316c0ddd2ef3 [formerly fa92ea33164cf4d8809d3fab33d5b44e3e47a56a]]] [formerly c356a14f1ebbaa4f30a954186f0c3eb3d5da38e6 [formerly bc399172be938ebbe6c138f7e0b38a75e95803a6] [formerly de0639764af311328e4d9783b8874ee2b9c32b57 [formerly a21d5f4b7294d0d8a5045cf8e91cebd0dedadbc5]] [formerly bb5a99a378faa15cc9cd33bd502a89b484092361 [formerly c24d22e34cb5d1ba35c4524a469fcc39e48e2638] [formerly 8e40bf975b828ab57f33eb5f6b74e666f884593b [formerly 4713d44365858058a1bd004511e49fe05e87a025]]]] [formerly 7e6fff31dad224d971aa3e12202772b52e0c9a8f [formerly 263493fb8bb47c4906d69a2137ce0a99eddc8a41] [formerly b3157a0cf623edf5badb01572ed0101bd304f707 [formerly 93a1deffe87876c74301c1247e81ca1cf5763062]] [formerly 677bbf0e9864ad4872bc69e5368a98ee4db0f412 [formerly b3dda231620f13cc88f398b92cc9585d48f7f098] [formerly ad1311ba174c420b958d1835b447ef5cc8fb8a26 [formerly a084f42a033916ec4969e92ca09819f5d6b6d2d3]]] [formerly 4158770bca9814e3523fcdd1acdbb38046c63dbe [formerly 8ee86965d42caa6c479af284eae75a1af50d56fb] [formerly 3c5a17dc5a40eb03ce4b764e7b3c0081c08e633f [formerly 888390b2577b569c379862257476e7896d3e4fb4]] [formerly 99dec252957ca54db7fecf22c8646141c0abf09a [formerly 59d735f095cb727b3d917edb281afc384b96039f] [formerly c1bc26210348a625c81208c3862c22db2f17e8f7 [formerly 179cb79b9803b61544828de77216f20a86f63925]]]]] [formerly d429a3e99b14ed3927bb2ddb6d851944fa26fd75 [formerly f2c078328d4fd78f2ee3c1ce72485797e40f1c31] [formerly 7b43a339f97e82b825e0d8fe3f138ee815af9a7d [formerly 5ff61cbe9b7e02893314f786f640218cd7deb620]] [formerly 20c196656d186bb87ee7abd82e331dbd3d539810 [formerly 547ef491e85f9fb18e3d82d07bd166129e116cb8] [formerly 64948ad96b751ff5e78ed37ba276ef5b869f2a76 [formerly 7c5acb40c5ff73496d3819bc0e1d16e31b8c8eaa]]] [formerly 15476a5e4cb988d00586d49d25e98097c6b1996f [formerly 7b8e5e69e0ce99ee4530914eb3648875e16fbcb8] [formerly c5becb539101cf9ba24e1c7beaca5476854e18f7 [formerly 67a2c86cce7e37053930ae5b63c2f9a83b58f6e7]] [formerly 724a7d67cd786651fade61f36e4f27b37c55e133 [formerly 94a9adf9426589c74bc4909a468f50f88cc55341] [formerly e062285b831b6b75e64547a5df0004574af1adad [formerly 380e9f198a9d2ae263be5074c70d3c508fef5195]]]] [formerly 27679e19b5eb442c69accfacfe339c92017fbac7 [formerly 8d1dffe6cbbec727d039a9347c0609b60e97919e] [formerly 217d75f0343f736e7d1fad8f44a0042e40477655 [formerly bfd497c2f8d0bc902fac77e51f3b5b26cff60e3e]] [formerly ebcf922728251e3c806fd58e9bbd07e5a4dbfda8 [formerly 64429c06d8a21e415711f3e3f3ade06a6b55cc62] [formerly 74417071a4cb40a2e23084ec197df05780c9ac1a [formerly 79ae1650006b559be84c342726c1483c92a9ec8c]]] [formerly 3ea575bac19964c670b7ae2a85510547d0a46835 [formerly 298c193d81d222b4e36edce4225743c9a0e15405] [formerly 2820c941d92a46c6bcf94882b68ab1c6ae461599 [formerly d903304ea342e8e519724e87d562125fe21f5949]] [formerly cdac59b2b7729037252935e13436bd7cb0289415 [formerly edc15796b8c164ad7a0fa38ccae83688bb4d63eb] [formerly 1f2b58f69631b7b810cefbd1ee43e37c58fea8f4 [formerly ea3a3245b61bba8a7fdee2e5978a358174d52c3e]]]]]] Former-commit-id: 21bf49d98867ebf41b67cdf61c4f108a21221011 [formerly 17530bc08115a2e6680ae82f5a98d4dac3886c29] [formerly de3f736bbfe17d7a7f4af5f8e1785f3566d0b159 [formerly e415cc06aae1617d907f1d92f6b8d1e6ffef2098]] [formerly cdc5d91c471ac8894f56bb93c500c6d42b218ae5 [formerly 617ce36a452a36e8e4fa16de70ad5b1fd9409506] [formerly a55947da4d8f16bead3d87a3de703b480966ba4e [formerly 80fcf32a21a2689a8a9bb796807427d70a712cad]]] [formerly 47aed3ddc7857c5778a81307a1de30fd8be410e8 [formerly c57ad7d488619f5adc9de2ee99dd2dd3a6d1b011] [formerly b64eba1e4b7d88cb46c8b5c4f1a9518b09cf909b [formerly a437b32ad724f3bb2ba17c40e4385aa181122648]] [formerly bab02916a475235171c07a08862087f91ce18e91 [formerly c104d2867c3f06c4e00980f41b244812f9d5c8da] [formerly f48fa96296252865619f53af100b4e27e244352a [formerly 404dc5f88a4b9cec9369c0fae42692392027ad86]]]] [formerly e65e02c86d2c61cd006aae269e489a432b076b4f [formerly 7d10e2b25f919c11d3215c90a89745f744dd2351] [formerly 8d0a5d240af100701718613881260cc0bd653352 [formerly cae0267091e9d320b12b91dca3ae5b78c12caa63]] [formerly 67f9a67916b0c2f26490b3ec201c80ddd397999c [formerly 71429432690bcc3fef23d9f31202539cd3db0b79] [formerly f035b9bea066a3b390f3f04f457f692bcee4cb98 [formerly ed250b673baceac015fac271e2c076b7dbba87d8]]] [formerly 5b0cb463c852829900ae88aa9302452ac1c9ccf5 [formerly 2621f8ddd26d696db17ca7f9c05db2b802236e98] [formerly 1258c415a6134fbb0af9836637e817bb9fb52c70 [formerly 1ddf210aae0334d1398291527587e0919a372450]] [formerly 572b6a29032bda5dede96cce48af53d5c93fcbad [formerly 3fe837f426fb374fac0782f9f2aa51cd636e2976] [formerly 1f2b58f69631b7b810cefbd1ee43e37c58fea8f4]]]] Former-commit-id: 868246627330c4ef7c42a1e519aca0bd468d30c5 [formerly e2d3d6db28da02256c85817bbf0d324077c34438] [formerly 8c1b56c255d53595931d486cb8ee3a49a88f593e [formerly 6eda5986c317598759cc8dc58dcd6199d58ddf2e]] [formerly 705890c35e1b7bf4af408777d9d445e891069d83 [formerly c276ef25f2bb08a95b49ee9bbbc66d679f9fec28] [formerly 7a6b1aee6a3c3240401c93f0c97b15050eab3238 [formerly 21c27e12db2cfccb3035efa7814e2f6102aa5dc3]]] [formerly 93f76b2e5e0fe9915c674748a35ebb7d418f625f [formerly 878474b504d15a5e027b86eb0eb2e65cd1cd7c43] [formerly c1b2bf92e62b64a37fc6da82ef718d364f5bffba [formerly 36c04290d181e7db4c24f1d601431e059f604c9f]] [formerly 3f3a9111216b93517e7c8f56c0aeb3357a1cd38d [formerly 11a5860d0ed2700093af25ae792cb8585b0450da] [formerly 110aa2ecd67689cfc857232bd512df1d8c5f36cb [formerly b72761d6b764b23e50cc84823486cac4602f1be0]]]] Former-commit-id: 78ae0319c1f0e83e9c4cd50698aac4af23249638 [formerly 54e0e43c7d4d0c231c0b48c548c88488a4821d01] [formerly f1129426aa695e4cc5dc4a55149df9bd803dea0a [formerly bc8ef441cc61c7a4edf5e619c48c7db08b8955e6]] [formerly a1b736f0e7c4cdc474a99793c665d99b169ac79e [formerly ea53de0f4533cdda325322f79ddf55b225146610] [formerly a22fb7400fa52aa55feb0633b606a34f00c534d5 [formerly 9f0a835e2908c60974d0f77a43becc30768dd1c9]]] Former-commit-id: 571861eb337378d3c2f0b2143f64182c01e08116 [formerly 01fa8af403f277bf11b32f38ddb0309ce88772cc] [formerly 8d8a812d862f5139af35ec9b7dae1782ccfd8498 [formerly 3c93bc8c52742aca8dbabdfe6a998f270d0e39c8]] Former-commit-id: 79dbc73d8960d6dbe1793a02e0951ee9d0ad26e0 [formerly d6b6cbd4c3833ec0d64e271d6c0d4fe26b3fa480] Former-commit-id: c7b026235f9c7db7bbb7896e38fcc0c89039ce3e
4 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100
  1. .. Time Series Outlier Detection System documentation master file, created by
  2. sphinx-quickstart on Wed Sep 9 22:52:15 2020.
  3. You can adapt this file completely to your liking, but it should at least
  4. contain the root `toctree` directive.
  5. Welcome to TOD's documentation!
  6. ================================================================
  7. TODS is a full-stack automated machine learning system for outlier detection on multivariate time-series data. TODS provides exahaustive modules for building machine learning-based outlier detection systems including: data processing, time series processing, feature analysis (extraction), detection algorithms, and reinforcement module. The functionalities provided via these modules including: data preprocessing for general purposes, time series data smoothing/transformation, extracting features from time/frequency domains, various detection algorithms, and involving human expertises to calibrate the system. Three common outlier detection scenarios on time-series data can be performed: point-wise detection (time points as outliers), pattern-wise detection (subsequences as outliers), and system-wise detection (sets of time series as outliers), and wide-range of corresponding algorithms are provided in TODS. This package is developed by `DATA Lab @ Texas A&M University <https://people.engr.tamu.edu/xiahu/index.html>`__.
  8. TODS is featured for:
  9. * **Full Sack Machine Learning System** which supports exhaustive components from preprocessings, feature extraction, detection algorithms and also human-in-the loop interface.
  10. * **Wide-range of Algorithms**, including all of the point-wise detection algorithms supported by `PyOD <https://github.com/yzhao062/pyod>`__, state-of-the-art pattern-wise (collective) detection algorithms such as `DeepLog <https://www.cs.utah.edu/~lifeifei/papers/deeplog.pdf>`__, `Telemanon <https://arxiv.org/pdf/1802.04431.pdf>`__, and also various ensemble algorithms for performing system-wise detection.
  11. * **Automated Machine Learning** aims on providing knowledge-free process that construct optimal pipeline based on the given data by automatically searching the best combination from all of the existing modules.
  12. Installation
  13. -----------
  14. This package works with **Python 3.6** and pip 19+. You need to have the following packages installed on the system (for Debian/Ubuntu):
  15. ::
  16. sudo apt-get install libssl-dev libcurl4-openssl-dev libyaml-dev build-essential libopenblas-dev libcap-dev ffmpeg
  17. Then execute ``python setup.py install``, the script will then install all of the packges to build up TODS.
  18. .. toctree::
  19. :maxdepth: 4
  20. :caption: Contents:
  21. Examples
  22. --------
  23. Examples are available in `examples <https://github.com/datamllab/tods/tree/master/examples>`__. For basic usage, you can evaluate a pipeline on a given datasets. Here, we provide an example to load our default pipeline and evaluate it on a subset of yahoo dataset.
  24. .. code:: python
  25. import pandas as pd
  26. from tods import schemas as schemas_utils
  27. from tods.utils import generate_dataset_problem, evaluate_pipeline
  28. table_path = 'datasets/yahoo_sub_5.csv'
  29. target_index = 6 # what column is the target
  30. #table_path = 'datasets/NAB/realTweets/labeled_Twitter_volume_IBM.csv' # The path of the dataset
  31. time_limit = 30 # How many seconds you wanna search
  32. #metric = 'F1' # F1 on label 1
  33. metric = 'F1_MACRO' # F1 on both label 0 and 1
  34. # Read data and generate dataset and problem
  35. df = pd.read_csv(table_path)
  36. dataset, problem_description = generate_dataset_problem(df, target_index=target_index, metric=metric)
  37. # Load the default pipeline
  38. pipeline = schemas_utils.load_default_pipeline()
  39. # Run the pipeline
  40. pipeline_result = evaluate_pipeline(problem_description, dataset, pipeline)
  41. We also provide AutoML support to help you automatically find a good pipeline for a your data.
  42. .. code:: python
  43. import pandas as pd
  44. from axolotl.backend.simple import SimpleRunner
  45. from tods.utils import generate_dataset_problem
  46. from tods.search import BruteForceSearch
  47. # Some information
  48. #table_path = 'datasets/NAB/realTweets/labeled_Twitter_volume_GOOG.csv' # The path of the dataset
  49. #target_index = 2 # what column is the target
  50. table_path = 'datasets/yahoo_sub_5.csv'
  51. target_index = 6 # what column is the target
  52. #table_path = 'datasets/NAB/realTweets/labeled_Twitter_volume_IBM.csv' # The path of the dataset
  53. time_limit = 30 # How many seconds you wanna search
  54. #metric = 'F1' # F1 on label 1
  55. metric = 'F1_MACRO' # F1 on both label 0 and 1
  56. # Read data and generate dataset and problem
  57. df = pd.read_csv(table_path)
  58. dataset, problem_description = generate_dataset_problem(df, target_index=target_index, metric=metric)
  59. # Start backend
  60. backend = SimpleRunner(random_seed=0)
  61. # Start search algorithm
  62. search = BruteForceSearch(problem_description=problem_description, backend=backend)
  63. # Find the best pipeline
  64. best_runtime, best_pipeline_result = search.search_fit(input_data=[dataset], time_limit=time_limit)
  65. best_pipeline = best_runtime.pipeline
  66. best_output = best_pipeline_result.output
  67. # Evaluate the best pipeline
  68. best_scores = search.evaluate(best_pipeline).scores

全栈的自动化机器学习系统,主要针对多变量时间序列数据的异常检测。TODS提供了详尽的用于构建基于机器学习的异常检测系统的模块,它们包括:数据处理(data processing),时间序列处理( time series processing),特征分析(feature analysis),检测算法(detection algorithms),和强化模块( reinforcement module)。这些模块所提供的功能包括常见的数据预处理、时间序列数据的平滑或变换,从时域或频域中抽取特征、多种多样的检测算