You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

TODS Notebook Master-Branch.ipynb 117 kB

4 years ago

  1. {
  2. "cells": [
  3. {
  4. "cell_type": "markdown",
  5. "metadata": {},
  6. "source": [
  7. "# TODS"
  8. ]
  9. },
  10. {
  11. "cell_type": "markdown",
  12. "metadata": {},
  13. "source": [
  14. "## Introduction Summary"
  15. ]
  16. },
  17. {
  18. "cell_type": "markdown",
  19. "metadata": {},
  20. "source": [
  21. "TODS is a full-stack automated machine learning system for outlier detection on multivariate time-series data. TODS provides exhaustive modules for building machine learning-based outlier detection systems, including: data processing, time series processing, feature analysis (extraction), detection algorithms, and reinforcement module. The functionalities provided via these modules include data preprocessing for general purposes, time series data smoothing/transformation, extracting features from time/frequency domains, various detection algorithms, and involving human expertise to calibrate the system. Three common outlier detection scenarios on time-series data can be performed: point-wise detection (time points as outliers), pattern-wise detection (subsequences as outliers), and system-wise detection (sets of time series as outliers), and a wide-range of corresponding algorithms are provided in TODS. This package is developed by DATA Lab @ Texas A&M University."
  22. ]
  23. },
  24. {
  25. "cell_type": "markdown",
  26. "metadata": {},
  27. "source": [
  28. "## Packages"
  29. ]
  30. },
  31. {
  32. "cell_type": "code",
  33. "execution_count": 1,
  34. "metadata": {},
  35. "outputs": [
  36. {
  37. "name": "stdout",
  38. "output_type": "stream",
  39. "text": [
  40. "Python 3.6.10 :: Anaconda, Inc.\r\n"
  41. ]
  42. }
  43. ],
  44. "source": [
  45. "!python -V\n",
  46. "# Make sure python version is 3.6"
  47. ]
  48. },
  49. {
  50. "cell_type": "code",
  51. "execution_count": 2,
  52. "metadata": {},
  53. "outputs": [
  54. {
  55. "data": {
  56. "text/plain": [
  57. "'1.4.1'"
  58. ]
  59. },
  60. "execution_count": 2,
  61. "metadata": {},
  62. "output_type": "execute_result"
  63. }
  64. ],
  65. "source": [
  66. "import scipy\n",
  67. "scipy.__version__"
  68. ]
  69. },
  70. {
  71. "cell_type": "code",
  72. "execution_count": 3,
  73. "metadata": {},
  74. "outputs": [
  75. {
  76. "name": "stdout",
  77. "output_type": "stream",
  78. "text": [
  79. "TODS Notebook Master-Branch.ipynb TODS Official Demo Notebook.ipynb\r\n"
  80. ]
  81. }
  82. ],
  83. "source": [
  84. "!ls"
  85. ]
  86. },
  87. {
  88. "cell_type": "markdown",
  89. "metadata": {},
  90. "source": [
  91. "## Imports"
  92. ]
  93. },
  94. {
  95. "cell_type": "code",
  96. "execution_count": 4,
  97. "metadata": {},
  98. "outputs": [],
  99. "source": [
  100. "import warnings\n",
  101. "warnings.filterwarnings(\"ignore\")"
  102. ]
  103. },
  104. {
  105. "cell_type": "code",
  106. "execution_count": 5,
  107. "metadata": {},
  108. "outputs": [
  109. {
  110. "name": "stderr",
  111. "output_type": "stream",
  112. "text": [
  113. "d3m.primitives.tods.detection_algorithm.LSTMODetector: Primitive is not providing a description through its docstring.\n"
  114. ]
  115. }
  116. ],
  117. "source": [
  118. "import sys\n",
  119. "import argparse\n",
  120. "import os\n",
  121. "import numpy as np\n",
  122. "import pandas as pd\n",
  123. "from sklearn.metrics import precision_recall_curve\n",
  124. "from sklearn.metrics import accuracy_score\n",
  125. "from sklearn.metrics import confusion_matrix\n",
  126. "from sklearn.metrics import classification_report\n",
  127. "import matplotlib.pyplot as plt\n",
  128. "from sklearn import metrics\n",
  129. "from d3m import index\n",
  130. "from d3m.metadata.base import ArgumentType\n",
  131. "from d3m.metadata.pipeline import Pipeline, PrimitiveStep\n",
  132. "from axolotl.backend.simple import SimpleRunner\n",
  133. "from tods import generate_dataset, generate_problem\n",
  134. "from tods.searcher import BruteForceSearch\n",
  135. "from tods import generate_dataset, load_pipeline, evaluate_pipeline\n",
  136. "from tods.sk_interface.detection_algorithm.DeepLog_skinterface import DeepLogSKI\n",
  137. "from tods.sk_interface.detection_algorithm.Telemanom_skinterface import TelemanomSKI\n"
  138. ]
  139. },
  140. {
  141. "cell_type": "markdown",
  142. "metadata": {},
  143. "source": [
  144. "## Dataset"
  145. ]
  146. },
  147. {
  148. "cell_type": "markdown",
  149. "metadata": {},
  150. "source": [
  151. "### UCR Dataset"
  152. ]
  153. },
  154. {
  155. "cell_type": "code",
  156. "execution_count": 6,
  157. "metadata": {},
  158. "outputs": [],
  159. "source": [
  160. "data_UCR = np.loadtxt(\"../../datasets/anomaly/raw_data/500_UCR_Anomaly_robotDOG1_10000_19280_19360.txt\")"
  161. ]
  162. },
  163. {
  164. "cell_type": "code",
  165. "execution_count": 7,
  166. "metadata": {},
  167. "outputs": [
  168. {
  169. "name": "stdout",
  170. "output_type": "stream",
  171. "text": [
  172. "shape: (20000,)\n",
  173. "datatype of data: float64\n",
  174. "First 5 rows:\n",
  175. " [0.145299 0.128205 0.094017 0.076923 0.111111]\n"
  176. ]
  177. }
  178. ],
  179. "source": [
  180. "print(\"shape:\", data_UCR.shape)\n",
  181. "print(\"datatype of data:\",data_UCR.dtype)\n",
  182. "print(\"First 5 rows:\\n\", data_UCR[:5])"
  183. ]
  184. },
  185. {
  186. "cell_type": "code",
  187. "execution_count": 8,
  188. "metadata": {},
  189. "outputs": [],
  190. "source": [
  191. "X_train = np.expand_dims(data_UCR[:10000], axis=1)\n",
  192. "X_test = np.expand_dims(data_UCR[10000:], axis=1)"
  193. ]
  194. },
  195. {
  196. "cell_type": "code",
  197. "execution_count": 9,
  198. "metadata": {},
  199. "outputs": [
  200. {
  201. "name": "stdout",
  202. "output_type": "stream",
  203. "text": [
  204. "First 5 rows train:\n",
  205. " [[0.145299]\n",
  206. " [0.128205]\n",
  207. " [0.094017]\n",
  208. " [0.076923]\n",
  209. " [0.111111]]\n",
  210. "First 5 rows test:\n",
  211. " [[0.076923]\n",
  212. " [0.076923]\n",
  213. " [0.076923]\n",
  214. " [0.094017]\n",
  215. " [0.145299]]\n"
  216. ]
  217. }
  218. ],
  219. "source": [
  220. "print(\"First 5 rows train:\\n\", X_train[:5])\n",
  221. "print(\"First 5 rows test:\\n\", X_test[:5])"
  222. ]
  223. },
  224. {
  225. "cell_type": "markdown",
  226. "metadata": {},
  227. "source": [
  228. "### Yahoo Dataset"
  229. ]
  230. },
  231. {
  232. "cell_type": "code",
  233. "execution_count": 10,
  234. "metadata": {},
  235. "outputs": [],
  236. "source": [
  237. "data_yahoo = pd.read_csv('../../datasets/anomaly/raw_data/yahoo_sub_5.csv')"
  238. ]
  239. },
  240. {
  241. "cell_type": "code",
  242. "execution_count": 11,
  243. "metadata": {},
  244. "outputs": [
  245. {
  246. "name": "stdout",
  247. "output_type": "stream",
  248. "text": [
  249. "shape: (1400, 7)\n",
  250. "First 5 rows:\n",
  251. " timestamp value_0 value_1 value_2 value_3 value_4 anomaly\n",
  252. "0 1 12183 0.000000 3.716667 5 2109 0\n",
  253. "1 2 12715 0.091758 3.610833 60 3229 0\n",
  254. "2 3 12736 0.172297 3.481389 88 3637 0\n",
  255. "3 4 12716 0.226219 3.380278 84 1982 0\n",
  256. "4 5 12739 0.176358 3.193333 111 2751 0\n"
  257. ]
  258. }
  259. ],
  260. "source": [
  261. "print(\"shape:\", data_yahoo.shape)\n",
  262. "print(\"First 5 rows:\\n\", data_yahoo[:5])"
  263. ]
  264. },
  265. {
  266. "cell_type": "markdown",
  267. "metadata": {},
  268. "source": [
  269. "## SK Example 1: DeepLog"
  270. ]
  271. },
  272. {
  273. "cell_type": "code",
  274. "execution_count": 12,
  275. "metadata": {},
  276. "outputs": [
  277. {
  278. "name": "stdout",
  279. "output_type": "stream",
  280. "text": [
  281. "Epoch 1/10\n",
  282. "282/282 [==============================] - 1s 5ms/step - loss: 0.4255 - val_loss: 0.2777\n",
  283. "Epoch 2/10\n",
  284. "282/282 [==============================] - 1s 2ms/step - loss: 0.3367 - val_loss: 0.2802\n",
  285. "Epoch 3/10\n",
  286. "282/282 [==============================] - 1s 2ms/step - loss: 0.3545 - val_loss: 0.2595\n",
  287. "Epoch 4/10\n",
  288. "282/282 [==============================] - 1s 2ms/step - loss: 0.3572 - val_loss: 0.2674\n",
  289. "Epoch 5/10\n",
  290. "282/282 [==============================] - 1s 2ms/step - loss: 0.3457 - val_loss: 0.2880\n",
  291. "Epoch 6/10\n",
  292. "282/282 [==============================] - 1s 2ms/step - loss: 0.3503 - val_loss: 0.2619\n",
  293. "Epoch 7/10\n",
  294. "282/282 [==============================] - 1s 2ms/step - loss: 0.3559 - val_loss: 0.2818\n",
  295. "Epoch 8/10\n",
  296. "282/282 [==============================] - 1s 2ms/step - loss: 0.3439 - val_loss: 0.2620\n",
  297. "Epoch 9/10\n",
  298. "282/282 [==============================] - 1s 2ms/step - loss: 0.3390 - val_loss: 0.2690\n",
  299. "Epoch 10/10\n",
  300. "282/282 [==============================] - 1s 2ms/step - loss: 0.3425 - val_loss: 0.2683\n"
  301. ]
  302. }
  303. ],
  304. "source": [
  305. "transformer = DeepLogSKI()\n",
  306. "transformer.fit(X_train)\n",
  307. "prediction_labels_train = transformer.predict(X_train)\n",
  308. "prediction_labels_test = transformer.predict(X_test)\n",
  309. "prediction_score = transformer.predict_score(X_test)"
  310. ]
  311. },
  312. {
  313. "cell_type": "code",
  314. "execution_count": 13,
  315. "metadata": {},
  316. "outputs": [
  317. {
  318. "name": "stdout",
  319. "output_type": "stream",
  320. "text": [
  321. "Prediction Labels\n",
  322. " [[0]\n",
  323. " [0]\n",
  324. " [0]\n",
  325. " ...\n",
  326. " [0]\n",
  327. " [0]\n",
  328. " [0]]\n",
  329. "Prediction Score\n",
  330. " [[0]\n",
  331. " [0]\n",
  332. " [0]\n",
  333. " ...\n",
  334. " [0]\n",
  335. " [0]\n",
  336. " [0]]\n"
  337. ]
  338. }
  339. ],
  340. "source": [
  341. "print(\"Prediction Labels\\n\", prediction_labels_test)\n",
  342. "print(\"Prediction Score\\n\", prediction_score)"
  343. ]
  344. },
  345. {
  346. "cell_type": "code",
  347. "execution_count": 14,
  348. "metadata": {},
  349. "outputs": [],
  350. "source": [
  351. "y_true = prediction_labels_train\n",
  352. "y_pred = prediction_labels_test\n",
  353. "precision, recall, thresholds = precision_recall_curve(y_true, y_pred)\n",
  354. "f1_scores = 2*recall*precision/(recall+precision)\n",
  355. "fpr, tpr, threshold = metrics.roc_curve(y_true, y_pred)\n",
  356. "roc_auc = metrics.auc(fpr, tpr)"
  357. ]
  358. },
  359. {
  360. "cell_type": "code",
  361. "execution_count": 15,
  362. "metadata": {},
  363. "outputs": [
  364. {
  365. "name": "stdout",
  366. "output_type": "stream",
  367. "text": [
  368. "Accuracy Score: 0.9042\n"
  369. ]
  370. }
  371. ],
  372. "source": [
  373. "print('Accuracy Score: ', accuracy_score(y_true, y_pred))"
  374. ]
  375. },
  376. {
  377. "cell_type": "code",
  378. "execution_count": 16,
  379. "metadata": {},
  380. "outputs": [
  381. {
  382. "name": "stdout",
  383. "output_type": "stream",
  384. "text": [
  385. " precision recall f1-score support\n",
  386. "\n",
  387. " 0 0.93 0.97 0.95 9004\n",
  388. " 1 0.53 0.35 0.42 996\n",
  389. "\n",
  390. " accuracy 0.90 10000\n",
  391. " macro avg 0.73 0.66 0.68 10000\n",
  392. "weighted avg 0.89 0.90 0.90 10000\n",
  393. "\n"
  394. ]
  395. }
  396. ],
  397. "source": [
  398. "print(classification_report(y_true, y_pred))"
  399. ]
  400. },
  401. {
  402. "cell_type": "code",
  403. "execution_count": 17,
  404. "metadata": {},
  405. "outputs": [
  406. {
  407. "name": "stdout",
  408. "output_type": "stream",
  409. "text": [
  410. "Best threshold: 1\n",
  411. "Best F1-Score: 0.42219541616405304\n"
  412. ]
  413. }
  414. ],
  415. "source": [
  416. "print('Best threshold: ', thresholds[np.argmax(f1_scores)])\n",
  417. "print('Best F1-Score: ', np.max(f1_scores))"
  418. ]
  419. },
  420. {
  421. "cell_type": "code",
  422. "execution_count": 18,
  423. "metadata": {},
  424. "outputs": [
  425. {
  426. "data": {
  427. "image/png": "\n",
  428. "text/plain": [
  429. "<Figure size 432x288 with 1 Axes>"
  430. ]
  431. },
  432. "metadata": {
  433. "needs_background": "light"
  434. },
  435. "output_type": "display_data"
  436. }
  437. ],
  438. "source": [
  439. "plt.title('ROC')\n",
  440. "plt.plot(fpr, tpr, 'b', label = 'AUC = %0.2f' % roc_auc)\n",
  441. "plt.legend(loc = 'lower right')\n",
  442. "plt.ylabel('True Positive Rate')\n",
  443. "plt.xlabel('False Positive Rate')\n",
  444. "plt.show()"
  445. ]
  446. },
  447. {
  448. "cell_type": "markdown",
  449. "metadata": {},
  450. "source": [
  451. "## SK Example 2: Telemanom"
  452. ]
  453. },
  454. {
  455. "cell_type": "code",
  456. "execution_count": 19,
  457. "metadata": {},
  458. "outputs": [
  459. {
  460. "name": "stdout",
  461. "output_type": "stream",
  462. "text": [
  463. "125/125 [==============================] - 1s 7ms/step - loss: 0.0134 - val_loss: 0.0051\n"
  464. ]
  465. }
  466. ],
  467. "source": [
  468. "transformer = TelemanomSKI(l_s= 2, n_predictions= 1)\n",
  469. "transformer.fit(X_train)\n",
  470. "prediction_labels_train = transformer.predict(X_train)\n",
  471. "prediction_labels_test = transformer.predict(X_test)\n",
  472. "prediction_score = transformer.predict_score(X_test)"
  473. ]
  474. },
  475. {
  476. "cell_type": "code",
  477. "execution_count": 20,
  478. "metadata": {},
  479. "outputs": [
  480. {
  481. "name": "stdout",
  482. "output_type": "stream",
  483. "text": [
  484. "Prediction Labels\n",
  485. " [[1]\n",
  486. " [1]\n",
  487. " [1]\n",
  488. " ...\n",
  489. " [1]\n",
  490. " [1]\n",
  491. " [1]]\n",
  492. "Prediction Score\n",
  493. " [[1]\n",
  494. " [1]\n",
  495. " [1]\n",
  496. " ...\n",
  497. " [1]\n",
  498. " [1]\n",
  499. " [1]]\n"
  500. ]
  501. }
  502. ],
  503. "source": [
  504. "print(\"Prediction Labels\\n\", prediction_labels_test)\n",
  505. "print(\"Prediction Score\\n\", prediction_score)"
  506. ]
  507. },
  508. {
  509. "cell_type": "code",
  510. "execution_count": 21,
  511. "metadata": {},
  512. "outputs": [],
  513. "source": [
  514. "y_true = prediction_labels_train\n",
  515. "y_pred = prediction_labels_test\n",
  516. "precision, recall, thresholds = precision_recall_curve(y_true, y_pred)\n",
  517. "f1_scores = 2*recall*precision/(recall+precision)\n",
  518. "fpr, tpr, threshold = metrics.roc_curve(y_true, y_pred)\n",
  519. "roc_auc = metrics.auc(fpr, tpr)"
  520. ]
  521. },
  522. {
  523. "cell_type": "code",
  524. "execution_count": 22,
  525. "metadata": {},
  526. "outputs": [
  527. {
  528. "name": "stdout",
  529. "output_type": "stream",
  530. "text": [
  531. "Accuracy Score: 0.18055416624987497\n"
  532. ]
  533. }
  534. ],
  535. "source": [
  536. "print('Accuracy Score: ', accuracy_score(y_true, y_pred))"
  537. ]
  538. },
  539. {
  540. "cell_type": "code",
  541. "execution_count": 23,
  542. "metadata": {},
  543. "outputs": [
  544. {
  545. "data": {
  546. "text/plain": [
  547. "array([[ 958, 8039],\n",
  548. " [ 153, 847]])"
  549. ]
  550. },
  551. "execution_count": 23,
  552. "metadata": {},
  553. "output_type": "execute_result"
  554. }
  555. ],
  556. "source": [
  557. "confusion_matrix(y_true, y_pred)"
  558. ]
  559. },
  560. {
  561. "cell_type": "code",
  562. "execution_count": 24,
  563. "metadata": {},
  564. "outputs": [
  565. {
  566. "name": "stdout",
  567. "output_type": "stream",
  568. "text": [
  569. " precision recall f1-score support\n",
  570. "\n",
  571. " 0 0.86 0.11 0.19 8997\n",
  572. " 1 0.10 0.85 0.17 1000\n",
  573. "\n",
  574. " accuracy 0.18 9997\n",
  575. " macro avg 0.48 0.48 0.18 9997\n",
  576. "weighted avg 0.79 0.18 0.19 9997\n",
  577. "\n"
  578. ]
  579. }
  580. ],
  581. "source": [
  582. "print(classification_report(y_true, y_pred))"
  583. ]
  584. },
  585. {
  586. "cell_type": "code",
  587. "execution_count": 25,
  588. "metadata": {},
  589. "outputs": [
  590. {
  591. "name": "stdout",
  592. "output_type": "stream",
  593. "text": [
  594. "Best threshold: 0\n",
  595. "Best F1-Score: 0.18186778212239701\n"
  596. ]
  597. }
  598. ],
  599. "source": [
  600. "print('Best threshold: ', thresholds[np.argmax(f1_scores)])\n",
  601. "print('Best F1-Score: ', np.max(f1_scores))"
  602. ]
  603. },
  604. {
  605. "cell_type": "code",
  606. "execution_count": 26,
  607. "metadata": {},
  608. "outputs": [
  609. {
  610. "data": {
  611. "image/png": "\n",
  612. "text/plain": [
  613. "<Figure size 432x288 with 1 Axes>"
  614. ]
  615. },
  616. "metadata": {
  617. "needs_background": "light"
  618. },
  619. "output_type": "display_data"
  620. }
  621. ],
  622. "source": [
  623. "plt.title('ROC')\n",
  624. "plt.plot(fpr, tpr, 'b', label = 'AUC = %0.2f' % roc_auc)\n",
  625. "plt.legend(loc = 'lower right')\n",
  626. "plt.ylabel('True Positive Rate')\n",
  627. "plt.xlabel('False Positive Rate')\n",
  628. "plt.show()"
  629. ]
  630. },
  631. {
  632. "cell_type": "markdown",
  633. "metadata": {},
  634. "source": [
  635. "## Pipline Example: AutoEncoder"
  636. ]
  637. },
  638. {
  639. "cell_type": "markdown",
  640. "metadata": {},
  641. "source": [
  642. "### Build Pipeline"
  643. ]
  644. },
  645. {
  646. "cell_type": "code",
  647. "execution_count": 27,
  648. "metadata": {},
  649. "outputs": [
  650. {
  651. "data": {
  652. "text/plain": [
  653. "'inputs.0'"
  654. ]
  655. },
  656. "execution_count": 27,
  657. "metadata": {},
  658. "output_type": "execute_result"
  659. }
  660. ],
  661. "source": [
  662. "# Creating pipeline\n",
  663. "pipeline_description = Pipeline()\n",
  664. "pipeline_description.add_input(name='inputs')"
  665. ]
  666. },
  667. {
  668. "cell_type": "code",
  669. "execution_count": 28,
  670. "metadata": {},
  671. "outputs": [
  672. {
  673. "name": "stderr",
  674. "output_type": "stream",
  675. "text": [
  676. "While loading primitive 'tods.data_processing.dataset_to_dataframe', an error has been detected: (scikit-learn 0.22.2.post1 (/Users/wangyanghe/anaconda3/envs/tods2/lib/python3.6/site-packages), Requirement.parse('scikit-learn==0.22.0'))\n",
  677. "Attempting to load primitive 'tods.data_processing.dataset_to_dataframe' without checking requirements.\n"
  678. ]
  679. }
  680. ],
  681. "source": [
  682. "# Step 0: dataset_to_dataframe\n",
  683. "step_0 = PrimitiveStep(primitive=index.get_primitive('d3m.primitives.tods.data_processing.dataset_to_dataframe'))\n",
  684. "step_0.add_argument(name='inputs', argument_type=ArgumentType.CONTAINER, data_reference='inputs.0')\n",
  685. "step_0.add_output('produce')\n",
  686. "pipeline_description.add_step(step_0)"
  687. ]
  688. },
  689. {
  690. "cell_type": "code",
  691. "execution_count": 29,
  692. "metadata": {},
  693. "outputs": [
  694. {
  695. "name": "stderr",
  696. "output_type": "stream",
  697. "text": [
  698. "While loading primitive 'tods.data_processing.column_parser', an error has been detected: (scikit-learn 0.22.2.post1 (/Users/wangyanghe/anaconda3/envs/tods2/lib/python3.6/site-packages), Requirement.parse('scikit-learn==0.22.0'))\n",
  699. "Attempting to load primitive 'tods.data_processing.column_parser' without checking requirements.\n"
  700. ]
  701. }
  702. ],
  703. "source": [
  704. "# Step 1: column_parser\n",
  705. "step_1 = PrimitiveStep(primitive=index.get_primitive('d3m.primitives.tods.data_processing.column_parser'))\n",
  706. "step_1.add_argument(name='inputs', argument_type=ArgumentType.CONTAINER, data_reference='steps.0.produce')\n",
  707. "step_1.add_output('produce')\n",
  708. "pipeline_description.add_step(step_1)"
  709. ]
  710. },
  711. {
  712. "cell_type": "code",
  713. "execution_count": 30,
  714. "metadata": {},
  715. "outputs": [
  716. {
  717. "name": "stderr",
  718. "output_type": "stream",
  719. "text": [
  720. "While loading primitive 'tods.data_processing.extract_columns_by_semantic_types', an error has been detected: (scikit-learn 0.22.2.post1 (/Users/wangyanghe/anaconda3/envs/tods2/lib/python3.6/site-packages), Requirement.parse('scikit-learn==0.22.0'))\n",
  721. "Attempting to load primitive 'tods.data_processing.extract_columns_by_semantic_types' without checking requirements.\n"
  722. ]
  723. }
  724. ],
  725. "source": [
  726. "# Step 2: extract_columns_by_semantic_types(attributes)\n",
  727. "step_2 = PrimitiveStep(primitive=index.get_primitive('d3m.primitives.tods.data_processing.extract_columns_by_semantic_types'))\n",
  728. "step_2.add_argument(name='inputs', argument_type=ArgumentType.CONTAINER, data_reference='steps.1.produce')\n",
  729. "step_2.add_output('produce')\n",
  730. "step_2.add_hyperparameter(name='semantic_types', argument_type=ArgumentType.VALUE,\n",
  731. "\t\t\t\t\t\t\t data=['https://metadata.datadrivendiscovery.org/types/Attribute'])\n",
  732. "pipeline_description.add_step(step_2)"
  733. ]
  734. },
  735. {
  736. "cell_type": "code",
  737. "execution_count": 31,
  738. "metadata": {},
  739. "outputs": [],
  740. "source": [
  741. "# Step 3: extract_columns_by_semantic_types(targets)\n",
  742. "step_3 = PrimitiveStep(primitive=index.get_primitive('d3m.primitives.tods.data_processing.extract_columns_by_semantic_types'))\n",
  743. "step_3.add_argument(name='inputs', argument_type=ArgumentType.CONTAINER, data_reference='steps.0.produce')\n",
  744. "step_3.add_output('produce')\n",
  745. "step_3.add_hyperparameter(name='semantic_types', argument_type=ArgumentType.VALUE,\n",
  746. "\t\t\t\t\t\t\tdata=['https://metadata.datadrivendiscovery.org/types/TrueTarget'])\n",
  747. "pipeline_description.add_step(step_3)"
  748. ]
  749. },
  750. {
  751. "cell_type": "code",
  752. "execution_count": 32,
  753. "metadata": {},
  754. "outputs": [],
  755. "source": [
  756. "attributes = 'steps.2.produce'\n",
  757. "targets = 'steps.3.produce'"
  758. ]
  759. },
  760. {
  761. "cell_type": "code",
  762. "execution_count": 33,
  763. "metadata": {},
  764. "outputs": [
  765. {
  766. "name": "stderr",
  767. "output_type": "stream",
  768. "text": [
  769. "While loading primitive 'tods.feature_analysis.statistical_maximum', an error has been detected: (scikit-learn 0.22.2.post1 (/Users/wangyanghe/anaconda3/envs/tods2/lib/python3.6/site-packages), Requirement.parse('scikit-learn==0.22.0'))\n",
  770. "Attempting to load primitive 'tods.feature_analysis.statistical_maximum' without checking requirements.\n"
  771. ]
  772. }
  773. ],
  774. "source": [
  775. "# Step 4: processing\n",
  776. "step_4 = PrimitiveStep(primitive=index.get_primitive('d3m.primitives.tods.feature_analysis.statistical_maximum'))\n",
  777. "step_4.add_argument(name='inputs', argument_type=ArgumentType.CONTAINER, data_reference=attributes)\n",
  778. "step_4.add_output('produce')\n",
  779. "pipeline_description.add_step(step_4)"
  780. ]
  781. },
  782. {
  783. "cell_type": "code",
  784. "execution_count": 34,
  785. "metadata": {},
  786. "outputs": [
  787. {
  788. "name": "stderr",
  789. "output_type": "stream",
  790. "text": [
  791. "While loading primitive 'tods.detection_algorithm.pyod_ae', an error has been detected: (scikit-learn 0.22.2.post1 (/Users/wangyanghe/anaconda3/envs/tods2/lib/python3.6/site-packages), Requirement.parse('scikit-learn==0.22.0'))\n",
  792. "Attempting to load primitive 'tods.detection_algorithm.pyod_ae' without checking requirements.\n"
  793. ]
  794. }
  795. ],
  796. "source": [
  797. "# Step 5: algorithm`\n",
  798. "step_5 = PrimitiveStep(primitive=index.get_primitive('d3m.primitives.tods.detection_algorithm.pyod_ae'))\n",
  799. "step_5.add_argument(name='inputs', argument_type=ArgumentType.CONTAINER, data_reference='steps.4.produce')\n",
  800. "step_5.add_output('produce')\n",
  801. "pipeline_description.add_step(step_5)"
  802. ]
  803. },
  804. {
  805. "cell_type": "code",
  806. "execution_count": 35,
  807. "metadata": {},
  808. "outputs": [
  809. {
  810. "name": "stderr",
  811. "output_type": "stream",
  812. "text": [
  813. "While loading primitive 'tods.data_processing.construct_predictions', an error has been detected: (scikit-learn 0.22.2.post1 (/Users/wangyanghe/anaconda3/envs/tods2/lib/python3.6/site-packages), Requirement.parse('scikit-learn==0.22.0'))\n",
  814. "Attempting to load primitive 'tods.data_processing.construct_predictions' without checking requirements.\n"
  815. ]
  816. }
  817. ],
  818. "source": [
  819. "# Step 6: Predictions\n",
  820. "step_6 = PrimitiveStep(primitive=index.get_primitive('d3m.primitives.tods.data_processing.construct_predictions'))\n",
  821. "step_6.add_argument(name='inputs', argument_type=ArgumentType.CONTAINER, data_reference='steps.5.produce')\n",
  822. "step_6.add_argument(name='reference', argument_type=ArgumentType.CONTAINER, data_reference='steps.1.produce')\n",
  823. "step_6.add_output('produce')\n",
  824. "pipeline_description.add_step(step_6)"
  825. ]
  826. },
  827. {
  828. "cell_type": "code",
  829. "execution_count": 36,
  830. "metadata": {},
  831. "outputs": [
  832. {
  833. "data": {
  834. "text/plain": [
  835. "'outputs.0'"
  836. ]
  837. },
  838. "execution_count": 36,
  839. "metadata": {},
  840. "output_type": "execute_result"
  841. }
  842. ],
  843. "source": [
  844. "# Final Output\n",
  845. "pipeline_description.add_output(name='output predictions', data_reference='steps.6.produce')"
  846. ]
  847. },
  848. {
  849. "cell_type": "code",
  850. "execution_count": 37,
  851. "metadata": {},
  852. "outputs": [
  853. {
  854. "name": "stdout",
  855. "output_type": "stream",
  856. "text": [
  857. "{\"id\": \"44caca5f-ed2a-42d6-bede-777fd96e5a90\", \"schema\": \"https://metadata.datadrivendiscovery.org/schemas/v0/pipeline.json\", \"created\": \"2021-06-29T04:06:32.108192Z\", \"inputs\": [{\"name\": \"inputs\"}], \"outputs\": [{\"data\": \"steps.6.produce\", \"name\": \"output predictions\"}], \"steps\": [{\"type\": \"PRIMITIVE\", \"primitive\": {\"id\": \"c78138d9-9377-31dc-aee8-83d9df049c60\", \"version\": \"0.3.0\", \"python_path\": \"d3m.primitives.tods.data_processing.dataset_to_dataframe\", \"name\": \"Extract a DataFrame from a Dataset\"}, \"arguments\": {\"inputs\": {\"type\": \"CONTAINER\", \"data\": \"inputs.0\"}}, \"outputs\": [{\"id\": \"produce\"}]}, {\"type\": \"PRIMITIVE\", \"primitive\": {\"id\": \"81235c29-aeb9-3828-911a-1b25319b6998\", \"version\": \"0.6.0\", \"python_path\": \"d3m.primitives.tods.data_processing.column_parser\", \"name\": \"Parses strings into their types\"}, \"arguments\": {\"inputs\": {\"type\": \"CONTAINER\", \"data\": \"steps.0.produce\"}}, \"outputs\": [{\"id\": \"produce\"}]}, {\"type\": \"PRIMITIVE\", \"primitive\": {\"id\": \"a996cd89-ddf0-367f-8e7f-8c013cbc2891\", \"version\": \"0.4.0\", \"python_path\": \"d3m.primitives.tods.data_processing.extract_columns_by_semantic_types\", \"name\": \"Extracts columns by semantic type\"}, \"arguments\": {\"inputs\": {\"type\": \"CONTAINER\", \"data\": \"steps.1.produce\"}}, \"outputs\": [{\"id\": \"produce\"}], \"hyperparams\": {\"semantic_types\": {\"type\": \"VALUE\", \"data\": [\"https://metadata.datadrivendiscovery.org/types/Attribute\"]}}}, {\"type\": \"PRIMITIVE\", \"primitive\": {\"id\": \"a996cd89-ddf0-367f-8e7f-8c013cbc2891\", \"version\": \"0.4.0\", \"python_path\": \"d3m.primitives.tods.data_processing.extract_columns_by_semantic_types\", \"name\": \"Extracts columns by semantic type\"}, \"arguments\": {\"inputs\": {\"type\": \"CONTAINER\", \"data\": \"steps.0.produce\"}}, \"outputs\": [{\"id\": \"produce\"}], \"hyperparams\": {\"semantic_types\": {\"type\": \"VALUE\", \"data\": [\"https://metadata.datadrivendiscovery.org/types/TrueTarget\"]}}}, {\"type\": \"PRIMITIVE\", \"primitive\": {\"id\": \"f07ce875-bbc7-36c5-9cc1-ba4bfb7cf48e\", \"version\": \"0.1.0\", \"python_path\": \"d3m.primitives.tods.feature_analysis.statistical_maximum\", \"name\": \"Time Series Decompostional\"}, \"arguments\": {\"inputs\": {\"type\": \"CONTAINER\", \"data\": \"steps.2.produce\"}}, \"outputs\": [{\"id\": \"produce\"}]}, {\"type\": \"PRIMITIVE\", \"primitive\": {\"id\": \"67e7fcdf-d645-3417-9aa4-85cd369487d9\", \"version\": \"0.0.1\", \"python_path\": \"d3m.primitives.tods.detection_algorithm.pyod_ae\", \"name\": \"TODS.anomaly_detection_primitives.AutoEncoder\"}, \"arguments\": {\"inputs\": {\"type\": \"CONTAINER\", \"data\": \"steps.4.produce\"}}, \"outputs\": [{\"id\": \"produce\"}]}, {\"type\": \"PRIMITIVE\", \"primitive\": {\"id\": \"2530840a-07d4-3874-b7d8-9eb5e4ae2bf3\", \"version\": \"0.3.0\", \"python_path\": \"d3m.primitives.tods.data_processing.construct_predictions\", \"name\": \"Construct pipeline predictions output\"}, \"arguments\": {\"inputs\": {\"type\": \"CONTAINER\", \"data\": \"steps.5.produce\"}, \"reference\": {\"type\": \"CONTAINER\", \"data\": \"steps.1.produce\"}}, \"outputs\": [{\"id\": \"produce\"}]}], \"digest\": \"3f4eb364201fc5fc403cc66c847ce1597fe8b0a91d130bded4d21c2a6ef2eef2\"}\n"
  858. ]
  859. }
  860. ],
  861. "source": [
  862. "# Output to json\n",
  863. "data = pipeline_description.to_json()\n",
  864. "with open('autoencoder_pipeline.json', 'w') as f:\n",
  865. " f.write(data)\n",
  866. " print(data)"
  867. ]
  868. },
  869. {
  870. "cell_type": "markdown",
  871. "metadata": {},
  872. "source": [
  873. "### Run Pipeline"
  874. ]
  875. },
  876. {
  877. "cell_type": "code",
  878. "execution_count": 38,
  879. "metadata": {},
  880. "outputs": [],
  881. "source": [
  882. "this_path = os.path.dirname(os.path.abspath(\"__file__\"))\n",
  883. "default_data_path = os.path.join(this_path, '../../datasets/anomaly/raw_data/yahoo_sub_5.csv')"
  884. ]
  885. },
  886. {
  887. "cell_type": "code",
  888. "execution_count": 39,
  889. "metadata": {},
  890. "outputs": [
  891. {
  892. "data": {
  893. "text/plain": [
  894. "_StoreAction(option_strings=['--pipeline_path'], dest='pipeline_path', nargs=None, const=None, default='/Users/wangyanghe/Desktop/Research/tods/examples/Demo Notebook/autoencoder_pipeline.json', type=None, choices=None, help='Input the path of the pre-built pipeline description', metavar=None)"
  895. ]
  896. },
  897. "execution_count": 39,
  898. "metadata": {},
  899. "output_type": "execute_result"
  900. }
  901. ],
  902. "source": [
  903. "parser = argparse.ArgumentParser(description='Arguments for running predefined pipelin.')\n",
  904. "parser.add_argument('--table_path', type=str, default=default_data_path,\n",
  905. " help='Input the path of the input data table')\n",
  906. "parser.add_argument('--target_index', type=int, default=6,\n",
  907. " help='Index of the ground truth (for evaluation)')\n",
  908. "parser.add_argument('--metric',type=str, default='F1_MACRO',\n",
  909. " help='Evaluation Metric (F1, F1_MACRO)')\n",
  910. "parser.add_argument('--pipeline_path', \n",
  911. " default=os.path.join(this_path, 'autoencoder_pipeline.json'),\n",
  912. " help='Input the path of the pre-built pipeline description')"
  913. ]
  914. },
  915. {
  916. "cell_type": "code",
  917. "execution_count": 40,
  918. "metadata": {},
  919. "outputs": [],
  920. "source": [
  921. "args, unknown = parser.parse_known_args()\n",
  922. "table_path = args.table_path \n",
  923. "target_index = args.target_index # what column is the target\n",
  924. "pipeline_path = args.pipeline_path\n",
  925. "metric = args.metric # F1 on both label 0 and 1"
  926. ]
  927. },
  928. {
  929. "cell_type": "code",
  930. "execution_count": 41,
  931. "metadata": {},
  932. "outputs": [],
  933. "source": [
  934. "# Read data and generate dataset\n",
  935. "df = pd.read_csv(table_path)\n",
  936. "dataset = generate_dataset(df, target_index)"
  937. ]
  938. },
  939. {
  940. "cell_type": "code",
  941. "execution_count": 42,
  942. "metadata": {},
  943. "outputs": [],
  944. "source": [
  945. "# Load the default pipeline\n",
  946. "pipeline = load_pipeline(pipeline_path)"
  947. ]
  948. },
  949. {
  950. "cell_type": "code",
  951. "execution_count": 43,
  952. "metadata": {},
  953. "outputs": [
  954. {
  955. "name": "stderr",
  956. "output_type": "stream",
  957. "text": [
  958. "Not all provided hyper-parameters for the data preparation pipeline 79ce71bd-db96-494b-a455-14f2e2ac5040 were used: ['method', 'number_of_folds', 'randomSeed', 'shuffle', 'stratified']\n"
  959. ]
  960. },
  961. {
  962. "name": "stdout",
  963. "output_type": "stream",
  964. "text": [
  965. "Model: \"sequential_2\"\n",
  966. "_________________________________________________________________\n",
  967. "Layer (type) Output Shape Param # \n",
  968. "=================================================================\n",
  969. "dense_2 (Dense) (None, 12) 156 \n",
  970. "_________________________________________________________________\n",
  971. "dropout_2 (Dropout) (None, 12) 0 \n",
  972. "_________________________________________________________________\n",
  973. "dense_3 (Dense) (None, 12) 156 \n",
  974. "_________________________________________________________________\n",
  975. "dropout_3 (Dropout) (None, 12) 0 \n",
  976. "_________________________________________________________________\n",
  977. "dense_4 (Dense) (None, 1) 13 \n",
  978. "_________________________________________________________________\n",
  979. "dropout_4 (Dropout) (None, 1) 0 \n",
  980. "_________________________________________________________________\n",
  981. "dense_5 (Dense) (None, 4) 8 \n",
  982. "_________________________________________________________________\n",
  983. "dropout_5 (Dropout) (None, 4) 0 \n",
  984. "_________________________________________________________________\n",
  985. "dense_6 (Dense) (None, 1) 5 \n",
  986. "_________________________________________________________________\n",
  987. "dropout_6 (Dropout) (None, 1) 0 \n",
  988. "_________________________________________________________________\n",
  989. "dense_7 (Dense) (None, 12) 24 \n",
  990. "=================================================================\n",
  991. "Total params: 362\n",
  992. "Trainable params: 362\n",
  993. "Non-trainable params: 0\n",
  994. "_________________________________________________________________\n",
  995. "None\n",
  996. "Epoch 1/20\n",
  997. "40/40 [==============================] - 0s 4ms/step - loss: 1.8796 - val_loss: 1.4306\n",
  998. "Epoch 2/20\n",
  999. "40/40 [==============================] - 0s 1ms/step - loss: 1.7280 - val_loss: 1.3324\n",
  1000. "Epoch 3/20\n",
  1001. "40/40 [==============================] - 0s 1ms/step - loss: 1.6184 - val_loss: 1.2660\n",
  1002. "Epoch 4/20\n",
  1003. "40/40 [==============================] - 0s 1ms/step - loss: 1.5448 - val_loss: 1.2157\n",
  1004. "Epoch 5/20\n",
  1005. "40/40 [==============================] - 0s 1ms/step - loss: 1.4950 - val_loss: 1.1736\n",
  1006. "Epoch 6/20\n",
  1007. "40/40 [==============================] - 0s 1ms/step - loss: 1.4282 - val_loss: 1.1391\n",
  1008. "Epoch 7/20\n",
  1009. "40/40 [==============================] - 0s 1ms/step - loss: 1.3967 - val_loss: 1.1090\n",
  1010. "Epoch 8/20\n",
  1011. "40/40 [==============================] - 0s 1ms/step - loss: 1.3643 - val_loss: 1.0819\n",
  1012. "Epoch 9/20\n",
  1013. "40/40 [==============================] - 0s 2ms/step - loss: 1.3212 - val_loss: 1.0579\n",
  1014. "Epoch 10/20\n",
  1015. "40/40 [==============================] - 0s 2ms/step - loss: 1.2965 - val_loss: 1.0358\n",
  1016. "Epoch 11/20\n",
  1017. "40/40 [==============================] - 0s 1ms/step - loss: 1.2677 - val_loss: 1.0152\n",
  1018. "Epoch 12/20\n",
  1019. "40/40 [==============================] - 0s 1ms/step - loss: 1.2449 - val_loss: 0.9960\n",
  1020. "Epoch 13/20\n",
  1021. "40/40 [==============================] - 0s 1ms/step - loss: 1.2246 - val_loss: 0.9778\n",
  1022. "Epoch 14/20\n",
  1023. "40/40 [==============================] - 0s 1ms/step - loss: 1.2096 - val_loss: 0.9606\n",
  1024. "Epoch 15/20\n",
  1025. "40/40 [==============================] - 0s 1ms/step - loss: 1.1837 - val_loss: 0.9444\n",
  1026. "Epoch 16/20\n",
  1027. "40/40 [==============================] - 0s 2ms/step - loss: 1.1703 - val_loss: 0.9288\n",
  1028. "Epoch 17/20\n",
  1029. "40/40 [==============================] - 0s 1ms/step - loss: 1.1430 - val_loss: 0.9140\n",
  1030. "Epoch 18/20\n",
  1031. "40/40 [==============================] - 0s 2ms/step - loss: 1.1249 - val_loss: 0.8997\n",
  1032. "Epoch 19/20\n",
  1033. "40/40 [==============================] - 0s 2ms/step - loss: 1.1178 - val_loss: 0.8861\n",
  1034. "Epoch 20/20\n",
  1035. "40/40 [==============================] - 0s 2ms/step - loss: 1.0976 - val_loss: 0.8732\n",
  1036. "{'method_called': 'evaluate',\n",
  1037. " 'outputs': \"[{'outputs.0': d3mIndex anomaly\"\n",
  1038. " '0 0 1'\n",
  1039. " '1 1 0'\n",
  1040. " '2 2 0'\n",
  1041. " '3 3 1'\n",
  1042. " '4 4 0'\n",
  1043. " '... ... ...'\n",
  1044. " '1395 1395 1'\n",
  1045. " '1396 1396 0'\n",
  1046. " '1397 1397 1'\n",
  1047. " '1398 1398 1'\n",
  1048. " '1399 1399 1'\n",
  1049. " ''\n",
  1050. " \"[1400 rows x 2 columns]}, {'outputs.0': d3mIndex anomaly\"\n",
  1051. " '0 0 1'\n",
  1052. " '1 1 0'\n",
  1053. " '2 2 0'\n",
  1054. " '3 3 1'\n",
  1055. " '4 4 0'\n",
  1056. " '... ... ...'\n",
  1057. " '1395 1395 1'\n",
  1058. " '1396 1396 0'\n",
  1059. " '1397 1397 1'\n",
  1060. " '1398 1398 1'\n",
  1061. " '1399 1399 1'\n",
  1062. " ''\n",
  1063. " '[1400 rows x 2 columns]}]',\n",
  1064. " 'pipeline': '<d3m.metadata.pipeline.Pipeline object at 0x1551d0ba8>',\n",
  1065. " 'scores': ' metric value normalized randomSeed fold'\n",
  1066. " '0 F1_MACRO 0.509059 0.509059 0 0',\n",
  1067. " 'status': 'COMPLETED'}\n"
  1068. ]
  1069. }
  1070. ],
  1071. "source": [
  1072. "# Run the pipeline\n",
  1073. "pipeline_result = evaluate_pipeline(dataset, pipeline, metric)\n",
  1074. "print(pipeline_result)"
  1075. ]
  1076. },
  1077. {
  1078. "cell_type": "markdown",
  1079. "metadata": {},
  1080. "source": [
  1081. "## Searcher Example:"
  1082. ]
  1083. },
  1084. {
  1085. "cell_type": "code",
  1086. "execution_count": 44,
  1087. "metadata": {},
  1088. "outputs": [],
  1089. "source": [
  1090. "table_path = '../../datasets/anomaly/raw_data/yahoo_sub_5.csv'\n",
  1091. "target_index = 6 # column of the target label\n",
  1092. "time_limit = 30 # How many seconds you wanna search"
  1093. ]
  1094. },
  1095. {
  1096. "cell_type": "code",
  1097. "execution_count": 45,
  1098. "metadata": {},
  1099. "outputs": [],
  1100. "source": [
  1101. "metric = 'F1_MACRO' # F1 on both label 0 and 1"
  1102. ]
  1103. },
  1104. {
  1105. "cell_type": "code",
  1106. "execution_count": 46,
  1107. "metadata": {},
  1108. "outputs": [],
  1109. "source": [
  1110. "# Read data and generate dataset and problem\n",
  1111. "df = pd.read_csv(table_path)\n",
  1112. "dataset = generate_dataset(df, target_index=target_index)\n",
  1113. "problem_description = generate_problem(dataset, metric)"
  1114. ]
  1115. },
  1116. {
  1117. "cell_type": "code",
  1118. "execution_count": 47,
  1119. "metadata": {},
  1120. "outputs": [],
  1121. "source": [
  1122. "# Start backend\n",
  1123. "backend = SimpleRunner(random_seed=0)"
  1124. ]
  1125. },
  1126. {
  1127. "cell_type": "code",
  1128. "execution_count": 48,
  1129. "metadata": {},
  1130. "outputs": [],
  1131. "source": [
  1132. "# Start search algorithm\n",
  1133. "search = BruteForceSearch(problem_description=problem_description,\n",
  1134. " backend=backend)"
  1135. ]
  1136. },
  1137. {
  1138. "cell_type": "code",
  1139. "execution_count": 49,
  1140. "metadata": {},
  1141. "outputs": [
  1142. {
  1143. "name": "stderr",
  1144. "output_type": "stream",
  1145. "text": [
  1146. "Not all provided hyper-parameters for the data preparation pipeline 79ce71bd-db96-494b-a455-14f2e2ac5040 were used: ['method', 'number_of_folds', 'randomSeed', 'shuffle', 'stratified']\n"
  1147. ]
  1148. },
  1149. {
  1150. "name": "stdout",
  1151. "output_type": "stream",
  1152. "text": [
  1153. "Model: \"sequential_3\"\n",
  1154. "_________________________________________________________________\n",
  1155. "Layer (type) Output Shape Param # \n",
  1156. "=================================================================\n",
  1157. "dense_8 (Dense) (None, 12) 156 \n",
  1158. "_________________________________________________________________\n",
  1159. "dropout_7 (Dropout) (None, 12) 0 \n",
  1160. "_________________________________________________________________\n",
  1161. "dense_9 (Dense) (None, 12) 156 \n",
  1162. "_________________________________________________________________\n",
  1163. "dropout_8 (Dropout) (None, 12) 0 \n",
  1164. "_________________________________________________________________\n",
  1165. "dense_10 (Dense) (None, 1) 13 \n",
  1166. "_________________________________________________________________\n",
  1167. "dropout_9 (Dropout) (None, 1) 0 \n",
  1168. "_________________________________________________________________\n",
  1169. "dense_11 (Dense) (None, 4) 8 \n",
  1170. "_________________________________________________________________\n",
  1171. "dropout_10 (Dropout) (None, 4) 0 \n",
  1172. "_________________________________________________________________\n",
  1173. "dense_12 (Dense) (None, 1) 5 \n",
  1174. "_________________________________________________________________\n",
  1175. "dropout_11 (Dropout) (None, 1) 0 \n",
  1176. "_________________________________________________________________\n",
  1177. "dense_13 (Dense) (None, 12) 24 \n",
  1178. "=================================================================\n",
  1179. "Total params: 362\n",
  1180. "Trainable params: 362\n",
  1181. "Non-trainable params: 0\n",
  1182. "_________________________________________________________________\n",
  1183. "None\n",
  1184. "Epoch 1/20\n",
  1185. "40/40 [==============================] - 0s 4ms/step - loss: 1.4187 - val_loss: 1.0009\n",
  1186. "Epoch 2/20\n",
  1187. "40/40 [==============================] - 0s 1ms/step - loss: 1.2895 - val_loss: 0.9167\n",
  1188. "Epoch 3/20\n",
  1189. "40/40 [==============================] - 0s 2ms/step - loss: 1.2010 - val_loss: 0.8517\n",
  1190. "Epoch 4/20\n",
  1191. "40/40 [==============================] - 0s 1ms/step - loss: 1.1463 - val_loss: 0.7988\n",
  1192. "Epoch 5/20\n",
  1193. "40/40 [==============================] - 0s 1ms/step - loss: 1.0777 - val_loss: 0.7531\n",
  1194. "Epoch 6/20\n",
  1195. "40/40 [==============================] - 0s 1ms/step - loss: 1.0281 - val_loss: 0.7135\n",
  1196. "Epoch 7/20\n",
  1197. "40/40 [==============================] - 0s 1ms/step - loss: 0.9993 - val_loss: 0.6791\n",
  1198. "Epoch 8/20\n",
  1199. "40/40 [==============================] - 0s 1ms/step - loss: 0.9634 - val_loss: 0.6496\n",
  1200. "Epoch 9/20\n",
  1201. "40/40 [==============================] - 0s 1ms/step - loss: 0.9320 - val_loss: 0.6239\n",
  1202. "Epoch 10/20\n",
  1203. "40/40 [==============================] - 0s 1ms/step - loss: 0.8982 - val_loss: 0.6019\n",
  1204. "Epoch 11/20\n",
  1205. "40/40 [==============================] - 0s 1ms/step - loss: 0.8760 - val_loss: 0.5825\n",
  1206. "Epoch 12/20\n",
  1207. "40/40 [==============================] - 0s 2ms/step - loss: 0.8527 - val_loss: 0.5652\n",
  1208. "Epoch 13/20\n",
  1209. "40/40 [==============================] - 0s 2ms/step - loss: 0.8399 - val_loss: 0.5510\n",
  1210. "Epoch 14/20\n",
  1211. "40/40 [==============================] - 0s 2ms/step - loss: 0.8218 - val_loss: 0.5378\n",
  1212. "Epoch 15/20\n",
  1213. "40/40 [==============================] - 0s 2ms/step - loss: 0.8096 - val_loss: 0.5263\n",
  1214. "Epoch 16/20\n",
  1215. "40/40 [==============================] - 0s 2ms/step - loss: 0.7945 - val_loss: 0.5162\n",
  1216. "Epoch 17/20\n",
  1217. "40/40 [==============================] - 0s 2ms/step - loss: 0.7836 - val_loss: 0.5069\n",
  1218. "Epoch 18/20\n",
  1219. "40/40 [==============================] - 0s 2ms/step - loss: 0.7713 - val_loss: 0.4988\n",
  1220. "Epoch 19/20\n",
  1221. "40/40 [==============================] - 0s 2ms/step - loss: 0.7561 - val_loss: 0.4908\n",
  1222. "Epoch 20/20\n",
  1223. "40/40 [==============================] - 0s 1ms/step - loss: 0.7538 - val_loss: 0.4840\n"
  1224. ]
  1225. },
  1226. {
  1227. "name": "stderr",
  1228. "output_type": "stream",
  1229. "text": [
  1230. "Traceback (most recent call last):\n",
  1231. " File \"/Users/wangyanghe/Desktop/Research/tods/tods/searcher/brute_force_search.py\", line 62, in _search\n",
  1232. " for error in pipeline_result.error:\n",
  1233. "TypeError: 'NoneType' object is not iterable\n",
  1234. "Not all provided hyper-parameters for the data preparation pipeline 79ce71bd-db96-494b-a455-14f2e2ac5040 were used: ['method', 'number_of_folds', 'randomSeed', 'shuffle', 'stratified']\n"
  1235. ]
  1236. },
  1237. {
  1238. "name": "stdout",
  1239. "output_type": "stream",
  1240. "text": [
  1241. "Model: \"sequential_4\"\n",
  1242. "_________________________________________________________________\n",
  1243. "Layer (type) Output Shape Param # \n",
  1244. "=================================================================\n",
  1245. "dense_14 (Dense) (None, 12) 156 \n",
  1246. "_________________________________________________________________\n",
  1247. "dropout_12 (Dropout) (None, 12) 0 \n",
  1248. "_________________________________________________________________\n",
  1249. "dense_15 (Dense) (None, 12) 156 \n",
  1250. "_________________________________________________________________\n",
  1251. "dropout_13 (Dropout) (None, 12) 0 \n",
  1252. "_________________________________________________________________\n",
  1253. "dense_16 (Dense) (None, 1) 13 \n",
  1254. "_________________________________________________________________\n",
  1255. "dropout_14 (Dropout) (None, 1) 0 \n",
  1256. "_________________________________________________________________\n",
  1257. "dense_17 (Dense) (None, 4) 8 \n",
  1258. "_________________________________________________________________\n",
  1259. "dropout_15 (Dropout) (None, 4) 0 \n",
  1260. "_________________________________________________________________\n",
  1261. "dense_18 (Dense) (None, 1) 5 \n",
  1262. "_________________________________________________________________\n",
  1263. "dropout_16 (Dropout) (None, 1) 0 \n",
  1264. "_________________________________________________________________\n",
  1265. "dense_19 (Dense) (None, 12) 24 \n",
  1266. "=================================================================\n",
  1267. "Total params: 362\n",
  1268. "Trainable params: 362\n",
  1269. "Non-trainable params: 0\n",
  1270. "_________________________________________________________________\n",
  1271. "None\n",
  1272. "Epoch 1/20\n",
  1273. "40/40 [==============================] - 0s 4ms/step - loss: 1.4226 - val_loss: 1.0312\n",
  1274. "Epoch 2/20\n",
  1275. "40/40 [==============================] - 0s 1ms/step - loss: 1.3035 - val_loss: 0.9579\n",
  1276. "Epoch 3/20\n",
  1277. "40/40 [==============================] - 0s 1ms/step - loss: 1.2140 - val_loss: 0.9087\n",
  1278. "Epoch 4/20\n",
  1279. "40/40 [==============================] - 0s 1ms/step - loss: 1.1662 - val_loss: 0.8710\n",
  1280. "Epoch 5/20\n",
  1281. "40/40 [==============================] - 0s 1ms/step - loss: 1.1229 - val_loss: 0.8401\n",
  1282. "Epoch 6/20\n",
  1283. "40/40 [==============================] - 0s 1ms/step - loss: 1.0874 - val_loss: 0.8141\n",
  1284. "Epoch 7/20\n",
  1285. "40/40 [==============================] - 0s 1ms/step - loss: 1.0573 - val_loss: 0.7913\n",
  1286. "Epoch 8/20\n",
  1287. "40/40 [==============================] - 0s 1ms/step - loss: 1.0292 - val_loss: 0.7709\n",
  1288. "Epoch 9/20\n",
  1289. "40/40 [==============================] - 0s 1ms/step - loss: 1.0038 - val_loss: 0.7525\n",
  1290. "Epoch 10/20\n",
  1291. "40/40 [==============================] - 0s 1ms/step - loss: 0.9837 - val_loss: 0.7353\n",
  1292. "Epoch 11/20\n",
  1293. "40/40 [==============================] - 0s 1ms/step - loss: 0.9654 - val_loss: 0.7190\n",
  1294. "Epoch 12/20\n",
  1295. "40/40 [==============================] - 0s 1ms/step - loss: 0.9444 - val_loss: 0.7040\n",
  1296. "Epoch 13/20\n",
  1297. "40/40 [==============================] - 0s 1ms/step - loss: 0.9280 - val_loss: 0.6898\n",
  1298. "Epoch 14/20\n",
  1299. "40/40 [==============================] - 0s 1ms/step - loss: 0.9117 - val_loss: 0.6762\n",
  1300. "Epoch 15/20\n",
  1301. "40/40 [==============================] - 0s 1ms/step - loss: 0.8950 - val_loss: 0.6634\n",
  1302. "Epoch 16/20\n",
  1303. "40/40 [==============================] - 0s 1ms/step - loss: 0.8789 - val_loss: 0.6515\n",
  1304. "Epoch 17/20\n",
  1305. "40/40 [==============================] - 0s 1ms/step - loss: 0.8663 - val_loss: 0.6400\n",
  1306. "Epoch 18/20\n",
  1307. "40/40 [==============================] - 0s 1ms/step - loss: 0.8541 - val_loss: 0.6290\n",
  1308. "Epoch 19/20\n",
  1309. "40/40 [==============================] - 0s 1ms/step - loss: 0.8419 - val_loss: 0.6187\n",
  1310. "Epoch 20/20\n",
  1311. "40/40 [==============================] - 0s 2ms/step - loss: 0.8293 - val_loss: 0.6088\n"
  1312. ]
  1313. },
  1314. {
  1315. "name": "stderr",
  1316. "output_type": "stream",
  1317. "text": [
  1318. "Traceback (most recent call last):\n",
  1319. " File \"/Users/wangyanghe/Desktop/Research/tods/tods/searcher/brute_force_search.py\", line 62, in _search\n",
  1320. " for error in pipeline_result.error:\n",
  1321. "TypeError: 'NoneType' object is not iterable\n",
  1322. "Not all provided hyper-parameters for the data preparation pipeline 79ce71bd-db96-494b-a455-14f2e2ac5040 were used: ['method', 'number_of_folds', 'randomSeed', 'shuffle', 'stratified']\n"
  1323. ]
  1324. },
  1325. {
  1326. "name": "stdout",
  1327. "output_type": "stream",
  1328. "text": [
  1329. "Model: \"sequential_5\"\n",
  1330. "_________________________________________________________________\n",
  1331. "Layer (type) Output Shape Param # \n",
  1332. "=================================================================\n",
  1333. "dense_20 (Dense) (None, 12) 156 \n",
  1334. "_________________________________________________________________\n",
  1335. "dropout_17 (Dropout) (None, 12) 0 \n",
  1336. "_________________________________________________________________\n",
  1337. "dense_21 (Dense) (None, 12) 156 \n",
  1338. "_________________________________________________________________\n",
  1339. "dropout_18 (Dropout) (None, 12) 0 \n",
  1340. "_________________________________________________________________\n",
  1341. "dense_22 (Dense) (None, 1) 13 \n",
  1342. "_________________________________________________________________\n",
  1343. "dropout_19 (Dropout) (None, 1) 0 \n",
  1344. "_________________________________________________________________\n",
  1345. "dense_23 (Dense) (None, 4) 8 \n",
  1346. "_________________________________________________________________\n",
  1347. "dropout_20 (Dropout) (None, 4) 0 \n",
  1348. "_________________________________________________________________\n",
  1349. "dense_24 (Dense) (None, 1) 5 \n",
  1350. "_________________________________________________________________\n",
  1351. "dropout_21 (Dropout) (None, 1) 0 \n",
  1352. "_________________________________________________________________\n",
  1353. "dense_25 (Dense) (None, 12) 24 \n",
  1354. "=================================================================\n",
  1355. "Total params: 362\n",
  1356. "Trainable params: 362\n",
  1357. "Non-trainable params: 0\n",
  1358. "_________________________________________________________________\n",
  1359. "None\n",
  1360. "Epoch 1/20\n",
  1361. "40/40 [==============================] - 0s 4ms/step - loss: 1.5037 - val_loss: 0.9548\n",
  1362. "Epoch 2/20\n",
  1363. "40/40 [==============================] - 0s 1ms/step - loss: 1.3846 - val_loss: 0.8790\n",
  1364. "Epoch 3/20\n",
  1365. "40/40 [==============================] - 0s 1ms/step - loss: 1.2879 - val_loss: 0.8227\n",
  1366. "Epoch 4/20\n",
  1367. "40/40 [==============================] - 0s 1ms/step - loss: 1.2069 - val_loss: 0.7754\n",
  1368. "Epoch 5/20\n",
  1369. "40/40 [==============================] - 0s 1ms/step - loss: 1.1539 - val_loss: 0.7350\n",
  1370. "Epoch 6/20\n",
  1371. "40/40 [==============================] - 0s 1ms/step - loss: 1.1092 - val_loss: 0.6988\n",
  1372. "Epoch 7/20\n",
  1373. "40/40 [==============================] - 0s 2ms/step - loss: 1.0573 - val_loss: 0.6661\n",
  1374. "Epoch 8/20\n",
  1375. "40/40 [==============================] - 0s 2ms/step - loss: 1.0250 - val_loss: 0.6363\n",
  1376. "Epoch 9/20\n",
  1377. "40/40 [==============================] - 0s 1ms/step - loss: 0.9841 - val_loss: 0.6097\n",
  1378. "Epoch 10/20\n",
  1379. "40/40 [==============================] - 0s 1ms/step - loss: 0.9457 - val_loss: 0.5857\n",
  1380. "Epoch 11/20\n",
  1381. "40/40 [==============================] - 0s 1ms/step - loss: 0.9316 - val_loss: 0.5643\n",
  1382. "Epoch 12/20\n",
  1383. "40/40 [==============================] - 0s 1ms/step - loss: 0.9055 - val_loss: 0.5456\n",
  1384. "Epoch 13/20\n",
  1385. "40/40 [==============================] - 0s 1ms/step - loss: 0.8854 - val_loss: 0.5292\n",
  1386. "Epoch 14/20\n",
  1387. "40/40 [==============================] - 0s 1ms/step - loss: 0.8633 - val_loss: 0.5146\n",
  1388. "Epoch 15/20\n",
  1389. "40/40 [==============================] - 0s 2ms/step - loss: 0.8412 - val_loss: 0.5022\n",
  1390. "Epoch 16/20\n",
  1391. "40/40 [==============================] - 0s 2ms/step - loss: 0.8335 - val_loss: 0.4911\n",
  1392. "Epoch 17/20\n",
  1393. "40/40 [==============================] - 0s 1ms/step - loss: 0.8192 - val_loss: 0.4814\n",
  1394. "Epoch 18/20\n",
  1395. "40/40 [==============================] - 0s 1ms/step - loss: 0.8071 - val_loss: 0.4726\n",
  1396. "Epoch 19/20\n",
  1397. "40/40 [==============================] - 0s 1ms/step - loss: 0.7888 - val_loss: 0.4646\n",
  1398. "Epoch 20/20\n",
  1399. "40/40 [==============================] - 0s 1ms/step - loss: 0.7846 - val_loss: 0.4576\n"
  1400. ]
  1401. },
  1402. {
  1403. "name": "stderr",
  1404. "output_type": "stream",
  1405. "text": [
  1406. "Traceback (most recent call last):\n",
  1407. " File \"/Users/wangyanghe/Desktop/Research/tods/tods/searcher/brute_force_search.py\", line 62, in _search\n",
  1408. " for error in pipeline_result.error:\n",
  1409. "TypeError: 'NoneType' object is not iterable\n",
  1410. "Not all provided hyper-parameters for the data preparation pipeline 79ce71bd-db96-494b-a455-14f2e2ac5040 were used: ['method', 'number_of_folds', 'randomSeed', 'shuffle', 'stratified']\n"
  1411. ]
  1412. },
  1413. {
  1414. "name": "stdout",
  1415. "output_type": "stream",
  1416. "text": [
  1417. "Model: \"sequential_6\"\n",
  1418. "_________________________________________________________________\n",
  1419. "Layer (type) Output Shape Param # \n",
  1420. "=================================================================\n",
  1421. "dense_26 (Dense) (None, 12) 156 \n",
  1422. "_________________________________________________________________\n",
  1423. "dropout_22 (Dropout) (None, 12) 0 \n",
  1424. "_________________________________________________________________\n",
  1425. "dense_27 (Dense) (None, 12) 156 \n",
  1426. "_________________________________________________________________\n",
  1427. "dropout_23 (Dropout) (None, 12) 0 \n",
  1428. "_________________________________________________________________\n",
  1429. "dense_28 (Dense) (None, 1) 13 \n",
  1430. "_________________________________________________________________\n",
  1431. "dropout_24 (Dropout) (None, 1) 0 \n",
  1432. "_________________________________________________________________\n",
  1433. "dense_29 (Dense) (None, 4) 8 \n",
  1434. "_________________________________________________________________\n",
  1435. "dropout_25 (Dropout) (None, 4) 0 \n",
  1436. "_________________________________________________________________\n",
  1437. "dense_30 (Dense) (None, 1) 5 \n",
  1438. "_________________________________________________________________\n",
  1439. "dropout_26 (Dropout) (None, 1) 0 \n",
  1440. "_________________________________________________________________\n",
  1441. "dense_31 (Dense) (None, 12) 24 \n",
  1442. "=================================================================\n",
  1443. "Total params: 362\n",
  1444. "Trainable params: 362\n",
  1445. "Non-trainable params: 0\n",
  1446. "_________________________________________________________________\n",
  1447. "None\n",
  1448. "Epoch 1/20\n",
  1449. "40/40 [==============================] - 0s 6ms/step - loss: 1.5385 - val_loss: 0.9377\n",
  1450. "Epoch 2/20\n",
  1451. "40/40 [==============================] - 0s 2ms/step - loss: 1.4001 - val_loss: 0.8741\n",
  1452. "Epoch 3/20\n",
  1453. "40/40 [==============================] - 0s 1ms/step - loss: 1.3243 - val_loss: 0.8293\n",
  1454. "Epoch 4/20\n",
  1455. "40/40 [==============================] - 0s 1ms/step - loss: 1.2663 - val_loss: 0.7954\n",
  1456. "Epoch 5/20\n",
  1457. "40/40 [==============================] - 0s 1ms/step - loss: 1.2048 - val_loss: 0.7677\n",
  1458. "Epoch 6/20\n",
  1459. "40/40 [==============================] - 0s 1ms/step - loss: 1.1459 - val_loss: 0.7439\n",
  1460. "Epoch 7/20\n",
  1461. "40/40 [==============================] - 0s 1ms/step - loss: 1.1224 - val_loss: 0.7230\n",
  1462. "Epoch 8/20\n",
  1463. "40/40 [==============================] - 0s 1ms/step - loss: 1.0832 - val_loss: 0.7042\n",
  1464. "Epoch 9/20\n",
  1465. "40/40 [==============================] - 0s 1ms/step - loss: 1.0554 - val_loss: 0.6868\n",
  1466. "Epoch 10/20\n",
  1467. "40/40 [==============================] - 0s 2ms/step - loss: 1.0283 - val_loss: 0.6708\n",
  1468. "Epoch 11/20\n",
  1469. "40/40 [==============================] - 0s 2ms/step - loss: 1.0062 - val_loss: 0.6558\n",
  1470. "Epoch 12/20\n",
  1471. "40/40 [==============================] - 0s 2ms/step - loss: 0.9902 - val_loss: 0.6417\n",
  1472. "Epoch 13/20\n",
  1473. "40/40 [==============================] - 0s 2ms/step - loss: 0.9738 - val_loss: 0.6284\n",
  1474. "Epoch 14/20\n",
  1475. "40/40 [==============================] - 0s 2ms/step - loss: 0.9461 - val_loss: 0.6158\n",
  1476. "Epoch 15/20\n",
  1477. "40/40 [==============================] - 0s 2ms/step - loss: 0.9323 - val_loss: 0.6038\n",
  1478. "Epoch 16/20\n",
  1479. "40/40 [==============================] - 0s 2ms/step - loss: 0.9126 - val_loss: 0.5925\n",
  1480. "Epoch 17/20\n",
  1481. "40/40 [==============================] - 0s 1ms/step - loss: 0.9007 - val_loss: 0.5817\n",
  1482. "Epoch 18/20\n",
  1483. "40/40 [==============================] - 0s 1ms/step - loss: 0.8846 - val_loss: 0.5715\n",
  1484. "Epoch 19/20\n",
  1485. "40/40 [==============================] - 0s 1ms/step - loss: 0.8657 - val_loss: 0.5617\n",
  1486. "Epoch 20/20\n",
  1487. "40/40 [==============================] - 0s 1ms/step - loss: 0.8551 - val_loss: 0.5524\n"
  1488. ]
  1489. },
  1490. {
  1491. "name": "stderr",
  1492. "output_type": "stream",
  1493. "text": [
  1494. "Traceback (most recent call last):\n",
  1495. " File \"/Users/wangyanghe/Desktop/Research/tods/tods/searcher/brute_force_search.py\", line 62, in _search\n",
  1496. " for error in pipeline_result.error:\n",
  1497. "TypeError: 'NoneType' object is not iterable\n",
  1498. "Not all provided hyper-parameters for the data preparation pipeline 79ce71bd-db96-494b-a455-14f2e2ac5040 were used: ['method', 'number_of_folds', 'randomSeed', 'shuffle', 'stratified']\n"
  1499. ]
  1500. },
  1501. {
  1502. "name": "stdout",
  1503. "output_type": "stream",
  1504. "text": [
  1505. "Model: \"sequential_7\"\n",
  1506. "_________________________________________________________________\n",
  1507. "Layer (type) Output Shape Param # \n",
  1508. "=================================================================\n",
  1509. "dense_32 (Dense) (None, 12) 156 \n",
  1510. "_________________________________________________________________\n",
  1511. "dropout_27 (Dropout) (None, 12) 0 \n",
  1512. "_________________________________________________________________\n",
  1513. "dense_33 (Dense) (None, 12) 156 \n",
  1514. "_________________________________________________________________\n",
  1515. "dropout_28 (Dropout) (None, 12) 0 \n",
  1516. "_________________________________________________________________\n",
  1517. "dense_34 (Dense) (None, 1) 13 \n",
  1518. "_________________________________________________________________\n",
  1519. "dropout_29 (Dropout) (None, 1) 0 \n",
  1520. "_________________________________________________________________\n",
  1521. "dense_35 (Dense) (None, 4) 8 \n",
  1522. "_________________________________________________________________\n",
  1523. "dropout_30 (Dropout) (None, 4) 0 \n",
  1524. "_________________________________________________________________\n",
  1525. "dense_36 (Dense) (None, 1) 5 \n",
  1526. "_________________________________________________________________\n",
  1527. "dropout_31 (Dropout) (None, 1) 0 \n",
  1528. "_________________________________________________________________\n",
  1529. "dense_37 (Dense) (None, 12) 24 \n",
  1530. "=================================================================\n",
  1531. "Total params: 362\n",
  1532. "Trainable params: 362\n",
  1533. "Non-trainable params: 0\n",
  1534. "_________________________________________________________________\n",
  1535. "None\n",
  1536. "Epoch 1/20\n",
  1537. "40/40 [==============================] - 0s 5ms/step - loss: 1.4187 - val_loss: 1.0796\n",
  1538. "Epoch 2/20\n",
  1539. "40/40 [==============================] - 0s 2ms/step - loss: 1.2828 - val_loss: 0.9882\n",
  1540. "Epoch 3/20\n",
  1541. "40/40 [==============================] - 0s 2ms/step - loss: 1.1966 - val_loss: 0.9252\n",
  1542. "Epoch 4/20\n",
  1543. "40/40 [==============================] - 0s 2ms/step - loss: 1.1363 - val_loss: 0.8790\n",
  1544. "Epoch 5/20\n",
  1545. "40/40 [==============================] - 0s 1ms/step - loss: 1.0851 - val_loss: 0.8430\n",
  1546. "Epoch 6/20\n",
  1547. "40/40 [==============================] - 0s 1ms/step - loss: 1.0490 - val_loss: 0.8141\n",
  1548. "Epoch 7/20\n",
  1549. "40/40 [==============================] - 0s 1ms/step - loss: 1.0228 - val_loss: 0.7893\n",
  1550. "Epoch 8/20\n",
  1551. "40/40 [==============================] - 0s 1ms/step - loss: 0.9927 - val_loss: 0.7679\n",
  1552. "Epoch 9/20\n",
  1553. "40/40 [==============================] - 0s 2ms/step - loss: 0.9690 - val_loss: 0.7490\n",
  1554. "Epoch 10/20\n",
  1555. "40/40 [==============================] - 0s 2ms/step - loss: 0.9507 - val_loss: 0.7316\n",
  1556. "Epoch 11/20\n",
  1557. "40/40 [==============================] - 0s 2ms/step - loss: 0.9298 - val_loss: 0.7158\n",
  1558. "Epoch 12/20\n",
  1559. "40/40 [==============================] - 0s 1ms/step - loss: 0.9132 - val_loss: 0.7011\n",
  1560. "Epoch 13/20\n",
  1561. "40/40 [==============================] - 0s 1ms/step - loss: 0.8952 - val_loss: 0.6873\n",
  1562. "Epoch 14/20\n",
  1563. "40/40 [==============================] - 0s 1ms/step - loss: 0.8818 - val_loss: 0.6743\n",
  1564. "Epoch 15/20\n",
  1565. "40/40 [==============================] - 0s 1ms/step - loss: 0.8682 - val_loss: 0.6620\n",
  1566. "Epoch 16/20\n",
  1567. "40/40 [==============================] - 0s 1ms/step - loss: 0.8537 - val_loss: 0.6504\n",
  1568. "Epoch 17/20\n",
  1569. "40/40 [==============================] - 0s 1ms/step - loss: 0.8404 - val_loss: 0.6394\n",
  1570. "Epoch 18/20\n",
  1571. "40/40 [==============================] - 0s 1ms/step - loss: 0.8283 - val_loss: 0.6289\n",
  1572. "Epoch 19/20\n",
  1573. "40/40 [==============================] - 0s 1ms/step - loss: 0.8162 - val_loss: 0.6190\n",
  1574. "Epoch 20/20\n",
  1575. "40/40 [==============================] - 0s 1ms/step - loss: 0.8059 - val_loss: 0.6095\n"
  1576. ]
  1577. },
  1578. {
  1579. "name": "stderr",
  1580. "output_type": "stream",
  1581. "text": [
  1582. "Traceback (most recent call last):\n",
  1583. " File \"/Users/wangyanghe/Desktop/Research/tods/tods/searcher/brute_force_search.py\", line 62, in _search\n",
  1584. " for error in pipeline_result.error:\n",
  1585. "TypeError: 'NoneType' object is not iterable\n",
  1586. "Not all provided hyper-parameters for the data preparation pipeline 79ce71bd-db96-494b-a455-14f2e2ac5040 were used: ['method', 'number_of_folds', 'randomSeed', 'shuffle', 'stratified']\n"
  1587. ]
  1588. },
  1589. {
  1590. "name": "stdout",
  1591. "output_type": "stream",
  1592. "text": [
  1593. "Model: \"sequential_8\"\n",
  1594. "_________________________________________________________________\n",
  1595. "Layer (type) Output Shape Param # \n",
  1596. "=================================================================\n",
  1597. "dense_38 (Dense) (None, 12) 156 \n",
  1598. "_________________________________________________________________\n",
  1599. "dropout_32 (Dropout) (None, 12) 0 \n",
  1600. "_________________________________________________________________\n",
  1601. "dense_39 (Dense) (None, 12) 156 \n",
  1602. "_________________________________________________________________\n",
  1603. "dropout_33 (Dropout) (None, 12) 0 \n",
  1604. "_________________________________________________________________\n",
  1605. "dense_40 (Dense) (None, 1) 13 \n",
  1606. "_________________________________________________________________\n",
  1607. "dropout_34 (Dropout) (None, 1) 0 \n",
  1608. "_________________________________________________________________\n",
  1609. "dense_41 (Dense) (None, 4) 8 \n",
  1610. "_________________________________________________________________\n",
  1611. "dropout_35 (Dropout) (None, 4) 0 \n",
  1612. "_________________________________________________________________\n",
  1613. "dense_42 (Dense) (None, 1) 5 \n",
  1614. "_________________________________________________________________\n",
  1615. "dropout_36 (Dropout) (None, 1) 0 \n",
  1616. "_________________________________________________________________\n",
  1617. "dense_43 (Dense) (None, 12) 24 \n",
  1618. "=================================================================\n",
  1619. "Total params: 362\n",
  1620. "Trainable params: 362\n",
  1621. "Non-trainable params: 0\n",
  1622. "_________________________________________________________________\n",
  1623. "None\n",
  1624. "Epoch 1/20\n",
  1625. "40/40 [==============================] - 0s 4ms/step - loss: 1.5237 - val_loss: 1.0177\n",
  1626. "Epoch 2/20\n",
  1627. "40/40 [==============================] - 0s 1ms/step - loss: 1.3755 - val_loss: 0.9350\n",
  1628. "Epoch 3/20\n",
  1629. "40/40 [==============================] - 0s 1ms/step - loss: 1.2795 - val_loss: 0.8795\n",
  1630. "Epoch 4/20\n",
  1631. "40/40 [==============================] - 0s 1ms/step - loss: 1.2211 - val_loss: 0.8374\n",
  1632. "Epoch 5/20\n",
  1633. "40/40 [==============================] - 0s 2ms/step - loss: 1.1686 - val_loss: 0.8039\n",
  1634. "Epoch 6/20\n",
  1635. "40/40 [==============================] - 0s 2ms/step - loss: 1.1298 - val_loss: 0.7758\n",
  1636. "Epoch 7/20\n",
  1637. "40/40 [==============================] - 0s 2ms/step - loss: 1.0982 - val_loss: 0.7514\n",
  1638. "Epoch 8/20\n",
  1639. "40/40 [==============================] - 0s 2ms/step - loss: 1.0670 - val_loss: 0.7298\n",
  1640. "Epoch 9/20\n",
  1641. "40/40 [==============================] - 0s 2ms/step - loss: 1.0382 - val_loss: 0.7106\n",
  1642. "Epoch 10/20\n",
  1643. "40/40 [==============================] - 0s 2ms/step - loss: 1.0145 - val_loss: 0.6931\n",
  1644. "Epoch 11/20\n",
  1645. "40/40 [==============================] - 0s 2ms/step - loss: 0.9928 - val_loss: 0.6770\n",
  1646. "Epoch 12/20\n",
  1647. "40/40 [==============================] - 0s 1ms/step - loss: 0.9743 - val_loss: 0.6621\n",
  1648. "Epoch 13/20\n",
  1649. "40/40 [==============================] - 0s 2ms/step - loss: 0.9573 - val_loss: 0.6483\n",
  1650. "Epoch 14/20\n",
  1651. "40/40 [==============================] - 0s 2ms/step - loss: 0.9370 - val_loss: 0.6353\n",
  1652. "Epoch 15/20\n",
  1653. "40/40 [==============================] - 0s 1ms/step - loss: 0.9207 - val_loss: 0.6231\n",
  1654. "Epoch 16/20\n",
  1655. "40/40 [==============================] - 0s 1ms/step - loss: 0.9041 - val_loss: 0.6116\n",
  1656. "Epoch 17/20\n",
  1657. "40/40 [==============================] - 0s 1ms/step - loss: 0.8930 - val_loss: 0.6007\n",
  1658. "Epoch 18/20\n",
  1659. "40/40 [==============================] - 0s 1ms/step - loss: 0.8765 - val_loss: 0.5904\n",
  1660. "Epoch 19/20\n",
  1661. "40/40 [==============================] - 0s 1ms/step - loss: 0.8633 - val_loss: 0.5806\n",
  1662. "Epoch 20/20\n",
  1663. "40/40 [==============================] - 0s 1ms/step - loss: 0.8528 - val_loss: 0.5713\n"
  1664. ]
  1665. },
  1666. {
  1667. "name": "stderr",
  1668. "output_type": "stream",
  1669. "text": [
  1670. "Traceback (most recent call last):\n",
  1671. " File \"/Users/wangyanghe/Desktop/Research/tods/tods/searcher/brute_force_search.py\", line 62, in _search\n",
  1672. " for error in pipeline_result.error:\n",
  1673. "TypeError: 'NoneType' object is not iterable\n"
  1674. ]
  1675. },
  1676. {
  1677. "name": "stdout",
  1678. "output_type": "stream",
  1679. "text": [
  1680. "Model: \"sequential_9\"\n",
  1681. "_________________________________________________________________\n",
  1682. "Layer (type) Output Shape Param # \n",
  1683. "=================================================================\n",
  1684. "dense_44 (Dense) (None, 12) 156 \n",
  1685. "_________________________________________________________________\n",
  1686. "dropout_37 (Dropout) (None, 12) 0 \n",
  1687. "_________________________________________________________________\n",
  1688. "dense_45 (Dense) (None, 12) 156 \n",
  1689. "_________________________________________________________________\n",
  1690. "dropout_38 (Dropout) (None, 12) 0 \n",
  1691. "_________________________________________________________________\n",
  1692. "dense_46 (Dense) (None, 1) 13 \n",
  1693. "_________________________________________________________________\n",
  1694. "dropout_39 (Dropout) (None, 1) 0 \n",
  1695. "_________________________________________________________________\n",
  1696. "dense_47 (Dense) (None, 4) 8 \n",
  1697. "_________________________________________________________________\n",
  1698. "dropout_40 (Dropout) (None, 4) 0 \n",
  1699. "_________________________________________________________________\n",
  1700. "dense_48 (Dense) (None, 1) 5 \n",
  1701. "_________________________________________________________________\n",
  1702. "dropout_41 (Dropout) (None, 1) 0 \n",
  1703. "_________________________________________________________________\n",
  1704. "dense_49 (Dense) (None, 12) 24 \n",
  1705. "=================================================================\n",
  1706. "Total params: 362\n",
  1707. "Trainable params: 362\n",
  1708. "Non-trainable params: 0\n",
  1709. "_________________________________________________________________\n",
  1710. "None\n",
  1711. "Epoch 1/20\n",
  1712. "40/40 [==============================] - 0s 4ms/step - loss: 1.5013 - val_loss: 1.5361\n",
  1713. "Epoch 2/20\n",
  1714. "40/40 [==============================] - 0s 1ms/step - loss: 1.3749 - val_loss: 1.4108\n",
  1715. "Epoch 3/20\n",
  1716. "40/40 [==============================] - 0s 1ms/step - loss: 1.2565 - val_loss: 1.3262\n",
  1717. "Epoch 4/20\n",
  1718. "40/40 [==============================] - 0s 1ms/step - loss: 1.1685 - val_loss: 1.2589\n",
  1719. "Epoch 5/20\n",
  1720. "40/40 [==============================] - 0s 1ms/step - loss: 1.1140 - val_loss: 1.2080\n",
  1721. "Epoch 6/20\n",
  1722. "40/40 [==============================] - 0s 1ms/step - loss: 1.0896 - val_loss: 1.1662\n",
  1723. "Epoch 7/20\n",
  1724. "40/40 [==============================] - 0s 2ms/step - loss: 1.0621 - val_loss: 1.1308\n",
  1725. "Epoch 8/20\n",
  1726. "40/40 [==============================] - 0s 2ms/step - loss: 1.0299 - val_loss: 1.0962\n",
  1727. "Epoch 9/20\n",
  1728. "40/40 [==============================] - 0s 2ms/step - loss: 0.9957 - val_loss: 1.0679\n",
  1729. "Epoch 10/20\n",
  1730. "40/40 [==============================] - 0s 1ms/step - loss: 0.9738 - val_loss: 1.0435\n",
  1731. "Epoch 11/20\n",
  1732. "40/40 [==============================] - 0s 1ms/step - loss: 0.9496 - val_loss: 1.0196\n",
  1733. "Epoch 12/20\n",
  1734. "40/40 [==============================] - 0s 1ms/step - loss: 0.9224 - val_loss: 1.0000\n",
  1735. "Epoch 13/20\n",
  1736. "40/40 [==============================] - 0s 1ms/step - loss: 0.9094 - val_loss: 0.9790\n",
  1737. "Epoch 14/20\n",
  1738. "40/40 [==============================] - 0s 1ms/step - loss: 0.8959 - val_loss: 0.9610\n",
  1739. "Epoch 15/20\n",
  1740. "40/40 [==============================] - 0s 1ms/step - loss: 0.8703 - val_loss: 0.9438\n",
  1741. "Epoch 16/20\n",
  1742. "40/40 [==============================] - 0s 2ms/step - loss: 0.8584 - val_loss: 0.9280\n",
  1743. "Epoch 17/20\n",
  1744. "40/40 [==============================] - 0s 2ms/step - loss: 0.8485 - val_loss: 0.9134\n",
  1745. "Epoch 18/20\n",
  1746. "40/40 [==============================] - 0s 2ms/step - loss: 0.8315 - val_loss: 0.8994\n",
  1747. "Epoch 19/20\n",
  1748. "40/40 [==============================] - 0s 2ms/step - loss: 0.8147 - val_loss: 0.8818\n",
  1749. "Epoch 20/20\n",
  1750. "40/40 [==============================] - 0s 2ms/step - loss: 0.8000 - val_loss: 0.8617\n"
  1751. ]
  1752. }
  1753. ],
  1754. "source": [
  1755. "# Find the best pipeline\n",
  1756. "best_runtime, best_pipeline_result = search.search_fit(input_data=[dataset], time_limit=time_limit)\n",
  1757. "best_pipeline = best_runtime.pipeline\n",
  1758. "best_output = best_pipeline_result.output"
  1759. ]
  1760. },
  1761. {
  1762. "cell_type": "code",
  1763. "execution_count": 50,
  1764. "metadata": {},
  1765. "outputs": [
  1766. {
  1767. "name": "stderr",
  1768. "output_type": "stream",
  1769. "text": [
  1770. "Not all provided hyper-parameters for the data preparation pipeline 79ce71bd-db96-494b-a455-14f2e2ac5040 were used: ['method', 'number_of_folds', 'randomSeed', 'shuffle', 'stratified']\n"
  1771. ]
  1772. },
  1773. {
  1774. "name": "stdout",
  1775. "output_type": "stream",
  1776. "text": [
  1777. "Model: \"sequential_10\"\n",
  1778. "_________________________________________________________________\n",
  1779. "Layer (type) Output Shape Param # \n",
  1780. "=================================================================\n",
  1781. "dense_50 (Dense) (None, 12) 156 \n",
  1782. "_________________________________________________________________\n",
  1783. "dropout_42 (Dropout) (None, 12) 0 \n",
  1784. "_________________________________________________________________\n",
  1785. "dense_51 (Dense) (None, 12) 156 \n",
  1786. "_________________________________________________________________\n",
  1787. "dropout_43 (Dropout) (None, 12) 0 \n",
  1788. "_________________________________________________________________\n",
  1789. "dense_52 (Dense) (None, 1) 13 \n",
  1790. "_________________________________________________________________\n",
  1791. "dropout_44 (Dropout) (None, 1) 0 \n",
  1792. "_________________________________________________________________\n",
  1793. "dense_53 (Dense) (None, 4) 8 \n",
  1794. "_________________________________________________________________\n",
  1795. "dropout_45 (Dropout) (None, 4) 0 \n",
  1796. "_________________________________________________________________\n",
  1797. "dense_54 (Dense) (None, 1) 5 \n",
  1798. "_________________________________________________________________\n",
  1799. "dropout_46 (Dropout) (None, 1) 0 \n",
  1800. "_________________________________________________________________\n",
  1801. "dense_55 (Dense) (None, 12) 24 \n",
  1802. "=================================================================\n",
  1803. "Total params: 362\n",
  1804. "Trainable params: 362\n",
  1805. "Non-trainable params: 0\n",
  1806. "_________________________________________________________________\n",
  1807. "None\n",
  1808. "Epoch 1/20\n",
  1809. "40/40 [==============================] - 0s 4ms/step - loss: 1.3728 - val_loss: 2.2095\n",
  1810. "Epoch 2/20\n",
  1811. "40/40 [==============================] - 0s 1ms/step - loss: 1.2507 - val_loss: 2.0598\n",
  1812. "Epoch 3/20\n",
  1813. "40/40 [==============================] - 0s 1ms/step - loss: 1.1434 - val_loss: 1.9599\n",
  1814. "Epoch 4/20\n",
  1815. "40/40 [==============================] - 0s 1ms/step - loss: 1.0993 - val_loss: 1.8894\n",
  1816. "Epoch 5/20\n",
  1817. "40/40 [==============================] - 0s 2ms/step - loss: 1.0486 - val_loss: 1.8368\n",
  1818. "Epoch 6/20\n",
  1819. "40/40 [==============================] - 0s 2ms/step - loss: 1.0115 - val_loss: 1.7958\n",
  1820. "Epoch 7/20\n",
  1821. "40/40 [==============================] - 0s 2ms/step - loss: 0.9806 - val_loss: 1.7621\n",
  1822. "Epoch 8/20\n",
  1823. "40/40 [==============================] - 0s 2ms/step - loss: 0.9481 - val_loss: 1.7337\n",
  1824. "Epoch 9/20\n",
  1825. "40/40 [==============================] - 0s 1ms/step - loss: 0.9264 - val_loss: 1.6992\n",
  1826. "Epoch 10/20\n",
  1827. "40/40 [==============================] - 0s 2ms/step - loss: 0.8929 - val_loss: 1.6732\n",
  1828. "Epoch 11/20\n",
  1829. "40/40 [==============================] - 0s 2ms/step - loss: 0.8834 - val_loss: 1.6493\n",
  1830. "Epoch 12/20\n",
  1831. "40/40 [==============================] - 0s 1ms/step - loss: 0.8608 - val_loss: 1.6288\n",
  1832. "Epoch 13/20\n",
  1833. "40/40 [==============================] - 0s 2ms/step - loss: 0.8382 - val_loss: 1.6080\n",
  1834. "Epoch 14/20\n",
  1835. "40/40 [==============================] - 0s 1ms/step - loss: 0.8256 - val_loss: 1.5866\n",
  1836. "Epoch 15/20\n",
  1837. "40/40 [==============================] - 0s 1ms/step - loss: 0.8124 - val_loss: 1.5684\n",
  1838. "Epoch 16/20\n",
  1839. "40/40 [==============================] - 0s 2ms/step - loss: 0.7965 - val_loss: 1.5524\n",
  1840. "Epoch 17/20\n",
  1841. "40/40 [==============================] - 0s 2ms/step - loss: 0.7840 - val_loss: 1.5353\n",
  1842. "Epoch 18/20\n",
  1843. "40/40 [==============================] - 0s 2ms/step - loss: 0.7678 - val_loss: 1.5211\n",
  1844. "Epoch 19/20\n",
  1845. "40/40 [==============================] - 0s 2ms/step - loss: 0.7594 - val_loss: 1.5052\n",
  1846. "Epoch 20/20\n",
  1847. "40/40 [==============================] - 0s 2ms/step - loss: 0.7455 - val_loss: 1.4914\n"
  1848. ]
  1849. }
  1850. ],
  1851. "source": [
  1852. "# Evaluate the best pipeline\n",
  1853. "best_scores = search.evaluate(best_pipeline).scores"
  1854. ]
  1855. },
  1856. {
  1857. "cell_type": "code",
  1858. "execution_count": 51,
  1859. "metadata": {},
  1860. "outputs": [
  1861. {
  1862. "name": "stdout",
  1863. "output_type": "stream",
  1864. "text": [
  1865. "Search History:\n",
  1866. "----------------------------------------------------\n",
  1867. "Pipeline id: f6665410-4d1d-4695-9f00-5d5f457ef95d\n",
  1868. " metric value normalized randomSeed fold\n",
  1869. "0 F1_MACRO 0.708549 0.708549 0 0\n",
  1870. "----------------------------------------------------\n",
  1871. "Pipeline id: 34ae48fe-fb5c-4dbd-940b-00098300cb9f\n",
  1872. " metric value normalized randomSeed fold\n",
  1873. "0 F1_MACRO 0.616695 0.616695 0 0\n",
  1874. "----------------------------------------------------\n",
  1875. "Pipeline id: fc287cdb-2958-4117-8e20-ba7645caa23c\n",
  1876. " metric value normalized randomSeed fold\n",
  1877. "0 F1_MACRO 0.55474 0.55474 0 0\n",
  1878. "----------------------------------------------------\n",
  1879. "Pipeline id: e510c088-369b-4b04-8b25-a320b4a86530\n",
  1880. " metric value normalized randomSeed fold\n",
  1881. "0 F1_MACRO 0.531302 0.531302 0 0\n",
  1882. "----------------------------------------------------\n",
  1883. "Pipeline id: b42e188a-ea92-4dc0-b7d3-8983b0e659e9\n",
  1884. " metric value normalized randomSeed fold\n",
  1885. "0 F1_MACRO 0.509059 0.509059 0 0\n",
  1886. "----------------------------------------------------\n",
  1887. "Pipeline id: 5e641e81-9e0e-46f3-b487-c37ccd1b9573\n",
  1888. " metric value normalized randomSeed fold\n",
  1889. "0 F1_MACRO 0.483604 0.483604 0 0\n"
  1890. ]
  1891. }
  1892. ],
  1893. "source": [
  1894. "print('Search History:')\n",
  1895. "for pipeline_result in search.history:\n",
  1896. " print('-' * 52)\n",
  1897. " print('Pipeline id:', pipeline_result.pipeline.id)\n",
  1898. " print(pipeline_result.scores)"
  1899. ]
  1900. },
  1901. {
  1902. "cell_type": "code",
  1903. "execution_count": 52,
  1904. "metadata": {},
  1905. "outputs": [
  1906. {
  1907. "name": "stdout",
  1908. "output_type": "stream",
  1909. "text": [
  1910. "Best pipeline:\n",
  1911. "----------------------------------------------------\n",
  1912. "Pipeline id: f6665410-4d1d-4695-9f00-5d5f457ef95d\n",
  1913. "Pipeline json: {\"id\": \"f6665410-4d1d-4695-9f00-5d5f457ef95d\", \"schema\": \"https://metadata.datadrivendiscovery.org/schemas/v0/pipeline.json\", \"created\": \"2021-06-29T04:06:53.685353Z\", \"inputs\": [{\"name\": \"inputs\"}], \"outputs\": [{\"data\": \"steps.7.produce\", \"name\": \"output predictions\"}], \"steps\": [{\"type\": \"PRIMITIVE\", \"primitive\": {\"id\": \"c78138d9-9377-31dc-aee8-83d9df049c60\", \"version\": \"0.3.0\", \"python_path\": \"d3m.primitives.tods.data_processing.dataset_to_dataframe\", \"name\": \"Extract a DataFrame from a Dataset\"}, \"arguments\": {\"inputs\": {\"type\": \"CONTAINER\", \"data\": \"inputs.0\"}}, \"outputs\": [{\"id\": \"produce\"}]}, {\"type\": \"PRIMITIVE\", \"primitive\": {\"id\": \"81235c29-aeb9-3828-911a-1b25319b6998\", \"version\": \"0.6.0\", \"python_path\": \"d3m.primitives.tods.data_processing.column_parser\", \"name\": \"Parses strings into their types\"}, \"arguments\": {\"inputs\": {\"type\": \"CONTAINER\", \"data\": \"steps.0.produce\"}}, \"outputs\": [{\"id\": \"produce\"}]}, {\"type\": \"PRIMITIVE\", \"primitive\": {\"id\": \"a996cd89-ddf0-367f-8e7f-8c013cbc2891\", \"version\": \"0.4.0\", \"python_path\": \"d3m.primitives.tods.data_processing.extract_columns_by_semantic_types\", \"name\": \"Extracts columns by semantic type\"}, \"arguments\": {\"inputs\": {\"type\": \"CONTAINER\", \"data\": \"steps.1.produce\"}}, \"outputs\": [{\"id\": \"produce\"}], \"hyperparams\": {\"semantic_types\": {\"type\": \"VALUE\", \"data\": [\"https://metadata.datadrivendiscovery.org/types/Attribute\"]}}}, {\"type\": \"PRIMITIVE\", \"primitive\": {\"id\": \"a996cd89-ddf0-367f-8e7f-8c013cbc2891\", \"version\": \"0.4.0\", \"python_path\": \"d3m.primitives.tods.data_processing.extract_columns_by_semantic_types\", \"name\": \"Extracts columns by semantic type\"}, \"arguments\": {\"inputs\": {\"type\": \"CONTAINER\", \"data\": \"steps.0.produce\"}}, \"outputs\": [{\"id\": \"produce\"}], \"hyperparams\": {\"semantic_types\": {\"type\": \"VALUE\", \"data\": [\"https://metadata.datadrivendiscovery.org/types/TrueTarget\"]}}}, {\"type\": \"PRIMITIVE\", \"primitive\": {\"id\": \"642de2e7-5590-3cab-9266-2a53c326c461\", \"version\": \"0.0.1\", \"python_path\": \"d3m.primitives.tods.timeseries_processing.transformation.axiswise_scaler\", \"name\": \"Axis_wise_scale\"}, \"arguments\": {\"inputs\": {\"type\": \"CONTAINER\", \"data\": \"steps.2.produce\"}}, \"outputs\": [{\"id\": \"produce\"}]}, {\"type\": \"PRIMITIVE\", \"primitive\": {\"id\": \"30bc7cec-2ccc-34bc-9df8-2095bf3b1ae2\", \"version\": \"0.1.0\", \"python_path\": \"d3m.primitives.tods.feature_analysis.statistical_mean\", \"name\": \"Time Series Decompostional\"}, \"arguments\": {\"inputs\": {\"type\": \"CONTAINER\", \"data\": \"steps.4.produce\"}}, \"outputs\": [{\"id\": \"produce\"}]}, {\"type\": \"PRIMITIVE\", \"primitive\": {\"id\": \"67e7fcdf-d645-3417-9aa4-85cd369487d9\", \"version\": \"0.0.1\", \"python_path\": \"d3m.primitives.tods.detection_algorithm.pyod_ae\", \"name\": \"TODS.anomaly_detection_primitives.AutoEncoder\"}, \"arguments\": {\"inputs\": {\"type\": \"CONTAINER\", \"data\": \"steps.5.produce\"}}, \"outputs\": [{\"id\": \"produce\"}], \"hyperparams\": {\"contamination\": {\"type\": \"VALUE\", \"data\": 0.01}}}, {\"type\": \"PRIMITIVE\", \"primitive\": {\"id\": \"2530840a-07d4-3874-b7d8-9eb5e4ae2bf3\", \"version\": \"0.3.0\", \"python_path\": \"d3m.primitives.tods.data_processing.construct_predictions\", \"name\": \"Construct pipeline predictions output\"}, \"arguments\": {\"inputs\": {\"type\": \"CONTAINER\", \"data\": \"steps.6.produce\"}, \"reference\": {\"type\": \"CONTAINER\", \"data\": \"steps.1.produce\"}}, \"outputs\": [{\"id\": \"produce\"}]}], \"digest\": \"af9c67055b7d50ecf3dce829af31051781b757f648d138836d827b6dfb699e8e\"}\n",
  1914. "Output:\n",
  1915. " d3mIndex anomaly\n",
  1916. "0 0 0\n",
  1917. "1 1 0\n",
  1918. "2 2 0\n",
  1919. "3 3 0\n",
  1920. "4 4 0\n",
  1921. "... ... ...\n",
  1922. "1395 1395 0\n",
  1923. "1396 1396 0\n",
  1924. "1397 1397 1\n",
  1925. "1398 1398 1\n",
  1926. "1399 1399 0\n",
  1927. "\n",
  1928. "[1400 rows x 2 columns]\n",
  1929. "Scores:\n",
  1930. " metric value normalized randomSeed fold\n",
  1931. "0 F1_MACRO 0.708549 0.708549 0 0\n"
  1932. ]
  1933. }
  1934. ],
  1935. "source": [
  1936. "print('Best pipeline:')\n",
  1937. "print('-' * 52)\n",
  1938. "print('Pipeline id:', best_pipeline.id)\n",
  1939. "print('Pipeline json:', best_pipeline.to_json())\n",
  1940. "print('Output:')\n",
  1941. "print(best_output)\n",
  1942. "print('Scores:')\n",
  1943. "print(best_scores)"
  1944. ]
  1945. },
  1946. {
  1947. "cell_type": "code",
  1948. "execution_count": null,
  1949. "metadata": {},
  1950. "outputs": [],
  1951. "source": []
  1952. }
  1953. ],
  1954. "metadata": {
  1955. "kernelspec": {
  1956. "display_name": "Python 3",
  1957. "language": "python",
  1958. "name": "python3"
  1959. },
  1960. "language_info": {
  1961. "codemirror_mode": {
  1962. "name": "ipython",
  1963. "version": 3
  1964. },
  1965. "file_extension": ".py",
  1966. "mimetype": "text/x-python",
  1967. "name": "python",
  1968. "nbconvert_exporter": "python",
  1969. "pygments_lexer": "ipython3",
  1970. "version": "3.6.10"
  1971. }
  1972. },
  1973. "nbformat": 4,
  1974. "nbformat_minor": 4
  1975. }

全栈的自动化机器学习系统,主要针对多变量时间序列数据的异常检测。TODS提供了详尽的用于构建基于机器学习的异常检测系统的模块,它们包括:数据处理(data processing),时间序列处理( time series processing),特征分析(feature analysis),检测算法(detection algorithms),和强化模块( reinforcement module)。这些模块所提供的功能包括常见的数据预处理、时间序列数据的平滑或变换,从时域或频域中抽取特征、多种多样的检测算