You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.md 9.6 kB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178
  1. # Regression test
  2. * [How to run](#how-to-run)
  3. * [Correctness](#correctness)
  4. * [Performance](#performance)
  5. * [Debug tools](#debug-tools)
  6. * [To do list](#to-do-list)
  7. ## How to run
  8. 1. Run correctness regression test by
  9. ```
  10. rlaunch --cpu=4 --memory=15000 --gpu=1 -- python3 verify_correctness.py
  11. ```
  12. 2. Run performance regression test by
  13. ```
  14. rlaunch --cpu=4 --memory=15000 --gpu=1 -- python3 run_resnet50_perf.py
  15. ```
  16. Compare with the [reference result](#performance) to verify the performance change.
  17. 3. [Temporary]: Run dynamic graph test
  18. ```
  19. cd python_module/megengine/examples/cifar10/resnet_example
  20. rlaunch --cpu=4 --memory=15000 --gpu=1 -- MGE_DISABLE_TRACE=1 python3 main.py --mode train --backend megengine-dynamic
  21. ```
  22. Be sure to run a few epochs to verify the CPU/GPU memory usage and the result tends to converge. The complete run takes around 2 hours.
  23. ## Correctness
  24. Pre-trained Resnet18 model on cifar10 dataset is used.
  25. The test set contains
  26. * forward run with static graph
  27. * forward run with dynamic graph
  28. * forward + backward + parameter update with static graph
  29. * forward + backward + parameter update with dynamic graph
  30. Sample output:
  31. ```
  32. Running fwd static ...
  33. Success
  34. Running fwd dynamic ...
  35. Success
  36. Running train static ...
  37. Success
  38. Running train dynamic ...
  39. Failed!!!
  40. import megengine operator
  41. [INFO] load /home/zhangfan/.local/lib/python3.6/site-packages/megengine/examples/cifar10/resnet_example/checkpoint/pytorch_init.pth done
  42. calculated loss: [2.3731833, 34.4626]
  43. expect: [ 2.3731833 34.460594 ]
  44. ```
  45. ## Performance
  46. Test cases run Resnet 50 training with batch size = 64.
  47. Run `python3 resnet50_perf.py --help` for valid options.
  48. Example script:
  49. * Run `python3 run_resnet50_perf.py`
  50. * You may want to submit the job to a remote server by `rlaunch --cpu=16 --memory=100384 --gpu=8 -- python3 run_resnet50_perf.py`
  51. * Sample output
  52. ```
  53. **************************************
  54. Run ResNet 50 performance test with batch size = 64
  55. **************************************
  56. Run static graph with default opt level
  57. Finish with GPU Usage 6710MiB
  58. Wall time per iter 283 ms
  59. Run status: finished
  60. **************************************
  61. Run static graph with conv fastrun
  62. Finish with GPU Usage 6540MiB
  63. Wall time per iter 265 ms
  64. Run status: finished
  65. **************************************
  66. Run static graph with conv fastrun and JIT
  67. Finish with GPU Usage 6540MiB
  68. Wall time per iter 267 ms
  69. Run status: finished
  70. **************************************
  71. Run static graph with JIT, conv fastrun and without running step
  72. Finish with GPU Usage 6540MiB
  73. Wall time per iter 223 ms
  74. Run status: finished
  75. **************************************
  76. ```
  77. ## Debug tools
  78. You can pass `--run-debug-tool` to script `run_resnet50_perf.py`. Opr-level profiling result and valgrind will be invoked.
  79. ### How much overhead time will it take due to usage of the profiler
  80. Please compare the same job with/without profiler. The timing statistic reported by profiler does not include the overhead time from itself.
  81. ### How can I get more information from profiler?
  82. Refer to the main function in `megengine.utils.profile_analyze`.
  83. ### How can I profile main memory usage?
  84. Valgrind massif tool can be used. The script also prints memory usage summary on screen as:
  85. ```
  86. GB
  87. 1.836^ #
  88. | @@#::::::@:::
  89. | @@@ #::::::@:::
  90. | ::::::::::::@:::::::::@:@@@ #::::::@:::
  91. | ::::: :::::: @ ::: ::: @:@@@ #::::::@:::
  92. | @@::::: :::::: @ ::: ::: @:@@@ #::::::@:::
  93. | ::@@::::: :::::: @ ::: ::: @:@@@ #::::::@:::
  94. | @:: @@::::: :::::: @ ::: ::: @:@@@ #::::::@:::
  95. | @@@:: @@::::: :::::: @ ::: ::: @:@@@ #::::::@:::
  96. | :@@@:: @@::::: :::::: @ ::: ::: @:@@@ #::::::@:::
  97. | @::@@@:: @@::::: :::::: @ ::: ::: @:@@@ #::::::@:::
  98. | @@::@@@:: @@::::: :::::: @ ::: ::: @:@@@ #::::::@:::
  99. | @:@@::@@@:: @@::::: :::::: @ ::: ::: @:@@@ #::::::@:::
  100. | :@ @@::@@@:: @@::::: :::::: @ ::: ::: @:@@@ #::::::@:::
  101. | ::::@ @@::@@@:: @@::::: :::::: @ ::: ::: @:@@@ #::::::@:::
  102. | :::: :@ @@::@@@:: @@::::: :::::: @ ::: ::: @:@@@ #::::::@:::
  103. | :@: :: :@ @@::@@@:: @@::::: :::::: @ ::: ::: @:@@@ #::::::@:::
  104. | :@@: :: :@ @@::@@@:: @@::::: :::::: @ ::: ::: @:@@@ #::::::@:::
  105. | @@:@@: :: :@ @@::@@@:: @@::::: :::::: @ ::: ::: @:@@@ #::::::@:::
  106. | @@ :@@: :: :@ @@::@@@:: @@::::: :::::: @ ::: ::: @:@@@ #::::::@:::
  107. 0 +----------------------------------------------------------------------->Gi
  108. 0 19.39
  109. ```
  110. You can change "--run-iter" value to adjust iters to profile.
  111. The detailed profiling is printed to `massif.out.ms_print`.
  112. ### How can I understand the profiler result?
  113. The dumped profiling file `prof.json` can be interpolated by [megengine/utils/profile_analyze.py](../../utils/profile_analyze.py).
  114. The following information is printed from the profiler:
  115. ```
  116. ----------------- --------
  117. total device time 0.318062
  118. total host time 0.275643
  119. ----------------- --------
  120. ╒════════════════════╤══════════════╤═══════════════════════════╤═══════════════╤═════════╤══════════╤═════════════╤═════════════╤══════════════╕
  121. │ device self time │ cumulative │ operator info │ computation │ FLOPS │ memory │ bandwidth │ in_shapes │ out_shapes │
  122. ╞════════════════════╪══════════════╪═══════════════════════════╪═══════════════╪═════════╪══════════╪═════════════╪═════════════╪══════════════╡
  123. │ #0 │ 0.114 │ Elemwise │ 6.53 │ 57.40 │ 51.63 │ 454.02 │ None │ None │
  124. │ 0.114 │ 35.8% │ 1481 │ GFLO │ GFLOPS │ GiB │ GiB/s │ │ │
  125. │ 35.8% │ │ N/A │ │ │ │ │ │ │
  126. ├────────────────────┼──────────────┼───────────────────────────┼───────────────┼─────────┼──────────┼─────────────┼─────────────┼──────────────┤
  127. │ #1 │ 0.176 │ ConvolutionBackwardFilter │ 523.15 │ 8.35 │ 5.28 │ 84.24 │ None │ None │
  128. │ 0.0627 │ 55.5% │ 53 │ GFLO │ TFLOPS │ GiB │ GiB/s │ │ │
  129. │ 19.7% │ │ N/A │ │ │ │ │ │ │
  130. ├────────────────────┼──────────────┼───────────────────────────┼───────────────┼─────────┼──────────┼─────────────┼─────────────┼──────────────┤
  131. │ #2 │ 0.221 │ ConvolutionBackwardData │ 508.05 │ 11.31 │ 5.05 │ 112.42 │ None │ None │
  132. │ 0.0449 │ 69.6% │ 52 │ GFLO │ TFLOPS │ GiB │ GiB/s │ │ │
  133. │ 14.1% │ │ N/A │ │ │ │ │ │ │
  134. ├────────────────────┼──────────────┼───────────────────────────┼───────────────┼─────────┼──────────┼─────────────┼─────────────┼──────────────┤
  135. ```
  136. Please read [megengine/utils/profile_analyze.py](../../utils/profile_analyze.py) for more usages.
  137. ## To do list
  138. * Change numerical tolerance after XPU-280 is done
  139. * Add scripts to facilitate log analysis
  140. * Profile GPU memory
  141. * Incorporate with QA system
  142. * Add more regression tests

MegEngine 安装包中集成了使用 GPU 运行代码所需的 CUDA 环境,不用区分 CPU 和 GPU 版。 如果想要运行 GPU 程序,请确保机器本身配有 GPU 硬件设备并安装好驱动。 如果你想体验在云端 GPU 算力平台进行深度学习开发的感觉,欢迎访问 MegStudio 平台