|
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178 |
- # Regression test
- * [How to run](#how-to-run)
- * [Correctness](#correctness)
- * [Performance](#performance)
- * [Debug tools](#debug-tools)
- * [To do list](#to-do-list)
-
- ## How to run
-
- 1. Run correctness regression test by
-
- ```
- rlaunch --cpu=4 --memory=15000 --gpu=1 -- python3 verify_correctness.py
- ```
-
- 2. Run performance regression test by
-
- ```
- rlaunch --cpu=4 --memory=15000 --gpu=1 -- python3 run_resnet50_perf.py
- ```
-
- Compare with the [reference result](#performance) to verify the performance change.
-
- 3. [Temporary]: Run dynamic graph test
-
- ```
- cd python_module/megengine/examples/cifar10/resnet_example
- rlaunch --cpu=4 --memory=15000 --gpu=1 -- MGE_DISABLE_TRACE=1 python3 main.py --mode train --backend megengine-dynamic
- ```
-
- Be sure to run a few epochs to verify the CPU/GPU memory usage and the result tends to converge. The complete run takes around 2 hours.
-
- ## Correctness
-
- Pre-trained Resnet18 model on cifar10 dataset is used.
-
- The test set contains
- * forward run with static graph
- * forward run with dynamic graph
- * forward + backward + parameter update with static graph
- * forward + backward + parameter update with dynamic graph
-
- Sample output:
-
- ```
- Running fwd static ...
- Success
- Running fwd dynamic ...
- Success
- Running train static ...
- Success
- Running train dynamic ...
- Failed!!!
- import megengine operator
- [INFO] load /home/zhangfan/.local/lib/python3.6/site-packages/megengine/examples/cifar10/resnet_example/checkpoint/pytorch_init.pth done
- calculated loss: [2.3731833, 34.4626]
- expect: [ 2.3731833 34.460594 ]
- ```
-
- ## Performance
-
- Test cases run Resnet 50 training with batch size = 64.
-
- Run `python3 resnet50_perf.py --help` for valid options.
-
- Example script:
-
- * Run `python3 run_resnet50_perf.py`
- * You may want to submit the job to a remote server by `rlaunch --cpu=16 --memory=100384 --gpu=8 -- python3 run_resnet50_perf.py`
- * Sample output
- ```
- **************************************
- Run ResNet 50 performance test with batch size = 64
- **************************************
- Run static graph with default opt level
- Finish with GPU Usage 6710MiB
- Wall time per iter 283 ms
- Run status: finished
- **************************************
- Run static graph with conv fastrun
- Finish with GPU Usage 6540MiB
- Wall time per iter 265 ms
- Run status: finished
- **************************************
- Run static graph with conv fastrun and JIT
- Finish with GPU Usage 6540MiB
- Wall time per iter 267 ms
- Run status: finished
- **************************************
- Run static graph with JIT, conv fastrun and without running step
- Finish with GPU Usage 6540MiB
- Wall time per iter 223 ms
- Run status: finished
- **************************************
- ```
-
- ## Debug tools
-
- You can pass `--run-debug-tool` to script `run_resnet50_perf.py`. Opr-level profiling result and valgrind will be invoked.
-
- ### How much overhead time will it take due to usage of the profiler
-
- Please compare the same job with/without profiler. The timing statistic reported by profiler does not include the overhead time from itself.
-
- ### How can I get more information from profiler?
-
- Refer to the main function in `megengine.utils.profile_analyze`.
-
- ### How can I profile main memory usage?
-
- Valgrind massif tool can be used. The script also prints memory usage summary on screen as:
-
- ```
-
- GB
- 1.836^ #
- | @@#::::::@:::
- | @@@ #::::::@:::
- | ::::::::::::@:::::::::@:@@@ #::::::@:::
- | ::::: :::::: @ ::: ::: @:@@@ #::::::@:::
- | @@::::: :::::: @ ::: ::: @:@@@ #::::::@:::
- | ::@@::::: :::::: @ ::: ::: @:@@@ #::::::@:::
- | @:: @@::::: :::::: @ ::: ::: @:@@@ #::::::@:::
- | @@@:: @@::::: :::::: @ ::: ::: @:@@@ #::::::@:::
- | :@@@:: @@::::: :::::: @ ::: ::: @:@@@ #::::::@:::
- | @::@@@:: @@::::: :::::: @ ::: ::: @:@@@ #::::::@:::
- | @@::@@@:: @@::::: :::::: @ ::: ::: @:@@@ #::::::@:::
- | @:@@::@@@:: @@::::: :::::: @ ::: ::: @:@@@ #::::::@:::
- | :@ @@::@@@:: @@::::: :::::: @ ::: ::: @:@@@ #::::::@:::
- | ::::@ @@::@@@:: @@::::: :::::: @ ::: ::: @:@@@ #::::::@:::
- | :::: :@ @@::@@@:: @@::::: :::::: @ ::: ::: @:@@@ #::::::@:::
- | :@: :: :@ @@::@@@:: @@::::: :::::: @ ::: ::: @:@@@ #::::::@:::
- | :@@: :: :@ @@::@@@:: @@::::: :::::: @ ::: ::: @:@@@ #::::::@:::
- | @@:@@: :: :@ @@::@@@:: @@::::: :::::: @ ::: ::: @:@@@ #::::::@:::
- | @@ :@@: :: :@ @@::@@@:: @@::::: :::::: @ ::: ::: @:@@@ #::::::@:::
- 0 +----------------------------------------------------------------------->Gi
- 0 19.39
-
- ```
- You can change "--run-iter" value to adjust iters to profile.
- The detailed profiling is printed to `massif.out.ms_print`.
-
- ### How can I understand the profiler result?
-
- The dumped profiling file `prof.json` can be interpolated by [megengine/utils/profile_analyze.py](../../utils/profile_analyze.py).
- The following information is printed from the profiler:
-
- ```
- ----------------- --------
- total device time 0.318062
- total host time 0.275643
- ----------------- --------
-
- ╒════════════════════╤══════════════╤═══════════════════════════╤═══════════════╤═════════╤══════════╤═════════════╤═════════════╤══════════════╕
- │ device self time │ cumulative │ operator info │ computation │ FLOPS │ memory │ bandwidth │ in_shapes │ out_shapes │
- ╞════════════════════╪══════════════╪═══════════════════════════╪═══════════════╪═════════╪══════════╪═════════════╪═════════════╪══════════════╡
- │ #0 │ 0.114 │ Elemwise │ 6.53 │ 57.40 │ 51.63 │ 454.02 │ None │ None │
- │ 0.114 │ 35.8% │ 1481 │ GFLO │ GFLOPS │ GiB │ GiB/s │ │ │
- │ 35.8% │ │ N/A │ │ │ │ │ │ │
- ├────────────────────┼──────────────┼───────────────────────────┼───────────────┼─────────┼──────────┼─────────────┼─────────────┼──────────────┤
- │ #1 │ 0.176 │ ConvolutionBackwardFilter │ 523.15 │ 8.35 │ 5.28 │ 84.24 │ None │ None │
- │ 0.0627 │ 55.5% │ 53 │ GFLO │ TFLOPS │ GiB │ GiB/s │ │ │
- │ 19.7% │ │ N/A │ │ │ │ │ │ │
- ├────────────────────┼──────────────┼───────────────────────────┼───────────────┼─────────┼──────────┼─────────────┼─────────────┼──────────────┤
- │ #2 │ 0.221 │ ConvolutionBackwardData │ 508.05 │ 11.31 │ 5.05 │ 112.42 │ None │ None │
- │ 0.0449 │ 69.6% │ 52 │ GFLO │ TFLOPS │ GiB │ GiB/s │ │ │
- │ 14.1% │ │ N/A │ │ │ │ │ │ │
- ├────────────────────┼──────────────┼───────────────────────────┼───────────────┼─────────┼──────────┼─────────────┼─────────────┼──────────────┤
- ```
- Please read [megengine/utils/profile_analyze.py](../../utils/profile_analyze.py) for more usages.
-
- ## To do list
-
- * Change numerical tolerance after XPU-280 is done
- * Add scripts to facilitate log analysis
- * Profile GPU memory
- * Incorporate with QA system
- * Add more regression tests
|