# Regression test * [How to run](#how-to-run) * [Correctness](#correctness) * [Performance](#performance) * [Debug tools](#debug-tools) * [To do list](#to-do-list) ## How to run 1. Run correctness regression test by ``` rlaunch --cpu=4 --memory=15000 --gpu=1 -- python3 verify_correctness.py ``` 2. Run performance regression test by ``` rlaunch --cpu=4 --memory=15000 --gpu=1 -- python3 run_resnet50_perf.py ``` Compare with the [reference result](#performance) to verify the performance change. 3. [Temporary]: Run dynamic graph test ``` cd python_module/megengine/examples/cifar10/resnet_example rlaunch --cpu=4 --memory=15000 --gpu=1 -- MGE_DISABLE_TRACE=1 python3 main.py --mode train --backend megengine-dynamic ``` Be sure to run a few epochs to verify the CPU/GPU memory usage and the result tends to converge. The complete run takes around 2 hours. ## Correctness Pre-trained Resnet18 model on cifar10 dataset is used. The test set contains * forward run with static graph * forward run with dynamic graph * forward + backward + parameter update with static graph * forward + backward + parameter update with dynamic graph Sample output: ``` Running fwd static ... Success Running fwd dynamic ... Success Running train static ... Success Running train dynamic ... Failed!!! import megengine operator [INFO] load /home/zhangfan/.local/lib/python3.6/site-packages/megengine/examples/cifar10/resnet_example/checkpoint/pytorch_init.pth done calculated loss: [2.3731833, 34.4626] expect: [ 2.3731833 34.460594 ] ``` ## Performance Test cases run Resnet 50 training with batch size = 64. Run `python3 resnet50_perf.py --help` for valid options. Example script: * Run `python3 run_resnet50_perf.py` * You may want to submit the job to a remote server by `rlaunch --cpu=16 --memory=100384 --gpu=8 -- python3 run_resnet50_perf.py` * Sample output ``` ************************************** Run ResNet 50 performance test with batch size = 64 ************************************** Run static graph with default opt level Finish with GPU Usage 6710MiB Wall time per iter 283 ms Run status: finished ************************************** Run static graph with conv fastrun Finish with GPU Usage 6540MiB Wall time per iter 265 ms Run status: finished ************************************** Run static graph with conv fastrun and JIT Finish with GPU Usage 6540MiB Wall time per iter 267 ms Run status: finished ************************************** Run static graph with JIT, conv fastrun and without running step Finish with GPU Usage 6540MiB Wall time per iter 223 ms Run status: finished ************************************** ``` ## Debug tools You can pass `--run-debug-tool` to script `run_resnet50_perf.py`. Opr-level profiling result and valgrind will be invoked. ### How much overhead time will it take due to usage of the profiler Please compare the same job with/without profiler. The timing statistic reported by profiler does not include the overhead time from itself. ### How can I get more information from profiler? Refer to the main function in `megengine.utils.profile_analyze`. ### How can I profile main memory usage? Valgrind massif tool can be used. The script also prints memory usage summary on screen as: ``` GB 1.836^ # | @@#::::::@::: | @@@ #::::::@::: | ::::::::::::@:::::::::@:@@@ #::::::@::: | ::::: :::::: @ ::: ::: @:@@@ #::::::@::: | @@::::: :::::: @ ::: ::: @:@@@ #::::::@::: | ::@@::::: :::::: @ ::: ::: @:@@@ #::::::@::: | @:: @@::::: :::::: @ ::: ::: @:@@@ #::::::@::: | @@@:: @@::::: :::::: @ ::: ::: @:@@@ #::::::@::: | :@@@:: @@::::: :::::: @ ::: ::: @:@@@ #::::::@::: | @::@@@:: @@::::: :::::: @ ::: ::: @:@@@ #::::::@::: | @@::@@@:: @@::::: :::::: @ ::: ::: @:@@@ #::::::@::: | @:@@::@@@:: @@::::: :::::: @ ::: ::: @:@@@ #::::::@::: | :@ @@::@@@:: @@::::: :::::: @ ::: ::: @:@@@ #::::::@::: | ::::@ @@::@@@:: @@::::: :::::: @ ::: ::: @:@@@ #::::::@::: | :::: :@ @@::@@@:: @@::::: :::::: @ ::: ::: @:@@@ #::::::@::: | :@: :: :@ @@::@@@:: @@::::: :::::: @ ::: ::: @:@@@ #::::::@::: | :@@: :: :@ @@::@@@:: @@::::: :::::: @ ::: ::: @:@@@ #::::::@::: | @@:@@: :: :@ @@::@@@:: @@::::: :::::: @ ::: ::: @:@@@ #::::::@::: | @@ :@@: :: :@ @@::@@@:: @@::::: :::::: @ ::: ::: @:@@@ #::::::@::: 0 +----------------------------------------------------------------------->Gi 0 19.39 ``` You can change "--run-iter" value to adjust iters to profile. The detailed profiling is printed to `massif.out.ms_print`. ### How can I understand the profiler result? The dumped profiling file `prof.json` can be interpolated by [megengine/utils/profile_analyze.py](../../utils/profile_analyze.py). The following information is printed from the profiler: ``` ----------------- -------- total device time 0.318062 total host time 0.275643 ----------------- -------- ╒════════════════════╤══════════════╤═══════════════════════════╤═══════════════╤═════════╤══════════╤═════════════╤═════════════╤══════════════╕ │ device self time │ cumulative │ operator info │ computation │ FLOPS │ memory │ bandwidth │ in_shapes │ out_shapes │ ╞════════════════════╪══════════════╪═══════════════════════════╪═══════════════╪═════════╪══════════╪═════════════╪═════════════╪══════════════╡ │ #0 │ 0.114 │ Elemwise │ 6.53 │ 57.40 │ 51.63 │ 454.02 │ None │ None │ │ 0.114 │ 35.8% │ 1481 │ GFLO │ GFLOPS │ GiB │ GiB/s │ │ │ │ 35.8% │ │ N/A │ │ │ │ │ │ │ ├────────────────────┼──────────────┼───────────────────────────┼───────────────┼─────────┼──────────┼─────────────┼─────────────┼──────────────┤ │ #1 │ 0.176 │ ConvolutionBackwardFilter │ 523.15 │ 8.35 │ 5.28 │ 84.24 │ None │ None │ │ 0.0627 │ 55.5% │ 53 │ GFLO │ TFLOPS │ GiB │ GiB/s │ │ │ │ 19.7% │ │ N/A │ │ │ │ │ │ │ ├────────────────────┼──────────────┼───────────────────────────┼───────────────┼─────────┼──────────┼─────────────┼─────────────┼──────────────┤ │ #2 │ 0.221 │ ConvolutionBackwardData │ 508.05 │ 11.31 │ 5.05 │ 112.42 │ None │ None │ │ 0.0449 │ 69.6% │ 52 │ GFLO │ TFLOPS │ GiB │ GiB/s │ │ │ │ 14.1% │ │ N/A │ │ │ │ │ │ │ ├────────────────────┼──────────────┼───────────────────────────┼───────────────┼─────────┼──────────┼─────────────┼─────────────┼──────────────┤ ``` Please read [megengine/utils/profile_analyze.py](../../utils/profile_analyze.py) for more usages. ## To do list * Change numerical tolerance after XPU-280 is done * Add scripts to facilitate log analysis * Profile GPU memory * Incorporate with QA system * Add more regression tests