You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.md 3.9 kB

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394
  1. # Extern-C-Opr with MACE
  2. ### Build MegEngine `load_and_run` for arm64-v8a
  3. NOTICE: build depends on [NDK](https://developer.android.com/ndk/downloads)
  4. after download, please config env by:
  5. ```bash
  6. export NDK_ROOT=path/to/ndk
  7. export ANDROID_NDK_HOME=${NDK_ROOT}
  8. export PATH=${NDK_ROOT}/toolchains/llvm/prebuilt/linux-x86_64/bin/:$PATH
  9. ```
  10. ```bash
  11. cd $MEGENGINE_HOME
  12. git checkout v1.0.0 (we only test v1.0.0 version)
  13. ./scripts/cmake-build/cross_build_android_arm_inference.sh -a arm64-v8a -r
  14. ```
  15. After successfully built:
  16. * load_and_run should be in `$MEGENGINE_HOME/build_dir/android/arm64-v8a/Release/install/bin`
  17. * libmegengine.so should be in `$MEGENGINE_HOME/build_dir/android/arm64-v8a/Release/install/lib`
  18. ### Build MACE libraries for arm64-v8a with GPU runtime
  19. ```bash
  20. cd $MACE_HOME
  21. RUNTIME=GPU bash tools/cmake/cmake-build-arm64-v8a.sh
  22. export SDKPATH=${MACE_HOME}/build/cmake-build/arm64-v8a/install
  23. ```
  24. After successfully libmace.so should be in `$MACE_HOME/build/cmake-build/arm64-v8a/install/lib/libmace.so`
  25. ### Build MACE loader for MegEngine
  26. If `SDKPATH` is not set, by default it's `./arm64-v8a`
  27. You can run with debug mode(by adding `DEBUG=1` to make command), which will show more running information
  28. ### Prepare a MACE model(for example: resnet_50), wrap it with MegEngine extern c opr
  29. ```
  30. python3 dump_model.py --input path/to/resnet_50.pb --param path/to/resnet_50.data --output resnet_50.mdl --config path/to/resnet_50.yml
  31. ```
  32. `*.pb` file denotes the model structure, `*.data` denotes the model parameters
  33. Check [here](https://github.com/XiaoMi/mace-models) to learn how to write yml files for MACE
  34. ### Run with load-and-run
  35. First of all, send all files to the executed device(for example: /data/local/tmp/test/):
  36. - load_and_run
  37. - resnet_50.mdl
  38. - libmace_loader.so
  39. - libmegengine.so
  40. - libmace.so
  41. As mace build with `c++_shared` by default, but old AOSP device do not have `libc++_shared.so` by default, if you use this class devices
  42. also need send it to devices, which always can be found at `${NDK_ROOT}/sources/cxx-stl/llvm-libc++/libs/arm64-v8a/libc++_shared.so`
  43. ```
  44. login to device
  45. cd /path/to/ (for example: /data/local/tmp/test/)
  46. MGB_MACE_RUNTIME=GPU MGB_MACE_OPENCL_CACHE_PATH=./ MGB_MACE_LOADER_FORMAT=NCHW LD_LIBRARY_PATH=. ./load_and_run resnet_50.mdl --c-opr-lib libmace_loader.so --input input-bs1.npy
  47. ```
  48. RUNTIME candidates:
  49. - CPU
  50. - GPU
  51. `MGB_MACE_OPENCL_CACHE_PATH` is the directory path where OpenCL binary cache writes to (the cache file name is always `mace_cl_compiled_program.bin`), if the cache file does not exist then it will be created.
  52. We mainly use NCHW data format, if you have NHWC model, use environment `MGB_MACE_LOADER_FORMAT=NHWC`
  53. For CPU runtime, default running thread is 1, could be specified with `MGB_MACE_NR_THREADS=n`
  54. if you want to run with HEXAGON runtime, more efforts should be made, please check [here](https://mace.readthedocs.io/en/latest/faq.html#why-is-mace-not-working-on-dsp).
  55. ### Tuning on specific OpenCL device
  56. MACE supports tuning on specific SoC to optimize the performace on GPU, see [doc](https://mace.readthedocs.io/en/latest/user_guide/advanced_usage.html#tuning-for-specific-soc-s-gpu).
  57. To enable this feature, use `MGB_MACE_TUNING_PARAM_PATH` env to give the path to the tuning param file.
  58. To generate the tunig param file, give `MACE_TUNING=1` env and set the `MACE_RUN_PARAMETER_PATH` to the file name you want.
  59. ```bash
  60. # search for tuning param
  61. MACE_TUNING=1 MACE_RUN_PARAMETER_PATH=opencl/vgg16.tune_param MGB_MACE_RUNTIME=GPU MGB_MACE_OPENCL_PATH=opencl MGB_MACE_LOADER_FORMAT=NCHW ./load_and_run mace/vgg16.mdl --c-opr-lib libmace_loader.so --input 4d.npy
  62. # then run test using the param
  63. MGB_MACE_TUNING_PARAM_PATH=opencl/vgg16.tune_param MGB_MACE_RUNTIME=GPU MGB_MACE_OPENCL_PATH=opencl MGB_MACE_LOADER_FORMAT=NCHW ./load_and_run mace/vgg16.mdl --c-opr-lib libmace_loader.so --input 4d.npy
  64. ```