You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

Technical_details.md 2.6 kB

2 years ago
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455
  1. # Super Fast and Accurate 3D Object Detection based on 3D LiDAR Point Clouds
  2. ---
  3. Technical details of the implementation
  4. ## 1. Network architecture
  5. - The **ResNet-based Keypoint Feature Pyramid Network** (KFPN) that was proposed in [RTM3D paper](https://arxiv.org/pdf/2001.03343.pdf).
  6. The unofficial implementation of the RTM3D paper by using PyTorch is [here](https://github.com/maudzung/RTM3D)
  7. - **Input**:
  8. - The model takes a birds-eye-view (BEV) map as input.
  9. - The BEV map is encoded by height, intensity, and density of 3D LiDAR point clouds. Assume that the size of the BEV input is `(H, W, 3)`.
  10. - **Outputs**:
  11. - Heatmap for main center with a size of `(H/S, W/S, C)` where `S=4` _(the down-sample ratio)_, and `C=3` _(the number of classes)_
  12. - Center offset: `(H/S, W/S, 2)`
  13. - The heading angle _(yaw)_: `(H/S, W/S, 2)`. The model estimates the **im**aginary and the **re**al fraction (`sin(yaw)` and `cos(yaw)` values).
  14. - Dimension _(h, w, l)_: `(H/S, W/S, 3)`
  15. - `z` coordinate: `(H/S, W/S, 1)`
  16. - **Targets**: **7 degrees of freedom** _(7-DOF)_ of objects: `(cx, cy, cz, l, w, h, θ)`
  17. - `cx, cy, cz`: The center coordinates.
  18. - `l, w, h`: length, width, height of the bounding box.
  19. - `θ`: The heading angle in radians of the bounding box.
  20. - **Objects**: Cars, Pedestrians, Cyclists.
  21. ## 2. Losses function
  22. - For main center heatmap: Used `focal loss`
  23. - For heading angle _(yaw)_: The `im` and `re` fractions are directly regressed by using `l1_loss`
  24. - For `z coordinate` and `3 dimensions` (height, width, length), I used `balanced l1 loss` that was proposed by the paper
  25. [Libra R-CNN: Towards Balanced Learning for Object Detection](https://arxiv.org/pdf/1904.02701.pdf)
  26. ## 3. Training in details
  27. - Set uniform weights to the above components of losses. (`=1.0` for all)
  28. - Number of epochs: 300.
  29. - Learning rate scheduler: [`cosine`](https://arxiv.org/pdf/1812.01187.pdf), initial learning rate: 0.001.
  30. - Batch size: `16` (on a single GTX 1080Ti).
  31. ## 4. Inference
  32. - A `3 × 3` max-pooling operation was applied on the center heat map, then only `50` predictions whose
  33. center confidences are larger than 0.2 were kept.
  34. - The heading angle _(yaw)_ = `arctan`(_imaginary fraction_ / _real fraction_)
  35. ## 5. How to expand the work
  36. - The model could be trained with more classes and with a larger detected area by modifying configurations in
  37. the [config/kitti_dataset.py](https://github.com/maudzung/Super-Fast-Accurate-3D-Object-Detection/blob/master/src/config/kitti_config.py) file.

一站式算法开发平台、高性能分布式深度学习框架、先进算法模型库、视觉模型炼知平台、数据可视化分析平台等一系列平台及工具,在模型高效分布式训练、数据处理和可视分析、模型炼知和轻量化等技术上形成独特优势,目前已在产学研等各领域近千家单位及个人提供AI应用赋能