# Super Fast and Accurate 3D Object Detection based on 3D LiDAR Point Clouds --- Technical details of the implementation ## 1. Network architecture - The **ResNet-based Keypoint Feature Pyramid Network** (KFPN) that was proposed in [RTM3D paper](https://arxiv.org/pdf/2001.03343.pdf). The unofficial implementation of the RTM3D paper by using PyTorch is [here](https://github.com/maudzung/RTM3D) - **Input**: - The model takes a birds-eye-view (BEV) map as input. - The BEV map is encoded by height, intensity, and density of 3D LiDAR point clouds. Assume that the size of the BEV input is `(H, W, 3)`. - **Outputs**: - Heatmap for main center with a size of `(H/S, W/S, C)` where `S=4` _(the down-sample ratio)_, and `C=3` _(the number of classes)_ - Center offset: `(H/S, W/S, 2)` - The heading angle _(yaw)_: `(H/S, W/S, 2)`. The model estimates the **im**aginary and the **re**al fraction (`sin(yaw)` and `cos(yaw)` values). - Dimension _(h, w, l)_: `(H/S, W/S, 3)` - `z` coordinate: `(H/S, W/S, 1)` - **Targets**: **7 degrees of freedom** _(7-DOF)_ of objects: `(cx, cy, cz, l, w, h, θ)` - `cx, cy, cz`: The center coordinates. - `l, w, h`: length, width, height of the bounding box. - `θ`: The heading angle in radians of the bounding box. - **Objects**: Cars, Pedestrians, Cyclists. ## 2. Losses function - For main center heatmap: Used `focal loss` - For heading angle _(yaw)_: The `im` and `re` fractions are directly regressed by using `l1_loss` - For `z coordinate` and `3 dimensions` (height, width, length), I used `balanced l1 loss` that was proposed by the paper [Libra R-CNN: Towards Balanced Learning for Object Detection](https://arxiv.org/pdf/1904.02701.pdf) ## 3. Training in details - Set uniform weights to the above components of losses. (`=1.0` for all) - Number of epochs: 300. - Learning rate scheduler: [`cosine`](https://arxiv.org/pdf/1812.01187.pdf), initial learning rate: 0.001. - Batch size: `16` (on a single GTX 1080Ti). ## 4. Inference - A `3 × 3` max-pooling operation was applied on the center heat map, then only `50` predictions whose center confidences are larger than 0.2 were kept. - The heading angle _(yaw)_ = `arctan`(_imaginary fraction_ / _real fraction_) ## 5. How to expand the work - The model could be trained with more classes and with a larger detected area by modifying configurations in the [config/kitti_dataset.py](https://github.com/maudzung/Super-Fast-Accurate-3D-Object-Detection/blob/master/src/config/kitti_config.py) file.