Camera-LiDAR Fusion for HD Map Generation
Offline camera-LiDAR fusion pipeline that automatically detects traffic signs and road markings, estimates their 3D pose, and annotates HD maps for autonomous vehicles. Achieved 97.5% recall over 14 km of real Italian roads.
Overview
This Master Thesis project, carried out within the AIDA (Artificial Intelligence Driving Autonomous) research group at Politecnico di Milano, tackles the challenge of automatically generating regulation-aware HD maps for autonomous driving in urban environments. The core problem: autonomous vehicles depend on pre-constructed HD maps for localization and planning, but manually annotating traffic regulations is slow and error-prone.
Pipeline Architecture
The system is composed of four main stages that process recorded drives:
- Detection: A YOLOv11-M detector, trained on a custom dataset of 20,930 images (34,000+ annotations) spanning 31 traffic sign classes. Hyperparameters were optimized via Bayesian Optimization (20 runs, 30 epochs each) targeting mAP50-95. Training achieved mAP50 of 0.917, Recall of 0.841, Precision of 0.923.
- Tracking: BoT-SORT multi-object tracker with appearance re-identification and global motion compensation. A custom multi-camera mosaic approach fuses detections from 3 cameras with a selection layer to avoid duplicates.
- 3D Localization & Pose Estimation: LiDAR points are projected onto camera frames and filtered per bounding box. Background point removal via statistical filtering, followed by 3D pose estimation (normal vector extraction) for vertical signs and side refinement with ROI extraction for horizontal markings.
- HD Map Route Annotation: Detected signs are converted into regulatory Special Nodes (stop, yield, crosswalk, speed limit) placed at precise distances along the vehicle's route with a hierarchical fallback system that prioritizes horizontal markings over vertical signs.
Results
Validated on 14 km of Italian roads. Recall ranges from 93% to 100% across classes. In the fallback configuration, only 3 missed events out of 161 and 13 false positives. Processing time: 5 minutes per kilometer.
Upcoming publication: Camera-LiDAR Fusion for Traffic Sign and Road Marking Extraction for Autonomous Driving (Feb 2026, CCTA).