Synthmantic LiDAR: A Synthetic Dataset for Semantic Segmentation on LiDAR Imaging

Synthmantic LiDAR is a synthetic dataset for LiDAR semantic segmentation, generated with a custom modification of the CARLA simulator. I adapted the simulator output to match the SemanticKITTI label space, adding missing classes and correcting labels that were inconsistent with the target dataset. This included classes such as car, motorcycle, motorcyclist, bicycle, bicyclist, and truck, which were previously grouped under broader vehicle and person categories.

The dataset contains 8 sequences of 6,000 scans across 7 maps, for a total of 48,000 scans. This is more than twice the number of labeled scans in the SemanticKITTI training subset. We also released a smaller variant, SynthmanticLiDAR-LT, built from the first 2,000 scans of each sequence.

To validate the dataset, we trained SqueezeSegV3 and Cylinder3D with a simple transfer setup: pre-train on synthetic data, then fine-tune on real data. Models pre-trained on Synthmantic LiDAR outperformed models trained only on SemanticKITTI, showing that the synthetic scans provide useful supervision for LiDAR semantic segmentation.

Method
car
bicycle
motorcycle
truck
other-vehicle
person
bicyclist
motorcyclist
road
parking
sidewalk
other-ground
building
fence
vegetation
trunk
terrain
pole
traffic-sign
mIoU
SPVCNN95.747.749.547.248.364.166.748.288.557.770.723.290.163.984.567.769.053.162.163.0
SPVCNN-F95.847.747.248.449.063.269.749.088.958.471.424.089.963.684.467.268.754.062.663.3
SPVCNN-LT95.745.644.548.047.662.668.659.188.858.371.126.890.364.784.266.668.153.262.463.5

SSV381.416.025.33.713.334.033.113.588.852.868.421.976.143.375.644.159.930.330.642.7
SSV3-F84.222.828.84.215.638.233.49.088.151.268.921.876.744.676.644.961.931.035.344.1
SSV3-LT84.420.726.86.217.135.532.419.787.952.268.619.977.545.276.142.062.631.733.844.2
Best
Second Best

The work was accepted at the 2024 IEEE International Conference on Image Processing (ICIP). The dataset is intended to support research on synthetic data, domain adaptation, and LiDAR semantic segmentation.