Synthmantic LiDAR: A Synthetic Dataset for Semantic Segmentation on LiDAR Imaging

Synthmantic LiDAR is a synthetic dataset for LiDAR semantic segmentation, generated with a custom modification of the CARLA simulator. I adapted the simulator output to match the SemanticKITTI label space, adding missing classes and correcting labels that were inconsistent with the target dataset. This included classes such as car, motorcycle, motorcyclist, bicycle, bicyclist, and truck, which were previously grouped under broader vehicle and person categories.

The dataset contains 8 sequences of 6,000 scans across 7 maps, for a total of 48,000 scans. This is more than twice the number of labeled scans in the SemanticKITTI training subset. We also released a smaller variant, SynthmanticLiDAR-LT, built from the first 2,000 scans of each sequence.

To validate the dataset, we trained SqueezeSegV3 and Cylinder3D with a simple transfer setup: pre-train on synthetic data, then fine-tune on real data. Models pre-trained on Synthmantic LiDAR outperformed models trained only on SemanticKITTI, showing that the synthetic scans provide useful supervision for LiDAR semantic segmentation.

Method	car	bicycle	motorcycle	truck	other-vehicle	person	bicyclist	motorcyclist	road	parking	sidewalk	other-ground	building	fence	vegetation	trunk	terrain	pole	traffic-sign	mIoU
SPVCNN	95.7	47.7	49.5	47.2	48.3	64.1	66.7	48.2	88.5	57.7	70.7	23.2	90.1	63.9	84.5	67.7	69.0	53.1	62.1	63.0
SPVCNN-F	95.8	47.7	47.2	48.4	49.0	63.2	69.7	49.0	88.9	58.4	71.4	24.0	89.9	63.6	84.4	67.2	68.7	54.0	62.6	63.3
SPVCNN-LT	95.7	45.6	44.5	48.0	47.6	62.6	68.6	59.1	88.8	58.3	71.1	26.8	90.3	64.7	84.2	66.6	68.1	53.2	62.4	63.5

SSV3	81.4	16.0	25.3	3.7	13.3	34.0	33.1	13.5	88.8	52.8	68.4	21.9	76.1	43.3	75.6	44.1	59.9	30.3	30.6	42.7
SSV3-F	84.2	22.8	28.8	4.2	15.6	38.2	33.4	9.0	88.1	51.2	68.9	21.8	76.7	44.6	76.6	44.9	61.9	31.0	35.3	44.1
SSV3-LT	84.4	20.7	26.8	6.2	17.1	35.5	32.4	19.7	87.9	52.2	68.6	19.9	77.5	45.2	76.1	42.0	62.6	31.7	33.8	44.2

Best

Second Best

The work was accepted at the 2024 IEEE International Conference on Image Processing (ICIP). The dataset is intended to support research on synthetic data, domain adaptation, and LiDAR semantic segmentation.