This repo contains the code for the 3D semantic scene generation method proposed in the paper: "Towards Generating Realistic 3D Semantic Training Data for Autonomous Driving"
In this paper we propose a 3D semantic scene generation method without requiring image projections and without training multiple decoupled VAE and DDPM models. By training the VAE and the DDPM with a single model we achieved more realistic scene generation compared to previous methods. In the article we also showed that training a semantic segmentation network with real data and with scenes generated by our method we were able to improve the model performance in the semantic segmentation task.
- We released the v2 of the model in the
layers_sup
branch - We released an infinite city generate pipeline script
Installing python (we have used python 3.9) packages pre-requisites:
sudo apt install build-essential python3-dev libopenblas-dev
pip install -r requirements.txt
Installing MinkowskiEngine:
pip install -U MinkowskiEngine==0.5.4 --install-option="--blas=openblas" -v --no-deps
To setup the code run the following command on the code main directory:
pip install -U -e .
You can also install the dependencies with conda environment:
conda create --name 3diss python=3.9 && conda activate 3diss
Then again, installing python packages pre-requisites:
sudo apt install build-essential python3-dev libopenblas-dev
pip install -r requirements.txt
And installing MinkowskiEngine:
pip install -U git+https://github.com/NVIDIA/MinkowskiEngine -v --no-deps
NOTE: At the moment, MinkowskiEngine is not compatible with python 3.10+, see this issue
The SemanticKITTI dataset has to be download from the official site and extracted in the following structure:
./diss/
└── data/
└── SemanticKITTI
└── dataset
└── sequences
├── 00/
│ ├── velodyne/
| | ├── 000000.bin
| | ├── 000001.bin
| | └── ...
│ └── labels/
| ├── 000000.label
| ├── 000001.label
| └── ...
├── 08/ # for validation
├── 11/ # 11-21 for testing
└── 21/
└── ...
For the poses we use pin-slam to compute it. You can download the poses from here and extract it to ./diss/data/SemanticKITTI/datasets/sequences/pin_slam_poses
.
To generate the ground complete scenes you can run the sem_map_from_scans.py
script. This will use the dataset scans and poses to generate the sequence map to be used as ground truth during training:
python tools/sem_map_from_scans.py
Once the sequences map is generated you can then train the VAE and diffusion models.
To train the VAE you can run the following command:
python vae_train.py
By default we set the config as used in the paper, training with batch size 2 and with 6 NVIDIA A40 GPUs. In case you want to change the VAE training config you can edit the config/vae.yaml
file.
After the VAE is trained you can run the VAE refinement training with:
python vae_train.py --weights VAE_CKPT --config config/vae_refine.yaml
Which will do the refinement training only on the VAE decoder weights.
After the VAE is trained you can run the folowing command to train the unconditional DDPM:
python diff_train.py --vae_weights VAE_CKPT
By default, the diffusion training is set to be trained as an unconditional DDPM and with the configuration used in the paper, with 8 NVIDIA A40 GPUs. In case you want to change the configuration you can change the file config/diff.yaml
.
For the LiDAR scan conditioning training you can run:
python diff_train.py --vae_weights VAE_CKPT --config config/diff_cond_config.yaml --condition single_scan
Which will train the model conditioned to the dataset LiDAR point clouds.
You can download the trained model weights from the following links:
For running the unconditional scene generation we provide a pipeline where both the diffusion and VAE trained models are loaded and used to generate a novel scene. You can run the pipeline with the command:
python tools/diff_pipeline.py --diff DIFF_CKPT --vae VAE_REFINE_CKPT
To run the pipeline for the conditional scene generation you can run:
python tools/diff_pipeline.py --path PATH_TO_SCANS --diff DIFF_CKPT --vae VAE_REFINE_CKPT --condition single_scan
The generated point cloud will be saved in results/{EXPERIMENT}/diff_x0
.
To visualize the generated point clouds we provide a visualization tool which can be used as:
python tools/pcd_vis.py --path results/{EXPERIMENT}/diff_x0
If you use this repo, please cite as :
@article{nunes2025arxiv,
author = {Lucas Nunes and Rodrigo Marcuzzi and Jens Behley and Cyrill Stachniss},
title = {{Towards Generating Realistic 3D Semantic Training Data for Autonomous Driving}},
journal = arxiv,
year = {2025},
volume = {arXiv:2503.21449}
}