Learning from Synchronization: Self-Supervised Uncalibrated Multi-View Person Association in Challenging Scenes
Keqi Chen, Vinkle Srivastav, Didier Mutter, Nicolas Padoy, CVPR 2025
Sample outputs of Self-MVA showing the performance on the WILDTRACK / MVOR / SOLDIERS datasets.
- Training and testing code for Self-MVA, a state-of-the-art self-supervised uncalibrated multi-view person association method.
- Trained models on the WILDTRACK / MVOR / SOLDIERS datasets.
- Clone this repo, and we'll call the directory that you cloned as ${ROOT_DIR}.
- Install dependencies.
> conda create -n selfmva python=3.9
> conda activate selfmva
(selfmva)> conda install pytorch==1.13.1 torchvision==0.14.1 pytorch-cuda=11.7 -c pytorch -c nvidia
(selfmva)> pip install -r requirements.txt
- Install Torchreid following deep-person-reid.
- Download the WILDTRACK dataset and place it in
./data/
as:
${ROOT_DIR}
|-- data
|-- Wildtrack
|-- sequence1
|-- output
| |-- detections_train.json
|-- src
| |-- annotations_positions
| |-- Image_subsets
- Run command:
python ./ssl/preprocess.py --dataset Wildtrack
- Download the MVOR dataset, and place them in
./data/
as:
${ROOT_DIR}
|-- data
|-- MVOR
|-- sequence1
|-- output
| |-- detections_train.json
|-- camma_mvor_dataset
|-- day1
|-- day2
|-- day3
|-- day4
|-- camma_mvor_2018_v2.json
- Run command:
python ./ssl/preprocess.py --dataset MVOR
- The original SOLDIERS dataset are unprocessed videos. For convenience, we provide the processed frames and annotations. Download and unzip in
./data/
as:
${ROOT_DIR}
|-- data
|-- soldiers
|-- output
|-- frames
|-- annotated_box_test.json
|-- detections_train.json
- Download the pre-trained models as follows.
> wget https://s3.unistra.fr/camma_public/github/Self-MVA/weights/prompt_vit_h_4b8939.pth
> wget https://s3.unistra.fr/camma_public/github/Self-MVA/weights/osnet_ain_ms_d_c.pth.tar
- Download the trained models on three datasets.
Model | Model Weights |
---|---|
WILDTRACK | download |
MVOR | download |
SOLDIERS | download |
The directory tree should look like this:
${ROOT_DIR}
|-- weights
| |-- mvor_edge_geometry_reid.pth.tar
| |-- osnet_ain_ms_d_c.pth.tar
| |-- prompt_vit_h_4b8939.pth
| |-- soldiers_geometry.pth.tar
| |-- wildtrack_edge_geometry_reid.pth.tar
python ssl/main.py --cfg configs/wildtrack.yaml
python ssl/main.py --cfg configs/mvor.yaml
python ssl/main.py --cfg configs/soldiers.yaml
python ssl/main.py --test --cfg configs/wildtrack.yaml
python ssl/main.py --test --cfg configs/mvor.yaml
python ssl/main.py --test --cfg configs/soldiers.yaml
Currently, we do not support APIs for custom dataset training. If you want to use Self-MVA on your own dataset, you can follow the steps below:
- Prepare multi-view images in the format of
{frame_id}_{camera_id}.jpg
. - Generate human bounding boxes using any off-the-shelf detector with decent performance, and save the results as a json file, whose format should be:
{frame_id: [[x1, y1, width, height, tracking_id (-1), camera_id], ...], ...}
- Manually register the dataset in
./ssl/dataset.py
. - Write the config file.
If you use our code or models in your research, please cite with:
@InProceedings{Chen_2025_CVPR,
author = {Chen, Keqi and Srivastav, Vinkle and Mutter, Didier and Padoy, Nicolas},
title = {Learning from Synchronization: Self-Supervised Uncalibrated Multi-View Person Association in Challenging Scenes},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2025}
}
The project uses segment-anything, ReST and MvMHAT. We thank the authors for releasing their codes.
This code and models are available for non-commercial scientific research purposes as defined in the CC BY-NC-SA 4.0. By downloading and using this code you agree to the terms in the LICENSE. Third-party codes are subject to their respective licenses.