dvl-tum
diff --git a/‎README.md
Lines changed: 69 additions & 12 deletions b/‎README.md
Lines changed: 69 additions & 12 deletions
diff --git a/‎configs/r50_fc512_motsynth_train.yaml
Lines changed: 6 additions & 10 deletions b/‎configs/r50_fc512_motsynth_train.yaml
Lines changed: 6 additions & 10 deletions
diff --git a/‎configs/r50_fc512_motsynth_train_dflt.yaml
Lines changed: 0 additions & 42 deletions b/‎configs/r50_fc512_motsynth_train_dflt.yaml
Lines changed: 0 additions & 42 deletions
diff --git a/‎configs/tracktor.yaml
Lines changed: 6 additions & 16 deletions b/‎configs/tracktor.yaml
Lines changed: 6 additions & 16 deletions
diff --git a/‎docs/DATA_PREPARATION.md
Lines changed: 156 additions & 0 deletions b/‎docs/DATA_PREPARATION.md
Lines changed: 156 additions & 0 deletions
@@ -1,27 +1,84 @@
 # MOTSynth Baselines
-This repository provides baseline implementations for object detection, segmentation and tracking on the MOTSynth dataset.
-
-
-Pretrained models and complete instructions will be released soon after the ECCV deadline (7th of March).
+This repository provides download instructions and helper code for the [MOTSynth dataset](https://arxiv.org/abs/2108.09518), as well as baseline implementations for object detection, segmentation and tracking.
 
+Check out our:
+- [ICCV 2021 paper](https://openaccess.thecvf.com/content/ICCV2021/html/Fabbri_MOTSynth_How_Can_Synthetic_Data_Help_Pedestrian_Detection_and_Tracking_ICCV_2021_paper.html)
+- [5 min. video](https://www.youtube.com/watch?v=dc_Z1iCceL4)
+- [Dataset page](https://motchallenge.net/data/MOTSynth-MOT-CVPR22/) 
+- [Project Page](https://aimagelab.ing.unimore.it/imagelab/page.asp?IdPage=42)
 
 > ![Method Visualization](teaser_github.png)
 
 
 # Installation:
-TODO
+See [docs/INSTALL.md](docs/INSTALL.md)
 
-# Data Preparation:
-TODO
+# Dataset Download and Preparation:
+See [docs/DATA_PREPARATION.md](docs/DATA_PREPARATION.md)
 
-# Object Detection:
-TODO
+# Object Detection (and Instance Segmentation):
+We adapt [torchvision's detection reference code](https://github.com/pytorch/vision/tree/main/references/detection) to train [Mask R-CNN](https://arxiv.org/abs/1703.06870) on MOTSynth. To train Mask R-CNN with a ResNet50 with FPN backbone, you can run the following:
+```
+NUM_GPUS=3
+PORT=1234
+python -m torch.distributed.launch --nproc_per_node=$NUM_GPUS --use_env  --master_port=$PORT tools/train_detector.py\
+    --model maskrcnn_resnet50_fpn\
+    --batch-size 5 --world-size $NUM_GPUS --trainable-backbone-layers 1  --backbone resnet50 --train-dataset train --epochs 10
+```
+If you use a different number of GPUs (`$NUM_GPUS`), please adapt your learning rate or modify your batch size so that the overall batch size stays at 15 (3 GPUs with 5 images per GPU).
 
-# ReID:
-TODO
+Our trained model can be downloaded [here](https://vision.in.tum.de/webshare/u/brasoand/motsynth/maskrcnn_resnet50_fpn_epoch_10.pth)
 
 # Multi-Object Tracking:
+We use our Mask R-CNN model trained on MOTSynth to test [Tracktor](https://arxiv.org/abs/1903.05625) for tracking on MOT17.
+
+To produce results for MOT17 train, you can run the following:
+```
+python tools/test_tracktor.py
+```
+This model should yield the following results:
 TODO
 
 # Multi-Object Tracking and Segmentation:
-TODO
+We provide a simple baseline for MOTS. We run Tracktor with our trained Mask R-CNN detector, and use Mask R-CNN's segmentation head to produce an segmentation mask for every output bounding box.
+
+To evaluate this model on MOTS20, you can run the following:
+```
+python tools/test_tracktor.py  mots.do_mots=True mots.mots20_only=True
+```
+This model should yield the following results on MOT17 train:
+```
+          IDF1   IDP   IDR  Rcll  Prcn  GT  MT  PT  ML    FP    FN IDs    FM  MOTA  MOTP IDt IDa IDm
+MOT17-02 35.2% 51.7% 26.7% 38.9% 75.4%  62   8  27  27  2361 11353  99   152 25.7% 0.251  28  78   8
+MOT17-04 55.5% 65.9% 48.0% 63.2% 86.8%  83  29  33  21  4569 17524  93   245 53.3% 0.204  23  75   5
+MOT17-05 62.2% 78.4% 51.6% 59.0% 89.6% 133  30  71  32   473  2834  41    90 51.6% 0.242  29  27  16
+MOT17-09 47.4% 51.9% 43.6% 67.0% 79.8%  26  10  15   1   903  1757  51    69 49.1% 0.230  21  34   6
+MOT17-10 42.1% 60.1% 32.4% 49.1% 91.1%  57  12  23  22   614  6534 146   326 43.2% 0.240  13 129   4
+MOT17-11 57.7% 70.4% 48.9% 63.0% 90.7%  75  23  22  30   607  3491  31    43 56.2% 0.197   7  26   2
+MOT17-13 39.9% 64.7% 28.8% 38.4% 86.2% 110  17  47  46   717  7168  88   151 31.5% 0.253  42  67  23
+OVERALL  49.7% 63.7% 40.8% 54.9% 85.7% 546 129 238 179 10244 50661 549  1076 45.3% 0.220 163 436  64
+```
+
+# Person Re-Identification
+We treat MOTSynth and MOT17 as ReID datasets by sampling 1 in 60 frames and treating each pedestrian as a unique identity. We use [torchreid](https://github.com/KaiyangZhou/deep-person-reid/tree/master/torchreid)'s amazing work to train our models.
+
+You can train our baseline ReID model with a ResNet50, on MOTSynth (and evaluate it on MOT17 train) by running:
+```
+python tools/main_reid.py  --config-file configs/r50_fc512_motsynth_train.yaml 
+```
+The resulting checkpoint can be downloaded [here](https://vision.in.tum.de/webshare/u/brasoand/motsynth/resnet50_fc512_reid_epoch_19.pth) 
+
+
+# Acknowledgements
+This codebase is built on top of several great works. Our detection code is minimally modified from [torchvision's detection reference code](https://github.com/pytorch/vision/tree/main/references/detection). For MOT, we directly use [Tracktor's codebase](https://github.com/phil-bergmann/tracking_wo_bnw), and for ReID, we use the great [torchreid](https://github.com/KaiyangZhou/deep-person-reid/tree/master/torchreid) framework. [Orçun Cetintas](https://github.com/ocetintas/) also helped with the MOTS postprocesing code. We thank all the authors of these codebases for their amazing work.
+
+# Citation:
+If you find MOTSynth useful in your research, please cite our publication:
+```
+    @inproceedings{fabbri21iccv,
+            title     = {MOTSynth: How Can Synthetic Data Help Pedestrian Detection and Tracking?},
+            author    = {Matteo Fabbri and Guillem Bras{\'o} and Gianluca Maugeri and Aljo{\v{s}}a O{\v{s}}ep and Riccardo Gasparini and Orcun Cetintas and Simone Calderara and Laura Leal-Taix{\'e} and Rita Cucchiara},
+            booktitle = {International Conference on Computer Vision (ICCV)},
+            year      = {2021}
+    }
+```
@@ -4,13 +4,13 @@ model:
 
 data:
   type: 'image'
-  sources: ['motsynth_train_mini']
+  sources: ['motsynth_split_1_mini']
   targets: ['mot17']
   height: 256
   width: 128
   combineall: False
   transforms: ['random_flip']
-  save_dir: 'log/resnet50_fc512_motsynth_train_new_data'
+  save_dir: 'resnet50_fc512_motsynth_train'
 
 loss:
   name: 'softmax'
@@ -19,17 +19,13 @@ loss:
 
 train:
   optim: 'amsgrad'
-  #lr: 0.0006
-  lr: 0.0036
-  max_epoch: 120
-  #batch_size: 32
-  batch_size: 196
+  lr: 0.0009
+  max_epoch: 19
+  batch_size: 180
   fixbase_epoch: 5
-  #fixbase_epoch: 0
-  #open_layers: ['classifier']
   open_layers: ['fc', 'classifier']
   lr_scheduler: 'single_step'
-  stepsize: [60]
+  stepsize: [15]
 
 test:
   batch_size: 224
 
@@ -7,22 +7,12 @@ seed: 12345
 network: fpn
 
 mots:
-  do_mots: False
-  maskrcnn_model: /storage/user/brasoand/MOTSynth_train_1_trainable_layer_mini/model_4.pth
-  mots20_only: True
+  do_mots: False # determines whether segmentation masks are also generated during tracking
+  maskrcnn_model: maskrcnn_resnet50_fpn_epoch_10.pth # Mask RCNN checkpoint used to obtain masks. It is expected to be an absolute path or a rel path at ${OUTPUT_DIR}/models 
+  mots20_only: True # if mots.do_mots is set to True, determines whether masks are generated for all sequences or only those in MOTS20
 
-
-# frcnn
-# obj_detect_weights: output/frcnn/res101/mot_2017_train/180k/res101_faster_rcnn_iter_180000.pth
-# obj_detect_config: output/frcnn/res101/mot_2017_train/180k/sacred_config.yaml
-
-# fpn
-obj_detect_models: /storage/user/brasoand/MOTSynth_train_1_trainable_layer_mini/model_4.pth
-# obj_detect_model: output/faster_rcnn_fpn/faster_rcnn_fpn_training_mot_20/model_epoch_27.model
-
-#reid_models: /usr/wiss/brasoand/motsynth-baselines/log/resnet50_fc512_motsyn4_softmax/model/model.pth.tar-5
-#reid_models: /storage/slurm/brasoand/motsynth_output/reid/r50_split_1_ep150.pth
-reid_models: /usr/wiss/brasoand/motsynth-baselines/log/resnet50_fc512_motsynth_split_3/model/model.pth.tar-95
+obj_detect_models: maskrcnn_resnet50_fpn_epoch_10.pth # Mask RCNN checkpoint used by Tracktor. It is expected to be an absolute path or rel path at ${OUTPUT_DIR}/models 
+reid_models: resnet50_fc512_reid_epoch_19.pth # ReID model checkpoint used by Tracktor. It is expected to be at ${OUTPUT_DIR}/models 
 
 interpolate: False
 # [False, 'debug', 'pretty']
@@ -41,7 +31,7 @@ frame_range:
 
 tracker:
   # FRCNN score threshold for detections
-  detection_person_thresh: 0.5
+  detection_person_thresh: 0.95 # Only modification over the original config. A high threshold is needed to avoid FPs
   # FRCNN score threshold for keeping the track alive
   regression_person_thresh: 0.5
   # NMS threshold for detection
 
@@ -0,0 +1,156 @@
+
+# Data preparation
+## Setup
+- You can optionally modify `MOTCHA_PATH` and `MOTSYNTH_PATH` and `OUTPUT_DIR` as your directories for you MOT17, MOTSynth, and you train/eval outputs at `configs/path_cfg.py`.
+
+## Downloading and preparing MOTSynth
+
+1. Download and extract all MOTSynth videos. This will take a while...
+```
+MOTSYNTH_ROOT=$(python -c "from configs.path_cfg import MOTSYNTH_ROOT; print(MOTSYNTH_ROOT);")
+wget -P $MOTSYNTH_ROOT https://motchallenge.net/data/MOTSynth_1.zip
+wget -P $MOTSYNTH_ROOT https://motchallenge.net/data/MOTSynth_2.zip
+wget -P $MOTSYNTH_ROOT https://motchallenge.net/data/MOTSynth_3.zip
+
+unzip $MOTSYNTH_ROOT/MOTSynth_1.zip -d $MOTSYNTH_ROOT
+unzip $MOTSYNTH_ROOT/MOTSynth_2.zip -d $MOTSYNTH_ROOT
+unzip $MOTSYNTH_ROOT/MOTSynth_3.zip -d $MOTSYNTH_ROOT
+
+rm $MOTSYNTH_ROOT/MOTSynth_1.zip
+rm $MOTSYNTH_ROOT/MOTSynth_2.zip
+rm $MOTSYNTH_ROOT/MOTSynth_3.zip
+```
+2. Extract frames from the videos you downloaded. Again, this will take while.
+```
+python tools/anns/to_frames.py --motsynth-root $MOTSYNTH_ROOT
+
+# You can now delete the videos
+rm -r $MOTSYNTH_ROOT/MOTSynth_1
+rm -r $MOTSYNTH_ROOT/MOTSynth_2
+rm -r $MOTSYNTH_ROOT/MOTSynth_3
+```
+3. Download and extract the annotations (in several formats):
+```
+wget -P $MOTSYNTH_ROOT https://motchallenge.net/data/MOTSynth_coco_annotations.zip
+wget -P $MOTSYNTH_ROOT https://motchallenge.net/data/MOTSynth_mot_annotations.zip
+wget -P $MOTSYNTH_ROOT https://motchallenge.net/data/MOTSynth_mots_annotations.zip
+# Merged annotation files for ReID and detection trainings
+wget -P $MOTSYNTH_ROOT https://vision.in.tum.de/webshare/u/brasoand/motsynth/comb_annotations.zip
+
+unzip $MOTSYNTH_ROOT/MOTSynth_coco_annotations.zip -d $MOTSYNTH_ROOT
+unzip $MOTSYNTH_ROOT/MOTSynth_mot_annotations.zip -d $MOTSYNTH_ROOT
+unzip $MOTSYNTH_ROOT/MOTSynth_mots_annotations.zip -d $MOTSYNTH_ROOT
+unzip $MOTSYNTH_ROOT/comb_annotations.zip -d $MOTSYNTH_ROOT
+
+rm $MOTSYNTH_ROOT/MOTSynth_coco_annotations.zip
+rm $MOTSYNTH_ROOT/MOTSynth_mot_annotations.zip
+rm $MOTSYNTH_ROOT/MOTSynth_mots_annotations.zip
+rm $MOTSYNTH_ROOT/comb_annotations.zip
+```
+**Note**: You can generate the mot, mots and combined annotation files yourself from the original coco format annotations with the scripts `tools/anns/generate_mot_format_files.py`, `tools/anns/generate_mots_format_files.py`, and `tools/anns/combine_anns.py`, respectively.
+
+After runnning these steps, your `MOTSYNTH_ROOT` directory should look like this:
+```text
+$MOTSYNTH_ROOT
+├── frames
+    │-- 000
+    │   │-- rgb
+    │   │   │-- 0000.jpg
+    │   │   │-- 0001.jpg
+    │   │   │-- ...
+    │-- ...
+├── annotations
+    │-- 000.json
+    │-- 001.json
+    │-- ...
+├── comb_annotations
+    │-- split_1.json 
+    │-- split_2.json
+    │-- ...
+├── mot_annotations
+    │-- 000
+    │   │-- gt
+    │   │   │-- gt.txt
+    │   │-- seqinfo.ini
+    │-- ...
+├── mots_annotations
+    │-- 000
+    │   │-- gt
+    │   │   │-- gt.txt
+    │   │-- seqinfo.ini
+    │-- ...
+
+```
+
+
+## Downloading and preparing MOT17
+We will use MOT17 for both tracking and MOTS experiments, since MOTS20 sequences are a subset of MOT17 sequences. To download it, follow these steps:
+
+1. Download and extract it under `$MOTCHA_ROOT`. E.g.:
+```
+MOTCHA_ROOT=$(python -c "from configs.path_cfg import MOTCHA_ROOT; print(MOTCHA_ROOT);")
+wget -P $MOTCHA_ROOT https://motchallenge.net/data/MOT17.zip
+unzip $MOTCHA_ROOT/MOT17.zip -d $MOTCHA_ROOT
+rm $MOTCHA_ROOT/MOT17.zip
+```
+2. Download and extract COCO-format MOT17 annotations (or alternatively, you can generate them with `tools/anns/motcha_to_coco.py`). These are needed for evaluation in detection and reid trainings.
+```
+wget -P $MOTCHA_ROOT https://vision.in.tum.de/webshare/u/brasoand/motsynth/motcha_coco_annotations.zip
+unzip $MOTCHA_ROOT/motcha_coco_annotations.zip -d $MOTCHA_ROOT
+rm $MOTCHA_ROOT/motcha_coco_annotations.zip
+```
+
+After runnning these steps, your `MOTCHA_ROOT` directory should look like this:
+```
+$MOTCHA_ROOT
+├── MOT17
+|   │-- train
+|   │   │-- MOT17-02-DPM
+|   │   │   │-- gt
+|   │   │   │   |-- gt.txt      
+|   │   │   │-- det
+|   │   │   │   |-- det.txt    
+|   │   │   |-- img1
+|   │   │   │   |-- 000001.jpg
+|   │   │   │   |-- 000002.jpg
+|   │   │   │   |-- ...
+|   │   │   │-- seqinfo.ini    
+|   |   |-- MOT17-02-FRCNN
+|   │   │   │-- ...    
+|   |   |-- ...
+|   │-- test
+|       │-- MOT17-01-DPM 
+|       │-- ...
+|       
+|--motcha_coco_annotations
+   │-- MOT17-02.json 
+   │-- ...
+   │-- MOT17-train.json 
+```
+## ReID data
+**Note**: This is only needed if you want to train you own ReID model.
+
+To train and evaluate ReID models, we store the bounding-box cropped images of pedestrians in every 60th frame from both MOTSynth and MOT17, respectively. You can download these images here:
+```
+# For MOT17
+MOTCHA_ROOT=$(python -c "from configs.path_cfg import MOTCHA_ROOT; print(MOTCHA_ROOT);")
+wget -P $MOTCHA_ROOT https://vision.in.tum.de/webshare/u/brasoand/motsynth/motcha_reid_images.zip.zip
+unzip $MOTCHA_ROOT/motcha_reid_images.zip -d $MOTCHA_ROOT
+rm $MOTCHA_ROOT/motcha_reid_images.zip
+
+# For MOTSynth
+MOTSYNTH_ROOT=$(python -c "from configs.path_cfg import MOTSYNTH_ROOT; print(MOTSYNTH_ROOT);")
+wget -P $MOTSYNTH_ROOT https://vision.in.tum.de/webshare/u/brasoand/motsynth/motsynth_reid_images.zip.zip
+unzip $MOTSYNTH_ROOT/motsynth_reid_images.zip -d $MOTSYNTH_ROOT
+rm $MOTSYNTH_ROOT/motsynth_reid_images.zip
+
+```
+
+Alternatively, you can directly generate these images locally by running:
+```
+# For MOT17
+python tools/anns/store_reid_imgs.py --ann-path $MOTCHA_ROOT/motcha_coco_annotations/MOT17-train.json
+
+# For MOTSynth
+python tools/anns/store_reid_imgs.py --ann-path $MOTSYNTH_ROOT/comb_annotations/train_mini.json
+```