This repository is unofficial implementation of CamTrol: Training-free Camera Control for Video Generation, based on SVD.
Some videos generated through SVD:
![]() |
![]() |
![]() |
![]() |
-
pip install -r requirement.txt
-
Download SVD checkpoint svd.safetensors and set its path at
ckpt_path
insgm/svd.yaml
. -
Clone depth estimation model:
git clone https://github.com/isl-org/ZoeDepth.git
The code downloads stable-diffusion-inpainting and open-clip automatically, you can set to your path if they're already done.
CUDA_VISIBLE_DEVICES=0 python3 sampling.py \
--input_path "assets/images/street.jpg" \
--prompt "a vivid anime street, wind blows." \
--neg_prompt " " \
--pcd_mode "hybrid default 14 out_left_up_down" \
--add_index 12 \
--seed 1 \
--save_warps False \
--load_warps None
pcd_mode
: camera motion for point cloud rendering, a string concat by four elements. For each element, the first defines camera motion, the second defines moving distance or angle, the third defines number of frames, the last defines moving direction. You can load any camera extrinsics matrices incomplex
mode, and set biggeradd_index
for better motion alignment.prompt
,neg_prompt
: as SVD doesn't support text input, these mainly serve for stable diffusion inpainting.add_index
: t_0 in the paper, balancing trade-off between motion fidelity and video diversity. Set between0
andnum_frames
, the bigger the more faithful video aligns to camera motion.save_warps
: whether save multi-view renderings, you can reload the already-rendered images as this process might takes some time. Use low-res images to boost speed.load_warps
: whether load renderings fromsave_warps
or not.
I used SVD in this repository. You can use it on your customized video diffusion model.
The code is majorly founded on SVD and LucidDreamer.