About LAMP

LAMP is an official implementation of the research paper accepted at CVPR 2024, designed for few-shot text-to-video generation. This software enables users to learn specific motion patterns from a small set of 8 to 16 video samples and apply them to generate new videos based on text prompts. The framework utilizes a Stable Diffusion v1.4 backbone and requires a single GPU with over 15 GB VRAM for efficient training. It supports both general video generation and advanced video editing tasks. The repository includes source code, pre-trained checkpoints, and training data for various motion types such as birds flying, fireworks, helicopter movement, and horses running. Users can train their own models using custom video datasets collected from public sources or utilize provided examples via Google Drive, Baidu Disk, or Colab notebooks. The system is built on Python 3.8, PyTorch 1.12.1, and runs on Ubuntu with CUDA 11.3. It offers a streamlined workflow for researchers and developers to create dynamic video conte

r

Published by

rq-wu

Visit View Profile

README.md

View on GitHub

[CVPR 2024] | LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation

This repository is the official implementation of [LAMP]()

LAMP: Learn A Motion Pattern for Few-Shot Video Generation
Ruiqi Wu, Linagyu Chen, Tong Yang, Chunle Guo, Chongyi Li, Xiangyu Zhang
( * indicates corresponding author)

[Arxiv Paper] [Website Page] [Google Drive] [Baidu Disk (pwd: ffsp)] [Colab Notebook] method

:rocket: LAMP is a few-shot-based method for text-to-video generation. You only need 8~16 videos 1 GPU (> 15 GB VRAM) for training!! Then you can generate videos with learned motion pattern.

News

[2024/02/27] Our paper is accepted by CVPR2024!
[2023/11/15] The code for applying LAMP on video editing is released!
[2023/11/02] The Colab demo is released! Thanks for the PR of @ShashwatNigam99.
[2023/10/21] We add Google Drive link about our checkpoints and training data.
[2023/10/17] We release our checkpoints and Arxiv paper.
[2023/10/16] Our code is publicly available.
Preparation

Dependencies and Installation
Ubuntu > 18.04
CUDA=11.3
Others:

# clone the repo
git clone https://github.com/RQ-Wu/LAMP.git
cd LAMP

# create virtual environment
conda create -n LAMP python=3.8
conda activate LAMP

# install packages
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
pip install -r requirements.txt
pip install xformers==0.0.13

Weights and Data

You can download pre-trained T2I diffusion models on Hugging Face. In our work, we use Stable Diffusion v1.4 as our backbone network. Clone the pretrained weights by git-lfs and put them in ./checkpoints
Our checkpoint and training data are listed as follows. You can also collect video data by your own (Suggest websites: pexels, frozen-in-time) and put .mp4 files in ./training_videos/[motion_name]/

[Update] You can find the training video for video editing demo in assets/run.mp4

Motion Name	Checkpoint Link	Training data
Birds fly	Baidu Disk (pwd: jj0o)	Baidu Disk (pwd: w96b)
Firework	Baidu Disk (pwd: wj1p)	Baidu Disk (pwd: oamp)
Helicopter	Baidu Disk (pwd: egpe)	Baidu Disk (pwd: t4ba)
Horse run	Baidu Disk (pwd: 19ld)	Baidu Disk (pwd: mte7)
Play the guitar	Baidu Disk (pwd: l4dw)	Baidu Disk (pwd: js26)
Rain	Baidu Disk (pwd: jomu)	Baidu Disk (pwd: 31ug)
Turn to smile	Baidu Disk (pwd: 2bkl)	Baidu Disk (pwd: l984)
Waterfall	Baidu Disk (pwd: vpkk)	Baidu Disk (pwd: 2edp)
All	Baidu Disk (pwd: ifsm)	Baidu Disk (pwd: 2i2k)

Get Started

1. Training

# Training code to learn a motion pattern
CUDA_VISIBLE_DEVICES=X accelerate launch train_lamp.py config="configs/horse-run.yaml"

# Training code for video editing (The training video can be found in assets/run.mp4)
CUDA_VISIBLE_DEVICES=X accelerate launch train_lamp.py config="configs/run.yaml"

2. Inference

Here is an example command for inference

# Motion Pattern
python inference_script.py --weight ./my_weight/turn_to_smile/unet --pretrain_weight ./checkpoints/stable-diffusion-v1-4 --first_frame_path ./benchmark/turn_to_smile/head_photo_of_a_cute_girl,_comic_style.png --prompt "head photo of a cute girl, comic style, turns to smile"

# Video Editing
python inference_script.py --weight ./outputs/run/unet --pretrain_weight ./checkpoints/stable-diffusion-v1-4 --first_frame_path ./bemchmark/editing/a_girl_runs_beside_a_river,_Van_Gogh_style.png --length 24 --editing

#########################################################################################################
# --weight:           the path of our model
# --pretrain_weight:  the path of the pre-trained model (e.g. SDv1.4)
# --first_frame_path: the path of the first frame generated by T2I model (e.g. SD-XL)
# --prompt:           the input prompt, the default value is aligned with the filename of the first frame
# --output:           output path, default: ./results 
# --height:           video height, default: 320
# --width:            video width, default: 512
# --length            video length, default: 16
# --cfg:              classifier-free guidance, default: 12.5
#########################################################################################################

Visual Examples

Few-Shot-Based Text-to-Video Generation

Horse run
	A horse runs in the universe.	A horse runs on the Mars.	A horse runs on the road.
Firework
	Fireworks in desert night.	Fireworks over the mountains.	Fireworks in the night city.
Play the guitar
	GTA5 poster, a man plays the guitar.	A woman plays the guitar.	An astronaut plays the guitar, photorealistic.
Birds fly
	Birds fly in the pink sky.	Birds fly in the sky, over the sea.	Many Birds fly over a plaza.

Video Editing

Origin Videos	Editing Result-1	Editing Result-2

	A girl in black runs on the road.	A man runs on the road.

	A man is dancing.	A girl in white is dancing.

Citation

If you find our repo useful for your research, please cite us:

@inproceedings{wu2024lamp,
      title={LAMP: Learn A Motion Pattern for Few-Shot Video Generation},
      author={Wu, Ruiqi and Chen, Liangyu and Yang, Tong and Guo, Chunle and Li, Chongyi and Zhang, Xiangyu},
      booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
      year={2024}

License

Licensed under a Creative Commons Attribution-NonCommercial 4.0 International for Non-commercial use only. Any commercial use should get formal permission first.

Acknowledgement

This repository is maintained by Ruiqi Wu. The code is built based on Tune-A-Video. Thanks for the excellent open-source code!!

LAMP