RAVE
RAVE is a zero-shot, lightweight, and fast framework for text-guided video editing presented at CVPR 2024. It leverages pre-trained text-to-image diffusion models without requiring additional training to transform input videos based on text prompts. The core innovation is a novel randomized noise shuffling strategy that exploits spatio-temporal interactions between frames, ensuring high visual quality and temporal consistency while significantly accelerating the editing process compared to existing methods. RAVE supports videos of any length and maintains the original motion and semantic structure of the source footage. It is versatile enough to handle diverse editing tasks ranging from local attribute modifications to complex shape transformations in dynamic scenes involving human activities, animals, or vehicles. The framework is memory-efficient and compatible with off-the-shelf models, including those from CivitAI. RAVE includes a standardized dataset for evaluating video editing methods and offers infere