StableVideo
StableVideo is a research project presented at ICCV 2023 that enables text-driven consistency-aware diffusion video editing. The software allows users to modify video content based on textual prompts while maintaining temporal consistency across frames. It builds upon established technologies including ControlNet for edge and depth guidance and methods derived from Text2LIVE for layered neural atlases. Key features include the ability to edit specific foreground regions within a video sequence, support for various memory optimization modes to run on hardware with limited VRAM, and the capability to train on custom datasets. The system requires significant graphical memory depending on the configuration but offers options to utilize CPU caching and xformers to reduce resource requirements. Users can run the application via a Python interface to generate edited video results in MP4 format. The tool supports downloading pre-trained models and example video datasets to facilitate immediate experimentation. It is