hkchengrex

Open Source

MMAudio

MMAudio is an advanced video-to-audio synthesis model presented at CVPR 2025 that generates high-quality, synchronized audio from video inputs and optional text prompts. Developed by researchers from the University of Illinois Urbana-Champaign and Sony AI, the software utilizes a novel multimodal joint training approach to learn effectively from diverse audio-visual and audio-text datasets. A specialized synchronization module ensures precise alignment between the generated sound and video frames. The system is designed to enhance silent videos, such as those from generative video models like Sora or Veo, by adding realistic environmental sounds and dialogue. It supports various input scenarios including pure video generation or video paired with textual descriptions to guide audio creation. The project provides pre-trained models accessible via Hugging Face, along with command-line interfaces and demo scripts for local deployment requiring PyTorch and CUDA-compatible GPUs. Inference is optimized for modern h

AI & Machine Learning Audio Editing & DAW

2.2K Github Stars

Open Source

XMem

XMem is a cutting-edge deep learning model for long-term video object segmentation, published at ECCV 2022 by researchers from the University of Illinois Urbana-Champaign. It addresses the critical challenge of tracking objects over extended periods by framing the task as a memory problem inspired by the human Atkinson-Shiffrin memory model. The architecture employs three distinct memory stores: sensory memory for immediate context, working memory for short-term retention, and long-term memory for persistent object features. This multi-scale memory design allows XMem to handle videos exceeding 10,000 frames with robustness against occlusions and appearance changes, overcoming the limitations of prior methods that suffer from catastrophic memory growth or excessive feature compression. Key features include efficient GPU memory usage, high inference speeds of approximately 20 frames per second even on long sequences, and an integrated graphical user interface for interactive segmentation tasks. The software sup

ML Frameworks Video Editing

2K Github Stars

Open Source

Cutie

Cutie is a state-of-the-art video object segmentation framework and the successor to XMem, recognized as a CVPR 2024 Highlight paper. Developed by researchers at the University of Illinois Urbana-Champaign and Adobe, it delivers superior consistency, robustness, and inference speed for tracking objects across video sequences. The software provides two primary modes of operation: a programmatic interface for integration into custom pipelines via Python scripting, and an interactive graphical user interface for manual video editing with precise object control. Notable features include a permanent memory mechanism adapted from XMem++, which enhances long-term coherence and offers better controllability during interactive segmentation. The package includes ready-to-use pretrained models and requires Python 3.8+ and PyTorch 1.12+ on Linux-based systems. Users can quickly deploy the model through a simple pip installation process or utilize provided demo scripts to process image sequences and generate object masks

ML Frameworks Video Editing

1.1K Github Stars

Open Source

CascadePSP

CascadePSP is a deep learning framework designed for class-agnostic, very high-resolution semantic segmentation refinement. Published in CVPR 2020, the system employs a global and local refinement strategy to enhance coarse segmentation masks into precise, high-definition outputs. The implementation is built on PyTorch and includes both training and testing functionalities. A key feature is its ability to refine masks without requiring class-specific training, making it adaptable to various datasets including the provided UHD BIG dataset and Relabeled PASCAL VOC 2012. The architecture consists of a global step for coarse context and a local step for detailed boundary adjustments, supported by a specialized refinement module. The software offers a streamlined pip package named segmentation-refinement, allowing users to process images with minimal code. It supports execution on both CUDA and CPU devices, with adjustable parameters for memory usage and speed. Pretrained models are available for immediate inferen

ML Frameworks

884 Github Stars

Software by hkchengrex

MMAudio

XMem

Cutie

CascadePSP