Home
Softono
r

rlhf-v

Professional software vendor delivering innovative solutions on the Softono platform. Specialized in both open-source and proprietary software development.

Total Products
2

Software by rlhf-v

RLAIF-V
Open Source

RLAIF-V

RLAIF-V is an open-source framework for aligning Multimodal Large Language Models to achieve super GPT-4V trustworthiness. Presented as a highlight at CVPR 2025, the system utilizes a fully open-source paradigm that combines high-quality AI-generated feedback data with online feedback learning algorithms. The project provides access to the RLAIF-V-7B and RLAIF-V-12B model weights, along with the RLAIF-V-Dataset, which contains over 83,000 high-quality preference pairs generated across diverse tasks and models like LLaVA and MiniCPM-V. Key capabilities include significantly reducing hallucinations in generative and discriminative tasks, offering scalable inference rewards that improve performance with increased budget, and supporting LoRA training for efficient fine-tuning. The framework is designed to minimize reliance on closed-source models by leveraging robust open-source feedback to enhance model reliability and generalization.

AI & Machine Learning ML Frameworks
455 Github Stars
RLHF-V
Open Source

RLHF-V

RLHF-V is a framework designed to enhance the trustworthiness of Multimodal Large Language Models (MLLMs) through behavior alignment using fine-grained correctional human feedback. Instead of simple preference judgments, the system leverages human annotators to identify and correct hallucinated segments within model responses, creating a high-precision dataset. This approach demonstrates exceptional data efficiency, capable of reducing model hallucination rates by nearly 35 percent after only one hour of training on eight A100 GPUs. The framework has been validated on strong MLLMs like Muffin and has powered significant research outcomes, including the OmniLMM-12B model, which achieved top rankings on benchmarks for open-source models and surpassed GPT-4V in object hallucination tasks. The project includes a publicly available dataset covering diverse model outputs and image types, alongside training code and model weights. It was presented at CVPR 2024 and serves as a foundational component for subsequent de

ML Frameworks
308 Github Stars