OPERA
OPERA is an advanced decoding method designed to alleviate hallucinations in Multi-Modal Large Language Models (MLLMs) without requiring additional training data, external knowledge, or model fine-tuning. Selected as a Highlight at CVPR 2024, OPERA addresses the common issue where models over-rely on specific summary tokens while neglecting visual information, leading to inaccurate descriptions of image content. The approach introduces two core mechanisms: an Over-Trust Penalty and a Retrospection-Allocation strategy. During beam-search decoding, the system applies a penalty term to the model logits to discourage excessive focus on summary tokens. Simultaneously, it employs a rollback mechanism that reviews previously generated tokens to detect over-trust patterns and re-allocates token selection accordingly. This method operates as a nearly cost-free enhancement, making it a practical solution for improving the precision and reliability of MLLMs in real-world applications. The implementation is available as