EgoLife
EgoLife is a CVPR 2025 research project focused on developing an egocentric life assistant. It features a comprehensive multimodal dataset capturing daily activities of six participants over a week using Meta Aria glasses, synchronized third-person cameras, and mmWave sensors. This rich resource supports long-term video understanding and real-world AI applications. The project introduces two core models: EgoGPT, an omni-modal vision-language model fine-tuned for egocentric scenarios that performs continuous video captioning to extract key events, actions, and context from first-person video and audio streams; and EgoRAG, a retrieval-augmented generation module enabling long-term reasoning and memory reconstruction. EgoRAG utilizes a hierarchical memory bank with hourly and daily summaries to retrieve time-stamped relevant past events for context-aware question answering. The system assists users with memory support, habit tracking, event recall, and task management. EgoLife provides public access to the datas