TeleMem: Building Long-Term and Multimodal Memory for Agentic AI
If you find this project helpful, please give us a โญ๏ธ on GitHub for the latest update.
๐ค Contributions welcome! Feel free to open an issue or submit a pull request.
TeleMem is an agent memory management layer that can be used as a high-performance drop-in replacement for Mem0 with one line of code (import telemem as mem0), deeply optimized for complex scenarios involving multi-turn dialogues, character modeling, long-term information storage, and semantic retrieval.
Through its unique context-aware enhancement mechanism, TeleMem provides conversational AI with core infrastructure offering higher accuracy, faster performance, and stronger character memory capabilities.
Building upon this foundation, TeleMem implements video understanding, multimodal reasoning, and visual question answering capabilities. Through a complete pipeline of video frame extraction, caption generation, and vector database construction, AI Agents can effortlessly store, retrieve, and reason over video content just like handling text memories.
The ultimate goal of the TeleMem project is to use an agent's hindsight to improve its foresignt.
TeleMem, where memory lives on and intelligence grows strong.
๐ข Latest Updates
- [2026-01-28] ๐ TeleMem v1.3.0 has been released!
- [2026-01-22] ๐ TeleMem Tech Report has been updated to its 4th version!
- [2026-01-13] ๐ TeleMem Tech Report has been released on arXiv!
- [2026-01-09] ๐ TeleMem v1.2.0 has been released!
- [2025-12-31] ๐ TeleMem v1.1.0 has been released!
- [2025-12-05] ๐ TeleMem v1.0.0 has been released!
๐ฅ Research Highlights
- Significantly improved memory accuracy: Achieved 86.33% accuracy on the ZH-4O Chinese multi-character long-dialogue benchmark, 19% higher than Mem0.
- Doubled speed performance: Millisecond-level semantic retrieval enabled by efficient buffering and batch writing.
- Greatly reduced token cost: Optimized token usage delivers the same performance with significantly lower LLM overhead.
- Precise character memory preservation: Automatically builds independent memory profiles for each character, eliminating confusion.
- Automated Video Processing Pipeline: From raw video โ frame extraction โ caption generation โ vector database, fully automated
- ReAct-Style Video QA: Multi-step reasoning + tool calling for precise video content understanding
๐ Table of Contents
- Project Introduction
- TeleMem vs Mem0: Core Advantages
- Experimental Results
- Quick Start
- Project Structure
- Core Functions
- Multimodal Extensions
- Data Storage Explanation
- Development and Contribution
- Acknowledgements
Project Introduction
TeleMem enables conversational AI to maintain stable, natural, and continuous worldviews and character settings during long-term interactions through a deeply optimized pipeline of character-aware summarization โ semantic clustering deduplication โ efficient storage โ precise retrieval.
Features
- Automatic memory extraction: Extracts and structures key facts from dialogues.
- Semantic clustering & deduplication: Uses LLMs to semantically merge similar memories, reducing conflicts and improving consistency.
- Character-profiled memory management: Builds independent memory archives for each character in a dialogue, ensuring precise isolation and personalized management.
- Efficient asynchronous writing: Employs a buffer + batch-flush mechanism for high-performance, stable persistence.
- Precise semantic retrieval: Combines FAISS + JSON dual storage for fast recall and human-readable auditability.
Applicable Scenarios
-
Multi-character virtual agent systems
-
Long-memory AI assistants (e.g., customer service, companionship, creative co-pilots)
-
Complex narrative/world-building in virtual environments
-
Dialogue scenarios with strong contextual dependencies
-
Video content QA and reasoning
-
Multimodal agent memory management
-
Long video understanding and information retrieval

TeleMem vs Mem0: Core Advantages
TeleMem deeply refactors Mem0 to address characterization, long-term memory, and high performance. Key differences:
| Capability Dimension | Mem0 | TeleMem |
|---|---|---|
| Multi-character separation | โ Not supported | โ Automatically creates independent memory profiles per character |
| Summary quality | Basic summarization | โ Context-aware + character-focused prompts covering key entities, actions, and timestamps |
| Deduplication mechanism | Vector similarity filtering | โ LLM-based semantic clustering: merges similar memories via LLM |
| Write performance | Streaming, single writes | โ Batch flush + concurrency: 2โ3ร faster writes |
| Storage format | SQLite / vector DB | โ FAISS + JSON metadata dual-write: fast retrieval + human-readable |
| Multimodal Capability | Single image to text only | โ Video Multimodal Memory: Full video processing pipeline + ReAct multi-step reasoning QA |
Experimental Results
Dataset
We evaluate the ZH-4O Chinese long-character dialogue dataset constructed in the paper MOOM: Maintenance, Organization and Optimization of Memory in Ultra-Long Role-Playing Dialogues:
- Average dialogue length: 600 turns per conversation
- Scenarios: daily interactions, plot progression, evolving character relationships
Memory capability was assessed via QA benchmarks, e.g.:
{
"question": "What is Zhao Qi's nickname for Bai Yulan? A Xiaobai B Xiaoyu C Lanlan D Yuyu",
"answer": "A"
},
{
"question": "What is the relationship between Zhao Qi and Bai Yulan? A Classmates B Teacher and student C Enemies D Neighbors",
"answer": "B"
}
Experimental Configuration
-
LLM: Qwen3-8B (thinking mode disabled)
-
Embedding model: Qwen3-Embedding-8B
-
Metric: QA accuracy
Method Overall(%) RAG 62.45 Mem0 70.20 MOOM 72.60 A-mem 73.78 Memobase 76.78 TeleMem 86.33
Quick Start
Environment Preparation
# Create and activate virtual environment
conda create -n telemem python=3.10
conda activate telemem
# Install dependencies
pip install -e .
Example
Set your OpenAI API key:
export OPENAI_API_KEY="your-openai-api-key"
# python examples/quickstart.py
import telemem as mem0
memory = mem0.Memory()
messages = [
{"role": "user", "content": "Jordan, did you take the subway to work again today?"},
{"role": "assistant", "content": "Yes, James. The subway is much faster than driving. I leave at 7 o'clock and it's just not crowded."},
{"role": "user", "content": "Jordan, I want to try taking the subway too. Can you tell me which station is closest?"},
{"role": "assistant", "content": "Of course, James. You take Line 2 to Civic Center Station, exit from Exit A, and walk 5 minutes to the company."}
]
memory.add(messages=messages, user_id="Jordan")
results = memory.search("What transportation did Jordan use to go to work today?", user_id="Jordan")
print(results)
Memory() uses the default provider settings inherited from mem0ai. To use the repository's local Qwen + FAISS configuration, load config/config.yaml explicitly:
from telemem.utils import load_config
import telemem as mem0
config = load_config("config/config.yaml")
memory = mem0.Memory(config=config)
The runnable examples also honor the same configuration through TELEMEM_CONFIG:
TELEMEM_CONFIG=config/config.yaml python examples/quickstart.py
Using MiniMax as the LLM Provider
TeleMem supports MiniMax as an LLM backend via its OpenAI-compatible API.
A ready-to-use example config is provided at config/config.minimax.yaml.
export MINIMAX_API_KEY="your-minimax-api-key"
export OPENAI_API_KEY="your-openai-api-key" # still needed for embeddings
from telemem.utils import load_config
import telemem as mem0
config = load_config("config/config.minimax.yaml")
memory = mem0.Memory(config=config)
Key points for MiniMax usage:
- LLM: MiniMax M3 (512K context, default) via
https://api.minimax.io/v1; MiniMax M2.7 / M2.7-highspeed (204K context) remain available as alternatives - Temperature: must be in (0.0, 1.0] โ set explicitly (e.g.
0.7) to avoid out-of-range errors - Embeddings: MiniMax does not provide a public embedding API; configure a separate embedder (e.g.
text-embedding-3-small) in theembeddersection
Project Structure
Expand/Collapse Directory Structure
telemem/
โโโ assets/ # Documentation assets and figures
โโโ baselines/ # Baseline implementations for comparative evaluation
โ โโโ RAG # Retrieval-Augmented Generation baseline
โ โโโ MemoBase # MemoBase memory management system
โ โโโ MOOM # MOOM dual-branch narrative memory framework
โ โโโ A-mem # A-mem agent memory baseline
โ โโโ Mem0 # Mem0 baseline implementation
โโโ config/
โ โโโ config.yaml # TeleMem default configuration
โ โโโ config.minimax.yaml # MiniMax provider example configuration
โโโ data/ # Small sample datasets for evaluation or demonstration
โโโ examples/ # Code examples and tutorial demos
โ โโโ quickstart.py ย ย ย ย ย ย ย # Quick start
โ โโโ quickstart_mm.py # Quick start (multimodal)
โโโ docs/
โ โโโ TeleMem_Tech_Report.pdf
โโโ telemem/ # Telemem code
โโโ tests/ # Telemem test
โโโ README.md ย ย ย ย ย ย ย ย ย ย ย ย ย # English README
โโโ README-ZH.md ย ย ย ย ย ย ย ย ย ย ย # Chinese README
โโโ pyproject.toml # Python environment
Core Functions
Add Memory (add)
The add() method injects one or more dialogue turns into the memory system.
def add(
self,
messages,
*,
user_id: Optional[str] = None,
agent_id: Optional[str] = None,
run_id: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
infer: bool = True,
memory_type: Optional[str] = None,
prompt: Optional[str] = None,
)
๐ Parameter Description
| Parameter | Type | Required | Description |
|---|---|---|---|
messages |
List[Dict[str, str]] |
โ Yes | List of dialogue messages, each with role (user/assistant) and content |
metadata |
Dict[str, Any] |
โ Yes | Must include: ใป sample_id: unique session ID ใป user: list of character names |
user_id / agent_id / run_id |
Optional[str] | โ No | Mem0-compatible parameters (ignored in TeleMem) |
infer |
bool |
โ No | Whether to auto-generate memory summaries (default: True) |
memory_type |
Optional[str] | โ No | Memory category (auto-classified if omitted) |
prompt |
Optional[str] | โ No | Custom prompt for summarization (uses optimized default if omitted) |
๐ Internal Workflow of add()
- Message preprocessing: Merge consecutive messages from the same speaker; normalize turn structure.
- Multi-perspective summarization:
- Global event summary
- Character 1โs perspective (actions, preferences, relationships)
- Character 2โs perspective
- Vectorization & similarity search: Generate embeddings and retrieve existing similar memories.
- Batch processing: When buffer threshold is reached, invoke LLM to semantically merge similar memories.
- Persistence: Dual-write to FAISS (for retrieval) and JSON (for metadata).
Search Memory (search)
Performs semantic vector-based retrieval of relevant memories with context-aware recall.
def search(
self,
query: str,
*,
user_id: Optional[str] = None,
agent_id: Optional[str] = None,
run_id: Optional[str] = None,
limit: int = 5,
filters: Optional[Dict[str, Any]] = None,
threshold: Optional[float] = None,
rerank: bool = True,
)
๐ Parameter Description
| Parameter | Type | Required | Description |
|---|---|---|---|
query |
str |
โ Yes | Natural language query |
run_id |
str |
โ Yes | Session ID (must match sample_id used in add) |
limit |
int |
โ No | Max number of results (default: 5) |
threshold |
float |
โ No | Similarity threshold (0โ1; auto-tuned if omitted) |
filters |
Dict[str, Any] |
โ No | Custom filters (e.g., by character, time range) |
rerank |
bool |
โ No | Whether to rerank results (default: True) |
user_id / agent_id |
Optional[str] | โ No | Mem0-compatible (no effect in TeleMem) |
๐ Search is based on FAISS vector retrieval, supporting millisecond-level responses.
Multimodal Extensions
Beyond text memory, TeleMem further extends multimodal capabilities. Drawing inspiration from Deep Video Discovery's Agentic Search and Tool Use approach, we implemented two core methods in the TeleMemory class to support intelligent storage and semantic retrieval of video content.
| Method | Description |
|---|---|
add_mm() |
Process video into retrievable memory (frame extraction โ caption generation โ vector database) |
search_mm() |
Query video content using natural language, supporting ReAct-style multi-step reasoning |
Add Multimodal Memory (add_mm)
def add_mm(
self,
video_path: str,
output_dir: str,
clip_secs: int | None = None,
emb_dim: int | None = None,
subtitle_path: str | None = None,
)
๐ Parameter Description
| Parameter | Type | Required | Description |
|---|---|---|---|
| video_path | str | โ Yes | Source video file path, e.g., "video/3EQLFHRHpag.mp4" |
| output_dir | str | โ Yes | Root output directory. Artifacts are written under frames/, captions/, and vdb/ subdirectories |
| clip_secs | int | โ No | Reserved parameter; clip length is currently read from config.vlm["CLIP_SECS"] |
| emb_dim | int | โ No | Embedding dimension, reads from config by default |
| subtitle_path | str | โ No | Subtitle file path (.srt), optional |
๐ add_mm() Internal Flow
- Frame Extraction:
decode_video_to_frames- Decodes video to JPEG frames at configured FPS - Caption Generation:
process_video- Uses VLM (e.g., Qwen3-Omni) to generate detailed descriptions for each clip - Vector Database Construction:
init_single_video_db- Generates embeddings for semantic retrieval
๐ก Smart Caching: If the target file for a stage already exists, that stage is automatically skipped to save computational resources.
Return Value Example
{
"output_dir": "/abs/path/to/output_dir"
}
Search Multimodal Memory (search_mm)
def search_mm(
self,
question: str,
output_dir: str,
max_iterations: int = 15,
)
๐ Parameter Description
| Parameter | Type | Required | Description |
|---|---|---|---|
| question | str | โ Yes | Question string (supports A/B/C/D multiple choice format) |
| output_dir | str | โ Yes | The same root output directory used by add_mm; it must contain exactly one captions/*/captions.json and one vdb/*/*_vdb.json |
| max_iterations | int | โ No | Maximum MMCoreAgent reasoning iterations (default 15) |
๐ ๏ธ ReAct-Style Reasoning Tools
search_mm internally uses MMCoreAgent, employing a THINK โ ACTION โ OBSERVATION loop with three specialized tools:
| Tool Name | Function |
|---|---|
global_browse_tool |
Get global overview of video events and themes |
clip_search_tool |
Search for specific content using semantic queries |
frame_inspect_tool |
Inspect frame details within a specific time range |
Multimodal Example
Run the multimodal demo:
python examples/quickstart_mm.py
On the first run, frames, captions and VDB JSON will be generated under the chosen output_dir. The repository ships a small sample video; generating captions and the video database still requires configured VLM and embedding services unless you already have these artifacts locally.
Complete code example:
import telemem as mem0
from pathlib import Path
from telemem.mm_utils.core import extract_choice_from_msg
# Initialize
memory = mem0.Memory()
# Define paths
repo_root = Path(__file__).resolve().parents[1]
video_path = repo_root / "data" / "samples" / "video" / "3EQLFHRHpag.mp4"
video_name = video_path.stem
output_dir = video_path.parent
# Step 1: Add video to memory (auto-processing)
vdb_json_path = output_dir / "vdb" / video_name / f"{video_name}_vdb.json"
if not vdb_json_path.exists():
result = memory.add_mm(
video_path=str(video_path),
output_dir=str(output_dir),
)
print(f"Video processing complete: {result}")
else:
print(f"VDB already exists: {vdb_json_path}")
# Step 2: Query video content
question = """The problems people encounter in the video are caused by what?
(A) Catastrophic weather.
(B) Global warming.
(C) Financial crisis.
(D) Oil crisis.
"""
messages = memory.search_mm(
question=question,
output_dir=str(output_dir),
max_iterations=15,
)
# Extract final answer
answer = extract_choice_from_msg(messages)
print(f"Answer: ({answer})")
Data Storage
Text Memory Storage
TeleMem automatically creates a structured storage layout under ./faiss_db/, organized by session and character:
faiss_db/
โโโ session_001_events.index
โโโ session_001_events_meta.json
โโโ session_001_person_1.index
โโโ session_001_person_1_meta.json
โโโ session_001_person_2.index
โโโ session_001_person_2_meta.json
๐ Metadata Example (_meta.json)
{
"summary": "Characters discussed the upcoming action plan.",
"sample_id": "session_001",
"round_index": 3,
"timestamp": "2024-01-01T00:00:00Z",
"user": "Jordan" // Only present in person_*.json
}
All memories include summary, round number, timestamp, and character, facilitating auditing and debugging.
Multimodal Memory Storage
TeleMem generates video-related storage files in the .data/samples/video/ directory:
video/
โโโ frames/
โ โโโ <video_name>/
โ โโโ frames/
โ โโโ frame_000001_n0.00.jpg
โ โโโ frame_000002_n0.50.jpg
โ โโโ ...
โโโ captions/
โ โโโ <video_name>/
โ โโโ captions.json # Clip descriptions + subject registry
โ โโโ ckpt/ # Checkpoint for resume
โ โโโ 0_10.json
โ โโโ 10_20.json
โโโ vdb/
โโโ <video_name>/
โโโ <video_name>_vdb.json # Semantic retrieval vector database
๐ captions.json Structure
{
"0_10": {
"caption": "The narrator discusses climate data, showing melting glaciers..."
},
"10_20": {
"caption": "Scene shifts to coastal communities affected by rising sea levels..."
},
"subject_registry": {
"narrator": {
"name": "narrator",
"appearance": ["professional attire"],
"identity": ["climate scientist"],
"first_seen": "00:00:00"
}
}
}
Development and Contribution
- Issues and pull requests are welcome.
- Chinese documentation: README-ZH.md
License
Acknowledgements
TeleMemโs development has been deeply inspired by open-source communities and cutting-edge research. We extend our sincere gratitude to the following projects and teams:
Star History
If you find this project helpful, please give us a โญ๏ธ.
Made with โค๏ธ by Bloo-Mind AI Ltd and the Ubiquitous AGI team at TeleAI.
