hkuds

Professional software vendor delivering innovative solutions on the Softono platform. Specialized in both open-source and proprietary software development.

Visit Website

Total Products

Software by hkuds

Open Source

nanobot

![nanobot README cover](./images/readme-cover.png) <div align="center"> <p> <a href="https://nanobot.wiki/docs/latest/getting-started/nanobot-overview">English</a> | <a href="https://nanobot.wiki/cn/docs/latest/getting-started/nanobot-overview">简体中文</a> | <a href="https://nanobot.wiki/zh-Hant/docs/latest/getting-started/nanobot-overview">繁體中文</a> | <a href="https://nanobot.wiki/es/docs/latest/getting-started/nanobot-overview">Español</a> | <a href="https://nanobot.wiki/fr/docs/latest/getting-started/nanobot-overview">Français</a> | <a href="https://nanobot.wiki/id/docs/latest/getting-started/nanobot-overview">Bahasa Indonesia</a> | <a href="https://nanobot.wiki/ja/docs/latest/getting-started/nanobot-overview">日本語</a> | <a href="https://nanobot.wiki/ko/docs/latest/getting-started/nanobot-overview">한국어</a> | <a href="https://nanobot.wiki/ru/docs/latest/getting-started/nanobot-overview">Русский</a> | <a href="https://nanobot.wiki/vi/docs/latest/getting-started/nanobot-overview">Tiếng Việt</a> </p> <p> <a href="https://pypi.org/project/nanobot-ai/"><img src="https://img.shields.io/pypi/v/nanobot-ai" alt="PyPI"></a> <a href="https://pepy.tech/project/nanobot-ai"><img src="https://static.pepy.tech/badge/nanobot-ai" alt="Downloads"></a> <img src="https://img.shields.io/badge/python-≥3.11-blue" alt="Python"> <img src="https://img.shields.io/badge/license-MIT-green" alt="License"> <a href="https://github.com/HKUDS/nanobot/graphs/commit-activity" target="_blank"> <img alt="Commits last month" src="https://img.shields.io/github/commit-activity/m/HKUDS/nanobot?labelColor=%20%2332b583&color=%20%2312b76a"></a> <a href="https://github.com/HKUDS/nanobot/issues?q=is%3Aissue%20is%3Aclosed" target="_blank"> <img alt="Issues closed" src="https://img.shields.io/github/issues-search?query=repo%3AHKUDS%2Fnanobot%20is%3Aissue%20is%3Aclosed&label=issues%20closed&labelColor=%20%237d89b0&color=%20%235d6b98"></a> <a href="https://twitter.com/intent/follow?screen_name=nanobot_project" target="_blank"> <img src="https://img.shields.io/twitter/follow/nanobot_project?logo=X&color=%20%23f5f5f5" alt="follow on X(Twitter)"></a> <a href="https://nanobot.wiki/docs/latest/getting-started/nanobot-overview"><img src="https://img.shields.io/badge/Docs-nanobot.wiki-blue?style=flat&logo=readthedocs&logoColor=white" alt="Docs"></a> <a href="./COMMUNICATION.md"><img src="https://img.shields.io/badge/Feishu-Group-E9DBFC?style=flat&logo=feishu&logoColor=white" alt="Feishu"></a> <a href="./COMMUNICATION.md"><img src="https://img.shields.io/badge/WeChat-Group-C5EAB4?style=flat&logo=wechat&logoColor=white" alt="WeChat"></a> <a href="https://discord.gg/MnCvHqpUGB"><img src="https://img.shields.io/badge/Discord-Community-5865F2?style=flat&logo=discord&logoColor=white" alt="Discord"></a> </p> </div> 🐈 **nanobot** is an open-source, ultra-lightweight personal AI agent you can truly own. It keeps the agent core small and readable while giving you the practical pieces for real long-running work: WebUI, chat channels, tools, memory, MCP, model routing, automation, and deployment. ## Start Here | You want to... | Go to | |---|---| | Install nanobot with no terminal/config background | [Start Without Technical Background](./docs/start-without-technical-background.md) | | Install quickly and get one CLI reply | [Install](#-install) and [Quick Start](#-quick-start) | | Open the bundled browser UI after the CLI works | [WebUI](#-webui) | | Connect Telegram, Discord, WeChat, Slack, Email, or another chat app | [Chat Apps](./docs/chat-apps.md) | | Configure providers, fallback models, Langfuse, MCP, web tools, or security | [Docs](./docs/README.md) and [Configuration](./docs/configuration.md) | | Understand or extend the internals | [Architecture](./docs/architecture.md) and [Development](./docs/development.md) | ## 📢 News - **2026-06-01** 🚀 Released **v0.2.1** — **The Workbench Release** turns the packaged WebUI into a daily agent workbench: clearer Thought/response timelines, live file-edit activity, project workspaces, model and context controls, steadier sustained goals, CLI Apps + MCP extensions, and broader provider/channel support. Please see [release notes](https://github.com/HKUDS/nanobot/releases/tag/v0.2.1) for details. - **2026-05-30** 🔐 Safer Matrix verification, bounded media downloads, clearer WebUI model timeline. - **2026-05-29** 🧩 Extension registry, context-window tuning, document extraction controls. - **2026-05-28** 🗂️ Project workspaces, access controls, steadier goals and streaming. - **2026-05-27** ⏱️ Codex streams respect idle timeouts during long runs. - **2026-05-26** 📡 Telegram webhooks, refreshed Kagi search, cleaner transport errors. - **2026-05-25** 🔌 Unified CLI Apps and MCP, Step Plan support, steadier sustained goals. - **2026-05-24** 🧰 MCP presets, richer slash actions, configurable OpenAI-compatible requests. - **2026-05-23** 🖼️ Zhipu image generation, longer exec windows, cleaner transcription config. - **2026-05-22** 🛠️ CLI Apps, more image providers, safer web redirects and edits. <details> <summary>Earlier news</summary> - **2026-05-21** ⚡ Novita provider, faster sidebar, smoother coding tools and Weixin replies. - **2026-05-20** 📶 Signal channel, faster gateway startup, multilingual README links. - **2026-05-19** 🎨 Image provider registry, StepFun and Skywork, stronger WebUI controls. - **2026-05-18** 🖌️ Gemini and MiniMax images, Ant Ling, live file-edit activity. - **2026-05-17** 🌊 Smoother WebUI streaming, AutoCompact fixes, buffered CLI reasoning. - **2026-05-16** 🧠 Atomic Chat provider, goal-aware timeouts, safer exec URL handling. - **2026-05-15** 🚀 Released **v0.2.0** — **`/goal`** holds sustained objectives across turns, WebUI now ships inside the wheel, image generation end to end, 5 new providers with `fallback_models`, and a real agent-loop refactor. Please see [release notes](https://github.com/HKUDS/nanobot/releases/tag/v0.2.0) for details. - **2026-05-14** 🎯 **`/goal`** for long-term objectives, visible multi-step progress, long-horizon missions in chat. - **2026-05-13** 🧠 Streaming reasoning before answers, automatic backup models, smoother plug-in reconnects. - **2026-05-12** 🎛️ Saved model presets with WebUI badge, simpler plug-in tools, quieter Feishu topic threads. - **2026-05-11** 🖥️ NVIDIA NIM support, terminal bot name and icon, streamed reasoning and MiMo toggle clarity. - **2026-05-09** 🖼️ Sharper image replay, BYO web-search keys in Settings, Feishu threads routed cleanly. - **2026-05-08** ✨ Inline chat image, redesigned Settings and keys, Dream memory aligned with visible history. - **2026-05-07** 📜 Locale-aware slash palette in WebUI, LAN login, faithful HTTP streaming responses. - **2026-05-06** 🧩 Tunable tool hint, steadier voice and plug-in startups, schedules and reminders that stick. - **2026-05-05** 🛡️ Quiet deny for unknown Telegram chats, Dream cleanup, fuller automation summaries. - **2026-05-04** 🔐 Safer DingTalk outbound media links, durable cron persistence, DeepSeek polish. - **2026-05-03** ⚙️ Predictable shell allow-list behavior, isolated chats mid-reply, cleaner interactive retries. - **2026-05-02** 🐈 LongCat support, smarter token sizing hints, clearer bundled upgrade guidance. - **2026-05-01** ☁️ Native AWS Bedrock provider, tighter helper handoffs and scoped session files. - **2026-04-30** 💬 Feishu threads that honor replies and topics, WhatsApp bridge refresh on source edits. - **2026-04-29** 🚀 Released **v0.1.5.post3** — Smarter threads on Feishu, Discord, Slack, and Teams; **DeepSeek-V4**; Hugging Face & Olostep; choices, `/history`, and steadier long chats. Please see [release notes](https://github.com/HKUDS/nanobot/releases/tag/v0.1.5.post3) for details. - **2026-04-28** 🌐 Olostep web search, Hugging Face provider, safer workspace-tool interruptions. - **2026-04-27** 💬 `/history` command, smarter session replay caps, smoother Discord / Slack threads. - **2026-04-26** 🧭 Natural cron reminders, thread-aware restarts, safer local provider and shell behavior. - **2026-04-25** 🧩 `ask_user` choices, macOS LaunchAgent deployment, MSTeams stale-reference cleanup. - **2026-04-24** 🎥 Video attachments for channels, DeepSeek thinking control, faster document startup. - **2026-04-23** 🧵 Discord thread sessions, Telegram inline buttons, structured tool progress updates. - **2026-04-22** 🔎 GitHub Copilot GPT-5 / o-series support, configurable web fetch, WebUI image uploads. - **2026-04-21** 🚀 Released **v0.1.5.post2** — Windows & Python 3.14 support, Office document reading, SSE streaming for the OpenAI-compatible API, and stronger reliability across sessions, memory, and channels. Please see [release notes](https://github.com/HKUDS/nanobot/releases/tag/v0.1.5.post2) for details. - **2026-04-20** 🎨 Kimi K2.6 support, Telegram long-message split, WebUI typography & dark-mode polish. - **2026-04-19** 🌐 WebUI i18n locale switcher, atomic session writes with auto-repair. - **2026-04-18** 🧪 Initial WebUI chat, smarter setup wizard menus, WebSocket multi-chat multiplexing. - **2026-04-17** 🪟 Windows & Python 3.14 CI, Dream line-age memory, email self-loop guard. - **2026-04-16** 📡 SSE streaming for OpenAI-compatible API, Discord channel allow-list. - **2026-04-15** 🎛️ LM Studio & nullable API keys, MiniMax thinking endpoint, runtime SelfTool. - **2026-04-14** 🚀 Released **v0.1.5.post1** — Dream skill discovery, mid-turn follow-up injection, WebSocket channel, and deeper channel integrations. Please see [release notes](https://github.com/HKUDS/nanobot/releases/tag/v0.1.5.post1) for details. - **2026-04-13** 🛡️ Agent turn hardened — user messages persisted early, auto-compact skips active tasks. - **2026-04-12** 🔒 Lark global domain support, Dream learns discovered skills, shell sandbox tightened. - **2026-04-11** ⚡ Context compact shrinks sessions on the fly; Kagi web search; QQ & WeCom full media. - **2026-04-10** 📓 Multiple MCP servers, Feishu streaming & done-emoji. - **2026-04-09** 🔌 WebSocket channel, unified cross-channel session, `disabled_skills` config. - **2026-04-08** 📤 API file uploads, OpenAI reasoning auto-routing with Responses fallback. - **2026-04-07** 🧠 Anthropic adaptive thinking, MCP resources & prompts exposed as tools. - **2026-04-06** 🛰️ Langfuse observability, unified Whisper transcription, email attachments. - **2026-04-05** 🚀 Released **v0.1.5** — sturdier long-running tasks, Dream two-stage memory, production-ready sandboxing and programming Agent SDK. Please see [release notes](https://github.com/HKUDS/nanobot/releases/tag/v0.1.5) for details. - **2026-04-04** 🚀 Jinja2 response templates, Dream memory hardened, smarter retry handling. - **2026-04-03** 🧠 Xiaomi MiMo provider, chain-of-thought reasoning visible, Telegram UX polish. - **2026-04-02** 🧱 Long-running tasks run more reliably — core runtime hardening. - **2026-04-01** 🔑 GitHub Copilot auth restored; stricter workspace paths; OpenRouter Claude caching fix. - **2026-03-31** 🛰️ WeChat multimodal alignment, Discord/Matrix polish, Python SDK facade, MCP and tool fixes. - **2026-03-30** 🧩 OpenAI-compatible API tightened; composable agent lifecycle hooks. - **2026-03-29** 💬 WeChat voice, typing, QR/media resilience; fixed-session OpenAI-compatible API. - **2026-03-28** 📚 Provider docs refresh; skill template wording fix. - **2026-03-27** 🚀 Released **v0.1.4.post6** — architecture decoupling, litellm removal, end-to-end streaming, WeChat channel, and a security fix. Please see [release notes](https://github.com/HKUDS/nanobot/releases/tag/v0.1.4.post6) for details. - **2026-03-26** 🏗️ Agent runner extracted and lifecycle hooks unified; stream delta coalescing at boundaries. - **2026-03-25** 🌏 StepFun provider, configurable timezone, Gemini thought signatures. - **2026-03-24** 🔧 WeChat compatibility, Feishu CardKit streaming, test suite restructured. - **2026-03-23** 🔧 Command routing refactored for plugins, WhatsApp/WeChat media, unified channel login CLI. - **2026-03-22** ⚡ End-to-end streaming, WeChat channel, Anthropic cache optimization, `/status` command. - **2026-03-21** 🔒 Replace `litellm` with native `openai` + `anthropic` SDKs. Please see [commit](https://github.com/HKUDS/nanobot/commit/3dfdab7). - **2026-03-20** 🧙 Interactive setup wizard — pick your provider, model autocomplete, and you're good to go. - **2026-03-19** 💬 Telegram gets more resilient under load; Feishu now renders code blocks properly. - **2026-03-18** 📷 Telegram can now send media via URL. Cron schedules show human-readable details. - **2026-03-17** ✨ Feishu formatting glow-up, Slack reacts when done, custom endpoints support extra headers, and image handling is more reliable. - **2026-03-16** 🚀 Released **v0.1.4.post5** — a refinement-focused release with stronger reliability and channel support, and a more dependable day-to-day experience. Please see [release notes](https://github.com/HKUDS/nanobot/releases/tag/v0.1.4.post5) for details. - **2026-03-15** 🧩 DingTalk rich media, smarter built-in skills, and cleaner model compatibility. - **2026-03-14** 💬 Channel plugins, Feishu replies, and steadier MCP, QQ, and media handling. - **2026-03-13** 🌐 Multi-provider web search, LangSmith, and broader reliability improvements. - **2026-03-12** 🚀 VolcEngine support, Telegram reply context, `/restart`, and sturdier memory. - **2026-03-11** 🔌 WeCom, Ollama, cleaner discovery, and safer tool behavior. - **2026-03-10** 🧠 Token-based memory, shared retries, and cleaner gateway and Telegram behavior. - **2026-03-09** 💬 Slack thread polish and better Feishu audio compatibility. - **2026-03-08** 🚀 Released **v0.1.4.post4** — a reliability-packed release with safer defaults, better multi-instance support, sturdier MCP, and major channel and provider improvements. Please see [release notes](https://github.com/HKUDS/nanobot/releases/tag/v0.1.4.post4) for details. - **2026-03-07** 🚀 Azure OpenAI provider, WhatsApp media, QQ group chats, and more Telegram/Feishu polish. - **2026-03-06** 🪄 Lighter providers, smarter media handling, and sturdier memory and CLI compatibility. - **2026-03-05** ⚡️ Telegram draft streaming, MCP SSE support, and broader channel reliability fixes. - **2026-03-04** 🛠️ Dependency cleanup, safer file reads, and another round of test and Cron fixes. - **2026-03-03** 🧠 Cleaner user-message merging, safer multimodal saves, and stronger Cron guards. - **2026-03-02** 🛡️ Safer default access control, sturdier Cron reloads, and cleaner Matrix media handling. - **2026-03-01** 🌐 Web proxy support, smarter Cron reminders, and Feishu rich-text parsing improvements. - **2026-02-28** 🚀 Released **v0.1.4.post3** — cleaner context, hardened session history, and smarter agent. Please see [release notes](https://github.com/HKUDS/nanobot/releases/tag/v0.1.4.post3) for details. - **2026-02-27** 🧠 Experimental thinking mode support, DingTalk media messages, Feishu and QQ channel fixes. - **2026-02-26** 🛡️ Session poisoning fix, WhatsApp dedup, Windows path guard, Mistral compatibility. - **2026-02-25** 🧹 New Matrix channel, cleaner session context, auto workspace template sync. - **2026-02-24** 🚀 Released **v0.1.4.post2** — a reliability-focused release with a redesigned heartbeat, prompt cache optimization, and hardened provider & channel stability. See [release notes](https://github.com/HKUDS/nanobot/releases/tag/v0.1.4.post2) for details. - **2026-02-23** 🔧 Virtual tool-call heartbeat, prompt cache optimization, Slack mrkdwn fixes. - **2026-02-22** 🛡️ Slack thread isolation, Discord typing fix, agent reliability improvements. - **2026-02-21** 🎉 Released **v0.1.4.post1** — new providers, media support across channels, and major stability improvements. See [release notes](https://github.com/HKUDS/nanobot/releases/tag/v0.1.4.post1) for details. - **2026-02-20** 🐦 Feishu now receives multimodal files from users. More reliable memory under the hood. - **2026-02-19** ✨ Slack now sends files, Discord splits long messages, and subagents work in CLI mode. - **2026-02-18** ⚡️ nanobot now supports VolcEngine, MCP custom auth headers, and Anthropic prompt caching. - **2026-02-17** 🎉 Released **v0.1.4** — MCP support, progress streaming, new providers, and multiple channel improvements. Please see [release notes](https://github.com/HKUDS/nanobot/releases/tag/v0.1.4) for details. - **2026-02-16** 🦞 nanobot now integrates a [ClawHub](https://clawhub.ai) skill — search and install public agent skills. - **2026-02-15** 🔑 nanobot now supports OpenAI Codex provider with OAuth login support. - **2026-02-14** 🔌 nanobot now supports MCP! See [MCP section](./docs/configuration.md#mcp-model-context-protocol) for details. - **2026-02-13** 🎉 Released **v0.1.3.post7** — includes security hardening and multiple improvements. **Please upgrade to the latest version to address security issues**. See [release notes](https://github.com/HKUDS/nanobot/releases/tag/v0.1.3.post7) for more details. - **2026-02-12** 🧠 Redesigned memory system — Less code, more reliable. Join the [discussion](https://github.com/HKUDS/nanobot/discussions/566) about it! - **2026-02-11** ✨ Enhanced CLI experience and added MiniMax support! - **2026-02-10** 🎉 Released **v0.1.3.post6** with improvements! Check the updates [notes](https://github.com/HKUDS/nanobot/releases/tag/v0.1.3.post6) and our [roadmap](https://github.com/HKUDS/nanobot/discussions/431). - **2026-02-09** 💬 Added Slack, Email, and QQ support — nanobot now supports multiple chat platforms! - **2026-02-08** 🔧 Refactored Providers—adding a new LLM provider now takes just 2 simple steps! Check [here](./docs/configuration.md#providers). - **2026-02-07** 🚀 Released **v0.1.3.post5** with Qwen support & several key improvements! Check [here](https://github.com/HKUDS/nanobot/releases/tag/v0.1.3.post5) for details. - **2026-02-06** ✨ Added Moonshot/Kimi provider, Discord integration, and enhanced security hardening! - **2026-02-05** ✨ Added Feishu channel, DeepSeek provider, and enhanced scheduled tasks support! - **2026-02-04** 🚀 Released **v0.1.3.post4** with multi-provider & Docker support! Check [here](https://github.com/HKUDS/nanobot/releases/tag/v0.1.3.post4) for details. - **2026-02-03** ⚡ Integrated vLLM for local LLM support and improved natural language task scheduling! - **2026-02-02** 🎉 nanobot officially launched! Welcome to try 🐈 nanobot! </details> ## 💡 Why nanobot - **Persistent workflows**: goals, memory, tools, and chat context survive long-running work. - **Chat-native reach**: WebUI, API, Telegram, Feishu, Slack, Discord, Teams, and email. - **Model freedom**: OpenAI-compatible APIs, local LLMs, image generation, search, and fallbacks. - **Small core**: readable internals with MCP, memory, deployment, and automation built in. - **Own your stack**: inspect, customize, self-host, and extend without a giant platform. ## 📦 Install > [!IMPORTANT] > If you want the newest features and experiments, install from source. > > If you want the most stable day-to-day experience, install from PyPI or with `uv`. Pick **one** install method: Prerequisites: Python 3.11 or newer. Git is only needed for a source install; Node.js/Bun are only needed if you are developing the WebUI itself. If terminals, API keys, or config files are new to you, use the guided zero-background walkthrough in [Start Without Technical Background](./docs/start-without-technical-background.md) instead of this compact README path. **One-command setup** macOS / Linux: ```bash sh -c "$(curl -fsSL https://raw.githubusercontent.com/HKUDS/nanobot/main/scripts/install.sh)" ``` Windows PowerShell: ```powershell irm https://raw.githubusercontent.com/HKUDS/nanobot/main/scripts/install.ps1 | iex ``` The default command installs or upgrades `nanobot-ai` from PyPI, then starts `nanobot onboard --wizard`. If you finish the wizard and save the config, skip the manual initialize/configure steps below and go straight to **Test one message**. To preview the plan without changing your environment, pass `--dry-run`; combine it with `--dev` when you want to preview the main-branch install. ```bash sh -c "$(curl -fsSL https://raw.githubusercontent.com/HKUDS/nanobot/main/scripts/install.sh)" -- --dry-run ``` ```powershell & ([scriptblock]::Create((irm https://raw.githubusercontent.com/HKUDS/nanobot/main/scripts/install.ps1))) --dry-run ``` To install the current `main` branch instead, pass `--dev`: ```bash sh -c "$(curl -fsSL https://raw.githubusercontent.com/HKUDS/nanobot/main/scripts/install.sh)" -- --dev ``` ```powershell & ([scriptblock]::Create((irm https://raw.githubusercontent.com/HKUDS/nanobot/main/scripts/install.ps1))) --dev ``` If you prefer to inspect the script first, open [`scripts/install.sh`](./scripts/install.sh) or [`scripts/install.ps1`](./scripts/install.ps1). **Install from PyPI** ```bash python -m pip install nanobot-ai ``` **Install with `uv`** ```bash uv tool install nanobot-ai ``` **Install from source** ```bash git clone https://github.com/HKUDS/nanobot.git cd nanobot python -m pip install -e . ``` Verify the install: ```bash nanobot --version ``` ## 🚀 Quick Start **1. Initialize** Skip this step if the one-command setup already started the wizard and you saved the config there. ```bash nanobot onboard ``` Use `nanobot onboard --wizard` if you prefer an interactive setup. **2. Configure** (`~/.nanobot/config.json`) Skip this step if you already configured provider and model settings in the wizard. `nanobot onboard` creates `~/.nanobot/config.json` and `~/.nanobot/workspace/`. Configure these **two parts** in the config file. Add or merge the following blocks into the existing file instead of replacing the whole file. The example below uses [OpenRouter](https://openrouter.ai/keys) only so the JSON has concrete names. Provider examples are recipes, not rankings or endorsements. If you use another provider, replace the provider config key, API key, preset provider name, and model ID together. *Set your API key*: ```json { "providers": { "openrouter": { "apiKey": "sk-or-v1-xxx" } } } ``` *Set a model preset and make it active*: ```json { "modelPresets": { "primary": { "label": "Primary", "provider": "openrouter", "model": "anthropic/claude-opus-4.5", "maxTokens": 8192, "contextWindowTokens": 65536, "temperature": 0.1 } }, "agents": { "defaults": { "modelPreset": "primary" } } } ``` Direct `agents.defaults.provider` and `agents.defaults.model` still work for existing configs, but named presets are the recommended path because they also power `/model` switching and `fallbackModels`. For another provider, the same config shape still applies: | Replace | Where | |---|---| | Provider config key | `providers.<provider>` | | API key | `providers.<provider>.apiKey` | | Preset provider name | `modelPresets.primary.provider` | | Model ID | `modelPresets.primary.model` | | Endpoint URL, only when needed | `providers.<provider>.apiBase` | **3. Test one message** ```bash nanobot status nanobot agent -m "Hello!" ``` In `nanobot status`, it is normal for most providers to say `not set`. The active preset's provider should be configured, and `Config` plus `Workspace` should show check marks. If that works, start an interactive chat: ```bash nanobot agent ``` Need help with `PATH`, API keys, provider/model matching, or JSON errors? See the fuller [Install and Quick Start](./docs/quick-start.md) and [Troubleshooting](./docs/troubleshooting.md). - Want a pasteable provider setup? See [Provider Cookbook](./docs/provider-cookbook.md) - Want to understand provider/model matching? See [Providers and Models](./docs/providers.md) - Want web search, MCP, security settings, or more config options? See [Configuration](./docs/configuration.md) - Want to run locally? See [Ollama](./docs/providers.md#ollama), [vLLM or another local OpenAI-compatible server](./docs/providers.md#vllm-or-other-local-openai-compatible-server), and the full [provider reference](./docs/configuration.md#providers). - Want to run nanobot in chat apps like Telegram, Discord, WeChat or Feishu? See [Chat Apps](./docs/chat-apps.md) - Want Docker or Linux service deployment? See [Deployment](./docs/deployment.md) ## 🌐 WebUI The WebUI ships **inside the published wheel** — no extra build step. Just enable the WebSocket channel and open it in your browser. <p align="center"> <img src="images/nanobot_webui.png" alt="nanobot webui preview" width="900"> </p> **1. Enable the WebSocket channel in `~/.nanobot/config.json`** Merge this block into your existing config: ```json { "channels": { "websocket": { "enabled": true } } } ``` **2. Start the gateway** ```bash nanobot gateway ``` **3. Open the WebUI** Visit [`http://127.0.0.1:8765`](http://127.0.0.1:8765) in your browser. To open it from another device on your LAN, see [WebUI docs → LAN access](./webui/README.md#access-from-another-device-lan). The WebUI is served by the WebSocket channel on port `8765` by default. The gateway's `18790` port is for the health endpoint, not the browser UI. > [!TIP] > Working on the WebUI itself? Check out [`webui/README.md`](./webui/README.md) for the Vite dev server (HMR) workflow. ## 🏗️ Architecture <p align="center"> <img src="images/nanobot_arch.png" alt="nanobot architecture" width="800"> </p> 🐈 nanobot stays lightweight by centering everything around a small agent loop: messages come in from chat apps, the LLM decides when tools are needed, and memory or skills are pulled in only as context instead of becoming a heavy orchestration layer. That keeps the core path readable and easy to extend, while still letting you add channels, tools, memory, and deployment options without turning the system into a monolith. ## ✨ Features <table align="center"> <tr align="center"> <th><p align="center">📈 24/7 Real-Time Market Analysis</p></th> <th><p align="center">🚀 Full-Stack Software Engineer</p></th> <th><p align="center">📅 Smart Daily Routine Manager</p></th> <th><p align="center">📚 Personal Knowledge Assistant</p></th> </tr> <tr> <td align="center"><p align="center"><img src="case/search.gif" width="180" height="400"></p></td> <td align="center"><p align="center"><img src="case/code.gif" width="180" height="400"></p></td> <td align="center"><p align="center"><img src="case/schedule.gif" width="180" height="400"></p></td> <td align="center"><p align="center"><img src="case/memory.gif" width="180" height="400"></p></td> </tr> <tr> <td align="center">Discovery • Insights • Trends</td> <td align="center">Develop • Deploy • Scale</td> <td align="center">Schedule • Automate • Organize</td> <td align="center">Learn • Memory • Reasoning</td> </tr> </table> ## 📚 Docs Browse the [repo docs](./docs/README.md) for the latest features and GitHub development version, or visit [nanobot.wiki](https://nanobot.wiki/docs/latest/getting-started/nanobot-overview) for the stable release documentation. - Start with no technical background: [Start Without Technical Background](./docs/start-without-technical-background.md) - Start from zero with developer basics: [Install and Quick Start](./docs/quick-start.md) - Understand the runtime model: [Concepts](./docs/concepts.md) - Read the source-level map: [Architecture](./docs/architecture.md) - Choose a provider/model: [Providers and Models](./docs/providers.md) - Copy provider setup recipes: [Provider Cookbook](./docs/provider-cookbook.md) - Debug setup and runtime failures: [Troubleshooting](./docs/troubleshooting.md) - Talk to your nanobot with familiar chat apps: [Chat Apps](./docs/chat-apps.md) - Configure providers, web search, MCP, and runtime behavior: [Configuration](./docs/configuration.md) - Integrate nanobot with local tools and automations: [OpenAI-Compatible API](./docs/openai-api.md) · [Python SDK](./docs/python-sdk.md) - Run nanobot with Docker or as a Linux service: [Deployment](./docs/deployment.md) ## 🤝 Contribute & Roadmap PRs welcome! The codebase is intentionally small and readable. 🤗 ### Contribution Flow See [CONTRIBUTING.md](./CONTRIBUTING.md) for setup, review, and contribution guidelines. **Roadmap** — Pick an item and [open a PR](https://github.com/HKUDS/nanobot/pulls)! - **Multi-modal** — See and hear (images, voice, video) - **Long-term memory** — Never forget important context - **Better reasoning** — Multi-step planning and reflection - **More integrations** — Calendar and more - **Self-improvement** — Learn from feedback and mistakes ## Contact This project was started by [Xubin Ren](https://github.com/re-bin) as a personal open-source project and continues to be maintained in an individual capacity using personal resources, with contributions from the open-source community. Feel free to contact [[email protected]](mailto:[email protected]) for questions, ideas, or collaboration. ### Contributors <a href="https://github.com/HKUDS/nanobot/graphs/contributors"> <img src="https://contrib.rocks/image?repo=HKUDS/nanobot&max=100&columns=12&updated=20260210" alt="Contributors" /> </a> ## ⭐ Star History <div align="center"> <a href="https://star-history.com/#HKUDS/nanobot&Date"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/svg?repos=HKUDS/nanobot&type=Date&theme=dark" /> <source media="(prefers-color-scheme: light)" srcset="https://api.star-history.com/svg?repos=HKUDS/nanobot&type=Date" /> <img alt="Star History Chart" src="https://api.star-history.com/svg?repos=HKUDS/nanobot&type=Date" style="border-radius: 15px; box-shadow: 0 0 30px rgba(0, 217, 255, 0.3);" /> </picture> </a> </div> <p align="center"> <em> Thanks for visiting ✨ nanobot!</em><br><br> <img src="https://visitor-badge.laobi.icu/badge?page_id=HKUDS.nanobot&style=for-the-badge&color=00d4ff" alt="Views"> </p>

AI Agents

43.9K Github Stars

Open Source

LightRAG

LightRAG is a simple and fast Retrieval-Augmented Generation framework designed to simplify the deployment of advanced RAG systems. Developed by HKUDS, it enables efficient retrieval of information from large datasets to enhance large language model responses. The system supports multiple text chunking strategies including Fix, Recursive, Vector, and Paragraph methods to optimize data indexing. Recent updates integrate multimodal content parsing and extraction capabilities via MinerU and Docling services, allowing the processing of complex document formats beyond plain text. LightRAG offers role-specific Large Language Model configurations for distinct tasks such as extraction and query handling, ensuring tailored performance for different stages of the RAG pipeline. It is compatible with Python 3.10 and is available via PyPI. The project is actively maintained with a robust community presence, offering multilingual support including English and Chinese documentation. Its architecture focuses on balancing sim

ML Frameworks Knowledge Bases & RAG

36.3K Github Stars

Open Source

Vibe-Trading

<p align="center"> <b>English</b> | <a href="README_zh.md">中文</a> | <a href="README_ja.md">日本語</a> | <a href="README_ko.md">한국어</a> | <a href="README_ar.md">العربية</a> </p> <p align="center"> <img src="assets/icon.png" width="120" alt="Vibe-Trading Logo"/> </p> <h1 align="center">Vibe-Trading: Your Personal Trading Agent</h1> <p align="center"> <b>One Command to Empower Your Agent with Comprehensive Trading Capabilities</b> </p> <p align="center"> <a href="https://trendshift.io/repositories/25527" target="_blank"><img src="https://trendshift.io/api/badge/repositories/25527" alt="HKUDS%2FVibe-Trading | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a> </p> <p align="center"> <img src="https://img.shields.io/badge/Python-3.11%2B-3776AB?style=flat&logo=python&logoColor=white" alt="Python"> <img src="https://img.shields.io/badge/Backend-FastAPI-009688?style=flat" alt="FastAPI"> <img src="https://img.shields.io/badge/Frontend-React%2019-61DAFB?style=flat&logo=react&logoColor=white" alt="React"> <a href="https://pypi.org/project/vibe-trading-ai/"><img src="https://img.shields.io/pypi/v/vibe-trading-ai?style=flat&logo=pypi&logoColor=white" alt="PyPI"></a> <a href="LICENSE"><img src="https://img.shields.io/badge/License-MIT-yellow?style=flat" alt="License"></a> <br> <a href="https://github.com/HKUDS/.github/blob/main/profile/README.md"><img src="https://img.shields.io/badge/Feishu-Group-E9DBFC?style=flat-square&logo=feishu&logoColor=white" alt="Feishu"></a> <a href="https://github.com/HKUDS/.github/blob/main/profile/README.md"><img src="https://img.shields.io/badge/WeChat-Group-C5EAB4?style=flat-square&logo=wechat&logoColor=white" alt="WeChat"></a> <a href="https://discord.gg/2vDYc2w5"><img src="https://img.shields.io/badge/Discord-Join-7289DA?style=flat-square&logo=discord&logoColor=white" alt="Discord"></a> </p> <p align="center"> <a href="https://vibetrading.wiki/">Website</a>  ·  <a href="https://vibetrading.wiki/docs/">Docs</a>  ·  <a href="#-news">News</a>  ·  <a href="#-key-features">Features</a>  ·  <a href="#-shadow-account">Shadow Account</a>  ·  <a href="#-demo">Demo</a>  ·  <a href="#-quick-start">Quick Start</a>  ·  <a href="#-examples">Examples</a>  ·  <a href="#-api-server">API / MCP</a>  ·  <a href="#-roadmap">Roadmap</a>  ·  <a href="#-contributing">Contributing</a> </p> <p align="center"> <a href="#-quick-start"><img src="assets/pip-install.svg" height="45" alt="pip install vibe-trading-ai"></a> </p> --- ## 📰 News - **2026-06-09** 🔑 **Clearer error when the Web UI is opened from another machine**: Reaching the chat from a non-loopback client (another machine, a VM host, a phone on your LAN) without `API_AUTH_KEY` set returned `403` on every sensitive endpoint — sending a message, listing sessions, live status — but the chat only showed a generic "Failed to send message, please retry." The send path now surfaces the real reason — *"Remote API access requires an API key. Add it in Settings, or run the backend on localhost for local-only use."* — and the README's web-UI setup spells out the localhost-vs-LAN rule plus the three fixes (browse via `localhost` on the same machine; set `API_AUTH_KEY` and enter it once in Settings; or `VIBE_TRADING_TRUST_DOCKER_LOOPBACK=1` for Docker Desktop's host gateway) ([#191](https://github.com/HKUDS/Vibe-Trading/issues/191), thanks @mafia23). - **2026-06-08** 🔧 **Gemini 3.x multi-turn tool-calling fix**: This completes the Gemini 3.x thinking-model fix. The 6/05 round-trip ([#176](https://github.com/HKUDS/Vibe-Trading/pull/176)) only covered in-memory history, but the real agent loop replays history as OpenAI-format dicts where LangChain dropped the per-tool-call `thought_signature` before the request was built — so multi-turn tool calling still 400'd with `missing thought_signature`. It is now re-attached at the single `_convert_input` chokepoint both `invoke` and `stream` pass through (parallel calls, where only the first of N is signed, included) ([#184](https://github.com/HKUDS/Vibe-Trading/pull/184), thanks @ngoanpv). - **2026-06-07** 🐝 **Live swarm status in the chat timeline**: When the agent launches a multi-agent swarm (investment committee, quant desk, risk committee, …), the chat now renders an inline **status card** that streams each worker's state — waiting / running / done / failed / blocked / retrying — in real time, the same per-agent visibility the standalone swarm dashboard already had. Runtime events are bridged into the session SSE stream without changing the existing `/swarm/runs` API, and a finished card rehydrates from the final `run_swarm` result on reconnect or history replay ([#188](https://github.com/HKUDS/Vibe-Trading/pull/188), thanks @BillDin). Preset routing also got sharper: an explicitly named preset (e.g. `investment_committee`, with or without underscores) now wins over keyword scoring, and the bare `IV` derivatives keyword no longer false-matches inside ordinary words like "g**iv**en" ([#189](https://github.com/HKUDS/Vibe-Trading/pull/189), thanks @BillDin). <details> <summary>Earlier news</summary> - **2026-06-06** ⚖️ **Alpha compare — head-to-head across CLI, Web UI, REST & agent**: A new `alpha compare` benches a hand-picked shortlist of Alpha Zoo alphas against each other on a universe and period, then ranks them by IC mean/std, IR, IC-positive ratio or sample count — each with its gap to the leader. Unlike a full-zoo bench it evaluates **only the alphas you name** (a new `run_bench(only=…)` subset filter), so comparing three alphas no longer scores all 191 in their zoo. One shared core powers every surface: `vibe-trading alpha compare <id1> <id2> … --sort ir` (CLI), a **Compare view** in the Alpha Zoo Web UI (tick alphas in the catalogue → one-click compare with a streamed ranking table), `POST /alpha/compare` + SSE (REST), and a read-only `alpha_compare` agent tool (**47 tools** now). - **2026-06-05** 🇮🇳 **Dhan + Shoonya connectors (India) — 10 brokers total**: The connector-first trading layer adds **Dhan** and **Shoonya** for the Indian market (NSE/BSE equities + F&O), bringing the roster to ten brokers. Both are **paper + read-only** — like Longbridge, their APIs expose no runtime paper/live discriminator, so their `place_order` / `cancel_order` hard-refuse any non-paper config at the first line (the rule: a broker with no structural paper/live guard is capped at paper + read-only) ([#181](https://github.com/HKUDS/Vibe-Trading/pull/181), closes [#174](https://github.com/HKUDS/Vibe-Trading/issues/174)). This cycle also fixes **Gemini 2.5 / 3.x thinking models**: their per-tool-call `thoughtSignature` now round-trips through the OpenAI-compatible path, so multi-turn function calling no longer fails with `INVALID_ARGUMENT` ([#176](https://github.com/HKUDS/Vibe-Trading/pull/176), closes [#170](https://github.com/HKUDS/Vibe-Trading/issues/170), thanks @mvanhorn & @jliu6789). Chinese docstrings landed on all **452 Alpha Zoo factors** ([#180](https://github.com/HKUDS/Vibe-Trading/pull/180), thanks @LeeCQiang), and a **frontend test suite (197 vitest tests)** plus backend auth / path-traversal / CORS security tests joined CI ([#175](https://github.com/HKUDS/Vibe-Trading/pull/175), thanks @sambazhu). - **2026-06-04** 🗃️ **Opt-in local data cache for all 7 data sources**: A new `VIBE_TRADING_DATA_CACHE` switch lets every backtest loader — tushare, okx, ccxt, akshare, mootdx, yfinance, futu — cache settled historical bars under `~/.vibe-trading/cache` (user home, never the repo), so repeated and long-horizon / cross-market backtests skip the network and avoid provider rate limits. Off by default. Batch and connection loaders (yfinance, futu) skip the bulk download / FutuOpenD connection entirely on a full cache hit, a staleness guard never caches a range ending today (its last bar is still forming), and cached frames round-trip byte-identical to freshly fetched ones ([#177](https://github.com/HKUDS/Vibe-Trading/pull/177), thanks @mvanhorn). A new contributor guide for AI / automation-assisted PRs also landed, mapping safe local checks and high-risk broker/MCP/credential surfaces ([#173](https://github.com/HKUDS/Vibe-Trading/pull/173)). - **2026-06-03** 🧹 **Community triage + trace correlation**: Tool-call trace entries now carry the originating `call_id`, so a `tool_result` can be matched back to its `tool_call` when replaying a run trace — arg previews stay truncated to keep trace files small ([#168](https://github.com/HKUDS/Vibe-Trading/pull/168), thanks @zwrong). Source comments no longer point at an internal-only docs path that external contributors couldn't find ([#166](https://github.com/HKUDS/Vibe-Trading/issues/166), thanks @jaleelpersonal). Also clarified that the `langchain-community` resolver warning on install is a harmless leftover-package notice, not a failure ([#167](https://github.com/HKUDS/Vibe-Trading/issues/167)), and scoped Gemini 2.5/3.0 `thoughtSignature` round-tripping for function calls as a `help wanted` task with a full fix plan ([#170](https://github.com/HKUDS/Vibe-Trading/issues/170), thanks @jliu6789). - **2026-06-02** 🔌 **Six new broker connectors (Tiger / Longbridge / Alpaca / OKX / Binance / Futu)**: The connector-first trading layer gains a direct-SDK transport alongside IBKR (local) and Robinhood (MCP). Each connector exposes read-only account / positions / orders / quote / history **plus paper-account order placement** — test your strategies across these broker paper accounts. Five of them (Tiger, Alpaca, OKX, Binance, Futu) also support **bounded, mandate-gated order placement** behind the same safety model as Robinhood: a user-committed mandate (symbol universe / order size / exposure / leverage / daily cap), a filesystem kill switch, a fail-closed pre-trade gate, and a full audit ledger. **Longbridge is paper + read-only only** (its API exposes no runtime paper/live discriminator). Every paper/live distinction is a structural per-broker guard — account-id format, host separation, demo flag, or trade environment. New `trading_place_order` / `trading_cancel_order` tools; HK and A-share asset classes added to the mandate universe. Experimental / use at your own risk. - **2026-06-01** 🚀 **v0.1.9 released** (`pip install -U vibe-trading-ai`): Rolls up everything since 0.1.8. Connector-first broker profiles (IBKR local read-only TWS / IB Gateway + Robinhood Agentic Trading behind OAuth, a committed mandate, order guard, audit ledger, and instant halt). Research Goal runtime across CLI / REST / MCP / Web. A swarm pass — live reconcile + MCP keepalive, operator-configured worker MCP tools, a strict alpha-bench random control, and a new `retry_run` to relaunch failed/stale runs (**36 MCP tools** now). The `agent/cli/` package refactor with a refreshed terminal UI, the `mootdx` no-token A-share loader, and a robustness pass across backtest / agent loop / sessions. `--version` now always matches the installed package, fixing the 0.1.8 drift ([#156](https://github.com/HKUDS/Vibe-Trading/issues/156)). - **2026-05-31** 🔌 **Connector-first broker architecture (IBKR + Robinhood)**: Trading access now starts from a selectable connector profile instead of separate broker/live entry points. `vibe-trading connector list/use/check/account/positions/orders/quote/history` and the MCP `trading_*` tools share the same selected profile, where paper/live is an attribute of the connector. IBKR can be used immediately through a local read-only TWS / IB Gateway profile, while the official IBKR remote MCP path is seeded as an OAuth `mcp.read` probe until stable read tool names are available. Robinhood Agentic Trading remains the bounded live MCP connector behind OAuth, a committed mandate, order guard, audit ledger, and instant halt. - **2026-05-30** 🧰 **Robustness pass — backtest, agent loop, sessions**: LLM-generated signal engines now pass pre-flight interface validation before instantiation, catching circular self-imports, a missing `generate()`, non-defaulted `__init__` args, and wrong return types with actionable JSON errors instead of raw tracebacks ([#149](https://github.com/HKUDS/Vibe-Trading/pull/149)); a follow-up routes source-level AST validation errors through the same clean JSON envelope. The agent loop no longer burns all 50 iterations into a `failed` status with no output — it mirrors the swarm worker's wrap-up nudge at 80% of the iteration budget and drops tool definitions on the last iteration to force a final text answer ([#148](https://github.com/HKUDS/Vibe-Trading/pull/148)), guarded to fire only mid-run so it never displaces research-goal context. Session message writes now `flush + fsync` each append so expensive AI responses survive a mid-write crash, and the read path skips corrupted JSONL lines (logging the first 200 chars for recovery) instead of 500-ing the whole `/messages` endpoint ([#147](https://github.com/HKUDS/Vibe-Trading/pull/147)). The Web composer also fixes IME Enter handling so a composition-confirming Enter no longer submits mid-word ([#146](https://github.com/HKUDS/Vibe-Trading/pull/146)). - **2026-05-29** 🔐 **Robinhood Agentic Trading support (opt-in, bounded autonomy)**: Adds support for Robinhood Agentic Trading (remote MCP, OAuth). Off and read-only by default; the agent acts only inside a user-committed mandate (symbols / order size / exposure / leverage / daily cap), with a filesystem-level instant kill switch, preemptive flatten, mandate auto-expiry, a full audit ledger, and a persistent autonomous runner. No custody, no venue — the broker holds funds and executes; we only relay intent. Experimental / use at your own risk. - **2026-05-28** 🧪 **Swarm safety + strict alpha gate + worker MCP**: Swarm DAG blocks downstream tasks when upstream fails ([#145](https://github.com/HKUDS/Vibe-Trading/pull/145)). New `run_bench_strict()` adds a same-universe random control + OOS split to catch factors that just track market beta ([#143](https://github.com/HKUDS/Vibe-Trading/pull/143), thanks @Soli22de). Swarm workers can call operator-configured external MCP servers, with trust boundary pinned ([#142](https://github.com/HKUDS/Vibe-Trading/pull/142), thanks @shadowinlife). - **2026-05-27** 📊 **mootdx A-share data source + output polish**: New `mootdx` loader speaks the native 通达信 TCP protocol for A-share OHLCV (no auth, no IP rate-limit, daily + intraday with 25-page walk-back pagination), slotting between tushare and akshare in the fallback chain ([#107](https://github.com/HKUDS/Vibe-Trading/issues/107)). CCXT loader now reads `HTTP_PROXY/HTTPS_PROXY/ALL_PROXY` so Binance/OKX public data works from restricted networks ([#126](https://github.com/HKUDS/Vibe-Trading/pull/126), thanks @ruok808). Final-answer rendering also dropped the ugly full-width `---` horizontal separators on CLI and Web: the system prompt now nudges the agent toward markdown tables and `##` headings, the CLI renderer strips standalone HRs as defense-in-depth, and the chat bubble hides any `<hr>` that slips through ([#139](https://github.com/HKUDS/Vibe-Trading/issues/139), thanks @sdwxm188). - **2026-05-26** ✅ **Research Goal lifecycle closure**: Goal mode now behaves like a real task runner: Web UI goal creation creates or binds the session and immediately sends the kickoff turn; active goals can be continued, edited, cancelled, and completed across Web/API/CLI/MCP; and the agent advances from the current goal snapshot (criteria, evidence, claims, open items) instead of only the original prompt. Covered-but-still-active goals now enter an audit/status update instead of stopping silently, with regression coverage across backend, CLI, MCP, and frontend events. - **2026-05-25** 🧼 **Cleaner chat UI + composer workflow**: The Web UI keeps chat focused on the next action: upload, swarm, and research-goal modes now live behind the composer `+` menu instead of floating panels. Active context appears above the input as compact chips, and goal details expand inline only when needed. The UI also drops the old custom i18n layer in favor of direct English copy, gates Full Report cards to report-worthy runs, and hardens local dev startup/status reporting for reliable browser smoke tests. - **2026-05-24** 🎯 **Research Goal runtime**: Added a session-scoped Research Goal layer across backend, CLI, API/MCP, SSE, and Web UI. Goals persist claims, acceptance criteria, evidence rows, budgets, and completion policy; agent tools can create goals and attach evidence; `/goal` gives the CLI a direct entry point; REST/MCP expose goal snapshots and evidence writes; SSE keeps chat clients fresh. Follow-up audit fixes locked down verified evidence, blocked live-trading risk tiers through agent tools, wired CLI-created goals into later turns, cleaned goal ledgers on session deletion, enabled replay-all, and fixed cross-session frontend races. - **2026-05-23** 🖥️ **Interactive CLI refresh**: The terminal front door now opens with a larger Vibe-Trading banner, a cleaner prompt divider, prior-turn recap, post-run timing, and a Claude Code-style activity rail for live agent work. Tool calls, web/data fetches, shell-style actions, Markdown answers, and pipe tables render in a more readable transcript, while piped or non-TTY runs keep plain-text output for automation. Generated CLI screenshots are now treated as local artifacts instead of committed docs files, keeping the repository lighter. - **2026-05-22** 🧭 **Swarm recovery + MCP keepalive**: Swarm status now reconciles from live task files on every read, so API/MCP/SSE/list views recover crashed or stale runs instead of showing permanent `running` snapshots. `run_swarm` sends MCP progress heartbeats while it polls, with a fixed first frame of `swarm_started run_id=<id>` for clients that reconnect after transport drops; workers now heartbeat through LLM streaming, grounding fetches, and tool execution. The stale-run reaper uses per-run thresholds and derives terminal status from task states, `SwarmTool` no longer cancels a still-running team just because its wait budget elapsed, and MCP clients can call `reap_stale_runs()` for explicit cleanup. Today's DX pass also refreshed provider default models and aligned CI syntax checks with the new `agent/cli/` package. 22 new regressions cover hydration, terminal recovery, stale reaping, keepalive cadence, env parsing, and heartbeat wiring; the full swarm/MCP suite is at 169 passed, 4 skipped. - **2026-05-21** 🧱 **CLI package refactor**: `agent/cli.py` (3216 LOC) split into the `agent/cli/` package — interactive front door, slash router, Rich components, plus a `_legacy.py` shim that preserves every subcommand and re-exports every public symbol so `cli.cmd_*` / `cli._INIT_ENV_PATH` / `cli.Confirm` keep working. New FastAPI middleware serves the SPA shell when a browser opens `/runs/{id}` or `/correlation` directly; same narrowing landed in the Vite dev proxy. Version unified via `cli/_version.py` (no more drift between `--version` and the banner), `python -m cli` restored via `__main__.py`, and the chat-gate narrowed so `chat --help` / `chat extra` reach legacy argparse instead of being swallowed by the REPL. - **2026-05-20** 🔬 **Hypothesis Registry CLI**: Closes the CLI side of the Hypothesis Registry shipped backend-only on 2026-05-16. `vibe-trading hypothesis list` prints a Rich table or JSON (`--status` filter, `--limit`); `show <id>` renders a detail panel including linked run cards; `invalidate <id> --note "..."` flips status to `rejected` while preserving prior invalidation notes when `--note` is omitted. Honors the existing `VIBE_TRADING_HYPOTHESES_PATH` env override and adds a per-invocation `--path`. 22 new tests cover wiring, JSON output, status filter, limit, missing-id errors, and note persistence. - **2026-05-19** ✨ **Live tool feedback + graceful cancel**: Long-running tools (backtests, large PDFs, swarm workers) no longer look frozen. Each tool call now emits a 3-second heartbeat plus structured per-stage progress — `run_backtest` shows phase markers (`validate` / `simulate` / `finalize`), `read_document` ticks per page on PDF or per sheet on Excel, `read_url` marks `fetch` / `parse`. The CLI Rich Live dashboard renders a Unicode spinner, ASCII progress bar, ETA, and stacks up to 3 parallel tools keyed by name; the frontend chat ships a new `ToolProgressIndicator` with rAF-coalesced renders, ARIA `role="status"` + hidden native `<progress>` for screen readers, and a determinate `ProgressRing` SVG when total is known. First `Ctrl+C` during a CLI run now calls `agent.cancel()` for graceful exit (current step finishes, trace closes cleanly); a second within 2s force-quits. Reusable primitives extracted along the way: `ProgressBar.tsx` and `lib/tools.ts` (shared tool-name i18n). - **2026-05-18** 🧹 **Cleanup pass + three latent bug fixes**: `CompositeEngine` no longer misroutes bare Chinese-futures codes like `RB2410` to `GlobalFuturesEngine` — `_is_china_futures` moved into a shared `_market_hooks` module with a case-normalized product table and a non-CN exchange guard, plus 9 new regression cases. Session FTS5 indexes now persist timestamps so cross-session search can sort by date; the same path also fixed a re-upsert that was wall-clocking every session's `started_at`. The Vite dev-mode proxy gained the missing `/alpha` entry so the AlphaZoo page resolves on `npm run dev`. `tests/test_e2e_harness_v2.py` (real-LLM e2e suite) is now gated behind `VIBE_TRADING_RUN_LIVE_E2E=1` so CI no longer changes shape based on env-key presence. Ruff `per-file-ignores` added for the factor zoo (3783 → 0 F401 noise), frontend tsconfig enables `noUnusedLocals` / `noUnusedParameters` as regression guards, and 76 unused `vw = vwap(...)` boilerplate lines were dropped from `gtja191` alphas. Net **-918 LOC**. - **2026-05-17** 🧬 **Alpha Zoo v1 (0.1.8)**: 452 pre-built quant alphas across 4 zoos — `qlib158` (Microsoft Qlib, Apache-2 attribution), `alpha101` (Kakushadze 101 Formulaic Alphas, paper rewrite from arXiv:1601.00991), `gtja191` (Guotai Junan 2014 short-horizon factor report), and `academic` (Fama-French 5 + Carhart price-based proxies). One-line CLI to bench any zoo on your universe: `vibe-trading alpha bench --zoo gtja191 --universe csi300 --period 2018-2025`. Ships with AST purity gate, lookahead-guard test, `pytest-socket` network kill-switch, per-zoo LICENSE.md, and a Developer Certificate of Origin (DCO) workflow for community PRs. Auto-rendered Alpha Library at [vibetrading.wiki/alpha-library/](https://vibetrading.wiki/alpha-library/) + research-lab post [Which of the 191 GTJA alphas still work in 2026?](https://vibetrading.wiki/research-lab/posts/alpha-191-in-2026.html). - **2026-05-16** 🧪 **Research spine update**: Added a backend Hypothesis Registry with `create_hypothesis`, `update_hypothesis`, `link_backtest`, and `search_hypotheses`; external-content readers now attach warning-only `security_warnings`; and Shadow Account scanning now uses deterministic OHLCV feature evaluation instead of the old calendar-phase stub. - **2026-05-15** 🪪 The run detail page now surfaces the Trust Layer run card alongside metrics and artifacts, completing the UI side of the `run_card.json` work landed on 2026-05-12. `PersistentMemory.add()` was also hardened on length, empty/whitespace-only names, and C0/C1 control bytes from the #108/#109/#110 triage ([#112](https://github.com/HKUDS/Vibe-Trading/pull/112), thanks @Teerapat-Vatpitak). - **2026-05-14** 🌐 the public wiki is now live at [vibetrading.wiki](https://vibetrading.wiki/) with docs, tutorials, Research Lab, and Alpha Library sections deployed through Cloudflare Pages. Persistent memory is also inspectable from the CLI via `vibe-trading memory list/show/search/forget` ([#102](https://github.com/HKUDS/Vibe-Trading/pull/102), thanks @Teerapat-Vatpitak), and memory tokenization/slugs now support Thai, Arabic, Hebrew, and Cyrillic text ([#104](https://github.com/HKUDS/Vibe-Trading/pull/104)). - **2026-05-13** 🧭 Swarm runs now ground workers with fetched market data and cleaner persisted reports ([#93](https://github.com/HKUDS/Vibe-Trading/pull/93), [#84](https://github.com/HKUDS/Vibe-Trading/pull/84)). - **2026-05-12** 🧾 Backtests now emit `run_card.json` and `run_card.md` alongside artifacts for reproducible research runs. - **2026-05-11** 🧭 **Memory slugs, swarm accounting, and CLI preflight**: Persistent memory now preserves CJK characters when generating file slugs, preventing silent filename collisions for Chinese/Japanese/Korean notes ([#95](https://github.com/HKUDS/Vibe-Trading/pull/95), thanks @voidborne-d). Swarm run totals now prefer provider-reported token usage with the existing estimate fallback ([#94](https://github.com/HKUDS/Vibe-Trading/pull/94), thanks @Teerapat-Vatpitak), and the CLI run UI gained a startup preflight check for common environment issues ([#96](https://github.com/HKUDS/Vibe-Trading/pull/96), thanks @ykykj). - **2026-05-10** 🧱 **Regression guardrails + run metadata**: Memory recall now treats underscores as token boundaries, so snake_case saved memories such as `mcp_wiring_test` match natural-language queries like "mcp wiring" ([#87](https://github.com/HKUDS/Vibe-Trading/pull/87), thanks @hp083625). The MCP server has a subprocess smoke test covering initialize → `tools/list` → `tools/call` to guard the first-call deadlock path ([#86](https://github.com/HKUDS/Vibe-Trading/pull/86)), while low-risk hardening landed for Windows path-sensitive tests, API best-effort exception handling, backtest `run_dir` allowed-root validation, and SwarmRun provider/model metadata ([#88](https://github.com/HKUDS/Vibe-Trading/pull/88), [#90](https://github.com/HKUDS/Vibe-Trading/pull/90), [#91](https://github.com/HKUDS/Vibe-Trading/pull/91), [#92](https://github.com/HKUDS/Vibe-Trading/pull/92), thanks @Teerapat-Vatpitak). - **2026-05-09** 🛡️ **API path hardening + MCP server stability**: API run/session routes now validate path IDs before lookup, rejecting malformed newline-containing parameters and pinning the behavior in the auth/security regression suite ([#80](https://github.com/HKUDS/Vibe-Trading/pull/80), thanks @SJoon99). The MCP server now pre-warms the tool registry on the main thread before serving `tools/call`, avoiding a first-call deadlock in lazy tool discovery ([#85](https://github.com/HKUDS/Vibe-Trading/pull/85), thanks @Teerapat-Vatpitak). The Vite dev proxy also honors `VITE_API_URL` for non-default backend targets ([#82](https://github.com/HKUDS/Vibe-Trading/pull/82), thanks @voidborne-d). - **2026-05-08** 🧾 **Tushare statement fields in filters**: A-share daily backtests can now request PIT-safe financial statement fields through `fundamental_fields`, so signal engines can screen on `income_total_revenue`, `income_n_income`, `balancesheet_total_hldr_eqy_exc_min_int`, `fina_indicator_roe`, and similar table-prefixed columns after their announcement/disclosure dates ([#76](https://github.com/HKUDS/Vibe-Trading/pull/76), thanks @mrbob-git). Follow-up hardening makes explicit statement-field requests fail fast if Tushare enrichment cannot run, instead of silently falling back to raw price bars ([#77](https://github.com/HKUDS/Vibe-Trading/pull/77)). - **2026-05-07** 📈 **Tushare fundamentals + community triage**: Added a point-in-time `TushareFundamentalProvider` contract for fundamental research workflows, with regression coverage for the project `TUSHARE_TOKEN` environment path ([#74](https://github.com/HKUDS/Vibe-Trading/pull/74)). Community triage also clarified that Vibe-Trading keeps rapid iteration focused on one UI language for now, avoids adding redundant search dependencies while DuckDuckGo-backed `web_search` is already bundled, and treats unofficial hosted deployments as untrusted places for API keys or data-source tokens. - **2026-05-06** 🚀 **v0.1.7 released** ([Release notes](https://github.com/HKUDS/Vibe-Trading/releases/tag/v0.1.7), `pip install -U vibe-trading-ai`): Security-boundary hardening is now published on PyPI and ClawHub, covering safer API/read/upload/file/URL/generated-code/shell-tool/Docker defaults while keeping localhost CLI/Web UI workflows low-friction. This cycle also includes Web UI Settings, correlation heatmap, OpenAI Codex OAuth, A-share pre-ST filtering, interactive CLI UX, swarm preset inspection, dividend analysis, dev workflow polish, and audited frontend build-dependency floors. Thanks to the 0.1.7 contributors and to lemi9090 (S2W) for coordinated security validation. - **2026-05-05** 🛡️ **Security boundary follow-up**: Completes the remaining security-boundary hardening around explicit CORS origins, Settings credential indicators, web URL reading, and Shadow Account code generation, with regression tests added for each path. Normal localhost CLI/Web UI workflows stay the same; remote deployments should continue using `API_AUTH_KEY` and explicit trusted origins. - **2026-05-04** 🖥️ **Interactive CLI UX + CI cleanup**: Interactive mode now has a live bottom status bar showing provider/model, session duration, last-run latency, and cumulative tool-call stats, plus prompt history navigation and cursor editing with arrow keys via `prompt_toolkit` ([#69](https://github.com/HKUDS/Vibe-Trading/pull/69)). The CLI still falls back to Rich prompts when `prompt_toolkit` or a TTY is unavailable. CI path expectations were also aligned with the hardened file-import sandbox and cross-platform `/tmp` resolution, returning main to green ([`bb67dc7`](https://github.com/HKUDS/Vibe-Trading/commit/bb67dc7cfcc11553c57d8962bee56381dca43758)). - **2026-05-03** 🛡️ **Security hardening patch**: Tightens default API authentication for non-local deployments, protects sensitive run/session/swarm reads, restricts upload and local file-reading boundaries, gates shell-capable tools by entry point, validates generated strategy loading before import, and runs the Docker image as a non-root user with a localhost-only published port by default. Local CLI and localhost Web UI workflows remain low-friction; remote API/Web deployments should set `API_AUTH_KEY`. - **2026-05-02** 🧭 **Dividend analysis + sharper roadmap**: Added the `dividend-analysis` skill for income stocks, payout sustainability, dividend growth, shareholder yield, ex-dividend mechanics, and yield-trap checks, pinned by bundled-skill regression tests. The public roadmap now focuses on upcoming work: Research Autopilot, Data Bridge, Options Lab, Portfolio Studio, Alpha Zoo, Research Delivery, Trust Layer, and Community sharing. - **2026-05-01** 🔥 **Correlation heatmap + OpenAI Codex OAuth + A-share pre-ST filter**: New correlation dashboard/API computes rolling return correlations and renders an ECharts heatmap for portfolio and symbol analysis ([#64](https://github.com/HKUDS/Vibe-Trading/pull/64)). OpenAI Codex provider support now uses ChatGPT OAuth via `vibe-trading provider login openai-codex`, with Settings metadata and adapter regression tests ([#65](https://github.com/HKUDS/Vibe-Trading/pull/65)). Added and hardened the `ashare-pre-st-filter` skill for A-share ST/*ST risk screening, including Sina penalty relevance filtering so securities-account mentions do not inflate E2 counts ([#63](https://github.com/HKUDS/Vibe-Trading/pull/63)). - **2026-04-30** ⚙️ **Web UI Settings + validation CLI hardening**: New Settings page for LLM provider/model, base URL, reasoning effort, and data source credentials, backed by local/auth-protected settings APIs and data-driven provider metadata ([#57](https://github.com/HKUDS/Vibe-Trading/pull/57)). Also hardens `python -m backtest.validation <run_dir>` so missing, blank, malformed, non-existent, and non-directory inputs fail with clear operator-facing messages before validation starts ([#60](https://github.com/HKUDS/Vibe-Trading/pull/60)). - **2026-04-28** 🚀 **v0.1.6 released** (`pip install -U vibe-trading-ai`): Fixes `vibe-trading --swarm-presets` returning empty after `pip install` / `uv tool install` ([#55](https://github.com/HKUDS/Vibe-Trading/issues/55)) — preset YAMLs now bundled inside the `src.swarm` package and pinned by a 6-test regression suite. Plus AKShare loader correctly routes ETFs (`510300.SH`) and forex (`USDCNH`) to the right endpoints with hardened registry fallback. Rolls up everything since v0.1.5: benchmark comparison panel, `/upload` streaming + size limits, Futu loader (HK + A-share), vnpy export skill, security hardening, frontend lazy loading (688KB → 262KB). - **2026-04-27** 📊 **Benchmark panel + upload safety**: Backtest output now ships a benchmark comparison panel (ticker / benchmark return / excess return / information ratio) with yfinance-backed resolution for SPY, CSI 300, etc. ([#48](https://github.com/HKUDS/Vibe-Trading/issues/48)). Plus `/upload` streams the request body in 1 MB chunks and aborts past `MAX_UPLOAD_SIZE`, bounding memory under oversized/malformed clients ([#53](https://github.com/HKUDS/Vibe-Trading/pull/53)) — pinned by a 4-case regression suite. - **2026-04-22** 🛡️ **Hardening + new integrations**: Path containment enforced in `safe_path` + journal/shadow tool sandbox, `MANIFEST.in` ships `.env.example` / tests / Docker files in sdist, route-level lazy loading shrinks frontend initial bundle 688KB → 262KB. Plus Futu data loader for HK & A-share equities ([#47](https://github.com/HKUDS/Vibe-Trading/pull/47)) and vnpy CtaTemplate export skill ([#46](https://github.com/HKUDS/Vibe-Trading/pull/46)). - **2026-04-21** 🛡️ **Workspace + docs**: Relative `run_dir` normalized to active run dir ([#43](https://github.com/HKUDS/Vibe-Trading/pull/43)). README usage examples ([#45](https://github.com/HKUDS/Vibe-Trading/pull/45)). - **2026-04-20** 🔌 **Reasoning + Swarm**: `reasoning_content` preserved across all `ChatOpenAI` paths — Kimi / DeepSeek / Qwen thinking work end-to-end ([#39](https://github.com/HKUDS/Vibe-Trading/issues/39)). Swarm streaming + clean Ctrl+C ([#42](https://github.com/HKUDS/Vibe-Trading/issues/42)). - **2026-04-19** 📦 **v0.1.5**: Published to PyPI & ClawHub. `python-multipart` CVE floor bump, 5 new MCP tools wired (`analyze_trade_journal` + 4 shadow-account tools), `pattern_recognition` → `pattern` registry fix, Docker dep parity, SKILL manifest synced (22 MCP tools / 71 skills). - **2026-04-18** 👥 **Shadow Account**: Extract your strategy rules from a broker journal → backtest the shadow across markets → 8-section HTML/PDF report showing exactly how much you leave on the table (rule violations, early exits, missed signals, counterfactual trades). 4 new tools, 1 skill, 32 tools total. Trade Journal + Shadow Account samples now live in the web UI welcome screen. - **2026-04-17** 📊 **Trade Journal Analyzer + Universal File Reader**: Upload broker exports (同花顺/东财/富途/generic CSV) → auto trading profile (holding days, win rate, PnL ratio, drawdown) + 4 bias diagnostics (disposition effect, overtrading, chasing momentum, anchoring). `read_document` now dispatches PDF, Word, Excel, PowerPoint, images (OCR), and 40+ text formats behind one unified call. - **2026-04-16** 🧠 **Agent Harness**: Persistent cross-session memory, FTS5 session search, self-evolving skills (full CRUD), 5-layer context compression, read/write tool batching. 27 tools, 107 new tests. - **2026-04-15** 🤖 **Z.ai + MiniMax**: Z.ai provider ([#35](https://github.com/HKUDS/Vibe-Trading/pull/35)), MiniMax temperature fix + model update ([#33](https://github.com/HKUDS/Vibe-Trading/pull/33)). 13 providers. - **2026-04-14** 🔧 **MCP Stability**: Fixed backtest tool `Connection closed` error on stdio transport ([#32](https://github.com/HKUDS/Vibe-Trading/pull/32)). - **2026-04-13** 🌐 **Cross-Market Composite Backtest**: New `CompositeEngine` backtests mixed-market portfolios (e.g. A-shares + crypto) with shared capital pool and per-market rules. Also fixed swarm template variable fallback and frontend timeout. - **2026-04-12** 🌍 **Multi-Platform Export**: `/pine` exports strategies to TradingView (Pine Script v6), TDX (通达信/同花顺/东方财富), and MetaTrader 5 (MQL5) in one command. - **2026-04-11** 🛡️ **Reliability & DX**: `vibe-trading init` .env bootstrap ([#19](https://github.com/HKUDS/Vibe-Trading/pull/19)), preflight checks, runtime data-source fallback, hardened backtest engine. Multi-language README ([#21](https://github.com/HKUDS/Vibe-Trading/pull/21)). - **2026-04-10** 📦 **v0.1.4**: Docker fix ([#8](https://github.com/HKUDS/Vibe-Trading/issues/8)), `web_search` MCP tool, 12 LLM providers, `akshare`/`ccxt` deps. Published to PyPI and ClawHub. - **2026-04-09** 📊 **Backtest Wave 2**: ChinaFutures, GlobalFutures, Forex, Options v2 engines. Monte Carlo, Bootstrap CI, Walk-Forward validation. - **2026-04-08** 🔧 **Multi-market backtest** with per-market rules, Pine Script v6 export, 5 data sources with auto-fallback. </details> --- ## ✨ Key Features <div align="center"> <table align="center" width="94%" style="width:94%; margin-left:auto; margin-right:auto;"> <tr> <td align="center" width="50%" valign="top"> <img src="assets/feature-self-improving-trading-agent.png" height="130" alt="Self-improving trading agent"/><br> <h3>🔍 Self-Improving Trading Agent</h3> <div align="left"> • Natural-language market research<br> • Strategy drafts and file/web analysis<br> • Memory-backed workflows </div> </td> <td align="center" width="50%" valign="top"> <img src="assets/feature-multi-agent-trading-teams.png" height="130" alt="Multi-agent trading teams"/><br> <h3>🐝 Multi-Agent Trading Teams</h3> <div align="left"> • Investment, quant, crypto, and risk teams<br> • Streaming progress and persisted reports<br> • Workers grounded with fetched market data </div> </td> </tr> <tr> <td align="center" width="50%" valign="top"> <img src="assets/feature-cross-market-data-backtesting.png" height="130" alt="Cross-market data and backtesting"/><br> <h3>📊 Cross-Market Data & Backtesting</h3> <div align="left"> • A/HK/US equities, crypto, futures, and forex<br> • Data fallback and composite backtests<br> • PIT data, validation, and run cards </div> </td> <td align="center" width="50%" valign="top"> <img src="assets/feature-shadow-account.png" height="130" alt="Shadow Account"/><br> <h3>👥 Shadow Account</h3> <div align="left"> • Broker-journal behavior diagnostics<br> • Rule-based Shadow Account comparisons<br> • Exportable audit reports and strategy code </div> </td> </tr> </table> </div> ## 💡 What Is Vibe-Trading? Vibe-Trading is an open-source research workspace for turning finance questions into runnable analysis. It connects natural-language prompts to market-data loaders, strategy generation, backtest engines, reports, exports, and persistent research memory. It is designed for research, simulation, and backtesting — and, when you choose, autonomous trading through a broker you authorize yourself (e.g. Robinhood Agentic Trading). It holds no funds and never trades outside the limits you set, and you can halt it instantly. --- ## ✨ What You Can Do | Task | Output | |------|--------| | **Ask a trading question** | Market research with tools, data, documents, and reusable session context. | | **Backtest a strategy idea** | Strategy code, metrics, benchmark context, validation artifacts, and run cards. | | **Review your own trades** | Broker-journal parsing, behavior diagnostics, rule extraction, and Shadow Account comparisons. | | **Improve repeated research** | Persistent memory and editable skills turn useful routines into reusable workflows. | | **Run analyst teams** | Multi-agent research reviews for investment, quant, crypto, macro, and risk workflows. | | **Ship usable artifacts** | Reports, TradingView Pine Script, TDX, MetaTrader 5, MCP tools, and later research sessions. | | **Bench a pre-built alpha zoo** | One-line IC + alive/reversed/dead categorisation across 452 alphas (Qlib 158 + Kakushadze 101 + GTJA 191 + FF5 + Carhart) on your universe. | --- ## ⚡ Quick Example ```bash pip install vibe-trading-ai # Natural-language research vibe-trading run -p "Backtest a BTC-USDT 20/50 moving-average strategy for 2024, summarize return and drawdown, then export the report" # Bench a pre-built alpha zoo (one line) vibe-trading alpha bench --zoo gtja191 --universe csi300 --period 2018-2025 --top 20 ``` ```bash vibe-trading --upload trades_export.csv vibe-trading run -p "Analyze my trading behavior, extract my shadow strategy, and compare it with my actual trades" ``` --- ## 👥 Shadow Account Shadow Account starts from your own trading records instead of a generic strategy template. Upload a broker export, let the agent summarize your behavior, then compare the actual trading path with a rule-based shadow strategy. | Step | Agent output | |------|--------------| | **1. Read your journal** | Parses broker exports from 同花顺, 东方财富, 富途, and generic CSV formats. | | **2. Profile your behavior** | Holding days, win rate, PnL ratio, drawdown, disposition effect, overtrading, momentum chasing, and anchoring checks. | | **3. Extract your rules** | Turns recurring entries/exits into an explicit strategy profile instead of a hand-wavy summary. | | **4. Run the shadow** | Backtests the extracted rules and highlights rule breaks, early exits, missed signals, and alternative trade paths. | | **5. Deliver the report** | Produces an HTML/PDF report that can be inspected, archived, or refined in a later session. | ```bash vibe-trading --upload trades_export.csv vibe-trading run -p "Analyze my trading behavior, extract my shadow strategy, and compare it with my actual trades" ``` --- ## 🧪 Research Workflow Most runs follow the same evidence path: route the request, load the right market context, execute tools, validate outputs, and keep the artifacts inspectable. | Layer | What happens | |-------|--------------| | **Plan** | Selects the relevant finance skills, tools, data sources, and swarm preset when useful. | | **Ground** | Pulls A-shares, HK/US equities, crypto, futures, forex, documents, or web context through the available loaders. | | **Execute** | Generates testable strategy code, runs tools, and uses the matching backtest engine or analysis workflow. | | **Validate** | Adds metrics, benchmark comparison, Monte Carlo, Bootstrap, Walk-Forward, run cards, and warnings where applicable. | | **Deliver** | Returns reports, artifacts, tool traces, and exports for TradingView, TDX, MetaTrader 5, MCP clients, or later sessions. | --- ## 🔩 Detailed Capabilities Detailed inventories are folded below to keep the main README scannable. Open them when you want to inspect the available building blocks. <details> <summary><b>Finance Skill Library</b> <sub>77 skills across 8 categories</sub></summary> - 📊 77 specialized finance skills organized into 8 categories - 🌐 Complete coverage from traditional markets to crypto & DeFi - 🔬 Comprehensive capabilities spanning data sourcing to quantitative research | Category | Skills | Examples | |----------|--------|----------| | Data Source | 7 | `data-routing`, `tushare`, `yfinance`, `okx-market`, `akshare`, `mootdx`, `ccxt` | | Strategy | 17 | `strategy-generate`, `cross-market-strategy`, `technical-basic`, `candlestick`, `ichimoku`, `elliott-wave`, `smc`, `multi-factor`, `ml-strategy` | | Analysis | 17 | `factor-research`, `macro-analysis`, `global-macro`, `valuation-model`, `earnings-forecast`, `credit-analysis`, `dividend-analysis` | | Asset Class | 9 | `options-strategy`, `options-advanced`, `convertible-bond`, `etf-analysis`, `asset-allocation`, `sector-rotation` | | Crypto | 7 | `perp-funding-basis`, `liquidation-heatmap`, `stablecoin-flow`, `defi-yield`, `onchain-analysis` | | Flow | 7 | `hk-connect-flow`, `us-etf-flow`, `edgar-sec-filings`, `financial-statement`, `adr-hshare` | | Tool | 11 | `backtest-diagnose`, `report-generate`, `pine-script`, `doc-reader`, `web-reader`, `vnpy-export`, `alpha-zoo` | | Risk Analysis | 1 | `ashare-pre-st-filter` | </details> <details> <summary><b>Custom Data Source</b> <sub>register your own historical OHLCV loader</sub></summary> Need a market or vendor we don't ship a loader for? Add your own historical-bar loader and select it with `source="<name>"`. The steps edit package source, so run from a clone (`pip install -e .`). 1. **Write the loader** — create `agent/backtest/loaders/<name>_loader.py` with a class that satisfies `DataLoaderProtocol` (duck-typed, no base class needed) and is tagged with `@register`: ```python import pandas as pd from backtest.loaders.registry import register @register class DataLoader: name = "mysource" # the value you pass as source= markets = {"us_equity"} # a_share/us_equity/hk_equity/crypto/futures/fund/macro/forex requires_auth = False def is_available(self) -> bool: return True # token present? network reachable? def fetch(self, codes, start_date, end_date, *, interval="1D", fields=None): # return {symbol: DataFrame indexed by trade_date, # columns: open, high, low, close, volume} ... ``` 2. **Register the module** so `@register` fires — add `"backtest.loaders.<name>_loader"` to `_loader_modules` in `agent/backtest/loaders/registry.py`. 3. **Allow the name** through config validation — add `"mysource"` to `_VALID_SOURCES` in `agent/backtest/runner.py`. 4. *(Optional)* slot it into a market's `FALLBACK_CHAINS` in `registry.py` so `source="auto"` can reach it. 5. **Use it** — `source="mysource"` in a backtest config, or via the CLI / agent. > **Real-time ticks / order-book depth are out of scope for loaders** — the > loader layer is point-in-time historical bars only. Live market data flows > through the broker connectors instead: `okx` / `binance` / `ccxt` for crypto, > `futu` / `tiger` for equities. </details> <details> <summary><b>Preset Trading Teams</b> <sub>29 swarm presets</sub></summary> - 🏢 29 ready-to-use agent teams - ⚡ Pre-configured finance workflows - 🎯 Investment, trading & risk management presets | Preset | Workflow | |--------|----------| | `investment_committee` | Bull/bear debate → risk review → PM final call | | `global_equities_desk` | A-share + HK/US + crypto researcher → global strategist | | `crypto_trading_desk` | Funding/basis + liquidation + flow → risk manager | | `earnings_research_desk` | Fundamental + revision + options → earnings strategist | | `macro_rates_fx_desk` | Rates + FX + commodity → macro PM | | `quant_strategy_desk` | Screening + factor research → backtest → risk audit | | `technical_analysis_panel` | Classic TA + Ichimoku + harmonic + Elliott + SMC → consensus | | `risk_committee` | Drawdown + tail risk + regime review → sign-off | | `global_allocation_committee` | A-shares + crypto + HK/US → cross-market allocation | <sub>Plus 20+ additional specialist presets — run vibe-trading --swarm-presets to explore all. </sub> </details> <details> <summary><b>Alpha Zoo</b> <sub>452 pre-built quant alphas across 4 zoos</sub></summary> - 🧬 452 cross-sectional alphas, lookahead-banned at the operator layer - 📈 IC + IR + alive/reversed/dead categorisation in one CLI command - 🔬 AST purity gate + 300-row lookahead sentinel test + `pytest-socket` network kill-switch - 📦 Apache-2 attribution for Qlib; per-zoo `LICENSE.md` declaring formulas as mathematical content - 🤝 Developer Certificate of Origin (DCO) sign-off workflow for community PRs | Zoo | Count | Source | License | |-----|-------|--------|---------| | **qlib158** | 154 | Microsoft Qlib `Alpha158` (Apache-2.0, commit-pinned) | Apache-2.0 | | **alpha101** | 101 | Kakushadze (2015), "101 Formulaic Alphas", arXiv:1601.00991 | Formulas are mathematical content | | **gtja191** | 191 | Guotai Junan (2014), "191 Short-period Trading Alpha Factors" | Formulas are mathematical content | | **academic** | 6 | Fama-French 5 + Carhart momentum (price-based proxies) | Public academic literature | Run `vibe-trading alpha list` to browse, `vibe-trading alpha show <id>` for formulas + source, `vibe-trading alpha bench --zoo X --universe Y --period Z` to score a whole zoo. </details> ## 🎬 Demo <div align="center"> <table> <tr> <td width="50%"> https://github.com/user-attachments/assets/4e4dcb80-7358-4b9a-92f0-1e29612e6e86 </td> <td width="50%"> https://github.com/user-attachments/assets/3754a414-c3ee-464f-b1e8-78e1a74fbd30 </td> </tr> <tr> <td colspan="2" align="center"><sub>☝️ Natural-language backtest & multi-agent swarm debate — Web UI + CLI</sub></td> </tr> </table> </div> --- ## 🚀 Quick Start ### One-line install (PyPI) ```bash pip install vibe-trading-ai ``` Then run a first research task: ```bash vibe-trading init vibe-trading run -p "Backtest a BTC-USDT 20/50 moving-average strategy for 2024 and summarize return and drawdown" ``` > **Package name vs commands:** The PyPI package is `vibe-trading-ai`. Once installed, you get three commands: > > | Command | Purpose | > |---------|---------| > | `vibe-trading` | Interactive CLI / TUI | > | `vibe-trading serve` | Launch FastAPI web server | > | `vibe-trading-mcp` | Start MCP server (for Claude Desktop, OpenClaw, Cursor, etc.) | ```bash vibe-trading init # interactive .env setup vibe-trading # launch CLI vibe-trading serve --port 8899 # launch web UI vibe-trading-mcp # start MCP server (stdio) ``` ### Or choose a path | Path | Best for | Time | |------|----------|------| | **A. Docker** | Try it now, zero local setup | 2 min | | **B. Local install** | Development, full CLI access | 5 min | | **C. MCP plugin** | Plug into your existing agent | 3 min | | **D. ClawHub** | One command, no cloning | 1 min | ### Prerequisites - An **LLM API key** from any supported provider — or run locally with **Ollama** (no key needed) - **Python 3.11+** for Path B - **Docker** for Path A - OpenAI Codex can also be used with ChatGPT OAuth: set `LANGCHAIN_PROVIDER=openai-codex`, then run `vibe-trading provider login openai-codex`. This does not use `OPENAI_API_KEY`. > **Supported LLM providers:** OpenRouter, OpenAI, DeepSeek, Gemini, Groq, DashScope/Qwen, Zhipu, Moonshot/Kimi, MiniMax, Xiaomi MIMO, Z.ai, Ollama (local). See `.env.example` for config. > **Tip:** All markets work without any API keys thanks to automatic fallback. yfinance (HK/US), OKX (crypto), mootdx (A-shares, TCP-direct, no IP throttle), and AKShare (A-shares, US, HK, futures, forex) are all free. Tushare token is optional — mootdx is the preferred no-token A-share fallback, with AKShare as a broader backup. ### Path A: Docker (zero setup) ```bash git clone https://github.com/HKUDS/Vibe-Trading.git cd Vibe-Trading cp agent/.env.example agent/.env # Edit agent/.env — uncomment your LLM provider and set API key docker compose up --build ``` Open `http://localhost:8899`. Backend + frontend in one container. Docker publishes the backend on `127.0.0.1:8899` by default and runs the app as a non-root container user. If you intentionally expose the API beyond your own machine, set a strong `API_AUTH_KEY` and send `Authorization: Bearer <key>` from clients. ### Path B: Local install ```bash git clone https://github.com/HKUDS/Vibe-Trading.git cd Vibe-Trading python -m venv .venv # Activate source .venv/bin/activate # Linux / macOS # .venv\Scripts\Activate.ps1 # Windows PowerShell pip install -e . cp agent/.env.example agent/.env # Edit — set your LLM provider API key vibe-trading # Launch interactive TUI ``` <details> <summary><b>Start web UI (optional)</b></summary> ```bash # Terminal 1: API server vibe-trading serve --port 8899 # Terminal 2: Frontend dev server cd frontend && npm install && npm run dev ``` Open `http://localhost:5899`. The frontend proxies API calls to `localhost:8899`. **Production mode (single server):** ```bash cd frontend && npm run build && cd .. vibe-trading serve --port 8899 # FastAPI serves dist/ as static files ``` > [!NOTE] > `vibe-trading serve` binds `0.0.0.0` and is loopback-only by default: opening the UI on the **same machine** (`http://localhost:8899`) works with zero config. If you browse from **another machine, a VM host, or a phone on your LAN**, sensitive endpoints return `403` and the chat shows "Remote API access requires an API key" — set a strong `API_AUTH_KEY` in `agent/.env`, restart, and enter the same key once in **Settings**. (Docker Desktop's host gateway: set `VIBE_TRADING_TRUST_DOCKER_LOOPBACK=1` with the default `127.0.0.1` port bind.) </details> ### Path C: MCP plugin See [MCP Plugin](#-mcp-plugin) section below. ### Path D: ClawHub (one command) ```bash npx clawhub@latest install vibe-trading --force ``` The skill + MCP config is downloaded into your agent's skills directory. See [ClawHub install](#-mcp-plugin) for details. --- ## 🧠 Environment Variables Copy `agent/.env.example` to `agent/.env` and uncomment the provider block you want. Each provider needs 3-4 variables: | Variable | Required | Description | |----------|:--------:|-------------| | `LANGCHAIN_PROVIDER` | Yes | Provider name (`openrouter`, `deepseek`, `groq`, `ollama`, etc.) | | `<PROVIDER>_API_KEY` | Yes* | API key (`OPENROUTER_API_KEY`, `DEEPSEEK_API_KEY`, etc.) | | `<PROVIDER>_BASE_URL` | Yes | API endpoint URL | | `LANGCHAIN_MODEL_NAME` | Yes | Model name (e.g. `deepseek-v4-pro`) | | `TUSHARE_TOKEN` | No | Tushare Pro token for A-share data (falls back to AKShare) | | `TIMEOUT_SECONDS` | No | LLM call timeout, default 120s | | `API_AUTH_KEY` | Recommended for network deployments | Bearer token required when the API is reachable from non-local clients | | `VIBE_TRADING_ENABLE_SHELL_TOOLS` | No | Explicit opt-in for shell-capable tools in remote API/MCP-SSE style deployments | | `VIBE_TRADING_ALLOWED_FILE_ROOTS` | No | Extra comma-separated roots for document and broker-journal imports | | `VIBE_TRADING_ALLOWED_RUN_ROOTS` | No | Extra comma-separated roots for generated-code run directories | <sub>* Ollama does not require an API key. OpenAI Codex uses ChatGPT OAuth and stores tokens via `oauth-cli-kit`, not in `agent/.env`.</sub> **Free data (no key needed):** A-shares via AKShare, HK/US equities via yfinance, crypto via OKX, 100+ crypto exchanges via CCXT. The system automatically selects the best available source for each market. ### 🎯 Recommended Models Vibe-Trading is a tool-heavy agent — skills, backtests, memory, and swarms all flow through tool calls. Model choice directly decides whether the agent *uses* its tools or fabricates answers from training data. | Tier | Examples | When to use | |------|----------|-------------| | **Best** | `anthropic/claude-opus-4.7`, `anthropic/claude-sonnet-4.6`, `openai/gpt-5.5-pro`, `google/gemini-3.5-flash` | Complex swarms (3+ agents), long research sessions, paper-grade analysis | | **Sweet spot** (default) | `deepseek-v4-pro`, `deepseek/deepseek-v4-pro`, `x-ai/grok-4.20`, `z-ai/glm-5.1`, `moonshotai/kimi-k2.6`, `qwen/qwen3-max-thinking` | Daily driver — reliable tool-calling at ~1/10 the cost | | **Avoid for agent use** | `*-nano`, `*-flash-lite`, `*-coder-next`, small / distilled variants | Tool-calling is unreliable — the agent will appear to "answer from memory" instead of loading skills or running backtests | The default `agent/.env.example` ships with DeepSeek official API + `deepseek-v4-pro`; OpenRouter users can use `deepseek/deepseek-v4-pro`. --- ## 🖥 CLI Reference The interactive TUI (`vibe-trading`) now uses a terminal-native transcript: a startup banner, prompt rule, previous-turn recap, live activity rail, Markdown/table rendering, and run timing all stay in the CLI. Non-interactive invocations such as `vibe-trading run`, pipes, and `--json` remain script-friendly. ```bash vibe-trading # interactive TUI vibe-trading run -p "..." # single run vibe-trading serve # API server vibe-trading alpha list # browse 452 pre-built alphas; show / bench / compare / export-manifest sub-commands available ``` <details> <summary><b>Slash commands inside TUI</b></summary> | Command | Description | |---------|-------------| | `/help` | Show all commands | | `/skills` | List all 77 finance skills | | `/swarm` | List 29 swarm team presets | | `/swarm run <preset> [vars_json]` | Run a swarm team with live streaming | | `/swarm list` | Swarm run history | | `/swarm show <run_id>` | Swarm run details | | `/swarm cancel <run_id>` | Cancel a running swarm | | `/list` | Recent runs | | `/show <run_id>` | Run details + metrics | | `/code <run_id>` | Generated strategy code | | `/pine <run_id>` | Export indicators (TradingView + TDX + MT5) | | `/trace <run_id>` | Full execution replay | | `/continue <run_id> <prompt>` | Continue a run with new instructions | | `/sessions` | List chat sessions | | `/settings` | Show runtime config | | `/clear` | Clear screen | | `/quit` | Exit | </details> <details> <summary><b>Single run & flags</b></summary> ```bash vibe-trading run -p "Backtest BTC-USDT MACD strategy, last 30 days" vibe-trading run -p "Analyze AAPL momentum" --json vibe-trading run -f strategy.txt echo "Backtest 000001.SZ RSI" | vibe-trading run ``` ```bash vibe-trading -p "your prompt" vibe-trading --skills vibe-trading --swarm-presets vibe-trading --swarm-run investment_committee '{"topic":"BTC outlook"}' vibe-trading --list vibe-trading --show <run_id> vibe-trading --code <run_id> vibe-trading --pine <run_id> # Export indicators (TradingView + TDX + MT5) vibe-trading --trace <run_id> vibe-trading --continue <run_id> "refine the strategy" vibe-trading --upload report.pdf ``` ```bash vibe-trading alpha list --zoo gtja191 --limit 10 vibe-trading alpha show gtja191_171 vibe-trading alpha bench --zoo gtja191 --universe csi300 --period 2018-2025 --top 20 ``` </details> --- ## 💡 Examples ### Strategy & Backtesting ```bash # Moving average crossover on US equities vibe-trading run -p "Backtest a 20/50-day moving average crossover on AAPL for the past year, show Sharpe ratio and max drawdown" # RSI mean-reversion on crypto vibe-trading run -p "Test RSI(14) mean-reversion on BTC-USDT: buy below 30, sell above 70, last 6 months" # Multi-factor strategy on A-shares vibe-trading run -p "Backtest a momentum + value + quality multi-factor strategy on CSI 300 constituents over 2 years" # After backtesting, export to TradingView / TDX / MetaTrader 5 vibe-trading --pine <run_id> ``` **Bench a pre-built alpha zoo** (one line): ```bash vibe-trading alpha bench --zoo gtja191 --universe csi300 --period 2018-2025 --top 20 ``` **Browse the catalogue** and inspect a single alpha: ```bash vibe-trading alpha list --zoo gtja191 --theme reversal --limit 10 vibe-trading alpha show gtja191_171 ``` **Compose a multi-factor signal** from the zoo (Python): ```python from src.skills.multi_factor.zoo_signal_engine import ZooSignalEngine engine = ZooSignalEngine.from_zoo(["gtja191_171", "gtja191_111", "gtja191_163"]) panel = ... # your wide OHLCV panel signal = engine.compute_signal(panel) ``` ### Market Research ```bash # Equity deep-dive vibe-trading run -p "Research NVDA: earnings trend, analyst consensus, option flow, and key risks for next quarter" # Macro analysis vibe-trading run -p "Analyze the current Fed rate path, USD strength, and impact on EM equities and gold" # Crypto on-chain vibe-trading run -p "Deep dive BTC on-chain: whale flows, exchange balances, miner activity, and funding rates" ``` ### Swarm Workflows ```bash # Bull/bear debate on a stock vibe-trading --swarm-run investment_committee '{"topic": "Is TSLA a buy at current levels?"}' # Quant strategy from screening to backtest vibe-trading --swarm-run quant_strategy_desk '{"universe": "S&P 500", "horizon": "3 months"}' # Crypto desk: funding + liquidation + flow → risk manager vibe-trading --swarm-run crypto_trading_desk '{"asset": "ETH-USDT", "timeframe": "1w"}' # Global macro portfolio allocation vibe-trading --swarm-run macro_rates_fx_desk '{"focus": "Fed pivot impact on EM bonds"}' ``` ### Cross-Session Memory ```bash # Save your preferences once vibe-trading run -p "Remember: I prefer RSI-based strategies, max 10% drawdown, hold period 5–20 days" # The agent recalls them in future sessions automatically vibe-trading run -p "Build a crypto strategy that fits my risk profile" ``` ### Upload & Analyze Documents ```bash # Analyze a broker export or earnings report vibe-trading --upload trades_export.csv vibe-trading run -p "Profile my trading behavior and identify any biases" vibe-trading --upload NVDA_Q1_earnings.pdf vibe-trading run -p "Summarize the key risks and beats/misses from this earnings report" ``` --- ## 🌐 API Server ```bash vibe-trading serve --port 8899 ``` | Method | Endpoint | Description | |--------|----------|-------------| | `GET` | `/runs` | List runs | | `GET` | `/runs/{run_id}` | Run details | | `GET` | `/runs/{run_id}/pine` | Multi-platform indicator export | | `POST` | `/sessions` | Create session | | `POST` | `/sessions/{id}/messages` | Send message | | `GET` | `/sessions/{id}/events` | SSE event stream | | `POST` | `/upload` | Upload PDF/file | | `GET` | `/swarm/presets` | List swarm presets | | `POST` | `/swarm/runs` | Start swarm run | | `GET` | `/swarm/runs/{id}/events` | Swarm SSE stream | | `GET` | `/alpha/list` | List alphas (filter by zoo/theme/universe) | | `GET` | `/alpha/{alpha_id}` | Alpha metadata + source code | | `POST` | `/alpha/bench` | Start a bench job (returns `job_id`) | | `GET` | `/alpha/bench/{job_id}/stream` | SSE progress stream | | `GET` | `/settings/llm` | Read Web UI LLM settings | | `PUT` | `/settings/llm` | Update local LLM settings | | `GET` | `/settings/data-sources` | Read local data source settings | | `PUT` | `/settings/data-sources` | Update local data source settings | Interactive docs: `http://localhost:8899/docs` ### Security defaults For localhost development, `vibe-trading serve` keeps the browser workflow simple. For any non-local client, sensitive API endpoints require `API_AUTH_KEY`; use `Authorization: Bearer <key>` for JSON/upload requests. Browser EventSource streams are handled by the Web UI after you enter the same key once in Settings. Shell-capable tools are available to local CLI and trusted localhost workflows, but are not exposed to remote API sessions unless you explicitly set `VIBE_TRADING_ENABLE_SHELL_TOOLS=1`. Document and journal readers are limited to upload/import roots by default; place files under `agent/uploads`, `agent/runs`, `./uploads`, `./data`, `~/.vibe-trading/uploads`, or `~/.vibe-trading/imports`, or add a dedicated directory through `VIBE_TRADING_ALLOWED_FILE_ROOTS`. ### Web UI Settings The Web UI Settings page lets local users update the LLM provider/model, base URL, generation parameters, reasoning effort, and optional market data credentials such as the Tushare token. Settings are persisted to `agent/.env`; provider defaults are loaded from `agent/src/providers/llm_providers.json`. Settings reads are side-effect free: `GET /settings/llm` and `GET /settings/data-sources` never create `agent/.env`, and they only return project-relative paths. Settings reads and writes can expose credential state or update credentials/runtime environment, so they require `API_AUTH_KEY` when configured. If `API_AUTH_KEY` is unset for dev mode, settings access is accepted only from loopback clients. --- ## 🔌 MCP Plugin Vibe-Trading exposes 36 MCP tools for any MCP-compatible client. Runs as a stdio subprocess — no server setup needed. Core research tools work with zero API keys for HK/US/crypto; trading connector tools use the selected connector profile, and `run_swarm` needs an LLM key. <details> <summary><b>Claude Desktop</b></summary> Add to `claude_desktop_config.json`: ```json { "mcpServers": { "vibe-trading": { "command": "vibe-trading-mcp" } } } ``` </details> <details> <summary><b>OpenClaw</b></summary> Add to `~/.openclaw/config.yaml`: ```yaml skills: - name: vibe-trading command: vibe-trading-mcp ``` For a first research-only smoke test, confirm tool discovery and run a market data or backtest request before selecting a trading connector profile. Core research tools can run without broker credentials; connector-backed `trading_*` tools should be used only after you intentionally select and check a connector profile. `run_swarm` requires an LLM key. </details> <details> <summary><b>Cursor / Windsurf / other MCP clients</b></summary> ```bash vibe-trading-mcp # stdio (default) vibe-trading-mcp --transport sse # SSE for web clients ``` </details> **MCP tools exposed (36):** `list_skills`, `load_skill`, `start_research_goal`, `get_research_goal`, `add_goal_evidence`, `update_research_goal_status`, `backtest`, `factor_analysis`, `analyze_options`, `pattern_recognition`, `read_url`, `read_document`, `web_search`, `write_file`, `read_file`, `list_swarm_presets`, `run_swarm`, `get_market_data`, `get_swarm_status`, `get_run_result`, `list_runs`, `reap_stale_runs`, `retry_run`, `analyze_trade_journal`, `extract_shadow_strategy`, `run_shadow_backtest`, `render_shadow_report`, `scan_shadow_signals`, `trading_connections`, `trading_select_connection`, `trading_check`, `trading_account`, `trading_positions`, `trading_orders`, `trading_quote`, `trading_history`. ### SWARM external MCP tools `run_swarm` workers can call operator-approved tools from external MCP servers. Configure the server-side allowlist in `VIBE_TRADING_SWARM_AGENT_CONFIG`, `~/.vibe-trading/swarm-agent.json`, or the fallback `~/.vibe-trading/agent.json`; then list remote tools in a swarm preset using the local MCP wrapper name, such as `mcp_internal_kb_search`. Caller-provided `variables` stay template data only and cannot inject MCP URLs, commands, environment variables, or allowlist overrides. <details> <summary><b>Install from ClawHub (one command)</b></summary> ```bash npx clawhub@latest install vibe-trading --force ``` > `--force` is required because the skill references external APIs, which triggers VirusTotal's automated scan. The code is fully open-source and safe to inspect. This downloads the skill + MCP config into your agent's skills directory. No cloning needed. Browse on ClawHub: [clawhub.ai/skills/vibe-trading](https://clawhub.ai/skills/vibe-trading) </details> <details> <summary><b>OpenSpace — self-evolving skills</b></summary> All 77 finance skills are published on [open-space.cloud](https://open-space.cloud) and evolve autonomously through OpenSpace's self-evolution engine. To use with OpenSpace, add both MCP servers to your agent config: ```json { "mcpServers": { "openspace": { "command": "openspace-mcp", "toolTimeout": 600, "env": { "OPENSPACE_HOST_SKILL_DIRS": "/path/to/vibe-trading/agent/src/skills", "OPENSPACE_WORKSPACE": "/path/to/OpenSpace" } }, "vibe-trading": { "command": "vibe-trading-mcp" } } } ``` OpenSpace will auto-discover all 77 skills, enabling auto-fix, auto-improve, and community sharing. Search for Vibe-Trading skills via `search_skills("finance backtest")` in any OpenSpace-connected agent. </details> --- ## 🔌 Loading Tools from External MCP Servers (MCP Client Mode) > **This is the opposite direction from the MCP Plugin above.** > The MCP Plugin lets *other* agents call Vibe-Trading tools. > This section lets the *built-in* Vibe-Trading agent call tools from *your* external MCP servers. ### Quick start Create `~/.vibe-trading/agent.json`: ```json { "mcpServers": { "my-server": { "command": "uvx", "args": ["my-mcp-server"] } } } ``` Run any CLI command — tools from ordinary external servers are automatically injected into the agent's registry after local tools: ```bash vibe-trading run "use my-server to do X" ``` ### Official IBKR MCP read-only probe Vibe-Trading can connect directly to Interactive Brokers' official remote MCP endpoint in read-only mode. Add this to `~/.vibe-trading/agent.json`: ```json { "mcpServers": { "ibkr": { "type": "streamableHttp", "url": "https://api.ibkr.com/v1/api/mcp", "auth": { "type": "oauth", "scopes": ["mcp.read"], "clientName": "Vibe-Trading", "cacheDir": "~/.vibe-trading/live/ibkr/oauth" }, "enabledTools": ["*"] } } } ``` Then start the browser OAuth flow: ```bash vibe-trading connector authorize ibkr-live-official-mcp-readonly ``` The wildcard is accepted only for IBKR's `mcp.read` probe. Authorizing this profile confirms access to IBKR's official read scope; generic `trading_account` and `trading_positions` calls stay disabled until IBKR publishes stable read tool names that Vibe-Trading can map safely. A config that adds `mcp.write` must pin an explicit tool allowlist and still passes through the live order guard. If IBKR issues a pre-registered OAuth client, add `clientId` and `clientSecret` inside `auth`. ### Trading connectors: fastest path For users who cannot wait for IBKR OAuth client approval, connect to a local TWS or IB Gateway session. Credentials stay inside IBKR's desktop app; Vibe- Trading only connects to `127.0.0.1` and exposes it as a connector profile. Install the optional SDK: ```bash pip install "vibe-trading-ai[ibkr]" ``` Open TWS paper trading or IB Gateway paper, enable API socket clients, then run: ```bash vibe-trading connector list vibe-trading connector use ibkr-paper-local vibe-trading connector configure ibkr-paper-local --yes vibe-trading connector check vibe-trading connector account vibe-trading connector positions vibe-trading connector orders vibe-trading connector quote AAPL vibe-trading connector history AAPL --duration "30 D" --bar-size "1 day" ``` Default local ports: | App | Paper | Live read-only | |-----|-------|----------------| | TWS | `7497` | `7496` | | IB Gateway | `4002` | `4001` | The agent exposes connector-scoped tools named `trading_connections`, `trading_select_connection`, `trading_check`, `trading_account`, `trading_positions`, `trading_orders`, `trading_quote`, and `trading_history`. Live-broker raw MCP tools are not registered directly as `mcp_<broker>_*`. No IBKR order-placement tool is registered. ### Config reference | Field | Type | Default | Description | |-------|------|---------|-------------| | `type` | string | inferred for stdio; required for HTTP | Omit for stdio, or set to `sse` / `streamableHttp` for URL-based servers. | | `command` | string | required for stdio | Executable to spawn for stdio servers. Invalid for `sse` / `streamableHttp` servers. | | `args` | array | `[]` | Command-line arguments for stdio servers only. | | `env` | object | `{}` | Extra environment variables merged into the subprocess env for stdio servers only. | | `url` | string | required for `sse` / `streamableHttp` | Remote SSE / streamable HTTP endpoint URL. Not used for stdio servers. | | `headers` | object | `{}` | Extra HTTP headers for `sse` / `streamableHttp` servers only. | | `toolTimeout` | number | `30` | Per-tool call timeout in seconds | | `enabledTools` | array | `["*"]` | Tool allowlist. Use `["*"]` to expose all tools from the server | Config file location: `~/.vibe-trading/agent.json` (JSON or YAML). For URL-based transports, `type` is required. The agent no longer guesses between SSE and streamable HTTP from the URL suffix. ### Per-session overrides (API) When creating a session via the API you can pass `mcpServers` inside `session.config` to extend or override the global config for that session only: ```json { "config": { "mcpServers": { "research-server": { "command": "uvx", "args": ["research-mcp"], "enabledTools": ["search", "fetch"] } } } } ``` ### Tool naming Ordinary remote tools are exposed with stable names: `mcp_<server>_<tool>`. Live-broker MCP servers stay behind the `trading_*` connector surface. If two server names produce the same ASCII-safe local prefix (e.g. `foo-bar` and `foo_bar` both become `foo_bar`), a deterministic hash suffix is appended at the server-segment level so names remain unique. The operator receives a warning: ``` WARNING: Configured MCP server 'foo-bar' collides with another server after local name normalization. Using local tool prefix 'mcp_foo_bar_<hash>_<tool>' to keep generated tool names unique. Rename the server in agent config if you want a different prefix. ``` ### v1 limits | Limit | Detail | |-------|--------| | Transport | stdio, SSE, and streamable HTTP | | Execution | serial only — MCP tools never enter the parallel readonly path | | Surfaces | tools only (resources and prompts excluded in v1) | | Hot reload | not supported — restart the process to pick up config changes | | Swarm path | MCP tools are not available inside Swarm worker registries in v1 | --- ## 📁 Project Structure <details> <summary><b>Click to expand</b></summary> ``` Vibe-Trading/ ├── agent/ # Backend (Python) │ ├── cli/ # CLI package — interactive TUI + subcommands │ ├── api_server.py # FastAPI server — runs, sessions, upload, swarm, SSE │ ├── mcp_server.py # MCP server — 36 tools for OpenClaw / Claude Desktop │ │ │ ├── src/ │ │ ├── agent/ # ReAct agent core │ │ │ ├── loop.py # 5-layer compression + read/write tool batching │ │ │ ├── context.py # system prompt + auto-recall from persistent memory │ │ │ ├── skills.py # skill loader (77 bundled + user-created via CRUD) │ │ │ ├── tools.py # tool base class + registry │ │ │ ├── memory.py # lightweight workspace state per run │ │ │ ├── frontmatter.py # shared YAML frontmatter parser │ │ │ └── trace.py # execution trace writer │ │ │ │ │ ├── memory/ # Cross-session persistent memory │ │ │ └── persistent.py # file-based memory (~/.vibe-trading/memory/) │ │ │ │ │ ├── tools/ # 31 auto-discovered agent tools │ │ │ ├── backtest_tool.py # run backtests │ │ │ ├── remember_tool.py # cross-session memory (save/recall/forget) │ │ │ ├── skill_writer_tool.py # skill CRUD (save/patch/delete/file) │ │ │ ├── session_search_tool.py # FTS5 cross-session search │ │ │ ├── swarm_tool.py # launch swarm teams │ │ │ ├── web_search_tool.py # DuckDuckGo web search │ │ │ └── ... # bash, file I/O, factor analysis, options, alpha browser + bench, etc. │ │ │ │ │ ├── factors/ # Alpha Zoo — 452 alphas across 4 zoos │ │ │ ├── base.py # 19 operators (rank/scale/ts_*/delta/decay_linear/safe_div/vwap) │ │ │ ├── registry.py # AST-only metadata load + lazy compute + sanity gates │ │ │ ├── bench_runner.py # IC + alive/reversed/dead categorisation │ │ │ └── zoo/ # qlib158 (154) + alpha101 (101) + gtja191 (191) + academic (6) │ │ │ │ │ ├── api/ # FastAPI route modules │ │ │ └── alpha_routes.py # /alpha/list, /alpha/{id}, /alpha/bench, SSE stream │ │ │ │ │ ├── skills/ # 77 finance skills in 8 categories (SKILL.md each) │ │ ├── swarm/ # Swarm DAG execution engine │ │ │ └── presets/ # 29 swarm preset YAML definitions │ │ ├── session/ # Multi-turn chat + FTS5 session search │ │ └── providers/ # LLM provider abstraction │ │ │ └── backtest/ # Backtest engines │ ├── engines/ # 7 engines + composite cross-market engine + options_portfolio │ ├── loaders/ # 7 sources: tushare, okx, yfinance, akshare, mootdx, ccxt, futu │ │ ├── base.py # DataLoader Protocol │ │ └── registry.py # Registry + auto-fallback chains │ └── optimizers/ # MVO, equal vol, max div, risk parity │ ├── frontend/ # Web UI (React 19 + Vite + TypeScript) │ └── src/ │ ├── pages/ # Home, Agent, AlphaZoo, RunDetail, Compare, Correlation, Settings │ ├── components/ # chat, charts, layout │ └── stores/ # Zustand state management │ ├── Dockerfile # Multi-stage build ├── docker-compose.yml # One-command deploy ├── pyproject.toml # Package config + CLI entrypoint ├── tools/ # Repo-level CI helpers │ └── ci_grep_gates.sh # rejects yaml.load / trademark / per-stock-data leaks └── LICENSE # MIT ``` </details> --- ## 🏛 Ecosystem Vibe-Trading is part of the **[HKUDS](https://github.com/HKUDS)** agent ecosystem: <table> <tr> <td align="center" width="20%"> <a href="https://github.com/HKUDS/nanobot"><b>NanoBot</b></a><br> <sub>Ultra-Lightweight Personal AI Assistant</sub> </td> <td align="center" width="20%"> <a href="https://github.com/HKUDS/AI-Trader"><b>AI-Trader</b></a><br> <sub>Agent-Native Signal & Copy Trading Platform</sub> </td> <td align="center" width="20%"> <a href="https://github.com/HKUDS/CLI-Anything"><b>CLI-Anything</b></a><br> <sub>Making All Software Agent-Native</sub> </td> <td align="center" width="20%"> <a href="https://github.com/HKUDS/OpenSpace"><b>OpenSpace</b></a><br> <sub>Self-Evolving AI Agent Skills</sub> </td> <td align="center" width="20%"> <a href="https://github.com/HKUDS/ClawTeam"><b>ClawTeam</b></a><br> <sub>Agent Swarm Intelligence</sub> </td> </tr> </table> --- ## 🗺 Roadmap > We ship in phases. Items move to [Issues](https://github.com/HKUDS/Vibe-Trading/issues) when work begins. | Phase | Feature | Status | |-------|---------|--------| | **Trust Layer** | Reproducible run cards are emitted and shown in Run Detail; v1 adds tool traces and citations | v0 Shipped | | **Hypothesis Registry** | Durable research hypotheses with lifecycle status, data sources, skills, run-card links, and invalidation notes | Backend MVP Shipped | | **Research Autopilot** | Manual-first research loop: hypothesis → deterministic backtest → evidence report | Next | | **Data Bridge** | Bring-your-own data: local CSV/Parquet/SQL connectors with schema mapping | Planned | | **Options Lab** | Vol surface, Greeks dashboard, payoff/scenario explorer | Planned | | **Portfolio Studio** | Risk x-ray, constraints, turnover-aware optimizer, rebalance notes | Planned | | **Alpha Zoo** | 452 pre-built alphas (Qlib 158 + Kakushadze 101 + GTJA 191 + FF5 + Carhart) with one-line bench, agent integration, and Web UI | **Shipped 0.1.8** | | **Research Delivery** | Scheduled briefs to Slack / Telegram / email-style channels | Planned | | **Community** | Shareable skills, presets, and strategy cards | Exploring | --- ## Contributing We welcome contributions! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines. **Good first issues** are tagged with [`good first issue`](https://github.com/HKUDS/Vibe-Trading/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22) — pick one and get started. Want to contribute something bigger? Check the [Roadmap](#-roadmap) above and open an issue to discuss before starting. --- ## Contributors Thanks to everyone who has contributed to Vibe-Trading! Recent v0.1.9 cycle contributors and credits: - @toanalien — session JSONL crash-hardening (#147), graceful agent-loop exit at the iteration budget (#148), pre-flight validation for LLM-generated signal engines (#149), and cross-browser Full Report links (#150) - @ai7eam-dev — cross-market correlation timestamp alignment (#158) and the session running-status indicator + swarm retry (#159 → #160) - @shadowinlife — remote MCP servers over SSE/HTTP (#125) and operator-configured external MCP tools in swarm workers (#142) - @DoubleSky123 — configurable SSE idle timeout (#157) - @ArthurXi — IME Enter submission handling in the Web composer (#146) - @omcdecor-cyber — swarm DAG gating when an upstream task fails (#145) - @Soli22de — strict alpha-bench mode with a mandatory random control (#143) - @ruok808 — proxy-env support in the CCXT loader (#126) - @faizack — remote Ollama base-URL normalization (#129) - @fightZy — agent session history loading fix (#136) - @lcwSeven — short universe names in the alpha list endpoint (#137) - @Teerapat-Vatpitak — resolved .env-source logging (#124) - @warren618 / Haozhe Wu — connector-first broker profiles, the Robinhood Agentic Trading channel, Research Goal runtime, swarm reconcile + retry_run, the agent/cli refactor, the mootdx loader, and release integration <a href="https://github.com/HKUDS/Vibe-Trading/graphs/contributors"> <img src="https://contrib.rocks/image?repo=HKUDS/Vibe-Trading" /> </a> --- ## Disclaimer Vibe-Trading is research and trading software. It is not investment advice, holds no funds, and runs no execution venue. Trading through a broker channel you explicitly authorize (e.g. Robinhood Agentic Trading) happens only within the limits you set and which you can halt at any time. This broker-trading capability is experimental and not verified by us against a real broker account — use it at your own risk. Past performance does not guarantee future results. ## License MIT License — see [LICENSE](LICENSE) --- ## Star History [![Star History Chart](https://api.star-history.com/svg?repos=HKUDS/Vibe-Trading&type=Date)](https://star-history.com/#HKUDS/Vibe-Trading&Date) <p align="center"> ⭐ If <b>Vibe-Trading</b> helps your research, a star helps more people find it. </p> --- <p align="center"> Thanks for visiting <b>Vibe-Trading</b> ✨ </p> <p align="center"> <img src="https://visitor-badge.laobi.icu/badge?page_id=HKUDS.Vibe-Trading&style=flat" alt="visitors"/> </p>

Finance & Accounting AI Agents

11.4K Github Stars

Open Source

CatchMe

<p align="right"> <a href="assets/readme/README_zh.md">中文</a> · <a href="assets/readme/README_ja.md">日本語</a> · <a href="assets/readme/README_es.md">Español</a> · <b>English</b> </p> <p align="center"> <img src="assets/catchme-logo.png" width="360" alt="CatchMe Logo"/> </p> <h1 align="center">CatchMe: Make Your AI Agents Truly Personal</h1> <p align="center"> <b>Capture Your Entire Digital Footprint: Lightweight & Vectorless & Powerful.</b> </p> <p align="center"> <a href="LICENSE"><img src="https://img.shields.io/badge/License-Apache%202.0-blue?style=flat" alt="License"></a> <img src="https://img.shields.io/badge/Python-%E2%89%A53.11-3776AB?style=flat&logo=python&logoColor=white" alt="Python"> <img src="https://img.shields.io/badge/Platform-macOS%20%7C%20Windows%20%7C%20Linux-lightgrey?style=flat" alt="Platform"> <a href="https://hkuds.github.io/CatchMe"><img src="https://img.shields.io/badge/Blog-online-orange?style=flat" alt="Blog"></a> <img src="https://img.shields.io/badge/Report-coming%20soon-lightgrey?style=flat" alt="Report"> <br> <a href="./COMMUNICATION.md"><img src="https://img.shields.io/badge/Feishu-Group-E9DBFC?style=flat-square&logo=feishu&logoColor=white" alt="Feishu"></a> <a href="./COMMUNICATION.md"><img src="https://img.shields.io/badge/WeChat-Group-C5EAB4?style=flat-square&logo=wechat&logoColor=white" alt="WeChat"></a> <a href="https://discord.gg/2vDYc2w5"><img src="https://img.shields.io/badge/Discord-Join-7289DA?style=flat-square&logo=discord&logoColor=white" alt="Discord"></a> </p> <p align="center"> <a href="#-key-features">Features</a>  ·  <a href="#%EF%B8%8F-how-it-works">How It Works</a>  ·  <a href="#-llm-configuration">LLM Config</a>  ·  <a href="#-get-started">Get Started</a>  ·  <a href="#-cost--efficiency">Cost</a>  ·  <a href="#-community">Community</a> </p> <p align="center"><i>「 <b>Just do your thing. CatchMe captures everything else — stored locally to ensure privacy and security. </b> 」</i></p> <p align="center"> <img src="assets/terminal_demo.svg" alt="CatchMe Terminal Demo"/> </p> **🦞 Makes Your Agents Truly Personal**. CatchMe ships as an agent-compatible skill for CLI agents (OpenClaw, NanoBot, Claude, Cursor, etc.). Run CatchMe independently. Your agents query memories via CLI commands only. ## ## 🎯 Enrich Your Personal Digital Context <table width="100%"> <tr> <td align="center" width="25%" valign="top"> <img src="assets/usecase_coding.png" height="150" alt="Coding"/><br> <h3>💻 Personal Coding Assistant</h3> <b><i>"What was I coding in Claude Code today?"</i></b><br><br> <div align="left"> • Code session replay<br> • Recall your edited files<br> • Trace what you typed </div> </td> <td align="center" width="25%" valign="top"> <img src="assets/usecase_research.png" height="150" alt="Research"/><br> <h3>🔍 Personal Deep Research</h3> <b><i>"What was I reading about AI yesterday?"</i></b><br><br> <div align="left"> • Web/PDF viewed<br> • Search queries typed<br> • Reading info tracked </div> </td> <td align="center" width="25%" valign="top"> <img src="assets/usecase_files.png" height="150" alt="Files"/><br> <h3>📁 Personal Files Manager</h3> <b><i>"Which files did I change today?"</i></b><br><br> <div align="left"> • File changes tracked<br> • Docs accessed<br> • Edits reviewed </div> </td> <td align="center" width="25%" valign="top"> <img src="assets/usecase_digital_life.png" height="150" alt="Digital Life"/><br> <h3>🧩 Digital Life Overview</h3> <b><i>"How did I spend my afternoon?"</i></b><br><br> <div align="left"> • App usage tracked<br> • Workflows replayed<br> • Activities recalled </div> </td> </tr> </table> ## ✨ Key Features ### 📹 Always-On Event Capture - **Event-Driven Recording**: No timer or delays - catch mouse actions with crosshair annotation instantly. - **Comprehensive Context**: Five recorders track windows, keyboard, clipboard, notifications, and files around mouse actions. ### 🌲 Intelligent Memory Hierarchy - **Auto-Organization**: Raw streams structure into five tiers: Day → Session → App → Location → Action. - **Smart Summaries**: LLM summaries at each level, transforming logs into searchable knowledge trees. ### 🔍 Tree-Based Retrieval - **No Vector Complexity**: Skip embeddings and VDBs — our system uses tree-based reasoning for navigation. - **Top-Down Search**: LLM reads summaries, selects relevant branches, and drills down to evidence. ### 🤖 Zero-Config Agent Integration - **One-File Setup**: Drop a single skill file into any AI agent for instant integration. - **Immediate Access**: CLI-based screen history queries with zero configuration required. ### 🪶 Ultralight & Privacy-First - **Minimal Footprint**: ~0.2GB runtime RAM with efficient SQLite + FTS5 storage. - **Local & Offline**: All data stays on your machine with full offline mode via Ollama/vLLM/LM Studio. ### 🖥️ Rich Web Interface - **Visual Exploration**: Interactive timelines, memory tree navigation, and real-time system monitoring. - **Natural Conversation**: Chat with your complete digital footprint using natural language. <p align="center"> <img src="assets/web.png" width="100%" alt="CatchMe Web Dashboard"/> </p> ## 💡 CatchMe Architecture CatchMe transforms raw digital activity into structured, searchable memory through three concurrent stages: ### 🔄 Record → Organize → Reason: Turn digital chaos into queryable memory **Capture**. Six background recorders silently track your activity. They monitor window focus, keystrokes, mouse movement, screenshots, clipboard, and notifications. **Index**. Raw events auto-organize into a Hierarchical Activity Tree: Day → Session → App → Location → Action. Each node gets LLM-generated summaries. Fast, meaningful recall without vector embeddings. **Retrieve**. You ask a question. The LLM traverses your memory tree top-down. It selects relevant nodes and inspects raw data like screenshots or keystrokes. Then synthesizes a precise answer. <p align="center"> <img src="assets/catchme-pipe.png" width="680" alt="CatchMe Pipeline: Capturing → Indexing → Retrieving"/> </p> ### 🌲 Hierarchical Activity Tree The Activity Tree is CatchMe's memory core. It provides structured, multi-level views of your digital life. Browse high-level summaries or dive into granular details. <p align="center"> <img src="assets/fig1_activity_tree.png" width="800" alt="Hierarchical Activity Tree Structure"/> </p> ### 🔍 Intelligent Tree Retrieval CatchMe skips traditional vector search. Instead, the LLM directly navigates your Activity Tree. This enables complex, cross-day reasoning. Precise evidence gathering from raw activity history. <p align="center"> <img src="assets/fig2_retrieval.png" width="800" alt="Tree-based Retrieval Process"/> </p> **📖 Learn More**: Detailed design insights and technical deep-dive available in our [blog](https://hkuds.github.io/CatchMe/). ## 🧠 LLM Configuration ### **❗️ Data Privacy Notice** • **100% Local Storage**: All raw data (screenshots, keystrokes, activity trees) stays in ~/data/ and never leaves your machine. • **Offline-First Options**: Local LLMs (Ollama, vLLM, LM Studio) enable fully offline operation without any cloud dependency. • **⚠️Cloud Provider Caution**: If used, cloud APIs will be used to summarize your daily activities. **Untrusted endpoints may expose private data** — review data policies of your provider carefully. ### **📋 Requirements** • **Multimodal support**: Your model should be able to handle text + images. • **Context window**: Make sure the context window of your model exceed `max_tokens` limits in `config.json`. • **Cost control**: For *forced cost control*, set limits via `llm.max_calls` or increase `filter.mouse_cluster_gap` to reduce summarization frequency. CatchMe requires an LLM for background summarization and intelligent retrieval. Use **catchme init** (in <a href="#-get-started">Get Started</a>)for **guided setup** or follow the **manual configuration** steps below. For cloud API services: ```json { "llm": { "provider": "openrouter", "api_key": "sk-or-...", "api_url": null, "model": "google/gemini-3-flash-preview" } } ``` For local/offline operation: ```json { "llm": { "provider": "ollama", "api_key": null, "api_url": null, "model": "gemma3:4b" } } ``` <details> <summary><b>Supported LLM Providers</b></summary> | Provider | Config name | Default API URL | Get Key | | ------------------------- | ------------------------ | ------------------------------------------------------- | -------------------------------------------------------------------- | | **OpenRouter** (gateway) | `openrouter` | `https://openrouter.ai/api/v1` | [openrouter.ai/keys](https://openrouter.ai/keys) | | **AiHubMix** (gateway) | `aihubmix` | `https://aihubmix.com/v1` | [aihubmix.com](https://aihubmix.com) | | **SiliconFlow** (gateway) | `siliconflow` | `https://api.siliconflow.cn/v1` | [cloud.siliconflow.cn](https://cloud.siliconflow.cn) | | **OpenAI** | `openai` | `https://api.openai.com/v1` | [platform.openai.com](https://platform.openai.com/api-keys) | | **Anthropic** | `anthropic` | `https://api.anthropic.com/v1` | [console.anthropic.com](https://console.anthropic.com) | | **DeepSeek** | `deepseek` | `https://api.deepseek.com/v1` | [platform.deepseek.com](https://platform.deepseek.com/api_keys) | | **Gemini** | `gemini` | `https://generativelanguage.googleapis.com/v1beta` | [aistudio.google.com](https://aistudio.google.com/apikey) | | **Groq** | `groq` | `https://api.groq.com/openai/v1` | [console.groq.com](https://console.groq.com/keys) | | **Mistral** | `mistral` | `https://api.mistral.ai/v1` | [console.mistral.ai](https://console.mistral.ai) | | **Moonshot / Kimi** | `moonshot` | `https://api.moonshot.ai/v1` | [platform.moonshot.cn](https://platform.moonshot.cn) | | **MiniMax** | `minimax` | `https://api.minimax.io/v1` | [platform.minimaxi.com](https://platform.minimaxi.com) | | **Zhipu AI (GLM)** | `zhipu` | `https://open.bigmodel.cn/api/paas/v4` | [open.bigmodel.cn](https://open.bigmodel.cn) | | **DashScope (Qwen)** | `dashscope` | `https://dashscope.aliyuncs.com/compatible-mode/v1` | [dashscope.console.aliyun.com](https://dashscope.console.aliyun.com) | | **VolcEngine** | `volcengine` | `https://ark.cn-beijing.volces.com/api/v3` | [console.volcengine.com](https://console.volcengine.com) | | **VolcEngine Coding** | `volcengine_coding_plan` | `https://ark.cn-beijing.volces.com/api/coding/v3` | [console.volcengine.com](https://console.volcengine.com) | | **BytePlus** | `byteplus` | `https://ark.ap-southeast.bytepluses.com/api/v3` | [console.byteplus.com](https://console.byteplus.com) | | **BytePlus Coding** | `byteplus_coding_plan` | `https://ark.ap-southeast.bytepluses.com/api/coding/v3` | [console.byteplus.com](https://console.byteplus.com) | | **Ollama** (local) | `ollama` | `http://localhost:11434/v1` | — | | **vLLM** (local) | `vllm` | `http://localhost:8000/v1` | — | | **LM Studio** (local) | `lmstudio` | `http://localhost:1234/v1` | — | > Any OpenAI-compatible endpoint works — just set `api_url` and `api_key` directly. </details> <details> <summary><b>All Configuration Parameters</b></summary> | Section | Parameter | Default | Description | | ------------- | -------------------------- | ----------- | --------------------------------------------------- | | **web** | `host` | `127.0.0.1` | Dashboard bind address | | | `port` | `8765` | Dashboard port | | **llm** | `provider` | — | LLM provider name (see table above) | | | `api_key` | — | API key for the provider | | | `api_url` | *(auto)* | Custom endpoint; auto-set per provider if omitted | | | `model` | — | Model name (provider-specific) | | | `wire_api` | *(omit)* | Set to `"responses"` for providers that only expose `POST /v1/responses` instead of chat completions | | | `max_calls` | `0` | Max LLM calls per cycle (`0` = unlimited; set to limit costs) | | | `max_images_per_cluster` | `5` | Max screenshots sent per event cluster | | **filter** | `window_min_dwell` | `3.0` | Min window dwell time (sec) before recording | | | `keyboard_cluster_gap` | `3.0` | Keyboard event clustering gap (sec) | | | `mouse_cluster_gap` | `3.0` | Time gap (sec) to merge mouse events; **larger values reduce LLM summaries** | | **summarize** | `language` | `en` | Summary output language (`en`, `zh`, etc.) | | | `max_tokens_l0`–`l3` | `1200` | Max tokens per tree level (L0=Action … L3=Session) | | | `temperature` | `0.4` | LLM temperature for summarization | | | `max_workers` | `2` | Concurrent summarization workers | | | `debounce_sec` | `3.0` | Debounce before triggering summary | | | `save_interval_sec` | `5.0` | Tree auto-save interval | | **retrieve** | `max_prompt_chars` | `42000` | Max chars in retrieval prompt | | | `max_iterations` | `15` | Max tree traversal iterations | | | `max_file_chars` | `8000` | Max chars from extracted files | | | `max_select_nodes` | `7` | Max nodes selected per iteration | | | `max_tokens_step` | `4096` | Max tokens per retrieval step | | | `max_tokens_answer` | `8192` | Max tokens for final answer | | | `temperature_select` | `0.3` | Temperature for node selection | | | `temperature_answer` | `0.5` | Temperature for answer generation | | | `temperature_time_resolve` | `0.1` | Temperature for time resolution | | | `max_tokens_time_resolve` | `1000` | Max tokens for time resolution | </details> ## 🚀 Get Started ### 📦 Install ```bash git clone https://github.com/HKUDS/catchme.git && cd catchme conda create -n catchme python=3.11 -y && conda activate catchme pip install -e . ``` > **macOS** — grant *Accessibility*, *Input Monitoring*, *Screen Recording* in System Settings → Privacy & Security > **Windows** — run as Administrator for global input monitoring ### ⚡ Init ```bash catchme init # interactive setup: provider, API key, llm model ``` ### 🔥 Run ```bash catchme awake # start recording catchme web # visualize and chat # or through cli catchme ask -- "What am I doing today?" ``` <details> <summary><b>Full CLI Reference</b></summary> | Command | Description | | --------------------------- | ------------------------------------------------------ | | `catchme awake` | Start the recording daemon | | `catchme web [-p PORT]` | Launch web dashboard (default `http://127.0.0.1:8765`) | | `catchme ask -- "question"` | Query your activity in natural language | | `catchme cost` | Show LLM token usage (last 10 min / today / all time) | | `catchme disk` | Show storage breakdown & event count | | `catchme ram` | Show memory usage of running processes | | `catchme init` | Interactive setup: LLM provider, API key & model | </details> ## 🦞 CatchMe Makes Your Agents Truly Personal CatchMe ships as an agent-compatible skill for CLI agents (OpenClaw, NanoBot, Claude, Cursor, etc.). **🪶 Agent Integration:** Run CatchMe independently. Your agents query memories via CLI commands only. ```bash # 1. Start CatchMe yourself catchme awake # 2. Give the light skill to your agent cp CATCHME-light.md ~/.cursor/skills/catchme/SKILL.md ``` **Option B — Full Skill** (agent manages the full CatchMe lifecycle autonomously): ```bash cp CATCHME-full.md ~/.cursor/skills/catchme/SKILL.md ``` ### 🔧 Integrate into your current workflow ```python from catchme import CatchMe from catchme.pipelines.retrieve import retrieve # 1. One-line search — fast keyword lookup over all recorded activity with CatchMe() as mem: for e in mem.search("meeting notes"): print(e.timestamp, e.data) # 2. LLM-powered retrieval — natural language Q&A over your screen history for step in retrieve("What was I working on this morning?"): if step["type"] == "answer": print(step["content"]) ``` ## 📊 Cost & Efficiency *Benchmarked with **2 hours of intensive, continuous computer use** on MacBook Air M4.* | Metric | Value | | ----------------------------------------------- | ------------------------------------------------------------------------------- | | **Runtime RAM** | ~0.2 GB | | **Disk Usage** | ~ 200 MB | | **Token Throughput** | input ~ 6 M , output ~ 0.7 M | | | **LLM cost** — `qwen-3.5-plus` | ~ $0.42 via [Aliyun DashScope](https://home.console.aliyun.com/home/dashboard/) | | **LLM cost** — `gemini-3-flash-preview` | ~ $5.00 via [OpenRouter](https://openrouter.ai/models) | **Full Retrieval Speed** (depends on question) | 5 - 20s per query using `gemini-3-flash-preview` | ## 🚀 Roadmap CatchMe evolves with community input. Upcoming features include: **Multi-Device Recording**. Capture and unify GUI activities across all your machines via LAN synchronization. **Dynamic Clustering**. Adaptive clustering algorithms that better reflect your actual work patterns and flows, reducing unnecessary costs. **Enhanced Data Utilization**. Unlock deeper insights from screenshots and metadata beyond current processing pipelines. > 🌟 **Star this repo** to follow our future updates — your interest keeps us motivated! We welcome contributions of any kind - whether it's a comment, a bug report, a feature idea, or a pull request. See [CONTRIBUTING.md](CONTRIBUTING.md) to get started. ## 🤝 Community ### Acknowledgments ! CatchMe is inspired by these excellent open-source projects: | Project | Inspiration | | --------------------------------------------------------------- | ----------------------------------------------------- | | [ActivityWatch](https://github.com/ActivityWatch/activitywatch) | Pioneering open-source activity tracking | | [Screenpipe](https://github.com/mediar-ai/screenpipe) | Screen recording infrastructure for AI agents | | [Windrecorder](https://github.com/Antonoko/Windrecorder) | Personal screen recording & search on Windows | | [OpenRecall](https://github.com/openrecall/openrecall) | Open-source alternative to Windows Recall | | [Selfspy](https://github.com/selfspy/selfspy) | Classic daemon-style activity logging | | [PageIndex](https://github.com/HKUDS/PageIndex) | Tree-structured document retrieval without embeddings | | [MineContext](https://github.com/volcengine/MineContext) | Proactive context-aware AI partner & screen capture | ### 🏛️ Ecosystem CatchMe is part of the **[HKUDS](https://github.com/HKUDS)** agent ecosystem — building the infrastructure layer for personal AI agents: <table> <tr> <td align="center" width="25%"> <a href="https://github.com/HKUDS/nanobot"><b>NanoBot</b></a><br> <sub>Ultra-Lightweight Personal AI Assistant</sub> </td> <td align="center" width="25%"> <a href="https://github.com/HKUDS/CLI-Anything"><b>CLI-Anything</b></a><br> <sub>Making All Software Agent-Native</sub> </td> <td align="center" width="25%"> <a href="https://github.com/HKUDS/ClawWork"><b>ClawWork</b></a><br> <sub>AI Assistant → AI Coworker Evolution</sub> </td> <td align="center" width="25%"> <a href="https://github.com/HKUDS/ClawTeam"><b>ClawTeam</b></a><br> <sub>Agent Awarm Intelligence for Full Team Automation</sub> </td> </tr> </table> <br> <p align="center"> Thanks for visiting ✨ <b>CatchMe</b> </p> <p align="center"> <img src="https://visitor-badge.laobi.icu/badge?page_id=HKUDS.catchme" alt="visitors"/> </p>

Frontend Templates AI Agents

412 Github Stars

Open Source

DeepTutor

<div align="center"> <p align="center"><img src="assets/logo.png" alt="DeepTutor logo" height="56" style="vertical-align: middle;"> <img src="assets/banner.png" alt="DeepTutor" height="48" style="vertical-align: middle;"></p> # DeepTutor: Agent-Native Personalized Tutoring <p align="center"> <a href="https://deeptutor.info" target="_blank"><img alt="Docs — deeptutor.info" src="https://img.shields.io/badge/Docs-deeptutor.info%20%E2%86%97-0A0A0A?style=for-the-badge&labelColor=F5F5F4" height="36"></a> </p> <a href="https://trendshift.io/repositories/17099" target="_blank"><img src="https://trendshift.io/api/badge/repositories/17099" alt="HKUDS%2FDeepTutor | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a> <p align="center"> <a href="README.md"><img alt="English" height="40" src="https://img.shields.io/badge/English-BCDCF7"></a>  <a href="assets/README/README_CN.md"><img alt="简体中文" height="40" src="https://img.shields.io/badge/简体中文-CDCFD4"></a>  <a href="assets/README/README_JA.md"><img alt="日本語" height="40" src="https://img.shields.io/badge/日本語-CDCFD4"></a>  <a href="assets/README/README_ES.md"><img alt="Español" height="40" src="https://img.shields.io/badge/Español-CDCFD4"></a>  <a href="assets/README/README_FR.md"><img alt="Français" height="40" src="https://img.shields.io/badge/Français-CDCFD4"></a>  <a href="assets/README/README_AR.md"><img alt="Arabic" height="40" src="https://img.shields.io/badge/Arabic-CDCFD4"></a>  <a href="assets/README/README_RU.md"><img alt="Русский" height="40" src="https://img.shields.io/badge/Русский-CDCFD4"></a>  <a href="assets/README/README_HI.md"><img alt="Hindi" height="40" src="https://img.shields.io/badge/Hindi-CDCFD4"></a>  <a href="assets/README/README_PT.md"><img alt="Português" height="40" src="https://img.shields.io/badge/Português-CDCFD4"></a>  <a href="assets/README/README_TH.md"><img alt="Thai" height="40" src="https://img.shields.io/badge/Thai-CDCFD4"></a>  <a href="assets/README/README_PL.md"><img alt="Polski" height="40" src="https://img.shields.io/badge/Polski-CDCFD4"></a> </p> [![Python 3.11+](https://img.shields.io/badge/Python-3.11%2B-3776AB?style=flat-square&logo=python&logoColor=white)](https://www.python.org/downloads/) [![Next.js 16](https://img.shields.io/badge/Next.js-16-000000?style=flat-square&logo=next.js&logoColor=white)](https://nextjs.org/) [![License](https://img.shields.io/badge/License-Apache_2.0-blue?style=flat-square)](LICENSE) [![GitHub release](https://img.shields.io/github/v/release/HKUDS/DeepTutor?style=flat-square&color=brightgreen)](https://github.com/HKUDS/DeepTutor/releases) [![arXiv](https://img.shields.io/badge/arXiv-2604.26962-b31b1b?style=flat-square&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2604.26962) [![Discord](https://img.shields.io/badge/Discord-Community-5865F2?style=flat-square&logo=discord&logoColor=white)](https://discord.gg/eRsjPgMU4t) [![Feishu](https://img.shields.io/badge/Feishu-Group-00D4AA?style=flat-square&logo=feishu&logoColor=white)](./Communication.md) [![WeChat](https://img.shields.io/badge/WeChat-Group-07C160?style=flat-square&logo=wechat&logoColor=white)](https://github.com/HKUDS/DeepTutor/issues/78) [Features](#-key-features) · [Get Started](#-get-started) · [Explore](#-explore-deeptutor) · [TutorBot](#-tutorbot--persistent-autonomous-ai-tutors) · [CLI](#%EF%B8%8F-deeptutor-cli--agent-native-interface) · [Multi-User](#-multi-user--shared-deployments-with-per-user-workspaces) · [Community](#-community--ecosystem) </div> --- > 🤝 **We welcome any kinds of contributing!** Vote on roadmap items or propose new ones at [`Roadmap`](https://github.com/HKUDS/DeepTutor/issues/498), and see our [Contributing Guide](CONTRIBUTING.md) for branching strategy, coding standards, and how to get started. ### 📦 Releases > **[2026.5.28]** [v1.4.2](https://github.com/HKUDS/DeepTutor/releases/tag/v1.4.2) — Stability + polish on v1.4.1: Gemini 2.5+ unblocked across Visualize and Chat, ContextVar auth-routing fix (#485), reasoning + native-tools label protocol hardened, smooth-streaming UX on every chat surface, new collapsible Recents sidebar, and Lemonade local-provider support. > **[2026.5.27]** [v1.4.1](https://github.com/HKUDS/DeepTutor/releases/tag/v1.4.1) — Security + stability patch: TutorBot tool sandbox locked down, per-user resource isolation, multimodal image fallback for vision-capable providers, an HTTP/SSE API for talking to a TutorBot, and a v1.4.0 chat regression fix. > **[2026.5.22]** [v1.4.0](https://github.com/HKUDS/DeepTutor/releases/tag/v1.4.0) — GA cut of v1.4: Auto Mode, three-layer Memory, agentic Deep Research / Solve / Question, LlamaIndex RAG refactor, Visualize/Animator merge, plus reasoning-effort normalization, tool-schema fallback, and restart-safe turn runtime. > **[2026.5.21]** [v1.4.0-beta](https://github.com/HKUDS/DeepTutor/releases/tag/v1.4.0-beta) — Three-layer Memory workbench (L1/L2/L3), every chat capability rebuilt on a single agentic engine, LlamaIndex-only RAG, and a unified Settings + Capabilities surface. <details> <summary><b>Past releases (more than 2 weeks ago)</b></summary> > **[2026.5.10]** [v1.3.10](https://github.com/HKUDS/DeepTutor/releases/tag/v1.3.10) — Remote Docker CORS recovery, `DISABLE_SSL_VERIFY` across SDK providers, safer code-block citations, and optional Matrix E2EE add-on. > **[2026.5.9]** [v1.3.9](https://github.com/HKUDS/DeepTutor/releases/tag/v1.3.9) — TutorBot Zulip and NVIDIA NIM support, safer thinking-model routing, `deeptutor start`, sidebar tooltips, and session-store parity. > **[2026.5.8]** [v1.3.8](https://github.com/HKUDS/DeepTutor/releases/tag/v1.3.8) — Optional multi-user deployments with isolated user workspaces, admin grants, auth routes, and scoped runtime access. > **[2026.5.4]** [v1.3.7](https://github.com/HKUDS/DeepTutor/releases/tag/v1.3.7) — Thinking-model/provider fixes, visible Knowledge index history, and safer Co-Writer clear/template editing. > **[2026.5.3]** [v1.3.6](https://github.com/HKUDS/DeepTutor/releases/tag/v1.3.6) — Catalog-based model selection for chat and TutorBot, safer RAG re-indexing, OpenAI Responses token-limit fixes, and Skills editor validation. > **[2026.5.2]** [v1.3.5](https://github.com/HKUDS/DeepTutor/releases/tag/v1.3.5) — Smoother local launch settings, safer RAG queries, cleaner local embedding auth, and Settings dark-mode polish. > **[2026.5.1]** [v1.3.4](https://github.com/HKUDS/DeepTutor/releases/tag/v1.3.4) — Book page chat persistence and rebuild flows, chat-to-book references, stronger language/reasoning handling, RAG document extraction hardening. > **[2026.4.30]** [v1.3.3](https://github.com/HKUDS/DeepTutor/releases/tag/v1.3.3) — NVIDIA NIM + Gemini embedding support, unified Space context for chat history/skills/memory, session snapshots, RAG re-index resilience. > **[2026.4.29]** [v1.3.2](https://github.com/HKUDS/DeepTutor/releases/tag/v1.3.2) — Transparent embedding endpoint URLs, RAG re-index resilience for invalid persisted vectors, memory cleanup for thinking-model output, Deep Solve runtime fix. > **[2026.4.28]** [v1.3.1](https://github.com/HKUDS/DeepTutor/releases/tag/v1.3.1) — Stability: safer RAG routing & embedding validation, Docker persistence, IME-safe input, Windows/GBK robustness. > **[2026.4.27]** [v1.3.0](https://github.com/HKUDS/DeepTutor/releases/tag/v1.3.0) — Versioned KB indexes with re-index workflow, rebuilt Knowledge workspace, embedding auto-discovery with new adapters, Space hub. > **[2026.4.25]** [v1.2.5](https://github.com/HKUDS/DeepTutor/releases/tag/v1.2.5) — Persistent chat attachments with file-preview drawer, attachment-aware capability pipelines, TutorBot Markdown export. > **[2026.4.25]** [v1.2.4](https://github.com/HKUDS/DeepTutor/releases/tag/v1.2.4) — Text/code/SVG attachments, one-command Setup Tour, Markdown chat export, compact KB management UI. > **[2026.4.24]** [v1.2.3](https://github.com/HKUDS/DeepTutor/releases/tag/v1.2.3) — Document attachments (PDF/DOCX/XLSX/PPTX), reasoning thinking-block display, Soul template editor, Co-Writer save-to-notebook. > **[2026.4.22]** [v1.2.2](https://github.com/HKUDS/DeepTutor/releases/tag/v1.2.2) — User-authored Skills system, chat input performance overhaul, TutorBot auto-start, Book Library UI, visualization fullscreen. > **[2026.4.21]** [v1.2.1](https://github.com/HKUDS/DeepTutor/releases/tag/v1.2.1) — Per-stage token limits, Regenerate response across all entry points, RAG & Gemma compatibility fixes. > **[2026.4.20]** [v1.2.0](https://github.com/HKUDS/DeepTutor/releases/tag/v1.2.0) — Book Engine "living book" compiler, multi-document Co-Writer, interactive HTML visualizations, Question Bank @-mention. > **[2026.4.18]** [v1.1.2](https://github.com/HKUDS/DeepTutor/releases/tag/v1.1.2) — Schema-driven Channels tab, RAG single-pipeline consolidation, externalized chat prompts. > **[2026.4.17]** [v1.1.1](https://github.com/HKUDS/DeepTutor/releases/tag/v1.1.1) — Universal "Answer now", Co-Writer scroll sync, unified settings panel, streaming Stop button. > **[2026.4.15]** [v1.1.0](https://github.com/HKUDS/DeepTutor/releases/tag/v1.1.0) — LaTeX block math overhaul, LLM diagnostic probe, Docker + local LLM guidance. > **[2026.4.14]** [v1.1.0-beta](https://github.com/HKUDS/DeepTutor/releases/tag/v1.1.0-beta) — Bookmarkable sessions, Snow theme, WebSocket heartbeat & auto-reconnect, embedding registry overhaul. > **[2026.4.13]** [v1.0.3](https://github.com/HKUDS/DeepTutor/releases/tag/v1.0.3) — Question Notebook with bookmarks & categories, Mermaid in Visualize, embedding mismatch detection, Qwen/vLLM compatibility, LM Studio & llama.cpp support, and Glass theme. > **[2026.4.11]** [v1.0.2](https://github.com/HKUDS/DeepTutor/releases/tag/v1.0.2) — Search consolidation with SearXNG fallback, provider switch fix, and frontend resource leak fixes. > **[2026.4.10]** [v1.0.1](https://github.com/HKUDS/DeepTutor/releases/tag/v1.0.1) — Visualize capability (Chart.js/SVG), quiz duplicate prevention, and o4-mini model support. > **[2026.4.10]** [v1.0.0-beta.4](https://github.com/HKUDS/DeepTutor/releases/tag/v1.0.0-beta.4) — Embedding progress tracking with rate-limit retry, cross-platform dependency fixes, and MIME validation fix. > **[2026.4.8]** [v1.0.0-beta.3](https://github.com/HKUDS/DeepTutor/releases/tag/v1.0.0-beta.3) — Native OpenAI/Anthropic SDK (drop litellm), Windows Math Animator support, robust JSON parsing, and full Chinese i18n. > **[2026.4.7]** [v1.0.0-beta.2](https://github.com/HKUDS/DeepTutor/releases/tag/v1.0.0-beta.2) — Hot settings reload, MinerU nested output, WebSocket fix, and Python 3.11+ minimum. > **[2026.4.4]** [v1.0.0-beta.1](https://github.com/HKUDS/DeepTutor/releases/tag/v1.0.0-beta.1) — Agent-native architecture rewrite (~200k lines): Tools + Capabilities plugin model, CLI & SDK, TutorBot, Co-Writer, Guided Learning, and persistent memory. > **[2026.1.23]** [v0.6.0](https://github.com/HKUDS/DeepTutor/releases/tag/v0.6.0) — Session persistence, incremental document upload, flexible RAG pipeline import, and full Chinese localization. > **[2026.1.18]** [v0.5.2](https://github.com/HKUDS/DeepTutor/releases/tag/v0.5.2) — Docling support for RAG-Anything, logging system optimization, and bug fixes. > **[2026.1.15]** [v0.5.0](https://github.com/HKUDS/DeepTutor/releases/tag/v0.5.0) — Unified service configuration, RAG pipeline selection per knowledge base, question generation overhaul, and sidebar customization. > **[2026.1.9]** [v0.4.0](https://github.com/HKUDS/DeepTutor/releases/tag/v0.4.0) — Multi-provider LLM & embedding support, new home page, RAG module decoupling, and environment variable refactor. > **[2026.1.5]** [v0.3.0](https://github.com/HKUDS/DeepTutor/releases/tag/v0.3.0) — Unified PromptManager architecture, GitHub Actions CI/CD, and pre-built Docker images on GHCR. > **[2026.1.2]** [v0.2.0](https://github.com/HKUDS/DeepTutor/releases/tag/v0.2.0) — Docker deployment, Next.js 16 & React 19 upgrade, WebSocket security hardening, and critical vulnerability fixes. </details> ### 📰 News > **[2026.5.22]** 🌐 Our official docs site is live at [**deeptutor.info**](https://deeptutor.info/) — guides, references, and capability tours all in one place. > **[2026.4.19]** 🎉 We've reached 20k stars after 111 days! Thank you for the incredible support — we're committed to continuous iteration toward truly personalized, intelligent tutoring for everyone. > **[2026.4.10]** 📄 Our paper is now live on arXiv! Read the [preprint](https://arxiv.org/abs/2604.26962) to learn more about the design and ideas behind DeepTutor. > **[2026.4.4]** Long time no see! ✨ DeepTutor v1.0.0 is finally here — an agent-native evolution featuring a ground-up architecture rewrite, TutorBot, and flexible mode switching under the Apache-2.0 license. A new chapter begins, and our story continues! > **[2026.2.6]** 🚀 We've reached 10k stars in just 39 days! A huge thank you to our incredible community for the support! > **[2026.1.1]** Happy New Year! Join our [Discord](https://discord.gg/eRsjPgMU4t), [WeChat](https://github.com/HKUDS/DeepTutor/issues/78), or [Discussions](https://github.com/HKUDS/DeepTutor/discussions) — let's shape the future of DeepTutor together! > **[2025.12.29]** DeepTutor is officially released! ## ✨ Key Features **Work surfaces** - Chat — Chat, Solve, Quiz, Research, and Visualize share one session, knowledge base, and citation history, so you can escalate mid-conversation without losing context. - Co-Writer — split-view Markdown workspace where any selection can be rewritten, expanded, or shortened, optionally grounded by your KB or the web. Drafts save straight to notebooks. - Book Engine — a multi-agent pipeline compiles your materials into interactive "living books" with 13 block types: quizzes, flash cards, timelines, concept graphs, an embedded GeoGebra viewer, animations, and more. Pages are KB-fingerprinted, so drift is detectable. **Your library** - Knowledge Bases — versioned RAG-ready collections, end-to-end on LlamaIndex. Every (re-)index is tracked, comparable, and rollback-able. - Space — a personal review library bundling chat history, notebooks, question bank, and user-authored skills (`SKILL.md`) that switch DeepTutor's persona. - Three-layer memory — append-only L1 traces, L2 per-surface curated facts with citations, and L3 cross-surface synthesis. An inspectable workbench and a memory graph let you audit *why* DeepTutor knows what it knows. **Extensibility & control** - Composable tools — RAG, web search, code execution, reasoning, brainstorming, paper search, GeoGebra analysis, and chat helpers (`ask_user`, `web_fetch`, `write_note`, `list_notebook`, `github_query`). MCP servers plug in alongside built-ins. - Personal TutorBots — persistent, autonomous tutors, each with its own workspace, soul, skills, and channels (Telegram, Discord, Slack, Matrix, Zulip, …). Built on [nanobot](https://github.com/HKUDS/nanobot). - Unified settings — one draft / Apply workbench for appearance, models, embeddings, search, capabilities, memory, MCP servers, and tools, with shared per-call cost tracking. - Agent-native CLI — every capability, KB, session, and TutorBot is one command away; rich output for humans, structured JSON for agents. Hand any tool-using LLM the [`SKILL.md`](SKILL.md) and it can drive DeepTutor on its own. - Optional authentication — off by default; opt in for multi-user deployments with bcrypt + JWT, an admin dashboard, and an optional PocketBase / OAuth sidecar. --- ## 🚀 Get Started DeepTutor ships four installation paths. They all share one workspace layout: settings live in `data/user/settings/` under the directory you launch from (or under `DEEPTUTOR_HOME` / `deeptutor start --home` if you set one explicitly). For the full app, the recommended flow is **pick a workspace directory → install → `deeptutor init` → `deeptutor start`**. > ✨ **v1.4.2 is live.** `pip install -U deeptutor` picks up the latest stable. Pre-releases (when available) opt in with `pip install --pre -U deeptutor`. ### Option 1 — Install From PyPI Full local Web app + CLI, no clone required. Needs **Python 3.11+** and a **Node.js 20+** runtime on PATH (the packaged Next.js standalone server is spawned by `deeptutor start`). ```bash mkdir -p my-deeptutor && cd my-deeptutor pip install -U deeptutor deeptutor init # prompts for ports + LLM provider + optional embedding deeptutor start # starts backend + frontend; keep the terminal open ``` `deeptutor init` prompts for backend port (default `8001`), frontend port (default `3782`), LLM provider / base URL / API key / model, and an optional embedding provider for Knowledge Base / RAG. After `deeptutor start`, open the frontend URL printed in the terminal — by default [http://127.0.0.1:3782](http://127.0.0.1:3782). Press `Ctrl+C` in that terminal to stop both backend and frontend. Skipping `deeptutor init` is fine for a quick trial; the app boots with default ports and empty model settings, configure them later in **Settings → Models**. ### Option 2 — Install From Source For development against a checkout. Use **Python 3.11+** and **Node.js 22 LTS** to match CI and Docker. ```bash git clone https://github.com/HKUDS/DeepTutor.git cd DeepTutor # Create a venv (macOS/Linux). Windows PowerShell: # py -3.11 -m venv .venv ; .\.venv\Scripts\Activate.ps1 python3 -m venv .venv && source .venv/bin/activate python -m pip install --upgrade pip # Install backend + frontend deps python -m pip install -e . ( cd web && npm ci --legacy-peer-deps ) deeptutor init deeptutor start ``` Source installs run Next.js in dev mode against the local `web/` directory; everything else (config layout, ports, stop with `Ctrl+C`) matches Option 1. <details> <summary><b>Conda environment</b> (instead of <code>venv</code>)</summary> ```bash conda create -n deeptutor python=3.11 conda activate deeptutor python -m pip install --upgrade pip ``` </details> <details> <summary><b>Optional install extras</b> — dev / tutorbot / matrix / math-animator</summary> ```bash pip install -e ".[dev]" # tests/lint tools pip install -e ".[tutorbot]" # TutorBot engine + channel SDKs pip install -e ".[matrix]" # Matrix channel without E2EE/libolm pip install -e ".[matrix-e2e]" # Matrix E2EE; requires libolm pip install -e ".[math-animator]" # Manim addon; requires LaTeX/ffmpeg/system libs ``` </details> <details> <summary><b>Frontend dependency tweaks & dev-server troubleshooting</b></summary> **Changing frontend dependencies:** run `npm install --legacy-peer-deps` to refresh `web/package-lock.json`, then commit both `web/package.json` and `web/package-lock.json`. **Stuck dev server:** if `deeptutor start` reports an existing frontend that isn't responding, stop the PID it prints. If no Next.js process is actually running, the lock files are stale — remove them and retry: ```bash rm -f web/.next/dev/lock web/.next/lock deeptutor start ``` </details> ### Option 3 — Docker One container for the full Web app. Images on GitHub Container Registry: - `ghcr.io/hkuds/deeptutor:latest` — stable release - `ghcr.io/hkuds/deeptutor:pre` — pre-release, when available ```bash docker run --rm --name deeptutor \ -p 127.0.0.1:3782:3782 \ -p 127.0.0.1:8001:8001 \ -v deeptutor-data:/app/data \ ghcr.io/hkuds/deeptutor:latest ``` > ⚠️ **Map both `3782` and `8001`.** `3782` serves the web UI; `8001` is the FastAPI backend that your browser calls directly — there is no in-container proxy. Skip the `8001` mapping and the page still loads, but **Settings** shows "Backend unreachable" and stays unusable. Open [http://127.0.0.1:3782](http://127.0.0.1:3782). The container creates `/app/data/user/settings/*.json` on first boot; configure model providers from the Web Settings page. Config, API keys, logs, workspace files, memory, and knowledge bases persist in the `deeptutor-data` volume. - **Different host ports:** change the left side of each `-p host:container` mapping (e.g. `-p 127.0.0.1:8088:3782`). If you change container-side ports in `/app/data/user/settings/system.json`, restart and update the right side of each mapping to match. - **Detached:** add `-d`, then `docker logs -f deeptutor` to follow, `docker stop deeptutor` to stop, `docker rm deeptutor` before reusing the name. The `deeptutor-data` volume keeps your settings and workspace across restarts. **Remote Docker / reverse proxy:** the Web UI runs in the browser, so the browser needs a backend URL it can reach. For remote servers, open **Settings -> Network** or edit `data/user/settings/system.json`: ```json { "next_public_api_base_external": "https://deeptutor.example.com" } ``` `public_api_base` is accepted as a compatibility alias and is normalized into `next_public_api_base_external` on save. CORS uses frontend **origins**, not API URLs. With auth disabled, DeepTutor permits normal HTTP/HTTPS browser origins by default. With auth enabled, add exact frontend origins: ```json { "cors_origins": ["https://deeptutor.example.com"] } ``` <details> <summary><b>Connecting to Ollama / LM Studio / llama.cpp / vLLM / Lemonade on the host</b></summary> Inside Docker, `localhost` is the container itself, not your host machine. To reach a model service running on the host, use the host gateway (recommended): ```bash docker run --rm --name deeptutor \ -p 127.0.0.1:3782:3782 -p 127.0.0.1:8001:8001 \ --add-host=host.docker.internal:host-gateway \ -v deeptutor-data:/app/data \ ghcr.io/hkuds/deeptutor:latest ``` Then in **Settings → Models**, point the provider Base URL at `host.docker.internal`: - Ollama LLM: `http://host.docker.internal:11434/v1` - Ollama embedding: `http://host.docker.internal:11434/api/embed` - LM Studio: `http://host.docker.internal:1234/v1` - llama.cpp: `http://host.docker.internal:8080/v1` - Lemonade: `http://host.docker.internal:13305/api/v1` Docker Desktop (macOS/Windows) usually resolves `host.docker.internal` without `--add-host`. On Linux, the flag is the portable way to create that hostname on modern Docker Engine. **Linux alternative — host networking:** add `--network=host` and drop the `-p` flags. The container shares the host network directly, so open [http://127.0.0.1:3782](http://127.0.0.1:3782) (or the `frontend_port` in `system.json`), and host services can be reached with normal localhost URLs like `http://127.0.0.1:11434/v1`. Note that host networking exposes container ports directly on the host and may conflict with existing services. </details> ### Option 4 — CLI Only When you don't need the Web UI. The CLI-only package is installed from a source checkout, not from PyPI. ```bash git clone https://github.com/HKUDS/DeepTutor.git cd DeepTutor # Create a venv (macOS/Linux). Windows PowerShell: # py -3.11 -m venv .venv-cli ; .\.venv-cli\Scripts\Activate.ps1 python3 -m venv .venv-cli && source .venv-cli/bin/activate python -m pip install --upgrade pip python -m pip install -e ./packaging/deeptutor-cli deeptutor init --cli deeptutor chat ``` `deeptutor init --cli` shares the same `data/user/settings/` layout as the full app but skips the backend/frontend port prompts and defaults embeddings to **off** (choose `Yes` if you plan to use `deeptutor kb …` or RAG tools). It still writes a complete runtime layout (`system.json`, `auth.json`, `integrations.json`, `model_catalog.json`, `main.yaml`, `agents.yaml`) and still prompts for the active LLM provider and model. <details> <summary><b>Common commands</b></summary> ```bash deeptutor chat # interactive REPL deeptutor chat --capability deep_solve --tool rag --kb my-kb deeptutor run chat "Explain Fourier transform" deeptutor run deep_solve "Solve x^2 = 4" --tool rag --kb my-kb deeptutor kb create my-kb --doc textbook.pdf deeptutor memory show deeptutor config show ``` </details> The local `deeptutor-cli` install ships no Web assets or server dependencies. Keep the source checkout around — the editable install points to it. To add the Web app later, install the PyPI package (Option 1) and run `deeptutor init` + `deeptutor start` from the same workspace. ### Configuration Reference <details> <summary><b>Config files under <code>data/user/settings/</code></b> — JSON/YAML reference</summary> Everything under `data/user/settings/` is plain JSON/YAML. The **Settings** page in the browser is the recommended editor. | File | Purpose | |:---|:---| | `model_catalog.json` | LLM, embedding, and search provider profiles; API keys; active models | | `system.json` | Backend/frontend ports, public API base, CORS, SSL verification, attachment directory | | `auth.json` | Optional auth toggle, username, password hash, token/cookie settings | | `integrations.json` | Optional PocketBase and sidecar integration settings | | `interface.json` | UI language / theme / sidebar preferences | | `main.yaml` | Runtime behavior defaults and path injection | | `agents.yaml` | Capability/tool temperature and token settings | Project-root `.env` is **not** read as an application config file. For a minimal model setup, open **Settings → Models**, add an LLM profile (Base URL / API key / model name), and save. Add an embedding profile only if you plan to use Knowledge Base / RAG features. </details> ## 📖 Explore DeepTutor The v1.4.0-beta refactor reorganises DeepTutor around **five core surfaces** — Chat, Co-Writer, Book, Knowledge, Space — plus a **three-layer Memory** that sits underneath all of them and a unified **Settings** workbench that exposes every knob. Capabilities (Solve / Quiz / Research / Visualize) and tools (RAG, web, code, reason, brainstorm, paper search, `ask_user`, `web_fetch`, `write_note`, `list_notebook`, `github_query`) compose freely on top. ### 💬 Chat — Unified Intelligent Workspace <div align="center"> <img src="assets/figs/dt-chat.png" alt="Chat Workspace" width="800"> </div> One thread, five modes, any tool. The capability picker lives in the composer; the same session, knowledge base, attachments, and references travel with you across modes — switch from a casual question into multi-agent solving, into a quiz, into a full research report, without losing context. <details> <summary><b>What each mode does & what it's built on</b></summary> | Mode | What it does | Built on | |:---|:---|:---| | **Chat** | Flexible conversation with any tool; pick from RAG, web search, code execution, deep reasoning, brainstorming, paper search, GeoGebra analysis. | LlamaIndex-backed RAG + tool registry | | **Solve** | Multi-step plan → investigate → solve → verify, with precise source citations. | Agentic engine (`deep_solve`) | | **Quiz** | Auto-validated question generation grounded in your KB; spawns a follow-up chat composer per question. | Agentic engine (`deep_question`) | | **Research** | Decomposes a topic into subtopics, dispatches parallel agents across RAG / web / arXiv, and produces a cited report with iterative append-mode revisions. | Rebuilt `pipeline.py` (~45% smaller, citations + iterative reporting preserved) | | **Visualize** | Generate SVG diagrams, Chart.js charts, Mermaid graphs, interactive HTML pages, **or** Manim videos / storyboards — the analyzer picks the right `render_type`. | Visualize pipeline (Animator merged in) | </details> **New chat tools** shipped with the refactor: `ask_user` (asks a structured clarifying question mid-turn), `web_fetch` (pulls a specific URL into context), `write_note` / `list_notebook` (saves and lists notebook records from the chat surface), and `github_query` (issue / PR / repo lookups). Tools stay **decoupled from workflows** — every mode lets you opt tools in or out per turn. A session also carries a **cumulative source inventory** across turns, so citations from earlier RAG / web hits remain reusable later in the same conversation. ### ✍️ Co-Writer — Multi-Document AI Writing Workspace <div align="center"> <img src="assets/figs/dt-cowriter.png" alt="Co-Writer" width="800"> </div> Co-Writer is a split-view Markdown workbench (raw editor on the left, live preview on the right) for notes, reports, tutorials, and AI-assisted drafts. Each document lives in its own workspace with autosave, downloadable Markdown, and one-click **Save to Notebook**. Select any text and choose **Rewrite**, **Expand**, or **Shorten** — every action runs as a tracked agent edit that can optionally pull from a knowledge base or the web. Co-Writer renders standard Markdown / CommonMark / GFM (tables, code, math, flowcharts, sequence diagrams), supports a HTML tag escape hatch (`<sub>`, `<sup>`, `<abbr>`, `<mark>`), and ships a starter template tuned for DeepTutor product docs and learning notes. ### 📖 Book Engine — Interactive "Living Books" <div align="center"> <img src="assets/figs/dt-book.png" alt="Book Engine" width="800"> </div> Give DeepTutor a topic, point it at your knowledge base, and it produces a structured, interactive book — not a static export, but a living document you can read, quiz yourself on, and discuss in context. Behind the scenes, a multi-agent pipeline handles the heavy lifting: proposing an outline, retrieving relevant sources from your KB, synthesising a chapter tree, planning each page, and compiling every block. You stay in control — review the proposal, reorder chapters, and chat alongside any page. Pages are assembled from 13 block types — text, callout, quiz, flash cards, code, figure, deep dive, animation, interactive demo (now including a **GeoGebra viewer**), timeline, concept graph, section, and user note — each rendered with its own interactive component. Book pages are fingerprinted against their source KB; `deeptutor book health` reports drift and `deeptutor book refresh-fingerprints` clears stale pages when sources change. ### 📚 Knowledge Bases — RAG-Ready Document Libraries <div align="center"> <img src="assets/figs/dt-kb.png" alt="Knowledge Bases" width="800"> </div> A dedicated workspace for the document collections that power RAG. Each knowledge base has four tabs: - **Files** — Browse uploaded sources, preview PDFs inline, and see per-file size / status. - **Add documents** — Drop in PDFs, Office files (DOCX / XLSX / PPTX), Markdown, plain text, and a wide range of code / data file types. Documents are routed through the appropriate extractor automatically. - **Index versions** — Every (re-)index is a tracked version. Roll back to an earlier index, compare embedding models, or inspect chunking stats without losing the previous build. - **Settings** — Pick the embedding provider / model, chunking parameters, and reranker for the KB. Defaults are inherited from your global LLM and embedding profiles. Indexing is built on **LlamaIndex** end-to-end (the previous dual-pipeline split was consolidated in the v1.4 refactor), with retry-safe re-index, embedding-mismatch detection, and resilient handling of corrupt persisted vectors. ### 🌐 Space — Your Personal Learning Library <div align="center"> <img src="assets/figs/dt-space.png" alt="Space" width="800"> </div> Space is the **read / review** counterpart to the active surfaces. Where Chat / Co-Writer / Book are where you *produce*, Space is where everything you produce lives, searchable and replayable. - **Chat History** — Every conversation across every mode, with title-rename, delete, and resume; deleting individual turns is supported on every entry point. - **Notebooks** — Save outputs from Chat, Research, and Co-Writer into categorised, colour-coded notebooks; each record links back to the originating session and surface. - **Question Bank** — Every auto-generated quiz question, bookmarkable and @-mention-able in chat to reason over past performance. - **Skills** — User-authored `SKILL.md` files that define teaching personas (name, description, triggers, body). When active, a skill is injected into the chat system prompt — turning DeepTutor into a Socratic tutor, a research assistant, or any role you design. ### 🧠 Memory — Three-Layer Architecture <div align="center"> <img src="assets/figs/dt-memory.png" alt="Memory Workbench" width="800"> </div> DeepTutor's memory is now a **three-layer pipeline** with an inspectable workbench at `/memory`. The two-file v1 `SUMMARY.md` / `PROFILE.md` model is gone; everything is migrated into the new layout on first boot. <details> <summary><b>L1 / L2 / L3 — role and on-disk layout</b></summary> | Layer | Role | Storage | |:---|:---|:---| | **L1 · Workspace mirror** (LIVE) | Append-only trace of every interaction, per surface, per day. The lossless record of what actually happened. | `trace/<surface>/<YYYY-MM-DD>.jsonl` | | **L2 · Per-surface summaries** (CURATED) | Surface-specific facts extracted by the consolidator. Each fact carries footnote citations back to L1 traces. Supports per-doc **Update / Audit / Dedup** runs. | `L2/<surface>.md` | | **L3 · Cross-surface knowledge** (SYNTHESIS) | Cross-surface synthesis: your `profile`, `recent` timeline, `scope` of knowledge, and `preferences`. Hedged claims, each backed by L2 evidence. | `L3/<recent\|profile\|scope\|preferences>.md` | </details> Seven surfaces feed the pipeline: **chat, notebook, quiz, kb, book, tutorbot, cowriter**. The consolidator is LLM-driven and runs asynchronously (`POST /memory/runs/start`) — you can fire it from the workbench, watch L1 → L2 → L3 propagate, and edit any layer by hand. <div align="center"> <img src="assets/figs/dt-memgraph.png" alt="Memory Graph" width="800"> </div> The **Memory Graph** (`/memory/graph`) renders all three layers at once: L3 synthesis at the centre, L2 facts in the middle ring, L1 traces on the outside, grouped by surface. Hover any node for an inline preview; click to lock the highlight and trace the L3 → L2 → L1 references inward, so you can audit *why* DeepTutor "knows" something about you. ### ⚙️ Settings — Unified Control Center <div align="center"> <img src="assets/figs/dt-settings.png" alt="Settings" width="800"> </div> The settings surface was unified in v1.4 and split by concern, with a draft / **Apply** model so changes are atomic and can be reverted before save: - **Appearance** — UI language and theme (Cream, Snow, Dark, Glass). - **Status** — Live health probe across LLM, embedding, search, and storage backends. - **LLM**, **Embedding**, **Search** — Provider catalog, base URLs, API keys, and active model selection. Active models are picked from the catalog so every surface stays in sync. - **Capabilities** — Per-capability tunables (chunking, LLM budget, dedup and reference policies, max iterations) for Chat, Solve, Quiz, Research, Visualize, and Co-Writer. Backed by a unified `emit_capability_result` envelope and a shared `UsageTracker` that surfaces per-call cost. - **Memory** — Toggle consolidator runs, configure cadence and budget, and jump into the memory workbench. - **MCP servers** — Register external Model Context Protocol servers; their tools are surfaced alongside built-in tools. - **Tools** — Inspect every built-in tool, its parameters, status (enabled / coming-soon), and i18n status copy. A "Tour" launcher walks new users through the page, and every capability ships a canonical `capabilities/prompts/{en,zh}/<name>.yaml` so status messages stay consistent in both English and 中文. --- ### 🦞 TutorBot — Persistent, Autonomous AI Tutors <div align="center"> <img src="assets/figs/tutorbot-architecture.png" alt="TutorBot Architecture" width="800"> </div> TutorBot is not a chatbot — it is a **persistent, multi-instance agent** built on [nanobot](https://github.com/HKUDS/nanobot). Each TutorBot runs its own agent loop with independent workspace, memory, and personality. Create a Socratic math tutor, a patient writing coach, and a rigorous research advisor — all running simultaneously, each evolving with you. <div align="center"> <img src="assets/figs/dt-tutorbot.png" alt="TutorBot Agents" width="800"> </div> - **Soul Templates** — Define your tutor's personality, tone, and teaching philosophy through editable Soul files. Choose from built-in archetypes (Socratic, encouraging, rigorous) or craft your own — the soul shapes every response. - **Independent Workspace** — Each bot has its own directory with separate memory, sessions, skills, and configuration — fully isolated yet able to access DeepTutor's shared knowledge layer. - **Proactive Heartbeat** — Bots don't just respond — they initiate. The built-in Heartbeat system enables recurring study check-ins, review reminders, and scheduled tasks. Your tutor shows up even when you don't. - **Full Tool Access** — Every bot reaches into DeepTutor's complete toolkit: RAG retrieval, code execution, web search, academic paper search, deep reasoning, and brainstorming. - **Skill Learning** — Teach your bot new abilities by adding skill files to its workspace. As your needs evolve, so does your tutor's capability. - **Multi-Channel Presence** — Connect bots to Telegram, Discord, Slack, Feishu, WeChat Work, DingTalk, Matrix, QQ, WhatsApp, Email, and more. Your tutor meets you wherever you are. - **Team & Sub-Agents** — Spawn background sub-agents or orchestrate multi-agent teams within a single bot for complex, long-running tasks. ```bash deeptutor bot create math-tutor --persona "Socratic math teacher who uses probing questions" deeptutor bot create writing-coach --persona "Patient, detail-oriented writing mentor" deeptutor bot list # See all your active tutors ``` --- ### ⌨️ DeepTutor CLI — Agent-Native Interface <div align="center"> <img src="assets/figs/cli-architecture.png" alt="DeepTutor CLI Architecture" width="800"> </div> DeepTutor is fully CLI-native. Every capability, knowledge base, session, memory, and TutorBot is one command away — no browser required. The CLI serves both humans (with rich terminal rendering) and AI agents (with structured JSON output). Hand the [`SKILL.md`](SKILL.md) at the project root to any tool-using agent ([nanobot](https://github.com/HKUDS/nanobot), or any LLM with tool access), and it can configure and operate DeepTutor autonomously. <details> <summary><b>Example commands</b> — one-shot run, REPL, KB lifecycle, JSON output, session resume</summary> **One-shot execution** — Run any capability directly from the terminal: ```bash deeptutor run chat "Explain the Fourier transform" -t rag --kb textbook deeptutor run deep_solve "Prove that √2 is irrational" -t reason deeptutor run deep_question "Linear algebra" --config num_questions=5 deeptutor run deep_research "Attention mechanisms in transformers" deeptutor run visualize "Draw the architecture of a transformer" ``` **Interactive REPL** — A persistent chat session with live mode switching: ```bash deeptutor chat --capability deep_solve --kb my-kb # Inside the REPL: /cap, /tool, /kb, /history, /notebook, /config to switch on the fly ``` **Knowledge base lifecycle** — Build, query, and manage RAG-ready collections entirely from the terminal: ```bash deeptutor kb create my-kb --doc textbook.pdf # Create from document deeptutor kb add my-kb --docs-dir ./papers/ # Add a folder of papers deeptutor kb search my-kb "gradient descent" # Search directly deeptutor kb set-default my-kb # Set as default for all commands ``` **Dual output mode** — Rich rendering for humans, structured JSON for pipelines: ```bash deeptutor run chat "Summarize chapter 3" -f rich # Colored, formatted output deeptutor run chat "Summarize chapter 3" -f json # Line-delimited JSON events ``` **Session continuity** — Resume any conversation right where you left off: ```bash deeptutor session list # List all sessions deeptutor session open <id> # Resume in REPL ``` </details> <details> <summary><b>Full CLI command reference</b></summary> **Top-level** | Command | Description | |:---|:---| | `deeptutor run <capability> <message>` | Run any capability in a single turn (`chat`, `deep_solve`, `deep_question`, `deep_research`, `math_animator`, `visualize`) | | `deeptutor chat` | Interactive REPL with optional `--capability`, `--tool`, `--kb`, `--language` | | `deeptutor serve` | Start the DeepTutor API server | **`deeptutor bot`** | Command | Description | |:---|:---| | `deeptutor bot list` | List all TutorBot instances | | `deeptutor bot create <id>` | Create and start a new bot (`--name`, `--persona`, `--model`) | | `deeptutor bot start <id>` | Start a bot | | `deeptutor bot stop <id>` | Stop a bot | **`deeptutor kb`** | Command | Description | |:---|:---| | `deeptutor kb list` | List all knowledge bases | | `deeptutor kb info <name>` | Show knowledge base details | | `deeptutor kb create <name>` | Create from documents (`--doc`, `--docs-dir`) | | `deeptutor kb add <name>` | Add documents incrementally | | `deeptutor kb search <name> <query>` | Search a knowledge base | | `deeptutor kb set-default <name>` | Set as default KB | | `deeptutor kb delete <name>` | Delete a knowledge base (`--force`) | **`deeptutor memory`** | Command | Description | |:---|:---| | `deeptutor memory show [file]` | View memory (`summary`, `profile`, or `all`) | | `deeptutor memory clear [file]` | Clear memory (`--force`) | **`deeptutor session`** | Command | Description | |:---|:---| | `deeptutor session list` | List sessions (`--limit`) | | `deeptutor session show <id>` | View session messages | | `deeptutor session open <id>` | Resume session in REPL | | `deeptutor session rename <id>` | Rename a session (`--title`) | | `deeptutor session delete <id>` | Delete a session | **`deeptutor notebook`** | Command | Description | |:---|:---| | `deeptutor notebook list` | List notebooks | | `deeptutor notebook create <name>` | Create a notebook (`--description`) | | `deeptutor notebook show <id>` | View notebook records | | `deeptutor notebook add-md <id> <path>` | Import markdown as record | | `deeptutor notebook replace-md <id> <rec> <path>` | Replace a markdown record | | `deeptutor notebook remove-record <id> <rec>` | Remove a record | **`deeptutor book`** | Command | Description | |:---|:---| | `deeptutor book list` | List all books in the workspace | | `deeptutor book health <book_id>` | Check KB drift and book health | | `deeptutor book refresh-fingerprints <book_id>` | Refresh KB fingerprints and clear stale pages | **`deeptutor config` / `plugin` / `provider`** | Command | Description | |:---|:---| | `deeptutor config show` | Print current configuration summary | | `deeptutor plugin list` | List registered tools and capabilities | | `deeptutor plugin info <name>` | Show tool or capability details | | `deeptutor provider login <provider>` | Provider auth (`openai-codex` OAuth login; `github-copilot` validates an existing Copilot auth session) | </details> --- ### 👥 Multi-User — Shared Deployments with Per-User Workspaces <div align="center"> <img src="assets/figs/dt-multi-user.png" alt="Multi-User" width="800"> </div> Flip on authentication and DeepTutor turns into a multi-tenant deployment with **per-user isolated workspaces** and **admin-curated resources**. The first person to register becomes the admin and configures models, API keys, and knowledge bases on behalf of everyone else. Subsequent accounts are created by the admin (invite-only), each gets their own scoped chat history / memory / notebooks / knowledge bases, and they only see the LLMs, KBs, and skills the admin assigned to them. **Quick start (5 steps):** ```bash # 1. Enable auth in data/user/settings/auth.json: # {"enabled": true, "token_expire_hours": 24, "cookie_secure": false} # Use cookie_secure=true for HTTPS deployments where Web and API are cross-site. # 2. Restart the web stack. deeptutor start # 3. Open http://localhost:3782/register and create the first account. # The first registration is the only public one; that user becomes admin # and the /register endpoint is closed automatically afterward. # 4. As admin, navigate to /admin/users → "Add user" to provision teammates. # 5. For each user, click the slider icon → assign LLM profiles, knowledge # bases, and skills. Save. The user can now sign in and start working. ``` **What the admin sees:** - **Full Settings page** at `/settings` — manage LLM / embedding / search providers, API keys, model catalogs, and runtime "Apply". - **User management** at `/admin/users` — create, promote, demote, and delete accounts. The public `/register` endpoint is automatically closed once the first admin exists; further accounts go through `POST /api/v1/auth/users` (admin-only). - **Grant editor** — for each non-admin user, pick the model profiles, knowledge bases, and skills they may use. Grants carry **logical IDs only**; API keys never cross the grant boundary. - **Audit trail** — every grant change and assigned-resource access is appended to `multi-user/_system/audit/usage.jsonl`. **What ordinary users get:** - **Isolated workspace** under `multi-user/<uid>/` — their own chat history (`chat_history.db`), memory (`SUMMARY.md` / `PROFILE.md`), notebooks, and personal knowledge bases. Nothing is shared by default. - **Read-only access** to admin-assigned knowledge bases and skills, surfaced inline next to their own resources with an "Assigned by admin" badge. - **Redacted Settings page** — only theme, language, and a summary of granted models. API keys, base URLs, and provider endpoints are never returned for non-admin requests. - **Scoped LLM** — chat turns are routed through the admin-assigned model. If no LLM is granted, the turn is rejected up-front (no silent fallback to the admin's keys). **Workspace layout:** ``` multi-user/ ├── _system/ │ ├── auth/users.json # Hashed credentials, roles │ ├── auth/auth_secret # JWT signing secret (auto-generated) │ ├── grants/<uid>.json # Per-user resource grants (admin-managed) │ └── audit/usage.jsonl # Audit trail └── <uid>/ ├── user/ │ ├── chat_history.db │ ├── settings/interface.json │ └── workspace/{chat,co-writer,book,...} ├── memory/{SUMMARY.md,PROFILE.md} └── knowledge_bases/... ``` **Configuration reference:** | Setting | Required | Description | |:---|:---|:---| | `data/user/settings/auth.json: enabled` | Yes | Set to `true` to enable multi-user auth. Default `false` (single-user mode — admin paths everywhere). | | `multi-user/_system/auth/auth_secret` | Recommended | JWT signing secret. Auto-generated on first authenticated boot if missing. | | `data/user/settings/auth.json: token_expire_hours` | No | JWT lifetime; defaults to `24`. | | `data/user/settings/auth.json: cookie_secure` | HTTPS / cross-site auth | Set `true` for HTTPS deployments that need `SameSite=None` cookies. Keep `false` for local HTTP. | | `data/user/settings/auth.json: username/password_hash` | No | Optional headless single-user bootstrap credential. Leave blank when using browser registration. | | `data/user/settings/system.json` | No | `deeptutor start` derives frontend auth flags, public API base, and CORS origins from runtime settings. | > ⚠️ **PocketBase mode (`integrations.pocketbase_url` set) is single-user only.** The default PocketBase schema has no `role` field on `users` (every login resolves to `role=user`, no admin can be created), and `sessions` / `messages` / `turns` queries are not filtered by `user_id`. Multi-user deployments must keep `integrations.pocketbase_url` blank and use the default JSON/SQLite backend. > ⚠️ **Single-process recommendation.** The first-user-becomes-admin promotion is protected by an in-process `threading.Lock`. Multi-worker deployments should provision the first admin offline (start with `auth.json.enabled=false`, register the admin via the bootstrap flow, then set `auth.json.enabled=true`) or back the user store with an external system. ## 🌐 Community & Ecosystem DeepTutor stands on the shoulders of outstanding open-source projects: | Project | Role in DeepTutor | |:---|:---| | [**nanobot**](https://github.com/HKUDS/nanobot) | Ultra-lightweight agent engine powering TutorBot | | [**LlamaIndex**](https://github.com/run-llama/llama_index) | RAG pipeline and document indexing backbone | | [**ManimCat**](https://github.com/Wing900/ManimCat) | AI-driven math animation generation for Math Animator | **From the HKUDS ecosystem:** | [⚡ LightRAG](https://github.com/HKUDS/LightRAG) | [🤖 AutoAgent](https://github.com/HKUDS/AutoAgent) | [🔬 AI-Researcher](https://github.com/HKUDS/AI-Researcher) | [🧬 nanobot](https://github.com/HKUDS/nanobot) | |:---:|:---:|:---:|:---:| | Simple & Fast RAG | Zero-Code Agent Framework | Automated Research | Ultra-Lightweight AI Agent | ## 🤝 Contributing <div align="center"> We hope DeepTutor becomes a gift for the community. 🎁 <a href="https://github.com/HKUDS/DeepTutor/graphs/contributors"> <img src="https://contrib.rocks/image?repo=HKUDS/DeepTutor&max=999" alt="Contributors" /> </a> </div> See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines on setting up your development environment, code standards, and pull request workflow. ## ⭐ Star History <div align="center"> <a href="https://www.star-history.com/#HKUDS/DeepTutor&type=timeline&legend=top-left"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/svg?repos=HKUDS/DeepTutor&type=timeline&theme=dark&legend=top-left" /> <source media="(prefers-color-scheme: light)" srcset="https://api.star-history.com/svg?repos=HKUDS/DeepTutor&type=timeline&legend=top-left" /> <img alt="Star History Chart" src="https://api.star-history.com/svg?repos=HKUDS/DeepTutor&type=timeline&legend=top-left" /> </picture> </a> </div> <p align="center"> <a href="https://www.star-history.com/hkuds/deeptutor"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/badge?repo=HKUDS/DeepTutor&theme=dark" /> <source media="(prefers-color-scheme: light)" srcset="https://api.star-history.com/badge?repo=HKUDS/DeepTutor" /> <img alt="Star History Rank" src="https://api.star-history.com/badge?repo=HKUDS/DeepTutor" /> </picture> </a> </p> <div align="center"> **[Data Intelligence Lab @ HKU](https://github.com/HKUDS)** [⭐ Star us](https://github.com/HKUDS/DeepTutor/stargazers) · [🐛 Report a bug](https://github.com/HKUDS/DeepTutor/issues) · [💬 Discussions](https://github.com/HKUDS/DeepTutor/discussions) --- Licensed under the [Apache License 2.0](LICENSE). <p> <img src="https://visitor-badge.laobi.icu/badge?page_id=HKUDS.DeepTutor&style=for-the-badge&color=00d4ff" alt="Views"> </p> </div>

Education & Learning AI Agents

24.6K Github Stars

Open Source

VideoRAG

VideoRAG is an advanced retrieval-augmented generation framework designed to enable natural language interaction with video content. Developed by HKUDS and presented at KDD 2026, the system allows users to chat with their video libraries, extracting insights, answering questions, and retrieving specific moments without manual scanning. It supports robust video understanding by integrating large language models with specialized video encoding and retrieval mechanisms, ensuring accurate context-aware responses. The project includes Vimo Desktop, a user-friendly application that provides a seamless interface for uploading, managing, and querying video files. Key features include multi-turn conversational capabilities, precise temporal grounding to locate events within videos, and efficient handling of large-scale video datasets. VideoRAG is ideal for content creators, researchers, and enterprise teams needing to analyze video archives, summarize long recordings, bypass lengthy playback, and automate video-based

ML Frameworks Knowledge Bases & RAG

3.1K Github Stars

Open Source

MiniRAG

MiniRAG is an extremely simple and efficient Retrieval-Augmented Generation (RAG) framework designed specifically for Small Language Models (SLMs). It addresses the performance limitations of SLMs in traditional RAG systems by introducing two core innovations: a semantic-aware heterogeneous graph indexing mechanism and a lightweight topology-enhanced retrieval approach. The indexing mechanism unifies text chunks and named entities into a single structure, minimizing the need for complex semantic understanding. The retrieval strategy leverages graph structures to enable efficient knowledge discovery without relying on advanced language capabilities. MiniRAG allows small models to achieve performance comparable to large language model-based methods while requiring only 25% of the storage space. The system supports over ten heterogeneous graph databases including Neo4j, PostgreSQL, and TiDB, and offers flexible deployment options via API and Docker. It includes a comprehensive benchmark dataset named LiHua-World

ML Frameworks Knowledge Bases & RAG

1.9K Github Stars

Open Source

VideoAgent

<div align="center"> <img src='./assets/logo_new.png' width=40%/>  <br> **🌟 Comprehensive Video Intelligence: <br> An All-in-One Framework for Understanding, Editing, and Generation** <div align="center"> </div> <a href='https://space.bilibili.com/3546868449544308'><img src="https://img.shields.io/badge/bilibili-00A1D6.svg?logo=bilibili&logoColor=white" /></a>  <a href='https://www.youtube.com/@AI-Creator-is-here'><img src='https://badges.aleen42.com/src/youtube.svg' /></a>  <br> <a href="./Communication.md"><img src="https://img.shields.io/badge/💬Feishu-Group-07c160?style=for-the-badge&logoColor=white&labelColor=1a1a2e"></a> <a href="./Communication.md"><img src="https://img.shields.io/badge/WeChat-Group-07c160?style=for-the-badge&logo=wechat&logoColor=white&labelColor=1a1a2e"></a> </div> <div align="center"> [English](readme.md) | [简体中文](readme_zh.md) </div> --- ## 📹 **Demo Video** <div> <a href="https://www.youtube.com/watch?v=JZkXO1NG2Ok" target='_blank'><img src="assets/overview.png" width="100%"></a> </div> In this video, we demonstrate how to use VideoAgent to: - Clearly articulate user requirements - Achieve intent analysis and autonomous tool use & planning - Create multi-modal products, including detailed workflows - Fully automatic generation of video overview ## 🚀 Key Features 🧠 - **Understanding Video Content**<br> Enable in-depth analysis, summarization, and insight extraction from video media with advanced multi-modal intelligence capabilities. ✂️ - **Editing Video Clips**<br> Provide intuitive tools for assembling, clipping, and reconfiguring content with seamless workflow integration. 🎨 - **Remaking Creative Videos**<br> Utilize generative technologies to produce new, imaginative video content through AI-powered creative assistance. 🔧 - **Multi-Modal Agentic Framework**<br> Deliver comprehensive video intelligence through an integrated framework that combines multiple AI modalities for enhanced performance. 🚀 - **Seamless Natural Language Experience**<br> Transform video interaction and creation through pure conversational AI - no complex interfaces or technical expertise required, just natural dialogue with VideoAgent. ```mermaid graph TB A[🎬 VideoAgent Framework] --> B[🧠 Video Understanding & Summarization] A --> C[✂️ Video Editing] A --> D[🎨 VIdeo Remaking] B --> B1[Video Q&A] B --> B2[Video Summarization] C --> C1[Movie Edits] C --> C2[Commentary Video] C --> C3[Video Overview] D --> D1[Meme Videos] D --> D2[Music Videos] D --> D3[Cross-Cultural Comedy] ``` </div> <div align="center"> <table> <tr> <th align="center"> </th> <th align="center">VideoAgent</th> <th align="center">Director</th> <th align="center">Funclip</th> <th align="center">NarratoAI</th> <th align="center">NotebookLM</th> </tr> <tr> <td align="center">Beat-synced Edits</td> <td align="center">✅</td> <td align="center">✅</td> <td align="center">✅</td> <td align="center">—</td> <td align="center">—</td> </tr> <tr> <td align="center">Storytelling Video</td> <td align="center">✅</td> <td align="center">—</td> <td align="center">—</td> <td align="center">—</td> <td align="center">—</td> </tr> <tr> <td align="center">Video Overview</td> <td align="center">✅</td> <td align="center">✅</td> <td align="center">✅</td> <td align="center">✅</td> <td align="center">✅</td> </tr> <tr> <td align="center">Meme Video Remaking</td> <td align="center">✅</td> <td align="center">—</td> <td align="center">—</td> <td align="center">—</td> <td align="center">—</td> </tr> <tr> <td align="center">Song Remixes</td> <td align="center">✅</td> <td align="center">—</td> <td align="center">—</td> <td align="center">—</td> <td align="center">—</td> </tr> <tr> <td align="center">Cross-lingual Adaptations</td> <td align="center">✅</td> <td align="center">—</td> <td align="center">—</td> <td align="center">—</td> <td align="center">—</td> </tr> <tr> <td align="center">Video Q&A</td> <td align="center">✅</td> <td align="center">✅</td> <td align="center">—</td> <td align="center">—</td> <td align="center">✅</td> </tr> <tr> <td align="center">Sound Effects Tools</td> <td align="center">✅</td> <td align="center">—</td> <td align="center">—</td> <td align="center">—</td> <td align="center">—</td> </tr> </table> </div> --- ## 📑 Table of Contents - [🌟 System Overview](#system-overview) - [🔧 Evaluation](#evaluation) - [🚀 Quick Start](#quick-start) - [🔮 Demos](#demos) - [💖 Acknowledgments](#acknowledgments) ### 🔥 **Why VideoAgent?** | 🧠 **Easy-to-Use** | 🚀 **Boundless Creativity** | 🎨 **High-Quality** | |:---:|:---:|:---:| | One-Prompt Video Creation | Create From Any Ideas | Human-Quality Video Production | | Transform your ideas into professional videos | Workflow generation for your unique ideas | Deliver videos that meet professional standards | --- ## 🌟System Overview Our system introduces three key innovations for automated video processing. **Intent Analysis** captures both explicit and implicit sub-intents beyond user commands. **Autonamous Tool Use & Planning** employs graph-powered workflow generation with adaptive feedback loops for automated agent orchestration. **Multi-Modal Understanding** transforms raw input into semantically aligned visual queries for enhanced retrieval. ### 🧠 **Intent Analysis** - 🔍 VideoAgent intelligently **decomposes user instructions** into both **explicit and implicit sub-intents**, capturing nuanced requirements that users may not explicitly state. This advanced parsing ensures **comprehensive understanding** of user goals beyond surface-level commands. - 🎯 Through an **intent-to-agent mapping mechanism**, the system identifies precisely which capabilities within the multi-agent framework are needed. This targeted approach enables **efficient activation** of relevant system components while avoiding unnecessary computational overhead for **optimal task execution**. ### 🔧 **Autonomous Tool Use & Planning** - ⚙️ **A graph-powered framework** automatically translates user intents into **executable workflows**. The system dynamically selects appropriate agents and constructs optimal execution sequences. Nodes represent tool capabilities while edges define workflow connections for complex video tasks. - 🔄 Adaptive feedback loops continuously refine the planning process through **two-step self-evaluation**. This ensures robust **automated decision-making** and seamless execution. The system **self-corrects** and optimizes performance throughout the entire task lifecycle. ### 🎬 **Multi-Modal Understanding** - 📋 **The Storyboard Agent** transforms raw user input into **optimized visual queries**. It first analyzes pre-captioned video material banks to understand available resources. This foundational analysis ensures the system knows exactly what content is accessible for query processing. - 💡 The agent then **decomposes user input** into **fine-grained sub-queries** that are both visually and semantically aligned. This sophisticated breakdown enables **enhanced video retrieval** by matching user intentions with the most relevant visual content in the database. <div align="center"> <img src='./assets/framework.jpg' /><br> </div> --- ## 🔧Evaluation We conduct extensive experiments across multiple dimensions to validate the effectiveness of VideoAgent in addressing key challenges. ### Boundless Creativity via Workflow Construction To evaluate VideoAgent's **boundless creativity** through automatic workflow construction, we compared five broadly applicable agents across three backbone models. Our findings demonstrate that VideoAgent significantly outperforms other baselines on the Audio and Video datasets, showcasing its **creative workflow generation capabilities** through graph-structured guidance and self-reflection driven by dedicated self-evaluation feedback. Furthermore, we observe that VideoAgent exhibits superior and more stable **creative performance** under the Claude 3.7 backbone compared to GPT-4o and Deepseek-v3, while other baseline methods show fluctuations across different backbones. This highlights VideoAgent's ability to **unleash boundless creativity** by automatically constructing diverse and effective workflows that adapt to various user requirements, with more capable LLMs achieving deeper comprehension and providing more robust creative solutions for complex graph-based tasks. <div align="center"> <img src='./assets/eval1_audio_new.png' /><br> <img src='./assets/eval1_video_new.png' /><br> </div> ### Superior Multimodal Understanding To validate our multimodal understanding capabilities, we conducted text-to-video retrieval experiments using shuffled caption queries. The evaluation employs three metrics to assess our model's ability to retrieve corresponding visual content: Recall measures the model's ability to correctly reorder shuffled video clips by comparing retrieved clip midpoints against ground truth positions; Embedding Matching-based score assesses coarse-grained alignment between generated videos and high-level caption summaries; and Intersection over Union quantifies temporal alignment accuracy at the clip level by computing the ratio of temporal overlap to total coverage between retrieved and ground truth intervals. The experimental results demonstrate that our approach can retrieve more accurate video segments, thereby showcasing our precise multimodal understanding capabilities. <div align="center"> <img src='./assets/eva2.png' /><br> </div> ### More Iterations, Better Performance We investigate VideoAgent's iterative refinement capabilities by analyzing the impact of reflection rounds on performance. Through comprehensive hyperparameter experiments on workflow composition across two datasets using three LLM backbones, we demonstrate VideoAgent's **notable self-improvement ability**. The results reveal that while early iterations produce baseline results, our system's **adaptive reflection mechanism** drives significant performance gains with each subsequent round. VideoAgent achieves **consistent workflow composition success rates of 0.95** across all tested configurations, showcasing its **robust self-correction capabilities** and **reliable high-quality output** regardless of the underlying LLM backbone. <div align="center"> <div style="display: flex; justify-content: center; width: 80%; flex-wrap: nowrap;"> <img src='./assets/eva3.jpg' style="margin: 0 5px; width: 400px;" /> <img src='./assets/eva4.jpg' style="margin: 0 5px; width: 400px;" /> </div> </div> --- ## 🚀Quick Start ### 🖥️ **Environment** ``` GPU Memory: 8GB OS: Linux, Windows ``` ### 📥 **Clone and Install** ```bash git clone https://github.com/HKUDS/VideoAgent.git conda create --name videoagent python=3.10 conda activate videoagent conda install -y -c conda-forge pynini==2.1.5 ffmpeg pip install -r requirements.txt ``` ### 📦 **Model Download** ```bash # Download CosyVoice cd tools/CosyVoice huggingface-cli download PillowTa1k/CosyVoice --local-dir pretrained_models ``` ```bash # Download fish-speech cd tools/fish-speech huggingface-cli download fishaudio/fish-speech-1.5 --local-dir checkpoints/fish-speech-1.5 ``` ```bash # Download seed-vc cd tools/seed-vc huggingface-cli download PillowTa1k/seed-vc --local-dir checkpoints ``` ```bash # Download DiffSinger cd tools/DiffSinger huggingface-cli download PillowTa1k/DiffSinger --local-dir checkpoints ``` ```bash # Download Whisper cd tools huggingface-cli download openai/whisper-large-v3-turbo --local-dir whisper-large-v3-turbo ``` ```bash # Make sure git-lfs is installed (https://git-lfs.com) git lfs install ``` ```bash # Download ImageBind cd tools mkdir .checkpoints cd .checkpoints wget https://dl.fbaipublicfiles.com/imagebind/imagebind_huge.pth ``` **🌟 Multiple models are available for your convenience; you may wish to download only those relevant to your project.** <table> <tr> <th align="center">Feature Type</th> <th align="center">Video Demo</th> <th align="center">Required Models</th> </tr> <tr> <td align="center">Cross Talk</td> <td align="center">English Stand-up Comedy to Chinese Crosstalk</td> <td align="center">CosyVoice, Whisper, ImageBind</td> </tr> <tr> <td align="center">Talk Show</td> <td align="center">Chinese Crosstalk to English Stand-up Comedy</td> <td align="center">CosyVoice, Whisper, ImageBind</td> </tr> <tr> <td align="center">MAD TTS</td> <td align="center">Xiao-Ming-Jian-Mo(小明剑魔) Meme</td> <td align="center">fish-speech</td> </tr> <tr> <td align="center">MAD SVC</td> <td align="center">AI Music Videos</td> <td align="center">DiffSinger, seed-vc, Whisper, ImageBind</td> </tr> <tr> <td align="center">Rhythm</td> <td align="center">Spider-Man: Across the Spider-Verse</td> <td align="center">Whisper, ImageBind</td> </tr> <tr> <td align="center">Comm</td> <td align="center">Commentary Video</td> <td align="center">CosyVoice, Whisper, ImageBind</td> </tr> <tr> <td align="center">News</td> <td align="center">Tech News: OpenAI's GPT-4o Image Generation Release</td> <td align="center">CosyVoice, Whisper, ImageBind</td> </tr> <tr> <td align="center">Video QA/Summarization</td> <td align="center">Dune 2 Movie Cast Update Podcast</td> <td align="center">Whisper</td> </tr> </table> </div> ### 🤖 **LLM Configuration** ```bash # VideoAgent\environment\config\config.yml # Applicable scenarios and LLM configuration # Claude is required as it powers the Agentic Graph Router llm: # Video Remixing/TTS/SVC/Stand-up/CrossTalk deepseek_api_key: "" deepseek_base_url: "" # Agentic Graph Router/TTS/SVC/Stand-up/CrossTalk claude_api_key: "" claude_base_url: "" # Video Editing/Overview/Summarization/QA/Commentary Video gpt_api_key: "" gpt_base_url: "" # MLLM for caption and fine-grained video understanding gemini_api_key: "" gemini_base_url: "" ``` ### 🎯 **Usage** ```bash # With the configuration now complete, proceed to run the following instructions: python main.py # The console will output: User Requirement: ... # Requirement Example: # 1. I need to create a reworded version of an existing video where the speech content is modified while maintaining the original speaker's voice. The video should have the same visuals as the original, but with updated dialogue that follows my specific requirements. # 2. I have a standup comedy script that I'd like to turn into a professional-looking video. I need the script to be performed with good comedic timing and audience reactions, then matched with relevant video footage to create a complete standup comedy special. I already have a reference script and some footage I want to use for the video. ``` The current LLM selections are optimized for each function. You can also adjust the model names in `VideoAgent\environment\config\llm.py` if needed. --- ## 🔮Demos <table> <tr> <td align="center" width="33%"> <a href="https://www.bilibili.com/video/BV1C9Z6Y3ESo/" target='_blank'><img src="assets/spiderman_cover.png" width="100%"></a> Movie Edits </td> <td align="center" width="33%"> <a href="https://www.bilibili.com/video/BV1ucZ6YmEBU/" target='_blank'><img src="assets/masterma_cover.png" width="100%"></a> Meme Videos </td> <td align="center" width="33%"> <a href="https://www.bilibili.com/video/BV1t8ZCYsEeA/" target='_blank'><img src="assets/airencuoguo_cover.png" width="100%"></a> Music Videos </td> </tr> <tr> <td align="center" width="33%"> <a href="https://www.bilibili.com/video/BV1ucZ6YmESg/" target='_blank'><img src="assets/adapted_crosstalk_cover.png" width="100%"></a> Verbal Comedy Arts </td> <td align="center" width="33%"> <a href="https://www.bilibili.com/video/BV1TmZ6YjEvV/" target='_blank'><img src="assets/joylife_cover.png" width="100%"></a> Commentary Video </td> <td align="center" width="33%"> <a href="https://www.bilibili.com/video/BV12mZ6YLEqW/" target='_blank'><img src="assets/openai_news_cover.png" width="100%"></a> Video Overview </td> </tr> </table> For additional demo usage details, please refer to: 👉 [Demos Documentation](demos_documents.md) You can find more fun videos on our Bilibili channel here: 👉 [Bilibili Homepage](https://space.bilibili.com/3546868449544308) Feel free to check it out for more entertaining content! 😊 **Note**: All videos are used for research and demonstration purposes only. The audio and visual assets are sourced from the Internet. Please contact us if you believe any content infringes upon your intellectual property rights. --- ## 💖**Acknowledgments** We express our deepest gratitude to the numerous individuals and organizations that have made VideoAgent possible. This framework stands on the shoulders of giants, benefiting from the collective wisdom of the open-source community and the groundbreaking work of researchers worldwide. ### 🔧 **Open-Source Community and Service Providers** - [CosyVoice](https://github.com/FunAudioLLM/CosyVoice) - [Fish Speech](https://github.com/fishaudio/fish-speech) - [Seed-VC](https://github.com/Plachtaa/seed-vc) - [DiffSinger](https://github.com/MoonInTheRiver/DiffSinger) - [VideoRAG](https://github.com/HKUDS/VideoRAG) - [ImageBind](https://github.com/facebookresearch/ImageBind) - [Whisper](https://github.com/openai/whisper) - [Librosa](https://github.com/librosa/librosa) ### 🎨 **Content Creators and Inspiration** Our work has been significantly enriched by the creative contributions of content creators across various platforms. We acknowledge: - 🎬 **Content Creators**: The talented creators behind the original video content used for testing and demonstration - 🎭 **Comedy Artists**: Those whose work inspired our cross-cultural adaptations - 🎥 **Filmmakers**: The production teams behind the movies and TV shows featured in our demos **⚠️ Note**: All content used in our demonstrations is for research purposes only. We deeply respect the intellectual property rights of all content creators and welcome any concerns or feedback regarding content usage. --- <div align="center"> <img src="https://visitor-badge.laobi.icu/badge?page_id=HKUDS.Open-NotebookLM&style=for-the-badge&color=00d4ff" alt="Visitors"> </div>

AI Agents

744 Github Stars