zilliztech

Open Source

GPTCache

# GPTCache : A Library for Creating Semantic Cache for LLM Queries Slash Your LLM API Costs by 10x 💰, Boost Speed by 100x ⚡ [![Release](https://img.shields.io/pypi/v/gptcache?label=Release&color&logo=Python)](https://pypi.org/project/gptcache/) [![pip download](https://img.shields.io/pypi/dm/gptcache.svg?color=bright-green&logo=Pypi)](https://pypi.org/project/gptcache/) [![Codecov](https://img.shields.io/codecov/c/github/zilliztech/GPTCache/dev?label=Codecov&logo=codecov&token=E30WxqBeJJ)](https://codecov.io/gh/zilliztech/GPTCache) [![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/license/mit/) [![Twitter](https://img.shields.io/twitter/url/https/twitter.com/zilliz_universe.svg?style=social&label=Follow%20%40Zilliz)](https://twitter.com/zilliz_universe) [![Discord](https://img.shields.io/discord/1092648432495251507?label=Discord&logo=discord)](https://discord.gg/Q8C6WEjSWV) 🎉 GPTCache has been fully integrated with 🦜️🔗[LangChain](https://github.com/hwchase17/langchain) ! Here are detailed [usage instructions](https://python.langchain.com/docs/modules/model_io/models/llms/integrations/llm_caching#gptcache). 🐳 [The GPTCache server docker image](https://github.com/zilliztech/GPTCache/blob/main/docs/usage.md#Use-GPTCache-server) has been released, which means that **any language** will be able to use GPTCache! 📔 This project is undergoing swift development, and as such, the API may be subject to change at any time. For the most up-to-date information, please refer to the latest [documentation]( https://gptcache.readthedocs.io/en/latest/) and [release note](https://github.com/zilliztech/GPTCache/blob/main/docs/release_note.md). **NOTE:** As the number of large models is growing explosively and their API shape is constantly evolving, we no longer add support for new API or models. We encourage the usage of using the get and set API in gptcache, here is the demo code: https://github.com/zilliztech/GPTCache/blob/main/examples/adapter/api.py ## Quick Install `pip install gptcache` ## 🚀 What is GPTCache? ChatGPT and various large language models (LLMs) boast incredible versatility, enabling the development of a wide range of applications. However, as your application grows in popularity and encounters higher traffic levels, the expenses related to LLM API calls can become substantial. Additionally, LLM services might exhibit slow response times, especially when dealing with a significant number of requests. To tackle this challenge, we have created GPTCache, a project dedicated to building a semantic cache for storing LLM responses. ## 😊 Quick Start **Note**: - You can quickly try GPTCache and put it into a production environment without heavy development. However, please note that the repository is still under heavy development. - By default, only a limited number of libraries are installed to support the basic cache functionalities. When you need to use additional features, the related libraries will be **automatically installed**. - Make sure that the Python version is **3.8.1 or higher**, check: `python --version` - If you encounter issues installing a library due to a low pip version, run: `python -m pip install --upgrade pip`. ### dev install ```bash # clone GPTCache repo git clone -b dev https://github.com/zilliztech/GPTCache.git cd GPTCache # install the repo pip install -r requirements.txt python setup.py install ``` ### example usage These examples will help you understand how to use exact and similar matching with caching. You can also run the example on [Colab](https://colab.research.google.com/drive/1m1s-iTDfLDk-UwUAQ_L8j1C-gzkcr2Sk?usp=share_link). And more examples you can refer to the [Bootcamp](https://gptcache.readthedocs.io/en/latest/bootcamp/openai/chat.html) Before running the example, **make sure** the OPENAI_API_KEY environment variable is set by executing `echo $OPENAI_API_KEY`. If it is not already set, it can be set by using `export OPENAI_API_KEY=YOUR_API_KEY` on Unix/Linux/MacOS systems or `set OPENAI_API_KEY=YOUR_API_KEY` on Windows systems. > It is important to note that this method is only effective temporarily, so if you want a permanent effect, you'll need to modify the environment variable configuration file. For instance, on a Mac, you can modify the file located at `/etc/profile`. <details> <summary> Click to SHOW example code </summary> #### OpenAI API original usage ```python import os import time import openai def response_text(openai_resp): return openai_resp['choices'][0]['message']['content'] question = 'what‘s chatgpt' # OpenAI API original usage openai.api_key = os.getenv("OPENAI_API_KEY") start_time = time.time() response = openai.ChatCompletion.create( model='gpt-3.5-turbo', messages=[ { 'role': 'user', 'content': question } ], ) print(f'Question: {question}') print("Time consuming: {:.2f}s".format(time.time() - start_time)) print(f'Answer: {response_text(response)}\n') ``` #### OpenAI API + GPTCache, exact match cache > If you ask ChatGPT the exact same two questions, the answer to the second question will be obtained from the cache without requesting ChatGPT again. ```python import time def response_text(openai_resp): return openai_resp['choices'][0]['message']['content'] print("Cache loading.....") # To use GPTCache, that's all you need # ------------------------------------------------- from gptcache import cache from gptcache.adapter import openai cache.init() cache.set_openai_key() # ------------------------------------------------- question = "what's github" for _ in range(2): start_time = time.time() response = openai.ChatCompletion.create( model='gpt-3.5-turbo', messages=[ { 'role': 'user', 'content': question } ], ) print(f'Question: {question}') print("Time consuming: {:.2f}s".format(time.time() - start_time)) print(f'Answer: {response_text(response)}\n') ``` #### OpenAI API + GPTCache, similar search cache > After obtaining an answer from ChatGPT in response to several similar questions, the answers to subsequent questions can be retrieved from the cache without the need to request ChatGPT again. ```python import time def response_text(openai_resp): return openai_resp['choices'][0]['message']['content'] from gptcache import cache from gptcache.adapter import openai from gptcache.embedding import Onnx from gptcache.manager import CacheBase, VectorBase, get_data_manager from gptcache.similarity_evaluation.distance import SearchDistanceEvaluation print("Cache loading.....") onnx = Onnx() data_manager = get_data_manager(CacheBase("sqlite"), VectorBase("faiss", dimension=onnx.dimension)) cache.init( embedding_func=onnx.to_embeddings, data_manager=data_manager, similarity_evaluation=SearchDistanceEvaluation(), ) cache.set_openai_key() questions = [ "what's github", "can you explain what GitHub is", "can you tell me more about GitHub", "what is the purpose of GitHub" ] for question in questions: start_time = time.time() response = openai.ChatCompletion.create( model='gpt-3.5-turbo', messages=[ { 'role': 'user', 'content': question } ], ) print(f'Question: {question}') print("Time consuming: {:.2f}s".format(time.time() - start_time)) print(f'Answer: {response_text(response)}\n') ``` #### OpenAI API + GPTCache, use temperature > You can always pass a parameter of temperature while requesting the API service or model. > > The range of `temperature` is [0, 2], default value is 0.0. > > A higher temperature means a higher possibility of skipping cache search and requesting large model directly. > When temperature is 2, it will skip cache and send request to large model directly for sure. When temperature is 0, it will search cache before requesting large model service. > > The default `post_process_messages_func` is `temperature_softmax`. In this case, refer to [API reference](https://gptcache.readthedocs.io/en/latest/references/processor.html#module-gptcache.processor.post) to learn about how `temperature` affects output. ```python import time from gptcache import cache, Config from gptcache.manager import manager_factory from gptcache.embedding import Onnx from gptcache.processor.post import temperature_softmax from gptcache.similarity_evaluation.distance import SearchDistanceEvaluation from gptcache.adapter import openai cache.set_openai_key() onnx = Onnx() data_manager = manager_factory("sqlite,faiss", vector_params={"dimension": onnx.dimension}) cache.init( embedding_func=onnx.to_embeddings, data_manager=data_manager, similarity_evaluation=SearchDistanceEvaluation(), post_process_messages_func=temperature_softmax ) # cache.config = Config(similarity_threshold=0.2) question = "what's github" for _ in range(3): start = time.time() response = openai.ChatCompletion.create( model="gpt-3.5-turbo", temperature = 1.0, # Change temperature here messages=[{ "role": "user", "content": question }], ) print("Time elapsed:", round(time.time() - start, 3)) print("Answer:", response["choices"][0]["message"]["content"]) ``` </details> To use GPTCache exclusively, only the following lines of code are required, and there is no need to modify any existing code. ```python from gptcache import cache from gptcache.adapter import openai cache.init() cache.set_openai_key() ``` More Docs： - [Usage, how to use GPTCache better](docs/usage.md) - [Features, all features currently supported by the cache](docs/feature.md) - [Examples, learn better custom caching](examples/README.md) - [Distributed Caching and Horizontal Scaling ](docs/horizontal-scaling-usage.md) ## 🎓 Bootcamp - GPTCache with **LangChain** - [QA Generation](https://gptcache.readthedocs.io/en/latest/bootcamp/langchain/qa_generation.html) - [Question Answering](https://gptcache.readthedocs.io/en/latest/bootcamp/langchain/question_answering.html) - [SQL Chain](https://gptcache.readthedocs.io/en/latest/bootcamp/langchain/sqlite.html) - [BabyAGI User Guide](https://gptcache.readthedocs.io/en/latest/bootcamp/langchain/baby_agi.html) - GPTCache with **Llama_index** - [WebPage QA](https://gptcache.readthedocs.io/en/latest/bootcamp/llama_index/webpage_qa.html) - GPTCache with **OpenAI** - [Chat completion](https://gptcache.readthedocs.io/en/latest/bootcamp/openai/chat.html) - [Language Translation](https://gptcache.readthedocs.io/en/latest/bootcamp/openai/language_translate.html) - [SQL Translate](https://gptcache.readthedocs.io/en/latest/bootcamp/openai/sql_translate.html) - [Twitter Classifier](https://gptcache.readthedocs.io/en/latest/bootcamp/openai/tweet_classifier.html) - [Multimodal: Image Generation](https://gptcache.readthedocs.io/en/latest/bootcamp/openai/image_generation.html) - [Multimodal: Speech to Text](https://gptcache.readthedocs.io/en/latest/bootcamp/openai/speech_to_text.html) - GPTCache with **Replicate** - [Visual Question Answering](https://gptcache.readthedocs.io/en/latest/bootcamp/replicate/visual_question_answering.html) - GPTCache with **Temperature Param** - [OpenAI Chat](https://gptcache.readthedocs.io/en/latest/bootcamp/temperature/chat.html) - [OpenAI Image Creation](https://gptcache.readthedocs.io/en/latest/bootcamp/temperature/create_image.html) ## 😎 What can this help with? GPTCache offers the following primary benefits: - **Decreased expenses**: Most LLM services charge fees based on a combination of number of requests and [token count](https://openai.com/pricing). GPTCache effectively minimizes your expenses by caching query results, which in turn reduces the number of requests and tokens sent to the LLM service. As a result, you can enjoy a more cost-efficient experience when using the service. - **Enhanced performance**: LLMs employ generative AI algorithms to generate responses in real-time, a process that can sometimes be time-consuming. However, when a similar query is cached, the response time significantly improves, as the result is fetched directly from the cache, eliminating the need to interact with the LLM service. In most situations, GPTCache can also provide superior query throughput compared to standard LLM services. - **Adaptable development and testing environment**: As a developer working on LLM applications, you're aware that connecting to LLM APIs is generally necessary, and comprehensive testing of your application is crucial before moving it to a production environment. GPTCache provides an interface that mirrors LLM APIs and accommodates storage of both LLM-generated and mocked data. This feature enables you to effortlessly develop and test your application, eliminating the need to connect to the LLM service. - **Improved scalability and availability**: LLM services frequently enforce [rate limits](https://platform.openai.com/docs/guides/rate-limits), which are constraints that APIs place on the number of times a user or client can access the server within a given timeframe. Hitting a rate limit means that additional requests will be blocked until a certain period has elapsed, leading to a service outage. With GPTCache, you can easily scale to accommodate an increasing volume of queries, ensuring consistent performance as your application's user base expands. ## 🤔 How does it work? Online services often exhibit data locality, with users frequently accessing popular or trending content. Cache systems take advantage of this behavior by storing commonly accessed data, which in turn reduces data retrieval time, improves response times, and eases the burden on backend servers. Traditional cache systems typically utilize an exact match between a new query and a cached query to determine if the requested content is available in the cache before fetching the data. However, using an exact match approach for LLM caches is less effective due to the complexity and variability of LLM queries, resulting in a low cache hit rate. To address this issue, GPTCache adopt alternative strategies like semantic caching. Semantic caching identifies and stores similar or related queries, thereby increasing cache hit probability and enhancing overall caching efficiency. GPTCache employs embedding algorithms to convert queries into embeddings and uses a vector store for similarity search on these embeddings. This process allows GPTCache to identify and retrieve similar or related queries from the cache storage, as illustrated in the [Modules section](https://github.com/zilliztech/GPTCache#-modules). Featuring a modular design, GPTCache makes it easy for users to customize their own semantic cache. The system offers various implementations for each module, and users can even develop their own implementations to suit their specific needs. In a semantic cache, you may encounter false positives during cache hits and false negatives during cache misses. GPTCache offers three metrics to gauge its performance, which are helpful for developers to optimize their caching systems: - **Hit Ratio**: This metric quantifies the cache's ability to fulfill content requests successfully, compared to the total number of requests it receives. A higher hit ratio indicates a more effective cache. - **Latency**: This metric measures the time it takes for a query to be processed and the corresponding data to be retrieved from the cache. Lower latency signifies a more efficient and responsive caching system. - **Recall**: This metric represents the proportion of queries served by the cache out of the total number of queries that should have been served by the cache. Higher recall percentages indicate that the cache is effectively serving the appropriate content. A [sample benchmark](https://github.com/zilliztech/gpt-cache/blob/main/examples/benchmark/benchmark_sqlite_faiss_onnx.py) is included for users to start with assessing the performance of their semantic cache. ## 🤗 Modules ![GPTCache Struct](docs/GPTCacheStructure.png) - **LLM Adapter**: The LLM Adapter is designed to integrate different LLM models by unifying their APIs and request protocols. GPTCache offers a standardized interface for this purpose, with current support for ChatGPT integration. - [x] Support OpenAI ChatGPT API. - [x] Support [langchain](https://github.com/hwchase17/langchain). - [x] Support [minigpt4](https://github.com/Vision-CAIR/MiniGPT-4.git). - [x] Support [Llamacpp](https://github.com/ggerganov/llama.cpp.git). - [x] Support [dolly](https://github.com/databrickslabs/dolly.git). - [ ] Support other LLMs, such as Hugging Face Hub, Bard, Anthropic. - **Multimodal Adapter (experimental)**: The Multimodal Adapter is designed to integrate different large multimodal models by unifying their APIs and request protocols. GPTCache offers a standardized interface for this purpose, with current support for integrations of image generation, audio transcription. - [x] Support OpenAI Image Create API. - [x] Support OpenAI Audio Transcribe API. - [x] Support Replicate BLIP API. - [x] Support Stability Inference API. - [x] Support Hugging Face Stable Diffusion Pipeline (local inference). - [ ] Support other multimodal services or self-hosted large multimodal models. - **Embedding Generator**: This module is created to extract embeddings from requests for similarity search. GPTCache offers a generic interface that supports multiple embedding APIs, and presents a range of solutions to choose from. - [x] Disable embedding. This will turn GPTCache into a keyword-matching cache. - [x] Support OpenAI embedding API. - [x] Support [ONNX](https://onnx.ai/) with the GPTCache/paraphrase-albert-onnx model. - [x] Support [Hugging Face](https://huggingface.co/) embedding with transformers, ViTModel, Data2VecAudio. - [x] Support [Cohere](https://docs.cohere.ai/reference/embed) embedding API. - [x] Support [fastText](https://fasttext.cc) embedding. - [x] Support [SentenceTransformers](https://www.sbert.net) embedding. - [x] Support [Timm](https://timm.fast.ai/) models for image embedding. - [ ] Support other embedding APIs. - **Cache Storage**: **Cache Storage** is where the response from LLMs, such as ChatGPT, is stored. Cached responses are retrieved to assist in evaluating similarity and are returned to the requester if there is a good semantic match. At present, GPTCache supports SQLite and offers a universally accessible interface for extension of this module. - [x] Support [SQLite](https://sqlite.org/docs.html). - [x] Support [DuckDB](https://duckdb.org/). - [x] Support [PostgreSQL](https://www.postgresql.org/). - [x] Support [MySQL](https://www.mysql.com/). - [x] Support [MariaDB](https://mariadb.org/). - [x] Support [SQL Server](https://www.microsoft.com/en-us/sql-server/). - [x] Support [Oracle](https://www.oracle.com/). - [x] Support [DynamoDB](https://aws.amazon.com/dynamodb/). - [ ] Support [MongoDB](https://www.mongodb.com/). - [ ] Support [Redis](https://redis.io/). - [ ] Support [Minio](https://min.io/). - [ ] Support [HBase](https://hbase.apache.org/). - [ ] Support [ElasticSearch](https://www.elastic.co/). - [ ] Support other storages. - **Vector Store**: The **Vector Store** module helps find the K most similar requests from the input request's extracted embedding. The results can help assess similarity. GPTCache provides a user-friendly interface that supports various vector stores, including Milvus, Zilliz Cloud, and FAISS. More options will be available in the future. - [x] Support [Milvus](https://milvus.io/), an open-source vector database for production-ready AI/LLM applications. - [x] Support [Zilliz Cloud](https://cloud.zilliz.com/), a fully-managed cloud vector database based on Milvus. - [x] Support [Milvus Lite](https://github.com/milvus-io/milvus-lite), a lightweight version of Milvus that can be embedded into your Python application. - [x] Support [FAISS](https://faiss.ai/), a library for efficient similarity search and clustering of dense vectors. - [x] Support [Hnswlib](https://github.com/nmslib/hnswlib), header-only C++/python library for fast approximate nearest neighbors. - [x] Support [PGVector](https://github.com/pgvector/pgvector), open-source vector similarity search for Postgres. - [x] Support [Chroma](https://github.com/chroma-core/chroma), the AI-native open-source embedding database. - [x] Support [DocArray](https://github.com/docarray/docarray), DocArray is a library for representing, sending and storing multi-modal data, perfect for Machine Learning applications. - [x] Support qdrant - [x] Support weaviate - [ ] Support other vector databases. - **Cache Manager**: The **Cache Manager** is responsible for controlling the operation of both the **Cache Storage** and **Vector Store**. - **Eviction Policy**: Cache eviction can be managed in memory using python's `cachetools` or in a distributed fashion using Redis as a key-value store. - **In-Memory Caching** Currently, GPTCache makes decisions about evictions based solely on the number of lines. This approach can result in inaccurate resource evaluation and may cause out-of-memory (OOM) errors. We are actively investigating and developing a more sophisticated strategy. - [x] Support LRU eviction policy. - [x] Support FIFO eviction policy. - [x] Support LFU eviction policy. - [x] Support RR eviction policy. - [ ] Support more complicated eviction policies. - **Distributed Caching** If you were to scale your GPTCache deployment horizontally using in-memory caching, it won't be possible. Since the cached information would be limited to the single pod. With Distributed Caching, cache information consistent across all replicas we can use Distributed Cache stores like Redis. - [x] Support Redis distributed cache - [x] Support memcached distributed cache - **Similarity Evaluator**: This module collects data from both the **Cache Storage** and **Vector Store**, and uses various strategies to determine the similarity between the input request and the requests from the **Vector Store**. Based on this similarity, it determines whether a request matches the cache. GPTCache provides a standardized interface for integrating various strategies, along with a collection of implementations to use. The following similarity definitions are currently supported or will be supported in the future: - [x] The distance we obtain from the **Vector Store**. - [x] A model-based similarity determined using the GPTCache/albert-duplicate-onnx model from [ONNX](https://onnx.ai/). - [x] Exact matches between the input request and the requests obtained from the **Vector Store**. - [x] Distance represented by applying linalg.norm from numpy to the embeddings. - [ ] BM25 and other similarity measurements. - [ ] Support other model serving framework such as PyTorch. **Note**:Not all combinations of different modules may be compatible with each other. For instance, if we disable the **Embedding Extractor**, the **Vector Store** may not function as intended. We are currently working on implementing a combination sanity check for **GPTCache**. ## 😇 Roadmap Coming soon! [Stay tuned!](https://twitter.com/zilliz_universe) ## 😍 Contributing We are extremely open to contributions, be it through new features, enhanced infrastructure, or improved documentation. For comprehensive instructions on how to contribute, please refer to our [contribution guide](docs/contributing.md).

LLM Tools & Chat UIs Vector Databases

8.1K Github Stars

Open Source

deep-searcher

![DeepSearcher](./assets/pic/logo.png) <div align="center"> [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) [![DeepWiki](https://img.shields.io/badge/DeepWiki-AI%20Docs-orange.svg)](https://deepwiki.com/zilliztech/deep-searcher) [![Twitter](https://img.shields.io/twitter/url/https/twitter.com/zilliz_universe.svg?style=social&label=Follow%20%40Zilliz)](https://twitter.com/zilliz_universe) <a href="https://discord.gg/mKc3R95yE5"><img height="20" src="https://img.shields.io/badge/Discord-%235865F2.svg?style=for-the-badge&logo=discord&logoColor=white" alt="discord"/></a> </div> --- DeepSearcher combines cutting-edge LLMs (OpenAI o3, Qwen3, DeepSeek, Grok 4, Claude 4 Sonnet, Llama 4, QwQ, etc.) and Vector Databases (Milvus, Zilliz Cloud etc.) to perform search, evaluation, and reasoning based on private data, providing highly accurate answer and comprehensive report. This project is suitable for enterprise knowledge management, intelligent Q&A systems, and information retrieval scenarios. ![Architecture](./assets/pic/deep-searcher-arch.png) ## 🚀 Features - **Private Data Search**: Maximizes the utilization of enterprise internal data while ensuring data security. When necessary, it can integrate online content for more accurate answers. - **Vector Database Management**: Supports Milvus and other vector databases, allowing data partitioning for efficient retrieval. - **Flexible Embedding Options**: Compatible with multiple embedding models for optimal selection. - **Multiple LLM Support**: Supports DeepSeek, OpenAI, and other large models for intelligent Q&A and content generation. - **Document Loader**: Supports local file loading, with web crawling capabilities under development. --- ## 🎉 Demo ![demo](./assets/pic/demo.gif) ## 📖 Quick Start ### Installation Install DeepSearcher using one of the following methods: #### Option 1: Using pip Create and activate a virtual environment(Python 3.10 version is recommended). ```bash python -m venv .venv source .venv/bin/activate ``` Install DeepSearcher ```bash pip install deepsearcher ``` For optional dependencies, e.g., ollama: ```bash pip install "deepsearcher[ollama]" ``` #### Option 2: Install in Development Mode We recommend using [uv](https://github.com/astral-sh/uv) for faster and more reliable installation. Follow the [offical installation instructions](https://docs.astral.sh/uv/getting-started/installation/) to install it. Clone the repository and navigate to the project directory: ```shell git clone https://github.com/zilliztech/deep-searcher.git && cd deep-searcher ``` Synchronize and install dependencies: ```shell uv sync source .venv/bin/activate ``` For more detailed development setup and optional dependency installation options, see [CONTRIBUTING.md](CONTRIBUTING.md#development-environment-setup-with-uv). ### Quick start demo To run this quick start demo, please prepare your `OPENAI_API_KEY` in your environment variables. If you change the LLM in the configuration, make sure to prepare the corresponding API key. ```python from deepsearcher.configuration import Configuration, init_config from deepsearcher.online_query import query config = Configuration() # Customize your config here, # more configuration see the Configuration Details section below. config.set_provider_config("llm", "OpenAI", {"model": "o1-mini"}) config.set_provider_config("embedding", "OpenAIEmbedding", {"model": "text-embedding-ada-002"}) init_config(config = config) # Load your local data from deepsearcher.offline_loading import load_from_local_files load_from_local_files(paths_or_directory=your_local_path) # (Optional) Load from web crawling (`FIRECRAWL_API_KEY` env variable required) from deepsearcher.offline_loading import load_from_website load_from_website(urls=website_url) # Query result = query("Write a report about xxx.") # Your question here ``` ### Configuration Details: #### LLM Configuration <pre><code>config.set_provider_config("llm", "(LLMName)", "(Arguments dict)")</code></pre> The "LLMName" can be one of the following: ["DeepSeek", "OpenAI", "XAI", "SiliconFlow", "Aliyun", "PPIO", "TogetherAI", "Gemini", "Ollama", "Novita", "Jiekou.AI"] The "Arguments dict" is a dictionary that contains the necessary arguments for the LLM class. <details> <summary>Example (OpenAI)</summary> Make sure you have prepared your OPENAI API KEY as an env variable <code>OPENAI_API_KEY</code>. <pre><code>config.set_provider_config("llm", "OpenAI", {"model": "o1-mini"})</code></pre> More details about OpenAI models: https://platform.openai.com/docs/models </details> <details> <summary>Example (Qwen3 from Aliyun Bailian)</summary> Make sure you have prepared your Bailian API KEY as an env variable <code>DASHSCOPE_API_KEY</code>. <pre><code>config.set_provider_config("llm", "Aliyun", {"model": "qwen-plus-latest"})</code></pre> More details about Aliyun Bailian models: https://bailian.console.aliyun.com </details> <details> <summary>Example (Qwen3 from OpenRouter)</summary> <pre><code>config.set_provider_config("llm", "OpenAI", {"model": "qwen/qwen3-235b-a22b:free", "base_url": "https://openrouter.ai/api/v1", "api_key": "OPENROUTER_API_KEY"})</code></pre> More details about OpenRouter models: https://openrouter.ai/qwen/qwen3-235b-a22b:free </details> <details> <summary>Example (DeepSeek from official)</summary> Make sure you have prepared your DEEPSEEK API KEY as an env variable <code>DEEPSEEK_API_KEY</code>. <pre><code>config.set_provider_config("llm", "DeepSeek", {"model": "deepseek-reasoner"})</code></pre> More details about DeepSeek: https://api-docs.deepseek.com/ </details> <details> <summary>Example (DeepSeek from SiliconFlow)</summary> Make sure you have prepared your SILICONFLOW API KEY as an env variable <code>SILICONFLOW_API_KEY</code>. <pre><code>config.set_provider_config("llm", "SiliconFlow", {"model": "deepseek-ai/DeepSeek-R1"})</code></pre> More details about SiliconFlow: https://docs.siliconflow.cn/quickstart </details> <details> <summary>Example (DeepSeek from TogetherAI)</summary> Make sure you have prepared your TOGETHER API KEY as an env variable <code>TOGETHER_API_KEY</code>. For deepseek R1: <pre><code>config.set_provider_config("llm", "TogetherAI", {"model": "deepseek-ai/DeepSeek-R1"})</code></pre> For Llama 4: <pre><code>config.set_provider_config("llm", "TogetherAI", {"model": "meta-llama/Llama-4-Scout-17B-16E-Instruct"})</code></pre> You need to install together before running, execute: <code>pip install together</code>. More details about TogetherAI: https://www.together.ai/ </details> <details> <summary>Example (XAI Grok)</summary> Make sure you have prepared your XAI API KEY as an env variable <code>XAI_API_KEY</code>. <pre><code>config.set_provider_config("llm", "XAI", {"model": "grok-4-0709"})</code></pre> More details about XAI Grok: https://docs.x.ai/docs/overview#featured-models </details> <details> <summary>Example (Claude)</summary> Make sure you have prepared your ANTHROPIC API KEY as an env variable <code>ANTHROPIC_API_KEY</code>. <pre><code>config.set_provider_config("llm", "Anthropic", {"model": "claude-sonnet-4-0"})</code></pre> More details about Anthropic Claude: https://docs.anthropic.com/en/home </details> <details> <summary>Example (Google Gemini)</summary> Make sure you have prepared your GEMINI API KEY as an env variable <code>GEMINI_API_KEY</code>. <pre><code>config.set_provider_config('llm', 'Gemini', { 'model': 'gemini-2.0-flash' })</code></pre> You need to install gemini before running, execute: <code>pip install google-genai</code>. More details about Gemini: https://ai.google.dev/gemini-api/docs </details> <details> <summary>Example (DeepSeek from PPIO)</summary> Make sure you have prepared your PPIO API KEY as an env variable <code>PPIO_API_KEY</code>. You can create an API Key <a href="https://ppinfra.com/settings/key-management?utm_source=github_deep-searcher">here</a>. <pre><code>config.set_provider_config("llm", "PPIO", {"model": "deepseek/deepseek-r1-turbo"})</code></pre> More details about PPIO: https://ppinfra.com/docs/get-started/quickstart.html?utm_source=github_deep-searcher </details> <details> <summary>Example (Claude Sonnet 4.5 from Jiekou.AI)</summary> Make sure you have prepared your Jiekou.AI API KEY as an env variable <code>JIEKOU_API_KEY</code>. You can create an API Key <a href="https://jiekou.ai/settings/key-management?utm_source=github_deep-searcher">here</a>. <pre><code>config.set_provider_config("llm", "JiekouAI", {"model": "claude-sonnet-4-5-20250929"})</code></pre> More details about Jiekou.AI: https://docs.jiekou.ai/docs/support/quickstart?utm_source=github_deep-searcher </details> <details> <summary>Example (Ollama)</summary> Follow <a href="https://github.com/jmorganca/ollama">these instructions</a> to set up and run a local Ollama instance: <a href="https://ollama.ai/download">Download</a> and install Ollama onto the available supported platforms (including Windows Subsystem for Linux). View a list of available models via the <a href="https://ollama.ai/library">model library</a>. Fetch available LLM models via <code>ollama pull <name-of-model></code> Example: <code>ollama pull qwen3</code> To chat directly with a model from the command line, use <code>ollama run <name-of-model></code>. By default, Ollama has a REST API for running and managing models on <a href="http://localhost:11434">http://localhost:11434</a>. <pre><code>config.set_provider_config("llm", "Ollama", {"model": "qwen3"})</code></pre> </details> <details> <summary>Example (Volcengine)</summary> Make sure you have prepared your Volcengine API KEY as an env variable <code>VOLCENGINE_API_KEY</code>. You can create an API Key <a href="https://console.volcengine.com/ark/region:ark+cn-beijing/apiKey">here</a>. <pre><code>config.set_provider_config("llm", "Volcengine", {"model": "deepseek-r1-250120"})</code></pre> More details about Volcengine: https://www.volcengine.com/docs/82379/1099455?utm_source=github_deep-searcher </details> <details> <summary>Example (GLM)</summary> Make sure you have prepared your GLM API KEY as an env variable <code>GLM_API_KEY</code>. <pre><code>config.set_provider_config("llm", "GLM", {"model": "glm-4-plus"})</code></pre> You need to install zhipuai before running, execute: <code>pip install zhipuai</code>. More details about GLM: https://bigmodel.cn/dev/welcome </details> <details> <summary>Example (Amazon Bedrock)</summary> Make sure you have prepared your Amazon Bedrock API KEY as an env variable <code>AWS_ACCESS_KEY_ID</code> and <code>AWS_SECRET_ACCESS_KEY</code>. <pre><code>config.set_provider_config("llm", "Bedrock", {"model": "us.deepseek.r1-v1:0"})</code></pre> You need to install boto3 before running, execute: <code>pip install boto3</code>. More details about Amazon Bedrock: https://docs.aws.amazon.com/bedrock/ </details> <details> <summary>Example (IBM watsonx.ai)</summary> Make sure you have prepared your watsonx.ai credentials as env variables <code>WATSONX_APIKEY</code>, <code>WATSONX_URL</code>, and <code>WATSONX_PROJECT_ID</code>. <pre><code>config.set_provider_config("llm", "watsonx", {"model": "us.deepseek.r1-v1:0"})</code></pre> You need to install ibm-watsonx-ai before running, execute: <code>pip install ibm-watsonx-ai</code>. More details about IBM watsonx.ai: https://www.ibm.com/products/watsonx-ai/foundation-models </details> #### Embedding Model Configuration <pre><code>config.set_provider_config("embedding", "(EmbeddingModelName)", "(Arguments dict)")</code></pre> The "EmbeddingModelName" can be one of the following: ["MilvusEmbedding", "OpenAIEmbedding", "VoyageEmbedding", "SiliconflowEmbedding", "PPIOEmbedding", "NovitaEmbedding", "JiekouAIEmbedding"] The "Arguments dict" is a dictionary that contains the necessary arguments for the embedding model class. <details> <summary>Example (OpenAI embedding)</summary> Make sure you have prepared your OpenAI API KEY as an env variable <code>OPENAI_API_KEY</code>. <pre><code>config.set_provider_config("embedding", "OpenAIEmbedding", {"model": "text-embedding-3-small"})</code></pre> More details about OpenAI models: https://platform.openai.com/docs/guides/embeddings/use-cases </details> <details> <summary>Example (OpenAI embedding Azure)</summary> Make sure you have prepared your OpenAI API KEY as an env variable <code>OPENAI_API_KEY</code>. <pre><code>config.set_provider_config("embedding", "OpenAIEmbedding", { "model": "text-embedding-ada-002", "azure_endpoint": "https://<youraifoundry>.openai.azure.com/", "api_version": "2023-05-15" })</code></pre> </details> <details> <summary>Example (Pymilvus built-in embedding model)</summary> Use the built-in embedding model in Pymilvus, you can set the model name as <code>"default"</code>, <code>"BAAI/bge-base-en-v1.5"</code>, <code>"BAAI/bge-large-en-v1.5"</code>, <code>"jina-embeddings-v3"</code>, etc. See [milvus_embedding.py](deepsearcher/embedding/milvus_embedding.py) for more details. <pre><code>config.set_provider_config("embedding", "MilvusEmbedding", {"model": "BAAI/bge-base-en-v1.5"})</code></pre> <pre><code>config.set_provider_config("embedding", "MilvusEmbedding", {"model": "jina-embeddings-v3"})</code></pre> For Jina's embedding model, you need<code>JINAAI_API_KEY</code>. You need to install pymilvus model before running, execute: <code>pip install pymilvus.model</code>. More details about Pymilvus: https://milvus.io/docs/embeddings.md </details> <details> <summary>Example (VoyageAI embedding)</summary> Make sure you have prepared your VOYAGE API KEY as an env variable <code>VOYAGE_API_KEY</code>. <pre><code>config.set_provider_config("embedding", "VoyageEmbedding", {"model": "voyage-3"})</code></pre> You need to install voyageai before running, execute: <code>pip install voyageai</code>. More details about VoyageAI: https://docs.voyageai.com/embeddings/ </details> <details> <summary>Example (Amazon Bedrock embedding)</summary> <pre><code>config.set_provider_config("embedding", "BedrockEmbedding", {"model": "amazon.titan-embed-text-v2:0"})</code></pre> You need to install boto3 before running, execute: <code>pip install boto3</code>. More details about Amazon Bedrock: https://docs.aws.amazon.com/bedrock/ </details> <details> <summary>Example (Novita AI embedding)</summary> Make sure you have prepared your Novita AI API KEY as an env variable <code>NOVITA_API_KEY</code>. <pre><code>config.set_provider_config("embedding", "NovitaEmbedding", {"model": "baai/bge-m3"})</code></pre> More details about Novita AI: https://novita.ai/docs/api-reference/model-apis-llm-create-embeddings?utm_source=github_deep-searcher&utm_medium=github_readme&utm_campaign=link </details> <details> <summary>Example (Siliconflow embedding)</summary> Make sure you have prepared your Siliconflow API KEY as an env variable <code>SILICONFLOW_API_KEY</code>. <pre><code>config.set_provider_config("embedding", "SiliconflowEmbedding", {"model": "BAAI/bge-m3"})</code></pre> More details about Siliconflow: https://docs.siliconflow.cn/en/api-reference/embeddings/create-embeddings </details> <details> <summary>Example (Volcengine embedding)</summary> Make sure you have prepared your Volcengine API KEY as an env variable <code>VOLCENGINE_API_KEY</code>. <pre><code>config.set_provider_config("embedding", "VolcengineEmbedding", {"model": "doubao-embedding-text-240515"})</code></pre> More details about Volcengine: https://www.volcengine.com/docs/82379/1302003 </details> <details> <summary>Example (GLM embedding)</summary> Make sure you have prepared your GLM API KEY as an env variable <code>GLM_API_KEY</code>. <pre><code>config.set_provider_config("embedding", "GLMEmbedding", {"model": "embedding-3"})</code></pre> You need to install zhipuai before running, execute: <code>pip install zhipuai</code>. More details about GLM: https://bigmodel.cn/dev/welcome </details> <details> <summary>Example (Google Gemini embedding)</summary> Make sure you have prepared your Gemini API KEY as an env variable <code>GEMINI_API_KEY</code>. <pre><code>config.set_provider_config("embedding", "GeminiEmbedding", {"model": "text-embedding-004"})</code></pre> You need to install gemini before running, execute: <code>pip install google-genai</code>. More details about Gemini: https://ai.google.dev/gemini-api/docs </details> <details> <summary>Example (Ollama embedding)</summary> <pre><code>config.set_provider_config("embedding", "OllamaEmbedding", {"model": "bge-m3"})</code></pre> You need to install ollama before running, execute: <code>pip install ollama</code>. More details about Ollama Python SDK: https://github.com/ollama/ollama-python </details> <details> <summary>Example (PPIO embedding)</summary> Make sure you have prepared your PPIO API KEY as an env variable <code>PPIO_API_KEY</code>. <pre><code>config.set_provider_config("embedding", "PPIOEmbedding", {"model": "baai/bge-m3"})</code></pre> More details about PPIO: https://ppinfra.com/docs/get-started/quickstart.html?utm_source=github_deep-searcher </details> <details> <summary>Example (Jiekou.AI embedding)</summary> Make sure you have prepared your Jiekou.AI API KEY as an env variable <code>JIEKOU_API_KEY</code>. <pre><code>config.set_provider_config("embedding", "JiekouAIEmbedding", {"model": "qwen/qwen3-embedding-8b"})</code></pre> More details about Jiekou.AI: https://docs.jiekou.ai/docs/support/quickstart?utm_source=github_deep-searcher </details> <details> <summary>Example (FastEmbed embedding)</summary> <pre><code>config.set_provider_config("embedding", "FastEmbedEmbedding", {"model": "intfloat/multilingual-e5-large"})</code></pre> You need to install fastembed before running, execute: <code>pip install fastembed</code>. More details about fastembed: https://github.com/qdrant/fastembed </details> <details> <summary>Example (IBM watsonx.ai embedding)</summary> Make sure you have prepared your WatsonX credentials as env variables <code>WATSONX_APIKEY</code>, <code>WATSONX_URL</code>, and <code>WATSONX_PROJECT_ID</code>. <pre><code>config.set_provider_config("embedding", "WatsonXEmbedding", {"model": "ibm/slate-125m-english-rtrvr-v2"})</code></pre> <pre><code>config.set_provider_config("embedding", "WatsonXEmbedding", {"model": "sentence-transformers/all-minilm-l6-v2"})</code></pre> You need to install ibm-watsonx-ai before running, execute: <code>pip install ibm-watsonx-ai</code>. More details about IBM watsonx.ai: https://www.ibm.com/products/watsonx-ai/foundation-models </details> #### Vector Database Configuration <pre><code>config.set_provider_config("vector_db", "(VectorDBName)", "(Arguments dict)")</code></pre> The "VectorDBName" can be one of the following: ["Milvus"] (Under development) The "Arguments dict" is a dictionary that contains the necessary arguments for the Vector Database class. <details> <summary>Example (Milvus)</summary> <pre><code>config.set_provider_config("vector_db", "Milvus", {"uri": "./milvus.db", "token": ""})</code></pre> More details about Milvus Config: <ul> <li> Setting the <code>uri</code> as a local file, e.g. <code>./milvus.db</code>, is the most convenient method, as it automatically utilizes <a href="https://milvus.io/docs/milvus_lite.md" target="_blank">Milvus Lite</a> to store all data in this file. </li> </ul> <ul> <li> If you have a large-scale dataset, you can set up a more performant Milvus server using <a href="https://milvus.io/docs/quickstart.md" target="_blank">Docker or Kubernetes</a>. In this setup, use the server URI, e.g., <code>http://localhost:19530</code>, as your <code>uri</code>. You can also use any other connection parameters supported by Milvus such as <code>host</code>, <code>user</code>, <code>password</code>, or <code>secure</code>. </li> </ul> <ul> <li> If you want to use <a href="https://zilliz.com/cloud" target="_blank">Zilliz Cloud</a>, the fully managed cloud service for Milvus, adjust the <code>uri</code> and <code>token</code> according to the <a href="https://docs.zilliz.com/docs/on-zilliz-cloud-console#free-cluster-details" target="_blank">Public Endpoint and API Key</a> in Zilliz Cloud. </li> </ul> </details> <details> <summary>Example (AZURE AI Search)</summary> <pre><code>config.set_provider_config("vector_db", "AzureSearch", { "endpoint": "https://<yourazureaisearch>.search.windows.net", "index_name": "<yourindex>", "api_key": "<yourkey>", "vector_field": "" })</code></pre> More details about Milvus Config: </details> #### File Loader Configuration <pre><code>config.set_provider_config("file_loader", "(FileLoaderName)", "(Arguments dict)")</code></pre> The "FileLoaderName" can be one of the following: ["PDFLoader", "TextLoader", "UnstructuredLoader"] The "Arguments dict" is a dictionary that contains the necessary arguments for the File Loader class. <details> <summary>Example (Unstructured)</summary> You can use Unstructured in two ways: <ul> <li>With API: Set environment variables <code>UNSTRUCTURED_API_KEY</code> and <code>UNSTRUCTURED_API_URL</code></li> <li>Without API: Use the local processing mode by simply not setting these environment variables</li> </ul> <pre><code>config.set_provider_config("file_loader", "UnstructuredLoader", {})</code></pre> <ul> <li>Currently supported file types: ["pdf"] (Under development)</li> <li>Installation requirements: <ul> <li>Install ingest pipeline: <code>pip install unstructured-ingest</code></li> <li>For all document formats: <code>pip install "unstructured[all-docs]"</code></li> <li>For specific formats (e.g., PDF only): <code>pip install "unstructured[pdf]"</code></li> </ul> </li> <li>More information: <ul> <li>Unstructured documentation: <a href="https://docs.unstructured.io/ingestion/overview">https://docs.unstructured.io/ingestion/overview</a></li> <li>Installation guide: <a href="https://docs.unstructured.io/open-source/installation/full-installation">https://docs.unstructured.io/open-source/installation/full-installation</a></li> </ul> </li> </ul> </details> <details> <summary>Example (Docling)</summary> <pre><code>config.set_provider_config("file_loader", "DoclingLoader", {})</code></pre> Currently supported file types: please refer to the Docling documentation: https://docling-project.github.io/docling/usage/supported_formats/#supported-output-formats You need to install docling before running, execute: <code>pip install docling</code>. More details about Docling: https://docling-project.github.io/docling/ </details> #### Web Crawler Configuration <pre><code>config.set_provider_config("web_crawler", "(WebCrawlerName)", "(Arguments dict)")</code></pre> The "WebCrawlerName" can be one of the following: ["FireCrawlCrawler", "Crawl4AICrawler", "JinaCrawler"] The "Arguments dict" is a dictionary that contains the necessary arguments for the Web Crawler class. <details> <summary>Example (FireCrawl)</summary> Make sure you have prepared your FireCrawl API KEY as an env variable <code>FIRECRAWL_API_KEY</code>. <pre><code>config.set_provider_config("web_crawler", "FireCrawlCrawler", {})</code></pre> More details about FireCrawl: https://docs.firecrawl.dev/introduction </details> <details> <summary>Example (Crawl4AI)</summary> Make sure you have run <code>crawl4ai-setup</code> in your environment. <pre><code>config.set_provider_config("web_crawler", "Crawl4AICrawler", {"browser_config": {"headless": True, "verbose": True}})</code></pre> You need to install crawl4ai before running, execute: <code>pip install crawl4ai</code>. More details about Crawl4AI: https://docs.crawl4ai.com/ </details> <details> <summary>Example (Jina Reader)</summary> Make sure you have prepared your Jina Reader API KEY as an env variable <code>JINA_API_TOKEN</code> or <code>JINAAI_API_KEY</code>. <pre><code>config.set_provider_config("web_crawler", "JinaCrawler", {})</code></pre> More details about Jina Reader: https://jina.ai/reader/ </details> <details> <summary>Example (Docling)</summary> <pre><code>config.set_provider_config("web_crawler", "DoclingCrawler", {})</code></pre> Currently supported file types: please refer to the Docling documentation: https://docling-project.github.io/docling/usage/supported_formats/#supported-output-formats You need to install docling before running, execute: <code>pip install docling</code>. More details about Docling: https://docling-project.github.io/docling/ </details> ### Python CLI Mode #### Load ```shell deepsearcher load "your_local_path_or_url" # load into a specific collection deepsearcher load "your_local_path_or_url" --collection_name "your_collection_name" --collection_desc "your_collection_description" ``` Example loading from local file: ```shell deepsearcher load "/path/to/your/local/file.pdf" # or more files at once deepsearcher load "/path/to/your/local/file1.pdf" "/path/to/your/local/file2.md" ``` Example loading from url (*Set `FIRECRAWL_API_KEY` in your environment variables, see [FireCrawl](https://docs.firecrawl.dev/introduction) for more details*): ```shell deepsearcher load "https://www.wikiwand.com/en/articles/DeepSeek" ``` #### Query ```shell deepsearcher query "Write a report about xxx." ``` More help information ```shell deepsearcher --help ``` For more help information about a specific subcommand, you can use `deepsearcher [subcommand] --help`. ```shell deepsearcher load --help deepsearcher query --help ``` ### Deployment #### Configure modules You can configure all arguments by modifying [config.yaml](./config.yaml) to set up your system with default modules. For example, set your `OPENAI_API_KEY` in the `llm` section of the YAML file. #### Start service The main script will run a FastAPI service with default address `localhost:8000`. ```shell $ python main.py ``` #### Access via browser You can open url http://localhost:8000/docs in browser to access the web service. Click on the button "Try it out", it allows you to fill the parameters and directly interact with the API. --- ## ❓ Q&A **Q1**: Why I failed to parse LLM output format / How to select the LLM? **A1**: Small LLMs struggle to follow the prompt to generate a desired response, which usually cause the format parsing problem. A better practice is to use large reasoning models e.g. deepseek-r1 671b, OpenAI o-series, Claude 4 sonnet, etc. as your LLM. --- **Q2**: OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like GPTCache/paraphrase-albert-small-v2 is not the path to a directory containing a file named config.json. Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'. **A2**: This is mainly due to abnormal access to huggingface, which may be a network or permission problem. You can try the following two methods: 1. If there is a network problem, set up a proxy, try adding the following environment variable. ```bash export HF_ENDPOINT=https://hf-mirror.com ``` 2. If there is a permission problem, set up a personal token, try adding the following environment variable. ```bash export HUGGING_FACE_HUB_TOKEN=xxxx ``` --- **Q3**: DeepSearcher doesn't run in Jupyter notebook. **A3**: Install `nest_asyncio` and then put this code block in front of your jupyter notebook. ``` pip install nest_asyncio ``` ``` import nest_asyncio nest_asyncio.apply() ``` --- ## 🔧 Module Support ### 🔹 Embedding Models - [Open-source embedding models](https://milvus.io/docs/embeddings.md) - [OpenAI](https://platform.openai.com/docs/guides/embeddings/use-cases) (`OPENAI_API_KEY` env variable required) - [VoyageAI](https://docs.voyageai.com/embeddings/) (`VOYAGE_API_KEY` env variable required) - [Amazon Bedrock](https://docs.aws.amazon.com/bedrock/) (`AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` env variable required) - [FastEmbed](https://qdrant.github.io/fastembed/) - [PPIO](https://ppinfra.com/model-api/product/llm-api?utm_source=github_deep-searcher) (`PPIO_API_KEY` env variable required) - [Novita AI](https://novita.ai/docs/api-reference/model-apis-llm-create-embeddings?utm_source=github_deep-searcher&utm_medium=github_readme&utm_campaign=link) (`NOVITA_API_KEY` env variable required) - [IBM watsonx.ai](https://www.ibm.com/products/watsonx-ai/foundation-models#ibmembedding) (`WATSONX_APIKEY`, `WATSONX_URL`, `WATSONX_PROJECT_ID` env variables required) - [Jiekou.AI](https://jiekou.ai/?utm_source=github_deep-searcher) (`JIEKOU_API_KEY` env variable required) ### 🔹 LLM Support - [OpenAI](https://platform.openai.com/docs/models) (`OPENAI_API_KEY` env variable required) - [DeepSeek](https://api-docs.deepseek.com/) (`DEEPSEEK_API_KEY` env variable required) - [XAI Grok](https://x.ai/api) (`XAI_API_KEY` env variable required) - [Anthropic Claude](https://docs.anthropic.com/en/home) (`ANTHROPIC_API_KEY` env variable required) - [SiliconFlow Inference Service](https://docs.siliconflow.cn/en/userguide/introduction) (`SILICONFLOW_API_KEY` env variable required) - [PPIO](https://ppinfra.com/model-api/product/llm-api?utm_source=github_deep-searcher) (`PPIO_API_KEY` env variable required) - [TogetherAI Inference Service](https://docs.together.ai/docs/introduction) (`TOGETHER_API_KEY` env variable required) - [Google Gemini](https://ai.google.dev/gemini-api/docs) (`GEMINI_API_KEY` env variable required) - [SambaNova Cloud Inference Service](https://docs.together.ai/docs/introduction) (`SAMBANOVA_API_KEY` env variable required) - [Ollama](https://ollama.com/) - [Novita AI](https://novita.ai/docs/guides/introduction?utm_source=github_deep-searcher&utm_medium=github_readme&utm_campaign=link) (`NOVITA_API_KEY` env variable required) - [IBM watsonx.ai](https://www.ibm.com/products/watsonx-ai/foundation-models#ibmfm) (`WATSONX_APIKEY`, `WATSONX_URL`, `WATSONX_PROJECT_ID` env variable required) - [Jiekou.AI](https://jiekou.ai/?utm_source=github_deep-searcher) (`JIEKOU_API_KEY` env variable required) ### 🔹 Document Loader - Local File - PDF(with txt/md) loader - [Unstructured](https://unstructured.io/) (under development) (`UNSTRUCTURED_API_KEY` and `UNSTRUCTURED_URL` env variables required) - Web Crawler - [FireCrawl](https://docs.firecrawl.dev/introduction) (`FIRECRAWL_API_KEY` env variable required) - [Jina Reader](https://jina.ai/reader/) (`JINA_API_TOKEN` env variable required) - [Crawl4AI](https://docs.crawl4ai.com/) (You should run command `crawl4ai-setup` for the first time) ### 🔹 Vector Database Support - [Milvus](https://milvus.io/) and [Zilliz Cloud](https://www.zilliz.com/) (fully managed Milvus) - [Qdrant](https://qdrant.tech/) --- ## 📊 Evaluation See the [Evaluation](./evaluation) directory for more details. --- ## 📌 Future Plans - Enhance web crawling functionality - Support more vector databases (e.g., FAISS...) - Add support for additional large models - Provide RESTful API interface (**DONE**) We welcome contributions! Star & Fork the project and help us build a more powerful DeepSearcher! 🎯

AI Agents Knowledge Bases & RAG

7.9K Github Stars

Open Source

akcio

# Akcio: Enhancing LLM-Powered ChatBot with CVP Stack [OSSChat](https://osschat.io) | [Documentation](https://github.com/zilliztech/akcio/wiki) | [Contact](https://zilliz.com/contact-sales) | [LICENSE](./LICENSE) Index - [Overview](#overview) - [Deployment](#deployment) - [Load Data](#load-data) - [Notice](#notice) ChatGPT has constraints due to its limited knowledge base, sometimes resulting in hallucinating answers when asked about unfamiliar topics. We are introducing the new AI stack, ChatGPT+Vector database+prompt-as-code, or the CVP Stack, to overcome this constraint. We have built [OSSChat](https://osschat.io) as a working demonstration of the CVP stack. Now we are presenting the technology behind OSSChat in this repository with a code name of Akcio.  <table> <tr> <td width="40%"> <img src="./pics/osschat.png" /> </td> <td width="40%"> <img src="https://github.com/towhee-io/data/raw/main/akcio/osschat.gif" /> </td> </tr> </table> With this project, you are able to build a knowledge-enhanced ChatBot using LLM service like ChatGPT. By the end, you will learn how to start a backend service using FastAPI, which provides standby APIs to support further applications. Alternatively, we show how to use Gradio to [build an online demo](https://github.com/zilliztech/akcio/wiki/Demo). ## Overview <img width="60%" src="./pics/architecture.png"> Akcio allows you to create a ChatGPT-like system with added intelligence obtained through semantic search of customized knowledge base. Instead of sending the user query directly to LLM service, our system firstly retrieves relevant information from stores by semantic search or keyword match. Then it feeds both user needs and helpful information into LLM. This allows LLM to better tailor its response to the user's needs and provide more accurate and helpful information. You can find more details and instructions at our [documentation](https://github.com/zilliztech/akcio/wiki). Akcio offers two AI platforms to choose from: [Towhee](https://towhee.io) or [LangChain](https://langchain.com). It also supports different integrations of LLM service and databases: | | | **Towhee** | **LangChain** | |:-----------------------:|:------------:|:------:|:-----:| | **LLM** | OpenAI | ✓ | ✓ | | | Llama-2 | ✓ | | | | Dolly | ✓ | ✓ | | | Ernie | ✓ | ✓ | | | MiniMax | ✓ | ✓ | | | DashScope | ✓ | | | | ChatGLM | ✓ | | | | SkyChat | ✓ | | | **Embedding** | OpenAI | ✓ | ✓ | | | HuggingFace | ✓ | ✓ | | **Vector Store** | Zilliz Cloud | ✓ | ✓ | | | Milvus | ✓ | ✓ | | **Scalar Store (Optional)** | Elastic | ✓ | ✓ | | **Memory Store** | Postgresql | ✓ | ✓ | | | MySQL and MariaDB | ✓ | | | | SQLite | ✓ | ✓ | | | Oracle | ✓ | | | | Microsoft SQL Server | ✓ | | | **Rerank** | MS MARCO Cross-Encoders | ✓ | | ### Option 1: Towhee The option using Towhee simplifies the process of building a system by providing [pre-defined pipelines](https://towhee.io/tasks/pipeline). These built-in pipelines require less coding and make system building much easier. If you require customization, you can either simply modify configuration or create your own pipeline with rich options of [Towhee Operators](https://towhee.io/tasks/operator). - [Pipelines](./src.towhee/pipelines) - **Insert:** The insert pipeline builds a knowledge base by saving documents and corresponding data in database(s). - **Search:** The search pipeline enables the question-answering capability powered by information retrieval (semantic search and optional keyword match) and LLM service. - **Prompt:** a prompt operator prepares messages for LLM by assembling system message, chat history, and the user's query processed by template. - [Memory](./src.towhee/memory): The memory storage stores chat history to support context in conversation. (available: [most SQL](./src.towhee/memory/sql.py)) ### Option 2: LangChain The option using LangChain employs the use of [Agent](https://python.langchain.com/docs/modules/agents) in order to enable LLM to utilize specific tools, resulting in a greater demand for LLM's ability to comprehend tasks and make informed decisions. - [Agent](./src.langchain/agent) - **ChatAgent:** agent ensembles all modules together to build up qa system. - Other agents (todo) - [LLM](./src.langchain/llm) - **ChatLLM:** large language model or service to generate answers. - [Embedding](./src.langchain/embedding/) - **TextEncoder:** encoder converts each text input to a vector. - Other encoders (todo) - [Store](./src.langchain/store) - **VectorStore:** vector database stores document chunks in embeddings, and performs document retrieval via semantic search. - **ScalarStore:** optional, database stores metadata for each document chunk, which supports additional information retrieval. (available: [Elastic](src.langchain/store/scalar_store/es.py)) - **MemoryStore:** memory storage stores chat history to support context in conversation. - [DataLoader](./src.langchain/data_loader/) - **DataParser:** tool loads data from given source and then splits documents into processed doc chunks. ## Deployment 1. Downloads ```shell $ git clone https://github.com/zilliztech/akcio.git $ cd akcio ``` 2. Install dependencies ```shell $ pip install -r requirements.txt ``` 3. Configure modules You can configure all arguments by modifying [config.py](./config.py) to set up your system with default modules. - LLM By default, the system will use **OpenAI** service as the LLM option. To set your OpenAI API key without modifying the configuration file, you can pass it as environment variable. ```shell $ export OPENAI_API_KEY=your_keys_here ``` <details> <summary> Check how to SWITCH LLM. </summary> If you want to use another supported LLM service, you can change the LLM option and set up for it. Besides directly modifying the configuration file, you can also set up via environment variables. - For example, to use **Llama-2** at local which does not require any account, you just need to change the LLM option: ```shell $ export LLM_OPTION=llama_2 ``` - For example, to use **Ernie** instead of OpenAI, you need to change the option and set up [ERNIE Bot SDK token](https://github.com/PaddlePaddle/ERNIE-Bot-SDK/tree/develop) : ```shell $ export LLM_OPTION=ernie $ export EB_API_TYPE=your_api_type $ export EB_ACCESS_TOKEN=your_ernie_access_token ``` </details> - Embedding By default, the embedding module uses methods from [Sentence Transformers](https://www.sbert.net/) to convert text inputs to vectors. Here are some information about the default embedding method: - model: [BAAI/bge-base-en](https://huggingface.co/BAAI/bge-base-en) - dim: 768 - normalization: True - Store Before getting started, all database services used for store must be running and be configured with write and create access. - Vector Store: You need to prepare the service of vector database in advance. For example, you can refer to [Milvus Documents](https://milvus.io/docs) or [Zilliz Cloud](https://zilliz.com/doc/quick_start) to learn about how to start a Milvus service. - Scalar Store (Optional): This is optional, only work when `USE_SCALAR` is true in [configuration](config.py). If this is enabled (i.e. USE_SCALAR=True), the default scalar store will use [Elastic](https://www.elastic.co/). In this case, you need to prepare the Elasticsearch service in advance. - Memory Store: By default, both LangChain and Towhee mode allow interaction with any database supported by [SQLAlchemy 2.0](https://docs.sqlalchemy.org/en/20/dialects/). The system will use default store configs. To set up your special connections for each database, you can also export environment variables instead of modifying the configuration file. For the Vector Store, set **ZILLIZ_URI**: ```shell $ export ZILLIZ_URI=your_zilliz_cloud_endpoint $ export ZILLIZ_TOKEN=your_zilliz_cloud_api_key # skip this if using Milvus instance ``` For the Memory Store, set **SQL_URI**: ```shell $ export SQL_URI={database_type}://{user}:{password}@{host}/{database_name} ``` <details> <summary>By default, scalar store (elastic) is disabled. Click to check how to enable Elastic.</summary> The following commands help to connect your Elastic cloud. ```shell $ export USE_SCALAR=True $ export ES_CLOUD_ID=your_elastic_cloud_id $ export ES_USER=your_elastic_username $ export ES_PASSWORD=your_elastic_password ``` To use host & port instead of cloud id, you can manually modify the `VECTORDB_CONFIG` in [config.py](./config.py). </details> 4. Start service The main script will run a FastAPI service with default address `localhost:8900`. - Option 1: using Towhee ```shell $ python main.py --towhee ``` - Option 2: using LangChain ```shell $ python main.py --langchain ``` 4. Access via browser You can open url http://localhost:8900/docs in browser to access the web service. <img width="80%" src="./pics/fastapi.png"> > `/`: Check service status > > `/answer`: Generate answer for the given question, with assigned session_id and project > > `/project/add`: Add data to project (will create the project if not exist) > > `/project/drop`: Drop project including delete data in both vector and memory storages. Check [Online Operations](https://github.com/zilliztech/akcio/wiki/Online-Operations) to learn more about these APIs. ## Load data The `insert` function in [operations](./src.langchain/operations.py) loads project data from url(s) or file(s). There are 2 options to load project data: ### Option 1: Offline We recommend this method, which loads data in separate steps. There is also advanced options to load document, for example, generating and inserting potential questions for each doc chunk. Refer to [offline_tools](./offline_tools) for instructions. ### Option 2. Online When the [FastAPI service](#deployment) is up, you can use the POST request `http://localhost:8900/project/add` to load data. Parameters: ```json { "project": "project_name", "data_src": "path_to_doc", "source_type": "file" } ``` or ```json { "project": "project_name", "data_src": "doc_url", "source_type": "url" } ``` This method is only recommended to load a small amount of data, but **not for a large amount of data**. ## LICENSE Akcio is published under the [Server Side Public License (SSPL) v1](./LICENSE).

Knowledge Bases & RAG Live Chat & Chatbots

260 Github Stars

Open Source

claude-context

![](assets/claude-context.png) > 🆕 **Looking for persistent memory for Claude Code?** Check out [memsearch Claude Code plugin](https://github.com/zilliztech/memsearch#for-claude-code-users) — a markdown-first memory system that gives your AI agent long-term memory across sessions. ### Your entire codebase as Claude's context [![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT) [![Node.js](https://img.shields.io/badge/Node.js-20%2B-green.svg)](https://nodejs.org/) [![Documentation](https://img.shields.io/badge/Documentation-📚-orange.svg)](docs/) [![VS Code Marketplace](https://img.shields.io/visual-studio-marketplace/v/zilliz.semanticcodesearch?label=VS%20Code%20Extension&logo=visual-studio-code)](https://marketplace.visualstudio.com/items?itemName=zilliz.semanticcodesearch) [![npm - core](https://img.shields.io/npm/v/@zilliz/claude-context-core?label=%40zilliz%2Fclaude-context-core&logo=npm)](https://www.npmjs.com/package/@zilliz/claude-context-core) [![npm - mcp](https://img.shields.io/npm/v/@zilliz/claude-context-mcp?label=%40zilliz%2Fclaude-context-mcp&logo=npm)](https://www.npmjs.com/package/@zilliz/claude-context-mcp) [![Twitter](https://img.shields.io/twitter/url/https/twitter.com/zilliz_universe.svg?style=social&label=Follow%20%40Zilliz)](https://twitter.com/zilliz_universe) [![DeepWiki](https://img.shields.io/badge/DeepWiki-AI%20Docs-purple.svg?logo=gitbook&logoColor=white)](https://deepwiki.com/zilliztech/claude-context) <a href="https://discord.gg/mKc3R95yE5"><img height="20" src="https://img.shields.io/badge/Discord-%235865F2.svg?style=for-the-badge&logo=discord&logoColor=white" alt="discord" /></a> <a href="https://trendshift.io/repositories/15064"><img src="https://trendshift.io/api/badge/repositories/15064" alt="zilliztech/claude-context | Trendshift" width="250" height="55" /></a> </div> **Claude Context** is an MCP plugin that adds semantic code search to Claude Code and other AI coding agents, giving them deep context from your entire codebase. 🧠 **Your Entire Codebase as Context**: Claude Context uses semantic search to find all relevant code from millions of lines. No multi-round discovery needed. It brings results straight into the Claude's context. 💰 **Cost-Effective for Large Codebases**: Instead of loading entire directories into Claude for every request, which can be very expensive, Claude Context efficiently stores your codebase in a vector database and only uses related code in context to keep your costs manageable. --- ## 🚀 Demo ![img](https://lh7-rt.googleusercontent.com/docsz/AD_4nXf2uIf2c5zowp-iOMOqsefHbY_EwNGiutkxtNXcZVJ8RI6SN9DsCcsc3amXIhOZx9VcKFJQLSAqM-2pjU9zoGs1r8GCTUL3JIsLpLUGAm1VQd5F2o5vpEajx2qrc77iXhBu1zWj?key=qYdFquJrLcfXCUndY-YRBQ) Model Context Protocol (MCP) allows you to integrate Claude Context with your favorite AI coding assistants, e.g. Claude Code. ## Quick Start ### Prerequisites <details> <summary>Get a free vector database on Zilliz Cloud 👈</summary> Claude Context needs a vector database. You can [sign up](https://cloud.zilliz.com/signup?utm_source=github&utm_medium=referral&utm_campaign=2507-codecontext-readme) on Zilliz Cloud to get an API key. ![](assets/signup_and_get_apikey.png) Copy your Personal Key to replace `your-zilliz-cloud-api-key` in the configuration examples. </details> <details> <summary>Get OpenAI API Key for embedding model</summary> You need an OpenAI API key for the embedding model. You can get one by signing up at [OpenAI](https://platform.openai.com/api-keys). Your API key will look like this: it always starts with `sk-`. Copy your key and use it in the configuration examples below as `your-openai-api-key`. </details> ### Configure MCP for Claude Code **System Requirements:** - Node.js >= 20.0.0 #### Configuration Use the command line interface to add the Claude Context MCP server: ```bash claude mcp add claude-context \ -e OPENAI_API_KEY=sk-your-openai-api-key \ -e MILVUS_ADDRESS=your-zilliz-cloud-public-endpoint \ -e MILVUS_TOKEN=your-zilliz-cloud-api-key \ -- npx @zilliz/claude-context-mcp@latest ``` See the [Claude Code MCP documentation](https://docs.anthropic.com/en/docs/claude-code/mcp) for more details about MCP server management. ### Other MCP Client Configurations <details> <summary>OpenAI Codex CLI</summary> Codex CLI uses TOML configuration files: 1. Create or edit the `~/.codex/config.toml` file. 2. Add the following configuration: ```toml # IMPORTANT: the top-level key is `mcp_servers` rather than `mcpServers`. [mcp_servers.claude-context] command = "npx" args = ["@zilliz/claude-context-mcp@latest"] env = { "OPENAI_API_KEY" = "your-openai-api-key", "MILVUS_TOKEN" = "your-zilliz-cloud-api-key" } # Optional: override the default 10s startup timeout startup_timeout_ms = 20000 ``` 3. Save the file and restart Codex CLI to apply the changes. </details> <details> <summary>Gemini CLI</summary> Gemini CLI requires manual configuration through a JSON file: 1. Create or edit the `~/.gemini/settings.json` file. 2. Add the following configuration: ```json { "mcpServers": { "claude-context": { "command": "npx", "args": ["@zilliz/claude-context-mcp@latest"], "env": { "OPENAI_API_KEY": "your-openai-api-key", "MILVUS_TOKEN": "your-zilliz-cloud-api-key" } } } } ``` 3. Save the file and restart Gemini CLI to apply the changes. </details> <details> <summary>Qwen Code</summary> Create or edit the `~/.qwen/settings.json` file and add the following configuration: ```json { "mcpServers": { "claude-context": { "command": "npx", "args": ["@zilliz/claude-context-mcp@latest"], "env": { "OPENAI_API_KEY": "your-openai-api-key", "MILVUS_ADDRESS": "your-zilliz-cloud-public-endpoint", "MILVUS_TOKEN": "your-zilliz-cloud-api-key" } } } } ``` </details> <details> <summary>Cursor</summary> Go to: `Settings` -> `Cursor Settings` -> `MCP` -> `Add new global MCP server` Pasting the following configuration into your Cursor `~/.cursor/mcp.json` file is the recommended approach. You may also install in a specific project by creating `.cursor/mcp.json` in your project folder. See [Cursor MCP docs](https://cursor.com/docs/context/mcp) for more info. ```json { "mcpServers": { "claude-context": { "command": "npx", "args": ["-y", "@zilliz/claude-context-mcp@latest"], "env": { "OPENAI_API_KEY": "your-openai-api-key", "MILVUS_ADDRESS": "your-zilliz-cloud-public-endpoint", "MILVUS_TOKEN": "your-zilliz-cloud-api-key" } } } } ``` </details> <details> <summary>Void</summary> Go to: `Settings` -> `MCP` -> `Add MCP Server` Add the following configuration to your Void MCP settings: ```json { "mcpServers": { "code-context": { "command": "npx", "args": ["-y", "@zilliz/claude-context-mcp@latest"], "env": { "OPENAI_API_KEY": "your-openai-api-key", "MILVUS_ADDRESS": "your-zilliz-cloud-public-endpoint", "MILVUS_TOKEN": "your-zilliz-cloud-api-key" } } } } ``` </details> <details> <summary>Claude Desktop</summary> Add to your Claude Desktop configuration: ```json { "mcpServers": { "claude-context": { "command": "npx", "args": ["@zilliz/claude-context-mcp@latest"], "env": { "OPENAI_API_KEY": "your-openai-api-key", "MILVUS_ADDRESS": "your-zilliz-cloud-public-endpoint", "MILVUS_TOKEN": "your-zilliz-cloud-api-key" } } } } ``` </details> <details> <summary>Windsurf</summary> Windsurf supports MCP configuration through a JSON file. Add the following configuration to your Windsurf MCP settings: ```json { "mcpServers": { "claude-context": { "command": "npx", "args": ["-y", "@zilliz/claude-context-mcp@latest"], "env": { "OPENAI_API_KEY": "your-openai-api-key", "MILVUS_ADDRESS": "your-zilliz-cloud-public-endpoint", "MILVUS_TOKEN": "your-zilliz-cloud-api-key" } } } } ``` </details> <details> <summary>VS Code</summary> The Claude Context MCP server can be used with VS Code through MCP-compatible extensions. Add the following configuration to your VS Code MCP settings: ```json { "mcpServers": { "claude-context": { "command": "npx", "args": ["-y", "@zilliz/claude-context-mcp@latest"], "env": { "OPENAI_API_KEY": "your-openai-api-key", "MILVUS_ADDRESS": "your-zilliz-cloud-public-endpoint", "MILVUS_TOKEN": "your-zilliz-cloud-api-key" } } } } ``` </details> <details> <summary>Cherry Studio</summary> Cherry Studio allows for visual MCP server configuration through its settings interface. While it doesn't directly support manual JSON configuration, you can add a new server via the GUI: 1. Navigate to **Settings → MCP Servers → Add Server**. 2. Fill in the server details: - **Name**: `claude-context` - **Type**: `STDIO` - **Command**: `npx` - **Arguments**: `["-y", "@zilliz/claude-context-mcp@latest"]` - **Environment Variables**: - `OPENAI_API_KEY`: `your-openai-api-key` - `MILVUS_ADDRESS`: `your-zilliz-cloud-public-endpoint` - `MILVUS_TOKEN`: `your-zilliz-cloud-api-key` 3. Save the configuration to activate the server. </details> <details> <summary>Cline</summary> Cline uses a JSON configuration file to manage MCP servers. To integrate the provided MCP server configuration: 1. Open Cline and click on the **MCP Servers** icon in the top navigation bar. 2. Select the **Installed** tab, then click **Advanced MCP Settings**. 3. In the `cline_mcp_settings.json` file, add the following configuration: ```json { "mcpServers": { "claude-context": { "command": "npx", "args": ["@zilliz/claude-context-mcp@latest"], "env": { "OPENAI_API_KEY": "your-openai-api-key", "MILVUS_ADDRESS": "your-zilliz-cloud-public-endpoint", "MILVUS_TOKEN": "your-zilliz-cloud-api-key" } } } } ``` 4. Save the file. </details> <details> <summary>Augment</summary> To configure Claude Context MCP in Augment Code, you can use either the graphical interface or manual configuration. #### **A. Using the Augment Code UI** 1. Click the hamburger menu. 2. Select **Settings**. 3. Navigate to the **Tools** section. 4. Click the **+ Add MCP** button. 5. Enter the following command: ``` npx @zilliz/claude-context-mcp@latest ``` 6. Name the MCP: **Claude Context**. 7. Click the **Add** button. ------ #### **B. Manual Configuration** 1. Press Cmd/Ctrl Shift P or go to the hamburger menu in the Augment panel 2. Select Edit Settings 3. Under Advanced, click Edit in settings.json 4. Add the server configuration to the `mcpServers` array in the `augment.advanced` object ```json "augment.advanced": { "mcpServers": [ { "name": "claude-context", "command": "npx", "args": ["-y", "@zilliz/claude-context-mcp@latest"], "env": { "OPENAI_API_KEY": "your-openai-api-key", "MILVUS_ADDRESS": "your-zilliz-cloud-public-endpoint", "MILVUS_TOKEN": "your-zilliz-cloud-api-key" } } ] } ``` </details> <details> <summary>Roo Code</summary> Roo Code utilizes a JSON configuration file for MCP servers: 1. Open Roo Code and navigate to **Settings → MCP Servers → Edit Global Config**. 2. In the `mcp_settings.json` file, add the following configuration: ```json { "mcpServers": { "claude-context": { "command": "npx", "args": ["@zilliz/claude-context-mcp@latest"], "env": { "OPENAI_API_KEY": "your-openai-api-key", "MILVUS_ADDRESS": "your-zilliz-cloud-public-endpoint", "MILVUS_TOKEN": "your-zilliz-cloud-api-key" } } } } ``` 3. Save the file to activate the server. </details> <details> <summary>Zencoder</summary> Zencoder offers support for MCP tools and servers in both its JetBrains and VS Code plugin versions. 1. Go to the Zencoder menu (...) 2. From the dropdown menu, select `Tools` 3. Click on the `Add Custom MCP` 4. Add the name (i.e. `Claude Context` and server configuration from below, and make sure to hit the `Install` button ```json { "command": "npx", "args": ["@zilliz/claude-context-mcp@latest"], "env": { "OPENAI_API_KEY": "your-openai-api-key", "MILVUS_ADDRESS": "your-zilliz-cloud-public-endpoint", "MILVUS_TOKEN": "your-zilliz-cloud-api-key" } } ``` 5. Save the server by hitting the `Install` button. </details> <details> <summary>LangChain/LangGraph</summary> For LangChain/LangGraph integration examples, see [this example](https://github.com/zilliztech/claude-context/blob/643796a0d30e706a2a0dff3d55621c9b5d831807/evaluation/retrieval/custom.py#L88). </details> <details> <summary>Other MCP Clients</summary> The server uses stdio transport and follows the standard MCP protocol. It can be integrated with any MCP-compatible client by running: ```bash npx @zilliz/claude-context-mcp@latest ``` </details> --- ### Usage in Your Codebase 1. **Open Claude Code** ``` cd your-project-directory claude ``` 2. **Index your codebase**: ``` Index this codebase ``` 3. **Check indexing status**: ``` Check the indexing status ``` 4. **Start searching**: ``` Find functions that handle user authentication ``` 🎉 **That's it!** You now have semantic code search in Claude Code. --- ### Environment Variables Configuration For more detailed MCP environment variable configuration, see our [Environment Variables Guide](docs/getting-started/environment-variables.md). ### Using Different Embedding Models To configure custom embedding models (e.g., `text-embedding-3-large` for OpenAI, `voyage-code-3` for VoyageAI), see the [MCP Configuration Examples](packages/mcp/README.md#embedding-provider-configuration) for detailed setup instructions for each provider. ### File Inclusion & Exclusion Rules For detailed explanation of file inclusion and exclusion rules, and how to customize them, see our [File Inclusion & Exclusion Rules](docs/dive-deep/file-inclusion-rules.md). ### Available Tools #### 1. `index_codebase` Index a codebase directory for hybrid search (BM25 + dense vector). #### 2. `search_code` Search the indexed codebase using natural language queries with hybrid search (BM25 + dense vector). #### 3. `clear_index` Clear the search index for a specific codebase. #### 4. `get_indexing_status` Get the current indexing status of a codebase. Shows progress percentage for actively indexing codebases and completion status for indexed codebases. --- ## 📊 Evaluation Our controlled evaluation demonstrates that Claude Context MCP achieves ~40% token reduction under the condition of equivalent retrieval quality. This translates to significant cost and time savings in production environments. This also means that, under the constraint of limited token context length, using Claude Context yields better retrieval and answer results. ![MCP Efficiency Analysis](assets/mcp_efficiency_analysis_chart.png) For detailed evaluation methodology and results, see the [evaluation directory](evaluation/). --- ## 🏗️ Architecture ![](assets/Architecture.png) ### 🔧 Implementation Details - 🔍 **Hybrid Code Search**: Ask questions like *"find functions that handle user authentication"* and get relevant, context-rich code instantly using advanced hybrid search (BM25 + dense vector). - 🧠 **Context-Aware**: Discover large codebase, understand how different parts of your codebase relate, even across millions of lines of code. - ⚡ **Incremental Indexing**: Efficiently re-index only changed files using Merkle trees. - 🧩 **Intelligent Code Chunking**: Analyze code in Abstract Syntax Trees (AST) for chunking. - 🗄️ **Scalable**: Integrates with Zilliz Cloud for scalable vector search, no matter how large your codebase is. - 🛠️ **Customizable**: Configure file extensions, ignore patterns, and embedding models. ### Core Components Claude Context is a monorepo containing three main packages: - **`@zilliz/claude-context-core`**: Core indexing engine with embedding and vector database integration - **VSCode Extension**: Semantic Code Search extension for Visual Studio Code - **`@zilliz/claude-context-mcp`**: Model Context Protocol server for AI agent integration ### Supported Technologies - **Embedding Providers**: [OpenAI](https://openai.com), [VoyageAI](https://voyageai.com), [Ollama](https://ollama.com), [Gemini](https://gemini.google.com) - **Vector Databases**: [Milvus](https://milvus.io) or [Zilliz Cloud](https://zilliz.com/cloud)(fully managed vector database as a service) - **Code Splitters**: AST-based splitter (with automatic fallback), LangChain character-based splitter - **Languages**: TypeScript, JavaScript, Python, Java, C++, C#, Go, Rust, PHP, Ruby, Swift, Kotlin, Scala, Markdown - **Development Tools**: VSCode, Model Context Protocol --- ## 📦 Other Ways to Use Claude Context While MCP is the recommended way to use Claude Context with AI assistants, you can also use it directly or through the VSCode extension. ### Build Applications with Core Package The `@zilliz/claude-context-core` package provides the fundamental functionality for code indexing and semantic search. ```typescript import { Context, MilvusVectorDatabase, OpenAIEmbedding } from '@zilliz/claude-context-core'; // Initialize embedding provider const embedding = new OpenAIEmbedding({ apiKey: process.env.OPENAI_API_KEY || 'your-openai-api-key', model: 'text-embedding-3-small' }); // Initialize vector database const vectorDatabase = new MilvusVectorDatabase({ address: process.env.MILVUS_ADDRESS || 'your-zilliz-cloud-public-endpoint', token: process.env.MILVUS_TOKEN || 'your-zilliz-cloud-api-key' }); // Create context instance const context = new Context({ embedding, vectorDatabase }); // Index your codebase with progress tracking const stats = await context.indexCodebase('./your-project', (progress) => { console.log(`${progress.phase} - ${progress.percentage}%`); }); console.log(`Indexed ${stats.indexedFiles} files, ${stats.totalChunks} chunks`); // Perform semantic search const results = await context.semanticSearch('./your-project', 'vector database operations', 5); results.forEach(result => { console.log(`File: ${result.relativePath}:${result.startLine}-${result.endLine}`); console.log(`Score: ${(result.score * 100).toFixed(2)}%`); console.log(`Content: ${result.content.substring(0, 100)}...`); }); ``` ### VSCode Extension Integrates Claude Context directly into your IDE. Provides an intuitive interface for semantic code search and navigation. 1. **Direct Link**: [Install from VS Code Marketplace](https://marketplace.visualstudio.com/items?itemName=zilliz.semanticcodesearch) 2. **Manual Search**: - Open Extensions view in VSCode (Ctrl+Shift+X or Cmd+Shift+X on Mac) - Search for "Semantic Code Search" - Click Install ![img](https://lh7-rt.googleusercontent.com/docsz/AD_4nXdtCtT9Qi6o5mGVoxzX50r8Nb6zDFcjvTQR7WZ-xMbEsHEPPhSYAFVJ7q4-rETzxJ8wy1cyZmU8CmtpNhAU8PGOqVnE2kc2HCn1etDg97Qsh7m89kBjG4ZT7XBgO4Dp7BfFZx7eow?key=qYdFquJrLcfXCUndY-YRBQ) --- ## 🛠️ Development ### Setup Development Environment #### Prerequisites - Node.js 20.x, 22.x, or 24.x - pnpm (recommended package manager) #### Cross-Platform Setup ```bash # Clone repository git clone https://github.com/zilliztech/claude-context.git cd claude-context # Install dependencies pnpm install # Build all packages pnpm build # Start development mode pnpm dev ``` #### Windows-Specific Setup On Windows, ensure you have: - **Git for Windows** with proper line ending configuration - **Node.js** installed via the official installer or package manager - **pnpm** installed globally: `npm install -g pnpm` ```powershell # Windows PowerShell/Command Prompt git clone https://github.com/zilliztech/claude-context.git cd claude-context # Configure git line endings (recommended) git config core.autocrlf false # Install dependencies pnpm install # Build all packages (uses cross-platform scripts) pnpm build # Start development mode pnpm dev ``` ### Building ```bash # Build all packages (cross-platform) pnpm build # Build specific package pnpm build:core pnpm build:vscode pnpm build:mcp # Performance benchmarking pnpm benchmark ``` #### Windows Build Notes - All build scripts are cross-platform compatible using rimraf - Build caching is enabled for faster subsequent builds - Use PowerShell or Command Prompt - both work equally well ### Running Examples ```bash # Development with file watching cd examples/basic-usage pnpm dev ``` --- ## 📖 Examples Check the `/examples` directory for complete usage examples: - **Basic Usage**: Simple indexing and search example --- ## ❓ FAQ **Common Questions:** - **[What files does Claude Context decide to embed?](docs/troubleshooting/faq.md#q-what-files-does-claude-context-decide-to-embed)** - **[Can I use a fully local deployment setup?](docs/troubleshooting/faq.md#q-can-i-use-a-fully-local-deployment-setup)** - **[Does it support multiple projects / codebases?](docs/troubleshooting/faq.md#q-does-it-support-multiple-projects--codebases)** - **[How does Claude Context compare to other coding tools?](docs/troubleshooting/faq.md#q-how-does-claude-context-compare-to-other-coding-tools-like-serena-context7-or-deepwiki)** ❓ For detailed answers and more troubleshooting tips, see our [FAQ Guide](docs/troubleshooting/faq.md). 🔧 **Encountering issues?** Visit our [Troubleshooting Guide](docs/troubleshooting/troubleshooting-guide.md) for step-by-step solutions. 📚 **Need more help?** Check out our [complete documentation](docs/) for detailed guides and troubleshooting tips. --- ## 🤝 Contributing We welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details on how to get started. **Package-specific contributing guides:** - [Core Package Contributing](packages/core/CONTRIBUTING.md) - [MCP Server Contributing](packages/mcp/CONTRIBUTING.md) - [VSCode Extension Contributing](packages/vscode-extension/CONTRIBUTING.md) --- ## 🗺️ Roadmap - [x] AST-based code analysis for improved understanding - [x] Support for additional embedding providers - [ ] Agent-based interactive search mode - [x] Enhanced code chunking strategies - [ ] Search result ranking optimization - [ ] Robust Chrome Extension --- ## 📄 License This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. --- ## 🔗 Links - [GitHub Repository](https://github.com/zilliztech/claude-context) - [VSCode Marketplace](https://marketplace.visualstudio.com/items?itemName=zilliz.semanticcodesearch) - [Milvus Documentation](https://milvus.io/docs) - [Zilliz Cloud](https://zilliz.com/cloud)

AI Tools AI Agents

11.8K Github Stars

Open Source

memsearch

<h1 align="center"> <img src="assets/logo-icon.jpg" alt="" width="100" valign="middle">   memsearch </h1> Cross-platform semantic memory for AI coding agents. <a href="https://pypi.org/project/memsearch/"><img src="https://img.shields.io/pypi/v/memsearch?style=flat-square&color=blue" alt="PyPI"></a> <a href="https://zilliztech.github.io/memsearch/platforms/claude-code/"><img src="https://img.shields.io/badge/Claude_Code-plugin-c97539?style=flat-square&logo=claude&logoColor=white" alt="Claude Code"></a> <a href="https://zilliztech.github.io/memsearch/platforms/openclaw/"><img src="https://img.shields.io/badge/OpenClaw-plugin-4a9eff?style=flat-square" alt="OpenClaw"></a> <a href="https://zilliztech.github.io/memsearch/platforms/opencode/"><img src="https://img.shields.io/badge/OpenCode-plugin-22c55e?style=flat-square" alt="OpenCode"></a> <a href="https://zilliztech.github.io/memsearch/platforms/codex/"><img src="https://img.shields.io/badge/Codex_CLI-plugin-ff6b35?style=flat-square" alt="Codex CLI"></a> <a href="https://pypi.org/project/memsearch/"><img src="https://img.shields.io/badge/python-%3E%3D3.10-blue?style=flat-square&logo=python&logoColor=white" alt="Python"></a> <a href="https://github.com/zilliztech/memsearch/blob/main/LICENSE"><img src="https://img.shields.io/github/license/zilliztech/memsearch?style=flat-square" alt="License"></a> <a href="https://github.com/zilliztech/memsearch/actions/workflows/test.yml"><img src="https://img.shields.io/github/actions/workflow/status/zilliztech/memsearch/test.yml?branch=main&style=flat-square" alt="Tests"></a> <a href="https://zilliztech.github.io/memsearch/"><img src="https://img.shields.io/badge/docs-memsearch-blue?style=flat-square" alt="Docs"></a> <a href="https://github.com/zilliztech/memsearch/stargazers"><img src="https://img.shields.io/github/stars/zilliztech/memsearch?style=flat-square" alt="Stars"></a> <a href="https://discord.com/invite/FG6hMJStWu"><img src="https://img.shields.io/badge/Discord-chat-7289da?style=flat-square&logo=discord&logoColor=white" alt="Discord"></a> <a href="https://x.com/zilliz_universe"><img src="https://img.shields.io/badge/follow-%40zilliz__universe-000000?style=flat-square&logo=x&logoColor=white" alt="X (Twitter)"></a> <img src="https://github.com/user-attachments/assets/427b7152-bc16-408c-a8b0-59a2b05fd1e0" alt="memsearch demo" width="800"> ### Why memsearch? - 🌐 **All Platforms, One Memory** — memories flow across [Claude Code](plugins/claude-code/README.md), [OpenClaw](plugins/openclaw/README.md), [OpenCode](plugins/opencode/README.md), and [Codex CLI](plugins/codex/README.md). A conversation in one agent becomes searchable context in all others — no extra setup - 👥 **For Agent Users**, install a plugin and get persistent memory with zero effort; **for Agent Developers**, use the full [CLI](https://zilliztech.github.io/memsearch/cli/) and [Python API](https://zilliztech.github.io/memsearch/python-api/) to build memory and harness engineering into your own agents - 📄 **Markdown is the source of truth** — inspired by [OpenClaw](https://github.com/openclaw/openclaw). Your memories are just `.md` files — human-readable, editable, version-controllable. Milvus is a "shadow index": a derived, rebuildable cache - 🔍 **Progressive retrieval, hybrid search, smart dedup, live sync** — 3-layer recall (search → expand → transcript); dense vector + BM25 sparse + RRF reranking; SHA-256 content hashing skips unchanged content; file watcher auto-indexes in real time --- ## 🧑‍💻 For Agent Users Pick your platform, install the plugin, and you're done. Each plugin captures conversations automatically and provides semantic recall with zero configuration. <details open> <summary><h3>For Claude Code Users</h3></summary> ```bash # Install /plugin marketplace add zilliztech/memsearch /plugin install memsearch # Restart Claude Code to activate the plugin ``` After restarting, just chat with Claude Code as usual. The plugin captures every conversation turn automatically. **Verify it's working** — after a few conversations, check your memory files: ```bash ls .memsearch/memory/ # you should see daily .md files cat .memsearch/memory/$(date +%Y-%m-%d).md ``` **Recall memories** — two ways to trigger: ``` /memory-recall what did we discuss about Redis? ``` Or just ask naturally — Claude auto-invokes the skill when it senses the question needs history: ``` We discussed Redis caching before, what was the TTL we chose? ``` > 📖 [Claude Code Plugin docs](https://zilliztech.github.io/memsearch/platforms/claude-code/) · [Troubleshooting](https://zilliztech.github.io/memsearch/platforms/claude-code/troubleshooting/) </details> <details open> <summary><h3>For Codex CLI Users</h3></summary> ```bash # Install git clone --depth 1 https://github.com/zilliztech/memsearch.git bash memsearch/plugins/codex/scripts/install.sh codex --yolo # needed for ONNX model network access ``` After installing, chat as usual. Hooks capture and summarize each turn. **Verify it's working:** ```bash ls .memsearch/memory/ ``` **Recall memories** — use the skill: ``` $memory-recall what did we discuss about deployment? ``` > 📖 [Codex CLI Plugin docs](https://zilliztech.github.io/memsearch/platforms/codex/) </details> <details> <summary><h3>For OpenClaw Users</h3></summary> ```bash # Install from ClawHub openclaw plugins install --force clawhub:memsearch openclaw config set plugins.entries.memsearch.hooks.allowConversationAccess true openclaw config set plugins.entries.memsearch.hooks.allowPromptInjection true openclaw gateway restart ``` After installing, chat in TUI as usual. The plugin captures each turn automatically. **Verify it's working** — memory files are stored in your agent's workspace: ```bash # For the main agent: ls ~/.openclaw/workspace/.memsearch/memory/ # For other agents (e.g. work): ls ~/.openclaw/workspace-work/.memsearch/memory/ ``` **Recall memories** — two ways to trigger: ``` /memory-recall what was the batch size limit we set? ``` Or just ask naturally — the LLM auto-invokes memory tools when it senses the question needs history: ``` We discussed batch size limits before, what did we decide? ``` > 📖 [OpenClaw Plugin docs](https://zilliztech.github.io/memsearch/platforms/openclaw/) · [Browse on ClawHub](https://clawhub.ai/plugins/memsearch) </details> <details> <summary><h3>For OpenCode Users</h3></summary> ```json // In ~/.config/opencode/opencode.json { "plugin": ["@zilliz/memsearch-opencode"] } ``` After installing, chat in TUI as usual. A background daemon captures conversations. **Verify it's working:** ```bash ls .memsearch/memory/ # daily .md files appear after a few conversations ``` **Recall memories** — two ways to trigger: ``` /memory-recall what did we discuss about authentication? ``` Or just ask naturally — the LLM auto-invokes memory tools when it senses the question needs history: ``` We discussed the authentication flow before, what was the approach? ``` > 📖 [OpenCode Plugin docs](https://zilliztech.github.io/memsearch/platforms/opencode/) </details> ### ⚙️ Configuration (all platforms) All plugins share the same memsearch backend. Configure once, works everywhere. #### Embedding Defaults to **ONNX bge-m3** — runs locally on CPU, no API key, no cost. On first launch the model (~558 MB) is downloaded from HuggingFace Hub. ```bash memsearch config set embedding.provider onnx # default — local, free memsearch config set embedding.provider openai # needs OPENAI_API_KEY memsearch config set embedding.provider ollama # local, any model ``` > All providers and models: [Configuration — Embedding Provider](https://zilliztech.github.io/memsearch/home/configuration/#embedding-provider) #### Milvus Backend Just change `milvus_uri` (and optionally `milvus_token`) to switch between deployment modes: **Milvus Lite** (default) — zero config, single file. Great for getting started: ```bash # Works out of the box, no setup needed memsearch config get milvus.uri # → ~/.memsearch/milvus.db ``` ⭐ **Zilliz Cloud** (recommended) — fully managed, [free tier available](https://cloud.zilliz.com/signup?utm_source=github&utm_medium=referral&utm_campaign=memsearch-readme) — [sign up](https://cloud.zilliz.com/signup?utm_source=github&utm_medium=referral&utm_campaign=memsearch-readme) 👇: ```bash memsearch config set milvus.uri "https://in03-xxx.api.gcp-us-west1.zillizcloud.com" memsearch config set milvus.token "your-api-key" ``` <details> <summary>⭐ Sign up for a free Zilliz Cloud cluster</summary> You can [sign up](https://cloud.zilliz.com/signup?utm_source=github&utm_medium=referral&utm_campaign=memsearch-readme) on Zilliz Cloud to get a free cluster and API key. ![Sign up and get API key](https://raw.githubusercontent.com/zilliztech/claude-context/master/assets/signup_and_get_apikey.png) </details> <details> <summary>Self-hosted Milvus Server (Docker) — for advanced users</summary> For multi-user or team environments with a dedicated Milvus instance. Requires Docker. See the [official installation guide](https://milvus.io/docs/install_standalone-docker-compose.md). ```bash memsearch config set milvus.uri http://localhost:19530 ``` </details> > 📖 Full configuration guide: [Configuration](https://zilliztech.github.io/memsearch/home/configuration/) · [Platform comparison](https://zilliztech.github.io/memsearch/platforms/) #### Capture Summarization Routing Each plugin keeps its native capture summarizer unless you override it explicitly: ```bash memsearch config set plugins.codex.summarize.model gpt-5.1-codex-mini memsearch config set plugins.opencode.summarize.model anthropic/claude-haiku ``` Advanced users can route plugin summarization through a memsearch-managed API provider: ```bash memsearch config set llm.providers.openai.type openai memsearch config set llm.providers.openai.model gpt-5-mini memsearch config set llm.providers.openai.api_key env:OPENAI_API_KEY memsearch config set plugins.codex.summarize.provider openai ``` Leave `plugins.<platform>.summarize.provider` empty or set it to `native` to preserve the default behavior. Plugin-specific summarize settings do not fall back to `llm.model`. You can also disable automatic capture for a project while keeping the plugin installed: ```bash memsearch config set plugins.codex.summarize.enabled false --project ``` #### Advanced Memory Maintenance Plugins can optionally maintain higher-level project and user notes in the background. These tasks are disabled by default and run only when a plugin wakes them after a session/turn, the journal input changed, and `min_interval_hours` has elapsed. ```bash memsearch config set plugins.codex.project_review.enabled true --project memsearch config set plugins.codex.project_review.provider native --project memsearch config set plugins.codex.project_review.min_interval_hours 24 --project memsearch config set plugins.codex.project_review.output_file .memsearch/PROJECT.md --project memsearch config set plugins.codex.user_profile.enabled true --project memsearch config set plugins.codex.user_profile.output_file .memsearch/USER.md --project ``` `project_review` summarizes durable project state such as active threads, decisions, risks, and next steps. `user_profile` captures reusable user preferences, working style, recurring goals, and background context. Both read `.memsearch/memory` by default; set `input_dir` if your journal files live somewhere else. Use `provider = "native"` to reuse the current agent's own non-interactive model path, or point the task at a named `[llm.providers.<name>]` API provider. Custom prompt files can be configured with `prompts.project_review` and `prompts.user_profile`. The `memory-config` skill, installed with the plugins, can inspect the current setup, explain these options, and make safe project-scoped changes from natural-language requests. ### What can you use it for? - **Resume debugging threads** — ask how a similar Redis, Docker, database, or deployment issue was fixed last time. - **Recover decision rationale** — find why the project chose one architecture, library, migration path, or API design over another. - **Trace feature history** — understand how a feature evolved across sessions, including the files changed and tradeoffs discussed. - **Do code archaeology** — ask when and why a module, config, or workflow was changed before touching it again. - **Find the right session to resume** — ask which previous conversation covered a topic, recover the relevant context, and continue from there. - **Carry context across agents** — keep Claude Code, Codex CLI, OpenClaw, and OpenCode working from the same project memory. --- ## 🛠️ For Agent Developers Beyond ready-to-use plugins, memsearch provides a complete **CLI and Python API** for building memory into your own agents. Whether you're adding persistent context to a custom agent, building a memory-augmented RAG pipeline, or doing harness engineering — the same core engine that powers the plugins is available as a library. ### 🏗️ Architecture Overview ``` ┌──────────────────────────────────────────────────────────────┐ │ 🧑‍💻 For Agent Users (Plugins) │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌────────┐ ┌──────┐ │ │ │ Claude │ │ OpenClaw │ │ OpenCode │ │ Codex │ │ Your │ │ │ │ Code │ │ Plugin │ │ Plugin │ │ Plugin │ │ App │ │ │ └────┬─────┘ └────┬─────┘ └────┬─────┘ └───┬────┘ └──┬───┘ │ │ └─────────────┴────────────┴───────────┴────────┘ │ ├────────────────────────────┬─────────────────────────────────┤ │ 🛠️ For Agent Developers │ Build your own with ↓ │ │ ┌─────────────────────────┴──────────────────────────────┐ │ │ │ memsearch CLI / Python API │ │ │ │ index · search · expand · watch · compact │ │ │ └─────────────────────────┬──────────────────────────────┘ │ │ ┌─────────────────────────┴──────────────────────────────┐ │ │ │ Core: Chunker → Embedder → Milvus │ │ │ │ Hybrid Search (BM25 + Dense + RRF) │ │ │ └────────────────────────────────────────────────────────┘ │ ├──────────────────────────────────────────────────────────────┤ │ 📄 Markdown Files (Source of Truth) │ │ memory/2026-03-27.md · memory/2026-03-26.md · ... │ └──────────────────────────────────────────────────────────────┘ ``` Plugins sit on top of the CLI/API layer. The API handles indexing, searching, and Milvus sync. Markdown files are always the source of truth — Milvus is a rebuildable shadow index. Everything below the plugin layer is what you use as an agent developer. ### How Plugins Work (Claude Code as example) **Capture — after each conversation turn:** ``` User asks question → Agent responds → Stop hook fires │ ┌────────────────────┘ ▼ Parse last turn │ ▼ LLM summarizes (haiku) "- User asked about X." "- Claude did Y." │ ▼ Append to memory/2026-03-27.md with  anchor │ ▼ memsearch index → Milvus ``` **Recall — 3-layer progressive search:** ``` User: "What did we discuss about batch size?" │ ▼ L1 memsearch search "batch size" → ranked chunks │ (need more?) ▼ L2 memsearch expand <chunk_hash> → full .md section │ (need original?) ▼ L3 parse-transcript <session.jsonl> → raw dialogue ``` ### 📄 Markdown as Source of Truth ``` Plugins append ──→ .md files ←── human editable │ ▼ memsearch watch (live watcher) │ detects file change │ ▼ re-chunk changed .md │ hash each chunk (SHA-256) │ ┌───────────┴───────────┐ ▼ ▼ hash unchanged? hash is new/changed? → skip (no API call) → embed → upsert to Milvus │ │ └───────────┬───────────┘ ▼ ┌──────────────────┐ │ Milvus (shadow) │ │ always in sync │ │ rebuildable │ └──────────────────┘ ``` ### 📦 Installation ```bash # Install as a global CLI tool — recommended when you mainly use the # `memsearch` command or any of the agent plugins (Claude Code, Codex, # OpenClaw, OpenCode), which all shell out to the CLI. uv tool install memsearch # via uv pipx install memsearch # via pipx pip install memsearch # plain pip # Install as a project dependency — use this if you want to import # `memsearch` from your own Python code (e.g. via the MemSearch class). uv add memsearch # via uv, adds to pyproject.toml pip install memsearch # into an activated venv ``` <details> <summary>Optional embedding providers</summary> ```bash # As a CLI tool (recommended — local ONNX, no API key) uv tool install "memsearch[onnx]" pipx install "memsearch[onnx]" pip install "memsearch[onnx]" # As a project dependency uv add "memsearch[onnx]" # Other options: [openai], [google], [voyage], [jina], [mistral], [ollama], [local], [all] ``` </details> ### 🐍 Python API — Give Your Agent Memory ```python from memsearch import MemSearch mem = MemSearch(paths=["./memory"]) await mem.index() # index markdown files results = await mem.search("Redis config", top_k=3) # semantic search scoped = await mem.search("pricing", top_k=3, source_prefix="./memory/product") print(results[0]["content"], results[0]["score"]) # content + similarity ``` <details> <summary>Full example — agent with memory (OpenAI) — click to expand</summary> ```python import asyncio from datetime import date from pathlib import Path from openai import OpenAI from memsearch import MemSearch MEMORY_DIR = "./memory" llm = OpenAI() # your LLM client mem = MemSearch(paths=[MEMORY_DIR]) # memsearch handles the rest def save_memory(content: str): """Append a note to today's memory log (OpenClaw-style daily markdown).""" p = Path(MEMORY_DIR) / f"{date.today()}.md" p.parent.mkdir(parents=True, exist_ok=True) with open(p, "a") as f: f.write(f"\n{content}\n") async def agent_chat(user_input: str) -> str: # 1. Recall — search past memories for relevant context memories = await mem.search(user_input, top_k=3) context = "\n".join(f"- {m['content'][:200]}" for m in memories) # 2. Think — call LLM with memory context resp = llm.chat.completions.create( model="gpt-5-mini", messages=[ {"role": "system", "content": f"You have these memories:\n{context}"}, {"role": "user", "content": user_input}, ], ) answer = resp.choices[0].message.content # 3. Remember — save this exchange and index it save_memory(f"## {user_input}\n{answer}") await mem.index() return answer async def main(): # Seed some knowledge save_memory("## Team\n- Alice: frontend lead\n- Bob: backend lead") save_memory("## Decision\nWe chose Redis for caching over Memcached.") await mem.index() # or mem.watch() to auto-index in the background # Agent can now recall those memories print(await agent_chat("Who is our frontend lead?")) print(await agent_chat("What caching solution did we pick?")) asyncio.run(main()) ``` </details> <details> <summary>Anthropic Claude example — click to expand</summary> ```bash pip install memsearch anthropic ``` ```python import asyncio from datetime import date from pathlib import Path from anthropic import Anthropic from memsearch import MemSearch MEMORY_DIR = "./memory" llm = Anthropic() mem = MemSearch(paths=[MEMORY_DIR]) def save_memory(content: str): p = Path(MEMORY_DIR) / f"{date.today()}.md" p.parent.mkdir(parents=True, exist_ok=True) with open(p, "a") as f: f.write(f"\n{content}\n") async def agent_chat(user_input: str) -> str: # 1. Recall memories = await mem.search(user_input, top_k=3) context = "\n".join(f"- {m['content'][:200]}" for m in memories) # 2. Think — call Claude with memory context resp = llm.messages.create( model="claude-sonnet-4-6", max_tokens=1024, system=f"You have these memories:\n{context}", messages=[{"role": "user", "content": user_input}], ) answer = resp.content[0].text # 3. Remember save_memory(f"## {user_input}\n{answer}") await mem.index() return answer async def main(): save_memory("## Team\n- Alice: frontend lead\n- Bob: backend lead") await mem.index() print(await agent_chat("Who is our frontend lead?")) asyncio.run(main()) ``` </details> <details> <summary>Ollama (fully local, no API key) — click to expand</summary> ```bash pip install "memsearch[ollama]" ollama pull nomic-embed-text # embedding model ollama pull llama3.2 # chat model ``` ```python import asyncio from datetime import date from pathlib import Path from ollama import chat from memsearch import MemSearch MEMORY_DIR = "./memory" mem = MemSearch(paths=[MEMORY_DIR], embedding_provider="ollama") def save_memory(content: str): p = Path(MEMORY_DIR) / f"{date.today()}.md" p.parent.mkdir(parents=True, exist_ok=True) with open(p, "a") as f: f.write(f"\n{content}\n") async def agent_chat(user_input: str) -> str: # 1. Recall memories = await mem.search(user_input, top_k=3) context = "\n".join(f"- {m['content'][:200]}" for m in memories) # 2. Think — call Ollama locally resp = chat( model="llama3.2", messages=[ {"role": "system", "content": f"You have these memories:\n{context}"}, {"role": "user", "content": user_input}, ], ) answer = resp.message.content # 3. Remember save_memory(f"## {user_input}\n{answer}") await mem.index() return answer async def main(): save_memory("## Team\n- Alice: frontend lead\n- Bob: backend lead") await mem.index() print(await agent_chat("Who is our frontend lead?")) asyncio.run(main()) ``` </details> > 📖 Full Python API reference: [Python API docs](https://zilliztech.github.io/memsearch/python-api/) ### ⌨️ CLI Usage **Setup:** ```bash memsearch config init # interactive setup wizard memsearch config set embedding.provider onnx # switch embedding provider memsearch config set milvus.uri http://localhost:19530 # switch Milvus backend ``` **Index & Search:** ```bash memsearch index ./memory/ # index markdown files memsearch index ./memory/ ./notes/ --force # re-embed everything memsearch search "Redis caching" # hybrid search (BM25 + vector) memsearch search "auth flow" --top-k 10 --json-output # JSON for scripting memsearch expand <chunk_hash> # show full section around a chunk ``` **Live Sync & Maintenance:** ```bash memsearch watch ./memory/ # live file watcher (auto-index on change) memsearch compact # LLM-powered chunk summarization memsearch stats # show indexed chunk count memsearch reset --yes # drop all indexed data and rebuild ``` > 📖 Full CLI reference with all flags: [CLI docs](https://zilliztech.github.io/memsearch/cli/) ## ⚙️ Configuration Embedding and Milvus backend settings → [Configuration (all platforms)](#️-configuration-all-platforms) Settings priority: Built-in defaults → `~/.memsearch/config.toml` → `.memsearch.toml` → CLI flags. > 📖 Full config guide: [Configuration](https://zilliztech.github.io/memsearch/home/configuration/) ## 🔗 Links - 📖 [Documentation](https://zilliztech.github.io/memsearch/) — full guides, API reference, and architecture details - 🔌 [Platform Plugins](https://zilliztech.github.io/memsearch/platforms/) — Claude Code, OpenClaw, OpenCode, Codex CLI - 💡 [Design Philosophy](https://zilliztech.github.io/memsearch/design-philosophy/) — why markdown, why Milvus, competitor comparison - 🦞 [OpenClaw](https://github.com/openclaw/openclaw) — the memory architecture that inspired memsearch - 🗄️ [Milvus](https://milvus.io/) | [Zilliz Cloud](https://cloud.zilliz.com/signup?utm_source=github&utm_medium=referral&utm_campaign=memsearch-readme) — the vector database powering memsearch ## 🤝 Contributing Bug reports, feature requests, and pull requests are welcome! See the [Contributing Guide](CONTRIBUTING.md) for development setup, testing, and plugin development instructions. For questions and discussions, join us on [Discord](https://discord.com/invite/FG6hMJStWu). ## 📄 License [MIT](LICENSE)

Vector Databases Knowledge Bases & RAG

1.9K Github Stars

Open Source

milvus-skill

# milvus-skill An agent skill that teaches LLMs how to use [pymilvus](https://github.com/milvus-io/pymilvus) to operate [Milvus](https://milvus.io/) vector database. ## What's Included - **SKILL.md** — Main skill definition with connection, collection management, vector operations, and index management - **references/** — Detailed reference docs for each feature area: - `collection.md` — Data types, schema fields, collection operations - `vector.md` — Insert, search, hybrid search, full-text search, iterators, filters - `index.md` — Index types, metric types, create/manage indexes - `partition.md` — Partition CRUD - `database.md` — Database management - `user-role.md` — RBAC: users, roles, privileges - `patterns.md` — Common patterns (RAG, semantic search, hybrid search, full-text search) ## Install as Claude Code Skill ```bash claude skill add --url https://github.com/zilliztech/milvus-skill ``` ## Capabilities - Connect to Milvus Lite, Standalone, Cluster, or Zilliz Cloud - Create collections with quick or custom schemas - Insert, upsert, search, query, get, delete vectors - Hybrid search with RRF/Weighted reranking - Full-text search with BM25 - Paginated iteration over large result sets - Index management (AUTOINDEX, HNSW, IVF_FLAT, etc.) - Partition, database, and RBAC management ## Requirements - Python 3.8+ - `pymilvus` (`pip install pymilvus`)

AI Agents Design Systems & Tokens

27 Github Stars

Software by zilliztech

GPTCache

deep-searcher

akcio

claude-context

memsearch

milvus-skill