About conversational-agent-langchain

FastAPI Backend for a Conversational Agent using Cohere, (Azure) OpenAI, Langchain & Langgraph and Qdrant as VectorDB

m

Published by

mfmezger

Visit View Profile

README.md

View on GitHub

conv

Conversational RAG Agent

This is a Rest-Backend for a Conversational Agent, that allows you to embed Documents, search for them using Semantic Search, to QA based on Documents and do document processing with Large Language Models.

LLMs and Backend Providers

I have decided to stop creating different services for different provider and switching to LiteLLM which allows to use basically every provider you want.

Some providers i would recommend are:

Cohere Awesome models and great free tier.
Ollama If you want to keep your data your data.
Google AI Studio The Google Integration that is not really suited for enterprise but perfect for everybody else.

[!NOTE] The EmbeddingManagement class in src/agent/backend/services/embedding_management.py contains placeholders for Google and OpenAI embedding providers. These are intended as extension points for you to implement if you wish to use these specific providers directly.

Quickstart

To run the complete system with docker use this command:

git clone https://github.com/mfmezger/conversational-agent-langchain.git
cd conversational-agent-langchain

Create a .env file from the template.env and set the necessary API Keys. Absolutely necessary are:

GEMINI_API_KEY
COHERE_API_KEY

Then start the system with

  docker compose up -d

Service	URL
API Documentation	`http://127.0.0.1:8001/docs`
Frontend	`http://localhost:8501`
Qdrant Dashboard	`http://localhost:6333/dashboard`
Phoenix Dashboard	`http://localhost:6006`

Project Description

This project is a conversational rag agent that uses Google Gemini Large Language Models to generate responses to user queries. The agent also includes a vector database and a REST API built with FastAPI.

What is RAG?

Retrieval-Augmented Generation (RAG) is a technique that enhances Large Language Models (LLMs) by providing them with relevant information from an external knowledge base. Instead of relying solely on its pre-trained knowledge, the model retrieves specific documents related to the user's query and uses them as context to generate more accurate, up-to-date, and domain-specific responses. This approach reduces hallucinations and allows the model to answer questions about private or proprietary data.

Features

Uses Large Language Models to generate responses to user queries.
Includes a vector database to store and retrieve information.
Provides a REST API built with FastAPI for easy integration with other applications.
Has a basic GUI.
Includes Phoenix Tracing for observability.
Reranking support with Cohere and FlashRank providers.
Fast, Rust-native git hooks using prek for code quality checks.

Reranking Configuration

Reranking improves retrieval quality by re-scoring documents after initial retrieval. Two providers are supported:

Provider	Description	Requires API Key
`cohere`	Cohere Rerank API (cloud-based, high quality)	Yes (`COHERE_API_KEY`)
`flashrank`	FlashRank (local, fast, privacy-friendly)	No
`none`	Disabled (default)	No

Add these to your .env file:

# Reranking
RERANK_PROVIDER=cohere          # Options: "cohere", "flashrank", "none"
RERANK_TOP_K=5                  # Number of documents to keep after reranking

# Retrieval
RETRIEVAL_K=10                  # Documents to retrieve initially
RETRIEVAL_K_RETRY=20            # Documents to retrieve on retry

[!TIP] For best results, set RETRIEVAL_K higher than RERANK_TOP_K so the reranker has more candidates to choose from.

Tracing

This project uses Phoenix for tracing and observability. It allows you to monitor the execution of your RAG pipeline, inspect retrieved documents, and debug the generation process.

Tracing

Semantic Search

Semantic Search Architecture

Semantic search is an advanced search technique that aims to understand the meaning and context of a user's query, rather than matching keywords. It involves natural language processing (NLP) and machine learning algorithms to analyze and interpret user intent, synonyms, relationships between words, and the structure of content. By considering these factors, semantic search improves the accuracy and relevance of search results, providing a more intuitive and personalized user experience.

Hybrid Search

For Hybrid Search the BM25 FastEmbed from Qdrant is used.

Architecture

Semantic Search Architecture

Installation & Development Backend

On Linux or Mac you need to adjust your /etc/hosts file to include the following line:

127.0.0.1 qdrant

First install Python Dependencies:

You need to install uv if you want to use it for syncing the requirements.lock file. UV Installation.

uv sync

Load Demo Data

In src/agent/scripts use the load dummy data script to load some example data in the rag.

Start the complete system with:

docker compose up -d

To run the Qdrant Database local just run:

docker compose up qdrant

To run the Backend use this command in the root directory:

uv run uvicorn agent.api:app --reload


To run the tests you can use this command:

```bash
uv run coverage run -m pytest -o log_cli=true -vvv tests

Development Frontend

To run the Frontend use this command in the root directory:

uv run streamlit run frontend/assistant.py --theme.base="dark"

Qdrant API Key

To use the Qdrant API you need to set the correct parameters in the .env file. QDRANT_API_KEY is the API key for the Qdrant API. And you need to change it in the qdrant.yaml file in the config folder.

Testing the API

To Test the API i would recommend Bruno. The API Requests are store in ConvAgentBruno folder.

conversational-agent-langchain

About conversational-agent-langchain

Platforms

Languages

Links

README.md

Conversational RAG Agent

Table of Contents

LLMs and Backend Providers

Quickstart

Project Description

What is RAG?

Reranking Configuration

Tracing

Semantic Search

Hybrid Search

Architecture

Installation & Development Backend

Load Demo Data

Development Frontend

Qdrant API Key

Testing the API

Star History