Home
Softono
advanced-deep-research

advanced-deep-research

Open source MIT Python
64
Stars
12
Forks
0
Issues
4
Watchers
1 year
Last Commit

About advanced-deep-research

Automated Deep Research with LLMs, web search, paper parsing, and didactic summarization.

Platforms

Web Self-hosted

Languages

Python

Links

🧠 Advanced Deep Research

Advanced Deep Research is an autonomous multi-agent research framework designed to simulate a human-level deep researcher. It breaks down complex queries into actionable sub-questions, performs real-time searches across multiple sources (web, papers, and local vector DB), and synthesizes the most relevant information into clear, didactic summaries.


πŸš€ Features

  • πŸ” Sub-question generation using a local LLM (Qwen 2.5)
  • 🌐 Web search via Brave Search, Google, or SerpAPI
  • πŸ“„ Advanced content extraction from HTML and PDFs (with pymupdf4llm)
  • ✍️ Chunked summarization using facebook/bart-large-cnn (fine-tuned)
  • 🎯 Relevance filtering via jina-reranker-v2-base-multilingual (threshold: 0.5)
  • πŸ—‚ Knowledge storage in a local vector DB (Qdrant)
  • πŸ€– Reflective agent to determine when to stop searching
  • πŸ“˜ Final summarizer agent for clear, didactic answers

βš™οΈ Tech Stack

Component Technology/Model
LLM (main) Qwen 2.5 via vLLM (OpenAI-compatible API)
Embeddings jinaai/jina-embeddings-v3
Summarization facebook/bart-large-cnn
Re-ranker jinaai/jina-reranker-v2-base-multilingual
Vector Storage Qdrant
PDF Parsing pymupdf4llm
Web Search Brave API, Google (local), SerpAPI, Tavily
Backend FastAPI + Transformers(Hugging Face)

πŸ“‚ Project Structure

resumidor/
β”œβ”€β”€ cache/                   # Caching utilities
β”œβ”€β”€ config/                  # Configuration and environment handling
β”œβ”€β”€ databases/               # DB integrations (e.g., Qdrant)
β”œβ”€β”€ deep_searcher/           # Core loop for deep search
β”œβ”€β”€ dockers/                 # Docker configurations
β”œβ”€β”€ factory/                 # Model and service instantiation
β”œβ”€β”€ llm/                     # LLM interaction logic (Qwen, etc.) and tools
β”œβ”€β”€ management/              # Process managers / controllers
β”œβ”€β”€ models/                  # Model loading and handling
β”œβ”€β”€ parsers/                 # Web & PDF content parsers
β”œβ”€β”€ prompt_engineering/      # Prompt templates
β”œβ”€β”€ researchers/             # Research engines
β”œβ”€β”€ schemas/                 # Pydantic schemas
β”œβ”€β”€ server/                  # FastAPI server logic
β”œβ”€β”€ tests/                   # Unit and integration tests

🧰 Installation

1. Clone the repository

git clone https://github.com/prodesk98/advanced-deep-research.git
cd advanced-deep-research

2. Setup environment variables

Copy .env.example to .env and set your keys:

cp .env.example .env

Fill in your credentials:

GOOGLE_SEARCH_ENGINE=local,brave,serpapi
CRAWLER_ENGINE=local,firecrawl
BRAVE_API_KEY=your_key
SERPAPI_KEY=your_key
FIRECRAWL_API_KEY=your_key
HF_TOKEN=your_huggingface_token

3. Install dependencies

Using Poetry:

pip install poetry
poetry install

4. Download models

poetry run python -m download_cli.py

🐳 Docker Deployment

docker compose up -d

App runs at: http://localhost:8501


🧠 Research Pipeline (Simplified)

graph TD
    UI[User Interface] --> Q[User Question] --> SQ[Sub-questions]
    SQ --> WS[Search: Brave / Google / ArXiv]
    WS --> XT[Extract Content]
    XT --> SM[Summarize]
    SM --> RK[Re-rank Relevant Info]
    RK --> RF[Reflect: Is Answer Complete?]
    RF -- No --> SQ
    RF -- Yes --> DS[Didactic Final Summary]
    DS --> DB[Store in Vector DB]

πŸ“Œ Roadmap

  • [x] Sub-questioning + multi-source search
  • [x] ArXiv PDF extraction
  • [x] Chunked summarization with BART
  • [x] Reranker filtering (threshold-based)
  • [x] Reflective agent for iterative research
  • [x] Final summarizer for clarity
  • [x] CLI / Web Interface
  • [ ] Export to Markdown / PDF
  • [ ] Chrome/Firefox extension for contextual search

πŸ“œ License

MIT License


🀝 Contributing

Open issues, submit pull requests, or suggest improvements!
All contributions are welcome.