π§ Advanced Deep Research
Advanced Deep Research is an autonomous multi-agent research framework designed to simulate a human-level deep researcher. It breaks down complex queries into actionable sub-questions, performs real-time searches across multiple sources (web, papers, and local vector DB), and synthesizes the most relevant information into clear, didactic summaries.
π Features
- π Sub-question generation using a local LLM (Qwen 2.5)
- π Web search via Brave Search, Google, or SerpAPI
- π Advanced content extraction from HTML and PDFs (with
pymupdf4llm) - βοΈ Chunked summarization using
facebook/bart-large-cnn(fine-tuned) - π― Relevance filtering via
jina-reranker-v2-base-multilingual(threshold: 0.5) - π Knowledge storage in a local vector DB (Qdrant)
- π€ Reflective agent to determine when to stop searching
- π Final summarizer agent for clear, didactic answers
βοΈ Tech Stack
| Component | Technology/Model |
|---|---|
| LLM (main) | Qwen 2.5 via vLLM (OpenAI-compatible API) |
| Embeddings | jinaai/jina-embeddings-v3 |
| Summarization | facebook/bart-large-cnn |
| Re-ranker | jinaai/jina-reranker-v2-base-multilingual |
| Vector Storage | Qdrant |
| PDF Parsing | pymupdf4llm |
| Web Search | Brave API, Google (local), SerpAPI, Tavily |
| Backend | FastAPI + Transformers(Hugging Face) |
π Project Structure
resumidor/
βββ cache/ # Caching utilities
βββ config/ # Configuration and environment handling
βββ databases/ # DB integrations (e.g., Qdrant)
βββ deep_searcher/ # Core loop for deep search
βββ dockers/ # Docker configurations
βββ factory/ # Model and service instantiation
βββ llm/ # LLM interaction logic (Qwen, etc.) and tools
βββ management/ # Process managers / controllers
βββ models/ # Model loading and handling
βββ parsers/ # Web & PDF content parsers
βββ prompt_engineering/ # Prompt templates
βββ researchers/ # Research engines
βββ schemas/ # Pydantic schemas
βββ server/ # FastAPI server logic
βββ tests/ # Unit and integration tests
π§° Installation
1. Clone the repository
git clone https://github.com/prodesk98/advanced-deep-research.git
cd advanced-deep-research
2. Setup environment variables
Copy .env.example to .env and set your keys:
cp .env.example .env
Fill in your credentials:
GOOGLE_SEARCH_ENGINE=local,brave,serpapi
CRAWLER_ENGINE=local,firecrawl
BRAVE_API_KEY=your_key
SERPAPI_KEY=your_key
FIRECRAWL_API_KEY=your_key
HF_TOKEN=your_huggingface_token
3. Install dependencies
Using Poetry:
pip install poetry
poetry install
4. Download models
poetry run python -m download_cli.py
π³ Docker Deployment
docker compose up -d
App runs at: http://localhost:8501
π§ Research Pipeline (Simplified)
graph TD
UI[User Interface] --> Q[User Question] --> SQ[Sub-questions]
SQ --> WS[Search: Brave / Google / ArXiv]
WS --> XT[Extract Content]
XT --> SM[Summarize]
SM --> RK[Re-rank Relevant Info]
RK --> RF[Reflect: Is Answer Complete?]
RF -- No --> SQ
RF -- Yes --> DS[Didactic Final Summary]
DS --> DB[Store in Vector DB]
π Roadmap
- [x] Sub-questioning + multi-source search
- [x] ArXiv PDF extraction
- [x] Chunked summarization with BART
- [x] Reranker filtering (threshold-based)
- [x] Reflective agent for iterative research
- [x] Final summarizer for clarity
- [x] CLI / Web Interface
- [ ] Export to Markdown / PDF
- [ ] Chrome/Firefox extension for contextual search
π License
MIT License
π€ Contributing
Open issues, submit pull requests, or suggest improvements!
All contributions are welcome.