SynapseKit

About SynapseKit

Minimal, async-first Python framework for production LLM apps- 2 hard deps, no magic, no SaaS.

s

Published by

synapsekit

Visit View Profile

README.md

View on GitHub

[]()

Website · Documentation · Quickstart · API Reference · Changelog · Discord · Report a Bug

Build production LLM apps with 2 dependencies. Async-native RAG, Agents, and Graph workflows — no magic, no SaaS, no bloat.

"LangChain for people who hate LangChain."

SynapseKit is the minimal, async-first Python framework for LLM applications. 33 providers · 48+ tools · 64 loaders · 22 vector stores. Every abstraction is plain Python you can read, debug, and extend. No hidden chains. No global state. No lock-in.

⚡ Async-native Every API is `async/await` first. Sync wrappers for scripts and notebooks. No event loop surprises.	🌊 Streaming-first Token-level streaming is the default, not an afterthought. Works across all providers.	🪶 Minimal footprint 2 hard dependencies: `numpy` + `rank-bm25`. Everything else is optional. Install only what you use.
🔌 One interface 33 LLM providers and 22 vector stores behind the same API. Swap without rewriting.	🧩 Composable RAG pipelines, agents, and graph nodes are interchangeable. Wrap anything as anything.	🔍 Transparent No hidden chains. Every step is plain Python you can read and override.

10-Line Agent Example

from synapsekit import agent, tool

@tool
def get_weather(city: str) -> str:
    """Get current weather for a city."""
    return f"Sunny, 22°C in {city}"

my_agent = agent(
    model="gpt-4o-mini",
    api_key="sk-...",
    tools=[get_weather],
)

print(my_agent.run("What's the weather in Tokyo?"))

SynapseKit vs LangChain vs LlamaIndex

	SynapseKit	LangChain	LlamaIndex
Hard dependencies	2	50+	20+
Install size	~5 MB	~200 MB+	~100 MB+
Async-native	✅ Default	⚠️ Partial	⚠️ Partial
Streaming	✅ Default	⚠️ Varies	⚠️ Varies
Cost tracking	✅ Built-in	❌ LangSmith (SaaS)	❌ No
Evaluation / EvalCI	✅ CLI + GitHub Action	❌ LangSmith (SaaS)	⚠️ Built-in
Graph workflows	✅ Built-in	⚠️ LangGraph (separate pkg)	❌ No
Agent federation	✅ Built-in	❌ No	❌ No
Reasoning LLMs	✅ Unified adapter	⚠️ Manual	⚠️ Manual
Structured output	✅ Provider-agnostic	⚠️ Provider-specific	⚠️ Provider-specific
Agent memory backends	✅ 4 built-in	⚠️ Community plugins	⚠️ Community plugins
Observability	✅ Prometheus + Grafana	❌ No	❌ No
Type safety	✅ Strict dataclasses	⚠️ Partial	⚠️ Partial
LLM providers	33	38+	20+
Stack traces	Your code	Framework internals	Framework internals
License	Apache 2.0	MIT	MIT

LangChain has more raw integrations and more tutorials. That's not what SynapseKit is optimizing for. SynapseKit is optimizing for the engineer who needs to ship, debug, and maintain an LLM feature in production — where readable code, predictable async behavior, and no surprise SaaS bills actually matter.

Who is it for?

SynapseKit is for Python developers who want to ship LLM features without fighting their framework.

Burned LangChain users — hit a wall with debugging, dependency hell, or version churn and want full control back
Async backend engineers — building FastAPI services where LangChain's sync-first model feels bolted on
Cost-conscious teams — startups and teams who don't want a LangSmith subscription for basic observability
ML engineers — building RAG or agent pipelines who need full control over retrieval, prompting, and tool use

What it covers

🗂 RAG Pipelines Retrieval-augmented generation with streaming, BM25 reranking, conversation memory, and token tracing. Load from PDFs, URLs, CSVs, HTML, directories, and more.	🤖 Agents ReAct loop (any LLM) and native function calling (OpenAI / Anthropic / Gemini / Mistral). 48 built-in tools including calculator, Python REPL, code interpreter, web search, SQL, HTTP, shell, Twilio, arxiv, pubmed, wolfram, wikipedia, and more. Fully extensible.
🔀 Graph Workflows DAG-based async pipelines. Nodes run in waves — parallel nodes execute concurrently. Conditional routing, typed state with reducers, fan-out/fan-in, SSE streaming, event callbacks, human-in-the-loop, checkpointing, and Mermaid export.	🧠 LLM Providers OpenAI, Anthropic, Ollama, Gemini, Cohere, Mistral, Bedrock, Azure OpenAI, Groq, DeepSeek, OpenRouter, Together, Fireworks, Cerebras, Cloudflare, Moonshot, Perplexity, Vertex AI, Zhipu, AI21 Labs, Databricks, Baidu ERNIE, llama.cpp, LM Studio, Minimax, Aleph Alpha, Hugging Face, SambaNova, xAI, NovitaAI, Writer — all behind one interface. Auto-detected from the model name. Swap without rewriting.
🗄 Vector Stores InMemory (built-in, `.npz` persistence), ChromaDB, FAISS, Qdrant, Pinecone, Weaviate, PGVector, Milvus, LanceDB, SQLiteVec, MongoDB Atlas, Redis, Elasticsearch, OpenSearch, Supabase, Cassandra, DuckDB, ClickHouse, Marqo, Typesense, Vespa, Zilliz. One interface for all 22 backends.	🔧 Utilities Output parsers (JSON, Pydantic, List), prompt templates (standard, chat, few-shot), token tracing with cost estimation.
🧠 Reasoning LLMs (new in v1.7.0) `ReasoningLLM` unified adapter for o1/o3, Claude thinking, Gemini thinking, DeepSeek R1, and Qwen QwQ. Returns `ReasoningResponse` with answer, thinking trace, and token breakdown. `stream()` yields `ReasoningStreamChunk` with `is_thinking` flag.	⚖️ Cost-Quality Routing (new in v1.7.0) `CostQualityRouter` explores candidates round-robin then exploits the cheapest model meeting your quality threshold. Tracks Pareto frontier of cost vs quality. Optional `budget_per_call_usd` hard cap.
🎯 Prompt Optimization (new in v1.7.0) `PromptOptimizer` scores prompt variants against an `@eval_case` suite and returns the best `PromptCandidate`. Supports LLM-generated variants or manual lists. Budget-aware early stopping.	🌐 Federated Retrieval (new in v1.7.0) `FederatedRetriever` fans out to multiple local retrievers and remote HTTP endpoints in parallel. RRF, normalised score fusion, or round-robin interleave. Near-duplicate dedup, per-source timeouts.
🧠 Smart Context Manager (new) `SmartContextManager` manages context windows hierarchically: static system prompt → running summary → search results → recent messages. Injects Anthropic `cache_control` tags on system and summary blocks automatically, cutting repeated-call costs by up to 80%. Sliding window prunes and summarises older turns via a cheap LLM. `pip install synapsekit[anthropic]`.	✅ Structured Output (new) `StructuredOutput` wraps any LLM and validates its response against a Pydantic v2 model. Retries with a corrective prompt on JSON or schema failures, with configurable backoff and optional fallback provider. Streaming support via `IncrementalJSONBuffer` — detects complete JSON mid-stream and validates immediately.
🕸 Agent Federation (new) `AgentFederation` routes prompts across a registry of agents using round-robin, capacity-aware, or cost-aware strategies. `InMemoryAgentRegistry` and `RedisAgentRegistry` track agents with heartbeat-based health checks and stale pruning. Tag and tool-based discovery filters. `LocalAgentClient` for in-process agents, custom `AgentClient` for remote. `pip install synapsekit[redis]` for Redis registry.	🔁 Continuous Fine-Tuning Pipeline (new) `ContinuousTrainer` closes the loop from production feedback to a deployed fine-tuned model. `FeedbackCollector` batches samples async; `TrainingDataGenerator` exports JSONL and preference pairs; `OpenAIFineTuneProvider` / `AnthropicFineTuneProvider` submit and poll jobs; `ABTestRouter` sticky-routes traffic by SHA-256 bucket; `AutoRolloutManager` stages rollout with latency/cost/quality regression guards; `CostBenefitAnalyzer` projects ROI and payback days. `pip install synapsekit[training]`.
⚡ Performance suite (new in v1.7.0) `orjson` fast JSON across all hot paths · `uvloop` event loop · `xxhash` cache key hashing (5–10× faster) · pre-allocated vector buffer (O(1) amortised inserts) · vectorised MMR · `__slots__` on hot classes · optional Rust extension for chunking and hashing. Install with `pip install synapsekit[performance]`.
🧪 EvalCI — LLM Quality Gates GitHub Action that runs `@eval_case` suites on every PR and blocks merge if quality drops. No infrastructure, 2-minute setup. Score, cost, and latency tracked per case. Works with any LLM provider. → GitHub Marketplace · Docs
📊 Agent Benchmarking Evaluate your agents against industry-standard benchmarks like GAIA, SWE-bench, WebArena, and AgentBench directly from the CLI. Generate leaderboards to compare performance across tasks. 🧪 EvalHub Community Suites Run shared community eval suites with `synapsekit bench` and compare aggregate score against baseline.

ReasoningAgent (automatic routing)

import asyncio

from synapsekit import ReasoningAgent, ReasoningAgentConfig

from synapsekit.agents.tools import CalculatorTool

from synapsekit.llm import LLMConfig, OpenAILLM, ReasoningLLM

fast = OpenAILLM(
    LLMConfig(model="gpt-4o-mini", api_key="sk-...", provider="openai")
)

reasoning = ReasoningLLM(model="o3", api_key="sk-...")


agent = ReasoningAgent(
    ReasoningAgentConfig(
        fast_llm=fast,
        reasoning_llm=reasoning,
        tools=[CalculatorTool()],
        agent_type="function_calling",
    )
)


async def main():

    answer = await agent.run("Solve: find the eigenvalues of [[2,1],[1,2]]")
    print(answer)


asyncio.run(main())

EvalHub quick usage

synapsekit bench --list
synapsekit bench --suite community/customer-support --model gpt-4o-mini
synapsekit bench --publish my_evals/ --name myorg/rag-finance

Docs: docs/evalhub.md

Integrations

One interface. 190+ integrations. Zero lock-in.

🧠 LLM Providers	🗄 Vector Stores	📂 Data Loaders	🔧 Agent Tools
33	22	64	48+

Every integration is pip install synapsekit[name] — nothing else. Swap providers, vector stores, or loaders without touching your application code.

Icons use Google Favicons for reliability across light and dark themes.

🧠 LLM Providers — 33 supported

Every provider implements the same BaseLLM interface. Auto-detected from model name — gpt-4o → OpenAI, claude-* → Anthropic, gemini-* → Google. Swap without rewriting.

_OpenAI	_Anthropic	_Gemini	_{Azure OpenAI}	_{AWS Bedrock}	_{Vertex AI}	_Mistral	_Cohere
_Groq	_{Hugging Face}	_Cloudflare	_Databricks	_Perplexity	_Replicate	_{xAI (Grok)}	_{Baidu ERNIE}
_DeepSeek	_Ollama	_{Together AI}	_OpenRouter	_{Fireworks AI}	_Cerebras	_SambaNova	_NovitaAI
_Writer	_{AI21 Labs}	_{Aleph Alpha}	_Minimax	_Moonshot	_Zhipu	_{LM Studio}	_llama.cpp
_vLLM	_GPT4All

🗄 Vector Stores — 22 backends

All implement VectorStore with add(), search(), search_mmr(), save(), and load(). Built-in InMemoryVectorStore needs zero extra deps. Everything else is pip install synapsekit[name].

_ChromaDB	_FAISS	_Qdrant	_Pinecone	_Weaviate	_Milvus	_LanceDB	_PGVector
_SQLiteVec	_{MongoDB Atlas}	_Redis	_{Elasticsearch}	_OpenSearch	_Supabase	_Cassandra	_DuckDB
_ClickHouse	_Marqo	_Typesense	_Vespa	_Zilliz

📂 Data Loaders — 64 sources

All return list[Document] with .text and .metadata. Every loader has a sync .load() and async .aload(). Load from disk, cloud, databases, or APIs — same interface everywhere.

File Formats

_PDF	_{Word (DOCX)}	_{Excel (XLSX)}	_PowerPoint	_{HTML / XML}	_Markdown	_LaTeX	_{YAML / JSON}
_Parquet	_{Audio (Whisper)}	_Video	_{RSS / Sitemap}	_{Git Repo}

Cloud Storage

_{AWS S3}

_{Google Drive}

_{Azure Blob}

_OneDrive

_Dropbox

_{Google Cloud}

Databases

_PostgreSQL	_MySQL	_MongoDB	_DynamoDB	_{Elasticsearch}	_Redis	_BigQuery	_Snowflake
_SQLite	_Supabase

APIs & Productivity

_GitHub	_Jira	_Confluence	_Notion	_Slack	_Discord	_HubSpot	_Salesforce
_Airtable	_YouTube	_Reddit	_Wikipedia	_Obsidian	_{Google Sheets}	_Firebase	_Twilio
_arXiv	_PubMed	_{Email (IMAP)}

🔧 Agent Tools — 48+ built-in

All implement BaseTool with a single async run(). Pass any list of tools to ReActAgent or FunctionCallingAgent. Write your own in 5 lines.

_DuckDuckGo	_{Google Search}	_Tavily	_{Wolfram Alpha}	_Wikipedia	_YouTube	_arXiv	_PubMed
_Slack	_Discord	_{GitHub API}	_Jira	_Notion	_Linear	_Stripe	_Twilio
_{Google Calendar}	_{AWS Lambda}	_{Browser (Playwright)}	_{SQL Query}	_{Python REPL}	_Shell

🧠 Memory & Cache Backends

_SQLite

_Redis

_PostgreSQL

_DynamoDB

_Memcached

📡 Observability

_{OpenTelemetry}

_Prometheus

_Grafana

PrometheusMetrics records synapsekit_cost_usd_total, synapsekit_tokens_total, and synapsekit_latency_seconds per model/provider. Hooks into the existing observe span pipeline — no code changes needed. Helm chart for a Prometheus + Grafana stack ships in assets/helm/synapsekit-observability/. pip install synapsekit[observe].

Multi-Hop Knowledge Graph RAG

SynapseKit provides advanced retrieval modules, including vector search and multi-hop Knowledge Graph (KG) retrieval.

When to use which?

Vector Search (Semantic): Best for broad conceptual queries, finding similar passages, or answering questions whose answers are contained within a single chunk of text.
Knowledge Graph (KG): Best for specific, multi-hop reasoning questions where the relationship spans across multiple documents (e.g., finding out who owns the parent company of a subsidiary).
Hybrid (Vector + KG): Combining both strategies guarantees that you capture deep semantic context while also exploring explicitly extracted entity relationships. Initialize the RAG facade with graph_store=NetworkXStore() or Neo4jStore(...) to enable this out-of-the-box.

Production RAG ROI

from synapsekit import RAG, RAGEvaluator, SlackWebhookAlertSink
from synapsekit.cli.ui_server import create_app

rag = RAG(
    model="gpt-4o-mini",
    api_key="sk-...",
    evaluator=RAGEvaluator(
        judge_llm=judge_llm,  # a cheaper judge model
        sample_rate=0.1,
        alert_sinks=[SlackWebhookAlertSink(webhook_url=SLACK_WEBHOOK_URL)],
    ),
)

app = create_app(tracer=rag.tracer, rag_evaluator=rag.evaluator)
answer = await rag.ask("What changed in the release notes?")
await rag.wait_for_evaluations()

metrics = rag.tracer.summary()
print(metrics["avg_rag_benefit_to_cost"])
print(metrics["total_rag_alerts"])

Don't see your stack? Every integration is built the same way — most take under an hour. Browse good first issue → · Contributing guide → · Discord →

We credit every contributor in the README and send a personal thank-you on Discord.

Install

pip

pip install synapsekit[openai]       # OpenAI
pip install synapsekit[anthropic]    # Anthropic + prompt caching
pip install synapsekit[ollama]       # Ollama (local)
pip install synapsekit[performance]  # orjson + uvloop + xxhash (faster)
pip install synapsekit[observe]      # OpenTelemetry + Prometheus metrics
pip install synapsekit[training]     # Continuous fine-tuning pipeline
pip install synapsekit[bench]        # pytest-benchmark + ASV harness
pip install synapsekit[redis]        # Redis agent registry + memory backends
pip install synapsekit[all]          # Everything

uv

uv add synapsekit[openai]
uv add synapsekit[all]

Poetry

poetry add synapsekit[openai]
poetry add "synapsekit[all]"

Full installation options → docs

Observability guide → docs/observability.md

Documentation

Everything you need to get started and go deep is in the docs.


🚀 Quickstart	Up and running in 5 minutes
🗂 RAG	Pipelines, loaders, retrieval, vector stores
🤖 Agents	ReAct, function calling, tools, executor
🔀 Graph Workflows	DAG pipelines, conditional routing, parallel execution
🧠 LLM Providers	All 33 providers + ReasoningLLM with examples
🧪 EvalCI	LLM quality gates on every PR — GitHub Action
📖 API Reference	Full class and method reference

Development

git clone https://github.com/SynapseKit/SynapseKit
cd SynapseKit
uv sync --group dev
uv run pytest tests/ -q

Contributing

Contributions are welcome — bug reports, documentation fixes, new providers, new features.

Read CONTRIBUTING.md to get started. Look for issues tagged good first issue if you're new.

Community

💬 Discord — chat, help, show and tell
💬 Discussions — ask questions, share ideas
🧭 Discord roles draft — proposed roles and permissions for issue #389
🧭 Discord release webhook draft — automate release announcements for issue #390
🐛 Bug reports
💡 Feature requests
🔒 Security policy

Contributors

_Nautiverse
💻 📖 🚧

_{Gordienko Andrey}
💻

_{Deepak singh}
💻

_by22Jy
💻

_{Arjun Kundapur}
💻

_{Harshit Gupta}
📖

_{Dhruv Garg}
💻

_{Adam Silva}
💻

_qorex
💻

_{Abhay Krishna}
💻

_{AYUSH BHATT}
💻

_HARSH
📖

_mikemolinet
💻 🐛

_{Alessandro Mecca}
💻 🐛

License

Apache 2.0