![]()
Helicone AI Gateway
The fastest, lightest, and easiest-to-integrate AI Gateway on the market.
Built by the team at Helicone, open-sourced for the community.
π Quick Start β’ π Docs β’ π¬ Discord β’ π Website
π 1 API. 100+ models.
Open-source, lightweight, and built on Rust.
Handle hundreds of models and millions of LLM requests with minimal latency and maximum reliability.
The NGINX of LLMs.
π©π»βπ» Set up in seconds
With the cloud hosted AI Gateway
from openai import OpenAI
client = OpenAI(
api_key="YOUR_HELICONE_API_KEY",
base_url="https://ai-gateway.helicone.ai/ai",
)
completion = client.chat.completions.create(
model="openai/gpt-4o-mini", # or 100+ models
messages=[
{
"role": "user",
"content": "Hello, how are you?"
}
]
)
-- For custom config, check out our configuration guide and the providers we support.
Why Helicone AI Gateway?
π Unified interface
Request any LLM provider using familiar OpenAI syntax. Stop rewriting integrationsβuse one API for OpenAI, Anthropic, Google, AWS Bedrock, and 20+ more providers.
β‘ Smart provider selection
Smart Routing to always hit the fastest, cheapest, or most reliable option, and always aware of provider uptimes and your rate limits. Built-in strategies include model-based latency routing (fastest model), provider latency-based P2C + PeakEWMA (fastest provider), weighted distribution (based on model weight), and cost optimization (cheapest option).
π° Control your spending
Rate limit to prevent runaway costs and usage abuse. Set limits per user, team, or globally with support for request counts, token usage, and dollar amounts.
π Improve performance
Cache responses to reduce costs and latency by up to 95%. Supports Redis and S3 backends with intelligent cache invalidation.
π Simplified tracing
Monitor performance and debug issues with built-in Helicone integration, plus OpenTelemetry support for logs, metrics, and traces.
βοΈ One-click deployment
Use our cloud-hosted AI Gateway or deploy it to your own infrastructure in seconds by using Docker or following any of our deployment guides here.
https://github.com/user-attachments/assets/ed3a9bbe-1c4a-47c8-98ec-2bb4ff16be1f
β‘ Scalable for production
| Metric | Helicone AI Gateway | Typical Setup |
|---|---|---|
| P95 Latency | <5ms | ~60-100ms |
| Memory Usage | ~64MB | ~512MB |
| Requests/sec | ~3,000 | ~500 |
| Binary Size | ~30MB | ~200MB |
| Cold Start | ~100ms | ~2s |
Note: See benchmarks/README.md for detailed benchmarking methodology and results.
π₯ Demo
https://github.com/user-attachments/assets/dd6b6df1-0f5c-43d4-93b6-3cc751efb5e1
ποΈ How it works
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Your App βββββΆβ Helicone AI βββββΆβ LLM Providers β
β β β Gateway β β β
β OpenAI SDK β β β β β’ OpenAI β
β (any language) β β β’ Load Balance β β β’ Anthropic β
β β β β’ Rate Limit β β β’ AWS Bedrock β
β β β β’ Cache β β β’ Google Vertex β
β β β β’ Trace β β β’ 20+ more β
β β β β’ Fallbacks β β β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β
βΌ
βββββββββββββββββββ
β Helicone β
β Observability β
β β
β β’ Dashboard β
β β’ Observability β
β β’ Monitoring β
β β’ Debugging β
βββββββββββββββββββ
βοΈ Custom configuration
Cloud hosted router configuration
For the cloud hosted router, we provide a configuration wizard in the UI to help you setup your router without the need for any YAML engineering.
For complete reference of our configuration options, check out our configuration reference and the providers we support.
π Migration guide
From OpenAI (Python)
from openai import OpenAI
client = OpenAI(
- api_key=os.getenv("OPENAI_API_KEY")
+ api_key="placeholder-api-key" # Gateway handles API keys
+ base_url="http://localhost:8080/router/your-router-name"
)
response = client.chat.completions.create(
- model="gpt-4o-mini",
+ model="openai/gpt-4o-mini", # or 100+ models
messages=[{"role": "user", "content": "Hello!"}]
)
From OpenAI (TypeScript)
import { OpenAI } from "openai";
const client = new OpenAI({
- apiKey: os.getenv("OPENAI_API_KEY")
+ apiKey: "placeholder-api-key", // Gateway handles API keys
+ baseURL: "http://localhost:8080/router/your-router-name",
});
const response = await client.chat.completions.create({
- model: "gpt-4o",
+ model: "openai/gpt-4o",
messages: [{ role: "user", content: "Hello from Helicone AI Gateway!" }],
});
Self-host the AI Gateway
The option might be best for you if you are extremely latency sensitive, or want to avoid a cloud offering and would prefer to self host the gateway.
Run the AI Gateway locally
- Set up your
.envfile with yourPROVIDER_API_KEYs
OPENAI_API_KEY=your_openai_key
ANTHROPIC_API_KEY=your_anthropic_key
- Run locally in your terminal
npx @helicone/ai-gateway@latest
- Make your requests using any OpenAI SDK:
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/ai",
# Gateway handles API keys, so this only needs to be
# set to a valid Helicone API key if authentication is enabled.
api_key="placeholder-api-key"
)
# Route to any LLM provider through the same interface, we handle the rest.
response = client.chat.completions.create(
model="anthropic/claude-3-5-sonnet", # Or other 100+ models..
messages=[{"role": "user", "content": "Hello from Helicone AI Gateway!"}]
)
That's it. No new SDKs to learn, no integrations to maintain. Fully-featured and open-sourced.
-- For custom config, check out our configuration guide and the providers we support.
Self hosted configuration customization
If you are self hosting the gateway and would like to configure different routing strategies, you may follow the below steps:
1. Set up your environment variables
Include your PROVIDER_API_KEYs in your .env file.
If you would like to enable authentication, set the HELICONE_CONTROL_PLANE_API_KEY
variable as well.
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
HELICONE_CONTROL_PLANE_API_KEY=sk-...
2. Customize your config file
Note: This is a sample config.yaml file. Please refer to our configuration guide for the full list of options, examples, and defaults.
See our full provider list here.
helicone: # Include your HELICONE_API_KEY in your .env file
features: all
cache-store:
type: in-memory
global: # Global settings for all routers
cache:
directive: "max-age=3600, max-stale=1800"
routers:
your-router-name: # Single router configuration
load-balance:
chat:
strategy: model-latency
models:
- openai/gpt-4o-mini
- anthropic/claude-3-7-sonnet
rate-limit:
per-api-key:
capacity: 1000
refill-frequency: 1m # 1000 requests per minute
3. Run with your custom configuration
npx @helicone/ai-gateway@latest --config config.yaml
4. Make your requests
from openai import OpenAI
import os
helicone_api_key = os.getenv("HELICONE_API_KEY")
client = OpenAI(
base_url="http://localhost:8080/router/your-router-name",
api_key=helicone_api_key
)
# Route to any LLM provider through the same interface, we handle the rest.
response = client.chat.completions.create(
model="anthropic/claude-3-5-sonnet", # Or other 100+ models..
messages=[{"role": "user", "content": "Hello from Helicone AI Gateway!"}]
)
For a complete guide on self-hosting options, including Docker deployment, Kubernetes, and cloud platforms, see our deployment guides.
π Resources
Documentation
- π Full Documentation - Complete guides and API reference
- π Quickstart Guide - Get up and running in 1 minute
- π¬ Advanced Configurations - Configuration reference & examples
Community
- π¬ Discord Server - Our community of passionate AI engineers
- π GitHub Discussions - Q&A and feature requests
- π¦ Twitter - Latest updates and announcements
- π§ Newsletter - Tips and tricks to deploying AI applications
Support
- π« Report bugs: Github issues
- πΌ Enterprise Support: Book a discovery call with our team
π License
The Helicone AI Gateway is licensed under the Apache License - see the file for details.
Made with β€οΈ by Helicone.