About OllamaR

OllamaR is a high-performance, easy-to-configure open-source load balancing server designed specifically for Ollama workloads. It distributes requests efficiently across multiple backend Ollama servers to improve availability, response speed, and resource utilization. Key features include high-performance request distribution algorithms, straightforward configuration, dynamic scaling through adding or removing backend nodes without restart, and enhanced security by keeping the native Ollama port unexposed and blocking dangerous endpoints related to model and data manipulation. OllamaR acts as a proxy in front of Ollama servers, storing the backend server list in a local database populated via the /api/tags interface, and supports both self-hosted and public Ollama servers. Supported endpoints include root status, /api/tags and /v1/models for listing enabled models, /api/version for version info, /api/chat for LLM inference, and /api/embed and /api/embeddings for embedding generation. Dangerous administrative

a

Published by

adysec

Visit View Profile

README.md

View on GitHub

Ollama 负载均衡服务器

Ollama 负载均衡服务器是一款高性能、易配置的开源负载均衡服务器，优化Ollama负载。它能够帮助您提高应用程序的可用性和响应速度，同时确保系统资源的有效利用。

特性

高性能 - 采用先进的算法和技术来实现高效的请求分发。
易配置 - 简单直观的配置文件使得部署和调整变得轻而易举。
可扩展性 - 支持动态添加或移除后端服务节点，无需重启服务器。
安全性 - 无原有Ollama漏洞利用路径相关路由，无法通过该负载均衡服务器删除源服务器模型及数据。

使用场景

提升Web应用和服务的可用性
均衡分布式系统的负载
快速故障转移和恢复
作为代理服务器，不使原生ollama端口对外暴露

说明

ollama.db: 存储源服务器列表，通过/api/tags接口获取，可使用自建Ollama或公网开放的Ollama服务器

支持接口

/: Ollama is running

/api/tags: 已启用的模型列表

/v1/models: 已启用的模型列表(OpenAI接口)

/api/version: Ollama版本号

/api/chat：LLM模型调用

/api/embed：embedding模型调用

/api/embeddings：embedding模型调用

不支持/api/show、/api/copy、/api/create、/api/push、/api/delete、/api/pull、/api/ps等有一定危险性的接口

OllamaR

About OllamaR

Platforms

Links

README.md

Ollama 负载均衡服务器

特性

使用场景

说明

支持接口