bentoml

Open Source

OpenLLM

<div align="center"> <h1>🦾 OpenLLM: Self-Hosting LLMs Made Easy</h1> [![License: Apache-2.0](https://img.shields.io/badge/License-Apache%202-green.svg)](https://github.com/bentoml/OpenLLM/blob/main/LICENSE) [![Releases](https://img.shields.io/pypi/v/openllm.svg?logo=pypi&label=PyPI&logoColor=gold)](https://pypi.org/project/openllm) [![CI](https://results.pre-commit.ci/badge/github/bentoml/OpenLLM/main.svg)](https://results.pre-commit.ci/latest/github/bentoml/OpenLLM/main) [![X](https://badgen.net/badge/icon/@bentomlai/000000?icon=twitter&label=Follow)](https://twitter.com/bentomlai) [![Community](https://badgen.net/badge/icon/Community/562f5d?icon=slack&label=Join)](https://l.bentoml.com/join-slack) </div> OpenLLM allows developers to run **any open-source LLMs** (Llama 3.3, Qwen2.5, Phi3 and [more](#supported-models)) or **custom models** as **OpenAI-compatible APIs** with a single command. It features a [built-in chat UI](#chat-ui), state-of-the-art inference backends, and a simplified workflow for creating enterprise-grade cloud deployment with Docker, Kubernetes, and [BentoCloud](#deploy-to-bentocloud). Understand the [design philosophy of OpenLLM](https://www.bentoml.com/blog/from-ollama-to-openllm-running-llms-in-the-cloud). ## Get Started Run the following commands to install OpenLLM and explore it interactively. ```bash pip install openllm # or pip3 install openllm openllm hello ``` ![hello](https://github.com/user-attachments/assets/5af19f23-1b34-4c45-b1e0-a6798b4586d1) ## Supported models OpenLLM supports a wide range of state-of-the-art open-source LLMs. You can also add a [model repository to run custom models](#set-up-a-custom-repository) with OpenLLM. <table> <tr> <th>Model</th> <th>Parameters</th> <th>Required GPU</th> <th>Start a Server</th> </tr> <tr> <td>deepseek</td> <td>r1-671b</td> <td>80Gx16</td> <td><code>openllm serve deepseek:r1-671b</code></td> </tr> <tr> <td>gemma2</td> <td>2b</td> <td>12G</td> <td><code>openllm serve gemma2:2b</code></td> </tr> <tr> <td>gemma3</td> <td>3b</td> <td>12G</td> <td><code>openllm serve gemma3:3b</code></td> </tr> <tr> <td>jamba1.5</td> <td>mini-ff0a</td> <td>80Gx2</td> <td><code>openllm serve jamba1.5:mini-ff0a</code></td> </tr> <tr> <td>llama3.1</td> <td>8b</td> <td>24G</td> <td><code>openllm serve llama3.1:8b</code></td> </tr> <tr> <td>llama3.2</td> <td>1b</td> <td>24G</td> <td><code>openllm serve llama3.2:1b</code></td> </tr> <tr> <td>llama3.3</td> <td>70b</td> <td>80Gx2</td> <td><code>openllm serve llama3.3:70b</code></td> </tr> <tr> <td>llama4</td> <td>17b16e</td> <td>80Gx8</td> <td><code>openllm serve llama4:17b16e</code></td> </tr> <tr> <td>mistral</td> <td>8b-2410</td> <td>24G</td> <td><code>openllm serve mistral:8b-2410</code></td> </tr> <tr> <td>mistral-large</td> <td>123b-2407</td> <td>80Gx4</td> <td><code>openllm serve mistral-large:123b-2407</code></td> </tr> <tr> <td>phi4</td> <td>14b</td> <td>80G</td> <td><code>openllm serve phi4:14b</code></td> </tr> <tr> <td>pixtral</td> <td>12b-2409</td> <td>80G</td> <td><code>openllm serve pixtral:12b-2409</code></td> </tr> <tr> <td>qwen2.5</td> <td>7b</td> <td>24G</td> <td><code>openllm serve qwen2.5:7b</code></td> </tr> <tr> <td>qwen2.5-coder</td> <td>3b</td> <td>24G</td> <td><code>openllm serve qwen2.5-coder:3b</code></td> </tr> <tr> <td>qwq</td> <td>32b</td> <td>80G</td> <td><code>openllm serve qwq:32b</code></td> </tr> </table> For the full model list, see the [OpenLLM models repository](https://github.com/bentoml/openllm-models). ## Start an LLM server To start an LLM server locally, use the `openllm serve` command and specify the model version. > [!NOTE] > OpenLLM does not store model weights. A Hugging Face token (HF_TOKEN) is required for gated models. > > 1. Create your Hugging Face token [here](https://huggingface.co/settings/tokens). > 2. Request access to the gated model, such as [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct). > 3. Set your token as an environment variable by running: > ```bash > export HF_TOKEN=<your token> > ``` ```bash openllm serve llama3.2:1b ``` The server will be accessible at [http://localhost:3000](http://localhost:3000/), providing OpenAI-compatible APIs for interaction. You can call the endpoints with different frameworks and tools that support OpenAI-compatible APIs. Typically, you may need to specify the following: - **The API host address**: By default, the LLM is hosted at [http://localhost:3000](http://localhost:3000/). - **The model name:** The name can be different depending on the tool you use. - **The API key**: The API key used for client authentication. This is optional. Here are some examples: <details> <summary>OpenAI Python client</summary> ```python from openai import OpenAI client = OpenAI(base_url='http://localhost:3000/v1', api_key='na') # Use the following func to get the available models # model_list = client.models.list() # print(model_list) chat_completion = client.chat.completions.create( model="meta-llama/Llama-3.2-1B-Instruct", messages=[ { "role": "user", "content": "Explain superconductors like I'm five years old" } ], stream=True, ) for chunk in chat_completion: print(chunk.choices[0].delta.content or "", end="") ``` </details> <details> <summary>LlamaIndex</summary> ```python from llama_index.llms.openai import OpenAI llm = OpenAI(api_bese="http://localhost:3000/v1", model="meta-llama/Llama-3.2-1B-Instruct", api_key="dummy") ... ``` </details> ## Chat UI OpenLLM provides a chat UI at the `/chat` endpoint for the launched LLM server at http://localhost:3000/chat. <img width="800" alt="openllm_ui" src="https://github.com/bentoml/OpenLLM/assets/5886138/8b426b2b-67da-4545-8b09-2dc96ff8a707"> ## Chat with a model in the CLI To start a chat conversation in the CLI, use the `openllm run` command and specify the model version. ```bash openllm run llama3:8b ``` ## Model repository A model repository in OpenLLM represents a catalog of available LLMs that you can run. OpenLLM provides a default model repository that includes the latest open-source LLMs like Llama 3, Mistral, and Qwen2, hosted at [this GitHub repository](https://github.com/bentoml/openllm-models). To see all available models from the default and any added repository, use: ```bash openllm model list ``` To ensure your local list of models is synchronized with the latest updates from all connected repositories, run: ```bash openllm repo update ``` To review a model’s information, run: ```bash openllm model get llama3.2:1b ``` ### Add a model to the default model repository You can contribute to the default model repository by adding new models that others can use. This involves creating and submitting a Bento of the LLM. For more information, check out this [example pull request](https://github.com/bentoml/openllm-models/pull/1). ### Set up a custom repository You can add your own repository to OpenLLM with custom models. To do so, follow the format in the default OpenLLM model repository with a `bentos` directory to store custom LLMs. You need to [build your Bentos with BentoML](https://docs.bentoml.com/en/latest/guides/build-options.html) and submit them to your model repository. First, prepare your custom models in a `bentos` directory following the guidelines provided by [BentoML to build Bentos](https://docs.bentoml.com/en/latest/guides/build-options.html). Check out the [default model repository](https://github.com/bentoml/openllm-repo) for an example and read the [Developer Guide](https://github.com/bentoml/OpenLLM/blob/main/DEVELOPMENT.md) for details. Then, register your custom model repository with OpenLLM: ```bash openllm repo add <repo-name> <repo-url> ``` **Note**: Currently, OpenLLM only supports adding public repositories. ## Deploy to BentoCloud OpenLLM supports LLM cloud deployment via BentoML, the unified model serving framework, and BentoCloud, an AI inference platform for enterprise AI teams. BentoCloud provides fully-managed infrastructure optimized for LLM inference with autoscaling, model orchestration, observability, and many more, allowing you to run any AI model in the cloud. [Sign up for BentoCloud](https://www.bentoml.com/) for free and [log in](https://docs.bentoml.com/en/latest/bentocloud/how-tos/manage-access-token.html). Then, run `openllm deploy` to deploy a model to BentoCloud: ```bash openllm deploy llama3.2:1b --env HF_TOKEN ``` > [!NOTE] > If you are deploying a gated model, make sure to set HF_TOKEN in enviroment variables. Once the deployment is complete, you can run model inference on the BentoCloud console: <img width="800" alt="bentocloud_ui" src="https://github.com/bentoml/OpenLLM/assets/65327072/4f7819d9-73ea-488a-a66c-f724e5d063e6"> ## Community OpenLLM is actively maintained by the BentoML team. Feel free to reach out and join us in our pursuit to make LLMs more accessible and easy to use 👉 [Join our Slack community!](https://l.bentoml.com/join-slack) ## Contributing As an open-source project, we welcome contributions of all kinds, such as new features, bug fixes, and documentation. Here are some of the ways to contribute: - Repost a bug by [creating a GitHub issue](https://github.com/bentoml/OpenLLM/issues/new/choose). - [Submit a pull request](https://github.com/bentoml/OpenLLM/compare) or help review other developers’ [pull requests](https://github.com/bentoml/OpenLLM/pulls). - Add an LLM to the OpenLLM default model repository so that other users can run your model. See the [pull request template](https://github.com/bentoml/openllm-models/pull/1). - Check out the [Developer Guide](https://github.com/bentoml/OpenLLM/blob/main/DEVELOPMENT.md) to learn more. ## Acknowledgements This project uses the following open-source projects: - [bentoml/bentoml](https://github.com/bentoml/bentoml) for production level model serving - [vllm-project/vllm](https://github.com/vllm-project/vllm) for production level LLM backend - [blrchen/chatgpt-lite](https://github.com/blrchen/chatgpt-lite) for a fancy Web Chat UI - [astral-sh/uv](https://github.com/astral-sh/uv) for blazing fast model requirements installing We are grateful to the developers and contributors of these projects for their hard work and dedication.

LLM Tools & Chat UIs ML Frameworks

12.4K Github Stars

Open Source

<picture> <source media="(prefers-color-scheme: dark)" srcset="https://github.com/bentoml/BentoML/assets/489344/d3e6c95d-d224-49a5-9cff-0789f094e127"> <source media="(prefers-color-scheme: light)" srcset="https://github.com/bentoml/BentoML/assets/489344/de4da660-6aeb-4e5a-bf76-b7177435444d"> <img alt="BentoML: Unified Model Serving Framework" src="https://github.com/bentoml/BentoML/assets/489344/de4da660-6aeb-4e5a-bf76-b7177435444d" width="370" style="max-width: 100%;"> </picture> ## Unified Model Serving Framework 🍱 Build model inference APIs and multi-model serving systems with any open-source or custom AI models. 👉 [Join our forum](https://forum.modular.com/c/bento/31)! [![License: Apache-2.0](https://img.shields.io/badge/License-Apache%202-green.svg)](https://github.com/bentoml/BentoML?tab=Apache-2.0-1-ov-file) [![Releases](https://img.shields.io/github/v/release/bentoml/bentoml.svg)](https://github.com/bentoml/bentoml/releases) [![CI](https://github.com/bentoml/bentoml/actions/workflows/ci.yml/badge.svg?branch=main)](https://github.com/bentoml/BentoML/actions/workflows/ci.yml?query=branch%3Amain) [![Twitter](https://badgen.net/badge/icon/@bentomlai/1DA1F2?icon=twitter&label=Follow)](https://twitter.com/bentomlai) ## What is BentoML? BentoML is a Python library for building online serving systems optimized for AI apps and model inference. - **🍱 Easily build APIs for Any AI/ML Model.** Turn any model inference script into a REST API server with just a few lines of code and standard Python type hints. - **🐳 Docker Containers made simple.** No more dependency hell! Manage your environments, dependencies and model versions with a simple config file. BentoML automatically generates Docker images, ensures reproducibility, and simplifies how you deploy to different environments. - **🧭 Maximize CPU/GPU utilization.** Build high performance inference APIs leveraging built-in serving optimization features like dynamic batching, model parallelism, multi-stage pipeline and multi-model inference-graph orchestration. - **👩‍💻 Fully customizable.** Easily implement your own APIs or task queues, with custom business logic, model inference and multi-model composition. Supports any ML framework, modality, and inference runtime. - **🚀 Ready for Production.** Develop, run and debug locally. Seamlessly deploy to production with Docker containers or [BentoCloud](https://www.bentoml.com/). ## Getting started Install BentoML: ``` # Requires Python≥3.9 pip install -U bentoml ``` Define APIs in a `service.py` file. ```python import bentoml @bentoml.service( image=bentoml.images.Image(python_version="3.11").python_packages("torch", "transformers"), ) class Summarization: def __init__(self) -> None: import torch from transformers import pipeline device = "cuda" if torch.cuda.is_available() else "cpu" self.pipeline = pipeline('summarization', device=device) @bentoml.api(batchable=True) def summarize(self, texts: list[str]) -> list[str]: results = self.pipeline(texts) return [item['summary_text'] for item in results] ``` ### 💻 Run locally Install PyTorch and Transformers packages to your Python virtual environment. ```bash pip install torch transformers # additional dependencies for local run ``` Run the service code locally (serving at http://localhost:3000 by default): ```bash bentoml serve ``` You should expect to see the following output. ``` [INFO] [cli] Starting production HTTP BentoServer from "service:Summarization" listening on http://localhost:3000 (Press CTRL+C to quit) [INFO] [entry_service:Summarization:1] Service Summarization initialized ``` Now you can run inference from your browser at http://localhost:3000 or with a Python script: ```python import bentoml with bentoml.SyncHTTPClient('http://localhost:3000') as client: summarized_text: str = client.summarize([bentoml.__doc__])[0] print(f"Result: {summarized_text}") ``` ### 🐳 Deploy using Docker Run `bentoml build` to package necessary code, models, dependency configs into a Bento - the standardized deployable artifact in BentoML: ```bash bentoml build ``` Ensure [Docker](https://docs.docker.com/) is running. Generate a Docker container image for deployment: ```bash bentoml containerize summarization:latest ``` Run the generated image: ```bash docker run --rm -p 3000:3000 summarization:latest ``` ### ☁️ Deploy on BentoCloud [BentoCloud](https://www.bentoml.com) provides compute infrastructure for rapid and reliable GenAI adoption. It helps speed up your BentoML development process leveraging cloud compute resources, and simplify how you deploy, scale and operate BentoML in production. [Sign up for BentoCloud](https://cloud.bentoml.com/signup) for personal access; for enterprise use cases, [contact our team](https://www.bentoml.com/contact). ```bash # After signup, run the following command to create an API token: bentoml cloud login # Deploy from current directory: bentoml deploy ``` ![bentocloud-ui](./docs/source/_static/img/get-started/cloud-deployment/first-bento-on-bentocloud.png) For detailed explanations, read the [Hello World example](https://docs.bentoml.com/en/latest/get-started/hello-world.html). ## Examples - LLMs: [Llama 3.2](https://github.com/bentoml/BentoVLLM/tree/main/llama3.2-11b-vision-instruct), [Mistral](https://github.com/bentoml/BentoVLLM/tree/main/ministral-8b-instruct-2410), [DeepSeek Distil](https://github.com/bentoml/BentoVLLM/tree/main/deepseek-r1-distill-llama3.1-8b-tool-calling), and more. - Image Generation: [Stable Diffusion 3 Medium](https://github.com/bentoml/BentoDiffusion/tree/main/sd3-medium), [Stable Video Diffusion](https://github.com/bentoml/BentoDiffusion/tree/main/svd), [Stable Diffusion XL Turbo](https://github.com/bentoml/BentoDiffusion/tree/main/sdxl-turbo), [ControlNet](https://github.com/bentoml/BentoDiffusion/tree/main/controlnet), and [LCM LoRAs](https://github.com/bentoml/BentoDiffusion/tree/main/lcm). - Embeddings: [SentenceTransformers](https://github.com/bentoml/BentoSentenceTransformers) and [ColPali](https://github.com/bentoml/BentoColPali) - Audio: [ChatTTS](https://github.com/bentoml/BentoChatTTS), [XTTS](https://github.com/bentoml/BentoXTTS), [WhisperX](https://github.com/bentoml/BentoWhisperX), [Bark](https://github.com/bentoml/BentoBark) - Computer Vision: [YOLO](https://github.com/bentoml/BentoYolo) and [ResNet](https://github.com/bentoml/BentoResnet) - Advanced examples: [Function calling](https://github.com/bentoml/BentoFunctionCalling), [LangGraph](https://github.com/bentoml/BentoLangGraph), [CrewAI](https://github.com/bentoml/BentoCrewAI) Check out the [full list](https://docs.bentoml.com/en/latest/examples/overview.html) for more sample code and usage. ## Advanced topics - [Model composition](https://docs.bentoml.com/en/latest/get-started/model-composition.html) - [Workers and model parallelization](https://docs.bentoml.com/en/latest/build-with-bentoml/parallelize-requests.html) - [Adaptive batching](https://docs.bentoml.com/en/latest/get-started/adaptive-batching.html) - [GPU inference](https://docs.bentoml.com/en/latest/build-with-bentoml/gpu-inference.html) - [Distributed serving systems](https://docs.bentoml.com/en/latest/build-with-bentoml/distributed-services.html) - [Concurrency and autoscaling](https://docs.bentoml.com/en/latest/scale-with-bentocloud/scaling/autoscaling.html) - [Model loading and Model Store](https://docs.bentoml.com/en/latest/build-with-bentoml/model-loading-and-management.html) - [Observability](https://docs.bentoml.com/en/latest/build-with-bentoml/observability/index.html) - [BentoCloud deployment](https://docs.bentoml.com/en/latest/get-started/cloud-deployment.html) See [Documentation](https://docs.bentoml.com) for more tutorials and guides. ## Community Get involved and join our [Community Forum 💬](https://forum.modular.com/c/bento/31), where thousands of AI/ML engineers help each other, contribute to the project, and talk about building AI products. To report a bug or suggest a feature request, use [GitHub Issues](https://github.com/bentoml/BentoML/issues/new/choose). ### Contributing There are many ways to contribute to the project: - Report bugs and "Thumbs up" on [issues](https://github.com/bentoml/BentoML/issues) that are relevant to you. - Investigate [issues](https://github.com/bentoml/BentoML/issues) and review other developers' [pull requests](https://github.com/bentoml/BentoML/pulls). - Contribute code or [documentation](https://docs.bentoml.com/en/latest/index.html) to the project by submitting a GitHub pull request. - Check out the [Contributing Guide](https://github.com/bentoml/BentoML/blob/main/CONTRIBUTING.md) and [Development Guide](https://github.com/bentoml/BentoML/blob/main/DEVELOPMENT.md) to learn more. - Share your feedback and discuss roadmap plans in our [forum](https://forum.modular.com/c/bento/31). Thanks to all of our amazing contributors! <a href="https://github.com/bentoml/BentoML/graphs/contributors"> <img src="https://contrib.rocks/image?repo=bentoml/BentoML" /> </a> ### Usage tracking and feedback The BentoML framework collects anonymous usage data that helps our community improve the product. Only BentoML's internal API calls are being reported. This excludes any sensitive information, such as user code, model data, model names, or stack traces. Here's the [code](https://github.com/bentoml/BentoML/blob/main/src/bentoml/_internal/utils/analytics/usage_stats.py) used for usage tracking. You can opt-out of usage tracking by the `--do-not-track` CLI option: ```bash bentoml [command] --do-not-track ``` Or by setting the environment variable: ```bash export BENTOML_DO_NOT_TRACK=True ``` ### License [Apache License 2.0](https://github.com/bentoml/BentoML/blob/main/LICENSE)

AI & Machine Learning

8.7K Github Stars

Open Source

BentoDiffusion

<div align="center"> <h1 align="center">Self-host Diffusion Models with BentoML</h1> </div> This repository contains a series of BentoML example projects, demonstrating how to deploy different models in [the Stable Diffusion (SD) family](https://huggingface.co/models?other=stable-diffusion), which is specialized in generating and manipulating images or video clips based on text prompts. See [here](https://docs.bentoml.com/en/latest/examples/overview.html) for a full list of BentoML example projects. The following guide uses SDXL Turbo as an example. ## Prerequisites If you want to test the Service locally, we recommend you use an Nvidia GPU with at least 12GB VRAM. ## Install dependencies ```bash git clone https://github.com/bentoml/BentoDiffusion.git cd BentoDiffusion/sdxl-turbo # Recommend Python 3.11 pip install -r requirements.txt ``` ## Run the BentoML Service We have defined a BentoML Service in `service.py`. Run `bentoml serve` in your project directory to start the Service. ```bash $ bentoml serve 2024-01-18T18:31:49+0800 [INFO] [cli] Starting production HTTP BentoServer from "service:SDXLTurboService" listening on http://localhost:3000 (Press CTRL+C to quit) Loading pipeline components...: 100% ``` The server is now active at [http://localhost:3000](http://localhost:3000/). You can interact with it using the Swagger UI or in other different ways. CURL ```bash curl -X 'POST' \ 'http://localhost:3000/txt2img' \ -H 'accept: image/*' \ -H 'Content-Type: application/json' \ -d '{ "prompt": "A cinematic shot of a baby racoon wearing an intricate italian priest robe.", "num_inference_steps": 1, "guidance_scale": 0 }' ``` Python client ```python import bentoml with bentoml.SyncHTTPClient("http://localhost:3000") as client: result = client.txt2img( prompt="A cinematic shot of a baby racoon wearing an intricate italian priest robe.", num_inference_steps=1, guidance_scale=0.0 ) ``` For detailed explanations of the Service code, see [Stable Diffusion XL Turbo](https://docs.bentoml.com/en/latest/use-cases/diffusion-models/sdxl-turbo.html). ## Deploy to BentoCloud After the Service is ready, you can deploy the application to BentoCloud for better management and scalability. [Sign up](https://www.bentoml.com/) if you haven't got a BentoCloud account. Make sure you have [logged in to BentoCloud](https://docs.bentoml.com/en/latest/scale-with-bentocloud/manage-api-tokens.html). ```bash bentoml cloud login ``` Deploy it to BentoCloud. ```bash bentoml deploy ``` Once the application is up and running on BentoCloud, you can access it via the exposed URL. **Note**: For custom deployment in your own infrastructure, use [BentoML to generate an OCI-compliant image](https://docs.bentoml.com/en/latest/get-started/packaging-for-deployment.html). ## Choose another diffusion model To deploy a different diffusion model, go to the corresponding subdirectories of this repository. - [FLUX.1](flux-timestep-distilled/) - [Stable Diffusion 3 Medium](sd3-medium/) - [Stable Diffusion 3.5 Large Turbo](sd3.5-large-turbo/) - [Stable Diffusion 3.5 Large](sd3.5-large/) - [Stable Diffusion XL Lightning](sdxl-lightning/) - [Stable Diffusion XL Turbo](sdxl-turbo/) - [ControlNet](controlnet/)

AI & Machine Learning DevOps & Infrastructure

387 Github Stars

Open Source

Yatai

# 🦄️ Yatai: Model Deployment at Scale on Kubernetes [![actions_status](https://github.com/bentoml/yatai/workflows/Release/badge.svg)](https://github.com/bentoml/yatai/actions) [![join_slack](https://badgen.net/badge/Join/Community%20Slack/cyan?icon=slack&style=flat-square)](https://join.slack.bentoml.org) ⚠️ Yatai for [BentoML 1.2](https://github.com/bentoml/BentoML/releases/tag/v1.2.0) is currently under construction. See [Yatai 2.0 Proposal](https://github.com/bentoml/Yatai/issues/504) for more details. --- Yatai (屋台, food cart) is the Kubernetes deployment operator for [BentoML](https://github.com/bentoml/bentoml). It let DevOps teams to seamlessly integrate BentoML into their GitOps workflow, for deploying and scaling Machine Learning services on any Kubernetes cluster. 👉 [Join our Slack community today!](https://l.bentoml.com/join-slack) --- ## Why Yatai? Yatai empowers developers to deploy [BentoML](https://github.com/bentoml) on Kubernetes, optimized for CI/CD and DevOps workflow. Yatai is Cloud native and DevOps friendly. Via its Kubernetes-native workflow, specifically the [BentoDeployment CRD](https://docs.yatai.io/en/latest/concepts/bentodeployment_crd.html) (Custom Resource Definition), DevOps teams can easily fit BentoML powered services into their existing workflow. ## Getting Started - 📖 [Documentation](https://docs.yatai.io/) - Overview of the Yatai docs and related resources - ⚙️ [Installation](https://docs.yatai.io/en/latest/installation/index.html) - Hands-on instruction on how to install Yatai for production use - 👉 [Join Community Slack](https://l.linklyhq.com/l/ktPW) - Get help from our community and maintainers ## Quick Tour Let's try out Yatai locally in a minikube cluster! ### ⚙️ Prerequisites: * Install latest minikube: https://minikube.sigs.k8s.io/docs/start/ * Install latest Helm: https://helm.sh/docs/intro/install/ * Start a minikube Kubernetes cluster: `minikube start --cpus 4 --memory 4096`, if you are using macOS, you should use [hyperkit](https://minikube.sigs.k8s.io/docs/drivers/hyperkit/) driver to prevent the macOS docker desktop [network limitation](https://docs.docker.com/desktop/networking/#i-cannot-ping-my-containers) * Check that minikube cluster status is "running": `minikube status` * Make sure your `kubectl` is configured with `minikube` context: `kubectl config current-context` * Enable ingress controller: `minikube addons enable ingress` ### 🚧 Install Yatai Install Yatai with the following script: ```bash bash <(curl -s "https://raw.githubusercontent.com/bentoml/yatai/main/scripts/quick-install-yatai.sh") ``` This script will install Yatai along with its dependencies (PostgreSQL and MinIO) on your minikube cluster. Note that this installation script is made for development and testing use only. For production deployment, check out the [Installation Guide](https://docs.yatai.io/en/latest/installation/index.html). To access Yatai web UI, run the following command and keep the terminal open: ```bash kubectl --namespace yatai-system port-forward svc/yatai 8080:80 ``` In a separate terminal, run: ```bash YATAI_INITIALIZATION_TOKEN=$(kubectl get secret yatai-env --namespace yatai-system -o jsonpath="{.data.YATAI_INITIALIZATION_TOKEN}" | base64 --decode) echo "Open in browser: http://127.0.0.1:8080/setup?token=$YATAI_INITIALIZATION_TOKEN" ``` Open the URL printed above from your browser to finish admin account setup. ### 🍱 Push Bento to Yatai First, get an API token and login to the BentoML CLI: * Keep the `kubectl port-forward` command in the step above running * Go to Yatai's API tokens page: http://127.0.0.1:8080/api_tokens * Create a new API token from the UI, making sure to assign "API" access under "Scopes" * Copy the login command upon token creation and run as a shell command, e.g.: ```bash bentoml yatai login --api-token {YOUR_TOKEN} --endpoint http://127.0.0.1:8080 ``` If you don't already have a Bento built, run the following commands from the [BentoML Quickstart Project](https://github.com/bentoml/BentoML/tree/main/examples/quickstart) to build a sample Bento: ```bash git clone https://github.com/bentoml/bentoml.git && cd ./examples/quickstart pip install -r ./requirements.txt python train.py bentoml build ``` Push your newly built Bento to Yatai: ```bash bentoml push iris_classifier:latest ``` ### 🔧 Install yatai-image-builder component Yatai's image builder feature comes as a separate component, you can install it via the following script: ```bash bash <(curl -s "https://raw.githubusercontent.com/bentoml/yatai-image-builder/main/scripts/quick-install-yatai-image-builder.sh") ``` This will install the `BentoRequest` CRD([Custom Resource Definition](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/)) and `Bento` CRD in your cluster. Similarly, this script is made for development and testing purposes only. ### 🔧 Install yatai-deployment component Yatai's Deployment feature comes as a separate component, you can install it via the following script: ```bash bash <(curl -s "https://raw.githubusercontent.com/bentoml/yatai-deployment/main/scripts/quick-install-yatai-deployment.sh") ``` This will install the `BentoDeployment` CRD([Custom Resource Definition](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/)) in your cluster and enable the deployment UI on Yatai. Similarly, this script is made for development and testing purposes only. ### 🚢 Deploy Bento! Once the `yatai-deployment` component was installed, Bentos pushed to Yatai can be deployed to your Kubernetes cluster and exposed via a Service endpoint. A Bento Deployment can be created via applying a BentoDeployment resource: Define your Bento deployment in a `my_deployment.yaml` file: ```yaml apiVersion: resources.yatai.ai/v1alpha1 kind: BentoRequest metadata: name: iris-classifier namespace: yatai spec: bentoTag: iris_classifier:3oevmqfvnkvwvuqj # check the tag by `bentoml list iris_classifier` --- apiVersion: serving.yatai.ai/v2alpha1 kind: BentoDeployment metadata: name: my-bento-deployment namespace: yatai spec: bento: iris-classifier ingress: enabled: true resources: limits: cpu: "500m" memory: "512Mi" requests: cpu: "250m" memory: "128Mi" autoscaling: maxReplicas: 10 minReplicas: 2 runners: - name: iris_clf resources: limits: cpu: "1000m" memory: "1Gi" requests: cpu: "500m" memory: "512Mi" autoscaling: maxReplicas: 4 minReplicas: 1 ``` Apply the deployment to your minikube cluster: ```bash kubectl apply -f my_deployment.yaml ``` Now you can check the deployment status via `kubectl get BentoDeployment -n my-bento-deployment` ## Community - To report a bug or suggest a feature request, use [GitHub Issues](https://github.com/bentoml/yatai/issues/new/choose). - For other discussions, use [GitHub Discussions](https://github.com/bentoml/BentoML/discussions) under the [BentoML repo](https://github.com/bentoml/BentoML/) - To receive release announcements and get support, join us on [Slack](https://join.slack.bentoml.org). ## Contributing There are many ways to contribute to the project: - If you have any feedback on the project, share it with the community in [GitHub Discussions](https://github.com/bentoml/BentoML/discussions) under the [BentoML repo](https://github.com/bentoml/BentoML/). - Report issues you're facing and "Thumbs up" on issues and feature requests that are relevant to you. - Investigate bugs and review other developers' pull requests. - Contributing code or documentation to the project by submitting a GitHub pull request. See the [development guide](https://github.com/bentoml/yatai/blob/main/DEVELOPMENT.md). ## Licence [Elastic License 2.0 (ELv2)](https://github.com/bentoml/yatai/blob/main/LICENSE.md)

CMS & Blogging DevOps & Infrastructure

844 Github Stars

Open Source

CLIP-API-service

<div align="center"> <h1 align="center">CLIP API Service</h1> <br> <strong>Discover the effortless integration of OpenAI's innovative CLIP model with our streamlined API service. <br></strong> <i>Powered by BentoML 🍱</i> <br> </div> <br> > [!CAUTION] > This repo is deprecated and won't receive updates in the future. Please go to [BentoCLIP](https://github.com/bentoml/BentoCLIP) for the latest usage. ## 📖 Introduction 📖 [CLIP](https://openai.com/research/clip), or Contrastive Language-Image Pretraining, is a cutting-edge AI model that comprehends and connects text and images, revolutionizing how we interpret online data. This library provides you with an instant, easy-to-use interface for CLIP, allowing you to harness its capabilities without any setup hassles. BentoML takes care of all the complexity of serving the model! ## 🔧 Installation 🔧 Ensure that you have Python 3.8 or newer and `pip` installed on your system. We highly recommend using a Virtual Environment to avoid any potential package conflicts. To install the service, enter the following command: ```bash pip install clip-api-service ``` ## 🏃 Quick start 🏃 Once the installation process is complete, you can start the service by running: ```bash clip-api-service serve --model-name=ViT-B-32:openai ``` Your service is now running! Interact with it via the Swagger UI at `localhost:3000` ![SwaggerUI](images/swagger-ui.png) Or try this tutorial in Google Colab: [CLIP demo](https://colab.research.google.com/github/bentoml/CLIP-API-service/blob/main/example/clip_demo.ipynb). ## 🎯 Use cases 🎯 Harness the capabilities of the CLIP API service across a range of applications: ### Encode 1. Text and Image Embedding - Use `encode` to transform text or images into meaningful embeddings. This makes it possible to perform tasks such as: 1. **Neural Search**: Utilize encoded embeddings to power a search engine capable of understanding and indexing images based on their textual descriptions, and vice versa. 2. **Custom Ranking**: Design a ranking system based on embeddings, providing unique ways to sort and categorize data according to your context. ### Rank 2. Zero-Shot Image Classification - Use `rank` to perform image classification without any training. For example: 1. Given a set of images, classify an image as being "a picture of a dog" or "a picture of a cat". 2. More complex classifications such as recognizing different breeds of dogs can also be performed, illustrating the versatility of the CLIP API service. 3. Visual Reasoning - The `rank` function can also be used to provide reasoning about visual scenarios. For instance: | Visual Scenario | Query Image | Candidates | Output | |-----------------|-------|---------------|--------| | Counting Objects | ![Three Dog](images/three-dog.jpg) | This is a picture of 1 dog<br>This is a picture of 2 dogs<br>This is a picture of 3 dogs | Image matched with "3 dogs" | | Identifying Colors | ![Blue Car](images/bluecar.jpeg) | The car is red<br>The car is blue<br>The car is green | Image matched with "blue car" | | Understanding Motion | ![Parked Car](images/parkedcar.jpeg) | The car is parked<br>The car is moving<br>The car is turning| Image matched with "parked car" | | Recognizing Location | ![Suburb Car](images/car-suburb.jpeg) | The car is in the suburb<br>The car is on the highway<br>The car is in the street| Image matched with "car in the street" | | Relative Positioning | ![Big Small car](images/big-small-car.jpg) | The big car is on the left, the small car is on the right<br>The small car is on the left, the big car is on the right| Image matched with the provided description | ## 🚀 Deploying to Production 🚀 Effortlessly transition your project into a production-ready application using [BentoCloud](https://www.bentoml.com/bento-cloud/), the production-ready platform for managing and deploying machine learning models. Start by creating a BentoCloud account. Once you've signed up, log in to your BentoCloud account using the command: ```bash bentoml cloud login --api-token <your-api-token> --endpoint <bento-cloud-endpoint> ``` > Note: Replace `<your-api-token>` and `<bento-cloud-endpoint>` with your specific API token and the BentoCloud endpoint respectively. Next, build your BentoML service using the `build` command: ```bash clip-api-service build --model-name=ViT-B-32:openai ``` Then, push your freshly-built Bento service to BentoCloud using the `push` command: ```bash bentoml push <name:version> ``` Lastly, deploy this application to BentoCloud with a single `bentoml deployment create` command following the [deployment instructions](https://docs.bentoml.org/en/latest/reference/cli.html#bentoml-deployment-create). BentoML offers a number of options for deploying and hosting online ML services into production, learn more at [Deploying a Bento](https://docs.bentoml.org/en/latest/concepts/deploy.html). ## 📚 Reference 📚 ### API reference #### `/encode` Accepts either: * `img_uri` : An Image URI, i.e `https://hips.hearstapps.com/hmg-prod/images/dog-puppy-on-garden-royalty-free-image-1586966191.jpg` * `text` : A string * `img_blob` : Base64 encoded string Returns a vector of embeddings of length 768. **Example:** ``` curl -X 'POST' \ 'http://localhost:3000/encode' \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ -d '[ { "img_uri": "https://hips.hearstapps.com/hmg-prod/images/dog-puppy-on-garden-royalty-free-image-1586966191.jpg" }, { "text": "picture of a dog" }, { "img_blob": "iVBORw0KGgoAAAANSUhEUgAAABIAAAAPCAYAAADphp8SAAAABHNCSVQICAgIfAhkiAAAABl0RVh0U29mdHdhcmUAZ25vbWUtc2NyZWVuc2hvdO8Dvz4AAAApdEVYdENyZWF0aW9uIFRpbWUARnJpIDI2IE1heSAyMDIzIDA0OjE2OjIxIEFNCXIaIQAAAnhJREFUOI1Fk82OY0UMhT+7qu6dJPfmZ3p6EhYR3QSGnk0P6QV/4rlYsOIBeKrhIQAJtZCgNQsmEWiQSG5V2SwSeixZtiydYx9blpsJPpv2fPP1V3zx5ee8vHnJbDFlNl/Q9GMQBwLUAM7JcXADN9wN3Im1wOaja7afbfnk4xcsVytS25DaFsyptSDiaIyAnMn8kVTOMc6mI+62d2w2G5bLFZOuBxUkJsyNYoIGRVXfTyMCLmByIseJm+trbl58ynL5nK7v0dTgOGgENyQqsWlxdxxH4FEW7og7mBHvtq+4vHxG381IqQF3qjvBnOrgKBIiOR/fA89gMUPdoRpxe/uKbjym6zton4AZ6oZoRNxQUQiJOhzO3R1xeyTiXItXV1csLi4gRjADOIHdEEBEoGSCCOaOmUGtKI4CQQMIxL7vmIwmcBygAdoWAKuV6o5GwUwQB/HzjkTOufC/xZRaSi6IOjHFU1UVrRkrmTwcERViiohDQFANiAmYYbUgBnE8miAiDDlT/j0gQ0aj4qoklZMkHBsKIShBFJXTJLVWai64GbGUQnrSEori7ljO5Gy4gMZIahMpRA7DABZAwXGsVnIulOGIuxFfv/6RUdtwcfmM2XTKfDEnNYlcCjkP1BJIqYUChuNU3J1aM+Us382Jv97f8/d+x3w+YzbtWa5WLD94ztOnCyZdR0QRM8TkdACr1FqoJeNuuFUAwg/fffu9m7Hf73j4/Q9+/uUn3jw8YG6MRyNijFgtJG0wM8owMBwPlJKppZ6+RiA2TWK9XvPhes27f96x3+04Hg/s/vyLe/2N29ueyWxCzQe8Gvu3b0GUXCu7/Y5ijgblP3zyX4rqQyp1AAAAAElFTkSuQmCC" } ]' ``` #### `/rank` Accepts a list of `queries` and a list of `candidates`. Similar to above, `queries` and `candidates` are either: * `img_uri` : An Image URI, i.e `https://hips.hearstapps.com/hmg-prod/images/dog-puppy-on-garden-royalty-free-image-1586966191.jpg` * `text` : A string * `img_blob` : Base64 encoded string Returns a list of probabilies and cosine similarities of each candidate with respect to the query. **Example:** ``` curl -X 'POST' \ 'http://localhost:3000/rank' \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ -d '{ "queries": [ { "img_uri": "https://hips.hearstapps.com/hmg-prod/images/dog-puppy-on-garden-royalty-free-image-1586966191.jpg" } ], "candidates": [ { "text": "picture of a dog" }, { "text": "picture of a cat" }, { "text": "picture of a bird" }, { "text": "picture of a car" }, { "text": "picture of a plane" }, { "text": "picture of a boat" } ] }' ``` And the response looks like: ``` { "probabilities": [ [ 0.9958375692367554, 0.0022114247549325228, 0.001514736912213266, 0.00011969256593147293, 0.00019143625104334205, 0.0001251235808013007 ] ], "cosine_similarities": [ [ 0.2297772467136383, 0.16867777705192566, 0.16489382088184357, 0.13951312005519867, 0.14420939981937408, 0.13995687663555145 ] ] } ``` ### CLI reference #### `serve` Spins up a HTTP Server with the model of your choice. Arguments: * `--model-name` : Name of the CLIP model. Use `list_models` to see the list of available model. Default: `openai/clip-vit-large-patch14` #### `build` Builds a Bento with the model of your choice Arguments: * `--model-name` : Name of the CLIP model. Use `list_models` to see the list of available model. Default: `openai/clip-vit-large-patch14` #### `list_models` List all available CLIP models.

AI & Machine Learning ML Frameworks

66 Github Stars

bentoml

Software by bentoml

OpenLLM

BentoML

BentoDiffusion

Yatai

CLIP-API-service