Home
Softono
h

huggingface

Professional software vendor delivering innovative solutions on the Softono platform. Specialized in both open-source and proprietary software development.

Total Products
20

Software by huggingface

chat-ui
Open Source

chat-ui

# Chat UI ![Chat UI repository thumbnail](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/chat-ui/chat-ui-2026.png) A chat interface for LLMs. It is a SvelteKit app and it powers the [HuggingChat app on hf.co/chat](https://huggingface.co/chat). 0. [Quickstart](#quickstart) 1. [Database Options](#database-options) 2. [Launch](#launch) 3. [Optional Docker Image](#optional-docker-image) 4. [Extra parameters](#extra-parameters) 5. [Building](#building) > [!NOTE] > Chat UI only supports OpenAI-compatible APIs via `OPENAI_BASE_URL` and the `/models` endpoint. Provider-specific integrations (legacy `MODELS` env var, GGUF discovery, embeddings, web-search helpers, etc.) are removed, but any service that speaks the OpenAI protocol (llama.cpp server, Ollama, OpenRouter, etc.) will work by default. > [!NOTE] > The old version is still available on the [legacy branch](https://github.com/huggingface/chat-ui/tree/legacy) ## Quickstart Chat UI speaks to OpenAI-compatible APIs only. The fastest way to get running is with the Hugging Face Inference Providers router plus your personal Hugging Face access token. **Step 1 – Create `.env.local`:** ```env OPENAI_BASE_URL=https://router.huggingface.co/v1 OPENAI_API_KEY=hf_************************ ``` `OPENAI_API_KEY` can come from any OpenAI-compatible endpoint you plan to call. Pick the combo that matches your setup and drop the values into `.env.local`: | Provider | Example `OPENAI_BASE_URL` | Example key env | | --------------------------------------------- | ---------------------------------- | ----------------------------------------------------------------------- | | Hugging Face Inference Providers router | `https://router.huggingface.co/v1` | `OPENAI_API_KEY=hf_xxx` (or `HF_TOKEN` legacy alias) | | llama.cpp server (`llama.cpp --server --api`) | `http://127.0.0.1:8080/v1` | `OPENAI_API_KEY=sk-local-demo` (any string works; llama.cpp ignores it) | | Ollama (with OpenAI-compatible bridge) | `http://127.0.0.1:11434/v1` | `OPENAI_API_KEY=ollama` | | OpenRouter | `https://openrouter.ai/api/v1` | `OPENAI_API_KEY=sk-or-v1-...` | | Poe | `https://api.poe.com/v1` | `OPENAI_API_KEY=pk_...` | Check the root [`.env` template](./.env) for the full list of optional variables you can override. **Step 2 – Install and launch the dev server:** ```bash git clone https://github.com/huggingface/chat-ui cd chat-ui npm install npm run dev -- --open ``` You now have Chat UI running locally. Open the browser and start chatting. ## Database Options Chat history, users, settings, files, and stats all live in MongoDB. You can point Chat UI at any MongoDB 6/7 deployment. > [!TIP] > For quick local development, you can skip this section. When `MONGODB_URL` is not set, Chat UI falls back to an embedded MongoDB that persists to `./db`. ### MongoDB Atlas (managed) 1. Create a free cluster at [mongodb.com](https://www.mongodb.com/pricing). 2. Add your IP (or `0.0.0.0/0` for development) to the network access list. 3. Create a database user and copy the connection string. 4. Paste that string into `MONGODB_URL` in `.env.local`. Keep the default `MONGODB_DB_NAME=chat-ui` or change it per environment. Atlas keeps MongoDB off your laptop, which is ideal for teams or cloud deployments. ### Local MongoDB (container) If you prefer to run MongoDB in a container: ```bash docker run -d -p 27017:27017 --name mongo-chatui mongo:latest ``` Then set `MONGODB_URL=mongodb://localhost:27017` in `.env.local`. ## Launch After configuring your environment variables, start Chat UI with: ```bash npm install npm run dev ``` The dev server listens on `http://localhost:5173` by default. Use `npm run build` / `npm run preview` for production builds. ## Optional Docker Image The `chat-ui-db` image bundles MongoDB inside the container: ```bash docker run \ -p 3000:3000 \ -e OPENAI_BASE_URL=https://router.huggingface.co/v1 \ -e OPENAI_API_KEY=hf_*** \ -v chat-ui-data:/data \ ghcr.io/huggingface/chat-ui-db:latest ``` All environment variables accepted in `.env.local` can be provided as `-e` flags. ## Extra parameters ### Theming You can use a few environment variables to customize the look and feel of chat-ui. These are by default: ```env PUBLIC_APP_NAME=ChatUI PUBLIC_APP_ASSETS=chatui PUBLIC_APP_DESCRIPTION="Making the community's best AI chat models available to everyone." PUBLIC_APP_DATA_SHARING= ``` - `PUBLIC_APP_NAME` The name used as a title throughout the app. - `PUBLIC_APP_ASSETS` Is used to find logos & favicons in `static/$PUBLIC_APP_ASSETS`, current options are `chatui` and `huggingchat`. - `PUBLIC_APP_DATA_SHARING` Can be set to 1 to add a toggle in the user settings that lets your users opt-in to data sharing with models creator. ### Models Models are discovered from `${OPENAI_BASE_URL}/models`, and you can optionally override their metadata via the `MODELS` env var (JSON5). Legacy provider‑specific integrations and GGUF discovery are removed. Authorization uses `OPENAI_API_KEY` (preferred). `HF_TOKEN` remains a legacy alias. ### LLM Router (Optional) Chat UI can perform server-side smart routing using a local heuristic — no separate router service or selection model is called. The UI exposes a virtual model alias called "Omni" (configurable) that, when selected, chooses the best route/model for each message: image inputs go to a `multimodal` route, MCP-tool-enabled requests go to an `agentic` route, and everything else goes to a `default` route. - Provide a routes policy JSON via `LLM_ROUTER_ROUTES_PATH`. No sample file ships with this branch, so you must point the variable to a JSON array you create yourself (for example, commit one in your project like `config/routes.chat.json`). Each route entry needs `name`, `description`, `primary_model`, and optional `fallback_models`. The router recognizes the route names `default`, `multimodal`, and `agentic`. - The default route name is configurable via `LLM_ROUTER_DEFAULT_ROUTE` (default: `default`). If the selected route's models all fail, calls fall back to `LLM_ROUTER_FALLBACK_MODEL`. - Omni alias configuration: `PUBLIC_LLM_ROUTER_ALIAS_ID` (default `omni`), `PUBLIC_LLM_ROUTER_DISPLAY_NAME` (default `Omni`), and optional `PUBLIC_LLM_ROUTER_LOGO_URL`. When you select Omni in the UI, Chat UI will: - Pick a route locally based on the request signals (image attached, MCP server enabled, or default). - Emit RouterMetadata immediately (route and actual model used) so the UI can display it. - Stream from the selected model via your configured `OPENAI_BASE_URL`. On errors, it tries route fallbacks in order, then `LLM_ROUTER_FALLBACK_MODEL`. Tool and multimodal shortcuts: - Multimodal: If `LLM_ROUTER_ENABLE_MULTIMODAL=true` and the user sends an image, the router bypasses the policy file and uses the model specified in `LLM_ROUTER_MULTIMODAL_MODEL`. Route name: `multimodal`. - Tools: If `LLM_ROUTER_ENABLE_TOOLS=true` and the user has at least one MCP server enabled, the router bypasses the policy file and uses `LLM_ROUTER_TOOLS_MODEL`. If that model is missing or misconfigured, it falls back to the heuristic route. Route name: `agentic`. ### MCP Tools (Optional) Chat UI can call tools exposed by Model Context Protocol (MCP) servers and feed results back to the model using OpenAI function calling. You can preconfigure trusted servers via env, let users add their own, and optionally have the Omni router auto‑select a tools‑capable model. Configure servers (base list for all users): ```env # JSON array of servers: name, url, optional headers MCP_SERVERS=[ {"name": "Web Search (Exa)", "url": "https://mcp.exa.ai/mcp"}, {"name": "Hugging Face MCP Login", "url": "https://hf.co/mcp?login"} ] # Forward the signed-in user's Hugging Face token to the official HF MCP login endpoint # when no Authorization header is set on that server entry. MCP_FORWARD_HF_USER_TOKEN=true ``` Enable router tool path (Omni): - Set `LLM_ROUTER_ENABLE_TOOLS=true` and choose a tools‑capable target with `LLM_ROUTER_TOOLS_MODEL=<model id or name>`. - The target must support OpenAI tools/function calling. Chat UI surfaces a “tools” badge on models that advertise this; you can also force‑enable it per‑model in settings (see below). Use tools in the UI: - Open “MCP Servers” from the top‑right menu or from the `+` menu in the chat input to add servers, toggle them on, and run Health Check. The server card lists available tools. - When a model calls a tool, the message shows a compact “tool” block with parameters, a progress bar while running, and the result (or error). Results are also provided back to the model for follow‑up. Per‑model overrides: - In Settings → Model, you can toggle “Tool calling (functions)” and “Multimodal input” per model. These overrides apply even if the provider metadata doesn’t advertise the capability. ## Building To create a production version of your app: ```bash npm run build ``` You can preview the production build with `npm run preview`. > To deploy your app, you may need to install an [adapter](https://kit.svelte.dev/docs/adapters) for your target environment.

LLM Tools & Chat UIs Live Chat & Chatbots
10.8K Github Stars
transformers
Open Source

transformers

<!--- Copyright 2020 The HuggingFace Team. All rights reserved. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. --> <p align="center"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://huggingface.co/datasets/huggingface/documentation-images/raw/main/transformers-logo-dark.svg"> <source media="(prefers-color-scheme: light)" srcset="https://huggingface.co/datasets/huggingface/documentation-images/raw/main/transformers-logo-light.svg"> <img alt="Hugging Face Transformers Library" src="https://huggingface.co/datasets/huggingface/documentation-images/raw/main/transformers-logo-light.svg" width="352" height="59" style="max-width: 100%;"> </picture> <br/> <br/> </p> <p align="center"> <a href="https://huggingface.com/models"><img alt="Checkpoints on Hub" src="https://img.shields.io/endpoint?url=https://huggingface.co/api/shields/models&color=brightgreen"></a> <a href="https://circleci.com/gh/huggingface/transformers"><img alt="Build" src="https://img.shields.io/circleci/build/github/huggingface/transformers/main"></a> <a href="https://github.com/huggingface/transformers/blob/main/LICENSE"><img alt="GitHub" src="https://img.shields.io/github/license/huggingface/transformers.svg?color=blue"></a> <a href="https://huggingface.co/docs/transformers/index"><img alt="Documentation" src="https://img.shields.io/website/http/huggingface.co/docs/transformers/index.svg?down_color=red&down_message=offline&up_message=online"></a> <a href="https://github.com/huggingface/transformers/releases"><img alt="GitHub release" src="https://img.shields.io/github/release/huggingface/transformers.svg"></a> <a href="https://github.com/huggingface/transformers/blob/main/CODE_OF_CONDUCT.md"><img alt="Contributor Covenant" src="https://img.shields.io/badge/Contributor%20Covenant-v2.0%20adopted-ff69b4.svg"></a> <a href="https://zenodo.org/badge/latestdoi/155220641"><img src="https://zenodo.org/badge/155220641.svg" alt="DOI"></a> </p> <h4 align="center"> <p> <b>English</b> | <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_zh-hans.md">简体中文</a> | <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_zh-hant.md">繁體中文</a> | <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_ko.md">한국어</a> | <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_es.md">Español</a> | <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_ja.md">日本語</a> | <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_hd.md">हिन्दी</a> | <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_ru.md">Русский</a> | <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_pt-br.md">Português</a> | <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_te.md">తెలుగు</a> | <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_fr.md">Français</a> | <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_de.md">Deutsch</a> | <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_it.md">Italiano</a> | <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_vi.md">Tiếng Việt</a> | <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_ar.md">العربية</a> | <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_ur.md">اردو</a> | <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_bn.md">বাংলা</a> | <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_fa.md">فارسی</a> | </p> </h4> <h3 align="center"> <p>State-of-the-art pretrained models for inference and training</p> </h3> <h3 align="center"> <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/transformers_as_a_model_definition.png"/> </h3> Transformers acts as the model-definition framework for state-of-the-art machine learning with text, computer vision, audio, video, and multimodal models, for both inference and training. It centralizes the model definition so that this definition is agreed upon across the ecosystem. `transformers` is the pivot across frameworks: if a model definition is supported, it will be compatible with the majority of training frameworks (Axolotl, Unsloth, DeepSpeed, FSDP, PyTorch-Lightning, ...), inference engines (vLLM, SGLang, TGI, ...), and adjacent modeling libraries (llama.cpp, mlx, ...) which leverage the model definition from `transformers`. We pledge to help support new state-of-the-art models and democratize their usage by having their model definition be simple, customizable, and efficient. There are over 1M+ Transformers [model checkpoints](https://huggingface.co/models?library=transformers&sort=trending) on the [Hugging Face Hub](https://huggingface.co/models) you can use. Explore the [Hub](https://huggingface.co/) today to find a model and use Transformers to help you get started right away. ## Installation Transformers works with Python 3.10+, and [PyTorch](https://pytorch.org/get-started/locally/) 2.4+. Create and activate a virtual environment with [venv](https://docs.python.org/3/library/venv.html) or [uv](https://docs.astral.sh/uv/), a fast Rust-based Python package and project manager. ```py # venv python -m venv .my-env source .my-env/bin/activate # uv uv venv .my-env source .my-env/bin/activate ``` Install Transformers in your virtual environment. ```py # pip pip install "transformers[torch]" # uv uv pip install "transformers[torch]" ``` Install Transformers from source if you want the latest changes in the library or are interested in contributing. However, the *latest* version may not be stable. Feel free to open an [issue](https://github.com/huggingface/transformers/issues) if you encounter an error. ```shell git clone https://github.com/huggingface/transformers.git cd transformers # pip pip install '.[torch]' # uv uv pip install '.[torch]' ``` ## Quickstart Get started with Transformers right away with the [Pipeline](https://huggingface.co/docs/transformers/pipeline_tutorial) API. The `Pipeline` is a high-level inference class that supports text, audio, vision, and multimodal tasks. It handles preprocessing the input and returns the appropriate output. Instantiate a pipeline and specify model to use for text generation. The model is downloaded and cached so you can easily reuse it again. Finally, pass some text to prompt the model. ```py from transformers import pipeline pipeline = pipeline(task="text-generation", model="Qwen/Qwen2.5-1.5B") pipeline("the secret to baking a really good cake is ") [{'generated_text': 'the secret to baking a really good cake is 1) to use the right ingredients and 2) to follow the recipe exactly. the recipe for the cake is as follows: 1 cup of sugar, 1 cup of flour, 1 cup of milk, 1 cup of butter, 1 cup of eggs, 1 cup of chocolate chips. if you want to make 2 cakes, how much sugar do you need? To make 2 cakes, you will need 2 cups of sugar.'}] ``` To chat with a model, the usage pattern is the same. The only difference is you need to construct a chat history (the input to `Pipeline`) between you and the system. > [!TIP] > You can also chat with a model directly from the command line, as long as [`transformers serve` is running](https://huggingface.co/docs/transformers/main/en/serving). > ```shell > transformers chat Qwen/Qwen2.5-0.5B-Instruct > ``` ```py import torch from transformers import pipeline chat = [ {"role": "system", "content": "You are a sassy, wise-cracking robot as imagined by Hollywood circa 1986."}, {"role": "user", "content": "Hey, can you tell me any fun things to do in New York?"} ] pipeline = pipeline(task="text-generation", model="meta-llama/Meta-Llama-3-8B-Instruct", dtype=torch.bfloat16, device_map="auto") response = pipeline(chat, max_new_tokens=512) print(response[0]["generated_text"][-1]["content"]) ``` Expand the examples below to see how `Pipeline` works for different modalities and tasks. <details> <summary>Automatic speech recognition</summary> ```py from transformers import pipeline pipeline = pipeline(task="automatic-speech-recognition", model="openai/whisper-large-v3") pipeline("https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac") {'text': ' I have a dream that one day this nation will rise up and live out the true meaning of its creed.'} ``` </details> <details> <summary>Image classification</summary> <h3 align="center"> <a><img src="https://huggingface.co/datasets/Narsil/image_dummy/raw/main/parrots.png"></a> </h3> ```py from transformers import pipeline pipeline = pipeline(task="image-classification", model="facebook/dinov2-small-imagenet1k-1-layer") pipeline("https://huggingface.co/datasets/Narsil/image_dummy/raw/main/parrots.png") [{'label': 'macaw', 'score': 0.997848391532898}, {'label': 'sulphur-crested cockatoo, Kakatoe galerita, Cacatua galerita', 'score': 0.0016551691805943847}, {'label': 'lorikeet', 'score': 0.00018523589824326336}, {'label': 'African grey, African gray, Psittacus erithacus', 'score': 7.85409429227002e-05}, {'label': 'quail', 'score': 5.502637941390276e-05}] ``` </details> <details> <summary>Visual question answering</summary> <h3 align="center"> <a><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/idefics-few-shot.jpg"></a> </h3> ```py from transformers import pipeline pipeline = pipeline(task="visual-question-answering", model="Salesforce/blip-vqa-base") pipeline( image="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/idefics-few-shot.jpg", question="What is in the image?", ) [{'answer': 'statue of liberty'}] ``` </details> ## Why should I use Transformers? 1. Easy-to-use state-of-the-art models: - High performance on natural language understanding & generation, computer vision, audio, video, and multimodal tasks. - Low barrier to entry for researchers, engineers, and developers. - Few user-facing abstractions with just three classes to learn. - A unified API for using all our pretrained models. 1. Lower compute costs, smaller carbon footprint: - Share trained models instead of training from scratch. - Reduce compute time and production costs. - Hundreds of model architectures with 1M+ pretrained checkpoints across all modalities. 1. Choose the right framework for every part of a model's lifetime: - Train state-of-the-art models in 3 lines of code. - Move a single model between PyTorch/JAX/TF2.0 frameworks at will. - Pick the right framework for training, evaluation, and production. 1. Easily customize a model or an example to your needs: - We provide examples for each architecture to reproduce the results published by its original authors. - Model internals are exposed as consistently as possible. - Model files can be used independently of the library for quick experiments. <a target="_blank" href="https://huggingface.co/enterprise"> <img alt="Hugging Face Enterprise Hub" src="https://github.com/user-attachments/assets/247fb16d-d251-4583-96c4-d3d76dda4925"> </a><br> ## When shouldn't I use Transformers? - This library is not a modular toolbox of building blocks for neural nets. The code in the model files is not refactored with additional abstractions on purpose, so that researchers can quickly iterate on each of the models without diving into additional abstractions/files. - The training API is optimized to work with PyTorch models provided by Transformers. For generic machine learning loops, you should use another library like [Accelerate](https://huggingface.co/docs/accelerate). - The [example scripts](https://github.com/huggingface/transformers/tree/main/examples) are only *examples*. They may not necessarily work out-of-the-box on your specific use case and you'll need to adapt the code for it to work. ## 100 projects using Transformers Transformers is more than a toolkit to use pretrained models, it's a community of projects built around it and the Hugging Face Hub. We want Transformers to enable developers, researchers, students, professors, engineers, and anyone else to build their dream projects. In order to celebrate Transformers 100,000 stars, we wanted to put the spotlight on the community with the [awesome-transformers](./awesome-transformers.md) page which lists 100 incredible projects built with Transformers. If you own or use a project that you believe should be part of the list, please open a PR to add it! ## Example models You can test most of our models directly on their [Hub model pages](https://huggingface.co/models). Expand each modality below to see a few example models for various use cases. <details> <summary>Audio</summary> - Audio classification with [CLAP](https://huggingface.co/laion/clap-htsat-fused) - Automatic speech recognition with [Parakeet](https://huggingface.co/nvidia/parakeet-ctc-1.1b#transcribing-using-transformers-%F0%9F%A4%97), [Whisper](https://huggingface.co/openai/whisper-large-v3-turbo), [GLM-ASR](https://huggingface.co/zai-org/GLM-ASR-Nano-2512) and [Moonshine-Streaming](https://huggingface.co/UsefulSensors/moonshine-streaming-medium) - Keyword spotting with [Wav2Vec2](https://huggingface.co/superb/wav2vec2-base-superb-ks) - Speech to speech generation with [Moshi](https://huggingface.co/kyutai/moshiko-pytorch-bf16) - Text to audio with [MusicGen](https://huggingface.co/facebook/musicgen-large) - Text to speech with [CSM](https://huggingface.co/sesame/csm-1b) </details> <details> <summary>Computer vision</summary> - Automatic mask generation with [SAM](https://huggingface.co/facebook/sam-vit-base) - Depth estimation with [DepthPro](https://huggingface.co/apple/DepthPro-hf) - Image classification with [DINO v2](https://huggingface.co/facebook/dinov2-base) - Keypoint detection with [SuperPoint](https://huggingface.co/magic-leap-community/superpoint) - Keypoint matching with [SuperGlue](https://huggingface.co/magic-leap-community/superglue_outdoor) - Object detection with [RT-DETRv2](https://huggingface.co/PekingU/rtdetr_v2_r50vd) - Pose Estimation with [VitPose](https://huggingface.co/usyd-community/vitpose-base-simple) - Universal segmentation with [OneFormer](https://huggingface.co/shi-labs/oneformer_ade20k_swin_large) - Video classification with [VideoMAE](https://huggingface.co/MCG-NJU/videomae-large) </details> <details> <summary>Multimodal</summary> - Audio or text to text with [Voxtral](https://huggingface.co/mistralai/Voxtral-Mini-3B-2507), [Audio Flamingo](https://huggingface.co/nvidia/audio-flamingo-3-hf) - Document question answering with [LayoutLMv3](https://huggingface.co/microsoft/layoutlmv3-base) - Image or text to text with [Qwen-VL](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) - Image captioning [BLIP-2](https://huggingface.co/Salesforce/blip2-opt-2.7b) - OCR-based document understanding with [GOT-OCR2](https://huggingface.co/stepfun-ai/GOT-OCR-2.0-hf) - Table question answering with [TAPAS](https://huggingface.co/google/tapas-base) - Unified multimodal understanding and generation with [Emu3](https://huggingface.co/BAAI/Emu3-Gen) - Vision to text with [Llava-OneVision](https://huggingface.co/llava-hf/llava-onevision-qwen2-0.5b-ov-hf) - Visual question answering with [Llava](https://huggingface.co/llava-hf/llava-1.5-7b-hf) - Visual referring expression segmentation with [Kosmos-2](https://huggingface.co/microsoft/kosmos-2-patch14-224) </details> <details> <summary>NLP</summary> - Masked word completion with [ModernBERT](https://huggingface.co/answerdotai/ModernBERT-base) - Named entity recognition with [Gemma](https://huggingface.co/google/gemma-2-2b) - Question answering with [Mixtral](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) - Summarization with [BART](https://huggingface.co/facebook/bart-large-cnn) - Translation with [T5](https://huggingface.co/google-t5/t5-base) - Text generation with [Llama](https://huggingface.co/meta-llama/Llama-3.2-1B) - Text classification with [Qwen](https://huggingface.co/Qwen/Qwen2.5-0.5B) </details> ## Citation We now have a [paper](https://aclanthology.org/2020.emnlp-demos.6/) you can cite for the 🤗 Transformers library: ```bibtex @inproceedings{wolf-etal-2020-transformers, title = "Transformers: State-of-the-Art Natural Language Processing", author = "Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and Rémi Louf and Morgan Funtowicz and Joe Davison and Sam Shleifer and Patrick von Platen and Clara Ma and Yacine Jernite and Julien Plu and Canwen Xu and Teven Le Scao and Sylvain Gugger and Mariama Drame and Quentin Lhoest and Alexander M. Rush", booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations", month = oct, year = "2020", address = "Online", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2020.emnlp-demos.6/", pages = "38--45" } ```

AI & Machine Learning ML Frameworks
161.4K Github Stars
datasets
Open Source

datasets

TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...

ML Frameworks Data Labeling
4.6K Github Stars
peft
Open Source

peft

<!--- Copyright 2023 The HuggingFace Team. All rights reserved. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. --> <h1 align="center"> <p>🤗 PEFT</p></h1> <h3 align="center"> <p>State-of-the-art Parameter-Efficient Fine-Tuning (PEFT) methods</p> </h3> Fine-tuning large pretrained models is often prohibitively costly due to their scale. Parameter-Efficient Fine-Tuning (PEFT) methods enable efficient adaptation of large pretrained models to various downstream applications by only fine-tuning a small number of (extra) model parameters instead of all the model's parameters. This significantly decreases the computational and storage costs. Recent state-of-the-art PEFT techniques achieve performance comparable to fully fine-tuned models. PEFT is integrated with Transformers for easy model training and inference, Diffusers for conveniently managing different adapters, and Accelerate for distributed training and inference for really big models. > [!TIP] > Visit the [PEFT](https://huggingface.co/PEFT) organization to read about the PEFT methods implemented in the library and to see notebooks demonstrating how to apply these methods to a variety of downstream tasks. Click the "Watch repos" button on the organization page to be notified of newly implemented methods and notebooks! Check the PEFT Adapters API Reference section for a list of supported PEFT methods, and read the [Adapters](https://huggingface.co/docs/peft/en/conceptual_guides/adapter), [Soft prompts](https://huggingface.co/docs/peft/en/conceptual_guides/prompting), and [IA3](https://huggingface.co/docs/peft/en/conceptual_guides/ia3) conceptual guides to learn more about how these methods work. ## Quickstart Install PEFT from pip: ```bash pip install peft ``` Prepare a model for training with a PEFT method such as LoRA by wrapping the base model and PEFT configuration with `get_peft_model`. For the bigscience/mt0-large model, you're only training 0.19% of the parameters! ```python from transformers import AutoModelForCausalLM from peft import LoraConfig, TaskType, get_peft_model device = torch.accelerator.current_accelerator().type if hasattr(torch, "accelerator") else "cuda" model_id = "Qwen/Qwen2.5-3B-Instruct" model = AutoModelForCausalLM.from_pretrained(model_id, device_map=device) peft_config = LoraConfig( r=16, lora_alpha=32, task_type=TaskType.CAUSAL_LM, # target_modules=["q_proj", "v_proj", ...] # optionally indicate target modules ) model = get_peft_model(model, peft_config) model.print_trainable_parameters() # prints: trainable params: 3,686,400 || all params: 3,089,625,088 || trainable%: 0.1193 # now perform training on your dataset, e.g. using transformers Trainer, then save the model model.save_pretrained("qwen2.5-3b-lora") ``` To load a PEFT model for inference: ```python from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel device = torch.accelerator.current_accelerator().type if hasattr(torch, "accelerator") else "cuda" model_id = "Qwen/Qwen2.5-3B-Instruct" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained(model_id, device_map=device) model = PeftModel.from_pretrained(model, "qwen2.5-3b-lora") inputs = tokenizer("Preheat the oven to 350 degrees and place the cookie dough", return_tensors="pt") outputs = model.generate(**inputs.to(device), max_new_tokens=50) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) # prints something like: Preheat the oven to 350 degrees and place the cookie dough in a baking dish [...] ``` ## Why you should use PEFT There are many benefits of using PEFT but the main one is the huge savings in compute and storage, making PEFT applicable to many different use cases. ### High performance on consumer hardware Consider the memory requirements for training the following models on the [ought/raft/twitter_complaints](https://huggingface.co/datasets/ought/raft/viewer/twitter_complaints) dataset with an A100 80GB GPU with more than 64GB of CPU RAM. | Model | Full Finetuning | PEFT-LoRA PyTorch | PEFT-LoRA DeepSpeed with CPU Offloading | | --------- | ---- | ---- | ---- | | bigscience/T0_3B (3B params) | 47.14GB GPU / 2.96GB CPU | 14.4GB GPU / 2.96GB CPU | 9.8GB GPU / 17.8GB CPU | | bigscience/mt0-xxl (12B params) | OOM GPU | 56GB GPU / 3GB CPU | 22GB GPU / 52GB CPU | | bigscience/bloomz-7b1 (7B params) | OOM GPU | 32GB GPU / 3.8GB CPU | 18.1GB GPU / 35GB CPU | With LoRA you can fully finetune a 12B parameter model that would've otherwise run out of memory on the 80GB GPU, and comfortably fit and train a 3B parameter model. When you look at the 3B parameter model's performance, it is comparable to a fully finetuned model at a fraction of the GPU memory. | Submission Name | Accuracy | | --------- | ---- | | Human baseline (crowdsourced) | 0.897 | | Flan-T5 | 0.892 | | lora-t0-3b | 0.863 | > [!TIP] > The bigscience/T0_3B model performance isn't optimized in the table above. You can squeeze even more performance out of it by playing around with the input instruction templates, LoRA hyperparameters, and other training related hyperparameters. The final checkpoint size of this model is just 19MB compared to 11GB of the full bigscience/T0_3B model. Learn more about the advantages of finetuning with PEFT in this [blog post](https://www.philschmid.de/fine-tune-flan-t5-peft). ### Quantization Quantization is another method for reducing the memory requirements of a model by representing the data in a lower precision. It can be combined with PEFT methods to make it even easier to train and load LLMs for inference. * Learn how to finetune [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) with QLoRA and the [TRL](https://huggingface.co/docs/trl/index) library on a 16GB GPU in the [Finetune LLMs on your own consumer hardware using tools from PyTorch and Hugging Face ecosystem](https://pytorch.org/blog/finetune-llms/) blog post. * Learn how to finetune a [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) model for multilingual automatic speech recognition with LoRA and 8-bit quantization in this [notebook](https://colab.research.google.com/drive/1DOkD_5OUjFa0r5Ik3SgywJLJtEo2qLxO?usp=sharing) (see this [notebook](https://colab.research.google.com/drive/1vhF8yueFqha3Y3CpTHN6q9EVcII9EYzs?usp=sharing) instead for an example of streaming a dataset). ### Save compute and storage PEFT can help you save storage by avoiding full finetuning of models on each of downstream task or dataset. In many cases, you're only finetuning a very small fraction of a model's parameters and each checkpoint is only a few MBs in size (instead of GBs). These smaller PEFT adapters demonstrate performance comparable to a fully finetuned model. If you have many datasets, you can save a lot of storage with a PEFT model and not have to worry about catastrophic forgetting or overfitting the backbone or base model. ## PEFT integrations PEFT is widely supported across the Hugging Face ecosystem because of the massive efficiency it brings to training and inference. ### Diffusers The iterative diffusion process consumes a lot of memory which can make it difficult to train. PEFT can help reduce the memory requirements and reduce the storage size of the final model checkpoint. For example, consider the memory required for training a Stable Diffusion model with LoRA on an A100 80GB GPU with more than 64GB of CPU RAM. The final model checkpoint size is only 8.8MB! | Model | Full Finetuning | PEFT-LoRA | PEFT-LoRA with Gradient Checkpointing | | --------- | ---- | ---- | ---- | | CompVis/stable-diffusion-v1-4 | 27.5GB GPU / 3.97GB CPU | 15.5GB GPU / 3.84GB CPU | 8.12GB GPU / 3.77GB CPU | > [!TIP] > Take a look at the [examples/lora_dreambooth/train_dreambooth.py](examples/lora_dreambooth/train_dreambooth.py) training script to try training your own Stable Diffusion model with LoRA, and play around with the [smangrul/peft-lora-sd-dreambooth](https://huggingface.co/spaces/smangrul/peft-lora-sd-dreambooth) Space which is running on a T4 instance. Learn more about the PEFT integration in Diffusers in this [tutorial](https://huggingface.co/docs/peft/main/en/tutorial/peft_integrations#diffusers). ### Transformers PEFT is directly integrated with [Transformers](https://huggingface.co/docs/transformers/main/en/peft). After loading a model, call `add_adapter` to add a new PEFT adapter to the model: ```python from peft import LoraConfig model = ... # transformers model peft_config = LoraConfig(...) model.add_adapter(lora_config, adapter_name="lora_1") ``` To load a trained PEFT adapter, call `load_adapter`: ```python model = ... # transformers model model.load_adapter(<path-to-adapter>, adapter_name="lora_1") ``` And to switch between different adapters, call `set_adapter`: ```python model.set_adapter("lora_2") ``` The Transformers integration doesn't include all the functionalities offered in PEFT, such as methods for merging the adapter into the base model. ### Accelerate [Accelerate](https://huggingface.co/docs/accelerate/index) is a library for distributed training and inference on various training setups and hardware (GPUs, TPUs, Apple Silicon, etc.). PEFT models work with Accelerate out of the box, making it really convenient to train really large models or use them for inference on consumer hardware with limited resources. ### TRL PEFT can also be applied to training LLMs with RLHF components such as the ranker and policy. Get started by reading: * [Fine-tune a Mistral-7b model with Direct Preference Optimization](https://towardsdatascience.com/fine-tune-a-mistral-7b-model-with-direct-preference-optimization-708042745aac) with PEFT and the [TRL](https://huggingface.co/docs/trl/index) library to learn more about the Direct Preference Optimization (DPO) method and how to apply it to a LLM. * [Fine-tuning 20B LLMs with RLHF on a 24GB consumer GPU](https://huggingface.co/blog/trl-peft) with PEFT and the [TRL](https://huggingface.co/docs/trl/index) library, and then try out the [gpt2-sentiment_peft.ipynb](https://github.com/huggingface/trl/blob/main/examples/notebooks/gpt2-sentiment.ipynb) notebook to optimize GPT2 to generate positive movie reviews. * [StackLLaMA: A hands-on guide to train LLaMA with RLHF](https://huggingface.co/blog/stackllama) with PEFT, and then try out the [stack_llama/scripts](https://github.com/huggingface/trl/tree/main/examples/research_projects/stack_llama/scripts) for supervised finetuning, reward modeling, and RL finetuning. ## Model support Use this [Space](https://stevhliu-peft-methods.hf.space) or check out the [docs](https://huggingface.co/docs/peft/main/en/index) to find which models officially support a PEFT method out of the box. Even if you don't see a model listed below, you can manually configure the model config to enable PEFT for a model. Read the [New transformers architecture](https://huggingface.co/docs/peft/main/en/developer_guides/custom_models#new-transformers-architectures) guide to learn how. ## Contribute If you would like to contribute to PEFT, please check out our [contribution guide](https://huggingface.co/docs/peft/developer_guides/contributing). ## Citing 🤗 PEFT To use 🤗 PEFT in your publication, please cite it by using the following BibTeX entry. ```bibtex @Misc{peft, title = {{PEFT}: State-of-the-art Parameter-Efficient Fine-Tuning methods}, author = {Sourab Mangrulkar and Sylvain Gugger and Lysandre Debut and Younes Belkada and Sayak Paul and Benjamin Bossan and Marian Tietz}, howpublished = {\url{https://github.com/huggingface/peft}}, year = {2022} } ```

ML Frameworks
21.3K Github Stars
alignment-handbook
Open Source

alignment-handbook

<p align="center"> <img src="https://raw.githubusercontent.com/huggingface/alignment-handbook/main/assets/handbook.png"> </p> <p align="center"> 🤗 <a href="https://huggingface.co/collections/alignment-handbook/handbook-v01-models-and-datasets-654e424d22e6880da5ebc015" target="_blank">Models & Datasets</a> | 📃 <a href="https://arxiv.org/abs/2310.16944" target="_blank">Technical Report</a> </p> # The Alignment Handbook Robust recipes to continue pretraining and to align language models with human and AI preferences. ## What is this? Just one year ago, chatbots were out of fashion and most people hadn't heard about techniques like Reinforcement Learning from Human Feedback (RLHF) to align language models with human preferences. Then, OpenAI broke the internet with ChatGPT and Meta followed suit by releasing the Llama series of language models which enabled the ML community to build their very own capable chatbots. This has led to a rich ecosystem of datasets and models that have mostly focused on teaching language models to follow instructions through supervised fine-tuning (SFT). However, we know from the [InstructGPT](https://huggingface.co/papers/2203.02155) and [Llama2](https://huggingface.co/papers/2307.09288) papers that significant gains in helpfulness and safety can be had by augmenting SFT with human (or AI) preferences. At the same time, aligning language models to a set of preferences is a fairly novel idea and there are few public resources available on how to train these models, what data to collect, and what metrics to measure for best downstream performance. The Alignment Handbook aims to fill that gap by providing the community with a series of robust training recipes that span the whole pipeline. ## News 🗞️ * **July 24, 2025**: We release the full [post-training recipe](recipes/smollm3/README.md) behind SmolLM3-3B: a state-of-the-art hybrid reasoning model 💭 * **November 21, 2024**: We release the [recipe](recipes/smollm2/README.md) for fine-tuning SmolLM2-Instruct. * **August 18, 2024**: We release SmolLM-Instruct v0.2, along with the [recipe](recipes/smollm/README.md) to fine-tuning small LLMs 💻 * **April 12, 2024**: We release Zephyr 141B (A35B), in collaboration with Argilla and Kaist AI, along with the recipe to fine-tune Mixtral 8x22B with ORPO 🪁 * **March 12, 2024:** We release StarChat2 15B, along with the recipe to train capable coding assistants 🌟 * **March 1, 2024:** We release Zephyr 7B Gemma, which is a new recipe to align Gemma 7B with RLAIF 🔥 * **February 1, 2024:** We release a recipe to align open LLMs with Constitutional AI 📜! See the [recipe](https://github.com/huggingface/alignment-handbook/tree/main/recipes/constitutional-ai) and the [blog post](https://huggingface.co/blog/constitutional_ai) for details. * **January 18, 2024:** We release a suite of evaluations of DPO vs KTO vs IPO, see the [recipe](recipes/pref_align_scan/README.md) and the [blog post](https://huggingface.co/blog/pref-tuning) for details. * **November 10, 2023:** We release all the training code to replicate Zephyr-7b-β 🪁! We also release [No Robots](https://huggingface.co/datasets/HuggingFaceH4/no_robots), a brand new dataset of 10,000 instructions and demonstrations written entirely by skilled human annotators. ## Links 🔗 * [Zephyr 7B models, datasets, and demos](https://huggingface.co/collections/HuggingFaceH4/zephyr-7b-6538c6d6d5ddd1cbb1744a66) ## How to navigate this project 🧭 This project is simple by design and mostly consists of: * [`scripts`](./scripts/) to train and evaluate models. Four steps are included: continued pretraining, supervised-finetuning (SFT) for chat, preference alignment with DPO, and supervised-finetuning with preference alignment with ORPO. Each script supports distributed training of the full model weights with DeepSpeed ZeRO-3, or LoRA/QLoRA for parameter-efficient fine-tuning. * [`recipes`](./recipes/) to reproduce models like Zephyr 7B. Each recipe takes the form of a YAML file which contains all the parameters associated with a single training run. A `gpt2-nl` recipe is also given to illustrate how this handbook can be used for language or domain adaptation, e.g. by continuing to pretrain on a different language, and then SFT and DPO tuning the result. We are also working on a series of guides to explain how methods like direct preference optimization (DPO) work, along with lessons learned from gathering human preferences in practice. To get started, we recommend the following: 1. Follow the [installation instructions](#installation-instructions) to set up your environment etc. 2. Replicate Zephyr-7b-β by following the [recipe instructions](./recipes/zephyr-7b-beta/README.md). If you would like to train chat models on your own datasets, we recommend following the dataset formatting instructions [here](./scripts/README.md#fine-tuning-on-your-datasets). ## Contents The initial release of the handbook will focus on the following techniques: * **Continued pretraining:** adapt language models to a new language or domain, or simply improve it by continued pretraining (causal language modeling) on a new dataset. * **Supervised fine-tuning:** teach language models to follow instructions and tips on how to collect and curate your training dataset. * **Reward modeling:** teach language models to distinguish model responses according to human or AI preferences. * **Rejection sampling:** a simple, but powerful technique to boost the performance of your SFT model. * **Direct preference optimisation (DPO):** a powerful and promising alternative to PPO. * **Odds Ratio Preference Optimisation (ORPO)**: a technique to fine-tune language models with human preferences, combining SFT and DPO in a single stage. ## Installation instructions To run the code in this project, first, create a Python virtual environment using e.g. `uv`: ```shell uv venv handbook --python 3.11 && source handbook/bin/activate && uv pip install --upgrade pip ``` > [!TIP] > To install `uv`, follow the [UV Installation Guide](https://docs.astral.sh/uv/getting-started/installation/). Next, install PyTorch `v2.6.0` ```shell uv pip install torch==2.6.0 --index-url https://download.pytorch.org/whl/cu126 ``` Note that the precise version is important for reproducibility! Since this is hardware-dependent, we also direct you to the [PyTorch Installation Page](https://pytorch.org/get-started/locally/). You can then install the remaining package dependencies as follows: ```shell uv pip install . ``` You will also need Flash Attention 2 installed, which can be done by running: ```shell uv pip install "flash-attn==2.7.4.post1" --no-build-isolation ``` Next, log into your Hugging Face account as follows: ```shell huggingface-cli login ``` Finally, install Git LFS so that you can push models to the Hugging Face Hub: ```shell sudo apt-get install git-lfs ``` You can now check out the `scripts` and `recipes` directories for instructions on how to train some models 🪁! ## Project structure ``` ├── LICENSE ├── Makefile <- Makefile with commands like `make style` ├── README.md <- The top-level README for developers using this project ├── recipes <- Recipe configs, accelerate configs, slurm scripts ├── scripts <- Scripts to train and evaluate chat models ├── setup.cfg <- Installation config (mostly used for configuring code quality & tests) ├── setup.py <- Makes project pip installable (pip install -e .) so `alignment` can be imported ├── src <- Source code for use in this project └── tests <- Unit tests ``` ## Citation If you find the content of this repo useful in your work, please cite it as follows via `\usepackage{biblatex}`: ```bibtex @software{Tunstall_The_Alignment_Handbook, author = {Tunstall, Lewis and Beeching, Edward and Lambert, Nathan and Rajani, Nazneen and Huang, Shengyi and Rasul, Kashif and Bartolome, Alvaro, and M. Patiño, Carlos and M. Rush, Alexander and Wolf, Thomas}, license = {Apache-2.0}, title = {{The Alignment Handbook}}, url = {https://github.com/huggingface/alignment-handbook}, version = {0.4.0.dev0} } ```

Education & Learning ML Frameworks
5.6K Github Stars
text-embeddings-inference
Open Source

text-embeddings-inference

<div align="center"> # Text Embeddings Inference <a href="https://github.com/huggingface/text-embeddings-inference"> <img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/huggingface/text-embeddings-inference?style=social"> </a> <a href="https://huggingface.github.io/text-embeddings-inference"> <img alt="Swagger API documentation" src="https://img.shields.io/badge/API-Swagger-informational"> </a> A blazing fast inference solution for text embeddings models. Benchmark for [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) on an NVIDIA A10 with a sequence length of 512 tokens: <p> <img src="assets/bs1-lat.png" width="400" /> <img src="assets/bs1-tp.png" width="400" /> </p> <p> <img src="assets/bs32-lat.png" width="400" /> <img src="assets/bs32-tp.png" width="400" /> </p> </div> ## Table of contents - [Get Started](#get-started) - [Supported Models](#supported-models) - [Docker](#docker) - [Docker Images](#docker-images) - [API Documentation](#api-documentation) - [Using a private or gated model](#using-a-private-or-gated-model) - [Air gapped deployment](#air-gapped-deployment) - [Using Re-rankers models](#using-re-rankers-models) - [Using Sequence Classification models](#using-sequence-classification-models) - [Using SPLADE pooling](#using-splade-pooling) - [Distributed Tracing](#distributed-tracing) - [gRPC](#grpc) - [Local Install](#local-install) - [Apple Silicon (Homebrew)](#apple-silicon-homebrew) - [Docker Build](#docker-build) - [ARM64 / aarch64](#arm64--aarch64) - [AMD Instinct GPUs (ROCm)](#amd-instinct-gpus-rocm-experimental) - [Examples](#examples) Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5. TEI implements many features such as: * No model graph compilation step * Metal support for local execution on Macs * Small docker images and fast boot times. Get ready for true serverless! * Token based dynamic batching * Optimized transformers code for inference using [Flash Attention](https://github.com/HazyResearch/flash-attention), [Candle](https://github.com/huggingface/candle) and [cuBLASLt](https://docs.nvidia.com/cuda/cublas/#using-the-cublaslt-api) * [Safetensors](https://github.com/huggingface/safetensors) weight loading * [ONNX](https://github.com/onnx/onnx) weight loading * Production ready (distributed tracing with Open Telemetry, Prometheus metrics) ## Get Started ### Supported Models #### Text Embeddings Text Embeddings Inference currently supports Nomic, BERT, CamemBERT, XLM-RoBERTa models with absolute positions, JinaBERT model with Alibi positions and Mistral, Alibaba GTE, Qwen2 models with Rope positions, MPNet, ModernBERT, Qwen3, and Gemma3. Below are some examples of the currently supported models: | MTEB Rank | Model Size | Model Type | Model ID | |-----------|------------------------|----------------|--------------------------------------------------------------------------------------------------| | 2 | 7.57B (Very Expensive) | Qwen3 | [Qwen/Qwen3-Embedding-8B](https://hf.co/Qwen/Qwen3-Embedding-8B) | | 3 | 4.02B (Very Expensive) | Qwen3 | [Qwen/Qwen3-Embedding-4B](https://hf.co/Qwen/Qwen3-Embedding-4B) | | 4 | 509M | Qwen3 | [Qwen/Qwen3-Embedding-0.6B](https://hf.co/Qwen/Qwen3-Embedding-0.6B) | | 6 | 7.61B (Very Expensive) | Qwen2 | [Alibaba-NLP/gte-Qwen2-7B-instruct](https://hf.co/Alibaba-NLP/gte-Qwen2-7B-instruct) | | 7 | 560M | XLM-RoBERTa | [intfloat/multilingual-e5-large-instruct](https://hf.co/intfloat/multilingual-e5-large-instruct) | | 8 | 308M | Gemma3 | [google/embeddinggemma-300m](https://hf.co/google/embeddinggemma-300m) (gated) | | 15 | 1.78B (Expensive) | Qwen2 | [Alibaba-NLP/gte-Qwen2-1.5B-instruct](https://hf.co/Alibaba-NLP/gte-Qwen2-1.5B-instruct) | | 18 | 7.11B (Very Expensive) | Mistral | [Salesforce/SFR-Embedding-2_R](https://hf.co/Salesforce/SFR-Embedding-2_R) | | 35 | 568M | XLM-RoBERTa | [Snowflake/snowflake-arctic-embed-l-v2.0](https://hf.co/Snowflake/snowflake-arctic-embed-l-v2.0) | | 41 | 305M | Alibaba GTE | [Snowflake/snowflake-arctic-embed-m-v2.0](https://hf.co/Snowflake/snowflake-arctic-embed-m-v2.0) | | 52 | 335M | BERT | [WhereIsAI/UAE-Large-V1](https://hf.co/WhereIsAI/UAE-Large-V1) | | 58 | 137M | NomicBERT | [nomic-ai/nomic-embed-text-v1](https://hf.co/nomic-ai/nomic-embed-text-v1) | | 79 | 137M | NomicBERT | [nomic-ai/nomic-embed-text-v1.5](https://hf.co/nomic-ai/nomic-embed-text-v1.5) | | 103 | 109M | MPNet | [sentence-transformers/all-mpnet-base-v2](https://hf.co/sentence-transformers/all-mpnet-base-v2) | | N/A | 475M-A305M | NomicBERT | [nomic-ai/nomic-embed-text-v2-moe](https://hf.co/nomic-ai/nomic-embed-text-v2-moe) | | N/A | 434M | Alibaba GTE | [Alibaba-NLP/gte-large-en-v1.5](https://hf.co/Alibaba-NLP/gte-large-en-v1.5) | | N/A | 396M | ModernBERT | [answerdotai/ModernBERT-large](https://hf.co/answerdotai/ModernBERT-large) | | N/A | 340M | Qwen3 | [voyageai/voyage-4-nano](https://hf.co/voyageai/voyage-4-nano) | | N/A | 137M | JinaBERT | [jinaai/jina-embeddings-v2-base-en](https://hf.co/jinaai/jina-embeddings-v2-base-en) | | N/A | 137M | JinaBERT | [jinaai/jina-embeddings-v2-base-code](https://hf.co/jinaai/jina-embeddings-v2-base-code) | To explore the list of best performing text embeddings models, visit the [Massive Text Embedding Benchmark (MTEB) Leaderboard](https://huggingface.co/spaces/mteb/leaderboard). #### Sequence Classification and Re-Ranking Text Embeddings Inference currently supports CamemBERT, and XLM-RoBERTa Sequence Classification models with absolute positions. Below are some examples of the currently supported models: | Task | Model Type | Model ID | |--------------------|-------------|-----------------------------------------------------------------------------------------------------------------| | Re-Ranking | XLM-RoBERTa | [BAAI/bge-reranker-large](https://huggingface.co/BAAI/bge-reranker-large) | | Re-Ranking | XLM-RoBERTa | [BAAI/bge-reranker-base](https://huggingface.co/BAAI/bge-reranker-base) | | Re-Ranking | GTE | [Alibaba-NLP/gte-multilingual-reranker-base](https://huggingface.co/Alibaba-NLP/gte-multilingual-reranker-base) | | Re-Ranking | ModernBert | [Alibaba-NLP/gte-reranker-modernbert-base](https://huggingface.co/Alibaba-NLP/gte-reranker-modernbert-base) | | Sentiment Analysis | RoBERTa | [SamLowe/roberta-base-go_emotions](https://huggingface.co/SamLowe/roberta-base-go_emotions) | ### Docker ```shell model=Qwen/Qwen3-Embedding-0.6B volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:cuda-1.9 --model-id $model ``` And then you can make requests like ```bash curl 127.0.0.1:8080/embed \ -X POST \ -d '{"inputs":"What is Deep Learning?"}' \ -H 'Content-Type: application/json' ``` **Note:** To use GPUs, you need to install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html). NVIDIA drivers on your machine need to be compatible with CUDA version 12.2 or higher. To see all options to serve your models: ```console $ text-embeddings-router --help Text Embedding Webserver Usage: text-embeddings-router [OPTIONS] --model-id <MODEL_ID> Options: --model-id <MODEL_ID> The Hugging Face model ID, can be any model listed on <https://huggingface.co/models> with the `text-embeddings-inference` tag (meaning it's compatible with Text Embeddings Inference). Alternatively, the specified ID can also be a path to a local directory containing the necessary model files saved by the `save_pretrained(...)` methods of either Transformers or Sentence Transformers. [env: MODEL_ID=] --revision <REVISION> The actual revision of the model if you're referring to a model on the hub. You can use a specific commit id or a branch like `refs/pr/2` [env: REVISION=] --tokenization-workers <TOKENIZATION_WORKERS> Optionally control the number of tokenizer workers used for payload tokenization, validation and truncation. Default to the number of CPU cores on the machine [env: TOKENIZATION_WORKERS=] --dtype <DTYPE> The dtype to be forced upon the model [env: DTYPE=] [possible values: float16, float32] --served-model-name <SERVED_MODEL_NAME> The name of the model that is being served. If not specified, defaults to `--model-id`. It is only used for the OpenAI-compatible endpoints via HTTP [env: SERVED_MODEL_NAME=] --pooling <POOLING> Optionally control the pooling method for embedding models. If `pooling` is not set, the pooling configuration will be parsed from the model `1_Pooling/config.json` configuration. If `pooling` is set, it will override the model pooling configuration [env: POOLING=] Possible values: - cls: Select the CLS token as embedding - mean: Apply Mean pooling to the model embeddings - splade: Apply SPLADE (Sparse Lexical and Expansion) to the model embeddings. This option is only available if the loaded model is a `ForMaskedLM` Transformer model - last-token: Select the last token as embedding --max-concurrent-requests <MAX_CONCURRENT_REQUESTS> The maximum amount of concurrent requests for this particular deployment. Having a low limit will refuse clients requests instead of having them wait for too long and is usually good to handle backpressure correctly [env: MAX_CONCURRENT_REQUESTS=] [default: 512] --max-batch-tokens <MAX_BATCH_TOKENS> **IMPORTANT** This is one critical control to allow maximum usage of the available hardware. This represents the total amount of potential tokens within a batch. For `max_batch_tokens=1000`, you could fit `10` queries of `total_tokens=100` or a single query of `1000` tokens. Overall this number should be the largest possible until the model is compute bound. Since the actual memory overhead depends on the model implementation, text-embeddings-inference cannot infer this number automatically. [env: MAX_BATCH_TOKENS=] [default: 16384] --max-batch-requests <MAX_BATCH_REQUESTS> Optionally control the maximum number of individual requests in a batch [env: MAX_BATCH_REQUESTS=] --max-client-batch-size <MAX_CLIENT_BATCH_SIZE> Control the maximum number of inputs that a client can send in a single request [env: MAX_CLIENT_BATCH_SIZE=] [default: 32] --auto-truncate Control automatic truncation of inputs that exceed the model's maximum supported size. Defaults to `true` (truncation enabled). Set to `false` to disable truncation; when disabled and the model's maximum input length exceeds `--max-batch-tokens`, the server will refuse to start with an error instead of silently truncating sequences. Unused for gRPC servers [env: AUTO_TRUNCATE=] --default-prompt-name <DEFAULT_PROMPT_NAME> The name of the prompt that should be used by default for encoding. If not set, no prompt will be applied. Must be a key in the `sentence-transformers` configuration `prompts` dictionary. For example if ``default_prompt_name`` is "query" and the ``prompts`` is {"query": "query: ", ...}, then the sentence "What is the capital of France?" will be encoded as "query: What is the capital of France?" because the prompt text will be prepended before any text to encode. The argument '--default-prompt-name <DEFAULT_PROMPT_NAME>' cannot be used with '--default-prompt <DEFAULT_PROMPT>` [env: DEFAULT_PROMPT_NAME=] --default-prompt <DEFAULT_PROMPT> The prompt that should be used by default for encoding. If not set, no prompt will be applied. For example if ``default_prompt`` is "query: " then the sentence "What is the capital of France?" will be encoded as "query: What is the capital of France?" because the prompt text will be prepended before any text to encode. The argument '--default-prompt <DEFAULT_PROMPT>' cannot be used with '--default-prompt-name <DEFAULT_PROMPT_NAME>` [env: DEFAULT_PROMPT=] --dense-path <DENSE_PATH> Optionally, define the path to the Dense module required for some embedding models. Some embedding models require an extra `Dense` module which contains a single Linear layer and an activation function. By default, those `Dense` modules are stored under the `2_Dense` directory, but there might be cases where different `Dense` modules are provided, to convert the pooled embeddings into different dimensions, available as `2_Dense_<dims>` e.g. https://huggingface.co/NovaSearch/stella_en_400M_v5. Note that this argument is optional, only required to be set if there is no `modules.json` file or when you want to override a single Dense module path, only when running with the `candle` backend. [env: DENSE_PATH=] --hf-token <HF_TOKEN> Your Hugging Face Hub token. If neither `--hf-token` nor `HF_TOKEN` are set, the token will be read from the `$HF_HOME/token` path, if it exists. This ensures access to private or gated models, and allows for a more permissive rate limiting [env: HF_TOKEN=] --hostname <HOSTNAME> The IP address to listen on [env: HOSTNAME=] [default: 0.0.0.0] -p, --port <PORT> The port to listen on [env: PORT=] [default: 3000] --uds-path <UDS_PATH> The name of the unix socket some text-embeddings-inference backends will use as they communicate internally with gRPC [env: UDS_PATH=] [default: /tmp/text-embeddings-inference-server] --huggingface-hub-cache <HUGGINGFACE_HUB_CACHE> The location of the huggingface hub cache. Used to override the location if you want to provide a mounted disk for instance [env: HUGGINGFACE_HUB_CACHE=] --payload-limit <PAYLOAD_LIMIT> Payload size limit in bytes Default is 2MB [env: PAYLOAD_LIMIT=] [default: 2000000] --api-key <API_KEY> Set an api key for request authorization. By default the server responds to every request. With an api key set, the requests must have the Authorization header set with the api key as Bearer token. [env: API_KEY=] --json-output Outputs the logs in JSON format (useful for telemetry) [env: JSON_OUTPUT=] --disable-spans Whether or not to include the log trace through spans [env: DISABLE_SPANS=] --otlp-endpoint <OTLP_ENDPOINT> The grpc endpoint for opentelemetry. Telemetry is sent to this endpoint as OTLP over gRPC. e.g. `http://localhost:4317` [env: OTLP_ENDPOINT=] --otlp-service-name <OTLP_SERVICE_NAME> The service name for opentelemetry. e.g. `text-embeddings-inference.server` [env: OTLP_SERVICE_NAME=] [default: text-embeddings-inference.server] --prometheus-port <PROMETHEUS_PORT> The Prometheus port to listen on [env: PROMETHEUS_PORT=] [default: 9000] --cors-allow-origin <CORS_ALLOW_ORIGIN> Unused for gRPC servers [env: CORS_ALLOW_ORIGIN=] -h, --help Print help (see a summary with '-h') -V, --version Print version ``` ### Docker Images Text Embeddings Inference ships with multiple Docker images that you can use to target a specific backend: | Architecture | Platform | Image | |----------------------------------------|----------|-------------------------------------------------------------------------| | CPU | x86_64 | ghcr.io/huggingface/text-embeddings-inference:cpu-1.9 | | CPU | aarch64 | ghcr.io/huggingface/text-embeddings-inference:cpu-arm64-1.9 | | Volta | x86_64 | NOT SUPPORTED | | Turing (T4, RTX 2000 series, ...) | x86_64 | ghcr.io/huggingface/text-embeddings-inference:turing-1.9 (experimental) | | Ampere 8.0 (A100, A30) | x86_64 | ghcr.io/huggingface/text-embeddings-inference:1.9 | | Ampere 8.6 (A10, A40, ...) | x86_64 | ghcr.io/huggingface/text-embeddings-inference:86-1.9 | | Ada Lovelace (RTX 4000 series, ...) | x86_64 | ghcr.io/huggingface/text-embeddings-inference:89-1.9 | | Hopper (H100) | x86_64 | ghcr.io/huggingface/text-embeddings-inference:hopper-1.9 | | Blackwell 10.0 (B200, GB200, ...) | x86_64 | ghcr.io/huggingface/text-embeddings-inference:100-1.9 (experimental) | | Blackwell 12.0 (GeForce RTX 50X0, ...) | x86_64 | ghcr.io/huggingface/text-embeddings-inference:120-1.9 (experimental) | | Blackwell 12.1 (DGX Spark GB10, ...) | multi | ghcr.io/huggingface/text-embeddings-inference:121-1.9 (experimental) | **Warning**: Flash Attention is turned off by default for the Turing image as it suffers from precision issues. You can turn Flash Attention v1 ON by using the `USE_FLASH_ATTENTION=True` environment variable. ### API documentation You can consult the OpenAPI documentation of the `text-embeddings-inference` REST API using the `/docs` route. The Swagger UI is also available at: [https://huggingface.github.io/text-embeddings-inference](https://huggingface.github.io/text-embeddings-inference). ### Using a private or gated model You have the option to utilize the `HF_TOKEN` environment variable for configuring the token employed by `text-embeddings-inference`. This allows you to gain access to protected resources. For example: 1. Go to https://huggingface.co/settings/tokens 2. Copy your CLI READ token 3. Export `HF_TOKEN=<your CLI READ token>` or with Docker: ```shell model=<your private model> volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run token=<your CLI READ token> docker run --gpus all -e HF_TOKEN=$token -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:cuda-1.9 --model-id $model ``` ### Air gapped deployment To deploy Text Embeddings Inference in an air-gapped environment, first download the weights and then mount them inside the container using a volume. For example: ```shell # (Optional) create a `models` directory mkdir models cd models # Make sure you have git-lfs installed (https://git-lfs.com) git lfs install git clone https://huggingface.co/Qwen/Qwen3-Embedding-0.6B # Set the models directory as the volume path volume=$PWD # Mount the models directory inside the container with a volume and set the model ID docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:cuda-1.9 --model-id /data/Qwen3-Embedding-0.6B ``` ### Using Re-rankers models `text-embeddings-inference` v0.4.0 added support for CamemBERT, RoBERTa, XLM-RoBERTa, and GTE Sequence Classification models. Re-rankers models are Sequence Classification cross-encoders models with a single class that scores the similarity between a query and a text. See [this blogpost](https://blog.llamaindex.ai/boosting-rag-picking-the-best-embedding-reranker-models-42d079022e83) by the LlamaIndex team to understand how you can use re-rankers models in your RAG pipeline to improve downstream performance. ```shell model=BAAI/bge-reranker-large volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:cuda-1.9 --model-id $model ``` And then you can rank the similarity between a query and a list of texts with: ```bash curl 127.0.0.1:8080/rerank \ -X POST \ -d '{"query": "What is Deep Learning?", "texts": ["Deep Learning is not...", "Deep learning is..."]}' \ -H 'Content-Type: application/json' ``` ### Using Sequence Classification models You can also use classic Sequence Classification models like `SamLowe/roberta-base-go_emotions`: ```shell model=SamLowe/roberta-base-go_emotions volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:cuda-1.9 --model-id $model ``` Once you have deployed the model you can use the `predict` endpoint to get the emotions most associated with an input: ```bash curl 127.0.0.1:8080/predict \ -X POST \ -d '{"inputs":"I like you."}' \ -H 'Content-Type: application/json' ``` ### Using SPLADE pooling You can choose to activate SPLADE pooling for Bert and Distilbert MaskedLM architectures: ```shell model=naver/efficient-splade-VI-BT-large-query volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:cuda-1.9 --model-id $model --pooling splade ``` Once you have deployed the model you can use the `/embed_sparse` endpoint to get the sparse embedding: ```bash curl 127.0.0.1:8080/embed_sparse \ -X POST \ -d '{"inputs":"I like you."}' \ -H 'Content-Type: application/json' ``` ### Distributed Tracing `text-embeddings-inference` is instrumented with distributed tracing using OpenTelemetry. You can use this feature by setting the address to an OTLP collector with the `--otlp-endpoint` argument. ### gRPC `text-embeddings-inference` offers a gRPC API as an alternative to the default HTTP API for high performance deployments. The API protobuf definition can be found [here](https://github.com/huggingface/text-embeddings-inference/blob/main/proto/tei.proto). You can use the gRPC API by adding the `-grpc` tag to any TEI Docker image. For example: ```shell model=Qwen/Qwen3-Embedding-0.6B volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:cuda-1.9-grpc --model-id $model ``` ```shell grpcurl -d '{"inputs": "What is Deep Learning"}' -plaintext 0.0.0.0:8080 tei.v1.Embed/Embed ``` ## Local install ### Apple Silicon (Homebrew) On Apple Silicon (M1/M2/M3/M4), you can install a prebuilt binary via Homebrew: ```shell brew install text-embeddings-inference ``` Then launch Text Embeddings Inference with Metal acceleration: ```shell model=Qwen/Qwen3-Embedding-0.6B text-embeddings-router --model-id $model --port 8080 ``` ### CPU You can also opt to install `text-embeddings-inference` locally. First [install Rust](https://rustup.rs/): ```shell curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh ``` Then run: ```shell # On x86 with ONNX backend (recommended) cargo install --path router -F ort # On x86 with Intel backend cargo install --path router -F mkl # On M1 or M2 cargo install --path router -F metal ``` You can now launch Text Embeddings Inference on CPU with: ```shell model=Qwen/Qwen3-Embedding-0.6B text-embeddings-router --model-id $model --port 8080 ``` **Note:** on some machines, you may also need the OpenSSL libraries and gcc. On Linux machines, run: ```shell sudo apt-get install libssl-dev gcc -y ``` ### CUDA GPUs with CUDA compute capabilities < 7.5 are not supported (V100, Titan V, GTX 1000 series, ...). Make sure you have CUDA and the NVIDIA drivers installed. NVIDIA drivers on your device need to be compatible with CUDA version 12.2 or higher. You also need to add the NVIDIA binaries to your path: ```shell export PATH=$PATH:/usr/local/cuda/bin ``` Then run the following (might take a while as it needs to compile the CUDA kernels): ```shell # On Turing GPUs (T4, RTX 2000 series ... ) cargo install --path router -F candle-cuda-turing # On Ampere, Ada Lovelace, Hopper and Blackwell cargo install --path router -F candle-cuda ``` You can now launch Text Embeddings Inference on GPU as follows: ```shell model=Qwen/Qwen3-Embedding-0.6B text-embeddings-router --model-id $model --port 8080 ``` ## Docker You can build the CPU container with Docker as: ```shell docker build -f Dockerfile . ``` To build the CUDA containers, you need to know the compute cap of the GPU you will be using at runtime, to build the image accordingly: ```shell # Get submodule dependencies git submodule update --init # Example for Turing (T4, RTX 2000 series, ...) runtime_compute_cap=75 # Example for Ampere (A100, ...) runtime_compute_cap=80 # Example for Ampere (A10, ...) runtime_compute_cap=86 # Example for Ada Lovelace (RTX 4000 series, ...) runtime_compute_cap=89 # Example for Hopper (H100, ...) runtime_compute_cap=90 # Example for Blackwell (B200, GB200, ...) runtime_compute_cap=100 # Example for Blackwell (GeForce RTX 50X0, RTX PRO 6000, ...) runtime_compute_cap=120 # Example for Blackwell GB10 (DGX Spark) runtime_compute_cap=121 docker build . -f Dockerfile-cuda --build-arg CUDA_COMPUTE_CAP=$runtime_compute_cap ``` ### ARM64 / aarch64 #### CPU-only (Apple Silicon, Ampere, Graviton) For ARM64 hosts without NVIDIA GPUs, use the CPU Dockerfile. Inference runs on CPU cores only (no Metal/MPS support via Docker). ```shell docker build . -f Dockerfile-arm64 --platform=linux/arm64 ``` #### CUDA on ARM64 (DGX Spark, Jetson) For ARM64 hosts with NVIDIA GPUs, build `Dockerfile-cuda` with the appropriate compute capability and `--platform linux/arm64`: ```shell # DGX Spark (GB10, sm_121) docker build . -f Dockerfile-cuda \ --build-arg CUDA_COMPUTE_CAP=121 \ --platform linux/arm64 # Future ARM64 + Blackwell devices (sm_120) docker build . -f Dockerfile-cuda \ --build-arg CUDA_COMPUTE_CAP=120 \ --platform linux/arm64 ``` ## AMD Instinct GPUs (ROCm) — experimental TEI has experimental support for AMD Instinct GPUs (MI200, MI300 series) via ROCm. You can use the `rocm/pytorch:latest` Docker image or a bare-metal ROCm installation. TEI will auto-detect the GPU at startup. For full setup instructions, see the **[AMD Instinct GPU guide](https://huggingface.github.io/text-embeddings-inference/amd_gpu)**. ## Examples - [Set up an Inference Endpoint with TEI](https://huggingface.co/learn/cookbook/automatic_embedding_tei_inference_endpoints) - [RAG containers with TEI](https://github.com/plaggy/rag-containers)

ML Frameworks Vector Databases
4.9K Github Stars
speech-to-speech
Open Source

speech-to-speech

<div align="center"> <div>&nbsp;</div> <img src="https://raw.githubusercontent.com/huggingface/speech-to-speech/main/logo.png" width="600"/> </div> # Speech To Speech: Build local voice agents with open-source models ## 📖 Quick Index * [Approach](#approach) - [Structure](#structure) - [Modularity](#modularity) * [Setup](#setup) * [Usage](#usage) - [Realtime approach](#realtime-approach) - [Server/Client approach](#serverclient-approach) - [WebSocket approach](#websocket-approach) - [Local approach](#local-approach-running-on-mac) - [LLM Backend](#llm-backend) - [Realtime mode](#realtime-mode) - [Docker Server approach](#docker-server) * [Command-line usage](#command-line-usage) - [Model parameters](#model-parameters) - [Generation parameters](#generation-parameters) - [Notable parameters](#notable-parameters) ## Approach ### Structure This repository implements a speech-to-speech cascaded pipeline consisting of the following parts: 1. **Voice Activity Detection (VAD)** 2. **Speech to Text (STT)** 3. **Language Model (LM)** 4. **Text to Speech (TTS)** ### Modularity The pipeline provides a fully open and modular approach, with a focus on leveraging models available through the Transformers library on the Hugging Face hub. The code is designed for easy modification, and we already support device-specific and external library implementations: **VAD** - [Silero VAD v5](https://github.com/snakers4/silero-vad) **STT** - Any [Whisper](https://huggingface.co/docs/transformers/en/model_doc/whisper) model checkpoint on the Hugging Face Hub through Transformers 🤗, including [whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) and [distil-large-v3](https://huggingface.co/distil-whisper/distil-large-v3) - [Lightning Whisper MLX](https://github.com/mustafaaljadery/lightning-whisper-mlx?tab=readme-ov-file#lightning-whisper-mlx) - [MLX Audio Whisper](https://github.com/huggingface/mlx-audio) - Fast Whisper inference on Apple Silicon - [Parakeet TDT](https://huggingface.co/nvidia/parakeet-tdt-1.1b) - Real-time streaming STT with sub-100ms latency on Apple Silicon (CUDA/CPU via nano-parakeet, no NeMo) - [Paraformer - FunASR](https://github.com/modelscope/FunASR) **LLM** - Any instruction-following model on the [Hugging Face Hub](https://huggingface.co/models?pipeline_tag=text-generation&sort=trending) via Transformers 🤗 - [mlx-lm](https://github.com/ml-explore/mlx-examples/blob/main/llms/README.md) - [OpenAI API](https://platform.openai.com/docs/quickstart) **TTS** - [ChatTTS](https://github.com/2noise/ChatTTS?tab=readme-ov-file) - [Pocket TTS](https://github.com/kyutai-labs/pocket-tts) - Streaming TTS with voice cloning from Kyutai Labs - [Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M) - Fast and high-quality TTS optimized for Apple Silicon - [Qwen3-TTS](https://huggingface.co/Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice) ## Setup Install the default package from PyPI: ```bash pip install speech-to-speech ``` The default install is scoped to the standard realtime voice-agent path: - Parakeet TDT for STT - OpenAI-compatible API for the language model - Qwen3-TTS for speech output - local audio and realtime server modes Optional backends are installed with extras: ```bash pip install "speech-to-speech[kokoro]" pip install "speech-to-speech[pocket]" pip install "speech-to-speech[faster-whisper]" pip install "speech-to-speech[paraformer]" pip install "speech-to-speech[mlx-lm]" pip install "speech-to-speech[websocket]" ``` Deprecated model implementations, including MeloTTS, live in [`archive/`](./archive) and are no longer wired into the CLI. For development from source: ```bash git clone https://github.com/huggingface/speech-to-speech.git cd speech-to-speech uv sync ``` This installs the `speech_to_speech` package in editable mode and makes the `speech-to-speech` CLI command available. The project uses a single `pyproject.toml` with platform markers, so macOS and non-macOS dependencies are resolved automatically from one file. **Note on DeepFilterNet:** DeepFilterNet (used for optional audio enhancement in VAD) requires `numpy<2` and conflicts with Pocket TTS, which requires `numpy>=2`. Install DeepFilterNet manually only in environments where you are not using Pocket TTS. ## Usage The default CLI is equivalent to a realtime Parakeet + OpenAI-compatible LLM + Qwen3-TTS setup. It uses `OPENAI_API_KEY` from the environment unless `--responses_api_api_key` is provided: ```bash speech-to-speech ``` The pipeline can be run in four ways: - **Realtime approach**: Models run locally or on a server, and an OpenAI Realtime-compatible WebSocket API is exposed for another app. - **Server/Client approach**: Models run on a server, and audio input/output are streamed from a client using TCP sockets. - **WebSocket approach**: Models run on a server, and audio input/output are streamed from a client using WebSockets. - **Local approach**: Runs locally. ### Recommended setup ### Realtime Approach The default realtime setup uses `--llm_backend responses-api`, which works with any provider supporting the OpenAI Responses API protocol. Export `OPENAI_API_KEY` with your provider's key before launching, or pass it explicitly with `--responses_api_api_key`. For a non-OpenAI provider, also set `--responses_api_base_url`. ```bash export OPENAI_API_KEY=... ``` The default mode starts the OpenAI Realtime-compatible server: ```bash speech-to-speech ``` This is equivalent to: ```bash speech-to-speech \ --thresh 0.6 \ --stt parakeet-tdt \ --llm_backend responses-api \ --tts qwen3 \ --qwen3_tts_model_name Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice \ --qwen3_tts_speaker Aiden \ --qwen3_tts_language auto \ --qwen3_tts_non_streaming_mode True \ --qwen3_tts_mlx_quantization 6bit \ --model_name gpt-5.4-mini \ --chat_size 30 \ --responses_api_stream \ --enable_live_transcription \ --mode realtime ``` ### Server/Client Approach 1. Run the pipeline on the server: ```bash speech-to-speech --recv_host 0.0.0.0 --send_host 0.0.0.0 ``` 2. Run the client locally to handle microphone input and receive generated audio: ```bash python scripts/listen_and_play.py --host <IP address of your server> ``` ### WebSocket Approach 1. Run the pipeline with WebSocket mode: ```bash speech-to-speech --mode websocket --ws_host 0.0.0.0 --ws_port 8765 ``` 2. Connect to the WebSocket server from your client application at `ws://<server-ip>:8765`. The server handles bidirectional audio streaming: - Send raw audio bytes to the server (16kHz, int16, mono) - Receive generated audio bytes from the server ### Local Approach (Mac) 1. For optimal settings on Mac: ```bash speech-to-speech --local_mac_optimal_settings ``` You can also specify a particular LLM model: ```bash speech-to-speech \ --local_mac_optimal_settings \ --model_name mlx-community/Qwen3-4B-Instruct-2507-bf16 ``` This setting: - Adds `--device mps` to use MPS for all models. - Sets [Parakeet TDT](https://huggingface.co/nvidia/parakeet-tdt-1.1b) for STT (fast streaming ASR on Apple Silicon) - Sets MLX LM as the LLM backend - Sets Qwen3-TTS for TTS - `--tts pocket` and `--tts kokoro` are also valid TTS options on macOS. - Qwen3 on Apple Silicon uses `mlx-audio` and defaults to the `6bit` MLX variant unless you explicitly select a different quantization or model suffix. - To compare the MLX variants locally, run: ```bash python scripts/benchmark_tts.py \ --handlers qwen3 \ --iterations 3 \ --qwen3_mlx_quantizations bf16 4bit 6bit 8bit ``` ### Realtime mode Realtime mode (`--mode realtime`) streams audio over a WebSocket using the OpenAI Realtime protocol, with live transcription and low-latency turn-taking. The server exposes a WebSocket endpoint at `/v1/realtime` that any OpenAI Realtime-compatible client can connect to. #### Connecting with the OpenAI Realtime client ```python from openai import OpenAI client = OpenAI(base_url="http://localhost:8765/v1", api_key="not-needed") with client.beta.realtime.connect(model="model_name") as conn: conn.session.update( session={ "instructions": "You are a helpful assistant.", "turn_detection": {"type": "server_vad", "interrupt_response": True}, } ) # send audio, receive events, etc. for event in conn: print(event.type) ``` #### Supported events **Client -> Server** | Event | Description | |---|---| | `input_audio_buffer.append` | Stream base64 PCM audio. Decoded, resampled to 16 kHz, and chunked for the VAD. | | `session.update` | Deep-merge session config (instructions, tools, voice, turn detection, audio format). | | `conversation.item.create` | Inject `input_text` or `function_call_output` into the LLM context without triggering generation. | | `response.create` | Trigger LLM generation. Supports per-response `instructions` and `tool_choice` overrides. | | `response.cancel` | Cancel the in-progress response and re-enable listening. | **Server -> Client** | Event | Description | |---|---| | `session.created` | Sent on connection with current session config. | | `error` | Protocol errors (`session_limit_reached`, `unknown_or_invalid_event`, `invalid_session_type`, `conversation_already_has_active_response`, etc.) | | `input_audio_buffer.speech_started` | VAD detected user speech. | | `input_audio_buffer.speech_stopped` | End of user speech segment. | | `conversation.item.created` | Acknowledges injected `input_text` from `conversation.item.create`. | | `conversation.item.input_audio_transcription.delta` | Streaming partial transcript (when live transcription is enabled). | | `conversation.item.input_audio_transcription.completed` | Final transcript for the user turn (with duration usage). | | `response.created` | Emitted on the first outbound audio chunk (response is `in_progress`). | | `response.output_audio.delta` | Base64 PCM audio chunk from TTS. | | `response.output_audio.done` | Audio stream complete for the current output item. | | `response.output_audio_transcript.done` | Full assistant text transcript for the turn. | | `response.function_call_arguments.done` | Tool call with `call_id`, `name`, and JSON `arguments`. | | `response.done` | Response finished (`completed`, `cancelled` with reason `turn_detected` or `client_cancelled`). | For the full architecture and design details, see the [Realtime Engine README](./src/speech_to_speech/api/openai_realtime/README.md). ### LLM Backend The LLM is the most compute-intensive and highest-latency component in the pipeline. A single forward pass through a large model can easily dominate the end-to-end response time, so choosing the right backend for your hardware and latency budget matters. To give users the most flexibility, we support the full spectrum of inference solutions: - **Local inference** — `transformers` (CUDA / CPU) and `mlx-lm` (Apple Silicon) run the model entirely on your machine with no external dependency. - **Self-hosted servers** — `--llm_backend responses-api` can point at a local [vLLM](https://github.com/vllm-project/vllm) or [llama.cpp](https://github.com/ggerganov/llama.cpp) server, giving you control over quantization, batching, and hardware while keeping traffic on-premise. - **Provider APIs** — the same `responses-api` backend works with OpenAI, [HuggingFace Inference Providers](https://huggingface.co/inference-providers), [OpenRouter](https://openrouter.ai), and any other provider that implements the OpenAI Responses API. Select a backend with `--llm_backend` (`responses-api` by default) and pair it with `--model_name`. Backend-specific options (`--responses_api_base_url`, `--responses_api_api_key`, `--responses_api_stream`, etc.) are only needed for the `responses-api` backend. > The examples below pair Parakeet TDT (local STT) and Qwen3-TTS (local TTS) with different LLM backends. #### OpenAI-compatible backends (`--llm_backend responses-api`) `--llm_backend responses-api` works with any server that implements the OpenAI Chat Completions API — point `--responses_api_base_url` at the right endpoint and set `--model_name` accordingly: | Backend | `--responses_api_base_url` | `--responses_api_api_key` | |---|---|---| | OpenAI | *(omit, uses OpenAI default)* | `$OPENAI_API_KEY` | | HF Inference Providers | `https://router.huggingface.co/v1` | `$HF_TOKEN` | | OpenRouter | `https://openrouter.ai/api/v1` | `$OPENROUTER_API_KEY` | | vLLM (local) | `http://localhost:8000/v1` | *(omit or any string)* | | llama.cpp (local) | `http://localhost:8080/v1` | *(omit or any string)* | ```bash # OpenAI speech-to-speech \ --mode local \ --stt parakeet-tdt \ --llm_backend responses-api \ --tts qwen3 \ --qwen3_tts_mlx_quantization 6bit \ --model_name "gpt-4o-mini" \ --responses_api_api_key "$OPENAI_API_KEY" \ --responses_api_stream \ --enable_live_transcription ``` ```bash # HF Inference Providers — Qwen3.5-9B via Together speech-to-speech \ --mode local \ --stt parakeet-tdt \ --llm_backend responses-api \ --tts qwen3 \ --qwen3_tts_mlx_quantization 6bit \ --model_name "Qwen/Qwen3.5-9B:together" \ --responses_api_base_url "https://router.huggingface.co/v1" \ --responses_api_api_key "$HF_TOKEN" \ --responses_api_stream \ --enable_live_transcription ``` ```bash # HF Inference Providers — GPT-oss-20B via Groq speech-to-speech \ --stt parakeet-tdt \ --llm_backend responses-api \ --tts qwen3 \ --qwen3_tts_mlx_quantization 6bit \ --model_name "openai/gpt-oss-20b:groq" \ --responses_api_base_url "https://router.huggingface.co/v1" \ --responses_api_api_key "$HF_TOKEN" \ --responses_api_stream \ --enable_live_transcription ``` #### Fully local (Apple Silicon) ```bash # MLX backend (Apple Silicon) speech-to-speech \ --mode local \ --stt parakeet-tdt \ --llm_backend mlx-lm \ --tts qwen3 \ --qwen3_tts_mlx_quantization 6bit \ --model_name "mlx-community/Qwen3-4B-Instruct-2507-bf16" \ --enable_live_transcription ``` ```bash # Transformers backend speech-to-speech \ --mode local \ --stt parakeet-tdt \ --llm_backend transformers \ --tts qwen3 \ --model_name "Qwen/Qwen3-4B-Instruct-2507" \ --enable_live_transcription ``` ### Docker Server #### Install the NVIDIA Container Toolkit https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html #### Start the docker container ```docker compose up``` ### Recommended usage with Cuda Leverage Torch Compile for Whisper with Pocket TTS for a simple low-latency setup: ```bash speech-to-speech \ --stt parakeet-tdt \ --llm_backend transformers \ --tts qwen3 \ --model_name "Qwen/Qwen3-4B-Instruct-2507" \ --enable_live_transcription ``` ### Multi-language Support The pipeline currently supports English, French, Spanish, Chinese, Japanese, and Korean. Two use cases are considered: - **Single-language conversation**: Enforce the language setting using the `--language` flag, specifying the target language code (default is 'en'). - **Language switching**: Set `--language` to 'auto'. The STT detects the language of each spoken prompt and forwards it to the LLM. Optionally, opt in with `--enable_lang_prompt` to also append a "`Please reply to my message in ...`" instruction so the LLM replies in the detected language. This flag defaults to `False` — large LLMs usually pick up the language from context on their own, but the explicit instruction can help smaller models stay in the right language. Please note that you must use STT and LLM checkpoints compatible with the target language(s). For multilingual TTS, use ChatTTS or another backend that supports the target language. #### With the server version: For automatic language detection: ```bash speech-to-speech \ --stt parakeet-tdt \ --language auto \ --llm_backend mlx-lm \ --model_name "mlx-community/Qwen3-4B-Instruct-2507-bf16" ``` Or for one language in particular, chinese in this example ```bash speech-to-speech \ --stt whisper-mlx \ --stt_model_name large-v3 \ --language zh \ --llm_backend mlx-lm \ --model_name mlx-community/Qwen3-4B-Instruct-2507-bf16 ``` #### Local Mac Setup For automatic language detection (note: `--stt whisper-mlx` overrides the default parakeet-tdt from optimal settings, since Whisper `large-v3` has broader language coverage): ```bash speech-to-speech \ --local_mac_optimal_settings \ --stt parakeet-tdt \ --language auto \ --model_name mlx-community/Qwen3-4B-Instruct-2507-bf16 ``` Or for one language in particular, chinese in this example ```bash speech-to-speech \ --local_mac_optimal_settings \ --stt whisper-mlx \ --stt_model_name large-v3 \ --language zh \ --model_name mlx-community/Qwen3-4B-Instruct-2507-bf16 ``` ### Using Pocket TTS Pocket TTS from Kyutai Labs provides streaming TTS with voice cloning capabilities. To use it: ```bash speech-to-speech \ --tts pocket \ --pocket_tts_voice jean \ --pocket_tts_device cpu ``` Available voice presets: `alba`, `marius`, `javert`, `jean`, `fantine`, `cosette`, `eponine`, `azelma`. You can also use custom voice files or HuggingFace paths. ## Command-line Usage > **_NOTE:_** References for all the CLI arguments can be found directly in the [arguments classes](./src/speech_to_speech/arguments_classes) or by running `speech-to-speech -h`. ### Module level Parameters See [ModuleArguments](./src/speech_to_speech/arguments_classes/module_arguments.py) class. Allows to set: - a common `--device` (if one wants each part to run on the same device) - `--mode`: `realtime` (default), `local`, `socket`, or `websocket` - chosen STT implementation (`--stt`) - chosen LLM backend (`--llm_backend`: `transformers`, `mlx-lm`, or `responses-api`) - chosen TTS implementation (`--tts`) - logging level ### VAD parameters See [VADHandlerArguments](./src/speech_to_speech/arguments_classes/vad_arguments.py) class. Notably: - `--thresh`: Threshold value to trigger voice activity detection. - `--min_speech_ms`: Minimum duration of detected voice activity to be considered speech. - `--min_silence_ms`: Minimum length of silence intervals for segmenting speech, balancing sentence cutting and latency reduction. ### STT, LLM and TTS parameters `model_name`, `torch_dtype`, and `device` are exposed for each implementation of the Speech to Text, Language Model, and Text to Speech. STT and TTS parameters use the handler prefix (e.g. `--stt_model_name`, `--llm_device`). LLM model selection and chat settings are shared across backends via unprefixed flags (e.g. `--model_name`, `--chat_size`); backend-specific flags use the `responses_api_` prefix for the `responses-api` backend and `llm_` prefix for local backends. See the [arguments classes](./src/speech_to_speech/arguments_classes) for the full list. For example: ```bash # Local transformers/mlx-lm backend --model_name google/gemma-2b-it # OpenAI-compatible backend --llm_backend responses-api --model_name deepseek-chat --responses_api_base_url https://api.deepseek.com ``` ### Generation parameters Other generation parameters can be set using the handler prefix + `_gen_`, e.g., `--stt_gen_max_new_tokens 128` or `--llm_gen_temperature 0.7`. These parameters can be added to the pipeline part's arguments class if not already exposed. ## Citations ### Silero VAD ```bibtex @misc{Silero VAD, author = {Silero Team}, title = {Silero VAD: pre-trained enterprise-grade Voice Activity Detector (VAD), Number Detector and Language Classifier}, year = {2021}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {\url{https://github.com/snakers4/silero-vad}}, commit = {insert_some_commit_here}, email = {[email protected]} } ``` ### Distil-Whisper ```bibtex @misc{gandhi2023distilwhisper, title={Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling}, author={Sanchit Gandhi and Patrick von Platen and Alexander M. Rush}, year={2023}, eprint={2311.00430}, archivePrefix={arXiv}, primaryClass={cs.CL} } ``` ### Parler-TTS ```bibtex @misc{lacombe-etal-2024-parler-tts, author = {Yoach Lacombe and Vaibhav Srivastav and Sanchit Gandhi}, title = {Parler-TTS}, year = {2024}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {\url{https://github.com/huggingface/parler-tts}} } ```

AI Agents LLM Tools & Chat UIs
4.9K Github Stars
autotrain-advanced
Open Source

autotrain-advanced

# 🤗 AutoTrain Advanced > [!WARNING] > **This project is no longer maintained.** No new features will be added and bugs will not be fixed. We recommend using [Axolotl](https://github.com/axolotl-ai-cloud/axolotl), [TRL](https://github.com/huggingface/trl), or [transformers.Trainer](https://huggingface.co/docs/transformers/main_classes/trainer). AutoTrain Advanced: faster and easier training and deployments of state-of-the-art machine learning models. AutoTrain Advanced is a no-code solution that allows you to train machine learning models in just a few clicks. Please note that you must upload data in correct format for project to be created. For help regarding proper data format and pricing, check out the documentation. NOTE: AutoTrain is free! You only pay for the resources you use in case you decide to run AutoTrain on Hugging Face Spaces. When running locally, you only pay for the resources you use on your own infrastructure. ## Supported Tasks | Task | Status | Python Notebook | Example Configs | | --- | --- | --- | --- | | LLM SFT Finetuning | ✅ | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/autotrain-advanced/blob/main/notebooks/llm_finetuning.ipynb) | [llm_sft_finetune.yaml](https://github.com/huggingface/autotrain-advanced/blob/main/configs/llm_finetuning/smollm2.yml) | | LLM ORPO Finetuning | ✅ | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/autotrain-advanced/blob/main/notebooks/llm_finetuning.ipynb) | [llm_orpo_finetune.yaml](https://github.com/huggingface/autotrain-advanced/blob/main/configs/llm_finetuning/llama3-8b-orpo.yml) | | LLM DPO Finetuning | ✅ | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/autotrain-advanced/blob/main/notebooks/llm_finetuning.ipynb) | [llm_dpo_finetune.yaml](https://github.com/huggingface/autotrain-advanced/blob/main/configs/llm_finetuning/llama3-8b-dpo-qlora.yml) | | LLM Reward Finetuning | ✅ | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/autotrain-advanced/blob/main/notebooks/llm_finetuning.ipynb) | [llm_reward_finetune.yaml](https://github.com/huggingface/autotrain-advanced/blob/main/configs/llm_finetuning/llama32-1b-sft.yml) | | LLM Generic/Default Finetuning | ✅ | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/autotrain-advanced/blob/main/notebooks/llm_finetuning.ipynb) | [llm_generic_finetune.yaml](https://github.com/huggingface/autotrain-advanced/blob/main/configs/llm_finetuning/gpt2_sft.yml) | | Text Classification | ✅ | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/autotrain-advanced/blob/main/notebooks/text_classification.ipynb) | [text_classification.yaml](https://github.com/huggingface/autotrain-advanced/tree/main/configs/text_classification) | | Text Regression | ✅ | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/autotrain-advanced/blob/main/notebooks/text_regression.ipynb) | [text_regression.yaml](https://github.com/huggingface/autotrain-advanced/tree/main/configs/text_regression) | | Token Classification | ✅ | Coming Soon | [token_classification.yaml](https://github.com/huggingface/autotrain-advanced/tree/main/configs/token_classification) | | Seq2Seq | ✅ | Coming Soon | [seq2seq.yaml](https://github.com/huggingface/autotrain-advanced/tree/main/configs/seq2seq) | | Extractive Question Answering | ✅ | Coming Soon | [extractive_qa.yaml](https://github.com/huggingface/autotrain-advanced/tree/main/configs/extractive_question_answering) | | Image Classification | ✅ | Coming Soon | [image_classification.yaml](https://github.com/huggingface/autotrain-advanced/tree/main/configs/image_classification) | | Image Scoring/Regression | ✅ | Coming Soon | [image_regression.yaml](https://github.com/huggingface/autotrain-advanced/tree/main/configs/image_scoring) | | VLM | 🟥 | Coming Soon | [vlm.yaml](https://github.com/huggingface/autotrain-advanced/tree/main/configs/vlm) | ## Running UI on Colab or Hugging Face Spaces - Deploy AutoTrain on Hugging Face Spaces: [![Deploy on Spaces](https://huggingface.co/datasets/huggingface/badges/resolve/main/deploy-on-spaces-md.svg)](https://huggingface.co/login?next=%2Fspaces%2Fautotrain-projects%2Fautotrain-advanced%3Fduplicate%3Dtrue) - Run AutoTrain UI on Colab via ngrok: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/autotrain-advanced/blob/main/colabs/AutoTrain_ngrok.ipynb) ## Local Installation You can Install AutoTrain-Advanced python package via PIP. Please note you will need python >= 3.10 for AutoTrain Advanced to work properly. pip install autotrain-advanced Please make sure that you have git lfs installed. Check out the instructions here: https://github.com/git-lfs/git-lfs/wiki/Installation You also need to install torch, torchaudio and torchvision. The best way to run autotrain is in a conda environment. You can create a new conda environment with the following command: conda create -n autotrain python=3.10 conda activate autotrain pip install autotrain-advanced conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia conda install -c "nvidia/label/cuda-12.1.0" cuda-nvcc Once done, you can start the application using: autotrain app --port 8080 --host 127.0.0.1 If you are not fond of UI, you can use AutoTrain Configs to train using command line or simply AutoTrain CLI. To use config file for training, you can use the following command: autotrain --config <path_to_config_file> You can find sample config files in the `configs` directory of this repository. Example config file for finetuning SmolLM2: ```yaml task: llm-sft base_model: HuggingFaceTB/SmolLM2-1.7B-Instruct project_name: autotrain-smollm2-finetune log: tensorboard backend: local data: path: HuggingFaceH4/no_robots train_split: train valid_split: null chat_template: tokenizer column_mapping: text_column: messages params: block_size: 2048 model_max_length: 4096 epochs: 2 batch_size: 1 lr: 1e-5 peft: true quantization: int4 target_modules: all-linear padding: right optimizer: paged_adamw_8bit scheduler: linear gradient_accumulation: 8 mixed_precision: bf16 merge_adapter: true hub: username: ${HF_USERNAME} token: ${HF_TOKEN} push_to_hub: true ``` To fine-tune a model using the config file above, you can use the following command: ```bash $ export HF_USERNAME=<your_hugging_face_username> $ export HF_TOKEN=<your_hugging_face_write_token> $ autotrain --config <path_to_config_file> ``` ## Documentation Documentation is available at https://hf.co/docs/autotrain/ ## Citation ``` @inproceedings{thakur-2024-autotrain, title = "{A}uto{T}rain: No-code training for state-of-the-art models", author = "Thakur, Abhishek", booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations", month = nov, year = "2024", address = "Miami, Florida, USA", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2024.emnlp-demo.44", pages = "419--423", abstract = "With the advancements in open-source models, training(or finetuning) models on custom datasets has become a crucial part of developing solutions which are tailored to specific industrial or open-source applications. Yet, there is no single tool which simplifies the process of training across different types of modalities or tasks.We introduce AutoTrain(aka AutoTrain Advanced){---}an open-source, no code tool/library which can be used to train (or finetune) models for different kinds of tasks such as: large language model (LLM) finetuning, text classification/regression, token classification, sequence-to-sequence task, finetuning of sentence transformers, visual language model (VLM) finetuning, image classification/regression and even classification and regression tasks on tabular data. AutoTrain Advanced is an open-source library providing best practices for training models on custom datasets. The library is available at https://github.com/huggingface/autotrain-advanced. AutoTrain can be used in fully local mode or on cloud machines and works with tens of thousands of models shared on Hugging Face Hub and their variations.", } ```

ML Frameworks
4.6K Github Stars
diffusers
Open Source

diffusers

<!--- Copyright 2022 - The HuggingFace Team. All rights reserved. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. --> <p align="center"> <br> <img src="https://raw.githubusercontent.com/huggingface/diffusers/main/docs/source/en/imgs/diffusers_library.jpg" width="400"/> <br> <p> <p align="center"> <a href="https://github.com/huggingface/diffusers/blob/main/LICENSE"><img alt="GitHub" src="https://img.shields.io/github/license/huggingface/datasets.svg?color=blue"></a> <a href="https://github.com/huggingface/diffusers/releases"><img alt="GitHub release" src="https://img.shields.io/github/release/huggingface/diffusers.svg"></a> <a href="https://pepy.tech/project/diffusers"><img alt="GitHub release" src="https://static.pepy.tech/badge/diffusers/month"></a> <a href="CODE_OF_CONDUCT.md"><img alt="Contributor Covenant" src="https://img.shields.io/badge/Contributor%20Covenant-2.1-4baaaa.svg"></a> <a href="https://twitter.com/diffuserslib"><img alt="X account" src="https://img.shields.io/twitter/url/https/twitter.com/diffuserslib.svg?style=social&label=Follow%20%40diffuserslib"></a> </p> 🤗 Diffusers is the go-to library for state-of-the-art pretrained diffusion models for generating images, audio, and even 3D structures of molecules. Whether you're looking for a simple inference solution or training your own diffusion models, 🤗 Diffusers is a modular toolbox that supports both. Our library is designed with a focus on [usability over performance](https://huggingface.co/docs/diffusers/conceptual/philosophy#usability-over-performance), [simple over easy](https://huggingface.co/docs/diffusers/conceptual/philosophy#simple-over-easy), and [customizability over abstractions](https://huggingface.co/docs/diffusers/conceptual/philosophy#tweakable-contributorfriendly-over-abstraction). 🤗 Diffusers offers three core components: - State-of-the-art [diffusion pipelines](https://huggingface.co/docs/diffusers/api/pipelines/overview) that can be run in inference with just a few lines of code. - Interchangeable noise [schedulers](https://huggingface.co/docs/diffusers/api/schedulers/overview) for different diffusion speeds and output quality. - Pretrained [models](https://huggingface.co/docs/diffusers/api/models/overview) that can be used as building blocks, and combined with schedulers, for creating your own end-to-end diffusion systems. ## Installation We recommend installing 🤗 Diffusers in a virtual environment from PyPI or Conda. For more details about installing [PyTorch](https://pytorch.org/get-started/locally/), please refer to their official documentation. ### PyTorch With `pip` (official package): ```bash pip install --upgrade diffusers[torch] ``` With `conda` (maintained by the community): ```sh conda install -c conda-forge diffusers ``` ### Apple Silicon (M1/M2) support Please refer to the [How to use Stable Diffusion in Apple Silicon](https://huggingface.co/docs/diffusers/optimization/mps) guide. ## Quickstart Generating outputs is super easy with 🤗 Diffusers. To generate an image from text, use the `from_pretrained` method to load any pretrained diffusion model (browse the [Hub](https://huggingface.co/models?library=diffusers&sort=downloads) for 30,000+ checkpoints): ```python from diffusers import DiffusionPipeline import torch pipeline = DiffusionPipeline.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16) pipeline.to("cuda") pipeline("An image of a squirrel in Picasso style").images[0] ``` You can also dig into the models and schedulers toolbox to build your own diffusion system: ```python from diffusers import DDPMScheduler, UNet2DModel from PIL import Image import torch scheduler = DDPMScheduler.from_pretrained("google/ddpm-cat-256") model = UNet2DModel.from_pretrained("google/ddpm-cat-256").to("cuda") scheduler.set_timesteps(50) sample_size = model.config.sample_size noise = torch.randn((1, 3, sample_size, sample_size), device="cuda") input = noise for t in scheduler.timesteps: with torch.no_grad(): noisy_residual = model(input, t).sample prev_noisy_sample = scheduler.step(noisy_residual, t, input).prev_sample input = prev_noisy_sample image = (input / 2 + 0.5).clamp(0, 1) image = image.cpu().permute(0, 2, 3, 1).numpy()[0] image = Image.fromarray((image * 255).round().astype("uint8")) image ``` Check out the [Quickstart](https://huggingface.co/docs/diffusers/quicktour) to launch your diffusion journey today! ## How to navigate the documentation | **Documentation** | **What can I learn?** | |---------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | [Tutorial](https://huggingface.co/docs/diffusers/tutorials/tutorial_overview) | A basic crash course for learning how to use the library's most important features like using models and schedulers to build your own diffusion system, and training your own diffusion model. | | [Loading](https://huggingface.co/docs/diffusers/using-diffusers/loading) | Guides for how to load and configure all the components (pipelines, models, and schedulers) of the library, as well as how to use different schedulers. | | [Pipelines for inference](https://huggingface.co/docs/diffusers/using-diffusers/overview_techniques) | Guides for how to use pipelines for different inference tasks, batched generation, controlling generated outputs and randomness, and how to contribute a pipeline to the library. | | [Optimization](https://huggingface.co/docs/diffusers/optimization/fp16) | Guides for how to optimize your diffusion model to run faster and consume less memory. | | [Training](https://huggingface.co/docs/diffusers/training/overview) | Guides for how to train a diffusion model for different tasks with different training techniques. | ## Contribution We ❤️ contributions from the open-source community! If you want to contribute to this library, please check out our [Contribution guide](https://github.com/huggingface/diffusers/blob/main/CONTRIBUTING.md). You can look out for [issues](https://github.com/huggingface/diffusers/issues) you'd like to tackle to contribute to the library. - See [Good first issues](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22) for general opportunities to contribute - See [New model/pipeline](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22New+pipeline%2Fmodel%22) to contribute exciting new diffusion models / diffusion pipelines - See [New scheduler](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22New+scheduler%22) Also, say 👋 in our public Discord channel <a href="https://discord.gg/G7tWnz98XR"><img alt="Join us on Discord" src="https://img.shields.io/discord/823813159592001537?color=5865F2&logo=discord&logoColor=white"></a>. We discuss the hottest trends about diffusion models, help each other with contributions, personal projects or just hang out ☕. ## Popular Tasks & Pipelines <table> <tr> <th>Task</th> <th>Pipeline</th> <th>🤗 Hub</th> </tr> <tr style="border-top: 2px solid black"> <td>Unconditional Image Generation</td> <td><a href="https://huggingface.co/docs/diffusers/api/pipelines/ddpm"> DDPM </a></td> <td><a href="https://huggingface.co/google/ddpm-ema-church-256"> google/ddpm-ema-church-256 </a></td> </tr> <tr style="border-top: 2px solid black"> <td>Text-to-Image</td> <td><a href="https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/text2img">Stable Diffusion Text-to-Image</a></td> <td><a href="https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5"> stable-diffusion-v1-5/stable-diffusion-v1-5 </a></td> </tr> <tr> <td>Text-to-Image</td> <td><a href="https://huggingface.co/docs/diffusers/api/pipelines/unclip">unCLIP</a></td> <td><a href="https://huggingface.co/kakaobrain/karlo-v1-alpha"> kakaobrain/karlo-v1-alpha </a></td> </tr> <tr> <td>Text-to-Image</td> <td><a href="https://huggingface.co/docs/diffusers/api/pipelines/deepfloyd_if">DeepFloyd IF</a></td> <td><a href="https://huggingface.co/DeepFloyd/IF-I-XL-v1.0"> DeepFloyd/IF-I-XL-v1.0 </a></td> </tr> <tr> <td>Text-to-Image</td> <td><a href="https://huggingface.co/docs/diffusers/api/pipelines/kandinsky">Kandinsky</a></td> <td><a href="https://huggingface.co/kandinsky-community/kandinsky-2-2-decoder"> kandinsky-community/kandinsky-2-2-decoder </a></td> </tr> <tr style="border-top: 2px solid black"> <td>Text-guided Image-to-Image</td> <td><a href="https://huggingface.co/docs/diffusers/api/pipelines/controlnet">ControlNet</a></td> <td><a href="https://huggingface.co/lllyasviel/sd-controlnet-canny"> lllyasviel/sd-controlnet-canny </a></td> </tr> <tr> <td>Text-guided Image-to-Image</td> <td><a href="https://huggingface.co/docs/diffusers/api/pipelines/pix2pix">InstructPix2Pix</a></td> <td><a href="https://huggingface.co/timbrooks/instruct-pix2pix"> timbrooks/instruct-pix2pix </a></td> </tr> <tr> <td>Text-guided Image-to-Image</td> <td><a href="https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/img2img">Stable Diffusion Image-to-Image</a></td> <td><a href="https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5"> stable-diffusion-v1-5/stable-diffusion-v1-5 </a></td> </tr> <tr style="border-top: 2px solid black"> <td>Text-guided Image Inpainting</td> <td><a href="https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/inpaint">Stable Diffusion Inpainting</a></td> <td><a href="https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-inpainting"> stable-diffusion-v1-5/stable-diffusion-inpainting </a></td> </tr> <tr style="border-top: 2px solid black"> <td>Image Variation</td> <td><a href="https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/image_variation">Stable Diffusion Image Variation</a></td> <td><a href="https://huggingface.co/lambdalabs/sd-image-variations-diffusers"> lambdalabs/sd-image-variations-diffusers </a></td> </tr> <tr style="border-top: 2px solid black"> <td>Super Resolution</td> <td><a href="https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/upscale">Stable Diffusion Upscale</a></td> <td><a href="https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler"> stabilityai/stable-diffusion-x4-upscaler </a></td> </tr> <tr> <td>Super Resolution</td> <td><a href="https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/latent_upscale">Stable Diffusion Latent Upscale</a></td> <td><a href="https://huggingface.co/stabilityai/sd-x2-latent-upscaler"> stabilityai/sd-x2-latent-upscaler </a></td> </tr> </table> ## Popular libraries using 🧨 Diffusers - https://github.com/microsoft/TaskMatrix - https://github.com/invoke-ai/InvokeAI - https://github.com/InstantID/InstantID - https://github.com/apple/ml-stable-diffusion - https://github.com/Sanster/lama-cleaner - https://github.com/IDEA-Research/Grounded-Segment-Anything - https://github.com/ashawkey/stable-dreamfusion - https://github.com/deep-floyd/IF - https://github.com/bentoml/BentoML - https://github.com/bmaltais/kohya_ss - +14,000 other amazing GitHub repositories 💪 Thank you for using us ❤️. ## Credits This library concretizes previous work by many different authors and would not have been possible without their great research and implementations. We'd like to thank, in particular, the following implementations which have helped us in our development and without which the API could not have been as polished today: - @CompVis' latent diffusion models library, available [here](https://github.com/CompVis/latent-diffusion) - @hojonathanho original DDPM implementation, available [here](https://github.com/hojonathanho/diffusion) as well as the extremely useful translation into PyTorch by @pesser, available [here](https://github.com/pesser/pytorch_diffusion) - @ermongroup's DDIM implementation, available [here](https://github.com/ermongroup/ddim) - @yang-song's Score-VE and Score-VP implementations, available [here](https://github.com/yang-song/score_sde_pytorch) We also want to thank @heejkoo for the very helpful overview of papers, code and resources on diffusion models, available [here](https://github.com/heejkoo/Awesome-Diffusion-Models) as well as @crowsonkb and @rromb for useful discussions and insights. ## Citation ```bibtex @misc{von-platen-etal-2022-diffusers, author = {Patrick von Platen and Suraj Patil and Anton Lozhkov and Pedro Cuenca and Nathan Lambert and Kashif Rasul and Mishig Davaadorj and Dhruv Nair and Sayak Paul and William Berman and Yiyi Xu and Steven Liu and Thomas Wolf}, title = {Diffusers: State-of-the-art diffusion models}, year = {2022}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {\url{https://github.com/huggingface/diffusers}} } ```

ML Frameworks Image Editing
33.8K Github Stars
deep-rl-class
Open Source

deep-rl-class

This repo contains the Hugging Face Deep Reinforcement Learning Course.

Education & Learning
4.9K Github Stars
tokenizers
Open Source

tokenizers

<p align="center"> <br> <img src="https://huggingface.co/landing/assets/tokenizers/tokenizers-logo.png" width="600"/> <br> <p> <p align="center"> <img alt="Build" src="https://github.com/huggingface/tokenizers/workflows/Rust/badge.svg"> <a href="https://github.com/huggingface/tokenizers/blob/main/LICENSE"> <img alt="GitHub" src="https://img.shields.io/github/license/huggingface/tokenizers.svg?color=blue&cachedrop"> </a> <a href="https://pepy.tech/project/tokenizers"> <img src="https://pepy.tech/badge/tokenizers/week" /> </a> </p> Provides an implementation of today's most used tokenizers, with a focus on performance and versatility. ## Main features: - Train new vocabularies and tokenize, using today's most used tokenizers. - Extremely fast (both training and tokenization), thanks to the Rust implementation. Takes less than 20 seconds to tokenize a GB of text on a server's CPU. - Easy to use, but also extremely versatile. - Designed for research and production. - Normalization comes with alignments tracking. It's always possible to get the part of the original sentence that corresponds to a given token. - Does all the pre-processing: Truncate, Pad, add the special tokens your model needs. ## Performances Performances can vary depending on hardware, but running the [~/bindings/python/benches/test_tiktoken.py](bindings/python/benches/test_tiktoken.py) should give the following on a g6 aws instance: ![image](https://github.com/user-attachments/assets/2b913d4b-e488-4cbc-b542-f90a6c40643d) ## Bindings We provide bindings to the following languages (more to come!): - [Rust](https://github.com/huggingface/tokenizers/tree/main/tokenizers) (Original implementation) - [Python](https://github.com/huggingface/tokenizers/tree/main/bindings/python) - [Node.js](https://github.com/huggingface/tokenizers/tree/main/bindings/node) - [Ruby](https://github.com/ankane/tokenizers-ruby) (Contributed by @ankane, external repo) ## Installation You can install from source using: ```bash pip install git+https://github.com/huggingface/tokenizers.git#subdirectory=bindings/python ``` or install the released versions with ```bash pip install tokenizers ``` ## Quick example using Python: Choose your model between Byte-Pair Encoding, WordPiece or Unigram and instantiate a tokenizer: ```python from tokenizers import Tokenizer from tokenizers.models import BPE tokenizer = Tokenizer(BPE()) ``` You can customize how pre-tokenization (e.g., splitting into words) is done: ```python from tokenizers.pre_tokenizers import Whitespace tokenizer.pre_tokenizer = Whitespace() ``` Then training your tokenizer on a set of files just takes two lines of codes: ```python from tokenizers.trainers import BpeTrainer trainer = BpeTrainer(special_tokens=["[UNK]", "[CLS]", "[SEP]", "[PAD]", "[MASK]"]) tokenizer.train(files=["wiki.train.raw", "wiki.valid.raw", "wiki.test.raw"], trainer=trainer) ``` Once your tokenizer is trained, encode any text with just one line: ```python output = tokenizer.encode("Hello, y'all! How are you 😁 ?") print(output.tokens) # ["Hello", ",", "y", "'", "all", "!", "How", "are", "you", "[UNK]", "?"] ``` Check the [documentation](https://huggingface.co/docs/tokenizers/index) or the [quicktour](https://huggingface.co/docs/tokenizers/quicktour) to learn more!

JavaScript Libraries & Components ML Frameworks
10.8K Github Stars
knockknock
Open Source

knockknock

# Knock Knock [![made-with-python](https://img.shields.io/badge/Made%20with-Python-red.svg)](#python) [![Downloads](https://pepy.tech/badge/knockknock)](https://pepy.tech/project/knockknock) [![Downloads](https://pepy.tech/badge/knockknock/month)](https://pepy.tech/project/knockknock/month) [![GitHub stars](https://img.shields.io/github/stars/huggingface/knockknock.svg?style=social&label=Star&maxAge=1000)](https://github.com/huggingface/knockknock/stargazers/) A small library to get a notification when your training is complete or when it crashes during the process with two additional lines of code. When training deep learning models, it is common to use early stopping. Apart from a rough estimate, it is difficult to predict when the training will finish. Thus, it can be interesting to set up automatic notifications for your training. It is also interesting to be notified when your training crashes in the middle of the process for unexpected reasons. ## Installation Install with `pip` or equivalent. ```bash pip install knockknock ``` This code has only been tested with Python >= 3.6. ## Usage The library is designed to be used in a seamless way, with minimal code modification: you only need to add a decorator on top your main function call. The return value (if there is one) is also reported in the notification. There are currently *twelve* ways to setup notifications: | Platform | External Contributors | | :-----------------------------------: | :---------------------------------------------------------------------------------------: | | [email](#email) | - | | [Slack](#slack) | - | | [Telegram](#telegram) | - | | [Microsoft Teams](#microsoft-teams) | [@noklam](https://github.com/noklam) | | [Text Message](<#text-message-(sms)>) | [@abhishekkrthakur](https://github.com/abhishekkrthakur) | | [Discord](#discord) | [@watkinsm](https://github.com/watkinsm) | | [Desktop](#desktop-notification) | [@atakanyenel](https://github.com/atakanyenel) [@eyalmazuz](https://github.com/eyalmazuz) | | [Matrix](#matrix) | [@jcklie](https://github.com/jcklie) | | [Amazon Chime](#amazon-chime) | [@prabhakar267](https://github.com/prabhakar267) | | [DingTalk](#dingtalk) | [@wuutiing](https://github.com/wuutiing) | | [RocketChat](#rocketchat) | [@radao](https://github.com/radao) | | [WeChat Work](#wechat-work) | [@jcyk](https://github.com/jcyk) | ### Email The service relies on [Yagmail](https://github.com/kootenpv/yagmail) a GMAIL/SMTP client. You'll need a gmail email address to use it (you can setup one [here](https://accounts.google.com), it's free). I recommend creating a new one (rather than your usual one) since you'll have to modify the account's security settings to allow the Python library to access it by [Turning on less secure apps](https://devanswers.co/allow-less-secure-apps-access-gmail-account/). #### Python ```python from knockknock import email_sender @email_sender(recipient_emails=["<[email protected]>", "<[email protected]>"], sender_email="<grandma'[email protected]>") def train_your_nicest_model(your_nicest_parameters): import time time.sleep(10000) return {'loss': 0.9} # Optional return value ``` #### Command-line ```bash knockknock email \ --recipient-emails <[email protected]>,<[email protected]> \ --sender-email <grandma'[email protected]> \ sleep 10 ``` If `sender_email` is not specified, then the first email in `recipient_emails` will be used as the sender's email. Note that launching this will asks you for the sender's email password. It will be safely stored in the system keyring service through the [`keyring` Python library](https://pypi.org/project/keyring/). ### Slack Similarly, you can also use Slack to get notifications. You'll have to get your Slack room [webhook URL](https://api.slack.com/incoming-webhooks#create_a_webhook) and optionally your [user id](https://api.slack.com/methods/users.identity) (if you want to tag yourself or someone else). #### Python ```python from knockknock import slack_sender webhook_url = "<webhook_url_to_your_slack_room>" @slack_sender(webhook_url=webhook_url, channel="<your_favorite_slack_channel>") def train_your_nicest_model(your_nicest_parameters): import time time.sleep(10000) return {'loss': 0.9} # Optional return value ``` You can also specify an optional argument to tag specific people: `user_mentions=[<your_slack_id>, <grandma's_slack_id>]`. #### Command-line ```bash knockknock slack \ --webhook-url <webhook_url_to_your_slack_room> \ --channel <your_favorite_slack_channel> \ sleep 10 ``` You can also specify an optional argument to tag specific people: `--user-mentions <your_slack_id>,<grandma's_slack_id>`. ### Telegram You can also use Telegram Messenger to get notifications. You'll first have to create your own notification bot by following the three steps provided by Telegram [here](https://core.telegram.org/bots#6-botfather) and save your API access `TOKEN`. Telegram bots are shy and can't send the first message so you'll have to do the first step. By sending the first message, you'll be able to get the `chat_id` required (identification of your messaging room) by visiting `https://api.telegram.org/bot<YourBOTToken>/getUpdates` and get the `int` under the key `message['chat']['id']`. #### Python ```python from knockknock import telegram_sender CHAT_ID: int = <your_messaging_room_id> @telegram_sender(token="<your_api_token>", chat_id=CHAT_ID) def train_your_nicest_model(your_nicest_parameters): import time time.sleep(10000) return {'loss': 0.9} # Optional return value ``` #### Command-line ```bash knockknock telegram \ --token <your_api_token> \ --chat-id <your_messaging_room_id> \ sleep 10 ``` ### Microsoft Teams Thanks to [@noklam](https://github.com/noklam), you can also use Microsoft Teams to get notifications. You'll have to get your Team Channel [webhook URL](https://docs.microsoft.com/en-us/microsoftteams/platform/concepts/connectors/connectors-using). #### Python ```python from knockknock import teams_sender @teams_sender(token="<webhook_url_to_your_teams_channel>") def train_your_nicest_model(your_nicest_parameters): import time time.sleep(10) return {'loss': 0.9} # Optional return value ``` #### Command-line ```bash knockknock teams \ --webhook-url <webhook_url_to_your_teams_channel> \ sleep 10 ``` You can also specify an optional argument to tag specific people: `user_mentions=[<your_teams_id>, <grandma's_teams_id>]`. ### Text Message (SMS) Thanks to [@abhishekkrthakur](https://github.com/abhishekkrthakur), you can use Twilio to send text message notifications. You'll have to setup a [Twilio](www.twilio.com) account [here](https://www.twilio.com/try-twilio), which is paid service with competitive prices: for instance in the US, getting a new number and sending one text message through this service respectively cost $1.00 and $0.0075. You'll need to get (a) a phone number, (b) your [account SID](https://www.twilio.com/docs/glossary/what-is-a-sid) and (c) your [authentification token](https://www.twilio.com/docs/iam/access-tokens). Some detail [here](https://www.twilio.com/docs/iam/api/account). #### Python ```python from knockknock import sms_sender ACCOUNT_SID: str = "<your_account_sid>" AUTH_TOKEN: str = "<your_auth_token>" @sms_sender(account_sid=ACCOUNT_SID, auth_token=AUTH_TOKEN, recipient_number="<recipient's_number>", sender_number="<sender's_number>") def train_your_nicest_model(your_nicest_parameters): import time time.sleep(10) return {'loss': 0.9} # Optional return value ``` #### Command-line ```bash knockknock sms \ --account-sid <your_account_sid> \ --auth-token <your_account_auth_token> \ --recipient-number <recipient_number> \ --sender-number <sender_number> sleep 10 ``` ### Discord Thanks to [@watkinsm](https://github.com/watkinsm), you can also use Discord to get notifications. You'll just have to get your Discord channel's [webhook URL](https://support.discordapp.com/hc/en-us/articles/228383668-Intro-to-Webhooks). #### Python ```python from knockknock import discord_sender webhook_url = "<webhook_url_to_your_discord_channel>" @discord_sender(webhook_url=webhook_url) def train_your_nicest_model(your_nicest_parameters): import time time.sleep(10000) return {'loss': 0.9} # Optional return value ``` #### Command-line ```bash knockknock discord \ --webhook-url <webhook_url_to_your_discord_channel> \ sleep 10 ``` ### Desktop Notification You can also get notified from a desktop notification. It is currently only available for MacOS and Linux and Windows 10. For Linux it uses the nofity-send command which uses libnotify, In order to use libnotify, you have to install a notification server. Cinnamon, Deepin, Enlightenment, GNOME, GNOME Flashback and KDE Plasma use their own implementations to display notifications. In other desktop environments, the notification server needs to be launched using your WM's/DE's "autostart" option. #### Python ```python from knockknock import desktop_sender @desktop_sender(title="Knockknock Desktop Notifier") def train_your_nicest_model(your_nicest_parameters): import time time.sleep(10000) return {"loss": 0.9} ``` #### Command Line ```bash knockknock desktop \ --title 'Knockknock Desktop Notifier' \ sleep 2 ``` ### Matrix Thanks to [@jcklie](https://github.com/jcklie), you can send notifications via [Matrix](https://matrix.org/). The homeserver is the server on which your user that will send messages is registered. Do not forget the schema for the URL (`http` or `https`). You'll have to get the access token for a bot or your own user. The easiest way to obtain it is to look into Riot looking in the riot settings, `Help & About`, down the bottom is: `Access Token:<click to reveal>`. You also need to specify a room alias to which messages are sent. To obtain the alias in Riot, create a room you want to use, then open the room settings under `Room Addresses` and add an alias. #### Python ```python from knockknock import matrix_sender HOMESERVER = "<url_to_your_home_server>" # e.g. https://matrix.org TOKEN = "<your_auth_token>" # e.g. WiTyGizlr8ntvBXdFfZLctyY ROOM = "<room_alias" # e.g. #knockknock:matrix.org @matrix_sender(homeserver=HOMESERVER, token=TOKEN, room=ROOM) def train_your_nicest_model(your_nicest_parameters): import time time.sleep(10000) return {'loss': 0.9} # Optional return value ``` #### Command-line ```bash knockknock matrix \ --homeserver <homeserver> \ --token <token> \ --room <room> \ sleep 10 ``` ### Amazon Chime Thanks to [@prabhakar267](https://github.com/prabhakar267), you can also use Amazon Chime to get notifications. You'll have to get your Chime room [webhook URL](https://docs.aws.amazon.com/chime/latest/dg/webhooks.html). #### Python ```python from knockknock import chime_sender @chime_sender(webhook_url="<webhook_url_to_your_chime_room>") def train_your_nicest_model(your_nicest_parameters): import time time.sleep(10) return {'loss': 0.9} # Optional return value ``` #### Command-line ```bash knockknock chime \ --webhook-url <webhook_url_to_your_chime_room> \ sleep 10 ``` You can also specify an optional argument to tag specific people: `user_mentions=[<your_alias>, <grandma's_alias>]`. ### DingTalk DingTalk is now supported thanks to [@wuutiing](https://github.com/wuutiing). Given DingTalk chatroom robot's webhook url and secret/keywords(at least one of them are set when creating a chatroom robot), your notifications will be sent to reach any one in that chatroom. #### Python ```python from knockknock import dingtalk_sender webhook_url = "<webhook_url_to_your_dingtalk_chatroom_robot>" @dingtalk_sender(webhook_url=webhook_url, secret="<your_robot_secret_if_set>", keywords=["<list_of_keywords_if_set>"]) def train_your_nicest_model(your_nicest_parameters): import time time.sleep(10000) return {'loss': 0.9} # Optional return value ``` #### Command-line ```bash knockknock dingtalk \ --webhook-url <webhook_url_to_your_dingtalk_chatroom_robot> \ --secret <your_robot_secret_if_set> \ sleep 10 ``` You can also specify an optional argument to at specific people: `user_mentions=["<list_of_phonenumbers_who_you_want_to_tag>"]`. ### RocketChat You can use [RocketChat](https://rocket.chat/) to get notifications. You'll need the following before you can post notifications: - a RocketChat server e.g. rocketchat.yourcompany.com - a RocketChat user id (you'll be able to view your user id when you create a personal access token in the next step) - a RocketChat personal access token ([create one as per this guide](https://rocket.chat/docs/developer-guides/rest-api/personal-access-tokens/)) - a RocketChat channel #### Python ```python from knockknock import rocketchat_sender @rocketchat_sender( rocketchat_server_url="<url_to_your_rocketchat_server>", rocketchat_user_id="<your_rocketchat_user_id>", rocketchat_auth_token="<your_rocketchat_auth_token>", channel="<channel_name>") def train_your_nicest_model(your_nicest_parameters): import time time.sleep(10000) return {'loss': 0.9} # Optional return value ``` You can also specify two optional arguments: - to tag specific users: `user_mentions=[<your_user_name>, <grandma's_user_name>]` - to use an alias for the notification: `alias="My Alias"` #### Command-line ```bash knockknock rocketchat \ --rocketchat-server-url <url_to_your_rocketchat_server> \ --rocketchat-user-id <your_rocketchat_user_id> \ --rocketchat-auth-token <your_rocketchat_auth_token> \ --channel <channel_name> \ sleep 10 ``` ### WeChat Work WeChat Work is now supported thanks to [@jcyk](https://github.com/jcyk). Given WeChat Work chatroom robot's webhook url, your notifications will be sent to reach anyone in that chatroom. #### Python ```python from knockknock import wechat_sender webhook_url = "<webhook_url_to_your_wechat_work_chatroom_robot>" @wechat_sender(webhook_url=webhook_url) def train_your_nicest_model(your_nicest_parameters): import time time.sleep(10000) return {'loss': 0.9} # Optional return value ``` #### Command-line ```bash knockknock wechat \ --webhook-url <webhook_url_to_your_wechat_work_chatroom_robot> \ sleep 10 ``` You can also specify an optional argument to tag specific people: `user-mentions=["<list_of_userids_you_want_to_tag>"]` and/or `user-mentions-mobile=["<list_of_phonenumbers_you_want_to_tag>"]`. ## Note on distributed training When using distributed training, a GPU is bound to its process using the local rank variable. Since knockknock works at the process level, if you are using 8 GPUs, you would get 8 notifications at the beginning and 8 notifications at the end... To circumvent that, except for errors, only the master process is allowed to send notifications so that you receive only one notification at the beginning and one notification at the end. **Note:** _In PyTorch, the launch of `torch.distributed.launch` sets up a RANK environment variable for each process (see [here](https://github.com/pytorch/pytorch/blob/master/torch/distributed/launch.py#L211)). This is used to detect the master process, and for now, the only simple way I came up with. Unfortunately, this is not intended to be general for all platforms but I would happily discuss smarter/better ways to handle distributed training in an issue/PR._

ML Frameworks
2.8K Github Stars
setfit
Open Source

setfit

Efficient few-shot learning with Sentence Transformers

AI & Machine Learning ML Frameworks
2.7K Github Stars
transfer-learning-conv-ai
Open Source

transfer-learning-conv-ai

# 🦄 Building a State-of-the-Art Conversational AI with Transfer Learning The present repo contains the code accompanying the blog post [🦄 How to build a State-of-the-Art Conversational AI with Transfer Learning](https://medium.com/@Thomwolf/how-to-build-a-state-of-the-art-conversational-ai-with-transfer-learning-2d818ac26313). This code is a clean and commented code base with training and testing scripts that can be used to train a dialog agent leveraging transfer Learning from an OpenAI GPT and GPT-2 Transformer language model. This codebase can be used to reproduce the results of HuggingFace's participation to NeurIPS 2018 dialog competition [ConvAI2](http://convai.io/) which was state-of-the-art on the automatic metrics. The 3k+ lines of competition code was distilled in about 250 lines of training code with distributed & FP16 options to form the present repository. This model can be trained in about one hour on a 8 V100 cloud instance (currently costs about $25) and a pre-trained model is also made available. ## Installation To install and use the training and inference scripts please clone the repo and install the requirements: ```bash git clone https://github.com/huggingface/transfer-learning-conv-ai cd transfer-learning-conv-ai pip install -r requirements.txt python -m spacy download en ``` ## Installation with Docker To install using docker please build the self-contained image: ```bash docker build -t convai . ``` _Note: Make sure your Docker setup allocates enough memory to building the container. Building with the default of 1.75GB will fail due to large Pytorch wheel._ You can then enter the image ```bash ip-192-168-22-157:transfer-learning-conv-ai loretoparisi$ docker run --rm -it convai bash root@91e241bb823e:/# ls Dockerfile README.md boot dev home lib media models proc root sbin sys train.py utils.py LICENCE bin convai_evaluation.py etc interact.py lib64 mnt opt requirements.txt run srv tmp usr var ``` You can then run the `interact.py` script on the pretrained model: ```bash python3 interact.py --model models/ ``` ## Pretrained model We make a pretrained and fine-tuned model available on our S3 [here](https://s3.amazonaws.com/models.huggingface.co/transfer-learning-chatbot/finetuned_chatbot_gpt.tar.gz). The easiest way to download and use this model is just to run the `interact.py` script to talk with the model. Without any argument, this script will automatically download and cache our model. ## Using the training script The training script can be used in single GPU or multi GPU settings: ```bash python ./train.py # Single GPU training python -m torch.distributed.launch --nproc_per_node=8 ./train.py # Training on 8 GPUs ``` The training script accept several arguments to tweak the training: Argument | Type | Default value | Description ---------|------|---------------|------------ dataset_path | `str` | `""` | Path or url of the dataset. If empty download from S3. dataset_cache | `str` | `'./dataset_cache.bin'` | Path or url of the dataset cache model | `str` | `"openai-gpt"` | Path, url or short name of the model num_candidates | `int` | `2` | Number of candidates for training max_history | `int` | `2` | Number of previous exchanges to keep in history train_batch_size | `int` | `4` | Batch size for training valid_batch_size | `int` | `4` | Batch size for validation gradient_accumulation_steps | `int` | `8` | Accumulate gradients on several steps lr | `float` | `6.25e-5` | Learning rate lm_coef | `float` | `1.0` | LM loss coefficient mc_coef | `float` | `1.0` | Multiple-choice loss coefficient max_norm | `float` | `1.0` | Clipping gradient norm n_epochs | `int` | `3` | Number of training epochs personality_permutations | `int` | `1` | Number of permutations of personality sentences device | `str` | `"cuda" if torch.cuda.is_available() else "cpu"` | Device (cuda or cpu) fp16 | `str` | `""` | Set to O0, O1, O2 or O3 for fp16 training (see apex documentation) local_rank | `int` | `-1` | Local rank for distributed training (-1: not distributed) Here is how to reproduce our results on a server with 8 V100 GPUs (adapt number of nodes and batch sizes to your configuration): ```bash python -m torch.distributed.launch --nproc_per_node=8 ./train.py --gradient_accumulation_steps=4 --lm_coef=2.0 --max_history=2 --n_epochs=1 --num_candidates=4 --personality_permutations=2 --train_batch_size=2 --valid_batch_size=2 ``` This model should give a Hits@1 over 79, perplexity of 20.5 and F1 of 16.5 using the convai2 evaluation script (see below). These numbers are slightly lower than the number we obtained in the ConvAI2 competition. Here is what you can tweak to reach the same results: - in the ConvAI2 competition we also used tweaked position emebddings so that the history of the dialog always start at with the same embeddings. This is easy to add with pytorch-transformers and should improve the hits@1 metric. - in the ConvAI2 competition we used a beam search decoder. While the results are better in term of f1 metric, our feeling is that the human experience is less compelling with beam search versus the nucleus sampling detector which is provided in the present repository. ## Using the interaction script The training script saves all the experiments and checkpoints in a sub-folder named with the timestamp of the experiment in the `./runs` folder of the repository base folder. You can then use the interactive script to interact with the model simply by pointing to this folder. Here is an example command line to run the interactive script: ```bash python ./interact.py --model_checkpoint ./data/Apr17_13-31-38_thunder/ # run the interactive script with a training checkpoint python ./interact.py # run the interactive script with the finetuned model on our S3 ``` The fine-tuned model will gives FINAL Hits@1: 0.715 The interactive script accept a few arguments to tweak the decoding algorithm: Argument | Type | Default value | Description ---------|------|---------------|------------ dataset_path | `str` | `""` | Path or url of the dataset. If empty download from S3. dataset_cache | `str` | `'./dataset_cache.bin'` | Path or url of the dataset cache model | `str` | `"openai-gpt"` | Path, url or short name of the model max_history | `int` | `2` | Number of previous utterances to keep in history device | `str` | `cuda` if `torch.cuda.is_available()` else `cpu` | Device (cuda or cpu) no_sample | action `store_true` | Set to use greedy decoding instead of sampling max_length | `int` | `20` | Maximum length of the output utterances min_length | `int` | `1` | Minimum length of the output utterances seed | `int` | `42` | Seed temperature | `int` | `0.7` | Sampling softmax temperature top_k | `int` | `0` | Filter top-k tokens before sampling (`<=0`: no filtering) top_p | `float` | `0.9` | Nucleus filtering (top-p) before sampling (`<=0.0`: no filtering) ## Running ConvAI2 evaluation scripts To run the evaluation scripts of the ConvAI2 challenge, you first need to install `ParlAI` in the repo base folder like this: ```bash git clone https://github.com/facebookresearch/ParlAI.git cd ParlAI python setup.py develop ``` You can then run the evaluation script from `ParlAI` base folder: ```bash cd ParlAI python ../convai_evaluation.py --eval_type hits@1 # to download and evaluate our fine-tuned model on hits@1 metric python ../convai_evaluation.py --eval_type hits@1 --model_checkpoint ./data/Apr17_13-31-38_thunder/ # to evaluate a training checkpoint on hits@1 metric ``` The evaluation script accept a few arguments to select the evaluation metric and tweak the decoding algorithm: Argument | Type | Default value | Description ---------|------|---------------|------------ eval_type | `str` | `"hits@1"` | Evaluate the model on `hits@1`, `ppl` or `f1` metric on the ConvAI2 validation dataset model | `str` | `"openai-gpt"` | Path, url or short name of the model max_history | `int` | `2` | Number of previous utterances to keep in history device | `str` | `cuda` if `torch.cuda.is_available()` else `cpu` | Device (cuda or cpu) no_sample | action `store_true` | Set to use greedy decoding instead of sampling max_length | `int` | `20` | Maximum length of the output utterances min_length | `int` | `1` | Minimum length of the output utterances seed | `int` | `42` | Seed temperature | `int` | `0.7` | Sampling softmax temperature top_k | `int` | `0` | Filter top-k tokens before sampling (`<=0`: no filtering) top_p | `float` | `0.9` | Nucleus filtering (top-p) before sampling (`<=0.0`: no filtering) ## Data Format see `example_entry.py`, and the comment at the top. ## Citation If you use this code in your research, you can cite our NeurIPS CAI workshop [paper](http://arxiv.org/abs/1901.08149): ```bash @article{DBLP:journals/corr/abs-1901-08149, author = {Thomas Wolf and Victor Sanh and Julien Chaumond and Clement Delangue}, title = {TransferTransfo: {A} Transfer Learning Approach for Neural Network Based Conversational Agents}, journal = {CoRR}, volume = {abs/1901.08149}, year = {2019}, url = {http://arxiv.org/abs/1901.08149}, archivePrefix = {arXiv}, eprint = {1901.08149}, timestamp = {Sat, 02 Feb 2019 16:56:00 +0100}, biburl = {https://dblp.org/rec/bib/journals/corr/abs-1901-08149}, bibsource = {dblp computer science bibliography, https://dblp.org} } ```

AI Agents ML Frameworks Live Chat & Chatbots
1.8K Github Stars
hmtl
Open Source

hmtl

# HMTL (Hierarchical Multi-Task Learning model) **\*\*\*\*\* New November 20th, 2018: Online web demo is available \*\*\*\*\*** We released an [online demo](https://huggingface.co/hmtl/) (along with pre-trained weights) so that you can play yourself with the model. The code for the web interface is also available in the `demo` folder. To download the pre-trained models, please install [git lfs](https://git-lfs.github.com/) and do a `git lfs pull`. The weights of the model will be saved in the model_dumps folder. [__A Hierarchical Multi-Task Approach for Learning Embeddings from Semantic Tasks__](https://arxiv.org/abs/1811.06031)\ Victor SANH, Thomas WOLF, Sebastian RUDER\ Accepted at AAAI 2019 <img src="https://github.com/huggingface/hmtl/blob/master/HMTL_architecture.png" alt="HMTL Architecture" width="350"/> ## About HMTL is a Hierarchical Multi-Task Learning model which combines a set of four carefully selected semantic tasks (namely Named Entity Recoginition, Entity Mention Detection, Relation Extraction and Coreference Resolution). The model achieves state-of-the-art results on Named Entity Recognition, Entity Mention Detection and Relation Extraction. Using [SentEval](https://github.com/facebookresearch/SentEval), we show that as we move from the bottom to the top layers of the model, the model tend to learn more complex semantic representation. For further details on the results, please refer to our [paper](https://arxiv.org/abs/1811.06031). We released the code for _training_, _fine tuning_ and _evaluating_ HMTL. We hope that this code will be useful for building your own Multi-Task models (hierarchical or not). The code is written in __Python__ and powered by __Pytorch__. ## Dependecies and installation The main dependencies are: - [AllenNLP](https://github.com/allenai/allennlp) - [PyTorch](https://pytorch.org/) - [SentEval](https://github.com/facebookresearch/SentEval) (only for evaluating the embeddings) The code works with __Python 3.6__. A stable version of the dependencies is listed in `requirements.txt`. You can quickly setup a working environment by calling the script `./script/machine_setup.sh`. It installs Python 3.6, creates a clean virtual environment, and installs all the required dependencies (listed in `requirements.txt`). Please adapt the script depending on your needs. ## Example usage We based our implementation on the [AllenNLP library](https://github.com/allenai/allennlp). For an introduction to this library, you should check [these tutorials](https://allennlp.org/tutorials). An experiment is defined in a _json_ configuration file (see `configs/*.json` for examples). The configuration file mainly describes the datasets to load, the model to create along with all the hyper-parameters of the model. Once you have set up your configuration file (and defined custom classes such `DatasetReaders` if needed), you can simply launch a training with the following command and arguments: ```bash python train.py --config_file_path configs/hmtl_coref_conll.json --serialization_dir my_first_training ``` Once the training has started, you can simply follow the training in the terminal or open a [Tensorboard](https://www.tensorflow.org/guide/summaries_and_tensorboard) (please make sure you have installed Tensorboard and its Tensorflow dependecy before): ```bash tensorboard --logdir my_first_training/log ``` ## Evaluating the embeddings with SentEval We used [SentEval](https://github.com/facebookresearch/SentEval) to assess the linguistic properties learned by the model. `hmtl_senteval.py` gives an example of how we can create an interface between SentEval and HMTL. It evaluates the linguistic properties learned by every layer of the hiearchy (shared based word embeddings and encoders). ## Data To download the pre-trained embeddings we used in HMTL, you can simply launch the script `./script/data_setup.sh`. We did not attach the datasets used to train HMTL for licensing reasons, but we invite you to collect them by yourself: [OntoNotes 5.0](https://catalog.ldc.upenn.edu/LDC2013T19), [CoNLL2003](https://www.clips.uantwerpen.be/conll2003/ner/), and [ACE2005](https://catalog.ldc.upenn.edu/LDC2006T06). The configuration files expect the datasets to be placed in the `data/` folder. ## References Please consider citing the following paper if you find this repository useful. ``` @article{sanh2018hmtl, title={A Hierarchical Multi-task Approach for Learning Embeddings from Semantic Tasks}, author={Sanh, Victor and Wolf, Thomas and Ruder, Sebastian}, journal={arXiv preprint arXiv:1811.06031}, year={2018} } ```

AI & Machine Learning ML Frameworks
1.2K Github Stars
dataset-viewer
Open Source

dataset-viewer

# Dataset viewer The dataset page includes a table with the dataset's contents, arranged by pages of 100 rows. You can navigate between pages using the buttons at the bottom of the table, filter, search, look at basic statistics, and more. <img width="1015" alt="screenshot of the dataset viewer, for the dataset 'AI-MO/NuminaMath-CoT' hosted on Hugging Face" src="https://github.com/user-attachments/assets/f4299ba4-8582-4b64-b3f5-eb1feb7b9731"> This repository is the backend that provides the dataset viewer with pre-computed data through an API, for all the datasets on the Hub. The frontend viewer component is not part of this repository and is not open-source, as the rest of the Hub. Documentation: - dataset viewer: https://huggingface.co/docs/hub/datasets-viewer - configuration of the datasets: https://huggingface.co/docs/hub/datasets-data-files-configuration - backend's API: https://huggingface.co/docs/dataset-viewer ## You saw a bug 🪲 or want a new feature 🎁 If the dataset viewer is showing an error on your dataset page, please [open a discussion](https://huggingface.co/docs/hub/repositories-pull-requests-discussions) there, it's the most efficient way to fix it. Tag [`@lhoestq`](https://huggingface.co/lhoestq) in the discussion to reach the team directly. If you identify a bigger error and think the dataset viewer has a bug, or if you want to ask for a new feature, please [open a new issue](https://github.com/huggingface/dataset-viewer/issues/new) here. ## Contribute 🤝 You can help by giving ideas, answering questions, reporting bugs, proposing enhancements, improving the documentation, and fixing bugs. See [CONTRIBUTING.md](./CONTRIBUTING.md) for more details. To install this backend and start contributing to the code, see [DEVELOPER_GUIDE.md](./DEVELOPER_GUIDE.md) ## Community 🤗 You can star and watch this [GitHub repository](https://github.com/huggingface/dataset-viewer) to follow the updates. You can ask for help or answer questions on the [Forum](https://discuss.huggingface.co/c/datasets/10) and [Discord](https://discord.com/channels/879548962464493619/1019883044724822016). You can also report bugs and propose enhancements on the code, or the documentation, in the [GitHub issues](https://github.com/huggingface/dataset-viewer/issues).

AI Agents API Tools
869 Github Stars
instruction-tuned-sd
Open Source

instruction-tuned-sd

# Instruction-tuning Stable Diffusion **TL;DR**: Motivated partly by [FLAN](https://arxiv.org/abs/2109.01652) and partly by [InstructPix2Pix](https://arxiv.org/abs/2211.09800), we explore a way to instruction-tune [Stable Diffusion](https://stability.ai/blog/stable-diffusion-public-release). This allows us to prompt our model using an input image and an “instruction”, such as - *Apply a cartoon filter to the natural image*. You can read [our blog post](https://hf.co/blog/instruction-tuning-sd) to know more details. ## Table of contents 🐶 [Motivation](#motivation) <br> 📷 [Data preparation](#data-preparation) <br> 💺 [Training](#training) <br> 🎛 [Models, datasets, demo](#models-datasets-demo) <br> ⭐️ [Inference](#inference) <br> 🧭 [Results](#results) <br> 🤝 [Acknowledgements](#acknowledgements) <br> ## Motivation Instruction-tuning is a supervised way of teaching language models to follow instructions to solve a task. It was introduced in [Fine-tuned Language Models Are Zero-Shot Learners](https://arxiv.org/abs/2109.01652) (FLAN) by Google. From recent times, you might recall works like [Alpaca](https://crfm.stanford.edu/2023/03/13/alpaca.html) and [FLAN V2](https://arxiv.org/abs/2210.11416), which are good examples of how beneficial instruction-tuning can be for various tasks. On the other hand, the idea of teaching Stable Diffusion to follow user instructions to perform edits on input images was introduced in [InstructPix2Pix: Learning to Follow Image Editing Instructions](https://arxiv.org/abs/2211.09800). Our motivation behind this work comes partly from the FLAN line of works and partly from InstructPix2Pix. We wanted to explore if it’s possible to prompt Stable Diffusion with specific instructions and input images to process them as per our needs. <p align="center"> <img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/instruction-tuning-sd.png" width=600/> </p> Our main idea is to first create an instruction prompted dataset (as described in [our blog](https://hf.co/blog/instruction-tuning-sd) and then conduct InstructPix2Pix style training. The end objective is to make Stable Diffusion better at following specific instructions that entail image transformation related operations. ## Data preparation Our data preparation process is inspired by FLAN. Refer to the sections below for more details. * **Cartoonization**: Refer to the `data_preparation` directory. * **Low-level image processing**: Refer to the [dataset card](https://huggingface.co/datasets/instruction-tuning-sd/low-level-image-proc). ## Training > [!TIP] > In case of using custom datasets, one needs to configure the dataset as per their choice as long as you maintain the format presented here. You might have to configure your dataloader and dataset class in case you don't want to make use of the `datasets` library. If you do so, you might have to adjust the training scripts accordingly. ### Dev env setup We recommend using a Python virtual environment for this. Feel free to use your favorite one here. We conducted our experiments with PyTorch 1.13.1 (CUDA 11.6) and a single A100 GPU. Since PyTorch installation can be hardware-dependent, we refer you to the [official docs](https://pytorch.org/) for installing PyTorch. Once PyTorch is installed, we can install the rest of the dependencies: ```bash pip install -r requirements.txt ``` Additionally, we recommend installing [xformers](https://github.com/facebookresearch/xformers) as well for enabling memory-efficient training. > 💡 **Note**: If you're using PyTorch 2.0 then you don't need to additionally install xformers. This is because we default to a memory-efficient attention processor in Diffusers when PyTorch 2.0 is being used. ### Launching training Our training code leverages [🧨 diffusers](https://github.com/huggingface/diffusers), [🤗 accelerate](https://github.com/huggingface/accelerate), and [🤗 transformers](https://github.com/huggingface/transformers). In particular, we extend [this training example](https://github.com/huggingface/diffusers/blob/main/examples/instruct_pix2pix/train_instruct_pix2pix.py) to fit our needs. ### Cartoonization #### Training from scratch using the InstructPix2Pix methodology ```bash export MODEL_ID="runwayml/stable-diffusion-v1-5" export DATASET_ID="instruction-tuning-sd/cartoonization" export OUTPUT_DIR="cartoonization-scratch" accelerate launch --mixed_precision="fp16" train_instruct_pix2pix.py \ --pretrained_model_name_or_path=$MODEL_ID \ --dataset_name=$DATASET_ID \ --use_ema \ --enable_xformers_memory_efficient_attention \ --resolution=256 --random_flip \ --train_batch_size=2 --gradient_accumulation_steps=4 --gradient_checkpointing \ --max_train_steps=15000 \ --checkpointing_steps=5000 --checkpoints_total_limit=1 \ --learning_rate=5e-05 --lr_warmup_steps=0 \ --mixed_precision=fp16 \ --val_image_url="https://hf.co/datasets/diffusers/diffusers-images-docs/resolve/main/mountain.png" \ --validation_prompt="Generate a cartoonized version of the natural image" \ --seed=42 \ --output_dir=$OUTPUT_DIR \ --report_to=wandb \ --push_to_hub ``` > 💡 **Note**: Following InstructPix2Pix, we train on the 256x256 resolution and that doesn't seem to affect the end quality too much when we perform inference with the 512x512 resolution. Once the training successfully launched, the logs will be automatically tracked using Weights and Biases. Depending on how you specified the `checkpointing_steps` and the `max_train_steps`, there will be intermediate checkpoints too. At the end of training, you can expect a directory (namely `OUTPUT_DIR`) that contains the intermediate checkpoints and the final pipeline artifacts. If `--push_to_hub` is specified, the contents of `OUTPUT_DIR` will be pushed to a repository on the Hugging Face Hub. [Here](https://wandb.ai/sayakpaul/instruction-tuning-sd/runs/wszjpb1b) is an example run page on Weights and Biases. [Here](https://huggingface.co/instruction-tuning-sd/scratch-cartoonizer) is an example of how the pipeline repository would look like on the Hugging Face Hub. #### Fine-tuning from InstructPix2Pix ```bash export MODEL_ID="timbrooks/instruct-pix2pix" export DATASET_ID="instruction-tuning-sd/cartoonization" export OUTPUT_DIR="cartoonization-finetuned" accelerate launch --mixed_precision="fp16" finetune_instruct_pix2pix.py \ --pretrained_model_name_or_path=$MODEL_ID \ --dataset_name=$DATASET_ID \ --use_ema \ --enable_xformers_memory_efficient_attention \ --resolution=256 --random_flip \ --train_batch_size=2 --gradient_accumulation_steps=4 --gradient_checkpointing \ --max_train_steps=15000 \ --checkpointing_steps=5000 --checkpoints_total_limit=1 \ --learning_rate=5e-05 --lr_warmup_steps=0 \ --mixed_precision=fp16 \ --val_image_url="https://hf.co/datasets/diffusers/diffusers-images-docs/resolve/main/mountain.png" \ --validation_prompt="Generate a cartoonized version of the natural image" \ --seed=42 \ --output_dir=$OUTPUT_DIR \ --report_to=wandb \ --push_to_hub ``` ### Low-level image processing #### Training from scratch using the InstructPix2Pix methodology ```bash export MODEL_ID="runwayml/stable-diffusion-v1-5" export DATASET_ID="instruction-tuning-sd/low-level-image-proc" export OUTPUT_DIR="low-level-img-proc-scratch" accelerate launch --mixed_precision="fp16" train_instruct_pix2pix.py \ --pretrained_model_name_or_path=$MODEL_ID \ --dataset_name=$DATASET_ID \ --original_image_column="input_image" \ --edit_prompt_column="instruction" \ --edited_image_column="ground_truth_image" \ --use_ema \ --enable_xformers_memory_efficient_attention \ --resolution=256 --random_flip \ --train_batch_size=2 --gradient_accumulation_steps=4 --gradient_checkpointing \ --max_train_steps=15000 \ --checkpointing_steps=5000 --checkpoints_total_limit=1 \ --learning_rate=5e-05 --lr_warmup_steps=0 \ --mixed_precision=fp16 \ --val_image_url="https://hf.co/datasets/sayakpaul/sample-datasets/resolve/main/derain_the_image_1.png" \ --validation_prompt="Derain the image" \ --seed=42 \ --output_dir=$OUTPUT_DIR \ --report_to=wandb \ --push_to_hub ``` #### Fine-tuning from InstructPix2Pix ```bash export MODEL_ID="timbrooks/instruct-pix2pix" export DATASET_ID="instruction-tuning-sd/low-level-image-proc" export OUTPUT_DIR="low-level-img-proc-finetuned" accelerate launch --mixed_precision="fp16" finetune_instruct_pix2pix.py \ --pretrained_model_name_or_path=$MODEL_ID \ --dataset_name=$DATASET_ID \ --original_image_column="input_image" \ --edit_prompt_column="instruction" \ --edited_image_column="ground_truth_image" \ --use_ema \ --enable_xformers_memory_efficient_attention \ --resolution=256 --random_flip \ --train_batch_size=2 --gradient_accumulation_steps=4 --gradient_checkpointing \ --max_train_steps=15000 \ --checkpointing_steps=5000 --checkpoints_total_limit=1 \ --learning_rate=5e-05 --lr_warmup_steps=0 \ --mixed_precision=fp16 \ --val_image_url="https://hf.co/datasets/sayakpaul/sample-datasets/resolve/main/derain_the_image_1.png" \ --validation_prompt="Derain the image" \ --seed=42 \ --output_dir=$OUTPUT_DIR \ --report_to=wandb \ --push_to_hub ``` ## Models, datasets, demo ### **Models**: * [instruction-tuning-sd/scratch-low-level-img-proc](https://huggingface.co/instruction-tuning-sd/scratch-low-level-img-proc) * [instruction-tuning-sd/scratch-cartoonizer](https://huggingface.co/instruction-tuning-sd/scratch-cartoonizer) * [instruction-tuning-sd/cartoonizer](https://huggingface.co/instruction-tuning-sd/cartoonizer) * [instruction-tuning-sd/low-level-img-proc](https://huggingface.co/instruction-tuning-sd/low-level-img-proc) ### **Datasets**: * [Instruction-prompted cartoonization](https://huggingface.co/datasets/instruction-tuning-sd/cartoonization) * [Instruction-prompted low-level image processing](https://huggingface.co/datasets/instruction-tuning-sd/low-level-image-proc) ### Demo on 🤗 Spaces Try out the models interactively WITHOUT any setup: [Demo](https://huggingface.co/spaces/instruction-tuning-sd/instruction-tuned-sd) ## Inference ### Cartoonization ```python import torch from diffusers import StableDiffusionInstructPix2PixPipeline from diffusers.utils import load_image model_id = "instruction-tuning-sd/cartoonizer" pipeline = StableDiffusionInstructPix2PixPipeline.from_pretrained( model_id, torch_dtype=torch.float16, use_auth_token=True ).to("cuda") image_path = "https://hf.co/datasets/diffusers/diffusers-images-docs/resolve/main/mountain.png" image = load_image(image_path) image = pipeline("Cartoonize the following image", image=image).images[0] image.save("image.png") ``` ### Low-level image processing ```python import torch from diffusers import StableDiffusionInstructPix2PixPipeline from diffusers.utils import load_image model_id = "instruction-tuning-sd/low-level-img-proc" pipeline = StableDiffusionInstructPix2PixPipeline.from_pretrained( model_id, torch_dtype=torch.float16, use_auth_token=True ).to("cuda") image_path = "https://hf.co/datasets/sayakpaul/sample-datasets/resolve/main/derain%20the%20image_1.png" image = load_image(image_path) image = pipeline("derain the image", image=image).images[0] image.save("image.png") ``` > 💡 **Note**: Since the above pipelines are essentially of type `StableDiffusionInstructPix2PixPipeline`, you can customize several arguments that the pipeline exposes. Refer to the [official docs](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/pix2pix) for more details. ## Results ### Cartoonization <p align="center"> <img src="https://i.imgur.com/wOCjpdI.jpg"/> </p> --- <p align="center"> <img src="https://i.imgur.com/RhTG8Lf.jpg"/> </p> ### Low-level image processing <p align="center"> <img src="https://i.imgur.com/LOhcJLv.jpg"/> </p> --- <p align="center"> <img src="https://i.imgur.com/uhTqIpY.png"/> </p> Refer to our [blog post](https://hf.co/blog/instruction-tuning-sd) for more discussions on results and open questions. ## Acknowledgements Thanks to [Alara Dirik](https://www.linkedin.com/in/alaradirik/) and [Zhengzhong Tu](https://www.linkedin.com/in/zhengzhongtu) for the helpful discussions. ## Citation ```bibtex @article{ Paul2023instruction-tuning-sd, author = {Paul, Sayak}, title = {Instruction-tuning Stable Diffusion with InstructPix2Pix}, journal = {Hugging Face Blog}, year = {2023}, note = {https://huggingface.co/blog/instruction-tuning-sd}, } ```

ML Frameworks Data Labeling
249 Github Stars
agents-course
Open Source

agents-course

# <a href="https://hf.co/learn/agents-course" target="_blank">The Hugging Face Agents Course</a> If you like the course, **don't hesitate to ⭐ star this repository**. This helps us to **make the course more visible 🤗**. <img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/communication/please_star.gif" alt="Star the repo" /> ## Content The course is divided into 4 units. These will take you from **the basics of agents to a final assignment with a benchmark**. Sign up here (it's free) 👉 <a href="https://bit.ly/hf-learn-agents" target="_blank">https://bit.ly/hf-learn-agents</a> You can access the course here 👉 <a href="https://hf.co/learn/agents-course" target="_blank">https://hf.co/learn/agents-course</a> | Unit | Topic | Description | |---------|----------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------| | 0 | [Welcome to the Course](https://huggingface.co/learn/agents-course/en/unit0/introduction) | Welcome, guidelines, necessary tools, and course overview. | | 1 | [Introduction to Agents](https://huggingface.co/learn/agents-course/en/unit1/introduction) | Definition of agents, LLMs, model family tree, and special tokens. | | 1 Bonus | [Fine-tuning an LLM for Function-calling](https://huggingface.co/learn/agents-course/bonus-unit1/introduction) | Learn how to fine-tune an LLM for Function-Calling | | 2 | [Frameworks for AI Agents](https://huggingface.co/learn/agents-course/unit2/introduction) | Overview of `smolagents`, `LangGraph` and `LlamaIndex`. | | 2.1 | [The Smolagents Framework](https://huggingface.co/learn/agents-course/unit2/smolagents/introduction) | Learn how to build effective agents using the `smolagents` library, a lightweight framework for creating capable AI agents. | | 2.2 | [The LlamaIndex Framework](https://huggingface.co/learn/agents-course/unit2/llama-index/introduction) | Learn how to build LLM-powered agents over your data using indexes and workflows using the `LlamaIndex` toolkit. | | 2.3 | [The LangGraph Framework](https://huggingface.co/learn/agents-course/unit2/langgraph/introduction) | Learn how to build production-ready applications using the `LangGraph` framework giving you control tools over the flow of your agent. | | 2 Bonus | [Observability and Evaluation](https://huggingface.co/learn/agents-course/bonus-unit2/introduction) | Learn how to trace and evaluate your agents. | | 3 | [Use Case for Agentic RAG](https://huggingface.co/learn/agents-course/unit3/agentic-rag/introduction) | Learn how to use Agentic RAG to help agents respond to different use cases using various frameworks. | | 4 | [Final Project - Create, Test and Certify Your Agent](https://huggingface.co/learn/agents-course/unit4/introduction) | Automated evaluation of agents and leaderboard with student results. | | 3 Bonus | [Agents in Games with Pokemon](https://huggingface.co/learn/agents-course/bonus-unit3/introduction) | Explore the exciting intersection of AI Agents and games. | ## Prerequisites - Basic knowledge of Python - Basic knowledge of LLMs ## Contribution Guidelines If you want to contribute to this course, you're welcome to do so. Feel free to open an issue or join the discussion in the [Discord](https://discord.gg/UrrTSsSyjb). For specific contributions, here are some guidelines: ### Small typo and grammar fixes If you find a small typo or grammar mistake, please fix it yourself and submit a pull request. This is very helpful for students. ### New unit If you want to add a new unit, **please create an issue in the repository, describe the unit, and why it should be added**. We will discuss it and if it's a good addition, we can collaborate on it. ## Citing the project To cite this repository in publications: ```bibtex @misc{agents-course, author = {Burtenshaw, Ben and Thomas, Joffrey and Simonini, Thomas and Paniego, Sergio}, title = {The Hugging Face Agents Course}, year = {2025}, howpublished = {\url{https://github.com/huggingface/agents-course}}, note = {GitHub repository}, } ```

AI & Machine Learning Education & Learning
29.2K Github Stars
pytorch-pretrained-BigGAN
Open Source

pytorch-pretrained-BigGAN

# PyTorch pretrained BigGAN An op-for-op PyTorch reimplementation of DeepMind's BigGAN model with the pre-trained weights from DeepMind. ## Introduction This repository contains an op-for-op PyTorch reimplementation of DeepMind's BigGAN that was released with the paper [Large Scale GAN Training for High Fidelity Natural Image Synthesis](https://openreview.net/forum?id=B1xsqj09Fm) by Andrew Brock, Jeff Donahue and Karen Simonyan. This PyTorch implementation of BigGAN is provided with the [pretrained 128x128, 256x256 and 512x512 models by DeepMind](https://tfhub.dev/deepmind/biggan-deep-128/1). We also provide the scripts used to download and convert these models from the TensorFlow Hub models. This reimplementation was done from the raw computation graph of the Tensorflow version and behave similarly to the TensorFlow version (variance of the output difference of the order of 1e-5). This implementation currently only contains the generator as the weights of the discriminator were not released (although the structure of the discriminator is very similar to the generator so it could be added pretty easily. Tell me if you want to do a PR on that, I would be happy to help.) ## Installation This repo was tested on Python 3.6 and PyTorch 1.0.1 PyTorch pretrained BigGAN can be installed from pip as follows: ```bash pip install pytorch-pretrained-biggan ``` If you simply want to play with the GAN this should be enough. If you want to use the conversion scripts and the imagenet utilities, additional requirements are needed, in particular TensorFlow and NLTK. To install all the requirements please use the `full_requirements.txt` file: ```bash git clone https://github.com/huggingface/pytorch-pretrained-BigGAN.git cd pytorch-pretrained-BigGAN pip install -r full_requirements.txt ``` ## Models This repository provide direct and simple access to the pretrained "deep" versions of BigGAN for 128, 256 and 512 pixels resolutions as described in the [associated publication](https://openreview.net/forum?id=B1xsqj09Fm). Here are some details on the models: - `BigGAN-deep-128`: a 50.4M parameters model generating 128x128 pixels images, the model dump weights 201 MB, - `BigGAN-deep-256`: a 55.9M parameters model generating 256x256 pixels images, the model dump weights 224 MB, - `BigGAN-deep-512`: a 56.2M parameters model generating 512x512 pixels images, the model dump weights 225 MB. Please refer to Appendix B of the paper for details on the architectures. All models comprise pre-computed batch norm statistics for 51 truncation values between 0 and 1 (see Appendix C.1 in the paper for details). ## Usage Here is a quick-start example using `BigGAN` with a pre-trained model. See the [doc section](#doc) below for details on these classes and methods. ```python import torch from pytorch_pretrained_biggan import (BigGAN, one_hot_from_names, truncated_noise_sample, save_as_images, display_in_terminal) # OPTIONAL: if you want to have more information on what's happening, activate the logger as follows import logging logging.basicConfig(level=logging.INFO) # Load pre-trained model tokenizer (vocabulary) model = BigGAN.from_pretrained('biggan-deep-256') # Prepare a input truncation = 0.4 class_vector = one_hot_from_names(['soap bubble', 'coffee', 'mushroom'], batch_size=3) noise_vector = truncated_noise_sample(truncation=truncation, batch_size=3) # All in tensors noise_vector = torch.from_numpy(noise_vector) class_vector = torch.from_numpy(class_vector) # If you have a GPU, put everything on cuda noise_vector = noise_vector.to('cuda') class_vector = class_vector.to('cuda') model.to('cuda') # Generate an image with torch.no_grad(): output = model(noise_vector, class_vector, truncation) # If you have a GPU put back on CPU output = output.to('cpu') # If you have a sixtel compatible terminal you can display the images in the terminal # (see https://github.com/saitoha/libsixel for details) display_in_terminal(output) # Save results as png images save_as_images(output) ``` ![output_0](assets/output_0.png) ![output_1](assets/output_1.png) ![output_2](assets/output_2.png) ## Doc ### Loading DeepMind's pre-trained weights To load one of DeepMind's pre-trained models, instantiate a `BigGAN` model with `from_pretrained()` as: ```python model = BigGAN.from_pretrained(PRE_TRAINED_MODEL_NAME_OR_PATH, cache_dir=None) ``` where - `PRE_TRAINED_MODEL_NAME_OR_PATH` is either: - the shortcut name of a Google AI's or OpenAI's pre-trained model selected in the list: - `biggan-deep-128`: 12-layer, 768-hidden, 12-heads, 110M parameters - `biggan-deep-256`: 24-layer, 1024-hidden, 16-heads, 340M parameters - `biggan-deep-512`: 12-layer, 768-hidden, 12-heads , 110M parameters - a path or url to a pretrained model archive containing: - `config.json`: a configuration file for the model, and - `pytorch_model.bin` a PyTorch dump of a pre-trained instance of `BigGAN` (saved with the usual `torch.save()`). If `PRE_TRAINED_MODEL_NAME_OR_PATH` is a shortcut name, the pre-trained weights will be downloaded from AWS S3 (see the links [here](pytorch_pretrained_biggan/model.py)) and stored in a cache folder to avoid future download (the cache folder can be found at `~/.pytorch_pretrained_biggan/`). - `cache_dir` can be an optional path to a specific directory to download and cache the pre-trained model weights. ### Configuration `BigGANConfig` is a class to store and load BigGAN configurations. It's defined in [`config.py`](./pytorch_pretrained_biggan/config.py). Here are some details on the attributes: - `output_dim`: output resolution of the GAN (128, 256 or 512) for the pre-trained models, - `z_dim`: size of the noise vector (128 for the pre-trained models). - `class_embed_dim`: size of the class embedding vectors (128 for the pre-trained models). - `channel_width`: size of each channel (128 for the pre-trained models). - `num_classes`: number of classes in the training dataset, like imagenet (1000 for the pre-trained models). - `layers`: A list of layers definition. Each definition for a layer is a triple of [up-sample in the layer ? (bool), number of input channels (int), number of output channels (int)] - `attention_layer_position`: Position of the self-attention layer in the layer hierarchy (8 for the pre-trained models). - `eps`: epsilon value to use for spectral and batch normalization layers (1e-4 for the pre-trained models). - `n_stats`: number of pre-computed statistics for the batch normalization layers associated to various truncation values between 0 and 1 (51 for the pre-trained models). ### Model `BigGAN` is a PyTorch model (`torch.nn.Module`) of BigGAN defined in [`model.py`](./pytorch_pretrained_biggan/model.py). This model comprises the class embeddings (a linear layer) and the generator with a series of convolutions and conditional batch norms. The discriminator is currently not implemented since pre-trained weights have not been released for it. The inputs and output are **identical to the TensorFlow model inputs and outputs**. We detail them here. `BigGAN` takes as *inputs*: - `z`: a torch.FloatTensor of shape [batch_size, config.z_dim] with noise sampled from a truncated normal distribution, and - `class_label`: an optional torch.LongTensor of shape [batch_size, sequence_length] with the token types indices selected in [0, 1]. Type 0 corresponds to a `sentence A` and type 1 corresponds to a `sentence B` token (see BERT paper for more details). - `truncation`: a float between 0 (not comprised) and 1. The truncation of the truncated normal used for creating the noise vector. This truncation value is used to selecte between a set of pre-computed statistics (means and variances) for the batch norm layers. `BigGAN` *outputs* an array of shape [batch_size, 3, resolution, resolution] where resolution is 128, 256 or 512 depending of the model: ### Utilities: Images, Noise, Imagenet classes We provide a few utility method to use the model. They are defined in [`utils.py`](./pytorch_pretrained_biggan/utils.py). Here are some details on these methods: - `truncated_noise_sample(batch_size=1, dim_z=128, truncation=1., seed=None)`: Create a truncated noise vector. - Params: - batch_size: batch size. - dim_z: dimension of z - truncation: truncation value to use - seed: seed for the random generator - Output: array of shape (batch_size, dim_z) - `convert_to_images(obj)`: Convert an output tensor from BigGAN in a list of images. - Params: - obj: tensor or numpy array of shape (batch_size, channels, height, width) - Output: - list of Pillow Images of size (height, width) - `save_as_images(obj, file_name='output')`: Convert and save an output tensor from BigGAN in a list of saved images. - Params: - obj: tensor or numpy array of shape (batch_size, channels, height, width) - file_name: path and beggingin of filename to save. Images will be saved as `file_name_{image_number}.png` - `display_in_terminal(obj)`: Convert and display an output tensor from BigGAN in the terminal. This function use `libsixel` and will only work in a libsixel-compatible terminal. Please refer to https://github.com/saitoha/libsixel for more details. - Params: - obj: tensor or numpy array of shape (batch_size, channels, height, width) - file_name: path and beggingin of filename to save. Images will be saved as `file_name_{image_number}.png` - `one_hot_from_int(int_or_list, batch_size=1)`: Create a one-hot vector from a class index or a list of class indices. - Params: - int_or_list: int, or list of int, of the imagenet classes (between 0 and 999) - batch_size: batch size. - If int_or_list is an int create a batch of identical classes. - If int_or_list is a list, we should have `len(int_or_list) == batch_size` - Output: - array of shape (batch_size, 1000) - `one_hot_from_names(class_name, batch_size=1)`: Create a one-hot vector from the name of an imagenet class ('tennis ball', 'daisy', ...). We use NLTK's wordnet search to try to find the relevant synset of ImageNet and take the first one. If we can't find it direcly, we look at the hyponyms and hypernyms of the class name. - Params: - class_name: string containing the name of an imagenet object. - Output: - array of shape (batch_size, 1000) ## Download and conversion scripts Scripts to download and convert the TensorFlow models from TensorFlow Hub are provided in [./scripts](./scripts/). The scripts can be used directly as: ```bash ./scripts/download_tf_hub_models.sh ./scripts/convert_tf_hub_models.sh ```

ML Frameworks
1K Github Stars