ComfyUI Whisper
Transcribe audio and add subtitles to videos using Whisper in ComfyUI. Support multiple languages, prompt guidance and multiple whisper models.
Last tested: 07 June 2026 (ComfyUI v0.23.0 | Torch 2.12.0 | Triton 3.7.0 | Python 3.12.3 | L40S | CUDA 13.0 | Ubuntu 24.04)

⭐ Support
If you like my projects and wish to see updates and new features, please consider supporting me. It helps a lot!
Installation
Install via ComfyUI Manager
Usage
Load this workflow into ComfyUI
Models are auto-downloaded to /ComfyUI/models/stt/whisper
Supported Models
'tiny.en', 'tiny', 'base.en', 'base', 'small.en', 'small', 'medium.en', 'medium', 'large-v1', 'large-v2', 'large-v3', 'large', 'large-v3-turbo', 'turbo'
Nodes
Apply Whisper
Transcribe audio and get timestamps for each segment and word.
Add Subtitles To Frames
Add subtitles on the video frames. You can specify font family, font color and x/y positions.
Add Subtitles To Background (Experimental)
Add subtitles like wordcloud on blank frames
Save SRT
Export alignments as SRT files in /ComfyUI/output/srt directory
Updates
7 June 2026
- Use soundfile to save audio and fix torchcodec issues
6 June 2026
- Merge https://github.com/yuvraj108c/ComfyUI-Whisper/pull/39 by @alastaira for alternative model download folders
- Merge https://github.com/yuvraj108c/ComfyUI-Whisper/pull/34 by @MMaximuss for subtitle text outline
2 January 2026
- Export alignments as SRT
- Add
torchcodecto requirements27 August 2025
- Merge https://github.com/yuvraj108c/ComfyUI-Whisper/pull/22 by @francislabountyjr for model patcher, more whisper models support, comfyui model directory support
- Merge https://github.com/yuvraj108c/ComfyUI-Whisper/pull/18 by @qy8502 for Prompt Guidance support
- Support YRDZST Semibold Font
2 May 2025
- Merge https://github.com/yuvraj108c/ComfyUI-Whisper/pull/15 by @niknah for language selection
Credits
License
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)

