Godot Whisper
Features
| Realtime audio transcription | Offline audio transcription |
|---|---|
| GPU acceleration | Flash Attention |
| Voice Activity Detection (VAD) | Quantized models |
| 99 languages | Model downloader |
Platforms
| Platform | GPU Backend |
|---|---|
| macOS | Metal + Accelerate |
| iOS | Metal + Accelerate |
| Windows | OpenCL + Vulkan |
| Linux | OpenCL + Vulkan |
| Android | OpenCL |
| Web | CPU (WebGPU disabled until Godot supports it) |
Video Tutorial
How to install
GitHub Release
Go to a Github Release, copy paste the addons folder to the samples folder.
Godot Assets
Download directly from Godot Asset Library.
Afterwards:
Activate the extension in Project -> Project Settings -> Godot Whisper. Restart the Godot editor.
Models
Models manual download link: Hugging Face.
| Model | Size |
|---|---|
| tiny | 78 MB |
| base | 148 MB |
| small | 244M |
| medium | 769M |
| large-v1 | 1550M |
| large-v2 | 1550M |
| large-v3 | 1550M |
| large-v3-turbo | 809M |
Global settings
Go to Project -> Project Settings -> General -> Audio -> Input (Check Advance Settings).
You will see a bunch of settings there.
Microphone transcription feeds Whisper at 16000 Hz. The addon resamples captured audio from the actual runtime mix rate reported by AudioServer.get_mix_rate().
Optional: set Project Settings -> Audio -> Driver -> Mix Rate (audio/driver/mix_rate) to 16000 to avoid resampling overhead. This may reduce overall game audio quality, so only use it if speech transcription is the main audio workload. Godot may still use a different runtime mix rate on some platforms or devices; verify with AudioServer.get_mix_rate(). If the runtime mix rate is not 16000, the addon will resample.
