π§ Podvoice
Podvoice is a local-first AI podcast generator that converts simple Markdown scripts into multi-speaker audio.
Originally built as a CLI tool, Podvoice now includes PodVoice Studio β a modern web-based GUI for creating, previewing, and generating AI audio visually.
No cloud APIs. No subscriptions. Fully offline.
Runs on Linux, Windows, macOS, and FreeBSD.
Why Podvoice?
Most AI audio tools:
- Require paid APIs
- Depend on cloud services
Podvoice is:
- Local-first
- Fully offline
- Developer-friendly
- Now with a visual GUI (PodVoice Studio)
Features
- Markdown-based scripts
- Multiple logical speakers
- Deterministic voice assignment
- Single stitched output file
- WAV or MP3 export
- Local-only inference
- CPU-first (GPU optional)
- Cross-platform support
- ποΈ Studio Web UI β Modern single-page interface for voice selection, preview, and generation
- π Built-in multi-speaker models β VCTK vits and others with cached voice demos
- β‘ AJAX-based generation β No page reloads, instant audio playback
- π¨ Modern dark theme β Clean sidebar layout with zero scrolling
- π Profile management β YAML-based speaker profiles with reference audio support
- π Multi-reference audio β Concatenate multiple clips for better voice conditioning
Supported platforms
| Platform | Status | Notes |
|---|---|---|
| Linux | β Fully supported | Primary dev platform |
| macOS | β Fully supported | Intel + Apple Silicon |
| Windows | β Fully supported | PowerShell |
| FreeBSD | β Supported | Requires ffmpeg |
| WSL2 | β Supported | Recommended on Windows |
Input format
Podvoice consumes Markdown files with speaker blocks:
[Host | calm]
Welcome to the show.
[Guest | warm]
If this sounds useful, try writing your own script
and see how easily Markdown becomes audio.
Rules:
- Speaker name is required
- Emotion tag is optional
- Text continues until the next speaker block
- Blank lines are allowed
βΆοΈ Demo Video of Podvoice Studio (GUI USAGE)
π§ Demo Audio
βΆοΈ Demo Video of Podvoice (CLI USAGE)
---Quick start (ALL operating systems)
1οΈβ£ System requirements (common)
Required everywhere:
- Python 3.10.x
- ffmpeg
- espeak or espeak-ng (required for Studio with built-in multi-speaker models)
- Internet access only for first run
- ~5β8 GB free disk space (model cache)
2οΈβ£ Install system dependencies
π§ Linux (Ubuntu / Debian)
sudo apt update
sudo apt install -y python3.10 python3.10-venv ffmpeg git espeak
π macOS (Homebrew)
brew install [email protected] ffmpeg git
πͺ Windows (PowerShell)
winget install Python.Python.3.10
winget install ffmpeg
winget install Git.Git
Restart the terminal after installing Python.
π‘ FreeBSD
pkg install python310 ffmpeg git
3οΈβ£ Clone the repository
git clone https://github.com/aman179102/podvoice.git
cd podvoice
Setup (recommended path)
π§ Linux / π macOS / π‘ FreeBSD
chmod +x bootstrap.sh
./bootstrap.sh
This script will:
- Verify Python 3.10
- Create a local
.venv - Install fully pinned dependencies from
requirements.lock - Install
podvoicein editable mode
πͺ Windows (PowerShell)
One-time: allow local scripts
Set-ExecutionPolicy -Scope CurrentUser -ExecutionPolicy RemoteSigned
Run bootstrap
.\bootstrap.ps1
Activate the environment
Linux / macOS / FreeBSD
source .venv/bin/activate
Windows
.venv\Scripts\Activate.ps1
Run the demo
podvoice examples/demo.md --out demo.wav
Or export MP3:
podvoice examples/demo.md --out demo.mp3
On first run, Coqui XTTS v2 model weights will be downloaded and cached locally. Subsequent runs reuse the cache.
ποΈ Studio Web UI
Podvoice includes a modern, single-page web interface for interactive voice generation.
Launch Studio
podvoice studio --host 127.0.0.1 --port 8000
Then open: http://127.0.0.1:8000
Studio Features
| Feature | Description |
|---|---|
| Sidebar Voice Gallery | All built-in speakers displayed with human-friendly labels |
| Instant Preview | Click any voice to hear a demo instantly (cached after first play) |
| Single TTS | Type text, select voice, generate audio β no page reloads |
| Multi TTS (Podcast) | Paste Markdown scripts with speaker mapping |
| AJAX Generation | Audio generates and plays without leaving the page |
| Modern Dark Theme | Clean aesthetic with CSS variables, no scrolling |
Studio Endpoints
/or/singleβ Single TTS page/multiβ Multi-speaker podcast page/demo_wav?voice=p240β Get cached demo audio for a voice/healthβ Health check endpoint
Using a Different Model
Studio defaults to tts_models/en/vctk/vits (built-in multi-speaker). To use XTTS v2 instead:
podvoice studio --model-name tts_models/multilingual/multi-dataset/xtts_v2
CLI usage
podvoice SCRIPT.md --out OUTPUT
Examples:
podvoice examples/demo.md --out output.wav
podvoice examples/demo.md --out podcast.mp3 --language en --device cpu
Options
| Option | Description |
|---|---|
SCRIPT |
Input Markdown file |
--out, -o |
Output .wav or .mp3 |
--language, -l |
XTTS language code |
--device, -d |
cpu (default) or cuda |
GPU usage (optional)
If you have a compatible NVIDIA GPU:
podvoice examples/demo.md --device cuda
If CUDA is unavailable, Podvoice safely falls back to CPU.
π Profile Management
Podvoice supports YAML-based speaker profiles for advanced use cases.
Profile Directory
Default: ./podvoice_profiles/profiles.yaml
Profile Format
profiles:
my_custom_voice:
builtin_speaker: p240
cloned_voice:
reference_audio: ./samples/voice.wav
multi_sample_voice:
reference_audios:
- ./samples/clip1.wav
- ./samples/clip2.wav
- ./samples/clip3.wav
Using Profiles
Profiles are automatically loaded and can be referenced in your Markdown scripts by speaker name.
Performance notes
You may see warnings like:
Could not initialize NNPACK! Reason: Unsupported hardware.
βοΈ These are harmless βοΈ Audio generation will still complete β No action required
How voices are assigned
Podvoice does not train voices.
Instead:
- Uses built-in XTTS v2 speakers
- Hashes speaker names deterministically
- Maps each logical speaker to a stable voice
Implications:
- Same speaker name β same voice
- Rename speaker β possibly different voice
- XTTS update β mapping may change
Fallback: default XTTS voice.
Project structure
podvoice/
βββ podvoice/
β βββ cli.py # CLI entrypoint
β βββ parser.py # Markdown parser
β βββ tts.py # XTTS inference
β βββ audio.py # Audio stitching
β βββ studio.py # FastAPI web UI
β βββ profiles.py # YAML profile management
β βββ preprocessing.py # Audio preprocessing
β βββ utils.py
β
βββ examples/
β βββ demo.md
β
βββ podvoice_profiles/ # Voice profiles directory
β
βββ bootstrap.sh
βββ bootstrap.ps1
βββ pyproject.toml
βββ README.md
Responsible use
Podvoice generates natural-sounding speech.
Do not:
- Impersonate real people without consent
- Use generated audio for fraud or deception
Always disclose synthesized content where appropriate.
You are responsible for compliance with all applicable laws and licenses, including those of Coqui XTTS v2.
Contributing
Podvoice is intentionally simple.
Good contributions:
- Bug reports with minimal reproduction scripts
- CLI UX improvements
- Documentation clarity
- Cross-platform fixes
Non-goals:
- Cloud dependencies
- Training pipelines
- Over-engineering
Goal: local, boring, reliable software.