yujxx

Professional software vendor delivering innovative solutions on the Softono platform. Specialized in both open-source and proprietary software development.

Visit Website

Total Products

Software by yujxx

Open Source

PodAgent

# 🎧 PodAgent: A Comprehensive Framework for Podcast Generation [![arXiv](https://img.shields.io/badge/arXiv-Paper-<COLOR>.svg)](https://arxiv.org/abs/2503.00455) [![githubio](https://img.shields.io/badge/GitHub.io-Demo_Page-blue?logo=Github&style=flat-square)](https://podcast-agent.github.io/demo/) This repository contains the official implementation of ["PodAgent: A Comprehensive Framework for Podcast Generation"](https://arxiv.org/abs/2503.00455). Given the topic to be discussed, PodAgent will simulate human behavior to create podcast-like audio presented as a talk show, featuring one host and several guests. The show will include diverse and insightful viewpoints, delivered in appropriate voices, along with structured sound effects and background music to enrich the listening experience. <img align="middle" width="800" src="assets/PodAgent.png"/> <hr> ## News - 🥂 2025.03: PodAgent is released! We currently support podcast generation in two languages: English and Chinese. - 🥂 2025.05: PodAgent is accepted by ACL 2025 Findings! ## Download Codes 1. Download PodAgent ```bash git clone https://github.com/yujxx/PodAgent.git ``` 2. Download CosyVoice ```bash cd PodAgent mkdir TTS cd TTS git clone https://github.com/FunAudioLLM/CosyVoice.git cd CosyVoice git submodule update --init --recursive cd ../.. ``` ## Environment Setup 1. Install the environment (might take some time) ```bash bash ./scripts/EnvsSetup.sh ``` - Or, setup the environment step by step (recommended): ```bash conda create -n podcast -y python=3.10 conda activate podcast conda install -y -c conda-forge pynini==2.1.5 pip install -r TTS/CosyVoice/requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host=mirrors.aliyun.com pip install -U git+https://[email protected]/facebookresearch/audiocraft@c5157b5bf14bf83449c17ea1eeb66c19fb4bc7f0#egg=audiocraft pip install pip==23.2.1 pip install -r requirements.txt ``` 2. Activate the conda environment ```bash conda activate podcast ``` ## Download Models Pre-download the models (might take some time) ```bash python scripts/download_models.py ``` ## Services Setup - Set environment variables for using API services [GPT-4 API](https://platform.openai.com/account/api-keys) ```bash export OPENAI_BASE_URL=your_openai_url_here export PODAGENT_OPENAI_KEY=your_openai_key_here export PODAGENT_SERVICE_PORT=8021 export PODAGENT_SERVICE_URL=127.0.0.1 export PODAGENT_MAX_SCRIPT_LINES=999 ``` - Start Python API services (e.g., Text-to-Speech, Text-to-Audio) ```bash bash scripts/start_services.sh ``` - After that, please wait a moment and check the log in services_logs/service.out. When you see the following output, it means the services are ready to be called. ```bash * Running on http://127.0.0.1:8021 ``` - (Optional) Kill the running services when you finish the usage. ```bash python scripts/kill_services.py ``` ## Usage ```bash python podagent.py --topic "What are the primary factors that influence consumer behavior?" --guest-number "2" --session-id "test" ``` - (Optional) If you want to reuse responses for repeated requests (e.g., during debugging), you can enable caching: ```bash export USE_OPENAI_CACHE=True ``` ## Citation If you find this work useful, you can cite the paper below: @misc{xiao2025podagentcomprehensiveframeworkpodcast, title={PodAgent: A Comprehensive Framework for Podcast Generation}, author={Yujia Xiao and Lei He and Haohan Guo and Fenglong Xie and Tan Lee}, year={2025}, eprint={2503.00455}, archivePrefix={arXiv}, primaryClass={cs.SD}, url={https://arxiv.org/abs/2503.00455}, } ## Appreciation - [WavJourney](https://github.com/Audio-AGI/WavJourney) for providing an extensive audio generation workflow. - [CosyVoice2](https://github.com/FunAudioLLM/CosyVoice) for a zero-shot text-to-speech synthesis model. - [AudioCraft](https://github.com/facebookresearch/audiocraft) for state-of-the-art audio generation models. ## Disclaimer 1. We are not liable for any audio generated using the semantics produced by this model. Please ensure that it is not used for any illegal purposes. 2. We provide voice libraries under data/voice_presets_cv_* for quick usage. The .wav files under voice_presets_cv_en and voice_presets_cv_zh are sourced from [LibriTTS-R](https://openslr.org/141/) and [AISHELL-3](https://openslr.org/93/), respectively. Please ensure their usage complies with the respective licenses.

AI Agents Podcast Tools

121 Github Stars

Open Source

PodEval

# PodEval: Comprehensive Podcast Evaluation Toolkit A comprehensive toolkit for podcast evaluation across multiple dimensions including audio, speech, and text using both objective metrics and subjective evaluation methods. [![arXiv](https://img.shields.io/badge/arXiv-Paper-<COLOR>.svg)](https://arxiv.org/abs/2510.00485) <img align="middle" width="800" src="PodEval-icon.png"/> ## Overview PodEval provides a complete evaluation pipeline for podcast generation systems, supporting: - **Real-world Dataset** - Curated dataset of real podcast episodes for benchmarking - **Text Quality Evaluation** - Both quantitative linguistic metrics and LLM-based subjective evaluation - **Speech/Audio Assessment** - Objective speech and audio evaluation metrics and Subjective listening tests. ## Directory Structure ### 📁 [Real_Pod/](./Real_Pod/) **Real-Pod Dataset** - A curated dataset of real-world podcast episodes serving as a reference for human-level creative quality. - **Content**: 51 topics across 17 categories with diverse audio scenarios - **Usage**: Download Real-Pod dataset; Process and prepare any podcast dataset for unified evaluation format. - **Documentation**: [Real_Pod/README.md](./Real_Pod/README.md) <img align="middle" width="800" src="Real_Pod/Figure_dataset.png"/> ### 📁 [Text_Eval/](./Text_Eval/) **Text Evaluation Tools** - Evaluate conversation scripts using quantitative metrics and LLM-as-a-Judge methods. - **Methods**: - **Quantitative Metrics**: distinct-2, information density, semantic diversity, MATTR - **LLM-as-a-Judge**: GPT-based evaluation for dialogue, including metrics like coherence, engagingness, diversity, informativeness, overall quality, speaker diversity - **Documentation**: [Text_Eval/README.md](./Text_Eval/README.md) <img align="middle" width="800" src="Text_Eval/text-eval.png"/> ### 📁 [Speech_Audio_Objective_Evaluation/](./Speech_Audio_Obj_Eval/) **Objective Speech/Audio Evaluation Toolkit** - Evaluate objective quality metrics of podcast audio. - **Metrics**: DNSMOS, Loudness, WER, Speaker Similarity, Speaker Timbre Difference, Speech-to-Music Ratio, Music-Speech Harmony. - **Documentation**: [Speech_Audio_Obj_Eval/README.md](./Speech_Audio_Obj_Eval/README.md) <img align="middle" width="800" src="Speech_Audio_Obj_Eval/audio-objective-metrics-workflow.png"/> ### 📁 [Subjective_Listening_Tests/](./Subjective_Listening_Tests/) **Subjective Listening Tests** - Human evaluation framework for podcast speech/audio assessment. - **Dialogue Naturalness Evaluation**: Evaluate the naturalness and authenticity of dialogue speech in podcast. - **Questionnaire-based MOS Test**: Comprehensive evaluation of long-form podcast content through structured questionnaires. - **Documentation**: [Subjective_Listening_Tests/README.md](./Subjective_Listening_Tests/README.md) <img align="middle" width="1200" src="Subjective_Listening_Tests/sub-intro.png"/> ## Environment ```bash conda create --name podeval python=3.10 conda activate podeval pip install -r requirements.txt ``` ### More - Pyannote: Please follow the `Requirements` [here](https://huggingface.co/pyannote/speaker-diarization-3.0) to create access tokens, and replace the `use_auth_token` in `./Speech_Audio_Obj_Eval/models.py` and `./Real_Pod/data_process.py`. ```python pipeline = Pipeline.from_pretrained( "pyannote/speaker-diarization-3.0", use_auth_token="hf_xxx" ) ``` --- ## Citation If you use PodEval in your research, please cite: ```bibtex @misc{xiao2025podeval, title={PodEval: A Multimodal Evaluation Framework for Podcast Audio Generation}, author={Yujia Xiao and Liumeng Xue and Lei He and Xinyi Chen and Aemon Yat Fei Chiu and Wenjie Tian and Shaofei Zhang and Qiuqiang Kong and Xinfa Zhu and Wei Xue and Tan Lee}, year={2025}, eprint={2510.00485}, archivePrefix={arXiv}, primaryClass={cs.SD}, url={https://arxiv.org/abs/2510.00485}, } ``` ## Disclaimer 1. The Real-Pod dataset provides publicly accessible download links instead of direct audio files. Users must comply with relevant legal and ethical regulations when using the dataset. 2. Users conducting subjective evaluations via crowdsourcing platforms should ensure fair compensation, exceeding minimum wage requirements, to maintain ethical standards.

ML Frameworks Testing & QA

20 Github Stars