I created an API server wrapper with web UI for Chatterbox TTS

#16
by devnen - opened

The Chatterbox model is truly impressive work by the Resemble AI team. The quality and capabilities are outstanding.

Seeing discussions about running Chatterbox locally, I wanted to share a project I built that might make it easier to get started: https://github.com/devnen/Chatterbox-TTS-Server

It's an enhanced FastAPI server that wraps the Chatterbox model with several useful features. Setup is straightforward with a standard pip install that works on Windows or Linux:

The goal was to create a simple way to run and experiment with Chatterbox without needing to piece together setup yourself, while adding helpful features like chunking for long texts and voice consistency controls.

The server automatically downloads the model from Hugging Face Hub and features a modern web UI for parameter tuning and voice management, plus automatic text chunking for long documents. Includes predefined voices and voice cloning with reference files, plus seed control for consistent results. Offers OpenAI-compatible and custom APIs, GPU/CPU support, and Docker deployment.

This builds on the architecture from my previous Dia-TTS-Server project but is specifically designed for the Chatterbox engine's capabilities:
https://github.com/devnen/Dia-TTS-Server

Hope you find it useful!

I usually see Gradios in this sort of space (AI/TTS Model->dedicated GUI) -- it appears the whole thing is done in JS on the frontend? That's the dream team for me, Python for its enjoyability and JS for its mature perfection (as far as display goes.) So this wasn't NiceGUI, you just routed it up and freeballed?
And yes so far, it is the best model. I've used Resemble's other TTS tools (XTTS is often paired with their other major tool, resemble enhance, put together by default in Daswer's XTTS GUI, which I contribute to as an app creator myself, as a hobbiest anyway) I'm hoping that they will stick with this one for a while and gain traction because it would be a perfect TTS for something like LMStudios.

Thank you for your kind words. I went with vanilla JS instead of Gradio for better control over the UI and chunking workflow. Just FastAPI backend + HTML/JS frontend which keeps it simple and lightweight.

I believe this is the most useful local TTS model since Kokoro. Chatterbox being open-source at this quality level could make a real difference.

Sign up or log in to comment