January 31, 2026
Recently, I started playing with audio. Locally I’m using Whisper and WhisperX, while for an upcoming project I’m using Eleven Labs for text-to-speech synthesis.
ElevenLabs provides a high-quality Text-to-Speech (TTS) service based on neural voice synthesis, designed to generate natural-sounding speech with precise control over voice characteristics, timing, and prosody.
At its core, the TTS workflow consists of:
- Supplying a text input (plain text or SSML-like structured text)
- Selecting a voice ID (predefined or custom)
- Configuring voice parameters
- Receiving an audio output (e.g. MP3, WAV)
The only weird thing is that they blocked my free account as soon as I started using it.. I had to upgrade. Well, probably is their business model.