The journey continues. We are officially starting work on our text-to-speech service. Following the success of our STT platform, we're bringing the same focus on quality, API simplicity, and MCP support to the world of voice generation.
sovavoice STT — speaker diarization is now live. The service can identify who said what within a recording and label each segment with a speaker ID. Works alongside word-level timestamps and all existing response formats. See the full roadmap or try it at sovavoice.com.
sovavoice STT update: the OpenAI-compatible endpoint is now live — POST /v1/audio/transcriptions works as a drop-in replacement for the OpenAI Whisper API. Supports all response formats: json, text, verbose_json, srt, vtt.
New project initialized: TTS. Our goal is to create a natural-sounding text-to-speech engine that's as easy to use as our STT service. Researching models and infrastructure to ensure low latency and high fidelity.
STT is officially live at sovavoice.com. We've crossed the finish line for our speech-to-text service, overcoming latency and scaling hurdles.
A few months back we started building a speech-to-text service. Designed primarily for developers — clean API, straightforward integration. Built with AI tooling in mind too, including MCP support for seamless use with AI assistants. And simple enough for regular users who just want things to work.