Get started / Overview
Zeligate Voice API
A self-hosted, low-latency text-to-speech service that speaks the
ZeliSpeech API. The request/response shapes follow mainstream TTS-API conventions,
so most integrations move by changing two things — the base_url and the API key —
and nothing else changes. Secure data sovereignty — your text and audio never
leave your own infrastructure. This is Sovereign Voice, built for enterprise.
Why it exists
Products that already integrate a mainstream TTS API shouldn't need a rewrite to
run on your own hardware. The Zeligate Voice API exposes the ZeliSpeech
/v1 surface on top of the Zeli Turbo engine, so an existing integration — the
ZeliSpeech Python/JS SDKs, a raw-HTTP call, or a third-party product with an
"API key + base URL" field — migrates by swapping the endpoint.
Same shapes, your box. Change base_url to your server and the API
key to your ZELI_API_KEY — nothing else in the calling code
changes.
Start here
Your first synthesized clip in under a minute — SDK, cURL, or a browser.
The core capability: turn text into natural, expressive speech.
Low-TTFB HTTP streaming and a realtime input-streaming WebSocket.
A checklist for pointing an existing TTS integration at your box.
What you can do today
| Capability | Endpoint | Status |
|---|---|---|
| Synthesize a full clip | POST /v1/text-to-speech/{voice_id} | Ready |
| Stream audio (low TTFB) | POST /v1/text-to-speech/{voice_id}/stream | Ready |
| Realtime input streaming | WSS /v1/text-to-speech/{voice_id}/stream-input | Ready |
| Character timestamps | .../with-timestamps | Approximate |
| List / get voices | GET /v1/voices · /v2/voices · /v1/voices/{id} | Ready |
| List models | GET /v1/models | Ready |
| Account / quota | GET /v1/user · /v1/user/subscription | Stub |
This documentation starts with the shipped /v1 swap-out surface and expands
as the platform grows (per-key concurrency, real character alignment,
voice-CRUD, and standalone-platform IaC are on the roadmap). See
Errors and the individual endpoint pages
for exact behavior.
How it's built
One FastAPI codebase serves two topologies from the same engine:
- Embedded — a loopback, on-instance TTS for the AI Avatar (native
/ttscontract), so nothing leaves the box. - Standalone — the public ZeliSpeech
/v1surface documented here.
The engine renders 24 kHz mono audio and is batch-1 (one stream per GPU worker), so scaling means more workers behind a router, not a bigger box.