API reference / Create speech
Create speech
Synthesize a whole clip and return it as binary audio. This is the ZeliSpeech
text_to_speech.convert endpoint.
Related: /stream (chunked, low TTFB),
.../with-timestamps (clip + approximate alignment), and
.../stream/with-timestamps (streamed NDJSON alignment).
Path parameters
A Zeli voice id or a custom-… id. Unknown
ids alias to the default voice unless ZELI_EL_STRICT_VOICES=1.
See Voices.
Query parameters
The container / codec / sample rate of the returned audio. A
query parameter, not a body field. See
Output formats for the full menu.
The Accept header is ignored.
Request body
JSON. Only text is required; everything else is optional. Unknown fields are
accepted and ignored (never a 422).
The text to synthesize.
Accepted and ignored — every request runs on Zeli Turbo. Kept so hard-coded model strings resolve. See Models.
Voice-settings sliders (stability, style,
speed, …) mapped onto Turbo delivery. See
Voice settings.
Accepted; the engine is multilingual by default.
Accepted where supported for reproducibility.
Accepted for continuity context.
Accepted for continuity context.
Fields the engine doesn't use — pronunciation_dictionary_locators,
apply_text_normalization, optimize_streaming_latency,
and the like — are accepted and ignored so an existing request body never
triggers a validation error.
Response
200 OK with the binary audio as the body. The Content-Type matches the
output_format (e.g. audio/mpeg for MP3). The resolved voice is echoed in the
x-zeli-voice header.
Examples
from zeli_tts import ZeliSpeech
client = ZeliSpeech(api_key="sk-zeli-...", base_url="https://voice.your-domain.com")
audio = client.text_to_speech.convert(
voice_id="zeli-voice-1",
text="A finished clip in one call.",
model_id="zeli-turbo",
output_format="mp3_44100_128",
)
with open("out.mp3", "wb") as f:
for chunk in audio:
f.write(chunk)curl -X POST \
"https://voice.your-domain.com/v1/text-to-speech/zeli-voice-1?output_format=mp3_44100_128" \
-H "Authorization: Bearer sk-zeli-..." \
-H "Content-Type: application/json" \
-d '{"text":"A finished clip in one call.","voice_settings":{"stability":0.5,"style":0.3}}' \
--output out.mp3import { ZeliSpeech } from "zelispeech";
import { writeFileSync } from "node:fs";
const client = new ZeliSpeech({
apiKey: "sk-zeli-...",
baseUrl: "https://voice.your-domain.com",
});
const audio = await client.textToSpeech.convert("zeli-voice-1", {
text: "A finished clip in one call.",
outputFormat: "mp3_44100_128",
});
const chunks = [];
for await (const c of audio) chunks.push(c);
writeFileSync("out.mp3", Buffer.concat(chunks));Errors
| Status | status | When |
|---|---|---|
400 | invalid_output_format | Unknown output_format value |
401 | missing_api_key / invalid_api_key | Auth required / bad key |
404 | voice_not_found | Unknown voice with ZELI_EL_STRICT_VOICES=1 |
422 | — | Request-validation error (pydantic detail array) |
503 | model_loading | Engine still warming up |
See Errors for envelope shapes.