API reference / Create speech

Create speech

Synthesize a whole clip and return it as binary audio. This is the ZeliSpeech text_to_speech.convert endpoint.

POST/v1/text-to-speech/{voice_id}

Related: /stream (chunked, low TTFB), .../with-timestamps (clip + approximate alignment), and .../stream/with-timestamps (streamed NDJSON alignment).

Path parameters

voice_idstringRequired

A Zeli voice id or a custom-… id. Unknown ids alias to the default voice unless ZELI_EL_STRICT_VOICES=1. See Voices.

Query parameters

output_formatstringOptionaldefault: mp3_44100_128

The container / codec / sample rate of the returned audio. A query parameter, not a body field. See Output formats for the full menu. The Accept header is ignored.

Request body

JSON. Only text is required; everything else is optional. Unknown fields are accepted and ignored (never a 422).

textstringRequired

The text to synthesize.

model_idstringOptional

Accepted and ignored — every request runs on Zeli Turbo. Kept so hard-coded model strings resolve. See Models.

voice_settingsobjectOptional

Voice-settings sliders (stability, style, speed, …) mapped onto Turbo delivery. See Voice settings.

language_codestringOptional

Accepted; the engine is multilingual by default.

seedintegerOptional

Accepted where supported for reproducibility.

previous_textstringOptional

Accepted for continuity context.

next_textstringOptional

Accepted for continuity context.

Tolerant by design

Fields the engine doesn't use — pronunciation_dictionary_locators, apply_text_normalization, optimize_streaming_latency, and the like — are accepted and ignored so an existing request body never triggers a validation error.

Response

200 OK with the binary audio as the body. The Content-Type matches the output_format (e.g. audio/mpeg for MP3). The resolved voice is echoed in the x-zeli-voice header.

Examples

from zeli_tts import ZeliSpeech
 
client = ZeliSpeech(api_key="sk-zeli-...", base_url="https://voice.your-domain.com")
 
audio = client.text_to_speech.convert(
    voice_id="zeli-voice-1",
    text="A finished clip in one call.",
    model_id="zeli-turbo",
    output_format="mp3_44100_128",
)
with open("out.mp3", "wb") as f:
    for chunk in audio:
        f.write(chunk)

curl -X POST \
  "https://voice.your-domain.com/v1/text-to-speech/zeli-voice-1?output_format=mp3_44100_128" \
  -H "Authorization: Bearer sk-zeli-..." \
  -H "Content-Type: application/json" \
  -d '{"text":"A finished clip in one call.","voice_settings":{"stability":0.5,"style":0.3}}' \
  --output out.mp3

import { ZeliSpeech } from "zelispeech";
import { writeFileSync } from "node:fs";
 
const client = new ZeliSpeech({
  apiKey: "sk-zeli-...",
  baseUrl: "https://voice.your-domain.com",
});
 
const audio = await client.textToSpeech.convert("zeli-voice-1", {
  text: "A finished clip in one call.",
  outputFormat: "mp3_44100_128",
});
const chunks = [];
for await (const c of audio) chunks.push(c);
writeFileSync("out.mp3", Buffer.concat(chunks));

Errors

Status	`status`	When
`400`	`invalid_output_format`	Unknown `output_format` value
`401`	`missing_api_key` / `invalid_api_key`	Auth required / bad key
`404`	`voice_not_found`	Unknown voice with `ZELI_EL_STRICT_VOICES=1`
`422`	—	Request-validation error (pydantic detail array)
`503`	`model_loading`	Engine still warming up

See Errors for envelope shapes.

Create speech#

Path parameters#

Query parameters#

Request body#

Response#

Examples#

Errors#