Get started / Quickstart
Quickstart
Turn text into speech in a couple of lines. Every example below points at a
placeholder host — swap https://voice.your-domain.com for your box
and sk-zeli-... for your API key.
Before you start
Your Zeligate Voice box, e.g. https://voice.your-domain.com. Do
not include a trailing /v1 — the SDK adds it.
Sent as an Authorization: Bearer token.
Required only if the box sets ZELI_API_KEY; an embedded/loopback
box may run open. See Authentication.
A Zeli voice (zeli-voice-1, zeli-voice-2,
zeli-voice-3, zeli-voice-4, zeli-voice-5,
zeli-voice-6…) or any other voice id — unknown ids alias to your
default voice. See Voices.
1. Synthesize a clip
The wire contract follows mainstream TTS-API conventions, so the ZeliSpeech SDKs synthesize in a couple of lines.
from zeli_tts import ZeliSpeech
client = ZeliSpeech(
api_key="sk-zeli-...", # your ZELI_API_KEY
base_url="https://voice.your-domain.com", # your Zeli box — NO trailing /v1
)
audio = client.text_to_speech.convert(
voice_id="zeli-voice-1",
text="G'day — this is running on our own hardware.",
model_id="zeli-turbo", # accepted; the engine is always Zeli Turbo
output_format="mp3_44100_128",
)
with open("out.mp3", "wb") as f:
for chunk in audio:
f.write(chunk)curl -X POST \
"https://voice.your-domain.com/v1/text-to-speech/zeli-voice-1?output_format=mp3_44100_128" \
-H "Authorization: Bearer sk-zeli-..." \
-H "Content-Type: application/json" \
-d '{"text":"Hello from Zeli."}' \
--output out.mp3import { ZeliSpeech } from "zelispeech";
const client = new ZeliSpeech({
apiKey: "sk-zeli-...",
baseUrl: "https://voice.your-domain.com", // NO trailing /v1
});
const audio = await client.textToSpeech.convert("zeli-voice-1", {
text: "Hello from Zeli.",
modelId: "zeli-turbo",
outputFormat: "mp3_44100_128",
});
// `audio` is a stream of MP3 bytes — pipe it to a file or a player.There's also a first-party Zeli Python SDK for the self-hosted streaming contract — a couple of lines to stream and play audio as it generates.
2. Stream for low latency
For conversational or long-form use, stream so playback can start after the first sentence instead of waiting for the whole clip.
from zeli_tts import ZeliSpeech, stream
client = ZeliSpeech(api_key="sk-zeli-...", base_url="https://voice.your-domain.com")
audio = client.text_to_speech.stream(
voice_id="zeli-voice-1",
text="Streaming means the first audio arrives after the first sentence.",
output_format="mp3_44100_128",
)
stream(audio) # plays as it arrives (needs ffplay / mpv)curl -N -X POST \
"https://voice.your-domain.com/v1/text-to-speech/zeli-voice-1/stream?output_format=mp3_44100_128" \
-H "Authorization: Bearer sk-zeli-..." \
-H "Content-Type: application/json" \
-d '{"text":"Streaming keeps time-to-first-byte low."}' \
--output stream.mp33. Pick a format and a voice
output_format is a query parameter (not a body field). The engine renders
at 24 kHz, so pcm_24000 / wav_24000 are cheapest and ulaw_8000 is
Twilio-ready.
# telephony-ready μ-law for a Twilio Media Stream
curl -X POST \
"https://voice.your-domain.com/v1/text-to-speech/zeli-voice-1?output_format=ulaw_8000" \
-H "Authorization: Bearer sk-zeli-..." -H "Content-Type: application/json" \
-d '{"text":"Ready for the phone."}' --output out.ulawSee Output formats for the full menu and Voices for the voice list and aliasing rules.
Next steps
Craft natural, expressive reads and steer pacing with punctuation.
Map voice-settings sliders (stability, style, speed) onto Zeli delivery.
Feed text as it's generated and receive audio frames live.
Error envelopes, status codes, and how to handle them.