SDKs & guides / Python SDK
Python SDK
The ZeliSpeech Python SDK is a small, dependency-light client
for the ZeliSpeech API. Point it at your server and get audio back in a couple
of lines — text_to_speech.convert(...) /
.stream(...) plus play / save /
stream helpers.
The ZeliSpeech SDK talks the same /v1 API documented throughout the
API reference. The request/response
shapes follow mainstream TTS-API conventions, so the calls will already look
familiar.
Install
pip install zeli-ttsRuntime deps: requests (HTTP) and websocket-client (streaming). Live playback
(play / stream) shells out to ffplay (ffmpeg), mpv, or afplay.
Quickstart
from zeli_tts import ZeliSpeech, play, save, stream
client = ZeliSpeech(
api_key="sk-zeli-...", # optional if the box runs open
base_url="https://voice.your-domain.com",
)
# 1) Stream — first audio after the first sentence, not the whole passage
audio = client.text_to_speech.stream(
voice_id="zeli-voice-1",
text="Hey there, this is Zeli.",
output_format="mp3_44100_128",
)
for chunk in audio:
... # audio bytes as they generate
# 2) Stream + play live (needs ffplay or mpv)
stream(client.text_to_speech.stream(voice_id="zeli-voice-1", text="Playing as I generate."))
# 3) One-shot — a finished clip
audio = client.text_to_speech.convert(
voice_id="zeli-voice-1",
text="Hello world.",
output_format="mp3_44100_128",
)
save(audio, "hello.mp3")Client
ZeliSpeech(base_url, *, api_key=None, timeout=60.0)http://host:8000 or https://voice.your-domain.com;
the ws(s):// URL is derived for streaming. Do not
include a trailing /v1.
Sent as an Authorization: Bearer token on every request.
Required only when the box sets ZELI_API_KEY.
Per-request timeout in seconds.
Synthesis — client.text_to_speech
convert(voice_id, text, *, model_id="zeli-turbo", voice_settings=None,
output_format="mp3_44100_128") -> bytes
stream(voice_id, text, *, model_id="zeli-turbo", voice_settings=None,
output_format="mp3_44100_128") -> AudioStreamconvert(...)returns the finished audio asbytesin the requestedoutput_format.stream(...)returns anAudioStream— iterate it for audiobytesas they generate..read()drains it to a single blob.output_formataccepts the full menu — see Output formats.voice_settingsmaps onto Turbo delivery — see Voice settings.
Voices — client.voices
client.voices.list() # -> [Voice(id, label, description, custom), ...]
client.voices.add(name="My voice", file="me.wav", description="") # -> Voice
client.voices.delete("custom-abc123")add clones a voice zero-shot from a short reference clip (~10–20 s of clean
single-speaker audio; file may be a path, a file object, or bytes).
Helpers
from zeli_tts import play, save, stream
play(audio) # play a finished clip
stream(audio_stream) # play chunks live; returns the collected audio
save(audio, "out.mp3") # write bytes to a fileErrors
All derive from ZeliSpeechError:
| Exception | When |
|---|---|
ConfigurationError | bad base_url, or a missing dep / player |
ConnectionError | server unreachable (refused / DNS / timeout) |
APIError | non-2xx from an HTTP endpoint (.status_code, .body) |
GenerationError | server sent an error event mid-synthesis |
Sync client first; an async variant (websockets + httpx)
follows. A published zeli-tts (Python) + a TS/npm
zelispeech SDK are on the near-term roadmap.