Capabilities / Voice settings

Voice settings

The ZeliSpeech voice_settings sliders are accepted and mapped onto the Zeli Turbo delivery knobs. Send the same body your integration already sends — it's translated, not rejected.

The mapping

voice_settings	Range	Maps to	Effect
`style`	0–1	`exaggeration`	expressiveness (0 ≈ neutral, 1 ≈ animated)
`stability`	0–1	`cfg_weight` ↑ + `temperature` ↓	steadier / more monotone as it rises
`speed`	~0.7–1.2	`speed` (WSOLA)	pitch-preserving pace
`similarity_boost`	0–1	(no-op)	timbre is fixed by the reference clip
`use_speaker_boost`	bool	(no-op)	timbre is fixed by the reference clip

A deliberate approximation

This mapping is an intentional approximation — the voice-settings sliders and the Turbo knobs aren't the same axes. It's tuned to feel familiar, not identical. Omit voice_settings entirely to get the engine's lively-neutral default.

Sending voice settings

from zeli_tts import ZeliSpeech, VoiceSettings
 
client = ZeliSpeech(api_key="sk-zeli-...", base_url="https://voice.your-domain.com")
 
audio = client.text_to_speech.convert(
    voice_id="zeli-voice-1",
    text="Steadier and a little more expressive.",
    voice_settings=VoiceSettings(stability=0.6, style=0.35, speed=1.0),
)

curl -X POST "https://voice.your-domain.com/v1/text-to-speech/zeli-voice-1" \
  -H "Authorization: Bearer sk-zeli-..." -H "Content-Type: application/json" \
  -d '{
        "text": "Steadier and a little more expressive.",
        "voice_settings": { "stability": 0.6, "style": 0.35, "speed": 1.0 }
      }' --output out.mp3

Reading defaults

Two endpoints return voice-settings shapes for a UI:

GET/v1/voices/settings/default

GET/v1/voices/{voice_id}/settings

Both are authed when a key is set and return the ZeliSpeech voice-settings shape, so a client that reads defaults before rendering sliders works unchanged.

The underlying knobs

If you use the native Zeli contract directly (or the Zeli SDK), you can set the Turbo knobs without going through the voice_settings mapping:

exaggerationfloatOptional

Expressiveness. Higher is more animated.

cfg_weightfloatOptional

Guidance weight. Higher is steadier / more faithful, lower is more dynamic.

temperaturefloatOptional

Sampling randomness. Lower is more monotone and stable.

speedfloatOptionaldefault: 1.0

Pitch-preserving time-stretch (WSOLA).

A per-emotion library on the native contract maps named tones to all four knobs at once, so presets like excited, calm, and serious sound distinct. The voice_settings mapping above is a convenient shorthand for the same delivery system.

Voice settings#

The mapping#

Sending voice settings#

Reading defaults#

The underlying knobs#

Voice settings

The mapping

Sending voice settings

Reading defaults

The underlying knobs