Text To Speech

Users are billed for text-to-speech (TTS) based on the duration of the generated audio, measured in seconds. Billing starts at a minimum of one second per request, ensuring a fair and consistent model that reflects actual usage while covering system overhead. This per-second approach allows users to scale efficiently and only pay for the audio they generate, making it both transparent and predictable.

Model
Audio Second Cost

Chatterbox

$0.0030 / Second

Stay tuned for more speech models.

Pricing subject to change.

Last updated