Worder API
Generate speech from real, verified human voices programmatically. Pay per second of generated audio.
Why build with Worder?
Ethically sourced voices
Every voice is uploaded and verified on camera by the actual voice actor. No scraped data.
Per-second pricing
Each actor sets their per-second rate. Pay only for the audio you generate, or save with subscription tiers.
Actors keep 90%
Voice actors receive 90% of every generation. Payouts are processed through Stripe Connect.
Quickstart
1. Get your API key + add credits
Create an account, then visit Dashboard → Developer API to create a key. Add credits ($5 minimum) — every generation is charged against your balance or a subscription you hold on a voice actor.
2. Find a voice + sample
curl https://worder.com/api/v1/voices \ -H "Authorization: Bearer wdr_YOUR_API_KEY"
Returns voices with IDs, samples, languages, per-second pricing, and subscription tiers. Or browse the marketplace and copy the voice / sample ID from any sample page → "Generate via API".
3. Generate speech
curl -X POST https://worder.com/api/v1/generate \
-H "Authorization: Bearer wdr_YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"voice_id": "VOICE_ID",
"sample_id": "SAMPLE_ID",
"text": "[friendly] Hello, this is a test of the Worder API. [pause 1] [serious] Now let us discuss the details."
}'Returns a signed URL to the generated WAV, the number of seconds produced, the amount charged, and whether the charge came from a subscription or from your credits.
Voice Types
Single-emotion
One recording sample. The entire script is generated in a uniform tone. Direction tags are not available. Best for narration, announcements, and consistent-style content.
Multi-emotion
13 emotional styles recorded by the voice actor. Buyers use direction tags like [happy], [angry], [calm] to switch emotions mid-script. Best for ads, storytelling, characters, and dynamic content.
Available tags: normal, excited, happy, confident, fast, calm, surprised, angry, thoughtful, sad, annoyed, scared, nervous.
The voiceType field in the API response tells you which type a voice is. Use descriptiveTags(up to 5 keywords like “deep”, “warm”, “professional”) to search for voices that match a specific quality.
API Reference
/api/v1/voicesList voices that have been verified on camera and have at least one sample.
Query Parameters
language — Filter by language (e.g. "English")
search — Case-insensitive name search
limit — Results per page (default 20, max 100)
offset — Pagination offset
Response
{
"voices": [
{
"id": "...",
"name": "Professional Narrator",
"language": "English (United States)",
"accent": null,
"voiceType": "MULTI_EMOTION",
"descriptiveTags": ["deep", "warm", "professional"],
"samples": [
{ "id": "...", "label": "normal" },
{ "id": "...", "label": "happy" },
{ "id": "...", "label": "angry" }
],
"user": {
"id": "...",
"name": "Sarah Mitchell",
"pricePerSecondCents": 5,
"volumeDiscounts": [
{ "threshold": 3600, "discountPercent": 20 }
],
"storefront": { "slug": "sarahmitchell" }
}
}
],
"total": 42,
"limit": 20,
"offset": 0
}/api/v1/generateGenerate speech from a voice. If you have an active subscription for that voice actor, seconds are deducted from the subscription. Otherwise your credits are charged at the actor's per-second rate.
Request Body (JSON)
voice_id required — Voice ID
sample_id — Default sample to use for untagged text (defaults to first sample)
text required — Text to synthesize (min 30, max 5,000 chars). Supports direction tags and pause tags (see below).
Direction Tags
Wrap parts of your script in square brackets with a tag name that matches one of the voice's sample labels. Worder generates each section with the matching sample, then stitches the audio together.
"text": "[happy] Welcome to the show! [serious] Now let's talk business."
Available tags depend on the voice — check the samples[].label field from /api/v1/voices. Untagged text (or tags that don't match any sample) uses the sample_id you specified, or the first sample.
Pause Tag
Insert silence between sections. Specify the duration in seconds (integer or decimal, max 30). Text after a pause continues in the previous tag's style.
"text": "[excited] Big news! [pause 2] [whisper] But keep it between us." "text": "[calm] Take a breath. [pause 1.5] Now continue." "text": "[happy] Hola! [pausa 3] [serious] Ahora hablemos en serio."
Both [pause N] and [pausa N] are supported (works in any language).
Emphasize Tag
Use [emphasize] to boost the volume of the next section. It does not change the voice style — it uses the same sample as the previous tag. Text after [emphasize]continues in the previous tag's style.
"text": "[calm] Take a deep breath. [emphasize] Now give it everything you've got!"
Also supported: [emphasis], [enfatizar], [énfasis].
Pronunciation Overrides
Use {written|spoken}to control how specific words are pronounced. The TTS engine receives the “spoken” version; the quality check compares against the “written” version.
"text": "{Pulse|Pols} es la energía que necesitas. {Nike|Naiki} también lo sabe."Useful for brand names, foreign words, or technical terms that the TTS mispronounces.
Segment Requirements
Each tagged section must be at least 30 characters. Shorter segments produce unreliable audio. If multiple sections are too short, the API returns all errors at once.
After generation, Worder transcribes the output with Whisper and compares it to the input. If the match is below 90%, the generation is rejected (422) — you are not charged.
Response
{
"audio_url": "https://...",
"voice_id": "...",
"sample_id": "...",
"seconds_generated": 12,
"cost_cents": 60,
"charged_via": "credits",
"subscription_id": null,
"balance_cents": 9940
}Error Codes
400 — Validation error (text too short, language mismatch, segment too short). May contain multiple errors separated by newlines.
401 — Invalid or missing API key
402 — Insufficient credits (response includes required_cents, balance_cents)
404 — Voice or sample not found
422 — Quality check failed (response includes similarity, transcript, quality_failed: true). You are not charged.
502 — Generation failed (TTS service error)
Pricing
Per-second pricing — each voice actor sets their own rate (minimum $0.01/sec). You pay only for the seconds of audio actually produced.
Subscriptions — commit to a monthly bucket of seconds (1h, 10h, or 100h) and save 20-60% off the per-second rate. Buy subscriptions in the UI; they apply automatically when you call the API.
Credits — one-time top-ups to your account, $5 minimum. Credits are charged when no subscription covers a generation.
Revenue split — voice actors receive 90% of every generation (paid via Stripe Connect). The remaining 10% covers payment processing, hosting, AI compute, and Worder's fee.