Grok TTS is xAI's text to speech model, starting at $15.00 / 1M input. A text-to-speech model from xAI's Grok line that generates expressive speech across five voices with speech-tag control and telephony codec support.
Specifications
Canonical IDxai-grok-tts
TypeText to Speech
StatusActive
CreatorxAIxAI
Providers
Input ModalitiesText
Output ModalitiesAudio
Release Date · 3 months ago

Capabilities

Input1/5
Text
Image·
Audio·
Video·
PDF·
Output1/5
Text·
Image·
Audio
Video·
Embedding·
Capabilities0/13
Reasoning·
Adaptive Reasoning·
Function Calling·
Parallel Function Calling·
Structured Outputs·
Native JSON Schema·
Web Search·
URL Context·
Computer Use·
Code Execution·
File Search·
Prompt Caching·
Assistant Prefill·

Pricing by Provider

US Dollar ($)
Per 1M tokens
ProviderStandard
Input
$ / 1M
Vercel AI Gateway logo
Vercel AI Gateway
xai/grok-tts
$15.00

Cost Calculator

US Dollar ($)
Preset:

Versions

VersionReleasedContextInput / 1MOutput / 1MStatus
Grok 4.31.0M$1.25$2.50Available
Grok 4.202.0M$1.25$2.50Available
Grok 4.20 Multi-Agent2.0M$1.25$2.50Available
Grok 4.20 Multi-Agent Beta2.0M$1.25$2.50Available
Grok 4.20 Non-Reasoning2.0M$1.25$2.50Available
Grok 4.20 Reasoning2.0M$1.25$2.50Available
Grok 4.1 Fast2.0M$0.200$0.500Deprecated
Grok 4 Fast131K$0.200$0.500Deprecated
Grok 4 Fast Non-Reasoning2.0M$0.200$0.500Deprecated
Grok 4256K$3.00$15.00Deprecated
Grok TTS$15.00Current

Model IDs

xai-grok-tts
xai/grok-tts