Qwen TTS Realtime is Alibaba's text to speech model with a 8K context window and up to 8K output tokens, starting at $345.00 / 1M input and $1721.00 / 1M output. A real-time text-to-speech model from Alibaba's Qwen ecosystem optimized for low-latency streaming speech synthesis.
Specifications
Canonical IDalibaba-qwen-tts-realtime
TypeText to Speech
StatusActive
CreatorAlibabaAlibaba
Providers
Context Window8K tokens
Max Output8K tokens
Input ModalitiesText
Output ModalitiesAudio

Capabilities

Input1/5
Text
Image·
Audio·
Video·
PDF·
Output1/5
Text·
Image·
Audio
Video·
Embedding·
Capabilities0/13
Reasoning·
Adaptive Reasoning·
Function Calling·
Parallel Function Calling·
Structured Outputs·
Native JSON Schema·
Web Search·
URL Context·
Computer Use·
Code Execution·
File Search·
Prompt Caching·
Assistant Prefill·

Pricing by Provider

ProviderStandardBatch
Input
$ / 1M
Output
$ / 1M
Input
$ / 1M
Output
$ / 1M
Alibaba Qwen logo
Alibaba Qwen
qwen-tts-realtime
$345.00$1721.00$172.50$860.50

Cost Calculator

Preset:

Versions

VersionReleasedContextInput / 1MOutput / 1MStatus
EAGLE Qwen 2.5 3B InstructAvailable
Qwen3.7 Max1.0M$1.25$3.75Available
Qwen3.6 Max Preview262K$1.04$6.24Available
Qwen3.6 27B262K$0.290$3.20Available
Qwen3.6 35B A3B262K$0.140$1.00Available
Qwen3.6 Plus1.0M$0.325$1.95Available
Qwen3 Max Thinking262K$0.780$3.90Available
Qwen3 Next 80B A3B128K$0.140$1.20Available
Qwen3 Max262K$0.359$1.43Available
Qwen3 Max Preview262K$1.20$6.00Available
Qwen TTS Realtime8K$345.00$1721.00Current

Model IDs