Qwen3 TTS VD Realtime is Alibaba's text to speech model, starting at $0.143 / 1M input. A real-time Qwen3 TTS model with voice design support for streaming custom-voice speech synthesis.
Specifications
Canonical IDalibaba-qwen3-tts-vd-realtime
TypeText to Speech
StatusActive
CreatorAlibabaAlibaba
Providers
Input ModalitiesText
Output ModalitiesAudio

Capabilities

Input1/5
Text
Image·
Audio·
Video·
PDF·
Output1/5
Text·
Image·
Audio
Video·
Embedding·
Capabilities0/13
Reasoning·
Adaptive Reasoning·
Function Calling·
Parallel Function Calling·
Structured Outputs·
Native JSON Schema·
Web Search·
URL Context·
Computer Use·
Code Execution·
File Search·
Prompt Caching·
Assistant Prefill·

Pricing by Provider

US Dollar ($)
Per 1M tokens
ProviderStandardBatch
Input
$ / 1M
Input
$ / 1M
Alibaba Qwen logo
Alibaba Qwen
qwen3-tts-vd-realtime-2025-12-16
$0.143$0.0717

Cost Calculator

US Dollar ($)
Preset:

Versions

VersionReleasedContextInput / 1MOutput / 1MStatus
EAGLE Qwen 2.5 3B InstructAvailable
Qwen3.7 Plus1.0M$0.320$1.28Available
Qwen3.7 Max1.0M$1.25$3.75Available
Qwen3.6 Max Preview262K$1.04$6.24Available
Qwen3.6 27B262K$0.150$0.500Available
Qwen3.6 35B A3B262K$0.140$0.450Available
Qwen3.6 Plus1.0M$0.325$1.95Available
Qwen3 Max Thinking262K$0.780$3.90Available
Qwen3 Max262K$0.780$3.90Available
Qwen3 Coder 30B A3B262K$0.150$0.600Available
Qwen3 TTS VD Realtime$0.143Current

Model IDs

alibaba-qwen3-tts-vd-realtime
qwen3-tts-vd-realtime-2025-12-16
qwen3-tts-vd-realtime-2026-01-15