CosyVoice 2 is Alibaba's language model, starting at $0.287 / 1M input. A second-generation text-to-speech synthesis model from Alibaba featuring natural, expressive voice generation.
Specifications
Canonical IDalibaba-cosyvoice-2
TypeLanguage
StatusActive
CreatorAlibabaAlibaba
Providers
Input ModalitiesText
Output ModalitiesText

Capabilities

Input1/5
Text
Image·
Audio·
Video·
PDF·
Output1/5
Text
Image·
Audio·
Video·
Embedding·
Capabilities0/13
Reasoning·
Adaptive Reasoning·
Function Calling·
Parallel Function Calling·
Structured Outputs·
Native JSON Schema·
Web Search·
URL Context·
Computer Use·
Code Execution·
File Search·
Prompt Caching·
Assistant Prefill·

Pricing by Provider

US Dollar ($)
Per 1M tokens
ProviderStandardBatch
Input
$ / 1M
Input
$ / 1M
Alibaba Qwen logo
Alibaba Qwen
cosyvoice-v2
$0.287$0.143

Cost Calculator

US Dollar ($)
Preset:

Versions

VersionReleasedContextInput / 1MOutput / 1MStatus
CosyVoice 3 Flash$0.130Available
CosyVoice 3 Plus$0.260Available
CosyVoice 3.5 Flash$0.116Available
CosyVoice 3.5 Plus$0.220Available
CosyVoice 2$0.287Current

Model IDs

alibaba-cosyvoice-2
cosyvoice-v2