Qwen2.5 VL 32B Instruct is Alibaba's language model with a 131K context window and up to 8K output tokens, available from 3 providers, starting at $0.2 / 1M input and $0.6 / 1M output. A 32-billion-parameter multimodal vision-language LLM from Alibaba's Qwen2.5-VL series, capable of understanding and reasoning over both images and text.
Specifications
Canonical IDalibaba-qwen2-5-vl-32b-instruct
TypeLanguage
StatusActive
CreatorAlibabaAlibaba
Providers
Context Window131K tokens
Max Output8K tokens
Input ModalitiesImage
Output ModalitiesText
Parameters32B

Capabilities

Input1/5
Text·
Image
Audio·
Video·
PDF·
Output1/5
Text
Image·
Audio·
Video·
Embedding·
Capabilities1/13
Reasoning·
Adaptive Reasoning·
Function Calling
Parallel Function Calling·
Structured Outputs·
Native JSON Schema·
Web Search·
URL Context·
Computer Use·
Code Execution·
File Search·
Prompt Caching·
Assistant Prefill·

Pricing by Provider

US Dollar ($)
Per 1M tokens
ProviderStandardBatch
Input
$ / 1M
Output
$ / 1M
Input
$ / 1M
Output
$ / 1M
Alibaba Qwen logo
Alibaba Qwen
qwen2.5-vl-32b-instruct
$1.40$4.20$0.7$2.10
DeepInfra logo
DeepInfra
deepinfra/Qwen/Qwen2.5-VL-32B-Instruct
$0.2$0.6
Fireworks AI logo
Fireworks AI
fireworks_ai/accounts/fireworks/models/qwen2p5-vl-32b-instruct
$0.9$0.9

Cost Calculator

US Dollar ($)
Preset:

Versions

VersionReleasedContextInput / 1MOutput / 1MStatus
Voyage Multimodal 3.5Available
Qwen2.5 VL 72B Instruct131K$0.130$0.400Available
Qwen2.5 VL 32B Instruct131K$0.200$0.600Current
Qwen2.5 VL 3B Instruct131K$0.200$0.200Available
Qwen2.5 VL 7B Instruct131K$0.200$0.200Available
Rolm OCR128K$0.200$0.200Available

Model IDs

accounts/fireworks/models/qwen2p5-vl-32b-instruct
alibaba-qwen2-5-vl-32b-instruct
deepinfra/Qwen/Qwen2.5-VL-32B-Instruct
fireworks_ai/accounts/fireworks/models/qwen2p5-vl-32b-instruct
qwen2.5-vl-32b-instruct