Name: Qwen2.5 VL 72B Instruct
Brand: Alibaba

Qwen2.5 VL 72B Instruct is Alibaba's language model with a 131K context window and up to 128K output tokens, available from 5 providers, starting at $0.13 / 1M input and $0.4 / 1M output. A 72-billion-parameter multimodal vision-language LLM from Alibaba's Qwen2.5-VL series, delivering high-capacity image understanding and visual reasoning.

Specifications
Canonical ID	`alibaba-qwen2-5-vl-72b-instruct`
Type	Language
Status	Active
Creator	Alibaba
Providers	Fireworks AI Nebius Novita OpenRouter OVHcloud
Context Window	131K tokens
Max Output	128K tokens
Input Modalities	ImageText
Output Modalities	Text
Parameters	72B
HuggingFace Likes	609
HuggingFace Downloads (30d)	103,451
HuggingFace Downloads (all-time)	5,812,114
Release Date	2025-02-01 · 1 year ago
Knowledge Cutoff	2024-06-30 · 2 years ago

Capabilities

Input2/5

Text✓

Image✓

Audio·

Video·

PDF·

Output1/5

Text✓

Image·

Audio·

Video·

Embedding·

Capabilities3/13

Reasoning·

Adaptive Reasoning·

Function Calling✓

Parallel Function Calling·

Structured Outputs✓

Native JSON Schema✓

Web Search·

URL Context·

Computer Use·

Code Execution·

File Search·

Prompt Caching·

Assistant Prefill·

Pricing by Provider

US Dollar ($)

Per 1M tokens

Provider	Standard
Provider	Input $ / 1M	Output $ / 1M	Cache Read $ / 1M
Fireworks AI `fireworks_ai/accounts/fireworks/models/qwen2p5-vl-72b-instruct`	$0.9	$0.9	N/A
Nebius `nebius/Qwen/Qwen2.5-VL-72B-Instruct`	$0.13	$0.4	N/A
Novita `novita/qwen/qwen2.5-vl-72b-instruct`	$0.8	$0.8	N/A
OpenRouter `qwen/qwen2.5-vl-72b-instruct`	$0.8	$1.00	$0.4
OVHcloud `ovhcloud/Qwen2.5-VL-72B-Instruct`	$0.91	$0.91	N/A

Cost Calculator

US Dollar ($)

Preset:

Input tokens

Output tokens

Cache write tokens

Cache read tokens

Number of calls

Cheapest Instances to Run It

Cloud GPU instances that can host Qwen2.5 VL 72B Instruct, ranked by cheapest on-demand price. The model needs about 173 GB of GPU memory at FP16 precision (estimated from its parameter count), so treat the fit as guidance rather than a guarantee.

All clouds

FP16 (full precision)

US Dollar ($)

Instance	Cloud	GPU	VRAM	Price	Cheapest region
Standard_NP40s	Azure	4× AMD Alveo U250 FPGA (64GB)	256 GB	$6.60/hr	westus2
g2-standard-96	GCP	8× nvidia-l4	192 GB	$7.98/hr	us-east4
g7e.12xlarge	AWS	2× RTX PRO Server 6000	192 GB	$8.29/hr	us-east-1
7 more instances can run Qwen2.5 VL 72B Instruct Unlock the full ranked list and FP8 / INT4 quantization with a CloudPrice subscription.

Versions

Version	Released	Context	Input / 1M	Output / 1M	Status
Voyage Multimodal 3.5	—	—	—	—	Available
Qwen2.5 VL 72B Instruct	2025-02-01	131K	$0.130	$0.400	Current
Qwen2.5 VL 32B Instruct	—	128K	$0.200	$0.600	Available
Qwen2.5 VL 3B Instruct	—	128K	$0.200	$0.200	Available
Qwen2.5 VL 7B Instruct	—	128K	$0.200	$0.200	Available
Rolm OCR	—	128K	$0.200	$0.200	Available

Model IDs

accounts/fireworks/models/qwen2p5-vl-72b-instruct

alibaba-qwen2-5-vl-72b-instruct

fireworks_ai/accounts/fireworks/models/qwen2p5-vl-72b-instruct

nebius/Qwen/Qwen2.5-VL-72B-Instruct

novita/qwen/qwen2.5-vl-72b-instruct

ovhcloud/Qwen2.5-VL-72B-Instruct

qwen/qwen2.5-vl-72b-instruct

qwen2.5-vl-72b-instruct

Qwen2.5 VL 72B Instruct

CapabilitiesAPIGET/api/v1/models/alibaba-qwen2-5-vl-72b-instruct

Pricing by ProviderAPIGET/api/v1/models/alibaba-qwen2-5-vl-72b-instruct/pricing

Cost CalculatorAPIGET/api/v1/models/alibaba-qwen2-5-vl-72b-instruct/pricing/calculate?input_tokens=1000000&output_tokens=500000

Cheapest Instances to Run ItAPIGET/api/v1/models/alibaba-qwen2-5-vl-72b-instruct/instances

VersionsAPIGET/api/v1/models?family=qwen2_5_vl

Model IDsAPIGET/api/v1/models/alibaba-qwen2-5-vl-72b-instruct

Capabilities

Pricing by Provider

Cost Calculator

Cheapest Instances to Run It

Versions

Model IDs