Name: Qwen2.5 32B Instruct
Brand: Alibaba

Qwen2.5 32B Instruct is Alibaba's language model with a 128K context window and up to 8K output tokens, available from 2 providers, starting at $0.06 / 1M input and $0.2 / 1M output. A 32-billion-parameter instruction-tuned LLM from Alibaba's Qwen2.5 series, optimized for following complex instructions and text generation tasks.

Specifications
Canonical ID	`alibaba-qwen2-5-32b-instruct`
Type	Language
Status	Active
Creator	Alibaba
Providers	Fireworks AI Nebius
Context Window	128K tokens
Max Output	8K tokens
Input Modalities	Text
Output Modalities	Text
Parameters	32B

Benchmarks
Intelligence Index	7.5 #356
MMLU-Pro	0.7 #206
GPQA	0.5 #355
HLE	0.0 #423
LiveCodeBench	0.2 #235
AIME	0.1 #110
Time to First Token	0.00s #8
SciCode	0.2 #334
MATH-500	0.8 #96
Output TPS	0.0 #264

Capabilities

Input1/5

Text✓

Image·

Audio·

Video·

PDF·

Output1/5

Text✓

Image·

Audio·

Video·

Embedding·

Capabilities1/13

Reasoning·

Adaptive Reasoning·

Function Calling✓

Parallel Function Calling·

Structured Outputs·

Native JSON Schema·

Web Search·

URL Context·

Computer Use·

Code Execution·

File Search·

Prompt Caching·

Assistant Prefill·

Pricing by Provider

US Dollar ($)

Per 1M tokens

Provider	Standard
Provider	Input $ / 1M	Output $ / 1M
Fireworks AI `fireworks_ai/accounts/fireworks/models/qwen2p5-32b-instruct`	$0.9	$0.9
Nebius `nebius/Qwen/Qwen2.5-32B-Instruct`	$0.06	$0.2

Cost Calculator

US Dollar ($)

Preset:

Input tokens

Output tokens

Number of calls

Cheapest Instances to Run It

Cloud GPU instances that can host Qwen2.5 32B Instruct, ranked by cheapest on-demand price. The model needs about 77 GB of GPU memory at FP16 precision (estimated from its parameter count), so treat the fit as guidance rather than a guarantee.

All clouds

FP16 (full precision)

US Dollar ($)

Instance	Cloud	GPU	VRAM	Price	Cheapest region
Standard_NP20s	Azure	2× AMD Alveo U250 FPGA (64GB)	128 GB	$3.30/hr	westus2
g7e.2xlarge	AWS	RTX PRO Server 6000	96 GB	$3.36/hr	us-east-1
Standard_NC24ads_A100_v4	Azure	NVIDIA A100	80 GB	$3.67/hr	westus2
7 more instances can run Qwen2.5 32B Instruct Unlock the full ranked list and FP8 / INT4 quantization with a CloudPrice subscription.

Versions

Version	Released	Context	Input / 1M	Output / 1M	Status
Dolphin 2.9.2 Qwen2 72B	—	131K	$0.900	$0.900	Available
Cogito V1 Preview Qwen 14B	—	131K	$0.200	$0.200	Available
Cogito V1 Preview Qwen 32B	—	131K	$0.900	$0.900	Available
QwQ 32B	2025-03-05	131K	$0.150	$0.200	Deprecated
Qwen2.5 Coder 32B Instruct	2024-11-11	131K	$0.050	$0.100	Available
Qwen2.5 7B Instruct	2024-10-16	131K	$0.040	$0.070	Available
Qwen2.5 72B Instruct	2024-09-19	131K	$0.120	$0.300	Available
Qwen2.5 32B Instruct	—	128K	$0.060	$0.200	Current
QwQ 32B Preview	—	33K	$0.900	$0.900	Available
Qwen2 72B Instruct	—	33K	$0.900	$0.900	Available
Qwen2.5 Coder 7B Instruct	—	33K	$0.010	$0.030	Available

Model IDs

accounts/fireworks/models/qwen2p5-32b-instruct

alibaba-qwen2-5-32b-instruct

fireworks_ai/accounts/fireworks/models/qwen2p5-32b-instruct

huggingface-llm-qwen2-5-32b-instruct

nebius/Qwen/Qwen2.5-32B-Instruct

qwen2.5-32b-instruct

Qwen2.5 32B Instruct

CapabilitiesAPIGET/api/v1/models/alibaba-qwen2-5-32b-instruct

Pricing by ProviderAPIGET/api/v1/models/alibaba-qwen2-5-32b-instruct/pricing

Cost CalculatorAPIGET/api/v1/models/alibaba-qwen2-5-32b-instruct/pricing/calculate?input_tokens=1000000&output_tokens=500000

Cheapest Instances to Run ItAPIGET/api/v1/models/alibaba-qwen2-5-32b-instruct/instances

VersionsAPIGET/api/v1/models?family=qwen2

Model IDsAPIGET/api/v1/models/alibaba-qwen2-5-32b-instruct

Capabilities

Pricing by Provider

Cost Calculator

Cheapest Instances to Run It

Versions

Model IDs