Alibaba logo

Qwen2.5 VL 3B Instruct


Qwen2.5 VL 3B Instruct is Alibaba logoAlibaba's language model with a 131K context window and up to 8K output tokens, available from 2 providers, starting at $0.200 / 1M input and $0.200 / 1M output. A compact 3-billion-parameter multimodal vision-language LLM from Alibaba's Qwen2.5-VL series, suited for image-text tasks in resource-constrained settings.
Spec
Canonical IDalibaba-qwen2-5-vl-3b-instruct
TypeLanguage
StatusActive
CreatorAlibabaAlibaba
Providers
Context Window131K tokens
Max Output8K tokens
Input ModalitiesText
Output ModalitiesText
Parameters3B

Capabilities

Input1/5
Text
Image·
Audio·
Video·
PDF·
Output1/5
Text
Image·
Audio·
Video·
Embedding·
Capabilities0/13
Reasoning·
Adaptive Reasoning·
Function Calling·
Parallel Function Calling·
Structured Outputs·
Native JSON Schema·
Web Search·
URL Context·
Computer Use·
Code Execution·
File Search·
Prompt Caching·
Assistant Prefill·

Pricing by Provider

ProviderStandardBatch
Input
$ / 1M
Output
$ / 1M
Input
$ / 1M
Output
$ / 1M
Fireworks AI logo
Fireworks AI
fireworks_ai/accounts/fireworks/models/qwen2p5-vl-3b-instruct
$0.200$0.200
Alibaba Qwen logo
Alibaba Qwen
qwen2.5-vl-3b-instruct
$0.210$0.630$0.105$0.315

Cost Calculator

Preset:
Compares every provider & tier in USD

Versions

VersionReleasedContextInput / 1MOutput / 1MStatus
Qwen2.5 VL 72B Instruct131K$0.130$0.400Available
Qwen2.5 VL 3B Instruct131K$0.200$0.200Current
Qwen2.5 VL 32B Instruct131K$0.200$0.600Available
Qwen2.5 VL 7B Instruct131K$0.200$0.200Available
Rolm OCR128K$0.200$0.200Available

Model IDs