Alibaba logo

Qwen VL OCR


Qwen VL OCR is Alibaba logoAlibaba's image to text model with a 38K context window and up to 8K output tokens, starting at $0.070 / 1M input and $0.160 / 1M output. A visual language model from Alibaba's Qwen VL series specialized for optical character recognition, extracting and understanding text from images and documents.
Spec
Canonical IDalibaba-qwen-vl-ocr
TypeImage to Text
StatusActive
CreatorAlibabaAlibaba
Providers
Context Window38K tokens
Max Output8K tokens
Input ModalitiesText
Output ModalitiesText

Capabilities

Input1/5
Text
Image·
Audio·
Video·
PDF·
Output1/5
Text
Image·
Audio·
Video·
Embedding·
Capabilities0/13
Reasoning·
Adaptive Reasoning·
Function Calling·
Parallel Function Calling·
Structured Outputs·
Native JSON Schema·
Web Search·
URL Context·
Computer Use·
Code Execution·
File Search·
Prompt Caching·
Assistant Prefill·

Pricing by Provider

ProviderStandardBatch
Input
$ / 1M
Output
$ / 1M
Input
$ / 1M
Output
$ / 1M
Alibaba Qwen logo
Alibaba Qwen
qwen-vl-ocr
$0.070$0.160$0.035$0.080

Cost Calculator

Preset:
Compares every provider & tier in USD

Versions

VersionReleasedContextInput / 1MOutput / 1MStatus
DeepSeek R1 0528 Qwen3 8B128K$0.060$0.090Available
Qwen3 9.23 MaxAvailable
Qwen 7 28 Flash998KAvailable
Qwen 4 28 Plus129KAvailable
Qwen 3 32B128KAvailable
Qwen3.5-Flash1.0M$0.065$0.260Available
Qwen3.5 Plus 2026-02-151.0M$0.260$1.56Available
Qwen 1 25 Plus129KAvailable
Qwen3.5 Max258KAvailable
Qwen3.6 Plus1.0M$0.325$1.95Available
Qwen VL OCR38K$0.070$0.160Current

Model IDs