Qwen3 VL 30B A3B Instruct is Alibaba's language model with a 262K context window and up to 33K output tokens, available from 5 providers, starting at $0.130 / 1M input and $0.520 / 1M output. An instruction-tuned vision-language MoE model with 30B total and 3B activated parameters, offering strong multimodal understanding and generation capabilities.
Specifications
Canonical IDalibaba-qwen3-vl-30b-a3b-instruct
TypeLanguage
StatusActive
CreatorAlibabaAlibaba
Providers
Context Window262K tokens
Max Output33K tokens
Input ModalitiesImageText
Output ModalitiesText
Parameters30B
HuggingFace Likes562
HuggingFace Downloads (30d)2,219,395
HuggingFace Downloads (all-time)14,070,852
Release Date · 8 months ago
Knowledge Cutoff
Benchmarks
Intelligence Index
16.0
#273
Coding Index
14.3
#234
Math Index
72.3
#85
MMLU-Pro
0.8
#146
GPQA
0.7
#191
HLE
0.1
#206
LiveCodeBench
0.5
#143
IFBench
0.3
#295
Time to First Token
1.07s
#337
SciCode
0.3
#214
AIME 2025
0.7
#85
LCR
0.2
#231
TerminalBench Hard
0.1
#237
TAU2
0.2
#311
Output TPS
125.1
#100

Capabilities

Input2/5
Text
Image
Audio·
Video·
PDF·
Output1/5
Text
Image·
Audio·
Video·
Embedding·
Capabilities4/13
Reasoning·
Adaptive Reasoning·
Function Calling
Parallel Function Calling
Structured Outputs
Native JSON Schema
Web Search·
URL Context·
Computer Use·
Code Execution·
File Search·
Prompt Caching·
Assistant Prefill·

Pricing by Provider

Cost Calculator

Preset:

Versions

VersionReleasedContextInput / 1MOutput / 1MStatus
Qwen3 VL 30B A3B Instruct262K$0.130$0.520Current
Qwen3 VL 30B A3B Thinking262K$0.130$0.600Available
Qwen3 VL 235B A22B Instruct262K$0.200$0.880Available
Qwen3 VL 235B A22B Thinking262K$0.220$0.880Available

Model IDs