ERNIE 4.5 VL 424B A47B is Baidu's language model with a 131K context window and up to 16K output tokens, available from 3 providers, starting at $0.420 / 1M input and $1.25 / 1M output. A large-scale 424B multimodal MoE vision-language model from Baidu activating 47B parameters per token for cross-modal knowledge fusion.
Specifications
Canonical IDbaidu-ernie-vl-4-5-424b-a47b
TypeLanguage
StatusActive
CreatorBaiduBaidu
Providers
Context Window131K tokens
Max Output16K tokens
Input ModalitiesImageText
Output ModalitiesText
Reasoning Effortsdefault
Parameters424B
Release Date · 11 months ago
Knowledge Cutoff

Capabilities

Input2/5
Text
Image
Audio·
Video·
PDF·
Output1/5
Text
Image·
Audio·
Video·
Embedding·
Capabilities1/13
Reasoning
Adaptive Reasoning·
Function Calling·
Parallel Function Calling·
Structured Outputs·
Native JSON Schema·
Web Search·
URL Context·
Computer Use·
Code Execution·
File Search·
Prompt Caching·
Assistant Prefill·

Pricing by Provider

Cost Calculator

Preset:

Versions

VersionReleasedContextInput / 1MOutput / 1MStatus
ERNIE 5 Thinking PreviewAvailable
ERNIE 4.5 21B A3B Thinking131K$0.070$0.280Available
ERNIE 4.5 VL 424B A47B131K$0.420$1.25Current
ERNIE 4.5 300B A47B131K$0.280$1.10Available
ERNIE 4.5 300B A47B Paddle123K$0.280$1.10Available
ERNIE 4.5 VL 28B A3B Thinking131K$0.390$0.390Available

Model IDs