ERNIE 4.5 VL 28B A3B is Baidu's language model with a 30K context window and up to 8K output tokens, starting at $0.14 / 1M input and $0.56 / 1M output. A 28B multimodal MoE vision-language model from Baidu with 3B active parameters per token, enabling cross-modal understanding and generation.
Specifications
Canonical IDbaidu-ernie-vl-4-5-28b-a3b
TypeLanguage
StatusActive
CreatorBaiduBaidu
Providers
Context Window30K tokens
Max Output8K tokens
Input ModalitiesImageText
Output ModalitiesText
Reasoning Effortsdefault
Parameters28B
HuggingFace Likes101
HuggingFace Downloads (30d)70,108
HuggingFace Downloads (all-time)802,331
Release Date · 10 months ago
Knowledge Cutoff · 1 year ago

Capabilities

Input2/5
Text
Image
Audio·
Video·
PDF·
Output1/5
Text
Image·
Audio·
Video·
Embedding·
Capabilities3/13
Reasoning
Adaptive Reasoning·
Function Calling
Parallel Function Calling
Structured Outputs·
Native JSON Schema·
Web Search·
URL Context·
Computer Use·
Code Execution·
File Search·
Prompt Caching·
Assistant Prefill·

Pricing by Provider

US Dollar ($)
Per 1M tokens
ProviderStandard
Input
$ / 1M
Output
$ / 1M
Novita logo
Novita
novita/baidu/ernie-4.5-vl-28b-a3b
$0.14$0.56

Cost Calculator

US Dollar ($)
Preset:

Model IDs

baidu-ernie-vl-4-5-28b-a3b
baidu/ernie-4.5-vl-28b-a3b
novita/baidu/ernie-4.5-vl-28b-a3b