Grok Vision Beta is xAI's language model with a 8K context window and up to 8K output tokens, starting at $5.00 / 1M input and $15.00 / 1M output. A multimodal LLM from xAI with vision capabilities for understanding and reasoning about images, released in beta.
Specifications
Canonical IDxai-grok-vision-beta
TypeLanguage
StatusActive
CreatorxAIxAI
Providers
Context Window8K tokens
Max Output8K tokens
Input ModalitiesImage
Output ModalitiesText

Capabilities

Input1/5
Text·
Image
Audio·
Video·
PDF·
Output1/5
Text
Image·
Audio·
Video·
Embedding·
Capabilities2/13
Reasoning·
Adaptive Reasoning·
Function Calling
Parallel Function Calling·
Structured Outputs·
Native JSON Schema·
Web Search
URL Context·
Computer Use·
Code Execution·
File Search·
Prompt Caching·
Assistant Prefill·

Pricing by Provider

US Dollar ($)
Per 1M tokens
ProviderStandard
Input
$ / 1M
Output
$ / 1M
Image In
$ / 1M
xAI logo
xAI
xai/grok-vision-beta
$5.00$15.00$5.00

Cost Calculator

US Dollar ($)
Preset:

Versions

VersionReleasedContextInput / 1MOutput / 1MStatus
Grok 4.31.0M$1.25$2.50Available
Grok 4.202.0M$1.25$2.50Available
Grok 4.20 Multi-Agent2.0M$1.25$2.50Available
Grok 4.20 Multi-Agent Beta2.0M$1.25$2.50Available
Grok 4.20 Non-Reasoning2.0M$1.25$2.50Available
Grok 4.20 Reasoning2.0M$1.25$2.50Available
Grok 4.1 Fast2.0M$0.200$0.500Deprecated
Grok 4 Fast131K$0.200$0.500Deprecated
Grok 4 Fast Non-Reasoning2.0M$0.200$0.500Deprecated
Grok 4256K$3.00$15.00Deprecated
Grok Vision Beta8K$5.00$15.00Current

Model IDs

xai-grok-vision-beta
xai/grok-vision-beta