Meta logo

Llama 3.2 90B Vision Instruct


Llama 3.2 90B Vision Instruct is Meta logoMeta's language model with a 128K context window and up to 16K output tokens, available from 4 providers, starting at $0.900 / 1M input and $0.900 / 1M output. A 90B instruction-tuned multimodal Llama 3.2 model delivering high-accuracy image captioning, visual reasoning, and general visual question answering at scale.
Spec
Canonical IDmeta-llama-3-2-90b-vision-instruct
TypeLanguage
StatusActive
CreatorMetaMeta
Providers
Context Window128K tokens
Max Output16K tokens
Input ModalitiesImage
Output ModalitiesText
Parameters90B

Capabilities

Input1/5
Text·
Image
Audio·
Video·
PDF·
Output1/5
Text
Image·
Audio·
Video·
Embedding·
Capabilities3/13
Reasoning·
Adaptive Reasoning·
Function Calling
Parallel Function Calling
Structured Outputs
Native JSON Schema·
Web Search·
URL Context·
Computer Use·
Code Execution·
File Search·
Prompt Caching·
Assistant Prefill·

Pricing by Provider

ProviderStandard
Input
$ / 1M
Output
$ / 1M
Fireworks AI logo
Fireworks AI
accounts/fireworks/models/llama-v3p2-90b-vision-instruct
$0.900$0.900
IBM watsonx logo
IBM watsonx
meta-llama/llama-3-2-90b-vision-instruct
$2.00$2.00
Oracle Cloud (OCI) logo
Oracle Cloud (OCI)
meta.llama-3.2-90b-vision-instruct
$2.00$2.00
Azure AI Foundry logo
Azure AI Foundry
Llama-3.2-90B-Vision-Instruct
$2.04$2.04

Cost Calculator

Preset:
Compares every provider & tier in USD

Versions

VersionReleasedContextInput / 1MOutput / 1MStatus
Llama 3.2 11B Vision Instruct131K$0.015$0.025Available
Llama 3.2 90B Vision Instruct128K$0.900$0.900Current

Model IDs