Meta logo

Llama 3.2 11B Vision Instruct


Llama 3.2 11B Vision Instruct is Meta logoMeta's language model with a 131K context window and up to 16K output tokens, available from 7 providers, starting at $0.015 / 1M input and $0.025 / 1M output. An 11B instruction-tuned multimodal Llama 3.2 model excelling at image captioning, visual question answering, and tasks requiring combined visual and textual reasoning.
Spec
Canonical IDmeta-llama-3-2-11b-vision-instruct
TypeLanguage
StatusActive
CreatorMetaMeta
Providers
Context Window131K tokens
Max Output16K tokens
Input ModalitiesImageText
Output ModalitiesText
Parameters11B
Release Date · 2 years ago

Capabilities

Input2/5
Text
Image
Audio·
Video·
PDF·
Output1/5
Text
Image·
Audio·
Video·
Embedding·
Capabilities3/13
Reasoning·
Adaptive Reasoning·
Function Calling
Parallel Function Calling
Structured Outputs
Native JSON Schema·
Web Search·
URL Context·
Computer Use·
Code Execution·
File Search·
Prompt Caching·
Assistant Prefill·

Pricing by Provider

ProviderStandard
Input
$ / 1M
Output
$ / 1M
Lambda logo
Lambda
llama3.2-11b-vision-instruct
$0.015$0.025
DeepInfra logo
DeepInfra
meta-llama/Llama-3.2-11B-Vision-Instruct
$0.049$0.049
Fireworks AI logo
Fireworks AI
accounts/fireworks/models/llama-v3p2-11b-vision-instruct
$0.200$0.200
OpenRouter logo
OpenRouter
meta-llama/llama-3.2-11b-vision-instruct
$0.245$0.245
IBM watsonx logo
IBM watsonx
meta-llama/llama-3-2-11b-vision-instruct
$0.350$0.350
Azure AI Foundry logo
Azure AI Foundry
Llama-3.2-11B-Vision-Instruct
$0.370$0.370
Oracle Cloud (OCI) logo
Oracle Cloud (OCI)
meta.llama-3.2-11b-vision-instruct
$2.00$2.00

Cost Calculator

Preset:
Compares every provider & tier in USD

Versions

VersionReleasedContextInput / 1MOutput / 1MStatus
Llama 3.2 11B Vision Instruct131K$0.015$0.025Current
Llama 3.2 90B Vision Instruct128K$0.900$0.900Available

Model IDs