Meta logo

Llama 3.2 11B Vision Instruct


Llama 3.2 11B Vision Instruct is Meta logoMeta's language model with a 128K context window and up to 8K output tokens, starting at $0.160 / 1M input and $0.160 / 1M output. An 11B instruction-tuned vision-language Llama 3.2 model optimized for visual recognition, image reasoning, and captioning tasks combining text and image inputs.
Spec
Canonical IDmeta-llama-3-2-11b
TypeLanguage
StatusActive
CreatorMetaMeta
Providers
Context Window128K tokens
Max Output8K tokens
Input ModalitiesImage
Output ModalitiesText
Parameters11B
Release Date · 2 years ago

Capabilities

Input1/5
Text·
Image
Audio·
Video·
PDF·
Output1/5
Text
Image·
Audio·
Video·
Embedding·
Capabilities1/13
Reasoning·
Adaptive Reasoning·
Function Calling
Parallel Function Calling·
Structured Outputs·
Native JSON Schema·
Web Search·
URL Context·
Computer Use·
Code Execution·
File Search·
Prompt Caching·
Assistant Prefill·

Pricing by Provider

ProviderStandard
Input
$ / 1M
Output
$ / 1M
Vercel AI Gateway logo
Vercel AI Gateway
meta/llama-3.2-11b
$0.160$0.160

Cost Calculator

Preset:
Compares every provider & tier in USD

Versions

VersionReleasedContextInput / 1MOutput / 1MStatus
Llama 3.3 70B Instruct131K$0.720$0.720Available
Llama 3.3 70B Instruct131K$0.100$0.300Available
Llama 3.3Available
Llama 3.3 70B Instruct Turbo131K$0.130$0.390Available
Llama 3.3 70B Versatile128K$0.590$0.790Available
Llama 3.3 8B Instruct128KAvailable
Llama 3.2 11B Vision Instruct128K$0.160$0.160Current
Llama 3.2 1B Instruct128K$0.027$0.080Deprecated
Llama 3.2 3B Instruct131K$0.015$0.020Deprecated
Llama 3.2 90B Vision Instruct128K$0.720$0.720Available
Llama 3.2 1B131K$0.100$0.100Available

Model IDs