Meta logo

Llama 3.2 11B Instruct Vision


Llama 3.2 11B Instruct Vision is an AI model from Meta. Meta's 11B instruction-tuned vision-language model from Llama 3.2, combining image understanding with instruction-following for multimodal applications.
Specifications
Canonical IDmeta-llama-3-2-11b-instruct-vision
StatusActive
CreatorMetaMeta
Benchmarks
Intelligence Index
8.7
#414
Coding Index
4.3
#345
Math Index
1.7
#253
MMLU-Pro
0.5
#284
GPQA
0.2
#441
HLE
0.1
#254
LiveCodeBench
0.1
#293
AIME
0.1
#123
IFBench
0.3
#319
Time to First Token
0.46s
#228
SciCode
0.1
#382
MATH-500
0.5
#154
AIME 2025
0.0
#253
LCR
0.1
#279
TerminalBench Hard
0.0
#323
TAU2
0.1
#325
Output TPS
85.6
#144

Capabilities

Input0/5
Text·
Image·
Audio·
Video·
PDF·
Output0/5
Text·
Image·
Audio·
Video·
Embedding·
Capabilities0/13
Reasoning·
Adaptive Reasoning·
Function Calling·
Parallel Function Calling·
Structured Outputs·
Native JSON Schema·
Web Search·
URL Context·
Computer Use·
Code Execution·
File Search·
Prompt Caching·
Assistant Prefill·

Versions

VersionReleasedContextInput / 1MOutput / 1MStatus
Llama 3.3 70B Instruct131K$0.100$0.200Available
Llama 3.2 3B Instruct131K$0.015$0.020Deprecated
Llama 3.2 1B Instruct128K$0.027$0.080Deprecated
Llama 3.1 405B Instruct131K$0.120$0.300Deprecating
Llama 3.1 70B Instruct131K$0.100$0.100Available
Llama 3.1 8B Instruct200K$0.020$0.030Available
Llama 3.1 70B128K$0.600$0.600Available
Llama 3.1 8B131K$0.030$0.050Available
Llama 3 70B Instruct131K$0.120$0.300Available
Llama 3 8B Instruct32K$0.030$0.040Available
Llama 3.2 11B Instruct VisionCurrent

Model IDs