Nemotron Nano 2 12B VL is NVIDIA's language model with a 131K context window and up to 4K output tokens, available from 2 providers, starting at $0.100 / 1M input and $0.100 / 1M output. A 12B-parameter vision-language model from NVIDIA's Nemotron Nano v2 series, enabling multi-image reasoning, video understanding, and document intelligence.
Specifications
Canonical IDnvidia-nemotron-nano-2-12b-vl
TypeLanguage
StatusActive
CreatorNVIDIANVIDIA
Providers
Context Window131K tokens
Max Output4K tokens
Input ModalitiesImageTextVideo
Output ModalitiesText
Reasoning Effortsdefault
Parameters12B
Release Date · 7 months ago
Deprecation Date
Benchmarks
Intelligence Index
10.1
#398
Coding Index
5.9
#343
Math Index
26.7
#185
MMLU-Pro
0.6
#233
GPQA
0.4
#344
HLE
0.0
#342
LiveCodeBench
0.3
#189
IFBench
0.3
#360
Time to First Token
0.69s
#297
SciCode
0.2
#359
AIME 2025
0.3
#185
LCR
0.2
#268
TerminalBench Hard
0.0
#380
TAU2
0.2
#313
Output TPS

Capabilities

Input3/5
Text
Image
Audio·
Video
PDF·
Output1/5
Text
Image·
Audio·
Video·
Embedding·
Capabilities3/13
Reasoning
Adaptive Reasoning·
Function Calling
Parallel Function Calling·
Structured Outputs
Native JSON Schema·
Web Search·
URL Context·
Computer Use·
Code Execution·
File Search·
Prompt Caching·
Assistant Prefill·

Pricing by Provider

Cost Calculator

Preset:

Model IDs