Nemotron Nano 2 12B VL is NVIDIA's language model with a 131K context window and up to 4K output tokens, available from 2 providers, starting at $0.100 / 1M input and $0.100 / 1M output. A 12B-parameter vision-language model from NVIDIA's Nemotron Nano v2 series, enabling multi-image reasoning, video understanding, and document intelligence.
Specifications
Canonical IDnvidia-nemotron-nano-2-12b-vl
TypeLanguage
StatusActive
CreatorNVIDIANVIDIA
Providers
Context Window131K tokens
Max Output4K tokens
Input ModalitiesImageTextVideo
Output ModalitiesText
Reasoning Effortsdefault
Parameters12B
Release Date · 7 months ago
Deprecation Date
Benchmarks
Intelligence Index
10.1
#391
Coding Index
5.9
#338
Math Index
26.7
#185
MMLU-Pro
0.6
#233
GPQA
0.4
#339
HLE
0.0
#335
LiveCodeBench
0.3
#189
IFBench
0.3
#353
Time to First Token
0.72s
#295
SciCode
0.2
#354
AIME 2025
0.3
#185
LCR
0.2
#263
TerminalBench Hard
0.0
#375
TAU2
0.2
#306
Output TPS

Capabilities

Input3/5
Text
Image
Audio·
Video
PDF·
Output1/5
Text
Image·
Audio·
Video·
Embedding·
Capabilities3/13
Reasoning
Adaptive Reasoning·
Function Calling
Parallel Function Calling·
Structured Outputs
Native JSON Schema·
Web Search·
URL Context·
Computer Use·
Code Execution·
File Search·
Prompt Caching·
Assistant Prefill·

Pricing by Provider

Cost Calculator

Preset:

Model IDs