Nemotron Nano 2 12B VL is NVIDIA's language model with a 131K context window and up to 4K output tokens, available from 2 providers, starting at $0.100 / 1M input and $0.100 / 1M output. A 12B-parameter vision-language model from NVIDIA's Nemotron Nano v2 series, enabling multi-image reasoning, video understanding, and document intelligence.
Specifications
Canonical IDnvidia-nemotron-nano-2-12b-vl
TypeLanguage
StatusActive
CreatorNVIDIANVIDIA
Providers
Context Window131K tokens
Max Output4K tokens
Input ModalitiesImageTextVideo
Output ModalitiesText
Reasoning Effortsdefault
Parameters12B
Release Date · 7 months ago
Deprecation Date
Benchmarks
Intelligence Index
10.1
#387
Coding Index
5.9
#334
Math Index
26.7
#185
MMLU-Pro
0.6
#233
GPQA
0.4
#335
HLE
0.0
#331
LiveCodeBench
0.3
#189
IFBench
0.3
#349
Time to First Token
0.71s
#292
SciCode
0.2
#350
AIME 2025
0.3
#185
LCR
0.2
#259
TerminalBench Hard
0.0
#371
TAU2
0.2
#302
Output TPS

Capabilities

Input3/5
Text
Image
Audio·
Video
PDF·
Output1/5
Text
Image·
Audio·
Video·
Embedding·
Capabilities3/13
Reasoning
Adaptive Reasoning·
Function Calling
Parallel Function Calling·
Structured Outputs
Native JSON Schema·
Web Search·
URL Context·
Computer Use·
Code Execution·
File Search·
Prompt Caching·
Assistant Prefill·

Pricing by Provider

ProviderStandard
Input
$ / 1M
Output
$ / 1M
Fireworks AI logo
Fireworks AI
fireworks_ai/accounts/fireworks/models/nemotron-nano-v2-12b-vl
$0.100$0.100
Vercel AI Gateway logo
Vercel AI Gateway
nvidia/nemotron-nano-12b-v2-vl
$0.200$0.600

Cost Calculator

Preset:

Model IDs