Llama 3.1 Nemotron 70B Instruct is NVIDIA's language model with a 131K context window and up to 16K output tokens, available from 2 providers, starting at $0.600 / 1M input and $0.600 / 1M output. A 70B instruction-tuned LLM fine-tuned by NVIDIA on Llama 3.1 to significantly improve helpfulness and response quality on user queries.
Specifications
Canonical IDnvidia-llama-3-1-nemotron-70b-instruct
TypeLanguage
StatusDeprecated
CreatorNVIDIANVIDIA
Providers
Context Window131K tokens
Max Output16K tokens
Input ModalitiesText
Output ModalitiesText
Parameters70B
HuggingFace Likes2,064
HuggingFace Downloads (30d)9,963
HuggingFace Downloads (all-time)1,815,075
Release Date · 2 years ago
Knowledge Cutoff
Deprecation Date
Benchmarks
Intelligence Index
13.4
#328
Coding Index
10.8
#285
Math Index
11.0
#222
MMLU-Pro
0.7
#215
GPQA
0.5
#333
HLE
0.0
#324
LiveCodeBench
0.2
#264
AIME
0.2
#85
IFBench
0.3
#323
Time to First Token
0.26s
#212
SciCode
0.2
#309
MATH-500
0.7
#118
AIME 2025
0.1
#222
LCR
0.1
#302
TerminalBench Hard
0.0
#257
TAU2
0.2
#272
Output TPS

Capabilities

Input1/5
Text
Image·
Audio·
Video·
PDF·
Output1/5
Text
Image·
Audio·
Video·
Embedding·
Capabilities2/13
Reasoning·
Adaptive Reasoning·
Function Calling
Parallel Function Calling·
Structured Outputs
Native JSON Schema·
Web Search·
URL Context·
Computer Use·
Code Execution·
File Search·
Prompt Caching·
Assistant Prefill·

Pricing by Provider

Cost Calculator

Preset:

Versions

VersionReleasedContextInput / 1MOutput / 1MStatus
Llama 3.3 70B Instruct131K$0.100$0.200Available
Llama 3.2 3B Instruct131K$0.015$0.020Deprecated
Llama 3.2 1B Instruct131K$0.027$0.080Deprecated
Llama 3.1 405B Instruct131K$0.120$0.300Deprecating
Llama 3.1 70B Instruct131K$0.100$0.100Available
Llama 3.1 8B Instruct200K$0.020$0.030Available
Llama 3.1 70B128K$0.600$0.600Available
Llama 3.1 8B131K$0.030$0.050Available
Llama 3 70B Instruct131K$0.120$0.300Available
Llama 3 8B Instruct32K$0.030$0.040Available
Llama 3.1 Nemotron 70B Instruct131K$0.600$0.600Current

Model IDs