Llama 3.1 Nemotron 70B Instruct

Llama 3.1 Nemotron 70B Instruct is a text model from DeepInfra with a context window of 131K tokens and max output of 131K tokens. Pricing starts at 0.60 per million input tokens and 0.60 per million output tokens (cheapest at Lambda).

Capabilities

Vision Function Calling Reasoning JSON Schema System Messages Web Search Prompt Caching Audio Input Audio Output

Specifications

Model Keydeepinfra/nvidia/Llama-3.1-Nemotron-70B-Instruct
ProviderDeepInfra
Provider IDdeepinfra
ModeText
Canonical Namellama-nemotron-3.1-70b
Context Window131K tokens
Max Output131K tokens

Pricing

TypePer 1K TokensPer 1M Tokens
Input Tokens0.0006000.600
Output Tokens0.0006000.600

Benchmarks

Intelligence Index13.4#147
Coding Index10.8#128
Math Index11.0#117
MMLU-Pro0.7#118
GPQA0.5#149
HLE0.0#128
LiveCodeBench0.2#153
AIME0.2#56
IFBench0.3#139
Time to First Token0.57s#137
SciCode0.2#142
MATH-5000.7#86
AIME 20250.1#117
LCR0.1#136
TerminalBench Hard0.0#107
TAU20.2#113

Price Comparison by Provider

Compare prices for Llama 3.1 Nemotron 70B Instruct across different providers. The same model may be available through multiple providers at different price points.

Provider
Model Key
Input Price, $
Output Price, $
Lambdalambda_ai/llama3.1-nemotron-70b-instruct-fp80.1200.300
Fireworks AIfireworks_ai/accounts/fireworks/models/llama-v3p1-nemotron-70b-instruct0.9000.900
DeepInfradeepinfra/nvidia/Llama-3.1-Nemotron-70B-Instruct0.6000.600

All Variants

All available versions, regions, and API endpoints for Llama 3.1 Nemotron 70B Instruct.

Model Key
Provider
Mode
Input Price, $
Output Price, $
Context
Max Output
Vision
Functions
deepinfra/nvidia/Llama-3.1-Nemotron-70B-InstructDeepInfraText0.6000.600131K131Knoyes
fireworks_ai/accounts/fireworks/models/llama-v3p1-nemotron-70b-instructFireworks AIText0.9000.900131K131Knono
lambda_ai/llama3.1-nemotron-70b-instruct-fp8LambdaText0.1200.300131K131Knoyes