Llama3.1 Nemotron 70B Instruct Fp8

Llama3.1 Nemotron 70B Instruct Fp8 is a text model from Lambda with a context window of 131K tokens and max output of 131K tokens. Pricing starts at 0.12 per million input tokens and 0.30 per million output tokens (cheapest at Lambda).

Capabilities

Vision Function Calling Reasoning JSON Schema System Messages Web Search Prompt Caching Audio Input Audio Output

Specifications

Model Keylambda_ai/llama3.1-nemotron-70b-instruct-fp8
ProviderLambda
Provider IDlambda_ai
ModeText
Canonical Namellama-nemotron-3.1-70b
Context Window131K tokens
Max Output131K tokens

Pricing

TypePer 1K TokensPer 1M Tokens
Input Tokens0.0001200.120
Output Tokens0.0003000.300

Benchmarks

Intelligence Index13.4#147
Coding Index10.8#128
Math Index11.0#117
MMLU-Pro0.7#118
GPQA0.5#149
HLE0.0#128
LiveCodeBench0.2#153
AIME0.2#56
IFBench0.3#139
Time to First Token0.57s#137
SciCode0.2#142
MATH-5000.7#86
AIME 20250.1#117
LCR0.1#136
TerminalBench Hard0.0#107
TAU20.2#113

Price Comparison by Provider

Compare prices for Llama3.1 Nemotron 70B Instruct Fp8 across different providers. The same model may be available through multiple providers at different price points.

Provider
Model Key
Input Price, $
Output Price, $
Lambdalambda_ai/llama3.1-nemotron-70b-instruct-fp80.1200.300
Fireworks AIfireworks_ai/accounts/fireworks/models/llama-v3p1-nemotron-70b-instruct0.9000.900
DeepInfradeepinfra/nvidia/Llama-3.1-Nemotron-70B-Instruct0.6000.600

All Variants

All available versions, regions, and API endpoints for Llama3.1 Nemotron 70B Instruct Fp8.

Model Key
Provider
Mode
Input Price, $
Output Price, $
Context
Max Output
Vision
Functions
deepinfra/nvidia/Llama-3.1-Nemotron-70B-InstructDeepInfraText0.6000.600131K131Knoyes
fireworks_ai/accounts/fireworks/models/llama-v3p1-nemotron-70b-instructFireworks AIText0.9000.900131K131Knono
lambda_ai/llama3.1-nemotron-70b-instruct-fp8LambdaText0.1200.300131K131Knoyes