Name: Llama 3.1 Nemotron 1 Ultra 253B
Brand: NVIDIA

Llama 3.1 Nemotron 1 Ultra 253B is NVIDIA's language model with a 128K context window, starting at $0.6 / 1M input and $1.80 / 1M output. A 253B-parameter ultra-scale LLM fine-tuned by NVIDIA on Llama 3.1, optimized for advanced reasoning and high-accuracy agentic tasks.

Specifications
Canonical ID	`nvidia-llama-3-1-nemotron-1-ultra-253b`
Type	Language
Status	Active
Creator	NVIDIA
Providers	Nebius
Context Window	128K tokens
Input Modalities	Text
Output Modalities	Text
Parameters	253B

Capabilities

Input1/5

Text✓

Image·

Audio·

Video·

PDF·

Output1/5

Text✓

Image·

Audio·

Video·

Embedding·

Capabilities1/13

Reasoning·

Adaptive Reasoning·

Function Calling✓

Parallel Function Calling·

Structured Outputs·

Native JSON Schema·

Web Search·

URL Context·

Computer Use·

Code Execution·

File Search·

Prompt Caching·

Assistant Prefill·

Pricing by Provider

US Dollar ($)

Per 1M tokens

Provider	Standard
Provider	Input $ / 1M	Output $ / 1M
Nebius `nebius/nvidia/Llama-3.1-Nemotron-Ultra-253B-v1`	$0.6	$1.80

Cost Calculator

US Dollar ($)

Preset:

Input tokens

Output tokens

Number of calls

Cheapest Instances to Run It

Cloud GPU instances that can host Llama 3.1 Nemotron 1 Ultra 253B, ranked by cheapest on-demand price. The model needs about 607 GB of GPU memory at FP16 precision (estimated from its parameter count), so treat the fit as guidance rather than a guarantee.

All clouds

FP16 (full precision)

US Dollar ($)

Instance	Cloud	GPU	VRAM	Price	Cheapest region
p4de.24xlarge	AWS	8× A100	640 GB	$27.45/hr	us-east-1
Standard_ND96amsr_A100_v4	Azure	8× NVIDIA A100 (80GB)	640 GB	$32.77/hr	westus2
g7e.48xlarge	AWS	8× RTX PRO Server 6000	768 GB	$33.14/hr	us-east-1
7 more instances can run Llama 3.1 Nemotron 1 Ultra 253B Unlock the full ranked list and FP8 / INT4 quantization with a CloudPrice subscription.

Versions

Version	Released	Context	Input / 1M	Output / 1M	Status
Llama 3.1 Nemotron 1 Ultra 253B	—	128K	$0.600	$1.80	Current
Llama 3.1 Nemotron 1 Ultra 253B Reasoning	—	—	—	—	Available

Other Models

Model	Tier	Released	Context	Input / 1M	Output / 1M
Llama 3.3 70B Instruct	—	2024-12-06	131K	$0.120	$0.200
Llama 3.2 3B Instruct	—	2024-09-25	131K	$0.015	$0.020
Llama 3.2 1B Instruct	—	2024-09-25	131K	$0.027	$0.080
Llama 3.2 11B	—	2024-09-25	128K	$0.160	$0.160
Llama 3.1 405B Instruct	—	2024-07-23	131K	$0.120	$0.300
Llama 3.1 8B Instruct	—	2024-07-23	200K	$0.020	$0.030
Llama 3.1 70B Instruct	—	2024-07-23	131K	$0.120	$0.300
Llama 3.1 70B	—	2024-07-23	128K	$0.360	$0.360
Llama 3.1 8B	—	2024-07-23	131K	$0.030	$0.050
Llama 3 70B Instruct	—	2024-04-23	131K	$0.120	$0.300

Model IDs

llama-3-1-nemotron-ultra-253b-v1-reasoning

nebius/nvidia/Llama-3.1-Nemotron-Ultra-253B-v1

nvidia-llama-3-1-nemotron-1-ultra-253b

Llama 3.1 Nemotron 1 Ultra 253B

CapabilitiesAPIGET/api/v1/models/nvidia-llama-3-1-nemotron-1-ultra-253b

Pricing by ProviderAPIGET/api/v1/models/nvidia-llama-3-1-nemotron-1-ultra-253b/pricing

Cost CalculatorAPIGET/api/v1/models/nvidia-llama-3-1-nemotron-1-ultra-253b/pricing/calculate?input_tokens=1000000&output_tokens=500000

Cheapest Instances to Run ItAPIGET/api/v1/models/nvidia-llama-3-1-nemotron-1-ultra-253b/instances

VersionsAPIGET/api/v1/models?family=llama

Other ModelsAPIGET/api/v1/models/nvidia-llama-3-1-nemotron-1-ultra-253b/similar

Model IDsAPIGET/api/v1/models/nvidia-llama-3-1-nemotron-1-ultra-253b

Capabilities

Pricing by Provider

Cost Calculator

Cheapest Instances to Run It

Versions

Other Models

Model IDs