Name: Hermes 3 Llama 3.1 405B
Brand: Nous Research

Hermes 3 Llama 3.1 405B is Nous Research's language model with a 131K context window and up to 16K output tokens, available from 3 providers, starting at $1.00 / 1M input and $1.00 / 1M output. A 405B-parameter Llama 3.1-based LLM from Nous Research fine-tuned for advanced roleplaying, reasoning, and agentic multi-turn conversation.

Specifications
Canonical ID	`nousresearch-hermes-3-llama-3-1-405b`
Type	Language
Status	Deprecated
Creator	Nous Research
Providers	DeepInfra Nebius OpenRouter
Context Window	131K tokens
Max Output	16K tokens
Input Modalities	Text
Output Modalities	Text
Parameters	405B
Release Date	2024-08-16 · 2 years ago
Knowledge Cutoff	2023-12-31 · 3 years ago
Deprecation Date	2026-07-19

Capabilities

Input1/5

Text✓

Image·

Audio·

Video·

PDF·

Output1/5

Text✓

Image·

Audio·

Video·

Embedding·

Capabilities3/13

Reasoning·

Adaptive Reasoning·

Function Calling✓

Parallel Function Calling·

Structured Outputs✓

Native JSON Schema✓

Web Search·

URL Context·

Computer Use·

Code Execution·

File Search·

Prompt Caching·

Assistant Prefill·

Pricing by Provider

US Dollar ($)

Per 1M tokens

Provider	Standard
Provider	Input $ / 1M	Output $ / 1M
DeepInfra `deepinfra/NousResearch/Hermes-3-Llama-3.1-405B`	$1.00	$1.00
Nebius `nebius/NousResearch/Hermes-3-Llama-3.1-405B`	$1.00	$3.00
OpenRouter `nousresearch/hermes-3-llama-3.1-405b`	$1.00	$1.00

Cost Calculator

US Dollar ($)

Preset:

Input tokens

Output tokens

Number of calls

Cheapest Instances to Run It

Cloud GPU instances that can host Hermes 3 Llama 3.1 405B, ranked by cheapest on-demand price. The model needs about 972 GB of GPU memory at FP16 precision (estimated from its parameter count), so treat the fit as guidance rather than a guarantee.

All clouds

FP16 (full precision)

US Dollar ($)

Instance	Cloud	GPU	VRAM	Price	Cheapest region
Standard_ND96isr_MI300X_v5	Azure	8× AMD Instinct MI300X	1536 GB	$48.00/hr	eastus2
p5en.48xlarge	AWS	8× H200	1128 GB	$63.30/hr	us-east-1
Standard_ND96isr_H200_v5	Azure	8× NVIDIA H200 GPU (141GB)	1128 GB	$84.80/hr	eastus2
3 more instances can run Hermes 3 Llama 3.1 405B Unlock the full ranked list and FP8 / INT4 quantization with a CloudPrice subscription.

Versions

Version	Released	Context	Input / 1M	Output / 1M	Status
Llama 3.3 70B Instruct	2024-12-06	131K	$0.120	$0.200	Deprecated
Llama 3.2 3B Instruct	2024-09-25	131K	$0.015	$0.020	Deprecated
Llama 3.2 1B Instruct	2024-09-25	128K	$0.027	$0.080	Deprecated
Llama 3.2 11B	2024-09-25	128K	$0.160	$0.160	Available
Llama 3.1 405B Instruct	2024-07-23	131K	$0.120	$0.300	Deprecated
Llama 3.1 8B Instruct	2024-07-23	200K	$0.020	$0.030	Deprecated
Llama 3.1 70B Instruct	2024-07-23	131K	$0.120	$0.300	Available
Llama 3.1 70B	2024-07-23	128K	$0.360	$0.360	Available
Llama 3.1 8B	2024-07-23	131K	$0.030	$0.050	Available
Llama 3 70B Instruct	2024-04-23	131K	$0.120	$0.300	Deprecated
Hermes 3 Llama 3.1 405B	2024-08-16	131K	$1.00	$1.00	Current

Model IDs

deepinfra/NousResearch/Hermes-3-Llama-3.1-405B

nebius/NousResearch/Hermes-3-Llama-3.1-405B

nousresearch-hermes-3-llama-3-1-405b

nousresearch/hermes-3-llama-3.1-405b

NousResearch/Hermes-3-Llama-3.1-405B

nousresearch/hermes-3-llama-3.1-405b:free

Hermes 3 Llama 3.1 405B

CapabilitiesAPIGET/api/v1/models/nousresearch-hermes-3-llama-3-1-405b

Pricing by ProviderAPIGET/api/v1/models/nousresearch-hermes-3-llama-3-1-405b/pricing

Cost CalculatorAPIGET/api/v1/models/nousresearch-hermes-3-llama-3-1-405b/pricing/calculate?input_tokens=1000000&output_tokens=500000

Cheapest Instances to Run ItAPIGET/api/v1/models/nousresearch-hermes-3-llama-3-1-405b/instances

VersionsAPIGET/api/v1/models?family=llama

Model IDsAPIGET/api/v1/models/nousresearch-hermes-3-llama-3-1-405b

Capabilities

Pricing by Provider

Cost Calculator

Cheapest Instances to Run It

Versions

Model IDs