FriendliAI AI models — pricing & benchmarks

Official siteDocsPricingAPI

FriendliAI is The Frontier AI Inference Cloud. Built by the researchers who invented the continuous batching technique that is now industry standard, FriendliAI provides AI engineers with a highly optimized engine that constantly evolves to efficiently run state-of-the-art open-weight and custom models at production scale. By maximizing GPU utilization, FriendliAI delivers speeds up to 3x faster than vLLM, and 50% to 90% cost savings relative to closed model APIs. FriendliAI empowers engineers to deploy frontier AI with uncompromising speed, model ownership, and enterprise-grade reliability. Inference platform · OpenAI-compatible API · High Throughput · Low Latency · Open Source

Intelligence vs Price

Best value among FriendliAI models on this chart: Llama 3.1 8B Instruct. Prices use each model's lowest available FriendliAI price across regions. Hover any dot for full pricing, or click a creator in the legend to isolate.

FriendliAI models

2 models in Global, 2 with pricing

All Model Types

All Creators

US Dollar ($)

Per 1M tokens

	Model	Creator	Input Price, $	Output Price, $	Context	Max Output	Inference Providers	Intelligence	Coding
	Llama 3.1 8B Instruct	Meta	0.1	0.1	200K	128K	compare (21)	7.6#1	5.4#1
	Llama 3.1 70B Instruct	Meta	0.6	0.6	131K	16K	compare (13)	6.8#2	N/A

FriendliAI

Intelligence vs Price

FriendliAI modelsAPIGET/api/v1/providers/friendliai/models

FriendliAI models