FriendliAI is The Frontier AI Inference Cloud. Built by the researchers who invented the continuous batching technique that is now industry standard, FriendliAI provides AI engineers with a highly optimized engine that constantly evolves to efficiently run state-of-the-art open-weight and custom models at production scale. By maximizing GPU utilization, FriendliAI delivers speeds up to 3x faster than vLLM, and 50% to 90% cost savings relative to closed model APIs. FriendliAI empowers engineers to deploy frontier AI with uncompromising speed, model ownership, and enterprise-grade reliability. Inference platform · OpenAI-compatible API · High Throughput · Low Latency · Open Source

Intelligence vs Price

Best value among FriendliAI models on this chart: Llama 3.1 70B Instruct · Llama 3.1 8B Instruct. Hover any dot for full pricing, or click a creator in the legend to isolate.

Language Models
Intelligence
Blended Price, $
Log X

FriendliAI models

2 models, 2 with pricing
All Model Types
All Creators
US Dollar ($)
Per 1M tokens
Input/1M
to
Output/1M
to
Model
Creator
Input Price, $
Output Price, $
Context
Max Output
Inference Providers
Intelligence
Coding
Llama 3.1 70B InstructMeta logoMeta0.120.3131K16Kcompare (13)12.5#110.9#1
Llama 3.1 8B InstructMeta logoMeta0.020.03200K128Kcompare (21)11.8#24.9#2