FriendliAI is The Frontier AI Inference Cloud. Built by the researchers who invented the continuous batching technique that is now industry standard, FriendliAI provides AI engineers with a highly optimized engine that constantly evolves to efficiently run state-of-the-art open-weight and custom models at production scale. By maximizing GPU utilization, FriendliAI delivers speeds up to 3x faster than vLLM, and 50% to 90% cost savings relative to closed model APIs. FriendliAI empowers engineers to deploy frontier AI with uncompromising speed, model ownership, and enterprise-grade reliability. Inference platform · OpenAI-compatible API · High Throughput · Low Latency · Open Source

Intelligence vs Price

Best value among FriendliAI models on this chart: Llama 3.1 70B Instruct · Llama 3.1 8B Instruct. Hover any dot for full pricing, or click a creator in the legend to isolate.

FriendliAI models

2 models, 2 with pricing
Input/1M
to
Output/1M
to
Model
Creator
Input Price, $
Output Price, $
Context
Max Output
Inference Providers
Intelligence
Coding
Llama 3.1 70B InstructMeta logoMeta0.1000.100131K16Kcompare (13)12.5#110.9#1
Llama 3.1 8B InstructMeta logoMeta0.0200.030200K128Kcompare (20)11.8#24.9#2