Llama 3.1 8B Instruct FP8 Fast is Meta's language model, starting at $0.045 / 1M input and $0.384 / 1M output. An FP8-quantized variant of Llama 3.1 8B Instruct tuned for high-speed inference with minimal accuracy loss on instruction-following tasks.
Capabilities
Input1/5
Text✓
Image·
Audio·
Video·
PDF·
Output1/5
Text✓
Image·
Audio·
Video·
Embedding·
Capabilities0/13
Reasoning·
Adaptive Reasoning·
Function Calling·
Parallel Function Calling·
Structured Outputs·
Native JSON Schema·
Web Search·
URL Context·
Computer Use·
Code Execution·
File Search·
Prompt Caching·
Assistant Prefill·
Pricing by Provider
US Dollar ($)
Per 1M tokens
| Provider | Standard | |
|---|---|---|
| Input $ / 1M | Output $ / 1M | |
| $0.045 | $0.384 | |
Cost Calculator
US Dollar ($)
Preset:
Versions
| Version | Released | Context | Input / 1M | Output / 1M | Status |
|---|---|---|---|---|---|
| Llama 3.3 70B Instruct | 131K | $0.100 | $0.200 | Available | |
| Llama 3.2 3B Instruct | 131K | $0.015 | $0.020 | Deprecated | |
| Llama 3.2 1B Instruct | 131K | $0.027 | $0.080 | Deprecated | |
| Llama 3.2 11B | 128K | $0.160 | $0.160 | Available | |
| Llama 3.1 405B Instruct | 131K | $0.120 | $0.300 | Deprecating | |
| Llama 3.1 70B Instruct | 131K | $0.120 | $0.300 | Available | |
| Llama 3.1 8B Instruct | 200K | $0.020 | $0.030 | Available | |
| Llama 3.1 70B | 128K | $0.360 | $0.360 | Available | |
| Llama 3.1 8B | 131K | $0.030 | $0.050 | Available | |
| Llama 3 70B Instruct | 131K | $0.120 | $0.300 | Deprecating | |
| Llama 3.1 8B Instruct FP8 Fast | — | — | $0.045 | $0.384 | Current |