Phi-3 Vision Instruct is Microsoft's language model with a 32K context window, starting at $0.2 / 1M input and $0.2 / 1M output. A lightweight multimodal Phi-3 model with vision capabilities, fine-tuned for instruction-following on image and text inputs.
Capabilities
Input2/5
Text✓
Image✓
Audio·
Video·
PDF·
Output1/5
Text✓
Image·
Audio·
Video·
Embedding·
Capabilities0/13
Reasoning·
Adaptive Reasoning·
Function Calling·
Parallel Function Calling·
Structured Outputs·
Native JSON Schema·
Web Search·
URL Context·
Computer Use·
Code Execution·
File Search·
Prompt Caching·
Assistant Prefill·
Pricing by Provider
US Dollar ($)
Per 1M tokens
| Provider | Standard | |
|---|---|---|
| Input $ / 1M | Output $ / 1M | |
| $0.2 | $0.2 | |
Cost Calculator
US Dollar ($)
Preset:
Versions
| Version | Released | Context | Input / 1M | Output / 1M | Status |
|---|---|---|---|---|---|
| Phi-4 Mini Instruct | 131K | $0.075 | $0.300 | Available | |
| Phi-4 | 16K | $0.065 | $0.140 | Available | |
| Phi-4 Multimodal | — | — | — | — | Available |
| Phi-4 Mini | — | — | — | — | Available |
| Phi-4 Eagle | — | — | — | — | Available |
| Phi-4 Mini MM | — | — | — | — | Available |
| Phi-4 Mini Reasoning | — | 131K | $0.080 | $0.320 | Available |
| Phi-4 Multimodal Instruct | — | 131K | $0.080 | $0.320 | Available |
| Phi-4 Reasoning | — | 33K | $0.125 | $0.500 | Available |
| Phi-4 Reasoning Plus | — | — | — | — | Available |
| Phi-3 Vision Instruct | — | 32K | $0.200 | $0.200 | Current |