Phi-3.5 Vision Instruct is Microsoft's language model with a 128K context window and up to 4K output tokens, starting at $0.13 / 1M input and $0.52 / 1M output. An instruction-tuned multimodal Phi-3.5 model with vision capabilities for image understanding and visual question answering.
Capabilities
Input1/5
Text·
Image✓
Audio·
Video·
PDF·
Output1/5
Text✓
Image·
Audio·
Video·
Embedding·
Capabilities0/13
Reasoning·
Adaptive Reasoning·
Function Calling·
Parallel Function Calling·
Structured Outputs·
Native JSON Schema·
Web Search·
URL Context·
Computer Use·
Code Execution·
File Search·
Prompt Caching·
Assistant Prefill·
Pricing by Provider
US Dollar ($)
Per 1M tokens
| Provider | Standard | |
|---|---|---|
| Input $ / 1M | Output $ / 1M | |
| $0.13 | $0.52 | |
Cost Calculator
US Dollar ($)
Preset:
Versions
| Version | Released | Context | Input / 1M | Output / 1M | Status |
|---|---|---|---|---|---|
| Phi-4 Mini Instruct | 131K | $0.075 | $0.300 | Available | |
| Phi-4 | 16K | $0.065 | $0.140 | Available | |
| Phi-4 Multimodal | — | — | — | — | Available |
| Phi-4 Mini | — | — | — | — | Available |
| Phi-4 Eagle | — | — | — | — | Available |
| Phi-4 Mini MM | — | — | — | — | Available |
| Phi-4 Mini Reasoning | — | 131K | $0.080 | $0.320 | Available |
| Phi-4 Multimodal Instruct | — | 131K | $0.080 | $0.320 | Available |
| Phi-4 Reasoning | — | 33K | $0.125 | $0.500 | Available |
| Phi-4 Reasoning Plus | — | — | — | — | Available |
| Phi-3.5 Vision Instruct | — | 128K | $0.130 | $0.520 | Current |