Phi 4 Multimodal Instruct
Phi 4 Multimodal Instruct is a text model from
Azure AI with a context window of 131K tokens and max output of 4K tokens. Pricing starts at 0.08 per million input tokens and 0.32 per million output tokens.
Capabilities
✓ Vision✓ Function Calling✗ Reasoning✗ JSON Schema✗ System Messages✗ Web Search✗ Prompt Caching✓ Audio Input✗ Audio Output
Specifications
| Model Key | azure_ai/Phi-4-multimodal-instruct |
| Provider | |
| Provider ID | azure_ai |
| Mode | Text |
| Canonical Name | phi-multimodal-4 |
| Context Window | 131K tokens |
| Max Output | 4K tokens |
Pricing
| Type | Per 1K Tokens | Per 1M Tokens |
|---|---|---|
| Input Tokens | 0.000080 | 0.080 |
| Output Tokens | 0.000320 | 0.320 |