Phi 4 Multimodal Instruct
Azure AIText
Phi 4 Multimodal Instruct is a text model from Azure AI with a context window of 131K tokens and max output of 4K tokens. Pricing starts at $0.08 per million input tokens and $0.32 per million output tokens.
Specifications
| Model Key | azure_ai/Phi-4-multimodal-instruct |
| Provider | Azure AI |
| LiteLLM Provider | azure_ai |
| Mode | Text |
| Canonical Name | phi-multimodal-4 |
| Context Window | 131K tokens |
| Max Output | 4K tokens |
Capabilities
✓ Vision✓ Function Calling✗ Reasoning✗ JSON Schema✗ System Messages✗ Web Search✗ Prompt Caching✓ Audio Input✗ Audio Output
Pricing
| Type | Per 1K Tokens | Per 1M Tokens |
|---|---|---|
| Input Tokens | $0.000080 | $0.080 |
| Output Tokens | $0.000320 | $0.320 |
Similar Models
Models with similar capabilities and context window size.
Model | Provider | Mode | Input Price | Output Price | Context | Max Output | Vision | Functions |
|---|---|---|---|---|---|---|---|---|
| Gemma 3 27B It | Google Gemini | Text | N/A | N/A | 131K | 8K | yes | yes |
| GPT-oss-120b-mxfp-GGUF | Lemonade | Text | N/A | N/A | 131K | 33K | no | yes |
| GPT-oss-20b-mxfp4-GGUF | Lemonade | Text | N/A | N/A | 131K | 33K | no | yes |
| GPT-oss:120b-cloud | Ollama | Text | N/A | N/A | 131K | 131K | no | yes |
| GPT-oss:20b-cloud | Ollama | Text | N/A | N/A | 131K | 131K | no | yes |
| Llama 3.2 3B Instruct | Deepinfra | Text | $0.020 | $0.020 | 131K | 131K | no | no |
| Llama3.2 11B Vision Instruct | Lambda Ai | Text | $0.015 | $0.025 | 131K | 131K | yes | yes |
| Llama3.2 3B Instruct | Lambda Ai | Text | $0.015 | $0.025 | 131K | 131K | no | yes |
| Meta Llama 3.1 8B Instruct Turbo | Deepinfra | Text | $0.020 | $0.030 | 131K | 131K | no | no |
| Mistral Nemo Instruct 2407 | Deepinfra | Text | $0.020 | $0.040 | 131K | 131K | no | no |