Phi 4 Multimodal Instruct

Azure AIText

Phi 4 Multimodal Instruct is a text model from Azure AI with a context window of 131K tokens and max output of 4K tokens. Pricing starts at $0.08 per million input tokens and $0.32 per million output tokens.

Specifications

Model Keyazure_ai/Phi-4-multimodal-instruct
ProviderAzure AI
LiteLLM Providerazure_ai
ModeText
Canonical Namephi-multimodal-4
Context Window131K tokens
Max Output4K tokens

Capabilities

Vision Function Calling Reasoning JSON Schema System Messages Web Search Prompt Caching Audio Input Audio Output

Pricing

TypePer 1K TokensPer 1M Tokens
Input Tokens$0.000080$0.080
Output Tokens$0.000320$0.320

Similar Models

Models with similar capabilities and context window size.

Model
Provider
Mode
Input Price
Output Price
Context
Max Output
Vision
Functions
Gemma 3 27B ItGoogle GeminiTextN/AN/A131K8Kyesyes
GPT-oss-120b-mxfp-GGUFLemonadeTextN/AN/A131K33Knoyes
GPT-oss-20b-mxfp4-GGUFLemonadeTextN/AN/A131K33Knoyes
GPT-oss:120b-cloudOllamaTextN/AN/A131K131Knoyes
GPT-oss:20b-cloudOllamaTextN/AN/A131K131Knoyes
Llama 3.2 3B InstructDeepinfraText$0.020$0.020131K131Knono
Llama3.2 11B Vision InstructLambda AiText$0.015$0.025131K131Kyesyes
Llama3.2 3B InstructLambda AiText$0.015$0.025131K131Knoyes
Meta Llama 3.1 8B Instruct TurboDeepinfraText$0.020$0.030131K131Knono
Mistral Nemo Instruct 2407DeepinfraText$0.020$0.040131K131Knono