Phi 4 Multimodal Instruct

Phi 4 Multimodal Instruct is a text model from Azure AI with a context window of 131K tokens and max output of 4K tokens. Pricing starts at 0.08 per million input tokens and 0.32 per million output tokens.

Capabilities

Vision Function Calling Reasoning JSON Schema System Messages Web Search Prompt Caching Audio Input Audio Output

Specifications

Model Keyazure_ai/Phi-4-multimodal-instruct
ProviderAzure AI
Provider IDazure_ai
ModeText
Canonical Namephi-multimodal-4
Context Window131K tokens
Max Output4K tokens

Pricing

TypePer 1K TokensPer 1M Tokens
Input Tokens0.0000800.080
Output Tokens0.0003200.320

Benchmarks

Intelligence Index10.0#191
MMLU-Pro0.5#164
GPQA0.3#200
HLE0.0#144
LiveCodeBench0.1#164
AIME0.1#88
Time to First Token0.34s#89
SciCode0.1#194
MATH-5000.7#96