Phi-4 Multimodal is an AI model from Microsoft. A multimodal Phi-4 model supporting text, image, and audio inputs for versatile instruction-following across modalities.
Specifications
Canonical IDmicrosoft-phi-4-multimodal
StatusActive
CreatorMicrosoftMicrosoft
Benchmarks
Intelligence Index
4.5
#407
MMLU-Pro
0.5
#278
GPQA
0.3
#420
HLE
0.0
#362
LiveCodeBench
0.1
#279
AIME
0.1
#124
Time to First Token
1.10s
#361
SciCode
0.1
#401
MATH-500
0.7
#130
Output TPS
17.0
#267

Capabilities

Input0/5
Text·
Image·
Audio·
Video·
PDF·
Output0/5
Text·
Image·
Audio·
Video·
Embedding·
Capabilities0/13
Reasoning·
Adaptive Reasoning·
Function Calling·
Parallel Function Calling·
Structured Outputs·
Native JSON Schema·
Web Search·
URL Context·
Computer Use·
Code Execution·
File Search·
Prompt Caching·
Assistant Prefill·

Versions

VersionReleasedContextInput / 1MOutput / 1MStatus
Phi-4 Mini Instruct131K$0.075$0.300Available
Phi-416K$0.070$0.140Available
Phi-4 MultimodalCurrent
Phi-4 MiniAvailable
Phi-4 EagleAvailable
Phi-4 Mini MMAvailable
Phi-4 Mini Reasoning131K$0.080$0.320Available
Phi-4 Multimodal Instruct131K$0.080$0.320Available
Phi-4 Reasoning33K$0.125$0.500Available
Phi-4 Reasoning PlusAvailable
Phi-3 MiniAvailable

Model IDs

microsoft-phi-4-multimodal
phi-4-multimodal