Microsoft logo

Phi-4 Multimodal


Phi-4 Multimodal is an AI model from Microsoft logoMicrosoft. A multimodal Phi-4 model supporting text, image, and audio inputs for versatile instruction-following across modalities.
Spec
Canonical IDmicrosoft-phi-4-multimodal
StatusActive
CreatorMicrosoftMicrosoft
Intelligence Index
10.0
#376
MMLU-Pro
0.5
#278
GPQA
0.3
#391
HLE
0.0
#332
LiveCodeBench
0.1
#279
AIME
0.1
#124
Time to First Token
0.37s
#204
SciCode
0.1
#372
MATH-500
0.7
#130
Output TPS
16.8
#264

Capabilities

Input0/5
Text·
Image·
Audio·
Video·
PDF·
Output0/5
Text·
Image·
Audio·
Video·
Embedding·
Capabilities0/13
Reasoning·
Adaptive Reasoning·
Function Calling·
Parallel Function Calling·
Structured Outputs·
Native JSON Schema·
Web Search·
URL Context·
Computer Use·
Code Execution·
File Search·
Prompt Caching·
Assistant Prefill·

Versions

VersionReleasedContextInput / 1MOutput / 1MStatus
Phi-416K$0.065$0.140Available
Phi-4 MultimodalCurrent
Phi-4 EagleAvailable
Phi-4 MiniAvailable
Phi-4 Mini Instruct131K$0.075$0.300Available
Phi-4 Mini MMAvailable
Phi-4 Mini Reasoning131K$0.080$0.320Available
Phi-4 Multimodal Instruct131K$0.080$0.320Available
Phi-4 Reasoning33K$0.125$0.500Available
Phi-4 Reasoning PlusAvailable
Phi-3Available

Model IDs