Microsoft logo

Phi-4 Multimodal


Phi-4 Multimodal is an AI model from Microsoft. A multimodal Phi-4 model supporting text, image, and audio inputs for versatile instruction-following across modalities.
Specifications
Canonical IDmicrosoft-phi-4-multimodal
StatusActive
CreatorMicrosoftMicrosoft
Benchmarks
Intelligence Index
10.0
#389
MMLU-Pro
0.5
#278
GPQA
0.3
#402
HLE
0.0
#342
LiveCodeBench
0.1
#279
AIME
0.1
#124
Time to First Token
0.42s
#225
SciCode
0.1
#384
MATH-500
0.7
#130
Output TPS
16.2
#261

Capabilities

Input0/5
Text·
Image·
Audio·
Video·
PDF·
Output0/5
Text·
Image·
Audio·
Video·
Embedding·
Capabilities0/13
Reasoning·
Adaptive Reasoning·
Function Calling·
Parallel Function Calling·
Structured Outputs·
Native JSON Schema·
Web Search·
URL Context·
Computer Use·
Code Execution·
File Search·
Prompt Caching·
Assistant Prefill·

Versions

VersionReleasedContextInput / 1MOutput / 1MStatus
Phi-4 Mini Instruct131K$0.075$0.300Available
Phi-416K$0.065$0.140Available
Phi-4 MultimodalCurrent
Phi-4 MiniAvailable
Phi-4 EagleAvailable
Phi-4 Mini MMAvailable
Phi-4 Mini Reasoning131K$0.080$0.320Available
Phi-4 Multimodal Instruct131K$0.080$0.320Available
Phi-4 Reasoning33K$0.125$0.500Available
Phi-4 Reasoning PlusAvailable
Phi-3 MiniAvailable

Model IDs