Phi-4 Mini MM is Microsoft's language model. A compact multimodal Phi-4 Mini model capable of processing both text and image inputs for vision-language tasks.
Specifications
Canonical IDmicrosoft-phi-4-mini-mm
TypeLanguage
StatusActive
CreatorMicrosoftMicrosoft
Providers
Input ModalitiesText
Output ModalitiesText

Capabilities

Input1/5
Text
Image·
Audio·
Video·
PDF·
Output1/5
Text
Image·
Audio·
Video·
Embedding·
Capabilities0/13
Reasoning·
Adaptive Reasoning·
Function Calling·
Parallel Function Calling·
Structured Outputs·
Native JSON Schema·
Web Search·
URL Context·
Computer Use·
Code Execution·
File Search·
Prompt Caching·
Assistant Prefill·

Pricing by Provider

US Dollar ($)
Per 1M tokens
ProviderStandard
Input
$ / 1M
Output
$ / 1M
Azure AI Foundry logo
Azure AI Foundry
microsoft:phi4minimm
$0.08$0.32

Cost Calculator

US Dollar ($)
Preset:

Versions

VersionReleasedContextInput / 1MOutput / 1MStatus
Phi-4 Mini Instruct131K$0.075$0.300Available
Phi-4 Mini MMCurrent
Phi-4 MiniAvailable
Phi-4 Mini Reasoning131K$0.080$0.320Available
Phi-3 MiniAvailable
Phi-3 Mini Instruct131K$0.100$0.100Available
Phi-3.5 Mini Instruct128K$0.130$0.520Available

Other Models

ModelTierReleasedContextInput / 1MOutput / 1M
Phi-416K$0.065$0.140
Phi-4 Multimodal
Phi-4 Eagle
Phi-4 Multimodal Instruct131K$0.080$0.320
Phi-4 Reasoning33K$0.125$0.500
Phi-4 Reasoning Plus
Phi-3
Phi-3 Medium Instruct128K$0.170$0.680
Phi-3 Small Instruct128K$0.150$0.600
Phi-3 Vision Instruct32K$0.200$0.200

Model IDs

microsoft-phi-4-mini-mm