Microsoft logo

Phi-3 Vision 128K Instruct


Phi-3 Vision 128K Instruct is Microsoft logoMicrosoft's language model with a 32K context window, starting at $0.200 / 1M input and $0.200 / 1M output. A lightweight multimodal model from the Phi-3 family with a 128K context window, combining vision and language capabilities for document understanding and image-based reasoning.
Spec
Canonical IDmicrosoft-phi-3-vision-128k-instruct
TypeLanguage
StatusActive
CreatorMicrosoftMicrosoft
Providers
Context Window32K tokens
Input ModalitiesImageText
Output ModalitiesText

Capabilities

Input2/5
Text
Image
Audio·
Video·
PDF·
Output1/5
Text
Image·
Audio·
Video·
Embedding·
Capabilities0/13
Reasoning·
Adaptive Reasoning·
Function Calling·
Parallel Function Calling·
Structured Outputs·
Native JSON Schema·
Web Search·
URL Context·
Computer Use·
Code Execution·
File Search·
Prompt Caching·
Assistant Prefill·

Pricing by Provider

ProviderStandard
Input
$ / 1M
Output
$ / 1M
Fireworks AI logo
Fireworks AI
accounts/fireworks/models/phi-3-vision-128k-instruct
$0.200$0.200

Cost Calculator

Preset:
Compares every provider & tier in USD

Versions

VersionReleasedContextInput / 1MOutput / 1MStatus
Phi-416K$0.065$0.140Available
Phi-4 Mini Instruct131K$0.075$0.300Available
Phi-4 Mini Reasoning131K$0.080$0.320Available
Phi-4 Multimodal Instruct131K$0.080$0.320Available
Phi-4 Reasoning33K$0.125$0.500Available
Phi-3.5 Mini Instruct128K$0.130$0.520Available
Phi-3.5 MoE Instruct128K$0.160$0.640Available
Phi-3.5 Vision Instruct128K$0.130$0.520Available
Phi-3 Vision 128K Instruct32K$0.200$0.200Current
Phi-3Available
Phi-3 Medium Instruct128K$0.170$0.680Available

Model IDs