Xiaomi logo

MiMo-V2-Omni


MiMo-V2-Omni is Xiaomi logoXiaomi's language model with a 262K context window and up to 66K output tokens, starting at $0.400 / 1M input and $2.00 / 1M output. Xiaomi's frontier omni-modal model natively processing image, video, and audio inputs within a unified architecture for agentic tasks.
Spec
Canonical IDxiaomi-mimo-2-omni
TypeLanguage
StatusActive
CreatorXiaomiXiaomi
Providers
Context Window262K tokens
Max Output66K tokens
Input ModalitiesAudioImageTextVideo
Output ModalitiesText
Reasoning Effortsdefault
Release Date · 1 month ago
Intelligence Index
43.4
#14
Coding Index
35.5
#29
GPQA
0.8
#27
HLE
0.2
#22
IFBench
0.5
#50
Time to First Token
0.00s
#135
SciCode
0.4
#77
LCR
0.6
#23
TerminalBench Hard
0.3
#23
TAU2
0.9
#21
Output TPS
0.0
#347

Capabilities

Input4/5
Text
Image
Audio
Video
PDF·
Output1/5
Text
Image·
Audio·
Video·
Embedding·
Capabilities1/13
Reasoning
Adaptive Reasoning·
Function Calling·
Parallel Function Calling·
Structured Outputs·
Native JSON Schema·
Web Search·
URL Context·
Computer Use·
Code Execution·
File Search·
Prompt Caching·
Assistant Prefill·

Pricing by Provider

ProviderStandard
Input
$ / 1M
Output
$ / 1M
Cache Read
$ / 1M
OpenRouter logo
OpenRouter
xiaomi/mimo-v2-omni
$0.400$2.00$0.080

Cost Calculator

Preset:
Compares every provider & tier in USD

Versions

VersionReleasedContextInput / 1MOutput / 1MStatus
MiMo-V2-Omni262K$0.400$2.00Current
MiMo V2 Pro1.0M$1.00$3.00Available

Model IDs