MiMo V2 Omni is Xiaomi's language model with a 262K context window and up to 66K output tokens. A frontier omni-modal LLM from Xiaomi that natively processes image, video, and audio inputs within a unified architecture for agentic tasks.
Specifications
Canonical IDxiaomi-mimo-2-omni
TypeLanguage
StatusDeprecated
CreatorXiaomiXiaomi
Providers
Context Window262K tokens
Max Output66K tokens
Input ModalitiesAudioImageTextVideo
Output ModalitiesText
Reasoning Effortsdefault
Release Date · 3 months ago
Deprecation Date
Benchmarks
Intelligence Index
35.0
#51
Coding Index
35.5
#74
GPQA
0.8
#78
HLE
0.2
#69
IFBench
0.5
#137
Time to First Token
2.11s
#428
SciCode
0.4
#153
LCR
0.7
#42
TerminalBench Hard
0.3
#65
TAU2
0.9
#52
Output TPS
73.8
#161

Capabilities

Input4/5
Text
Image
Audio
Video
PDF·
Output1/5
Text
Image·
Audio·
Video·
Embedding·
Capabilities3/13
Reasoning
Adaptive Reasoning·
Function Calling
Parallel Function Calling·
Structured Outputs
Native JSON Schema·
Web Search·
URL Context·
Computer Use·
Code Execution·
File Search·
Prompt Caching·
Assistant Prefill·

Pricing by Provider

US Dollar ($)
Per 1M tokens
ProviderStandard
Input
$ / 1M
Output
$ / 1M
Cache Read
$ / 1M
OpenRouter logo
OpenRouter
xiaomi/mimo-v2-omni
$0.4$2.00$0.08

Cost Calculator

US Dollar ($)
Preset:

Versions

VersionReleasedContextInput / 1MOutput / 1MStatus
MiMo V2.5 Pro1.1M$0.435$0.870Available
MiMo V2.51.1M$0.140$0.280Available
MiMo V2 Omni262KCurrent
MiMo V2 Pro1.0M$1.00$3.00Deprecated
MiMo V2 Flash262K$0.100$0.300Deprecated
MiMo V2.5 424BAvailable
MiMo V2 OmniAvailable
MiMo V2Available
MiMo V2 Flash ReasoningAvailable
MiMo V2 TTSAvailable
MiMo V2.5 TTS VoiceDesignAvailable

Model IDs

mimo-v2-omni
xiaomi-mimo-2-omni
xiaomi/mimo-v2-omni