MiMo V2 Omni is Xiaomi's language model with a 262K context window and up to 66K output tokens, starting at $0.400 / 1M input and $2.00 / 1M output. A frontier omni-modal LLM from Xiaomi that natively processes image, video, and audio inputs within a unified architecture for agentic tasks.
Specifications
Canonical IDxiaomi-mimo-2-omni
TypeLanguage
StatusActive
CreatorXiaomiXiaomi
Providers
Context Window262K tokens
Max Output66K tokens
Input ModalitiesAudioImageTextVideo
Output ModalitiesText
Reasoning Effortsdefault
Release Date · 2 months ago
Benchmarks
Intelligence Index
43.4
#45
Coding Index
35.5
#66
GPQA
0.8
#71
HLE
0.2
#61
IFBench
0.5
#127
Time to First Token
1.41s
#383
SciCode
0.4
#144
LCR
0.7
#37
TerminalBench Hard
0.3
#57
TAU2
0.9
#47
Output TPS
99.0
#115

Capabilities

Input4/5
Text
Image
Audio
Video
PDF·
Output1/5
Text
Image·
Audio·
Video·
Embedding·
Capabilities3/13
Reasoning
Adaptive Reasoning·
Function Calling
Parallel Function Calling·
Structured Outputs
Native JSON Schema·
Web Search·
URL Context·
Computer Use·
Code Execution·
File Search·
Prompt Caching·
Assistant Prefill·

Pricing by Provider

ProviderStandard
Input
$ / 1M
Output
$ / 1M
Cache Read
$ / 1M
OpenRouter logo
OpenRouter
xiaomi/mimo-v2-omni
$0.400$2.00$0.080

Cost Calculator

Preset:

Versions

VersionReleasedContextInput / 1MOutput / 1MStatus
MiMo V2.5 Pro1.1M$1.00$3.00Available
MiMo V2.51.1M$0.400$2.00Available
MiMo V2 Omni262K$0.400$2.00Current
MiMo V2 Pro1.0M$1.00$3.00Available
MiMo V2 Flash262K$0.100$0.300Available
MiMo V2.5 424BAvailable
MiMo V2 OmniAvailable
MiMo V2Available
MiMo V2 Flash ReasoningAvailable
MiMo V2 TTSAvailable
MiMo V2.5 TTS VoiceDesignAvailable

Model IDs