MiMo V2 Omni is Xiaomi's language model with a 262K context window and up to 66K output tokens, starting at $0.400 / 1M input and $2.00 / 1M output. A frontier omni-modal LLM from Xiaomi that natively processes image, video, and audio inputs within a unified architecture for agentic tasks.
Specifications
Canonical IDxiaomi-mimo-2-omni
TypeLanguage
StatusActive
CreatorXiaomiXiaomi
Providers
Context Window262K tokens
Max Output66K tokens
Input ModalitiesAudioImageTextVideo
Output ModalitiesText
Reasoning Effortsdefault
Release Date · 2 months ago
Benchmarks
Intelligence Index
43.4
#43
Coding Index
35.5
#64
GPQA
0.8
#68
HLE
0.2
#59
IFBench
0.5
#124
Time to First Token
1.42s
#392
SciCode
0.4
#140
LCR
0.7
#35
TerminalBench Hard
0.3
#55
TAU2
0.9
#44
Output TPS
111.4
#104

Capabilities

Input4/5
Text
Image
Audio
Video
PDF·
Output1/5
Text
Image·
Audio·
Video·
Embedding·
Capabilities3/13
Reasoning
Adaptive Reasoning·
Function Calling
Parallel Function Calling·
Structured Outputs
Native JSON Schema·
Web Search·
URL Context·
Computer Use·
Code Execution·
File Search·
Prompt Caching·
Assistant Prefill·

Pricing by Provider

ProviderStandard
Input
$ / 1M
Output
$ / 1M
Cache Read
$ / 1M
OpenRouter logo
OpenRouter
xiaomi/mimo-v2-omni
$0.400$2.00$0.080

Cost Calculator

Preset:

Versions

VersionReleasedContextInput / 1MOutput / 1MStatus
MiMo V2.5 Pro1.1M$1.00$3.00Available
MiMo V2.51.1M$0.400$2.00Available
MiMo V2 Omni262K$0.400$2.00Current
MiMo V2 Pro1.0M$1.00$3.00Available
MiMo V2 Flash262K$0.100$0.300Available
MiMo V2.5 424BAvailable
MiMo V2 OmniAvailable
MiMo V2Available
MiMo V2 Flash ReasoningAvailable
MiMo V2 TTSAvailable
MiMo V2.5 TTS VoiceDesignAvailable

Model IDs