MiMo-V2-Omni is
Xiaomi's language model with a 262K context window and up to 66K output tokens, starting at $0.400 / 1M input and $2.00 / 1M output. Xiaomi's frontier omni-modal model natively processing image, video, and audio inputs within a unified architecture for agentic tasks.
xiaomi-mimo-2-omni |
| Language |
| Active |
| 262K tokens |
| 66K tokens |
| AudioImageTextVideo |
| Text |
| default |
| · 1 month ago |
43.4#14 |
35.5#29 |
0.8#27 |
0.2#22 |
0.5#50 |
0.00s#135 |
0.4#77 |
0.6#23 |
0.3#23 |
0.9#21 |
0.0#347 |
Capabilities
Input4/5
✓
✓
✓
✓
·
Output1/5
✓
·
·
·
·
Capabilities1/13
✓
·
·
·
·
·
·
·
·
·
·
·
·
Pricing by Provider
| Provider | Standard | ||
|---|---|---|---|
| Input $ / 1M | Output $ / 1M | Cache Read $ / 1M | |
OpenRouter | $0.400 | $2.00 | $0.080 |
Cost Calculator
Preset:
Compares every provider & tier in USD
Versions
| Version | Released | Context | Input / 1M | Output / 1M | Status |
|---|---|---|---|---|---|
| MiMo-V2-Omni | 262K | $0.400 | $2.00 | Current | |
| MiMo V2 Pro | 1.0M | $1.00 | $3.00 | Available |