Name: MiMo V2 Flash
Brand: Xiaomi

MiMo V2 Flash is Xiaomi's language model with a 262K context window and up to 32K output tokens, available from 2 providers, starting at $0.1 / 1M input and $0.3 / 1M output. A highly efficient MoE LLM from Xiaomi with 309B total parameters and only 15B active, designed for extreme inference speed with hybrid attention.

Specifications
Canonical ID	`xiaomi-mimo-2-flash`
Type	Language
Status	Deprecated
Creator	Xiaomi
Providers	Novita OpenRouter
Context Window	262K tokens
Max Output	32K tokens
Input Modalities	Text
Output Modalities	Text
Reasoning Efforts	default
Parameters	310B
HuggingFace Likes	716
HuggingFace Downloads (30d)	67,865
HuggingFace Downloads (all-time)	728,735
Release Date	2025-12-16 · 7 months ago
Deprecation Date	2026-06-17

Benchmarks
Intelligence Index	24.7 #140
Coding Index	49.8 #43
Math Index	67.7 #95
MMLU-Pro	0.7 #169
GPQA	0.7 #249
HLE	0.1 #197
LiveCodeBench	0.4 #168
IFBench	0.4 #237
Time to First Token	0.00s #361
SciCode	0.3 #312
AIME 2025	0.7 #95
LCR	0.3 #226
TerminalBench Hard	0.3 #117
TAU2	0.8 #97
Output TPS	0.0 #486

Capabilities

Input1/5

Text✓

Image·

Audio·

Video·

PDF·

Output1/5

Text✓

Image·

Audio·

Video·

Embedding·

Capabilities6/13

Reasoning✓

Adaptive Reasoning·

Function Calling✓

Parallel Function Calling✓

Structured Outputs✓

Native JSON Schema✓

Web Search·

URL Context·

Computer Use·

Code Execution·

File Search·

Prompt Caching✓

Assistant Prefill·

Pricing by Provider

US Dollar ($)

Per 1M tokens

Provider	Standard
Provider	Input $ / 1M	Output $ / 1M	Cache Read $ / 1M
Novita `novita/xiaomimimo/mimo-v2-flash`	$0.1	$0.3	$0.02
OpenRouter `openrouter/xiaomi/mimo-v2-flash`	$0.1	$0.3	$0.01

Cost Calculator

US Dollar ($)

Preset:

Input tokens

Output tokens

Cache write tokens

Cache read tokens

Number of calls

Cheapest Instances to Run It

Cloud GPU instances that can host MiMo V2 Flash, ranked by cheapest on-demand price. The model needs about 743 GB of GPU memory at FP16 precision (estimated from its parameter count), so treat the fit as guidance rather than a guarantee.

All clouds

FP16 (full precision)

US Dollar ($)

Instance	Cloud	GPU	VRAM	Price	Cheapest region
g7e.48xlarge	AWS	8× RTX PRO Server 6000	768 GB	$33.14/hr	us-east-1
Standard_ND96isr_MI300X_v5	Azure	8× AMD Instinct MI300X	1536 GB	$48.00/hr	westus3
p5en.48xlarge	AWS	8× H200	1128 GB	$63.30/hr	us-east-1
4 more instances can run MiMo V2 Flash Unlock the full ranked list and FP8 / INT4 quantization with a CloudPrice subscription.

Versions

Version	Released	Context	Input / 1M	Output / 1M	Status
MiMo V2.5 Pro	2026-04-22	1.1M	$0.435	$0.870	Available
MiMo V2.5	2026-04-22	1.1M	$0.140	$0.280	Available
MiMo V2 Pro	2026-03-18	1.0M	—	—	Deprecated
MiMo V2 Omni	2026-03-18	262K	—	—	Deprecated
MiMo V2 Flash	2025-12-16	262K	$0.100	$0.300	Current
MiMo V2.5 424B	—	—	—	—	Available
MiMo V2 Omni	—	—	—	—	Available
MiMo V2	—	—	—	—	Available
MiMo V2 Flash Reasoning	—	—	—	—	Available
MiMo V2 TTS	—	—	—	—	Available
MiMo V2.5 TTS	—	—	—	—	Available

Model IDs

mimo-v2-flash

mimo-v2-flash-reasoning

novita/xiaomimimo/mimo-v2-flash

openrouter/xiaomi/mimo-v2-flash

xiaomi-mimo-2-flash

xiaomi/mimo-v2-flash

xiaomimimo/mimo-v2-flash

MiMo V2 Flash

CapabilitiesAPIGET/api/v1/models/xiaomi-mimo-2-flash

Pricing by ProviderAPIGET/api/v1/models/xiaomi-mimo-2-flash/pricing

Cost CalculatorAPIGET/api/v1/models/xiaomi-mimo-2-flash/pricing/calculate?input_tokens=1000000&output_tokens=500000

Cheapest Instances to Run ItAPIGET/api/v1/models/xiaomi-mimo-2-flash/instances

VersionsAPIGET/api/v1/models?family=mimo

Model IDsAPIGET/api/v1/models/xiaomi-mimo-2-flash

Capabilities

Pricing by Provider

Cost Calculator

Cheapest Instances to Run It

Versions

Model IDs