Voxtral Mini 3B is Mistral AI's language model with a 128K context window and up to 8K output tokens, starting at $0.040 / 1M input and $0.040 / 1M output. A compact 3B-parameter audio-language model from Mistral capable of speech transcription and audio understanding in a small footprint.
Specifications
Canonical IDmistral-voxtral-mini
TypeLanguage
StatusActive
CreatorMistral AIMistral AI
Providers
Context Window128K tokens
Max Output8K tokens
Input ModalitiesAudioText
Output ModalitiesText
Release Date · 6 months ago

Capabilities

Input2/5
Text
Image·
Audio
Video·
PDF·
Output1/5
Text
Image·
Audio·
Video·
Embedding·
Capabilities1/13
Reasoning·
Adaptive Reasoning·
Function Calling·
Parallel Function Calling·
Structured Outputs·
Native JSON Schema
Web Search·
URL Context·
Computer Use·
Code Execution·
File Search·
Prompt Caching·
Assistant Prefill·

Pricing by Provider

ProviderStandardBatchFlexPriority
Input
$ / 1M
Output
$ / 1M
Input
$ / 1M
Output
$ / 1M
Input
$ / 1M
Output
$ / 1M
Input
$ / 1M
Output
$ / 1M
Amazon Bedrock logo
Amazon Bedrock
mistral.voxtral-mini-3b-2507
$0.040$0.040$0.020$0.020$0.020$0.020$0.070$0.070

Cost Calculator

Preset:

Versions

VersionReleasedContextInput / 1MOutput / 1MStatus
Voxtral Mini 3B128K$0.040$0.040Current
Voxtral Mini Transcribe Realtime$0.006Available
Voxtral Mini TTS$0.000$16.00Available

Other models

ModelTierReleasedContextInput / 1MOutput / 1M
Voxtral Small 24BSmall128K$0.100$0.300
Voxtral TTS

Model IDs