STT is Microsoft's speech to text model. A speech-to-text transcription model from Microsoft for converting spoken audio into written text.
Specifications
Canonical IDmicrosoft-stt
TypeSpeech to Text
StatusActive
CreatorMicrosoftMicrosoft
Providers
Input ModalitiesAudio
Output ModalitiesText

Capabilities

Input1/5
Text·
Image·
Audio
Video·
PDF·
Output1/5
Text
Image·
Audio·
Video·
Embedding·
Capabilities0/13
Reasoning·
Adaptive Reasoning·
Function Calling·
Parallel Function Calling·
Structured Outputs·
Native JSON Schema·
Web Search·
URL Context·
Computer Use·
Code Execution·
File Search·
Prompt Caching·
Assistant Prefill·

Pricing by Provider

US Dollar ($)
Per 1M tokens
ProviderStandard
Audio In
$ / sec
Azure AI Foundry logo
Azure AI Foundry
azure/speech/azure-stt
$0.000278

Cost Calculator

US Dollar ($)
Preset:

Versions

VersionReleasedContextInput / 1MOutput / 1MStatus
STTCurrent
Model Router$0.140Available

Model IDs

azure/speech/azure-stt
microsoft-stt