GPT Realtime Whisper is OpenAI's language model with a 16K context window and up to 2K output tokens. A streaming speech-to-text model optimized for low-latency transcript delivery from live audio, tailored for real-time applications requiring fine-tuned transcription control.
Specifications
Canonical IDopenai-gpt-realtime-whisper
TypeLanguage
StatusActive
CreatorOpenAIOpenAI
Context Window16K tokens
Max Output2K tokens
Input ModalitiesAudioText
Output ModalitiesText
Knowledge Cutoff

Capabilities

Input2/5
Text
Image·
Audio
Video·
PDF·
Output1/5
Text
Image·
Audio·
Video·
Embedding·
Capabilities0/13
Reasoning·
Adaptive Reasoning·
Function Calling·
Parallel Function Calling·
Structured Outputs·
Native JSON Schema·
Web Search·
URL Context·
Computer Use·
Code Execution·
File Search·
Prompt Caching·
Assistant Prefill·

Versions

VersionReleasedContextInput / 1MOutput / 1MStatus
GPT-5.51.1M$5.00$30.00Available
GPT-5.4 Mini1.1M$0.750$4.50Available
GPT-5.4 Nano1.1M$0.200$1.25Available
GPT-5.41.1M$2.50$15.00Available
GPT-5.3 Codex400K$1.75$14.00Available
GPT-5.2 Codex400K$1.75$14.00Available
GPT-5.2410K$1.75$14.00Available
GPT-5.1410K$1.25$10.00Available
GPT-5.1 Codex400K$1.25$10.00Available
GPT-5.1 Codex Mini400K$0.250$2.00Available
GPT Realtime Whisper16KCurrent

Model IDs