GPT Realtime Whisper is OpenAI's language model with a 16K context window and up to 2K output tokens, starting at $0.017 / 1M output. A streaming speech-to-text model optimized for low-latency transcript delivery from live audio, tailored for real-time applications requiring fine-tuned transcription control.
Specifications
Canonical IDopenai-gpt-realtime-whisper
TypeLanguage
StatusActive
CreatorOpenAIOpenAI
Providers
Context Window16K tokens
Max Output2K tokens
Input ModalitiesAudioText
Output ModalitiesText
Knowledge Cutoff

Capabilities

Input2/5
Text
Image·
Audio
Video·
PDF·
Output1/5
Text
Image·
Audio·
Video·
Embedding·
Capabilities0/13
Reasoning·
Adaptive Reasoning·
Function Calling·
Parallel Function Calling·
Structured Outputs·
Native JSON Schema·
Web Search·
URL Context·
Computer Use·
Code Execution·
File Search·
Prompt Caching·
Assistant Prefill·

Pricing by Provider

ProviderStandardBatchFlexPriority
Output
$ / 1M
Output
$ / 1M
Output
$ / 1M
Output
$ / 1M
OpenAI logo
OpenAI
gpt-realtime-whisper
$0.017$0.017$0.017$0.017

Cost Calculator

Preset:

Versions

VersionReleasedContextInput / 1MOutput / 1MStatus
GPT-5.51.1M$5.00$30.00Available
GPT-5.4 Mini1.1M$0.750$4.50Available
GPT-5.4 Nano1.1M$0.200$1.25Available
GPT-5.41.1M$2.50$15.00Available
GPT-5.3 Codex400K$1.75$14.00Available
GPT-5.2 Codex400K$1.75$14.00Available
GPT-5.2410K$1.75$14.00Available
GPT-5.1410K$1.25$10.00Available
GPT-5.1 Codex400K$1.25$10.00Available
GPT-5.1 Codex Mini400K$0.250$2.00Available
GPT Realtime Whisper16K$0.017Current

Model IDs