OpenAI logo

GPT-4o Audio


GPT-4o Audio is OpenAI logoOpenAI's language model with a 128K context window and up to 16K output tokens, available from 2 providers. A multimodal GPT-4o variant that accepts and produces audio inputs and outputs alongside text, enabling voice-capable conversational applications.
Spec
Canonical IDopenai-gpt-4o-audio
TypeLanguage
StatusActive
CreatorOpenAIOpenAI
Providers
Context Window128K tokens
Max Output16K tokens
Input ModalitiesAudio
Output ModalitiesAudio

Capabilities

Input1/5
Text·
Image·
Audio
Video·
PDF·
Output1/5
Text·
Image·
Audio
Video·
Embedding·
Capabilities2/13
Reasoning·
Adaptive Reasoning·
Function Calling
Parallel Function Calling
Structured Outputs·
Native JSON Schema·
Web Search·
URL Context·
Computer Use·
Code Execution·
File Search·
Prompt Caching·
Assistant Prefill·

Pricing by Provider

ProviderStandard
Input
$ / 1M
Output
$ / 1M
Audio In
$ / 1M
Audio Out
$ / 1M
Azure AI Foundry logo
Azure AI Foundry
$2.50$10.00$2.50$80.00
OpenAI logo
OpenAI
$2.50$10.00$40.00$80.00

Cost Calculator

Preset:
Compares every provider & tier in USD

Versions

VersionReleasedContextInput / 1MOutput / 1MStatus
GPT-5.4 Mini1.1M$0.750$4.50Available
GPT-5.4 Nano1.1M$0.200$1.25Available
GPT-5.41.1M$2.50$15.00Available
GPT-5.4 Pro1.1M$30.00$180.00Available
GPT-5.4 3.51.1MAvailable
GPT-5.4 Pro 3.51.1MAvailable
GPT-5.3 Chat128K$1.75$14.00Available
GPT-5.3 Codex400K$1.75$14.00Available
GPT-5.3 Codex Spark128KAvailable
GPT-5.3 Instant128KAvailable
GPT-4o Audio128KCurrent

Model IDs