Name: GPT Audio
Brand: OpenAI

GPT Audio is OpenAI's language model with a 128K context window and up to 16K output tokens, available from 3 providers, starting at $2.50 / 1M input and $10.00 / 1M output. OpenAI's first generally available audio model that accepts and produces audio inputs and outputs via the Chat Completions API.

Specifications
Canonical ID	`openai-gpt-audio`
Type	Language
Status	Active
Creator	OpenAI
Providers	Microsoft Azure AI Foundry OpenAI OpenRouter
Context Window	128K tokens
Max Output	16K tokens
Input Modalities	AudioText
Output Modalities	AudioText
Release Date	2025-08-28 · 9 months ago
Knowledge Cutoff	2023-10

Capabilities

Input2/5

Text✓

Image·

Audio✓

Video·

PDF·

Output2/5

Text✓

Image·

Audio✓

Video·

Embedding·

Capabilities4/13

Reasoning·

Adaptive Reasoning·

Function Calling✓

Parallel Function Calling✓

Structured Outputs✓

Native JSON Schema✓

Web Search·

URL Context·

Computer Use·

Code Execution·

File Search·

Prompt Caching·

Assistant Prefill·

Pricing by Provider

Provider	Standard
Provider	Input $ / 1M	Output $ / 1M	Audio In $ / 1M	Audio Out $ / 1M
Azure AI Foundry azure/gpt-audio-2025-08-28	$2.50	$10.00	$40.00	$80.00
OpenAI gpt-audio	$2.50	$10.00	$32.00	$64.00
OpenRouter openai/gpt-audio	$2.50	$10.00	$32.00	N/A

Cost Calculator

Preset:

Input tokens

Output tokens

Number of calls

Versions

Version	Released	Context	Input / 1M	Output / 1M	Status
GPT Audio 1.5	—	128K	$2.50	$10.00	Available
GPT Audio Mini	2025-10-06	128K	$0.600	$2.40	Available
GPT Audio	2025-08-28	128K	$2.50	$10.00	Current
GPT Realtime 2 Image	—	—	—	—	Available
GPT Realtime 2 Text	—	—	—	—	Available

GPT Audio

Capabilities

Pricing by Provider

Cost Calculator

Versions

Model IDs