Base Video is Deepgram's speech to text model. Deepgram's base-tier ASR model designed for transcribing speech from video content.
Specifications
Canonical IDdeepgram-base-video
TypeSpeech to Text
StatusActive
CreatorDeepgramDeepgram
Providers
Input ModalitiesAudio
Output ModalitiesText

Capabilities

Input1/5
Text·
Image·
Audio
Video·
PDF·
Output1/5
Text
Image·
Audio·
Video·
Embedding·
Capabilities0/13
Reasoning·
Adaptive Reasoning·
Function Calling·
Parallel Function Calling·
Structured Outputs·
Native JSON Schema·
Web Search·
URL Context·
Computer Use·
Code Execution·
File Search·
Prompt Caching·
Assistant Prefill·

Pricing by Provider

US Dollar ($)
Per 1M tokens
ProviderStandard
Audio In
$ / sec
Deepgram logo
Deepgram
deepgram/base-video
$0.000208

Cost Calculator

US Dollar ($)
Preset:

Versions

VersionReleasedContextInput / 1MOutput / 1MStatus
Base VideoCurrent
Base ConversationalAIAvailable
Base FinanceAvailable
Base GeneralAvailable
Base MeetingAvailable
Base PhonecallAvailable
Base VoicemailAvailable
Deepgram BaseAvailable
Deepgram EnhancedAvailable
Enhanced FinanceAvailable
Enhanced GeneralAvailable

Model IDs

deepgram-base-video
deepgram/base-video