DeepSeek-OCR is DeepSeek's image to text model with a 8K context window and up to 8K output tokens, available from 4 providers, starting at $0.03 / 1M input and $0.03 / 1M output. A multimodal OCR model that compresses long document contexts via optical 2D mapping, combining a DeepEncoder with a compact MoE language model.
Specifications
Canonical IDdeepseek-ocr
TypeImage to Text
StatusActive
CreatorDeepSeekDeepSeek
Providers
Context Window8K tokens
Max Output8K tokens
Input ModalitiesImagePDFText
Output ModalitiesText
Parameters3.34B
HuggingFace Likes3,218
HuggingFace Downloads (30d)2,082,348
HuggingFace Downloads (all-time)22,155,059

Capabilities

Input3/5
Text
Image
Audio·
Video·
PDF
Output1/5
Text
Image·
Audio·
Video·
Embedding·
Capabilities1/13
Reasoning·
Adaptive Reasoning·
Function Calling·
Parallel Function Calling·
Structured Outputs
Native JSON Schema·
Web Search·
URL Context·
Computer Use·
Code Execution·
File Search·
Prompt Caching·
Assistant Prefill·

Pricing by Provider

US Dollar ($)
Per 1M tokens
ProviderStandard
Input
$ / 1M
Output
$ / 1M
Page In
$ / page
Google Gemini logo
Google Gemini
deepseek-ocr
$0.3$1.20N/A
Google Vertex AI logo
Google Vertex AI
deepseek-ocr
$0.3$1.20$0.000300
Hugging Face logo
Hugging Face
novita:deepseek/deepseek-ocr
$0.03$0.03N/A
Novita logo
Novita
novita/deepseek/deepseek-ocr
$0.03$0.03N/A

Cost Calculator

US Dollar ($)
Preset:

Model IDs

deepseek-ai/DeepSeek-OCR
deepseek-ocr
deepseek-ocr-maas
deepseek/deepseek-ocr
novita/deepseek/deepseek-ocr
publishers/google/models/deepseek-ocr-maas
vertex_ai/deepseek-ai/deepseek-ocr-maas