DeepSeek-OCR is DeepSeek's image to text model with a 8K context window and up to 8K output tokens, available from 4 providers, starting at $0.030 / 1M input and $0.030 / 1M output. A multimodal OCR model that compresses long document contexts via optical 2D mapping, combining a DeepEncoder with a compact MoE language model.
Specifications
Canonical IDdeepseek-ocr
TypeImage to Text
StatusActive
CreatorDeepSeekDeepSeek
Providers
Context Window8K tokens
Max Output8K tokens
Input ModalitiesImagePdfText
Output ModalitiesText
Parameters3.34B
HuggingFace Likes3,218
HuggingFace Downloads (30d)2,082,348
HuggingFace Downloads (all-time)22,155,059

Capabilities

Input3/5
Text
Image
Audio·
Video·
PDF
Output1/5
Text
Image·
Audio·
Video·
Embedding·
Capabilities1/13
Reasoning·
Adaptive Reasoning·
Function Calling·
Parallel Function Calling·
Structured Outputs
Native JSON Schema·
Web Search·
URL Context·
Computer Use·
Code Execution·
File Search·
Prompt Caching·
Assistant Prefill·

Pricing by Provider

Cost Calculator

Preset:

Versions

VersionReleasedContextInput / 1MOutput / 1MStatus
DeepSeek-OCR 2Available
Qianfan OCR Fast66KDeprecated
DeepSeek-OCR8K$0.030$0.030Current
DeepSeek-OCRAvailable
Document OCRAvailable
Mistral OCR$2.00$3.00Available
OCRAvailable
Prebuilt DocumentAvailable
Prebuilt LayoutAvailable
Prebuilt ReadAvailable

Model IDs