DeepSeek-OCR is DeepSeek's image to text model with a 8K context window and up to 8K output tokens, available from 4 providers, starting at $0.030 / 1M input and $0.030 / 1M output. A multimodal OCR model that compresses long document contexts via optical 2D mapping, combining a DeepEncoder with a compact MoE language model.
deepseek-ocr |
| Image to Text |
| Active |
| 8K tokens |
| 8K tokens |
| ImagePdfText |
| Text |
| 3.34B |
| 3,218 |
| 2,082,348 |
| 22,155,059 |
Capabilities
Input3/5
Text✓
Image✓
Audio·
Video·
PDF✓
Output1/5
Text✓
Image·
Audio·
Video·
Embedding·
Capabilities1/13
Reasoning·
Adaptive Reasoning·
Function Calling·
Parallel Function Calling·
Structured Outputs✓
Native JSON Schema·
Web Search·
URL Context·
Computer Use·
Code Execution·
File Search·
Prompt Caching·
Assistant Prefill·
Pricing by Provider
| Provider | Standard | |
|---|---|---|
| Input $ / 1M | Output $ / 1M | |
Hugging Face | $0.030 | $0.030 |
Novita | $0.030 | $0.030 |
Google Gemini | $0.300 | $1.20 |
Google Vertex AI | $0.300 | $1.20 |
Cost Calculator
Preset:
Versions
| Version | Released | Context | Input / 1M | Output / 1M | Status |
|---|---|---|---|---|---|
| DeepSeek-OCR 2 | — | — | — | — | Available |
| Qianfan OCR Fast | 66K | — | — | Deprecated | |
| DeepSeek-OCR | — | 8K | $0.030 | $0.030 | Current |
| DeepSeek-OCR | — | — | — | — | Available |
| Document OCR | — | — | — | — | Available |
| Mistral OCR | — | — | $2.00 | $3.00 | Available |
| OCR | — | — | — | — | Available |
| Prebuilt Document | — | — | — | — | Available |
| Prebuilt Layout | — | — | — | — | Available |
| Prebuilt Read | — | — | — | — | Available |