DeepSeek-OCR is DeepSeek's image to text model with a 8K context window and up to 8K output tokens, available from 4 providers, starting at $0.03 / 1M input and $0.03 / 1M output. A multimodal OCR model that compresses long document contexts via optical 2D mapping, combining a DeepEncoder with a compact MoE language model.
| Specifications | |
|---|---|
deepseek-ocr | |
| Image to Text | |
| Active | |
| 8K tokens | |
| 8K tokens | |
| ImagePDFText | |
| Text | |
| 3.34B | |
| 3,218 | |
| 2,082,348 | |
| 22,155,059 | |
Capabilities
Input3/5
Text✓
Image✓
Audio·
Video·
PDF✓
Output1/5
Text✓
Image·
Audio·
Video·
Embedding·
Capabilities1/13
Reasoning·
Adaptive Reasoning·
Function Calling·
Parallel Function Calling·
Structured Outputs✓
Native JSON Schema·
Web Search·
URL Context·
Computer Use·
Code Execution·
File Search·
Prompt Caching·
Assistant Prefill·
Pricing by Provider
US Dollar ($)
Per 1M tokens
| Provider | Standard | ||
|---|---|---|---|
| Input $ / 1M | Output $ / 1M | Page In $ / page | |
| $0.3 | $1.20 | N/A | |
| $0.3 | $1.20 | $0.000300 | |
| $0.03 | $0.03 | N/A | |
| $0.03 | $0.03 | N/A | |
Cost Calculator
US Dollar ($)
Preset: