Rolm OCR


Rolm OCR is Rolm's image to text model with a 128K context window, starting at $0.200 / 1M input and $0.200 / 1M output. An open-source document OCR model built on Qwen2.5-VL-7B-Instruct by Reducto AI, offering faster performance and reduced memory usage as a drop-in alternative to olmOCR.
Spec
Canonical IDrolm-ocr
TypeImage to Text
StatusActive
CreatorRolm
Providers
Context Window128K tokens
Input ModalitiesText
Output ModalitiesText

Capabilities

Input1/5
Text
Image·
Audio·
Video·
PDF·
Output1/5
Text
Image·
Audio·
Video·
Embedding·
Capabilities0/13
Reasoning·
Adaptive Reasoning·
Function Calling·
Parallel Function Calling·
Structured Outputs·
Native JSON Schema·
Web Search·
URL Context·
Computer Use·
Code Execution·
File Search·
Prompt Caching·
Assistant Prefill·

Pricing by Provider

ProviderStandard
Input
$ / 1M
Output
$ / 1M
Fireworks AI logo
Fireworks AI
fireworks_ai/accounts/fireworks/models/rolm-ocr
$0.200$0.200

Cost Calculator

Preset:
Compares every provider & tier in USD

Versions

VersionReleasedContextInput / 1MOutput / 1MStatus
Qwen2.5 VL 72B Instruct131K$0.130$0.400Available
Rolm OCR128K$0.200$0.200Current
Qwen2.5 VL 32B Instruct131K$0.200$0.600Available
Qwen2.5 VL 3B Instruct131K$0.200$0.200Available
Qwen2.5 VL 7B Instruct131K$0.200$0.200Available

Model IDs