Rolm OCR is Rolm's image to text model with a 128K context window, starting at $0.2 / 1M input and $0.2 / 1M output. An open-source document OCR model built on Qwen2.5-VL-7B-Instruct by Reducto AI, offering faster performance and reduced memory usage as a drop-in alternative to olmOCR.
Specifications
Canonical IDrolm-ocr
TypeImage to Text
StatusActive
CreatorRolm
Providers
Context Window128K tokens
Input ModalitiesText
Output ModalitiesText

Capabilities

Input1/5
Text
Image·
Audio·
Video·
PDF·
Output1/5
Text
Image·
Audio·
Video·
Embedding·
Capabilities0/13
Reasoning·
Adaptive Reasoning·
Function Calling·
Parallel Function Calling·
Structured Outputs·
Native JSON Schema·
Web Search·
URL Context·
Computer Use·
Code Execution·
File Search·
Prompt Caching·
Assistant Prefill·

Pricing by Provider

US Dollar ($)
Per 1M tokens
ProviderStandard
Input
$ / 1M
Output
$ / 1M
Fireworks AI logo
Fireworks AI
fireworks_ai/accounts/fireworks/models/rolm-ocr
$0.2$0.2

Cost Calculator

US Dollar ($)
Preset:

Versions

VersionReleasedContextInput / 1MOutput / 1MStatus
Voyage Multimodal 3.5Available
Qwen2.5 VL 72B Instruct131K$0.130$0.400Available
Rolm OCR128K$0.200$0.200Current
Qwen2.5 VL 32B Instruct131K$0.200$0.600Available
Qwen2.5 VL 3B Instruct131K$0.200$0.200Available
Qwen2.5 VL 7B Instruct131K$0.200$0.200Available

Model IDs

accounts/fireworks/models/rolm-ocr
fireworks_ai/accounts/fireworks/models/rolm-ocr
rolm-ocr