IBM logo

RolmOCR


RolmOCR is IBM logoIBM's image to text model with a 128K context window, starting at $0.200 / 1M input and $0.200 / 1M output. An open-source document OCR model built on Qwen2.5-VL-7B-Instruct, offering fast and memory-efficient document text extraction as an alternative to olmOCR.
Spec
Canonical IDibm-rolm-ocr
TypeImage to Text
StatusActive
CreatorIBMIBM
Providers
Context Window128K tokens
Input ModalitiesText
Output ModalitiesText

Capabilities

Input1/5
Text
Image·
Audio·
Video·
PDF·
Output1/5
Text
Image·
Audio·
Video·
Embedding·
Capabilities0/13
Reasoning·
Adaptive Reasoning·
Function Calling·
Parallel Function Calling·
Structured Outputs·
Native JSON Schema·
Web Search·
URL Context·
Computer Use·
Code Execution·
File Search·
Prompt Caching·
Assistant Prefill·

Pricing by Provider

ProviderStandard
Input
$ / 1M
Output
$ / 1M
Fireworks AI logo
Fireworks AI
accounts/fireworks/models/rolm-ocr
$0.200$0.200

Cost Calculator

Preset:
Compares every provider & tier in USD

Versions

VersionReleasedContextInput / 1MOutput / 1MStatus
Qwen2.5 VL 32B Instruct131K$0.200$0.600Available
Qwen2.5 VL 72B Instruct131K$0.130$0.400Available
RolmOCR128K$0.200$0.200Current
Qwen2.5 VL 3B Instruct131K$0.200$0.200Available
Qwen2.5 VL 7B Instruct131K$0.200$0.200Available

Model IDs