PaddlePaddle (Baidu) logo

PaddleOCR VL


PaddleOCR VL is PaddlePaddle (Baidu) logoPaddlePaddle (Baidu)'s image to text model with a 16K context window and up to 16K output tokens, starting at $0.020 / 1M input and $0.020 / 1M output. A vision-language OCR model from PaddlePaddle that combines visual and textual understanding for document recognition and text extraction tasks.
Spec
Canonical IDpaddlepaddle-paddleocr-vl
TypeImage to Text
StatusActive
CreatorPaddlePaddle (Baidu)PaddlePaddle (Baidu)
Providers
Context Window16K tokens
Max Output16K tokens
Input ModalitiesImage
Output ModalitiesText

Capabilities

Input1/5
Text·
Image
Audio·
Video·
PDF·
Output1/5
Text
Image·
Audio·
Video·
Embedding·
Capabilities0/13
Reasoning·
Adaptive Reasoning·
Function Calling·
Parallel Function Calling·
Structured Outputs·
Native JSON Schema·
Web Search·
URL Context·
Computer Use·
Code Execution·
File Search·
Prompt Caching·
Assistant Prefill·

Pricing by Provider

ProviderStandard
Input
$ / 1M
Output
$ / 1M
Novita logo
Novita
paddlepaddle/paddleocr-vl
$0.020$0.020

Cost Calculator

Preset:
Compares every provider & tier in USD

Model IDs