UI-TARS 1.5 7B is ByteDance's language model with a 131K context window and up to 2K output tokens, starting at $0.100 / 1M input and $0.200 / 1M output. A 7B multimodal vision-language agent optimized for GUI-based environments including desktop, web, mobile, and game interfaces.
Specifications
Canonical IDbytedance-ui-tars-1-5-7b
TypeLanguage
StatusActive
CreatorByteDanceByteDance
Providers
Context Window131K tokens
Max Output2K tokens
Input ModalitiesImageText
Output ModalitiesText
Parameters7B
Release Date · 10 months ago
Knowledge Cutoff

Capabilities

Input2/5
Text
Image
Audio·
Video·
PDF·
Output1/5
Text
Image·
Audio·
Video·
Embedding·
Capabilities0/13
Reasoning·
Adaptive Reasoning·
Function Calling·
Parallel Function Calling·
Structured Outputs·
Native JSON Schema·
Web Search·
URL Context·
Computer Use·
Code Execution·
File Search·
Prompt Caching·
Assistant Prefill·

Pricing by Provider

ProviderStandard
Input
$ / 1M
Output
$ / 1M
Cache Read
$ / 1M
OpenRouter logo
OpenRouter
bytedance/ui-tars-1.5-7b
$0.100$0.200$0.100

Cost Calculator

Preset:

Model IDs