ByteDance logo

UI-TARS 7B


UI-TARS 7B is ByteDance logoByteDance's language model with a 131K context window and up to 2K output tokens, starting at $0.100 / 1M input and $0.200 / 1M output. A 7-billion-parameter multimodal vision-language agent from ByteDance optimized for GUI automation across desktop, web, mobile, and gaming environments.
Spec
Canonical IDbytedance-ui-tars-1-5-7b
TypeLanguage
StatusActive
CreatorByteDanceByteDance
Providers
Context Window131K tokens
Max Output2K tokens
Input ModalitiesImageText
Output ModalitiesText
Parameters7B
Release Date · 9 months ago

Capabilities

Input2/5
Text
Image
Audio·
Video·
PDF·
Output1/5
Text
Image·
Audio·
Video·
Embedding·
Capabilities0/13
Reasoning·
Adaptive Reasoning·
Function Calling·
Parallel Function Calling·
Structured Outputs·
Native JSON Schema·
Web Search·
URL Context·
Computer Use·
Code Execution·
File Search·
Prompt Caching·
Assistant Prefill·

Pricing by Provider

ProviderStandard
Input
$ / 1M
Output
$ / 1M
Cache Read
$ / 1M
OpenRouter logo
OpenRouter
bytedance/ui-tars-1.5-7b
$0.100$0.200$0.100

Cost Calculator

Preset:
Compares every provider & tier in USD

Model IDs