Tongyi Flash Embedding Vision is Alibaba's embedding model, starting at $0.03 / 1M input. A fast-tier Tongyi multimodal embedding model supporting vision and text for cross-modal retrieval.
Specifications
Canonical IDalibaba-tongyi-flash-embedding-vision
TypeEmbedding
StatusActive
CreatorAlibabaAlibaba
Providers
Input ModalitiesText
Output ModalitiesEmbedding

Capabilities

Input1/5
Text
Image·
Audio·
Video·
PDF·
Output1/5
Text·
Image·
Audio·
Video·
Embedding
Capabilities0/13
Reasoning·
Adaptive Reasoning·
Function Calling·
Parallel Function Calling·
Structured Outputs·
Native JSON Schema·
Web Search·
URL Context·
Computer Use·
Code Execution·
File Search·
Prompt Caching·
Assistant Prefill·

Pricing by Provider

US Dollar ($)
Per 1M tokens
ProviderStandardBatch
Input
$ / 1M
Input
$ / 1M
Alibaba Qwen logo
Alibaba Qwen
tongyi-embedding-vision-flash
$0.03$0.015

Cost Calculator

US Dollar ($)
Preset:

Other Models

ModelTierReleasedContextInput / 1MOutput / 1M
Tongyi Intent Detect 38K$0.058$0.144
Tongyi DeepResearch 30B A3B131K
Tongyi Plus Embedding VisionPlus$0.090

Model IDs

alibaba-tongyi-flash-embedding-vision
tongyi-embedding-vision-flash