Qwen3.7 Plus VL Instruct is Alibaba's language model. A Qwen3 mixture-of-experts vision-language model designed for multimodal instruction-following, combining image understanding with strong language capabilities.
Specifications
Canonical IDalibaba-qwen3-7-plus-vl-instruct
TypeLanguage
StatusActive
CreatorAlibabaAlibaba
Input ModalitiesText
Output ModalitiesText

Capabilities

Input1/5
Text
Image·
Audio·
Video·
PDF·
Output1/5
Text
Image·
Audio·
Video·
Embedding·
Capabilities0/13
Reasoning·
Adaptive Reasoning·
Function Calling·
Parallel Function Calling·
Structured Outputs·
Native JSON Schema·
Web Search·
URL Context·
Computer Use·
Code Execution·
File Search·
Prompt Caching·
Assistant Prefill·

Other Models

ModelTierReleasedContextInput / 1MOutput / 1M
Qwen3.5 122B A10B262K$0.260$2.08
Qwen3.5 35B A3B262K$0.140$1.00
Qwen3.5 397B A17B262K$0.390$2.34

Model IDs

accounts/fireworks/models/qwen3p7-plus-vl-instruct
alibaba-qwen3-7-plus-vl-instruct