Cohere logo

Aya Vision


Aya Vision is Cohere logoCohere's language model with a 16K context window and up to 4K output tokens. A 32B multimodal model supporting vision and text across 23 languages, excelling at image understanding, language, and cross-modal benchmarks.
Spec
Canonical IDcohere-aya-vision-32b
TypeLanguage
StatusActive
CreatorCohereCohere
Context Window16K tokens
Max Output4K tokens
Input ModalitiesImageText
Output ModalitiesText
Parameters32B

Capabilities

Input2/5
Text
Image
Audio·
Video·
PDF·
Output1/5
Text
Image·
Audio·
Video·
Embedding·
Capabilities0/13
Reasoning·
Adaptive Reasoning·
Function Calling·
Parallel Function Calling·
Structured Outputs·
Native JSON Schema·
Web Search·
URL Context·
Computer Use·
Code Execution·
File Search·
Prompt Caching·
Assistant Prefill·

Versions

VersionReleasedContextInput / 1MOutput / 1MStatus
Aya Vision16KCurrent
Aya 101Available
Aya 101Available
Aya Expanse128K$0.500$1.50Available
Aya Expanse8K$0.500$1.50Available
Aya Vision16KAvailable

Model IDs