Aya Vision 32B is Cohere's language model with a 16K context window and up to 4K output tokens. A 32B multimodal vision-language model supporting 23 languages, excelling at image understanding, text, and language benchmarks.
Specifications
Canonical IDcohere-aya-vision-32b
TypeLanguage
StatusActive
CreatorCohereCohere
Context Window16K tokens
Max Output4K tokens
Input ModalitiesImageText
Output ModalitiesText
Parameters32B

Capabilities

Input2/5
Text
Image
Audio·
Video·
PDF·
Output1/5
Text
Image·
Audio·
Video·
Embedding·
Capabilities0/13
Reasoning·
Adaptive Reasoning·
Function Calling·
Parallel Function Calling·
Structured Outputs·
Native JSON Schema·
Web Search·
URL Context·
Computer Use·
Code Execution·
File Search·
Prompt Caching·
Assistant Prefill·

Versions

VersionReleasedContextInput / 1MOutput / 1MStatus
Aya Vision 32B16KCurrent
Aya Tiny Global8KAvailable
Aya 101Available
Aya Expanse 32B128K$0.500$1.50Available
Aya Expanse 8B8K$0.500$1.50Available
Aya Vision 8B16KAvailable
Tiny Aya Earth8KAvailable
Tiny Aya Fire8KAvailable
Tiny Aya Water8KAvailable

Model IDs