Aya Vision is
Cohere's language model with a 16K context window and up to 4K output tokens. A 32B multimodal model supporting vision and text across 23 languages, excelling at image understanding, language, and cross-modal benchmarks.
Capabilities
Input2/5
✓
✓
·
·
·
Output1/5
✓
·
·
·
·
Capabilities0/13
·
·
·
·
·
·
·
·
·
·
·
·
·
Versions
| Version | Released | Context | Input / 1M | Output / 1M | Status |
|---|---|---|---|---|---|
| Aya Vision | — | 16K | — | — | Current |
| Aya 101 | — | — | — | — | Available |
| Aya 101 | — | — | — | — | Available |
| Aya Expanse | — | 128K | $0.500 | $1.50 | Available |
| Aya Expanse | — | 8K | $0.500 | $1.50 | Available |
| Aya Vision | — | 16K | — | — | Available |