DiTo is Google's image to text model. Google's DiTo (Distillation of Text-to-Object) model for grounded object detection and vision-language understanding tasks.
google-dito |
| Image to Text |
| Active |
| Image |
| Text |
Capabilities
Input1/5
·
✓
·
·
·
Output1/5
✓
·
·
·
·
Capabilities0/13
·
·
·
·
·
·
·
·
·
·
·
·
·