OWL-ViT Base Patch32 is Google's image to text model. A base-scale open-vocabulary object detection model using a ViT-B/32 backbone for text-conditioned zero-shot detection.
google-owlvit-base-patch32 |
| Image to Text |
| Active |
| Image |
| Text |
Capabilities
Input1/5
·
✓
·
·
·
Output1/5
✓
·
·
·
·
Capabilities0/13
·
·
·
·
·
·
·
·
·
·
·
·
·