BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Model card for image captioning pretrained on COCO dataset - base architecture (with ViT large backbone).
BLIP.gif
Pull figure from BLIP official repo
Model card for image captioning pretrained on COCO dataset - base architecture (with ViT large backbone).
BLIP.gif
Pull figure from BLIP official repo