PaliGemma – Google's Cutting-Edge Open Vision Language Mo...

PaliGemma – Google's Cutting-Edge Open Vision Language Model
Updated on 23-05-2024: We have introduced a few changes to the transformers PaliGemma implementation around fine-tuning, which you can find in this notebook.

PaliGemma is a new family of vision language models from Google. PaliGemma can take in an image and a text and output text.

The team at Google has released three types of models: the pretrained (pt) models, the mix models, and the fine-tuned (ft) models, each with different resolutions and available in multiple precisions for convenience.

All models are released in the Hugging Face Hub model repositories with their model cards and licenses and have transformers integration.https://github.com/huggingface/blog/blob/main/paligemma.md

GitHub

blog/paligemma.md at main · huggingface/blog

Public repo for HF blog posts. Contribute to huggingface/blog development by creating an account on GitHub.