Kakao Brain’s Open Source ViT, ALIGN, and the New COYO Te...

Kakao Brain’s Open Source ViT, ALIGN, and the New COYO Text-Image Dataset
Kakao Brain and Hugging Face are excited to release a new open-source image-text dataset COYO of 700 million pairs and two new visual language models trained on it, ViT and ALIGN. This is the first time ever the ALIGN model is made public for free and open-source use and the first release of ViT and ALIGN models that come with the train dataset.

Kakao Brain’s ViT and ALIGN models follow the same architecture and hyperparameters as provided in the original respective Google models but are trained on the open source COYO dataset. Google’s ViT and ALIGN models, while trained on huge datasets (ViT trained on 300 million images and ALIGN trained on 1.8 billion image-text pairs respectively), cannot be replicated because the datasets are not public. This contribution is particularly valuable to researchers who want to reproduce visual language modeling with access to the data as well.