The Imp project aims to provide a family of a strong mult... | The Imp project aims to provide a family of a strong mult...
The Imp project aims to provide a family of a strong multimodal small language models (MSLMs). Our imp-v1-3b is a strong MSLM with only 3B parameters, which is build upon a small yet powerful SLM Phi-2 (2.7B) and a powerful visual encoder SigLIP (0.4B), and trained on the LLaVA-v1.5 training set.

As shown in the image below, imp-v1-3b significantly outperforms the counterparts of similar model sizes, and even achieves slightly better performance than the strong LLaVA-7B model on various multimodal benchmarks.
https://huggingface.co/MILVLG/imp-v1-3b MILVLG/imp-v1-3b · Hugging Face