microsoft/Phi-3.5-vision-instruct Model Summary
Phi-3.5-vision is a lightweight, state-of-the-art open multimodal model built upon datasets which include - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data both on text and vision. The model belongs to the Phi-3 model family, and the multimodal version comes with 128K context length (in tokens) it can support. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures.
๐ก Phi-3 Portal ๐ฐ Phi-3 Microsoft Blog ๐ Phi-3 Technical Report ๐ฉโ๐ณ Phi-3 Cookbook ๐ฅ๏ธ Try It
Phi-3.5-vision is a lightweight, state-of-the-art open multimodal model built upon datasets which include - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data both on text and vision. The model belongs to the Phi-3 model family, and the multimodal version comes with 128K context length (in tokens) it can support. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures.
๐ก Phi-3 Portal ๐ฐ Phi-3 Microsoft Blog ๐ Phi-3 Technical Report ๐ฉโ๐ณ Phi-3 Cookbook ๐ฅ๏ธ Try It