HF-hub - Share and discover more about AI with social posts from the community.huggingface/OpenAi
Share and discover more about AI with social posts from the community.huggingface/OpenAi
[2024.05.25] MiniCPM-Llama3-V 2.5 now supports streaming outputs and customized system prompts. Try it at here
[2024.06.03] Now, you can run MiniCPM-Llama3-V 2.5 on multiple low VRAM GPUs(12 GB or 16 GB) by distributing the model's layers across multiple GPUs. For more details, Check this link.
[2024.05.23] 🔥🔥🔥 MiniCPM-V tops GitHub Trending and HuggingFace Trending! Our demo, recommended by Hugging Face Gradio’s official account, is available here. Come and try it out!
[2024.05.28] 🚀🚀🚀 MiniCPM-Llama3-V 2.5 now fully supports its feature in llama.cpp and ollama! Please pull the latest code of our provided forks (llama.cpp, ollama). GGUF models in various sizes are available here. MiniCPM-Llama3-V 2.5 series is not supported by the official repositories yet, and we are working hard to merge PRs. Please stay tuned! You can visit our GitHub repository for more information!
[2024.05.28] 💫 We now support LoRA fine-tuning for MiniCPM-Llama3-V 2.5, using only 2 V100 GPUs! See more statistics here.
A GPT-4V Level Multimodal LLM on Your Phone [2024.08.10] 🚀🚀🚀 MiniCPM-Llama3-V 2.5 is now fully supported by official llama.cpp! GGUF models of various sizes are available here.
[2024.08.06] 🔥🔥🔥 We open-source MiniCPM-V 2.6, which outperforms GPT-4V on single image, multi-image and video understanding. It advances popular features of MiniCPM-Llama3-V 2.5, and can support real-time video understanding on iPad. Try it now!
This is the int4 quantized version of MiniCPM-V 2.6.
Running with int4 version would use lower GPU memory (about 7GB).

Usage
Inference using Huggingface transformers on NVIDIA GPUs. Requirements tested on python 3.10:
MiniCPM-V 2.6 is the latest and most capable model in the MiniCPM-V series. The model is built on SigLip-400M and Qwen2-7B with a total of 8B parameters. It exhibits a significant performance improvement over MiniCPM-Llama3-V 2.5, and introduces new features for multi-image and video understanding. Notable features of MiniCPM-V 2.6 include:

🔥 Leading Performance. MiniCPM-V 2.6 achieves an average score of 65.2 on the latest version of OpenCompass, a comprehensive evaluation over 8 popular benchmarks. With only 8B parameters, it surpasses widely used proprietary models like GPT-4o mini, GPT-4V, Gemini 1.5 Pro, and Claude 3.5 Sonnet for single image understanding.

🖼 Multi Image Understanding and In-context Learning. MiniCPM-V 2.6 can also perform conversation and reasoning over multiple images. It achieves state-of-the-art performance on popular multi-image benchmarks such as Mantis-Eval, BLINK, Mathverse mv and Sciverse mv, and also shows promising in-context learning capability.

🎬 Video Understanding. MiniCPM-V 2.6 can also accept video inputs, performing conversation and providing dense captions for spatial-temporal information. It outperforms GPT-4V, Claude 3.5 Sonnet and LLaVA-NeXT-Video-34B on Video-MME with/without subtitles.

💪 Strong OCR Capability and Others. MiniCPM-V 2.6 can process images with any aspect ratio and up to 1.8 million pixels (e.g., 1344x1344). It achieves state-of-the-art performance on OCRBench, surpassing proprietary models such as GPT-4o, GPT-4V, and Gemini 1.5 Pro. Based on the the latest RLAIF-V and VisCPM techniques, it features trustworthy behaviors, with significantly lower hallucination rates than GPT-4o and GPT-4V on Object HalBench, and supports multilingual capabilities on English, Chinese, German, French, Italian, Korean, etc.
⚡️ My PhD thesis, “Scalable Nested Optimization for Deep Learning,” is now on arXiv! ⚡️

tl;dr: We develop various optimization tools with highlights, including:
· Making the momentum coefficient complex for adversarial games like GANs.
· Optimizing millions of hyperparameters using implicit differentiation.
· Tuning hyperparameters using hypernetworks.
· Differentiably finding bifurcations in optimization for diverse solutions.

https://arxiv.org/abs/2407.01526
What is the best LLM for RAG systems? 🤔

In a business setting, it will be the one that gives the best performance at a great price! 💼💰

And maybe it should be easy to fine-tune, cheap to fine-tune... FREE to fine-tune? 😲

That's @Google Gemini 1.5 Flash! 🚀🌟

It now supports fine-tuning, and the inference cost is the same as the base model! <coughs LORA adopters> 🤭🤖
🚀 TraVisionLM: Türünün İlk Örneği Türkçe Görsel Dil Modelini Sunuyorum! 🇹🇷🖼

TraVisionLM modelini Hugging Face'te yayınladım! 875M parametre ile bu hafif ve verimli model, görüntüye dayalı Türkçe talimatları işlemek için tasarlandı. Transformers kütüphanesiyle tamamen uyumlu, yüklemesi, eğitmesi ve kullanması çok kolay—dış kütüphane gerekmez!

Tek başıma geliştirdiğim TraVisionLM, düşük kaynaklı dillerde araştırmalar için sağlam bir temel sunuyor. Geliştirmeye devam ederken geri bildirimlerinizi bekliyorum.

🎉 Hemen keşfedin:

- Model:
ucsahin/TraVisionLM-base

- Demo: https://huggingface.co/spaces/ucsahin/TraVisionLM-Turkish_Visual_Language_Model
- Obje Tespiti İnce Ayarı:
ucsahin/TraVisionLM-Object-Detection-ft


Türkçe görsel dil işleme sınırlarını birlikte zorlayalım! TraVisionLM - Turkish Visual Language Model - a Hugging Face Space by ucsahin
🚀 Introducing TraVisionLM: Turkish Visual Language Model - The First of Its Kind! 🇹🇷🖼

I'm thrilled to share TraVisionLM on Hugging Face! With 875M parameters, this lightweight, efficient model handles Turkish instructions for image inputs. Fully compatible with the Transformers library, it’s easy to load, fine-tune, and use—no external libraries needed!

Developed solo, TraVisionLM is a strong foundation for low-resource language research. While still improving, it's a key step for Turkish-language AI. Your feedback is welcome as I refine the model.

🎉 Explore it now:

- Model:
ucsahin/TraVisionLM-base

- Demo: https://huggingface.co/spaces/ucsahin/TraVisionLM-Turkish_Visual_Language_Model
- Object Detection Finetune:
ucsahin/TraVisionLM-Object-Detection-ft


Let’s push Turkish visual language processing forward! TraVisionLM - Turkish Visual Language Model - a Hugging Face Space by ucsahin
Remember when Claude 3.5 Sonnet by @AnthropicAI took the world by storm with Claude Artifacts? 🌍

Now we have LlamaCoder, an open-source Claude Artifacts app that can generate full React apps and components with Meta-Llama 3.1 405B. 💻 100% free and open source. 🆓

I like how Llama has now started becoming a placeholder denoting open-source work! 🔓
Originally, Llama was an acronym for Large Language Model Meta AI. 🤖

GitHub: https://github.com/Nutlope/llamacoder
Demo (by togetherAI): https://llamacoder.together.ai
Alright Ya'll

I know it's a Saturday, but I decided to release my first Flux Dev Lora.

A retrain of my "Frosting Lane" model and I am sure the styles will just keep improving.

Have fun! Link Below - Thanks again to @ostris for the trainer and Black Forest Labs for the awesome model!

alvdansen/frosting_lane_flux
NEW MODEL + DATASET! :)

Check out Enigma, our new code-instruct model:
- trained on synthetic code-instruct data created with Llama 3.1 405b
- high quality code-instruct within the Llama 3.1 Instruct format

The model:
ValiantLabs/Llama3.1-8B-Enigma

The dataset:
sequelbox/Tachibana


Enjoy! We've got more new datasets and models to follow soon.