HF-hub - Share and discover more about AI with social posts from the community.huggingface/OpenAi
Share and discover more about AI with social posts from the community.huggingface/OpenAi
MiniCPM-V 2.6 is the latest and most capable model in the MiniCPM-V series. The model is built on SigLip-400M and Qwen2-7B with a total of 8B parameters. It exhibits a significant performance improvement over MiniCPM-Llama3-V 2.5, and introduces new features for multi-image and video understanding. Notable features of MiniCPM-V 2.6 include:

🔥 Leading Performance. MiniCPM-V 2.6 achieves an average score of 65.2 on the latest version of OpenCompass, a comprehensive evaluation over 8 popular benchmarks. With only 8B parameters, it surpasses widely used proprietary models like GPT-4o mini, GPT-4V, Gemini 1.5 Pro, and Claude 3.5 Sonnet for single image understanding.

🖼 Multi Image Understanding and In-context Learning. MiniCPM-V 2.6 can also perform conversation and reasoning over multiple images. It achieves state-of-the-art performance on popular multi-image benchmarks such as Mantis-Eval, BLINK, Mathverse mv and Sciverse mv, and also shows promising in-context learning capability.

🎬 Video Understanding. MiniCPM-V 2.6 can also accept video inputs, performing conversation and providing dense captions for spatial-temporal information. It outperforms GPT-4V, Claude 3.5 Sonnet and LLaVA-NeXT-Video-34B on Video-MME with/without subtitles.

💪 Strong OCR Capability and Others. MiniCPM-V 2.6 can process images with any aspect ratio and up to 1.8 million pixels (e.g., 1344x1344). It achieves state-of-the-art performance on OCRBench, surpassing proprietary models such as GPT-4o, GPT-4V, and Gemini 1.5 Pro. Based on the the latest RLAIF-V and VisCPM techniques, it features trustworthy behaviors, with significantly lower hallucination rates than GPT-4o and GPT-4V on Object HalBench, and supports multilingual capabilities on English, Chinese, German, French, Italian, Korean, etc.
⚡️ My PhD thesis, “Scalable Nested Optimization for Deep Learning,” is now on arXiv! ⚡️

tl;dr: We develop various optimization tools with highlights, including:
· Making the momentum coefficient complex for adversarial games like GANs.
· Optimizing millions of hyperparameters using implicit differentiation.
· Tuning hyperparameters using hypernetworks.
· Differentiably finding bifurcations in optimization for diverse solutions.

https://arxiv.org/abs/2407.01526
What is the best LLM for RAG systems? 🤔

In a business setting, it will be the one that gives the best performance at a great price! 💼💰

And maybe it should be easy to fine-tune, cheap to fine-tune... FREE to fine-tune? 😲

That's @Google Gemini 1.5 Flash! 🚀🌟

It now supports fine-tuning, and the inference cost is the same as the base model! <coughs LORA adopters> 🤭🤖
🚀 TraVisionLM: Türünün İlk Örneği Türkçe Görsel Dil Modelini Sunuyorum! 🇹🇷🖼

TraVisionLM modelini Hugging Face'te yayınladım! 875M parametre ile bu hafif ve verimli model, görüntüye dayalı Türkçe talimatları işlemek için tasarlandı. Transformers kütüphanesiyle tamamen uyumlu, yüklemesi, eğitmesi ve kullanması çok kolay—dış kütüphane gerekmez!

Tek başıma geliştirdiğim TraVisionLM, düşük kaynaklı dillerde araştırmalar için sağlam bir temel sunuyor. Geliştirmeye devam ederken geri bildirimlerinizi bekliyorum.

🎉 Hemen keşfedin:

- Model:
ucsahin/TraVisionLM-base

- Demo: https://huggingface.co/spaces/ucsahin/TraVisionLM-Turkish_Visual_Language_Model
- Obje Tespiti İnce Ayarı:
ucsahin/TraVisionLM-Object-Detection-ft


Türkçe görsel dil işleme sınırlarını birlikte zorlayalım! TraVisionLM - Turkish Visual Language Model - a Hugging Face Space by ucsahin
🚀 Introducing TraVisionLM: Turkish Visual Language Model - The First of Its Kind! 🇹🇷🖼

I'm thrilled to share TraVisionLM on Hugging Face! With 875M parameters, this lightweight, efficient model handles Turkish instructions for image inputs. Fully compatible with the Transformers library, it’s easy to load, fine-tune, and use—no external libraries needed!

Developed solo, TraVisionLM is a strong foundation for low-resource language research. While still improving, it's a key step for Turkish-language AI. Your feedback is welcome as I refine the model.

🎉 Explore it now:

- Model:
ucsahin/TraVisionLM-base

- Demo: https://huggingface.co/spaces/ucsahin/TraVisionLM-Turkish_Visual_Language_Model
- Object Detection Finetune:
ucsahin/TraVisionLM-Object-Detection-ft


Let’s push Turkish visual language processing forward! TraVisionLM - Turkish Visual Language Model - a Hugging Face Space by ucsahin
Remember when Claude 3.5 Sonnet by @AnthropicAI took the world by storm with Claude Artifacts? 🌍

Now we have LlamaCoder, an open-source Claude Artifacts app that can generate full React apps and components with Meta-Llama 3.1 405B. 💻 100% free and open source. 🆓

I like how Llama has now started becoming a placeholder denoting open-source work! 🔓
Originally, Llama was an acronym for Large Language Model Meta AI. 🤖

GitHub: https://github.com/Nutlope/llamacoder
Demo (by togetherAI): https://llamacoder.together.ai
Alright Ya'll

I know it's a Saturday, but I decided to release my first Flux Dev Lora.

A retrain of my "Frosting Lane" model and I am sure the styles will just keep improving.

Have fun! Link Below - Thanks again to @ostris for the trainer and Black Forest Labs for the awesome model!

alvdansen/frosting_lane_flux
NEW MODEL + DATASET! :)

Check out Enigma, our new code-instruct model:
- trained on synthetic code-instruct data created with Llama 3.1 405b
- high quality code-instruct within the Llama 3.1 Instruct format

The model:
ValiantLabs/Llama3.1-8B-Enigma

The dataset:
sequelbox/Tachibana


Enjoy! We've got more new datasets and models to follow soon.
I've been testing a new open-source AI image generation model, FLUX.1 [dev], today. I'm quite impressed! This could mark a goodbye to Midjourney and Stable Diffusion. The images were done locally on an RTX 3090, for around 40 seconds per image.
Thank you Robin&Black Forest Labs!
Databricks ❤️ Hugging Face: up to 40% faster training and tuning of Large Language Models
Generative AI has been taking the world by storm. As the data and AI company, we have been on this journey with the release of the open source large language model Dolly, as well as the internally crowdsourced dataset licensed for research and commercial use that we used to fine-tune it, the databricks-dolly-15k. Both the model and dataset are available on Hugging Face. We’ve learned a lot throughout this process, and today we’re excited to announce our first of many official commits to the Hugging Face codebase that allows users to easily create a Hugging Face Dataset from an Apache Spark dataframe.
https://huggingface.co/blog/databricks-case-study Databricks ❤️ Hugging Face: up to 40% faster training and tuning of Large Language Models