HF-hub - Share and discover more about AI with social posts from the community.huggingface/OpenAi
Share and discover more about AI with social posts from the community.huggingface/OpenAi
A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
MiniCPM-V 2.6
MiniCPM-V 2.6 is the latest and most capable model in the MiniCPM-V series. The model is built on SigLip-400M and Qwen2-7B with a total of 8B parameters. It exhibits a significant performance improvement over MiniCPM-Llama3-V 2.5, and introduces new features for multi-image and video understanding. Notable features of MiniCPM-V 2.6 include:

🔥 Leading Performance. MiniCPM-V 2.6 achieves an average score of 65.2 on the latest version of OpenCompass, a comprehensive evaluation over 8 popular benchmarks. With only 8B parameters, it surpasses widely used proprietary models like GPT-4o mini, GPT-4V, Gemini 1.5 Pro, and Claude 3.5 Sonnet for single image understanding.

🖼 Multi Image Understanding and In-context Learning. MiniCPM-V 2.6 can also perform conversation and reasoning over multiple images. It achieves state-of-the-art performance on popular multi-image benchmarks such as Mantis-Eval, BLINK, Mathverse mv and Sciverse mv, and also shows promising in-context learning capability.

🎬 Video Understanding. MiniCPM-V 2.6 can also accept video inputs, performing conversation and providing dense captions for spatial-temporal information. It outperforms GPT-4V, Claude 3.5 Sonnet and LLaVA-NeXT-Video-34B on Video-MME with/without subtitles.

💪 Strong OCR Capability and Others. MiniCPM-V 2.6 can process images with any aspect ratio and up to 1.8 million pixels (e.g., 1344x1344). It achieves state-of-the-art performance on OCRBench, surpassing proprietary models such as GPT-4o, GPT-4V, and Gemini 1.5 Pro. Based on the the latest RLAIF-V and VisCPM techniques, it features trustworthy behaviors, with significantly lower hallucination rates than GPT-4o and GPT-4V on Object HalBench, and supports multilingual capabilities on English, Chinese, German, French, Italian, Korean, etc.

🚀 Superior Efficiency. In addition to its friendly size, MiniCPM-V 2.6 also shows state-of-the-art token density (i.e., number of pixels encoded into each visual token). It produces only 640 tokens when processing a 1.8M pixel image, which is 75% fewer than most models. This directly improves the inference speed, first-token latency, memory usage, and power consumption. As a result, MiniCPM-V 2.6 can efficiently support real-time video understanding on end-side devices such as iPad.

💫 Easy Usage. MiniCPM-V 2.6 can be easily used in various ways: (1) llama.cpp and ollama support for efficient CPU inference on local devices, (2) int4 and GGUF format quantized models in 16 sizes, (3) vLLM support for high-throughput and memory-efficient inference, (4) fine-tuning on new domains and tasks, (5) quick local WebUI demo setup with Gradio and (6) online web demo.https://huggingface.co/openbmb/MiniCPM-V-2_6 openbmb/MiniCPM-V-2_6 · Hugging Face
Kling AI Video is FINALLY Public (All Countries), Free to Use and MIND BLOWING - Full Tutorial > https://youtu.be/zcpqAxYV1_w

You probably seen those mind blowing AI made videos. And the day has arrived. The famous Kling AI is now worldwide available for free. In this tutorial video I will show you how to register for free with just email to Kling AI and use its mind blowing text to video animation, image to video animation and text to image, and image to image capabilities. This video will show you non-cherry pick results so you will know the actual quality and capability of the model unlike those extremely cherry pick example demos. Still, #KlingAI is the only #AI model that competes with OpenAI's #SORA and it is real to use.

🔗 Kling AI Official Website ⤵️
▶️ https://www.klingai.com/



🔗 Our GitHub Repository ⤵️
▶️ https://github.com/FurkanGozukara/Stable-Diffusion
I just had a masterclass in open-source collaboration with the release of Llama 3.1 🦙🤗

Meta dropped Llama 3.1, and seeing firsthand the Hugging Face team working to integrate it is nothing short of impressive. Their swift integration, comprehensive documentation, and innovative tools showcase the power of open-source teamwork.

For the curious minds:

📊 Check out independent evaluations:
open-llm-leaderboard/open_llm_leaderboard


🧠 Deep dive into the tech: https://huggingface.co/blog/llama31

👨‍🍳 Try different recipes (including running 8B on free Colab!): https://github.com/huggingface/huggingface-llama-recipes

📈 Visualize open vs. closed LLM progress:
andrewrreed/closed-vs-open-arena-elo


🤖 Generate synthetic data with distilabel, thanks to the new license allowing the use of outputs to train other LLMs https://huggingface.co/blog/llama31#synthetic-data-generation-with-distilabel

💡 Pro tip: Experience the 405B version for free on HuggingChat, now with tool-calling capabilities! https://huggingface.co/chat/

#OpenSourceAI #AIInnovation Llama 3.1 - 405B, 70B & 8B with multilinguality and long context
📚 Trained on a large dataset of 558k Arabic triplets translated from the AllNLI triplet dataset:
Omartificial-Intelligence-Space/Arabic-NLi-Triplet

6️⃣ 6 different base models: AraBERT, MarBERT, LaBSE, MiniLM, paraphrase-multilingual-mpnet-base, mpnet-base, ranging from 109M to 471M parameters.
🪆 Trained with a Matryoshka loss, allowing you to truncate embeddings with minimal performance loss: smaller embeddings are faster to compare.
📈 Outperforms all commonly used multilingual models like
intfloat/multilingual-e5-large
,
sentence-transformers/paraphrase-multilingual-mpnet-base-v2
, and
sentence-transformers/LaBSE
.

Check them out here:
-
Omartificial-Intelligence-Space/Arabic-mpnet-base-all-nli-triplet

-
Omartificial-Intelligence-Space/Arabic-all-nli-triplet-Matryoshka

-
Omartificial-Intelligence-Space/Arabert-all-nli-triplet-Matryoshka

-
Omartificial-Intelligence-Space/Arabic-labse-Matryoshka

-
Omartificial-Intelligence-Space/Marbert-all-nli-triplet-Matryoshka

-
Omartificial-Intelligence-Space/Arabic-MiniLM-L12-v2-all-nli-triplet

Or the collection with all:
Omartificial-Intelligence-Space/arabic-matryoshka-embedding-models-666f764d3b570f44d7f77d4e


My personal favourite is likely
Omartificial-Intelligence-Space/Arabert-all-nli-triplet-Matryoshka
: a very efficient 135M parameters & scores #1 on
mteb/leaderboard
https://huggingface.co/datasets/Omartificial-Intelligence-Space/Arabic-NLi-Triplet
. Omartificial-Intelligence-Space/Arabic-NLi-Triplet · Datasets at Hugging Face
New smol-vision tutorial dropped: QLoRA fine-tuning IDEFICS3-Llama 8B on VQAv2 🐶

Learn how to efficiently fine-tune the latest IDEFICS3-Llama on visual question answering in this notebook 📖
Fine-tuning notebook: https://github.com/merveenoyan/smol-vision/blob/main/Idefics_FT.ipynb
Resulting model:
merve/idefics3llama-vqav2
built a space for creating prompts for FLUX

gokaygokay/FLUX-Prompt-Generator


You can create long prompts from images or simple words. Enhance your short prompts with prompt enhancer. You can configure various settings such as artform, photo type, character details, scene details, style, and artist to create tailored prompts.

And you can combine all of them with custom prompts using llms (Mixtral, Mistral, Llama 3, and Mistral-Nemo).

The UI is a bit complex, but it includes almost everything you need. Choosing random option is the most fun!

And i've created some other spaces for using FLUX models with captioners and enhancers.

-
gokaygokay/FLUX.1-dev-with-Captioner

-
gokaygokay/FLUX.1-Schnell-with-Captioner
New feature 🔥
Image models and LoRAs now have little previews 🤏

If you don't know where to start to find them, I invite you to browse cool LoRAs in the profile of some amazing fine-tuners: @artificialguybr, @alvdansen, @DoctorDiffusion, @e-n-v-y, @KappaNeuro @ostris
The first open Stable Diffusion 3-like architecture model is JUST out 💣 - but it is not SD3! 🤔

It is
Tencent-Hunyuan/HunyuanDiT
by Tencent, a 1.5B parameter DiT (diffusion transformer) text-to-image model 🖼, trained with multi-lingual CLIP + multi-lingual T5 text-encoders for english 🤝 chinese understanding

Try it out by yourself here ▶️ https://huggingface.co/spaces/multimodalart/HunyuanDiT
(a bit too slow as the model is chunky and the research code isn't super optimized for inference speed yet)

In the paper they claim to be SOTA open source based on human preference evaluation! HunyuanDiT - a Hugging Face Space by Tencent-Hunyuan
🥳celebrating 5K readers in one of my blog posts 🥳
I came back with another one this time 🤓
in this blog you will learn 📖 :
* How to train custom AI models with the trainer API 🚀
* integrate your AI models with HF using the mixin classes 🔥

happy reading everyone 🤗
🔗link: https://huggingface.co/blog/not-lain/trainer-api-and-mixin-classes Train custom AI models with the trainer API and adapt them to 🤗
I will be delivering an introductory coding session this Sunday 7Pm gmt+1 time about huggingface, if you are new to HF and don't know where to begin, you are welcome to join us 🤗
📌Place: huggingface discord server
🔗Link : https://discord.gg/hugging-face-879548962464493619?event=1245406127668203541 Join the Hugging Face Discord Server!
It is with great pleasure I inform you that huggingface's ModelHubMixin reached 200+ models on the hub 🥳

ModelHubMixin is a class developed by HF to integrate AI models with the hub with ease and it comes with 3 methods :
* save_pretrained
* from_pretrained
* push_to_hub

Shoutout to @nielsr , @Wauplin and everyone else on HF for their awesome work 🤗

If you are not familiar with ModelHubMixin and you are looking for extra resources you might consider :
* docs: https://huggingface.co/docs/huggingface_hub/main/en/package_reference/mixins
🔗blog about training models with the trainer API and using ModelHubMixin: https://huggingface.co/blog/not-lain/trainer-api-and-mixin-classes
🔗GitHub repo with pip integration: https://github.com/not-lain/PyTorchModelHubMixin-template
🔗basic guide: https://huggingface.co/posts/not-lain/884273241241808
I have finished writing a blogpost about building an image-based retrieval system, This is one of the first-ever approaches to building such a pipeline using only open-source models/libraries 🤗

You can checkout the blogpost in https://huggingface.co/blog/not-lain/image-retriever and the associated space at
not-lain/image-retriever
.

If you want to request another blog post consider letting me know down below or you can reach out to me through any of my social media

📖 Happy reading ! Image-based search engine
AI Comic Factory
Last release: AI Comic Factory 1.2

The AI Comic Factory will soon have an official website: aicomicfactory.app

For more information about my other projects please check linktr.ee/FLNGR.

Running the project at home
First, I would like to highlight that everything is open-source (see here, here, here, here).

However the project isn't a monolithic Space that can be duplicated and ran immediately: it requires various components to run for the frontend, backend, LLM, SDXL etc.

If you try to duplicate the project, open the .env you will see it requires some variables.
distilabel 1.3.0 is out! This release contains many core improvements and new tasks that help us building
argilla/magpie-ultra-v0.1
!

Distributed pipeline execution with Ray, new Magpie tasks, reward models, components for dataset diversity based on sentence embeddings, Argilla 2.0 compatibility and many more features!

Check the new release in GitHub: https://github.com/argilla-io/distilabel
Post
171

Remember when @mistralAI said large enough and casually dropped Mistral-Large-Instruct-2407? 🤯🚀

It's now on http://lmsys.org! 🌐 It works amazing for instruction following, hard prompts, coding, and longer queries with only 123 billion parameters. 💡💻

It outperforms GPT4-Turbo and Claude 3 Opus on Coding, Hard Prompts, Math, and Longer Query categories. 📈🔢

It also outperforms Llama 3.1 405B on Instruction Following while being 3x smaller. 🐎🔍

It also does exceedingly well on the Ai2 ZebraLogic logistic reasoning benchmark despite being much smaller than the other models. 🦓🤔

Mistral is not here to take part but to take over! 🏆🌟

Model: https://mistral.ai/news/mistral-large-2407/