Share and discover more about AI with social posts from the community.huggingface/OpenAi
RT-DETR - probably the best combination of speed, accuracy, and license for real-time object detection.

I just released a blog post tutorial on fine-tuning RT-DETR on a custom dataset.

shoutout to
@NielsRogge
for all the help!

link: https://blog.roboflow.com/train-rt-detr-custom-dataset-transformers/

↓ key takeawayshttps://pbs.twimg.com/card_img/1821575792448491520/AfYz_BzY?format=jpg&name=small How to Train RT-DETR on a Custom Dataset with Transformers
Can we train a VLM to 𝐩𝐫𝐞𝐟𝐞𝐫?

This is now possible, thanks to the new TRL/DPO support for VLMs! 🎉

As an example, we've trained a model to reduce hallucinations.

Check out:
📰 Blog post: https://huggingface.co/blog/dpo_vlm
🐙 TRL: https://github.com/huggingface/trl

Thanks to
@mervenoyann
,
@vwxyzjn
and
@krasul
who helped me with this work!
Agent for self-correcting Text-to-SQL 🧑‍💻

What if the query generated by your Text-to-SQL pipeline is correct SQL but returns wrong results?
👉 We need to add a critique step

That's very simple with an agent!
Check out the notebook!https://huggingface.co/learn/cookbook/agent_text_to_sql Agent for text-to-SQL with automatic error correction - Hugging Face Open-Source AI Cookbook
The vision language model in this video is 0.5B and can take in image, video and 3D! 🤯
Llava-NeXT-Interleave is a new vision language model trained on interleaved image, video and 3D data

keep reading ⥥⥥
Mistral 7B running on Mac, powered by CoreML! ⚡️

Heavily optimised with the latest updates from WWDC like stateful buffers, ML Tensors and 4-bit palletisation!

Try it today with swift-transformers and chat-ui! 🔥
Llama 405B is here, and it comes with more than expected! 🚨
@AIatMeta
Llama 3.1 comes in 3 sizes, 8B, 70B, and 405B, and speaks 8 languages! 🌍 Llama 3.1 405B matches or beats the Openai GPT-4o across many text benchmarks.

New and improvements of 3.1:
🧮 8B, 70B & 405B
Meta Llama 3.1 405B, 70B & 8B are here - Multilingual & with 128K context & Tool-use + agents! Competitive/ beats GPT4o & Claude Sonnet 3.5 unequivocally the best open LLM out there!🐐

Bonus: It comes with a more permissive license, which allows one to train other LLMs on its high-quality outputs 🔥
Llama-405b runs on cpu.
Getting 1.67 token/s output,
10 tokens/words per second input without a gpu.
Slow but usable, summarizing a 2 hour long medtech discussion with it. Will upload 2bit optimized etc here
https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/tree/main
With larger and larger diffusion transformers coming up, it's becoming increasingly important to have some good quantization tools for them.

We present our findings from a series of experiments on quantizing different diffusion pipelines based on diffusion transformers.

We demonstrate excellent memory savings with a bit of sacrifice on inference latency which is expected to improve in the coming days.
Really nice development by
@nvidia
and
@HuggingFace


Launch of Hugging Face Inference-as-a-Service powered by NVIDIA NIM, a new service on the Hugging Face Hub

So, we can use open models with the accelerated compute platform, of NVIDIA DGX Cloud for inference serving.

Code is fully compatible with OpenAI API, allowing you to use the openai’ sdk for inference.

Note: You need access to an Organization with a Hugging Face Enterprise subscription to run Inference.

------

📌 So NVIDIA NIMs is an inference microservices that provide models as optimized containers — to deploy on clouds, data centers or workstations, giving them the ability to easily build generative AI applications for copilots, chatbots and more, in minutes rather than weeks.

📌 Maximizes infrastructure investments and compute efficiency. For example, running Meta Llama 3-8B in a NIM produces up to 3x more generative AI tokens on accelerated infrastructure than without NIM.
Google just dropped Gemma 2 2B! 🔥

> Scores higher than GPT 3.5, Mixtral 8x7B on the LYMSYS arena
> MMLU: 56.1 & MBPP: 36.6
> Beats previous (Gemma 1 2B) by more than 10% in benchmarks
> 2.6B parameters, Multilingual
> 2 Trillion tokens (training set)
> Distilled from Gemma 2 27B (?)
> Trained on 512 TPU v5e

Smaller models beat orders of magnitude bigger models! 🤗
Very cool direction and so many cool ablations for distillation, too!

Kudos to Google & Deepmind for continuing their belief in open source and science! ⚡️
Gemma 2 2B running in a browser, powered by WebLLM & WebGPU! 🔥

100% local & on-device

In less than 24 hours, we've already got the model to the edge! ⚡️

Try it out on an HF space below:
NEW ARENA: Text to Speech Arena for Japanese by
@kotoba_tech
🔥

🔉Sound on

Outside of English, TTS evaluation is quite scarce. The Arena, allows one to test open source models against the closed source giants.

In the leaderboard you can compare open models like Bark, MOE-VITS, Kotoba Speech with closed source models like Google TTS, Open AI TTS and so on.

If you're a Japanese speaker then go check it out and help us find the best Japanese TTS model out there! 👀
Hugging News #114 :ilovepython: @everyone
Gemma
:google: Google releases Gemma 2 2B , ShieldGemma and Gemma Scope
:google: Gemma 2 2B in your browser thanks to MLC, 100% local and super fast with WebLLM + WebGPU!
:google: Gemma 2 2B running in a free Google Colab , powered by Transformers !
:google: Simple instructions to get started with the latest Gemma 2 models + llama.cpp !
Llama 3.1 405B released. 🎏 MagPie-Ultra is the first open dataset using Llama 3.1 405B-Instruct FP8 to generate 50,000 synthetic instruction pairs using the MagPie recipe and
@argilla_io
distilabel. It includes challenging instructions for coding math, data analysis, creative writing, advice seeking, or Brainstorming. ⚗️

MagPie datasets are created by prompting LLMs with "empty" prompts that consist only of starting special tokens, allowing the model to auto-regressively generate user queries and corresponding responses, which are then filtered to select high-quality data. 👨‍🎓

Note: The dataset is unfiltered but includes quality & difficulty scores, embeddings, topics, and safety scores from ArmorRM and LlamaGuard. 🛡

⚗️ Pipeline: https://huggingface.co/datasets/argilla/magpie-ultra-v0.1/blob/main/pipeline.py
🤗 Dataset: https://huggingface.co/datasets/argilla/magpie-ultra-v0.1
Dropping magpie-ultra-v0.1, the first open synthetic dataset built with Llama 3.1 405B.

Created with distilabel, it's our most advanced and compute-intensive pipeline to date.

https://huggingface.co/datasets/argilla/magpie-ultra-v0.1

Let's dig into the details!
Today we release our first foundation model. OCRonos-Vintage is a 124 million parameters model pretrained end-to-end by
@pleiasfr
on 18 billion tokens of cultural heritage archives, with nearly SOTA results for OCR correction in English
SF3D from Stability claims to state-of-the-art in mesh reconstruction.

let's see if it's true

⚔️ Added to 3D Arena https://huggingface.co/spaces/dylanebert/3d-arena
ROAM Challenge 2: LATAM Out-of-Distribution Few-shot Challenge
Develop models that can classify unusual or specific vehicle types from minimal training data, a crucial skill in environments with unique vehicular regulations
https://huggingface.co/spaces/Artificio/ROAM2FewShotChallenge