HF-hub - Share and discover more about AI with social posts from the community.huggingface/OpenAi
Share and discover more about AI with social posts from the community.huggingface/OpenAi
🌟 Argilla v2.1.0 goes multi-modal: Image Field, Dark Mode, Enhanched Hugging Face Hub imports and more!

🖼 Image Field: Seamlessly work with multimodal datasets
🌓 Dark Mode: Reduce eye strain with our sleek new look
🤗 Enhanced Hugging Face Hub import with the SDK
🇪🇸 Spanish UI: Breaking language barriers

Plus more improvements to supercharge your model curation workflow!

Check out the full announcement for details and code examples: https://github.com/argilla-io/argilla/compare/v2.0.1...v2.1.0 Comparing v2.0.1...v2.1.0 · argilla-io/argilla
Wanted to train a FLUX model using out-of-copyright images, so I curated concept art images from NASA.

Model: https://huggingface.co/davanstrien/nasa_concept_art
Dataset:
davanstrien/nasa_concept_art


So far, training was done without captions, but I'm experimenting with using VLLMs to generate captions to see if that improves the model. davanstrien/nasa_concept_art-flux-lora · Hugging Face
💾🧠How much VRAM will you need for training your AI model? 💾🧠
Check out this app where you convert:
Pytorch/tensorflow summary -> required VRAM
or
Parameter count -> required VRAM

Use it in: http://howmuchvram.com

And everything is open source! Ask for new functionalities or contribute in:
https://github.com/AlexBodner/How_Much_VRAM
If it's useful to you leave a star 🌟and share it to someone that will find the tool useful!
More discussion in: https://x.com/AlexBodner_/status/1832054850294812679
Yesterday @mattshumer released
mattshumer/Reflection-Llama-3.1-70B
, an impressive model that achieved incredible results in benchmarks like MMLU. The model was fine-tuned using Reflection-Tuning and the dataset used wasn't released, but I created a small recipe with distilabel that allows generating a dataset with a similar output format:

1. We use MagPie 🐦 in combination with
meta-llama/Meta-Llama-3.1-70B-Instruct
to generate reasoning instructions.
2. We generate a response again using
meta-llama/Meta-Llama-3.1-70B-Instruct
, but we steer the LLM to generate an specific output format using a custom system prompt. In the system prompt, we instruct the LLM that it will have first to think 💭 and have reflections that will help resolving ambiguities. After that, we instruct the LLM to generate an output based on the previous thinking

In this dataset
gabrielmbmb/distilabel-reflection-tuning
you can found 5 rows that I generated with this recipe. You can also found the code of the pipeline in the file called reflection.py.
FLUX Prompt Generator Updates

-
gokaygokay/FLUX-Prompt-Generator


- There are now hundreds of new selections across diverse categories, each offering a lot of choices:

Architecture, Art, Artist, Brands, Character, Cinematic, Fashion, Feelings, Geography, Human, Interaction, Keywords, Objects, People, Photography, Plots, Poses, Scene, Science, Stuff, Time, Typography, Vehicle, Video Game

- In addition to Hugging Face, I've integrated new LLM providers: Groq, OpenAI, and Claude.

- Upgraded Vision Language Models (VLMs): We now feature Qwen2-VL and Florence-2-large.

- New specialized system prompts for various styles and themes, including Happy, Simple, Poster, Only Objects, No Figure, Landscape, Fantasy.https://cdn-uploads.huggingface.co/production/uploads/630899601dd1e3075d975785/u_IZ43q0247UaH2_LK07W.png
Reposting from twitter:

Just so you all know, I'll be on vacation for the following two weeks and away from home! I'm hoping to get on at least once a day to load up some quants, but I won't be as bleeding edge and on the ball :) feel free to shoot me a message if you see one I should make!

In the meantime if you need something bleeding edge make sure to check out @MaziyarPanahi or @bullerwins who both put out great work!
Flux actually has deforum (a "classical" method of generating videos using a text graph model)! ? It feels like a renaissance, and I'm dreaming back to 2022🥹 (By the way, the flux ecosystem is developing really fast! 🤔)

GitHub - XLabs-AI/deforum-x-flux: Deforum based on flux-dev by XLabs-AI

🧐Deforum-x-flux is a project based on flux-dev, mainly used to create high-quality animations and video generation, especially in combination with Stable Diffusion technology for image-to-video conversion. It provides two running modes: CLI and Jupyter Notebook, and supports complex 3D animation modes and interpolation functions.

➡️Link: Web link
Did you see the new coding model from @01-ai ?

collection :
01-ai/yi-coder-66bdb00f5bdd611f9a008f30

demo :
Tonic/Yi-Coder-9B https://huggingface.co/spaces/Tonic/Yi-Coder-9B


achieves SOTA on benchmarks , 125K context window , 55 languages including Docker, Js and many more 🚀 Yi Coder 9B - a Hugging Face Space by Tonic
🌐 Introducing PPT Online Dataset -
nyuuzyou/pptonline


Dataset highlights:
- Metadata for 1,418,349 PowerPoint (.ppt) files from ppt-online.org
- Multilingual content: Russian, Ukrainian, Belarusian, Kazakh, English, and others
- Each entry includes: Unique ID, title, category, download link, file size, and content snippet
- Data reflects presentations accessible through the PPT Online platform
- Licensed under Creative Commons Zero (CC0) for unrestricted use

This dataset offers a unique window into online educational resources, particularly in Eastern European and Central Asian contexts. It provides opportunities for analyzing presentation trends, topic distributions, and language patterns in educational materials.
🚀 𝗪𝗵𝗲𝗿𝗲 𝘀𝗰𝗮𝗹𝗶𝗻𝗴 𝗹𝗮𝘄𝘀 𝗮𝗿𝗲 𝘁𝗮𝗸𝗶𝗻𝗴 𝘂𝘀 : 𝗯𝘆 𝟮𝟬𝟮𝟴, 𝗔𝗜 𝗖𝗹𝘂𝘀𝘁𝗲𝗿𝘀 𝘄𝗶𝗹𝗹 𝗿𝗲𝗮𝗰𝗵 𝘁𝗵𝗲 𝗽𝗼𝘄𝗲𝗿 𝗰𝗼𝗻𝘀𝘂𝗺𝗽𝘁𝗶𝗼𝗻 𝗼𝗳 𝗲𝗻𝘁𝗶𝗿𝗲 𝗰𝗼𝘂𝗻𝘁𝗿𝗶𝗲𝘀

Reminder : “Scaling laws” are empirical laws saying that if you keep multiplying your compute by x10, your models will mechanically keep getting better and better.

To give you an idea, GPT-3 can barely write sentences, and GPT-4, which only used x15 its amount of compute, already sounds much smarter than some of my friends (although it's not really - or at least I haven't tested them side-by side). So you can imagine how far a x100 over GPT-4 can take us.

🏎 As a result, tech titans are racing to build the biggest models, and for this they need gigantic training clusters.

The picture below shows the growth of training compute: it is increasing at a steady exponential rate of a x10 every 2 years. So let’s take this progress a bit further:
- 2022: starting training for GPT-4 : 10^26 FLOPs, cost of $100M
- 2024: today, companies start training on much larger clusters like the “super AI cluster” of Elon Musk’s xAI, 10^27 FLOPS, $1B
- 2026 : by then clusters will require 1GW, i.e. around the full power generated by a nuclear reactor
- 2028: we reach cluster prices in the 100 billion dollars, using 10GW, more than the most powerful power stations currently in use in the US. This last size seems crazy, but Microsoft and OpenAI already are planning one.

Will AI clusters effectively reach these crazy sizes where the consume as much as entire countries?
➡️ Three key ingredients of training might be a roadblock to scaling up :
💸 Money: but it’s very unlikely, given the potential market size for AGI, that investors lose interest.
⚡️ Energy supply at a specific location
📚 Training data: we’re already using 15 trillion tokens for Llama-3.1 when Internet has something like 60 trillion.

🤔 I’d be curious to hear your thoughts: do you think we’ll race all the way there?
How do i access llama 3.1 70b in my space ?

this doesn't seem to work, can someone help me with a working code


from transformers import AutoConfig

config = AutoConfig.from_pretrained("meta-llama/Meta-Llama-3.1-70B", revision="main")
config.rope_scaling = {"type": "llama3", "factor": 8.0}

model = AutoModelForCausalLM.from_pretrained("meta-llama/Meta-Llama-3.1-70B", config=config, use_auth_token=True)
I have put together a notebook on Multimodal RAG, where we do not process the documents with hefty pipelines but natively use:
-
vidore/colpali
for retrieval 📖 it doesn't need indexing with image-text pairs but just images!
-
Qwen/Qwen2-VL-2B-Instruct
for generation 💬 directly feed images as is to a vision language model with no processing to text!
I used ColPali implementation of the new 🐭 Byaldi library by @bclavie 🤗
https://github.com/answerdotai/byaldi
Link to notebook: https://github.com/merveenoyan/smol-vision/blob/main/ColPali_%2B_Qwen2_VL.ipynb GitHub - AnswerDotAI/byaldi: Use late-interaction multi-modal models such as ColPali in just a few lines of code.
The timm leaderboard
timm/leaderboard
has been updated with the ability to select different hardware benchmark sets: RTX4090, RTX3090, two different CPUs along with some NCHW / NHWC layout and torch.compile (dynamo) variations.

Also worth pointing out, there are three rather newish 'test' models that you'll see at the top of any samples/sec comparison:
* test_vit (
timm/test_vit.r160_in1k
)
* test_efficientnet (
timm/test_efficientnet.r160_in1k
)
* test_byobnet (
timm/test_byobnet.r160_in1k
, a mix of resnet, darknet, effnet/regnet like blocks)

They are < 0.5M params, insanely fast and originally intended for unit testing w/ real weights. They have awful ImageNet top-1, it's rare to have anyone bother to train a model this small on ImageNet (the classifier is roughly 30-70% of the param count!). However, they are FAST on very limited hadware and you can fine-tune them well on small data. Could be the model you're looking for?
Decided to try to check how many weights in a 70b F32 model would be squashed when converted to F16 (spoiler, it's shockingly few)

The reason for this comparison is that it should represent the same percentage of squishing as bf16 to fp16

Had claude make me a script, using the new Reflection-70B, and these are the results:

Total weights: 70553706496
Fully representable: 70530215524
Squashed: 23490972
Percentage squashed: 0.03%

0.03%!!!!

A couple things to note, this uses a roundtrip of F32 -> F16 -> F32 and then torch.isclose to account for rounding errors that come up by the very nature of extremely accurate numbers, but it uses VERY small tolerances (rtol=1e-5, atol=1e-8)

This is also examining EVERY weight that was stored at F32, and for most layers I was somewhere between 0% and 0.03% of weights being squashed, no major outliers.

Overall, I feel even safer converting to F16 for llama.cpp, the extremely small number of weights that fall outside the range are likely so small that they don't actually play a role in the final output of the model at inference anyways.
🚀 Introducing Hugging Face's Multilingual Speech-to-Speech! 🎤
💬Our modular, cross-platform pipeline to run GPT4o-like experiences on device can now seamlessly switch languages mid-conversation with an imperceptible 100ms delay.

🌟 Building on an amazing early reception with 2600 stars on GitHub 🌟
🚀 We are expanding the library to support multiple languages
🔥 Try it out with a flag: --language fr
🤯 Or don't set the flag and let the system detect the language

💡 What feature should we add next? https://cdn-uploads.huggingface.co/production/uploads/65d66b494bbd0d92b641cdbb/WbpkWi8OlJGXnL1kzmcqK.mp4
@victor Sorry for the repetitiveness.

I'm not sure if Post is the right place to report such an error, but it seems to be a server error unrelated to the Zero GPU space error the other day, so I don't know where else to report it.

Since this morning, I have been getting a strange error when running inference from space in Gradio 3.x.
Yntec (https://huggingface.co/Yntec) discovered it, but he is not in the Pro subscription, so I am reporting it on behalf of him.

The error message is as follows: 1girl and other prompts will show cached output, so experiment with unusual prompts.

Thank you in advance.

John6666/blitz_diffusion_error

John6666/GPU-stresser-t2i-error Yntec (Yn Tec)
A few weeks ago, we uploaded the MERIT Dataset 🎒📃🏆 into Hugging Face 🤗!

Now, we are excited to share the Merit Dataset paper via arXiv! 📃💫
The MERIT Dataset: Modelling and Efficiently Rendering Interpretable Transcripts (2409.00447)


The MERIT Dataset is a fully synthetic, labeled dataset created for training and benchmarking LLMs on Visually Rich Document Understanding tasks. It is also designed to help detect biases and improve interpretability in LLMs, where we are actively working. 🔧🔨

MERIT contains synthetically rendered students' transcripts of records from different schools in English and Spanish. We plan to expand the dataset into different contexts (synth medical/insurance documents, synth IDS, etc.) Want to collaborate? Do you have any feedback? 🧐

Resources:

- Dataset:
de-Rodrigo/merit

- Code and generation pipeline: https://github.com/nachoDRT/MERIT-Dataset

PD: We are grateful to Hugging Face 🤗 for providing the fantastic tools and resources we find in the platform and, more specifically, to @nielsr for sharing the fine-tuning/inference scripts we have used in our benchmark. GitHub - nachoDRT/MERIT-Dataset: The MERIT Dataset is a fully synthetic, labeled dataset created for training and benchmarking…
I am integrating Azure Cosmos DB, the database system that backs GPT conversations into my workflow, and experimenting with new patterns to accelerate dataset evolution for evaluation and training of AI.

While initially using it for research prompts and research outputs using my GPT-4o client here which can interface and search ArXiv, I am excited to try out some new features specifically for AI at scale. Research on memory augmentation is shown.
awacke1/GPT-4o-omni-text-audio-image-video


awacke1/AzureCosmosDBUI https://huggingface.co/spaces/awacke1/GPT-4o-omni-text-audio-image-video 📝🔊GPT4O🖼️🎥 - a Hugging Face Space by awacke1
A few weeks ago, we uploaded the MERIT Dataset 🎒📃🏆 into Hugging Face 🤗!

Now, we are excited to share the Merit Dataset paper via arXiv! 📃💫
The MERIT Dataset: Modelling and Efficiently Rendering Interpretable Transcripts (2409.00447)


The MERIT Dataset is a fully synthetic, labeled dataset created for training and benchmarking LLMs on Visually Rich Document Understanding tasks. It is also designed to help detect biases and improve interpretability in LLMs, where we are actively working. 🔧🔨

MERIT contains synthetically rendered students' transcripts of records from different schools in English and Spanish. We plan to expand the dataset into different contexts (synth medical/insurance documents, synth IDS, etc.) Want to collaborate? Do you have any feedback? 🧐

Resources:

- Dataset:
de-Rodrigo/merit

- Code and generation pipeline: https://github.com/nachoDRT/MERIT-Dataset

PD: We are grateful to Hugging Face 🤗 for providing the fantastic tools and resources we find in the platform and, more specifically, to @nielsr for sharing the fine-tuning/inference scripts we have used in our benchmark. GitHub - nachoDRT/MERIT-Dataset: The MERIT Dataset is a fully synthetic, labeled dataset created for training and benchmarking…