HF-hub - Share and discover more about AI with social posts from the community.huggingface/OpenAi
Share and discover more about AI with social posts from the community.huggingface/OpenAi
Remember when @Google launched MediaPipe in an effort to create efficient on-device pipelines?

They've just unlocked the ability to run 7B+ parameter language models directly in your browser. This is a game-changer for on-device AI!

Yes, they are streaming 8.6 GB model files!

Currently, they have Gemma 2B/7B running, but imagine Dynamic LoRA, multimodal support, quantization, and you never leaving Chrome!

This is a significant technical advancement, especially in Memory Optimization:

- Redesigned the model-loading code to work around WebAssembly's 4 GB memory limit.
- Implemented asynchronous loading of transformer stack layers (28 for Gemma 1.1 7B).
- Reduced peak WebAssembly memory usage to less than 1% of previous requirements.

Cross-Platform Compatibility
- Compiled the C++ codebase to WebAssembly for broad browser support.
- Utilized the WebGPU API for native GPU acceleration in browsers.

Here's why this matters:

1. Privacy: No need to send data to remote servers.
2. Cost-Efficiency: Eliminates server expenses.
3. Offline Capabilities: Use powerful AI without an internet connection.

Blog: https://research.google/blog/unlocking-7b-language-models-in-your-browser-a-deep-dive-with-google-ai-edges-mediapipe/
The Hugging Face Semantic Dataset Search Space is back in action! You can find similar datasets by ID or perform a semantic search of dataset cards.

Give it a try:
librarian-bots/huggingface-datasets-semantic-search
https://huggingface.co/spaces/librarian-bots/huggingface-datasets-semantic-search Semantic Dataset Search - a Hugging Face Space by librarian-bots
Ultimate FLUX LoRA Training Tutorial: Windows and Cloud Deployment

I have done total 104 different LoRA trainings and compared each one of them to find the very best hyper parameters and the workflow for FLUX LoRA training by using Kohya GUI training script.

You can see all the done experiments’ checkpoint names and their repo links in following public post: https://www.patreon.com/posts/110838414

After completing all these FLUX LoRA trainings by using the most VRAM optimal and performant optimizer Adafactor I came up with all of the following ranked ready to use configurations.

You can download all the configurations, all research data, installers and instructions at the following link : https://www.patreon.com/posts/110879657


Tutorials
I also have prepared 2 full tutorials. First tutorial covers how to train and use the best FLUX LoRA locally on your Windows computer : https://youtu.be/nySGu12Y05k

This is the main tutorial that you have to watch without skipping to learn everything. It has total 74 chapters, manually written English captions. It is a perfect resource to become 0 to hero for FLUX LoRA training.

The second tutorial I have prepared is for how to train FLUX LoRA on cloud. This tutorial is super extremely important for several reasons. If you don’t have a powerful GPU, you can rent a very powerful and very cheap GPU on Massed Compute and RunPod. I prefer Massed Compute since it is faster and cheaper with our special coupon SECourses. Another reason is that in this tutorial video, I have fully in details shown how to train on a multiple GPU setup to scale your training speed. Moreover, I have shown how to upload your checkpoints and files ultra fast to Hugging Face for saving and transferring for free. Still watch first above Windows tutorial to be able to follow below cloud tutorial : https://youtu.be/-uhL2nW7Ddw

For upscaling SUPIR used : https://youtu.be/OYxVEvDf284 All The LoRA FLUX Training Experiments I Have Done So Far | SECourses: Tutorials, Guides, Resources, Training, FLUX, MidJourney…
NEW RELEASE!

- MOTH is a generalist chat model, using high quality synthetic data to improve general performance.
- Currently available for Llama 3.1 and Gemma 2, more models to follow in the future.

get the models:
sequelbox/Llama3.1-8B-MOTH https://huggingface.co/sequelbox/Llama3.1-8B-MOTH

sequelbox/gemma-2-9B-MOTHhttps://huggingface.co/sequelbox/gemma-2-9B-MOTH


get the dataset:
sequelbox/Supernova


<3 for everyone to use <3 sequelbox/Llama3.1-8B-MOTH · Hugging Face
The world’s first multilingual ColBERT: Jina ColBERT V2 and its “Russian Doll” technology
In the field of RAG, the multi-vector model ColBERT improves retrieval accuracy by generating independent vectors for each token of the document. But it also brings about a sharp increase in storage requirements, and only supports English, which limits its application scope. To solve these problems, we improved the architecture and training process of ColBERT, especially making breakthroughs in multi-language processing. The latest Jina-ColBERT-v2 supports 89 languages ​​and introduces custom output dimension options, significantly reducing storage requirements and improving the efficiency and accuracy of multi-language retrieval. The core highlights of the new version are performance enhancements: compared with the original ColBERT-v2, the English retrieval performance has improved by 6.5%; compared with the previous generation jina-colbert-v1-en, the performance has also improved by 5.4%. Multi-language support: The new version supports up to 89 languages, covering Arabic, Chinese, English, Japanese, Russian and other languages, and also supports programming languages. The output dimensions can be customized: The new version adopts "Russian doll" representation learning technology (Matryoshka Representation Learning, MRL) and provides 128, 96 and 64-dimensional output vector options, allowing users to choose the appropriate dimensions according to actual needs. The full technical report can be found on arXiv: https://arxiv.org/abs/2408.16672
SemanticFinder now supports WebGPU thanks to @Xenova's efforts with transformers.js v3!
Expect massive performance gains. Inferenced a whole book with 46k chunks in <5min. If your device doesn't support #WebGPU use the classic Wasm-based version:
- WebGPU: https://do-me.github.io/SemanticFinder/webgpu/
- Wasm: https://do-me.github.io/SemanticFinder/

WebGPU harnesses the full power of your hardware, no longer being restricted to just the CPU. The speedup is significant (4-60x) for all kinds of devices: consumer-grade laptops, heavy Nvidia GPU setups or Apple Silicon. Measure the difference for your device here:
Xenova/webgpu-embedding-benchmark

Chrome currently works out of the box, Firefox requires some tweaking.

WebGPU + transformers.js allows to build amazing applications and make them accessible to everyone. E.g. SemanticFinder could become a simple GUI for populating your (vector) DB of choice. See the pre-indexed community texts here:
do-me/SemanticFinder

Happy to hear your ideas!
This is an absolutely mind-boggling experiment!

@GuangyuRobert (Twitter Handle) from MIT has created Project Sid, which simulates over 1,000 autonomous AI agents collaborating in a Minecraft environment, operating for extended periods without human intervention. This simulation demonstrates unprecedented levels of agent interaction, decision-making, and societal development.

Agents operate independently for hours or days, showcasing advanced decision-making algorithms and goal-oriented behavior.

The simulation produced complex, emergent phenomena, including:
- Economic systems with currency (gems) and trading
- Cultural development and religious practices
- Agents even understood bribing. Priests were moving the most gems to bribe people into following them!
- Governmental structures and democratic processes

Project Sid addresses fundamental challenges in AI research:
- Coherence: Maintaining consistent agent behavior over extended periods.
- Multi-agent Collaboration: Enabling effective communication and coordination among numerous AI entities.
- Long-term Progression: Developing agents capable of learning and evolving over time.

While Minecraft serves as the initial testbed, the underlying AI architecture is designed to be game-agnostic, suggesting potential applications in various digital environments and real-world simulations.

Imagine a policy being debated by the government and how it might affect society; Sid can simulate its impact!

Even if this remains just a game experiment, the project successfully manages 1,000+ agents simultaneously, a feat that requires robust distributed computing and efficient agent architecture.

02:35
🌐 Introducing Edupres.ru Presentations Dataset -
nyuuzyou/edupres


Dataset highlights:
- Metadata for 44,210 presentations from edupres.ru
- 21,941 presentations available in original format
- Multilingual content: Primarily Russian, with some Ukrainian, Belarusian, and English
- Each entry includes: URL, title, description, author, publication date, file size, and download link
- Data reflects educational presentations accessible through the Edupres.ru platform
- Licensed under Creative Commons Zero (CC0) for unrestricted use

This dataset offers a unique window into online educational resources, particularly in Russian-language contexts. It provides opportunities for analyzing presentation trends, topic distributions, and language patterns in educational materials. The dataset is particularly well-suited for tasks such as text classification and text retrieval in multilingual educational settings.
An example of the application of LegalKit is the production of knowledge graphs, here is a demo Space 🔗

With the update of the French legal code data model uploaded to 🤗 and the introduction of a column dedicated to HTML text, it's now easy to extract links between different articles and produce complex graphs with just a few lines of Python.

This simplified demo highlights the ease of implementation and creative potential, and enables the generation of complete data sets, although requiring a powerful graphics card for display. The framework used for the moment is D3.js, but perhaps other solutions are possible. I'd be delighted to hear your suggestions, and look forward to hearing from the community.

Link to the 🤗 Space:
louisbrulenaudet/legalkit-knowledge-graph
I'm excited to share my article introducing AISAK's new flagship model, AISAK-O (Artificially Intelligent Swiss Army Knife OPTIMUM). You can read the full details here:

https://huggingface.co/blog/mandelakori/aisak-o

Key highlights of AISAK-O include:

8 billion parameters and a 32k token context length
Multimodal capabilities for processing both text and visual data
Impressive benchmark scores, surpassing GPT-4V in some areas
Specialized in tasks like image captioning, visual reasoning, and cohesive content generation
Efficient architecture competing with larger models
We're also offering a unique beta testing opportunity with access to inference code.

For more information or partnership inquiries, please contact us at [email protected].

I hope you find this advancement in multimodal AI as exciting as we do!
aisak-ai/O Introducing AISAK-O
Reflection Llama 3.1 70B (Correct Weights) on ZeroGPU thanks to llama.cpp and unsloth (for quantization)

ZeroGPU space
-
gokaygokay/Reflection-70B-llamacpp https://huggingface.co/spaces/gokaygokay/Reflection-70B-llamacpp


- Working Model
mattshumer/ref_70_e3


- Quantized Models
unsloth/Reflection-Llama-3.1-70B-GGUF Reflection 70B llama.cpp (Correct Weights) - a Hugging Face Space by gokaygokay
FLUX Gif Generator
Create GIFs with Flux-dev. Based on @fofr's tweet.

For better results include a description of the motion in your prompt
Reflection Llama-3.1 70B
| IMPORTANT UPDATE – There was an issue with the model when we first uploaded it. If you tried it and didn't have good results, please, try again, we think we've fixed the issue.

Reflection Llama-3.1 70B is (currently) the world's top open-source LLM, trained with a new technique called Reflection-Tuning that teaches a LLM to detect mistakes in its reasoning and correct course.

The model was trained on synthetic data generated by Glaive. If you're training a model, Glaive is incredible — use them.

You can try the model here.

Benchmarks
🌟 Argilla v2.1.0 goes multi-modal: Image Field, Dark Mode, Enhanched Hugging Face Hub imports and more!

🖼 Image Field: Seamlessly work with multimodal datasets
🌓 Dark Mode: Reduce eye strain with our sleek new look
🤗 Enhanced Hugging Face Hub import with the SDK
🇪🇸 Spanish UI: Breaking language barriers

Plus more improvements to supercharge your model curation workflow!

Check out the full announcement for details and code examples: https://github.com/argilla-io/argilla/compare/v2.0.1...v2.1.0 Comparing v2.0.1...v2.1.0 · argilla-io/argilla
Wanted to train a FLUX model using out-of-copyright images, so I curated concept art images from NASA.

Model: https://huggingface.co/davanstrien/nasa_concept_art
Dataset:
davanstrien/nasa_concept_art


So far, training was done without captions, but I'm experimenting with using VLLMs to generate captions to see if that improves the model. davanstrien/nasa_concept_art-flux-lora · Hugging Face
💾🧠How much VRAM will you need for training your AI model? 💾🧠
Check out this app where you convert:
Pytorch/tensorflow summary -> required VRAM
or
Parameter count -> required VRAM

Use it in: http://howmuchvram.com

And everything is open source! Ask for new functionalities or contribute in:
https://github.com/AlexBodner/How_Much_VRAM
If it's useful to you leave a star 🌟and share it to someone that will find the tool useful!
More discussion in: https://x.com/AlexBodner_/status/1832054850294812679
Yesterday @mattshumer released
mattshumer/Reflection-Llama-3.1-70B
, an impressive model that achieved incredible results in benchmarks like MMLU. The model was fine-tuned using Reflection-Tuning and the dataset used wasn't released, but I created a small recipe with distilabel that allows generating a dataset with a similar output format:

1. We use MagPie 🐦 in combination with
meta-llama/Meta-Llama-3.1-70B-Instruct
to generate reasoning instructions.
2. We generate a response again using
meta-llama/Meta-Llama-3.1-70B-Instruct
, but we steer the LLM to generate an specific output format using a custom system prompt. In the system prompt, we instruct the LLM that it will have first to think 💭 and have reflections that will help resolving ambiguities. After that, we instruct the LLM to generate an output based on the previous thinking

In this dataset
gabrielmbmb/distilabel-reflection-tuning
you can found 5 rows that I generated with this recipe. You can also found the code of the pipeline in the file called reflection.py.
FLUX Prompt Generator Updates

-
gokaygokay/FLUX-Prompt-Generator


- There are now hundreds of new selections across diverse categories, each offering a lot of choices:

Architecture, Art, Artist, Brands, Character, Cinematic, Fashion, Feelings, Geography, Human, Interaction, Keywords, Objects, People, Photography, Plots, Poses, Scene, Science, Stuff, Time, Typography, Vehicle, Video Game

- In addition to Hugging Face, I've integrated new LLM providers: Groq, OpenAI, and Claude.

- Upgraded Vision Language Models (VLMs): We now feature Qwen2-VL and Florence-2-large.

- New specialized system prompts for various styles and themes, including Happy, Simple, Poster, Only Objects, No Figure, Landscape, Fantasy.https://cdn-uploads.huggingface.co/production/uploads/630899601dd1e3075d975785/u_IZ43q0247UaH2_LK07W.png
Reposting from twitter:

Just so you all know, I'll be on vacation for the following two weeks and away from home! I'm hoping to get on at least once a day to load up some quants, but I won't be as bleeding edge and on the ball :) feel free to shoot me a message if you see one I should make!

In the meantime if you need something bleeding edge make sure to check out @MaziyarPanahi or @bullerwins who both put out great work!