Share and discover more about AI with social posts from the community.huggingface/OpenAi
Introducing Storage Regions on the Hub
As part of our Enterprise Hub plan, we recently released support for Storage Regions.

Regions let you decide where your org's models and datasets will be stored. This has two main benefits, which we'll briefly go over in this blog post:

Regulatory and legal compliance, and more generally, better digital sovereignty
Performance (improved download and upload speeds and latency)
Currently we support the following regions:

US 🇺🇸
EU 🇪🇺
coming soon: Asia-Pacific 🌏
But first, let's see how to setup this feature in your organization's settings 🔥
Creating open machine learning datasets? Share them on the Hugging Face Hub!
Who is this blog post for?
Are you a researcher doing data-intensive research or using machine learning as a research tool? As part of this research, you have likely created datasets for training and evaluating machine learning models, and like many researchers, you may be sharing these datasets via Google Drive, OneDrive, or your own personal server. In this post, we’ll outline why you might want to consider sharing these datasets on the Hugging Face Hub instead.

This post outlines:

Why researchers should openly share their data (feel free to skip this section if you are already convinced about this!)
What the Hugging Face Hub offers for researchers who want to share their datasets.
Resources for getting started with sharing your datasets on the Hugging Face Hub.
https://github.com/huggingface/blog/blob/main/researcher-dataset-sharing.md blog/researcher-dataset-sharing.md at main · huggingface/blog
Illustrating Reinforcement Learning from Human Feedback (RLHF)
Language models have shown impressive capabilities in the past few years by generating diverse and compelling text from human input prompts. However, what makes a "good" text is inherently hard to define as it is subjective and context dependent. There are many applications such as writing stories where you want creativity, pieces of informative text which should be truthful, or code snippets that we want to be executable.
https://github.com/huggingface/blog/blob/main/rlhf.md
Rocket Money x Hugging Face: Scaling Volatile ML Models in Production
"We discovered that they were not just service providers, but partners who were invested in our goals and outcomes” - Nicolas Kuzak, Senior ML Engineer at Rocket Money.
Scaling and Maintaining ML Models in Production Without an MLOps Team
We created Rocket Money (a personal finance app formerly known as Truebill) to help users improve their financial wellbeing. Users link their bank accounts to the app which then classifies and categorizes their transactions, identifying recurring patterns to provide a consolidated, comprehensive view of their personal financial life. A critical stage of transaction processing is detecting known merchants and services, some of which Rocket Money can cancel and negotiate the cost of for members. This detection starts with the transformation of short, often truncated and cryptically formatted transaction strings into classes we can use to enrich our product experience.
Deploy MusicGen in no time with Inference Endpoints
MusicGen is a powerful music generation model that takes in text prompt and an optional melody to output music. This blog post will guide you through generating music with MusicGen using Inference Endpoints.

Inference Endpoints allow us to write custom inference functions called custom handlers. These are particularly useful when a model is not supported out-of-the-box by the transformers high-level abstraction pipeline.

transformers pipelines offer powerful abstractions to run inference with transformers-based models. Inference Endpoints leverage the pipeline API to easily deploy models with only a few clicks. However, Inference Endpoints can also be used to deploy models that don't have a pipeline, or even non-transformer models! This is achieved using a custom inference function that we call a custom handler.

Let's demonstrate this process using MusicGen as an example. To implement a custom handler function for MusicGen and deploy it, we will need to:

Duplicate the MusicGen repository we want to serve,
Write a custom handler in handler.py and any dependencies in requirements.txt and add them to the duplicated repository,
Create Inference Endpoint for that repository.
Or simply use the final result and deploy our custom MusicGen model repo, where we just followed the steps above :)
Introducing RWKV - An RNN with the advantages of a transformer
ChatGPT and chatbot-powered applications have captured significant attention in the Natural Language Processing (NLP) domain. The community is constantly seeking strong, reliable and open-source models for their applications and use cases. The rise of these powerful models stems from the democratization and widespread adoption of transformer-based models, first introduced by Vaswani et al. in 2017. These models significantly outperformed previous SoTA NLP models based on Recurrent Neural Networks (RNNs), which were considered dead after that paper. Through this blogpost, we will introduce the integration of a new architecture, RWKV, that combines the advantages of both RNNs and transformers, and that has been recently integrated into the Hugging Face transformers library.
https://github.com/huggingface/blog/blob/main/rwkv.md blog/rwkv.md at main · huggingface/blog
Ryght’s Journey to Empower Healthcare and Life Sciences with Expert Support from Hugging Face
[!NOTE] This is a guest blog post by the Ryght team.

Who is Ryght?
Ryght is building an enterprise-grade generative AI platform tailored for the healthcare and life sciences sectors. Today is their official launch of Ryght Preview, now publicly available for all.

Life science companies are amassing a wealth of data from diverse sources (lab data, EMR, genomics, claims, pharmacy, clinical, etc.), but analysis of that data is archaic, requiring large teams for everything from simple queries to developing useful ML models. There is huge demand for actionable knowledge to drive drug development, clinical trials, and commercial activity, and the rise of precision medicine is only accelerating this demand.
SafeCoder vs. Closed-source Code Assistants
For decades, software developers have designed methodologies, processes, and tools that help them improve code quality and increase productivity. For instance, agile, test-driven development, code reviews, and CI/CD are now staples in the software industry.

In "How Google Tests Software" (Addison-Wesley, 2012), Google reports that fixing a bug during system tests - the final testing stage - is 1000x more expensive than fixing it at the unit testing stage. This puts much pressure on developers - the first link in the chain - to write quality code from the get-go.

For all the hype surrounding generative AI, code generation seems a promising way to help developers deliver better code fast. Indeed, early studies show that managed services like GitHub Copilot or Amazon CodeWhisperer help developers be more productive.

However, these services rely on closed-source models that can't be customized to your technical culture and processes. Hugging Face released SafeCoder a few weeks ago to fix this. SafeCoder is a code assistant solution built for the enterprise that gives you state-of-the-art models, transparency, customizability, IT flexibility, and privacy.

In this post, we'll compare SafeCoder to closed-source services and highlight the benefits you can expect from our solution.
https://github.com/huggingface/blog/blob/main/safecoder-vs-closed-source-code-assistants.md blog/safecoder-vs-closed-source-code-assistants.md at main · huggingface/blog
Audit shows that safetensors is safe and ready to become the default
Hugging Face, in close collaboration with EleutherAI and Stability AI, has ordered an external security audit of the safetensors library, the results of which allow all three organizations to move toward making the library the default format for saved models.

The full results of the security audit, performed by Trail of Bits, can be found here: Report.

The following blog post explains the origins of the library, why these audit results are important, and the next steps.
https://github.com/huggingface/blog/blob/main/safetensors-security-audit.md blog/safetensors-security-audit.md at main · huggingface/blog
Distributed Training: Train BART/T5 for Summarization using 🤗 Transformers and Amazon SageMaker
Open on Github
In case you missed it: on March 25th we announced a collaboration with Amazon SageMaker to make it easier to create State-of-the-Art Machine Learning models, and ship cutting-edge NLP features faster.

Together with the SageMaker team, we built 🤗 Transformers optimized Deep Learning Containers to accelerate training of Transformers-based models. Thanks AWS friends!🤗 🚀

With the new HuggingFace estimator in the SageMaker Python SDK, you can start training with a single line of code.
Introducing the Hugging Face Embedding Container for Amazon SageMaker
We are excited to announce that the new Hugging Face Embedding Container for Amazon SageMaker is now generally available (GA). AWS customers can now efficiently deploy embedding models on SageMaker to build Generative AI applications, including Retrieval-Augmented Generation (RAG) applications.

In this Blog we will show you how to deploy open Embedding Models, like Snowflake/snowflake-arctic-embed-l, BAAI/bge-large-en-v1.5 or sentence-transformers/all-MiniLM-L6-v2 to Amazon SageMaker for inference using the new Hugging Face Embedding Container. We will deploy the Snowflake/snowflake-arctic-embed-m-v1.5 one of the best open Embedding Models for retrieval - you can check its rankings on the MTEB Leaderboard.

The example covers:

1. Setup development environment
2. Retrieve the new Hugging Face Embedding Container
3. Deploy Snowflake Arctic to Amazon SageMaker
4. Run and evaluate Inference performance
5. Delete model and endpoint
https://github.com/huggingface/blog/blob/main/sagemaker-huggingface-embedding.md blog/sagemaker-huggingface-embedding.md at main · huggingface/blog
Introducing the Hugging Face LLM Inference Container for Amazon SageMaker
This is an example on how to deploy the open-source LLMs, like BLOOM to Amazon SageMaker for inference using the new Hugging Face LLM Inference Container. We will deploy the 12B Pythia Open Assistant Model, an open-source Chat LLM trained with the Open Assistant dataset.

The example covers:

Setup development environment
Retrieve the new Hugging Face LLM DLC
Deploy Open Assistant 12B to Amazon SageMaker
Run inference and chat with our model
Create Gradio Chatbot backed by Amazon SageMaker
You can find the code for the example also in the notebooks repository.https://github.com/huggingface/blog/blob/main/sagemaker-huggingface-llm.md blog/sagemaker-huggingface-llm.md at main · huggingface/blog
Machine Learning Experts - Sasha Luccioni
🤗 Welcome to Machine Learning Experts - Sasha Luccioni
🚀 If you're interested in learning how ML Experts, like Sasha, can help accelerate your ML roadmap visit: hf.co/support.

Hey friends! Welcome to Machine Learning Experts. I'm your host, Britney Muller and today’s guest is Sasha Luccioni. Sasha is a Research Scientist at Hugging Face where she works on the ethical and societal impacts of Machine Learning models and datasets.

Sasha is also a co-chair of the Carbon Footprint WG of the Big Science Workshop, on the Board of WiML, and a founding member of the Climate Change AI (CCAI) organization which catalyzes impactful work applying machine learning to the climate crisis.https://github.com/huggingface/blog/blob/main/sasha-luccioni-interview.md Expert Support – Hugging Face
Welcome Stable-baselines3 to the Hugging Face Hub 🤗
At Hugging Face, we are contributing to the ecosystem for Deep Reinforcement Learning researchers and enthusiasts. That’s why we’re happy to announce that we integrated Stable-Baselines3 to the Hugging Face Hub.

Stable-Baselines3 is one of the most popular PyTorch Deep Reinforcement Learning library that makes it easy to train and test your agents in a variety of environments (Gym, Atari, MuJoco, Procgen...). With this integration, you can now host your saved models 💾 and load powerful models from the community.

In this article, we’re going to show how you can do it.
StarCoder2-Instruct: Fully Transparent and Permissive Self-Alignment for Code Generation
Instruction tuning is an approach of fine-tuning that gives large language models (LLMs) the capability to follow natural and human-written instructions. However, for programming tasks, most models are tuned on either human-written instructions (which are very expensive) or instructions generated by huge and proprietary LLMs (which may not be permitted). We introduce StarCoder2-15B-Instruct-v0.1, the very first entirely self-aligned code LLM trained with a fully permissive and transparent pipeline. Our open-source pipeline uses StarCoder2-15B to generate thousands of instruction-response pairs, which are then used to fine-tune StarCoder-15B itself without any human annotations or distilled data from huge and proprietary LLMs.
Interactively explore your Huggingface dataset with one line of code
The Hugging Face datasets library not only provides access to more than 70k publicly available datasets, but also offers very convenient data preparation pipelines for custom datasets.

Renumics Spotlight allows you to create interactive visualizations to identify critical clusters in your data. Because Spotlight understands the data semantics within Hugging Face datasets, you can get started with just one line of code:
Diffusers welcomes Stable Diffusion 3
Stable Diffusion 3 (SD3), Stability AI’s latest iteration of the Stable Diffusion family of models, is now available on the Hugging Face Hub and can be used with 🧨 Diffusers.

The model released today is Stable Diffusion 3 Medium, with 2B parameters.

As part of this release, we have provided:

Models on the Hub
Diffusers Integration
SD3 Dreambooth and LoRA training scripts
Open-sourcing Knowledge Distillation Code and Weights of SD-Small and SD-Tiny
In recent times, the AI community has witnessed a remarkable surge in the development of larger and more performant language models, such as Falcon 40B, LLaMa-2 70B, Falcon 40B, MPT 30B, and in the imaging domain with models like SD2.1 and SDXL. These advancements have undoubtedly pushed the boundaries of what AI can achieve, enabling highly versatile and state-of-the-art image generation and language understanding capabilities. However, as we marvel at the power and complexity of these models, it is essential to recognize a growing need to make AI models smaller, efficient, and more accessible, particularly by open-sourcing them.
Accelerating Stable Diffusion XL Inference with JAX on Cloud TPU v5e
Generative AI models, such as Stable Diffusion XL (SDXL), enable the creation of high-quality, realistic content with wide-ranging applications. However, harnessing the power of such models presents significant challenges and computational costs. SDXL is a large image generation model whose UNet component is about three times as large as the one in the previous version of the model. Deploying a model like this in production is challenging due to the increased memory requirements, as well as increased inference times. Today, we are thrilled to announce that Hugging Face Diffusers now supports serving SDXL using JAX on Cloud TPUs, enabling high-performance, cost-efficient inference.

Google Cloud TPUs are custom-designed AI accelerators, which are optimized for training and inference of large AI models, including state-of-the-art LLMs and generative AI models such as SDXL. The new Cloud TPU v5e is purpose-built to bring the cost-efficiency and performance required for large-scale AI training and inference. At less than half the cost of TPU v4, TPU v5e makes it possible for more organizations to train and deploy AI models.

🧨 Diffusers JAX integration offers a convenient way to run SDXL on TPU via XLA, and we built a demo to showcase it. You can try it out in this Space or in the playground embedded below:

<script type="module" src="https://gradio.s3-us-west-2.amazonaws.com/3.45.1/gradio.js"> </script>
Under the hood, this demo runs on several TPU v5e-4 instances (each instance has 4 TPU chips) and takes advantage of parallelization to serve four large 1024×1024 images in about 4 seconds. This time includes format conversions, communications time, and frontend processing; the actual generation time is about 2.3s, as we'll see below!
LoRA training scripts of the world, unite!
A community derived guide to some of the SOTA practices for SD-XL Dreambooth LoRA fine tuning

TL;DR

We combined the Pivotal Tuning technique used on Replicate's SDXL Cog trainer with the Prodigy optimizer used in the Kohya trainer (plus a bunch of other optimizations) to achieve very good results on training Dreambooth LoRAs for SDXL. Check out the training script on diffusers🧨. Try it out on Colab.

If you want to skip the technical talk, you can use all the techniques in this blog and train on Hugging Face Spaces with a simple UI and curated parameters (that you can meddle with).