Share and discover more about AI with social posts from the community.huggingface/OpenAi
Patch Time Series Transformer in Hugging Face - Getting Started
<script async defer src="https://unpkg.com/medium-zoom-element@0/dist/medium-zoom-element.min.js"></script> Open In Colab
In this blog, we provide examples of how to get started with PatchTST. We first demonstrate the forecasting capability of PatchTST on the Electricity data. We will then demonstrate the transfer learning capability of PatchTST by using the previously trained model to do zero-shot forecasting on the electrical transformer (ETTh1) dataset. The zero-shot forecasting performance will denote the test performance of the model in the target domain, without any training on the target domain. Subsequently, we will do linear probing and (then) finetuning of the pretrained model on the train part of the target data, and will validate the forecasting performance on the test part of the target data.

The PatchTST model was proposed in A Time Series is Worth 64 Words: Long-term Forecasting with Transformers by Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, Jayant Kalagnanam and presented at ICLR 2023.
PEFT: Parameter-Efficient Fine-Tuning of Billion-Scale Models on Low-Resource Hardware
Motivation
Large Language Models (LLMs) based on the transformer architecture, like GPT, T5, and BERT have achieved state-of-the-art results in various Natural Language Processing (NLP) tasks. They have also started foraying into other domains, such as Computer Vision (CV) (VIT, Stable Diffusion, LayoutLM) and Audio (Whisper, XLS-R). The conventional paradigm is large-scale pretraining on generic web-scale data, followed by fine-tuning to downstream tasks. Fine-tuning these pretrained LLMs on downstream datasets results in huge performance gains when compared to using the pretrained LLMs out-of-the-box (zero-shot inference, for example).

However, as models get larger and larger, full fine-tuning becomes infeasible to train on consumer hardware. In addition, storing and deploying fine-tuned models independently for each downstream task becomes very expensive, because fine-tuned models are the same size as the original pretrained model. Parameter-Efficient Fine-tuning (PEFT) approaches are meant to address both problems!
PEFT welcomes new merging methods
Model merging has quickly become the de-facto standard of pushing the performance limits of large language models. On the Open LLM Leaderboard, we continue to notice merged models topping up the charts. Our very own Omar Sanseviero, made a little sprint on model merging and discovered interesting findings.

The typical way of model merging, so far, has been to take a set of models and merge them. This post gives a nice primer on this topic. Generally, for merging multiple models, we first download their checkpoints and then perform merging. Depending on the merge algorithm and the sizes of the underlying model, this process can be quite memory-intensive. The mergekit library provides optimized ways for handling this, making the process manageable on limited memory.https://github.com/huggingface/blog/blob/main/peft_merging.md blog/peft_merging.md at main · huggingface/blog
Perceiver IO: a scalable, fully-attentional model that works on any modality
TLDR
We've added Perceiver IO to Transformers, the first Transformer-based neural network that works on all kinds of modalities (text, images, audio, video, point clouds,...) and combinations thereof. Take a look at the following Spaces to view some examples:

predicting optical flow between images
classifying images.
We also provide several notebooks.

Below, you can find a technical explanation of the model.
Personal Copilot: Train Your Own Coding Assistant
In the ever-evolving landscape of programming and software development, the quest for efficiency and productivity has led to remarkable innovations. One such innovation is the emergence of code generation models such as Codex, StarCoder and Code Llama. These models have demonstrated remarkable capabilities in generating human-like code snippets, thereby showing immense potential as coding assistants.

However, while these pre-trained models can perform impressively across a range of tasks, there's an exciting possibility lying just beyond the horizon: the ability to tailor a code generation model to your specific needs. Think of personalized coding assistants which could be leveraged at an enterprise scale.

In this blog post we show how we created HugCoder 🤗, a code LLM fine-tuned on the code contents from the public repositories of the huggingface GitHub organization. We will discuss our data collection workflow,
Let’s begin 🚀
A Chatbot on your Laptop: Phi-2 on Intel Meteor Lake
Because of their impressive abilities, large language models (LLMs) require significant computing power, which is seldom available on personal computers. Consequently, we have no choice but to deploy them on powerful bespoke AI servers hosted on-premises or in the cloud.

Why local LLM inference is desirable
What if we could run state-of-the-art open-source LLMs on a typical personal computer? Wouldn't we enjoy benefits like:

Increased privacy: our data would not be sent to an external API for inference.
Lower latency: we would save network round trips.
Offline work: we could work without network connectivity (a frequent flyer's dream!).
Lower cost: we wouldn't spend any money on API calls or model hosting.
Customizability: each user could find the models that best fit the tasks they work on daily, and they could even fine-tune them or use local Retrieval-Augmented Generation (RAG) to increase relevance.
Building a Playlist Generator with Sentence Transformers
<script async defer src="https://unpkg.com/medium-zoom-element@0/dist/medium-zoom-element.min.js"></script>
A short while ago I published a playlist generator that I’d built using Sentence Transformers and Gradio, and I followed that up with a reflection on how I try to use my projects as effective learning experiences. But how did I actually build the playlist generator? In this post we’ll break down that project and look at two technical details: how the embeddings were generated, and how the multi-step Gradio demo was built.

<iframe src="https://nimaboscarino-playlist-generator.hf.space" frameBorder="0" width="1400" height="690" title="Gradio app" class="p-0 flex-grow space-iframe" allow="accelerometer; ambient-light-sensor; autoplay; battery; camera; document-domain; encrypted-media; fullscreen; geolocation; gyroscope; layout-animations; legacy-image-formats; magnetometer; microphone; midi; oversized-images; payment; picture-in-picture; publickey-credentials-get; sync-xhr; usb; vr ; wake-lock; xr-spatial-tracking" sandbox="allow-forms allow-modals allow-popups allow-popups-to-escape-sandbox allow-same-origin allow-scripts allow-downloads"></iframe>
As we’ve explored in previous posts on the Hugging Face blog, Sentence Transformers (ST) is a library that gives us tools to generate sentence embeddings, which have a variety of uses. Since I had access to a dataset of song lyrics, I decided to leverage ST’s semantic search functionality to generate playlists from a given text prompt. Specifically, the goal was to create an embedding from the prompt, use that embedding for a semantic search across a set of pre-generated lyrics embeddings to generate a relevant set of songs. This would all be wrapped up in a Gradio app using the new Blocks API, hosted on Hugging Face Spaces.

We’ll be looking at a slightly advanced use of Gradio, so if you’re a beginner to the library I recommend reading the Introduction to Blocks before tackling the Gradio-specific parts of this post. Also, note that while I won’t be releasing the lyrics dataset, the lyrics embeddings are available on the Hugging Face Hub for you to play around with. Let’s jump in! 🪂
Public Policy at Hugging Face
AI Policy at Hugging Face is a multidisciplinary and cross-organizational workstream. Instead of being part of a vertical communications or global affairs organization, our policy work is rooted in the expertise of our many researchers and developers, from Ethics and Society Regulars and the legal team to machine learning engineers working on healthcare, art, and evaluations.

What we work on is informed by our Hugging Face community needs and experiences on the Hub. We champion responsible openness, investing heavily in ethics-forward research, transparency mechanisms, platform safeguards, and translate our lessons to policy.

So what have we shared with policymakers?
Pollen-Vision: Unified interface for Zero-Shot vision models in robotics
[!NOTE] This is a guest blog post by the Pollen Robotics team. We are the creators of Reachy, an open-source humanoid robot designed for manipulation in the real world.

In the context of autonomous behaviors, the essence of a robot's usability lies in its ability to understand and interact with its environment. This understanding primarily comes from visual perception, which enables robots to identify objects, recognize people, navigate spaces, and much more.

We're excited to share the initial launch of our open-source pollen-vision library, a first step towards empowering our robots with the autonomy to grasp unknown objects. This library is a carefully curated collection of vision models chosen for their direct applicability to robotics. Pollen-vision is designed for ease of installation and use, composed of independent modules that can be combined to create a 3D object detection pipeline, getting the position of the objects in 3D space (x, y, z).

We focused on selecting zero-shot models, eliminating the need for any training, and making these tools instantly usable right out of the box.

Our initial release is focused on 3D object detection—laying the groundwork for tasks like robotic grasping by providing a reliable estimate of objects' spatial coordinates. Currently limited to positioning within a 3D space (not extending to full 6D pose estimation), this functionality establishes a solid foundation for basic robotic manipulation tasks.
Porting fairseq wmt19 translation system to transformers
A guest blog post by Stas Bekman
This article is an attempt to document how fairseq wmt19 translation system was ported to transformers.

I was looking for some interesting project to work on and Sam Shleifer suggested I work on porting a high quality translator.

I read the short paper: Facebook FAIR's WMT19 News Translation Task Submission that describes the original system and decided to give it a try.

Initially, I had no idea how to approach this complex project and Sam helped me to break it down to smaller tasks, which was of a great help.

I chose to work with the pre-trained en-ru/ru-en models during porting as I speak both languages. It'd have been much more difficult to work with de-en/en-de pairs as I don't speak German, and being able to evaluate the translation quality by just reading and making sense of the outputs at the advanced stages of the porting process saved me a lot of time.

Also, as I did the initial porting with the en-ru/ru-en models, I was totally unaware that the de-en/en-de models used a merged vocabulary, whereas the former used 2 separate vocabularies of different sizes. So once I did the more complicated work of supporting 2 separate vocabularies, it was trivial to get the merged vocabulary to work.
Preference Tuning LLMs with Direct Preference Optimization Methods
Addendum

After consulting with the authors of the IPO paper, we discovered that the implementation of IPO in TRL was incorrect; in particular, the loss over the log-likelihoods of the completions needs to be averaged instead of summed. We have added a fix in this PR and re-run the experiments. The results are now consistent with the paper, with IPO on par with DPO and performing better than KTO in the paired preference setting. We have updated the post to reflect these new results.

TL;DR

We evaluate three promising methods to align language models without reinforcement learning (or preference tuning) on a number of models and hyperparameter settings. In particular we train using different hyperparameters and evaluate on:

Direct Preference Optimization (DPO)
Identity Preference Optimisation (IPO)
Kahneman-Tversky Optimisation (KTO)
https://github.com/huggingface/blog/blob/main/pref-tuning.md blog/pref-tuning.md at main · huggingface/blog
Experimenting with Automatic PII Detection on the Hub using Presidio
At Hugging Face, we've noticed a concerning trend in machine learning (ML) datasets hosted on our Hub: Undocumented private information about individuals. This poses some unique challenges for ML practitioners. In this blog post, we'll explore different types of datasets containing a type of private information known as Personally Identifying Information (PII), the issues they present, and a new feature we're experimenting with on the Dataset Hub to help address these challenges.

Types of Datasets with PII
We noticed two types of datasets that contain PII:

Annotated PII datasets: Datasets like PII-Masking-300k by Ai4Privacy are specifically designed to train PII Detection Models, which are used to detect and mask PII. For example, these models can help with online content moderation or provide anonymized databases.
Pre-training datasets: These are large-scale datasets, often terabytes in size, that are typically obtained through web crawls. While these datasets are generally filtered to remove certain types of PII, small amounts of sensitive information can still slip through the cracks due to the sheer volume of data and the imperfections of PII Detection Models.https://github.com/huggingface/blog/blob/main/presidio-pii-detection.md blog/presidio-pii-detection.md at main · huggingface/blog
Pre-Training BERT with Hugging Face Transformers and Habana Gaudi
In this Tutorial, you will learn how to pre-train BERT-base from scratch using a Habana Gaudi-based DL1 instance on AWS to take advantage of the cost-performance benefits of Gaudi. We will use the Hugging Face Transformers, Optimum Habana and Datasets libraries to pre-train a BERT-base model using masked-language modeling, one of the two original BERT pre-training tasks. Before we get started, we need to set up the deep learning environment.

View Code
You will learn how to:

Prepare the dataset
Train a Tokenizer
Preprocess the dataset
Pre-train BERT on Habana Gaudi
Note: Steps 1 to 3 can/should be run on a different instance size since those are CPU intensive tasks.
Going multimodal: How Prezi is leveraging the Hub and the Expert Support Program to accelerate their ML roadmap
Everybody knows that a great visual is worth a thousand words. The team at Prezi, a visual communications software company, is putting this insight into practice with their Prezi presentations that combine images and text in highly dynamic presentations.

Prezi has joined the Hugging Face Expert Support Program to fully leverage modern machine learning's potential. Over the past months, Hugging Face has supported Prezi in integrating smaller, more efficient open-source models into their ML workflows. This cooperation started at a perfect time, as multimodal models are becoming increasingly capable.

We recently sat down with Máté Börcsök, a backend engineer at Prezi, to talk about their experience in the Expert Support Program. In this short video, Máté walks us through some of their machine learning work and shares their experience collaborating with our team via the Expert Support Program.

<iframe width="100%" style="aspect-ratio: 16 / 9;" src="https://www.youtube.com/embed/pM6D0tRoIbI" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
If you'd like to accelerate your machine learning roadmap with the help of our experts, as Máté and his team did, visit hf.co/support to learn more about our Expert Support Program and request a quote.
Introducing our new pricing
As you might have noticed, our pricing page has changed a lot recently.

First of all, we are sunsetting the Paid tier of the Inference API service. The Inference API will still be available for everyone to use for free. But if you're looking for a fast, enterprise-grade inference as a service, we recommend checking out our brand new solution for this: Inference Endpoints.

Along with Inference Endpoints, we've recently introduced hardware upgrades for Spaces, which allows running ML demos with the hardware of your choice. No subscription is required to use these services; you only need to add a credit card to your account from your billing settings. You can also attach a payment method to any of your organizations.

Your billing settings centralize everything about our paid services. From there, you can manage your personal PRO subscription, update your payment method, and visualize your usage for the past three months. Usage for all our paid services and subscriptions will be charged at the start of each month, and a consolidated invoice will be available for your records.

TL;DR: At HF we monetize by providing simple access to compute for AI, with services like AutoTrain, Spaces and Inference Endpoints, directly accessible from the Hub. Read more about our pricing and billing system.

If you have any questions, feel free to reach out. We welcome your feedback 🔥
Introducing Prodigy-HF
Prodigy is an annotation tool made by Explosion, a company well known as the creators of spaCy. It's a fully scriptable product with a large community around it. The product has many features, including tight integration with spaCy and active learning capabilities. But the main feature of the product is that it is programmatically customizable with Python.

To foster this customisability, Explosion has started releasing plugins. These plugins integrate with third-party tools in an open way that encourages users to work on bespoke annotation workflows. However, one customization specifically deserves to be celebrated explicitly. Last week, Explosion introduced Prodigy-HF, which offers code recipes that directly integrate with the Hugging Face stack. It's been a much-requested feature on the Prodigy support forum, so we're super excited to have it out there.
Putting RL back in RLHF
We are excited to introduce the RLOO (REINFORCE Leave One-Out) Trainer in TRL. As an alternative to PPO, RLOO is a new online RLHF training algorithm designed to be more accessible and easier to implement. In particular, RLOO requires less GPU memory and takes less wall time to converge. As shown in the figures below:

🤑RLOO uses approximately 50-70% less vRAM than PPO, depending on the model size
🚀RLOO runs 2x faster than PPO with 1B models and up to 3x faster than PPO with 6.9B models.
🔥RLOO performs competitively to PPO in terms of the response win rate (judged by GPT4) and consistently outperforms popular offline methods like DPO.
With RLOO, we bring Reinforcement Learning back into RLHF, enabling the community to explore online RL methods more easily. This is exciting because more and more studies have shown that online RL is more effective than offline methods such as DPO (https://arxiv.org/abs/2402.04792, https://arxiv.org/abs/2405.08448).
From PyTorch DDP to Accelerate to Trainer, mastery of distributed training with ease
General Overview
This tutorial assumes you have a basic understanding of PyTorch and how to train a simple model. It will showcase training on multiple GPUs through a process called Distributed Data Parallelism (DDP) through three different levels of increasing abstraction:

Native PyTorch DDP through the pytorch.distributed module
Utilizing 🤗 Accelerate's light wrapper around pytorch.distributed that also helps ensure the code can be run on a single GPU and TPUs with zero code changes and miminimal code changes to the original code
Utilizing 🤗 Transformer's high-level Trainer API which abstracts all the boilerplate code and supports various devices and distributed scenarios
Accelerate Large Model Training using PyTorch Fully Sharded Data Parallel
In this post we will look at how we can leverage Accelerate Library for training large models which enables users to leverage the latest features of PyTorch FullyShardedDataParallel (FSDP).

Motivation 🤗
With the ever increasing scale, size and parameters of the Machine Learning (ML) models, ML practitioners are finding it difficult to train or even load such large models on their hardware. On one hand, it has been found that large models learn quickly (data and compute efficient) and are significantly more performant when compared to smaller models [1]; on the other hand, it becomes prohibitive to train such models on most of the available hardware.

Distributed training is the key to enable training such large ML models. There have been major recent advances in the field of Distributed Training at Scale. Few the most notable advances are given below:
Hugging Face on PyTorch / XLA TPUs: Faster and cheaper training
Open In Colab

Training Your Favorite Transformers on Cloud TPUs using PyTorch / XLA
The PyTorch-TPU project originated as a collaborative effort between the Facebook PyTorch and Google TPU teams and officially launched at the 2019 PyTorch Developer Conference 2019. Since then, we’ve worked with the Hugging Face team to bring first-class support to training on Cloud TPUs using PyTorch / XLA. This new integration enables PyTorch users to run and scale up their models on Cloud TPUs while maintaining the exact same Hugging Face trainers interface.

This blog post provides an overview of changes made in the Hugging Face library, what the PyTorch / XLA library does, an example to get you started training your favorite transformers on Cloud TPUs, and some performance benchmarks. If you can’t wait to get started with TPUs, please skip ahead to the “Train Your Transformer on Cloud TPUs” section - we handle all the PyTorch / XLA mechanics for you within the Trainer module!