HF-hub - Share and discover more about AI with social posts from the community.huggingface/OpenAi
Share and discover more about AI with social posts from the community.huggingface/OpenAi
Multimodalart/FLUX.1-merged
https://huggingface.co/spaces/multimodalart/FLUX.1-merged
accelerate
git+https://github.com/huggingface/diffusers.git
torch
transformers==4.42.4
xformers
sentencepiece
Model Cards
Introduction
Model cards are an important documentation framework for understanding, sharing, and improving machine learning models. When done well, a model card can serve as a boundary object, a single artefact that is accessible to people with different backgrounds and goals in understanding models - including developers, students, policymakers, ethicists, and those impacted by machine learning models.

Today, we launch a model card creation tool and a model card Guide Book, which details how to fill out model cards, user studies, and state of the art in ML documentation. This work, building from many other people and organizations, focuses on the inclusion of people with different backgrounds and roles. We hope it serves as a stepping stone in the path toward improved ML documentation.

In sum, today we announce the release of:

A Model Card Creator Tool, to ease card creation without needing to program, and to help teams share the work of different sections.

An updated model card template, released in the huggingface_hub library, drawing together model card work in academia and throughout the industry.

An Annotated Model Card Template, which details how to fill the card out.

A User Study on model card usage at Hugging Face.
MTEB: Massive Text Embedding Benchmark
MTEB is a massive benchmark for measuring the performance of text embedding models on diverse embedding tasks.

The 🥇 leaderboard provides a holistic view of the best text embedding models out there on a variety of tasks.

The 📝 paper gives background on the tasks and datasets in MTEB and analyzes leaderboard results!

The 💻 Github repo contains the code for benchmarking and submitting any model of your choice to the leaderboard.
TGI Multi-LoRA: Deploy Once, Serve 30 models
Are you tired of the complexity and expense of managing multiple AI models? What if you could deploy once and serve 30 models? In today's ML world, organizations looking to leverage the value of their data will likely end up in a fine-tuned world, building a multitude of models, each one highly specialized for a specific task. But how can you keep up with the hassle and cost of deploying a model for each use case? The answer is Multi-LoRA serving.

Motivation
As an organization, building a multitude of models via fine-tuning makes sense for multiple reasons.

Performance - There is compelling evidence that smaller, specialized models outperform their larger, general-purpose counterparts on the tasks that they were trained on. Predibase [5] showed that you can get better performance than GPT-4 using task-specific LoRAs with a base like mistralai/Mistral-7B-v0.1.

Adaptability - Models like Mistral or Llama are extremely versatile. You can pick one of them as your base model and build many specialized models, even when the downstream tasks are very different. Also, note that you aren't locked in as you can easily swap that base and fine-tune it with your data on another base (more on this later).

Independence - For each task that your organization cares about, different teams can work on different fine tunes, allowing for independence in data preparation, configurations, evaluation criteria, and cadence of model updates.

Privacy - Specialized models offer flexibility with training data segregation and access restrictions to different users based on data privacy requirements. Additionally, in cases where running models locally is important, a small model can be made highly capable for a specific task while keeping its size small enough to run on device.https://github.com/huggingface/blog/blob/main/multi-lora-serving.md blog/multi-lora-serving.md at main · huggingface/blog
Total noob’s intro to Hugging Face Transformers
Welcome to "A Total Noob’s Introduction to Hugging Face Transformers," a guide designed specifically for those looking to understand the bare basics of using open-source ML. Our goal is to demystify what Hugging Face Transformers is and how it works, not to turn you into a machine learning practitioner, but to enable better understanding of and collaboration with those who are. That being said, the best way to learn is by doing, so we'll walk through a simple worked example of running Microsoft’s Phi-2 LLM in a notebook on a Hugging Face space.

You might wonder, with the abundance of tutorials on Hugging Face already available, why create another? The answer lies in accessibility: most existing resources assume some technical background, including Python proficiency, which can prevent non-technical individuals from grasping ML fundamentals. As someone who came from the business side of AI, I recognize that the learning curve presents a barrier and wanted to offer a more approachable path for like-minded learners.

Therefore, this guide is tailored for a non-technical audience keen to better understand open-source machine learning without having to learn Python from scratch. We assume no prior knowledge and will explain concepts from the ground up to ensure clarity. If you're an engineer, you’ll find this guide a bit basic, but for beginners, it's an ideal starting point.

Let’s get stuck in… but first some context.https://github.com/huggingface/blog/blob/main/noob_intro_transformers.md blog/noob_intro_transformers.md at main · huggingface/blog
Jupyter X Hugging Face
We’re excited to announce improved support for Jupyter notebooks hosted on the Hugging Face Hub!

From serving as an essential learning resource to being a key tool used for model development, Jupyter notebooks have become a key component across many areas of machine learning. Notebooks' interactive and visual nature lets you get feedback quickly as you develop models, datasets, and demos. For many, their first exposure to training machine learning models is via a Jupyter notebook, and many practitioners use notebooks as a critical tool for developing and communicating their work.

Hugging Face is a collaborative Machine Learning platform in which the community has shared over 150,000 models, 25,000 datasets, and 30,000 ML apps. The Hub has model and dataset versioning tools, including model cards and client-side libraries to automate the versioning process. However, only including a model card with hyperparameters is not enough to provide the best reproducibility; this is where notebooks can help. Alongside these models, datasets, and demos, the Hub hosts over 7,000 notebooks. These notebooks often document the development process of a model or a dataset and can provide guidance and tutorials showing how others can use these resources. We’re therefore excited about our improved support for notebook hosting on the Hub.https://github.com/huggingface/blog/blob/main/notebooks-hub.md blog/notebooks-hub.md at main · huggingface/blog
Introducing NPC-Playground, a 3D playground to interact with LLM-powered NPCs
Thumbnail

AI-powered NPCs (Non-Playable Characters) are one of the most important breakthroughs brought about by the use of LLMs in games.

LLMs, or Large Language Models, make it possible to design "intelligent" in-game characters that can engage in realistic conversations with the player, perform complex actions and follow instructions, dramatically enhancing the player's experience. AI-powered NPCs represent a huge advancement vs rule-based and heuristics systems.

Today, we are excited to introduce NPC-Playground, a demo created by Cubzh and Gigax where you can interact with LLM-powered NPCs and see for yourself what the future holds!

You can play with the demo directly on your browser 👉 here

In this 3D demo, you can interact with the NPCs and teach them new skills with just a few lines of Lua scripting!
Nyströmformer: Approximating self-attention in linear time and memory via the Nyström method
<script async defer src="https://unpkg.com/medium-zoom-element@0/dist/medium-zoom-element.min.js"></script>
Introduction
Transformers have exhibited remarkable performance on various Natural Language Processing and Computer Vision tasks. Their success can be attributed to the self-attention mechanism, which captures the pairwise interactions between all the tokens in an input. However, the standard self-attention mechanism has a time and memory complexity of \(O(n^2)\) (where \(n\) is the length of the input sequence), making it expensive to train on long input sequences.

The Nyströmformer is one of many efficient Transformer models that approximates standard self-attention with \(O(n)\) complexity. Nyströmformer exhibits competitive performance on various downstream NLP and CV tasks while improving upon the efficiency of standard self-attention. The aim of this blog post is to give readers an overview of the Nyström method and how it can be adapted to approximate self-attention.https://github.com/huggingface/blog/blob/main/nystromformer.md blog/nystromformer.md at main · huggingface/blog
TGI Multi-LoRA: Deploy once and serve 30 models
Are you tired of the complexity and high costs of managing multiple AI models? So what if you could deploy once and have 30 model inference services? In today’s ML world, organizations looking to unlock the full value of their data may end up in a “fine-tuned world.” In this world, organizations build a large number of models, each highly specialized for a specific task. But how do you deal with the hassle and cost of deploying models for each niche application? Multi-LoRa services offer a potential answer.
Open LLM Leaderboard: DROP deep dive
Recently, three new benchmarks were added to the Open LLM Leaderboard: Winogrande, GSM8k and DROP, using the original implementations reproduced in the EleutherAI Harness. A cursory look at the scores for DROP revealed something strange was going on, with the overwhelming majority of models scoring less than 10 out of 100 on their f1-score! We did a deep dive to understand what was going on, come with us to see what we found out!

Initial observations
DROP (Discrete Reasoning Over Paragraphs) is an evaluation where models must extract relevant information from English-text paragraphs before executing discrete reasoning steps on them (for example, sorting or counting items to arrive at the correct answer, see the table below for examples). The metrics used are custom f1 and exact match scores.
What's going on with the Open LLM Leaderboard?
Recently an interesting discussion arose on Twitter following the release of Falcon 🦅 and its addition to the Open LLM Leaderboard, a public leaderboard comparing open access large language models.

The discussion centered around one of the four evaluations displayed on the leaderboard: a benchmark for measuring Massive Multitask Language Understanding (shortname: MMLU).

The community was surprised that MMLU evaluation numbers of the current top model on the leaderboard, the LLaMA model 🦙, were significantly lower than the numbers in the published LLaMa paper.

So we decided to dive in a rabbit hole to understand what was going on and how to fix it 🕳🐇

In our quest, we discussed with both the great @javier-m who collaborated on the evaluations of LLaMA and the amazing @slippylolo from the Falcon team. This being said, all the errors in the below should be attributed to us rather than them of course!

Along this journey with us you’ll learn a lot about the ways you can evaluate a model on a single evaluation and whether or not to believe the numbers you see online and in papers.

Ready? Then buckle up, we’re taking off 🚀.
Can foundation models label data like humans?
Since the advent of ChatGPT, we have seen unprecedented growth in the development of Large Language Models (LLMs), and particularly chatty models that are fine-tuned to follow instructions given in the form of prompts. However, how these models compare is unclear due to the lack of benchmarks designed to test their performance rigorously. Evaluating instruction and chatty models is intrinsically difficult because a large part of user preference is centered around qualitative style while in the past NLP evaluation was far more defined.

In this line, it’s a common story that a new large language model (LLM) is released to the tune of “our model is preferred to ChatGPT N% of the time,” and what is omitted from that sentence is that the model is preferred in some type of GPT-4-based evaluation scheme. What these points are trying to show is a proxy for a different measurement: scores provided by human labelers.
Accelerate your models with 🤗 Optimum Intel and OpenVINO
image

Last July, we announced that Intel and Hugging Face would collaborate on building state-of-the-art yet simple hardware acceleration tools for Transformer models.​Today, we are very happy to announce that we added Intel OpenVINO to Optimum Intel. You can now easily perform inference with OpenVINO Runtime on a variety of Intel processors (see the full list of supported devices) using Transformers models which can be hosted either on the Hugging Face hub or locally. You can also quantize your model with the OpenVINO Neural Network Compression Framework (NNCF), and reduce its size and prediction latency in near minutes.
Opinion Classification with Kili and HuggingFace AutoTrain
Introduction
Understanding your users’ needs is crucial in any user-related business. But it also requires a lot of hard work and analysis, which is quite expensive. Why not leverage Machine Learning then? With much less coding by using Auto ML.

In this article, we will leverage HuggingFace AutoTrain and Kili to build an active learning pipeline for text classification. Kili is a platform that empowers a data-centric approach to Machine Learning through quality training data creation. It provides collaborative data annotation tools and APIs that enable quick iterations between reliable dataset building and model training. Active learning is a process in which you add labeled data to the data set and then retrain a model iteratively. Therefore, it is endless and requires humans to label the data.

As a concrete example use case for this article, we will build our pipeline by using user reviews of Medium from the Google Play Store. After that, we are going to categorize the reviews with the pipeline we built. Finally, we will apply sentiment analysis to the classified reviews. Then we will analyze the results, understanding the users’ needs and satisfaction will be much easier.https://github.com/huggingface/blog/blob/main/opinion-classification-with-kili.md blog/opinion-classification-with-kili.md at main · huggingface/blog
Optimizing your LLM in production
Open In Colab
Note: This blog post is also available as a documentation page on Transformers.

Large Language Models (LLMs) such as GPT3/4, Falcon, and LLama are rapidly advancing in their ability to tackle human-centric tasks, establishing themselves as essential tools in modern knowledge-based industries. Deploying these models in real-world tasks remains challenging, however:

To exhibit near-human text understanding and generation capabilities, LLMs currently require to be composed of billions of parameters (see Kaplan et al, Wei et. al). This consequently amplifies the memory demands for inference.
In many real-world tasks, LLMs need to be given extensive contextual information. This necessitates the model's capability to manage very long input sequences during inference.
The crux of these challenges lies in augmenting the computational and memory capabilities of LLMs, especially when handling expansive input sequences.https://github.com/huggingface/blog/blob/main/optimize-llm.md blog/optimize-llm.md at main · huggingface/blog
Optimizing a Text-To-Speech model using 🤗 Transformers

🤗 Transformers provides many of the latest state-of-the-art (SoTA) models across domains and tasks. To get the best performance from these models, they need to be optimized for inference speed and memory usage.

The 🤗 Hugging Face ecosystem offers precisely such ready & easy to use optimization tools that can be applied across the board to all the models in the library. This makes it easy to reduce memory footprint and improve inference with just a few extra lines of code.

In this hands-on tutorial, I'll demonstrate how you can optimize Bark, a Text-To-Speech (TTS) model supported by 🤗 Transformers, based on three simple optimizations. These optimizations rely solely on the Transformers, Optimum and Accelerate libraries from the 🤗 ecosystem.

This tutorial is also a demonstration of how one can benchmark a non-optimized model and its varying optimizations.

For a more streamlined version of the tutorial with fewer explanations but all the code, see the accompanying Google Colab.

This blog post is organized as follows:https://github.com/huggingface/blog/blob/main/optimizing-bark.md blog/optimizing-bark.md at main · huggingface/blog
Accelerated Inference with Optimum and Transformers Pipelines
Inference has landed in Optimum with support for Hugging Face Transformers pipelines, including text-generation using ONNX Runtime.

The adoption of BERT and Transformers continues to grow. Transformer-based models are now not only achieving state-of-the-art performance in Natural Language Processing but also for Computer Vision, Speech, and Time-Series. 💬 🖼 🎤

Companies are now moving from the experimentation and research phase to the production phase in order to use Transformer models for large-scale workloads. But by default BERT and its friends are relatively slow, big, and complex models compared to traditional Machine Learning algorithms.

To solve this challenge, we created Optimum – an extension of Hugging Face Transformers to accelerate the training and inference of Transformer models like BERT.https://github.com/huggingface/blog/blob/main/optimum-inference.md blog/optimum-inference.md at main · huggingface/blog
Optimum-NVIDIA on Hugging Face enables blazingly fast LLM inference in just 1 line of code
Large Language Models (LLMs) have revolutionized natural language processing and are increasingly deployed to solve complex problems at scale. Achieving optimal performance with these models is notoriously challenging due to their unique and intense computational demands. Optimized performance of LLMs is incredibly valuable for end users looking for a snappy and responsive experience, as well as for scaled deployments where improved throughput translates to dollars saved.

That's where the Optimum-NVIDIA inference library comes in. Available on Hugging Face, Optimum-NVIDIA dramatically accelerates LLM inference on the NVIDIA platform through an extremely simple API. By changing just a single line of code, you can unlock up to 28x faster inference and 1,200 tokens/second on the NVIDIA platform.

Optimum-NVIDIA is the first Hugging Face inference library to benefit from the new float8 format supported on the NVIDIA Ada Lovelace and Hopper architectures. FP8, in addition to the advanced compilation capabilities of NVIDIA TensorRT-LLM software software, dramatically accelerates LLM inference.https://github.com/huggingface/blog/blob/main/optimum-nvidia.md blog/optimum-nvidia.md at main · huggingface/blog
Optimum + ONNX Runtime: Easier, Faster training for your Hugging Face models
Introduction
Transformer based models in language, vision and speech are getting larger to support complex multi-modal use cases for the end customer. Increasing model sizes directly impact the resources needed to train these models and scale them as the size increases. Hugging Face and Microsoft’s ONNX Runtime teams are working together to build advancements in finetuning large Language, Speech and Vision models. Hugging Face’s Optimum library, through its integration with ONNX Runtime for training, provides an open solution to improve training times by 35% or more for many popular Hugging Face models. We present details of both Hugging Face Optimum and the ONNX Runtime Training ecosystem, with performance numbers highlighting the benefits of using the Optimum library.https://github.com/huggingface/blog/blob/main/optimum-onnxruntime-training.md blog/optimum-onnxruntime-training.md at main · huggingface/blog