Share and discover more about AI with social posts from the community.huggingface/OpenAi
Flux AI
AI image generation - create art at the click of a button.

InternationalSelection
Image
AI Image Generation
Text to Image
Visit
Flux AI is an advanced text-to-image AI model developed by Black Forest Labs that employs a transformer-based flow model to generate high-quality images. Key advantages of this technology include outstanding visual quality, strict adherence to prompts, diverse dimensions/aspect ratios, and varied typography and outputs. Flux AI offers three variants: FLUX.1 [pro], FLUX.1 [dev], and FLUX.1 [schnell], each designed for different use cases and performance levels. Flux AI aims to make cutting-edge AI technology accessible to everyone by providing FLUX.1 [schnell] as a free open-source model, ensuring that individuals, researchers, and small developers can benefit from advanced AI technology without financial barriers.
A Brief History of HuggingFace
Founded in 2016, HuggingFace (named after the popular emoji 🤗) started as a chatbot company and later transformed into an open-source provider of NLP technologies. The chatbot company at the time, aimed at the teenage demographic, was focused on:
(...) building an AI so that you’re having fun talking with it. When you’re chatting with it, you’re going to laugh and smile — it’s going to be entertaining
- Clem Delangue, CEO & Co-founder
Like Tamagotchi, the chatbot could talk coherently about a wide range of topics, detect emotions in text, and adapt its tone accordingly.
Underlying this chatbot, however, were HuggingFace's main strengths: in-house NLP models (one such one was called Hierarchical Multi-Task Learning (HMTL)) and a managed library of pre-trained NLP models. This would serve as the early backbone of the transformers we know of currently.
The early PyTorch transformers established compatibility between PyTorch and TensorFlow 2.0, which then enabled users to move easily from one framework to another during the life of a model. Coupled with the release of the “Attention Is All You Need” paper by Google.
The shift to transformers in the NLP space, HuggingFace, who had already released parts of the powerful library powering their chatbot as an open-source project on GitHub, began to focus on open-sourcing popular large language models to PyTorch such as BERT and GPT.
With the most recent Series C funding round leading to $2 billion in evaluation, HuggingFace currently offers an ecosystem of models and datasets spread across its various tools like HuggingFace Hub, transformers, diffusers, and more.
Advanced usage & custom training
Let's create our own logic for a more customized training.

✏️ Preparing a dataset
The dataset will vary based on the task you work on. Let's work on sequence classification!

Our dataset will be composed of sentences and their associated classes. For example if you wanted to identify the subject of a conversation, you could create a dataset such as:

input class
The team scored a goal in the last seconds sports
The debate was heated between the 2 parties politics
I've never tasted croissants so delicious ! food
The objective of our trained model will be to correctly identify the class associated to new sentences.

🔎 Finding a dataset
If you don't have the right dataset, you can always explore the Datasets Hub. The "topic classification" category contains many datasets suitable for prototyping this model.

We select "Yahoo! Answers Topic Classification" and visualize it with the Datasets viewer.
🛠 Installation and set-up
We need the following 🤗 Hugging Face libraries:

transformers contains an API for training models and many pre-trained models
tokenizers is automatically installed by transformers and "tokenize" our data (ie it converts text to sequence of numbers)
datasets contains a rich source of data and common metrics, perfect for prototyping
We also install wandb to automatically instrument our training.


[ ]
!pip install datasets wandb evaluate accelerate -qU
!wget https://raw.githubusercontent.com/huggingface/transformers/master/examples/pytorch/text-classification/run_glue.py

[ ]
# the run_glue.py script requires transformers dev
!pip install -q git+https://github.com/huggingface/transformers
We finally make sure we're logged into W&B so that our experiments can be associated to our account.


[ ]
import wandb


[ ]
wandb.login()
💡 Configuration tips
W&B integration with Hugging Face can be configured to add extra functionalities:

auto-logging of models as artifacts: just set environment varilable WANDB_LOG_MODEL to true
log histograms of gradients and parameters: by default gradients are logged, you can also log parameters by setting environment variable WANDB_WATCH to all
set custom run names with run_name arg present in scripts or as part of TrainingArguments
organize runs by project with the WANDB_PROJECT environment variable
For more details refer to W&B + HF integration documentation.

Let's log every trained model.
Optimize 🤗 Hugging Face models with Weights & Biases
Hugging Face provides tools to quickly train neural networks for NLP (Natural Language Processing) on any task (classification, translation, question answering, etc) and any dataset with PyTorch and TensorFlow 2.0.

Coupled with Weights & Biases integration, you can quickly train and monitor models for full traceability and reproducibility without any extra line of code! You just need to install the library, sign in, and your experiments will automatically be logged:

pip install wandb
wandb login
Note: To enable logging to W&B, set report_to to wandb in your TrainingArguments or script.

W&B integration with 🤗 Hugging Face can automatically:

log your configuration parameters
log your losses and metrics
log gradients and parameter distributions
log your model
keep track of your code
log your system metrics (GPU, CPU, memory, temperature, etc)
Here's what the W&B interactive dashboard will look like:
Recommender Systems Using Hugging Face & NVIDIA
There's no question that recommender systems have changed the way we consume content, purchase consumer goods, and create or maintain relationships online. With a few clicks, and sometimes even without explicitly teaching a model our preferences, we can have a never-ending stream of personalized content delivered straight to our screens. But how exactly do these systems work, and what implications do they have for the future of content personalization, how we shop, and who we connect with online?

Here's What We'll Be Coverin
https://wandb.ai/int_pb/recommendations/reports/Recommender-Systems-Using-Hugging-Face-NVIDIA--VmlldzoyOTczMzUy?galleryTag=hugging-face
xAI Enters the Chat with Grok 2
A new LLM from Elon Musk's xAI
Performance and Benchmarks
Grok-2 has already demonstrated its prowess by outperforming major competitors like Claude 3.5 Sonnet and GPT-4-Turbo in key benchmarks. An early version of Grok-2, tested under the alias "sus-column-r," topped the charts in the LMSYS chatbot arena, showcasing its superior Elo score. This evaluation was reinforced through internal testing, where Grok-2 excelled in tasks requiring instruction-following and accurate information retrieval, marking a noticeable improvement in reasoning and tool-use capabilities over its predecessors
https://wandb.ai/byyoung3/ml-news/reports/xAI-Enters-the-Chat-with-Grok-2--Vmlldzo5MDM0Mzky
vit-base-xray-pneumonia
This model is a fine-tuned version of google/vit-base-patch16-224-in21k on the chest-xray-pneumonia dataset. It achieves the following results on the evaluation set:

Loss: 0.3387
Accuracy: 0.9006
Model description
More information needed

Intended uses & limitations
More information needed

Training and evaluation data
More information needed

Training procedure
Training hyperparameters
The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 16
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 10 https://huggingface.co/nickmuchi/vit-base-xray-pneumonia nickmuchi/vit-base-xray-pneumonia · Hugging Face
SegFormer (b5-sized) encoder pre-trained-only
SegFormer encoder fine-tuned on Imagenet-1k. It was introduced in the paper SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers by Xie et al. and first released in this repository.

Disclaimer: The team releasing SegFormer did not write a model card for this model so this model card has been written by the Hugging Face team.

Model description
SegFormer consists of a hierarchical Transformer encoder and a lightweight all-MLP decode head to achieve great results on semantic segmentation benchmarks such as ADE20K and Cityscapes. The hierarchical Transformer is first pre-trained on ImageNet-1k, after which a decode head is added and fine-tuned altogether on a downstream dataset.

This repository only contains the pre-trained hierarchical Transformer, hence it can be used for fine-tuning purposes.

Intended uses & limitations
You can use the model for fine-tuning of semantic segmentation. See the model hub to look for fine-tuned versions on a task that interests you.
https://huggingface.co/nvidia/mit-b5 nvidia/mit-b5 · Hugging Face
dit-large-finetuned-rvlcdip
Document Image Transformer (large-sized model)
Document Image Transformer (DiT) model pre-trained on IIT-CDIP (Lewis et al., 2006), a dataset that includes 42 million document images and fine-tuned on RVL-CDIP, a dataset consisting of 400,000 grayscale images in 16 classes, with 25,000 images per class. It was introduced in the paper DiT: Self-supervised Pre-training for Document Image Transformer by Li et al. and first released in this repository. Note that DiT is identical to the architecture of BEiT.

Disclaimer: The team releasing DiT did not write a model card for this model so this model card has been written by the Hugging Face team.

Model description
The Document Image Transformer (DiT) is a transformer encoder model (BERT-like) pre-trained on a large collection of images in a self-supervised fashion. The pre-training objective for the model is to predict visual tokens from the encoder of a discrete VAE (dVAE), based on masked patches.

Images are presented to the model as a sequence of fixed-size patches (resolution 16x16), which are linearly embedded. One also adds absolute position embeddings before feeding the sequence to the layers of the Transformer encoder.

By pre-training the model, it learns an inner representation of images that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled document images for instance, you can train a standard classifier by placing a linear layer on top of the pre-trained encoder.
https://huggingface.co/microsoft/dit-large-finetuned-rvlcdip microsoft/dit-large-finetuned-rvlcdip · Hugging Face
rare-puppers
Autogenerated by HuggingPics🤗🖼

Create your own image classifier for anything by running the demo on Google Colab.

Report any issues with the demo at the github repo.

Example Images
corgi https://huggingface.co/nateraw/rare-puppers nateraw/rare-puppers · Hugging Face
Turns out if you do a cute little hack, you can make
nateraw/musicgen-songstarter-v0.2
work on vocal inputs. 👀

Now, you can hum an idea for a song and get a music sample generated with AI 🔥🔥

Give it a try: ➡️
nateraw/singing-songstarter
⬅️

It'll take your voice and try to autotune it (because let's be real, you're no michael jackson), then pass it along to the model to condition on the melody. It works surprisingly well!

https://huggingface.co/spaces/nateraw/singing-songstarter Sing an idea ➡️ Music - a Hugging Face Space by nateraw
baseball-stadium-foods
Autogenerated by HuggingPics🤗🖼

Create your own image classifier for anything by running the demo.

Report any issues with the demo at the github repo.

Example Images
cotton candy
https://huggingface.co/nateraw/baseball-stadium-foods nateraw/baseball-stadium-foods · Hugging Face
Google didn't publish vit-tiny and vit-small model checkpoints in Hugging Face. I converted the weights from the timm repository. This model is used in the same way as ViT-base.

Note that [safetensors] model requires torch 2.0 environment.
https://huggingface.co/WinKawaks/vit-small-patch16-224 WinKawaks/vit-small-patch16-224 · Hugging Face
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
MobileCLIP was introduced in MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training (CVPR 2024), by Pavan Kumar Anasosalu Vasu, Hadi Pouransari, Fartash Faghri, Raviteja Vemulapalli, Oncel Tuzel.

This repository contains the MobileCLIP-B (LT) checkpoint for timm.

MobileCLIP Performance Figure

Highlights
Our smallest variant MobileCLIP-S0 obtains similar zero-shot performance as OpenAI's ViT-B/16 model while being 4.8x faster and 2.8x smaller.
MobileCLIP-S2 obtains better avg zero-shot performance than SigLIP's ViT-B/16 model while being 2.3x faster and 2.1x smaller, and trained with 3x less seen samples.
MobileCLIP-B(LT) attains zero-shot ImageNet performance of 77.2% which is significantly better than recent works like DFN and SigLIP with similar architectures or even OpenAI's ViT-L/14@336.
https://huggingface.co/apple/mobileclip_b_lt_timm apple/mobileclip_b_lt_timm · Hugging Face
BEiT (base-sized model, fine-tuned on ImageNet-22k)
BEiT model pre-trained in a self-supervised fashion on ImageNet-22k - also called ImageNet-21k (14 million images, 21,841 classes) at resolution 224x224, and fine-tuned on the same dataset at resolution 224x224. It was introduced in the paper BEIT: BERT Pre-Training of Image Transformers by Hangbo Bao, Li Dong and Furu Wei and first released in this repository.

Disclaimer: The team releasing BEiT did not write a model card for this model so this model card has been written by the Hugging Face team.

Model description
The BEiT model is a Vision Transformer (ViT), which is a transformer encoder model (BERT-like). In contrast to the original ViT model, BEiT is pretrained on a large collection of images in a self-supervised fashion, namely ImageNet-21k, at a resolution of 224x224 pixels. The pre-training objective for the model is to predict visual tokens from the encoder of OpenAI's DALL-E's VQ-VAE, based on masked patches. Next, the model was fine-tuned in a supervised fashion on ImageNet (also referred to as ILSVRC2012), a dataset comprising 1 million images and 1,000 classes, also at resolution 224x224.

Images are presented to the model as a sequence of fixed-size patches (resolution 16x16), which are linearly embedded. Contrary to the original ViT models, BEiT models do use relative position embeddings (similar to T5) instead of absolute position embeddings, and perform classification of images by mean-pooling the final hidden states of the patches, instead of placing a linear layer on top of the final hidden state of the [CLS] token.

By pre-training the model, it learns an inner representation of images that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled images for instance, you can train a standard classifier by placing a linear layer on top of the pre-trained encoder. One typically places a linear layer on top of the [CLS] token, as the last hidden state of this token can be seen as a representation of an entire image. Alternatively, one can mean-pool the final hidden states of the patch embeddings, and place a linear layer on top of that.

Intended uses & limitations
You can use the raw model for image classification. See the model hub to look for fine-tuned versions on a task that interests you.https://huggingface.co/microsoft/beit-base-patch16-224-pt22k-ft22k microsoft/beit-base-patch16-224-pt22k-ft22k · Hugging Face
📣 Introducing Dataset Viber: your chill repo for data collection, annotation and vibe checks! 🎉

I've cooked up Dataset Viber, a set of cool tools designed to make data preparation for AI models easier, more approachable and enjoyable for standalone AI engineers and enthusiasts.

🔧 What Dataset Viber offers:
- CollectorInterface: Lazily collect model interaction data without human annotation
- AnnotatorInterface: Annotate your data with models in the loop
- BulkInterface: Explore data distribution and annotate in bulk
- Embedder: Efficiently embed data with ONNX-optimized speeds

🎯 Key features:
- Supports various tasks for text, chat, and image modalities
- Runs in .ipynb notebooks
- Logs data to local CSV or directly to Hugging Face Hub
- Easy to install via pip: pip install dataset-viber

It's not designed for team collaboration or production use, but rather as a fun and efficient toolkit for individual projects.

Want to give it a try? Check out the repository link https://github.com/davidberenstein1957/dataset-viber/.

I'm excited to hear your feedback and learn how you vibe with your data. Feel free to open an issue or reach out if you have any questions or suggestions!

Some shoutouts:
- Gradio for the amazing backbone
- Daniel van Strien for some initial presentations I did on vibe checks
- Emily Omier for the workshop on structuring GitHub repo READMEs
- Hamel Husain for keeping mentioning that people should look at their data.
- Philipp Schmid for his code for ONNX feature-extractors
- Ben Burtenshaw for the first PR GitHub - davidberenstein1957/dataset-viber: Dataset Viber is your chill repo for data collection, annotation and vibe checks.