HF-hub

Share and discover more about AI with social posts from the community.huggingface/OpenAi

14:27 · Aug 8, 2024 · Thu

PyTorch implementation of the Self-Compression & Differentiable Quantization Algorithm introduced in “Self-Compressing Neural Networks” paper.

The algorithm shows dynamic neural network compression during training - with reduced size of weight, activation tensors and bits required to represent weights.

It’s basically shrinking the neural network size (weights and activations) as it’s being trained without compromising performance - this helps reduce compute and inference cost.

Code: https://github.com/Jaykef/ai-algorithms
Paper: https://arxiv.org/pdf/2301.13142

14:27 · Aug 8, 2024 · Thu

Live Portrait Updated to V5

Animals Live animation added

All of the main repo changes and improvements added to our modified and improve app

Link : https://patreon.com/posts/107609670

Patreon

1-Click Installers and Upgraded APP for LivePortrait - Blazing Fast Static Image or Video to Video to Talking and Moving Animation…

14:27 · Aug 8, 2024 · Thu

✨The STABLE IMAGINE !!✨
🍺Space:
prithivMLmods/STABLE-IMAGINE

↗️The specific LoRA in the space that requires appropriate trigger words brings good results.
📒 Articles: https://huggingface.co/blog/prithivMLmods/lora-adp-01

Description and Utility Functions
✅ Most likely image generation

14:26 · Aug 8, 2024 · Thu

New smol-vision tutorial dropped: QLoRA fine-tuning IDEFICS3-Llama 8B on VQAv2 🐶

Learn how to efficiently fine-tune the latest IDEFICS3-Llama on visual question answering in this notebook 📖
Fine-tuning notebook: https://github.com/merveenoyan/smol-vision/blob/main/Idefics_FT.ipynb
Resulting model:
merve/idefics3llama-vqav2

GitHub

smol-vision/Idefics_FT.ipynb at main · merveenoyan/smol-vision

Recipes for shrinking, optimizing, customizing cutting edge vision models. 💜 - merveenoyan/smol-vision

13:34 · Aug 8, 2024 · Thu

MobileViT (small-sized model)

MobileViT model pre-trained on ImageNet-1k at resolution 256x256. It was introduced in MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer by Sachin Mehta and Mohammad Rastegari, and first released in this repository. The license used is Apple sample code license.

Disclaimer: The team releasing MobileViT did not write a model card for this model so this model card has been written by the Hugging Face team.

Model description
MobileViT is a light-weight, low latency convolutional neural network that combines MobileNetV2-style layers with a new block that replaces local processing in convolutions with global processing using transformers. As with ViT (Vision Transformer), the image data is converted into flattened patches before it is processed by the transformer layers. Afterwards, the patches are "unflattened" back into feature maps. This allows the MobileViT-block to be placed anywhere inside a CNN. MobileViT does not require any positional embeddings.

Intended uses & limitations
You can use the raw model for image classification. See the model hub to look for fine-tuned versions on a task that interests you.
https://huggingface.co/apple/mobilevit-small

huggingface.co

apple/mobilevit-small · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

13:33 · Aug 8, 2024 · Thu

Vision Transformer (base-sized model)
Vision Transformer (ViT) model pre-trained on ImageNet-21k (14 million images, 21,843 classes) at resolution 224x224, and fine-tuned on ImageNet 2012 (1 million images, 1,000 classes) at resolution 224x224. It was introduced in the paper An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale by Dosovitskiy et al. and first released in this repository. However, the weights were converted from the timm repository by Ross Wightman, who already converted the weights from JAX to PyTorch. Credits go to him.

Disclaimer: The team releasing ViT did not write a model card for this model so this model card has been written by the Hugging Face team.
https://huggingface.co/google/vit-base-patch16-224

huggingface.co

google/vit-base-patch16-224 · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

13:32 · Aug 8, 2024 · Thu

Returns whether the face belongs to man or woman based on face image.

See https://www.kaggle.com/code/dima806/man-woman-face-image-detection-vit for more details.

Classification report:

precision recall f1-score support

man 0.9898 0.9908 0.9903 7071
woman 0.9908 0.9898 0.9903 7072

accuracy 0.9903 14143
macro avg 0.9903 0.9903 0.9903 14143
weighted avg 0.9903 0.9903 0.9903 14143

Kaggle

Man/woman face image detection ViT

Explore and run machine learning code with Kaggle Notebooks | Using data from Biggest gender/face recognition dataset.

13:32 · Aug 8, 2024 · Thu

fashion-images-gender-age-vit-large-patch16-224-in21k-v3
This model is a fine-tuned version of google/vit-large-patch16-224-in21k on the touchtech/fashion-images-gender-age dataset. It achieves the following results on the evaluation set:

Loss: 0.0223
Accuracy: 0.9960
Model description
More information needed

Intended uses & limitations
More information needed

Training and evaluation data
More information needed

Training procedure
Training hyperparameters
The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 8
eval_batch_size: 8
seed: 1337
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 5.0

13:32 · Aug 8, 2024 · Thu

Model card for eca_botnext26ts_256.c1_in1k
A BotNet image classification model (with Efficient channel attention, based on ResNeXt architecture). Trained on ImageNet-1k in timm by Ross Wightman.

NOTE: this model did not adhere to any specific paper configuration, it was tuned for reasonable training times and reduced frequency of self-attention blocks.

Recipe details:

Based on ResNet Strikes Back C recipes
SGD (w/ Nesterov) optimizer and AGC (adaptive gradient clipping).
Cosine LR schedule with warmup
This model architecture is implemented using timm's flexible BYOBNet (Bring-Your-Own-Blocks Network).

BYOB (with BYOANet attention specific blocks) allows configuration of:

block / stage layout
block-type interleaving
stem layout
output stride (dilation)
activation and norm layers
channel and spatial / self-attention layers
...and also includes timm features common to many other architectures, including:

stochastic depth
gradient checkpointing
layer-wise LR decay
per-stage feature extraction

https://huggingface.co/timm/eca_botnext26ts_256.c1_in1k

huggingface.co

timm/eca_botnext26ts_256.c1_in1k · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

13:31 · Aug 8, 2024 · Thu

Model card for poolformer_m36.sail_in1k
A PoolFormer (a MetaFormer) image classification model. Trained on ImageNet-1k by paper authors.

Model Details
Model Type: Image classification / feature backbone
Model Stats:
Params (M): 56.2
GMACs: 8.8
Activations (M): 22.0
Image size: 224 x 224
Papers:
MetaFormer Is Actually What You Need for Vision: https://arxiv.org/abs/2210.13452
Original: https://github.com/sail-sg/poolformer
Dataset: ImageNet-1k

arXiv.org

MetaFormer Baselines for Vision

MetaFormer, the abstracted architecture of Transformer, has been found to play a significant role in achieving competitive performance. In this paper, we further explore the capacity of...

13:31 · Aug 8, 2024 · Thu

Model card for res2net101_26w_4s.in1k
A Res2Net (Multi-Scale ResNet) image classification model. Trained on ImageNet-1k by paper authors.

Model Details
Model Type: Image classification / feature backbone
Model Stats:
Params (M): 45.2
GMACs: 8.1
Activations (M): 18.4
Image size: 224 x 224
Papers:
Res2Net: A New Multi-scale Backbone Architecture: https://arxiv.org/abs/1904.01169
Dataset: ImageNet-1k
Original: https://github.com/gasvn/Res2Net/

arXiv.org

Res2Net: A New Multi-scale Backbone Architecture

Representing features at multiple scales is of great importance for numerous vision tasks. Recent advances in backbone convolutional neural networks (CNNs) continually demonstrate stronger...

13:30 · Aug 8, 2024 · Thu

Model card for resnet18.fb_swsl_ig1b_ft_in1k
A ResNet-B image classification model.

This model features:

ReLU activations
single layer 7x7 convolution with pooling
1x1 convolution shortcut downsample
Pretrained on Instagram-1B hashtags dataset using semi-weakly supervised learning and fine-tuned on ImageNet-1k by paper authors.

Model Details
Model Type: Image classification / feature backbone
Model Stats:
Params (M): 11.7
GMACs: 1.8
Activations (M): 2.5
Image size: 224 x 224
Papers:
Billion-scale semi-supervised learning for image classification: https://arxiv.org/abs/1905.00546
Deep Residual Learning for Image Recognition: https://arxiv.org/abs/1512.03385
Original: https://github.com/facebookresearch/semi-supervised-ImageNet1K-models

arXiv.org

Billion-scale semi-supervised learning for image classification

This paper presents a study of semi-supervised learning with large convolutional networks. We propose a pipeline, based on a teacher/student paradigm, that leverages a large collection of...

13:30 · Aug 8, 2024 · Thu

Nuno-Tome /simple_image_classifier
Image Classification
This is a simple web app to test and compare different image classifier models using Hugging Face's image-classification pipeline.

From time to time more models will be added to the list. If you want to add a model, please open an issue on the GitHub repository.https://huggingface.co/spaces/Nuno-Tome/simple_image_classifier

huggingface.co

Simple Image Classifier - a Hugging Face Space by Nuno-Tome

compare different image classifier models

13:27 · Aug 8, 2024 · Thu

ONNX version of MoritzLaurer/deberta-v3-base-zeroshot-v1
This model is a conversion of MoritzLaurer/deberta-v3-base-zeroshot-v1 to ONNX format using the 🤗 Optimum library.

MoritzLaurer/deberta-v3-large-zeroshot-v1 is designed for zero-shot classification, capable of determining whether a hypothesis is true or not_true based on a text, a format based on Natural Language Inference (NLI).

https://huggingface.co/protectai/deberta-v3-base-zeroshot-v1-onnx

huggingface.co

protectai/deberta-v3-base-zeroshot-v1-onnx · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Before

After