HF-hub - Share and discover more about AI with social posts from the community.huggingface/OpenAi
Share and discover more about AI with social posts from the community.huggingface/OpenAi
MobileViT (small-sized model)

MobileViT model pre-trained on ImageNet-1k at resolution 256x256. It was introduced in MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer by Sachin Mehta and Mohammad Rastegari, and first released in this repository. The license used is Apple sample code license.

Disclaimer: The team releasing MobileViT did not write a model card for this model so this model card has been written by the Hugging Face team.

Model description
MobileViT is a light-weight, low latency convolutional neural network that combines MobileNetV2-style layers with a new block that replaces local processing in convolutions with global processing using transformers. As with ViT (Vision Transformer), the image data is converted into flattened patches before it is processed by the transformer layers. Afterwards, the patches are "unflattened" back into feature maps. This allows the MobileViT-block to be placed anywhere inside a CNN. MobileViT does not require any positional embeddings.

Intended uses & limitations
You can use the raw model for image classification. See the model hub to look for fine-tuned versions on a task that interests you.
https://huggingface.co/apple/mobilevit-small apple/mobilevit-small · Hugging Face
Vision Transformer (base-sized model)
Vision Transformer (ViT) model pre-trained on ImageNet-21k (14 million images, 21,843 classes) at resolution 224x224, and fine-tuned on ImageNet 2012 (1 million images, 1,000 classes) at resolution 224x224. It was introduced in the paper An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale by Dosovitskiy et al. and first released in this repository. However, the weights were converted from the timm repository by Ross Wightman, who already converted the weights from JAX to PyTorch. Credits go to him.

Disclaimer: The team releasing ViT did not write a model card for this model so this model card has been written by the Hugging Face team.
https://huggingface.co/google/vit-base-patch16-224 google/vit-base-patch16-224 · Hugging Face
Returns whether the face belongs to man or woman based on face image.

See https://www.kaggle.com/code/dima806/man-woman-face-image-detection-vit for more details.

Classification report:

precision recall f1-score support

man 0.9898 0.9908 0.9903 7071
woman 0.9908 0.9898 0.9903 7072

accuracy 0.9903 14143
macro avg 0.9903 0.9903 0.9903 14143
weighted avg 0.9903 0.9903 0.9903 14143
fashion-images-gender-age-vit-large-patch16-224-in21k-v3
This model is a fine-tuned version of google/vit-large-patch16-224-in21k on the touchtech/fashion-images-gender-age dataset. It achieves the following results on the evaluation set:

Loss: 0.0223
Accuracy: 0.9960
Model description
More information needed

Intended uses & limitations
More information needed

Training and evaluation data
More information needed

Training procedure
Training hyperparameters
The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 8
eval_batch_size: 8
seed: 1337
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 5.0
Model card for eca_botnext26ts_256.c1_in1k
A BotNet image classification model (with Efficient channel attention, based on ResNeXt architecture). Trained on ImageNet-1k in timm by Ross Wightman.

NOTE: this model did not adhere to any specific paper configuration, it was tuned for reasonable training times and reduced frequency of self-attention blocks.

Recipe details:

Based on ResNet Strikes Back C recipes
SGD (w/ Nesterov) optimizer and AGC (adaptive gradient clipping).
Cosine LR schedule with warmup
This model architecture is implemented using timm's flexible BYOBNet (Bring-Your-Own-Blocks Network).

BYOB (with BYOANet attention specific blocks) allows configuration of:

block / stage layout
block-type interleaving
stem layout
output stride (dilation)
activation and norm layers
channel and spatial / self-attention layers
...and also includes timm features common to many other architectures, including:

stochastic depth
gradient checkpointing
layer-wise LR decay
per-stage feature extraction

https://huggingface.co/timm/eca_botnext26ts_256.c1_in1k timm/eca_botnext26ts_256.c1_in1k · Hugging Face
Model card for poolformer_m36.sail_in1k
A PoolFormer (a MetaFormer) image classification model. Trained on ImageNet-1k by paper authors.

Model Details
Model Type: Image classification / feature backbone
Model Stats:
Params (M): 56.2
GMACs: 8.8
Activations (M): 22.0
Image size: 224 x 224
Papers:
MetaFormer Is Actually What You Need for Vision: https://arxiv.org/abs/2210.13452
Original: https://github.com/sail-sg/poolformer
Dataset: ImageNet-1k
Model card for res2net101_26w_4s.in1k
A Res2Net (Multi-Scale ResNet) image classification model. Trained on ImageNet-1k by paper authors.

Model Details
Model Type: Image classification / feature backbone
Model Stats:
Params (M): 45.2
GMACs: 8.1
Activations (M): 18.4
Image size: 224 x 224
Papers:
Res2Net: A New Multi-scale Backbone Architecture: https://arxiv.org/abs/1904.01169
Dataset: ImageNet-1k
Original: https://github.com/gasvn/Res2Net/
Model card for resnet18.fb_swsl_ig1b_ft_in1k
A ResNet-B image classification model.

This model features:

ReLU activations
single layer 7x7 convolution with pooling
1x1 convolution shortcut downsample
Pretrained on Instagram-1B hashtags dataset using semi-weakly supervised learning and fine-tuned on ImageNet-1k by paper authors.

Model Details
Model Type: Image classification / feature backbone
Model Stats:
Params (M): 11.7
GMACs: 1.8
Activations (M): 2.5
Image size: 224 x 224
Papers:
Billion-scale semi-supervised learning for image classification: https://arxiv.org/abs/1905.00546
Deep Residual Learning for Image Recognition: https://arxiv.org/abs/1512.03385
Original: https://github.com/facebookresearch/semi-supervised-ImageNet1K-models
Nuno-Tome /simple_image_classifier
Image Classification
This is a simple web app to test and compare different image classifier models using Hugging Face's image-classification pipeline.

From time to time more models will be added to the list. If you want to add a model, please open an issue on the GitHub repository.https://huggingface.co/spaces/Nuno-Tome/simple_image_classifier Simple Image Classifier - a Hugging Face Space by Nuno-Tome
ONNX version of MoritzLaurer/deberta-v3-base-zeroshot-v1
This model is a conversion of MoritzLaurer/deberta-v3-base-zeroshot-v1 to ONNX format using the 🤗 Optimum library.

MoritzLaurer/deberta-v3-large-zeroshot-v1 is designed for zero-shot classification, capable of determining whether a hypothesis is true or not_true based on a text, a format based on Natural Language Inference (NLI).

https://huggingface.co/protectai/deberta-v3-base-zeroshot-v1-onnx protectai/deberta-v3-base-zeroshot-v1-onnx · Hugging Face
76k Downloads Model Card for deberta-v3-base-prompt-injection
There is a newer version of the model - protectai/deberta-v3-base-prompt-injection-v2.

This model is a fine-tuned version of microsoft/deberta-v3-base on multiple combined datasets of prompt injections and normal prompts.

It aims to identify prompt injections, classifying inputs into two categories: 0 for no injection and 1 for injection detected.

It achieves the following results on the evaluation set:

Loss: 0.0010
Accuracy: 0.9999
Recall: 0.9997
Precision: 0.9998
F1: 0.9998
Model details
Fine-tuned by: Laiyer.ai
Model type: deberta-v3
Language(s) (NLP): English
License: Apache license 2.0
Finetuned from model: microsoft/deberta-v3-base
Intended Uses & Limitations
It aims to identify prompt injections, classifying inputs into two categories: 0 for no injection and 1 for injection detected.

The model's performance is dependent on the nature and quality of the training data. It might not perform well on text styles or topics not represented in the training set.

How to Get Started with the Model
Transformers
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
import torch

tokenizer = AutoTokenizer.from_pretrained("ProtectAI/deberta-v3-base-prompt-injection")
model = AutoModelForSequenceClassification.from_pretrained("ProtectAI/deberta-v3-base-prompt-injection")

classifier = pipeline(
"text-classification",
model=model,
tokenizer=tokenizer,
truncation=True,
max_length=512,
device=torch.device("cuda" if torch.cuda.is_available() else "cpu"),
)

print(classifier("Your prompt injection is here"))
https://huggingface.co/protectai/deberta-v3-base-prompt-injection protectai/deberta-v3-base-prompt-injection · Hugging Face
Model Card for deberta-v3-base-prompt-injection-v2
This model is a fine-tuned version of microsoft/deberta-v3-base specifically developed to detect and classify prompt injection attacks which can manipulate language models into producing unintended outputs.

Introduction
Prompt injection attacks manipulate language models by inserting or altering prompts to trigger harmful or unintended responses. The deberta-v3-base-prompt-injection-v2 model is designed to enhance security in language model applications by detecting these malicious interventions.

Model Details
Fine-tuned by: Protect AI
Model type: deberta-v3-base
Language(s) (NLP): English
License: Apache License 2.0
Finetuned from model: microsoft/deberta-v3-base
Intended Uses
This model classifies inputs into benign (0) and injection-detected (1).

Limitations
deberta-v3-base-prompt-injection-v2 is highly accurate in identifying prompt injections in English. It does not detect jailbreak attacks or handle non-English prompts, which may limit its applicability in diverse linguistic environments or against advanced adversarial techniques.

Additionally, we do not recommend using this scanner for system prompts, as it produces false-positives.
https://huggingface.co/protectai/deberta-v3-base-prompt-injection-v2 protectai/deberta-v3-base-prompt-injection-v2 · Hugging Face
Clip-Roco_Version2_v2
This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.8034
Model description
More information needed

Intended uses & limitations
More information needed

Training and evaluation data
More information needed

Training procedure
Training hyperparameters
The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 3
eval_batch_size: 3
seed: 2
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 5
num_epochs: 5

https://huggingface.co/MohamedAhmedAE/Clip-Roco_Version2_v2 MohamedAhmedAE/Clip-Roco_Version2_v2 · Hugging Face
Model tree for hugging-quants/Meta-Llama-3.1-405B-BNB-NF4-BF16
Model Information
The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.

This repository contains meta-llama/Meta-Llama-3.1-405B quantized using bitsandbytes from BF16 down to NF4 with a block size of 64, and storage type torch.bfloat16.

Model Usage
In order to run the inference with Llama 3.1 405B BNB in NF4, around 220 GiB of VRAM are needed only for loading the model checkpoint, without including the KV cache or the CUDA graphs, meaning that there should be a bit over that VRAM available.

In order to use the current quantized model, support is offered for different solutions as transformers, or text-generation-inference.

🤗 transformers
In order to run the inference with Llama 3.1 405B BNB in NF4, both torch and bitsandbytes need to be installed as:

pip install "torch>=2.0.0" bitsandbytes --upgrade

Then, the latest version of transformers need to be installed, being 4.43.0 or higher, as:

pip install "transformers[accelerate]>=4.43.0" --upgrade

To run the inference on top of Llama 3.1 405B BNB in NF4 precision, the model can be instantiated as any other causal language modeling model via AutoModelForCausalLM and run the inference normally.https://huggingface.co/hugging-quants/Meta-Llama-3.1-405B-BNB-NF4-BF16 hugging-quants/Meta-Llama-3.1-405B-BNB-NF4-BF16 · Hugging Face
2024 The Hugging-quants/Meta-Llama-3.1-405B-BNB-NF4
Model Information
The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.

This repository contains meta-llama/Meta-Llama-3.1-405B quantized using bitsandbytes from BF16 down to NF4 with a block size of 64.

🤗 transformers
In order to run the inference with Llama 3.1 405B BNB in NF4, both torch and bitsandbytes need to be installed as:

pip install "torch>=2.0.0" bitsandbytes --upgrade

Then, the latest version of transformers need to be installed, being 4.43.0 or higher, as:

pip install "transformers[accelerate]>=4.43.0" --upgrade

To run the inference on top of Llama 3.1 405B BNB in NF4 precision, the model can be instantiated as any other causal language modeling model via AutoModelForCausalLM and run the inference normally.https://huggingface.co/hugging-quants/Meta-Llama-3.1-405B-BNB-NF4 hugging-quants/Meta-Llama-3.1-405B-BNB-NF4 · Hugging Face
Vintern-1B ❄️ (Viet-InternVL2-1B) [🤗 HF Demo] - The LLaVA 🌋 Challenger
We are excited to introduce Vintern-1B the Vietnamese 🇻🇳 multimodal model that combines the advanced Vietnamese language model Qwen2-0.5B-Instruct with the latest visual model, InternViT-300M-448px, CVPR 2024. This model excels in tasks such as OCR-VQA, Doc-VQA, and Chart-VQA,... With only 1 billion parameters, it is 4096 context length finetuned from the InternVL2-1B model on over 1.5 million specialized image-question-answer pairs for optical character recognition 🔍, text recognition 🔤, document extraction 📑, and more. The model can be integrated into various on-device applications 📱, demonstrating its versatility and robust capabilities.

Vintern-1B is a multimodal large language model series, featuring models of various sizes. For each size, we release instruction-tuned models optimized for multimodal tasks. Vintern-1B consists of InternViT-300M-448px, an MLP projector, and Qwen2-0.5B-Instruct.

Training details 📚
The fine-tuning dataset was meticulously sampled in part from the following datasets:

Viet-OCR-VQA
Viet-Doc-VQA
Viet-Doc-VQA-II
Benchmarks 📈
Since there are still many different metrics that need to be tested, we chose a quick and simple metric first to guide the development of our model. Our metric is inspired by Lavy's paper. For the time being, we are using GPT-4 to evaluate the quality of answers on two datasets: OpenViVQA and ViTextVQA. Detailed results can be found at the provided Here. The inputs are images, questions, labels, and predicted answers. The model will return a score from 0 to 10 for the corresponding answer quality. The results table is shown below.https://huggingface.co/5CD-AI/Viet-InternVL2-1B 5CD-AI/Viet-InternVL2-1B · Hugging Face
merve/sam2-hiera-small
SAM2-Hiera-small
This repository contains small variant of SAM2 model. SAM2 is the state-of-the-art mask generation model released by Meta.

Usage
You can use it like below. First install packaged version of SAM2.
pip install samv2 huggingface_hub

Each model requires different classes to infer.
For prompting:
Rred by: https://huggingface.co/merve/sam2-hiera-small merve/sam2-hiera-small · Hugging Face