HF-hub - Share and discover more about AI with social posts from the community.huggingface/OpenAi
Share and discover more about AI with social posts from the community.huggingface/OpenAi
baseball-stadium-foods
Autogenerated by HuggingPics🤗🖼

Create your own image classifier for anything by running the demo.

Report any issues with the demo at the github repo.

Example Images
cotton candy
https://huggingface.co/nateraw/baseball-stadium-foods nateraw/baseball-stadium-foods · Hugging Face
Google didn't publish vit-tiny and vit-small model checkpoints in Hugging Face. I converted the weights from the timm repository. This model is used in the same way as ViT-base.

Note that [safetensors] model requires torch 2.0 environment.
https://huggingface.co/WinKawaks/vit-small-patch16-224 WinKawaks/vit-small-patch16-224 · Hugging Face
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
MobileCLIP was introduced in MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training (CVPR 2024), by Pavan Kumar Anasosalu Vasu, Hadi Pouransari, Fartash Faghri, Raviteja Vemulapalli, Oncel Tuzel.

This repository contains the MobileCLIP-B (LT) checkpoint for timm.

MobileCLIP Performance Figure

Highlights
Our smallest variant MobileCLIP-S0 obtains similar zero-shot performance as OpenAI's ViT-B/16 model while being 4.8x faster and 2.8x smaller.
MobileCLIP-S2 obtains better avg zero-shot performance than SigLIP's ViT-B/16 model while being 2.3x faster and 2.1x smaller, and trained with 3x less seen samples.
MobileCLIP-B(LT) attains zero-shot ImageNet performance of 77.2% which is significantly better than recent works like DFN and SigLIP with similar architectures or even OpenAI's ViT-L/14@336.
https://huggingface.co/apple/mobileclip_b_lt_timm apple/mobileclip_b_lt_timm · Hugging Face
BEiT (base-sized model, fine-tuned on ImageNet-22k)
BEiT model pre-trained in a self-supervised fashion on ImageNet-22k - also called ImageNet-21k (14 million images, 21,841 classes) at resolution 224x224, and fine-tuned on the same dataset at resolution 224x224. It was introduced in the paper BEIT: BERT Pre-Training of Image Transformers by Hangbo Bao, Li Dong and Furu Wei and first released in this repository.

Disclaimer: The team releasing BEiT did not write a model card for this model so this model card has been written by the Hugging Face team.

Model description
The BEiT model is a Vision Transformer (ViT), which is a transformer encoder model (BERT-like). In contrast to the original ViT model, BEiT is pretrained on a large collection of images in a self-supervised fashion, namely ImageNet-21k, at a resolution of 224x224 pixels. The pre-training objective for the model is to predict visual tokens from the encoder of OpenAI's DALL-E's VQ-VAE, based on masked patches. Next, the model was fine-tuned in a supervised fashion on ImageNet (also referred to as ILSVRC2012), a dataset comprising 1 million images and 1,000 classes, also at resolution 224x224.

Images are presented to the model as a sequence of fixed-size patches (resolution 16x16), which are linearly embedded. Contrary to the original ViT models, BEiT models do use relative position embeddings (similar to T5) instead of absolute position embeddings, and perform classification of images by mean-pooling the final hidden states of the patches, instead of placing a linear layer on top of the final hidden state of the [CLS] token.

By pre-training the model, it learns an inner representation of images that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled images for instance, you can train a standard classifier by placing a linear layer on top of the pre-trained encoder. One typically places a linear layer on top of the [CLS] token, as the last hidden state of this token can be seen as a representation of an entire image. Alternatively, one can mean-pool the final hidden states of the patch embeddings, and place a linear layer on top of that.

Intended uses & limitations
You can use the raw model for image classification. See the model hub to look for fine-tuned versions on a task that interests you.https://huggingface.co/microsoft/beit-base-patch16-224-pt22k-ft22k microsoft/beit-base-patch16-224-pt22k-ft22k · Hugging Face
📣 Introducing Dataset Viber: your chill repo for data collection, annotation and vibe checks! 🎉

I've cooked up Dataset Viber, a set of cool tools designed to make data preparation for AI models easier, more approachable and enjoyable for standalone AI engineers and enthusiasts.

🔧 What Dataset Viber offers:
- CollectorInterface: Lazily collect model interaction data without human annotation
- AnnotatorInterface: Annotate your data with models in the loop
- BulkInterface: Explore data distribution and annotate in bulk
- Embedder: Efficiently embed data with ONNX-optimized speeds

🎯 Key features:
- Supports various tasks for text, chat, and image modalities
- Runs in .ipynb notebooks
- Logs data to local CSV or directly to Hugging Face Hub
- Easy to install via pip: pip install dataset-viber

It's not designed for team collaboration or production use, but rather as a fun and efficient toolkit for individual projects.

Want to give it a try? Check out the repository link https://github.com/davidberenstein1957/dataset-viber/.

I'm excited to hear your feedback and learn how you vibe with your data. Feel free to open an issue or reach out if you have any questions or suggestions!

Some shoutouts:
- Gradio for the amazing backbone
- Daniel van Strien for some initial presentations I did on vibe checks
- Emily Omier for the workshop on structuring GitHub repo READMEs
- Hamel Husain for keeping mentioning that people should look at their data.
- Philipp Schmid for his code for ONNX feature-extractors
- Ben Burtenshaw for the first PR GitHub - davidberenstein1957/dataset-viber: Dataset Viber is your chill repo for data collection, annotation and vibe checks.
Blane187/animalese-py


or you can make your voice to animalese with it:
Blane187/animalese_RVC


i'm just bored, so i make the project, lol
Everchanging Quest is out !

It is an LLM controlled Rogue-Like in which the LLM gets a markdown representation of the map, and should generate a JSON with the objective to fulfill on the map as well as the necessary objects and their placements.

Come test it on the space :
Jofthomas/Everchanging-Quest
Some personal and professional news

I'm writing a book on ML metrics.

Together with Wojtek Kuberski, we’re creating the missing piece of every ML university program and online course: a book solely dedicated to Machine Learning metrics!

The book will cover the following types of metrics:
• Regression
• Classification
• Clustering
• Ranking
• Vision
• Text
• GenAI
• Bias and Fairness

👉 check out the book: https://www.nannyml.com/metrics
𝗭𝗲𝗿𝗼-𝗺𝗮𝘁𝗵 𝗶𝗻𝘁𝗿𝗼 𝘁𝗼 𝗔𝗜 𝗵𝗶𝘀𝘁𝗼𝗿𝘆: 𝗳𝗿𝗼𝗺 𝘁𝗵𝗲 𝟭𝟵𝟱𝟬𝘀 𝘁𝗼 𝘁𝗼𝗱𝗮𝘆'𝘀 𝗟𝗟𝗠𝘀 📖

I wanted to structure my thinking about LLMs by going through their history since the 50s. This history is captivating, with the opposition between Connexionists (Rosenblatt, LeCun) and Symbolists, the first victories of "deep" neural networks, the revolution of Attention...

So I might have gone a bit too far! 😅

📝 I've made a long post summarizing the main stages of building LLMs: neural networks, optimization, backpropagation, attention layers...

And I've made sure to keep it 100% horrible-latex-math-free: the technical stuff is conveyed in graphs only, so it should be accessible to really anyone, even your grandfather (I'm sending it to mine right now).

Read it here in english 👉 https://aymeric-roucher.github.io/brief-history-of-ai/
Pour le post en français 👉 https://aymeric-roucher.github.io/breve-histoire-de-l-ia/
Fal/AuraFlow-v0.3
is now here with support for different aspect resolutions (w/h up to 1536px!) and much nicer aesthetics! Make sure to install the latest diffusers to get support for it.
As some of you know, I try to convert models to either fp32 or bf16 depending on theirs size before doing imatrix and quantization

Today I decided to see if that matters, and the results have me.. for lack of a better word, perplexed

My setup:

Mistral Nemo Instruct 2407
- convert to FP32, calculate imatrix, quantize to Q8_0 and Q4_K_M
- convert to FP16, calculate imatrix, quantize to Q8_0 and Q4_K_M

I calculated the kld base from the FP32 model:
./llama-perplexity -m /models/Mistral-Nemo-Instruct-2407-f32.gguf -f /training_data/wikitext-2-raw/wiki.test.raw --kl-divergence-base /training_data/mistral-nemo-f32.kld -ngl 35 -fa -sm row

then calculated the divergence itself for each like so:
./llama-perplexity -m /models/Mistral-Nemo-Instruct-2407-Q8_0.gguf -f /training_data/wikitext-2-raw/wiki.test.raw --kl-divergence-base /training_data/mistral-nemo-f32.kld --kl-divergence -ngl 50 -fa -sm row

Q4_K_M from fp16 and fp32 were similar, trading blows across statistics, odd since i expected fp32 to be strictly better but it's not

Q8_0 is where things get weird. Despite each file being slightly different size, and the sha256sum of course being different, they each get *completely identical* scores, down to 6 decimal places of precision on the statistics.

How is this possible? Is there something I don't understand about llama.cpp that makes it always convert to fp16 before it does quantization? Am I wasting time using FP32/BF16??
https://huggingface.co/posts/bartowski/608656345183499 @bartowski on Hugging Face:
Improved ControlNet!
Now supports dynamic resolution for perfect landscape and portrait outputs. Generate stunning images without distortion—optimized for any aspect ratio!
...
https://huggingface.co/spaces/DamarJati/FLUX.1-DEV-Canny FLUX.1-DEV Canny - a Hugging Face Space by DamarJati
SAM2 Video Predictor
This is a simple demo for video segmentation with SAM2.

Instructions: (read the instructions)

Upload your video [MP4-24fps]
With 'include' point type selected, Click on the object to mask on first frame
Switch to 'exclude' point type if you want to specify an area to avoid
Get Mask !
Check Propagation every 15 frames
Add point on corresponding frame number if any mask needs to be refined
If propagation seems ok on every 15 frames, propagate with "render" to render final masked video !
Hit Reset button if you want to refresh and start again.
Input video will be processed over 10 seconds only for demo purpose :)
https://huggingface.co/spaces/fffiloni/SAM2-Video-Predictor SAM2 Video Predictor - a Hugging Face Space by fffiloni
Second day of Composer use
genecyber
1h
First of all, I’m loving composer, I use it in combination with live preview and can see real time rendering of edits also.

Second of all, we really need some sort of internal versioning, unable to undo is pretty scary. Also id I add my own versions, composer gets confused which files it’s editing.

I had a filename change work tonight, that was cool.

Mostly I’m struggling to add files to composer context that stick, but it might be that my composer disconnects? If composer gets in a weird state I end up having to restart cursor completely to recover.

why is the ui not following standards? in minimized the + makes a new instead of expanding, and the only way to expand floating is keys+i ? at least I can now find my composer instances I might accidentally lose by clicking plus.

The direction is amazing, and far surpasses any other experiences I’ve tried with collaborative editing multiple files.

Also love the ability to modularize the code when it gets just too much for context windows. I’ll create a styles.css add it to composer context and ask it to move styles to css file, and so on, this is great. otherwise Cursor suffers like everyone else with attention when it comes to large files.

These are all over the place I know.
https://forum.cursor.com/t/second-day-of-composer-use/7584
Make Google part of your security team. Join Mandiant and Google Cloud experts online for Google Cloud Security Summit, Thursday, August 22, at 11:30 AM CST, to discover how you can defend your organisation against evolving cyberthreats with intel-driven security operations, a secure-by-design foundation, and AI innovations across the security life cycle.


Register to dive deep into key security topics and new technologies:
https://cloudonair.withgoogle.com/events/summit-apac-security-24?utm_content=invite2&utm_source=cloud_sfdc&utm_medium=email&utm_campaign=FY24-Q3-apac-EXP120-onlineevent-er-dgcsm-security-summit-2024-mc&pref=K&mkt_tok=ODA4LUdKVy0zMTQAAAGU6OHWBoF2Y5Hwwfb2QFOtyxtgmixuo-CGF6NRTRFIkjwshtRL-iLyPcZqVgrMOI8bqjtZOditNJpP6QJl-PDITmFSR8L1dvNKb2vJEg3zPxovd3Vaavw

Start with the opening keynote.

Join Sunil Potti, VP and GM of Google Cloud Security, to explore how AI is enhancing security and helping organisations boost their resilience.


Get to know Gemini for Security.

Check out the latest ways Google AI is transforming cloud security, security operations, and threat intelligence with robust Gemini-powered capabilities.


Gain valuable insights from the 2024 M-Trends report.

Learn about the evolving cyber-threat landscape from Steve D’sa, Regional Leader, Mandiant Consulting. Featuring APAC perspectives and best practice you can apply directly to your security program.

Register yourself and your team today, join five or more sessions and receive a Google Cloud collectible digital badge in recognition of your participation.
Metric Card for CharacTER
CharacTer is a character-level metric inspired by the commonly applied translation edit rate (TER). It is defined as the minimum number of character edits required to adjust a hypothesis, until it completely matches the reference, normalized by the length of the hypothesis sentence. CharacTer calculates the character level edit distance while performing the shift edit on word level. Unlike the strict matching criterion in TER, a hypothesis word is considered to match a reference word and could be shifted, if the edit distance between them is below a threshold value. The Levenshtein distance between the reference and the shifted hypothesis sequence is computed on the character level. In addition, the lengths of hypothesis sequences instead of reference sequences are used for normalizing the edit distance, which effectively counters the issue that shorter translations normally achieve lower TER. If this is a text-based metric, make sure to wrap you input in double quotes. Alternatively you can use a JSON-formatted list as input.
https://huggingface.co/spaces/evaluate-metric/character CharacTER - a Hugging Face Space by evaluate-metric
Metric Card for BLEU
BLEU (Bilingual Evaluation Understudy) is an algorithm for evaluating the quality of text which has been machine-translated from one natural language to another. Quality is considered to be the correspondence between a machine’s output and that of a human: “the closer a machine translation is to a professional human translation, the better it is” – this is the central idea behind BLEU. BLEU was one of the first metrics to claim a high correlation with human judgements of quality, and remains one of the most popular automated and inexpensive metrics.

Scores are calculated for individual translated segments—generally sentences—by comparing them with a set of good quality reference translations. Those scores are then averaged over the whole corpus to reach an estimate of the translation’s overall quality. Neither intelligibility nor grammatical correctness are not taken into account.

If this is a text-based metric, make sure to wrap you input in double quotes. Alternatively you can use a JSON-formatted list as input. https://huggingface.co/spaces/evaluate-metric/bleu BLEU - a Hugging Face Space by evaluate-metric