Share and discover more about AI with social posts from the community.huggingface/OpenAi
A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
GitHub | Demo

MiniCPM-V 2.6
MiniCPM-V 2.6 is the latest and most capable model in the MiniCPM-V series. The model is built on SigLip-400M and Qwen2-7B with a total of 8B parameters. It exhibits a significant performance improvement over MiniCPM-Llama3-V 2.5, and introduces new features for multi-image and video understanding. Notable features of MiniCPM-V 2.6 include:https://github.com/OpenBMB/MiniCPM-V GitHub - OpenBMB/MiniCPM-V: MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
If you are interested in Knowledge Graphs, I invented all of this a year ago. It is a Encoder/Decoder that works with Knowledge Graphs. I am glad the world finally realizes this is useful a year later. I tried to tell you. I have not licensed any of the math. I own all of it. I do not have any plans to ever enforce the licensing but I like holding onto it.

https://huggingface.co/blog/TuringsSolutions/pfafresearch Probabilistic Fractal Activation Function (P-FAF) and Its Advantages Over Traditional Word Vectorization
FalconMamba 7B - a new model from TII (Technology Innovation Institute) is out !

- Blogpost: https://huggingface.co/blog/falconmamba
- Link to collection:
tiiuae/falconmamba-7b-66b9a580324dd1598b0f6d4a

- Link to playground:
tiiuae/falcon-mamba-playground
Announcements and New Features
- 🥳 We just launched the Open Source survey: https://hf.co/oss-survey. Feel free to provide feedback and shape the future of our Open Source ecosystem. You can even get a special GitHub badge!
- New 🤗 Transformers documentation! It has dark mode, new style, quick search and more https://hf.co/docs/transformers
- 🤖 RL at HF!? Last week we published Snowball Fight, a Unity ML-Agents DRL environment hosted in the Hub! Try it here: https://bit.ly/3FYhchD
- we also have two new channels ⁠无访问权限 and ⁠无访问权限! 🎉
- and an accompanying blog post https://huggingface.co/blog/snowball-fight
- New blog post! Getting Started with Hugging Face Transformers for IPUs with Optimum https://hf.co/blog/graphcore-getting-started

Events
Many events coming soon!
- Dec 8 On Sentiments and Biases @Merve Noyan and Vincent from Rasa will talk about various challenges in NLP.
https://discord.gg/2ajRMS9N?event=917383152144637982
- Dec 8 & 9: HF will be at The AI Summit in NY. If you're around you should visit! https://newyork.theaisummit.com/
- Dec 10: @lewtun and @Merve Noyan will be answering your questions in a Question Answering task special workshop! https://discord.gg/aKQTbg8d?event=917368738616057896
-🔊 Dec 14: An audio study group is forming at ⁠audio-discuss! The first meetup will happen next week! Join if you want to learn about Automatic Speech Recognition, TTS, and more.
Announcements and New Features
- Training's energy consumption and CO2 emissions can now be included in model repos https://twitter.com/julien_c/status/1461701986886336516 🌏
- New improved token system https://twitter.com/SimonBrandeis/status/1461389412621819908?s=20 🔒
- Summary of talks from the course event https://huggingface.co/course/event/1?fw=pt with some very nice visuals 🎨
- If you missed it, FaceBook XLS-R is an audio model pretrained on 128 spoken languages. We wrote a guide about fine-tuning it for multi-lingual ASR https://huggingface.co/blog/fine-tune-xlsr-wav2vec2 🎙
- New 3 part video series on how to train a vision transformer with SageMaker https://twitter.com/julsimon/status/1463934344293236741
- Building a dataset from images stored in S3 https://youtu.be/jalopOoBL5M
- Train with Transformers https://youtu.be/iiw9dNG7JcU
- Train with PyTorch Lightning https://youtu.be/rjYV0kKHjBA
- New blog post about accelerating distributed training with Intel technologies 🔥 https://huggingface.co/blog/accelerating-pytorch

Upcoming Events: 🥳
- Nov 30: Implementing DietNeRF with JAX and Flax. Learn about NeRF, JAX+Flax, 3D reconstruction, HF Spaces, and more. See you at ⁠无访问权限 https://www.youtube.com/watch?v=A9iefUXkvQU
- Dec 4: We have a nice talk about Spaces at PyCon Indonesia https://pycon.id/.
- Dec 8: On Sentiments & Biases with @Merve and Vincent Warmerdam from Rasa. https://twitter.com/mervenoyann/status/1464162219357360135
- Dec 8: Accelerating Transformers Down to 1ms - To Infinity and Beyond! at The AI Summit https://newyork.theaisummit.com/. Julien Chaumond
Our course community event is going full speed right now! Check out ⁠无访问权限 and related channels to see some very cool projects going on 😎

Announcements and New Features
- New Activity Feed! If you log in, your home screen will show an activity feed and trending repos. Check it out 🔥
- Facebook/Meta just released XLS-R, a model pretrained on 128 spoken languages 🌍. Try out "All-to-All" Speech Translation https://huggingface.co/spaces/facebook/XLS-R-2B-22-16 and read about fine-tuning it in https://huggingface.co/blog/fine-tune-xlsr-wav2vec2.
- Korean GPT is now in the Hub https://huggingface.co/kakaobrain/kogpt 🇰🇷

Upcoming Events: 🥳
- Nov 30: Implementing DietNeRF with JAX and Flax www.youtube.com/watch?v=A9iefUXkvQU

Previous events:
- All course talks playlist: https://www.youtube.com/playlist?list=PLo2EIpI_JMQvcXKx5RFReyg6Qd2UICAif
- Search Like You Mean It: Semantic Search with NLP and a Vector Database https://hubs.ly/H0_r91Q0 XLS-R All-to-All 2B - a Hugging Face Space by facebook
Releasing HQQ Llama-3.1-70b 4-bit quantized version! Check it out at
mobiuslabsgmbh/Llama-3.1-70b-instruct_4bitgs64_hqq
.

Achieves 99% of the base model performance across various benchmarks! Details in the model card.
a new reward modelling dataset:
Avelina/UltraSteer-v0


UltraSteer-V0 is a massive collection of single- and multi-turn dialogue with fine-grained reward labels produced by Nvidia's
nvidia/Llama2-13B-SteerLM-RM
reward model. We have a total of 2.3M labelled sequences taken from high quality datasets with a total of 2.8M labelled turns each containing 9 attributes produced as is from the reward model.

This is still very much an early version of the dataset (but it's fully usable!) and an updated version will be on the way with a full paper.

I would really appreciate if people could take a look at the dataset and suggest any improvements (e.g. more data sources, different cleaning approaches, different label schema, etc) in the community section.
I'm excited to announce that Transformers.js V3 is finally available on NPM! 🔥 State-of-the-art Machine Learning for the web, now with WebGPU support! 🤯⚡️

Install it from NPM with:
𝚗𝚙𝚖 𝚒 @𝚑𝚞𝚐𝚐𝚒𝚗𝚐𝚏𝚊𝚌𝚎/𝚝𝚛𝚊𝚗𝚜𝚏𝚘𝚛𝚖𝚎𝚛𝚜

or via CDN, for example: https://v2.scrimba.com/s0lmm0qh1q

Segment Anything demo:
webml-community/segment-anything-webgpu
mobiuslabsgmbh/Llama-3.1-70b-instruct_4bitgs64_hqq

99% of the performance across various benchmarks!

mobiuslabsgmbh/Llama-3.1-70b-instruct_4bitgs64_hqq
Welcome FalconMamba 7B - permissively licensed, State Space Model trained on 5.5T tokens, near infinite sequential pre-fill! 🔥

> Base model only - scores higher than Llama 3.1 8B on ARC, GSM8K & TruthfulQA
> Also, beats L3.1 8B on MUSR and GPQA
> Trained on 256 8x H100s on AWS using 3D parallelism w/ ZeRO (took 2 months)
> Trained on 5.5 T tokens (4096 -> 8192 ctx len)
> Uses the same tokenizer as Falcon 7B & 11B
> Released under Apache 2.0 licensed w/ acceptable use policy
> Integrated w/ transformers 🤗

Kudos to TII for a brilliant base model! Now go fine-tune it y'all!
I'm excited to announce that Transformers.js V3 is finally available on NPM! 🔥 State-of-the-art Machine Learning for the web, now with WebGPU support! 🤯⚡️

Install it from NPM with:
𝚗𝚙𝚖 𝚒 @𝚑𝚞𝚐𝚐𝚒𝚗𝚐𝚏𝚊𝚌𝚎/𝚝𝚛𝚊𝚗𝚜𝚏𝚘𝚛𝚖𝚎𝚛𝚜

or via CDN (example below) 👇
How to set up and use the HuggingFace Transformers library
Setting up and using the HuggingFace Transformers library involves several steps. Below is a detailed guide to help you get started:

Step 1: Install the Library
First, you need to install the HuggingFace Transformers library. You can do this using pip:

bash

pip install transformers
Step 2: Import the Necessary Modules
Once the library is installed, you can import the necessary modules in your Python script or Jupyter notebook:

python

from transformers import pipeline
Step 3: Initialize a Pipeline
HuggingFace Transformers provides a high-level API called pipeline that simplifies the process of using pre-trained models for various tasks such as text classification, token classification, question answering, etc.

Here’s how you can initialize a pipeline for a specific task:

python

# Example for text classification
classifier = pipeline('text-classification')

# Example for question answering
question_answerer = pipeline('question-answering')
Step 4: Use the Pipeline
Once the pipeline is initialized, you can use it to perform the desired task. Here are examples for text classification and question answering:

Text Classification
python

result = classifier("This is an example sentence for classification.")
print(result)
Question Answering
python

question = "What is the capital of France?"
context = "The capital of France is Paris."
result = question_answerer(question=question, context=context)
print(result)
Step 5: Fine-Tuning a Model (Optional)
If you need to fine-tune a pre-trained model on your own dataset, you can use the Trainer API provided by the Transformers library. Here’s a simplified example:

Load a Pre-trained Model and Tokenizer:

python

from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_name = "distilbert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
Prepare Your Dataset:

You need to prepare your dataset in a format that the Trainer can use. This typically involves tokenizing your text data.

Initialize the Trainer:

python

from transformers import Trainer, TrainingArguments

training_args = TrainingArguments(
output_dir="./results",
evaluation_strategy="epoch",
learning_rate=2e-5,
per_device_train_batch_size=16,
per_device_eval_batch_size=16,
num_train_epochs=3,
weight_decay=0.01,
)

trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
)
Train the Model:

python

trainer.train()
Conclusion
The HuggingFace Transformers library provides a powerful and flexible way to work with pre-trained models for a variety of NLP tasks. By following the steps above, you can set up and use the library effectively. For more detailed information and advanced usage, refer to the official documentation.
Fuck yeah! Moshi by
@kyutai_labs
just owned the stage! 🇪🇺/acc.

Architecture
1. 7B Multimodal LM (speech in, speech out)
2. 2 channel I/O - Streaming LM constantly generates text tokens as well as audio codecs (tunable)
3. Achieves 160ms latency (with a Real-Time Factor of 2)
4. The base text language model is a 7B (trained from scratch) - Helium 7B
5. Helium 7B is then jointly trained on w/ text and audio codecs
6. Speech codec is based on a Mimi (their inhouse audio compression model)
7. Mimi is a VQ-VAE capable of 300x compression factor - trained on both semantic and acoustic information
8. Text to Speech Engine supports 70 different emotions and styles like whispering, accents, personas, etc

Training/ RLHF
1. The model is fine-tuned on 100K transcripts generated by Helium itself.
2. These transcripts are highly detailed, heavily annotated with emotion and style, and conversational.
3. Text to Speech Engine is further fine-tuned on 20 hours of audio recorded by Alice and licensed.
4. The model can be fine-tuned with less than 30 minutes of audio.
5. Safety: Generated audio is watermarked (possibly w/ audioseal) & generated audios are indexed in a database
6. Trained on Scaleway cluster of 1000 H100 GPUs

Inference
1. The deployed demo model is capable of bs=2 at 24GB VRAM (hosted on Scaleway and Hugging Face)
2. Model is capable of 4-bit and 8-bit quantisation
3. Works across backends - CUDA, Metal, CPU
4. Inference code optimised with Rust
5. Further savings to be made with better KV Caching, prompt caching, etc.

Future plans
1. Short-term technical report and open model releases.
2. Open model releases would include the inference codebase, the 7B model, the audio codec and the full optimised stack.
3. Scale the model/ refine based on feedback except Moshi 1.1, 1.2, 2.0
4. License as permissive as they can be (yet to be decided)

Just 8 team members put all of this together! 🔥

After using it IRL, it feels magical to have such a quick response. It opens so many avenues: research assistance, brainstorming/Steelman discussion points, language learning, and more importantly, it's on-device with the flexibility to use it however you want!

Hats off to Kyutai and the team for shipping a version that *just* works and is out in public 🫡

Your turn, Open AI! ;)
A 7B LLM, Code Execution, and synthetic data are all you need!

NuminaMath 7B is a task-specific LLM that can solve complex math problems better than most high school students! It uses tool-integrated reasoning to solve problems by applying Chain-of-Thought reasoning and Python REPLs in an agentic flow with self-healing.

NuminaMath 7B TIR solves math problems by:
1. Generating a Chain of Thought reasoning on how to approach the problem.
2. Translating the CoT into Python Code.
3. Executes the Python Code in a REPL
3. If it fails, it tries to self-heal, repeating steps 1-3 using the wrong output.
If it succeeds, it generates a nice response with the result.

Model TL;DR:
> Fine-tuned from deepseek-math-7b-base
> Won the first progress prize in the AI Math Olympiad (AIMO)
> Built a large synthetic dataset following ToRA paper
> Trained in two-stage using Supervised Fine-Tuning on the Hugging Face cluster
> Utilizes tool-integrated reasoning with Python REPL
> Available on
@huggingface
Stitching / Blending / Sharpening

(I have created 3 spaces, might be useful for some people)

Stitching -
gokaygokay/Stitching

Blending -
gokaygokay/Blending

Sharpening -
gokaygokay/Sharpeninghttps://cdn-uploads.huggingface.co/production/uploads/630899601dd1e3075d975785/0KYk2Av5ZRyEYioqJW4oj.png
Here is an AI Puzzle!
When you solve it just use a 😎 emoji.
NO SPOILERS
A similar puzzle might have each picture that has a hidden meaning of summer, winter, fall, spring, and the answer would be seasons.

Its a little dated now (almost a year), so bottom right might be tough.

Thanks to @johko for the encouragement to post!
If you are interested in Knowledge Graphs, I invented all of this a year ago. It is a Encoder/Decoder that works with Knowledge Graphs. I am glad the world finally realizes this is useful a year later. I tried to tell you. I have not licensed any of the math. I own all of it. I do not have any plans to ever enforce the licensing but I like holding onto it.

https://huggingface.co/blog/TuringsSolutions/pfafresearch Probabilistic Fractal Activation Function (P-FAF) and Its Advantages Over Traditional Word Vectorization
Build an agent with tool-calling superpowers 🦸 using Transformers Agents
Authored by: Aymeric Roucher

This notebook demonstrates how you can use Transformers Agents to build awesome agents!

What are agents? Agents are systems that are powered by an LLM and enable the LLM (with careful prompting and output parsing) to use specific tools to solve problems.

These tools are basically functions that the LLM couldn’t perform well by itself: for instance for a text-generation LLM like Llama-3-70B, this could be an image generation tool, a web search tool, a calculator…

What is Transformers Agents? it’s an extension of our transformers library that provides building blocks to build your own agents! Learn more about it in the documentation.https://huggingface.co/learn/cookbook/agents Build an agent with tool-calling superpowers 🦸 using Transformers Agents - Hugging Face Open-Source AI Cookbook