Share and discover more about AI with social posts from the community.huggingface/OpenAi
MiniCPM3-4B is the 3rd generation of MiniCPM series. The overall performance of MiniCPM3-4B surpasses Phi-3.5-mini-Instruct and GPT-3.5-Turbo-0125, being comparable with many recent 7B~9B models.

Compared to MiniCPM1.0/MiniCPM2.0, MiniCPM3-4B has a more powerful and versatile skill set to enable more general usage. MiniCPM3-4B supports function call, along with code interpreter. Please refer to Advanced Features for usage guidelines.

MiniCPM3-4B has a 32k context window. Equipped with LLMxMapReduce, MiniCPM3-4B can handle infinite context theoretically, without requiring huge amount of memory.
FLUX.1 [dev] is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions. For more information, please read our blog post.

Key Features
Cutting-edge output quality, second only to our state-of-the-art model FLUX.1 [pro].
Competitive prompt following, matching the performance of closed source alternatives .
Trained using guidance distillation, making FLUX.1 [dev] more efficient.
Open weights to drive new scientific research, and empower artists to develop innovative workflows.
Generated outputs can be used for personal, scientific, and commercial purposes as described in the FLUX.1 [dev] Non-Commercial License.
When the three AI Godfathers join hands to write a paper you know it’s nothing short of classic genius! This was an excellent read, I hope they write one on Generative AI.

Read: https://www.cs.toronto.edu/~hinton/absps/NatureDeepReview.pdf
🎓 Introducing the конспекты-уроков.рф Lesson Plans Dataset -
nyuuzyou/classnotes


Dataset highlights:
- Metadata for 65,068 lesson plans from конспекты-уроков.рф
- 58,433 lesson plans available in original format
- Multilingual content: Primarily Russian, with some Kazakh, Ukrainian, Belarusian, and English
- Each entry includes: URL, title, description, author, publication date, file size, and download link
- Data reflects educational materials accessible through the конспекты-уроков.рф platform
- Licensed under Creative Commons (https://creativecommons.org/licenses/by-nc/3.0/deed.en)

This dataset offers a unique window into online educational resources, particularly in Russian-language contexts. It provides opportunities for analyzing lesson plan trends, topic distributions, and language patterns in educational materials. The dataset is particularly well-suited for tasks such as text classification and text retrieval in multilingual educational settings.
> Article read: Simple guide to LLM inference and to TGI

I've just read article "LLM inference at scale with TGI" by @martinigoyanes . It's really good content, a must-read if you want a good low-level intro to LLM inference with TGI!

My takeaways:

How does inference work?
🧠 Prefill: the input prompt is tokenized on CPU, then transferred to GPU. Then one single forward pass generates the initial token.
🔄 Decode: the model generates ("decodes") tokens one by one, each time appending the new token to the current input of size N to then generate a new token again with this augmented input of length N+1. This loop ends either when a specific token called "End-of-sequence" is generated or when the completion reaches a pre-specified maximum length. Then the sequence is de-tokenized on CPU to yield text again.
This step's speed determines the Time Per Output Token, which directly translates to the key metric: Throughput

🤔 How was the separation between the two steps decided ? Like, why does prefill include this strange generation of only one token at then end?
➡️ The cost of attention scales quadratically with the number of tokens, so it can really explode quickly.
To compensate for that, a really important technique called KV caching was devised: using the fact that when generating token N+1, the Key and Value (K and V) matrices generated inside the Transformers are a simple extension from the K and V from the previous step, the model caches the K and V matrices between steps : thus the separation - the prefill part is the part that prepares this KV cache, while the decoding is the one that leverages it and expands it by one at each step.

TGI-specific takeaways:
⚙️ TGI has many SOTA techniques for decoding: Paged Attention, KV Caching and Flash Attention…
🔀 TGI's router handles generations finishing early because of an EOS token: instead of static batching, it continuously batches requests to the inference engine & filters away finished requests. https://cdn-uploads.huggingface.co/production/uploads/63d10d4e8eaa4831005e92b5/8_CFLfbkMRDWj8QkgTcRh.png
Help me to upgrade my model.

Hi all, so I am a complete beginner in coding, however, with the help of Claude (similar to Matt :P) and GPT 4o have been able to develop this RAG PDF summarizer/Q&A plus a web search tool.

The application is specifically built for summarization task including summarizing a financial document, news article, resume, research document, call transcript, etc.

The space could be found here:
Shreyas094/SearchGPT


The news tool simply use duckduckgo chat to generate the search results using llama 3.1 70bn model.

I want your support to fine tune the retrieval task for handling more unstructured documents.
A lot of coverage of the Apple event! I’ve selected a few unique angles and distinctive takes.

The NYT
- "The iPhone’s limited feature set is emblematic of how Apple is taking a cautious approach to generative A.I."
- "Wall Street is enthusiastic about the artificially intelligent phones, with analysts predicting the features could help Apple sell a record 240 million iPhones next year."

The Guardian
- "Despite the bells and whistles, and being a tech-adopting lot, I bet many of you won’t be lining up to buy it."
- One reason is the simple cost of the iPhone 16, which starts at $799.
- The adoption of AI into the iPhone could be considered a step change in how the iPhone works. But there may not be a huge hankering to use ChatGPT on your phone."

The WSJ
- Apple didn’t say when the AI services would be available in China, its second-largest market after the U.S.
- The delay puts the iPhone maker at a disadvantage against rivals offering AI services
- Huawei held its own announcement in China to release the Mate XT, a three-way foldable smartphone with AI features.
- Apple said that the launch of Apple Intelligence was subject to regulatory approval. In China, any generative AI models that could influence public opinion need government approval.

CNN
- "For an event built around unveiling Apple’s first AI-powered iPhone, there was one striking absence over the two-hour presentation: the words 'artificial intelligence.'"
- "But Apple understands something that often gets lost in the bot-pilled bubble of Silicon Valley: Regular people don’t trust AI."

Links:
https://www.nytimes.com/2024/09/09/technology/apple-event-iphone-16-watch.html
https://www.theguardian.com/technology/article/2024/sep/10/techscape-iphone-16-cost-features
https://www.wsj.com/tech/apples-challenge-in-china-rises-with-new-rival-phones-and-ai-delay-8cf871fb?mod=rss_Technology
https://www.cnn.com/2024/09/10/business/apple-iphone-ai-nightcap/ Apple Unveils New iPhones With Built-In Artificial Intelligence
Almost ready: search for a Hugging Face dataset on the Hub from information in the datasets viewer preview!

Soon, you can find deep-cut datasets even if they don't have a full dataset card (you should still document your datasets!)

You can help improve this project by rating synthetic user search queries for hub datasets.

If you have a Hub login, you can start annotating in Argilla
in < 5 seconds here: https://davanstrien-my-argilla.hf.space/dataset/1100a091-7f3f-4a6e-ad51-4e859abab58f/annotation-mode

I need to do some tidying, but I'll share all the code and in-progress datasets for this soon!https://cdn-uploads.huggingface.co/production/uploads/60107b385ac3e86b3ea4fc34/4ZfsdtnD8ay-WnkqxjpjX.png
Reflection Llama 3.1 70B (Correct Weights) on ZeroGPU thanks to llama.cpp and unsloth (for quantization)

ZeroGPU space
-
gokaygokay/Reflection-70B-llamacpp


- Working Model
mattshumer/ref_70_e3


- Quantized Models
unsloth/Reflection-Llama-3.1-70B-GGUF
Interested in learning about everything Image?

​With the rise of recent interest in Vision Language Models (VLMs), we decided to make a push to include an ImageField within Argilla! This means any open source developer can now work on better models for vision ML tasks too and we would like to show you how.

​We would love to introduce this new feature to you, so we've prepared a set of notebooks to go over some common image scenarios.
finetune an CLIP retrieval model with sentence transformers
use ColPali+ Qwen VL for RAG and log the results to Argilla
image-generation preference: creating multi-modal preference datasets for free using Hugging Face inference endpoints.

​See you on Thursday!

https://lu.ma/x7id1jqu Everything image: from fine-tuning CLIP models to synthetic image datasets · Luma
The New York Times did a fun quiz to test your ability to detect whether a video is AI-generated or real. They put Runway, Kling, and Sora to the test.

I got 10/10 🤓 —how about you?

https://www.nytimes.com/interactive/2024/09/09/technology/ai-video-deepfake-runway-kling-quiz.html

00:12 A.I. Can Now Create Lifelike Videos. Can You Tell What’s Real?
⚖️ 𝐀𝐈 𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐢𝐬 𝐂𝐨𝐩𝐲𝐫𝐢𝐠𝐡𝐭 𝐈𝐧𝐟𝐫𝐢𝐧𝐠𝐞𝐦𝐞𝐧𝐭

This bold claim is not my opinion, but it has been made in a recent "report" of a group, whose stance is recognizable in their name. It is roughly translated as "Authors' Rights Initiative". They published a report which was also presented before the EU Parliament according to the LinkedIn post below.

I am not really interested in politics, but as an EU citizen I am of course somewhat interested in a reasonable and practical version of the EU AI Act. Not saying there should not be rules around data and AI, but this report is obviously very biased towards one side.

While I think the report itself does not deserve attention, I post it in the hope that you find more examples, where they did not address the issue adequately. Feel free to add to my LinkedIn posts (where the original authors will see it) or here.

[en] Executive summary: https://urheber.info/media/pages/diskurs/ai-training-is-copyright-infringement/3b900058e6-1725460935/executive-summary_engl_final_29-08-2024.pdf
[de] Full report: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4946214

LinkedIn: https://www.linkedin.com/posts/activity-7238912869268959232-6cFx
Remember when @Google launched MediaPipe in an effort to create efficient on-device pipelines?

They've just unlocked the ability to run 7B+ parameter language models directly in your browser. This is a game-changer for on-device AI!

Yes, they are streaming 8.6 GB model files!

Currently, they have Gemma 2B/7B running, but imagine Dynamic LoRA, multimodal support, quantization, and you never leaving Chrome!

This is a significant technical advancement, especially in Memory Optimization:

- Redesigned the model-loading code to work around WebAssembly's 4 GB memory limit.
- Implemented asynchronous loading of transformer stack layers (28 for Gemma 1.1 7B).
- Reduced peak WebAssembly memory usage to less than 1% of previous requirements.

Cross-Platform Compatibility
- Compiled the C++ codebase to WebAssembly for broad browser support.
- Utilized the WebGPU API for native GPU acceleration in browsers.

Here's why this matters:

1. Privacy: No need to send data to remote servers.
2. Cost-Efficiency: Eliminates server expenses.
3. Offline Capabilities: Use powerful AI without an internet connection.

Blog: https://research.google/blog/unlocking-7b-language-models-in-your-browser-a-deep-dive-with-google-ai-edges-mediapipe/
The Hugging Face Semantic Dataset Search Space is back in action! You can find similar datasets by ID or perform a semantic search of dataset cards.

Give it a try:
librarian-bots/huggingface-datasets-semantic-search
https://huggingface.co/spaces/librarian-bots/huggingface-datasets-semantic-search Semantic Dataset Search - a Hugging Face Space by librarian-bots
Ultimate FLUX LoRA Training Tutorial: Windows and Cloud Deployment

I have done total 104 different LoRA trainings and compared each one of them to find the very best hyper parameters and the workflow for FLUX LoRA training by using Kohya GUI training script.

You can see all the done experiments’ checkpoint names and their repo links in following public post: https://www.patreon.com/posts/110838414

After completing all these FLUX LoRA trainings by using the most VRAM optimal and performant optimizer Adafactor I came up with all of the following ranked ready to use configurations.

You can download all the configurations, all research data, installers and instructions at the following link : https://www.patreon.com/posts/110879657


Tutorials
I also have prepared 2 full tutorials. First tutorial covers how to train and use the best FLUX LoRA locally on your Windows computer : https://youtu.be/nySGu12Y05k

This is the main tutorial that you have to watch without skipping to learn everything. It has total 74 chapters, manually written English captions. It is a perfect resource to become 0 to hero for FLUX LoRA training.

The second tutorial I have prepared is for how to train FLUX LoRA on cloud. This tutorial is super extremely important for several reasons. If you don’t have a powerful GPU, you can rent a very powerful and very cheap GPU on Massed Compute and RunPod. I prefer Massed Compute since it is faster and cheaper with our special coupon SECourses. Another reason is that in this tutorial video, I have fully in details shown how to train on a multiple GPU setup to scale your training speed. Moreover, I have shown how to upload your checkpoints and files ultra fast to Hugging Face for saving and transferring for free. Still watch first above Windows tutorial to be able to follow below cloud tutorial : https://youtu.be/-uhL2nW7Ddw

For upscaling SUPIR used : https://youtu.be/OYxVEvDf284 All The LoRA FLUX Training Experiments I Have Done So Far | SECourses: Tutorials, Guides, Resources, Training, FLUX, MidJourney…
NEW RELEASE!

- MOTH is a generalist chat model, using high quality synthetic data to improve general performance.
- Currently available for Llama 3.1 and Gemma 2, more models to follow in the future.

get the models:
sequelbox/Llama3.1-8B-MOTH https://huggingface.co/sequelbox/Llama3.1-8B-MOTH

sequelbox/gemma-2-9B-MOTHhttps://huggingface.co/sequelbox/gemma-2-9B-MOTH


get the dataset:
sequelbox/Supernova


<3 for everyone to use <3 sequelbox/Llama3.1-8B-MOTH · Hugging Face
The world’s first multilingual ColBERT: Jina ColBERT V2 and its “Russian Doll” technology
In the field of RAG, the multi-vector model ColBERT improves retrieval accuracy by generating independent vectors for each token of the document. But it also brings about a sharp increase in storage requirements, and only supports English, which limits its application scope. To solve these problems, we improved the architecture and training process of ColBERT, especially making breakthroughs in multi-language processing. The latest Jina-ColBERT-v2 supports 89 languages ​​and introduces custom output dimension options, significantly reducing storage requirements and improving the efficiency and accuracy of multi-language retrieval. The core highlights of the new version are performance enhancements: compared with the original ColBERT-v2, the English retrieval performance has improved by 6.5%; compared with the previous generation jina-colbert-v1-en, the performance has also improved by 5.4%. Multi-language support: The new version supports up to 89 languages, covering Arabic, Chinese, English, Japanese, Russian and other languages, and also supports programming languages. The output dimensions can be customized: The new version adopts "Russian doll" representation learning technology (Matryoshka Representation Learning, MRL) and provides 128, 96 and 64-dimensional output vector options, allowing users to choose the appropriate dimensions according to actual needs. The full technical report can be found on arXiv: https://arxiv.org/abs/2408.16672
SemanticFinder now supports WebGPU thanks to @Xenova's efforts with transformers.js v3!
Expect massive performance gains. Inferenced a whole book with 46k chunks in <5min. If your device doesn't support #WebGPU use the classic Wasm-based version:
- WebGPU: https://do-me.github.io/SemanticFinder/webgpu/
- Wasm: https://do-me.github.io/SemanticFinder/

WebGPU harnesses the full power of your hardware, no longer being restricted to just the CPU. The speedup is significant (4-60x) for all kinds of devices: consumer-grade laptops, heavy Nvidia GPU setups or Apple Silicon. Measure the difference for your device here:
Xenova/webgpu-embedding-benchmark

Chrome currently works out of the box, Firefox requires some tweaking.

WebGPU + transformers.js allows to build amazing applications and make them accessible to everyone. E.g. SemanticFinder could become a simple GUI for populating your (vector) DB of choice. See the pre-indexed community texts here:
do-me/SemanticFinder

Happy to hear your ideas!