Share and discover more about AI with social posts from the community.huggingface/OpenAi
Excited to share the latest update to the Notebook Creator Tool!

Now with basic fine-tuning support using Supervised Fine-Tuning! 🎯

How it works:
1️⃣ Choose your Hugging Face dataset and notebook type (SFT)
2️⃣ Automatically generate your training notebook
3️⃣ Start fine-tuning with your data!

Link to the app 👉 https://lnkd.in/e_3nmWrB
💡 Want to contribute with new notebooks? 👉https://lnkd.in/eWcZ92dS

https://huggingface.co/posts/asoria/316708748461696#:~:text=Excited%20to%20share,lnkd.in/eWcZ92dS
Good folks from VILA Lab at Mohamed bin Zayed University of AI have introduced 26 guiding principles for optimizing prompts when interacting with large language models (LLMs) like LLaMA and GPT.

These principles aim to enhance LLM response quality, accuracy, and task alignment across various scales of models.

1. Be direct and concise, avoiding unnecessary politeness.
2. Specify the intended audience.
3. Break complex tasks into simpler steps.
4. Use affirmative directives instead of negative language.
5. Request explanations in simple terms for clarity.
6. Mention a potential reward for better solutions.
7. Provide examples to guide responses.
8. Use consistent formatting and structure.
9. Clearly state tasks and requirements.
10. Mention potential penalties for incorrect responses.
11. Request natural, human-like answers.
12. Encourage step-by-step thinking.
13. Ask for unbiased responses without stereotypes.
14. Allow the model to ask clarifying questions.
15. Request explanations with self-tests.
16. Assign specific roles to the model.
17. Use delimiters to separate sections.
18. Repeat key words or phrases for emphasis.
19. Combine chain-of-thought with few-shot prompts.
20. Use output primers to guide responses.
21. Request detailed responses on specific topics.
22. Specify how to revise or improve text.
23. Provide instructions for generating multi-file code.
24. Give specific starting points for text generation.
25. Clearly state content requirements and guidelines.
26. Request responses similar to provided examples.

Results show significant improvements in both "boosting" (response quality enhancement) and "correctness" across different model scales. Using the ATLAS benchmark, specialized prompts improved response quality and accuracy by an average of 57.7% and 67.3%, respectively, when applied to GPT-4.
today's release: the updated Supernova general chat dataset!

- the new Supernova is 2x the rows, continuing to provide high quality general synthetic data generated with Llama 405b Instruct.

Find it at
sequelbox/Supernova


Enjoy! There's also a new version of
sequelbox/Llama3.1-8B-MOTH
available using the new dataset. (new and better MOTHs for other models will come as well, but the Build Tools and Shining Valiant take priority.)
Single Block / Layer FLUX LoRA Training Research Results and LoRA Network Alpha Change Impact With LoRA Network Rank Dimension

Full article posted here : https://medium.com/@furkangozukara/single-block-layer-flux-lora-training-research-results-and-lora-network-alpha-change-impact-with-e713cc89c567

Conclusions
As expected, as you train lesse parameters e.g. LoRA vs Full Fine Tuning or Single Blocks LoRA vs all Blocks LoRA, your quality get reduced
Of course you earn some extra VRAM memory reduction and also some reduced size on the disk
Moreover, lesser parameters reduces the overfitting and realism of the FLUX model, so if you are into stylized outputs like comic, it may work better
Furthermore, when you reduce LoRA Network Rank, keep original Network Alpha unless you are going to do a new Learning Rate research
Finally, very best and least overfitting is achieved with full Fine Tuning
Check figure 3 and figure 4 last columns — I make extracted LoRA Strength / Weight 1.1 instead of 1.0
Full fine tuning configs and instructions > https://www.patreon.com/posts/112099700
Second best one is extracting a LoRA from Fine Tuned model if you need a LoRA
Check figure 3 and figure 4 last columns — I make extracted LoRA Strength / Weight 1.1 instead of 1.0
Extract LoRA guide (public article) : https://www.patreon.com/posts/112335162
Third is doing a all layers regular LoRA training
Full guide, configs and instructions > https://www.patreon.com/posts/110879657
And the worst quality is training lesser blocks / layers with LoRA
Full configs are included in > https://www.patreon.com/posts/110879657
So how much VRAM and Speed single block LoRA training brings?
All layers 16 bit is 27700 MB (4.85 second / it) and 1 single block is 25800 MB (3.7 second / it)
All layers 8 bit is 17250 MB (4.85 second / it) and 1 single block is 15700 MB (3.8 second / it)
Image Raw Links
Figure 0 :
MonsterMMORPG/FLUX-Fine-Tuning-Grid-Tests Single Block / Layer FLUX LoRA Training Research Results and LoRA Network Alpha Change Impact With LoRA Network Rank Dimension
Excited to announce the release of InfiMM-WebMath-40B — the largest open-source multimodal pretraining dataset designed to advance mathematical reasoning in AI! 🧮

With 40 billion tokens, this dataset aims for enhancing the reasoning capabilities of multimodal large language models in the domain of mathematics.

If you're interested in MLLMs, AI, and math reasoning, check out our work and dataset:

🤗 HF:
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning (2409.12568)

📂 Dataset:
Infi-MM/InfiMM-WebMath-40B
OpenAI's latest model, "o1", has demonstrated remarkable performance on the Norway Mensa IQ test, scoring an estimated IQ of 120.

Everyone should think before answering!

Key findings:

• o1 correctly answered 25 out of 35 IQ questions, surpassing average human performance
• The model excelled at pattern recognition and logical reasoning tasks
• Performance was validated on both public and private test sets to rule out training data bias

Technical details:

• o1 utilizes advanced natural language processing and visual reasoning capabilities
• The model likely employs transformer architecture with billions of parameters
• Improved few-shot learning allows o1 to tackle novel problem types

Implications:

• This represents a significant leap in AI reasoning abilities
• We may see AIs surpassing 140 IQ by 2026 if the trend continues
• Raises important questions about the nature of intelligence and cognition
Nvidia just released a small 4B Nemotron-mini model , and it works surprisingly well !

you can check it out here :

base :
nvidia/Minitron-4B-Base https://huggingface.co/nvidia/Minitron-4B-Base

instruct :
nvidia/Nemotron-Mini-4B-Instruct

demo :
Tonic/Nemotron-Mini-4B nvidia/Minitron-4B-Base · Hugging Face
Chat as a way to query SQL! The Airtrain AI team is happy to share a new Hugging Face Space that lets you interact with Hugging Face Hub datasets using a natural language chatbot. 🤗

Start Exploring 👉
airtrain-ai/hf-dataset-chat-to-sql


This Space is forked from
davidberenstein1957/text-to-sql-hub-datasets
by @davidberenstein1957 and features chat capability with improved table naming. The tool works with Hugging Face’s recently released in-browser DuckDB-based SQL query engine for datasets.
2024 CVPR Videos Are Now Available! 🎥

CVPR conference keynotes, panels, posters, workshops, and other content are now available.

⬇️
https://cvpr.thecvf.com/Conferences/2024/Videos
𝗨𝗽𝗹𝗼𝗮𝗱 𝗹𝗮𝗿𝗴𝗲 𝗳𝗼𝗹𝗱𝗲𝗿𝘀 with ease using huggingface-cli upload-large-folder. Designed for your massive models and datasets. Much recommended if you struggle to upload your Llama 70B fine-tuned model 🤡
🔎 𝗦𝗲𝗮𝗿𝗰𝗵 𝗔𝗣𝗜: new search filters (gated status, inference status) and fetch trending score.
⚡️𝗜𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲𝗖𝗹𝗶𝗲𝗻𝘁: major improvements simplifying chat completions and handling async tasks better.

We’ve also introduced tons of bug fixes and quality-of-life improvements - thanks to the awesome contributions from our community! 💪

💡 Check out the release notes:
Wauplin/huggingface_hub#8


Want to try it out? Install the release with:
Solar Pro Preview: The most intelligent LLM on a single GPU
Summary
We introduce Solar Pro Preview, an advanced large language model (LLM) with 22 billion parameters designed to fit into a single GPU. Solar Pro Preview shows superior performance compared to LLMs with less than 30 billion parameters and delivers performance comparable to models over three times its size, such as Llama 3.1 with 70 billion parameters.

Solar Pro Preview is developed using an enhanced version of our previous depth up-scaling method, which scales a Phi-3-medium model with 14 billion parameters to 22 billion parameters, intended to run on a GPU with 80GB of VRAM. Our carefully curated training strategy and dataset have significantly enhanced performance from Phi-3-medium, particularly on the MMLU-Pro and IFEval benchmarks, both respected for evaluating a model’s knowledge and instruction-following abilities.

Solar Pro Preview is a pre-release version of the official Solar Pro, with limitations on language coverage and a maximum context length of 4K. However, we believe Solar Pro Preview not only stands out as a highly efficient and capable model, but has the potential to be further extended to cover more languages and capabilities. The official version of Solar Pro will be released this November 2024 with expanded language support beyond English and longer context windows. To stay informed about the latest updates, please sign up for our mailing list. If you have any feedback or questions about the model, please visit our model discussion board and connect with us directly. https://huggingface.co/upstage/solar-pro-preview-instruct upstage/solar-pro-preview-instruct · Hugging Face
Bringing Open-Source Text-to-Speech to French! 🗣🇫🇷

Hugging Face's Parler TTS mini can now speak French! 🇫🇷🎉
You can try it here:
PHBJT/french_parler_tts


Key highlights:
Transform the English TTS model to speak French 🇬🇧➡️🇫🇷
Fully open source (code, weights, and datasets) 🛠
It can be replicated for every language 🌍

Read more about it in this article: https://huggingface.co/blog/PHBJT/french-parler-tts

Special thanks to FlexAI and their dedicated team for providing the computing power that made this possible and of course to all of the Parler TTS community 🤗 Fine-tuning Parler TTS on a Specific Language
𝗔𝗿𝗲 𝗔𝗴𝗲𝗻𝘁𝘀 𝗰𝗮𝗽𝗮𝗯𝗹𝗲 𝗲𝗻𝗼𝘂𝗴𝗵 𝗳𝗼𝗿 𝗗𝗮𝘁𝗮 𝗦𝗰𝗶𝗲𝗻𝗰𝗲? ⇒ 𝗠𝗲𝗮𝘀𝘂𝗿𝗲 𝘁𝗵𝗲𝗶𝗿 𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲 𝘄𝗶𝘁𝗵 𝗗𝗦𝗕𝗲𝗻𝗰𝗵 📊

A team from Tencent AI wanted to evaluate agentic systems on data science (DS) tasks : but they noticed that existing agentic benchmarks were severely limited in several aspects: they were limited to text and did not include tables or images, were only specific to certain packages, only performed exact match evaluation…

➡️ So they set out to build a much more exhaustive approach, to finally make the definitive DS agent benchmark.

𝗧𝗵𝗲 𝗗𝗦𝗕𝗲𝗻𝗰𝗵 𝗱𝗮𝘁𝗮𝘀𝗲𝘁
▪️DS bench has 466 data analysis tasks and 74 data modelling tasks
▪️The tasks are sourced from ModelOff and Kaggle, the platforms hosting the most popular data science competitions
▪️Difference with previous DS benchmarks:
❶ This benchmark leverages various modalities on top of text: images, Excel files, tables
❷ Complex tables: sometimes several tables should be leveraged to answer one question
❸ The context is richer, with longer descriptions.
▪️ Evaluation metrics : the benchmark is scored with an LLM as a judge, using a specific prompt.

𝗜𝗻𝘀𝗶𝗴𝗵𝘁𝘀 𝗳𝗿𝗼𝗺 𝗲𝘃𝗮𝗹𝘂𝗮𝘁𝗶𝗻𝗴 𝗮𝗴𝗲𝗻𝘁𝘀
▪️ Their evaluation confirms that using LLMs in an agent setup, for instance by allowing them to run a single step of code execution, is more costly (especially with multi-turn frameworks like autogen) but also much more performant than the vanilla LLM.
▪️ The sets of tasks solved by different models (like GPT-3.5 vs Llama-3-8B) has quite low overlap, which suggests that different models tend to try very different approches.

This new benchmark is really welcome, can't wait to try transformers agents on it! 🤗

Read their full paper 👉
DSBench: How Far Are Data Science Agents to Becoming Data Science Experts? (2409.07703)https://huggingface.co/papers/2409.07703 Paper page - DSBench: How Far Are Data Science Agents to Becoming Data Science  Experts?
Bringing Open-Source Text-to-Speech to French! 🗣🇫🇷

Hugging Face's Parler TTS mini can now speak French! 🇫🇷🎉
You can try it here:
PHBJT/french_parler_tts


Key highlights:
Transform the English TTS model to speak French 🇬🇧➡️🇫🇷
Fully open source (code, weights, and datasets) 🛠
It can be replicated for every language 🌍

Read more about it in this article: https://huggingface.co/blog/PHBJT/french-parler-tts

Special thanks to FlexAI and their dedicated team for providing the computing power that made this possible and of course to all of the Parler TTS community 🤗 Fine-tuning Parler TTS on a Specific Language
OpenAI's latest model, "o1", has demonstrated remarkable performance on the Norway Mensa IQ test, scoring an estimated IQ of 120.

Everyone should think before answering!

Key findings:

• o1 correctly answered 25 out of 35 IQ questions, surpassing average human performance
• The model excelled at pattern recognition and logical reasoning tasks
• Performance was validated on both public and private test sets to rule out training data bias

Technical details:

• o1 utilizes advanced natural language processing and visual reasoning capabilities
• The model likely employs transformer architecture with billions of parameters
• Improved few-shot learning allows o1 to tackle novel problem types

Implications:

• This represents a significant leap in AI reasoning abilities
• We may see AIs surpassing 140 IQ by 2026 if the trend continues
• Raises important questions about the nature of intelligence and cognitionhttps://cdn-uploads.huggingface.co/production/uploads/662bf5bfe93bb73804ef9344/Vk04-meRDfz9ay8YaMrLT.png
🙋🏻‍♂️Hey there folks,

Nvidia just released a small 4B Nemotron-mini model , and it works surprisingly well !

you can check it out here :

base :
nvidia/Minitron-4B-Base

instruct :
nvidia/Nemotron-Mini-4B-Instruct

demo :
Tonic/Nemotron-Mini-4B


hoep you like it 🤗🤗
💬 Chat as a way to query SQL! The Airtrain AI team is happy to share a new Hugging Face Space that lets you interact with Hugging Face Hub datasets using a natural language chatbot. 🤗

Start Exploring 👉
airtrain-ai/hf-dataset-chat-to-sql


This Space is forked from
davidberenstein1957/text-to-sql-hub-datasets
by @davidberenstein1957 and features chat capability with improved table naming. The tool works with Hugging Face’s recently released in-browser DuckDB-based SQL query engine for datasets.
Could someone please give me a screenshot of their fine tuning/training space form before they initiate the training? I have no idea the format column mapping field.
Column1,column2,column3
"Column1","column2","column3"
🤷
For all the Muslims out there who are interested in Quran and its tafsir (explanations). This humble dataset consists of 84 different books of tafsir for nearly all the ayat in the Quran:
MohamedRashad/Quran-Tafseer


I hope it helps someone to build something nice and useful with it ^_^