HF-hub - Share and discover more about AI with social posts from the community.huggingface/OpenAi
Share and discover more about AI with social posts from the community.huggingface/OpenAi
Excited to share the latest update to the Notebook Creator Tool!

Now with basic fine-tuning support using Supervised Fine-Tuning! 🎯

How it works:
1️⃣ Choose your Hugging Face dataset and notebook type (SFT)
2️⃣ Automatically generate your training notebook
3️⃣ Start fine-tuning with your data!

Link to the app 👉 https://lnkd.in/e_3nmWrB
💡 Want to contribute with new notebooks? 👉https://lnkd.in/eWcZ92dS

https://huggingface.co/posts/asoria/316708748461696#:~:text=Excited%20to%20share,lnkd.in/eWcZ92dS
Good folks from VILA Lab at Mohamed bin Zayed University of AI have introduced 26 guiding principles for optimizing prompts when interacting with large language models (LLMs) like LLaMA and GPT.

These principles aim to enhance LLM response quality, accuracy, and task alignment across various scales of models.

1. Be direct and concise, avoiding unnecessary politeness.
2. Specify the intended audience.
3. Break complex tasks into simpler steps.
4. Use affirmative directives instead of negative language.
5. Request explanations in simple terms for clarity.
6. Mention a potential reward for better solutions.
7. Provide examples to guide responses.
8. Use consistent formatting and structure.
9. Clearly state tasks and requirements.
10. Mention potential penalties for incorrect responses.
11. Request natural, human-like answers.
12. Encourage step-by-step thinking.
13. Ask for unbiased responses without stereotypes.
14. Allow the model to ask clarifying questions.
15. Request explanations with self-tests.
16. Assign specific roles to the model.
17. Use delimiters to separate sections.
18. Repeat key words or phrases for emphasis.
19. Combine chain-of-thought with few-shot prompts.
20. Use output primers to guide responses.
21. Request detailed responses on specific topics.
22. Specify how to revise or improve text.
23. Provide instructions for generating multi-file code.
24. Give specific starting points for text generation.
25. Clearly state content requirements and guidelines.
26. Request responses similar to provided examples.

Results show significant improvements in both "boosting" (response quality enhancement) and "correctness" across different model scales. Using the ATLAS benchmark, specialized prompts improved response quality and accuracy by an average of 57.7% and 67.3%, respectively, when applied to GPT-4.
today's release: the updated Supernova general chat dataset!

- the new Supernova is 2x the rows, continuing to provide high quality general synthetic data generated with Llama 405b Instruct.

Find it at
sequelbox/Supernova


Enjoy! There's also a new version of
sequelbox/Llama3.1-8B-MOTH
available using the new dataset. (new and better MOTHs for other models will come as well, but the Build Tools and Shining Valiant take priority.)
Single Block / Layer FLUX LoRA Training Research Results and LoRA Network Alpha Change Impact With LoRA Network Rank Dimension

Full article posted here : https://medium.com/@furkangozukara/single-block-layer-flux-lora-training-research-results-and-lora-network-alpha-change-impact-with-e713cc89c567

Conclusions
As expected, as you train lesse parameters e.g. LoRA vs Full Fine Tuning or Single Blocks LoRA vs all Blocks LoRA, your quality get reduced
Of course you earn some extra VRAM memory reduction and also some reduced size on the disk
Moreover, lesser parameters reduces the overfitting and realism of the FLUX model, so if you are into stylized outputs like comic, it may work better
Furthermore, when you reduce LoRA Network Rank, keep original Network Alpha unless you are going to do a new Learning Rate research
Finally, very best and least overfitting is achieved with full Fine Tuning
Check figure 3 and figure 4 last columns — I make extracted LoRA Strength / Weight 1.1 instead of 1.0
Full fine tuning configs and instructions > https://www.patreon.com/posts/112099700
Second best one is extracting a LoRA from Fine Tuned model if you need a LoRA
Check figure 3 and figure 4 last columns — I make extracted LoRA Strength / Weight 1.1 instead of 1.0
Extract LoRA guide (public article) : https://www.patreon.com/posts/112335162
Third is doing a all layers regular LoRA training
Full guide, configs and instructions > https://www.patreon.com/posts/110879657
And the worst quality is training lesser blocks / layers with LoRA
Full configs are included in > https://www.patreon.com/posts/110879657
So how much VRAM and Speed single block LoRA training brings?
All layers 16 bit is 27700 MB (4.85 second / it) and 1 single block is 25800 MB (3.7 second / it)
All layers 8 bit is 17250 MB (4.85 second / it) and 1 single block is 15700 MB (3.8 second / it)
Image Raw Links
Figure 0 :
MonsterMMORPG/FLUX-Fine-Tuning-Grid-Tests Single Block / Layer FLUX LoRA Training Research Results and LoRA Network Alpha Change Impact With LoRA Network Rank Dimension
Excited to announce the release of InfiMM-WebMath-40B — the largest open-source multimodal pretraining dataset designed to advance mathematical reasoning in AI! 🧮

With 40 billion tokens, this dataset aims for enhancing the reasoning capabilities of multimodal large language models in the domain of mathematics.

If you're interested in MLLMs, AI, and math reasoning, check out our work and dataset:

🤗 HF:
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning (2409.12568)

📂 Dataset:
Infi-MM/InfiMM-WebMath-40B
OpenAI's latest model, "o1", has demonstrated remarkable performance on the Norway Mensa IQ test, scoring an estimated IQ of 120.

Everyone should think before answering!

Key findings:

• o1 correctly answered 25 out of 35 IQ questions, surpassing average human performance
• The model excelled at pattern recognition and logical reasoning tasks
• Performance was validated on both public and private test sets to rule out training data bias

Technical details:

• o1 utilizes advanced natural language processing and visual reasoning capabilities
• The model likely employs transformer architecture with billions of parameters
• Improved few-shot learning allows o1 to tackle novel problem types

Implications:

• This represents a significant leap in AI reasoning abilities
• We may see AIs surpassing 140 IQ by 2026 if the trend continues
• Raises important questions about the nature of intelligence and cognition