HF-hub - Share and discover more about AI with social posts from the community.huggingface/OpenAi
Share and discover more about AI with social posts from the community.huggingface/OpenAi
๐—”๐—ฟ๐—ฒ ๐—”๐—ด๐—ฒ๐—ป๐˜๐˜€ ๐—ฐ๐—ฎ๐—ฝ๐—ฎ๐—ฏ๐—น๐—ฒ ๐—ฒ๐—ป๐—ผ๐˜‚๐—ด๐—ต ๐—ณ๐—ผ๐—ฟ ๐——๐—ฎ๐˜๐—ฎ ๐—ฆ๐—ฐ๐—ถ๐—ฒ๐—ป๐—ฐ๐—ฒ? โ‡’ ๐— ๐—ฒ๐—ฎ๐˜€๐˜‚๐—ฟ๐—ฒ ๐˜๐—ต๐—ฒ๐—ถ๐—ฟ ๐—ฝ๐—ฒ๐—ฟ๐—ณ๐—ผ๐—ฟ๐—บ๐—ฎ๐—ป๐—ฐ๐—ฒ ๐˜„๐—ถ๐˜๐—ต ๐——๐—ฆ๐—•๐—ฒ๐—ป๐—ฐ๐—ต ๐Ÿ“Š

A team from Tencent AI wanted to evaluate agentic systems on data science (DS) tasks : but they noticed that existing agentic benchmarks were severely limited in several aspects: they were limited to text and did not include tables or images, were only specific to certain packages, only performed exact match evaluationโ€ฆ

โžก๏ธ So they set out to build a much more exhaustive approach, to finally make the definitive DS agent benchmark.

๐—ง๐—ต๐—ฒ ๐——๐—ฆ๐—•๐—ฒ๐—ป๐—ฐ๐—ต ๐—ฑ๐—ฎ๐˜๐—ฎ๐˜€๐—ฒ๐˜
โ–ช๏ธDS bench has 466 data analysis tasks and 74 data modelling tasks
โ–ช๏ธThe tasks are sourced from ModelOff and Kaggle, the platforms hosting the most popular data science competitions
โ–ช๏ธDifference with previous DS benchmarks:
โถ This benchmark leverages various modalities on top of text: images, Excel files, tables
โท Complex tables: sometimes several tables should be leveraged to answer one question
โธ The context is richer, with longer descriptions.
โ–ช๏ธ Evaluation metrics : the benchmark is scored with an LLM as a judge, using a specific prompt.

๐—œ๐—ป๐˜€๐—ถ๐—ด๐—ต๐˜๐˜€ ๐—ณ๐—ฟ๐—ผ๐—บ ๐—ฒ๐˜ƒ๐—ฎ๐—น๐˜‚๐—ฎ๐˜๐—ถ๐—ป๐—ด ๐—ฎ๐—ด๐—ฒ๐—ป๐˜๐˜€
โ–ช๏ธ Their evaluation confirms that using LLMs in an agent setup, for instance by allowing them to run a single step of code execution, is more costly (especially with multi-turn frameworks like autogen) but also much more performant than the vanilla LLM.
โ–ช๏ธ The sets of tasks solved by different models (like GPT-3.5 vs Llama-3-8B) has quite low overlap, which suggests that different models tend to try very different approches.

This new benchmark is really welcome, can't wait to try transformers agents on it! ๐Ÿค—

Read their full paper ๐Ÿ‘‰
DSBench: How Far Are Data Science Agents to Becoming Data Science Experts? (2409.07703)https://huggingface.co/papers/2409.07703 Paper page - DSBench: How Far Are Data Science Agents to Becoming Data Science  Experts?
Bringing Open-Source Text-to-Speech to French! ๐Ÿ—ฃ๐Ÿ‡ซ๐Ÿ‡ท

Hugging Face's Parler TTS mini can now speak French! ๐Ÿ‡ซ๐Ÿ‡ท๐ŸŽ‰
You can try it here:
PHBJT/french_parler_tts


Key highlights:
Transform the English TTS model to speak French ๐Ÿ‡ฌ๐Ÿ‡งโžก๏ธ๐Ÿ‡ซ๐Ÿ‡ท
Fully open source (code, weights, and datasets) ๐Ÿ› 
It can be replicated for every language ๐ŸŒ

Read more about it in this article: https://huggingface.co/blog/PHBJT/french-parler-tts

Special thanks to FlexAI and their dedicated team for providing the computing power that made this possible and of course to all of the Parler TTS community ๐Ÿค— Fine-tuning Parler TTS on a Specific Language
OpenAI's latest model, "o1", has demonstrated remarkable performance on the Norway Mensa IQ test, scoring an estimated IQ of 120.

Everyone should think before answering!

Key findings:

โ€ข o1 correctly answered 25 out of 35 IQ questions, surpassing average human performance
โ€ข The model excelled at pattern recognition and logical reasoning tasks
โ€ข Performance was validated on both public and private test sets to rule out training data bias

Technical details:

โ€ข o1 utilizes advanced natural language processing and visual reasoning capabilities
โ€ข The model likely employs transformer architecture with billions of parameters
โ€ข Improved few-shot learning allows o1 to tackle novel problem types

Implications:

โ€ข This represents a significant leap in AI reasoning abilities
โ€ข We may see AIs surpassing 140 IQ by 2026 if the trend continues
โ€ข Raises important questions about the nature of intelligence and cognitionhttps://cdn-uploads.huggingface.co/production/uploads/662bf5bfe93bb73804ef9344/Vk04-meRDfz9ay8YaMrLT.png
๐Ÿ™‹๐Ÿปโ€โ™‚๏ธHey there folks,

Nvidia just released a small 4B Nemotron-mini model , and it works surprisingly well !

you can check it out here :

base :
nvidia/Minitron-4B-Base

instruct :
nvidia/Nemotron-Mini-4B-Instruct

demo :
Tonic/Nemotron-Mini-4B


hoep you like it ๐Ÿค—๐Ÿค—
๐Ÿ’ฌ Chat as a way to query SQL! The Airtrain AI team is happy to share a new Hugging Face Space that lets you interact with Hugging Face Hub datasets using a natural language chatbot. ๐Ÿค—

Start Exploring ๐Ÿ‘‰
airtrain-ai/hf-dataset-chat-to-sql


This Space is forked from
davidberenstein1957/text-to-sql-hub-datasets
by @davidberenstein1957 and features chat capability with improved table naming. The tool works with Hugging Faceโ€™s recently released in-browser DuckDB-based SQL query engine for datasets.
Could someone please give me a screenshot of their fine tuning/training space form before they initiate the training? I have no idea the format column mapping field.
Column1,column2,column3
"Column1","column2","column3"
๐Ÿคท
For all the Muslims out there who are interested in Quran and its tafsir (explanations). This humble dataset consists of 84 different books of tafsir for nearly all the ayat in the Quran:
MohamedRashad/Quran-Tafseer


I hope it helps someone to build something nice and useful with it ^_^
Anybody ever play Final Fantasy: Crystal Chronicles?
Like, *really* play it?

Mag Mell has been in my head recently. What a place that was.

Those cocoons looked like I could lay down inside of one, and it would be the most powerful sleep of a lifetime, with dreams that would last one thousand years, and I'd wake up with the wisdom of generations.

...Hey, anybody like text adventures?
Last Week in Medical AI: Top Research Papers/Models
๐Ÿ…(September 7 - September 14, 2024)

๐Ÿ… Medical AI Paper of the week
Chai-1 Foundation model molecular structure prediction

Medical LLMs & Benchmarks
- BrainWave: A Brain Signal Foundation Model
- DS-ViT: Vision Transformer for Alzheimerโ€™s Diagnosis
- EyeCLIP: Visualโ€“language model for ophthalmic
- Segment Anything Model for Tumor Segmentation
- MEDIC: Evaluating LLMs in Clinical Applications

Medical LLM Applications
- KARGEN: Radiology Report Generation LLMs
- DrugAgent: Explainable Drug Repurposing Agents
- Improving RAG in Medicine with Follow-up Questions

Frameworks and Methodologies
- Infrastructure for Automatic Cell Segmentation
- Data Alignment for Dermatology AI
- Diagnostic Reasoning in Natural Language
- Two-Stage Instruction Fine-tuning Approach for Med

AI in Healthcare Ethics
- Concerns and Choices of Using LLMs for Healthcare
- Understanding Fairness in Recommender Systems
- Towards Fairer Health Recommendations

Check the full thread: https://x.com/OpenlifesciAI/status/1832476252260712788

Thank you for your continued support and love for this series! Stay up-to-date with weekly updates on Medical LLMs, datasets, and top research papers by following @aaditya ๐Ÿค—
Trained Myself With 256 Images on FLUX โ€” Results Mind Blowing

Detailed Full Workflow

Medium article : https://medium.com/@furkangozukara/ultimate-flux-lora-training-tutorial-windows-and-cloud-deployment-abb72f21cbf8

Windows main tutorial : https://youtu.be/nySGu12Y05k

Cloud tutorial for GPU poor or scaling : https://youtu.be/-uhL2nW7Ddw

Full detailed results and conclusions : https://www.patreon.com/posts/111891669

Full config files and details to train : https://www.patreon.com/posts/110879657

SUPIR Upscaling (default settings are now perfect) : https://youtu.be/OYxVEvDf284

I used my Poco X6 Camera phone and solo taken images

My dataset is far from being ready, thus I have used so many repeating and almost same images, but this was rather experimental

Hopefully I will continue taking more shots and improve dataset and reduce size in future

I trained Clip-L and T5-XXL Text Encoders as well

Since there was too much push from community that my workflow wonโ€™t work with expressions, I had to take a break from research and use whatever I have

I used my own researched workflow for training with Kohya GUI and also my own self developed SUPIR app batch upscaling with face upscaling and auto LLaVA captioning improvement

Download images to see them in full size, the last provided grid is 50% downscaled

Workflow

Gather a dataset that has expressions and perspectives that you like after training, this is crucial, whatever you add, it can generate perfect

Follow one of the LoRA training tutorials / guides

After training your LoRA, use your favorite UI to generate images

I prefer SwarmUI and here used prompts (you can add specific expressions to prompts) including face inpainting :

https://gist.github.com/FurkanGozukara/ce72861e52806c5ea4e8b9c7f4409672

After generating images, use SUPIR to upscale 2x with maximum resemblance

Short Conclusions

Using 256 images certainly caused more overfitting than necessary Ultimate FLUX LoRA Training Tutorial: Windows and Cloud Deployment
Researchers from Tencent have developed DepthCrafter, a novel method for generating temporally consistent long depth sequences for open-world videos using video diffusion models.

It leverages a pre-trained image-to-video diffusion model (SVD) as the foundation and uses a 3-stage training strategy on paired video-depth datasets:
1. Train on a large realistic dataset (1-25 frames)
2. Fine-tune temporal layers on realistic data (1-110 frames)
3. Fine-tune spatial layers on synthetic data (45 frames)

It adapts SVD's conditioning mechanism for frame-by-frame video input and employs latent diffusion in VAE space for efficiency.
Sprinkle some intelligent inference strategy for extremely long videos:
- Segment-wise processing (up to 110 frames)
- Noise initialization to anchor depth distributions
- Latent interpolation for seamless stitching

And outperforms SOTA methods on multiple datasets (Sintel, ScanNet, KITTI, Bonn).

Read here: https://depthcrafter.github.io
nanoGPT with Sigmoid Self-Attention
I couldnโ€™t resist had to give it a try:)

Some observations on M2:
SSA was ~5-10% faster in training with similar final loss values, slightly less coherent text generation, marginally higher perplexity, and lower memory usage compared to softmax.

Code: https://github.com/Jaykef/ai-algorithms/blob/main/sigmoid_attn.ipynb ai-algorithms/sigmoid_attn.ipynb at main ยท Jaykef/ai-algorithms
How much VRAM will you need for training your AI model? ๐Ÿ’พ๐Ÿง 
Check out this app where you convert:
Pytorch/tensorflow summary -> required VRAM
or
Parameter count -> required VRAM

Use it in: http://howmuchvram.com

And everything is open source! Ask for new functionalities or contribute in:
https://github.com/AlexBodner/How_Much_VRAM
If it's useful to you leave a star ๐ŸŒŸand share it to someone that will find the tool useful!
More discussion in: https://x.com/AlexBodner_/status/1832054850294812679
inflatebot/MN-12B-Mag-Mell-R1
https://huggingface.co/inflatebot/MN-12B-Mag-Mell-R1
MN-12B-Mag-Mell is a multi-stage merge, inspired by hypermerges like Tiefighter and Umbral Mind, intended for use as a general-purpose "Best of Nemo" model for co-writing, roleplay, and text adventures.

Consistently, Mag Mell produced prose that shocked testers, with a minimum of "slop". It also exhibited a unique sense of humor, and a propensity for inserting bespoke details into adventuring scenarios. inflatebot/MN-12B-Mag-Mell-R1 ยท Hugging Face
Have you tried the new SQL Console yet?

Would love to know any queries you've tried or general feedback! If you haven't go try it out and let us know ๐Ÿค—

If you have some interesting queries feel free to share the URLs as well!
๐—˜๐˜…๐˜๐—ฟ๐—ฎ๐—ฐ๐˜๐—ถ๐—ป๐—ด ๐˜†๐—ผ๐˜‚๐—ฟ ๐—›๐—ง๐— ๐—Ÿ ๐˜„๐—ฒ๐—ฏ๐—ฝ๐—ฎ๐—ด๐—ฒ๐˜€ ๐˜๐—ผ ๐—บ๐—ฎ๐—ฟ๐—ธ๐—ฑ๐—ผ๐˜„๐—ป ๐—ถ๐˜€ ๐—ป๐—ผ๐˜„ ๐—ฝ๐—ผ๐˜€๐˜€๐—ถ๐—ฏ๐—น๐—ฒ ๐—ฒ๐—ป๐—ฑ-๐˜๐—ผ-๐—ฒ๐—ป๐—ฑ ๐˜„๐—ถ๐˜๐—ต ๐—ฎ ๐˜€๐—ถ๐—บ๐—ฝ๐—น๐—ฒ ๐—Ÿ๐—Ÿ๐— ! ๐Ÿ‘

Jina just released Reader-LM, that handles the whole pipeline of extracting markdown from HTML webpages.

A while ago, Jina had released a completely code-based deterministic program to do this extraction, based on some heuristics : e.g., โ€œif the text is in a <p> tag, keep it, but if itโ€™s hidden behind another, remove itโ€.

๐Ÿค” But they received complaints from readers: some found it too detailed, other not enough, depending on the pages.

โžก๏ธ So they decided, ๐—บ๐—ฎ๐˜†๐—ฏ๐—ฒ ๐—ต๐—ฒ๐˜‚๐—ฟ๐—ถ๐˜€๐˜๐—ถ๐—ฐ๐˜€ ๐˜„๐—ฒ๐—ฟ๐—ฒ ๐—ป๐—ผ๐˜ ๐—ฒ๐—ป๐—ผ๐˜‚๐—ด๐—ต: ๐—ถ๐—ป๐˜€๐˜๐—ฒ๐—ฎ๐—ฑ, ๐˜๐—ต๐—ฒ๐˜† ๐˜๐—ฟ๐—ถ๐—ฒ๐—ฑ ๐˜๐—ผ ๐˜๐—ฟ๐—ฎ๐—ถ๐—ป ๐—ฎ ๐—Ÿ๐—Ÿ๐—  ๐˜๐—ผ ๐—ฑ๐—ผ ๐˜๐—ต๐—ฒ ๐—ฐ๐—ผ๐—บ๐—ฝ๐—น๐—ฒ๐˜๐—ฒ ๐—ฒ๐˜…๐˜๐—ฟ๐—ฎ๐—ฐ๐˜๐—ถ๐—ผ๐—ป. This LLM does not need to be very strong,but it should handle a very long context: itโ€™s a challenging, โ€œshallow-but-wideโ€ architecture.

๐—ง๐—ฒ๐—ฐ๐—ต๐—ป๐—ถ๐—ฐ๐—ฎ๐—น ๐—ถ๐—ป๐˜€๐—ถ๐—ด๐—ต๐˜๐˜€:
2๏ธโƒฃ models: Reader-LM-0.5B and 1.5B
โš™๏ธ Two stages of training: first, short and simple HTML to get the basics, then ramp up to longer and harder HTML up to 128k tokens
๐Ÿ”Ž Use contrastive search for decoding: this empirically reduces โ€œrepeating outputโ€ issues
โžก๏ธ Their models beat much larger models at HTML extraction ๐Ÿ”ฅ
๐Ÿค— Weights available on HF (sadly cc-by-nc license):
jinaai/reader-lm-1.5b
Hugging face presents FineVideo ๐ŸŽฅ! Unlocking the next generation of Video understanding ๐Ÿš€

๐Ÿคฏ3400 hours of annotated Creative Common videos with rich character descriptions, scene splits, mood, and content descriptions per scene as well as QA pairs.
๐Ÿ”ฅ
@mfarre processed over 2M videos of Youtube-CC to make this incredibly powerful selection.

Very psyched to fine-tune idefics on this dataset. โšก๏ธ
Explore the videos:
HuggingFaceFV/FineVideo-Explorer
Hugging face presents FineVideo ๐ŸŽฅ! Unlocking the next generation of Video understanding ๐Ÿš€

๐Ÿคฏ3400 hours of annotated Creative Common videos with rich character descriptions, scene splits, mood, and content descriptions per scene as well as QA pairs.
๐Ÿ”ฅ
@mfarre processed over 2M videos of Youtube-CC to make this incredibly powerful selection.

Very psyched to fine-tune idefics on this dataset. โšก๏ธ
Explore the videos:
HuggingFaceFV/FineVideo-Explorer
๐Ž๐ฉ๐ž๐ง๐€๐ˆ ๐Ÿ๐ข๐ง๐š๐ฅ๐ฅ๐ฒ ๐ซ๐ž๐ฏ๐ž๐š๐ฅ๐ฌ โ€œ๐Ÿ“โ€: ๐œ๐ซ๐š๐ณ๐ฒ ๐œ๐ก๐š๐ข๐ง-๐จ๐Ÿ-๐ญ๐ก๐จ๐ฎ๐ ๐ก๐ญ-๐ญ๐ฎ๐ง๐ž๐ ๐ฆ๐จ๐๐ž๐ฅ >> ๐†๐๐“-๐Ÿ’๐จ ๐Ÿ’ฅ

OpenAI had hinted at a mysterious โ€œproject strawberryโ€ for a long time: ๐˜๐—ต๐—ฒ๐˜† ๐—ฝ๐˜‚๐—ฏ๐—น๐—ถ๐˜€๐—ต๐—ฒ๐—ฑ ๐˜๐—ต๐—ถ๐˜€ ๐—ป๐—ฒ๐˜„ ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น ๐—ฐ๐—ฎ๐—น๐—น๐—ฒ๐—ฑ โ€œ๐—ผ๐Ÿญโ€ ๐Ÿญ๐—ต๐—ผ๐˜‚๐—ฟ ๐—ฎ๐—ด๐—ผ, ๐—ฎ๐—ป๐—ฑ ๐˜๐—ต๐—ฒ ๐—ฝ๐—ฒ๐—ฟ๐—ณ๐—ผ๐—ฟ๐—บ๐—ฎ๐—ป๐—ฐ๐—ฒ ๐—ถ๐˜€ ๐—ท๐˜‚๐˜€๐˜ ๐—บ๐—ถ๐—ป๐—ฑ-๐—ฏ๐—น๐—ผ๐˜„๐—ถ๐—ป๐—ด.

๐Ÿคฏ Ranks among the top 500 students in the US in a qualifier for the USA Math Olympiad
๐Ÿคฏ Beats human-PhD-level accuracy by 8% on GPQA, hard science problems benchmark where the previous best was Claude 3.5 Sonnet with 59.4.
๐Ÿคฏ Scores 78.2% on vision benchmark MMMU, making it the first model competitive w/ human experts
๐Ÿคฏ GPT-4o on MATH scored 60% โ‡’ o1 scores 95%

How did they pull this? Sadly OpenAI keeps increasing their performance in โ€œmaking cryptic AF reports to not reveal any real infoโ€, so here are excerpts:

๐Ÿ’ฌ โ€œ๐—ผ๐Ÿญ ๐˜‚๐˜€๐—ฒ๐˜€ ๐—ฎ ๐—ฐ๐—ต๐—ฎ๐—ถ๐—ป ๐—ผ๐—ณ ๐˜๐—ต๐—ผ๐˜‚๐—ด๐—ต๐˜ ๐˜„๐—ต๐—ฒ๐—ป ๐—ฎ๐˜๐˜๐—ฒ๐—บ๐—ฝ๐˜๐—ถ๐—ป๐—ด ๐˜๐—ผ ๐˜€๐—ผ๐—น๐˜ƒ๐—ฒ ๐—ฎ ๐—ฝ๐—ฟ๐—ผ๐—ฏ๐—น๐—ฒ๐—บ. ๐—ง๐—ต๐—ฟ๐—ผ๐˜‚๐—ด๐—ต ๐—ฟ๐—ฒ๐—ถ๐—ป๐—ณ๐—ผ๐—ฟ๐—ฐ๐—ฒ๐—บ๐—ฒ๐—ป๐˜ ๐—น๐—ฒ๐—ฎ๐—ฟ๐—ป๐—ถ๐—ป๐—ด, ๐—ผ๐Ÿญ ๐—น๐—ฒ๐—ฎ๐—ฟ๐—ป๐˜€ ๐˜๐—ผ ๐—ต๐—ผ๐—ป๐—ฒ ๐—ถ๐˜๐˜€ ๐—ฐ๐—ต๐—ฎ๐—ถ๐—ป ๐—ผ๐—ณ ๐˜๐—ต๐—ผ๐˜‚๐—ด๐—ต๐˜ ๐—ฎ๐—ป๐—ฑ ๐—ฟ๐—ฒ๐—ณ๐—ถ๐—ป๐—ฒ ๐˜๐—ต๐—ฒ ๐˜€๐˜๐—ฟ๐—ฎ๐˜๐—ฒ๐—ด๐—ถ๐—ฒ๐˜€ ๐—ถ๐˜ ๐˜‚๐˜€๐—ฒ๐˜€. It learns to recognize and correct its mistakes.โ€

And of course, they decide to hide the content of this precious Chain-of-
Thought. Would it be for maximum profit? Of course not, you awful capitalist, itโ€™s to protect users:

๐Ÿ’ฌ โ€œWe also do not want to make an unaligned chain of thought directly visible to users.โ€

Theyโ€™re right, it would certainly have hurt my feelings to see the internal of this model tearing apart math problems.

๐Ÿค” I suspect it could be not only CoT, but also some agentic behaviour where the model can just call a code executor. The kind of score improvement the show certainly looks like the ones you see with agents.

This model will be immediately released for ChatGPT and some โ€œtrusted API usersโ€.

Letโ€™s start cooking to release the same thing in 6 months! ๐Ÿš€
I believe Hugging Face should have something similar to Hacktoberfest. I miss the days when there were events like this every 3 months for audio, deep reinforcement learning, gradio themes, but it turns out everything slowed down. There are no more Hugging Face events.