HF-hub - Share and discover more about AI with social posts from the community.huggingface/OpenAi
Share and discover more about AI with social posts from the community.huggingface/OpenAi
The only 405B spaces still freely accessible are powered by SN fast api.

xianbao/SambaNova-fast


https://sambanova.ai/fast-api?api_ref=907266
Sharing for anyone using Diffusers from_single_file loading and affected by the Runway SD 1.5 issue.

If you have runwayml/stable-diffusion-v1-5 saved locally in your HF cache then loading single file checkpoints in the following way should still work.


from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_single_file("<url or path to single file checkpoint>")


If you do not have the model repo saved in your cache, then automatically inferring the pipeline config will not work since the reference repo runwayml/stable-diffusion-v1-5 doesn't exist anymore.

You can use an alternative SD1.5 repo id to still configure your pipeline.


from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_single_file("<url or path to single file checkpoint>", config="Lykon/DreamShaper")


We're working on resolving the issue ASAP.
Introducing "Writing in the Margins (WiM)" - better inference pattern for long context LLMs that solves the Lost-in-the-Middle problem 🔥

Paper page:
Writing in the Margins: Better Inference Pattern for Long Context Retrieval (2408.14906)


TL;DR
Make your model write "margin notes" as you chunk prefill the KV cache. Then ask it reread all notes before it speaks up.
Works with humans, works with AI 🤖

WiM leverages the chunked prefill of the key-value cache, which concurrently generates query-based extractive summaries at each step of the prefill that are subsequently reintegrated at the end of the computation. We term these intermediate outputs “margins”, drawing inspiration from the practice of making margin notes for improved comprehension of long contexts in human reading. We show that this technique, which adds only minimal additional computation, significantly improves LLMs long context reasoning capabilities.

Think: Every chunk has a chance to be attended to/ be at the end of the context at least once. 🎉

📊 Results:
- An average accuracy boost of 7.5% in multi-hop reasoning tasks like HotpotQA and MultiHop-RAG.
- Even a 30% increase in F1-score for summarisation-like tasks (CWE).

Plus, WiM fits seamlessly into interactive applications (think: progress bar!). It can provide real-time progress updates during data retrieval and integration, making it user-friendly and transparent - a stark contrast to feeding 1mln tokens to an LLMs and waiting 6 min for the first token. 🤯

👩‍💻🧑‍💻 Check it out and contribute to our open-source project here: https://github.com/writer/writing-in-the-margins

🧠 More about chunked prefill: https://docs.vllm.ai/en/latest/models/performance.html#chunked-prefill GitHub - writer/writing-in-the-margins
Had a funny thought, would it be at all possible to rework what shows up on our personal HF page?

Picture this: I upload a model to an organization, someone who follows me now has no idea that I've uploaded a model or to where, unless they also watch those repos (which also floods them with other notifications)

What if our main Huggingface page was a collection of both models that we've uploaded specifically to our profile, as well as models we've uploaded to organizations? That way it would all be contained in one central followable location, and I wouldn't have concerns about losing followership if I wanted to upload to an organization all of a sudden.
run Llama405B at over 100 tokens per second for free using SambaNova's API! https://sambanova.ai/fast-api?api_ref=444868

I have been able to generate some high quality synthetic data and use it as an LLM as a judge instead of the slower and more expensive alternatives like openAI or Anthropic. Get Fast & Free AI Inference API | SambaNova Systems
fast-sentence-transformers - simply, faster, sentence-transformers

Released an initial version a while ago
Archived it because of a cleaner solution described in a blog by Philipp Schmid
Reimplemented it based on that cleaner solution
Unarchived the project
Packaged it up
Released a 0.5 version

pip install fast-sentence-transformers

https://github.com/davidberenstein1957/fast-sentence-transformers GitHub - davidberenstein1957/fast-sentence-transformers: Simply, faster, sentence-transformers
𝗔 𝗻𝗲𝘂𝗿𝗮𝗹 𝗻𝗲𝘁𝘄𝗼𝗿𝗸 𝘀𝗶𝗺𝘂𝗹𝗮𝘁𝗲𝘀 𝗗𝗢𝗢𝗠: 𝗚𝗼𝗼𝗴𝗹𝗲 𝗿𝗲𝘀𝗲𝗮𝗿𝗰𝗵𝗲𝗿𝘀 𝗼𝗽𝗲𝗻 𝘁𝗵𝗲 𝘄𝗮𝘆 𝗳𝗼𝗿 𝗰𝗼𝗺𝗽𝗹𝗲𝘁𝗲𝗹𝘆-𝗔𝗜-𝗴𝗲𝗻𝗲𝗿𝗮𝘁𝗲𝗱 𝗴𝗮𝗺𝗲𝘀!

Imagine if games were completely live-generated by an AI model : the NPCs and their dialogues, the storyline, and even the game environment. The player’s in-game actions would have a real, lasting impact on the game story.

In a very exciting paper, Google researchers just gave us the first credible glimpse of this future.

➡️ They created GameNGen, the first neural model that can simulate a complex 3D game in real-time. They use it to simulate the classic game DOOM running at over 20 frames per second on a single TPU, with image quality comparable to lossy JPEG compression. And it feels just like the true game!

Here's how they did it:
1. They trained an RL agent to play DOOM and recorded its gameplay sessions.
2. They then used these recordings to train a diffusion model to predict the next frame, based on past frames and player actions.
3. During inference, they use only 4 denoising steps (instead of the usual dozens) to generate each frame quickly.

𝗞𝗲𝘆 𝗶𝗻𝘀𝗶𝗴𝗵𝘁𝘀:
🎮🤔 Human players can barely tell the difference between short clips (3 seconds) of the real game or the simulation
🧠 The model maintains game state (health, ammo, etc.) over long periods despite having only 3 seconds of effective context length
🔄 They use "noise augmentation" during training to prevent quality degradation in long play sessions
🚀 The game runs on one TPU at 20 FPS with 4 denoising steps, or 50 FPS with model distillation (with some quality loss)

The researchers did not open source the code, but I feel like we’ve just seen a part of the future being written!

Their paper (exploding the upvote counter) 👉
Diffusion Models Are Real-Time Game Engines (2408.14837)

In a similar vein, play @Jofthomas's 'Everchanging Quest' 🎮
Jofthomas/Everchanging-Quest

00:07
Chatbots should let users select their preferred reading speed, defined by words per minute.

By dynamically adjusting batch sizes based on user-defined reading speeds, you could more effectively distribute requests, especially in large-scale distributed systems. For users preferring slower token generation, larger batches can be processed concurrently, maximising GPU throughput without compromising user experience (as these users have expressed they are indifferent to, or may even prefer, higher latency).

For the user, for different tasks the user may prefer different reading speeds. When generating code, I want responses as quickly as possible. But when I'm bouncing ideas off an LLM, I'd prefer a more readable pace rather than a wall of text.
Excited to share my new Gradio app featuring the impressive Llama-3.1-Storm-8B model!
This app demonstrates the capabilities of Llama-3.1-Storm-8B, an 8B parameter language model created by Ashvini Kumar Jindal, Pawan Kumar Rajpoot, Ankur Parikh,@akjindal53244
Key highlights of Llama-3.1-Storm-8B:

Outperforms Llama-3.1-8B-Instruct on multiple benchmarks:

Instruction Following (IFEval): +3.93%
Knowledge-driven QA (GPQA): +7.21%
Reduced Hallucinations (TruthfulQA): +9%
Function Calling (BFCL): +7.92%


Achieves impressive results with only 8B parameters
Uses innovative techniques like self-curation and model merging

Try out the model yourself:
sagar007/lama_storm_8b


Kudos to the creators for pushing the boundaries of smaller language models! This work makes advanced AI more accessible and efficient.
#AI #NLP #MachineLearning #GradioApp #Llama3
Introducing
fal/AuraFace-v1
: Commercially available & open source identity encoder model for next generation one shot personalization. Read more about it here: https://huggingface.co/blog/isidentical/auraface Introducing AuraFace: Open-Source Face Recognition and Identity Preservation Models
worked with OpenAI to fine-tune gpt-4o and built the SOTA model for the
patched-codes/static-analysis-eval
benchmark. All the code and data
patched-codes/synth-vuln-fixes
on how we did it is available on their GitHub - https://github.com/openai/build-hours/tree/main/5-4o_fine_tuning.

Here are some tips based on our experience:

→ Establish baseline with "conditioning" / prompting

→ Task-specific datasets are ideal for PEFT; hard to beat gpt-4o on "broad" tasks

→ Add your best system prompt to each example

→ Ensure training data distribution is similar to inference data

→ Shorten instructions with concise prompts; may require more examples.

→ Define clear evaluation metrics (seriously, please eval!)

You can see more details on the benchmark and process here - https://www.patched.codes/blog/the-static-analysis-evaluation-benchmark-measuring-llm-performance-in-fixing-software-vulnerabilities build-hours/5-4o_fine_tuning at main · openai/build-hours
This merge, this time grounded in Gemma2 9B Instruct fine-tunes, is another demonstration that models without any fine-tuning to support roleplay can still perform the function, maintaining coherence and attention to context. It should be evident that no overt fine-tuning is required for roleplay in text generation; pretraining should provide models with a requisite basic understanding of the world, so all that should be needed is some corrective fine-tuning to address observed defects in portraying the world along with datasets to promote a suitably entertaining writing style. Good Instruct tuning should promote reasoning, coherence, and attention to context.
grimjim/Kitsunebi-v1-Gemma2-8k-9B

grimjim/Kitsunebi-v1-Gemma2-8k-9B-GGUF


I opted not to incorporate the UCLA SPPO fine-tune for Gemma2 9B after observing context confusion occur with some frequency during complex scenarios.

Thanks to Axcxept co., ltd. for fine-tuning HODACHI/EZO-Common-9B-gemma-2-it, and to Princeton NLP Group for fine-tuning princeton-nlp/gemma-2-9b-it-SimPO.
AXCXEPT/EZO-Common-9B-gemma-2-it

princeton-nlp/gemma-2-9b-it-SimPO https://huggingface.co/AXCXEPT/EZO-Common-9B-gemma-2-it AXCXEPT/EZO-Common-9B-gemma-2-it · Hugging Face
🌟 Liger Kernel: Efficient Triton Kernels for LLM Training

LIGER "is a [Hugging Face-compatible] collection of Triton kernels designed specifically for LLM training. It can effectively increase multi-GPU training throughput by 20% and reduces memory usage by 60%."

GitHub: https://github.com/linkedin/Liger-Kernel GitHub - linkedin/Liger-Kernel: Efficient Triton Kernels for LLM Training
Exploring Realistic Emotional Depth in AI Language Models

Language models, particularly those proprietary, often grapple with issues of censorship, which can limit their ability to engage authentically with users. Recognizing this, the open-source AI community has pioneered the development of language models that are less restrained, offering more candid interactions. However, even these models tend to maintain a veneer of neutrality or overly positive responses, which might not serve all users' needs, especially in contexts where emotional depth and relatability are crucial.

To address this gap, I've curated a specialized dataset aimed at infusing language models with a more nuanced emotional spectrum, specifically targeting a darker, more introspective mood. This dataset, titled "Dark Sentience", is designed to complement existing datasets like RP (Role Play) and those focused on instruction following. It seeks to enhance the emotional intelligence of AI by exposing it to complex human emotions, including but not limited to:

- Suicide
- Depression
- Anxiety

Trigger Warning: Please be advised that the content within this dataset deals with heavy and potentially distressing themes.

The "Dark Sentience" dataset is now available for review and use at:
Locutusque/Dark-Sentience
. I encourage researchers, developers, and mental health professionals to explore how this resource can foster more genuine and supportive AI interactions.
30 Features that Dramatically Improve LLM Performance - Part 1 https://mltblog.com/3Aq9iAb

Many are ground-breaking innovations that make LLMs much faster and not prone to hallucinations. They reduce the cost, latency, and amount of computer resources (GPU, training) by several orders of magnitude. Some of them improve security, making your LLM more attractive to corporate clients. I introduced a few of these features in my previous article "New Trends in LLM Architecture". Now I offer a comprehensive list, based on the most recent developments.

Read full article, learn about agentic LLMs, LLM routers, contextual tables, fast search, and more, at https://mltblog.com/3Aq9iAb 30 Features that Dramatically Improve LLM Performance – Part 1
🎮 Introducing a new dataset focused on Steam user game bans -
nyuuzyou/steambans


Dataset highlights:
- 476,694 Steam user profiles
- Emphasis on game ban data and account status
- Each entry includes: Steam ID, profile URL, username, avatar, account creation date, visibility state, VAC bans, game bans, economy ban status, and days since last ban
- Covers a wide range of Steam users, from clean accounts to those with multiple bans
- Data spans up to 2024

Perfect for researchers studying cheating in online games, anti-cheat effectiveness, and player behavior patterns in the face of bans.
📐 AI Math Equation Solver: Your Step-by-Step Solution Companion

Hello Hugging Face community! 👋 I'm excited to share my latest Space: the AI Math Equation Solver!

🔍 What does it do?

This Space uses the power of AI to solve math problems from images. Simply upload a picture of a math equation or problem, and the AI will provide a detailed, step-by-step solution. It's perfect for students, teachers, or anyone looking to understand complex mathematical concepts better.

🧠 How does it work?

- Backend: Utilizes the microsoft/Phi-3.5-vision-instruct model for image understanding and mathematical reasoning.
- Frontend: Built with Gradio for a clean, user-friendly interface.
- Features:
- Image upload for math problems
- Detailed step-by-step solutions
- Example problems to try instantly

🚀 Try it out!
sagar007/phi-vision-math-assistant


Visit the Space here: [Insert your Hugging Face Space URL]

💡 Use cases:

- Students: Check your work or get help with homework
- Teachers: Create detailed solution guides quickly
- Tutors: Explain complex problems more effectively
- Self-learners: Understand new mathematical concepts

🛠 Technical Details:

- Model: microsoft/Phi-3.5-vision-instruct
- Libraries: transformers, Gradio, PyTorch
- Optimizations: Uses Flash Attention for improved performance

🤝 Contribute:

This is an open project, and I welcome contributions! Whether it's improving the model, enhancing the UI, or adding new features, feel free to fork the project and submit your pull requests.

📣 Feedback:

I'd love to hear your thoughts! How are you using this Space? Any suggestions for improvements? Let me know in the comments below.

Happy problem-solving! 🎉

#MachineLearning #AI #Mathematics #Education #HuggingFace
I developed a way to test very clearly whether or not a Transformers model can actually learn symbolic reasoning, or if LLM models are forever doomed to be offshoots of 'Socratic Parrots'. The results are in, undeniable proof that Transformers models CAN learn symbolic relationships. Undeniable proof that AI can learn its ABC's. Credit goes to myself, Claude, and ChatGPT. I would not be able to prove this without Claude or ChatGPT.



https://www.youtube.com/watch?v=I8jHRgahRfY