HF-hub - Share and discover more about AI with social posts from the community.huggingface/OpenAi
Share and discover more about AI with social posts from the community.huggingface/OpenAi
Fal/AuraFlow-v0.3
is now here with support for different aspect resolutions (w/h up to 1536px!) and much nicer aesthetics! Make sure to install the latest diffusers to get support for it.
As some of you know, I try to convert models to either fp32 or bf16 depending on theirs size before doing imatrix and quantization

Today I decided to see if that matters, and the results have me.. for lack of a better word, perplexed

My setup:

Mistral Nemo Instruct 2407
- convert to FP32, calculate imatrix, quantize to Q8_0 and Q4_K_M
- convert to FP16, calculate imatrix, quantize to Q8_0 and Q4_K_M

I calculated the kld base from the FP32 model:
./llama-perplexity -m /models/Mistral-Nemo-Instruct-2407-f32.gguf -f /training_data/wikitext-2-raw/wiki.test.raw --kl-divergence-base /training_data/mistral-nemo-f32.kld -ngl 35 -fa -sm row

then calculated the divergence itself for each like so:
./llama-perplexity -m /models/Mistral-Nemo-Instruct-2407-Q8_0.gguf -f /training_data/wikitext-2-raw/wiki.test.raw --kl-divergence-base /training_data/mistral-nemo-f32.kld --kl-divergence -ngl 50 -fa -sm row

Q4_K_M from fp16 and fp32 were similar, trading blows across statistics, odd since i expected fp32 to be strictly better but it's not

Q8_0 is where things get weird. Despite each file being slightly different size, and the sha256sum of course being different, they each get *completely identical* scores, down to 6 decimal places of precision on the statistics.

How is this possible? Is there something I don't understand about llama.cpp that makes it always convert to fp16 before it does quantization? Am I wasting time using FP32/BF16??
https://huggingface.co/posts/bartowski/608656345183499 @bartowski on Hugging Face:
Improved ControlNet!
Now supports dynamic resolution for perfect landscape and portrait outputs. Generate stunning images without distortion—optimized for any aspect ratio!
...
https://huggingface.co/spaces/DamarJati/FLUX.1-DEV-Canny FLUX.1-DEV Canny - a Hugging Face Space by DamarJati
SAM2 Video Predictor
This is a simple demo for video segmentation with SAM2.

Instructions: (read the instructions)

Upload your video [MP4-24fps]
With 'include' point type selected, Click on the object to mask on first frame
Switch to 'exclude' point type if you want to specify an area to avoid
Get Mask !
Check Propagation every 15 frames
Add point on corresponding frame number if any mask needs to be refined
If propagation seems ok on every 15 frames, propagate with "render" to render final masked video !
Hit Reset button if you want to refresh and start again.
Input video will be processed over 10 seconds only for demo purpose :)
https://huggingface.co/spaces/fffiloni/SAM2-Video-Predictor SAM2 Video Predictor - a Hugging Face Space by fffiloni
Second day of Composer use
genecyber
1h
First of all, I’m loving composer, I use it in combination with live preview and can see real time rendering of edits also.

Second of all, we really need some sort of internal versioning, unable to undo is pretty scary. Also id I add my own versions, composer gets confused which files it’s editing.

I had a filename change work tonight, that was cool.

Mostly I’m struggling to add files to composer context that stick, but it might be that my composer disconnects? If composer gets in a weird state I end up having to restart cursor completely to recover.

why is the ui not following standards? in minimized the + makes a new instead of expanding, and the only way to expand floating is keys+i ? at least I can now find my composer instances I might accidentally lose by clicking plus.

The direction is amazing, and far surpasses any other experiences I’ve tried with collaborative editing multiple files.

Also love the ability to modularize the code when it gets just too much for context windows. I’ll create a styles.css add it to composer context and ask it to move styles to css file, and so on, this is great. otherwise Cursor suffers like everyone else with attention when it comes to large files.

These are all over the place I know.
https://forum.cursor.com/t/second-day-of-composer-use/7584
Make Google part of your security team. Join Mandiant and Google Cloud experts online for Google Cloud Security Summit, Thursday, August 22, at 11:30 AM CST, to discover how you can defend your organisation against evolving cyberthreats with intel-driven security operations, a secure-by-design foundation, and AI innovations across the security life cycle.


Register to dive deep into key security topics and new technologies:
https://cloudonair.withgoogle.com/events/summit-apac-security-24?utm_content=invite2&utm_source=cloud_sfdc&utm_medium=email&utm_campaign=FY24-Q3-apac-EXP120-onlineevent-er-dgcsm-security-summit-2024-mc&pref=K&mkt_tok=ODA4LUdKVy0zMTQAAAGU6OHWBoF2Y5Hwwfb2QFOtyxtgmixuo-CGF6NRTRFIkjwshtRL-iLyPcZqVgrMOI8bqjtZOditNJpP6QJl-PDITmFSR8L1dvNKb2vJEg3zPxovd3Vaavw

Start with the opening keynote.

Join Sunil Potti, VP and GM of Google Cloud Security, to explore how AI is enhancing security and helping organisations boost their resilience.


Get to know Gemini for Security.

Check out the latest ways Google AI is transforming cloud security, security operations, and threat intelligence with robust Gemini-powered capabilities.


Gain valuable insights from the 2024 M-Trends report.

Learn about the evolving cyber-threat landscape from Steve D’sa, Regional Leader, Mandiant Consulting. Featuring APAC perspectives and best practice you can apply directly to your security program.

Register yourself and your team today, join five or more sessions and receive a Google Cloud collectible digital badge in recognition of your participation.
Metric Card for CharacTER
CharacTer is a character-level metric inspired by the commonly applied translation edit rate (TER). It is defined as the minimum number of character edits required to adjust a hypothesis, until it completely matches the reference, normalized by the length of the hypothesis sentence. CharacTer calculates the character level edit distance while performing the shift edit on word level. Unlike the strict matching criterion in TER, a hypothesis word is considered to match a reference word and could be shifted, if the edit distance between them is below a threshold value. The Levenshtein distance between the reference and the shifted hypothesis sequence is computed on the character level. In addition, the lengths of hypothesis sequences instead of reference sequences are used for normalizing the edit distance, which effectively counters the issue that shorter translations normally achieve lower TER. If this is a text-based metric, make sure to wrap you input in double quotes. Alternatively you can use a JSON-formatted list as input.
https://huggingface.co/spaces/evaluate-metric/character CharacTER - a Hugging Face Space by evaluate-metric
Metric Card for BLEU
BLEU (Bilingual Evaluation Understudy) is an algorithm for evaluating the quality of text which has been machine-translated from one natural language to another. Quality is considered to be the correspondence between a machine’s output and that of a human: “the closer a machine translation is to a professional human translation, the better it is” – this is the central idea behind BLEU. BLEU was one of the first metrics to claim a high correlation with human judgements of quality, and remains one of the most popular automated and inexpensive metrics.

Scores are calculated for individual translated segments—generally sentences—by comparing them with a set of good quality reference translations. Those scores are then averaged over the whole corpus to reach an estimate of the translation’s overall quality. Neither intelligibility nor grammatical correctness are not taken into account.

If this is a text-based metric, make sure to wrap you input in double quotes. Alternatively you can use a JSON-formatted list as input. https://huggingface.co/spaces/evaluate-metric/bleu BLEU - a Hugging Face Space by evaluate-metric
Metric Card for BERT Score
Metric description
BERTScore is an automatic evaluation metric for text generation that computes a similarity score for each token in the candidate sentence with each token in the reference sentence. It leverages the pre-trained contextual embeddings from BERT models and matches words in candidate and reference sentences by cosine similarity.

Moreover, BERTScore computes precision, recall, and F1 measure, which can be useful for evaluating different language generation tasks.

How to use
BERTScore takes 3 mandatory arguments : predictions (a list of string of candidate sentences), references (a list of strings or list of list of strings of reference sentences) and either lang (a string of two letters indicating the language of the sentences, in ISO 639-1 format) or model_type (a string specififying which model to use, according to the BERT specification). The default behavior of the metric is to use the suggested model for the target language when one is specified, otherwise to use the model_type indicated.https://huggingface.co/spaces/evaluate-metric/bertscore BERT Score - a Hugging Face Space by evaluate-metric
Supabase Realtime: Broadcast and Presence Authorization

Today we're releasing Authorization for Realtime's Broadcast and Presence.

For context, Supabase includes three useful extensions for building real-time applications.

Broadcast: Send ephemeral, low-latency messages between users.
Presence: Show when users are online and share state between users.
Postgres Changes: Listen to Postgres database changes.
This release introduces authorization for Broadcast and Presence using Row Level Security policies:
https://youtu.be/IXRrU9MpA8Q

https://supabase.com/blog/supabase-realtime-broadcast-and-presence-authorization
New phone, new era. The new #Pixel9 is built for and with Gemini. It has…
- Tools using Gemini to spark creativity
- Pixel Camera features for great photos *and* videos
- AI that improves phone calls
- Smart, elevated design
#MadeByGoogle
TTS Arena: Benchmarking Text-to-Speech Models in the Wild
Automated measurement of the quality of text-to-speech (TTS) models is very difficult. Assessing the naturalness and inflection of a voice is a trivial task for humans, but it is much more difficult for AI. This is why today, we’re thrilled to announce the TTS Arena. Inspired by LMSys's Chatbot Arena for LLMs, we developed a tool that allows anyone to easily compare TTS models side-by-side. Just submit some text, listen to two different models speak it out, and vote on which model you think is the best. The results will be organized into a leaderboard that displays the community’s highest-rated models.https://huggingface.co/blog/arena-tts TTS Arena: Benchmarking Text-to-Speech Models in the Wild
Yesterday, we released Parler-TTS and Data-Speech, fully open-source reproduction of work from the paper:
Natural language guidance of high-fidelity text-to-speech with synthetic annotations (2402.01912)


Parler-TTS is a lightweight text-to-speech (TTS) model that can generate high-quality, natural sounding speech in the style of a given speaker (gender, pitch, speaking style, etc).

https://huggingface.co/collections/parler-tts/parler-tts-fully-open-source-high-quality-tts-models-66164ad285ba03e8ffde214c

Parler-TTS Mini v0.1, is the first iteration Parler-TTS model trained using 10k hours of narrated audiobooks. It generates high-quality speech with features that can be controlled using a simple text prompt (e.g. gender, background noise, speaking rate, pitch and reverberation).

To improve the prosody and naturalness of the speech further, we're scaling up the amount of training data to 50k hours of speech. The v1 release of the model will be trained on this data, as well as inference optimisations, such as flash attention and torch compile.

parler-tts/parler_tts_mini_v0.1


Data-Speech can be used for annotating speech characteristics in a large-scale setting.

parler-tts/open-source-speech-datasets-annotated-using-data-speech-661648ffa0d3d76bfa23d534


This work is both scalable and easily modifiable and will hopefully help the TTS research community explore new ways of conditionning speech synthesis.

All of the datasets, pre-processing, training code and weights are released publicly under permissive license, enabling the community to build on our work and develop their own powerful TTS models. Parler-TTS: fully open-source high-quality TTS - a parler-tts Collection
I'm excited to announce that Transformers.js V3 is finally available on NPM! 🔥 State-of-the-art Machine Learning for the web, now with WebGPU support! 🤯⚡️

Install it from NPM with:
𝚗𝚙𝚖 𝚒 @𝚑𝚞𝚐𝚐𝚒𝚗𝚐𝚏𝚊𝚌𝚎/𝚝𝚛𝚊𝚗𝚜𝚏𝚘𝚛𝚖𝚎𝚛𝚜

or via CDN, for example: https://v2.scrimba.com/s0lmm0qh1q

Segment Anything demo:
webml-community/segment-anything-webgpu
Imagen 3
Published on Aug 14
·
Submitted by
akhaliq
on Aug 14
#2 Paper of the day
Authors:
Imagen-Team-Google
,

Jason Baldridge
,
Jakob Bauer
,
Mukul Bhutani
,
Nicole Brichtova
,
Andrew Bunner
,
Kelvin Chan
,

Yichang Chen
,
Sander Dieleman
,

Yuqing Du
,

Zach Eaton-Rosen
,

Hongliang Fei
,
Nando de Freitas
,
Yilin Gao
,
Evgeny Gladchenko
,
Sergio Gómez Colmenarejo
,
Mandy Guo
,

Alex Haig
,
Will Hawkins
,

Hexiang Hu
,
Huilian Huang
,

Tobenna Peter Igwe
+229 authors
Abstract
We introduce Imagen 3, a latent diffusion model that generates high quality images from text prompts. We describe our quality and responsibility evaluations. Imagen 3 is preferred over other state-of-the-art (SOTA) models at the time of evaluation. In addition, we discuss issues around safety and representation, as well as methods we used to minimize the potential harm of our models.
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs
Published on Aug 14
·
Submitted by
akhaliq
on Aug 14
#1 Paper of the day
Authors:

Yushi Bai
,
Jiajie Zhang
,

Xin Lv
,

Linzhi Zheng
,
Siqi Zhu
,
Lei Hou
,

Yuxiao Dong
,

Jie Tang
,

Juanzi Li
Abstract
Current long context large language models (LLMs) can process inputs up to 100,000 tokens, yet struggle to generate outputs exceeding even a modest length of 2,000 words. Through controlled experiments, we find that the model's effective generation length is inherently bounded by the sample it has seen during supervised fine-tuning (SFT). In other words, their output limitation is due to the scarcity of long-output examples in existing SFT datasets. To address this, we introduce AgentWrite, an agent-based pipeline that decomposes ultra-long generation tasks into subtasks, enabling off-the-shelf LLMs to generate coherent outputs exceeding 20,000 words. Leveraging AgentWrite, we construct LongWriter-6k, a dataset containing 6,000 SFT data with output lengths ranging from 2k to 32k words. By incorporating this dataset into model training, we successfully scale the output length of existing models to over 10,000 words while maintaining output quality. We also develop LongBench-Write, a comprehensive benchmark for evaluating ultra-long generation capabilities. Our 9B parameter model, further improved through DPO, achieves state-of-the-art performance on this benchmark, surpassing even much larger proprietary models. In general, our work demonstrates that existing long context LLM already possesses the potential for a larger output window--all you need is data with extended output during model alignment to unlock this capability. Our code & models are at: https://github.com/THUDM/LongWriter. GitHub - THUDM/LongWriter: LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs
InfinityMATH: A Scalable Instruction Tuning Dataset in Programmatic Mathematical Reasoning
Published on Aug 9
·
Submitted by
akhaliq
on Aug 15
Authors:
Bo-Wen Zhang
,
Yan Yan
,

Lin Li
,
Guang Liu
Abstract
Recent advancements in Chain-of-Thoughts (CoT) and Program-of-Thoughts (PoT) methods have greatly enhanced language models' mathematical reasoning capabilities, facilitating their integration into instruction tuning datasets with LLMs. However, existing methods for large-scale dataset creation require substantial seed data and high computational costs for data synthesis, posing significant challenges for scalability. We introduce InfinityMATH, a scalable instruction tuning dataset for programmatic mathematical reasoning. The construction pipeline emphasizes decoupling numbers from mathematical problems to synthesize number-independent programs, enabling efficient and flexible scaling while minimizing dependency on specific numerical values. Fine-tuning experiments with open-source language and code models, such as Llama2 and CodeLlama, demonstrate the practical benefits of InfinityMATH. These fine-tuned models, showed significant relative improvements on both in-domain and out-of-domain benchmarks, ranging from 184.7% to 514.3% on average. Additionally, these models exhibited high robustness on the GSM8K+ and MATH+ benchmarks, which are enhanced version of test sets with simply the number variations. InfinityMATH ensures that models are more versatile and effective across a broader range of mathematical problems. The data is available at https://huggingface.co/datasets/flagopen/InfinityMATH. flagopen/InfinityMATH · Datasets at Hugging Face
Generative Photomontage
Published on Aug 14
·
Submitted by
akhaliq
on Aug 15
Authors:
Sean J. Liu
,
Nupur Kumari
,
Ariel Shamir
,
Jun-Yan Zhu
Abstract
Text-to-image models are powerful tools for image creation. However, the generation process is akin to a dice roll and makes it difficult to achieve a single image that captures everything a user wants. In this paper, we propose a framework for creating the desired image by compositing it from various parts of generated images, in essence forming a Generative Photomontage. Given a stack of images generated by ControlNet using the same input condition and different seeds, we let users select desired parts from the generated results using a brush stroke interface. We introduce a novel technique that takes in the user's brush strokes, segments the generated images using a graph-based optimization in diffusion feature space, and then composites the segmented regions via a new feature-space blending method. Our method faithfully preserves the user-selected regions while compositing them harmoniously. We demonstrate that our flexible framework can be used for many applications, including generating new appearance combinations, fixing incorrect shapes and artifacts, and improving prompt alignment. We show compelling results for each application and demonstrate that our method outperforms existing image blending methods and various baselines.
https://huggingface.co/papers/2408.07116 Paper page - Generative Photomontage
LGM Full
This custom pipeline encapsulates the full LGM pipeline, including multi-view diffusion.

It is provided as a resource for the ML for 3D Course.

Original LGM paper: LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation.https://huggingface.co/Thever/LGM-Thever Thever/LGM-Thever · Hugging Face