HF-hub - Share and discover more about AI with social posts from the community.huggingface/OpenAi
Share and discover more about AI with social posts from the community.huggingface/OpenAi
Blane187/animalese-py


or you can make your voice to animalese with it:
Blane187/animalese_RVC


i'm just bored, so i make the project, lol
Everchanging Quest is out !

It is an LLM controlled Rogue-Like in which the LLM gets a markdown representation of the map, and should generate a JSON with the objective to fulfill on the map as well as the necessary objects and their placements.

Come test it on the space :
Jofthomas/Everchanging-Quest
Some personal and professional news

I'm writing a book on ML metrics.

Together with Wojtek Kuberski, we’re creating the missing piece of every ML university program and online course: a book solely dedicated to Machine Learning metrics!

The book will cover the following types of metrics:
• Regression
• Classification
• Clustering
• Ranking
• Vision
• Text
• GenAI
• Bias and Fairness

👉 check out the book: https://www.nannyml.com/metrics
𝗭𝗲𝗿𝗼-𝗺𝗮𝘁𝗵 𝗶𝗻𝘁𝗿𝗼 𝘁𝗼 𝗔𝗜 𝗵𝗶𝘀𝘁𝗼𝗿𝘆: 𝗳𝗿𝗼𝗺 𝘁𝗵𝗲 𝟭𝟵𝟱𝟬𝘀 𝘁𝗼 𝘁𝗼𝗱𝗮𝘆'𝘀 𝗟𝗟𝗠𝘀 📖

I wanted to structure my thinking about LLMs by going through their history since the 50s. This history is captivating, with the opposition between Connexionists (Rosenblatt, LeCun) and Symbolists, the first victories of "deep" neural networks, the revolution of Attention...

So I might have gone a bit too far! 😅

📝 I've made a long post summarizing the main stages of building LLMs: neural networks, optimization, backpropagation, attention layers...

And I've made sure to keep it 100% horrible-latex-math-free: the technical stuff is conveyed in graphs only, so it should be accessible to really anyone, even your grandfather (I'm sending it to mine right now).

Read it here in english 👉 https://aymeric-roucher.github.io/brief-history-of-ai/
Pour le post en français 👉 https://aymeric-roucher.github.io/breve-histoire-de-l-ia/
Fal/AuraFlow-v0.3
is now here with support for different aspect resolutions (w/h up to 1536px!) and much nicer aesthetics! Make sure to install the latest diffusers to get support for it.
As some of you know, I try to convert models to either fp32 or bf16 depending on theirs size before doing imatrix and quantization

Today I decided to see if that matters, and the results have me.. for lack of a better word, perplexed

My setup:

Mistral Nemo Instruct 2407
- convert to FP32, calculate imatrix, quantize to Q8_0 and Q4_K_M
- convert to FP16, calculate imatrix, quantize to Q8_0 and Q4_K_M

I calculated the kld base from the FP32 model:
./llama-perplexity -m /models/Mistral-Nemo-Instruct-2407-f32.gguf -f /training_data/wikitext-2-raw/wiki.test.raw --kl-divergence-base /training_data/mistral-nemo-f32.kld -ngl 35 -fa -sm row

then calculated the divergence itself for each like so:
./llama-perplexity -m /models/Mistral-Nemo-Instruct-2407-Q8_0.gguf -f /training_data/wikitext-2-raw/wiki.test.raw --kl-divergence-base /training_data/mistral-nemo-f32.kld --kl-divergence -ngl 50 -fa -sm row

Q4_K_M from fp16 and fp32 were similar, trading blows across statistics, odd since i expected fp32 to be strictly better but it's not

Q8_0 is where things get weird. Despite each file being slightly different size, and the sha256sum of course being different, they each get *completely identical* scores, down to 6 decimal places of precision on the statistics.

How is this possible? Is there something I don't understand about llama.cpp that makes it always convert to fp16 before it does quantization? Am I wasting time using FP32/BF16??
https://huggingface.co/posts/bartowski/608656345183499 @bartowski on Hugging Face:
Improved ControlNet!
Now supports dynamic resolution for perfect landscape and portrait outputs. Generate stunning images without distortion—optimized for any aspect ratio!
...
https://huggingface.co/spaces/DamarJati/FLUX.1-DEV-Canny FLUX.1-DEV Canny - a Hugging Face Space by DamarJati
SAM2 Video Predictor
This is a simple demo for video segmentation with SAM2.

Instructions: (read the instructions)

Upload your video [MP4-24fps]
With 'include' point type selected, Click on the object to mask on first frame
Switch to 'exclude' point type if you want to specify an area to avoid
Get Mask !
Check Propagation every 15 frames
Add point on corresponding frame number if any mask needs to be refined
If propagation seems ok on every 15 frames, propagate with "render" to render final masked video !
Hit Reset button if you want to refresh and start again.
Input video will be processed over 10 seconds only for demo purpose :)
https://huggingface.co/spaces/fffiloni/SAM2-Video-Predictor SAM2 Video Predictor - a Hugging Face Space by fffiloni
Second day of Composer use
genecyber
1h
First of all, I’m loving composer, I use it in combination with live preview and can see real time rendering of edits also.

Second of all, we really need some sort of internal versioning, unable to undo is pretty scary. Also id I add my own versions, composer gets confused which files it’s editing.

I had a filename change work tonight, that was cool.

Mostly I’m struggling to add files to composer context that stick, but it might be that my composer disconnects? If composer gets in a weird state I end up having to restart cursor completely to recover.

why is the ui not following standards? in minimized the + makes a new instead of expanding, and the only way to expand floating is keys+i ? at least I can now find my composer instances I might accidentally lose by clicking plus.

The direction is amazing, and far surpasses any other experiences I’ve tried with collaborative editing multiple files.

Also love the ability to modularize the code when it gets just too much for context windows. I’ll create a styles.css add it to composer context and ask it to move styles to css file, and so on, this is great. otherwise Cursor suffers like everyone else with attention when it comes to large files.

These are all over the place I know.
https://forum.cursor.com/t/second-day-of-composer-use/7584
Make Google part of your security team. Join Mandiant and Google Cloud experts online for Google Cloud Security Summit, Thursday, August 22, at 11:30 AM CST, to discover how you can defend your organisation against evolving cyberthreats with intel-driven security operations, a secure-by-design foundation, and AI innovations across the security life cycle.


Register to dive deep into key security topics and new technologies:
https://cloudonair.withgoogle.com/events/summit-apac-security-24?utm_content=invite2&utm_source=cloud_sfdc&utm_medium=email&utm_campaign=FY24-Q3-apac-EXP120-onlineevent-er-dgcsm-security-summit-2024-mc&pref=K&mkt_tok=ODA4LUdKVy0zMTQAAAGU6OHWBoF2Y5Hwwfb2QFOtyxtgmixuo-CGF6NRTRFIkjwshtRL-iLyPcZqVgrMOI8bqjtZOditNJpP6QJl-PDITmFSR8L1dvNKb2vJEg3zPxovd3Vaavw

Start with the opening keynote.

Join Sunil Potti, VP and GM of Google Cloud Security, to explore how AI is enhancing security and helping organisations boost their resilience.


Get to know Gemini for Security.

Check out the latest ways Google AI is transforming cloud security, security operations, and threat intelligence with robust Gemini-powered capabilities.


Gain valuable insights from the 2024 M-Trends report.

Learn about the evolving cyber-threat landscape from Steve D’sa, Regional Leader, Mandiant Consulting. Featuring APAC perspectives and best practice you can apply directly to your security program.

Register yourself and your team today, join five or more sessions and receive a Google Cloud collectible digital badge in recognition of your participation.
Metric Card for CharacTER
CharacTer is a character-level metric inspired by the commonly applied translation edit rate (TER). It is defined as the minimum number of character edits required to adjust a hypothesis, until it completely matches the reference, normalized by the length of the hypothesis sentence. CharacTer calculates the character level edit distance while performing the shift edit on word level. Unlike the strict matching criterion in TER, a hypothesis word is considered to match a reference word and could be shifted, if the edit distance between them is below a threshold value. The Levenshtein distance between the reference and the shifted hypothesis sequence is computed on the character level. In addition, the lengths of hypothesis sequences instead of reference sequences are used for normalizing the edit distance, which effectively counters the issue that shorter translations normally achieve lower TER. If this is a text-based metric, make sure to wrap you input in double quotes. Alternatively you can use a JSON-formatted list as input.
https://huggingface.co/spaces/evaluate-metric/character CharacTER - a Hugging Face Space by evaluate-metric
Metric Card for BLEU
BLEU (Bilingual Evaluation Understudy) is an algorithm for evaluating the quality of text which has been machine-translated from one natural language to another. Quality is considered to be the correspondence between a machine’s output and that of a human: “the closer a machine translation is to a professional human translation, the better it is” – this is the central idea behind BLEU. BLEU was one of the first metrics to claim a high correlation with human judgements of quality, and remains one of the most popular automated and inexpensive metrics.

Scores are calculated for individual translated segments—generally sentences—by comparing them with a set of good quality reference translations. Those scores are then averaged over the whole corpus to reach an estimate of the translation’s overall quality. Neither intelligibility nor grammatical correctness are not taken into account.

If this is a text-based metric, make sure to wrap you input in double quotes. Alternatively you can use a JSON-formatted list as input. https://huggingface.co/spaces/evaluate-metric/bleu BLEU - a Hugging Face Space by evaluate-metric
Metric Card for BERT Score
Metric description
BERTScore is an automatic evaluation metric for text generation that computes a similarity score for each token in the candidate sentence with each token in the reference sentence. It leverages the pre-trained contextual embeddings from BERT models and matches words in candidate and reference sentences by cosine similarity.

Moreover, BERTScore computes precision, recall, and F1 measure, which can be useful for evaluating different language generation tasks.

How to use
BERTScore takes 3 mandatory arguments : predictions (a list of string of candidate sentences), references (a list of strings or list of list of strings of reference sentences) and either lang (a string of two letters indicating the language of the sentences, in ISO 639-1 format) or model_type (a string specififying which model to use, according to the BERT specification). The default behavior of the metric is to use the suggested model for the target language when one is specified, otherwise to use the model_type indicated.https://huggingface.co/spaces/evaluate-metric/bertscore BERT Score - a Hugging Face Space by evaluate-metric
Supabase Realtime: Broadcast and Presence Authorization

Today we're releasing Authorization for Realtime's Broadcast and Presence.

For context, Supabase includes three useful extensions for building real-time applications.

Broadcast: Send ephemeral, low-latency messages between users.
Presence: Show when users are online and share state between users.
Postgres Changes: Listen to Postgres database changes.
This release introduces authorization for Broadcast and Presence using Row Level Security policies:
https://youtu.be/IXRrU9MpA8Q

https://supabase.com/blog/supabase-realtime-broadcast-and-presence-authorization
New phone, new era. The new #Pixel9 is built for and with Gemini. It has…
- Tools using Gemini to spark creativity
- Pixel Camera features for great photos *and* videos
- AI that improves phone calls
- Smart, elevated design
#MadeByGoogle
TTS Arena: Benchmarking Text-to-Speech Models in the Wild
Automated measurement of the quality of text-to-speech (TTS) models is very difficult. Assessing the naturalness and inflection of a voice is a trivial task for humans, but it is much more difficult for AI. This is why today, we’re thrilled to announce the TTS Arena. Inspired by LMSys's Chatbot Arena for LLMs, we developed a tool that allows anyone to easily compare TTS models side-by-side. Just submit some text, listen to two different models speak it out, and vote on which model you think is the best. The results will be organized into a leaderboard that displays the community’s highest-rated models.https://huggingface.co/blog/arena-tts TTS Arena: Benchmarking Text-to-Speech Models in the Wild
Yesterday, we released Parler-TTS and Data-Speech, fully open-source reproduction of work from the paper:
Natural language guidance of high-fidelity text-to-speech with synthetic annotations (2402.01912)


Parler-TTS is a lightweight text-to-speech (TTS) model that can generate high-quality, natural sounding speech in the style of a given speaker (gender, pitch, speaking style, etc).

https://huggingface.co/collections/parler-tts/parler-tts-fully-open-source-high-quality-tts-models-66164ad285ba03e8ffde214c

Parler-TTS Mini v0.1, is the first iteration Parler-TTS model trained using 10k hours of narrated audiobooks. It generates high-quality speech with features that can be controlled using a simple text prompt (e.g. gender, background noise, speaking rate, pitch and reverberation).

To improve the prosody and naturalness of the speech further, we're scaling up the amount of training data to 50k hours of speech. The v1 release of the model will be trained on this data, as well as inference optimisations, such as flash attention and torch compile.

parler-tts/parler_tts_mini_v0.1


Data-Speech can be used for annotating speech characteristics in a large-scale setting.

parler-tts/open-source-speech-datasets-annotated-using-data-speech-661648ffa0d3d76bfa23d534


This work is both scalable and easily modifiable and will hopefully help the TTS research community explore new ways of conditionning speech synthesis.

All of the datasets, pre-processing, training code and weights are released publicly under permissive license, enabling the community to build on our work and develop their own powerful TTS models. Parler-TTS: fully open-source high-quality TTS - a parler-tts Collection