HF-hub - Share and discover more about AI with social posts from the community.huggingface/OpenAi
Share and discover more about AI with social posts from the community.huggingface/OpenAi
Mozilla/Whisperfile: Local OpenAI Whisper Alternative is Here?Wanna try out FLUX.1 the next generation AI image generator? 🚀🚀🚀
Look no further, Anakin AI offers a whole Universe of AI tools including FLUX.1, DALL.E 3, Stable Diffusion 3, and hundreds of AI tools. So, don’t waste any more time by jumping from website to website.🔥🔥🔥
Experience FLUX, DALLE and Stable Diffusion 3 Now at Anakin AI 👇👇👇
Anakin.ai — One-Stop AI App Platform
Generate Content, Images, Videos, and Voice; Craft Automated Workflows, Custom AI Apps, and Intelligent Agents. Your…
app.anakin.ai
OpenAI GLIDE: Guided Language to Image Diffusion for Generation and Editing
Diffusion Models
Diffusion models work by gradually transforming a noisy image into a clear, detailed one. The process starts with a random noise image, and the model iteratively reduces the noise, guided by the input data, until it produces a realistic image.
Example: Think of it as sculpting a statue from a block of marble. The initial noise represents the unformed block, and each step of the diffusion process chisels away the noise, revealing the final image.
Doc to Dialogue in Hugging Face
Project

Transform any PDF document (research report, market analysis, manuals, or user guides) into an audio interview with two AI-generated voices to enhance engagement with complex content. I used the Gemini API model for document processing, OpenAI Whisper TTS for voice generation, and Gradio for the interface, and uploaded in huggingface.
Any feedback will be welcome!
State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow



🤗 Transformers provides thousands of pretrained models to perform tasks on different modalities such as text, vision, and audio.
These models can be applied on:
📝 Text, for tasks like text classification, information extraction, question answering, summarization, translation, and text generation, in over 100 languages.
🖼️ Images, for tasks like image classification, object detection, and segmentation.
🗣️ Audio, for tasks like speech recognition and audio classification.
Transformer models can also perform tasks on several modalities combined, such as table question answering, optical character recognition, information extraction from scanned documents, video classification, and visual question answering.
🤗 Transformers provides APIs to quickly download and use those pretrained models on a given text, fine-tune them on your own datasets and then share them with the community on our model hub. At the same time, each python module defining an architecture is fully standalone and can be modified to enable quick research experiments. Models - Hugging Face
Everything you need to know about using the tools, libraries, and models at Hugging Face—from transformers, to RAG, LangChain, and Gradio.

Hugging Face is the ultimate resource for machine learning engineers and AI developers. It provides hundreds of pretrained and open-source models for dozens of different domains—from natural language processing to computer vision. Plus, you’ll find a popular platform for hosting your models and datasets. Hugging Face in Action reveals how to get the absolute best out of everything Hugging Face, from accessing state-of-the-art models to building intuitive frontends for AI apps.
microsoft/Phi-3.5-MoE-instruct
Model Summary
Phi-3.5-MoE is a lightweight, state-of-the-art open model built upon datasets used for Phi-3 - synthetic data and filtered publicly available documents - with a focus on very high-quality, reasoning dense data. The model supports multilingual and comes with 128K context length (in tokens). The model underwent a rigorous enhancement process, incorporating supervised fine-tuning, proximal policy optimization, and direct preference optimization to ensure precise instruction adherence and robust safety measures.
🏡 Phi-3 Portal 📰 Phi-3 Microsoft Blog 📖 Phi-3 Technical Report 👩‍🍳 Phi-3 Cookbook 🖥️ Try It
microsoft/Phi-3.5-vision-instruct Model Summary
Phi-3.5-vision is a lightweight, state-of-the-art open multimodal model built upon datasets which include - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data both on text and vision. The model belongs to the Phi-3 model family, and the multimodal version comes with 128K context length (in tokens) it can support. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures.
🏡 Phi-3 Portal 📰 Phi-3 Microsoft Blog 📖 Phi-3 Technical Report 👩‍🍳 Phi-3 Cookbook 🖥️ Try It
microsoft/Phi-3.5-mini-instruct Model Summary
Phi-3.5-mini is a lightweight, state-of-the-art open model built upon datasets used for Phi-3 - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data. The model belongs to the Phi-3 model family and supports 128K token context length. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning, proximal policy optimization, and direct preference optimization to ensure precise instruction adherence and robust safety measures.
🏡 Phi-3 Portal 📰 Phi-3 Microsoft Blog 📖 Phi-3 Technical Report 👩‍🍳 Phi-3 Cookbook 🖥️ Try It
lllyasviel/flux1-dev-bnb-nf4 from hugging face
Main page: https://github.com/lllyasviel/stable-diffusion-webui-forge/discussions/981

Update:

Always use V2 by default.

V2 is quantized in a better way to turn off the second stage of double quant.

V2 is 0.5 GB larger than the previous version, since the chunk 64 norm is now stored in full precision float32, making it much more precise than the previous version. Also, since V2 does not have second compression stage, it now has less computation overhead for on-the-fly decompression, making the inference a bit faster.

The only drawback of V2 is being 0.5 GB larger.

Main model in bnb-nf4 (v1 with chunk 64 norm in nf4, v2 with chunk 64 norm in float32)

T5xxl in fp8e4m3fn

CLIP-L in fp16

VAE in bf16 [Major Update] BitsandBytes Guidelines and Flux · lllyasviel/stable-diffusion-webui-forge · Discussion #981
TurboEdit: Instant text-based image editing

We address the challenges of precise image inversion and disentangled image editing in the context of few-step diffusion models. We introduce an encoder based iterative inversion technique. The inversion network is conditioned on the input image and the reconstructed image from the previous step, allowing for correction of the next reconstruction towards the input image. We demonstrate that disentangled controls can be easily achieved in the few-step diffusion model by conditioning on an (automatically generated) detailed text prompt. To manipulate the inverted image, we freeze the noise maps and modify one attribute in the text prompt (either manually or via instruction based editing driven by an LLM), resulting in the generation of a new image similar to the input image with only one attribute changed. It can further control the editing strength and accept instructive text prompt. Our approach facilitates realistic text-guided image edits in real-time, requiring only 8 number of functional evaluations (NFEs) in inversion (one-time cost) and 4 NFEs per edit. Our method is not only fast, but also significantly outperforms state-of-the-art multi-step diffusion editing techniques.

Related Links
Few step diffusion model SDXL-Turbo.

StyleGAN based iterative image inversion method ReStyle.

Concurrent few step diffusion image editing works Renoise and another method also calls TurboEdit.

This website is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Website source code based on the Nerfies project page. If you want to reuse their source code, please credit them appropriately.

Project Page: https://betterze.github.io/TurboEdit/
alvdansen/plushy-world-flux
Model description
This style features charming, whimsical characters with exaggerated proportions and soft, rounded forms. The 3D renders have a high level of detail, with textures that appear tactile and inviting, from fluffy fur to smooth, glossy surfaces. The overall aesthetic combines cute, cartoon-like designs with realistic lighting and materials, creating a magical and endearing world that feels both fantastical and tangible.

Sponsored by Glif!
https://huggingface.co/alvdansen/plushy-world-flux alvdansen/plushy-world-flux · Hugging Face
fofr/flux-80s-cyberpunk
https://replicate.com/fofr/flux-80s-cyberpunk Run time and cost
This model costs approximately $0.059 to run on Replicate, or 16 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

To see how much you've spent, go to your dashboard.

This model runs on Nvidia H100 GPU hardware. Predictions typically complete within 39 seconds. The predict time for this model varies significantly based on the inputs. fofr/flux-80s-cyberpunk – Replicate
Subtle Highlights, Wild Fractals, and Tricky Tic-Tac-Toe
This week's CodePen community highlights include a CSS trick for creating marker-like text highlights from Cassidy Williams, a collection of truly Wild Fractals from week two of the August #CodePenChallenge, and a game of tic-tac-toe with a twist from Stefan Matei.

Plus, Amit Sheen takes us to the park with a robot pal, and Vincent Van Gogh drops in on a Pen from Josetxu.
HuggingFaceM4 /Idefics3-8B-Llama3
Transformers version: until the next Transformers pypi release, please install Transformers from source and use this PR to be able to use Idefics3. TODO: change when new version.

Idefics3
Idefics3 is an open multimodal model that accepts arbitrary sequences of image and text inputs and produces text outputs. The model can answer questions about images, describe visual content, create stories grounded on multiple images, or simply behave as a pure language model without visual inputs. It improves upon Idefics1 and Idefics2, significantly enhancing capabilities around OCR, document understanding and visual reasoning.

We release the checkpoints under the Apache 2.0.

Model Summary
Developed by: Hugging Face
Model type: Multi-modal model (image+text)
Language(s) (NLP): en
License: Apache 2.0
Parent Models: google/siglip-so400m-patch14-384 and meta-llama/Meta-Llama-3.1-8B-Instruct
Resources for more information:
Idefics1 paper: OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents
Idefics2 paper: What matters when building vision-language models?
Idefics3 paper: Coming soon (TODO)https://huggingface.co/HuggingFaceM4/Idefics3-8B-Llama3 HuggingFaceM4/Idefics3-8B-Llama3 · Hugging Face
AI Tools of the week
1.🎥 Guidde - Magically create video documentation that explains the most complex things through simple step-by-step guides.

2.💡 MuckBrass - Uncover market-validated startup ideas with AI-powered analysis of search trends and competition.

🤖 Splutter AI - Launch custom AI chatbots for websites, supercharging support, marketing, and sales with 24/7 automation.

📞 LangCall - Skip long waits with AI agents that navigate phone menus, handle conversations, and connect you only when needed.

🔍 MiniPerplx - Streamline web search with advanced functions like weather updates, event tracking, and literary analysis.

Sourcer AI - Combat online misinformation with real-time AI-powered fact-checking for instant credibility assessment.

📊 PPT GPTSci - Convert images into fully editable PowerPoint slides through a user-friendly interface with high-quality output.

📖 Narrative Nooks - Unlock personalized learning with interactive stories, 24/7 tutoring support, and engaging lessons.

💬 AI Chat Bot - Skyrocket sales with multilingual AI chatbots, offering easy setup, customizable features, and seamless integration.

🎶 Songifier - Identify songs instantly using lyric matching, bridging the gap between fragmentary lyric recall and full song access.

📊 Shortimize - Analyze and optimize shorts with cross-platform tracking, AI-powered viral video search, and in-depth analytics.

📚 GetQuiz - Turn reading into lasting knowledge with AI companion that generates quizzes directly in Telegram.

🎵 AudioStack - Revolutionize audio production with AI-powered tools that create professional-quality content 10,000x faster.

🐦 BioIt - Craft your perfect Twitter bio by answering a few simple questions, generating unique, catchy intros.

🎭 Immersim AI - Create and explore infinite interactive universes with seamless storytelling and dynamic character interactions.
ResShift 1-Click Windows, RunPod, Massed Compute, Kaggle Installers with Amazing Gradio APP and Batch Image Processing. ResShift is Efficient Diffusion Model for Image Super-resolution by Residual Shifting (NeurIPS 2023, Spotlight).


Official Repo : https://github.com/zsyOAOA/ResShift

I have developed a very advanced Gradio APP.

Developed APP Scripts and Installers : https://www.patreon.com/posts/110331752

Features

It supports following tasks:

Real-world image super-resolution

Bicubic (resize by Matlab) image super-resolution

Blind Face Restoration

Automatically saving all generated image with same name + numbering if necessary

Randomize seed feature for each generation

Batch image processing - give input and output folder paths and it batch process all images and saves

1-Click to install on Windows, RunPod, Massed Compute and Kaggle (free account)

Windows Requirements

Python 3.10, FFmpeg, Cuda 11.8, C++ tools and Git

If it doesn't work make sure to below tutorial and install everything exactly as shown in this below tutorial

https://youtu.be/-NjNy7afOQ0

How to Install on Windows

Make sure that you have the above requirements

Extract files into a folder like c:/reshift_v1

Double click Windows_Install.bat and it will automatically install everything for you with an isolated virtual environment folder (VENV)

After that double click Windows_Start_app.bat and start the app

When you first time use a task it will download necessary models (all under 500 MB) into accurate folders

If during download it fails, file gets corrupted sadly it doesn't verify that so delete files inside weights and restart

How to Install on RunPod, Massed Compute, Kaggle

Follow the Massed_Compute_Instructions_READ.txt and Runpod_Instructions_READ.txt

For Kaggle follow the notebook written steps

An example video of how to use my RunPod, Massed Compute scripts and Kaggle notebook can be seen

https://youtu.be/wG7oPp01COg GitHub - zsyOAOA/ResShift: ResShift: Efficient Diffusion Model for Image Super-resolution by Residual Shifting (NeurIPS@2023 Spotlight…
Introducing Hugging Face Similar: a Chrome extension to find relevant datasets!

Adds a "Similar Datasets" section to Hugging Face dataset pages
🔍 Recommendations based on dataset READMEs
🏗 Powered by https://huggingface.co/chromadb and https://huggingface.co/Snowflake embeddings.

You can try it here: https://chromewebstore.google.com/detail/hugging-face-similar/aijelnjllajooinkcpkpbhckbghghpnl?authuser=0&hl=en.

I am very happy to get feedback on whether this could be useful or not 🤗 chromadb (chroma)
Serving Meta Llama 3.1 405B on Google Cloud is now possible via the Hugging Face Deep Learning Containers (DLCs) for Text Generation Inference (TGI)

In this post, we showcase how to deploy
meta-llama/Meta-Llama-3.1-405B-Instruct-FP8
on an A3 instance with 8 x H100 GPUs on Vertex AI

Thanks to the Hugging Face DLCs for TGI and Google Cloud Vertex AI, deploying a high-performance text generation container for serving Large Language Models (LLMs) has never been easier. And we’re not going to stop here – stay tuned as we enable more experiences to build AI with open models on Google Cloud!

Read the full post at https://huggingface.co/blog/llama31-on-vertex-ai Deploy Meta Llama 3.1 405B on Google Cloud Vertex AI
How The Washington Post Uses AI to Empower Journalists 🔍📰

An exciting new example in the world of AI-assisted journalism! The Post has developed an internal tool called "Hayatacker" that's enhancing in-depth reporting. Here's why it matters:

🎥 What it does:
• Extracts stills from video files
• Processes on-screen text
• Labels objects in images

🗳 First big project:
Analyzed 745 Republican campaign ads on immigration (Jan-Jun 2024)

🤝 Human-AI collaboration:
• AI extracts and organizes data
• Reporters verify and analyze findings

🔎 Thorough approach:
• Manual review of all 745 ads
• Reverse image searches when context is lacking
• Cross-referencing with AdImpact transcripts

💡 Key insight from WaPo's Senior Editor for AI strategy Phoebe Connelly:
"The more exciting choice is putting AI in the hands of reporters early on in the process."

This tool showcases how AI can augment journalistic capabilities without replacing human insight and verification. It's a powerful example of technology enhancing, not replacing, traditional reporting skills.

👉 Read the full article and the methodology: https://www.washingtonpost.com/elections/interactive/2024/republican-campaign-ads-immigration-border-security/ Republicans flood TV with misleading ads about immigration, border