Share and discover more about AI with social posts from the community.huggingface/OpenAi
This repo contains minimal inference code to run text-to-image and image-to-image with our Flux latent rectified flow transformers.

Inference partners
We are happy to partner with Replicate and FAL. You can sample our models using their services. Below we list relevant links.

Replicate:

https://replicate.com/collections/flux
https://replicate.com/black-forest-labs/flux-pro
https://replicate.com/black-forest-labs/flux-dev
https://replicate.com/black-forest-labs/flux-schnell
FAL:

https://fal.ai/models/fal-ai/flux-pro
https://fal.ai/models/fal-ai/flux/dev
https://fal.ai/models/fal-ai/flux/schnell
Local installation
cd $HOME && git clone https://github.com/black-forest-labs/flux
cd $HOME/flux
python3.10 -m venv .venv
source .venv/bin/activate
pip install -e '.[all]'
Models
We are offering three models:

FLUX.1 [pro] the base model, available via API
FLUX.1 [dev] guidance-distilled variant
FLUX.1 [schnell] guidance and step-distilled variant
Name HuggingFace repo License md5sum
FLUX.1 [schnell] https://huggingface.co/black-forest-labs/FLUX.1-schnell apache-2.0 a9e1e277b9b16add186f38e3f5a34044
FLUX.1 [dev] https://huggingface.co/black-forest-labs/FLUX.1-dev FLUX.1-dev Non-Commercial License a6bd8c16dfc23db6aee2f63a2eba78c0
FLUX.1 [pro] Only available in our API.
The weights of the autoencoder are also released under apache-2.0 and can be found in either of the two HuggingFace repos above. They are the same for both models.

Usage
The weights will be downloaded automatically from HuggingFace once you start one of the demos. To download FLUX.1 [dev], you will need to be logged in, see here. If you have downloaded the model weights manually, you can specify the downloaded paths via environment-variables:

export FLUX_SCHNELL=<path_to_flux_schnell_sft_file>
export FLUX_DEV=<path_to_flux_dev_sft_file>
export AE=<path_to_ae_sft_file>
For interactive sampling run

python -m flux --name <name> --loop
Or to generate a single sample run

python -m flux --name <name> \
--height <height> --width <width> \
--prompt "<prompt>"
We also provide a streamlit demo that does both text-to-image and image-to-image. The demo can be run via

streamlit run demo_st.py
We also offer a Gradio-based demo for an interactive experience. To run the Gradio demo:

python demo_gr.py --name flux-schnell --device cuda
Options:

--name: Choose the model to use (options: "flux-schnell", "flux-dev")
--device: Specify the device to use (default: "cuda" if available, otherwise "cpu")
--offload: Offload model to CPU when not in use
--share: Create a public link to your demo
To run the demo with the dev model and create a public link:

python -m demo_gr.py --name flux-dev --share
Diffusers integration
FLUX.1 [schnell] and FLUX.1 [dev] are integrated with the 🧨 diffusers library. To use it with diffusers, install it:

pip install git+https://github.com/huggingface/diffusers.git
Then you can use FluxPipeline to run the model

import torch
from diffusers import FluxPipeline

model_id = "black-forest-labs/FLUX.1-schnell" #you can also use black-forest-labs/FLUX.1-dev

pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell", torch_dtype=torch.bfloat16)
pipe.enable_model_cpu_offload() #save some VRAM by offloading the model to CPU. Remove this if you have enough GPU power

prompt = "A cat holding a sign that says hello world"
seed = 42
image = pipe(
prompt,
output_type="pil",
num_inference_steps=4, #use a larger number if you are using [dev]
generator=torch.Generator("cpu").manual_seed(seed)
).images[0]
image.save("flux-schnell.png")
To learn more check out the diffusers documentation

API usage
Our API offers access to the pro model. It is documented here: docs.bfl.ml.

In this repository we also offer an easy python interface. To use this, you first need to register with the API on api.bfl.ml, and create a new API key.

To use the API key either run export BFL_API_KEY=<your_key_here> or provide it via the api_key=<your_key_here> parameter. Is is also expected that you have installed the package as above.
https://huggingface.co/mmhamdySo what are those "thinking tokens"?! Nothing fancy, they are just special tokens '<T>' that you insert after each word in a sentence whenever a complex problem is encountered. That's it!

👉 The main idea is to "buy" the model "some time" to think about the problem with these additional computations before answering. Using this method they observed an improved (a little bit) perplexity.

👉 Before getting excited note that: They have added these tokens manually, and they have used an RNN language model. From the paper:

"As a proof of concept, we have added N ’thinking tokens’ (< T >) after each observed word in a dataset. Our vision is that this basic concept can be extended to a self-adjusting model, which will be able to decide itself if and how many ’thinking tokens’ will be used for a specific problem, where N could also vary throughout the sentence. This would allow us to reduce the computational time, which would not increase N times."
mmhamdy
posted an update
May 15
Post
1272

💡 Thinking Tokens For Language Models!

How much is 56 times 37? Can you answer that right away?

In a short paper, David Herel and Tomas Mikolov propose a simple method to improve the reasoning of language models when performing complex calculations.

📌 They note that, although language models are not that good with difficult calculations, humans also cannot perform these calculations immediately and require a considerable amount of time to come up with an answer.

Inspired by this, they introduce 💡Thinking Tokens💡
Great work by Finegrain: Erase any object from your image just by naming it. Shadows or reflections will also be adjusted accordingly! https://huggingface.co/spaces/finegrain/finegrain-object-eraser
prompt :

(medium full shot) of (awe-inspiring snake) with muscular body, amber eyes, bronze brown armored scales, venomous fangs, coiling tail, gemstone-studded scales frills, set in a barren desert wasteland, with cracked earth and the remains of ancient structures, a place of mystery and danger, at dawn, ,Masterpiece,best quality, raw photo, realistic, very aesthetic, dark

CFG 1 - seed 1 - FLUX CFG is default : 3.5

Full public SwarmUI tutorial

Zero to Hero Stable Diffusion 3 Tutorial with Amazing SwarmUI SD Web UI that Utilizes ComfyUI

https://youtu.be/HKX8_F1Er_w

Full public Cloud SwarmUI tutorial

How to Use SwarmUI & Stable Diffusion 3 on Cloud Services Kaggle (free), Massed Compute & RunPod

https://youtu.be/XFUZof6Skkw 2/2 pages
FLUX FP16 produces better quality than FP8 but requires 28 GB VRAM - Full comparisons - Also compared Dev vs Turbo model and 1024 vs 1536

check the file names in the below given imgsli to see all details

SwarmUI on L40S is used to compare - 1.82 it / second step speed for 1024x1024

imgsli link that compares all : https://imgsli.com/MjgzNzM1

SwarmUI full tutorial public post : https://www.patreon.com/posts/106135985

1-Click FLUX models downloader scripts for Windows, RunPod and Massed Compute are in below post

https://www.patreon.com/posts/109289967

free Kaggle account notebook that supports FLUX already : Download from here : https://www.patreon.com/posts/106650931 1/2 pages Imgsli
Black Forest Labs, BASED! 👏
FLUX.1 is more delightful, with good instruction following.
FLUX.1 dev(
black-forest-labs/FLUX.1-dev
) with a 12B parameter distillation model, second only to Black Forest Labs' state-of-the-art model FLUX.1 pro. 🙀

Update 🤙Official demo:
black-forest-labs/FLUX.1-dev
Flux.1-Dev like images but in fewer steps.

Merging code (very simple), inference code, merged params:
sayakpaul/FLUX.1-merged


Enjoy the Monday 🤗
🔗 Comprehensive Tutorial Video Link ▶️ https://youtu.be/bupRePUOA18

FLUX represents a milestone in open source txt2img technology, delivering superior quality and more accurate prompt adherence than #Midjourney, Adobe Firefly, Leonardo Ai, Playground Ai, Stable Diffusion, SDXL, SD3, and Dall E3. #FLUX, a creation of Black Forest Labs, boasts a team largely comprised of #StableDiffusion's original developers, and its output quality is truly remarkable. This statement is not hyperbole; you'll witness its capabilities in the tutorial. This guide will demonstrate how to effortlessly install and utilize FLUX models on your personal computer and cloud platforms like Massed Compute, RunPod, and a complimentary Kaggle account.

🔗 FLUX Setup Guide (publicly accessible) ⤵️
▶️ https://www.patreon.com/posts/106135985

🔗 FLUX Models One-Click Robust Automatic Downloader Scripts ⤵️
▶️ https://www.patreon.com/posts/109289967

🔗 Primary Windows SwarmUI Tutorial (Essential for Usage Instructions) ⤵️
▶️ https://youtu.be/HKX8_F1Er_w

🔗 Cloud-based SwarmUI Tutorial (Massed Compute - RunPod - Kaggle) ⤵️
▶️ https://youtu.be/XFUZof6Skkw

🔗 SECourses Discord Server for Comprehensive Support ⤵️
▶️ https://discord.com/servers/software-engineering-courses-secourses-772774097734074388

🔗 SECourses Reddit Community ⤵️
▶️ https://www.reddit.com/r/SECourses/

🔗 SECourses GitHub Repository ⤵️
▶️ https://github.com/FurkanGozukara/Stable-Diffusion

🔗 Official FLUX 1 Launch Announcement Blog Post ⤵️
▶️ https://blackforestlabs.ai/announcing-black-forest-labs/

Video Segments

0:00 Introduction to the state-of-the-art open source txt2img model FLUX
5:01 Process for integrating FLUX model into SwarmUI
FLUX Local & Cloud Tutorial With SwarmUI - FLUX: The Groundbreaking Open Source txt2img Model Outperforms Midjourney & Others - FLUX: The Anticipated Successor to SD3
Porting Vision-Language Models to Apple Silicon with MLX: A Tutorial Series

Are you interested in running cutting-edge AI models efficiently on your Mac? We're excited to share a detailed tutorial series on porting Phi-3-Vision to Apple's MLX framework!

This 8-part series covers:

1. Basic Implementation: Translating core components from PyTorch to MLX
2. Su-scaled Rotary Position Embeddings (SuRoPE): Enabling 128K token contexts
3. Batching: Processing multiple inputs simultaneously for improved efficiency
4. Caching: Optimizing inference speed for autoregressive generation
5. Choice Selection: Implementing constrained outputs for multiple-choice scenarios
6. Constrained Decoding: Guiding model outputs with flexible constraints
7. LoRA Training: Fine-tuning models efficiently with Low-Rank Adaptation
8. Agent & Toolchain System: Building flexible AI workflows

Whether you're an AI enthusiast, researcher, or developer looking to leverage Apple Silicon, this series provides a deep dive into optimizing advanced vision-language models. You'll learn hands-on techniques for model porting, performance optimization, and extending model capabilities.

Check out the full series for a comprehensive guide to running state-of-the-art AI on your Mac!

Link to the tutorial series:

https://medium.com/@albersj66

All the code examples and implementations discussed in this tutorial series are available in our GitHub repository:

https://github.com/JosefAlbers/Phi-3-Vision-MLX

This repository contains:
- Full implementation of Phi-3-Vision in MLX
- Step-by-step code for each tutorial part
- Additional utilities and helper functions

We encourage you to explore the code, experiment with it, and contribute to the project. Your feedback and contributions are welcome!

#MachineLearning #AppleSilicon #MLX #VisionLanguageModels #AI #OpenSource #GitHub #AITutorial
Post
96

Reply

I'm excited to share our updated hallucination evaluation model (called HHEM-2.1-Open) as well as the updated leaderboard that ranks LLM by the propensity to hallucinate.

vectara/Hallucination-evaluation-leaderboard
We are offering a free trial of the Tumeryk Gen AI Security Studio with 2 million tokens for the hugging face community. You can run a vulnerability scan of the LLM and get a security score to fix the vulnerabilities via the built in policy editing tools for configuring the necessary guardrail policies.

Protect against jailbreaks, moderate content and manage hallucinations.

https://www.tumeryk.com/sign-up to get an account or you can get an account from the Amazon Marketplace https://aws.amazon.com/marketplace/pp/prodview-c6ywcyjefl3hu?sr=0-2&ref_=beagle&applicationId=AWSMPContessa
We are offering a free trial of the Tumeryk Gen AI Security Studio with 2 million tokens for the hugging face community. You can run a vulnerability scan of the LLM and get a security score to fix the vulnerabilities via the built in policy editing tools for configuring the necessary guardrail policies.

Protect against jailbreaks, moderate content and manage hallucinations.

https://www.tumeryk.com/sign-up to get an account or you can get an account from the Amazon Marketplace https://aws.amazon.com/marketplace/pp/prodview-c6ywcyjefl3hu?sr=0-2&ref_=beagle&applicationId=AWSMPContessa
Looks like @Google is still not satisfied with Gemini 1.5 Pro! 😲

Good folks at @GoogleDeepMind quietly updated the already good Gemini 1.5 Pro to Gemini-1.5-Pro-Experiment-0801 🚀

Unremarkable naming aside, the model itself outperforms GPT-4o, Claude-3.5, and LLama 3.1 on LMSYS and the Vision Leaderboard. 🌟
Who wants to take a stab at explaining this one? SPOILER ALERT: You CANNOT transfer an image Jailbreak from one model to another. Why in the world you cannot do this when you can transfer learn literally everything else? You tell me, experts.



https://arxiv.org/abs/2407.15211
🚀 Introducing The Open Language Models List

This is a work-in-progress list of open language models with permissive licenses such as MIT, Apache 2.0, or other similar licenses.

The list is not limited to only autoregressive models or even only transformers models, and it includes many SSMs, and SSM-Transformers hybrids.

🤗 Contributions, corrections, and feedback are very welcome!

The Open Language Models List: https://github.com/mmhamdy/open-language-models