HF-hub

Share and discover more about AI with social posts from the community.huggingface/OpenAi

13:54 · Aug 16, 2024 · Fri

FLUX.1: First Impressions
FLUX.1 is a new AI model (available on Replicate) that makes images from text. Unlike most text-to-image models, which rely on diffusion, FLUX.1 uses an upgraded technique called “flow matching.”

While diffusion models create images by gradually removing noise from a random starting point, flow matching takes a more direct approach, learning the precise transformations needed to map noise onto a realistic image. This difference in methodology leads to a distinct aesthetic and unique advantages in terms of speed and control.

We were curious to see how this approach impacts the generated images, so we fed it a variety of prompts, many created by other AI models. Here are some observations:

Text: It gets it (mostly)
One of the challenges in text-to-image generation is accurately translating words into visual representations. FLUX.1 handles this surprisingly well, even in complex scenarios like memes.

Prompt:

Photograph of letterpress serif type on thick rough creamy paper saying ‘REPLICATE.COM’

https://d31rfu1d3w8e4q.cloudfront.net/static/blog/flux-first-impressions/letterpress.webp

13:52 · Aug 16, 2024 · Fri

Meta 3D AssetGen: Text-to-Mesh Generation with High-Quality Geometry, Texture, and PBR Materials
We present Meta 3D AssetGen (AssetGen), a significant advancement in text-to-3D generation which produces faithful, high-quality meshes with texture and material control. Compared to works that bake shading in the 3D object's appearance, AssetGen outputs physically-based rendering (PBR) materials, supporting realistic relighting. AssetGen generates first several views of the object with factored shaded and albedo appearance channels, and then reconstructs colours, metalness and roughness in 3D, using a deferred shading loss for efficient supervision. It also uses a sign-distance function to represent 3D shape more reliably and introduces a corresponding loss for direct shape supervision. This is implemented using fused kernels for high memory efficiency. After mesh extraction, a texture refinement transformer operating in UV space significantly improves sharpness and details. AssetGen achieves 17% improvement in Chamfer Distance and 40% in LPIPS over the best concurrent work for few-view reconstruction, and a human preference of 72% over the best industry competitors of comparable speed, including those that support PBR. Project page with generated assets: https://assetgen.github.io.

Yawar Siddiqui, Tom Monnier,
Filippos Kokkinos
, Mahendra Kariya, Yanir Kleiman, Emilien Garreau,
Oran Gafni
,
Natalia Neverova
, Andrea Vedaldi, Roman Shapovalov, David Novotny
https://ai.meta.com/research/publications/meta-3d-assetgen-text-to-mesh-generation-with-high-quality-geometry-texture-and-pbr-materials/

assetgen.github.io

Meta 3D AssetGen

Meta 3D AssetGen: Text-to-Mesh Generation with High-Quality Geometry, Texture, and PBR Materials

13:52 · Aug 16, 2024 · Fri

The Llama 3 Herd of Models
Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical evaluation of Llama 3. We find that Llama 3 delivers comparable quality to leading language models such as GPT-4 on a plethora of tasks. We publicly release Llama 3, including pre-trained and post-trained versions of the 405B parameter language model and our Llama Guard 3 model for input and output safety. The paper also presents the results of experiments in which we integrate image, video, and speech capabilities into Llama 3 via a compositional approach. We observe this approach performs competitively with the state-of-the-art on image, video, and speech recognition tasks. The resulting models are not yet being broadly released as they are still under development.

Llama team https://ai.meta.com/research/publications/the-llama-3-herd-of-models/

Meta

X-Sample Contrastive Loss: Improving Contrastive Learning with Sample Similarity Graphs | Research - AI at Meta

Learning good representations involves capturing the diverse ways in which data samples relate. Contrastive loss—an objective matching related...

13:50 · Aug 16, 2024 · Fri

An overview of the SAM 2 framework.

SAM 2 uses a transformer architecture with streaming memory for real-time video processing. It builds on the original SAM model, extending its capabilities to video.

For more technical details, check out the Research paper.

Safety
⚠️ Users should be aware of potential ethical implications: - Ensure you have the right to use input images and videos, especially those featuring identifiable individuals. - Be responsible about generated content to avoid potential misuse. - Be cautious about using copyrighted material as inputs without permission.

Support
All credit goes to the Meta AI Research teamhttps://raw.githubusercontent.com/facebookresearch/segment-anything-2/main/assets/model_diagram.png

13:48 · Aug 16, 2024 · Fri

How to Use SAM 2 for Video?
Segment Anything Model 2 (SAM 2) is a unified video and image segmentation model.

Video segmentation presents unique challenges compared to image segmentation. Object motion, deformation, occlusion, lighting changes, and other factors can vary dramatically from frame to frame. Videos are often lower quality than images due to camera motion, blur, and lower resolution, further increasing the difficulty.

SAM 2 demonstrates improved accuracy in video segmentation, with 3 times fewer interactions than previous approaches. SAM 2 is more accurate for image segmentation and 6 times faster than the original Segment Anything Model (SAM).https://youtu.be/Dv003fTyO-Y

YouTube

Segment Anything 2 (SAM 2): Meta AI's Newest Model | Community Q&A (Jul 30)

Segment Anything Model 2 (SAM 2) is a foundation model designed to address promptable visual segmentation in both images and videos. The model extends its functionality to video by treating images as single-frame videos. Its design, a simple transformer architecture…

13:48 · Aug 16, 2024 · Fri

SAM 2: Segment Anything in Images and Videos
Segment Anything Model 2 (SAM 2) is a foundation model towards solving promptable visual segmentation in images and videos. We extend SAM to video by considering images as a video with a single frame. The model design is a simple transformer architecture with streaming memory for real-time video processing. We build a model-in-the-loop data engine, which improves model and data via user interaction, to collect our SA-V dataset, the largest video segmentation dataset to date. SAM 2 trained on our data provides strong performance across a wide range of tasks and visual domains.https://github.com/zsxkib/segment-anything-2/raw/video/assets/sa_v_dataset.jpg?raw=true

13:47 · Aug 16, 2024 · Fri

Batouresearch / high-resolution-controlnet-tile
Run time and cost
This model costs approximately $0.054 to run on Replicate, or 18 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia A40 (Large) GPU hardware. Predictions typically complete within 75 seconds. The predict time for this model varies significantly based on the inputs.

Readme
High quality upscale from Fermat.app. Increase the creativity to encourage hallucination.https://replicate.delivery/pbxt/etT436Z2RrWAOajwhQm6YLBHiT5Y1Oix2aZnDLnIkfM7u4ESA/out-0.png
https://replicate.delivery/pbxt/1rbKAbFss7ZUGNxKnFGmOEHBaeEZ7cI7Sx61eiOo9AyGjQMTA/output.jpg

13:46 · Aug 16, 2024 · Fri

Restore images
These models restore and improve images by fixing defects like blur, noise, and low resolution. Key capabilities:

Deblurring - Sharpen blurry images by reversing blur effects. Useful for old photos.
Denoising - Remove grain and artifacts by learning noise patterns.
Colorization - Add realistic color to black and white photos.
Face restoration - Improve the image quality of faces in old photos, or unrealistic AI generated faces.
Our Picks
Best restoration model: google-research/maxim
If you need to sharpen a blurry photo, or remove noise or compression artifacts, start with google-research/maxim. It has a total of 11 image restoration models baked-in that let you deblur, denoise, remove raindrops, and more. If you’re not getting the results you’re looking for, try megvii-research/nafnet which is similar but supports fewer restoration features.

Best colorization model: piddnad/ddcolor
The best model for adding color to black and white photos is piddnad/ddcolor, which was released in 2023. If you are looking for more saturated results try out arielreplicate/deoldify_image.

Best face restoration model: sczhou/codeformer
If you’re looking for a face restoration model, try starting with sczhou/codeformer. It produces more realistic faces than alternatives like tencentarc/gfpgan. If you aren’t getting the exact image improvements you want, we recommend exploring more modern upscaling models like batouresearch/magic-image-refiner.https://replicate.com/collections/image-restoration

13:45 · Aug 16, 2024 · Fri

Using FLUX.1 Schnell for faster inference
You can use your FLUX.1 Dev LoRA with the smaller FLUX.1 Schnell model, to generate images faster and cheaper. Just change the model parameter from “dev” to “schnell” when you generate, and lower your number of steps to something small like 4.

Note that outputs will still be under the non-commercial license of FLUX.1 Dev.

Examples and use cases
Check out our examples gallery for inspiration. You can see how others have fine-tuned FLUX.1 to create different styles, characters, a never-ending parade of cute animals, and more.https://d31rfu1d3w8e4q.cloudfront.net/static/blog/fine-tune-flux/3-base.webp

13:45 · Aug 16, 2024 · Fri

How to fine-tune FLUX.1
Fine-tuning FLUX.1 on Replicate is a straightforward process that can be done either through the web interface or programmatically via the API. Let’s walk through both methods.

Prepare your training data
To start fine-tuning, you’ll need a collection of images that represent the concept you want to teach the model. These images should be diverse enough to cover different aspects of the concept. For example, if you’re fine-tuning on a specific character, include images in various settings, poses, and lighting. Here are some guidelines:

Use 12-20 images for best results
Use large images if possible
Use JPEG or PNG formats
(Optional) Create a corresponding .txt file for each image with the same name, containing the caption
Once you have your images (and optional captions), zip them up into a single file.

Start the training process
https://replicate.com/blog/fine-tune-flux

Replicate

Fine-tune FLUX.1 with your own images

We've added fine-tuning (LoRA) support to FLUX.1 image generation models. You can train FLUX.1 on your own images with one line of code using Replicate's API.

13:45 · Aug 16, 2024 · Fri

What is fine-tuning FLUX.1 on Replicate?
These big image generation models, like FLUX.1 and Stable Diffusion, are trained on a bunch of images that have had noise added, and they learn the reverse function of “adding noise.” Amazingly, that turns out to be “creating images.”

How do they know which image to create? They build on transformer models, like CLIP and T5, which are themselves trained on tons of image-caption pairs. These are language-to-image encoders: they learn to map an image and its caption to the same shape in high-dimensional space. When you send them a text prompt, like “squirrel reading a newspaper in the park,” they can map that to patterns of pixels in a grid. To the encoder, the picture and the caption are the same thing.

The image generation process looks like this: take some input pixels, move them a little bit away from noise and toward the pattern created by your text input, and repeat until the correct number of steps is reached.

The fine-tuning process, in turn, takes each image/caption pair from your dataset and updates that internal mapping a little bit. You can teach the model anything this way, as long as it can be represented through image-caption pairs: characters, settings, mediums, styles, genres. In training the model will learn to associate your concept with a particular text string. Include this string in your prompt to activate that association.

For example, say you want to fine-tune the model on your comic book superhero. You’ll collect some images of your character as your dataset. A well-rounded batch: different settings, costumes, lighting, maybe even different art styles. That way the model understands that what it’s learning is this one person, not any of these other incidental details.

Pick a short, uncommon word or phrase as your trigger: something unique that won’t conflict with other concepts or fine-tunes. You might choose something like “bad 70s food” or “JELLOMOLD”. Train your model. Now, when you prompt “Establishing shot of bad 70s food at a party in San Francisco,” your model will call up your specific concept. Easy as that.

Could it be as easy as that? Yes, actually. We realized that we could use the Replicate platform to make fine-tuning as simple as uploading images. We can even do the captioning for you.

If you’re not familiar with Replicate, we make it easy to run AI as an API. You don’t have to go looking for a beefy GPU, you don’t have to deal with environments and containers, you don’t have to worry about scaling. You write normal code, with normal APIs, and pay only for what you use.

You can try this right now! It doesn’t take a lot of images. Check out our examples gallery to see the kinds of styles and characters people are creating.

Grab a few photos of your pet, or your favorite houseplant, and let’s get started.

13:45 · Aug 16, 2024 · Fri

Fine-tune FLUX.1 with your own images
We have fine-tuning for FLUX.1
Fine-tuning is now available on Replicate for the FLUX.1 [dev] image model. Here’s what that means, and how to do it.

FLUX.1 is a family of text-to-image models released by Black Forest Labs this summer. The FLUX.1 models set a new standard for open-weights image models: they can generate realistic hands, legible text, and even the strangely hard task of “funny memes.” You can now fine-tune your model on Replicate with the FLUX.1 Dev LoRA Trainer.

If you know what all that means and you’re ready to try it with your dataset, you can skip to the code.

Otherwise, here’s what it means and why you should care.
https://replicate.com/blog/fine-tune-flux

Replicate

Fine-tune FLUX.1 with your own images

We've added fine-tuning (LoRA) support to FLUX.1 image generation models. You can train FLUX.1 on your own images with one line of code using Replicate's API.

12:28 · Aug 16, 2024 · Fri

Welcome to our interactive tutorial on Tony's Chocolonely! I'm excited to guide you through the world of this Dutch chocolate company, known for its commitment to selling 100% slavery-free chocolate.

Overview of Tony's Chocolonely

Tony's Chocolonely is a Dutch chocolate company founded in 2005 by Teun van de Keuken, a Dutch TV journalist who was shocked to discover that most of the chocolate on supermarket shelves was harvested by slaves. He decided to take action and make 5,000 Fairtrade chocolate bars himself, marking the birth of Tony's Chocolonely.

Key Aspects of Tony's Chocolonely

Fairtrade Certification: Tony's Chocolonely is committed to Fairtrade certification, which guarantees a minimum price for cocoa farmers and promotes fair labor practices.
Slavery-Free Chocolate: Tony's Chocolonely aims to eradicate slavery from the global chocolate industry, making it a pioneer in the movement towards slavery-free chocolate.
Direct Trade: The company buys cocoa mass directly from cocoa farmers in West Africa, working with 5 cocoa cooperatives in Ghana and Ivory Coast.
Environmental Sustainability: Tony's Chocolonely prioritizes environmental sustainability, using uncoated, recycled FSC-certified paper for its wrappers and offsetting emissions through Justdiggit.
What would you like to learn more about?

Please choose one of the following topics:

History of Tony's Chocolonely: Learn about the company's founding, its mission, and its journey towards becoming a leader in the chocolate industry.
Fairtrade Certification and Slavery-Free Chocolate: Dive deeper into the Fairtrade certification process, the benefits of slavery-free chocolate, and Tony's Chocolonely's commitment to these values.
Direct Trade and Environmental Sustainability: Explore the company's direct trade practices, its environmental sustainability initiatives, and how these efforts contribute to a more responsible chocolate industry.
Customized Wrappers and Gift Cards: Discover how to create customized wrappers, order gift cards, and other fun features of Tony's Chocolonely's online shop.
Other: Is there something specific you'd like to know about Tony's Chocolonely?

12:27 · Aug 16, 2024 · Fri

Welcome to our interactive tutorial on LoRA (Lightweight Adapter) and Diffusion Pipelines!

In this tutorial, we'll be exploring the AlbertFLUX model, a LoRA derived from the FLUX.1-dev model by black-forest-labs. We'll delve into the details of the model's architecture, training settings, and inference pipeline.

What would you like to learn about?

LoRA and its applications: Understand the concept of LoRA and its use cases in machine learning.
AlbertFLUX model architecture: Learn about the model's architecture, including the text encoder and image decoder.
Training settings and hyperparameters: Explore the training settings and hyperparameters used to train the AlbertFLUX model.
Inference pipeline and code: Understand the inference pipeline and code used to generate images with the AlbertFLUX model.
Diffusion pipelines and their benefits: Learn about diffusion pipelines and their advantages in image generation tasks.
Please select one of the above options to get started!

https://huggingface.co/markury/AlbertFLUX

huggingface.co

markury/AlbertFLUX · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

12:26 · Aug 16, 2024 · Fri

Welcome to our interactive tutorial on Flux DreamBooth LoRA!

We will be exploring the concept of LoRA (Low-Rank Adaptation) in the context of DreamBooth, a text-to-image synthesis model. Specifically, we will be working with the linoyts/3d_icon_flux_1500 DreamBooth LoRA weights for the black-forest-labs/FLUX.1-dev model.

What would you like to learn about?

Model Description: Understand the context and training process of the Flux DreamBooth LoRA model.
LoRA and its Applications: Learn about the concept of LoRA and its use cases in text-to-image synthesis.
Trigger Words and Image Generation: Discover how to use trigger words to generate images with the Flux DreamBooth LoRA model.
Loading and Using the Model: Understand how to load and use the linoyts/3d_icon_flux_1500 DreamBooth LoRA weights with the diffusers library.
Limitations and Bias: Explore the potential limitations and biases of the model, as well as strategies for addressing them.
Training Details: Learn about the data used to train the model and its implications for the generated images.
Please select one of the above options by typing the corresponding number.
https://huggingface.co/linoyts/3d_icon_flux_1500#flux-dreambooth-lora---linoyts3d_icon_flux_1500

huggingface.co

linoyts/3d_icon_flux_1500 · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

12:25 · Aug 16, 2024 · Fri

Welcome to our interactive lesson on FLUX.1 dev SimpleTuner Test!

I'm excited to help you learn about this topic. Here's a brief overview:

What is FLUX.1 dev SimpleTuner Test?

FLUX.1 dev SimpleTuner Test is a type of artificial intelligence (AI) model that uses a technique called LoRA (Learning to be a Regularizer) to improve the performance of a pre-trained model. In this case, the pre-trained model is called FLUX.1 dev, which is a type of image generation model.

What do you want to learn about FLUX.1 dev SimpleTuner Test?

Here are some options:

What is LoRA and how does it work?
How is FLUX.1 dev SimpleTuner Test different from other image generation models?
What are the key settings and parameters used to train FLUX.1 dev SimpleTuner Test?
How can I use FLUX.1 dev SimpleTuner Test to generate images?
Something else (please specify)
Let me know what you're interested in learning, and I'll do my best to explain it in a way that's easy to understand!

https://huggingface.co/markury/FLUX-dev-LoRA-test

huggingface.co

markury/FLUX-dev-LoRA-test · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

12:24 · Aug 16, 2024 · Fri

Flux DreamBooth LoRA - merve/flux-lego-lora-dreambooth
Model description
These are merve/flux-lego-lora-dreambooth DreamBooth LoRA weights for black-forest-labs/FLUX.1-dev.

The weights were trained using DreamBooth with the Flux diffusers trainer.

Was LoRA for the text encoder enabled? False.

Trigger words
You should use lego set in style of TOK to trigger the image generation.

Download model
Download the *.safetensors LoRA in the Files & versions tab.

Use it with the 🧨 diffusers library https://huggingface.co/merve/flux-lego-lora-dreambooth

huggingface.co

merve/flux-lego-lora-dreambooth · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

12:24 · Aug 16, 2024 · Fri

flux-dreambooth-lora-r16-dev-cfg1
This is a LoRA derived from black-forest-labs/FLUX.1-dev.

The main validation prompt used during training was:

julie, in photograph style

Validation settings
CFG: 3.0
CFG Rescale: 0.0
Steps: 20
Sampler: None
Seed: 420420420
Resolution: 512
Note: The validation settings are not necessarily the same as the training settings.

You can find some example images in the following gallery:https://huggingface.co/ptx0/flux-dreambooth-lora-r16-dev-cfg1

huggingface.co

ptx0/flux-dreambooth-lora-r16-dev-cfg1 · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Before

After