HF-hub

Share and discover more about AI with social posts from the community.huggingface/OpenAi

14:08 · Aug 16, 2024 · Fri

Demo: https://huggingface.co/spaces/Gradio-Community/Text-guided-Flux-Inpainting

Shout out to
@Gothos03
and
@skalskip92
for enabling and showcasing Inpainting with Flux 🙌

huggingface.co

Text Guided Flux Inpainting - a Hugging Face Space by Gradio-Community

Discover amazing ML apps made by the community

14:08 · Aug 16, 2024 · Fri

UniPortrait is exclusively released with a Gradio demo on the
@huggingface
Spaces!🤩

Code not released yet. Learn more from their Project page: https://aigcdesigngroup.github.io/UniPortrait-Page/

Access the demo at:https://huggingface.co/spaces/Junjie96/UniPortrait

aigcdesigngroup.github.io

UniPortrait

Offcial website of 'UniPortrait: A Unified Framework for Identity-Preserving Single- and Multi-Human Image Personalization'

14:07 · Aug 16, 2024 · Fri

Model on 🤗 Hub: https://huggingface.co/THUDM/LongWriter-glm4-9b

Gradio demo available on the repo locally and linked on the project Readme: https://github.com/THUDM/LongWriter?tab=readme-ov-file#%EF%B8%8F-longwriter-deployment

Clone the repo and launch the gradio demo: python trans_web_demo.py 🤠

Demo releasing soon on 🤗 Spaces, stay tuned!

huggingface.co

THUDM/LongWriter-glm4-9b · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

14:07 · Aug 16, 2024 · Fri

We're grateful for our amazing community! Harry C (hoveychen on GitHub) has created "gradio-i18n", a package to easily localize Gradio apps!

This solves real developer needs. Gradio developers, now don't hesitate to take your apps global.

Learn more: https://github.com/gradio-app/gradio/issues/2465#issuecomment-2290969699

GitHub

Allow users to define their own i18n (e.g. for button value text) · Issue #2465 · gradio-app/gradio

Describe the bug See Button.svelte#L14, the value of the Button components has always try to translated by svelte-i18n, but I have not found the way to custom the locale dictionaries, that caused a...

14:06 · Aug 16, 2024 · Fri

Online demos for BiRefNet on
@huggingface
Spaces!

Is this the best background removal model out there? 🤯
MIT licensed. 5.5G GPU memory needed for inference for 1024x1024 images.🤩
BiRefNet

🔥 Gradio Demo 1 with ImageSlider output: https://huggingface.co/spaces/not-lain/background-removal

Gradio demo 2 by the author 🙌
https://huggingface.co/spaces/ZhengPeng7/BiRefNet_demo

huggingface.co

Background Removal - a Hugging Face Space by not-lain

Discover amazing ML apps made by the community

14:05 · Aug 16, 2024 · Fri

NEW and Hot: AuraSR Upscaler

- 600M parameter
- Based on GigaGAN paper from Adobe
- GANs are much faster than diffusion upscaling
- Upscaling to 1024px in 1/4th of a second

Model and Demo are up on Huggingface Hub. Great work by Fal AI and Gokay Aydogan, respectively.
🔥 AuraSR is a GAN-based Super-Res upscaler for generated images, a variation of the GigaGAN paper for image-conditioned upscaling.

Demo by @NONDA30: https://huggingface.co/spaces/gokaygokay/AuraSR

🔥 Torch implementation is based on the unofficial lucidrains/gigagan-pytorch repository: https://github.com/lucidrains/gigagan-pytorch?ref=blog.fal.ai

huggingface.co

AuraSR-v2 - a Hugging Face Space by gokaygokay

Discover amazing ML apps made by the community

13:59 · Aug 16, 2024 · Fri

How to run Yi chat models with an API
Posted November 23, 2023 by @nateraw

The Yi series models are large language models trained from scratch by developers at 01.AI. Today, they’ve released two new models: Yi-6B-Chat and Yi-34B-Chat. These models extend the base models, Yi-6B and Yi-34B, and are fine-tuned for chat completion.

Yi-34B currently holds the state-of-the-art for most benchmarks, beating larger models like Llama-70B..

13:58 · Aug 16, 2024 · Fri

Run Code Llama 70B with an API
Posted January 30, 2024 by @cbh123

Code Llama is a code generation model built on top of Llama 2. It can generate code and natural language about code in many programming languages, including Python, JavaScript, TypeScript, C++, Java, PHP, C#, Bash and more.

Today, Meta announced a more powerful new version of Code Llama with 70 billion parameters. It’s one of the highest performing open models. Meta reports a 67.8 on HumanEval, which beats zero-shot GPT-4.

With Replicate, you can run Code Llama 70B in the cloud with one line of code.

Contents
Contents
Code Llama 70B variants
Run Code Llama 70B with JavaScript
Run Code Llama 70B with Python
Run Code Llama 70B with cURL
Keep up to speed
Code Llama 70B variants
There are three variants of Code Llama 70B. The code snippets in this guide use codellama-70b-instruct, but all three variants are available on Replicate:

Code Llama 70B Base is the foundation model.
Code Llama 70B Python is trained on Python code.
Code Llama 70B Instruct is fine-tuned for understanding natural language instructions.

13:58 · Aug 16, 2024 · Fri

Run Snowflake Arctic with an API
Posted April 23, 2024 by @cbh123

Snowflake Arctic is a new open-source language model from Snowflake. Arctic is on-par or better than both Llama 3 8B and Llama 2 70B on all metrics while using less than half of the training compute budget.

It's massive. At 480B, Arctic is the biggest open-source model to date. As expected from a model from Snowflake, it's good at SQL and other coding tasks, and it has a liberal Apache 2.0 license.

With Replicate, you can run Arctic in the cloud with one line of code.

13:58 · Aug 16, 2024 · Fri

Picking an SD3 version
Stability AI have packaged up SD3 Medium in different ways to make sure it can run on as many devices as possible.

SD3 uses three different text encoders. (The text encoder is the part that takes your prompt and puts it into a format the model can understand). One of these new text encoders is really big – meaning it uses a lot of memory. If you’re looking at the SD3 Hugging Face weights, you’ll see four options with different text encoder configurations. You should choose which one to use based on your available VRAM.

sd3_medium_incl_clips_t5xxlfp8.safetensors
This encoder contains the model weights, the two CLIP text encoders and the large T5-XXL model in a compressed fp8 format. We recommend these weights for simplicity and best results.

sd3_medium_incl_clips_t5xxlfp16.safetensors
The same as sd3_medium_incl_clips_t5xxlfp8.safetensors, except the T5 part isn’t compressed as much. By using fp16 instead of fp8, you’ll get a slight improvement in your image quality. This improvement comes at the cost of higher memory usage.

sd3_medium_incl_clips.safetensors
This version does away with the T5 element altogether. It includes the weights with just the two CLIP text encoders. This is a good option if you do not have much VRAM, but your results might be very different from the full version. You might notice that this version doesn’t follow your prompts as closely, and it may also reduce the quality of text in images.

sd3_medium.safetensors
This model is just the base weights without any text encoders. If you use these weights, make sure you’re loading the text encoders separately. Stability AI have provided an example ComfyUI workflow for this.

13:57 · Aug 16, 2024 · Fri

How to get the best results from Stable Diffusion 3?
Stability AI recently released the weights for Stable Diffusion 3 Medium, a 2 billion parameter text-to-image model that excels at photorealism, typography, and prompt following.

You can run the official Stable Diffusion 3 model on Replicate, and it is available for commercial use. We have also open-sourced our Diffusers and ComfyUI implementations (read our guide to ComfyUI).

In this blog post we’ll show you how to use Stable Diffusion 3 (SD3) to get the best images, including how to prompt SD3, which is a bit different from previous Stable Diffusion models.

To help you experiment, we’ve created an SD3 explorer model that exposes all of the settings we discuss here.https://d31rfu1d3w8e4q.cloudfront.net/static/blog/get-the-best-from-stable-diffusion-3/explorer-screenshot.png

13:57 · Aug 16, 2024 · Fri

What makes FLUX.1 special?
FLUX.1 models have state-of-the-art performance in prompt following, visual quality, image detail, and output diversity. Here are some particular areas where we’ve been impressed:

Text! Unlike older models that often messed up similar-looking letters, Flux can handle tricky words with repeated letters. This makes it great for designs where text needs to be accurate. Check out this Black Forest Flux Schnell gateau:https://d31rfu1d3w8e4q.cloudfront.net/static/blog/flux/cake-text.png

13:56 · Aug 16, 2024 · Fri

How to fine-tune: Focus on effective datasets?
This is the third blog post in a series about adapting open source large language models (LLMs). In this post, we explore some rules of thumb for curating a good training dataset.

In Part 1, we took a look at prevalent approaches for adapting language models to domain data.
In Part 2, we discussed how to determine if fine-tuning is the right approach for your use case.
Introduction

Fine-tuning LLMs is a mix of art and science, with best practices in the field still emerging. In this blog post, we’ll highlight design variables for fine-tuning and give directional guidance on best practices we’ve seen so far to fine-tune models with resource constraints. We recommend using the information below as a starting point to strategize your fine-tuning experiments.

Full fine-tuning vs. parameter-efficient fine-tuning (PEFT)

Both full fine-tuning and PEFT have shown improvements in downstream performance when applied to new domains in both academic and practical settings. Choosing one boils down to compute available (in GPU hours and GPU memory), performance on tasks other than the target downstream task (the learning-forgetting tradeoff) and human annotation costs.

Full fine-tuning is more prone to suffer from two problems: model collapse and catastrophic forgetting. Model collapse is where the model output converges to a limited set of outputs and the tail of the original content distribution disappears. Catastrophic forgetting, as discussed in Part 1 of this series, leads to the model losing its abilities. Some early empirical studies show that full fine-tuning techniques are more prone to the above mentioned issues as compared to PEFT techniques, though more research needs to be done.

PEFT techniques serve as natural regularizers for fine-tuning by design. PEFT often costs relatively less compute to train a downstream model and is much more accessible for a resource-constrained scenario with limited dataset sizes. In some cases, full fine-tuning has shown better performance at the specific task of interest, often at the cost of forgetting some of the capabilities of the original model. This “learning-forgetting” tradeoff between the specific downstream task performance and performance on other tasks is explored deeply in the comparison of LoRA and full fine-tuning in this paper.

Given resource constraints, PEFT techniques will likely give a better performance boost/cost ratio as compared to full fine-tuning. If downstream performance is of paramount importance with resource constraints, full fine-tuning will be the most effective. In either scenario, the key is to create a high-quality dataset keeping the following key principles in mind.

13:56 · Aug 16, 2024 · Fri

How NVIDIA is using structured weight pruning and knowledge distillation to build new Llama models
Large language models like Llama can move with impressive speed and precision to handle a variety of challenging tasks, such as generating code, solving math problems, and helping doctors make life-saving medical decisions. Open source models are already leading to incredible breakthroughs across disciplines—however, they’re resource-intensive to deploy. It’s important that we work collaboratively across the industry to make it even easier for people to tap into the game-changing potential of LLMs.

Last month, we announced Llama 3.1, which includes our largest model yet, the 405B, as well as two smaller models with 70 billion and 8 billion parameters, respectively. Smaller models from a larger relative are typically cheaper to deploy to the masses and perform well across many language tasks. In a new research paper, our partners at NVIDIA explore how various large models can be made smaller using structured weight pruning and knowledge distillation—without having to train a new model from scratch. Working with Llama 3.1 8B, the team shares how it created Llama-Minitron 3.1 4B, its first work within the Llama 3.1 open source family.

Learn more about this work, and get the pruning and distillation strategy and additional resources by reading NVIDIA’s blog post.https://ai.meta.com/blog/nvidia-llama/

Meta AI

How NVIDIA is using structured weight pruning and knowledge distillation to build new Llama models

Our partners at NVIDIA explain how they used structured weight pruning and model distillation to create Llama-Minitron 3.1 4B—their first work within the Llama 3.1 open source collection of models.

13:54 · Aug 16, 2024 · Fri

FLUX.1: First Impressions
FLUX.1 is a new AI model (available on Replicate) that makes images from text. Unlike most text-to-image models, which rely on diffusion, FLUX.1 uses an upgraded technique called “flow matching.”

While diffusion models create images by gradually removing noise from a random starting point, flow matching takes a more direct approach, learning the precise transformations needed to map noise onto a realistic image. This difference in methodology leads to a distinct aesthetic and unique advantages in terms of speed and control.

We were curious to see how this approach impacts the generated images, so we fed it a variety of prompts, many created by other AI models. Here are some observations:

Text: It gets it (mostly)
One of the challenges in text-to-image generation is accurately translating words into visual representations. FLUX.1 handles this surprisingly well, even in complex scenarios like memes.

Prompt:

Photograph of letterpress serif type on thick rough creamy paper saying ‘REPLICATE.COM’

https://d31rfu1d3w8e4q.cloudfront.net/static/blog/flux-first-impressions/letterpress.webp

13:52 · Aug 16, 2024 · Fri

Meta 3D AssetGen: Text-to-Mesh Generation with High-Quality Geometry, Texture, and PBR Materials
We present Meta 3D AssetGen (AssetGen), a significant advancement in text-to-3D generation which produces faithful, high-quality meshes with texture and material control. Compared to works that bake shading in the 3D object's appearance, AssetGen outputs physically-based rendering (PBR) materials, supporting realistic relighting. AssetGen generates first several views of the object with factored shaded and albedo appearance channels, and then reconstructs colours, metalness and roughness in 3D, using a deferred shading loss for efficient supervision. It also uses a sign-distance function to represent 3D shape more reliably and introduces a corresponding loss for direct shape supervision. This is implemented using fused kernels for high memory efficiency. After mesh extraction, a texture refinement transformer operating in UV space significantly improves sharpness and details. AssetGen achieves 17% improvement in Chamfer Distance and 40% in LPIPS over the best concurrent work for few-view reconstruction, and a human preference of 72% over the best industry competitors of comparable speed, including those that support PBR. Project page with generated assets: https://assetgen.github.io.

Yawar Siddiqui, Tom Monnier,
Filippos Kokkinos
, Mahendra Kariya, Yanir Kleiman, Emilien Garreau,
Oran Gafni
,
Natalia Neverova
, Andrea Vedaldi, Roman Shapovalov, David Novotny
https://ai.meta.com/research/publications/meta-3d-assetgen-text-to-mesh-generation-with-high-quality-geometry-texture-and-pbr-materials/

assetgen.github.io

Meta 3D AssetGen

Meta 3D AssetGen: Text-to-Mesh Generation with High-Quality Geometry, Texture, and PBR Materials

13:52 · Aug 16, 2024 · Fri

The Llama 3 Herd of Models
Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical evaluation of Llama 3. We find that Llama 3 delivers comparable quality to leading language models such as GPT-4 on a plethora of tasks. We publicly release Llama 3, including pre-trained and post-trained versions of the 405B parameter language model and our Llama Guard 3 model for input and output safety. The paper also presents the results of experiments in which we integrate image, video, and speech capabilities into Llama 3 via a compositional approach. We observe this approach performs competitively with the state-of-the-art on image, video, and speech recognition tasks. The resulting models are not yet being broadly released as they are still under development.

Llama team https://ai.meta.com/research/publications/the-llama-3-herd-of-models/