Share and discover more about AI with social posts from the community.huggingface/OpenAi
Hugging Face Text to Image (Prompt)

Detailed Configuration Options
model_endpoint: (Required) Specifies the endpoint for the model you want to use for image generation. The default base URL is https://api-inference.huggingface.co. Providing just the model name defaults to this URL. A full URL, such as 'https://api-inference.huggingface.co/models/stabilityai/stable-diffusion-2-1', overrides the default. This is where API requests for image generation are sent.

asset_folder: (Required) Designates the folder where generated images will be stored. This path, like '/AI Generated', is the path where the generated images will be saved.

prompt_template: (Optional) A Twig template that creates the prompt sent to the image generation model. It combines the input fields into a coherent description for the model. The selected element gets passed as "subject". If empty, the user has to input the initial prompt manually.

filename_template: (Optional) A Twig template to generate the filename dynamically.

parameters: (Optional) Contains additional parameters for the image generation process:

height: (Optional) Specifies the height of the generated image in pixels.
width: (Optional) Specifies the width of the generated image in pixels.
negative_prompt: A Twig template that specifies descriptions to avoid in the generated images.
guidance_scale: (Optional) Determines how closely the generated image should adhere to the prompt as opposed to the model's own creativity.
num_inference_steps: (Optional) Sets the number of steps the model undergoes to refine the generated image. Higher values can lead to more detailed images.
options: (Optional) Contains additional options for the image generation process: use_cache: (Optional, default: true) Utilizes previously generated images for similar requests to accelerate response times. Setting this to false ensures a new image is generated for each request, enhancing uniqueness but potentially increasing wait times. Serverless Inference API
Use HuggingFace Stable Diffusion Model to Generate Images from Text

I typed “Generate a picture illustrating AI for drawing a picture” to Bing’s Copilot. The picture above was generated by Copilot. Have you ever wondered how to build a model to generate pictures based on prompted text, like Copilot or DALL-E? Well, in this article, I will show you step-by-step how to use the Huggingface pre-trained stable diffusion model to generate images from text.

Install Huggingface Tansformers and diffusers
At the beginning of a notebook (I used Google Colab free version with T4 GPU runtime), type the following code to install the necessary libraries:

!pip install --upgrade diffusers transformers -q
Import the Necessary Libraries
import torch

from diffusers import StableDiffusionPipeline
from transformers import pipeline, set_seed
Set up an Attribute Class TTI
The TTI class specifies the HuggingFace model id, the generative model type, and some related attributes.
FLUX Tarot v1
Model description
A tarot card LoRA trained with the public domain card set of Raider Waite 1920. Dataset

Trained with fal-ai trainer based on the open source trainer ostris AI Toolkit.

Trigger words
You should use in the style of TOK a trtcrd tarot style to trigger the image generation.

Download model
Weights for this model are available in Safetensors format.

Download them in the Files & versions tab.

Use it with the 🧨 diffusers library
https://huggingface.co/multimodalart/flux-tarot-v1 multimodalart/flux-tarot-v1 · Hugging Face
Breaking news!

You can now browse from a
@huggingface
model page, to its:
- fine-tunes
- adapters
- merges
- quantized versions

and browse through the models' genealogy tree 🌲
uPass
uPass: AI tool for students to humanize academic work and bypass AI detectors with three tailored modes
uPass is an AI tool designed for students to humanize academic work and bypass AI detectors.

uPass Introduction
uPass is an AI tool designed specifically for students, aiming to humanize academic work and bypass AI detectors. It offers three modes: Basic, Advanced, and Aggressive, allowing users to tailor the rewriting process to their needs. By using uPass, students can ensure their assignments and essays appear error-free and maintain integrity, while also bypassing Turnitin's AI detector and plagiarism checker. This tool is particularly useful for those looking to present their work in a more human-like manner, ensuring it meets academic standards without being flagged by AI detection systems.

uPass Features
uPass is an AI tool designed to assist students in bypassing AI detectors and humanizing their academic work. Below is a detailed overview of its key functions:
GigaGAN: Large-scale GAN for Text-to-Image Synthesis
Can GANs also be trained on a large dataset for a general text-to-image synthesis task? We present our 1B-parameter GigaGAN, achieving lower FID than Stable Diffusion v1.5, DALL·E 2, and Parti-750M. It generates 512px outputs at 0.13s, orders of magnitude faster than diffusion and autoregressive models, and inherits the disentangled, continuous, and controllable latent space of GANs. We also train a fast upsampler that can generate 4K images from the low-res outputs of text-to-image models.
Disentangled Prompt Interpolation
GigaGAN comes with a disentangled, continuous, and controllable latent space.
In particular, it can achieve layout-preserving fine style control by applying a different prompt at fine scales.

Abstract
The recent success of text-to-image synthesis has taken the world by storm and captured the general public's imagination. From a technical standpoint, it also marked a drastic change in the favored architecture to design generative image models. GANs used to be the de facto choice, with techniques like StyleGAN. With DALL·E 2, auto-regressive and diffusion models became the new standard for large-scale generative models overnight. This rapid shift raises a fundamental question: can we scale up GANs to benefit from large datasets like LAION? We find that naÏvely increasing the capacity of the StyleGAN architecture quickly becomes unstable. We introduce GigaGAN, a new GAN architecture that far exceeds this limit, demonstrating GANs as a viable option for text-to-image synthesis. GigaGAN offers three major advantages. First, it is orders of magnitude faster at inference time, taking only 0.13 seconds to synthesize a 512px image. Second, it can synthesize high-resolution images, for example, 16-megapixel pixels in 3.66 seconds. Finally, GigaGAN supports various latent space editing applications such as latent interpolation, style mixing, and vector arithmetic operations.
Improved ControlNet!
Now supports dynamic resolution for perfect landscape and portrait outputs. Generate stunning images without distortion—optimized for any aspect ratio!
...https://huggingface.co/spaces/DamarJati/FLUX.1-DEV-Canny FLUX.1-DEV Canny - a Hugging Face Space by DamarJati
Announcing another BIG data drop! This time it's ~275M images from Flickr
bigdata-pw/Flickr


Data acquisition for this project is still in progress, get ready for an update soon:tm:

In case you missed them; other BIG data drops include Diffusion1B
bigdata-pw/Diffusion1B
- ~1.23B images and generation parameters from a variety of diffusion models and if you fancy practicing diffusion model training check out Dataception
bigdata-pw/Dataception
- a dataset of over 5000 datasets in WebDataset format!

Requests are always welcome so reach out if there's a dataset you'd like to see!
Build with real-time digital twins that speak, see, and hear

We're Quinn and Hassaan, co-founders of Tavus.

At Tavus, we build AI models and APIs to empower product development teams to build digital twin experiences with video generation and, officially, as of today – real-time conversational video.

With the Conversational Video Interface, developers can now build with real-time digital twins that speak, see, & hear. It's the world's only end-to-end conversational pipeline with less than a second of latency.

» You can try talking to Carter, our Digital Twin, at www.tavus.io 🤖🤓
YC S24's
@SimplexData
creates photorealistic vision datasets rendered from 3D scenes for AI model training.

Submit a 30-second form, provide feedback on a few sample images, and receive gigabytes of labeled vision data.

https://www.ycombinator.com/launches/Lbx-simplex-on-demand-photorealistic-vision-datasets
Groq: Models Quality, Performance & Price
Analysis of Groq's models across key metrics including quality, price, output speed, latency, context window & more. This analysis is intended to support you in choosing the best model provided by Groq for your use-case. For more details including relating to our methodology, see our FAQs. Models analyzed: Gemma 2 9B, Gemma 7B, Llama 3.1 70B, Llama 3 70B, Llama 3.1 8B, Llama 3 8B, and Mixtral 8x7B.
Link:
Visit
Groq Model Comparison Summary
GPT-4o mini: Quality, Performance & Price Analysis
Analysis of OpenAI's GPT-4o mini and comparison to other AI models across key metrics including quality, price, performance (tokens per second & time to first token), context window & more. Click on any model to compare API providers for that model. For more details including relating to our methodology, see our FAQs.
For analysis of API providers see

GPT-4o mini API Providers comparison


Comparison Summary

Quality:
GPT-4o mini is of higher quality compared to average, with a MMLU score of 0.82 and a Quality Index across evaluations of 88.
Price:
GPT-4o mini is cheaper compared to average with a price of $0.26 per 1M Tokens (blended 3:1).
GPT-4o mini Input token price: $0.15, Output token price: $0.60 per 1M Tokens.
Speed:
GPT-4o mini is faster compared to average, with a output speed of 107.6 tokens per second.
Latency:
GPT-4o mini has a lower latency compared to average, taking 0.54s to receive the first token (TTFT).
Context Window:
GPT-4o mini has a smaller context windows than average, with a context window of 130k tokens.

https://artificialanalysis.ai/models/gpt-4o-mini GPT-4o mini - Quality, Performance & Price Analysis | Artificial Analysis
Sonar 3.1 Large: Quality, Performance & Price Analysis
Analysis of Perplexity's Sonar 3.1 Large and comparison to other AI models across key metrics including quality, price, performance (tokens per second & time to first token), context window & more. Click on any model to compare API providers for that model. For more details including relating to our methodology, see our FAQs.
For analysis of API providers see

Sonar 3.1 Large API Providers comparison

https://artificialanalysis.ai/models/sonar-3-1-large-chat Sonar 3.1 Large - Quality, Performance & Price Analysis | Artificial Analysis
Getty Images Partners with NVIDIA to Upgrade AI Image Generation Tool: Generate 4 Images in 6 Seconds

Global image repository giant Getty Images has partnered with tech titan NVIDIA to introduce a cutting-edge AI image generation tool. This is no ordinary upgrade; it represents a significant leap in speed, quality, and accuracy!

The new AI model can generate four images in approximately 6 seconds, doubling the speed of its predecessor! Imagine pressing the shutter and, in the blink of an eye, four high-definition beautiful images appear before you—the speed is almost unbelievable.

Key Points:

🚀 Ultra-fast Experience: New AI model generates 4 images in 6 seconds, doubling the speed!

🎨 Quality Leap: Adopts NVIDIA Edify architecture, significantly improving image quality and output speed.

🛠 Unlimited Creativity: Introduces AI image modification features, allowing for one-click element changes, canvas expansion, and more creative freedom.

These are the exciting upgrades brought by Getty Images and NVIDIA's collaboration on AI image generation tools. Let's look forward to how it will transform our creative world!

#GettyImages#NVIDIA#AI Image Generation#Midjourney
Lenovo's Profits Increase for the Second Consecutive Quarter Driven by AI Demand

Lenovo's profits have grown for the second consecutive quarter, with the world's largest personal computer manufacturer emerging from an industry slump that has lasted for years and seizing the opportunity of artificial intelligence to drive growth.

According to financial reports, Lenovo's net profit reached $243 million in the three months ending June 30, a year-on-year increase of 38%. This achievement not only exceeded analysts' expectations but also breathed new life into Lenovo after years of industry downturn.

Key Points:

1. 🚀 Lenovo's second-quarter net profit reached $243 million, a 38% year-on-year increase, exceeding analyst expectations.

2. 💻 Total revenue of $15.45 billion, a 20% increase, mainly driven by computers and AI technology.

3. 🌍 Lenovo is actively expanding non-PC business, with non-PC revenue accounting for 47% of total sales, and infrastructure business sales growing by 65%.
#Lenovo#Artificial Intelligence#Copilot#Snapdragon
OpenAI Launches SWE-bench Verified: Enhancing AI Software Engineering Capability Assessment
OpenAI announced the launch of SWE-bench Verified, a code generation evaluation benchmark, on August 13th. This new benchmark aims to more accurately assess the performance of AI models in software engineering tasks, addressing several limitations of the previous SWE-bench.

SWE-bench is an evaluation dataset based on real software issues from GitHub, containing 2,294 Issue-Pull Request pairs from 12 popular Python repositories. However, the original SWE-bench had three main issues: overly strict unit tests that could reject correct solutions, unclear problem descriptions, and unreliable development environment setup.
SoftBank and Intel Discuss Cooperation to Challenge Nvidia

Recently, it has been reported that SoftBank is in talks with Intel to explore a collaboration in the field of artificial intelligence (AI) chips, aiming to compete with the market leader NVIDIA. With the rapid advancement of AI technology, the demand for high-performance computing chips is growing stronger, and NVIDIA's dominant position in this sector is undoubtedly putting pressure on other companies.
Can Llama 8B Outsmart GPT-4o Using Search Engines?
Recently, a new study has brought excitement, demonstrating that Large Language Models (LLMs) can significantly enhance their performance through search functionalities. Notably, the Llama3.1 model with only 800 million parameters, after 100 searches, performed on par with GPT-4o in Python code generation tasks.

This idea seems reminiscent of Rich Sutton's pioneering work in reinforcement learning, particularly his 2019 classic blog post, "The Bitter Lesson." He emphasized the power of general methods as computational capabilities improve, highlighting "search" and "learning" as excellent choices that can continue to scale.
Disrupting Tradition! Lumina-mGPT Can Create Realistic and High-Resolution Images from Text
Multimodal generative models are at the forefront of the latest trends in artificial intelligence, dedicated to integrating visual and textual data to create systems capable of handling a variety of tasks. These tasks range from generating high-detail images from textual descriptions to understanding and reasoning across different data types, driving the emergence of more interactive and intelligent AI systems that seamlessly combine vision and language.

A key challenge in this field is developing autoregressive (AR) models that can generate realistic images based on textual descriptions. While diffusion models have made significant strides in this area, AR models have lagged behind, particularly in terms of image quality, resolution flexibility, and the ability to handle various visual tasks. This gap has prompted researchers to seek innovative methods to enhance the capabilities of AR models.