HF-hub - Share and discover more about AI with social posts from the community.huggingface/OpenAi
Share and discover more about AI with social posts from the community.huggingface/OpenAi
GigaGAN: Large-scale GAN for Text-to-Image Synthesis
Can GANs also be trained on a large dataset for a general text-to-image synthesis task? We present our 1B-parameter GigaGAN, achieving lower FID than Stable Diffusion v1.5, DALL·E 2, and Parti-750M. It generates 512px outputs at 0.13s, orders of magnitude faster than diffusion and autoregressive models, and inherits the disentangled, continuous, and controllable latent space of GANs. We also train a fast upsampler that can generate 4K images from the low-res outputs of text-to-image models.
Disentangled Prompt Interpolation
GigaGAN comes with a disentangled, continuous, and controllable latent space.
In particular, it can achieve layout-preserving fine style control by applying a different prompt at fine scales.

Abstract
The recent success of text-to-image synthesis has taken the world by storm and captured the general public's imagination. From a technical standpoint, it also marked a drastic change in the favored architecture to design generative image models. GANs used to be the de facto choice, with techniques like StyleGAN. With DALL·E 2, auto-regressive and diffusion models became the new standard for large-scale generative models overnight. This rapid shift raises a fundamental question: can we scale up GANs to benefit from large datasets like LAION? We find that naÏvely increasing the capacity of the StyleGAN architecture quickly becomes unstable. We introduce GigaGAN, a new GAN architecture that far exceeds this limit, demonstrating GANs as a viable option for text-to-image synthesis. GigaGAN offers three major advantages. First, it is orders of magnitude faster at inference time, taking only 0.13 seconds to synthesize a 512px image. Second, it can synthesize high-resolution images, for example, 16-megapixel pixels in 3.66 seconds. Finally, GigaGAN supports various latent space editing applications such as latent interpolation, style mixing, and vector arithmetic operations.
Improved ControlNet!
Now supports dynamic resolution for perfect landscape and portrait outputs. Generate stunning images without distortion—optimized for any aspect ratio!
...https://huggingface.co/spaces/DamarJati/FLUX.1-DEV-Canny FLUX.1-DEV Canny - a Hugging Face Space by DamarJati
Announcing another BIG data drop! This time it's ~275M images from Flickr
bigdata-pw/Flickr


Data acquisition for this project is still in progress, get ready for an update soon:tm:

In case you missed them; other BIG data drops include Diffusion1B
bigdata-pw/Diffusion1B
- ~1.23B images and generation parameters from a variety of diffusion models and if you fancy practicing diffusion model training check out Dataception
bigdata-pw/Dataception
- a dataset of over 5000 datasets in WebDataset format!

Requests are always welcome so reach out if there's a dataset you'd like to see!
Build with real-time digital twins that speak, see, and hear

We're Quinn and Hassaan, co-founders of Tavus.

At Tavus, we build AI models and APIs to empower product development teams to build digital twin experiences with video generation and, officially, as of today – real-time conversational video.

With the Conversational Video Interface, developers can now build with real-time digital twins that speak, see, & hear. It's the world's only end-to-end conversational pipeline with less than a second of latency.

» You can try talking to Carter, our Digital Twin, at www.tavus.io 🤖🤓
YC S24's
@SimplexData
creates photorealistic vision datasets rendered from 3D scenes for AI model training.

Submit a 30-second form, provide feedback on a few sample images, and receive gigabytes of labeled vision data.

https://www.ycombinator.com/launches/Lbx-simplex-on-demand-photorealistic-vision-datasets
Groq: Models Quality, Performance & Price
Analysis of Groq's models across key metrics including quality, price, output speed, latency, context window & more. This analysis is intended to support you in choosing the best model provided by Groq for your use-case. For more details including relating to our methodology, see our FAQs. Models analyzed: Gemma 2 9B, Gemma 7B, Llama 3.1 70B, Llama 3 70B, Llama 3.1 8B, Llama 3 8B, and Mixtral 8x7B.
Link:
Visit
Groq Model Comparison Summary
GPT-4o mini: Quality, Performance & Price Analysis
Analysis of OpenAI's GPT-4o mini and comparison to other AI models across key metrics including quality, price, performance (tokens per second & time to first token), context window & more. Click on any model to compare API providers for that model. For more details including relating to our methodology, see our FAQs.
For analysis of API providers see

GPT-4o mini API Providers comparison


Comparison Summary

Quality:
GPT-4o mini is of higher quality compared to average, with a MMLU score of 0.82 and a Quality Index across evaluations of 88.
Price:
GPT-4o mini is cheaper compared to average with a price of $0.26 per 1M Tokens (blended 3:1).
GPT-4o mini Input token price: $0.15, Output token price: $0.60 per 1M Tokens.
Speed:
GPT-4o mini is faster compared to average, with a output speed of 107.6 tokens per second.
Latency:
GPT-4o mini has a lower latency compared to average, taking 0.54s to receive the first token (TTFT).
Context Window:
GPT-4o mini has a smaller context windows than average, with a context window of 130k tokens.

https://artificialanalysis.ai/models/gpt-4o-mini GPT-4o mini - Quality, Performance & Price Analysis | Artificial Analysis
Sonar 3.1 Large: Quality, Performance & Price Analysis
Analysis of Perplexity's Sonar 3.1 Large and comparison to other AI models across key metrics including quality, price, performance (tokens per second & time to first token), context window & more. Click on any model to compare API providers for that model. For more details including relating to our methodology, see our FAQs.
For analysis of API providers see

Sonar 3.1 Large API Providers comparison

https://artificialanalysis.ai/models/sonar-3-1-large-chat Sonar 3.1 Large - Quality, Performance & Price Analysis | Artificial Analysis
Getty Images Partners with NVIDIA to Upgrade AI Image Generation Tool: Generate 4 Images in 6 Seconds

Global image repository giant Getty Images has partnered with tech titan NVIDIA to introduce a cutting-edge AI image generation tool. This is no ordinary upgrade; it represents a significant leap in speed, quality, and accuracy!

The new AI model can generate four images in approximately 6 seconds, doubling the speed of its predecessor! Imagine pressing the shutter and, in the blink of an eye, four high-definition beautiful images appear before you—the speed is almost unbelievable.

Key Points:

🚀 Ultra-fast Experience: New AI model generates 4 images in 6 seconds, doubling the speed!

🎨 Quality Leap: Adopts NVIDIA Edify architecture, significantly improving image quality and output speed.

🛠 Unlimited Creativity: Introduces AI image modification features, allowing for one-click element changes, canvas expansion, and more creative freedom.

These are the exciting upgrades brought by Getty Images and NVIDIA's collaboration on AI image generation tools. Let's look forward to how it will transform our creative world!

#GettyImages#NVIDIA#AI Image Generation#Midjourney
Lenovo's Profits Increase for the Second Consecutive Quarter Driven by AI Demand

Lenovo's profits have grown for the second consecutive quarter, with the world's largest personal computer manufacturer emerging from an industry slump that has lasted for years and seizing the opportunity of artificial intelligence to drive growth.

According to financial reports, Lenovo's net profit reached $243 million in the three months ending June 30, a year-on-year increase of 38%. This achievement not only exceeded analysts' expectations but also breathed new life into Lenovo after years of industry downturn.

Key Points:

1. 🚀 Lenovo's second-quarter net profit reached $243 million, a 38% year-on-year increase, exceeding analyst expectations.

2. 💻 Total revenue of $15.45 billion, a 20% increase, mainly driven by computers and AI technology.

3. 🌍 Lenovo is actively expanding non-PC business, with non-PC revenue accounting for 47% of total sales, and infrastructure business sales growing by 65%.
#Lenovo#Artificial Intelligence#Copilot#Snapdragon
OpenAI Launches SWE-bench Verified: Enhancing AI Software Engineering Capability Assessment
OpenAI announced the launch of SWE-bench Verified, a code generation evaluation benchmark, on August 13th. This new benchmark aims to more accurately assess the performance of AI models in software engineering tasks, addressing several limitations of the previous SWE-bench.

SWE-bench is an evaluation dataset based on real software issues from GitHub, containing 2,294 Issue-Pull Request pairs from 12 popular Python repositories. However, the original SWE-bench had three main issues: overly strict unit tests that could reject correct solutions, unclear problem descriptions, and unreliable development environment setup.
SoftBank and Intel Discuss Cooperation to Challenge Nvidia

Recently, it has been reported that SoftBank is in talks with Intel to explore a collaboration in the field of artificial intelligence (AI) chips, aiming to compete with the market leader NVIDIA. With the rapid advancement of AI technology, the demand for high-performance computing chips is growing stronger, and NVIDIA's dominant position in this sector is undoubtedly putting pressure on other companies.
Can Llama 8B Outsmart GPT-4o Using Search Engines?
Recently, a new study has brought excitement, demonstrating that Large Language Models (LLMs) can significantly enhance their performance through search functionalities. Notably, the Llama3.1 model with only 800 million parameters, after 100 searches, performed on par with GPT-4o in Python code generation tasks.

This idea seems reminiscent of Rich Sutton's pioneering work in reinforcement learning, particularly his 2019 classic blog post, "The Bitter Lesson." He emphasized the power of general methods as computational capabilities improve, highlighting "search" and "learning" as excellent choices that can continue to scale.
Disrupting Tradition! Lumina-mGPT Can Create Realistic and High-Resolution Images from Text
Multimodal generative models are at the forefront of the latest trends in artificial intelligence, dedicated to integrating visual and textual data to create systems capable of handling a variety of tasks. These tasks range from generating high-detail images from textual descriptions to understanding and reasoning across different data types, driving the emergence of more interactive and intelligent AI systems that seamlessly combine vision and language.

A key challenge in this field is developing autoregressive (AR) models that can generate realistic images based on textual descriptions. While diffusion models have made significant strides in this area, AR models have lagged behind, particularly in terms of image quality, resolution flexibility, and the ability to handle various visual tasks. This gap has prompted researchers to seek innovative methods to enhance the capabilities of AR models.
Flux AI
AI image generation - create art at the click of a button.

InternationalSelection
Image
AI Image Generation
Text to Image
Visit
Flux AI is an advanced text-to-image AI model developed by Black Forest Labs that employs a transformer-based flow model to generate high-quality images. Key advantages of this technology include outstanding visual quality, strict adherence to prompts, diverse dimensions/aspect ratios, and varied typography and outputs. Flux AI offers three variants: FLUX.1 [pro], FLUX.1 [dev], and FLUX.1 [schnell], each designed for different use cases and performance levels. Flux AI aims to make cutting-edge AI technology accessible to everyone by providing FLUX.1 [schnell] as a free open-source model, ensuring that individuals, researchers, and small developers can benefit from advanced AI technology without financial barriers.
A Brief History of HuggingFace
Founded in 2016, HuggingFace (named after the popular emoji 🤗) started as a chatbot company and later transformed into an open-source provider of NLP technologies. The chatbot company at the time, aimed at the teenage demographic, was focused on:
(...) building an AI so that you’re having fun talking with it. When you’re chatting with it, you’re going to laugh and smile — it’s going to be entertaining
- Clem Delangue, CEO & Co-founder
Like Tamagotchi, the chatbot could talk coherently about a wide range of topics, detect emotions in text, and adapt its tone accordingly.
Underlying this chatbot, however, were HuggingFace's main strengths: in-house NLP models (one such one was called Hierarchical Multi-Task Learning (HMTL)) and a managed library of pre-trained NLP models. This would serve as the early backbone of the transformers we know of currently.
The early PyTorch transformers established compatibility between PyTorch and TensorFlow 2.0, which then enabled users to move easily from one framework to another during the life of a model. Coupled with the release of the “Attention Is All You Need” paper by Google.
The shift to transformers in the NLP space, HuggingFace, who had already released parts of the powerful library powering their chatbot as an open-source project on GitHub, began to focus on open-sourcing popular large language models to PyTorch such as BERT and GPT.
With the most recent Series C funding round leading to $2 billion in evaluation, HuggingFace currently offers an ecosystem of models and datasets spread across its various tools like HuggingFace Hub, transformers, diffusers, and more.
Advanced usage & custom training
Let's create our own logic for a more customized training.

✏️ Preparing a dataset
The dataset will vary based on the task you work on. Let's work on sequence classification!

Our dataset will be composed of sentences and their associated classes. For example if you wanted to identify the subject of a conversation, you could create a dataset such as:

input class
The team scored a goal in the last seconds sports
The debate was heated between the 2 parties politics
I've never tasted croissants so delicious ! food
The objective of our trained model will be to correctly identify the class associated to new sentences.

🔎 Finding a dataset
If you don't have the right dataset, you can always explore the Datasets Hub. The "topic classification" category contains many datasets suitable for prototyping this model.

We select "Yahoo! Answers Topic Classification" and visualize it with the Datasets viewer.
🛠 Installation and set-up
We need the following 🤗 Hugging Face libraries:

transformers contains an API for training models and many pre-trained models
tokenizers is automatically installed by transformers and "tokenize" our data (ie it converts text to sequence of numbers)
datasets contains a rich source of data and common metrics, perfect for prototyping
We also install wandb to automatically instrument our training.


[ ]
!pip install datasets wandb evaluate accelerate -qU
!wget https://raw.githubusercontent.com/huggingface/transformers/master/examples/pytorch/text-classification/run_glue.py

[ ]
# the run_glue.py script requires transformers dev
!pip install -q git+https://github.com/huggingface/transformers
We finally make sure we're logged into W&B so that our experiments can be associated to our account.


[ ]
import wandb


[ ]
wandb.login()
💡 Configuration tips
W&B integration with Hugging Face can be configured to add extra functionalities:

auto-logging of models as artifacts: just set environment varilable WANDB_LOG_MODEL to true
log histograms of gradients and parameters: by default gradients are logged, you can also log parameters by setting environment variable WANDB_WATCH to all
set custom run names with run_name arg present in scripts or as part of TrainingArguments
organize runs by project with the WANDB_PROJECT environment variable
For more details refer to W&B + HF integration documentation.

Let's log every trained model.
Optimize 🤗 Hugging Face models with Weights & Biases
Hugging Face provides tools to quickly train neural networks for NLP (Natural Language Processing) on any task (classification, translation, question answering, etc) and any dataset with PyTorch and TensorFlow 2.0.

Coupled with Weights & Biases integration, you can quickly train and monitor models for full traceability and reproducibility without any extra line of code! You just need to install the library, sign in, and your experiments will automatically be logged:

pip install wandb
wandb login
Note: To enable logging to W&B, set report_to to wandb in your TrainingArguments or script.

W&B integration with 🤗 Hugging Face can automatically:

log your configuration parameters
log your losses and metrics
log gradients and parameter distributions
log your model
keep track of your code
log your system metrics (GPU, CPU, memory, temperature, etc)
Here's what the W&B interactive dashboard will look like: