HF-hub - Share and discover more about AI with social posts from the community.huggingface/OpenAi
Share and discover more about AI with social posts from the community.huggingface/OpenAi
I'm excited to announce that Transformers.js V3 is finally available on NPM! 🔥 State-of-the-art Machine Learning for the web, now with WebGPU support! 🤯⚡️

Install it from NPM with:
𝚗𝚙𝚖 𝚒 @𝚑𝚞𝚐𝚐𝚒𝚗𝚐𝚏𝚊𝚌𝚎/𝚝𝚛𝚊𝚗𝚜𝚏𝚘𝚛𝚖𝚎𝚛𝚜

or via CDN, for example: https://v2.scrimba.com/s0lmm0qh1q

Segment Anything demo:
webml-community/segment-anything-webgpu
Imagen 3
Published on Aug 14
·
Submitted by
akhaliq
on Aug 14
#2 Paper of the day
Authors:
Imagen-Team-Google
,

Jason Baldridge
,
Jakob Bauer
,
Mukul Bhutani
,
Nicole Brichtova
,
Andrew Bunner
,
Kelvin Chan
,

Yichang Chen
,
Sander Dieleman
,

Yuqing Du
,

Zach Eaton-Rosen
,

Hongliang Fei
,
Nando de Freitas
,
Yilin Gao
,
Evgeny Gladchenko
,
Sergio Gómez Colmenarejo
,
Mandy Guo
,

Alex Haig
,
Will Hawkins
,

Hexiang Hu
,
Huilian Huang
,

Tobenna Peter Igwe
+229 authors
Abstract
We introduce Imagen 3, a latent diffusion model that generates high quality images from text prompts. We describe our quality and responsibility evaluations. Imagen 3 is preferred over other state-of-the-art (SOTA) models at the time of evaluation. In addition, we discuss issues around safety and representation, as well as methods we used to minimize the potential harm of our models.
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs
Published on Aug 14
·
Submitted by
akhaliq
on Aug 14
#1 Paper of the day
Authors:

Yushi Bai
,
Jiajie Zhang
,

Xin Lv
,

Linzhi Zheng
,
Siqi Zhu
,
Lei Hou
,

Yuxiao Dong
,

Jie Tang
,

Juanzi Li
Abstract
Current long context large language models (LLMs) can process inputs up to 100,000 tokens, yet struggle to generate outputs exceeding even a modest length of 2,000 words. Through controlled experiments, we find that the model's effective generation length is inherently bounded by the sample it has seen during supervised fine-tuning (SFT). In other words, their output limitation is due to the scarcity of long-output examples in existing SFT datasets. To address this, we introduce AgentWrite, an agent-based pipeline that decomposes ultra-long generation tasks into subtasks, enabling off-the-shelf LLMs to generate coherent outputs exceeding 20,000 words. Leveraging AgentWrite, we construct LongWriter-6k, a dataset containing 6,000 SFT data with output lengths ranging from 2k to 32k words. By incorporating this dataset into model training, we successfully scale the output length of existing models to over 10,000 words while maintaining output quality. We also develop LongBench-Write, a comprehensive benchmark for evaluating ultra-long generation capabilities. Our 9B parameter model, further improved through DPO, achieves state-of-the-art performance on this benchmark, surpassing even much larger proprietary models. In general, our work demonstrates that existing long context LLM already possesses the potential for a larger output window--all you need is data with extended output during model alignment to unlock this capability. Our code & models are at: https://github.com/THUDM/LongWriter. GitHub - THUDM/LongWriter: LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs
InfinityMATH: A Scalable Instruction Tuning Dataset in Programmatic Mathematical Reasoning
Published on Aug 9
·
Submitted by
akhaliq
on Aug 15
Authors:
Bo-Wen Zhang
,
Yan Yan
,

Lin Li
,
Guang Liu
Abstract
Recent advancements in Chain-of-Thoughts (CoT) and Program-of-Thoughts (PoT) methods have greatly enhanced language models' mathematical reasoning capabilities, facilitating their integration into instruction tuning datasets with LLMs. However, existing methods for large-scale dataset creation require substantial seed data and high computational costs for data synthesis, posing significant challenges for scalability. We introduce InfinityMATH, a scalable instruction tuning dataset for programmatic mathematical reasoning. The construction pipeline emphasizes decoupling numbers from mathematical problems to synthesize number-independent programs, enabling efficient and flexible scaling while minimizing dependency on specific numerical values. Fine-tuning experiments with open-source language and code models, such as Llama2 and CodeLlama, demonstrate the practical benefits of InfinityMATH. These fine-tuned models, showed significant relative improvements on both in-domain and out-of-domain benchmarks, ranging from 184.7% to 514.3% on average. Additionally, these models exhibited high robustness on the GSM8K+ and MATH+ benchmarks, which are enhanced version of test sets with simply the number variations. InfinityMATH ensures that models are more versatile and effective across a broader range of mathematical problems. The data is available at https://huggingface.co/datasets/flagopen/InfinityMATH. flagopen/InfinityMATH · Datasets at Hugging Face
Generative Photomontage
Published on Aug 14
·
Submitted by
akhaliq
on Aug 15
Authors:
Sean J. Liu
,
Nupur Kumari
,
Ariel Shamir
,
Jun-Yan Zhu
Abstract
Text-to-image models are powerful tools for image creation. However, the generation process is akin to a dice roll and makes it difficult to achieve a single image that captures everything a user wants. In this paper, we propose a framework for creating the desired image by compositing it from various parts of generated images, in essence forming a Generative Photomontage. Given a stack of images generated by ControlNet using the same input condition and different seeds, we let users select desired parts from the generated results using a brush stroke interface. We introduce a novel technique that takes in the user's brush strokes, segments the generated images using a graph-based optimization in diffusion feature space, and then composites the segmented regions via a new feature-space blending method. Our method faithfully preserves the user-selected regions while compositing them harmoniously. We demonstrate that our flexible framework can be used for many applications, including generating new appearance combinations, fixing incorrect shapes and artifacts, and improving prompt alignment. We show compelling results for each application and demonstrate that our method outperforms existing image blending methods and various baselines.
https://huggingface.co/papers/2408.07116 Paper page - Generative Photomontage
LGM Full
This custom pipeline encapsulates the full LGM pipeline, including multi-view diffusion.

It is provided as a resource for the ML for 3D Course.

Original LGM paper: LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation.https://huggingface.co/Thever/LGM-Thever Thever/LGM-Thever · Hugging Face
[ECCV 2024] VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models
Porject page, Paper link

VFusion3D is a large, feed-forward 3D generative model trained with a small amount of 3D data and a large volume of synthetic multi-view data. It is the first work exploring scalable 3D generative/reconstruction models as a step towards a 3D foundation.

VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models
Junlin Han, Filippos Kokkinos, Philip Torr
GenAI, Meta and TVG, University of Oxford
European Conference on Computer Vision (ECCV), 2024

News
[08.08.2024] HF Demo is available, big thanks to Jade Choghari's help for making it possible.
[25.07.2024] Release weights and inference code for VFusion3D.
Quick Start
Getting started with VFusion3D is super easy! 🤗 Here’s how you can use the model with Hugging Face:https://huggingface.co/facebook/vfusion3d facebook/vfusion3d · Hugging Face
Let’s see JEPA in action🤖
Simplified image-based implementation training on a CPU with live preview support - very satisfying to watch:)

I-JEPA is the image-based version of JEPA (Joint-Embedding Predictive Architecture - an alternative to autoregressive LLM architectures ) pioneered by professor Yann Lecun.

At a higher level, I-JEPA predicts image segment representations (Target) based on representations of other segments within the same image (Context). It consists of three key components: a context encoder, target encoder and a predictor.

Code: https://github.com/Jaykef/ai-algorithms/blob/main/mnist_ijepa.ipynb
Introducing Fineweb-Edu-Fortified: An enhanced Fineweb-Edu dataset. 📚

This dataset is tailored for NLP tasks and helps streamline model training by offering a more refined, unique dataset. Perfect for startups and researchers looking for high-quality educational content to train, evaluate, or fine-tune AI models. The dataset is based on the Fineweb-Edu subset of the large Fineweb dataset and includes:

- Exact-match deduplication across all crawls
- Embeddings for each row using the TaylorAI/bge-micro model
- Count column indicating duplication frequency
- Includes data from 95 Common Crawl crawls (2013-2024)
- Rows have been reduced from 1.279B to 0.324B after deduplication
- It is comprised of ~375B tokens (down from 1,320B in Fineweb-Edu)

Access the entire Fineweb-Edu-Fortified dataset on Hugging Face →
airtrain-ai/fineweb-edu-fortified


Try a semantic search demo via this Hugging Face Space →
airtrain-ai/fineweb-edu-fortified-search-demo


Many thanks to the amazing @josh-sematic for his work on this project, the Fineweb/Fineweb-Edu team at Hugging Face for producing the original datasets and for their support during our work on Fineweb-Edu-Fortified, and also thanks to @underspirit for pointing out the reduction in dataset size that could be achieved via deduplication. 🤗
AuraSR Giga Upscaler V1 by SECourses - Upscales to 4x

AuraSR is a 600M parameter upsampler model derived from the GigaGAN paper. It works super fast and uses a very limited VRAM below 5 GB. It is deterministic upscaler. It works perfect in some images but fails in some images so it is worth to give it a shot.

GitHub official repo : https://github.com/fal-ai/aura-sr

I have developed 1-click installers and a batch upscaler App.

You can download installers and advanced batch App from below link:
https://www.patreon.com/posts/110060645

Check the screenshots and examples below

Windows Requirements

Python 3.10, FFmpeg, Cuda 11.8, C++ tools and Git

If it doesn't work make sure to below tutorial and install everything exactly as shown in this below tutorial

https://youtu.be/-NjNy7afOQ0

How to Install and Use on Windows

Extract the attached GigaGAN_Upscaler_v1.zip into a folder like c:/giga_upscale

Then double click and install with Windows_Install.bat file

It will generate an isolated virtual environment venv folder and install requirements

Then double click and start the Gradio App with Windows_Start_App.bat file

When first time running it will download models into your Hugging Face cache folder

Hugging Face cache folder setup explained below

https://www.patreon.com/posts/108419878

All upscaled images will be saved into outputs folder automatically with same name and plus numbering if necessary

You can also batch upscale a folder

How to Install and Use on Cloud

Follow Massed Compute and RunPod instructions

Usage is same as on Windows

For Kaggle start a Kaggle notebook, import our Kaggle notebook and follow the instructions

App Screenshots and Examples below GitHub - fal-ai/aura-sr: AuraSR: GAN-based Super-Resolution for real-world
📸Photo LoRA Drop📸

I've been working on this one for a few days, but really I've had this dataset for a few years! I collected a bunch of open access photos online back in late 2022, but I was never happy enough with how they played with the base model!

I am so thrilled that they look so nice with Flux!

This for me is a version one of this model - I still see room for improvement and possibly expansion of it's 40 image dataset. For those who are curious:

40 Image
3200 Steps
Dim 32
3e-4

Enjoy! Create! Big thank you to Glif for sponsoring the model creation! :D

alvdansen/flux_film_foto
🚀 Introducing ChemVLM, the first open-source multimodal large language model dedicated to chemistry!
🌟Comparable performances with commercial models or specific OCR model but with dialogue capabilities!
2B/26B Models Here!
AI4Chem/ChemVLM-26B

Seeing and Understanding: Bridging Vision with Chemical Knowledge Via ChemVLM (2408.07246)
Came across this awesome interactive website today

-- open-source project explains everything about LLM Transformer Models!

- provides a detailed, visual explanation of how those models work.

A great resource for anyone looking to gain a deeper understanding of how Transformer-based AI models like GPT work, including:

- Self-attention mechanisms
- Encoder-decoder architecture
- Positional encoding
- Multi-head attention
https://poloclub.github.io/transformer-explainer/
📌 Golden-Retriever enhances Retrieval Augmented Generation (RAG) for industrial knowledge bases. Addresses challenges with domain-specific jargon and context interpretation.

📌 Results: Golden-Retriever improves total score of Meta-Llama-3-70B by 79.2% over vanilla LLM, 40.7% over RAG. Average improvement across three LLMs: 57.3% over vanilla LLM, 35.0% over RAG.

📌 Introduces reflection-based question augmentation before document retrieval. Identifies jargon, clarifies meaning based on context, augments question accordingly.

📌 Offline process: OCR extracts text from various document formats. LLMs summarize and contextualize to enhance document database.

📌 Online process: LLM identifies jargon and context in user query. Queries jargon dictionary for accurate definitions. Augments original question with clear context and resolved ambiguities.

📌 Jargon identification uses LLM instead of string-exact-match. Adapts to new terms, misspellings. Outputs structured list of identified terms.

📌 Context identification uses pre-specified context names and descriptions. LLM identifies context using few-shot examples with Chain-of-Thought prompting.

📌 Jargon dictionary queried using SQL. Retrieves extended definitions, descriptions, notes about identified terms.

📌 Augmented question integrates original query, context information, detailed jargon definitions. Explicitly states context, clarifies ambiguous terms.

📌 Fallback mechanism for unidentified jargon. Synthesizes response indicating missing information, instructs user to check spelling or contact knowledge base manager.

📌 Evaluation: Question-answering experiment using multiple-choice questions from new-hire training documents. Covers six domains, 9-10 questions each. Compared with vanilla LLM and RAG.
Prompt caching with
@AnthropicAI


Production-ready LLM applications often involve long, static instructions in every prompt. Anthropic's new prompt caching feature improves model latency by up to 80% and cost by up to 90% on such prompts.

Try it out in LangChain today!

Python: langchain-anthropic==0.1.23
JS: langchain/anthropic 0.2.15

Anthropic announcement: https://anthropic.com/news/prompt-caching Prompt caching with Claude
Even with preference alignment, LLMs can be enticed into harmful behavior via adversarial prompts 😈.

🚨 Breaking: our theoretical findings confirm:
LLM alignment is fundamentally limited!

More details, on framework, statistical bounds and phenomenal defense results 👇🏻
mistral-7b-instruct-v0.1-awq
Beta
Model ID: @hf/thebloke/mistral-7b-instruct-v0.1-awq

Mistral 7B Instruct v0.1 AWQ is an efficient, accurate and blazing-fast low-bit weight quantized Mistral variant.

Properties
Task Type: Text Generation

Use the Playground
Try out this model with Workers AI Model Playground. It does not require any setup or authentication and an instant way to preview and test a model directly in the browser.
https://playground.ai.cloudflare.com/?model=@hf/thebloke/mistral-7b-instruct-v0.1-awq
hermes-2-pro-mistral-7b
Beta Function calling
Model ID: @hf/nousresearch/hermes-2-pro-mistral-7b

Hermes 2 Pro on Mistral 7B is the new flagship 7B Hermes! Hermes 2 Pro is an upgraded, retrained version of Nous Hermes 2, consisting of an updated and cleaned version of the OpenHermes 2.5 Dataset, as well as a newly introduced Function Calling and JSON Mode dataset developed in-house.

Properties
Task Type: Text Generation

Use the Playground
Try out this model with Workers AI Model Playground. It does not require any setup or authentication and an instant way to preview and test a model directly in the browser.

Launch the Model Playground
https://playground.ai.cloudflare.com/?model=@hf/nousresearch/hermes-2-pro-mistral-7b
Cloudflare WARP client allows you to protect corporate devices by securely and privately sending traffic from those devices to Cloudflare’s global network, where Cloudflare Gateway can apply advanced web filtering. The WARP client also makes it possible to apply advanced Zero Trust policies that check for a device’s health before it connects to corporate applications.

Downloading and deploying the WARP client to your devices enhances the protection Cloudflare Zero Trust can provide to your users and data, wherever they are.

Here are a few ways in which the WARP client provides in-depth protection for your organization:

WARP lets you enforce security policies anywhere.
With the WARP client deployed in the Gateway with WARP mode, Gateway policies are not location-dependent — they can be enforced anywhere.

WARP lets you enforce HTTP filtering and user-based policies.
Download and install the WARP client to enable Gateway features such as Anti-Virus scanning, HTTP filtering, Browser Isolation, and identity-based policies.

WARP lets you have in-depth, application-specific insights.
With WARP installed on your corporate devices, you can populate the Zero Trust Shadow IT Discovery page with visibility down to the application and user level. This makes it easy to discover, analyze, and take action on any shadow IT your users may be using every day.

WARP allows you to build rich device posture rules.
The WARP client provides advanced Zero Trust protection by making it possible to check for device posture. By setting up device posture checks, you can build Zero Trust policies that check for a device’s location, disk encryption status, OS version, and more.https://developers.cloudflare.com/cloudflare-one/connections/connect-devices/warp/ WARP | Cloudflare Zero Trust docs
Build serverless applications and deploy instantly across the globe for exceptional performance, reliability, and scale.

Available on all plans
Cloudflare Workers provides a serverless execution environment that allows you to create new applications or augment existing ones without configuring or maintaining infrastructure.

Cloudflare Workers runs on Cloudflare’s global network in hundreds of cities worldwide, offering both Free and Paid plans.https://developers.cloudflare.com/workers/ Overview | Cloudflare Workers docs