HF-hub

Share and discover more about AI with social posts from the community.huggingface/OpenAi

10:48 · Sep 1, 2024 · Sun

I am training a controlnet model for Flux. And some of my experiences:

Checkpoint-10000:

https://x.com/kadirnar_ai/status/1829831750471606668

Checkpoint-12000:

https://x.com/kadirnar_ai/status/1829889524962640001

Checkpoint-14000:

https://x.com/kadirnar_ai/status/1829989622878744711

Checkpoint (16000-18000):

https://x.com/kadirnar_ai/status/1830179551407665654

Dataset:
kadirnar/fluxdev_controlnet_16k

GPU: 1xA100(80GB)
GPU Hours: 65

X (formerly Twitter)

Kadir Nar (@kadirnar_ai) on X

I started training controlnet-canny for Flux, but this is a small dataset. Model training will finish in 29 hours😍

Checkpoint-10000:

10:48 · Sep 1, 2024 · Sun

Understanding the json format response with HF's Serverless Inference API 🤗

As it stands, there seems to be an inconsistency with the OpenAI documentation on the question of implementing the JSON response format using the InferenceClient completion API.

After investigating the InferenceClient source code, I share the official solution using a JSON Schema. This consolidates the structure of the response and simplifies parsing as part of an automated process for extracting metadata, information:

from huggingface_hub import InferenceClient

client = InferenceClient("meta-llama/Meta-Llama-3-70B-Instruct")

messages = [
{
"role": "user",
"content": "I saw a puppy a cat and a raccoon during my bike ride in the park. What did I saw and when?",
},
]

response_format = {
"type": "json",
"value": {
"properties": {
"location": {"type": "string"},
"activity": {"type": "string"},
"animals_seen": {"type": "integer", "minimum": 1, "maximum": 5},
"animals": {"type": "array", "items": {"type": "string"}},
},
"required": ["location", "activity", "animals_seen", "animals"],
},
}

response = client.chat_completion(
messages=messages,
response_format=response_format,
max_tokens=500,
)

print(response.choices[0].message.content)

As a reminder, json mode is activated with the OpenAI client as follows:

response = client.chat.completions.create(
model="gpt-3.5-turbo-0125",
messages=[...],
response_format={"type": "json_object"}
)

One question remains unanswered, however, and will perhaps be answered by the community: it seems that an incompatibility persists for list of dictionaries generation, and currently, the production of simple dictionaries seems to be the only functional option.

10:48 · Sep 1, 2024 · Sun

Amazing day. AWPortrait-FL finally here!
🦖 AWPortrait-FL is finetuned on FLUX.1-dev using the training set of AWPortrait-XL and nearly 2,000 fashion photography photos with extremely high aesthetic quality.

🤗Model:
Shakker-Labs/AWPortrait-FL

🙇Demo:
vilarin/AWPortrait-FL

01:20 · Aug 31, 2024 · Sat

Check Your Redirects and Status Codes
Quickly analyze 301, 302 redirects and other HTTP status codes to optimize your website's performance and SEO.
https://link.zhihu.com/?target=https%3A%2F%2Fredirect-checker.girff.com%2F

01:19 · Aug 31, 2024 · Sat

https://link.zhihu.com/?target= https%3A%2F%2Fsaasinfopro.com%2F

01:18 · Aug 31, 2024 · Sat

The best-selling SaaS affiliate
programs that make you money.
Monetize your content through affiliate programs from top SaaS companies and generate a side income, fast.

https://www.nodeseek.com/jump?to=https%3A%2F%2Fsaasinfopro.com%2F

01:17 · Aug 31, 2024 · Sat

Check Your Redirects and Status Codes
Quickly analyze 301, 302 redirects and other HTTP status codes to optimize your website's performance and SEO.
https://www.nodeseek.com/jump?to=https%3A%2F%2Fredirect-checker.girff.com%2F

21:13 · Aug 30, 2024 · Fri

AI Video THUDM/CogVideoX-5b
CogVideoX is an open-source version of the video generation model originating from QingYing. The table below displays the list of video generation models we currently offer, along with their foundational information.

When testing using the diffusers library, all optimizations provided by the diffusers library were enabled. This solution has not been tested for actual VRAM/memory usage on devices other than NVIDIA A100 / H100. Generally, this solution can be adapted to all devices with NVIDIA Ampere architecture and above. If the optimizations are disabled, VRAM usage will increase significantly, with peak VRAM usage being about 3 times higher than the table shows. However, speed will increase by 3-4 times. You can selectively disable some optimizations, including:
pipe.enable_model_cpu_offload()
pipe.enable_sequential_cpu_offload()
pipe.vae.enable_slicing()
pipe.vae.enable_tiling()

When performing multi-GPU inference, the enable_model_cpu_offload() optimization needs to be disabled.
Using INT8 models will reduce inference speed. This is to ensure that GPUs with lower VRAM can perform inference normally while maintaining minimal video quality loss, though inference speed will decrease significantly.
The 2B model is trained with FP16 precision, and the 5B model is trained with BF16 precision. We recommend using the precision the model was trained with for inference.
PytorchAO and Optimum-quanto can be used to quantize the text encoder, Transformer, and VAE modules to reduce CogVideoX's memory requirements. This makes it possible to run the model on a free T4 Colab or GPUs with smaller VRAM! It is also worth noting that TorchAO quantization is fully compatible with torch.compile, which can significantly improve inference speed. FP8 precision must be used on devices with NVIDIA H100 or above, which requires installing the torch, torchao, diffusers, and accelerate Python packages from source. CUDA 12.4 is recommended.
The inference speed test also used the above VRAM optimization scheme. Without VRAM optimization, inference speed increases by about 10%. Only the diffusers version of the model supports quantization.
The model only supports English input; other languages can be translated into English during refinement by a large model.
https://huggingface.co/THUDM/CogVideoX-5b

huggingface.co

THUDM/CogVideoX-5b · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

04:32 · Aug 30, 2024 · Fri

Automated web scraping with playwright is becoming easier by the day. Now, using ollama tool calling, its possible to perform very high accuracy web scraping (in some cases 100% accurate) through just asking an LLM to scrape the content for you.

This can be completed in a multistep process similar to cohere's platform. If you have tried the cohere playground with web scraping, this will feel very similar. In my experience, the Llama 3.1 version is much better due to the larger context window. Both tools are great, but the difference is the ollama + playwright version is completely controlled by you.

All you need to do is wrap your scraper in a function:

async def query_web_scraper(url: str) -> dict:
scraper = WebScraper(headless=False)
return await scraper.query_page_content(url)

and then make your request:

# First API call: Send the query and function description to the model
response = ollama.chat(
model=model,
messages=messages,
tools=[
{
'type': 'function',
'function': {
'name': 'query_web_scraper',
'description': 'Scrapes the content of a web page and returns the structured JSON object with titles, articles, and associated links.',
'parameters': {
'type': 'object',
'properties': {
'url': {
'type': 'string',
'description': 'The URL of the web page to scrape.',
},
},
'required': ['url'],
},
},
},
]
)

To learn more:
Github w/ Playground: https://github.com/tdolan21/tool-calling-playground/blob/main/notebooks/ollama-playwright-web-scraping.ipynb
Complete Guide: https://medium.com/@tdolan21/building-an-llm-powered-web-scraper-with-ollama-and-playwright-6274d5d938b5

GitHub

tool-calling-playground/notebooks/ollama-playwright-web-scraping.ipynb at main · tdolan21/tool-calling-playground

A series of playgrounds and notebooks to utilize tool-calling. Feel free to contribute tools to the repository. - tdolan21/tool-calling-playground

04:32 · Aug 30, 2024 · Fri

Kwai-Kolors/Kolors-Virtual-Try-On

04:31 · Aug 30, 2024 · Fri

Thought this was an interesting graphic from the EAGLE blog post. It made me wonder if certain sampling methods have been shown to work better for certain tasks.

Does anyone know of any work looking at trends in the output token probability distribution by task type? (or similar)

Source: https://sites.google.com/view/eagle-llm

Google

EAGLE

by Yuhui Li, Fangyun Wei, Chao Zhang, and Hongyang Zhang, December 8, 2023
Vector Institute, University of Waterloo, Peking University, Microsoft Research
[Code with Apache-2.0] [EAGLE-1 Paper] [EAGLE-2 Paper]

04:31 · Aug 30, 2024 · Fri

Continuing my streak by releasing the Wikireading dataset: a large collection of scraped non-fiction books predominantly in Russian language.
its5Q/wikireading

Here's the highlights:
- ~7B tokens, or ~28B characters, making it a great candidate for use in pretraining
- Contains non-fiction works from many knowledge domains
- Includes both the original HTML and extracted text of book chapters

04:31 · Aug 30, 2024 · Fri

The word 'Lead' has three definitions. When an LLM model tokenizes this word, it is always the same token. Imagine being able to put any particular embedding at any particular time into a 'Quantum State'. When an Embedding is in a Quantum State, the word token could have up to 3 different meanings (x1, x2, x3). The Quantum State gets collapsed based on the individual context surrounding the word. 'Jill lead Joy to the store' would collapse to x1. 'Jill and Joy stumbled upon a pile of lead' would collapse to x3. Very simple, right? This method produces OFF THE CHARTS results:

https://www.youtube.com/watch?v=tuQI6A-EOqE

YouTube

Quantum Word Embeddings Are Off The Charts Effective

Link to Research Paper: https://zenodo.org/records/13532259

This video is about a research paper that incorporates Quantum-inspired concepts into word embeddings. The speaker, Richard Aragon, explains what the research paper is about and why he finds this…

04:31 · Aug 30, 2024 · Fri

The only 405B spaces still freely accessible are powered by SN fast api.

xianbao/SambaNova-fast

https://sambanova.ai/fast-api?api_ref=907266

04:30 · Aug 30, 2024 · Fri

Sharing for anyone using Diffusers from_single_file loading and affected by the Runway SD 1.5 issue.

If you have runwayml/stable-diffusion-v1-5 saved locally in your HF cache then loading single file checkpoints in the following way should still work.

from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_single_file("<url or path to single file checkpoint>")

If you do not have the model repo saved in your cache, then automatically inferring the pipeline config will not work since the reference repo runwayml/stable-diffusion-v1-5 doesn't exist anymore.

You can use an alternative SD1.5 repo id to still configure your pipeline.

from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_single_file("<url or path to single file checkpoint>", config="Lykon/DreamShaper")

We're working on resolving the issue ASAP.

06:13 · Aug 29, 2024 · Thu

X’s Grok bot now points to government website after election misinformation warnings - The Verge
https://www.theverge.com/2024/8/28/24230325/x-grok-chatbot-election-misinformation-warnings-vote

Klarna aims to halve workforce with AI-driven gains
https://www.ft.com/content/bfd9af3d-d607-4877-9571-078ab82a837e

Artificial intelligence: questioning the loss of employee autonomy - Le Monde (Google Translate)
https://www-lemonde-fr.translate.goog/emploi/article/2024/08/28/intelligence-artificielle-la-perte-d-autonomie-des-salaries-en-question_6297347_1698637.html?_x_tr_sl=fr&_x_tr_tl=en&_x_tr_hl=en-US&_x_tr_pto=wapp

Make AI tools to reduce teacher workloads, tech companies urged - The Guardian
https://www.theguardian.com/education/article/2024/aug/28/make-ai-tools-to-reduce-teacher-workloads-tech-companies-urged

Can Tech Executives Be Held Responsible for What Happens on Their Platforms?
https://www.nytimes.com/2024/08/28/technology/durov-telegram-liability-platforms.html

‘Being on camera is no longer sensible’: persecuted Venezuelan journalists turn to AI - The Guardian
https://www.theguardian.com/world/article/2024/aug/27/venezuela-journalists-nicolas-maduro-artificial-intelligence-media-election

Read my daily newsletter here: https://linkedin.com/pulse/ai-news-august-28th-2024-florent-daudens-o7mjc/

The Verge

X’s Grok bot now points to government website after election misinformation warnings

You can still use it to generate images of politicians, though.

06:13 · Aug 29, 2024 · Thu

Introducing "Writing in the Margins (WiM)" - better inference pattern for long context LLMs that solves the Lost-in-the-Middle problem 🔥

Paper page:
Writing in the Margins: Better Inference Pattern for Long Context Retrieval (2408.14906)

TL;DR
Make your model write "margin notes" as you chunk prefill the KV cache. Then ask it reread all notes before it speaks up.
Works with humans, works with AI 🤖

WiM leverages the chunked prefill of the key-value cache, which concurrently generates query-based extractive summaries at each step of the prefill that are subsequently reintegrated at the end of the computation. We term these intermediate outputs “margins”, drawing inspiration from the practice of making margin notes for improved comprehension of long contexts in human reading. We show that this technique, which adds only minimal additional computation, significantly improves LLMs long context reasoning capabilities.

Think: Every chunk has a chance to be attended to/ be at the end of the context at least once. 🎉

📊 Results:
- An average accuracy boost of 7.5% in multi-hop reasoning tasks like HotpotQA and MultiHop-RAG.
- Even a 30% increase in F1-score for summarisation-like tasks (CWE).

Plus, WiM fits seamlessly into interactive applications (think: progress bar!). It can provide real-time progress updates during data retrieval and integration, making it user-friendly and transparent - a stark contrast to feeding 1mln tokens to an LLMs and waiting 6 min for the first token. 🤯

👩‍💻🧑‍💻 Check it out and contribute to our open-source project here: https://github.com/writer/writing-in-the-margins

🧠 More about chunked prefill: https://docs.vllm.ai/en/latest/models/performance.html#chunked-prefill

GitHub

GitHub - writer/writing-in-the-margins

Contribute to writer/writing-in-the-margins development by creating an account on GitHub.

06:13 · Aug 29, 2024 · Thu

Had a funny thought, would it be at all possible to rework what shows up on our personal HF page?

Picture this: I upload a model to an organization, someone who follows me now has no idea that I've uploaded a model or to where, unless they also watch those repos (which also floods them with other notifications)

What if our main Huggingface page was a collection of both models that we've uploaded specifically to our profile, as well as models we've uploaded to organizations? That way it would all be contained in one central followable location, and I wouldn't have concerns about losing followership if I wanted to upload to an organization all of a sudden.

06:12 · Aug 29, 2024 · Thu

run Llama405B at over 100 tokens per second for free using SambaNova's API! https://sambanova.ai/fast-api?api_ref=444868

I have been able to generate some high quality synthetic data and use it as an LLM as a judge instead of the slower and more expensive alternatives like openAI or Anthropic.

sambanova.ai

Get Fast & Free AI Inference API | SambaNova Systems

Empower your AI applications with blazingly-fast inferencing using SambaNova’s Free API. Experience the future of AI with cutting-edge RDU chip technology.

06:12 · Aug 29, 2024 · Thu

fast-sentence-transformers - simply, faster, sentence-transformers

Released an initial version a while ago
Archived it because of a cleaner solution described in a blog by Philipp Schmid
Reimplemented it based on that cleaner solution
Unarchived the project
Packaged it up
Released a 0.5 version

pip install fast-sentence-transformers

https://github.com/davidberenstein1957/fast-sentence-transformers

GitHub

GitHub - davidberenstein1957/fast-sentence-transformers: Simply, faster, sentence-transformers

Simply, faster, sentence-transformers. Contribute to davidberenstein1957/fast-sentence-transformers development by creating an account on GitHub.

Before

After