HF-hub

Share and discover more about AI with social posts from the community.huggingface/OpenAi

06:37 · Aug 22, 2024 · Thu

🔔 Release: small-text v1.4.1

The new release contains some smaller bugfixes. Check it out!

Github: https://github.com/webis-de/small-text
Paper:
Small-Text: Active Learning for Text Classification in Python (2107.10314)

GitHub

GitHub - webis-de/small-text: Active Learning for Text Classification in Python

Active Learning for Text Classification in Python. Contribute to webis-de/small-text development by creating an account on GitHub.

06:37 · Aug 22, 2024 · Thu

Put together a small repo showing how to go from making your own fine-tuning dataset w/ services like Groq & Together to publishing that model on ollama.

In my case I fine-tuned SmolLM-360M to be a better assistant for my Pi-Card (previous post) project.

Check it out!
https://github.com/nkasmanoff/ft-flow

GitHub

GitHub - nkasmanoff/ft-flow: Synthetic data to inference for LLM finetuning

Synthetic data to inference for LLM finetuning. Contribute to nkasmanoff/ft-flow development by creating an account on GitHub.

06:37 · Aug 22, 2024 · Thu

ResShift 1-Click Windows, RunPod, Massed Compute, Kaggle Installers with Amazing Gradio APP and Batch Image Processing. ResShift is Efficient Diffusion Model for Image Super-resolution by Residual Shifting (NeurIPS 2023, Spotlight).

Official Repo : https://github.com/zsyOAOA/ResShift

I have developed a very advanced Gradio APP.

GitHub

GitHub - zsyOAOA/ResShift: ResShift: Efficient Diffusion Model for Image Super-resolution by Residual Shifting (NeurIPS@2023 Spotlight…

ResShift: Efficient Diffusion Model for Image Super-resolution by Residual Shifting (NeurIPS@2023 Spotlight, TPAMI@2024) - zsyOAOA/ResShift

06:37 · Aug 22, 2024 · Thu

🚀 Introducing Hugging Face Similar: a Chrome extension to find relevant datasets!

✨ Adds a "Similar Datasets" section to Hugging Face dataset pages
🔍 Recommendations based on dataset READMEs
🏗 Powered by https://huggingface.co/chromadb and https://huggingface.co/Snowflake embeddings.

You can try it here: https://chromewebstore.google.com/detail/hugging-face-similar/aijelnjllajooinkcpkpbhckbghghpnl?authuser=0&hl=en.

I am very happy to get feedback on whether this could be useful or not 🤗

huggingface.co

chromadb (chroma)

Org profile for chroma on Hugging Face, the AI community building the future.

06:37 · Aug 22, 2024 · Thu

🤗 Serving Meta Llama 3.1 405B on Google Cloud is now possible via the Hugging Face Deep Learning Containers (DLCs) for Text Generation Inference (TGI)

In this post, we showcase how to deploy
meta-llama/Meta-Llama-3.1-405B-Instruct-FP8
on an A3 instance with 8 x H100 GPUs on Vertex AI

Thanks to the Hugging Face DLCs for TGI and Google Cloud Vertex AI, deploying a high-performance text generation container for serving Large Language Models (LLMs) has never been easier. And we’re not going to stop here – stay tuned as

06:37 · Aug 22, 2024 · Thu

🚀 How The Washington Post Uses AI to Empower Journalists 🔍📰

An exciting new example in the world of AI-assisted journalism! The Post has developed an internal tool called "Hayatacker" that's enhancing in-depth reporting. Here's why it matters:

🎥 What it does:
• Extracts stills from video files
• Processes on-screen text

06:36 · Aug 22, 2024 · Thu

🚀 We will be generating a preference dataset for DPO/ORPO and cleaning it with AI feedback during our upcoming meetup!

In this session, we'll walk you through the essentials of building a distilabel pipeline by exploring two key use cases: cleaning an existing dataset and generating a preference dataset for DPO/ORPO. You’ll also learn how to make the most of AI feedback, integrating Argilla to gather human feedback and improve the overall data quality.

06:36 · Aug 22, 2024 · Thu

𝗚𝗼𝗼𝗴𝗹𝗲 𝗽𝗮𝗽𝗲𝗿 : 𝘀𝗰𝗮𝗹𝗶𝗻𝗴 𝘂𝗽 𝗶𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲 𝗰𝗼𝗺𝗽𝘂𝘁𝗲 𝗯𝗲𝗮𝘁𝘀 𝟭𝟰𝘅 𝗹𝗮𝗿𝗴𝗲𝗿 𝗺𝗼𝗱𝗲𝗹𝘀 🚀

Remember scaling laws? These are empirical laws that say "the bigger your model, the better it gets". More precisely, "as your compute increases exponentially, loss decreases in a linear fashion". They have wild implications, suggesting that spending 100x more training compute would make you super-LLMs. That's why companies are racing to build the biggest AI superclusters ever, and Meta bought 350k H100 GPUs, which probably cost in the order of $1B.

But think of this : we're building huge reasoning machines, but only ask them to do one pass through the mod

06:36 · Aug 22, 2024 · Thu

🚀 Meet the new GLiNER architecture 🚀
GLiNER revolutionized zero-shot NER by demonstrating that lightweight encoders can achieve excellent results. We're excited to continue R&D with this spirit 🔥. Our new bi-encoder and poly-encoder architectures were developed to address the main limitations of the original GLiNER architecture and bring the following new possibilities:

🔹 An unlimited number of entities can be recognized at once.
🔹Faster inference when entity embeddings are preprocessed.
🔹Better generalization to unseen entities.

06:36 · Aug 22, 2024 · Thu

'Legal Dictionary GPT' is now completely trained and ready for Open Source release to the world! Trained on 10,000 rows of legal definitions, Legal Dictionary GPT is your go-to resource for everything related to the first step in understanding the law, defining it. The model is free and publicly available for anyone to use.

Model Link: https://platform.openai.com/playground/chat?preset=eCrKdaPe9cnMnyTETqWDCQAU

Knowledge Base Bots are internal facing as opposed to external facing LLM models, that are either fine tuned or RAG tuned, generally on systems and processes related data.

Openai

OpenAI Platform

Explore developer resources, tutorials, API docs, and dynamic examples to get the most out of OpenAI's platform.

06:36 · Aug 22, 2024 · Thu

BIG update dropped for
bigdata-pw/Flickr
- now ~515M images! Target for the next update: 1B

In case you missed them; other recent drops include
bigdata-pw/Dinosaurs
- a small set of BIG creatures 🦕🦖 and the first in a series of articles about the art of web scraping! https://huggingface.co/blog/hlky/web-scraping-101 https://huggingface.co/blog/hlky/web-scraping-102

Stay tuned for exciting datasets and models coming soon:
- PC and Console game screenshots
- TV/Film actors biographies and photos (thin

huggingface.co

Web Scraping 101

A Blog post by hlky on Hugging Face

06:36 · Aug 22, 2024 · Thu

We are proud to release our latest suite of three image(s)-to-3D Gradio demos and two new papers.

SpaRP (Unposed sparse views to 3D):
sudo-ai/SpaRP

SpaRP: Fast 3D Object Reconstruction and Pose Estimation from Sparse Views (2408.10195)

MeshFormer (@minghua @NCJ ):
sudo-ai/MeshFormer

MeshFormer: High-Quality Mesh Generation with 3D-Guided Reconstruction Model (2408.10198)

MeshLRM-reproduced (@sarahwei0210 ):
sudo-ai/MeshLRM
https://huggingface.co/spaces/sudo-ai/MeshLRM

huggingface.co

MeshLRM (Unofficial) - a Hugging Face Space by sudo-ai

Discover amazing ML apps made by the community

06:35 · Aug 22, 2024 · Thu

Cooked up a cool & much faster AI voice assistant space that also supports speech translation (with seamless-expressive). Start with the phrase "Please translate" followed by the speech you'd like to translate, to activate speech translation mode. Using opensource LLMs (Llama 3, Mistral etc) with edge tts for voice assistant and seamless-expressive for speech translation.

Give it a try:
Jaward/optimus
https://huggingface.co/spaces/Jaward/optimus

huggingface.co

Optimus - a Hugging Face Space by Jaward

Discover amazing ML apps made by the community

06:35 · Aug 22, 2024 · Thu

Woman.ru Forum Posts Dataset -
nyuuzyou/womanru-posts

📊 Dataset highlights:

- 1,308,238 forum posts extracted from Woman.ru
- Includes original posts and replies from various threads
- Each entry contains URL, title, original post, date, and replies
- Primarily in Russian language

06:34 · Aug 22, 2024 · Thu

The Minimalist Spaces That May Be Helpful !!
Grab Doc | Type Byte | SD3 CLI

- Grab Doc:
prithivMLmods/GRAB-DOC

- Type Byte:
prithivMLmods/Type-Byte

- SD3 CLI:
prithivMLmods/SD3-CLI

06:34 · Aug 22, 2024 · Thu

Falcon Mamba now available now in llama.cpp !
Check out GGUF files uploaded here:
tiiuae/falconmamba-7b-66b9a580324dd1598b0f6d4a

06:34 · Aug 22, 2024 · Thu

This isn’t a goal of ours because we have plenty of money in the bank but quite excited to see that @huggingfaceis profitable these days, with 220 team members and most of our platform being free (like model hosting) and open-source for the community!

Especially noteworthy at a time when most AI startups wouldn’t survive a year or two without VC money. Yay!

06:34 · Aug 22, 2024 · Thu

Calling all Hugging Face users! We want to hear from YOU!

What feature or improvement would make the biggest impact on Hugging Face?

Whether it's the Hub, better documentation, new integrations, or something completely different – we're all ears!

Your feedback shapes the future of Hugging Face. Drop your ideas in the comments below! 👇

06:34 · Aug 22, 2024 · Thu

NEW TASK ALERT 🚨
Extractive Question Answering: because sometimes generative is not all you need 😉
AutoTrain is the only open-source, no code solution to offer so many tasks across different modalities. Current task count: 23 🚀
Check out the blog post on getting started with this task: https://huggingface.co/blog/abhishek/extractive-qa-autotrain

huggingface.co