HF-hub - Share and discover more about AI with social posts from the community.huggingface/OpenAi
Share and discover more about AI with social posts from the community.huggingface/OpenAi
Transformers.js V3 is finally available on NPM! 馃敟 State-of-the-art Machine Learning for the web, now with WebGPU support! 馃く鈿★笍

Install it from NPM with:
饾殫饾殭饾殩 饾殥 @饾殤饾殲饾殣饾殣饾殥饾殫饾殣饾殢饾殜饾殞饾殠/饾殱饾殯饾殜饾殫饾殰饾殢饾殬饾殯饾殩饾殠饾殯饾殰

or via CDN, for example: https://v2.scrimba.com/s0lmm0qh1q

Segment Anything demo:
webml-community/segment-anything-webgpu
Just wanted to share something cool I've been working on for the past 6 months! I wrote over 200 chat examples on my own (takes a lot longer than you think) to emulate female characters from popular television shows, movies, and comic books. Plus it runs on 8gb VRAM! Feel free to check out my model or provide feedback on what I can improve!

https://huggingface.co/rwitz/Femme-v0.1 rwitz/Femme-v0.1 路 Hugging Face
Stable Diffusion 3 available now, ComfyUI workflows
Greetings friends of fal!

At fal, we are building the fastest and most reliable inference cloud for generative AI. We are thrilled to announce some major updates. Here鈥檚 what鈥檚 new:

Stable Diffusion 3 Now Available
https://blog.fal.ai/stable-diffusion-3-on-fal-comfyui-workflows-and-more/
Introducing AuraSR - An open reproduction of the GigaGAN Upscaler
Today we are releasing AuraSR, a 600M parameter upsampler model derived from the GigaGAN paper. This model can upscale low-res images to 4x the resolution, and can be applied repeatedly. We are publishing this model under a truly open source license.

AuraSR excels in upscaling images generated by text-to-image models. This model does not have any limitations on resolution or upscaling factor.
https://blog.fal.ai/introducing-aurasr-an-open-reproduction-of-the-gigagan-upscaler-2/
AuraSR V2
Today we released the second version of our single step GAN upscaler: AuraSR.

We released AuraSR v1 last month and were encouraged by the community response so immediately started training a new version.
https://blog.fal.ai/aurasr-v2/
Announcing Flux by Black Forest Labs: The Next Leap in Text-to-Image Models
Flux, the largest SOTA open source text-to-image model to date, developed by Black Forest Labs鈥攖he original team behind Stable Diffusion is now available on fal. Flux pushes the boundaries of creativity and performance with an impressive 12B parameters, delivering aesthetics reminiscent of Midjourney.

To play around with the model now, check out the demo page here on fal.
https://blog.fal.ai/flux-the-largest-open-sourced-text2img-model-now-available-on-fal/
InternVL2-Llama3-76B
We are excited to announce the release of InternVL 2.0, the latest addition to the InternVL series of multimodal large language models. InternVL 2.0 features a variety of instruction-tuned models, ranging from 1 billion to 108 billion parameters. This repository contains the instruction-tuned InternVL2-Llama3-76B model.

Compared to the state-of-the-art open-source multimodal large language models, InternVL 2.0 surpasses most open-source models. It demonstrates competitive performance on par with proprietary commercial models across various capabilities, including document and chart comprehension, infographics QA, scene text understanding and OCR tasks, scientific and mathematical problem solving, as well as cultural understanding and integrated multimodal capabilities.

InternVL 2.0 is trained with an 8k context window and utilizes training data consisting of long texts, multiple images, and videos, significantly improving its ability to handle these types of inputs compared to InternVL 1.5. For more details, please refer to our blog and GitHub.https://huggingface.co/OpenGVLab/InternVL2-Llama3-76B OpenGVLab/InternVL2-Llama3-76B 路 Hugging Face
Parler-TTS Mini v1 is a lightweight text-to-speech (TTS) model, trained on 45K hours of audio data, that can generate high-quality, natural sounding speech with features that can be controlled using a simple text prompt (e.g. gender, background noise, speaking rate, pitch and reverberation).

With Parler-TTS Large v1, this is the second set of models published as part of the Parler-TTS project, which aims to provide the community with TTS training resources and dataset pre-processing code.https://huggingface.co/parler-tts/parler-tts-mini-v1 parler-tts/parler-tts-mini-v1 路 Hugging Face
Maybe The Best LLM with Small Parameters under 34B
Peach-9B-8k-Roleplay
Peach-9B-8k-Roleplay is a chat large language model obtained by finetuning 01-ai/Yi-1.5-9B model on more than 100K conversations created through our data synthesis approach.
How to start
The version of Transformers we are using is as follows, but a newer version may be available.

torch==1.13.1
gradio==3.50.2
transformers==4.37.2https://huggingface.co/ClosedCharacter/Peach-9B-8k-Roleplay ClosedCharacter/Peach-9B-8k-Roleplay 路 Hugging Face
XLabs AI is a part of an international company, a product laboratory where we strive to become leaders in machine learning and neural networks. The company develops and implements revolutionary solutions, setting new standards and inspiring to achieve the impossible in the field of information technology. Our team is an open, energized, and young collective that welcomes innovative ideas and supports the initiative and creativity of our employees.https://huggingface.co/XLabs-AI XLabs-AI (XLabs AI)
FLUX.1 Merged Models - Several different Schnell:Dev ratios
This repository includes merged models from black-forest-labs/FLUX.1-dev and black-forest-labs/FLUX.1-schnell, in different ratios. The licenses of both models apply to these merged models.

Inspired by: https://huggingface.co/sayakpaul/FLUX.1-merged

Motivation
The goal is to create modeld which balance generation speed - allowing near-Dev generations in more like 4-16 generations.

Results
FLUX.1 [dev] is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions. For more information, please read our blog post.

Key Features
Cutting-edge output quality, second only to our state-of-the-art model FLUX.1 [pro].
Competitive prompt following, matching the performance of closed source alternatives .
Trained using guidance distillation, making FLUX.1 [dev] more efficient.
Open weights to drive new scientific research, and empower artists to develop innovative workflows.
Generated outputs can be used for personal, scientific, and commercial purposes as described in the flux-1-dev-non-commercial-license.
SAM2 Video Predictor
This is a simple demo for video segmentation with SAM2.

Instructions: (read the instructions)

Upload your video [MP4-24fps]
With 'include' point type selected, Click on the object to mask on first frame
Switch to 'exclude' point type if you want to specify an area to avoid
Get Mask !
Check Propagation every 15 frames
Add point on corresponding frame number if any mask needs to be refined
If propagation seems ok on every 15 frames, propagate with "render" to render final masked video !
Hit Reset button if you want to refresh and start again.
Input video will be processed over 10 seconds only for demo purpose :)https://huggingface.co/spaces/fffiloni/SAM2-Video-Predictor SAM2 Video Predictor - a Hugging Face Space by fffiloni
MiniCPM-V and OmniLMM are a family of open-source large multimodal models (LMMs) adept at vision & language modeling. The models process images and text inputs and deliver high-quality text outputs. We release two featured versions that are targeted at strong performance and efficient deployment:

MiniCPM-V 2.8B: State-of-the-art end-side large multimodal models. Our latest MiniCPM-V 2.0 can accept 1.8 million pixels (e.g., 1344x1344) images at any aspect ratio, and is adept at OCR capability. It achieves comparable performance with Gemini Pro in understanding scene-text and matches GPT-4V in preventing hallucinations.

OmniLMM 12B: The most capable version with leading performance among comparable-sized models on multiple benchmarks. The model also achieves state-of-the-art performance in trustworthy behaviors, with even less hallucination than GPT-4V.https://github.com/OpenBMB/MiniCPM-V/blob/8a1f766b85595a8095651eed9a44a83a965b305b/README_en.md#minicpm-v- MiniCPM-V/README_en.md at 8a1f766b85595a8095651eed9a44a83a965b305b 路 OpenBMB/MiniCPM-V
MiniCPM-V 2.8B is a strong multimodal large language model for efficient end-side deployment. The model is built based on SigLip-400M and MiniCPM-2.4B, connected by a perceiver resampler. Our latest version, MiniCPM-V 2.0 has several notable features.

馃敟 State-of-the-art Performance.

MiniCPM-V 2.0 achieves state-of-the-art performance on multiple benchmarks (including OCRBench, TextVQA, MME, MMB, MathVista, etc) among models under 7B parameters. It even outperforms strong Qwen-VL-Chat 9.6B, CogVLM-Chat 17.4B, and Yi-VL 34B on OpenCompass, a comprehensive evaluation over 11 popular benchmarks. Notably, MiniCPM-V 2.0 shows strong OCR capability, achieving comparable performance to Gemini Pro in scene-text understanding, and state-of-the-art performance on OCRBench among open-source models.