HF-hub

HF-hub - Share and discover more about AI with social posts from the community.huggingface/OpenAi

Share and discover more about AI with social posts from the community.huggingface/OpenAi

12:42 · Aug 11, 2024 · Sun

It is recommended to use [Stable Swarm UI](https://github.com/Stability-AI/StableSwarmUI) to inference the CosXL model and the edit model.

Cos Stable Diffusion XL 1.0 can also be used as a regular checkpoint in [ComfyUI](https://github.com/comfyanonymous/ComfyUI)

For an example on how to use Edit Stable Diffusion XL 1.0 see [ComfyUI Example](https://comfyanonymous.github.io/ComfyUI_examples/edit_models/)

## Uses

GitHub

GitHub - Stability-AI/StableSwarmUI: StableSwarmUI, A Modular Stable Diffusion Web-User-Interface, with an emphasis on making powertools…

StableSwarmUI, A Modular Stable Diffusion Web-User-Interface, with an emphasis on making powertools easily accessible, high performance, and extensibility. - Stability-AI/StableSwarmUI

12:42 · Aug 11, 2024 · Sun

# Cos Stable Diffusion XL 1.0 and Cos Stable Diffusion XL 1.0 Edit

Cos Stable Diffusion XL 1.0 Base is tuned to use a Cosine-Continuous EDM VPred schedule. The most notable feature of this schedule change is its capacity to produce the full color range from pitch black to pure white, alongside more subtle improvements to the model's rate-of-change to images across each step.

Edit Stable Diffusion XL 1.0 Base is tuned to use a Cosine-Continuous EDM VPred schedule, and then upgraded to perform instructed image editing. This model takes a source image as input alongside a prompt, and interprets the prompt as an instruction for how to alter the image.

## Usage

12:42 · Aug 11, 2024 · Sun

These were all made using the SwarmUI lora extract tool

## 4thTail-v045-SDXL.safetensors
LoRA of 4th_tail_v0.4.5 extracted from Stable Diffusion XL 1.0 Base at rank 32.
note: broken

[4thTail-v045-SDXL.safetensors](https://huggingface.co/AshtakaOOf/lora-extract/resolve/main/sdxl/4thTail-v045-SDXL.safetensors?download=true)

## aidxl052-extract.safetensors
LoRA of animeIllustDiffusion_v052 extracted from Stable Diffusion XL 1.0 Base at rank 24.
note: Doesn't work

12:39 · Aug 11, 2024 · Sun

Better Alignment with Instruction Back-and-Forth Translation

Abstract
We propose a new method, instruction back-and-forth translation, to construct high-quality synthetic data grounded in world knowledge for aligning large language models (LLMs).

https://arxiv.org/abs/2408.04614 #LLM

arXiv.org

Better Alignment with Instruction Back-and-Forth Translation

We propose a new method, instruction back-and-forth translation, to construct high-quality synthetic data grounded in world knowledge for aligning large language models (LLMs). Given documents...

LLM

12:38 · Aug 11, 2024 · Sun

Radxa Launches New Single-Board Computers Featuring Rockchip RK3588S2 and RK3582 Chips, Starting at $30
Radxa has announced the launch of its latest single-board computers (SBCs), the Radxa ROCK 5C and the Radxa ROCK 5C Lite. These credit card-sized devices are designed to cater to various computing needs, with prices starting at just $30 for the Lite version and $50 for the standard ROCK 5C. Both models are currently available for pre-order and are set to begin shipping on April 10th 2024. #RK3588S2 #Radxa

RK3588S2 Radxa

12:38 · Aug 11, 2024 · Sun

It is driven by a motor, with a height of 5 feet 6 inches and a weight of 70 kilograms.
Software and intelligence aspect:
It is equipped with an on-board visual language model (VLM), enabling it to perform rapid common-sense visual reasoning.
Compared to the previous generation product, the on-board computing and AI reasoning capabilities have tripled, allowing many real-world AI tasks to be executed completely independently.
It is equipped with a specially customized speech-to-speech reasoning model from the company's investor OpenAI. The default UI is speech, and it communicates with humans through the on-board microphone and speaker. #AI

12:38 · Aug 11, 2024 · Sun

Characteristics and capabilities of Figure 02:
Hardware aspect:
The appearance adopts an exoskeleton structure, integrating the power supply and computing power wiring inside the body, improving reliability and packaging compactness.
It is equipped with a fourth-generation hand device, with 16 degrees of freedom and strength comparable to that of humans, capable of carrying up to 25 kilograms of weight, and can flexibly perform various human-like tasks.
It has 6 RGB cameras (located on the head, chest and back respectively), and has "superhuman" vision.
The internal battery pack capacity has increased to 2.25 kWh. Its founder hopes that it can achieve an actual effective working time of more than 20 hours per day (but currently the official website shows that the battery life is only 5 hours. The 20 hours might be the inferred limit working time of "charging + working").

12:34 · Aug 11, 2024 · Sun

I've built a space for creating prompts for FLUX

gokaygokay/FLUX-Prompt-Generator

You can create long prompts from images or simple words. Enhance your short prompts with prompt enhancer. You can configure various settings such as artform, photo type, character details, scene details, style, and artist to create tailored prompts.

And you can combine all of them with custom prompts using llms (Mixtral, Mistral, Llama 3, and Mistral-Nemo).

The UI is a bit complex, but it includes almost everything you need. Choosing random option is the most fun!

And i've created some other spaces for using FLUX models with captioners and enhancers.

-
gokaygokay/FLUX.1-dev-with-Captioner

-
gokaygokay/FLUX.1-Schnell-with-Captioner

11:45 · Aug 11, 2024 · Sun

Results on TextVQA, DocVQA, OCRBench, OpenCompass MultiModal Avg , MME, MMBench, MMMU, MathVista, LLaVA Bench, RealWorld QA, Object HalBench.

11:44 · Aug 11, 2024 · Sun

💫 Easy Usage. MiniCPM-Llama3-V 2.5 can be easily used in various ways: (1) llama.cpp and ollama support for efficient CPU inference on local devices, (2) GGUF format quantized models in 16 sizes, (3) efficient LoRA fine-tuning with only 2 V100 GPUs, (4) streaming output, (5) quick local WebUI demo setup with Gradio and Streamlit, and (6) interactive demos on HuggingFace Spaces.

11:44 · Aug 11, 2024 · Sun

🚀 Efficient Deployment. MiniCPM-Llama3-V 2.5 systematically employs model quantization, CPU optimizations, NPU optimizations and compilation optimizations, achieving high-efficiency deployment on edge devices. For mobile phones with Qualcomm chips, we have integrated the NPU acceleration framework QNN into llama.cpp for the first time. After systematic optimization, MiniCPM-Llama3-V 2.5 has realized a 150-fold acceleration in multimodal large model end-side image encoding and a 3-fold increase in language decoding speed.

11:44 · Aug 11, 2024 · Sun

🌏 Multilingual Support. Thanks to the strong multilingual capabilities of Llama 3 and the cross-lingual generalization technique from VisCPM, MiniCPM-Llama3-V 2.5 extends its bilingual (Chinese-English) multimodal capabilities to over 30 languages including German, French, Spanish, Italian, Korean, Japanese etc. All Supported Languages.

11:44 · Aug 11, 2024 · Sun

🏆 Trustworthy Behavior. Leveraging the latest RLAIF-V method (the newest technology in the RLHF-V [CVPR'24] series), MiniCPM-Llama3-V 2.5 exhibits more trustworthy behavior. It achieves 10.3% hallucination rate on Object HalBench, lower than GPT-4V-1106 (13.6%), achieving the best-level performance within the open-source community. Data released.

11:44 · Aug 11, 2024 · Sun

💪 Strong OCR Capabilities. MiniCPM-Llama3-V 2.5 can process images with any aspect ratio and up to 1.8 million pixels (e.g., 1344x1344), achieving an 700+ score on OCRBench, surpassing proprietary models such as GPT-4o, GPT-4V-0409, Qwen-VL-Max and Gemini Pro. Based on recent user feedback, MiniCPM-Llama3-V 2.5 has now enhanced full-text OCR extraction, table-to-markdown conversion, and other high-utility capabilities, and has further strengthened its instruction-following and complex reasoning abilities, enhancing multimodal interaction experiences.

11:44 · Aug 11, 2024 · Sun

🔥 Leading Performance. MiniCPM-Llama3-V 2.5 has achieved an average score of 65.1 on OpenCompass, a comprehensive evaluation over 11 popular benchmarks. With only 8B parameters, it surpasses widely used proprietary models like GPT-4V-1106, Gemini Pro, Claude 3 and Qwen-VL-Max and greatly outperforms other Llama 3-based MLLMs.

11:44 · Aug 11, 2024 · Sun

MiniCPM-Llama3-V 2.5 is the latest model in the MiniCPM-V series. The model is built on SigLip-400M and Llama3-8B-Instruct with a total of 8B parameters. It exhibits a significant performance improvement over MiniCPM-V 2.0. Notable features of MiniCPM-Llama3-V 2.5 include:

11:44 · Aug 11, 2024 · Sun

[2024.05.20] We open-soure MiniCPM-Llama3-V 2.5, it has improved OCR capability and supports 30+ languages, representing the first end-side MLLM achieving GPT-4V level performance! We provide efficient inference and simple fine-tuning. Try it now!

11:44 · Aug 11, 2024 · Sun

[2024.05.23] 🔍 We've released a comprehensive comparison between Phi-3-vision-128k-instruct and MiniCPM-Llama3-V 2.5, including benchmarks evaluations, multilingual capabilities, and inference efficiency 🌟📊🌍🚀. Click here to view more details.

11:44 · Aug 11, 2024 · Sun

[2024.05.24] We release the MiniCPM-Llama3-V 2.5 gguf, which supports llama.cpp inference and provides a 6~8 token/s smooth decoding on mobile phones. Try it now!

After

FLUX