HF-hub - Share and discover more about AI with social posts from the community.huggingface/OpenAi
Share and discover more about AI with social posts from the community.huggingface/OpenAi
# Cos Stable Diffusion XL 1.0 and Cos Stable Diffusion XL 1.0 Edit

Cos Stable Diffusion XL 1.0 Base is tuned to use a Cosine-Continuous EDM VPred schedule. The most notable feature of this schedule change is its capacity to produce the full color range from pitch black to pure white, alongside more subtle improvements to the model's rate-of-change to images across each step.

Edit Stable Diffusion XL 1.0 Base is tuned to use a Cosine-Continuous EDM VPred schedule, and then upgraded to perform instructed image editing. This model takes a source image as input alongside a prompt, and interprets the prompt as an instruction for how to alter the image.

## Usage
These were all made using the SwarmUI lora extract tool

## 4thTail-v045-SDXL.safetensors
LoRA of 4th_tail_v0.4.5 extracted from Stable Diffusion XL 1.0 Base at rank 32.
note: broken

[4thTail-v045-SDXL.safetensors](https://huggingface.co/AshtakaOOf/lora-extract/resolve/main/sdxl/4thTail-v045-SDXL.safetensors?download=true)

## aidxl052-extract.safetensors
LoRA of animeIllustDiffusion_v052 extracted from Stable Diffusion XL 1.0 Base at rank 24.
note: Doesn't work
Better Alignment with Instruction Back-and-Forth Translation

Abstract
We propose a new method, instruction back-and-forth translation, to construct high-quality synthetic data grounded in world knowledge for aligning large language models (LLMs).

https://arxiv.org/abs/2408.04614 #LLM
Radxa Launches New Single-Board Computers Featuring Rockchip RK3588S2 and RK3582 Chips, Starting at $30
Radxa has announced the launch of its latest single-board computers (SBCs), the Radxa ROCK 5C and the Radxa ROCK 5C Lite. These credit card-sized devices are designed to cater to various computing needs, with prices starting at just $30 for the Lite version and $50 for the standard ROCK 5C. Both models are currently available for pre-order and are set to begin shipping on April 10th 2024. #RK3588S2 #Radxa
It is driven by a motor, with a height of 5 feet 6 inches and a weight of 70 kilograms.
Software and intelligence aspect:
It is equipped with an on-board visual language model (VLM), enabling it to perform rapid common-sense visual reasoning.
Compared to the previous generation product, the on-board computing and AI reasoning capabilities have tripled, allowing many real-world AI tasks to be executed completely independently.
It is equipped with a specially customized speech-to-speech reasoning model from the company's investor OpenAI. The default UI is speech, and it communicates with humans through the on-board microphone and speaker. #AI
Characteristics and capabilities of Figure 02:
Hardware aspect:
The appearance adopts an exoskeleton structure, integrating the power supply and computing power wiring inside the body, improving reliability and packaging compactness.
It is equipped with a fourth-generation hand device, with 16 degrees of freedom and strength comparable to that of humans, capable of carrying up to 25 kilograms of weight, and can flexibly perform various human-like tasks.
It has 6 RGB cameras (located on the head, chest and back respectively), and has "superhuman" vision.
The internal battery pack capacity has increased to 2.25 kWh. Its founder hopes that it can achieve an actual effective working time of more than 20 hours per day (but currently the official website shows that the battery life is only 5 hours. The 20 hours might be the inferred limit working time of "charging + working").
I've built a space for creating prompts for FLUX

gokaygokay/FLUX-Prompt-Generator


You can create long prompts from images or simple words. Enhance your short prompts with prompt enhancer. You can configure various settings such as artform, photo type, character details, scene details, style, and artist to create tailored prompts.

And you can combine all of them with custom prompts using llms (Mixtral, Mistral, Llama 3, and Mistral-Nemo).

The UI is a bit complex, but it includes almost everything you need. Choosing random option is the most fun!

And i've created some other spaces for using FLUX models with captioners and enhancers.

-
gokaygokay/FLUX.1-dev-with-Captioner

-
gokaygokay/FLUX.1-Schnell-with-Captioner
Results on TextVQA, DocVQA, OCRBench, OpenCompass MultiModal Avg , MME, MMBench, MMMU, MathVista, LLaVA Bench, RealWorld QA, Object HalBench.
💫 Easy Usage. MiniCPM-Llama3-V 2.5 can be easily used in various ways: (1) llama.cpp and ollama support for efficient CPU inference on local devices, (2) GGUF format quantized models in 16 sizes, (3) efficient LoRA fine-tuning with only 2 V100 GPUs, (4) streaming output, (5) quick local WebUI demo setup with Gradio and Streamlit, and (6) interactive demos on HuggingFace Spaces.
🚀 Efficient Deployment. MiniCPM-Llama3-V 2.5 systematically employs model quantization, CPU optimizations, NPU optimizations and compilation optimizations, achieving high-efficiency deployment on edge devices. For mobile phones with Qualcomm chips, we have integrated the NPU acceleration framework QNN into llama.cpp for the first time. After systematic optimization, MiniCPM-Llama3-V 2.5 has realized a 150-fold acceleration in multimodal large model end-side image encoding and a 3-fold increase in language decoding speed.
🌏 Multilingual Support. Thanks to the strong multilingual capabilities of Llama 3 and the cross-lingual generalization technique from VisCPM, MiniCPM-Llama3-V 2.5 extends its bilingual (Chinese-English) multimodal capabilities to over 30 languages including German, French, Spanish, Italian, Korean, Japanese etc. All Supported Languages.
🏆 Trustworthy Behavior. Leveraging the latest RLAIF-V method (the newest technology in the RLHF-V [CVPR'24] series), MiniCPM-Llama3-V 2.5 exhibits more trustworthy behavior. It achieves 10.3% hallucination rate on Object HalBench, lower than GPT-4V-1106 (13.6%), achieving the best-level performance within the open-source community. Data released.
💪 Strong OCR Capabilities. MiniCPM-Llama3-V 2.5 can process images with any aspect ratio and up to 1.8 million pixels (e.g., 1344x1344), achieving an 700+ score on OCRBench, surpassing proprietary models such as GPT-4o, GPT-4V-0409, Qwen-VL-Max and Gemini Pro. Based on recent user feedback, MiniCPM-Llama3-V 2.5 has now enhanced full-text OCR extraction, table-to-markdown conversion, and other high-utility capabilities, and has further strengthened its instruction-following and complex reasoning abilities, enhancing multimodal interaction experiences.
🔥 Leading Performance. MiniCPM-Llama3-V 2.5 has achieved an average score of 65.1 on OpenCompass, a comprehensive evaluation over 11 popular benchmarks. With only 8B parameters, it surpasses widely used proprietary models like GPT-4V-1106, Gemini Pro, Claude 3 and Qwen-VL-Max and greatly outperforms other Llama 3-based MLLMs.
MiniCPM-Llama3-V 2.5 is the latest model in the MiniCPM-V series. The model is built on SigLip-400M and Llama3-8B-Instruct with a total of 8B parameters. It exhibits a significant performance improvement over MiniCPM-V 2.0. Notable features of MiniCPM-Llama3-V 2.5 include:
[2024.05.20] We open-soure MiniCPM-Llama3-V 2.5, it has improved OCR capability and supports 30+ languages, representing the first end-side MLLM achieving GPT-4V level performance! We provide efficient inference and simple fine-tuning. Try it now!
[2024.05.23] 🔍 We've released a comprehensive comparison between Phi-3-vision-128k-instruct and MiniCPM-Llama3-V 2.5, including benchmarks evaluations, multilingual capabilities, and inference efficiency 🌟📊🌍🚀. Click here to view more details.
[2024.05.24] We release the MiniCPM-Llama3-V 2.5 gguf, which supports llama.cpp inference and provides a 6~8 token/s smooth decoding on mobile phones. Try it now!