Comprehensive quantization algorithms (weight-only & activation)
🤗 Seamless integration with Hugging Face models
📁 Safetensors-based format for vLLM compatibility
🦙 Large model support via HF accelerate, including Llama 405B
https://github.com/vllm-project/llm-compressor
🤗 Seamless integration with Hugging Face models
📁 Safetensors-based format for vLLM compatibility
🦙 Large model support via HF accelerate, including Llama 405B
https://github.com/vllm-project/llm-compressor