Llama-405b runs on cpu.Getting 1.67 token/s output, 10 to... | Llama-405b runs on cpu.Getting 1.67 token/s output, 10 to...
Llama-405b runs on cpu.
Getting 1.67 token/s output,
10 tokens/words per second input without a gpu.
Slow but usable, summarizing a 2 hour long medtech discussion with it. Will upload 2bit optimized etc here
https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/tree/main