Supercool Weekend Read🤖Nvidia researchers achieved SOTA... | Supercool Weekend Read🤖Nvidia researchers achieved SOTA...
Supercool Weekend Read🤖
Nvidia researchers achieved SOTA LLM compression metrics using pruning and knowledge distillation techniques.

Details on Techniques (Simplified):
They started off with a large pre-trained language model (15B params), then:

1. Estimated the importance of different parts of the model (neurons, attention heads, layers) using activation-based metrics on a small calibration dataset.