Supercool Weekend Read🤖
Nvidia researchers achieved SOTA LLM compression metrics using pruning and knowledge distillation techniques.
Details on Techniques (Simplified):
They started off with a large pre-trained language model (15B params), then:
1. Estimated the importance of different parts of the model (neurons, attention heads, layers) using activation-based metrics on a small calibration dataset.
Nvidia researchers achieved SOTA LLM compression metrics using pruning and knowledge distillation techniques.
Details on Techniques (Simplified):
They started off with a large pre-trained language model (15B params), then:
1. Estimated the importance of different parts of the model (neurons, attention heads, layers) using activation-based metrics on a small calibration dataset.