PyTorch implementation of the Self-Compression & Differentiable Quantization Algorithm introduced in “Self-Compressing Neural Networks” paper.
The algorithm shows dynamic neural network compression during training - with reduced size of weight, activation tensors and bits required to represent weights.
It’s basically shrinking the neural network size (weights and activations) as it’s being trained without compromising performance - this helps reduce compute and inference cost.