A SmolLM - blazingly fast and remarkably powerfulTL;DRThi... | A SmolLM - blazingly fast and remarkably powerfulTL;DRThi...
A SmolLM - blazingly fast and remarkably powerful
TL;DR
This blog post introduces SmolLM, a family of state-of-the-art small models with 135M, 360M, and 1.7B parameters, trained on a new high-quality dataset. It covers data curation, model evaluation, and usage.

Introduction
There is increasing interest in small language models that can operate on local devices. This trend involves techniques such as distillation or quantization to compress large models, as well as training small models from scratch on large datasets. These approaches enable novel applications while dramatically reducing inference costs and improving user privacy.

Microsoft's Phi series, Alibaba's Qwen2 (less than 2B), and Meta's MobileLLM demonstrate that small models can achieve impressive results when designed and trained thoughtfully. However, most of the details about the data curation and training of these models are not publicly available.https://huggingface.co/blog/smollm SmolLM - blazingly fast and remarkably powerful