๐—”๐—œ๐Ÿฎ๐Ÿญ ๐—ถ๐˜๐—ฒ๐—ฟ๐—ฎ๐˜๐—ฒ๐˜€ ๐˜„๐—ถ๐˜๐—ต ๐—ป๐—ฒ๐˜„ ๐—๐—ฎ๐—บ๐—ฏ๐—ฎ ๐Ÿญ.๏ฟฝ... | ๐—”๐—œ๐Ÿฎ๐Ÿญ ๐—ถ๐˜๐—ฒ๐—ฟ๐—ฎ๐˜๐—ฒ๐˜€ ๐˜„๐—ถ๐˜๐—ต ๐—ป๐—ฒ๐˜„ ๐—๐—ฎ๐—บ๐—ฏ๐—ฎ ๐Ÿญ.๏ฟฝ...
๐—”๐—œ๐Ÿฎ๐Ÿญ ๐—ถ๐˜๐—ฒ๐—ฟ๐—ฎ๐˜๐—ฒ๐˜€ ๐˜„๐—ถ๐˜๐—ต ๐—ป๐—ฒ๐˜„ ๐—๐—ฎ๐—บ๐—ฏ๐—ฎ ๐Ÿญ.๐Ÿฑ ๐—ฟ๐—ฒ๐—น๐—ฒ๐—ฎ๐˜€๐—ฒ: ๐—ก๐—ฒ๐˜„ ๐˜€๐˜๐—ฎ๐—ป๐—ฑ๐—ฎ๐—ฟ๐—ฑ ๐—ณ๐—ผ๐—ฟ ๐—น๐—ผ๐—ป๐—ด-๐—ฐ๐—ผ๐—ป๐˜๐—ฒ๐˜…๐˜ ๐˜‚๐˜€๐—ฒ-๐—ฐ๐—ฎ๐˜€๐—ฒ๐˜€!๐Ÿ…

@ai21labs used a different architecture to beat the status-quo Transformers models: Jamba architecture combines classic Transformers layers with the new Mamba layers, for which the complexity is a linear (instead of quadratic) function of the context length.

What does this imply?

โžก๏ธ Jamba models are much more efficient for long contexts: faster (up to 2.5x faster for long context), takes less memory, and also performs better to recall everything in the prompt.

That means itโ€™s a new go-to model for RAG or agentic applications!

And the performance is not too shabby: Jamba 1.5 models are comparable in perf to similar-sized Llama-3.1 models! The largest model even outperforms Llama-3.1 405B on Arena-Hard.

โœŒ๏ธ Comes in 2 sizes: Mini (12B active/52B) and Large (94B active/399B)
๐Ÿ“ Both deliver 256k context length, for low memory: Jamba-1.5 mini fits 140k context length on one single A100.
โš™๏ธ New quanttization method: Experts Int8 quantizes only the weights parts of the MoE layers, which account for 85% of weights
๐Ÿค– Natively supports JSON format generation & function calling.
๐Ÿ”“ Permissive license *if your org makes <$50M revenue*

Available on the Hub ๐Ÿ‘‰
ai21labs/jamba-15-66c44befa474a917fcf55251

Read their release blog post ๐Ÿ‘‰ https://www.ai21.com/blog/announcing-jamba-model-family