Google just dropped Gemma 2 2B! π₯
> Scores higher than GPT 3.5, Mixtral 8x7B on the LYMSYS arena
> MMLU: 56.1 & MBPP: 36.6
> Beats previous (Gemma 1 2B) by more than 10% in benchmarks
> 2.6B parameters, Multilingual
> 2 Trillion tokens (training set)
> Distilled from Gemma 2 27B (?)
> Trained on 512 TPU v5e
Smaller models beat orders of magnitude bigger models! π€
Very cool direction and so many cool ablations for distillation, too!
Kudos to Google & Deepmind for continuing their belief in open source and science! β‘οΈ
> Scores higher than GPT 3.5, Mixtral 8x7B on the LYMSYS arena
> MMLU: 56.1 & MBPP: 36.6
> Beats previous (Gemma 1 2B) by more than 10% in benchmarks
> 2.6B parameters, Multilingual
> 2 Trillion tokens (training set)
> Distilled from Gemma 2 27B (?)
> Trained on 512 TPU v5e
Smaller models beat orders of magnitude bigger models! π€
Very cool direction and so many cool ablations for distillation, too!
Kudos to Google & Deepmind for continuing their belief in open source and science! β‘οΈ