Welcome FalconMamba 7B - permissively licensed, State Space Model trained on 5.5T tokens, near infinite sequential pre-fill! 🔥
> Base model only - scores higher than Llama 3.1 8B on ARC, GSM8K & TruthfulQA
> Also, beats L3.1 8B on MUSR and GPQA
> Trained on 256 8x H100s on AWS using 3D parallelism w/ ZeRO (took 2 months)
> Trained on 5.5 T tokens (4096 -> 8192 ctx len)
> Uses the same tokenizer as Falcon 7B & 11B
> Released under Apache 2.0 licensed w/ acceptable use policy
> Integrated w/ transformers 🤗
Kudos to TII for a brilliant base model! Now go fine-tune it y'all!
> Base model only - scores higher than Llama 3.1 8B on ARC, GSM8K & TruthfulQA
> Also, beats L3.1 8B on MUSR and GPQA
> Trained on 256 8x H100s on AWS using 3D parallelism w/ ZeRO (took 2 months)
> Trained on 5.5 T tokens (4096 -> 8192 ctx len)
> Uses the same tokenizer as Falcon 7B & 11B
> Released under Apache 2.0 licensed w/ acceptable use policy
> Integrated w/ transformers 🤗
Kudos to TII for a brilliant base model! Now go fine-tune it y'all!