Welcome FalconMamba 7B - permissively licensed, State Spa... | Welcome FalconMamba 7B - permissively licensed, State Spa...
Welcome FalconMamba 7B - permissively licensed, State Space Model trained on 5.5T tokens, near infinite sequential pre-fill! 🔥

> Base model only - scores higher than Llama 3.1 8B on ARC, GSM8K & TruthfulQA
> Also, beats L3.1 8B on MUSR and GPQA
> Trained on 256 8x H100s on AWS using 3D parallelism w/ ZeRO (took 2 months)
> Trained on 5.5 T tokens (4096 -> 8192 ctx len)
> Uses the same tokenizer as Falcon 7B & 11B
> Released under Apache 2.0 licensed w/ acceptable use policy
> Integrated w/ transformers 🤗

Kudos to TII for a brilliant base model! Now go fine-tune it y'all!