Llama 3.1 405B released. 🎏 MagPie-Ultra is the first ope... | Llama 3.1 405B released. 🎏 MagPie-Ultra is the first ope...
Llama 3.1 405B released. 🎏 MagPie-Ultra is the first open dataset using Llama 3.1 405B-Instruct FP8 to generate 50,000 synthetic instruction pairs using the MagPie recipe and
@argilla_io
distilabel. It includes challenging instructions for coding math, data analysis, creative writing, advice seeking, or Brainstorming. βš—οΈ

MagPie datasets are created by prompting LLMs with "empty" prompts that consist only of starting special tokens, allowing the model to auto-regressively generate user queries and corresponding responses, which are then filtered to select high-quality data. πŸ‘¨β€πŸŽ“

Note: The dataset is unfiltered but includes quality & difficulty scores, embeddings, topics, and safety scores from ArmorRM and LlamaGuard. πŸ›‘

βš—οΈ Pipeline: https://huggingface.co/datasets/argilla/magpie-ultra-v0.1/blob/main/pipeline.py
πŸ€— Dataset: https://huggingface.co/datasets/argilla/magpie-ultra-v0.1