๐๐ผ๐ผ๐ด๐น๐ฒ ๐ฝ๐ฎ๐ฝ๐ฒ๐ฟ : ๐๐ฐ๐ฎ๐น๐ถ๐ป๐ด ๐๐ฝ ๐ถ๐ป๐ณ๐ฒ๐ฟ๐ฒ๐ป๐ฐ๐ฒ ๐ฐ๐ผ๐บ๐ฝ๐๐๐ฒ ๐ฏ๐ฒ๐ฎ๐๐ ๐ญ๐ฐ๐
๐น๐ฎ๐ฟ๐ด๐ฒ๐ฟ ๐บ๐ผ๐ฑ๐ฒ๐น๐ ๐
Remember scaling laws? These are empirical laws that say "the bigger your model, the better it gets". More precisely, "as your compute increases exponentially, loss decreases in a linear fashion". They have wild implications, suggesting that spending 100x more training compute would make you super-LLMs. That's why companies are racing to build the biggest AI superclusters ever, and Meta bought 350k H100 GPUs, which probably cost in the order of $1B.
But think of this : we're building huge reasoning machines, but only ask them to do one pass through the mod
Remember scaling laws? These are empirical laws that say "the bigger your model, the better it gets". More precisely, "as your compute increases exponentially, loss decreases in a linear fashion". They have wild implications, suggesting that spending 100x more training compute would make you super-LLMs. That's why companies are racing to build the biggest AI superclusters ever, and Meta bought 350k H100 GPUs, which probably cost in the order of $1B.
But think of this : we're building huge reasoning machines, but only ask them to do one pass through the mod