๐—š๐—ผ๐—ผ๐—ด๐—น๐—ฒ ๐—ฝ๐—ฎ๐—ฝ๐—ฒ๐—ฟ : ๐˜€๐—ฐ๐—ฎ๐—น๐—ถ๐—ป๐—ด ๐˜‚๐—ฝ ๐—ถ๐—ป๐—ณ๐—ฒ๐—ฟ๏ฟฝ... | ๐—š๐—ผ๐—ผ๐—ด๐—น๐—ฒ ๐—ฝ๐—ฎ๐—ฝ๐—ฒ๐—ฟ : ๐˜€๐—ฐ๐—ฎ๐—น๐—ถ๐—ป๐—ด ๐˜‚๐—ฝ ๐—ถ๐—ป๐—ณ๐—ฒ๐—ฟ๏ฟฝ...
๐—š๐—ผ๐—ผ๐—ด๐—น๐—ฒ ๐—ฝ๐—ฎ๐—ฝ๐—ฒ๐—ฟ : ๐˜€๐—ฐ๐—ฎ๐—น๐—ถ๐—ป๐—ด ๐˜‚๐—ฝ ๐—ถ๐—ป๐—ณ๐—ฒ๐—ฟ๐—ฒ๐—ป๐—ฐ๐—ฒ ๐—ฐ๐—ผ๐—บ๐—ฝ๐˜‚๐˜๐—ฒ ๐—ฏ๐—ฒ๐—ฎ๐˜๐˜€ ๐Ÿญ๐Ÿฐ๐˜… ๐—น๐—ฎ๐—ฟ๐—ด๐—ฒ๐—ฟ ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น๐˜€ ๐Ÿš€

Remember scaling laws? These are empirical laws that say "the bigger your model, the better it gets". More precisely, "as your compute increases exponentially, loss decreases in a linear fashion". They have wild implications, suggesting that spending 100x more training compute would make you super-LLMs. That's why companies are racing to build the biggest AI superclusters ever, and Meta bought 350k H100 GPUs, which probably cost in the order of $1B.

But think of this : we're building huge reasoning machines, but only ask them to do one pass through the mod