๐Ÿš€ ๐—ช๐—ต๐—ฒ๐—ฟ๐—ฒ ๐˜€๐—ฐ๐—ฎ๐—น๐—ถ๐—ป๐—ด ๐—น๐—ฎ๐˜„๐˜€ ๐—ฎ๐—ฟ๐—ฒ ๐˜๐—ฎ๐—ธ๐—ถ๐—ป๐—ด... | ๐Ÿš€ ๐—ช๐—ต๐—ฒ๐—ฟ๐—ฒ ๐˜€๐—ฐ๐—ฎ๐—น๐—ถ๐—ป๐—ด ๐—น๐—ฎ๐˜„๐˜€ ๐—ฎ๐—ฟ๐—ฒ ๐˜๐—ฎ๐—ธ๐—ถ๐—ป๐—ด...
๐Ÿš€ ๐—ช๐—ต๐—ฒ๐—ฟ๐—ฒ ๐˜€๐—ฐ๐—ฎ๐—น๐—ถ๐—ป๐—ด ๐—น๐—ฎ๐˜„๐˜€ ๐—ฎ๐—ฟ๐—ฒ ๐˜๐—ฎ๐—ธ๐—ถ๐—ป๐—ด ๐˜‚๐˜€ : ๐—ฏ๐˜† ๐Ÿฎ๐Ÿฌ๐Ÿฎ๐Ÿด, ๐—”๐—œ ๐—–๐—น๐˜‚๐˜€๐˜๐—ฒ๐—ฟ๐˜€ ๐˜„๐—ถ๐—น๐—น ๐—ฟ๐—ฒ๐—ฎ๐—ฐ๐—ต ๐˜๐—ต๐—ฒ ๐—ฝ๐—ผ๐˜„๐—ฒ๐—ฟ ๐—ฐ๐—ผ๐—ป๐˜€๐˜‚๐—บ๐—ฝ๐˜๐—ถ๐—ผ๐—ป ๐—ผ๐—ณ ๐—ฒ๐—ป๐˜๐—ถ๐—ฟ๐—ฒ ๐—ฐ๐—ผ๐˜‚๐—ป๐˜๐—ฟ๐—ถ๐—ฒ๐˜€

Reminder : โ€œScaling lawsโ€ are empirical laws saying that if you keep multiplying your compute by x10, your models will mechanically keep getting better and better.

To give you an idea, GPT-3 can barely write sentences, and GPT-4, which only used x15 its amount of compute, already sounds much smarter than some of my friends (although it's not really - or at least I haven't tested them side-by side). So you can imagine how far a x100 over GPT-4 can take us.

๐ŸŽ As a result, tech titans are racing to build the biggest models, and for this they need gigantic training clusters.

The picture below shows the growth of training compute: it is increasing at a steady exponential rate of a x10 every 2 years. So letโ€™s take this progress a bit further:
- 2022: starting training for GPT-4 : 10^26 FLOPs, cost of $100M
- 2024: today, companies start training on much larger clusters like the โ€œsuper AI clusterโ€ of Elon Muskโ€™s xAI, 10^27 FLOPS, $1B
- 2026 : by then clusters will require 1GW, i.e. around the full power generated by a nuclear reactor
- 2028: we reach cluster prices in the 100 billion dollars, using 10GW, more than the most powerful power stations currently in use in the US. This last size seems crazy, but Microsoft and OpenAI already are planning one.

Will AI clusters effectively reach these crazy sizes where the consume as much as entire countries?
โžก๏ธ Three key ingredients of training might be a roadblock to scaling up :
๐Ÿ’ธ Money: but itโ€™s very unlikely, given the potential market size for AGI, that investors lose interest.
โšก๏ธ Energy supply at a specific location
๐Ÿ“š Training data: weโ€™re already using 15 trillion tokens for Llama-3.1 when Internet has something like 60 trillion.

๐Ÿค” Iโ€™d be curious to hear your thoughts: do you think weโ€™ll race all the way there?