๐ ๐ป๐ฒ๐๐ฟ๐ฎ๐น ๐ป๐ฒ๐๐๐ผ๐ฟ๐ธ ๐๐ถ๐บ๐๐น๐ฎ๐๐ฒ๐ ๐๐ข๐ข๐ : ๐๐ผ๐ผ๐ด๐น๐ฒ ๐ฟ๐ฒ๐๐ฒ๐ฎ๐ฟ๐ฐ๐ต๐ฒ๐ฟ๐ ๐ผ๐ฝ๐ฒ๐ป ๐๐ต๐ฒ ๐๐ฎ๐ ๐ณ๐ผ๐ฟ ๐ฐ๐ผ๐บ๐ฝ๐น๐ฒ๐๐ฒ๐น๐-๐๐-๐ด๐ฒ๐ป๐ฒ๐ฟ๐ฎ๐๐ฒ๐ฑ ๐ด๐ฎ๐บ๐ฒ๐!
Imagine if games were completely live-generated by an AI model : the NPCs and their dialogues, the storyline, and even the game environment. The playerโs in-game actions would have a real, lasting impact on the game story.
In a very exciting paper, Google researchers just gave us the first credible glimpse of this future.
โก๏ธ They created GameNGen, the first neural model that can simulate a complex 3D game in real-time. They use it to simulate the classic game DOOM running at over 20 frames per second on a single TPU, with image quality comparable to lossy JPEG compression. And it feels just like the true game!
Here's how they did it:
1. They trained an RL agent to play DOOM and recorded its gameplay sessions.
2. They then used these recordings to train a diffusion model to predict the next frame, based on past frames and player actions.
3. During inference, they use only 4 denoising steps (instead of the usual dozens) to generate each frame quickly.
๐๐ฒ๐ ๐ถ๐ป๐๐ถ๐ด๐ต๐๐:
๐ฎ๐ค Human players can barely tell the difference between short clips (3 seconds) of the real game or the simulation
๐ง The model maintains game state (health, ammo, etc.) over long periods despite having only 3 seconds of effective context length
๐ They use "noise augmentation" during training to prevent quality degradation in long play sessions
๐ The game runs on one TPU at 20 FPS with 4 denoising steps, or 50 FPS with model distillation (with some quality loss)
The researchers did not open source the code, but I feel like weโve just seen a part of the future being written!
Their paper (exploding the upvote counter) ๐
Diffusion Models Are Real-Time Game Engines (2408.14837)
In a similar vein, play @Jofthomas's 'Everchanging Quest' ๐ฎ
Jofthomas/Everchanging-Quest
00:07
Imagine if games were completely live-generated by an AI model : the NPCs and their dialogues, the storyline, and even the game environment. The playerโs in-game actions would have a real, lasting impact on the game story.
In a very exciting paper, Google researchers just gave us the first credible glimpse of this future.
โก๏ธ They created GameNGen, the first neural model that can simulate a complex 3D game in real-time. They use it to simulate the classic game DOOM running at over 20 frames per second on a single TPU, with image quality comparable to lossy JPEG compression. And it feels just like the true game!
Here's how they did it:
1. They trained an RL agent to play DOOM and recorded its gameplay sessions.
2. They then used these recordings to train a diffusion model to predict the next frame, based on past frames and player actions.
3. During inference, they use only 4 denoising steps (instead of the usual dozens) to generate each frame quickly.
๐๐ฒ๐ ๐ถ๐ป๐๐ถ๐ด๐ต๐๐:
๐ฎ๐ค Human players can barely tell the difference between short clips (3 seconds) of the real game or the simulation
๐ง The model maintains game state (health, ammo, etc.) over long periods despite having only 3 seconds of effective context length
๐ They use "noise augmentation" during training to prevent quality degradation in long play sessions
๐ The game runs on one TPU at 20 FPS with 4 denoising steps, or 50 FPS with model distillation (with some quality loss)
The researchers did not open source the code, but I feel like weโve just seen a part of the future being written!
Their paper (exploding the upvote counter) ๐
Diffusion Models Are Real-Time Game Engines (2408.14837)
In a similar vein, play @Jofthomas's 'Everchanging Quest' ๐ฎ
Jofthomas/Everchanging-Quest
00:07