Serving Meta Llama 3.1 405B on Google Cloud is now possib... | Serving Meta Llama 3.1 405B on Google Cloud is now possib...
Serving Meta Llama 3.1 405B on Google Cloud is now possible via the Hugging Face Deep Learning Containers (DLCs) for Text Generation Inference (TGI)

In this post, we showcase how to deploy
meta-llama/Meta-Llama-3.1-405B-Instruct-FP8
on an A3 instance with 8 x H100 GPUs on Vertex AI

Thanks to the Hugging Face DLCs for TGI and Google Cloud Vertex AI, deploying a high-performance text generation container for serving Large Language Models (LLMs) has never been easier. And we’re not going to stop here – stay tuned as we enable more experiences to build AI with open models on Google Cloud!

Read the full post at https://huggingface.co/blog/llama31-on-vertex-ai Deploy Meta Llama 3.1 405B on Google Cloud Vertex AI