Really nice development by
@nvidia
and
@HuggingFace
Launch of Hugging Face Inference-as-a-Service powered by NVIDIA NIM, a new service on the Hugging Face Hub
So, we can use open models with the accelerated compute platform, of NVIDIA DGX Cloud for inference serving.
Code is fully compatible with OpenAI API, allowing you to use the openaiโ sdk for inference.
Note: You need access to an Organization with a Hugging Face Enterprise subscription to run Inference.
------
๐ So NVIDIA NIMs is an inference microservices that provide models as optimized containers โ to deploy on clouds, data centers or workstations, giving them the ability to easily build generative AI applications for copilots, chatbots and more, in minutes rather than weeks.
๐ Maximizes infrastructure investments and compute efficiency. For example, running Meta Llama 3-8B in a NIM produces up to 3x more generative AI tokens on accelerated infrastructure than without NIM.
@nvidia
and
@HuggingFace
Launch of Hugging Face Inference-as-a-Service powered by NVIDIA NIM, a new service on the Hugging Face Hub
So, we can use open models with the accelerated compute platform, of NVIDIA DGX Cloud for inference serving.
Code is fully compatible with OpenAI API, allowing you to use the openaiโ sdk for inference.
Note: You need access to an Organization with a Hugging Face Enterprise subscription to run Inference.
------
๐ So NVIDIA NIMs is an inference microservices that provide models as optimized containers โ to deploy on clouds, data centers or workstations, giving them the ability to easily build generative AI applications for copilots, chatbots and more, in minutes rather than weeks.
๐ Maximizes infrastructure investments and compute efficiency. For example, running Meta Llama 3-8B in a NIM produces up to 3x more generative AI tokens on accelerated infrastructure than without NIM.