NVIDIA AI Foundry goes full force in supporting the adoption of Meta Llama 3.1 AI model
Alongside the official public announcement of NVIDIA AI Foundry service, the news also covered said service in addition to NVIDIA NIM inference microservices enhancing the generative AI capabilities for enterprises with the newly introduced Llama 3.1 model collection.
Through the power of NVIDIA NIM, rapid deployment of Llama 3.1 models are achievable with up to 2.5x higher throughput and it can even be paired with the new NVIDIA NeMo Retriever NIM microservices to create advanced retrieval pipelines for AI applications.
Although the "full version" Llama 3.1 requires literal data center-class GPU clusters to operate, the collaboration between NVIDIA and Meta also gave way to a distillation recipe for Llama 3.1, enabling the creation of smaller, efficient models suitable for a wide range of AI infrastructures.
Regardless, leading companies across sectors like healthcare, energy, and telecommunications are already leveraging NVIDIA NIM microservices for Llama. The Llama 3.1 collection, optimized for NVIDIA's hardware, includes models with up to 405 billion parameters, supporting a variety of generative AI applications.
The new NeMo Retriever RAG microservices enhance the accuracy and performance of AI responses by combining them with Llama 3.1 NIM microservices. This boosts the retrieval-augmented generation pipelines' efficiency.
Hundreds of partners in NVIDIA's enterprise ecosystem are set to integrate these microservices, bolstering generative AI development. Production support for these services is available through NVIDIA AI Enterprise, with free access for research and testing provided to NVIDIA Developer Program members.
More info can be learned in this post.