Learn how NVIDIA AI Foundry provides businesses with the easiest way to develop their own custom generative AI models

There's no denying that generative AI technology has transformed the way how humans interact with a digital product or platform. However, those behind those services are run by business folks so where do they source them? Look no further as NVIDIA themselves are one of the main drivers behind the trend through the NVIDIA AI Foundry service.

It is a service that helps enterprises use data, accelerated computing, and software tools to create and deploy custom models and enhances their generative AI initiatives. Through its expansive line of infrastructure and tools, NVIDIA AI Foundry usually revolves around DGX Cloud, foundation models, NVIDIA NeMo software, NVIDIA expertise, ecosystem tools, and support.

Its software includes AI foundation models from NVIDIA and the AI community, along with the complete NVIDIA NeMo software platform for rapid model development.

The computing power behind NVIDIA AI Foundry is NVIDIA DGX Cloud, a network of accelerated compute resources co-engineered with leading public clouds — Amazon Web Services, Google Cloud, and Oracle Cloud Infrastructure. With DGX Cloud, AI Foundry customers can develop and fine-tune custom generative AI applications with ease and efficiency, scaling their AI initiatives as needed without significant upfront investments in hardware. This flexibility is crucial for businesses looking to stay agile in a rapidly changing market.

NVIDIA AI Enterprise experts are available to assist customers, guiding them through building, fine-tuning, and deploying models with proprietary data to ensure they meet business requirements.

NVIDIA AI Foundry customers have access to a global ecosystem of partners providing a full range of support. Partners such as Accenture, Deloitte, Infosys, Tata Consultancy Services, and Wipro offer AI Foundry consulting services encompassing design, implementation, and management of AI-driven digital transformation projects. Accenture is the first to offer its AI Foundry-based offering for custom model development, the Accenture AI Refinery framework.

Service delivery partners like Data Monsters, Quantiphi, Slalom, and SoftServe help enterprises integrate AI into their existing IT landscapes, ensuring scalability, security, and alignment with business objectives.

Customers can develop NVIDIA AI Foundry models for production using AIOps and MLOps platforms from partners such as ActiveFence, AutoAlign, Cleanlab, DataDog, Dataiku, Dataloop, DataRobot, Deepchecks, Domino Data Lab, Fiddler AI, Giskard, New Relic, Scale, Tumeryk, and Weights & Biases.

Customers can output their AI Foundry models as NVIDIA NIM inference microservices — which include the custom model, optimized engines, and a standard API — to run on their preferred accelerated infrastructure.

Inference solutions like NVIDIA TensorRT-LLM improve efficiency for Llama 3.1 models, minimizing latency and maximizing throughput. This enables enterprises to generate tokens faster while reducing the total cost of running the models in production. Enterprise-grade support and security are provided by the NVIDIA AI Enterprise software suite.

NVIDIA NIM and TensorRT-LLM minimize inference latency and maximize throughput for Llama 3.1 models to generate tokens faster. Deployment options include NVIDIA-certified systems from global server manufacturers like Cisco, Dell Technologies, Hewlett Packard Enterprise, Lenovo, and Supermicro, as well as cloud instances from Amazon Web Services, Google Cloud, and Oracle Cloud Infrastructure.

Additionally, Together AI, a leading AI acceleration cloud, announced it will enable its ecosystem of over 100,000 developers and enterprises to use its NVIDIA GPU-accelerated inference stack to deploy Llama 3.1 endpoints and other open models on DGX Cloud.

Other than that, community and partner collaboration are also key players here since Team Green also supports Meta Llama, CodeGemma by Google DeepMind, CodeLlama, Gemma by Google DeepMind, Mistral, Mixtral, Phi-3, StarCoder2, and more.

As for NVIDIA NeMo, it comes with various tools and guideline features to help the refinement of AI models which include:

NeMo Curator: A GPU-accelerated data-curation library that improves generative AI model performance by preparing large-scale, high-quality datasets for pretraining and fine-tuning.
NeMo Customizer: A high-performance, scalable microservice that simplifies fine-tuning and alignment of LLMs for domain-specific use cases.
NeMo Evaluator: Provides automatic assessment of generative AI models across academic and custom benchmarks on any accelerated cloud or data center.
NeMo Guardrails: Orchestrates dialog management, supporting accuracy, appropriateness, and security in smart applications with large language models to provide safeguards for generative AI applications.

Using the NeMo platform in NVIDIA AI Foundry, businesses can create custom AI models precisely tailored to their needs. This customization allows for better alignment with strategic objectives, improved accuracy in decision-making, and enhanced operational efficiency. For instance, companies can develop models that understand industry-specific jargon, comply with regulatory requirements, and integrate seamlessly with existing workflows.

Enterprises can deploy their custom AI models in production with NVIDIA NeMo Retriever NIM inference microservices, helping developers fetch proprietary data to generate knowledgeable responses for their AI applications with retrieval-augmented generation (RAG).

All in all, NVIDIA AI Foundry champions itself in helping businesses deal with unique challenges when adopting AI technology.