Decoding the NVIDIA team's agent-based AI technology practice

Written by
Clara Bennett
Updated on:June-24th-2025
Recommendation

How the NVIDIA team uses agent-based AI technology to achieve business innovation.

Core content:
1. The integration technology path and advantages of AI sales assistant
2. Efficient practical strategies for code review optimization
3. The key role of agent-based AI technology in enterprise digital transformation

Yang Fangxian
Founder of 53A/Most Valuable Expert of Tencent Cloud (TVP)


In the wave of digital transformation, Agentic AI, as a representative of cutting-edge technology, is gradually becoming a key tool for enterprises to improve efficiency and optimize decision-making. The NVIDIA team has achieved innovative breakthroughs in multiple scenarios with Agentic AI technology. Based on its official technical blog, this article will restore the core technical paths and results of four major types of practices , providing developers with reference implementation templates. 

01

Practice 1: AI Sales Assistant - Enterprise-level Data Hub 

At NVIDIA, the sales team relied on internal and external documents, requiring multiple repositories to find information, which was time-consuming and difficult to ensure data consistency across systems. In addition, NVIDIA's product portfolio is diverse, requiring the sales team to stay up to date with the latest information in the fast-paced AI market.  

Based on this, NVIDIA used LLM and retrieval augmented generation (RAG) technology to develop an AI sales assistant integrated into the workflow, which can instantly access proprietary data and external data to simplify sales workflows and address the above challenges.

Advantages of a Sales Assistant

Unified information access : Combine internal NVIDIA data with broader insights through the Perplexity API and web search.

Enterprise Chat : Use models like Llama-3.1-405B-instruct to handle various queries like spell checking, summarization, encoding, and analysis.

Simplified CRM integration : Aggregate sales data directly in your CRM system using the Text2SQL approach, automatically generating SQL queries and enhancing reporting capabilities.

Architecture and workflow

LLM-assisted document extraction workflow: All text is processed using LLM and converted to a standardized Markdown format for extraction. The steps include parsing the PDF using NVIDIA Multimodal PDF Ingestion Blueprint, transcribing the audio files using NVIDIA Parakeet NIM, editing and translating using Llama 3.1 70B, and storing the results in a Milvus database.

Wide RAG integration: Using prompts during text generation, references are replaced with concise alphanumeric keys. In a subsequent postprocessing step, these keys are replaced with full reference details, significantly improving the reliability and accuracy of inline references.

Event-driven chat architecture : LlamaIndex workflows manage the generation process, and Chainlit context managers show progress . For tasks that require complex reasoning, structured generation techniques with mindchain reasoning are used to significantly improve the quality of queries generated for CRM data.

Early Progress Indicators: Citation cards provide real-time feedback during third-party API calls.

Value of results

  • AI Sales Assistant optimizes query processing, ensuring high performance and accuracy to meet the needs of dynamic, data-intensive environments .

  • Deliver instant, customized insights while dramatically improving workflow efficiency and user engagement .

02

Practice 2: Code review optimization - efficient practice of small models 

Fine-tuning small language models (SLMs)  , often using techniques such as knowledge distillation, can address some of the challenges posed by LLMs , such as high cost, data privacy issues, and the need for extensive hint engineering to achieve high accuracy in specific use cases. These smaller models can provide performance close to that of larger models, but are faster and more cost-effective. However, fine-tuning smaller models requires high-quality labeled data, which is time-consuming and expensive to create.

NVIDIA has built an automatic fine-tuning approach that addresses these challenges by using a data flywheel strategy. By using a large “teacher” model to generate and structure synthetic training data, this approach optimizes the fine-tuning process, enabling smaller models to more effectively handle complex tasks with minimal human intervention. 

The automatic fine-tuning method developed by NVIDIA draws inspiration from how teachers adjust their lessons to address specific areas of improvement for students. It adopts the teacher-student paradigm and incorporates the principles of knowledge distillation. The detailed fine-tuning method can be read in the official blog.

Practical Application of SLM in Code Review Automation

Code reviews are critical to ensuring software quality and performance and are traditionally performed by human reviewers.

Fine-tuned SLM enhances NVIDIA automated code review:

  • Improve LLM's accuracy when assigning severity.

  • Improve the clarity and quality of reasoning in LLMs. 

03

Practice 3: Slackbot Intelligent Assistant - Quickly Build a Practical App

Slackbot acts as a valuable virtual assistant that can handle a wide range of tasks. This not only saves time and resources, but also creates a more collaborative and efficient work environment. How can you quickly build an intelligent Slackbot that goes beyond simple automation?

Use NVIDIA NIM and LangChain to create custom Slackbot agents for specific use cases.  

The initial implementation of Slackbot supports interaction through Slack channels, threads, and chatbot personal messages. The primary model supporting this interaction is llama-3_1-405b-instruct, which can access external tools to enhance responses. These tools involve calling and preprocessing external endpoints.

Before you start building your Slackbot, make sure:

  • Set up Slack.

  • Get familiar with LangChain and proxies.


The libraries required for installation include the following:

openai

boto3

slack_bolt

slack - sdk

langchain

python - dotenv

langchain - community

langchain - nvidia - ai - endpoints

python - dotenv

langchainhub

You will also need the following resources:

  • API key from the NVIDIA API Catalog.

  • An AWS account (for Amazon EC2, Amazon Aurora, Amazon DynamoDB, Amazon ElastiCache, etc.) or similar cloud service.

  • Jupyter Lab notebook for initial testing.

Here are the steps to deploy Slackbot on AWS:

  • Install required libraries:  Before setting up the agent, make sure you have installed the necessary libraries, such as LangChain,  LangChain NVIDIA AI endpoint  , Slack SDK, etc.

  • Define the main agent: Define the main Slack functions for user interaction and integrate the NIM model as the main agent 

  • Setting up DynamoDB for memory management:  To track agent interactions, initialize a DynamoDB table and configure session memory

  • Configure conversational memory:  Integrate chat message history into the agent's conversational memory

  • Define keyword-based tool usage:  Add keyword-based triggers to prompt the bot to use specific tools

  • Complete Agent:  ReACT is a framework where Large Language Models (LLM) combine reasoning with action. Use it to solve tasks based on provided examples. Create ReACT agents and agent executors using predefined variables

  • Saving interactions in Amazon Aurora PostgreSQL:  Save interactions in predefined functions in an Amazon Aurora PostgreSQL database

AI agents are transforming enterprise applications by automating tasks, optimizing processes and improving productivity. NVIDIA NIM microservices provide a way to seamlessly integrate multiple agents and tools, enabling enterprises to create customized AI-driven solutions.

This exercise shows how to use the NIM AI endpoint to create an end-to-end Slackbot agent with custom tools. This solution enhances the simple Slack interface to handle more complex tasks and solve unique challenges.

For more examples, check out the Official/NVIDIA/GenerativeAIExamples GitHub repository .

04

Practice 4: Automated test generation - Hephaestus framework

In software development, testing is critical to ensuring the quality and reliability of the final product. However, creating test plans and specifications can be time-consuming and labor-intensive, especially when managing multiple requirements and different test types in complex systems. Many of these tasks are often performed manually by test engineers.  

To simplify this process, NVIDIA’s DriveOS team developed Hephaestus (HEPH), an in-house generative AI framework for automated test generation. HEPH automates the design and implementation of various tests, including integration tests and unit tests.

HEPH uses LLM agents in every step of the test generation process, from document tracking to code generation. This automates the entire test workflow and saves a lot of time for the engineering team.

  • Time savings: HEPH significantly speeds up the test creation process. In trials with multiple NVIDIA pilot teams, teams reported saving up to 10 weeks of development time.

  • Context-aware test generation: HEPH generates test specifications and implementations using project documentation and interface specifications. Each test is compiled, executed, and verified to ensure correctness. Test coverage data is fed back into the model to further optimize test generation.

  • Multi-format support and modularity: HEPH supports a variety of input formats, including PDF, RST, RSTI, and HTML, and integrates with internal tools such as Confluence and JIRA.

Conclusion

NVIDIA's four major practices demonstrate the key technical paths for the implementation of agent-based AI in enterprise-level scenarios:

1.RAG+LLM realizes dynamic data integration

2.Small model fine-tuning breaks through cost and privacy bottlenecks

3.NVIDIA NIM+LangChain builds lightweight intelligent agents

4.Demand-driven test generation improves quality engineering efficiency