NVIDIA Huang's March 2025 GTC Keynote transcript

Written by
Audrey Miles
Updated on:July-11th-2025
Recommendation

NVIDIA Huang's March 2025 GTC Keynote transcript will show you the innovative power of AI and future technology trends.

Core content:
1. How AI revolutionizes traditional factories and opens up new areas
2. NVIDIA's latest progress and product releases in the field of artificial intelligence
3. Application examples of AI technology in computer graphics, medical care, transportation and other industries

Yang Fangxian
Founder of 53AI/Most Valuable Expert of Tencent Cloud (TVP)

AI Assistant: AI Haoji (aihaoji.com)

This is how intelligence is created - a new kind of factory, a tokenizer, the building block of AI. Tokens open up new frontiers and mark the first step into an extraordinary world of endless possibilities.

Tokens transform images into scientific data, mapping alien atmospheres and guiding future explorers. They transform raw data into foresight, ensuring we are prepared for the future. Tokens decode the laws of physics, allowing us to get to our destinations faster and go further. They detect disease before it manifests, helping us unlock the language of life and understand how we operate.

Tokens connect the dots, allowing us to protect our most precious creatures. They turn potential into abundance, helping us reap the fruits of our labor. Tokens not only teach robots how to move, they also bring joy, lend a hand, and make life easier.

Together, we are taking the next great leap and boldly going where no one has gone before. It all starts here. Welcome to the stage, NVIDIA founder and CEO, Jen-Hsun Huang.

What an incredible year it’s been. At NVIDIA, we’re committed to making the extraordinary possible. Using the power of artificial intelligence, we’re virtually transporting you to NVIDIA headquarters. This is where we work and innovate.

It’s been a remarkable year, and we have many exciting developments to share. I want you to know that I’m here for the moment, and I’m present without reservation.

There's no script, no teleprompter, and I have a lot to cover, so let's get started. First, I want to thank all the sponsors and participants of this conference. Almost every industry was represented - healthcare, transportation, retail, and especially computers. It was amazing to see such a diverse gathering. Thank you for your support.

GTC started with GeForce, and today, I introduce you to the GeForce 5090. Remarkably, 25 years after we began developing GeForce, it's still a global bestseller. This is the Blackwell series, the GeForce 5090. It's 30% smaller, 30% cooler, and delivers unmatched performance over the 4090. The driving force behind this advancement is AI. GeForce introduced the world to CUDA, which made AI possible, and now AI is revolutionizing computer graphics. What you're looking at here is real-time computer graphics. For every pixel rendered, the AI ​​predicts the other 15. Think about this: for every pixel mathematically rendered, the AI ​​infers the other 15 with such precision that the image appears accurate and temporally stable. This means that from frame to frame, whether moving forward or backward in computer graphics, the image remains temporally consistent. It's pretty amazing.

Artificial intelligence has made extraordinary progress in the past decade. Although discussions about AI have been going on for a long time, it really entered the global vision about ten years ago. The journey began with perceptual AI, covering computer vision and speech recognition, followed by generative AI, which has dominated the past five years. Generative AI focuses on teaching AI to convert between different modalities - text to image, image to text, text to video, amino acids to proteins, and properties to chemicals. This technology fundamentally transforms computing from a retrieval-based model to a generative model. Before, content was created in advance, stored in multiple versions, and retrieved when needed. Now, AI understands the context, interprets the request, and generates a response, retrieving information when necessary to enhance its understanding. This transformation has revolutionized every layer of computing.

In recent years, a major breakthrough has occurred with the emergence of agent AI. Agent AI is autonomous and can perceive and understand context, reason through questions, plan actions, and use tools. It can navigate multimodal information, such as websites, interpret their content, and apply newly acquired knowledge to its tasks. At the core of agent AI is reasoning, a relatively new capability. The next wave, already underway, is robotics, driven by physical AI. Physical AI understands the physical world, including concepts such as friction, inertia, causality, and object constancy. This understanding of the three-dimensional world will lead AI into a new era, making advanced robotics possible.


Each stage of AI development—perception, generative, agent, and physical —has opened up new market opportunities and attracted more partners to GTC, which has become a hub for innovation and collaboration.

The only way to accommodate more attendees at GTC is to expand San Jose, which we are actively doing. We have ample land available for development, and growing San Jose will allow us to ramp up GTC. Standing here, I want you all to witness what I see - a stadium full of activity. Last year marked our return to in-person events, and it felt like a rock concert. GTC has been described as the Woodstock of AI, and this year it's been called the Super Bowl of AI. The key difference is that in this Super Bowl, everyone wins - everyone is a winner. Every year, more and more people participate as AI continues to solve complex problems in more industries and companies. This year, we'll focus on Agentic AI and Physical AI .

At its core, every wave and phase of AI is driven by three fundamental elements. First, how do we solve the data problem? This is critical because AI is a data-driven approach to computer science. It needs data to learn, acquire knowledge through digital experiences, and build its understanding. Second, how do we solve the training problem without human intervention? The challenge is that humans have limited time, while AI needs to learn at superhuman speeds, in real time, and at a scale beyond human capabilities. The third element is scalability - how do we create algorithms so that the more resources we provide, the smarter the AI ​​becomes? This is the law of scaling.

Last year, the world largely misunderstood the computational requirements of AI. The scaling laws for AI are actually much more tenacious and are hyper-accelerating. The amount of compute required now, driven by agent AI and inference, easily exceeds what we expected last year by a factor of a hundred. Let’s explore why this is the case.

First, let’s consider what AI can do. As I mentioned, agent AI is fundamentally about reasoning. The AI ​​we have today is able to break down a problem piece by piece. They might approach a problem in multiple ways, choosing the best solution, or solving the same problem through different methods to ensure consistency. Once they have an answer, they might plug it into an equation, such as a quadratic equation, to confirm its accuracy.

Just one try. Remember when we started working with ChatGPT two years ago? Although it was a wonder, it still struggled to handle many complex and simple problems. It relied on pre-training data and past experience to generate output in one go. Today, we have AIs that can reason step by step through thought chaining techniques, consistency checks, and various path planning techniques. These AIs can break down problems and reason sequentially.

The basic technique of AI is still to predict the next token. However, instead of generating one token at a time, AI now generates a sequence of tokens that represent the steps of reasoning. This results in a huge increase in the number of tokens generated - easily a hundred times more. To remain responsive and interactive, we have to compute faster, resulting in a hundred-fold increase in computing requirements.

The question then becomes: how do we teach AI to perform this chain of thought? One approach involves teaching AI to reason. During the training process, we face two fundamental challenges: obtaining data and avoiding limitations imposed by human intervention. Available data and human demonstrations are limited.

One of the big breakthroughs in recent years, and with demonstrable results, is reinforcement learning, an approach that enables AI to solve problems incrementally, drawing on a vast library of problems that humans have already solved.

We are familiar with the rules for solving quadratic equations, the Pythagorean theorem, and right triangles. We understand numerous principles in math, geometry, logic, and science. Puzzles like Sudoku present constrained problems, and we have hundreds of these problem spaces and can generate millions of examples. We use reinforcement learning to reward the AI ​​for progress by giving it hundreds of attempts to solve these problems step by step. This process involves hundreds of subjects, millions of examples, and hundreds of attempts, generating tens of thousands of tokens each time. In total, this requires trillions of tokens to train the model. With reinforcement learning, we can generate large amounts of synthetic data, essentially using a robotic approach to teach the AI. The combination of these methods presents significant computational challenges for the industry.

The industry is rising to this challenge. For example, shipments of Hopper by the four largest CSPs—Amazon, Azure, GCP, and OCI—reflect this growth. Excluding AI companies, startups, and enterprises, these four largest CSPs alone demonstrate the rapid expansion of AI infrastructure. Comparing Hopper’s peak year to Blackwell’s first year shows significant growth in just one year. This growth is reflected across the computing landscape. Analysts predict that global data center capital spending (both CSP and enterprise) will increase significantly by the end of this decade. I previously predicted that data center construction will reach a trillion dollars, and I believe we will reach this milestone soon.

Two dynamics are driving this growth. First, much of this expansion is accelerating. General-purpose computing has reached the end of its tether, and new approaches are needed. The world is moving away from hand-coding software on general-purpose computers to running machine learning software on accelerators and GPUs.

This approach to computing has passed a tipping point, and we are now witnessing an inflection point in the construction of data centers around the world. The first major change is a shift in the way we perform computing. The second is a growing recognition that the future of software requires a large capital investment. This is a profound change.

In the past, we wrote software and executed it on computers. In the future, computers will generate tokens for software, transforming them from retrieval-based systems to generation-based systems. This marks a shift from traditional data center operations to a new paradigm for building infrastructure, which I call AI Factories. These AI Factories have a single purpose: to generate tokens that we can reassemble into music, text, video, research, chemicals, proteins, and a variety of other forms of information.

The world is shifting not only in the scale of data center construction, but also in its design. Every component within the data center will be accelerated, although not all of them will be AI-driven. I want to emphasize this point. This slide is really my favorite because for those of you who have attended GTC over the years, you have heard me discuss these libraries in detail. This slide encapsulates the essence of GTC.

In fact, 20 years ago, this was our only slide - library after library. Just as we needed an AI framework to create AI and accelerate these frameworks, we now need frameworks for science fields such as physics, biology, multiphysics, quantum physics, etc. We call these CUDAX libraries - acceleration frameworks tailored for each discipline.

The first of these is CU PI Numeric. NumPy is the world's most downloaded and most widely used Python library, with 400 million downloads last year. CuLitho is a computational lithography library that has revolutionized the lithography process over the past four years. Computational lithography is the second factory in the fab - one fab makes wafers, while the other generates the information needed to make them. In the future, every industry and company with a fab will operate two facilities: one for production and one for the math behind it.

This factory makes AI, cars, smart speakers, and AI for smart speakers. CuLitho is our computational lithography technology, backed by partners like TSMC, Samsung, ASML, Synopsys, and Mentor. This technology is at a tipping point where every mask and lithography process will be processed by NVIDIA CUDA in the next five years. Arial is our library for 5G that turns a GPU into a 5G radio. Signal processing is our strength, and we can layer AI on top of it to create an AI RAN for the next generation of radio networks. AI will be deeply integrated into these networks, overcoming the limitations of information theory by increasing the available spectrum.

CuOpt is our solution for numerical and mathematical optimization, which is widely used in various industries, including planning seats and flights, managing inventory and customers, workers and factories, drivers and passengers, etc. It can handle multiple constraints and variables to optimize time, profit, service quality and resource use. NVIDIA uses CuOpt for supply chain management, reducing computing time from hours to seconds, so as to explore a larger solution space. We announced that CuOpt will be open sourced and will work with Gurobi, IBM CPLEX and FICO to accelerate industry innovation.

MONI is the world's leading medical imaging library, while Earth2 focuses on multiphysics for high-resolution local weather forecasting. CuQuantum and CuToQ are part of our quantum computing efforts, and we held our first Quantum Day at GTC. We work with the ecosystem on quantum architectures, algorithms, and quantum heterogeneous architectures for classical acceleration. CuEquivariance and CuTensor deal with tensor contraction and quantum chemistry.

The CUDA stack is well-known, and many libraries have been integrated into various parts of the ecosystem to drive AI progress. Today, I am announcing CuDSS, our sparse solver, which is essential for CAE. This is one of the most important developments in the past year, and we have collaborated with system companies such as Cadence, Synopsys, Ansys, Dassault System, etc.

We have now accelerated almost all of the major EDA and CAE libraries. It is worth noting that until recently, NVIDIA relied on general-purpose computers to run software at slower speeds to design accelerated computers for others. The reason for this is that until recently, there was no software optimized for CUDA. Now, as we move to accelerated computing, the entire industry is poised for a significant boost.

We launched cuDF, a dataframe for structured data, which now provides seamless acceleration for Spark and pandas. In addition, we have Warp, a physics library that runs in Python and is optimized for CUDA. We have a big announcement about this that I will share later. These libraries are just the tip of the iceberg of making accelerated computing possible.

While we are incredibly proud of CUDA, it is important to recognize that without CUDA and its widespread adoption, these libraries would be far less valuable to developers. Developers use these tools because they provide amazing speed and scalability, and because CUDA is now ubiquitous—found on every cloud platform, in every data center, and at every major computer company in the world.

By leveraging these libraries, your software can reach a global audience. We have reached a tipping point in accelerated computing, and CUDA has played a key role in this achievement. This is what GTC is all about - the ecosystem, and it is all of you who make this possible.

CUDA is designed for the creators, pioneers, and builders of tomorrow. Since 2006, six million developers in more than 200 countries have used CUDA to revolutionize computing.

With more than 900 CUDA libraries and AI models, NVIDIA is accelerating scientific advancements, transforming industries, and empowering machine vision, learning, and reasoning. The NVIDIA Blackwell architecture is now 50,000 times faster than the first CUDA GPU. These dramatic increases in speed and scale are bridging the gap between simulation and real-time digital twins. This is just the beginning, and we can’t wait to see the innovations that will follow.

I feel deeply about the value of our work, and especially the impact you create. One moment in my 33-year career that really stood out to me was when a scientist said to me, “ Jensen, because of your work, I’m able to accomplish my life’s work before I die.

If this doesn't resonate with you, you're indifferent. It's all about you. Thank you. Now, let's talk about artificial intelligence (AI). AI first emerged in the cloud for a good reason: it requires infrastructure. As science dictates, machine learning requires machines to perform the science. Therefore, machine learning requires infrastructure, and cloud data centers provide that infrastructure, along with exceptional computer science and research. These conditions are perfect for AI to thrive in the cloud and in cloud service providers (CSPs). However, AI is not limited to the cloud; it will permeate every sector.

We will explore the application of AI in various scenarios. CSPs appreciate our cutting-edge technology and full-stack solutions. As I mentioned before, accelerated computing is not just about chips. It includes chips, programming models, and a series of software layers. The entire stack is extremely complex, and each layer and library is similar to the SQL revolutionized by IBM. Imagine the complexity of this stack when applied to AI, only even greater.

CSPs also value the fact that NVIDIA CUDA developers are their customers as they build infrastructure for global use. A strong developer ecosystem is highly respected. As we expand AI into the wider world, we encounter a diversity of system configurations, operating environments, domain-specific libraries, and usage patterns. Applications of AI span enterprise IT, manufacturing, robotics, self-driving cars, and even GPU cloud startups. About 20 companies, including our esteemed partner CoreWeave, have emerged during NVIDIA's tenure focusing on managed GPUs and calling themselves GPU clouds. CoreWeave is coming to market, and we are incredibly proud of them.

One area I’m particularly excited about is edge computing . Today, we announced a collaboration with Cisco, NVIDIA, T-Mobile, the world’s largest telecommunications company, and Cerberus ODC to develop a full-stack solution for radio networks in the United States. This initiative will bring AI to the edge and marks a major advancement in our technology landscape.

Keep in mind that hundreds of billions of dollars are invested globally each year in radio networks and the data centers that support communications. In the future, I firmly believe this will be an area where accelerated computing and AI will converge. AI will excel at adapting radio signals, especially massive MIMO systems, to changing environmental and traffic conditions. We will naturally employ reinforcement learning to do this. MIMO is essentially a giant radio robot, and we will provide the necessary capabilities to support it. AI has the potential to revolutionize communications. For example, when I call home, I don’t need to say much because my wife is already familiar with my work and what’s going on. Our conversation often picks up where we left off, and she knows my preferences. This efficiency comes from context and prior knowledge. Combining these capabilities can transform communications, just as it has already done for video processing and 3D graphics. We are also committed to advancing AI in edge computing.

I’m very excited to announce the collaboration between T-Mobile, Cisco, NVIDIA, Cerberus, and ODC to build a full-stack solution. AI is about to permeate every industry, and one of the earliest adopters is self-driving cars. After working in computer vision for many years, I was inspired and excited when I first saw AlexNet. This led us to go all-in on self-driving cars. We have been working on this technology for more than a decade, and now, almost every self-driving car company is using our technology. This includes Tesla, which uses NVIDIA GPUs in its data center, and Waymo and Wave, which use NVIDIA computers in both their data centers and vehicles. In some cases, our technology is used exclusively in vehicles, although this is rare. In addition, many companies use our full software stack.

We build three types of computers: training computers, simulation computers, and robotic computers for self-driving cars. We also develop the software stacks, models, and algorithms that support these systems, just as we do for other industries. Today, I’m excited to announce that General Motors (GM) has selected NVIDIA as a partner to build their future fleet of self-driving cars. The era of self-driving cars is here, and we look forward to collaborating with GM in three key areas: AI for manufacturing, AI for infrastructure, and AI for autonomy.

Enterprise AI enables companies to revolutionize their workflows, design and simulate vehicles, and integrate AI in the vehicle itself. We are working with General Motors to develop their AI infrastructure, and I am particularly excited about this collaboration. Although this area is often overlooked, I am deeply proud of the progress we have made in this area.

Safety is critical, especially in automotive safety. It's called the halo. In our company, it's called the halo. Safety requires technology, from silicon to systems and system software.

Algorithms and methods cover everything from promoting diversity to ensuring diversity, as well as monitoring and transparency. Explainability is a fundamental principle that must be deeply integrated into every aspect of system and software development. We are the first company in the world to conduct security assessments on every line of code, totaling seven million lines of code.

Our chips. Our system software and algorithms undergo rigorous security assessments by third parties who scrutinize every line of code to ensure diversity, transparency, and explainability. We have also applied for more than a thousand patents.

During this GTC, I strongly recommend that you attend the Halos workshop to see first-hand the integration of the various components that ensure the safety and autonomy of future vehicles. This is an achievement that I am particularly proud of, even though it is often overlooked. Therefore, I decided to spend extra time discussing it.

NVIDIA has made significant progress in self-driving car technology. You may have seen self-driving cars in action, such as the impressive Waymo driverless taxis. We created a video showing some of the advanced technologies we are using in this area.

Addressing challenges around data, training, and diversity. We can harness the power of AI to advance AI technology further. NVIDIA is using Omniverse and Cosmos to accelerate AI development for autonomous vehicles.

Cosmos prediction and inference capabilities support AI-first autonomous driving systems that can be trained end-to-end and leverage new development methods such as model distillation, closed-loop training, and synthetic data generation.

First, model distillation.

As an adapted version of the policy model, Cosmos facilitates knowledge transfer from a slower, smarter teacher model to a smaller, faster student model deployed in the car. The teacher policy model demonstrates the optimal trajectory.

The student model learns through iterations until its performance reaches almost the same level as the teacher. The distillation process initiates a policy model, but further tuning is still required in complex scenarios. Closed-loop training helps fine-tune the policy model. The log data is converted into 3D scenes, enabling driving closed-loop simulations using Omniverse neural reconstruction in a physics-based environment. Variants of these scenes are created to evaluate the trajectory generation capabilities of the model. The Cosmos behavior evaluator then scores the generated driving behaviors to measure model performance. The newly generated scenes and their evaluation results provide a comprehensive database for closed-loop training, enhancing the navigation capabilities of autonomous vehicles in complex scenarios.

Finally, 3D synthetic data generation improves AVs’ adaptability to different environments. Omniverse builds a detailed 4D driving environment by integrating maps and images from log data, generating a digital twin of the real world. Including segmentation. Guided by classifying each pixel, Cosmos expands training data by generating accurate and diverse scenarios, effectively narrowing the gap between simulation and the real world. Omniverse and Cosmos enable autonomous vehicles to learn, adapt, and drive intelligently. To advance safer mobility, NVIDIA is the ideal company to lead this initiative.

That is our destiny: to use AI to reinvent AI.

The technology we are demonstrating is very similar to what you experience today. Our goal is to take you into a digital twin, which we call NVIDIA. Now, let's talk about the data center.

Let's talk about the data center. Blackwell is now in full production, and this is its current state. It's been an incredible experience. It's a beautiful sight for us. Do you agree?

This is a significant milestone because we have achieved a fundamental shift in computer architecture. About three years ago, I proposed a version of this concept called Grace Hopper, with a system called Ranger. The Ranger system was about half the width of the screen and was the world's first NVLink 32. At the time, Ranger was oversized, but it was proven to be feasible and it represented the right direction.

Our goal is to solve the challenge of scaling up. Distributed computing involves using multiple computers to solve large-scale problems, but before scaling, scaling up is necessary. Both strategies are critical, but upgrading must come before scaling. Scaling up is extremely challenging and there is no easy solution.

Unlike Hadoop, which connects a large number of commodity computers into a large network for in-memory computing, our approach requires a different strategy. Hadoop is revolutionary, enabling hyperscale data centers to solve huge problems using off-the-shelf computers. However, the complexity of our problem makes this approach prohibitively expensive in terms of power and energy consumption.

Deep learning wouldn't be feasible without upgrading first. Here's how we got there. The previous generation system architecture, the HGX, weighed 70 pounds and revolutionized computing and AI. It was made up of eight GPUs. Eight GPUs. Each of them was essentially similar. This was a Blackwell package with two Blackwell GPUs, and eight of them were integrated underneath it.

This connects to what we call an MVLink 8, which in turn connects to the CPU rack. The setup consists of dual CPUs placed on top, connected via PCI Express.

Many of these components are connected via InfiniBand to form an AI supercomputer. This is the way it was done in the past, and it’s how we started. We scaled up to a certain extent, and then we scaled out. However, our goal is to scale up even further. Ranger augments this system with scale-out, increasing its capacity fourfold.

We originally used MVLink 32, but the system became too large. So we had to do a lot of re-engineering to modify the way MVLink and the scale-up process worked. The first step was to decouple the MVLink system that was originally embedded on the motherboard and remove it. This is the MVLink system.

This is an NVLink switch, the highest performance switch ever built, which enables each GPU to communicate with every other GPU simultaneously at full bandwidth.

This is the MVLink switch. We decoupled it, removed it, and placed it in the center of the chassis. There are 18 of these switches spread across nine different racks, which we call switch trays. Now that these switches are decoupled, the compute components are placed here, which is equivalent to these two elements of computing.

What's remarkable is that this system is completely liquid-cooled. By leveraging liquid cooling, we've been able to squeeze all of these compute nodes into a single rack. This represents a significant shift in the industry.

I want to express my gratitude to everyone here for embracing this fundamental shift - from integrated MVLink to disaggregated MVLink, from air-cooled to liquid-cooled systems, and from approximately 60,000 components per computer to 600,000 components per rack.

This 120 kW, fully liquid-cooled system delivers an exascale computer in a single rack. Isn't that amazing? This is a compute node.

The system is now in a single unit that weighs 3,000 pounds and has 5,000 cables that are about two miles long. It's made up of 600,000 parts, the equivalent of 20 cars packed into a supercomputer. Our goal was to scale, and this is the result. We aimed to make this chip, but there was no reticle limit or process technology that could make it. It contains 130 trillion transistors, 20 trillion of which are used for computing. To solve this problem, we broke it down into a rack of Grace Blackwell MVLink 72, and achieved the ultimate scale-up. This represents the most extreme scale-up ever, with unprecedented computing power. With a memory bandwidth of 570 terabytes per second, every metric of this machine is measured in trillions. It's capable of exaflops, which is a million trillion floating-point operations per second. We pursued this to solve an extreme problem.

Many people mistakenly believe that the extreme problem of inference is simple. In reality, it represents the ultimate challenge in extreme computing. Inference involves the factory generating tokens, which directly affects revenue and profitability. Therefore, this factory must be built with extreme efficiency and performance, as every aspect of it affects service quality, revenue, and profitability.

To understand this better, let's look at a graph with two axes. The X-axis represents tokens per second. As you interact with ChatGPT, the output consists of tokens, which are then reassembled into words. For example, the token "THE" can stand for "the", "them", "theory", or "theatrics", etc. These tokens are the building blocks of the AI-generated responses.

To increase the intelligence of AI, it is crucial to generate a large number of tokens. These tokens include reasoning tokens, consistency check tokens, and idea generation tokens, which help AI choose the best response. AI may question itself, asking, "Is this the best you can do?" This internal dialogue reflects the human thought process. The more tokens generated, the smarter the AI. However, if the response time is too long, users may abandon the service. This dynamic is similar to web search, where there is a practical limit to how long users are willing to wait for intelligent answers.

Therefore, there are two competing dimensions: generating as many tokens as possible while ensuring fast response times. Token rate is critical because faster token generation per user improves the experience. However, in computer science and factory operations, there is a fundamental tension between latency (response time) and throughput. In high-volume businesses, batching is a common practice, where customer requests are aggregated and processed in batches. This approach can introduce latency between the time the batch is processed and the time it is consumed.

The same principle applies to the AI ​​factory that generates tokens. On the one hand, the goal is to provide the best possible service to customers - fast-responding intelligent AI. On the other hand, the data center must generate tokens for as many users as possible. Balancing these two goals is a key challenge in AI inference.

To maximize your revenue, the ideal solution is in the upper right quadrant, where the curve resembles a square. This would allow tokens to be generated for everyone at maximum rate, until the factory's limits are reached. However, no factory can achieve this, so the curve may be more nuanced. Your goal is to maximize the area under the curve, which is the product of X and Y. The further out you push it, the better the factory you are building.

One dimension requires a lot of compute (flops) in terms of tokens per second and response time across the factory, and the other dimension requires significant bandwidth and flops. This presents a difficult problem to solve. The optimal solution is to have plenty of flops, bandwidth, memory, and everything else. That's why this computer is so great - it starts with the maximum possible flops, memory, bandwidth, optimal architecture, and highest energy efficiency. In addition, it requires a programming model that allows software to run efficiently on these demanding parameters.

Now, let's understand this concept concretely with a demonstration. Traditional large language models (LLMs) capture basic knowledge, while reasoning models assist in solving complex problems by using thought tokens. For example, consider a prompt that asks to arrange the seating at a wedding table under the constraints of following traditions, photogenic angles, and quarreling family members.

The traditional LLM gave an answer quickly, using less than 500 tokens, but made mistakes when seating guests. In contrast, the inference model processed more than 8,000 tokens to arrive at the correct solution, ensuring a harmonious event.

As you know, organizing a wedding party for 300 guests and finding the optimal seating arrangement is a problem that only AI or your mother-in-law can solve. It is a task that traditional methods cannot handle effectively. Here, we pose a problem that requires reasoning, and R1 responds by exploring various scenarios, testing its own answers, and verifying their correctness. In contrast, the previous generation language model took a one-shot approach and quickly consumed 439 tokens, but the results were inaccurate, causing these tokens to be wasted. To reason about this relatively simple problem, R1 requires nearly 9,000 tokens and requires significantly more computing resources due to the complexity of the model.

Before I dive into the results, let me explain another aspect. When looking at the Blackwell system, which is now scalable with NVLink 72, the first step is to deal with the size of the model. While the R1 is generally considered a small model, it contains 608 billion parameters and future models may scale to trillions of parameters. To manage this, the workload is distributed across the GPU system through techniques such as tensor parallelism, pipeline parallelism, and expert parallelism. The combination of these approaches is very wide and the configuration must be tuned depending on the model, workload, and situation to maximize throughput or optimize for low latency. This requires advanced techniques such as in-flight batching and aggregation, making the operating system of the AI ​​factory extremely complex.

A key advantage of a homogeneous architecture like NVLink 72 is that each GPU is capable of performing all of the tasks described. We observed that the inference model goes through multiple computational stages, one of which is the “thinking” stage. When you think, you are not generating a lot of tokens. Instead, you are consuming tokens internally, perhaps in the process of reading or digesting information. This information may come from a PDF, a website, or even a video, and you process it at a superlinear rate. You then synthesize this information to formulate a planned response. This stage of information digestion and context processing is highly FLOPs intensive.

The next stage, called decoding, requires a lot of floating point operations and huge bandwidth. For example, a model with trillions of parameters requires terabytes per second to fetch the model from HBM memory and generate a token. This is because large language models predict the next token, not every token. Techniques like speculative decoding aim to speed up this process, but fundamentally the model predicts one token at a time.

During this process, the entire model and context, known as the KV cache, is ingested to generate a token. This token is then fed back into the system to generate the next token. Each iteration involves processing trillions of parameters to generate a token. For example, in a recent demonstration, 8,600 tokens were generated, which means that the GPU processed trillions of bytes of information to generate one token at a time.

That’s why NVLink is so important. NVLink enables multiple GPUs to operate as a single, giant GPU for ultimate scalability. Additionally, NVLink allows the pre-fill and decode stages to be decomposed, so more GPUs can be allocated to pre-fill and less to decode, depending on the task requirements.

For example, when doing deep research, which involves extensive reading and synthesis of information, more GPUs can be allocated for pre-population. This process is agentic, involves deep research and formulation of answers, and often results in a comprehensive report. This capability is a testament to the power of GPUs and the joy of using them to their full potential.

This is quite remarkable. Prefill was very active throughout the entire period, but relatively few tokens were generated. In contrast, when engaging with the chatbot, which is a scenario that millions of users are engaged in, the process is highly dependent on token generation and decoding. Depending on the workload, we may allocate more GPUs to decoding or prefilling. This dynamic operation is extremely complex.

I have outlined pipeline parallelism, tensor parallelism, expert parallelism, in-flight batching, factorized inference, and workload management. In addition, managing the KV cache, routing it to the appropriate GPU, and traversing all the memory hierarchies is a complex task. The software to handle this is extremely complex.

Today, we are launching NVIDIA Dynamo, which manages all of these functions. It acts as the operating system for the AI ​​factory. Previously, data centers used systems like VMware to orchestrate various enterprise applications. In the future, the focus will shift from enterprise IT to the agent, and the operating system will also shift from VMware to Dynamo. This system does not run in the data center, but in the AI ​​factory.

The name Dynamo is significant because the generator sparked the last industrial revolution, turning water into electricity. Water went in, ignited into steam, and produced an invisible but extremely valuable output. While the transition to alternating current took 80 years, the generator marked the beginning. So we named this complex software NVIDIA Dynamo.

It's open source, and we're really excited because a lot of partners are working with us on this project. One of my favorite partners is Perplexity, not only because of their revolutionary work, but because Erevin is such a wonderful person. They're a great partner in this endeavor.

Now, we need to wait for all the infrastructure to be expanded. In the meantime, we have performed extensive simulations. It is logical that we simulate our supercomputers with supercomputers. Next, I will show the benefits of what I have explained.

Think back to the diagram of the factory. The x-axis is the token throughput per second of the factory, and the y-axis is the token throughput per second of the user experience. The goal is to mass produce highly intelligent AI.

This is Hopper. It can generate about a hundred tokens per second per user. It consists of eight GPUs connected via InfiniBand, normalized to tokens per second per megawatt. This is based on a one-megawatt data center, which is relatively small for an AI factory.

Hopper can generate a hundred thousand tokens per second for that one-megawatt data center, or about two and a half million tokens per second if the system is super batched and customers are willing to wait a while.

This is the essence of every GTC event, where the price of admission involves a deep dive into complex mathematics. This is a unique experience that only NVIDIA can provide.

So, Hopper, you got 2.5 million. How do you interpret that number? Remember, ChatGPT costs about $10 per million tokens.

$10 per million tokens. Let’s assume for a moment that $10 per million tokens is probably where the price is going to be.

Let me illustrate this. If the rate is two to five million, multiply by 10, that's $25 million per second. Or, if the rate is significantly lower, say one million, divide by 10 and you get $250,000 per second per plant. Considering there are about 30 million seconds in a year, that's about the same revenue as a one-megawatt data center.

The goal is to maximize the token rate to develop highly intelligent AI, because smarter AI has higher value. However, there is a trade-off here: the more advanced the AI, the lower the production volume. This balance is critical, and this is the curve we seek to optimize.

What I am showing you now is the fastest computer in the world.

Computers have revolutionized everything. So how do we improve it? The first step was to develop Blackwell with NVLink 8, keeping the same compute node but leveraging FP8. Blackwell is faster, bigger, and packs more transistors. However, we went beyond that, we introduced new precision. It's not as simple as 4-bit floating point, but using 4-bit floating point allows us to quantize models and use less energy on the same task. So using less energy allows you to do more.

Remember, every data center in the future will be power constrained, which means your revenue is tied to the available power. This is similar to many other industries. We are now a power constrained industry, and our revenue will reflect this. Therefore, it is critical to have the most energy-efficient computing architecture.

Next, we scaled with NVLink 72. Note the difference between NVLink 72 and FP4. Our architecture is tightly integrated, and with the addition of Dynamo, it scales even further. Dynamo also benefits Hopper, but the impact on Blackwell is particularly significant. Only at GTC would such progress be applauded.

Now, look at those two shiny blocks - that's where your maximum Q is, and most likely where you run your factory operations. You're balancing maximum throughput with the highest quality AI. The smartest AI, with the most output - that's what you're optimizing for. Under these two boxes, Blackwell significantly outperforms Hopper.

Remember, this isn't just about ISO chips; this is about ISO power. This is the ultimate expression of Moore's Law, and at ISO power, there's a 25x improvement between generations. This isn't about ISO chips, transistors, or anything else -- this is about ISO power, which is the ultimate limiting factor. We can only allocate so much energy to the data center. At ISO power, Blackwell is 25x more efficient.

Now, look at that rainbow — that’s the fun part. Each configuration under the Pareto front represents millions of points where we could configure our data center. We could split and shard the work in many ways, but we found the optimal solution — the Pareto front. Each configuration, represented by a different color, shows a unique setup.

This image clearly shows the need for a programmable architecture that is as uniformly interchangeable as possible given the wild variation in workloads across the range of Frontier. At the top, Expert Parallel 8 at a batch size of 3,000, factorization off, and Dynamo off. In the middle, Expert Parallel 64 using 26% of the context, with Dynamo on, leaving 74% unused. Here at a batch size of 64, there is Expert Parallel 64 on one side and Expert Parallel 4 on the other. At the bottom, Tensor Parallel 16 is paired with Expert Parallel 4 at a batch size of 2 and context usage at 1%. The configurations of the machines vary significantly across the range.

This test case, with an input sequence length of 1,000 tokens and an output of 2,000 tokens, serves as a baseline. Earlier, we showed outputs of 8,000 to 9,000 tokens, which was not representative of a single chat, but a broader scenario. The goal is to build the next generation of computers for the next generation of workloads. In this inference model, Blackwell outperforms Hopper by 40 times.

These advances are truly remarkable. I mentioned earlier that once Blackwell GPUs start shipping in volume, the Hopper GPU will become obsolete. That is correct. If you are still considering buying a Hopper, don’t worry. It is still a viable option. But I am the Chief Revenue Officer. My sales team expressed concerns, but I assured them that Hopper is still viable in certain scenarios. That is the most positive thing I can say about Hopper - it is adequate in certain situations.

If I were to make an estimate, the rapid pace of technological advancement, combined with the intensity of the workloads and the size of these factory-like systems, highlights the importance of investing in the right version. To put it in perspective, a 100 MW factory based on the Hopper architecture consists of 45,000 chips and 1,400 racks, generating 300 million tokens per second. In contrast, a Blackwell-based system has 86 chips, which may seem counterintuitive at first glance.

We are not trying to offer you less product. Our sales team is concerned, Jason, that you are giving them less commission. However, this approach works better. The more you buy, the more you save.  In fact, it's even better than that - the more you buy, the more you earn.

You know what? Everything is now in the context of the AI ​​factory. Although we often talk about chips, the focus is always on starting with scale - maximizing scale. The AI ​​factory is extremely complex. For example, a single rack contains 600,000 parts and weighs 3,000 pounds. These racks must be interconnected with many other racks.

We are developing a digital twin for each data center, which must be done before the physical data center is built. The world is rapidly advancing to build the most advanced, large-scale AI factories.

Building an AI Gigafactory is a remarkable engineering achievement that requires the collaboration of tens of thousands of workers from suppliers, architects, contractors, and engineers. The work involves building, transporting, and assembling nearly 5 billion parts and more than 200,000 miles of optical fiber, a distance almost equivalent to the distance from the Earth to the Moon.

The NVIDIA Omniverse AI Factory digital twin blueprint enables us to design and optimize these AI factories before physical construction begins. NVIDIA engineers used the blueprint to plan a 1-Gigawatt AI factory, incorporating 3D and layout data from the latest NVIDIA DGX SuperPods, advanced power and cooling systems from Vertiv and Schneider Electric, and optimized topology from NVIDIA AIR, a framework for simulating network logic, layout, and protocols.

Traditionally, this work has been performed in silos. Omniverse Blueprints enables our engineering teams to work in parallel and collaboratively, allowing us to explore various configurations to maximize total cost of ownership (TCO) and power usage efficiency.

NVIDIA uses Cadence’s realistic digital twin technology, accelerated by CUDA and Omniverse libraries, to simulate air and liquid cooling systems. Schneider Electric uses ETAP, an application designed to simulate power module efficiency and reliability.

Real-time simulation enables us to iterate and execute large-scale what-if scenarios in seconds rather than hours. Digital twin technology facilitates the communication of instructions to a wide network of teams and suppliers, minimizing execution errors and accelerating the setup process.

Additionally, when planning a retrofit or upgrade, we can efficiently test and simulate costs and downtime to ensure the AI ​​Factory remains future-proof.

This marks the first time that data builders have expressed appreciation for the beauty of their work. I have a lot to cover today, and if I move quickly it is not from lack of care but because of the sheer volume of information that needs to be shared.

Let's start with our roadmap. We are now in full production of Blackwell, and computer companies around the world are producing these remarkable machines at scale. I deeply appreciate your efforts in transitioning to this new architecture. In the second half of this year, we will seamlessly transition to the upgraded Blackwell Ultra MVLink 72, which provides 1.5 times the FLOPS, a new attention instruction, 1.5 times the memory (useful for applications such as KVCache), and double the network bandwidth. On the same architecture, we will smoothly transition to Blackwell Ultra, which will be available in the second half of this year.

This product launch is unique because of the anticipation it generates. That’s because we are building an AI factory and infrastructure, which requires years of planning. This is not a discretionary decision like buying a laptop, but a strategic investment that requires careful preparation, including securing land, power, and capital expenditures, and building an engineering team. We plan years in advance, which is why I’m sharing our roadmap with you early to ensure there are no surprises. For example, next month we will launch an incredible new system.

Looking ahead, the next milestone, a year later, is named after Vera Rubin, the astronomer who discovered dark matter. Her grandchildren are here today. Vera Rubin has twice the CPU performance of Grace, more memory and bandwidth, and consumes only 50 watts, which is really remarkable.

The Rubin family introduces a new GPU, CX9 networking, SmartNICs, NVLink 6, and HBM4 memory. Basically, everything is new except the chassis. This approach allows us to take significant risks in one direction without compromising the infrastructure. The Vera Rubin NVLink 144 will be available in the second half of next year.

I need to clarify a mistake I made. Blackwell is actually two GPUs in one chip, but we referred to it as a single GPU, which caused confusion in the NVLink naming. Going forward, when I refer to NVLink 144, it means it connects to 144 GPUs, each of which is a GPU die. These dies can be assembled in various ways and may change over time. Each GPU die is a GPU, and each NVLink connects to a GPU.

This sets the stage for the second half of this year, followed by Rubin Ultra next year. This is the destination you're looking for. This is the Vera Rubin Observatory, with the Rubin Ultra system. Scheduled to be operational in the second half of 2027, it uses NVLink 576 for extreme scalability. Each rack consumes 600 kilowatts and contains 2.5 million components.

The system includes a large number of GPUs, which greatly improves performance indicators. It can achieve 15 exaflops, which is a significant increase from the 1 exaflop mentioned earlier. The extended bandwidth reaches 4.6 petabytes per second, which is equivalent to 4,600 terabytes per second. In addition, the system is equipped with many new NVLink switches and CX9 components.

The configuration consists of 16 sites, each containing four GPUs in one package, interconnected by a very large NVLink. To provide context, this is the overall structure of the system.

It's going to be very exciting. Right now, we're bringing up Grace Blackwell. Just to be clear, it's not a laptop. This is what Grace Blackwell looks like, and this is what Ruben looks like in ISO size.

Essentially, before you can scale, you must first scale. Once you scale, you can then leverage the amazing techniques I’ll soon demonstrate to scale.

First, we scale. This helps give us some insight into how fast we are progressing. Here are the scalings by FLOPs: Hopper represents 1x, Blackwell represents 68x, and Ruben represents 900x. Power consumption is a key factor when considering total cost of ownership (TCO).

The area under the curve, as I mentioned before, represents the square of the curve. This is actually equivalent to the number of floating point operations multiplied by the bandwidth.

A straightforward way to assess your AI factory progress is to calculate watts divided by those metrics. It’s clear that Reuben will reduce costs dramatically.

NVIDIA's roadmap is updated every year like clockwork. To increase scale, we introduced NVLink, and our extended network relies on InfiniBand and SpectrumX. Many people were surprised by our entry into Ethernet. Our goal is to enhance Ethernet to have features comparable to InfiniBand, making the network easier to use and manage. This led us to invest in SpectrumX, which builds congestion control, low latency, and broad software integration into our computing architecture.

"So SpectrumX has performed exceptionally well, enabling the largest single-GPU cluster, Colossus, to run as a unified system. SpectrumX has been a huge success for us. I'm particularly excited that it has been integrated into the product lines of major enterprise networking companies, helping global enterprises transform into AI-driven organizations."

We are now at 100,000 GPUs with CX-7 and CX-8, and CX-9 is coming soon. Our goal is to scale to hundreds of thousands of GPUs during Ruben's tenure. The challenge is the connectivity in the scaled-out scenarios, these data centers are stadium-sized.

While copper cables are ideal for extending connectivity due to their reliability, energy efficiency, and cost-effectiveness, silicon photonics are essential for extending connectivity over long distances. The main challenge with silicon photonics is the high energy consumption of the transceiver in converting electrical signals into optical signals, which involves multiple stages.

First, we announced NVIDIA's first co-packaged silicon photonics system, the world's first 1.6 terabit per second CPO system. The system is based on a technology called micro-ring resonator modulators and is manufactured using advanced process technology from TSMC, with whom we have been working for many years. We worked with a large ecosystem of technology providers to develop this breakthrough innovation.

The decision to invest in MRMs (microring resonator modulators) stems from their higher density and power efficiency compared to the Mach-Zehnder modulators used in traditional telecom data center interconnects. To date, Mach-Zehnder modulators have been sufficient due to lower density requirements. However, as we scale, MRMs are showing significant advantages.

To illustrate, consider a transceiver. The transceiver I'm going to show you consumes 30 watts of power in high volume and costs $1,000. It has an electrical interface on one side and an optical interface on the other. The optical interface uses Mach-Zehnder technology, and it plays an important role in connecting the GPU to the switch and between switches.

In a system with 100,000 GPUs, we would need 100,000 transceivers on one side and another 100,000 on the other side to connect the switches. For 250,000 GPUs, an additional layer of switches would be needed. Each GPU would have six transceivers, consuming a total of 180 watts of power at a cost of $6,000. Scaling to millions of GPUs would require 6 million transceivers, consuming 180 megawatts of power. This raises a critical question: how to manage such a huge energy demand, since energy is our most precious resource. This energy consumption will directly impact our customers' revenues, reducing 180 megawatts of power.

We have achieved something extraordinary by inventing the world's first MRM micromirror. Here's how it works: A waveguide directs light into a resonant ring, which controls the waveguide's reflectivity and modulates the amount of energy or light passing through. It can selectively absorb light or allow it to pass through, effectively converting a continuous laser beam into binary 1 and 0 signals.

This photonic integrated circuit is then integrated with electronic integrated circuits, microlenses, and fiber arrays. These components are manufactured using TSMC's CoWoS technology and packaged using 3D CoWoS technology in collaboration with multiple technology providers. The result is a remarkable machine. Now, let's watch the video.

It's really a technological marvel. They turned into these switches, our InfiniBand switches, where the silicon performs really well. In the second half of this year, we will ship silicon photonic switches, and in the second half of next year, we will ship Spectrum X. This is all made possible by the MRM choice and the significant technology risks we have taken over the last five years, during which we have filed hundreds of patents and licensed technology to our partners.

Now, we are able to combine silicon photonics with a co-packaging option that eliminates the need for transceivers and allows direct fiber input into our switches with a 512-port radix. This achievement would not be feasible any other way. This allows us to scale to systems with hundreds of thousands and millions of GPUs.

The benefits are huge; in a data center, we can save tens of megawatts of power. For example, six megawatts is equivalent to ten Rubin Ultra racks.

Sixty is a significant number. We can now deploy a hundred racks of Rubin Ultra into Rubin systems. This is our roadmap every year: a new architecture every two years, a new product line every year. We take risks in silicon, networking, or system chassis in stages to push the industry forward while pursuing these incredible technologies.

I am very grateful to Vera Rubin's grandchildren to be here today. This is an opportunity for us to recognize and honor her outstanding contributions. Our next generation of products will be named after Feynman.

Let me talk about enterprise computing, a critical topic. In order to bring artificial intelligence to the global enterprise, we must first explore the other side of NVIDIA: the elegance of Gaussian scatter.

To bring AI to the enterprise, let’s take a step back and reflect: AI and machine learning have fundamentally reshaped the entire computing stack. Processors, operating systems, and the applications built on top of them have all changed. So have the ways in which applications are developed, orchestrated, and executed.

For example, the way we access data will be fundamentally different than in the past. In the future, instead of retrieving specific data and analyzing it, we will interact with systems like Perplexity. Instead of using traditional retrieval methods, we will directly ask Perplexity questions and it will provide answers.

This is how enterprise IT will work in the future. We will have AI agents as part of our digital workforce. There are 1 billion knowledge workers in the world and potentially 10 billion digital workers working in tandem with them.

In the future, 100% of software engineers - 30 million worldwide - will be assisted by AI. I have no doubt about this. By the end of this year, 100% of NVIDIA software engineers will be assisted by AI. AI agents will be everywhere and fundamentally change the way businesses operate and how we manage.

This requires a new generation of computers, and this is what a personal computer should be. 20 petaflops. Incredible. The system has 72 CPU cores and inter-chip interfaces.

Additionally, some PCI Express slots are available for your GeForce. This is called DGX Station. Both DGX Spark and DGX Station will be available from all OEMs including HP, Dell, Lenovo and ASUS. Designed for data scientists and researchers around the world, these systems represent the future of AI-driven computing. This is what computers should look like in the age of AI. We now offer a comprehensive product line for enterprises, from compact workstations to servers and supercomputers, all available through our partners.

We are also revolutionizing the compute stack, which consists of three pillars: compute, network, and storage. Spectrum X is transforming enterprise networks into AI-driven networks. Storage, the third pillar, is transforming from a retrieval-based system to a semantic-based storage system. This new system continuously embeds raw data into knowledge in the background. Users no longer interact with data by retrieving it, but by asking questions or asking questions. For example, Aaron at Box worked with us to deploy a super-intelligent storage system in the cloud. In the future, such systems will become standard for every enterprise.

We are working with leading storage industry partners, including DDN, Dell, Hewlett Packard Enterprise, Hitachi, IBM, NetApp, Newtonix, Pure Storage, Vast, and Weka, to enable GPU acceleration for storage systems for the first time.

Michael was concerned about the number of slides and sent me an extra one. This slide highlights Dell's upcoming NVIDIA Enterprise IT AI Infrastructure systems and the software running on them. It highlights our continued efforts to innovate enterprise technology.

Today, we also announced an incredible model that is now fully open source and enterprise-ready. Earlier, I compared R1 (an inference model) to Lama 3 (a non-inference model), demonstrating the superior intelligence of R1. We are committed to making advanced models like this accessible to any company.

This is part of our system, called NIMS, that you can download and run anywhere - on DGX Spark, DGX Station, any OEM server, or in the cloud. It integrates seamlessly into any agent AI framework. We work with companies around the world, and I want to thank some of our key partners in the room.

Accenture, under Julie Sweet, is developing their AI factory and framework. Amdocs, the largest telecom software company, is also leveraging our technology. AT&T, under John Stanky, is building an agent AI system. Larry Fink and the BlackRock team are advancing their own initiatives.

In the future, we will not only hire ASIC designers, but also digital ASIC designers from Cadence to assist in chip design. Cadence is integrating NVIDIA models, NIMS and libraries into their AI framework. This allows deployment on-premises or in any cloud environment. Capital One, as a leader in financial services technology, uses NVIDIA solutions extensively.

Deloitte, led by Jason, ENY, led by Janet, Nasdaq, led by Adina, and SAP, led by Christian, are all integrating NVIDIA technology into their AI frameworks. ServiceNow, led by Bill McDermott, is also making significant progress.

The keynote began with a 30-minute introduction followed by an equally detailed slideshow. Now, let’s turn our focus to robotics.

The age of robots has arrived. Robots have the unique advantage of interacting with the physical world and can perform tasks that digital information alone cannot. Clearly, the world is facing a serious labor shortage. By the end of this decade, the global labor shortage is expected to reach at least 50 million workers. While we are happy to pay $50,000 per year for each worker, we may soon find ourselves paying robots the same amount to do these tasks. This will undoubtedly become a huge industry.

Various robotic systems will change infrastructure, with billions of cameras deployed in warehouses and factories around the world, between 10 million and 20 million. As mentioned earlier, every car is already a robot. Now we are moving towards general robots. Everything that moves will be autonomous. Physical AI will enable robots to be integrated into all industries. NVIDIA has developed three computers that support a continuous cycle of robotic AI simulation, training, testing, and real-world experience. Training robots requires a lot of data, and Internet-scale data provides the basis for common sense and reasoning.

However, robots require motion and control data, which is expensive to obtain.

Based on the blueprint built on NVIDIA Omniverse and Cosmos, developers can generate large amounts of diverse synthetic data for training robot policies. Initially, in Omniverse, developers aggregate real-world sensor or demonstration data based on their specific domain, robot, and task. Then, they use Omniverse to adjust Cosmos to expand the original capture into a large amount of realistic and diverse data. Finally, developers use IsaacLab to post-train the robot policy using the enhanced dataset.

Robots can learn to clone behaviors through imitation, or acquire new skills through trial and error using reinforcement learning with AI feedback. Practicing in the lab is different from real-world scenarios, so new strategies need to be tested in the field. Developers use NVIDIA Omniverse for software and hardware-in-the-loop testing to simulate these strategies in a digital twin that incorporates real-world environmental dynamics, domain randomization, physical feedback, and high-fidelity sensor simulation. Real-world operations often require multiple robots to collaborate effectively.

Mega, an Omniverse Blueprint, enables developers to test fleets of post-trained policies at scale. In this case, Foxconn is evaluating heterogeneous robots in a virtual NVIDIA Blackwell production facility. As the robot brains perform their tasks, they perceive the results of their actions through sensor simulations and subsequently plan their next steps. Mega allows developers to test numerous robot policies, facilitating the ability of robots to operate as a coordinated system, whether for spatial reasoning, navigation, mobility, or dexterity.

Amazing innovations are born from simulation. Today, we’re introducing NVIDIA Isaac Groot N1, a universal base model designed for humanoid robots. It’s built on the principles of synthetic data generation and in-simulation learning. Groot N1 uses a dual-system architecture designed to enable fast and slow thinking, inspired by human cognitive processing principles. The slow-thinking system enables the robot to perceive and reason about its environment and instructions, and plan appropriate actions.

The fast-thinking system translates plans into precise and continuous robotic actions. Groot N1’s generalization capabilities enable the robot to easily manipulate common objects and collaboratively perform multi-step sequences. Through the entire process of synthetic data generation and robot learning, humanoid robot developers can post-train Groot N1 across multiple entities, tasks, and environments. Globally, developers from all walks of life are using NVIDIA’s three computers to build the next generation of embodied AI.

Physical AI and robotics are advancing rapidly. It is vital to stay up to date with the latest developments in this field. This is quite possibly the biggest of all industries. At its core, we face the same challenges.

As mentioned before, we focus on three key areas. They are very systematic.

How do we address the data challenge? Where and how do we generate the data necessary to train AI? What is a model architecture? And what are the scaling laws? How do we scale data, compute, or both to continue to improve AI intelligence?  These fundamental questions apply to robotics as well.

In robotics, we've developed a system called Omniverse, which is our operating system for physical AI. You've heard me talk about Omniverse in detail. Today, I'm going to talk about two new technologies that are integrated into it.

The first technology enables us to extend AI through generative capabilities - a generative model that understands the physical world, which we call Cosmos. By using Omniverse to condition Cosmos, and using Cosmos to generate an infinite number of environments, we can create data that is both grounded and controlled, but systematically infinite. For example, Omniverse uses candy colors to show how to precisely control a robot in a scene, while Cosmos generates diverse virtual environments.

The second capability, as mentioned earlier, involves the potential to significantly scale language models through reinforcement learning with verifiable rewards. In robotics, verifiable rewards are governed by the laws of physics. Verifiable physical rewards.

We needed an advanced physics engine tailored for precise applications. While most physics engines are designed for various purposes, such as large-scale machinery or virtual worlds like video games, we needed an engine specifically designed for detailed rigid and soft body simulations.

This engine must support haptic feedback training, fine motor skills, and actuator control. It should be GPU-accelerated so that virtual worlds can run in super-linear time, facilitating fast AI model training. In addition, it must be able to be seamlessly integrated into Mujoco, a framework widely used by roboticists worldwide.

Today, we’re excited to announce a groundbreaking collaboration between DeepMind, Disney Research, and NVIDIA called Newton. Let’s explore Newton in detail.

Let's start over and make sure we don't ruin their experience.

Please provide feedback. I need your input. What's going on? I need to talk to a real person. Please, that's a good joke. Find me a real person to talk to. Jenny, I know this isn't your fault, but talk to me. We only have two minutes left. They're reviewing it. Are they relisting it? I'm not sure what that means. That's not remarkable. Hey, Blue. How do you like your new physics engine? Haptic feedback, rigid body and soft body simulation are all beyond real time. What you just witnessed was a full real time simulation. This is how we'll train robots in the future. Blue has two NVIDIA computers inside. You're really smart.

Hey, Bru. Let's take them home and wrap up this keynote. It's time for lunch. Are you ready? Let's wrap up. We have another announcement. You did a great job. Please stand here. Very good. Right there. Okay, stand up.

We have an exciting new announcement. Advances in robotics have been phenomenal, and today we are proud to announce that Groot N1 is now open source.  Thank you all for attending GTC. Let's review the key highlights.

First, Blackwell is in full production and is growing at an incredible pace thanks to strong customer demand. This is due to the inflection point in AI, where the computational requirements for inferencing AI, training AI systems, and agent systems have increased significantly.

Second, Blackwell NVLink 72 with Dynamo delivers 40x better AI factory performance than Hopper. As we scale AI, inference will be one of the most critical workloads over the next decade.

Third, we have established an annual roadmap cadence to help you plan your AI infrastructure. We are building three kinds of AI infrastructure: for cloud, enterprise, and robotics.

Finally, we have a special announcement for you all. Play it. Thanks to everyone who worked on this video. Have a great GTC. Thank you.

Hey, Blue.

Let's go home.

Well done. Thank you.