What is NVDIA Dynamo inference

On March 18, 2025, NVIDIA unveiled two groundbreaking innovations at its GTC conference: the Blackwell Ultra platform and the open-source Dynamo inference framework. These advancements are set to redefine AI infrastructure and inference capabilities, catering to the escalating demands of reasoning and agentic AI applications.

Blackwell Ultra: A Leap in AI Performance

Building upon the Blackwell architecture introduced a year prior, Blackwell Ultra comprises the NVIDIA GB300 NVL72 rack-scale solution and the NVIDIA HGX™ B300 NVL16 system. The GB300 NVL72 connects 72 Blackwell Ultra GPUs and 36 Arm Neoverse-based NVIDIA Grace™ CPUs in a rack-scale design, delivering 1.5 times more AI performance than its predecessor, the GB200 NVL72. This enhancement translates to a 50-fold increase in revenue opportunities for AI factories compared to those built with NVIDIA’s earlier Hopper™ architecture.

Blackwell Ultra is engineered to handle the rigorous demands of advanced AI applications, including:

Agentic AI: Systems capable of sophisticated reasoning and iterative planning to autonomously solve complex, multistep problems.
Physical AI: Enabling the generation of synthetic, photorealistic videos in real-time for training applications such as robotics and autonomous vehicles at scale.

To complement its hardware advancements, Blackwell Ultra integrates seamlessly with NVIDIA Spectrum-X™ Ethernet and NVIDIA Quantum-X800 InfiniBand platforms, offering 800 Gb/s of data throughput per GPU. This integration ensures reduced latency and jitter, optimising performance for AI infrastructure. citeturn0search0

Dynamo: Revolutionizing AI Inference

Alongside Blackwell Ultra, NVIDIA introduced Dynamo, an open-source inference framework designed to accelerate and scale AI reasoning models efficiently. As the successor to NVIDIA Triton Inference Server™, Dynamo orchestrates and accelerates inference communication across thousands of GPUs. It employs disaggregated serving to separate the processing and generation phases of large language models (LLMs) onto different GPUs, allowing each phase to be optimised independently and ensuring maximum GPU resource utilisation.

Key innovations within Dynamo include:

GPU Planner: Dynamically adds and removes GPUs to adjust to fluctuating user demand, avoiding over- or under-provisioning.
Smart Router: An LLM-aware router that directs requests across large GPU fleets to minimise costly GPU recompilations, freeing up resources for new incoming requests.
Low-Latency Communication Library: Supports state-of-the-art GPU-to-GPU communication, accelerating data transfer and reducing inference response time.
Memory Manager: Intelligently offloads and reloads inference data to and from lower-cost memory and storage devices without impacting user experience.

When running the DeepSeek-R1 model on a large cluster of GB200 NVL72 racks, Dynamo’s intelligent inference optimisations have been shown to boost the number of tokens generated by over 30 times per GPU. citeturn0search1 This substantial improvement underscores Dynamo’s capability to enhance throughput while reducing response times and model serving costs, providing an efficient solution for scaling test-time compute.

Global Adoption and Future Outlook

NVIDIA’s Blackwell Ultra and Dynamo have garnered support from leading technology companies. Major server manufacturers such as Cisco, Dell Technologies, Hewlett Packard Enterprise, Lenovo, and Supermicro are expected to deliver a range of servers based on Blackwell Ultra products. Additionally, cloud service providers including Amazon Web Services, Google Cloud, Microsoft Azure, and Oracle Cloud Infrastructure, along with GPU cloud providers like CoreWeave and Lambda, will be among the first to offer Blackwell Ultra-powered instances. citeturn0search0

These developments signify NVIDIA’s commitment to advancing AI infrastructure and inference capabilities, addressing the growing computational demands of sophisticated AI models. As AI continues to evolve, platforms like Blackwell Ultra and frameworks like Dynamo will play pivotal roles in shaping the future of AI applications across various industries.

What is NVDIA Dynamo inference

Submit a Comment Cancel reply

Recent Posts

Recent Comments