Nvidia and Microsoft accelerate AI processing on PCs

Nvidia and Microsoft announced work to accelerate the performance of AI processing on Nvidia RTX-based AI PCs.

Generative AI is transforming PC software into breakthrough experiences — from digital humans to writing assistants, intelligent agents and creative tools.

Nvidia RTX AI PCs are powering this transformation with technology that makes it simpler to get started experimenting with generative AI, and unlocking greater performance on Windows 11.

TensorRT for RTX AI PCs

TensorRT has been reimagined for RTX AI PCs, combining industry leading TensorRT performance with just-in-time on-device engine building and an 8x smaller package size for fast AI deployment to the more than 100 million RTX AI PCs.

Announced at Microsoft Build, TensorRT for RTX is natively supported by Windows ML — a new inference stack that provides app developers with both broad hardware compatibility and state of the art performance.

Gerardo Delgado, director of product for AI PC at Nvidia, said in a press briefing that the AI PCs start with Nvidia’s RTX hardware, CUDA programming and an array of AI models. He noted that at a high level, an AI model is basically a set of mathematical operations along with a way to run them. And the combination of operations and how to run them is what is normally known as a graph in machine learning.

He added, “Our GPUs are going to execute these operations with Tensor cores. But Tensor cores change from generation to generatio. We have been implementing them from time to time, and then within a generation of GPUs, you also have different Tensor code counts depending on the schema. Being able to match what’s the right Tensor code for each mathematical operation is the key to achieving performance. So a TensorRT does this in a two step approach.”

First, Nvidia has to optimize the AI model. It has to quantize the model so it reduces the precision of parts of the model or some of the layers. Once Nvidia has optimized model, TensorRT consumes that optimized model, and then Nvidia basically prepares a plan with a pre-selection of kernels.”

If you compare this to a standard way of running AI on Windows, Nvidia can achieve about a 1.6 times performance on average.

Now there will be a new version of TensorRT for RTX to improve this experience. It’s designed specifically for RTX AI PCs and it provides the same TensorRT performance, but instead of having to pre-generate the TensorRT engines per GPU, it will focus on optimizing the model, and it will ship a generic TensorRT engine.

“Then once the application is installed, TensorRT for RTX will generate the right TensorRT engine for your specific GPU in just seconds. This greatly simplifies the developer workflow,” he said.

Among the results are a reduction in size of of libraries, better performance for video generation, and better quality livestreams, Delgado said.

Nvidia SDKs make it easier for app developers to integrate AI features and accelerate their apps on GeForce RTX GPUs. This month top software applications from Autodesk, Bilibili, Chaos, LM Studio and Topaz are releasing updates to unlock RTX AI features and acceleration.

AI enthusiasts and developers can easily get started with AI using Nvidia NIM, pre-packaged, optimized AI models that run in popular apps like AnythingLLM, Microsoft VS Code and ComfyUI. The FLUX.1-schnell image generation model is now available as a NIM, and the popular FLUX.1-dev NIM has been updated to support more RTX GPUs.

For a no-code option to dive into AI development, Project G-Assist — the RTX PC AI assistant in the Nvidia app — has enabled a simple way to build plug-ins to create assistant workflows. New community plug-ins are now available including Google Gemini web search, Spotify, Twitch, IFTTT and SignalRGB.

Accelerated AI inference with TensorRT for RTX

Today’s AI PC software stack requires developers to choose between frameworks that have broad hardware support but lower performance, or optimized paths that only cover certain hardware or model types and require the developer to maintain multiple paths.

The new Windows ML inference framework was built to solve these challenges. Windows ML is built on top of ONNX Runtime and seamlessly connects to an optimized AI execution layer provided and maintained by each hardware manufacturer. For GeForce RTX GPUs, Windows ML automatically uses TensorRT for RTX — an inference library optimized for high performance and rapid deployment. Compared to DirectML, TensorRT delivers over 50% faster performance for AI workloads on PCs.

Windows ML also delivers quality of life benefits for the developer. It can automatically select the right hardware to run each AI feature, and download the execution provider for that hardware, removing the need to package those files into their app. This allows Nvidia to provide the latest TensorRT performance optimizations to users as soon as they are ready. And because it’s built on ONNX Runtime, Windows ML works with any ONNX model.

To further enhance the experience for developers, TensorRT has been reimagined for RTX. Instead of having to pre-generate TensorRT engines and package them with the app, TensorRT for RTX uses just-in-time, on-device engine building to optimize how the AI model is run for the user’s specific RTX GPU in mere seconds. And the library has been streamlined, reducing its file size by a massive eight times. TensorRT for RTX is available to developers through the Windows ML preview today, and will be available directly as a standalone SDK at Nvidia Developer, targeting a June release.

Developers can learn more in Nvidia’s Microsoft Build Developer Blog, the TensorRT for RTX launch blog, and Microsoft’s Windows ML blog.

Expanding the AI ecosystem on Windows PCs

Developers looking to add AI features or boost app performance can tap into a broad range of Nvidia SDKs. These include CUDA and TensortRT for GPU acceleration; DLSS and Optix for 3D graphics; RTX Video and Maxine for multimedia; and Riva, Nemotron or ACE for generative AI.

Top applications are releasing updates this month to enable Nvidia unique features using these SDKs. Topaz is releasing a generative AI video model to enhance video quality accelerated by CUDA. Chaos Enscape and Autodesk VRED are adding DLSS 4 for faster performance and better image quality. BiliBili is integrating Nvidia Broadcast features, enabling streamers to activate Nvidia Virtual Background directly within Bilibili Livehime to enhance the quality of livestreams.

Local AI made easy with NIM Microservices and AI blueprints

Getting started with developing AI on PCs can be daunting. AI developers and enthusiasts have to select from over 1.2 million AI models on Hugging Face, quantize it into a format that runs well on PC, find and install all the dependencies to run it, and more. Nvidia NIM makes it easy to get started by providing a curated list of AI models, pre-packaged with all the files needed to run them, and optimized to achieve full performance on RTX GPUs. And as containerized microservices, the same NIM can be run seamlessly across PC or cloud.

A NIM is a package — a generative AI model that’s been prepackaged with everything you need to run it.

It’s already optimized with TensorRT for RTX GPUs, and it comes with an easy to use API that’s open-API compatible, which makes it compatible with all of the top AI applications that users are using today.

At Computex, Nvidia is releasing the FLUX.1-schnell NIM — an image generation model from Black Forest Labs for fast image generation — and updating the FLUX.1-dev NIM to add compatibility for a wide range of GeForce RTX 50 and 40 Series GPUs. These NIMs enable faster performance with TensorRT, plus additional performance thanks to quantized models. On Blackwell GPUs, these run over twice as fast as running them natively, thanks to FP4 and RTX optimizations.

AI developers can also jumpstart their work with Nvidia AI Blueprints — sample workflows and projects using NIM.

Last month Nvidia released the 3D Guided Generative AI Blueprint, a powerful way to control composition and camera angles of generated images by using a 3D scene as a reference. Developers can modify the open source blueprint for their needs or extend it with additional functionality.

New Project G-Assist plug-ins and sample projects now available

Nvidia recently released Project G-Assist as an experimental AI assistant integrated into the Nvidia app. G-Assist enables users to control their GeForce RTX system using simple voice and text commands, offering a more convenient interface compared to manual controls spread across multiple legacy control panels.

Developers can also use Project G-Assist to easily build plug-ins, test assistant use cases and publish them through Nvidia’s Discord and GitHub.

To make it easier to get started creating plug-ins, Nvidia has made available the easy-to use Plug-in Builder — a ChatGPT-based app that allows no-code/low-code development with natural language commands. These lightweight, community-driven add-ons leverage straightforward JSON definitions and Python logic.

New open-source samples are available now on GitHub, showcasing diverse ways how on device AI can enhance your PC and gaming workflows.

● Gemini: The existing Gemini plug-in that uses Google’s cloud-based free-to-use LLM has been updated to include real-time web search capabilities.

● IFTTT: Enable automations from the hundreds of end points that work with IFTTT, such as IoT and home automation systems, enabling routines spanning digital setups and physical surroundings.

● Discord: Easily share game highlights, or messages directly to Discord servers without disrupting gameplay.

Explore the GitHub repository for additional examples — including hands-free music control via Spotify, livestream status checks with Twitch, and more.

Project G-Assist — AI Assistant For Your RTX PC

Companies are also adopting AI as the new PC interface. For example, SignalRGB is developing a G-Assist plugin that enables unified lighting control across multiple manufacturers. SignalRGB users will soon be able to install this plug-in directly from the SignalRGB app.

Enthusiasts interested in developing and experimenting with Project G-Assist plug-ins are invited to join the Nvidia Developer Discord channel to collaborate, share creations and receive support during development.

Each week, the RTX AI Garage blog series features community-driven AI innovations and content for those looking to learn more about NIM microservices and AI Blueprints, as well as building AI agents, creative workflows, digital humans, productivity apps and more on AI PCs and workstations.

The post Nvidia and Microsoft accelerate AI processing on PCs appeared first on Venture Beat.