Amazon HyperPod Task Governance keeps GPUs running, cutting costs 40%

Cost remains a primary concern of enterprise AI usage and it’s a challenge that AWS is tackling head-on.

At the AWS:reinvent 2024 conference today, the cloud giant announced HyperPod Task Governance, a sophisticated solution targeting one of the most expensive inefficiencies in enterprise AI operations: underutilized GPU resources.

According to AWS, HyperPod Task Governance can increase AI accelerator utilization, helping enterprises to optimize AI costs and producing potentially significant savings.

“This innovation helps you maximize computer resource utilization by automating the prioritization and management of these Gen AI tasks, reducing the cost by up to 40%,” said Swami Sivasubramanian, VP of AI and Data at AWS.

End GPU idle time

As organizations rapidly scale their AI initiatives, many are discovering a costly paradox. Despite heavy investments in GPU infrastructure to power various AI workloads, including training, fine tuning and inference, these expensive computing resources frequently sit idle.

Enterprise leaders report surprisingly low utilization rates across their AI projects, even as teams compete for computing resources. As it turns out, it’s actually a challenge that AWS itself faced.

“Internally, we had this kind of problem as we were scaling up more than a year ago, and we built a system that takes into account the consumption needs of these accelerators,” Sivasubramanian told VentureBeat. “I talked to many of our customers, CIOs and CEOs, they said we want exactly that; we want it as part of Sagemaker and that’s what we are launching.”

Swami said that once the system was deployed AWS’ AI accelerator utilization went through the roof with utilization rates rising over 90%

How HyperPod Task Governance works

The SageMaker Hyperpod technology was first announced at the re:invent 2023 conference.

SageMaker HyperPod is built to handle the complexity of training large models with billions or tens of billions of parameters, which requires managing large clusters of machine learning accelerators.

HyperPod Task Governance adds a new layer of control to SageMaker Hyperpod by introducing intelligent resource allocation across different AI workloads.

The system recognizes that different AI tasks have varying demand patterns throughout the day. For instance, inference workloads typically peak during business hours when applications see the most use, while training and experimentation can be scheduled during off-peak hours.

The system provides enterprises with real-time insights into project utilization, team resource consumption, and compute needs. It enables organizations to effectively load balance their GPU resources across different teams and projects, ensuring that expensive AI infrastructure never sits idle.

AWS wants to make sure enterprises don’t leave money on the table

Sivasubramanian highlighted the critical importance of AI cost management during his keynote address.

As an example, he said that if an organization has allocated a thousand AI accelerators deployed not all are utilized consistently over a 24 hour period. During the day, they are heavily used for inference, but at night, a large portion of these costly resources are sitting idle when the inference demand might be very low.

“We live in a world where compute resources are finite and expensive and it can be difficult to maximize utilization and efficiently allocate resources, which is typically done through spreadsheets and calendars,” he said. ” Now, without a strategic approach to resource allocation, you’re not only missing opportunities, but you’re also leaving money on the table.”

The post Amazon HyperPod Task Governance keeps GPUs running, cutting costs 40% appeared first on Venture Beat.