Today at its annual huge conference re:Invent 2024, Amazon Web Services (AWS) announced the next generation of its cloud-based machine learning (ML) development platform SageMaker, transforming it a unified hub that allows enterprises to bring together not only all their data assets — spanning across different data lakes and sources in the lakehouse architecture — but also a comprehensive set of AWS ecosystem analytics and formerly disparate ML tools.
In other words: no longer will Sagemaker just be a place to build AI and machine learning apps — now you can link your data and derive analytics from it, too.
The move comes in response to a general trend of convergence of analytics and AI, where enterprise users have been seen using their data in interconnected ways, right from powering historical analytics to enabling ML model training and generative AI applications targeting different use cases.
“Many customers already use combinations of our purpose-built analytics and ML tools (in isolation), such as Amazon SageMaker—the de facto standard for working with data and building ML models—Amazon EMR, Amazon Redshift, Amazon S3 data lakes and AWS Glue. The next generation of SageMaker brings together these capabilities—along with some exciting new features—to give customers all the tools they need for data processing, SQL analytics, ML model development and training, and generative AI, directly within SageMaker,” Swami Sivasubramanian, the vice president of Data and AI at AWS, said in a statement.
SageMaker Unified Studio and Lakehouse at the heart
Amazon SageMaker has long been a critical tool for developers and data scientists, providing them with a fully managed service to deploy production-grade ML models.
The platform’s integrated development environment, SageMaker Studio, gives teams a single, web-based visual interface to perform all machine learning development steps, right from data preparation, model building, training, tuning, and deployment.
However, as enterprise needs continue to evolve, AWS realized that keeping SageMaker restricted to just ML deployment doesn’t make sense. Enterprises also need purpose-built analytics services (supporting workloads like SQL analytics, search analytics, big data processing, and streaming analytics) in conjunction with existing SageMaker ML capabilities and easy access to all their data to drive insights and power new experiences for their downstream users.
Two new capabilities: SageMaker Lakehouse and Unified Studio
To bridge this gap, the company has now upgraded SageMaker with two key capabilities: Amazon SageMaker Lakehouse and Unified Studio.
The lakehouse offering, as the company explains, provides unified access to all the data stored in the data lakes built on top of Amazon Simple Storage Service (S3), Redshift data warehouses and other federated data sources, breaking silos and making it easily queryable regardless of where the information is originally stored.
“Today, more than one million data lakes are built on Amazon Simple Storage Service… allowing customers to centralize their data assets and derive value with AWS analytics, AI, and ML tools… Customers may have data spread across multiple data lakes, as well as a data warehouse, and would benefit from a simple way to unify all of this data,” the company noted in a press release.
Once all the data is unified with the lakehouse offering, enterprises can access it and put it to work with the other key capability — SageMaker Unified Studio.
At the core, the studio acts as a unified environment that strings together all existing AI and analytics capabilities from Amazon’s standalone studios, query editors, and visual tools – spanning Amazon Bedrock, Amazon EMR, Amazon Redshift, AWS Glue and the existing SageMaker Studio.
This avoids the time-consuming hassle of using separate tools in isolation and gives users one place to leverage these capabilities to discover and prepare their data, author queries or code, process the data and build ML models. They can even pull up Amazon Q Developer assistant and ask it to handle tasks like data integration, discovery, coding or SQL generation — in the same environment.
So, in a nutshell, users get one place with all their data and all their analytics and ML tools to power downstream applications, ranging from data engineering, SQL analytics and ad-hoc querying to data science, ML and generative AI.
Bedrock in Sagemaker
For instance, with Bedrock capabilities in the SageMaker Studio, users can connect their preferred high-performing foundation models and tools like Agents, Guardrails and Knowledge Bases with their lakehouse data assets to quickly build and deploy gen AI applications.
Once the projects are executed, the lakehouse and studio offerings also allow teams to publish and share their data, models, applications and other artifacts with their team members – while maintaining consistent access policies using a single permission model with granular security controls. This accelerates the discoverability and reuse of resources, preventing duplication of efforts.
Compatible with open standards
Notably, SageMaker Lakehouse is compatible with Apache Iceberg, meaning it will also work with familiar AI and ML tools and query engines compatible with Apache Iceberg open standard. Plus, it includes zero-ETL integrations for Amazon Aurora MySQL and PostgreSQL, Amazon RDS for MySQL, Amazon DynamoDB with Amazon Redshift as well as SaaS applications like Zendesk and SAP.
“SageMaker offerings underscore AWS’ strategy of exposing its advanced, comprehensive capabilities in a governed and unified way, so it is quick to build, test and consume ML and AI workloads. AWS pioneered the term Zero-ETL, and it has now become a standard in the industry. It is exciting to see that Zero-ETL has gone beyond databases and into apps. With governance control and support for both structured and unstructured data, data scientists can now easily build ML applications,” industry analyst Sanjeev Mohan told VentureBeat.
New SageMaker is now available
The new SageMaker is available for AWS customers starting today. However, the Unified Studio is still in the preview phase. AWS has not shared a specific timeline but noted that it expects the studio to become generally available soon.
Companies like Roche and Natwast Group will be among the first users of the new capabilities, with the latter anticipating Unified Studio will result in a 50% reduction in the time required for its data users to access analytics and AI capabilities. Roche, meanwhile, expects a 40% reduction in data processing time with SageMaker Lakehouse.
AWS re:Invent runs from December 2 to 6, 2024.
The post AWS SageMaker is transforming into a combined data and AI hub appeared first on Venture Beat.