• Latest
  • Trending
  • All
  • News
  • Business
  • Politics
  • Science
  • World
  • Lifestyle
  • Tech
GTC 2023: Nvidia shares how Rapids can future-proof Apache Spark

GTC 2023: Nvidia shares how Rapids can future-proof Apache Spark

March 23, 2023
Maine abortion laws expansion, which would be among broadest in US, passes committee

Maine abortion laws expansion, which would be among broadest in US, passes committee

June 9, 2023
Edin Dzeko: Inter’s big-game Mr Reliable who continues to defy time

Edin Dzeko: Inter’s big-game Mr Reliable who continues to defy time

June 9, 2023
DraftKings Promo Code: Bet $5, Get $200 Guaranteed NBA Finals Game 4 Bonus

DraftKings Promo Code: Bet $5, Get $200 Guaranteed NBA Finals Game 4 Bonus

June 9, 2023
The Only Positive of Smokemageddon

The Only Positive of Smokemageddon

June 9, 2023
Garth Brooks admits he ‘sucked’ at being husband, ‘horrible’ at being dad: ‘Had to get my s–t together’

Garth Brooks admits he ‘sucked’ at being husband, ‘horrible’ at being dad: ‘Had to get my s–t together’

June 9, 2023
Colombia and ELN rebel group sign ceasefire agreement

Colombia and ELN rebel group sign ceasefire agreement

June 9, 2023
Voting Rights Act survives at US Supreme Court but more challenges loom

Voting Rights Act survives at US Supreme Court but more challenges loom

June 9, 2023
US House Republicans unveil broad package of tax cuts

US House Republicans unveil broad package of tax cuts

June 9, 2023
Who is Jack Smith, the special counsel investigating Donald Trump?

Who is Jack Smith, the special counsel investigating Donald Trump?

June 9, 2023
Ukraine targeted homes of intelligence officers with Moscow drone strikes, US report claims

Ukraine targeted homes of intelligence officers with Moscow drone strikes, US report claims

June 9, 2023
Blue Jays cut pitcher Anthony Bass after latest anti-LGBTQ+ comments

Blue Jays cut pitcher Anthony Bass after latest anti-LGBTQ+ comments

June 9, 2023
Wonking Out: How Low Must Inflation Go?

Wonking Out: How Low Must Inflation Go?

June 9, 2023
DNYUZ
  • Home
  • News
    • U.S.
    • World
    • Politics
    • Opinion
    • Business
    • Crime
    • Education
    • Environment
    • Science
  • Entertainment
    • Culture
    • Music
    • Movie
    • Television
    • Theater
    • Gaming
    • Sports
  • Tech
    • Apps
    • Autos
    • Gear
    • Mobile
    • Startup
  • Lifestyle
    • Arts
    • Fashion
    • Food
    • Health
    • Travel
No Result
View All Result
DNYUZ
No Result
View All Result
Home News

GTC 2023: Nvidia shares how Rapids can future-proof Apache Spark

March 23, 2023
in News
GTC 2023: Nvidia shares how Rapids can future-proof Apache Spark
604
SHARES
1.7k
VIEWS
Share on FacebookShare on Twitter

Following the initial rise of Hadoop, data teams across industries have adopted Apache Spark as the go-to framework for distributed big data processing. The open-source platform has largely replaced Hadoop’s Mapreduce by enabling faster in-memory processing of datasets, and handling use cases that Hadoop could not manage. Spark is also more accessible in terms of APIs, and backed with adequate fault tolerance.

However, with the amount of data in the world predicted to grow to 221 zettabytes by 2026, it’s difficult for organizations to get a grip on the information they have. At current processing speeds, companies will face latencies in business applications like analytics. And if they move to increase speeds, the costs rise.

That’s why teams should look at the option of accelerating Spark with GPUs, via Rapids, said Sameer Raheja, senior director of engineering at Nvidia, at the ongoing GTC 2023 conference. 

>>Follow VentureBeat’s ongoing Nvidia GTC spring 2023 coverage

GPU-accelerated Apache Spark 

To handle future data demands with Spark, Raheja suggested running the framework with Nvidia GPUs. A plugin jar like Rapids Accelerator for Apache Spark, he said, can allow Spark batch processing to run on GPUs without any code changes.

This, he said, will not only enable teams to run massive data jobs faster at a lower cost than is possible with CPUs, it will also drive power savings.

Rapids Accelerator for Apache Spark combines the power of the Rapids cuDF library and the scale of the Spark distributed computing framework. The Rapids Accelerator library also has a built-in accelerated shuffle based on UCX that can be configured to leverage GPU-to-GPU communication and remote direct memory access capabilities.

Using the Nvidia decision support benchmark — an adaptation of the industry-standard TPC-DS benchmark, with 100 modified queries — the company compared a Rapids-based GPU-accelerated Google cloud dataproc Spark distribution with one based on CPUs. The GPU nodes did a power run of all 100 queries in just 31 minutes, versus 176 minutes taken by the CPU nodes.

Since the GPU run took less time, it also proved to be more affordable than CPU nodes, costing just $7.20 as against $32.52 for the CPU run. The GPU run was five times more power-efficient.

“For anyone who’s running big data workloads and managing a budget … performance, cost and efficiency are key factors, and Rapids Accelerator for Spark addresses all three,” Raheja emphasized.

He added that similar benchmark results were witnessed on other clouds and Spark distributions with configurations closely matching that of Dataproc. For example, Rapids-accelerated AWS EMR distribution saw a 42% cost savings, while AWS Databricks Photon and Azure Databricks Photon delivered 39% and 34% cost savings, respectively.

How it works

The key to these benefits is Apache Spark 3, which brings column-based processing and resource-aware custom resource scheduling capabilities. This allows teams to schedule tasks on accelerator resources like GPUs.

“You can continue to write your application in the APIs you’re familiar with — SQL, Python, R, Java and Scala. Spark provides distributed and scale-up compute power; Spark 3.x provides resource-aware scheduling; and the Rapids Accelerator for Apache Spark plugin provides transparency for applications to run on Nvidia GPUs, enabling acceleration in cooperation with [the] Spark core engine’s built-in processor,” Raheja said.

Currently, the Rapids Spark accelerator is available on and built into Amazon EMR, Cloudera CDP, Databricks ML runtime, Azure Synapse Analytics, Google Cloud Dataproc, and open-source Apache Spark 3.x distributions, either on-premises or in the cloud.

The 2023 Nvidia GTC event runs through March 23.

The post GTC 2023: Nvidia shares how Rapids can future-proof Apache Spark appeared first on Venture Beat.

Share242Tweet151Share

Trending Posts

Boris Johnson rewards allies, and a hairdresser, with honors as critics cry foul

Boris Johnson rewards allies, and a hairdresser, with honors as critics cry foul

June 9, 2023
Amy Schumer Jokes She Was ‘Too Thin’ to Play Barbie

Amy Schumer Jokes She Was ‘Too Thin’ to Play Barbie

June 9, 2023
Damning Trump Indictment Says He Stored Classified Docs in a SHOWER

Damning Trump Indictment Says He Stored Classified Docs in a SHOWER

June 9, 2023
What N.Y. Lawmakers Have, and Haven’t, Accomplished This Year

What N.Y. Lawmakers Have, and Haven’t, Accomplished This Year

June 9, 2023
Black List Suspends Studio Memberships, Lowers Scribes’ Fees In Support Of WGA Strike

Black List Suspends Studio Memberships, Lowers Scribes’ Fees In Support Of WGA Strike

June 9, 2023

Copyright © 2023.

Site Navigation

  • About
  • Advertise
  • Privacy & Policy
  • Contact

Follow Us

No Result
View All Result
  • Home
  • News
    • U.S.
    • World
    • Politics
    • Opinion
    • Business
    • Crime
    • Education
    • Environment
    • Science
  • Entertainment
    • Culture
    • Gaming
    • Music
    • Movie
    • Sports
    • Television
    • Theater
  • Tech
    • Apps
    • Autos
    • Gear
    • Mobile
    • Startup
  • Lifestyle
    • Arts
    • Fashion
    • Food
    • Health
    • Travel

Copyright © 2023.

We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept”, you consent to the use of ALL the cookies.
Cookie settingsACCEPT
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT