SlideShare a Scribd company logo
Real-Time Insights by
Leveraging Spark with
Aerospike
Aerospike Spark Connector
Zohar Elkayam, Aerospike
2 Proprietary & Confidential | All rights reserved. © 2020 Aerospike Inc.
▪ Where is Aerospike Spark Connecter located in the EcoSystem
▪ A Quick Overview of Aerospike Spark Connector
▪ Some Code Example
▪ Scaling up: A Customer Story
Agenda
3 Proprietary & Confidential | All rights reserved. © 2020 Aerospike Inc.
Data Warehouse Data Lake
Legacy RDBMS HDFS Based
Aerospike Simplifies Real-time Architecture at any Scale
Aerospike
Database
SoE Location 1
SoE Location 2
SoE Location 3
XDR
XDR
Transactional
Systems
Aerospike
Database
XDR
XDR
Enterprise Environment
Transactional
Systems
Legacy Database
(Mainframe)
RDBMS
Database
Delivering Extreme Scalability:
✓ Simplicity
✓ Maintainability
✓ Durability
✓ Strong Consistency
✓ Scalability
✓ Low Cost ($)
✓ Less Data Drag
XDR Legacy RDBMS
Data LakeReal-time Data Warehouse
System of Record Query &
Reporting Store
XDR
4 A E R O S P I K E | Proprietary & Confidential | All rights reserved. © 2020 Aerospike Inc
Aerospike Connect for Spark
5 Proprietary & Confidential | All rights reserved. © 2020 Aerospike Inc.
Aerospike Connect for Spark
Example Use Cases
✓ Fraud prevention: transaction data via
streaming and need to analyze based on
historical data in real time
✓ Recommendation Engines: Real-time
recommendations and targeting based on user
behavior
✓ Ad Tech: Ad Fraud and real-time retargeting
base on user behavior
✓ Digital Identity Management
✓ Industrial Internet of Things (IIoT): Real-time &
closed loop business decisions
6 A E R O S P I K E | Proprietary & Confidential | All rights reserved. © 2020 Aerospike Inc
• Spark connection for Aerospike – both loading the data and using it as dataframe (i.e.
Spark SQL) or by using it as streamed data
• Supports Scala (spark-shell) for all Aerospike’s Spark Operations
• Support Python (pyspark) for some operations – Dataset operations not supported
• Guide: https://www.aerospike.com/docs/connectors/enterprise/spark/index.html
Aerospike Connect for Spark
7 A E R O S P I K E | Proprietary & Confidential | All rights reserved. © 2020 Aerospike Inc
• Use SparkSQL to fetch data from Aerospike
• Aerospike Connect for Spark provides the capability to use Spark SQL in order to
query records from an Aerospike cluster.
• Load Aerospike data into Spark for processing
• Load data from Aerospike into DataFrames for processing
• The connector support Scan and Queries (secondary indexes)
• Save data from DataFrame back into Aerospike
• A DataFrame can be saved in Aerospike by specifying a column in the DataFrame as
the Primary Key or the Digest.
• Joins Data using Aerospike [Scala Only]
• Provides an AeroJoin function which allows you to read records from Aerospike given
a Dataset which contains keys to the records of interest.
• This operation takes advantage of Aerospike's batch read functionality.
Aerospike Spark Operations
8 A E R O S P I K E | Proprietary & Confidential | All rights reserved. © 2020 Aerospike Inc
Aerospike Spark Example: Spark SQL
9 A E R O S P I K E | Proprietary & Confidential | All rights reserved. © 2020 Aerospike Inc
Save DataFrame to Aerospike (by Key, with schema)
10 A E R O S P I K E | Proprietary & Confidential | All rights reserved. © 2020 Aerospike Inc
Aerospike Spark Example: AeroJoin
11 A E R O S P I K E | Proprietary & Confidential | All rights reserved. © 2020 Aerospike Inc
• Spark partition data for workers, supervised by executor (one per spark node)
• Aerospike scan (pre-4.9) scans data by Aerospike node (one per Aerospike node)
• That means there is a mismatch in parallization between the number of cores on the spark
side and the number of nodes on Aerospike side
Customer Story: Is Scaling an Issue?
12 A E R O S P I K E | Proprietary & Confidential | All rights reserved. © 2020 Aerospike Inc
Data is distributed evenly across nodes in a cluster using the Aerospike Smart
Partitions™ algorithm.
▪ Automatic Sharding
▪ 4096 Data Partitions
▪ Even distribution of
▪ Partitions across nodes
▪ Records across Partitions
▪ Data across Flash devices
▪ Primary and Replica Partitions
Aerospike Partitions: Even Data Distribution
13 A E R O S P I K E | Proprietary & Confidential | All rights reserved. © 2020 Aerospike Inc
• Customer Environment:
• 33 Aerospike nodes
• Over 10B objects, over 125TB unique data
• ~200 Spark Nodes with 36 core each (~7200 total cores/workers)
• The Problem: Less than 1 percent utilization on the spark side in data load operation.
• The Change: Aerospike 4.9 will allow scanning of partitions instead on nodes so 4096
partitions, Aerospike Spark Connector 2.0 Supports partition scan.
• The Result:
• The customer got a RC for Aerospike 4.9 + Spark Connector 2.0
• Using over 10B unique records (125TB unique data) was scanned, load and
filtered in ~45 minutes.
Customer Story: Scaling Things Up (With 4.9 RC Access)
14 A E R O S P I K E | Proprietary & Confidential | All rights reserved. © 2020 Aerospike Inc
Time for Q&A!
15 A E R O S P I K E | Proprietary & Confidential | All rights reserved. © 2020 Aerospike Inc
Thank You!
zelkayam@aerospike.com

More Related Content

What's hot

Accelerate Spark Workloads on S3
Accelerate Spark Workloads on S3Accelerate Spark Workloads on S3
Accelerate Spark Workloads on S3
Alluxio, Inc.
 
HPE Keynote Hadoop Summit San Jose 2016
HPE Keynote Hadoop Summit San Jose 2016HPE Keynote Hadoop Summit San Jose 2016
HPE Keynote Hadoop Summit San Jose 2016
DataWorks Summit/Hadoop Summit
 
Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...
DataWorks Summit
 
High Performance Python on Apache Spark
High Performance Python on Apache SparkHigh Performance Python on Apache Spark
High Performance Python on Apache Spark
Wes McKinney
 
TriHUG Feb: Hive on spark
TriHUG Feb: Hive on sparkTriHUG Feb: Hive on spark
TriHUG Feb: Hive on spark
trihug
 
Hive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it finalHive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it final
Hortonworks
 
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAdvancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Alluxio, Inc.
 
Query Anything, Anywhere with Kubernetes
Query Anything, Anywhere with KubernetesQuery Anything, Anywhere with Kubernetes
Query Anything, Anywhere with Kubernetes
Alluxio, Inc.
 
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit
 
Presto + Alluxio on steroids a romantic drama on Production with happy end
Presto + Alluxio on steroids a romantic drama on Production with happy endPresto + Alluxio on steroids a romantic drama on Production with happy end
Presto + Alluxio on steroids a romantic drama on Production with happy end
Alluxio, Inc.
 
PUT is the new rename()
PUT is the new rename()PUT is the new rename()
PUT is the new rename()
Steve Loughran
 
CtrlS - DR on Demand
CtrlS - DR on DemandCtrlS - DR on Demand
CtrlS - DR on Demand
CTRLS
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
DataWorks Summit
 
Distributing Data The Aerospike Way
Distributing Data The Aerospike WayDistributing Data The Aerospike Way
Distributing Data The Aerospike Way
Aerospike, Inc.
 
Infra space talk on Apache Spark - Into to CASK
Infra space talk on Apache Spark - Into to CASKInfra space talk on Apache Spark - Into to CASK
Infra space talk on Apache Spark - Into to CASK
Rob Mueller
 
October 2014 HUG : Hive On Spark
October 2014 HUG : Hive On SparkOctober 2014 HUG : Hive On Spark
October 2014 HUG : Hive On Spark
Yahoo Developer Network
 
Getting Started With Amazon Redshift
Getting Started With Amazon Redshift Getting Started With Amazon Redshift
Getting Started With Amazon Redshift
Matillion
 
Webinar | Getting Started With Amazon Redshift Spectrum
Webinar | Getting Started With Amazon Redshift SpectrumWebinar | Getting Started With Amazon Redshift Spectrum
Webinar | Getting Started With Amazon Redshift Spectrum
Matillion
 
Tame that Beast
Tame that BeastTame that Beast
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
Databricks
 

What's hot (20)

Accelerate Spark Workloads on S3
Accelerate Spark Workloads on S3Accelerate Spark Workloads on S3
Accelerate Spark Workloads on S3
 
HPE Keynote Hadoop Summit San Jose 2016
HPE Keynote Hadoop Summit San Jose 2016HPE Keynote Hadoop Summit San Jose 2016
HPE Keynote Hadoop Summit San Jose 2016
 
Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...
 
High Performance Python on Apache Spark
High Performance Python on Apache SparkHigh Performance Python on Apache Spark
High Performance Python on Apache Spark
 
TriHUG Feb: Hive on spark
TriHUG Feb: Hive on sparkTriHUG Feb: Hive on spark
TriHUG Feb: Hive on spark
 
Hive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it finalHive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it final
 
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAdvancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
 
Query Anything, Anywhere with Kubernetes
Query Anything, Anywhere with KubernetesQuery Anything, Anywhere with Kubernetes
Query Anything, Anywhere with Kubernetes
 
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod Narasimha
 
Presto + Alluxio on steroids a romantic drama on Production with happy end
Presto + Alluxio on steroids a romantic drama on Production with happy endPresto + Alluxio on steroids a romantic drama on Production with happy end
Presto + Alluxio on steroids a romantic drama on Production with happy end
 
PUT is the new rename()
PUT is the new rename()PUT is the new rename()
PUT is the new rename()
 
CtrlS - DR on Demand
CtrlS - DR on DemandCtrlS - DR on Demand
CtrlS - DR on Demand
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
 
Distributing Data The Aerospike Way
Distributing Data The Aerospike WayDistributing Data The Aerospike Way
Distributing Data The Aerospike Way
 
Infra space talk on Apache Spark - Into to CASK
Infra space talk on Apache Spark - Into to CASKInfra space talk on Apache Spark - Into to CASK
Infra space talk on Apache Spark - Into to CASK
 
October 2014 HUG : Hive On Spark
October 2014 HUG : Hive On SparkOctober 2014 HUG : Hive On Spark
October 2014 HUG : Hive On Spark
 
Getting Started With Amazon Redshift
Getting Started With Amazon Redshift Getting Started With Amazon Redshift
Getting Started With Amazon Redshift
 
Webinar | Getting Started With Amazon Redshift Spectrum
Webinar | Getting Started With Amazon Redshift SpectrumWebinar | Getting Started With Amazon Redshift Spectrum
Webinar | Getting Started With Amazon Redshift Spectrum
 
Tame that Beast
Tame that BeastTame that Beast
Tame that Beast
 
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
 

Similar to Aerospike Meetup - Real Time Insights using Spark with Aerospike - Zohar - 04 March 2020

Distributed Data Storage & Streaming for Real-time Decisioning Using Kafka, S...
Distributed Data Storage & Streaming for Real-time Decisioning Using Kafka, S...Distributed Data Storage & Streaming for Real-time Decisioning Using Kafka, S...
Distributed Data Storage & Streaming for Real-time Decisioning Using Kafka, S...
HostedbyConfluent
 
C5 journey to_the_cloud_with_oracle_sparc
C5 journey to_the_cloud_with_oracle_sparcC5 journey to_the_cloud_with_oracle_sparc
C5 journey to_the_cloud_with_oracle_sparc
Dr. Wilfred Lin (Ph.D.)
 
AWS April 2016 Webinar Series - Best Practices for Apache Spark on AWS
AWS April 2016 Webinar Series - Best Practices for Apache Spark on AWSAWS April 2016 Webinar Series - Best Practices for Apache Spark on AWS
AWS April 2016 Webinar Series - Best Practices for Apache Spark on AWS
Amazon Web Services
 
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Databricks
 
Data Science & Best Practices for Apache Spark on Amazon EMR
Data Science & Best Practices for Apache Spark on Amazon EMRData Science & Best Practices for Apache Spark on Amazon EMR
Data Science & Best Practices for Apache Spark on Amazon EMR
Amazon Web Services
 
Configuring Aerospike - Part 2
Configuring Aerospike - Part 2 Configuring Aerospike - Part 2
Configuring Aerospike - Part 2
Aerospike, Inc.
 
Amazon Aurora and AWS Database Migration Service
Amazon Aurora and AWS Database Migration ServiceAmazon Aurora and AWS Database Migration Service
Amazon Aurora and AWS Database Migration Service
Amazon Web Services
 
Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16
Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16
Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16
MLconf
 
Sparc solaris servers
Sparc solaris serversSparc solaris servers
Sparc solaris servers
solarisyougood
 
Spectrum Scale - Diversified analytic solution based on various storage servi...
Spectrum Scale - Diversified analytic solution based on various storage servi...Spectrum Scale - Diversified analytic solution based on various storage servi...
Spectrum Scale - Diversified analytic solution based on various storage servi...
Wei Gong
 
Best Practices for Using Apache Spark on AWS
Best Practices for Using Apache Spark on AWSBest Practices for Using Apache Spark on AWS
Best Practices for Using Apache Spark on AWS
Amazon Web Services
 
Big data processing with Apache Spark and Oracle Database
Big data processing with Apache Spark and Oracle DatabaseBig data processing with Apache Spark and Oracle Database
Big data processing with Apache Spark and Oracle Database
Martin Toshev
 
Představení produktové řady Oracle SPARC S7
Představení produktové řady Oracle SPARC S7Představení produktové řady Oracle SPARC S7
Představení produktové řady Oracle SPARC S7
MarketingArrowECS_CZ
 
Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...
Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...
Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...
Databricks
 
Lift and shift to sparc cloud
Lift and shift to sparc cloudLift and shift to sparc cloud
Lift and shift to sparc cloud
Riccardo Romani
 
Why_Oracle_Hardware.ppt
Why_Oracle_Hardware.pptWhy_Oracle_Hardware.ppt
Why_Oracle_Hardware.ppt
EverestMedinilla2
 
Oracle Cloud Infrastructure
Oracle Cloud InfrastructureOracle Cloud Infrastructure
Oracle Cloud Infrastructure
MarketingArrowECS_CZ
 
Apache Spark in Scientific Applications
Apache Spark in Scientific ApplicationsApache Spark in Scientific Applications
Apache Spark in Scientific Applications
Dr. Mirko Kämpf
 
Apache Spark in Scientific Applciations
Apache Spark in Scientific ApplciationsApache Spark in Scientific Applciations
Apache Spark in Scientific Applciations
Dr. Mirko Kämpf
 
Aerospike meetup july 2019 | Big Data Demystified
Aerospike meetup july 2019 | Big Data DemystifiedAerospike meetup july 2019 | Big Data Demystified
Aerospike meetup july 2019 | Big Data Demystified
Omid Vahdaty
 

Similar to Aerospike Meetup - Real Time Insights using Spark with Aerospike - Zohar - 04 March 2020 (20)

Distributed Data Storage & Streaming for Real-time Decisioning Using Kafka, S...
Distributed Data Storage & Streaming for Real-time Decisioning Using Kafka, S...Distributed Data Storage & Streaming for Real-time Decisioning Using Kafka, S...
Distributed Data Storage & Streaming for Real-time Decisioning Using Kafka, S...
 
C5 journey to_the_cloud_with_oracle_sparc
C5 journey to_the_cloud_with_oracle_sparcC5 journey to_the_cloud_with_oracle_sparc
C5 journey to_the_cloud_with_oracle_sparc
 
AWS April 2016 Webinar Series - Best Practices for Apache Spark on AWS
AWS April 2016 Webinar Series - Best Practices for Apache Spark on AWSAWS April 2016 Webinar Series - Best Practices for Apache Spark on AWS
AWS April 2016 Webinar Series - Best Practices for Apache Spark on AWS
 
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
 
Data Science & Best Practices for Apache Spark on Amazon EMR
Data Science & Best Practices for Apache Spark on Amazon EMRData Science & Best Practices for Apache Spark on Amazon EMR
Data Science & Best Practices for Apache Spark on Amazon EMR
 
Configuring Aerospike - Part 2
Configuring Aerospike - Part 2 Configuring Aerospike - Part 2
Configuring Aerospike - Part 2
 
Amazon Aurora and AWS Database Migration Service
Amazon Aurora and AWS Database Migration ServiceAmazon Aurora and AWS Database Migration Service
Amazon Aurora and AWS Database Migration Service
 
Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16
Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16
Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16
 
Sparc solaris servers
Sparc solaris serversSparc solaris servers
Sparc solaris servers
 
Spectrum Scale - Diversified analytic solution based on various storage servi...
Spectrum Scale - Diversified analytic solution based on various storage servi...Spectrum Scale - Diversified analytic solution based on various storage servi...
Spectrum Scale - Diversified analytic solution based on various storage servi...
 
Best Practices for Using Apache Spark on AWS
Best Practices for Using Apache Spark on AWSBest Practices for Using Apache Spark on AWS
Best Practices for Using Apache Spark on AWS
 
Big data processing with Apache Spark and Oracle Database
Big data processing with Apache Spark and Oracle DatabaseBig data processing with Apache Spark and Oracle Database
Big data processing with Apache Spark and Oracle Database
 
Představení produktové řady Oracle SPARC S7
Představení produktové řady Oracle SPARC S7Představení produktové řady Oracle SPARC S7
Představení produktové řady Oracle SPARC S7
 
Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...
Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...
Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...
 
Lift and shift to sparc cloud
Lift and shift to sparc cloudLift and shift to sparc cloud
Lift and shift to sparc cloud
 
Why_Oracle_Hardware.ppt
Why_Oracle_Hardware.pptWhy_Oracle_Hardware.ppt
Why_Oracle_Hardware.ppt
 
Oracle Cloud Infrastructure
Oracle Cloud InfrastructureOracle Cloud Infrastructure
Oracle Cloud Infrastructure
 
Apache Spark in Scientific Applications
Apache Spark in Scientific ApplicationsApache Spark in Scientific Applications
Apache Spark in Scientific Applications
 
Apache Spark in Scientific Applciations
Apache Spark in Scientific ApplciationsApache Spark in Scientific Applciations
Apache Spark in Scientific Applciations
 
Aerospike meetup july 2019 | Big Data Demystified
Aerospike meetup july 2019 | Big Data DemystifiedAerospike meetup july 2019 | Big Data Demystified
Aerospike meetup july 2019 | Big Data Demystified
 

More from Aerospike

Aerospike-AppsFlyer COVID-19 Crisis Growth Elad Leev
Aerospike-AppsFlyer COVID-19 Crisis Growth Elad LeevAerospike-AppsFlyer COVID-19 Crisis Growth Elad Leev
Aerospike-AppsFlyer COVID-19 Crisis Growth Elad Leev
Aerospike
 
Contentsquare Aerospike Usage and COVID-19 Impact - Doron Hoffman
Contentsquare Aerospike Usage and COVID-19 Impact - Doron HoffmanContentsquare Aerospike Usage and COVID-19 Impact - Doron Hoffman
Contentsquare Aerospike Usage and COVID-19 Impact - Doron Hoffman
Aerospike
 
Handling Increasing Load and Reducing Costs During COVID-19 Crisis - Oshrat &...
Handling Increasing Load and Reducing Costs During COVID-19 Crisis - Oshrat &...Handling Increasing Load and Reducing Costs During COVID-19 Crisis - Oshrat &...
Handling Increasing Load and Reducing Costs During COVID-19 Crisis - Oshrat &...
Aerospike
 
Aerospike Meetup - Introduction - Ami - 04 March 2020
Aerospike Meetup - Introduction - Ami - 04 March 2020Aerospike Meetup - Introduction - Ami - 04 March 2020
Aerospike Meetup - Introduction - Ami - 04 March 2020
Aerospike
 
Aerospike Meetup - Nielsen Customer Story - Alex - 04 March 2020
Aerospike Meetup - Nielsen Customer Story - Alex - 04 March 2020Aerospike Meetup - Nielsen Customer Story - Alex - 04 March 2020
Aerospike Meetup - Nielsen Customer Story - Alex - 04 March 2020
Aerospike
 
Aerospike Roadmap Overview - Meetup Dec 2019
Aerospike Roadmap Overview - Meetup Dec 2019Aerospike Roadmap Overview - Meetup Dec 2019
Aerospike Roadmap Overview - Meetup Dec 2019
Aerospike
 
Aerospike Nested CDTs - Meetup Dec 2019
Aerospike Nested CDTs - Meetup Dec 2019Aerospike Nested CDTs - Meetup Dec 2019
Aerospike Nested CDTs - Meetup Dec 2019
Aerospike
 
Aerospike Data Modeling - Meetup Dec 2019
Aerospike Data Modeling - Meetup Dec 2019Aerospike Data Modeling - Meetup Dec 2019
Aerospike Data Modeling - Meetup Dec 2019
Aerospike
 
JDBC Driver for Aerospike - Meetup Dec 2019
JDBC Driver for Aerospike - Meetup Dec 2019JDBC Driver for Aerospike - Meetup Dec 2019
JDBC Driver for Aerospike - Meetup Dec 2019
Aerospike
 

More from Aerospike (9)

Aerospike-AppsFlyer COVID-19 Crisis Growth Elad Leev
Aerospike-AppsFlyer COVID-19 Crisis Growth Elad LeevAerospike-AppsFlyer COVID-19 Crisis Growth Elad Leev
Aerospike-AppsFlyer COVID-19 Crisis Growth Elad Leev
 
Contentsquare Aerospike Usage and COVID-19 Impact - Doron Hoffman
Contentsquare Aerospike Usage and COVID-19 Impact - Doron HoffmanContentsquare Aerospike Usage and COVID-19 Impact - Doron Hoffman
Contentsquare Aerospike Usage and COVID-19 Impact - Doron Hoffman
 
Handling Increasing Load and Reducing Costs During COVID-19 Crisis - Oshrat &...
Handling Increasing Load and Reducing Costs During COVID-19 Crisis - Oshrat &...Handling Increasing Load and Reducing Costs During COVID-19 Crisis - Oshrat &...
Handling Increasing Load and Reducing Costs During COVID-19 Crisis - Oshrat &...
 
Aerospike Meetup - Introduction - Ami - 04 March 2020
Aerospike Meetup - Introduction - Ami - 04 March 2020Aerospike Meetup - Introduction - Ami - 04 March 2020
Aerospike Meetup - Introduction - Ami - 04 March 2020
 
Aerospike Meetup - Nielsen Customer Story - Alex - 04 March 2020
Aerospike Meetup - Nielsen Customer Story - Alex - 04 March 2020Aerospike Meetup - Nielsen Customer Story - Alex - 04 March 2020
Aerospike Meetup - Nielsen Customer Story - Alex - 04 March 2020
 
Aerospike Roadmap Overview - Meetup Dec 2019
Aerospike Roadmap Overview - Meetup Dec 2019Aerospike Roadmap Overview - Meetup Dec 2019
Aerospike Roadmap Overview - Meetup Dec 2019
 
Aerospike Nested CDTs - Meetup Dec 2019
Aerospike Nested CDTs - Meetup Dec 2019Aerospike Nested CDTs - Meetup Dec 2019
Aerospike Nested CDTs - Meetup Dec 2019
 
Aerospike Data Modeling - Meetup Dec 2019
Aerospike Data Modeling - Meetup Dec 2019Aerospike Data Modeling - Meetup Dec 2019
Aerospike Data Modeling - Meetup Dec 2019
 
JDBC Driver for Aerospike - Meetup Dec 2019
JDBC Driver for Aerospike - Meetup Dec 2019JDBC Driver for Aerospike - Meetup Dec 2019
JDBC Driver for Aerospike - Meetup Dec 2019
 

Recently uploaded

Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
HarisZaheer8
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Tatiana Kojar
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
Dinusha Kumarasiri
 

Recently uploaded (20)

Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
 

Aerospike Meetup - Real Time Insights using Spark with Aerospike - Zohar - 04 March 2020

  • 1. Real-Time Insights by Leveraging Spark with Aerospike Aerospike Spark Connector Zohar Elkayam, Aerospike
  • 2. 2 Proprietary & Confidential | All rights reserved. © 2020 Aerospike Inc. ▪ Where is Aerospike Spark Connecter located in the EcoSystem ▪ A Quick Overview of Aerospike Spark Connector ▪ Some Code Example ▪ Scaling up: A Customer Story Agenda
  • 3. 3 Proprietary & Confidential | All rights reserved. © 2020 Aerospike Inc. Data Warehouse Data Lake Legacy RDBMS HDFS Based Aerospike Simplifies Real-time Architecture at any Scale Aerospike Database SoE Location 1 SoE Location 2 SoE Location 3 XDR XDR Transactional Systems Aerospike Database XDR XDR Enterprise Environment Transactional Systems Legacy Database (Mainframe) RDBMS Database Delivering Extreme Scalability: ✓ Simplicity ✓ Maintainability ✓ Durability ✓ Strong Consistency ✓ Scalability ✓ Low Cost ($) ✓ Less Data Drag XDR Legacy RDBMS Data LakeReal-time Data Warehouse System of Record Query & Reporting Store XDR
  • 4. 4 A E R O S P I K E | Proprietary & Confidential | All rights reserved. © 2020 Aerospike Inc Aerospike Connect for Spark
  • 5. 5 Proprietary & Confidential | All rights reserved. © 2020 Aerospike Inc. Aerospike Connect for Spark Example Use Cases ✓ Fraud prevention: transaction data via streaming and need to analyze based on historical data in real time ✓ Recommendation Engines: Real-time recommendations and targeting based on user behavior ✓ Ad Tech: Ad Fraud and real-time retargeting base on user behavior ✓ Digital Identity Management ✓ Industrial Internet of Things (IIoT): Real-time & closed loop business decisions
  • 6. 6 A E R O S P I K E | Proprietary & Confidential | All rights reserved. © 2020 Aerospike Inc • Spark connection for Aerospike – both loading the data and using it as dataframe (i.e. Spark SQL) or by using it as streamed data • Supports Scala (spark-shell) for all Aerospike’s Spark Operations • Support Python (pyspark) for some operations – Dataset operations not supported • Guide: https://www.aerospike.com/docs/connectors/enterprise/spark/index.html Aerospike Connect for Spark
  • 7. 7 A E R O S P I K E | Proprietary & Confidential | All rights reserved. © 2020 Aerospike Inc • Use SparkSQL to fetch data from Aerospike • Aerospike Connect for Spark provides the capability to use Spark SQL in order to query records from an Aerospike cluster. • Load Aerospike data into Spark for processing • Load data from Aerospike into DataFrames for processing • The connector support Scan and Queries (secondary indexes) • Save data from DataFrame back into Aerospike • A DataFrame can be saved in Aerospike by specifying a column in the DataFrame as the Primary Key or the Digest. • Joins Data using Aerospike [Scala Only] • Provides an AeroJoin function which allows you to read records from Aerospike given a Dataset which contains keys to the records of interest. • This operation takes advantage of Aerospike's batch read functionality. Aerospike Spark Operations
  • 8. 8 A E R O S P I K E | Proprietary & Confidential | All rights reserved. © 2020 Aerospike Inc Aerospike Spark Example: Spark SQL
  • 9. 9 A E R O S P I K E | Proprietary & Confidential | All rights reserved. © 2020 Aerospike Inc Save DataFrame to Aerospike (by Key, with schema)
  • 10. 10 A E R O S P I K E | Proprietary & Confidential | All rights reserved. © 2020 Aerospike Inc Aerospike Spark Example: AeroJoin
  • 11. 11 A E R O S P I K E | Proprietary & Confidential | All rights reserved. © 2020 Aerospike Inc • Spark partition data for workers, supervised by executor (one per spark node) • Aerospike scan (pre-4.9) scans data by Aerospike node (one per Aerospike node) • That means there is a mismatch in parallization between the number of cores on the spark side and the number of nodes on Aerospike side Customer Story: Is Scaling an Issue?
  • 12. 12 A E R O S P I K E | Proprietary & Confidential | All rights reserved. © 2020 Aerospike Inc Data is distributed evenly across nodes in a cluster using the Aerospike Smart Partitions™ algorithm. ▪ Automatic Sharding ▪ 4096 Data Partitions ▪ Even distribution of ▪ Partitions across nodes ▪ Records across Partitions ▪ Data across Flash devices ▪ Primary and Replica Partitions Aerospike Partitions: Even Data Distribution
  • 13. 13 A E R O S P I K E | Proprietary & Confidential | All rights reserved. © 2020 Aerospike Inc • Customer Environment: • 33 Aerospike nodes • Over 10B objects, over 125TB unique data • ~200 Spark Nodes with 36 core each (~7200 total cores/workers) • The Problem: Less than 1 percent utilization on the spark side in data load operation. • The Change: Aerospike 4.9 will allow scanning of partitions instead on nodes so 4096 partitions, Aerospike Spark Connector 2.0 Supports partition scan. • The Result: • The customer got a RC for Aerospike 4.9 + Spark Connector 2.0 • Using over 10B unique records (125TB unique data) was scanned, load and filtered in ~45 minutes. Customer Story: Scaling Things Up (With 4.9 RC Access)
  • 14. 14 A E R O S P I K E | Proprietary & Confidential | All rights reserved. © 2020 Aerospike Inc Time for Q&A!
  • 15. 15 A E R O S P I K E | Proprietary & Confidential | All rights reserved. © 2020 Aerospike Inc Thank You! zelkayam@aerospike.com