SlideShare a Scribd company logo
© 2018 GridGain Systems, Inc.
Improving Apache Spark™ In-Memory
Computing with Apache Ignite™
Valentin Kulichenko
GridGain Systems
© 2018 GridGain Systems, Inc.
a memory-centric distributed
database, caching, and processing platform
for transactional, analytical, and streaming workloads,
delivering in-memory speeds at petabyte scale
© 2018 GridGain Systems, Inc.
Apache Ignite Database and Caching Platform
Memory-Centric Storage
Ignite Native Persistence
(Flash, SSD, Intel 3D XPoint)
Third-Party Persistence
(RDBMS, HDFS, NoSQL)
SQL Transactions Compute Services MLStreamingKey/Value
IoTFinancial
Services
Pharma &
Healthcare
E-CommerceTravel &
Logistics
Telco
© 2018 GridGain Systems, Inc.
• Distributed memory-centric database • Ingests data from HDFS or another
storage
• Fully fledged compute platform: SQL,
transactions, key-value, collocated
processing, ML/DL
• Streaming and compute engine
• OLAP and OLTP • Inclined towards OLAP and focused on
MR payloads
Comparing Ignite and Spark
© 2018 GridGain Systems, Inc.
Ignite is a memory-centric store for Spark
• No data movement from Ignite to Spark
• In-place query execution
• Boost DataFrame and SQL performance
• Share state and data among Spark jobs
• Faster data and streaming analytics
Ignite and Spark Together
+
© 2018 GridGain Systems, Inc.
Ignite and Spark Integration
Spark Application
Spark Worker
Spark
Job
Spark
Job
Yarn Mesos Docker HDFS
Spark Worker
Spark
Job
Spark
Job
Spark Worker
Spark
Job
Spark
Job
In-Memory Shared RDD or DataFrame
GridGain Node GridGain Node GridGain Node
Share state and
data among
Spark jobs
No data
movement
Boost DataFrame
and SQL
Performance
SQL on top
of RDDs
In-place query
execution
© 2018 GridGain Systems, Inc.
• Spark RDD abstraction
• Shared view over Ignite cache/table
• Mutable
• Ignite SQL on top of RDDs APIs
• Indexes and in-place execution
Ignite Shared RDDs
© 2018 GridGain Systems, Inc.
• Standard RDD APIs + Ignite SQL
• No rip-and-replace
• Switch to Ignite as a storage
Write to and Read from Ignite
val sharedRDD: IgniteRDD[int, int] = ic.fromCache(”sharedRDD")
val greaterThanFiftyThousand = sharedRDD.filter(_._2 > 50000)
val df = sharedRDD.sql(”select _val from Integer where _key > 50000”)
val sharedRDD: IgniteRDD[int, int] = ic.fromCache(”sharedRDD")
sharedRDD.savePairs(sc.parallelize(1 to 100000, 10).map(i => (i, i)))
© 2018 GridGain Systems, Inc.
• Optimizing Spark’s Catalyst Engine
• In-place execution on Ignite side
• No data movement
• For most of the scenarios
Ignite DataFrames
© 2017 GridGain Systems, Inc.
1. Initial Query
2. Query execution over local data
3. Reduce multiple results in one
Ignite Node
Canada
Toronto
Ottawa
Montreal
Calgary
Ignite Node
India
Mumbai
New Delhi
1
2
23
SQL Queries Execution Flow
© 2018 GridGain Systems, Inc.
• Store DataFrames in Ignite
• Save modes
• Append
• Overwrite
• ErrorIfExists
• Ignore
SparkSession spark = _
String cfgPath = “path/to/config/file”
Dataset<Row> jsonDataFrame = spark.read().json(“path/to/file.json”);
jsonDataFrame.write()
.format(IgniteDataFrameSettings.FORMAT_IGNITE())
.mode(SaveMode.Append) // SaveMode
//... other options
.save();
Saving DataFrames
© 2018 GridGain Systems, Inc.
• Read from Ignite
• Specify format
• Specify config file
SparkSession spark = _
String cfgPath = “path/to/config/file”
Dataset<Row> df = spark.read()
.format(IgniteDataFrameSettings.FORMAT_IGNITE()) //Data source
.option(IgniteDataFrameSettings.OPTION_TABLE(), "person") //Table to read
.option(IgniteDataFrameSettings.OPTION_CONFIG_FILE(), cfgPath) //Ignite config
.load();
df.createOrReplaceTempView("person");
Dataset<Row> igniteDF = spark.sql(
"SELECT * FROM person WHERE name = 'Mary Major'");
Reading DataFrames
© 2018 GridGain Systems, Inc.
• 1 Ignite Server Node
• SensorDataGenerator
• Writes random data to a socket
• Stream
• Connects to the socket, reads sensor data and
streams via Spark; for each streamed RDD, it
creates a DataFrame and saves it into Ignite
• Query
• Creates another Spark application that uses
DataFrames integration to query data from Ignite
DataFrames Demo Setup
+
© 2018 GridGain Systems, Inc.
Any Questions?
Thank you for joining us. Follow the conversation.
http://ignite.apache.org
#apacheignite

More Related Content

What's hot

Azure data lakes
Azure data lakesAzure data lakes
Azure data lakes
Vishwas N
 
The new big data
The new big dataThe new big data
The new big data
Adam Doyle
 
Iceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data AnalyticsIceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data Analytics
Alluxio, Inc.
 
HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in th...
HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in th...HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in th...
HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in th...
Cloudera, Inc.
 
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
DataStax
 
Built-In Security for the Cloud
Built-In Security for the CloudBuilt-In Security for the Cloud
Built-In Security for the Cloud
DataWorks Summit
 
Unified Data Access with Gimel
Unified Data Access with GimelUnified Data Access with Gimel
Unified Data Access with Gimel
Alluxio, Inc.
 
PostgreSQL continuous backup and PITR with Barman
 PostgreSQL continuous backup and PITR with Barman PostgreSQL continuous backup and PITR with Barman
PostgreSQL continuous backup and PITR with Barman
EDB
 
Unleash the power of Azure Data Factory
Unleash the power of Azure Data Factory Unleash the power of Azure Data Factory
Unleash the power of Azure Data Factory
Sergio Zenatti Filho
 
Cloudian HyperStore Operating Environment
Cloudian HyperStore Operating EnvironmentCloudian HyperStore Operating Environment
Cloudian HyperStore Operating Environment
Cloudian
 
Ignite Your Big Data With a Spark!
Ignite Your Big Data With a Spark!Ignite Your Big Data With a Spark!
Ignite Your Big Data With a Spark!
Progress
 
Azure Data Factory v2
Azure Data Factory v2Azure Data Factory v2
Azure Data Factory v2
Sergio Zenatti Filho
 
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environmen...
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environmen...HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environmen...
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environmen...
Cloudera, Inc.
 
Accelerating the Hadoop data stack with Apache Ignite, Spark and Bigtop
Accelerating the Hadoop data stack with Apache Ignite, Spark and BigtopAccelerating the Hadoop data stack with Apache Ignite, Spark and Bigtop
Accelerating the Hadoop data stack with Apache Ignite, Spark and Bigtop
In-Memory Computing Summit
 
Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
Laurent Leturgez
 
Backup multi-cloud solution based on named pipes
Backup multi-cloud solution based on named pipesBackup multi-cloud solution based on named pipes
Backup multi-cloud solution based on named pipes
Leandro Totino Pereira
 
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and CloudsArchitecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Alluxio, Inc.
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud Era
Alluxio, Inc.
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
Data Con LA
 
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Cloudian
 

What's hot (20)

Azure data lakes
Azure data lakesAzure data lakes
Azure data lakes
 
The new big data
The new big dataThe new big data
The new big data
 
Iceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data AnalyticsIceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data Analytics
 
HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in th...
HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in th...HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in th...
HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in th...
 
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
 
Built-In Security for the Cloud
Built-In Security for the CloudBuilt-In Security for the Cloud
Built-In Security for the Cloud
 
Unified Data Access with Gimel
Unified Data Access with GimelUnified Data Access with Gimel
Unified Data Access with Gimel
 
PostgreSQL continuous backup and PITR with Barman
 PostgreSQL continuous backup and PITR with Barman PostgreSQL continuous backup and PITR with Barman
PostgreSQL continuous backup and PITR with Barman
 
Unleash the power of Azure Data Factory
Unleash the power of Azure Data Factory Unleash the power of Azure Data Factory
Unleash the power of Azure Data Factory
 
Cloudian HyperStore Operating Environment
Cloudian HyperStore Operating EnvironmentCloudian HyperStore Operating Environment
Cloudian HyperStore Operating Environment
 
Ignite Your Big Data With a Spark!
Ignite Your Big Data With a Spark!Ignite Your Big Data With a Spark!
Ignite Your Big Data With a Spark!
 
Azure Data Factory v2
Azure Data Factory v2Azure Data Factory v2
Azure Data Factory v2
 
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environmen...
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environmen...HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environmen...
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environmen...
 
Accelerating the Hadoop data stack with Apache Ignite, Spark and Bigtop
Accelerating the Hadoop data stack with Apache Ignite, Spark and BigtopAccelerating the Hadoop data stack with Apache Ignite, Spark and Bigtop
Accelerating the Hadoop data stack with Apache Ignite, Spark and Bigtop
 
Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
 
Backup multi-cloud solution based on named pipes
Backup multi-cloud solution based on named pipesBackup multi-cloud solution based on named pipes
Backup multi-cloud solution based on named pipes
 
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and CloudsArchitecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud Era
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
 
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
 

Similar to Improving Apache Spark™ In-Memory Computing with Apache Ignite™

Data Summer Conf 2018, “Apache Ignite + Apache Spark RDDs and DataFrames inte...
Data Summer Conf 2018, “Apache Ignite + Apache Spark RDDs and DataFrames inte...Data Summer Conf 2018, “Apache Ignite + Apache Spark RDDs and DataFrames inte...
Data Summer Conf 2018, “Apache Ignite + Apache Spark RDDs and DataFrames inte...
Provectus
 
Spark Summit EU talk by Christos Erotocritou
Spark Summit EU talk by Christos ErotocritouSpark Summit EU talk by Christos Erotocritou
Spark Summit EU talk by Christos Erotocritou
Spark Summit
 
How to become an big data rockstar in 15 minutes - Akmal Chaudhri
How to become an big data rockstar in 15 minutes - Akmal ChaudhriHow to become an big data rockstar in 15 minutes - Akmal Chaudhri
How to become an big data rockstar in 15 minutes - Akmal Chaudhri
Dataconomy Media
 
Fast Data with Apache Ignite and Apache Spark with Christos Erotocritou
Fast Data with Apache Ignite and Apache Spark with Christos ErotocritouFast Data with Apache Ignite and Apache Spark with Christos Erotocritou
Fast Data with Apache Ignite and Apache Spark with Christos Erotocritou
Spark Summit
 
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017
Codemotion
 
Apache Ignite: In-Memory Hammer for Your Data Science Toolkit
Apache Ignite: In-Memory Hammer for Your Data Science ToolkitApache Ignite: In-Memory Hammer for Your Data Science Toolkit
Apache Ignite: In-Memory Hammer for Your Data Science Toolkit
Denis Magda
 
Getting Started with Apache Ignite as a Distributed Database
Getting Started with Apache Ignite as a Distributed DatabaseGetting Started with Apache Ignite as a Distributed Database
Getting Started with Apache Ignite as a Distributed Database
Roman Shtykh
 
OSDC 2017 - Christos Erotocritou - Apache ignite in-memory data fabric
OSDC 2017 - Christos Erotocritou - Apache ignite in-memory data fabricOSDC 2017 - Christos Erotocritou - Apache ignite in-memory data fabric
OSDC 2017 - Christos Erotocritou - Apache ignite in-memory data fabric
NETWAYS
 
Apache Ignite: In-Memory Hammer for Your Data Science Toolkit
Apache Ignite: In-Memory Hammer for Your Data Science ToolkitApache Ignite: In-Memory Hammer for Your Data Science Toolkit
Apache Ignite: In-Memory Hammer for Your Data Science Toolkit
Denis Magda
 
Apache Spark and Apache Ignite: Where Fast Data Meets IoT
Apache Spark and Apache Ignite: Where Fast Data Meets IoTApache Spark and Apache Ignite: Where Fast Data Meets IoT
Apache Spark and Apache Ignite: Where Fast Data Meets IoT
Denis Magda
 
Deploying Distributed Databases and In-Memory Computing Platforms with Kubern...
Deploying Distributed Databases and In-Memory Computing Platforms with Kubern...Deploying Distributed Databases and In-Memory Computing Platforms with Kubern...
Deploying Distributed Databases and In-Memory Computing Platforms with Kubern...
Stephen Darlington
 
Operational Intelligence Using Hadoop
Operational Intelligence Using HadoopOperational Intelligence Using Hadoop
Operational Intelligence Using Hadoop
DataWorks Summit
 
“Building consistent and highly available distributed systems with Apache Ign...
“Building consistent and highly available distributed systems with Apache Ign...“Building consistent and highly available distributed systems with Apache Ign...
“Building consistent and highly available distributed systems with Apache Ign...
Tom Diederich
 
Making Hadoop Realtime by Dr. William Bain of Scaleout Software
Making Hadoop Realtime by Dr. William Bain of Scaleout SoftwareMaking Hadoop Realtime by Dr. William Bain of Scaleout Software
Making Hadoop Realtime by Dr. William Bain of Scaleout Software
Data Con LA
 
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...
Certus Solutions
 
Giga Spaces Data Grid / Data Caching Overview
Giga Spaces Data Grid / Data Caching OverviewGiga Spaces Data Grid / Data Caching Overview
Giga Spaces Data Grid / Data Caching Overview
jimliddle
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analytics
kgshukla
 
Accelerated Any-Scale Solutions from DDN
Accelerated Any-Scale Solutions from DDNAccelerated Any-Scale Solutions from DDN
Accelerated Any-Scale Solutions from DDN
inside-BigData.com
 
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
Amazon Web Services
 
Make your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSMake your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWS
Kimmo Kantojärvi
 

Similar to Improving Apache Spark™ In-Memory Computing with Apache Ignite™ (20)

Data Summer Conf 2018, “Apache Ignite + Apache Spark RDDs and DataFrames inte...
Data Summer Conf 2018, “Apache Ignite + Apache Spark RDDs and DataFrames inte...Data Summer Conf 2018, “Apache Ignite + Apache Spark RDDs and DataFrames inte...
Data Summer Conf 2018, “Apache Ignite + Apache Spark RDDs and DataFrames inte...
 
Spark Summit EU talk by Christos Erotocritou
Spark Summit EU talk by Christos ErotocritouSpark Summit EU talk by Christos Erotocritou
Spark Summit EU talk by Christos Erotocritou
 
How to become an big data rockstar in 15 minutes - Akmal Chaudhri
How to become an big data rockstar in 15 minutes - Akmal ChaudhriHow to become an big data rockstar in 15 minutes - Akmal Chaudhri
How to become an big data rockstar in 15 minutes - Akmal Chaudhri
 
Fast Data with Apache Ignite and Apache Spark with Christos Erotocritou
Fast Data with Apache Ignite and Apache Spark with Christos ErotocritouFast Data with Apache Ignite and Apache Spark with Christos Erotocritou
Fast Data with Apache Ignite and Apache Spark with Christos Erotocritou
 
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017
 
Apache Ignite: In-Memory Hammer for Your Data Science Toolkit
Apache Ignite: In-Memory Hammer for Your Data Science ToolkitApache Ignite: In-Memory Hammer for Your Data Science Toolkit
Apache Ignite: In-Memory Hammer for Your Data Science Toolkit
 
Getting Started with Apache Ignite as a Distributed Database
Getting Started with Apache Ignite as a Distributed DatabaseGetting Started with Apache Ignite as a Distributed Database
Getting Started with Apache Ignite as a Distributed Database
 
OSDC 2017 - Christos Erotocritou - Apache ignite in-memory data fabric
OSDC 2017 - Christos Erotocritou - Apache ignite in-memory data fabricOSDC 2017 - Christos Erotocritou - Apache ignite in-memory data fabric
OSDC 2017 - Christos Erotocritou - Apache ignite in-memory data fabric
 
Apache Ignite: In-Memory Hammer for Your Data Science Toolkit
Apache Ignite: In-Memory Hammer for Your Data Science ToolkitApache Ignite: In-Memory Hammer for Your Data Science Toolkit
Apache Ignite: In-Memory Hammer for Your Data Science Toolkit
 
Apache Spark and Apache Ignite: Where Fast Data Meets IoT
Apache Spark and Apache Ignite: Where Fast Data Meets IoTApache Spark and Apache Ignite: Where Fast Data Meets IoT
Apache Spark and Apache Ignite: Where Fast Data Meets IoT
 
Deploying Distributed Databases and In-Memory Computing Platforms with Kubern...
Deploying Distributed Databases and In-Memory Computing Platforms with Kubern...Deploying Distributed Databases and In-Memory Computing Platforms with Kubern...
Deploying Distributed Databases and In-Memory Computing Platforms with Kubern...
 
Operational Intelligence Using Hadoop
Operational Intelligence Using HadoopOperational Intelligence Using Hadoop
Operational Intelligence Using Hadoop
 
“Building consistent and highly available distributed systems with Apache Ign...
“Building consistent and highly available distributed systems with Apache Ign...“Building consistent and highly available distributed systems with Apache Ign...
“Building consistent and highly available distributed systems with Apache Ign...
 
Making Hadoop Realtime by Dr. William Bain of Scaleout Software
Making Hadoop Realtime by Dr. William Bain of Scaleout SoftwareMaking Hadoop Realtime by Dr. William Bain of Scaleout Software
Making Hadoop Realtime by Dr. William Bain of Scaleout Software
 
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...
 
Giga Spaces Data Grid / Data Caching Overview
Giga Spaces Data Grid / Data Caching OverviewGiga Spaces Data Grid / Data Caching Overview
Giga Spaces Data Grid / Data Caching Overview
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analytics
 
Accelerated Any-Scale Solutions from DDN
Accelerated Any-Scale Solutions from DDNAccelerated Any-Scale Solutions from DDN
Accelerated Any-Scale Solutions from DDN
 
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
 
Make your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSMake your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWS
 

More from Tom Diederich

Tom Diederich portfolio presentation (updated Nov. 18, 2016)
Tom Diederich portfolio presentation (updated Nov. 18, 2016)Tom Diederich portfolio presentation (updated Nov. 18, 2016)
Tom Diederich portfolio presentation (updated Nov. 18, 2016)
Tom Diederich
 
How to build & grow online communities: with Tom Diederich
How to build & grow online communities: with Tom DiederichHow to build & grow online communities: with Tom Diederich
How to build & grow online communities: with Tom Diederich
Tom Diederich
 
Troubleshooting Apache® Ignite™
Troubleshooting Apache® Ignite™Troubleshooting Apache® Ignite™
Troubleshooting Apache® Ignite™
Tom Diederich
 
How to build a production-ready in-memory-based application in 1 hour
How to build a production-ready in-memory-based application in 1 hourHow to build a production-ready in-memory-based application in 1 hour
How to build a production-ready in-memory-based application in 1 hour
Tom Diederich
 
Ingesting streaming data for analysis in apache ignite (stream sets theme)
Ingesting streaming data for analysis in apache ignite (stream sets theme)Ingesting streaming data for analysis in apache ignite (stream sets theme)
Ingesting streaming data for analysis in apache ignite (stream sets theme)
Tom Diederich
 
IT Modernization in Practice
IT Modernization in PracticeIT Modernization in Practice
IT Modernization in Practice
Tom Diederich
 
In-Memory Key Value Store (KVS) in FPGA for Ultra Low Latency and High Throug...
In-Memory Key Value Store (KVS) in FPGA for Ultra Low Latency and High Throug...In-Memory Key Value Store (KVS) in FPGA for Ultra Low Latency and High Throug...
In-Memory Key Value Store (KVS) in FPGA for Ultra Low Latency and High Throug...
Tom Diederich
 
Machine learning and deep learning with Apache Ignite
Machine learning and deep learning with Apache IgniteMachine learning and deep learning with Apache Ignite
Machine learning and deep learning with Apache Ignite
Tom Diederich
 
Heimdall Data: "Increase Application Performance with SQL Auto-Caching; No Co...
Heimdall Data: "Increase Application Performance with SQL Auto-Caching; No Co...Heimdall Data: "Increase Application Performance with SQL Auto-Caching; No Co...
Heimdall Data: "Increase Application Performance with SQL Auto-Caching; No Co...
Tom Diederich
 
Comparing Apache Ignite and Cassandra for Hybrid Transactional/Analytical Pro...
Comparing Apache Ignite and Cassandra for Hybrid Transactional/Analytical Pro...Comparing Apache Ignite and Cassandra for Hybrid Transactional/Analytical Pro...
Comparing Apache Ignite and Cassandra for Hybrid Transactional/Analytical Pro...
Tom Diederich
 
Quick MySQL performance check
Quick MySQL performance checkQuick MySQL performance check
Quick MySQL performance check
Tom Diederich
 

More from Tom Diederich (11)

Tom Diederich portfolio presentation (updated Nov. 18, 2016)
Tom Diederich portfolio presentation (updated Nov. 18, 2016)Tom Diederich portfolio presentation (updated Nov. 18, 2016)
Tom Diederich portfolio presentation (updated Nov. 18, 2016)
 
How to build & grow online communities: with Tom Diederich
How to build & grow online communities: with Tom DiederichHow to build & grow online communities: with Tom Diederich
How to build & grow online communities: with Tom Diederich
 
Troubleshooting Apache® Ignite™
Troubleshooting Apache® Ignite™Troubleshooting Apache® Ignite™
Troubleshooting Apache® Ignite™
 
How to build a production-ready in-memory-based application in 1 hour
How to build a production-ready in-memory-based application in 1 hourHow to build a production-ready in-memory-based application in 1 hour
How to build a production-ready in-memory-based application in 1 hour
 
Ingesting streaming data for analysis in apache ignite (stream sets theme)
Ingesting streaming data for analysis in apache ignite (stream sets theme)Ingesting streaming data for analysis in apache ignite (stream sets theme)
Ingesting streaming data for analysis in apache ignite (stream sets theme)
 
IT Modernization in Practice
IT Modernization in PracticeIT Modernization in Practice
IT Modernization in Practice
 
In-Memory Key Value Store (KVS) in FPGA for Ultra Low Latency and High Throug...
In-Memory Key Value Store (KVS) in FPGA for Ultra Low Latency and High Throug...In-Memory Key Value Store (KVS) in FPGA for Ultra Low Latency and High Throug...
In-Memory Key Value Store (KVS) in FPGA for Ultra Low Latency and High Throug...
 
Machine learning and deep learning with Apache Ignite
Machine learning and deep learning with Apache IgniteMachine learning and deep learning with Apache Ignite
Machine learning and deep learning with Apache Ignite
 
Heimdall Data: "Increase Application Performance with SQL Auto-Caching; No Co...
Heimdall Data: "Increase Application Performance with SQL Auto-Caching; No Co...Heimdall Data: "Increase Application Performance with SQL Auto-Caching; No Co...
Heimdall Data: "Increase Application Performance with SQL Auto-Caching; No Co...
 
Comparing Apache Ignite and Cassandra for Hybrid Transactional/Analytical Pro...
Comparing Apache Ignite and Cassandra for Hybrid Transactional/Analytical Pro...Comparing Apache Ignite and Cassandra for Hybrid Transactional/Analytical Pro...
Comparing Apache Ignite and Cassandra for Hybrid Transactional/Analytical Pro...
 
Quick MySQL performance check
Quick MySQL performance checkQuick MySQL performance check
Quick MySQL performance check
 

Recently uploaded

Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
sameer shah
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
SaffaIbrahim1
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
y3i0qsdzb
 

Recently uploaded (20)

Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
 

Improving Apache Spark™ In-Memory Computing with Apache Ignite™

  • 1. © 2018 GridGain Systems, Inc. Improving Apache Spark™ In-Memory Computing with Apache Ignite™ Valentin Kulichenko GridGain Systems
  • 2. © 2018 GridGain Systems, Inc. a memory-centric distributed database, caching, and processing platform for transactional, analytical, and streaming workloads, delivering in-memory speeds at petabyte scale
  • 3. © 2018 GridGain Systems, Inc. Apache Ignite Database and Caching Platform Memory-Centric Storage Ignite Native Persistence (Flash, SSD, Intel 3D XPoint) Third-Party Persistence (RDBMS, HDFS, NoSQL) SQL Transactions Compute Services MLStreamingKey/Value IoTFinancial Services Pharma & Healthcare E-CommerceTravel & Logistics Telco
  • 4. © 2018 GridGain Systems, Inc. • Distributed memory-centric database • Ingests data from HDFS or another storage • Fully fledged compute platform: SQL, transactions, key-value, collocated processing, ML/DL • Streaming and compute engine • OLAP and OLTP • Inclined towards OLAP and focused on MR payloads Comparing Ignite and Spark
  • 5. © 2018 GridGain Systems, Inc. Ignite is a memory-centric store for Spark • No data movement from Ignite to Spark • In-place query execution • Boost DataFrame and SQL performance • Share state and data among Spark jobs • Faster data and streaming analytics Ignite and Spark Together +
  • 6. © 2018 GridGain Systems, Inc. Ignite and Spark Integration Spark Application Spark Worker Spark Job Spark Job Yarn Mesos Docker HDFS Spark Worker Spark Job Spark Job Spark Worker Spark Job Spark Job In-Memory Shared RDD or DataFrame GridGain Node GridGain Node GridGain Node Share state and data among Spark jobs No data movement Boost DataFrame and SQL Performance SQL on top of RDDs In-place query execution
  • 7. © 2018 GridGain Systems, Inc. • Spark RDD abstraction • Shared view over Ignite cache/table • Mutable • Ignite SQL on top of RDDs APIs • Indexes and in-place execution Ignite Shared RDDs
  • 8. © 2018 GridGain Systems, Inc. • Standard RDD APIs + Ignite SQL • No rip-and-replace • Switch to Ignite as a storage Write to and Read from Ignite val sharedRDD: IgniteRDD[int, int] = ic.fromCache(”sharedRDD") val greaterThanFiftyThousand = sharedRDD.filter(_._2 > 50000) val df = sharedRDD.sql(”select _val from Integer where _key > 50000”) val sharedRDD: IgniteRDD[int, int] = ic.fromCache(”sharedRDD") sharedRDD.savePairs(sc.parallelize(1 to 100000, 10).map(i => (i, i)))
  • 9. © 2018 GridGain Systems, Inc. • Optimizing Spark’s Catalyst Engine • In-place execution on Ignite side • No data movement • For most of the scenarios Ignite DataFrames
  • 10. © 2017 GridGain Systems, Inc. 1. Initial Query 2. Query execution over local data 3. Reduce multiple results in one Ignite Node Canada Toronto Ottawa Montreal Calgary Ignite Node India Mumbai New Delhi 1 2 23 SQL Queries Execution Flow
  • 11. © 2018 GridGain Systems, Inc. • Store DataFrames in Ignite • Save modes • Append • Overwrite • ErrorIfExists • Ignore SparkSession spark = _ String cfgPath = “path/to/config/file” Dataset<Row> jsonDataFrame = spark.read().json(“path/to/file.json”); jsonDataFrame.write() .format(IgniteDataFrameSettings.FORMAT_IGNITE()) .mode(SaveMode.Append) // SaveMode //... other options .save(); Saving DataFrames
  • 12. © 2018 GridGain Systems, Inc. • Read from Ignite • Specify format • Specify config file SparkSession spark = _ String cfgPath = “path/to/config/file” Dataset<Row> df = spark.read() .format(IgniteDataFrameSettings.FORMAT_IGNITE()) //Data source .option(IgniteDataFrameSettings.OPTION_TABLE(), "person") //Table to read .option(IgniteDataFrameSettings.OPTION_CONFIG_FILE(), cfgPath) //Ignite config .load(); df.createOrReplaceTempView("person"); Dataset<Row> igniteDF = spark.sql( "SELECT * FROM person WHERE name = 'Mary Major'"); Reading DataFrames
  • 13. © 2018 GridGain Systems, Inc. • 1 Ignite Server Node • SensorDataGenerator • Writes random data to a socket • Stream • Connects to the socket, reads sensor data and streams via Spark; for each streamed RDD, it creates a DataFrame and saves it into Ignite • Query • Creates another Spark application that uses DataFrames integration to query data from Ignite DataFrames Demo Setup +
  • 14. © 2018 GridGain Systems, Inc. Any Questions? Thank you for joining us. Follow the conversation. http://ignite.apache.org #apacheignite