Submit Search
Upload
Spark Summit EU talk by Christos Erotocritou
•
2 likes
•
1,115 views
Spark Summit
Follow
Better Together: Fast Data with Apache Spark and Apache Ignite
Read less
Read more
Data & Analytics
Report
Share
Report
Share
1 of 25
Download now
Download to read offline
Recommended
Spark Summit San Francisco 2016 - Ali Ghodsi Keynote
Spark Summit San Francisco 2016 - Ali Ghodsi Keynote
Databricks
Spark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit
Spark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas Geerdink
Spark Summit
Big Telco - Yousun Jeong
Big Telco - Yousun Jeong
Spark Summit
End-to-End Data Pipelines with Apache Spark
End-to-End Data Pipelines with Apache Spark
Burak Yavuz
Simplifying Big Data Applications with Apache Spark 2.0
Simplifying Big Data Applications with Apache Spark 2.0
Spark Summit
Spark Summit EU talk by Elena Lazovik
Spark Summit EU talk by Elena Lazovik
Spark Summit
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...
Khai Tran
Recommended
Spark Summit San Francisco 2016 - Ali Ghodsi Keynote
Spark Summit San Francisco 2016 - Ali Ghodsi Keynote
Databricks
Spark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit
Spark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas Geerdink
Spark Summit
Big Telco - Yousun Jeong
Big Telco - Yousun Jeong
Spark Summit
End-to-End Data Pipelines with Apache Spark
End-to-End Data Pipelines with Apache Spark
Burak Yavuz
Simplifying Big Data Applications with Apache Spark 2.0
Simplifying Big Data Applications with Apache Spark 2.0
Spark Summit
Spark Summit EU talk by Elena Lazovik
Spark Summit EU talk by Elena Lazovik
Spark Summit
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...
Khai Tran
Spark Summit EU talk by Stephan Kessler
Spark Summit EU talk by Stephan Kessler
Spark Summit
Spark Summit EU talk by Rolf Jagerman
Spark Summit EU talk by Rolf Jagerman
Spark Summit
From R Script to Production Using rsparkling with Navdeep Gill
From R Script to Production Using rsparkling with Navdeep Gill
Databricks
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Spark Summit
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Databricks
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Databricks
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
Databricks
Parallelizing Large Simulations with Apache SparkR with Daniel Jeavons and Wa...
Parallelizing Large Simulations with Apache SparkR with Daniel Jeavons and Wa...
Spark Summit
Building the Foundations of an Intelligent, Event-Driven Data Platform at EFSA
Building the Foundations of an Intelligent, Event-Driven Data Platform at EFSA
Databricks
Trends for Big Data and Apache Spark in 2017 by Matei Zaharia
Trends for Big Data and Apache Spark in 2017 by Matei Zaharia
Spark Summit
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
Databricks
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
Databricks
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Databricks
Spark Summit EU talk by John Musser
Spark Summit EU talk by John Musser
Spark Summit
H2O World - H2O Rains with Databricks Cloud
H2O World - H2O Rains with Databricks Cloud
Sri Ambati
Spark Summit EU talk by Michael Nitschinger
Spark Summit EU talk by Michael Nitschinger
Spark Summit
Top 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applications
hadooparchbook
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Databricks
OAP: Optimized Analytics Package for Spark Platform with Daoyuan Wang and Yua...
OAP: Optimized Analytics Package for Spark Platform with Daoyuan Wang and Yua...
Databricks
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
Databricks
Kafka for data scientists
Kafka for data scientists
Jenn Rawlins
Streaming datasets for personalization
Streaming datasets for personalization
Shriya Arora
More Related Content
What's hot
Spark Summit EU talk by Stephan Kessler
Spark Summit EU talk by Stephan Kessler
Spark Summit
Spark Summit EU talk by Rolf Jagerman
Spark Summit EU talk by Rolf Jagerman
Spark Summit
From R Script to Production Using rsparkling with Navdeep Gill
From R Script to Production Using rsparkling with Navdeep Gill
Databricks
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Spark Summit
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Databricks
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Databricks
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
Databricks
Parallelizing Large Simulations with Apache SparkR with Daniel Jeavons and Wa...
Parallelizing Large Simulations with Apache SparkR with Daniel Jeavons and Wa...
Spark Summit
Building the Foundations of an Intelligent, Event-Driven Data Platform at EFSA
Building the Foundations of an Intelligent, Event-Driven Data Platform at EFSA
Databricks
Trends for Big Data and Apache Spark in 2017 by Matei Zaharia
Trends for Big Data and Apache Spark in 2017 by Matei Zaharia
Spark Summit
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
Databricks
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
Databricks
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Databricks
Spark Summit EU talk by John Musser
Spark Summit EU talk by John Musser
Spark Summit
H2O World - H2O Rains with Databricks Cloud
H2O World - H2O Rains with Databricks Cloud
Sri Ambati
Spark Summit EU talk by Michael Nitschinger
Spark Summit EU talk by Michael Nitschinger
Spark Summit
Top 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applications
hadooparchbook
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Databricks
OAP: Optimized Analytics Package for Spark Platform with Daoyuan Wang and Yua...
OAP: Optimized Analytics Package for Spark Platform with Daoyuan Wang and Yua...
Databricks
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
Databricks
What's hot
(20)
Spark Summit EU talk by Stephan Kessler
Spark Summit EU talk by Stephan Kessler
Spark Summit EU talk by Rolf Jagerman
Spark Summit EU talk by Rolf Jagerman
From R Script to Production Using rsparkling with Navdeep Gill
From R Script to Production Using rsparkling with Navdeep Gill
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
Parallelizing Large Simulations with Apache SparkR with Daniel Jeavons and Wa...
Parallelizing Large Simulations with Apache SparkR with Daniel Jeavons and Wa...
Building the Foundations of an Intelligent, Event-Driven Data Platform at EFSA
Building the Foundations of an Intelligent, Event-Driven Data Platform at EFSA
Trends for Big Data and Apache Spark in 2017 by Matei Zaharia
Trends for Big Data and Apache Spark in 2017 by Matei Zaharia
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Spark Summit EU talk by John Musser
Spark Summit EU talk by John Musser
H2O World - H2O Rains with Databricks Cloud
H2O World - H2O Rains with Databricks Cloud
Spark Summit EU talk by Michael Nitschinger
Spark Summit EU talk by Michael Nitschinger
Top 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applications
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
OAP: Optimized Analytics Package for Spark Platform with Daoyuan Wang and Yua...
OAP: Optimized Analytics Package for Spark Platform with Daoyuan Wang and Yua...
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
Viewers also liked
Kafka for data scientists
Kafka for data scientists
Jenn Rawlins
Streaming datasets for personalization
Streaming datasets for personalization
Shriya Arora
Wrangling Big Data in a Small Tech Ecosystem
Wrangling Big Data in a Small Tech Ecosystem
Shalin Hai-Jew
Online learning with structured streaming, spark summit brussels 2016
Online learning with structured streaming, spark summit brussels 2016
Ram Sriharsha
Kafka Streams: The Stream Processing Engine of Apache Kafka
Kafka Streams: The Stream Processing Engine of Apache Kafka
Eno Thereska
A little bit of clojure
A little bit of clojure
Ben Stopford
Best Practices for testing of SOA-based systems - with examples of SOA Suite 11g
Best Practices for testing of SOA-based systems - with examples of SOA Suite 11g
Guido Schmutz
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit
Big Data & the Enterprise
Big Data & the Enterprise
Ben Stopford
Sessionization with Spark streaming
Sessionization with Spark streaming
Ramūnas Urbonas
Continuous Application with Structured Streaming 2.0
Continuous Application with Structured Streaming 2.0
Anyscale
Building a Real-Time Forecasting Engine with Scala and Akka
Building a Real-Time Forecasting Engine with Scala and Akka
Lightbend
Data Stream Analytics - Why they are important
Data Stream Analytics - Why they are important
Paris Carbone
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise Control
Jiangjie Qin
Apache Storm Tutorial
Apache Storm Tutorial
Davide Mazza
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thessaloniki
React Fast by Processing Streaming Data in Real-Time
React Fast by Processing Streaming Data in Real-Time
Amazon Web Services
Lightbend Fast Data Platform
Lightbend Fast Data Platform
Lightbend
The return of big iron?
The return of big iron?
Ben Stopford
Big iron 2 (published)
Big iron 2 (published)
Ben Stopford
Viewers also liked
(20)
Kafka for data scientists
Kafka for data scientists
Streaming datasets for personalization
Streaming datasets for personalization
Wrangling Big Data in a Small Tech Ecosystem
Wrangling Big Data in a Small Tech Ecosystem
Online learning with structured streaming, spark summit brussels 2016
Online learning with structured streaming, spark summit brussels 2016
Kafka Streams: The Stream Processing Engine of Apache Kafka
Kafka Streams: The Stream Processing Engine of Apache Kafka
A little bit of clojure
A little bit of clojure
Best Practices for testing of SOA-based systems - with examples of SOA Suite 11g
Best Practices for testing of SOA-based systems - with examples of SOA Suite 11g
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Big Data & the Enterprise
Big Data & the Enterprise
Sessionization with Spark streaming
Sessionization with Spark streaming
Continuous Application with Structured Streaming 2.0
Continuous Application with Structured Streaming 2.0
Building a Real-Time Forecasting Engine with Scala and Akka
Building a Real-Time Forecasting Engine with Scala and Akka
Data Stream Analytics - Why they are important
Data Stream Analytics - Why they are important
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise Control
Apache Storm Tutorial
Apache Storm Tutorial
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
React Fast by Processing Streaming Data in Real-Time
React Fast by Processing Streaming Data in Real-Time
Lightbend Fast Data Platform
Lightbend Fast Data Platform
The return of big iron?
The return of big iron?
Big iron 2 (published)
Big iron 2 (published)
Similar to Spark Summit EU talk by Christos Erotocritou
OSDC 2017 - Christos Erotocritou - Apache ignite in-memory data fabric
OSDC 2017 - Christos Erotocritou - Apache ignite in-memory data fabric
NETWAYS
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017
Codemotion
Improving Apache Spark™ In-Memory Computing with Apache Ignite™
Improving Apache Spark™ In-Memory Computing with Apache Ignite™
Tom Diederich
Data Summer Conf 2018, “Apache Ignite + Apache Spark RDDs and DataFrames inte...
Data Summer Conf 2018, “Apache Ignite + Apache Spark RDDs and DataFrames inte...
Provectus
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
Spark Summit
Operational Intelligence Using Hadoop
Operational Intelligence Using Hadoop
DataWorks Summit
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analytics
kgshukla
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
DataStax
Giga Spaces Data Grid / Data Caching Overview
Giga Spaces Data Grid / Data Caching Overview
jimliddle
Apache Ignite - Distributed Database Orchestration
Apache Ignite - Distributed Database Orchestration
Ariel Jatib
Spark Development Lifecycle at Workday - ApacheCon 2020
Spark Development Lifecycle at Workday - ApacheCon 2020
Pavel Hardak
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
Eren Avşaroğulları
Making Hadoop Realtime by Dr. William Bain of Scaleout Software
Making Hadoop Realtime by Dr. William Bain of Scaleout Software
Data Con LA
Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)
Sid Anand
Azure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User Store
DataStax Academy
The need for more agile analytics platforms.
The need for more agile analytics platforms.
Amy Hodler
Apache Ignite: In-Memory Hammer for Your Data Science Toolkit
Apache Ignite: In-Memory Hammer for Your Data Science Toolkit
Denis Magda
What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017
Databricks
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Ali Hodroj
Petabytes of Data and No Servers: Corteva Scales DNA Analysis to Meet Increas...
Petabytes of Data and No Servers: Corteva Scales DNA Analysis to Meet Increas...
Capgemini
Similar to Spark Summit EU talk by Christos Erotocritou
(20)
OSDC 2017 - Christos Erotocritou - Apache ignite in-memory data fabric
OSDC 2017 - Christos Erotocritou - Apache ignite in-memory data fabric
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017
Improving Apache Spark™ In-Memory Computing with Apache Ignite™
Improving Apache Spark™ In-Memory Computing with Apache Ignite™
Data Summer Conf 2018, “Apache Ignite + Apache Spark RDDs and DataFrames inte...
Data Summer Conf 2018, “Apache Ignite + Apache Spark RDDs and DataFrames inte...
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
Operational Intelligence Using Hadoop
Operational Intelligence Using Hadoop
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analytics
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
Giga Spaces Data Grid / Data Caching Overview
Giga Spaces Data Grid / Data Caching Overview
Apache Ignite - Distributed Database Orchestration
Apache Ignite - Distributed Database Orchestration
Spark Development Lifecycle at Workday - ApacheCon 2020
Spark Development Lifecycle at Workday - ApacheCon 2020
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
Making Hadoop Realtime by Dr. William Bain of Scaleout Software
Making Hadoop Realtime by Dr. William Bain of Scaleout Software
Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)
Azure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User Store
The need for more agile analytics platforms.
The need for more agile analytics platforms.
Apache Ignite: In-Memory Hammer for Your Data Science Toolkit
Apache Ignite: In-Memory Hammer for Your Data Science Toolkit
What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Petabytes of Data and No Servers: Corteva Scales DNA Analysis to Meet Increas...
Petabytes of Data and No Servers: Corteva Scales DNA Analysis to Meet Increas...
More from Spark Summit
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
Spark Summit
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Spark Summit
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
Spark Summit
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Spark Summit
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
Spark Summit
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
Spark Summit
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Spark Summit
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spark Summit
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spark Summit
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
Spark Summit
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Spark Summit
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Spark Summit
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
More from Spark Summit
(20)
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Recently uploaded
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Suhani Kapoor
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
thyngster
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
olyaivanovalion
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
Aishani27
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
Samantha Rae Coolbeth
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
Suhani Kapoor
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
manisha194592
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
shivangimorya083
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
firstjob4
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
Lars Albertsson
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
Neil Barnes
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
olyaivanovalion
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
shivangimorya083
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
ccctableauusergroup
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
Sonatrach
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
Emmanuel Dauda
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
ffjhghh
Recently uploaded
(20)
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
Spark Summit EU talk by Christos Erotocritou
1.
© 2016 The
Apache Software Foundation. Apache, Apache Ignite, the Apache feather and the Apache Ignite logo are trademarks of The Apache Software Foundation. Better Together: Fast Data with Ignite & Spark Christos Erotocritou - Spark Summit EU 2016
2.
© 2016 GridGain
Systems, Inc. Agenda • GridGain & Apache Ignite Project • Ignite In-Memory Data Fabric • Apache Ignite vs. Apache Spark • Hadoop & Spark Integration • Q & A
3.
© 2016 GridGain
Systems, Inc. Apache Ignite Project • 2007: First version of GridGain • Oct. 2014: GridGain contributes Ignite to ASF • Aug. 2015: Ignite is the second fastest project to graduate after Spark • Today: • 82+ contributors and growing rapidly • Huge development momentum - Estimated 233 years of effort since the first commit in February, 2014 [Openhub] • Mature codebase: 840k+ SLOC & more than 17k commits January 2016
4.
© 2016 GridGain
Systems, Inc. • GridGain Enterprise Edition • Is a binary build of Apache Ignite™ created by GridGain • Added enterprise features for enterprise deployments • Earlier features and bug fixes by a few weeks • Heavily tested
5.
© 2016 GridGain
Systems, Inc. Customer Use Cases Automated Trading Systems Real time analysis of trading positions & market risk. High volume transactions, ultra low latencies. Financial Services Fraud Detection, Risk Analysis, Insurance rating and modelling. Online & Mobile Advertising Real time decisions, geo-targeting & retail traffic information. Big Data Analytics Customer 360 view, real-time analysis of KPIs, up-to- the-second operational BI. Online Gaming Real-time back-ends for mobile and massively parallel games. SaaS Platforms & Apps High performance next-generation architectures for Software as a Service Application vendors. Travel & E-Commerce High performance next-generation architectures for online hotel booking.
6.
© 2016 GridGain
Systems, Inc. What is an IMDF? High-performance distributed in-memory platform for computing and transacting on large-scale data sets in near real-time.
7.
© 2016 GridGain
Systems, Inc. What is an IMDF? ‣ HPC ‣ Machine learning ‣ Risk analysis ‣ Grid computing ‣ HA API Services ‣ Scalable Middleware ‣ Web-session clustering ‣ Distributed caching ‣ In-Memory SQL ‣ Real-time Analytics ‣ Big Data ‣ Monitoring tools ‣ Big Data ‣ Realtime Analytics ‣ Batch processing ‣ Distributed In- Memory File System ‣ Node2Node & Topic-based Messaging ‣ Fault Tolerance ‣ Multiple backups ‣ Cluster groups ‣ Auto Rebalancing ‣ Complex event processing ‣ Event driven design ‣ Distributed queues ‣ Atomic variables ‣ Dist. Semaphore
8.
© 2016 GridGain
Systems, Inc. In-Memory Computing Platform Data Grid Batch Data Compute Grid Transactional & Analytical Workloads Transactional & Analytical workloads Streaming Data External Persistency External APIs Back-end users, third-party clients and downstream systems Downstream Systems Clients accessing a high-speed distributed multi-facet service
9.
© 2016 GridGain
Systems, Inc. Scalability & Resilience with Ignite Data Grid Batch Data Compute Grid Transactional & Analytical Workloads Transactional & Analytical workloads Streaming Data External Persistency External APIs Back-end users, third-party clients and downstream systems Downstream Systems Clients accessing a high-speed distributed multi-facet service
10.
© 2016 GridGain
Systems, Inc. Fault Tolerance & Horizontal Scalability Replicated Cache Partitioned Cache
11.
© 2016 GridGain
Systems, Inc. Local Store & Vertical Scale • Tiered Memory • On-Heap -> Off-Heap -> Disk • Persistent On-Disk Store • Fast Recovery • Local Data Reload • Eliminate Network and Db impacts when reloading in-memory store
12.
© 2016 GridGain
Systems, Inc. Storage and Caching using Ignite Data Grid Batch Data Compute Grid Transactional & Analytical Workloads Transactional & Analytical workloads Streaming Data External Persistency External APIs Back-end users, third-party clients and downstream systems Downstream Systems Clients accessing a high-speed distributed multi-facet service
13.
© 2016 GridGain
Systems, Inc. • 100% JCache Compliant (JSR 107) – Basic Cache Operations – Concurrent Map APIs – Collocated Processing (EntryProcessor) – Events and Metrics – Pluggable Persistence • Ignite Data Grid – Fault Tolerance and Scalability – Distributed Key-Value Store – SQL Queries (ANSI 99) – ACID Transactions – In-Memory Indexes – RDBMS / NoSQL Integration In-Memory Data Grid
14.
© 2016 GridGain
Systems, Inc. Distributed Computing with Ignite Data Grid Batch Data Compute Grid Transactional & Analytical Workloads Transactional & Analytical workloads Streaming Data External Persistency External APIs Back-end users, third-party clients and downstream systems Downstream Systems Clients accessing a high-speed distributed multi-facet service
15.
© 2016 GridGain
Systems, Inc. Client-Server vs. Affinity Colocation 1 2 4 3 Data 1 Job 1 2 3 Data 2 Job 2 Processing Node 1 Processing Node 2 Client Node Data Node 1 Data Node 2 Processing Node 1 1 3 4 Data 1 Data 2 2 2 1. Initial Request 2. Fetch data from remote nodes 3. Process entire data-set 4. Return to client 1. Initial Request 2. Co-locating processing with data 3. Return partial result 4. Reduce & return to client
16.
© 2016 GridGain
Systems, Inc. • Direct API for MapReduce • Cron-like Task Scheduling • State Checkpoints • Load Balancing • Round-robin • Random & weighted • Automatic Failover • Per-node Shared State • Zero Deployment • Distributed class loading In-Memory Compute Grid
17.
© 2016 GridGain
Systems, Inc. Hadoop & Spark Integration
18.
© 2016 GridGain
Systems, Inc. Apache Ignite Apache Spark – Ingests data from HDFS or another distributed file system – Inclined towards analytics (OLAP) and focused on MR-specific payloads – Requires the creation of RDD and data and processing operations are governed by it – Basic disk-based SQL support – Strong ML libraries – Big community – Data source agnostic – Fully fledged compute engine and resilient data storage in-memory for OLAP & OLTP – Zero-deployment – In-Memory SQL support – Fully ACID transactions across memory and disk – Broader in-memory system that is less focused on Hadoop – Off-heap memory to avoid GC pauses – In production since 2007
19.
© 2016 GridGain
Systems, Inc. • IgniteRDD – Share RDD across jobs on the host – Share RDD across jobs in the application – Share RDD globally • Faster SQL – In-Memory Indexes – SQL on top of Shared RDD Spark & Ignite Integration Spark Application Spark Worker Spark Job Spark Job Ignite Node Yarn Mesos Docker Cloud Server Spark Worker Spark Job Spark Job Ignite Node Server Spark Worker Spark Job Spark Job Ignite Node Server In-Memory Shared RDDs
20.
© 2016 GridGain
Systems, Inc. • Reading values from Ignite: • IgniteContext is the main entry point to Spark-Ignite integration: val igniteContext = new IgniteContext[Integer, Integer] (sparkContext, () => new IgniteConfiguration()) val cache = igniteContext.fromCache("myRdd") val result = cache.filter(_._2.contains("Ignite")).collect() val cacheRdd = igniteContext.fromCache("myRdd") cacheRdd.savePairs(sparkContext.parallelize(1 to 10000, 10).map(i => (i, i))) • Saving values to Ignite: • Running SQL queries against Ignite Cache: val cacheRdd = igniteContext.fromCache("myRdd") val result = cacheRdd.sql ("select _val from Integer where val > ? and val < ?", 10, 100) Spark & Ignite Integration: IgniteRDD
21.
© 2016 GridGain
Systems, Inc. Spark Integration: Using Dataframes from IgniteRDDs // Create an IgniteRDD val companyCacheIgnite = new IgniteContext[Int, String](sc, () => new IgniteConfiguration()).fromCache("CompanyCache") // Create company DataFrame val dfCompany = sqlContext.createDataFrame(companyCacheIgnite.map(p => Company(p._1, p._2))) // Register DataFrame as a table dfCompany.registerTempTable("company")
22.
© 2016 GridGain
Systems, Inc. • Ignite In-Memory File System (IGFS) – Hadoop-compliant – Easy to Install – On-Heap and Off-Heap – Caching Layer for HDFS – Write-through and Read-through HDFS – Performance Boost IGFS: In-Memory File System MR HIVE PIG In-Memory MapReduce IGFS HDFS IGFS YARN }Any Hadoop Distro
23.
© 2016 GridGain
Systems, Inc. Hadoop Accelerator: Map Reduce • In-Memory Performance • Zero Code Change • Use existing MR code • Use existing Hive queries • No Name Node • No Network Noise • In-Process Data Colocation • Eager Push Scheduling User Application Hadoop Client Ignite Client Hadoop Jobtracker Hadoop Name Node Hadoop Tasktracker Hadoop Tasktracker Ignite Data Node (IGFS) Ignite Data Node (IGFS) Hadoop Data Node (HDFS) Hadoop Data Node (HDFS) Ignite Path Hadoop Path
24.
© 2016 GridGain
Systems, Inc. • Docker • Amazon AWS • Azure Marketplace • Google Cloud • Apache JClouds • Mesos • YARN • Apache Karaf (OSGi) Deployment
25.
© 2016 GridGain
Systems, Inc. Thank You! www.gridgain.com @gridgain @ApacheIgnite #gridgain #ApacheIgnite Thank you for joining us. Follow the conversation. Author: Christos Erotocritou github.com/kemiz/SparkIgniteSimpleExample
Download now