Submit Search
Upload
Spark Summit EU talk by Shaun Klopfenstein and Neelesh Shastry
•
1 like
•
1,062 views
Spark Summit
Follow
Spark Streaming in a Multitenant World
Read less
Read more
Data & Analytics
Report
Share
Report
Share
1 of 29
Download now
Download to read offline
Recommended
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark Summit
Effective Spark on Multi-Tenant Clusters
Effective Spark on Multi-Tenant Clusters
DataWorks Summit/Hadoop Summit
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
Demystifying Data Warehouse as a Service
Demystifying Data Warehouse as a Service
Snowflake Computing
The architecture of data analytics PaaS on AWS
The architecture of data analytics PaaS on AWS
Treasure Data, Inc.
Oracle Goldengate for Big Data - LendingClub Implementation
Oracle Goldengate for Big Data - LendingClub Implementation
Vengata Guruswamy
data warehouse vs data lake
data warehouse vs data lake
Polestarsolutions
Admission Control in Impala
Admission Control in Impala
Cloudera, Inc.
Recommended
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark Summit
Effective Spark on Multi-Tenant Clusters
Effective Spark on Multi-Tenant Clusters
DataWorks Summit/Hadoop Summit
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
Demystifying Data Warehouse as a Service
Demystifying Data Warehouse as a Service
Snowflake Computing
The architecture of data analytics PaaS on AWS
The architecture of data analytics PaaS on AWS
Treasure Data, Inc.
Oracle Goldengate for Big Data - LendingClub Implementation
Oracle Goldengate for Big Data - LendingClub Implementation
Vengata Guruswamy
data warehouse vs data lake
data warehouse vs data lake
Polestarsolutions
Admission Control in Impala
Admission Control in Impala
Cloudera, Inc.
Oracle Real Application Clusters (RAC) 12c Rel. 2 - Operational Best Practices
Oracle Real Application Clusters (RAC) 12c Rel. 2 - Operational Best Practices
Markus Michalewicz
Transactional operations in Apache Hive: present and future
Transactional operations in Apache Hive: present and future
DataWorks Summit
Data platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptx
CalvinSim10
Free Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
Databricks
Deep Dive into Apache Kafka
Deep Dive into Apache Kafka
confluent
Improving Data Locality for Spark Jobs on Kubernetes Using Alluxio
Improving Data Locality for Spark Jobs on Kubernetes Using Alluxio
Alluxio, Inc.
Less11 auditing
Less11 auditing
Amit Bhalla
Some Iceberg Basics for Beginners (CDP).pdf
Some Iceberg Basics for Beginners (CDP).pdf
Michael Kogan
Apache Kafka
Apache Kafka
emreakis
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
Cloudera, Inc.
Exadata master series_asm_2020
Exadata master series_asm_2020
Anil Nair
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Databricks
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark Summit
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Amazon Web Services
Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky
Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky
Databricks
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Databricks
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
HostedbyConfluent
Size Matters-Best Practices for Trillion Row Datasets on ClickHouse-2202-08-1...
Size Matters-Best Practices for Trillion Row Datasets on ClickHouse-2202-08-1...
Altinity Ltd
Introduction to Apache Kafka
Introduction to Apache Kafka
Shiao-An Yuan
Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)
Timothy Spann
Spark Summit EU talk by Tug Grall
Spark Summit EU talk by Tug Grall
Spark Summit
Spark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit
More Related Content
What's hot
Oracle Real Application Clusters (RAC) 12c Rel. 2 - Operational Best Practices
Oracle Real Application Clusters (RAC) 12c Rel. 2 - Operational Best Practices
Markus Michalewicz
Transactional operations in Apache Hive: present and future
Transactional operations in Apache Hive: present and future
DataWorks Summit
Data platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptx
CalvinSim10
Free Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
Databricks
Deep Dive into Apache Kafka
Deep Dive into Apache Kafka
confluent
Improving Data Locality for Spark Jobs on Kubernetes Using Alluxio
Improving Data Locality for Spark Jobs on Kubernetes Using Alluxio
Alluxio, Inc.
Less11 auditing
Less11 auditing
Amit Bhalla
Some Iceberg Basics for Beginners (CDP).pdf
Some Iceberg Basics for Beginners (CDP).pdf
Michael Kogan
Apache Kafka
Apache Kafka
emreakis
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
Cloudera, Inc.
Exadata master series_asm_2020
Exadata master series_asm_2020
Anil Nair
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Databricks
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark Summit
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Amazon Web Services
Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky
Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky
Databricks
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Databricks
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
HostedbyConfluent
Size Matters-Best Practices for Trillion Row Datasets on ClickHouse-2202-08-1...
Size Matters-Best Practices for Trillion Row Datasets on ClickHouse-2202-08-1...
Altinity Ltd
Introduction to Apache Kafka
Introduction to Apache Kafka
Shiao-An Yuan
Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)
Timothy Spann
What's hot
(20)
Oracle Real Application Clusters (RAC) 12c Rel. 2 - Operational Best Practices
Oracle Real Application Clusters (RAC) 12c Rel. 2 - Operational Best Practices
Transactional operations in Apache Hive: present and future
Transactional operations in Apache Hive: present and future
Data platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptx
Free Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
Deep Dive into Apache Kafka
Deep Dive into Apache Kafka
Improving Data Locality for Spark Jobs on Kubernetes Using Alluxio
Improving Data Locality for Spark Jobs on Kubernetes Using Alluxio
Less11 auditing
Less11 auditing
Some Iceberg Basics for Beginners (CDP).pdf
Some Iceberg Basics for Beginners (CDP).pdf
Apache Kafka
Apache Kafka
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
Exadata master series_asm_2020
Exadata master series_asm_2020
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky
Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Size Matters-Best Practices for Trillion Row Datasets on ClickHouse-2202-08-1...
Size Matters-Best Practices for Trillion Row Datasets on ClickHouse-2202-08-1...
Introduction to Apache Kafka
Introduction to Apache Kafka
Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)
Viewers also liked
Spark Summit EU talk by Tug Grall
Spark Summit EU talk by Tug Grall
Spark Summit
Spark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit
Spark Summit EU talk by Miklos Christine paddling up the stream
Spark Summit EU talk by Miklos Christine paddling up the stream
Spark Summit
Integrating Elastic and Apache Spark - Elastic London Meetup (2015-09-24)
Integrating Elastic and Apache Spark - Elastic London Meetup (2015-09-24)
Neil Andrassy
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
DB Tsai
2014 spark with elastic search
2014 spark with elastic search
Henry Saputra
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit
Spark Summit EU talk by Sebastian Schroeder and Ralf Sigmund
Spark Summit EU talk by Sebastian Schroeder and Ralf Sigmund
Spark Summit
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Spark Summit
IoT and the Autonomous Vehicle in the Clouds: Simultaneous Localization and M...
IoT and the Autonomous Vehicle in the Clouds: Simultaneous Localization and M...
Spark Summit
Elasticsearch and Spark
Elasticsearch and Spark
Audible, Inc.
Elasticsearch PHP UG BG
Elasticsearch PHP UG BG
Nikolay Ignatov
Spark Summit EU talk by Ahsan Javed Awan
Spark Summit EU talk by Ahsan Javed Awan
Spark Summit
Spark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas Geerdink
Spark Summit
Spark Summit EU talk by Berni Schiefer
Spark Summit EU talk by Berni Schiefer
Spark Summit
Spark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca Canali
Spark Summit
963
963
Annu Ahmed
Using Apache Spark for Intelligent Services: Keynote at Spark Summit East by ...
Using Apache Spark for Intelligent Services: Keynote at Spark Summit East by ...
Spark Summit
Spark Summit EU talk by Emlyn Whittick
Spark Summit EU talk by Emlyn Whittick
Spark Summit
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit
Viewers also liked
(20)
Spark Summit EU talk by Tug Grall
Spark Summit EU talk by Tug Grall
Spark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Miklos Christine paddling up the stream
Spark Summit EU talk by Miklos Christine paddling up the stream
Integrating Elastic and Apache Spark - Elastic London Meetup (2015-09-24)
Integrating Elastic and Apache Spark - Elastic London Meetup (2015-09-24)
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
2014 spark with elastic search
2014 spark with elastic search
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Sebastian Schroeder and Ralf Sigmund
Spark Summit EU talk by Sebastian Schroeder and Ralf Sigmund
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
IoT and the Autonomous Vehicle in the Clouds: Simultaneous Localization and M...
IoT and the Autonomous Vehicle in the Clouds: Simultaneous Localization and M...
Elasticsearch and Spark
Elasticsearch and Spark
Elasticsearch PHP UG BG
Elasticsearch PHP UG BG
Spark Summit EU talk by Ahsan Javed Awan
Spark Summit EU talk by Ahsan Javed Awan
Spark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Berni Schiefer
Spark Summit EU talk by Berni Schiefer
Spark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca Canali
963
963
Using Apache Spark for Intelligent Services: Keynote at Spark Summit East by ...
Using Apache Spark for Intelligent Services: Keynote at Spark Summit East by ...
Spark Summit EU talk by Emlyn Whittick
Spark Summit EU talk by Emlyn Whittick
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Similar to Spark Summit EU talk by Shaun Klopfenstein and Neelesh Shastry
Accelerating the Data to Value Journey
Accelerating the Data to Value Journey
Denodo
Business driven IT design
Business driven IT design
Chris Haddad
Using Web Data to Drive Revenue and Reduce Costs
Using Web Data to Drive Revenue and Reduce Costs
Connotate
Using Web Data to Drive Revenue and Reduce Costs
Using Web Data to Drive Revenue and Reduce Costs
Connotate
Business Driven IT Design
Business Driven IT Design
WSO2
Analytics in the Cloud and the ROI for B2B
Analytics in the Cloud and the ROI for B2B
Veronica Kirn
Overcoming Barriers to the Cloud
Overcoming Barriers to the Cloud
Andy Milsark
Cio by request full proposal
Cio by request full proposal
CIO_By_Request
Rebuilding Web Tracking Infrastructure for Scale
Rebuilding Web Tracking Infrastructure for Scale
DataWorks Summit/Hadoop Summit
Assessing New Databases– Translytical Use Cases
Assessing New Databases– Translytical Use Cases
DATAVERSITY
How To Position Cloud
How To Position Cloud
Vishal Sharma
2016 DSG Webinar Azure HDInsight 2 V4
2016 DSG Webinar Azure HDInsight 2 V4
Janani Eshwaran
2016 DSG Webinar Azure HDInsight 2 V4
2016 DSG Webinar Azure HDInsight 2 V4
Janani Eshwaran
Cloud Computing for CPAs: What Your Client Will Ask You
Cloud Computing for CPAs: What Your Client Will Ask You
Wipfli LLP/Brittenford Systems Inc.
Cloud Technology and Your Printing Business
Cloud Technology and Your Printing Business
CPLLC2016
Cloud Services Brokerage Demystified
Cloud Services Brokerage Demystified
Zach Gardner
Accelerate Digital Transformation with Data Virtualization in Banking, Financ...
Accelerate Digital Transformation with Data Virtualization in Banking, Financ...
Denodo
THT10839_OpenWorldSF2015 CSP Location Data Monetization V1.0
THT10839_OpenWorldSF2015 CSP Location Data Monetization V1.0
Srini Alavala
2018-09-20 Accounting Systems Comparison Seminar
2018-09-20 Accounting Systems Comparison Seminar
Raffa Learning Community
eFolder Expert Series Webinar — Profiting As Your Clients Move to the Cloud F...
eFolder Expert Series Webinar — Profiting As Your Clients Move to the Cloud F...
eFolder
Similar to Spark Summit EU talk by Shaun Klopfenstein and Neelesh Shastry
(20)
Accelerating the Data to Value Journey
Accelerating the Data to Value Journey
Business driven IT design
Business driven IT design
Using Web Data to Drive Revenue and Reduce Costs
Using Web Data to Drive Revenue and Reduce Costs
Using Web Data to Drive Revenue and Reduce Costs
Using Web Data to Drive Revenue and Reduce Costs
Business Driven IT Design
Business Driven IT Design
Analytics in the Cloud and the ROI for B2B
Analytics in the Cloud and the ROI for B2B
Overcoming Barriers to the Cloud
Overcoming Barriers to the Cloud
Cio by request full proposal
Cio by request full proposal
Rebuilding Web Tracking Infrastructure for Scale
Rebuilding Web Tracking Infrastructure for Scale
Assessing New Databases– Translytical Use Cases
Assessing New Databases– Translytical Use Cases
How To Position Cloud
How To Position Cloud
2016 DSG Webinar Azure HDInsight 2 V4
2016 DSG Webinar Azure HDInsight 2 V4
2016 DSG Webinar Azure HDInsight 2 V4
2016 DSG Webinar Azure HDInsight 2 V4
Cloud Computing for CPAs: What Your Client Will Ask You
Cloud Computing for CPAs: What Your Client Will Ask You
Cloud Technology and Your Printing Business
Cloud Technology and Your Printing Business
Cloud Services Brokerage Demystified
Cloud Services Brokerage Demystified
Accelerate Digital Transformation with Data Virtualization in Banking, Financ...
Accelerate Digital Transformation with Data Virtualization in Banking, Financ...
THT10839_OpenWorldSF2015 CSP Location Data Monetization V1.0
THT10839_OpenWorldSF2015 CSP Location Data Monetization V1.0
2018-09-20 Accounting Systems Comparison Seminar
2018-09-20 Accounting Systems Comparison Seminar
eFolder Expert Series Webinar — Profiting As Your Clients Move to the Cloud F...
eFolder Expert Series Webinar — Profiting As Your Clients Move to the Cloud F...
More from Spark Summit
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
Spark Summit
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Spark Summit
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
Spark Summit
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Spark Summit
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
Spark Summit
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
Spark Summit
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Spark Summit
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spark Summit
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spark Summit
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
Spark Summit
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Spark Summit
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Spark Summit
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
More from Spark Summit
(20)
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Recently uploaded
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdf
MichaelSenkow
Machine Learning for Accident Severity Prediction
Machine Learning for Accident Severity Prediction
Boston Institute of Analytics
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Jack Cole
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
pyhepag
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
pyhepag
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting
Alison Pitt
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
cyebo
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
pyhepag
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)
Jon Hansen
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
pyhepag
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
CEPTES Software Inc
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call
lward7
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
DilipVasan
how can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like Bitcoin
DOT TECH
basics of data science with application areas.pdf
basics of data science with application areas.pdf
vyankatesh1
How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prison
Payment Village
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?
DOT TECH
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptx
Stephen266013
Easy and simple project file on mp online
Easy and simple project file on mp online
balibahu1313
Slip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp Claims
Bisnar Chase Personal Injury Attorneys
Recently uploaded
(20)
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdf
Machine Learning for Accident Severity Prediction
Machine Learning for Accident Severity Prediction
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
how can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like Bitcoin
basics of data science with application areas.pdf
basics of data science with application areas.pdf
How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prison
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptx
Easy and simple project file on mp online
Easy and simple project file on mp online
Slip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp Claims
Spark Summit EU talk by Shaun Klopfenstein and Neelesh Shastry
1.
Elastic Streaming Spark Streaming
+ Dynamic Provisioning + Dynamic Allocation Neelesh Shastry, Architect Shaun Klopfenstein, CTO
2.
The Vision
3.
Requirements
4.
Page 4 Marketo Proprietary
and Confidential | © Marketo, Inc. 10/30/16 Business Requirements • Near real-time activity processing • Billons activities per customer per day • Improve cost efficiency of operations while scaling up • Global enterprise grade security and governance
5.
Page 5 Marketo Proprietary
and Confidential | © Marketo, Inc. 10/30/16 SAAS Requirements • Customers are added and removed • Fairness and throttling per customer • Strict sequential event processing for some applications • Temporarily suspend a customer, when errors occur
6.
Technology Selection
7.
Page 7 Marketo Proprietary
and Confidential | © Marketo, Inc. 10/30/16 Use Cases • React to activities • Send an email when someone visits a web page • Change the score when someone fills a form • Replicate data • Build Solr Indexes, near real-time • Update DataXChange – an internal lead cache • Sync to/from CRM Systems • Analytics • Incrementally update email reports • Enrich activities and feed to Druid for advanced email/web reports
8.
Page 8 Marketo Proprietary
and Confidential | © Marketo, Inc. 10/30/16 Why Spark Streaming? • Micro-batching provides sink-side efficiencies • Great integration with Kafka • No strict real time processing requirements • Great community, industry adoption
9.
Page 9 Marketo Proprietary
and Confidential | © Marketo, Inc. 10/30/16 Challenges with Spark + Kafka • No way to add/remove topics on the fly • No out of the box support for sequencing RDDs • No support for turning off topics under errors • Does not play well with scaling Kafka partitions up/down, when ordering is required
10.
Page 10 Marketo Proprietary
and Confidential | © Marketo, Inc. 10/30/16 Challenges - Stragglers • A batch can’t complete until the slowest operation finishes • Many of our batches include slow operations • Sometimes don’t complete within the batch time • Batches are multitenant • one customers operation can delay processing for other customers in the same batch • Severe impact on utilization & batch delay
11.
Architecture & Design
12.
Page 12 Marketo Proprietary
and Confidential | © Marketo, Inc. 10/30/16 Marketo Activity Architecture
13.
Page 13 Marketo Proprietary
and Confidential | © Marketo, Inc. 10/30/16 Kafka Topics Organization • One topic per use case, data from all customers • Easy to manage • A single customer can create backlogs for others during activity storms • Fairness/throttling is hard to implement • One topic per use case, per customer • Storms are isolated to the customer • Fairness/throttling is easy to control, by tweaking the topic • Pressure on Kafka ZK – so far not a problem
14.
Solutions
15.
Page 15 Marketo Proprietary
and Confidential | © Marketo, Inc. 10/30/16 Dynamic provisioning capacity Job Generator DAG Scheduler Executor 1 Executor 2 Multitenant Kafka DStream Offset Manager Provisioning Framework Customer Registry Add/Remove Check & Pull Changes compute#Get new offsets Generate RDD Submit Job Schedule Tasks
16.
Page 16 Marketo Proprietary
and Confidential | © Marketo, Inc. 10/30/16 Marketo Offset Manager • Tracks multitenancy • Streaming Jobs process data for many customers • Accessing multiple Kafka topics and partitions • Adds new topics • Remove/Deactivate/Suspend topics
17.
Page 17 Marketo Proprietary
and Confidential | © Marketo, Inc. 10/30/16 • Enables efficient multitenant RDDs • Controlled sequencing of RDDs • Coalesce Kafka partitions • Bin-packing for efficiency • Maintains partition lineage for offset management Multitenant DStream
18.
Page 18 Marketo Proprietary
and Confidential | © Marketo, Inc. 10/30/16 Provisioning • Manages allocating customers to a spark streaming application • round robin + resource affinity • Enables rebalancing of customers across spark streaming jobs • Oozie based framework
19.
Page 19 Marketo Proprietary
and Confidential | © Marketo, Inc. 10/30/16 Dynamic Resource Allocation • SPARK-12133 • Goal – “make processing time infinitely close to duration” • Assumes tasks are roughly similar • Stragglers throw this goal off • What we really want : • DRA + Safe concurrent job execution
20.
Page 20 Marketo Proprietary
and Confidential | © Marketo, Inc. 10/30/16 Results so far • ~ 10 different use cases • > 100 Spark Executors • >1000 Kafka Partitions • Processing latencies < 5s (99th %) • Rolled out to ~20% customers
21.
Future Work
22.
Page 22 Marketo Proprietary
and Confidential | © Marketo, Inc. 10/30/16 Application Scheduling • Scheduling within an application to handle stragglers • spark.streaming.concurrentJobs • Exploring scheduler pools • Changes to Streaming Job Scheduler, to execute multiple RDDs safely
23.
Page 23 Marketo Proprietary
and Confidential | © Marketo, Inc. 10/30/16 Scaling Up Kafka Partitions • Our customers grow in size over a period of time • Ordering requirements mean we cannot alter topic on the fly • Coordination required on both producer & consumer fronts • Enhance provisionerto manage partition up/down scaling
24.
Page 24 Marketo Proprietary
and Confidential | © Marketo, Inc. 10/30/16 Move to 2.x and Open Source!
25.
We’re Hiring! Http://Marketo.Jobs Q &
A
26.
Q & A
27.
Page 27 Marketo Proprietary
and Confidential | © Marketo, Inc. 10/30/16 Architecture Requirements • Maximize utilization of hardware • Multitenancy support with fairness • Encryption, Authorization & Authentication • Applications must scale horizontally
28.
Deploying It
29.
Running It
Download now