SlideShare a Scribd company logo

Building a Streaming Microservices Architecture - Data + AI Summit EU 2020

Databricks
Databricks
DatabricksDeveloper Marketing and Relations at MuleSoft

Jules Damji and Denny Lee from Databricks Developer Relations will recap some keynote highlights, and each will briefly present personal picks from sessions that resonated well with them. Next, Jacek Laskowski, an independent consultant, will speak about Spark 3.0 internals, and Scott Haines from Twilio, Inc. will give a talk about structured streaming microservice architectures. This live coding session and technical deep dive are not to be missed!

Building a Streaming Microservices Architecture - Data + AI Summit EU 2020

1 of 34
Download to read offline
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
Building a Streaming
Microservices
Architecture
With Apache Spark Structured Streaming & Friends
Scott Haines
Senior Principal Software Engineer, Twilio
@newfront
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
A little about me.
• I work at Twilio building massive data systems
• I run a bi-weekly internal Spark Office Hours where I offer
training and guidance to teams at the company
• >12 years working on large distributed analytics systems
@newfront
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
A little about me.
• I work at Twilio building massive data systems
• I run a bi-weekly internal Spark Office Hours where I offer
training and guidance to teams at the company
• >12 years working on large distributed analytics systems
• Published work on Distributed Analytics Systems
@newfront
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
Voice Insights
Accountable observability data and
interactive analytics and insights for
the voice business and customers.
VIRGINIA, USA
DUBLIN, IRELAND
SINGAPORE
SYDNEY, AUSTRALIA
TOKYO, JAPAN
SAO PAULO, BRAZIL
DATA CENT ERS
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
Events per Second
>1MIL
© 2019 TWILIO INC. ALL RIGHTS RESERVED.
Let’s Build a Reliable Data Architecture

Recommended

Building a Cross Cloud Data Protection Engine
Building a Cross Cloud Data Protection EngineBuilding a Cross Cloud Data Protection Engine
Building a Cross Cloud Data Protection EngineDatabricks
 
Scale and Optimize Data Engineering Pipelines with Software Engineering Best ...
Scale and Optimize Data Engineering Pipelines with Software Engineering Best ...Scale and Optimize Data Engineering Pipelines with Software Engineering Best ...
Scale and Optimize Data Engineering Pipelines with Software Engineering Best ...Databricks
 
Data Driven Decisions at Scale
Data Driven Decisions at ScaleData Driven Decisions at Scale
Data Driven Decisions at ScaleDatabricks
 
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...DataWorks Summit
 
Real-Time Robot Predictive Maintenance in Action
Real-Time Robot Predictive Maintenance in ActionReal-Time Robot Predictive Maintenance in Action
Real-Time Robot Predictive Maintenance in ActionDataWorks Summit
 
Automated Testing For Protecting Data Pipelines from Undocumented Assumptions
Automated Testing For Protecting Data Pipelines from Undocumented AssumptionsAutomated Testing For Protecting Data Pipelines from Undocumented Assumptions
Automated Testing For Protecting Data Pipelines from Undocumented AssumptionsDatabricks
 
How to leverage Kafka data streams with Neo4j
How to leverage Kafka data streams with Neo4jHow to leverage Kafka data streams with Neo4j
How to leverage Kafka data streams with Neo4jGraphRM
 
Building Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks DeltaBuilding Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks DeltaDatabricks
 

More Related Content

What's hot

Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...HostedbyConfluent
 
Democratizing data science Using spark, hive and druid
Democratizing data science Using spark, hive and druidDemocratizing data science Using spark, hive and druid
Democratizing data science Using spark, hive and druidDataWorks Summit
 
Delta Lake: Open Source Reliability w/ Apache Spark
Delta Lake: Open Source Reliability w/ Apache SparkDelta Lake: Open Source Reliability w/ Apache Spark
Delta Lake: Open Source Reliability w/ Apache SparkGeorge Chow
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...Data Con LA
 
Building Sessionization Pipeline at Scale with Databricks Delta
Building Sessionization Pipeline at Scale with Databricks DeltaBuilding Sessionization Pipeline at Scale with Databricks Delta
Building Sessionization Pipeline at Scale with Databricks DeltaDatabricks
 
Data Privacy with Apache Spark: Defensive and Offensive Approaches
Data Privacy with Apache Spark: Defensive and Offensive ApproachesData Privacy with Apache Spark: Defensive and Offensive Approaches
Data Privacy with Apache Spark: Defensive and Offensive ApproachesDatabricks
 
Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...
Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...
Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...Databricks
 
Successful AI/ML Projects with End-to-End Cloud Data Engineering
Successful AI/ML Projects with End-to-End Cloud Data EngineeringSuccessful AI/ML Projects with End-to-End Cloud Data Engineering
Successful AI/ML Projects with End-to-End Cloud Data EngineeringDatabricks
 
Reliable Data Intestion in BigData / IoT
Reliable Data Intestion in BigData / IoTReliable Data Intestion in BigData / IoT
Reliable Data Intestion in BigData / IoTGuido Schmutz
 
Apache Spark in Scientific Applciations
Apache Spark in Scientific ApplciationsApache Spark in Scientific Applciations
Apache Spark in Scientific ApplciationsDr. Mirko Kämpf
 
How Workato creates robust data pipelines and automations for you?
How Workato creates robust data pipelines and automations for you?How Workato creates robust data pipelines and automations for you?
How Workato creates robust data pipelines and automations for you?Jeraldine Phneah
 
ML, Statistics, and Spark with Databricks for Maximizing Revenue in a Delayed...
ML, Statistics, and Spark with Databricks for Maximizing Revenue in a Delayed...ML, Statistics, and Spark with Databricks for Maximizing Revenue in a Delayed...
ML, Statistics, and Spark with Databricks for Maximizing Revenue in a Delayed...Databricks
 
Redash: Open Source SQL Analytics on Data Lakes
Redash: Open Source SQL Analytics on Data LakesRedash: Open Source SQL Analytics on Data Lakes
Redash: Open Source SQL Analytics on Data LakesDatabricks
 
Reltio: Powering Enterprise Data-driven Applications with Cassandra
Reltio: Powering Enterprise Data-driven Applications with CassandraReltio: Powering Enterprise Data-driven Applications with Cassandra
Reltio: Powering Enterprise Data-driven Applications with CassandraDataStax Academy
 
How to design and implement a data ops architecture with sdc and gcp
How to design and implement a data ops architecture with sdc and gcpHow to design and implement a data ops architecture with sdc and gcp
How to design and implement a data ops architecture with sdc and gcpJoseph Arriola
 
Northwestern Mutual Journey – Transform BI Space to Cloud
Northwestern Mutual Journey – Transform BI Space to CloudNorthwestern Mutual Journey – Transform BI Space to Cloud
Northwestern Mutual Journey – Transform BI Space to CloudDatabricks
 
Building Custom Big Data Integrations
Building Custom Big Data IntegrationsBuilding Custom Big Data Integrations
Building Custom Big Data IntegrationsPat Patterson
 

What's hot (20)

Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
 
Democratizing data science Using spark, hive and druid
Democratizing data science Using spark, hive and druidDemocratizing data science Using spark, hive and druid
Democratizing data science Using spark, hive and druid
 
Delta Lake: Open Source Reliability w/ Apache Spark
Delta Lake: Open Source Reliability w/ Apache SparkDelta Lake: Open Source Reliability w/ Apache Spark
Delta Lake: Open Source Reliability w/ Apache Spark
 
Intuit Analytics Cloud 101
Intuit Analytics Cloud 101Intuit Analytics Cloud 101
Intuit Analytics Cloud 101
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
 
Building Sessionization Pipeline at Scale with Databricks Delta
Building Sessionization Pipeline at Scale with Databricks DeltaBuilding Sessionization Pipeline at Scale with Databricks Delta
Building Sessionization Pipeline at Scale with Databricks Delta
 
Data Privacy with Apache Spark: Defensive and Offensive Approaches
Data Privacy with Apache Spark: Defensive and Offensive ApproachesData Privacy with Apache Spark: Defensive and Offensive Approaches
Data Privacy with Apache Spark: Defensive and Offensive Approaches
 
Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...
Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...
Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...
 
Successful AI/ML Projects with End-to-End Cloud Data Engineering
Successful AI/ML Projects with End-to-End Cloud Data EngineeringSuccessful AI/ML Projects with End-to-End Cloud Data Engineering
Successful AI/ML Projects with End-to-End Cloud Data Engineering
 
Reliable Data Intestion in BigData / IoT
Reliable Data Intestion in BigData / IoTReliable Data Intestion in BigData / IoT
Reliable Data Intestion in BigData / IoT
 
Apache Spark in Scientific Applciations
Apache Spark in Scientific ApplciationsApache Spark in Scientific Applciations
Apache Spark in Scientific Applciations
 
How Workato creates robust data pipelines and automations for you?
How Workato creates robust data pipelines and automations for you?How Workato creates robust data pipelines and automations for you?
How Workato creates robust data pipelines and automations for you?
 
LinkedIn2
LinkedIn2LinkedIn2
LinkedIn2
 
ML, Statistics, and Spark with Databricks for Maximizing Revenue in a Delayed...
ML, Statistics, and Spark with Databricks for Maximizing Revenue in a Delayed...ML, Statistics, and Spark with Databricks for Maximizing Revenue in a Delayed...
ML, Statistics, and Spark with Databricks for Maximizing Revenue in a Delayed...
 
Redash: Open Source SQL Analytics on Data Lakes
Redash: Open Source SQL Analytics on Data LakesRedash: Open Source SQL Analytics on Data Lakes
Redash: Open Source SQL Analytics on Data Lakes
 
Reltio: Powering Enterprise Data-driven Applications with Cassandra
Reltio: Powering Enterprise Data-driven Applications with CassandraReltio: Powering Enterprise Data-driven Applications with Cassandra
Reltio: Powering Enterprise Data-driven Applications with Cassandra
 
How to design and implement a data ops architecture with sdc and gcp
How to design and implement a data ops architecture with sdc and gcpHow to design and implement a data ops architecture with sdc and gcp
How to design and implement a data ops architecture with sdc and gcp
 
Instrumenting your Instruments
Instrumenting your Instruments Instrumenting your Instruments
Instrumenting your Instruments
 
Northwestern Mutual Journey – Transform BI Space to Cloud
Northwestern Mutual Journey – Transform BI Space to CloudNorthwestern Mutual Journey – Transform BI Space to Cloud
Northwestern Mutual Journey – Transform BI Space to Cloud
 
Building Custom Big Data Integrations
Building Custom Big Data IntegrationsBuilding Custom Big Data Integrations
Building Custom Big Data Integrations
 

Similar to Building a Streaming Microservices Architecture - Data + AI Summit EU 2020

Integrating Postgres with ActiveMQ and Camel
Integrating Postgres with ActiveMQ and CamelIntegrating Postgres with ActiveMQ and Camel
Integrating Postgres with ActiveMQ and CamelJustin Reock
 
Oracle Modern AppDev Approach to Cloud & Container Native App
Oracle Modern AppDev Approach to Cloud & Container Native AppOracle Modern AppDev Approach to Cloud & Container Native App
Oracle Modern AppDev Approach to Cloud & Container Native AppPaulo Alberto Simoes ∴
 
Why Splunk Chose Pulsar_Karthik Ramasamy
Why Splunk Chose Pulsar_Karthik RamasamyWhy Splunk Chose Pulsar_Karthik Ramasamy
Why Splunk Chose Pulsar_Karthik RamasamyStreamNative
 
Pulsar summit-keynote-final
Pulsar summit-keynote-finalPulsar summit-keynote-final
Pulsar summit-keynote-finalKarthik Ramasamy
 
CICDforModernApplications-Oslo.pdf
CICDforModernApplications-Oslo.pdfCICDforModernApplications-Oslo.pdf
CICDforModernApplications-Oslo.pdfAmazon Web Services
 
CI/CD for Containers: A Way Forward for Your DevOps Pipeline
CI/CD for Containers: A Way Forward for Your DevOps PipelineCI/CD for Containers: A Way Forward for Your DevOps Pipeline
CI/CD for Containers: A Way Forward for Your DevOps PipelineAmazon Web Services
 
[2015-11월 정기 세미나] Cloud Native Platform - Pivotal
[2015-11월 정기 세미나] Cloud Native Platform - Pivotal[2015-11월 정기 세미나] Cloud Native Platform - Pivotal
[2015-11월 정기 세미나] Cloud Native Platform - PivotalOpenStack Korea Community
 
Kamailio practice Quobis-University of Vigo Laboratory of Commutation 2012-2...
Kamailio practice Quobis-University of Vigo Laboratory of Commutation  2012-2...Kamailio practice Quobis-University of Vigo Laboratory of Commutation  2012-2...
Kamailio practice Quobis-University of Vigo Laboratory of Commutation 2012-2...Quobis
 
The DevOps Promise: Helping Management Realise the Quality, Velocity & Effici...
The DevOps Promise: Helping Management Realise the Quality, Velocity & Effici...The DevOps Promise: Helping Management Realise the Quality, Velocity & Effici...
The DevOps Promise: Helping Management Realise the Quality, Velocity & Effici...Splunk
 
SplDevOps: Making Splunk Development a Breeze With a Deep Dive on DevOps' Con...
SplDevOps: Making Splunk Development a Breeze With a Deep Dive on DevOps' Con...SplDevOps: Making Splunk Development a Breeze With a Deep Dive on DevOps' Con...
SplDevOps: Making Splunk Development a Breeze With a Deep Dive on DevOps' Con...Harry McLaren
 
Cisco Connect Ottawa 2018 dev net
Cisco Connect Ottawa 2018 dev netCisco Connect Ottawa 2018 dev net
Cisco Connect Ottawa 2018 dev netCisco Canada
 
Continuous Integration and Continuous Delivery Best Practices for Building Mo...
Continuous Integration and Continuous Delivery Best Practices for Building Mo...Continuous Integration and Continuous Delivery Best Practices for Building Mo...
Continuous Integration and Continuous Delivery Best Practices for Building Mo...Amazon Web Services
 
Analytics im DevOps Lebenszyklus
Analytics im DevOps LebenszyklusAnalytics im DevOps Lebenszyklus
Analytics im DevOps LebenszyklusSplunk
 
Emulators as an Emerging Best Practice for API Providers
Emulators as an Emerging Best Practice for API ProvidersEmulators as an Emerging Best Practice for API Providers
Emulators as an Emerging Best Practice for API ProvidersCisco DevNet
 
AWS DevDay Cologne - CI/CD for modern applications
AWS DevDay Cologne - CI/CD for modern applicationsAWS DevDay Cologne - CI/CD for modern applications
AWS DevDay Cologne - CI/CD for modern applicationsCobus Bernard
 
Leveraging Splunk Enterprise Security with the MITRE’s ATT&CK Framework
Leveraging Splunk Enterprise Security with the MITRE’s ATT&CK FrameworkLeveraging Splunk Enterprise Security with the MITRE’s ATT&CK Framework
Leveraging Splunk Enterprise Security with the MITRE’s ATT&CK FrameworkSplunk
 
TechWiseTV Workshop: Cisco Hybrid Cloud Platform for Google Cloud
TechWiseTV Workshop:  Cisco Hybrid Cloud Platform for Google CloudTechWiseTV Workshop:  Cisco Hybrid Cloud Platform for Google Cloud
TechWiseTV Workshop: Cisco Hybrid Cloud Platform for Google CloudRobb Boyd
 

Similar to Building a Streaming Microservices Architecture - Data + AI Summit EU 2020 (20)

Apache Pulsar @Splunk
Apache Pulsar @SplunkApache Pulsar @Splunk
Apache Pulsar @Splunk
 
Integrating Postgres with ActiveMQ and Camel
Integrating Postgres with ActiveMQ and CamelIntegrating Postgres with ActiveMQ and Camel
Integrating Postgres with ActiveMQ and Camel
 
Oracle Modern AppDev Approach to Cloud & Container Native App
Oracle Modern AppDev Approach to Cloud & Container Native AppOracle Modern AppDev Approach to Cloud & Container Native App
Oracle Modern AppDev Approach to Cloud & Container Native App
 
Why Splunk Chose Pulsar_Karthik Ramasamy
Why Splunk Chose Pulsar_Karthik RamasamyWhy Splunk Chose Pulsar_Karthik Ramasamy
Why Splunk Chose Pulsar_Karthik Ramasamy
 
Pulsar summit-keynote-final
Pulsar summit-keynote-finalPulsar summit-keynote-final
Pulsar summit-keynote-final
 
CICDforModernApplications-Oslo.pdf
CICDforModernApplications-Oslo.pdfCICDforModernApplications-Oslo.pdf
CICDforModernApplications-Oslo.pdf
 
Netflix MSA and Pivotal
Netflix MSA and PivotalNetflix MSA and Pivotal
Netflix MSA and Pivotal
 
CI/CD for Containers: A Way Forward for Your DevOps Pipeline
CI/CD for Containers: A Way Forward for Your DevOps PipelineCI/CD for Containers: A Way Forward for Your DevOps Pipeline
CI/CD for Containers: A Way Forward for Your DevOps Pipeline
 
CI/CD for Modern Applications
CI/CD for Modern ApplicationsCI/CD for Modern Applications
CI/CD for Modern Applications
 
[2015-11월 정기 세미나] Cloud Native Platform - Pivotal
[2015-11월 정기 세미나] Cloud Native Platform - Pivotal[2015-11월 정기 세미나] Cloud Native Platform - Pivotal
[2015-11월 정기 세미나] Cloud Native Platform - Pivotal
 
Kamailio practice Quobis-University of Vigo Laboratory of Commutation 2012-2...
Kamailio practice Quobis-University of Vigo Laboratory of Commutation  2012-2...Kamailio practice Quobis-University of Vigo Laboratory of Commutation  2012-2...
Kamailio practice Quobis-University of Vigo Laboratory of Commutation 2012-2...
 
The DevOps Promise: Helping Management Realise the Quality, Velocity & Effici...
The DevOps Promise: Helping Management Realise the Quality, Velocity & Effici...The DevOps Promise: Helping Management Realise the Quality, Velocity & Effici...
The DevOps Promise: Helping Management Realise the Quality, Velocity & Effici...
 
SplDevOps: Making Splunk Development a Breeze With a Deep Dive on DevOps' Con...
SplDevOps: Making Splunk Development a Breeze With a Deep Dive on DevOps' Con...SplDevOps: Making Splunk Development a Breeze With a Deep Dive on DevOps' Con...
SplDevOps: Making Splunk Development a Breeze With a Deep Dive on DevOps' Con...
 
Cisco Connect Ottawa 2018 dev net
Cisco Connect Ottawa 2018 dev netCisco Connect Ottawa 2018 dev net
Cisco Connect Ottawa 2018 dev net
 
Continuous Integration and Continuous Delivery Best Practices for Building Mo...
Continuous Integration and Continuous Delivery Best Practices for Building Mo...Continuous Integration and Continuous Delivery Best Practices for Building Mo...
Continuous Integration and Continuous Delivery Best Practices for Building Mo...
 
Analytics im DevOps Lebenszyklus
Analytics im DevOps LebenszyklusAnalytics im DevOps Lebenszyklus
Analytics im DevOps Lebenszyklus
 
Emulators as an Emerging Best Practice for API Providers
Emulators as an Emerging Best Practice for API ProvidersEmulators as an Emerging Best Practice for API Providers
Emulators as an Emerging Best Practice for API Providers
 
AWS DevDay Cologne - CI/CD for modern applications
AWS DevDay Cologne - CI/CD for modern applicationsAWS DevDay Cologne - CI/CD for modern applications
AWS DevDay Cologne - CI/CD for modern applications
 
Leveraging Splunk Enterprise Security with the MITRE’s ATT&CK Framework
Leveraging Splunk Enterprise Security with the MITRE’s ATT&CK FrameworkLeveraging Splunk Enterprise Security with the MITRE’s ATT&CK Framework
Leveraging Splunk Enterprise Security with the MITRE’s ATT&CK Framework
 
TechWiseTV Workshop: Cisco Hybrid Cloud Platform for Google Cloud
TechWiseTV Workshop:  Cisco Hybrid Cloud Platform for Google CloudTechWiseTV Workshop:  Cisco Hybrid Cloud Platform for Google Cloud
TechWiseTV Workshop: Cisco Hybrid Cloud Platform for Google Cloud
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionDatabricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
 

Recently uploaded

Lies and Myths in InfoSec - 2023 Usenix Enigma
Lies and Myths in InfoSec - 2023 Usenix EnigmaLies and Myths in InfoSec - 2023 Usenix Enigma
Lies and Myths in InfoSec - 2023 Usenix EnigmaAdrian Sanabria
 
Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...
Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...
Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...Cyber Security Experts
 
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdf
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdfIIBA Adl - Being Effective on Day 1 - Slide Deck.pdf
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdfAustraliaChapterIIBA
 
A Gentle Introduction to Text Analysis :)
A Gentle Introduction to Text Analysis :)A Gentle Introduction to Text Analysis :)
A Gentle Introduction to Text Analysis :)UNCResearchHub
 
Oppotus - Malaysians on Malaysia 4Q 2023.pdf
Oppotus - Malaysians on Malaysia 4Q 2023.pdfOppotus - Malaysians on Malaysia 4Q 2023.pdf
Oppotus - Malaysians on Malaysia 4Q 2023.pdfOppotus
 
Big Data - large Scale data (Amazon, FB)
Big Data - large Scale data (Amazon, FB)Big Data - large Scale data (Amazon, FB)
Big Data - large Scale data (Amazon, FB)CUO VEERANAN VEERANAN
 
SABARI PRIYAN's self introduction as a reference
SABARI PRIYAN's self introduction as a referenceSABARI PRIYAN's self introduction as a reference
SABARI PRIYAN's self introduction as a referencepriyansabari355
 
Artificial Intelligence and its Impact on Society.pptx
Artificial Intelligence and its Impact on Society.pptxArtificial Intelligence and its Impact on Society.pptx
Artificial Intelligence and its Impact on Society.pptxVighnesh Shashtri
 
data analytics and tools from in2inglobal.pdf
data analytics  and tools from in2inglobal.pdfdata analytics  and tools from in2inglobal.pdf
data analytics and tools from in2inglobal.pdfdigimartfamily
 
Soil Health Policy Map Years 2020 to 2023
Soil Health Policy Map Years 2020 to 2023Soil Health Policy Map Years 2020 to 2023
Soil Health Policy Map Years 2020 to 2023stephizcoolio
 
Industry 4.0 in IoT Transforming the Future.pptx
Industry 4.0 in IoT Transforming the Future.pptxIndustry 4.0 in IoT Transforming the Future.pptx
Industry 4.0 in IoT Transforming the Future.pptxMdRafiqulIslam403212
 
AWS Identity and access management for users
AWS Identity and access management for usersAWS Identity and access management for users
AWS Identity and access management for usersStephenEfange3
 
SABARI PRIYAN's self introduction as reference
SABARI PRIYAN's self introduction as referenceSABARI PRIYAN's self introduction as reference
SABARI PRIYAN's self introduction as referencepriyansabari355
 
Operations Data On Mobile - inSis Mobile App - Sample Screens
Operations Data On Mobile - inSis Mobile App - Sample ScreensOperations Data On Mobile - inSis Mobile App - Sample Screens
Operations Data On Mobile - inSis Mobile App - Sample ScreensKondapi V Siva Rama Brahmam
 
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...Thibaud Le Douarin
 

Recently uploaded (17)

Lies and Myths in InfoSec - 2023 Usenix Enigma
Lies and Myths in InfoSec - 2023 Usenix EnigmaLies and Myths in InfoSec - 2023 Usenix Enigma
Lies and Myths in InfoSec - 2023 Usenix Enigma
 
Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...
Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...
Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...
 
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdf
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdfIIBA Adl - Being Effective on Day 1 - Slide Deck.pdf
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdf
 
A Gentle Introduction to Text Analysis :)
A Gentle Introduction to Text Analysis :)A Gentle Introduction to Text Analysis :)
A Gentle Introduction to Text Analysis :)
 
Oppotus - Malaysians on Malaysia 4Q 2023.pdf
Oppotus - Malaysians on Malaysia 4Q 2023.pdfOppotus - Malaysians on Malaysia 4Q 2023.pdf
Oppotus - Malaysians on Malaysia 4Q 2023.pdf
 
Big Data - large Scale data (Amazon, FB)
Big Data - large Scale data (Amazon, FB)Big Data - large Scale data (Amazon, FB)
Big Data - large Scale data (Amazon, FB)
 
SABARI PRIYAN's self introduction as a reference
SABARI PRIYAN's self introduction as a referenceSABARI PRIYAN's self introduction as a reference
SABARI PRIYAN's self introduction as a reference
 
2.pptx
2.pptx2.pptx
2.pptx
 
Artificial Intelligence and its Impact on Society.pptx
Artificial Intelligence and its Impact on Society.pptxArtificial Intelligence and its Impact on Society.pptx
Artificial Intelligence and its Impact on Society.pptx
 
data analytics and tools from in2inglobal.pdf
data analytics  and tools from in2inglobal.pdfdata analytics  and tools from in2inglobal.pdf
data analytics and tools from in2inglobal.pdf
 
Soil Health Policy Map Years 2020 to 2023
Soil Health Policy Map Years 2020 to 2023Soil Health Policy Map Years 2020 to 2023
Soil Health Policy Map Years 2020 to 2023
 
Industry 4.0 in IoT Transforming the Future.pptx
Industry 4.0 in IoT Transforming the Future.pptxIndustry 4.0 in IoT Transforming the Future.pptx
Industry 4.0 in IoT Transforming the Future.pptx
 
AWS Identity and access management for users
AWS Identity and access management for usersAWS Identity and access management for users
AWS Identity and access management for users
 
SABARI PRIYAN's self introduction as reference
SABARI PRIYAN's self introduction as referenceSABARI PRIYAN's self introduction as reference
SABARI PRIYAN's self introduction as reference
 
Operations Data On Mobile - inSis Mobile App - Sample Screens
Operations Data On Mobile - inSis Mobile App - Sample ScreensOperations Data On Mobile - inSis Mobile App - Sample Screens
Operations Data On Mobile - inSis Mobile App - Sample Screens
 
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...
 
Electricity Year 2023_updated_22022024.pptx
Electricity Year 2023_updated_22022024.pptxElectricity Year 2023_updated_22022024.pptx
Electricity Year 2023_updated_22022024.pptx
 

Building a Streaming Microservices Architecture - Data + AI Summit EU 2020

  • 1. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Building a Streaming Microservices Architecture With Apache Spark Structured Streaming & Friends Scott Haines Senior Principal Software Engineer, Twilio @newfront
  • 2. © 2019 TWILIO INC. ALL RIGHTS RESERVED. A little about me. • I work at Twilio building massive data systems • I run a bi-weekly internal Spark Office Hours where I offer training and guidance to teams at the company • >12 years working on large distributed analytics systems @newfront
  • 3. © 2019 TWILIO INC. ALL RIGHTS RESERVED. A little about me. • I work at Twilio building massive data systems • I run a bi-weekly internal Spark Office Hours where I offer training and guidance to teams at the company • >12 years working on large distributed analytics systems • Published work on Distributed Analytics Systems @newfront
  • 4. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Voice Insights Accountable observability data and interactive analytics and insights for the voice business and customers.
  • 5. VIRGINIA, USA DUBLIN, IRELAND SINGAPORE SYDNEY, AUSTRALIA TOKYO, JAPAN SAO PAULO, BRAZIL DATA CENT ERS © 2019 TWILIO INC. ALL RIGHTS RESERVED. Events per Second >1MIL
  • 6. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Let’s Build a Reliable Data Architecture
  • 7. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Goal: Reliable E2E Streaming Data Pipeline GRPC Client GRPC Server GRPC Server GRPC Server 1 2 3 Kafka Broker 4 Kafka Broker 5 6 Spark Application 7 8 HDFS S39 HTTP /2 @newfront
  • 8. Strong and Reliable Data starts at Ingest © 2019 TWILIO INC. ALL RIGHTS RESERVED. @newfront
  • 9. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Of people love JSON*. >95% { "type": "CallEvent", “call_sid”: "CA123", "attributes": [ { “account_sid”: “AC123” “start_ms": 123, “end_ms": "435" } ] } • JSON has Structure. • But JSON isn't strictly Structured Data. Structured Data @newfront @twilio
  • 10. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Of people saddened by bad data* 100% { "type": "CallEvent", “call_sid": “CA234”, “attributes":"oops" } • JSON has poor runtime guarantees due to its flexible nature. Optimize for compile time guarantees. • Debugging corrupt data in a large distributed system ruins hopes and dreams. Structured Data @newfront @twilio
  • 11. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Protocol Buffers message CallEvent { uint64 created_ms = 1; string call_sid = 2; uint64 account_sid = 3; EventType event_type = 4; Region region = 5; } • Well Defined Events tell their own Story • Type-Safety Rules • Versioning your API / Pipeline / Data now just means sticking to a version of your schema • Rely on Releases for versioning • Interoperable with most major languages (java/scala/c++/go/obj-c/node-js/ python/...)
  • 12. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Protocol Buffers message CallEvent { uint64 created_ms = 1; string call_sid = 2; uint64 account_sid = 3; EventType event_type = 4; Region region = 5; } • Data Accountability • Lightning Fast Serialization / Deserialization • Plays with nicely gRPC • Interoperable with Spark SQL • Like “JSON with Guard Rails” Of people like when things work between releases! 100% Take Aways
  • 13. Data Engineers love gRPC © 2019 TWILIO INC. ALL RIGHTS RESERVED. *technically not universally accepted @newfront
  • 14. GRPC Client GRPC Server GRPC Server GRPC Server HTTP /2 gRPC | saves time © 2019 TWILIO INC. ALL RIGHTS RESERVED.
  • 15. GRPC Client GRPC Server GRPC Server GRPC Server HTTP /2 gRPC | saves time © 2019 TWILIO INC. ALL RIGHTS RESERVED. val time = “$$$”
  • 16. GRPC Client GRPC Server GRPC Server GRPC Server HTTP /2 © 2019 TWILIO INC. ALL RIGHTS RESERVED. GRPC. GRPC // nutshell • RPC = remote procedure call. “G” stands for generic or Google • Great for Internal Services • High Performance • Compact Binary Exchange Format • Compile Idiomatic API Definitions • Capable of Bi-Directional Streaming • Pluggable HTTP/2 transport
  • 17. © 2019 TWILIO INC. ALL RIGHTS RESERVED. GRPC. • Building a CallEvent Service. • 1: Define your messages (call.proto)
  • 18. © 2019 TWILIO INC. ALL RIGHTS RESERVED. GRPC. • Building a CallEvent Service. • 1: Define your messages (call.proto) • 2: Define your services (service.proto) @newfront
  • 19. © 2019 TWILIO INC. ALL RIGHTS RESERVED. GRPC. • Building a CallEvent Service. • 1: Define your messages (call.proto) • 2: Define your services (service.proto) • 3: Compile your messages and service stubs. sbt clean compile package publishLocal
  • 20. © 2019 TWILIO INC. ALL RIGHTS RESERVED. GRPC. • Building a CallEvent Service. • 1: Define your messages (call.proto) • 2: Define your services (service.proto) • 3: Compile your messages and service stubs. • 4: Implement Traits and Run!
  • 21. © 2019 TWILIO INC. ALL RIGHTS RESERVED. GRPC. • Building a CallEvent Service. • Client SDKs essentially write themselves. • JSON <-> Protobuf is still possible for maintaining customer facing APIs
  • 22. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Protocol Gatewaymessage T<:Event { … } protobuf @ version Kafka Broker gRPC Common Pattern Emerges Protocol Streams
  • 23. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Still Building… GRPC Client GRPC Server GRPC Server GRPC Server 1 2 3 Kafka Broker 4 Kafka Broker 5 6 Spark Application 7 8 HDFS S39 HTTP /2 @newfront
  • 24. GRPC Client GRPC Server GRPC Server GRPC Server 1 2 3 Kafka Broker 4 Kafka Broker 5 6 Spark Application 7 8 9 HTTP /2 © 2019 TWILIO INC. ALL RIGHTS RESERVED. Kafka + Protobuf + Spark
  • 25. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Kafka. • Solve for common Data Pipeline Problems • Partition Keys are important. Use them to your advantage • Spark AQE - adaptive query execution can handle hot spots. Non-spark services not- so-much. Topic: CallEvents key: call_sid partitions*: n * number of partitions: factor of producer records/s * avg(record.bytesize)
  • 26. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Spark. • Use ScalaPB’s ExpressionEncoders to natively convert protobuf to Catalyst Optimizable DataFrames • Marry this with Sparks Kafka DataSource Reader/Writer
  • 27. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Spark. • Behind the Scenes… • Spark is doing magic
  • 28. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Spark. • End to End Tests ensure you can press the release button anywhere in the pipeline. • Can be automated to ensure your Spark Apps can continue working with any changes to the upstream gRPC • Can use for Canary testing updates
  • 29. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Spark. • Use ScalaPB to read local streams • Ensure your Spark Apps can continue working with any new protobuf dependencies • Test simple reads and complex aggregations locally, deploy globally @newfront
  • 30. GRPC Client GRPC Server GRPC Server GRPC Server 1 Kafka Broker 4 Kafka Broker 5 6 Spark Application 7 8 HDFS S39 TP /2 © 2019 TWILIO INC. ALL RIGHTS RESERVED. Spark & Beyond @newfront
  • 31. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Spark + Friends. • Proto to Catalyst DataFrame • Conversion to Parquet in Delta Table • Partitioned by Date @newfront
  • 32. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Spark + Friends. • Now it is easy for Downstream applications to pick up using Parquet Protocol Streams @newfront
  • 33. © 2019 TWILIO INC. ALL RIGHTS RESERVED. Rinse and Repeat @newfront Just keep adding new Flows. 1. Define your Structured Data 2. Define your service definitions 3. Emit Data / Enqueue to Kafka 4. Read and Drop in HDFS (delta) or pass along to a new Topic Kafka Topic Kafka Topic Spark Application Spark Application Spark Application Kafka Topic Data Table Data Table Spark Application GRPC Server