SlideShare a Scribd company logo
1 © 2018 PURE STORAGE INC. PURE PROPRIETARY
Efficiently Triaging CI/CD Pipelines
with Apache Spark
Ivan Jibaja
Software Engineer
Streaming 80 Billion Events/Day and
Batching 40 TB/Hour
2 © 2018 PURE STORAGE INC. PURE PROPRIETARY
Log Analytics Pipeline in Numbers
ü900k - 1M events / second
ü5 seconds SLA
ü0.5 - 1 PB of data / day
3 © 2018 PURE STORAGE INC. PURE PROPRIETARY
Continuous Integration &
Continuous Deployment
Source Build
Functional
Test
Stress
Test
Deploy
4 © 2018 PURE STORAGE INC. PURE PROPRIETARY
< 5
1 Test
coordinator
(Jenkins)
< 10
< 10
CI/CD works!
100s
tests / day
< 5
failures
Email
developer
5 © 2018 PURE STORAGE INC. PURE PROPRIETARY
700
failures
x
15 min
70,000+
tests / day
20 Triage Engineers
2x in the next 12 months
1500+
VMs
250+
FBs
20+
Jenkins
700+
clients
100+
Engineers
Scale Problems
6 © 2018 PURE STORAGE INC. PURE PROPRIETARY
Log Analysis Dream
1. Automate triaging of failures
2. Extract performance metrics
3. Save our logs for future use
4. Do all of this in a scalable system
5. Real-time results!
7 © 2018 PURE STORAGE INC. PURE PROPRIETARY
Log Analysis
Volume
Value
8 © 2018 PURE STORAGE INC. PURE PROPRIETARY
Log Analysis v1
Volume
Value
Save
Alert / Take action
9 © 2018 PURE STORAGE INC. PURE PROPRIETARY
Log Analysis v2
Volume
Value
Save
ETL / Add Structure
Alert / Take action
10 © 2018 PURE STORAGE INC. PURE PROPRIETARY
Log Analysis v3
Volume
Value
Save
Aggregate / Search
ETL / Add Structure
Alert / Take action
11 © 2018 PURE STORAGE INC. PURE PROPRIETARY
Log Analysis v10
Volume
Value
Save
Aggregate / Search
ETL / Add Structure
Alert / Take action
12 © 2018 PURE STORAGE INC. PURE PROPRIETARY
Log Analysis Pipeline
Augment &
Centralize
LogSources
Index
Aggregate
Transform
Logic
Timeseries
DB
AlertStore
Visualize
13 © 2018 PURE STORAGE INC. PURE PROPRIETARY
Log Analysis Pipeline
Augment &
Centralize
LogSources
Aggregate
Transform
Logic
Timeseries
DB
AlertStore
Visualize
Index
14 © 2018 PURE STORAGE INC. PURE PROPRIETARY
Log Analysis Pipeline
Augment &
Centralize
LogSources
Streaming
Buffer
Filter
Store
Aggregate
Transform
Logic
Timeseries
DB
Alert
Visualize
Index
15 © 2018 PURE STORAGE INC. PURE PROPRIETARY
Log Analysis Pipeline
Augment &
Centralize
LogSources
Streaming
Buffer
Filter
Store Re-Filter
Aggregate
Transform
Logic
Timeseries
DB
Alert
Visualize
Index
16 © 2018 PURE STORAGE INC. PURE PROPRIETARY
Log Analysis Pipeline
Augment &
Centralize
LogSources
Streaming
Buffer
Filter
Store
Aggregate
Transform
Logic
Timeseries
DB
Alert
Visualize
Index
Re-Filter
17 © 2018 PURE STORAGE INC. PURE PROPRIETARY
Log Analysis Pipeline
rsyslog
LogSources
Streaming
Buffer
Filter
Store Re-Filter
Aggregate
Transform
Logic
Timeseries
DB
Alert
Visualize
Index
18 © 2018 PURE STORAGE INC. PURE PROPRIETARY
Log Analysis Pipeline
rsyslog
LogSources
Filter
Re-Filter
Timeseries
DB
Alert
Aggregate
Transform
Logic
Visualize
Index
19 © 2018 PURE STORAGE INC. PURE PROPRIETARY
Log Analysis Pipeline
rsyslog
LogSources
Timeseries
DB
Alert
Aggregate
Transform
Logic
Visualize
Index
20 © 2018 PURE STORAGE INC. PURE PROPRIETARY
Log Analysis Pipeline
rsyslog
LogSources
Timeseries
DB
Alert
Aggregate
Transform
Logic
Visualize
21 © 2018 PURE STORAGE INC. PURE PROPRIETARY
Log Analysis Pipeline
rsyslog
LogSources
Timeseries
DB
Alert
Visualize
22 © 2018 PURE STORAGE INC. PURE PROPRIETARY
Log Analysis Pipeline
rsyslog
LogSources
23 © 2018 PURE STORAGE INC. PURE PROPRIETARY
FULL PIPELINE
1,500+
VMs
250+
FBs
20+
Jenkins
700+
clients
27T
12
12
12
12
12
12
12
12
12
12
27T 12
12
12
12
12
12
12
12
12
12
12
12
12
12
70,000+
tests / day
9T
rsyslog
16
16
16
16
16
16
300G 12
12
12
12
12
12
ü Duplicate bug
ü Infrastructure failure
ü Performance regression
24 © 2018 PURE STORAGE INC. PURE PROPRIETARY
FULL PIPELINE
1,500+
VMs
250+
FBs
20+
Jenkins
700+
clients
27T
12
12
12
12
12
12
12
12
12
12
27T 12
12
12
12
12
12
12
12
12
12
12
12
12
12
70,000+
tests / day
9T
rsyslog
16
16
16
16
16
16
300G
12
12
12
12
12
12
ü Duplicate bug
ü Infrastructure failure
ü Performance regression81T
12
12
12
12
12
12
30G
25 © 2018 PURE STORAGE INC. PURE PROPRIETARY
FULL PIPELINE
1,500+
VMs
250+
FBs
20+
Jenkins
700+
clients
27T
12
12
12
12
12
12
12
12
12
12
27T 12
12
12
12
12
12
12
12
12
12
12
12
12
12
70,000+
tests / day
9T
rsyslog
16
16
16
16
16
16
300G
12
12
12
12
12
12
ü Duplicate bug
ü Infrastructure failure
ü Performance regression81T
12
12
12
12
12
12
30G
50G
12
12
12
12189T ü Low level details
ü Easy to read graphs
26 © 2018 PURE STORAGE INC. PURE PROPRIETARY
Takeaways
ü Index only what you need, store the rest
(in a storage layer that scales in throughput and to billions of files/objects)
ü Disaggregation of compute and storage for
scalability of subsystems
27 © 2018 PURE STORAGE INC. PURE PROPRIETARY
QUESTIONS?

More Related Content

What's hot

Metrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye Zhou
Metrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye ZhouMetrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye Zhou
Metrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye Zhou
Databricks
 
Pinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ UberPinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ Uber
Xiang Fu
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
Xiang Fu
 
KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...
KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...
KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...
confluent
 
OAP: Optimized Analytics Package for Spark Platform with Daoyuan Wang and Yua...
OAP: Optimized Analytics Package for Spark Platform with Daoyuan Wang and Yua...OAP: Optimized Analytics Package for Spark Platform with Daoyuan Wang and Yua...
OAP: Optimized Analytics Package for Spark Platform with Daoyuan Wang and Yua...
Databricks
 
Uber Real Time Data Analytics
Uber Real Time Data AnalyticsUber Real Time Data Analytics
Uber Real Time Data Analytics
Ankur Bansal
 
TuneIn: How to Get Your Hadoop/Spark Jobs Tuned While You’re Sleeping with Ma...
TuneIn: How to Get Your Hadoop/Spark Jobs Tuned While You’re Sleeping with Ma...TuneIn: How to Get Your Hadoop/Spark Jobs Tuned While You’re Sleeping with Ma...
TuneIn: How to Get Your Hadoop/Spark Jobs Tuned While You’re Sleeping with Ma...
Databricks
 
Kafka Summit SF 2017 - Fast Data in Supply Chain Planning
Kafka Summit SF 2017 - Fast Data in Supply Chain PlanningKafka Summit SF 2017 - Fast Data in Supply Chain Planning
Kafka Summit SF 2017 - Fast Data in Supply Chain Planning
confluent
 
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
confluent
 
Stsg17 speaker yousunjeong
Stsg17 speaker yousunjeongStsg17 speaker yousunjeong
Stsg17 speaker yousunjeong
Yousun Jeong
 
Bullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query EngineBullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query Engine
DataWorks Summit
 
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Databricks
 
Apache Pulsar: The Next Generation Messaging and Queuing System
Apache Pulsar: The Next Generation Messaging and Queuing SystemApache Pulsar: The Next Generation Messaging and Queuing System
Apache Pulsar: The Next Generation Messaging and Queuing System
Databricks
 
Spark, Tachyon and Mesos internals
Spark, Tachyon and Mesos internalsSpark, Tachyon and Mesos internals
Spark, Tachyon and Mesos internals
Claudiu Barbura
 
Performant Streaming in Production: Preventing Common Pitfalls when Productio...
Performant Streaming in Production: Preventing Common Pitfalls when Productio...Performant Streaming in Production: Preventing Common Pitfalls when Productio...
Performant Streaming in Production: Preventing Common Pitfalls when Productio...
Databricks
 
Real-time Analytics with Presto and Apache Pinot
Real-time Analytics with Presto and Apache PinotReal-time Analytics with Presto and Apache Pinot
Real-time Analytics with Presto and Apache Pinot
Xiang Fu
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
Spark Summit
 
The Revolution Will be Streamed
The Revolution Will be StreamedThe Revolution Will be Streamed
The Revolution Will be Streamed
Databricks
 
Scaling up uber's real time data analytics
Scaling up uber's real time data analyticsScaling up uber's real time data analytics
Scaling up uber's real time data analytics
Xiang Fu
 
Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira
Using Apache Spark in the Cloud—A Devops Perspective with Telmo OliveiraUsing Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira
Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira
Spark Summit
 

What's hot (20)

Metrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye Zhou
Metrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye ZhouMetrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye Zhou
Metrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye Zhou
 
Pinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ UberPinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ Uber
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
 
KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...
KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...
KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...
 
OAP: Optimized Analytics Package for Spark Platform with Daoyuan Wang and Yua...
OAP: Optimized Analytics Package for Spark Platform with Daoyuan Wang and Yua...OAP: Optimized Analytics Package for Spark Platform with Daoyuan Wang and Yua...
OAP: Optimized Analytics Package for Spark Platform with Daoyuan Wang and Yua...
 
Uber Real Time Data Analytics
Uber Real Time Data AnalyticsUber Real Time Data Analytics
Uber Real Time Data Analytics
 
TuneIn: How to Get Your Hadoop/Spark Jobs Tuned While You’re Sleeping with Ma...
TuneIn: How to Get Your Hadoop/Spark Jobs Tuned While You’re Sleeping with Ma...TuneIn: How to Get Your Hadoop/Spark Jobs Tuned While You’re Sleeping with Ma...
TuneIn: How to Get Your Hadoop/Spark Jobs Tuned While You’re Sleeping with Ma...
 
Kafka Summit SF 2017 - Fast Data in Supply Chain Planning
Kafka Summit SF 2017 - Fast Data in Supply Chain PlanningKafka Summit SF 2017 - Fast Data in Supply Chain Planning
Kafka Summit SF 2017 - Fast Data in Supply Chain Planning
 
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
 
Stsg17 speaker yousunjeong
Stsg17 speaker yousunjeongStsg17 speaker yousunjeong
Stsg17 speaker yousunjeong
 
Bullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query EngineBullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query Engine
 
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
 
Apache Pulsar: The Next Generation Messaging and Queuing System
Apache Pulsar: The Next Generation Messaging and Queuing SystemApache Pulsar: The Next Generation Messaging and Queuing System
Apache Pulsar: The Next Generation Messaging and Queuing System
 
Spark, Tachyon and Mesos internals
Spark, Tachyon and Mesos internalsSpark, Tachyon and Mesos internals
Spark, Tachyon and Mesos internals
 
Performant Streaming in Production: Preventing Common Pitfalls when Productio...
Performant Streaming in Production: Preventing Common Pitfalls when Productio...Performant Streaming in Production: Preventing Common Pitfalls when Productio...
Performant Streaming in Production: Preventing Common Pitfalls when Productio...
 
Real-time Analytics with Presto and Apache Pinot
Real-time Analytics with Presto and Apache PinotReal-time Analytics with Presto and Apache Pinot
Real-time Analytics with Presto and Apache Pinot
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
 
The Revolution Will be Streamed
The Revolution Will be StreamedThe Revolution Will be Streamed
The Revolution Will be Streamed
 
Scaling up uber's real time data analytics
Scaling up uber's real time data analyticsScaling up uber's real time data analytics
Scaling up uber's real time data analytics
 
Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira
Using Apache Spark in the Cloud—A Devops Perspective with Telmo OliveiraUsing Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira
Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira
 

Similar to Efficiently Triaging CI Pipelines with Apache Spark: Mixing 52 Billion Events/Day of Streaming with 40 TB/Hour of Batch Processing with Ivan Jibaja

Avoiding Log Data Overload in a CI/CD System: Streaming 190 Billion Events an...
Avoiding Log Data Overload in a CI/CD System: Streaming 190 Billion Events an...Avoiding Log Data Overload in a CI/CD System: Streaming 190 Billion Events an...
Avoiding Log Data Overload in a CI/CD System: Streaming 190 Billion Events an...
Databricks
 
Avoiding Log Data Overload in a CI/CD System While Streaming 190 Billion Even...
Avoiding Log Data Overload in a CI/CD System While Streaming 190 Billion Even...Avoiding Log Data Overload in a CI/CD System While Streaming 190 Billion Even...
Avoiding Log Data Overload in a CI/CD System While Streaming 190 Billion Even...
DataWorks Summit
 
Building Resilient and Scalable Data Pipelines by Decoupling Compute and Storage
Building Resilient and Scalable Data Pipelines by Decoupling Compute and StorageBuilding Resilient and Scalable Data Pipelines by Decoupling Compute and Storage
Building Resilient and Scalable Data Pipelines by Decoupling Compute and Storage
Databricks
 
Syngenta's Predictive Analytics Platform for Seeds R&D
Syngenta's Predictive Analytics Platform for Seeds R&DSyngenta's Predictive Analytics Platform for Seeds R&D
Syngenta's Predictive Analytics Platform for Seeds R&D
Michael Swanson
 
Gimel at Dataworks Summit San Jose 2018
Gimel at Dataworks Summit San Jose 2018Gimel at Dataworks Summit San Jose 2018
Gimel at Dataworks Summit San Jose 2018
Romit Mehta
 
Dataworks | 2018-06-20 | Gimel data platform
Dataworks | 2018-06-20 | Gimel data platformDataworks | 2018-06-20 | Gimel data platform
Dataworks | 2018-06-20 | Gimel data platform
Deepak Chandramouli
 
Top 5 Lessons Learned in Deploying AI in the Real World
Top 5 Lessons Learned in Deploying AI in the Real WorldTop 5 Lessons Learned in Deploying AI in the Real World
Top 5 Lessons Learned in Deploying AI in the Real World
Digital Transformation EXPO Event Series
 
Storage for big-data by Joshua Robinson
Storage for big-data by Joshua RobinsonStorage for big-data by Joshua Robinson
Storage for big-data by Joshua Robinson
Data Con LA
 
Why You Need Manageability Now More than Ever and How to Get It
Why You Need Manageability Now More than Ever and How to Get ItWhy You Need Manageability Now More than Ever and How to Get It
Why You Need Manageability Now More than Ever and How to Get It
Gustavo Rene Antunez
 
How Financial Services can Save On File Storage
How Financial Services can Save On File Storage How Financial Services can Save On File Storage
How Financial Services can Save On File Storage
Charly Mostert
 
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Amazon Web Services
 
Webinar slides: How to Measure Database Availability?
Webinar slides: How to Measure Database Availability?Webinar slides: How to Measure Database Availability?
Webinar slides: How to Measure Database Availability?
Severalnines
 
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025
Nicola Sandoli
 
Oracle Project Analytics
Oracle Project Analytics Oracle Project Analytics
Oracle Project Analytics
Nitai Partners Inc
 
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
Amazon Web Services
 
Get to Know Your Customers - Build and Innovate with a Modern Data Architecture
Get to Know Your Customers - Build and Innovate with a Modern Data ArchitectureGet to Know Your Customers - Build and Innovate with a Modern Data Architecture
Get to Know Your Customers - Build and Innovate with a Modern Data Architecture
Amazon Web Services
 
Oracle BI Publisher to Transform Cloud ERP Reports
Oracle BI Publisher to Transform Cloud ERP ReportsOracle BI Publisher to Transform Cloud ERP Reports
Oracle BI Publisher to Transform Cloud ERP Reports
Mahesh Vallampati
 
Accelerate AI/ML Adoption with Intel Processors and C3IoT on AWS (AIM386-S) -...
Accelerate AI/ML Adoption with Intel Processors and C3IoT on AWS (AIM386-S) -...Accelerate AI/ML Adoption with Intel Processors and C3IoT on AWS (AIM386-S) -...
Accelerate AI/ML Adoption with Intel Processors and C3IoT on AWS (AIM386-S) -...
Amazon Web Services
 
Learning to Rank: From Theory to Production - Malvina Josephidou & Diego Cecc...
Learning to Rank: From Theory to Production - Malvina Josephidou & Diego Cecc...Learning to Rank: From Theory to Production - Malvina Josephidou & Diego Cecc...
Learning to Rank: From Theory to Production - Malvina Josephidou & Diego Cecc...
Lucidworks
 
Case Study: Tableau Integration with CA Project & Portfolio Management
Case Study: Tableau Integration with CA Project & Portfolio ManagementCase Study: Tableau Integration with CA Project & Portfolio Management
Case Study: Tableau Integration with CA Project & Portfolio Management
CA Technologies
 

Similar to Efficiently Triaging CI Pipelines with Apache Spark: Mixing 52 Billion Events/Day of Streaming with 40 TB/Hour of Batch Processing with Ivan Jibaja (20)

Avoiding Log Data Overload in a CI/CD System: Streaming 190 Billion Events an...
Avoiding Log Data Overload in a CI/CD System: Streaming 190 Billion Events an...Avoiding Log Data Overload in a CI/CD System: Streaming 190 Billion Events an...
Avoiding Log Data Overload in a CI/CD System: Streaming 190 Billion Events an...
 
Avoiding Log Data Overload in a CI/CD System While Streaming 190 Billion Even...
Avoiding Log Data Overload in a CI/CD System While Streaming 190 Billion Even...Avoiding Log Data Overload in a CI/CD System While Streaming 190 Billion Even...
Avoiding Log Data Overload in a CI/CD System While Streaming 190 Billion Even...
 
Building Resilient and Scalable Data Pipelines by Decoupling Compute and Storage
Building Resilient and Scalable Data Pipelines by Decoupling Compute and StorageBuilding Resilient and Scalable Data Pipelines by Decoupling Compute and Storage
Building Resilient and Scalable Data Pipelines by Decoupling Compute and Storage
 
Syngenta's Predictive Analytics Platform for Seeds R&D
Syngenta's Predictive Analytics Platform for Seeds R&DSyngenta's Predictive Analytics Platform for Seeds R&D
Syngenta's Predictive Analytics Platform for Seeds R&D
 
Gimel at Dataworks Summit San Jose 2018
Gimel at Dataworks Summit San Jose 2018Gimel at Dataworks Summit San Jose 2018
Gimel at Dataworks Summit San Jose 2018
 
Dataworks | 2018-06-20 | Gimel data platform
Dataworks | 2018-06-20 | Gimel data platformDataworks | 2018-06-20 | Gimel data platform
Dataworks | 2018-06-20 | Gimel data platform
 
Top 5 Lessons Learned in Deploying AI in the Real World
Top 5 Lessons Learned in Deploying AI in the Real WorldTop 5 Lessons Learned in Deploying AI in the Real World
Top 5 Lessons Learned in Deploying AI in the Real World
 
Storage for big-data by Joshua Robinson
Storage for big-data by Joshua RobinsonStorage for big-data by Joshua Robinson
Storage for big-data by Joshua Robinson
 
Why You Need Manageability Now More than Ever and How to Get It
Why You Need Manageability Now More than Ever and How to Get ItWhy You Need Manageability Now More than Ever and How to Get It
Why You Need Manageability Now More than Ever and How to Get It
 
How Financial Services can Save On File Storage
How Financial Services can Save On File Storage How Financial Services can Save On File Storage
How Financial Services can Save On File Storage
 
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
 
Webinar slides: How to Measure Database Availability?
Webinar slides: How to Measure Database Availability?Webinar slides: How to Measure Database Availability?
Webinar slides: How to Measure Database Availability?
 
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025
 
Oracle Project Analytics
Oracle Project Analytics Oracle Project Analytics
Oracle Project Analytics
 
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
 
Get to Know Your Customers - Build and Innovate with a Modern Data Architecture
Get to Know Your Customers - Build and Innovate with a Modern Data ArchitectureGet to Know Your Customers - Build and Innovate with a Modern Data Architecture
Get to Know Your Customers - Build and Innovate with a Modern Data Architecture
 
Oracle BI Publisher to Transform Cloud ERP Reports
Oracle BI Publisher to Transform Cloud ERP ReportsOracle BI Publisher to Transform Cloud ERP Reports
Oracle BI Publisher to Transform Cloud ERP Reports
 
Accelerate AI/ML Adoption with Intel Processors and C3IoT on AWS (AIM386-S) -...
Accelerate AI/ML Adoption with Intel Processors and C3IoT on AWS (AIM386-S) -...Accelerate AI/ML Adoption with Intel Processors and C3IoT on AWS (AIM386-S) -...
Accelerate AI/ML Adoption with Intel Processors and C3IoT on AWS (AIM386-S) -...
 
Learning to Rank: From Theory to Production - Malvina Josephidou & Diego Cecc...
Learning to Rank: From Theory to Production - Malvina Josephidou & Diego Cecc...Learning to Rank: From Theory to Production - Malvina Josephidou & Diego Cecc...
Learning to Rank: From Theory to Production - Malvina Josephidou & Diego Cecc...
 
Case Study: Tableau Integration with CA Project & Portfolio Management
Case Study: Tableau Integration with CA Project & Portfolio ManagementCase Study: Tableau Integration with CA Project & Portfolio Management
Case Study: Tableau Integration with CA Project & Portfolio Management
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
2023240532
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 

Recently uploaded (20)

一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 

Efficiently Triaging CI Pipelines with Apache Spark: Mixing 52 Billion Events/Day of Streaming with 40 TB/Hour of Batch Processing with Ivan Jibaja

  • 1. 1 © 2018 PURE STORAGE INC. PURE PROPRIETARY Efficiently Triaging CI/CD Pipelines with Apache Spark Ivan Jibaja Software Engineer Streaming 80 Billion Events/Day and Batching 40 TB/Hour
  • 2. 2 © 2018 PURE STORAGE INC. PURE PROPRIETARY Log Analytics Pipeline in Numbers ü900k - 1M events / second ü5 seconds SLA ü0.5 - 1 PB of data / day
  • 3. 3 © 2018 PURE STORAGE INC. PURE PROPRIETARY Continuous Integration & Continuous Deployment Source Build Functional Test Stress Test Deploy
  • 4. 4 © 2018 PURE STORAGE INC. PURE PROPRIETARY < 5 1 Test coordinator (Jenkins) < 10 < 10 CI/CD works! 100s tests / day < 5 failures Email developer
  • 5. 5 © 2018 PURE STORAGE INC. PURE PROPRIETARY 700 failures x 15 min 70,000+ tests / day 20 Triage Engineers 2x in the next 12 months 1500+ VMs 250+ FBs 20+ Jenkins 700+ clients 100+ Engineers Scale Problems
  • 6. 6 © 2018 PURE STORAGE INC. PURE PROPRIETARY Log Analysis Dream 1. Automate triaging of failures 2. Extract performance metrics 3. Save our logs for future use 4. Do all of this in a scalable system 5. Real-time results!
  • 7. 7 © 2018 PURE STORAGE INC. PURE PROPRIETARY Log Analysis Volume Value
  • 8. 8 © 2018 PURE STORAGE INC. PURE PROPRIETARY Log Analysis v1 Volume Value Save Alert / Take action
  • 9. 9 © 2018 PURE STORAGE INC. PURE PROPRIETARY Log Analysis v2 Volume Value Save ETL / Add Structure Alert / Take action
  • 10. 10 © 2018 PURE STORAGE INC. PURE PROPRIETARY Log Analysis v3 Volume Value Save Aggregate / Search ETL / Add Structure Alert / Take action
  • 11. 11 © 2018 PURE STORAGE INC. PURE PROPRIETARY Log Analysis v10 Volume Value Save Aggregate / Search ETL / Add Structure Alert / Take action
  • 12. 12 © 2018 PURE STORAGE INC. PURE PROPRIETARY Log Analysis Pipeline Augment & Centralize LogSources Index Aggregate Transform Logic Timeseries DB AlertStore Visualize
  • 13. 13 © 2018 PURE STORAGE INC. PURE PROPRIETARY Log Analysis Pipeline Augment & Centralize LogSources Aggregate Transform Logic Timeseries DB AlertStore Visualize Index
  • 14. 14 © 2018 PURE STORAGE INC. PURE PROPRIETARY Log Analysis Pipeline Augment & Centralize LogSources Streaming Buffer Filter Store Aggregate Transform Logic Timeseries DB Alert Visualize Index
  • 15. 15 © 2018 PURE STORAGE INC. PURE PROPRIETARY Log Analysis Pipeline Augment & Centralize LogSources Streaming Buffer Filter Store Re-Filter Aggregate Transform Logic Timeseries DB Alert Visualize Index
  • 16. 16 © 2018 PURE STORAGE INC. PURE PROPRIETARY Log Analysis Pipeline Augment & Centralize LogSources Streaming Buffer Filter Store Aggregate Transform Logic Timeseries DB Alert Visualize Index Re-Filter
  • 17. 17 © 2018 PURE STORAGE INC. PURE PROPRIETARY Log Analysis Pipeline rsyslog LogSources Streaming Buffer Filter Store Re-Filter Aggregate Transform Logic Timeseries DB Alert Visualize Index
  • 18. 18 © 2018 PURE STORAGE INC. PURE PROPRIETARY Log Analysis Pipeline rsyslog LogSources Filter Re-Filter Timeseries DB Alert Aggregate Transform Logic Visualize Index
  • 19. 19 © 2018 PURE STORAGE INC. PURE PROPRIETARY Log Analysis Pipeline rsyslog LogSources Timeseries DB Alert Aggregate Transform Logic Visualize Index
  • 20. 20 © 2018 PURE STORAGE INC. PURE PROPRIETARY Log Analysis Pipeline rsyslog LogSources Timeseries DB Alert Aggregate Transform Logic Visualize
  • 21. 21 © 2018 PURE STORAGE INC. PURE PROPRIETARY Log Analysis Pipeline rsyslog LogSources Timeseries DB Alert Visualize
  • 22. 22 © 2018 PURE STORAGE INC. PURE PROPRIETARY Log Analysis Pipeline rsyslog LogSources
  • 23. 23 © 2018 PURE STORAGE INC. PURE PROPRIETARY FULL PIPELINE 1,500+ VMs 250+ FBs 20+ Jenkins 700+ clients 27T 12 12 12 12 12 12 12 12 12 12 27T 12 12 12 12 12 12 12 12 12 12 12 12 12 12 70,000+ tests / day 9T rsyslog 16 16 16 16 16 16 300G 12 12 12 12 12 12 ü Duplicate bug ü Infrastructure failure ü Performance regression
  • 24. 24 © 2018 PURE STORAGE INC. PURE PROPRIETARY FULL PIPELINE 1,500+ VMs 250+ FBs 20+ Jenkins 700+ clients 27T 12 12 12 12 12 12 12 12 12 12 27T 12 12 12 12 12 12 12 12 12 12 12 12 12 12 70,000+ tests / day 9T rsyslog 16 16 16 16 16 16 300G 12 12 12 12 12 12 ü Duplicate bug ü Infrastructure failure ü Performance regression81T 12 12 12 12 12 12 30G
  • 25. 25 © 2018 PURE STORAGE INC. PURE PROPRIETARY FULL PIPELINE 1,500+ VMs 250+ FBs 20+ Jenkins 700+ clients 27T 12 12 12 12 12 12 12 12 12 12 27T 12 12 12 12 12 12 12 12 12 12 12 12 12 12 70,000+ tests / day 9T rsyslog 16 16 16 16 16 16 300G 12 12 12 12 12 12 ü Duplicate bug ü Infrastructure failure ü Performance regression81T 12 12 12 12 12 12 30G 50G 12 12 12 12189T ü Low level details ü Easy to read graphs
  • 26. 26 © 2018 PURE STORAGE INC. PURE PROPRIETARY Takeaways ü Index only what you need, store the rest (in a storage layer that scales in throughput and to billions of files/objects) ü Disaggregation of compute and storage for scalability of subsystems
  • 27. 27 © 2018 PURE STORAGE INC. PURE PROPRIETARY QUESTIONS?