SlideShare a Scribd company logo
1 of 32
JOSHUA ROBINSON
FLASHBLADE ENGINEERING, PURE STORAGE
From Big Data To Big Intelligence:
Spark Meets Flashblade
A Pure Engineering Use Case
© 2017 PURE STORAGE INC.
2
ALL-FLASH STORAGE FOR
DATA-INTENSIVE COMPUTING
© 2017 PURE STORAGE INC.
3
FLASHBLADE FOR BIG DATA ANALYTICS
FAST
DATA
BIG
DATA
X
AGILE
DATA
X
 =
DATA
ADVANTAGE
© 2017 PURE STORAGE INC.
4
TRADITIONAL DATA ANALYTICS
EXAMPLE
SALES FEED/
PIPELINE
CRM
ENGINEERING
TICKETS
EXTRACT 
AGGREGATE
ANALYTICS
PARAMETERS
PRODUCT LOGS
 RAW LOG STORAGE
 GREP, AWK, ETC
STORAGE


COMPUTE
© 2017 PURE STORAGE INC.
5
MODERN BIG DATA ANALYTICS
EXAMPLE
SALES FEED/
PIPELINE
CRM
ENGINEERING
TICKETS
PRODUCT LOGS
EXTRACT 
AGGREGATE
© 2017 PURE STORAGE INC.
6
INFRASTRUCTURE OF BIG
DATA WAREHOUSES
>6PBs ACROSS 100s OF
HETEROGENEOUS DATA SILOS
© 2017 PURE STORAGE INC.
7
BIG DATA WAREHOUSES

INFRASTRUCTURE
Compute
Storage
BIG FAST SIMPLE
© 2017 PURE STORAGE INC.
8
INTRODUCING FLASHBLADE™
ALL-FLASH FILE AND OBJECT STORAGE
BIG 
Up to 8 PBs
FAST 
75 GBps / 8M IOPS
SIMPLE 
Seamlessly scalable 
BLADE
 PURITY
 FABRIC
Automating triage of test
failures in SW development
A Pure Engineering Use Case
© 2017 PURE STORAGE INC.
10
THE PROBLEM
Handful
 1 Test
coordinator
(Jenkins)
Handful
Handful
100s of tests
© 2017 PURE STORAGE INC.
11
THE PROBLEM
1,000 
test 
failures
20,000+
tests / day
20 Engineers
2x in the next 12 months
1000+
VMs
120+
FBs
20+
Jenkins
400+
clients
100+
Engineers
© 2017 PURE STORAGE INC.
12
THE DREAM
1.  Automate triaging of failures as much as possible
2.  Extract performance metrics from the logs
3.  Save our logs for future use
4.  Do all of this in a scalable system
5.  Real-time results!
© 2017 PURE STORAGE INC.
13
OUR DATA ANALYTICS PIPELINE
10 FB
20 
clients
100+ tests
12
12
12
12
rsyslog
12
12
12
12
12
12
12
12
© 2017 PURE STORAGE INC.
14
OUR DATA ANALYTICS PIPELINE
100 FB
200
clients
1,000+ tests
12
12
12
12
rsyslog
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
© 2017 PURE STORAGE INC.
15
OUR DATA ANALYTICS PIPELINE
100 FB
200
clients
1,000+ tests
12
12
12
12
rsyslog
12
12
12
12
12
12
12
12
12
12
12
12
12
12
© 2017 PURE STORAGE INC.
16
OUR DATA ANALYTICS PIPELINE
120+
FB
400+
clients
4,000+ tests
12
12
12
12
rsyslog
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
© 2017 PURE STORAGE INC.
17
OUR DATA ANALYTICS PIPELINE
1,000+
VMs
120+
FBs
20+
Jenkins
400+
clients
16
16
16
16
rsyslog
12
12
12
12
12
12
12
12
12
12
6G
40
40
40
40
18T
 18T
6T
6G
 12
Custom code
ü  Duplicate bug
ü  Infrastructure failure
ü  Performance regression
20,000+
tests
© 2017 PURE STORAGE INC.
18
© 2017 PURE STORAGE INC.
19
Processed: 18 TB 30 Billion events
per day
Extracted: 6 GB 8 Million events
per day
© 2017 PURE STORAGE INC.
20
THE POWER OF DATA ANALYTICS
20,000+
tests
1,000 
test 
failures
20,000+
tests
~30 
distinct
test 
failures
Data Analytics Pipeline
Shared Storage Benefits
© 2017 PURE STORAGE INC.
22
SCALING
STORAGE
© 2017 PURE STORAGE INC.
23
FLASHBLADE GUI
© 2017 PURE STORAGE INC.
24
SCALING
COMPUTE
© 2017 PURE STORAGE INC.
25
OUR DATA ANALYTICS PIPELINE
1,000+
VMs
120+
FBs
20+
Jenkins
400+
clients
16
16
16
16
rsyslog
12
12
12
12
4G
40
40
40
40
18T
 18T
6T
4G
 12
Custom code
20,000+
tests
12
12
12
12
© 2017 PURE STORAGE INC.
26
ANALYTICS PIPELINE
SCALING COMPUTE

1.  Download docker image
2.  Mount FlashBlade on container
3.  Hot-add to Spark cluster
ALL OF THIS CAN BE DONE IN A SINGLE COMMAND 
WITHOUT DISRUPTING YOUR SPARK JOBS!
© 2017 PURE STORAGE INC.
27
OUR DATA ANALYTICS PIPELINE
1000+
VMs
120+
FBs
20+
Jenkins
400+
clients
16
16
16
16
rsyslog
12
12
12
12
12
12
12
12
12
12
6G
40
40
40
40
18T
 18T
6T
20,000+
tests
6G
 12
Custom code
© 2017 PURE STORAGE INC.
28
INFRASTRUCTURE AGILITY
AD-HOC AND BURSTY ANALYTICS
rsyslog
1000-CORE SPARK CLUSTER
ON FLASHBLADE
© 2017 PURE STORAGE INC.
29
INFRASTRUCTURE SIMPLICITY
⎯  Physical Consolidation
⎯  Density: Multiple racks to a single FlashBlade
⎯  Management Consolidation 
⎯  Non-disruptive upgrades
⎯  Storage capacity planning, data access, security
⎯  Backups and Restores
© 2017 PURE STORAGE INC.
30
FLASHBLADE

File & Object
AND
2.5 PBs (1:1)
N+2 REDUNDANCY
Purity
PLUS
Pure1
17TB
52TB
BLADES
Power

1150Watt/PB
8M IOPs
AND
75 GB/s
PERFORMANCE
© 2017 PURE STORAGE INC.
31
RESOURCES

Apache Spark White Papers:
1)  Guide to Supporting On-Premise Spark
Deployments with a Cloud-Scale Data Platform 
2) Engineering Unplugged: A Discussion with Pure
Storage's Brian Gold on Big Data Analytics for
Apache Spark 
Big Data Analytics: purestorage.com/analytics
FlashBlade Product Info: purestorage.com/flashblade
Storage for big-data by Joshua Robinson

More Related Content

What's hot

RedisConf17- Zettaset + Redis - Protecting Redis Enterprise while Maintaining...
RedisConf17- Zettaset + Redis - Protecting Redis Enterprise while Maintaining...RedisConf17- Zettaset + Redis - Protecting Redis Enterprise while Maintaining...
RedisConf17- Zettaset + Redis - Protecting Redis Enterprise while Maintaining...Redis Labs
 
Serverless data lake architecture
Serverless data lake architectureServerless data lake architecture
Serverless data lake architectureMaik Wiesmüller
 
Intel: How to Use Alluxio to Accelerate BigData Analytics on the Cloud and Ne...
Intel: How to Use Alluxio to Accelerate BigData Analytics on the Cloud and Ne...Intel: How to Use Alluxio to Accelerate BigData Analytics on the Cloud and Ne...
Intel: How to Use Alluxio to Accelerate BigData Analytics on the Cloud and Ne...Alluxio, Inc.
 
WekaIO: Making Machine Learning Compute Bound Again
WekaIO: Making Machine Learning Compute Bound AgainWekaIO: Making Machine Learning Compute Bound Again
WekaIO: Making Machine Learning Compute Bound Againinside-BigData.com
 
Presto + Alluxio on steroids a romantic drama on Production with happy end
Presto + Alluxio on steroids a romantic drama on Production with happy endPresto + Alluxio on steroids a romantic drama on Production with happy end
Presto + Alluxio on steroids a romantic drama on Production with happy endAlluxio, Inc.
 
Kickstart your data strategy for 2018: Getting started with Amazon Redshift
Kickstart your data strategy for 2018: Getting started with Amazon RedshiftKickstart your data strategy for 2018: Getting started with Amazon Redshift
Kickstart your data strategy for 2018: Getting started with Amazon RedshiftMatillion
 
Advancing Open Software Defined Storage
Advancing Open Software Defined Storage Advancing Open Software Defined Storage
Advancing Open Software Defined Storage Red_Hat_Storage
 
Presto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data Stores
Presto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data StoresPresto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data Stores
Presto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data StoresAlluxio, Inc.
 
Cornami Accelerates Performance on SPARK: Spark Summit East talk by Paul Master
Cornami Accelerates Performance on SPARK: Spark Summit East talk by Paul MasterCornami Accelerates Performance on SPARK: Spark Summit East talk by Paul Master
Cornami Accelerates Performance on SPARK: Spark Summit East talk by Paul MasterSpark Summit
 
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...Alluxio, Inc.
 
IoT Architectural Overview - 3 use case studies from InfluxData
IoT Architectural Overview - 3 use case studies from InfluxData IoT Architectural Overview - 3 use case studies from InfluxData
IoT Architectural Overview - 3 use case studies from InfluxData InfluxData
 
Apache Spark and Apache Ignite: Where Fast Data Meets IoT
Apache Spark and Apache Ignite: Where Fast Data Meets IoTApache Spark and Apache Ignite: Where Fast Data Meets IoT
Apache Spark and Apache Ignite: Where Fast Data Meets IoTDenis Magda
 
What's Next for Google's BigTable
What's Next for Google's BigTableWhat's Next for Google's BigTable
What's Next for Google's BigTableSqrrl
 
RedisConf17 - Turbo-charge your apps with Amazon Elasticache for Redis
RedisConf17 - Turbo-charge your apps with Amazon Elasticache for RedisRedisConf17 - Turbo-charge your apps with Amazon Elasticache for Redis
RedisConf17 - Turbo-charge your apps with Amazon Elasticache for RedisRedis Labs
 
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkBest Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkAlluxio, Inc.
 
Data Orchestration for AI, Big Data, and Cloud
Data Orchestration for AI, Big Data, and CloudData Orchestration for AI, Big Data, and Cloud
Data Orchestration for AI, Big Data, and CloudAlluxio, Inc.
 
Lenovo: Elastic Stack Practices in Enterprise Integration
Lenovo: Elastic Stack Practices in Enterprise IntegrationLenovo: Elastic Stack Practices in Enterprise Integration
Lenovo: Elastic Stack Practices in Enterprise IntegrationElasticsearch
 
Performance Models for Apache Accumulo
Performance Models for Apache AccumuloPerformance Models for Apache Accumulo
Performance Models for Apache AccumuloSqrrl
 
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & MoreMeetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & MoreAlluxio, Inc.
 
Ceph used in Cancer Research at OICR
Ceph used in Cancer Research at OICRCeph used in Cancer Research at OICR
Ceph used in Cancer Research at OICRCeph Community
 

What's hot (20)

RedisConf17- Zettaset + Redis - Protecting Redis Enterprise while Maintaining...
RedisConf17- Zettaset + Redis - Protecting Redis Enterprise while Maintaining...RedisConf17- Zettaset + Redis - Protecting Redis Enterprise while Maintaining...
RedisConf17- Zettaset + Redis - Protecting Redis Enterprise while Maintaining...
 
Serverless data lake architecture
Serverless data lake architectureServerless data lake architecture
Serverless data lake architecture
 
Intel: How to Use Alluxio to Accelerate BigData Analytics on the Cloud and Ne...
Intel: How to Use Alluxio to Accelerate BigData Analytics on the Cloud and Ne...Intel: How to Use Alluxio to Accelerate BigData Analytics on the Cloud and Ne...
Intel: How to Use Alluxio to Accelerate BigData Analytics on the Cloud and Ne...
 
WekaIO: Making Machine Learning Compute Bound Again
WekaIO: Making Machine Learning Compute Bound AgainWekaIO: Making Machine Learning Compute Bound Again
WekaIO: Making Machine Learning Compute Bound Again
 
Presto + Alluxio on steroids a romantic drama on Production with happy end
Presto + Alluxio on steroids a romantic drama on Production with happy endPresto + Alluxio on steroids a romantic drama on Production with happy end
Presto + Alluxio on steroids a romantic drama on Production with happy end
 
Kickstart your data strategy for 2018: Getting started with Amazon Redshift
Kickstart your data strategy for 2018: Getting started with Amazon RedshiftKickstart your data strategy for 2018: Getting started with Amazon Redshift
Kickstart your data strategy for 2018: Getting started with Amazon Redshift
 
Advancing Open Software Defined Storage
Advancing Open Software Defined Storage Advancing Open Software Defined Storage
Advancing Open Software Defined Storage
 
Presto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data Stores
Presto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data StoresPresto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data Stores
Presto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data Stores
 
Cornami Accelerates Performance on SPARK: Spark Summit East talk by Paul Master
Cornami Accelerates Performance on SPARK: Spark Summit East talk by Paul MasterCornami Accelerates Performance on SPARK: Spark Summit East talk by Paul Master
Cornami Accelerates Performance on SPARK: Spark Summit East talk by Paul Master
 
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
 
IoT Architectural Overview - 3 use case studies from InfluxData
IoT Architectural Overview - 3 use case studies from InfluxData IoT Architectural Overview - 3 use case studies from InfluxData
IoT Architectural Overview - 3 use case studies from InfluxData
 
Apache Spark and Apache Ignite: Where Fast Data Meets IoT
Apache Spark and Apache Ignite: Where Fast Data Meets IoTApache Spark and Apache Ignite: Where Fast Data Meets IoT
Apache Spark and Apache Ignite: Where Fast Data Meets IoT
 
What's Next for Google's BigTable
What's Next for Google's BigTableWhat's Next for Google's BigTable
What's Next for Google's BigTable
 
RedisConf17 - Turbo-charge your apps with Amazon Elasticache for Redis
RedisConf17 - Turbo-charge your apps with Amazon Elasticache for RedisRedisConf17 - Turbo-charge your apps with Amazon Elasticache for Redis
RedisConf17 - Turbo-charge your apps with Amazon Elasticache for Redis
 
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkBest Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with Spark
 
Data Orchestration for AI, Big Data, and Cloud
Data Orchestration for AI, Big Data, and CloudData Orchestration for AI, Big Data, and Cloud
Data Orchestration for AI, Big Data, and Cloud
 
Lenovo: Elastic Stack Practices in Enterprise Integration
Lenovo: Elastic Stack Practices in Enterprise IntegrationLenovo: Elastic Stack Practices in Enterprise Integration
Lenovo: Elastic Stack Practices in Enterprise Integration
 
Performance Models for Apache Accumulo
Performance Models for Apache AccumuloPerformance Models for Apache Accumulo
Performance Models for Apache Accumulo
 
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & MoreMeetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
 
Ceph used in Cancer Research at OICR
Ceph used in Cancer Research at OICRCeph used in Cancer Research at OICR
Ceph used in Cancer Research at OICR
 

Similar to Storage for big-data by Joshua Robinson

GTC Taiwan 2017 如何在充滿未知的巨量數據時代中建構一個數據中心
GTC Taiwan 2017 如何在充滿未知的巨量數據時代中建構一個數據中心GTC Taiwan 2017 如何在充滿未知的巨量數據時代中建構一個數據中心
GTC Taiwan 2017 如何在充滿未知的巨量數據時代中建構一個數據中心NVIDIA Taiwan
 
Building Resilient and Scalable Data Pipelines by Decoupling Compute and Storage
Building Resilient and Scalable Data Pipelines by Decoupling Compute and StorageBuilding Resilient and Scalable Data Pipelines by Decoupling Compute and Storage
Building Resilient and Scalable Data Pipelines by Decoupling Compute and StorageDatabricks
 
Avoiding Log Data Overload in a CI/CD System: Streaming 190 Billion Events an...
Avoiding Log Data Overload in a CI/CD System: Streaming 190 Billion Events an...Avoiding Log Data Overload in a CI/CD System: Streaming 190 Billion Events an...
Avoiding Log Data Overload in a CI/CD System: Streaming 190 Billion Events an...Databricks
 
Data at the corner of SAP and AWS
Data at the corner of SAP and AWSData at the corner of SAP and AWS
Data at the corner of SAP and AWSOcean9, Inc.
 
Efficiently Triaging CI Pipelines with Apache Spark: Mixing 52 Billion Events...
Efficiently Triaging CI Pipelines with Apache Spark: Mixing 52 Billion Events...Efficiently Triaging CI Pipelines with Apache Spark: Mixing 52 Billion Events...
Efficiently Triaging CI Pipelines with Apache Spark: Mixing 52 Billion Events...Databricks
 
Avoiding Log Data Overload in a CI/CD System While Streaming 190 Billion Even...
Avoiding Log Data Overload in a CI/CD System While Streaming 190 Billion Even...Avoiding Log Data Overload in a CI/CD System While Streaming 190 Billion Even...
Avoiding Log Data Overload in a CI/CD System While Streaming 190 Billion Even...DataWorks Summit
 
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
 TiVo: How to Scale New Products with a Data Lake on AWS and Qubole TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
TiVo: How to Scale New Products with a Data Lake on AWS and QuboleAmazon Web Services
 
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
 TiVo: How to Scale New Products with a Data Lake on AWS and Qubole TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
TiVo: How to Scale New Products with a Data Lake on AWS and QuboleAmazon Web Services
 
Flink Forward Berlin 2017: Gyula Fora - Building and operating large-scale st...
Flink Forward Berlin 2017: Gyula Fora - Building and operating large-scale st...Flink Forward Berlin 2017: Gyula Fora - Building and operating large-scale st...
Flink Forward Berlin 2017: Gyula Fora - Building and operating large-scale st...Flink Forward
 
Recipe for Success: The Right Ingredients for Enterprise-Class Cloud Data Man...
Recipe for Success: The Right Ingredients for Enterprise-Class Cloud Data Man...Recipe for Success: The Right Ingredients for Enterprise-Class Cloud Data Man...
Recipe for Success: The Right Ingredients for Enterprise-Class Cloud Data Man...Amazon Web Services
 
Aerospike Meetup - Nielsen Customer Story - Alex - 04 March 2020
Aerospike Meetup - Nielsen Customer Story - Alex - 04 March 2020Aerospike Meetup - Nielsen Customer Story - Alex - 04 March 2020
Aerospike Meetup - Nielsen Customer Story - Alex - 04 March 2020Aerospike
 
Postgres Vision 2018: Taking Postgres Everywhere
Postgres Vision 2018: Taking Postgres EverywherePostgres Vision 2018: Taking Postgres Everywhere
Postgres Vision 2018: Taking Postgres EverywhereEDB
 
Bridging Your Business Across the Enterprise and Cloud with MongoDB and NetApp
Bridging Your Business Across the Enterprise and Cloud with MongoDB and NetAppBridging Your Business Across the Enterprise and Cloud with MongoDB and NetApp
Bridging Your Business Across the Enterprise and Cloud with MongoDB and NetAppMongoDB
 
Big Data LDN 2018: LESSONS LEARNED FROM DEPLOYING REAL-WORLD AI SYSTEMS
Big Data LDN 2018: LESSONS LEARNED FROM DEPLOYING REAL-WORLD AI SYSTEMSBig Data LDN 2018: LESSONS LEARNED FROM DEPLOYING REAL-WORLD AI SYSTEMS
Big Data LDN 2018: LESSONS LEARNED FROM DEPLOYING REAL-WORLD AI SYSTEMSMatt Stubbs
 
Demystifying Data Warehouse as a Service
Demystifying Data Warehouse as a ServiceDemystifying Data Warehouse as a Service
Demystifying Data Warehouse as a ServiceSnowflake Computing
 
Demystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFWDemystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFWKent Graziano
 
Cisco Kinetic. Раскрывая ценность данных
Cisco Kinetic. Раскрывая ценность данныхCisco Kinetic. Раскрывая ценность данных
Cisco Kinetic. Раскрывая ценность данныхCisco Russia
 
Aerospike Meetup - Introduction - Ami - 04 March 2020
Aerospike Meetup - Introduction - Ami - 04 March 2020Aerospike Meetup - Introduction - Ami - 04 March 2020
Aerospike Meetup - Introduction - Ami - 04 March 2020Aerospike
 
Architecting a Modern Data Warehouse: Enterprise Must-Haves
Architecting a Modern Data Warehouse: Enterprise Must-HavesArchitecting a Modern Data Warehouse: Enterprise Must-Haves
Architecting a Modern Data Warehouse: Enterprise Must-HavesYellowbrick Data
 

Similar to Storage for big-data by Joshua Robinson (20)

GTC Taiwan 2017 如何在充滿未知的巨量數據時代中建構一個數據中心
GTC Taiwan 2017 如何在充滿未知的巨量數據時代中建構一個數據中心GTC Taiwan 2017 如何在充滿未知的巨量數據時代中建構一個數據中心
GTC Taiwan 2017 如何在充滿未知的巨量數據時代中建構一個數據中心
 
Building Resilient and Scalable Data Pipelines by Decoupling Compute and Storage
Building Resilient and Scalable Data Pipelines by Decoupling Compute and StorageBuilding Resilient and Scalable Data Pipelines by Decoupling Compute and Storage
Building Resilient and Scalable Data Pipelines by Decoupling Compute and Storage
 
Avoiding Log Data Overload in a CI/CD System: Streaming 190 Billion Events an...
Avoiding Log Data Overload in a CI/CD System: Streaming 190 Billion Events an...Avoiding Log Data Overload in a CI/CD System: Streaming 190 Billion Events an...
Avoiding Log Data Overload in a CI/CD System: Streaming 190 Billion Events an...
 
Data at the corner of SAP and AWS
Data at the corner of SAP and AWSData at the corner of SAP and AWS
Data at the corner of SAP and AWS
 
Efficiently Triaging CI Pipelines with Apache Spark: Mixing 52 Billion Events...
Efficiently Triaging CI Pipelines with Apache Spark: Mixing 52 Billion Events...Efficiently Triaging CI Pipelines with Apache Spark: Mixing 52 Billion Events...
Efficiently Triaging CI Pipelines with Apache Spark: Mixing 52 Billion Events...
 
Avoiding Log Data Overload in a CI/CD System While Streaming 190 Billion Even...
Avoiding Log Data Overload in a CI/CD System While Streaming 190 Billion Even...Avoiding Log Data Overload in a CI/CD System While Streaming 190 Billion Even...
Avoiding Log Data Overload in a CI/CD System While Streaming 190 Billion Even...
 
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
 TiVo: How to Scale New Products with a Data Lake on AWS and Qubole TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
 
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
 TiVo: How to Scale New Products with a Data Lake on AWS and Qubole TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
 
Top 5 Lessons Learned in Deploying AI in the Real World
Top 5 Lessons Learned in Deploying AI in the Real WorldTop 5 Lessons Learned in Deploying AI in the Real World
Top 5 Lessons Learned in Deploying AI in the Real World
 
Flink Forward Berlin 2017: Gyula Fora - Building and operating large-scale st...
Flink Forward Berlin 2017: Gyula Fora - Building and operating large-scale st...Flink Forward Berlin 2017: Gyula Fora - Building and operating large-scale st...
Flink Forward Berlin 2017: Gyula Fora - Building and operating large-scale st...
 
Recipe for Success: The Right Ingredients for Enterprise-Class Cloud Data Man...
Recipe for Success: The Right Ingredients for Enterprise-Class Cloud Data Man...Recipe for Success: The Right Ingredients for Enterprise-Class Cloud Data Man...
Recipe for Success: The Right Ingredients for Enterprise-Class Cloud Data Man...
 
Aerospike Meetup - Nielsen Customer Story - Alex - 04 March 2020
Aerospike Meetup - Nielsen Customer Story - Alex - 04 March 2020Aerospike Meetup - Nielsen Customer Story - Alex - 04 March 2020
Aerospike Meetup - Nielsen Customer Story - Alex - 04 March 2020
 
Postgres Vision 2018: Taking Postgres Everywhere
Postgres Vision 2018: Taking Postgres EverywherePostgres Vision 2018: Taking Postgres Everywhere
Postgres Vision 2018: Taking Postgres Everywhere
 
Bridging Your Business Across the Enterprise and Cloud with MongoDB and NetApp
Bridging Your Business Across the Enterprise and Cloud with MongoDB and NetAppBridging Your Business Across the Enterprise and Cloud with MongoDB and NetApp
Bridging Your Business Across the Enterprise and Cloud with MongoDB and NetApp
 
Big Data LDN 2018: LESSONS LEARNED FROM DEPLOYING REAL-WORLD AI SYSTEMS
Big Data LDN 2018: LESSONS LEARNED FROM DEPLOYING REAL-WORLD AI SYSTEMSBig Data LDN 2018: LESSONS LEARNED FROM DEPLOYING REAL-WORLD AI SYSTEMS
Big Data LDN 2018: LESSONS LEARNED FROM DEPLOYING REAL-WORLD AI SYSTEMS
 
Demystifying Data Warehouse as a Service
Demystifying Data Warehouse as a ServiceDemystifying Data Warehouse as a Service
Demystifying Data Warehouse as a Service
 
Demystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFWDemystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFW
 
Cisco Kinetic. Раскрывая ценность данных
Cisco Kinetic. Раскрывая ценность данныхCisco Kinetic. Раскрывая ценность данных
Cisco Kinetic. Раскрывая ценность данных
 
Aerospike Meetup - Introduction - Ami - 04 March 2020
Aerospike Meetup - Introduction - Ami - 04 March 2020Aerospike Meetup - Introduction - Ami - 04 March 2020
Aerospike Meetup - Introduction - Ami - 04 March 2020
 
Architecting a Modern Data Warehouse: Enterprise Must-Haves
Architecting a Modern Data Warehouse: Enterprise Must-HavesArchitecting a Modern Data Warehouse: Enterprise Must-Haves
Architecting a Modern Data Warehouse: Enterprise Must-Haves
 

More from Data Con LA

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA
 

More from Data Con LA (20)

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with Kafka
 

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Recently uploaded (20)

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Storage for big-data by Joshua Robinson

  • 1. JOSHUA ROBINSON FLASHBLADE ENGINEERING, PURE STORAGE From Big Data To Big Intelligence: Spark Meets Flashblade A Pure Engineering Use Case
  • 2. © 2017 PURE STORAGE INC. 2 ALL-FLASH STORAGE FOR DATA-INTENSIVE COMPUTING
  • 3. © 2017 PURE STORAGE INC. 3 FLASHBLADE FOR BIG DATA ANALYTICS FAST DATA BIG DATA X AGILE DATA X = DATA ADVANTAGE
  • 4. © 2017 PURE STORAGE INC. 4 TRADITIONAL DATA ANALYTICS EXAMPLE SALES FEED/ PIPELINE CRM ENGINEERING TICKETS EXTRACT AGGREGATE ANALYTICS PARAMETERS PRODUCT LOGS RAW LOG STORAGE GREP, AWK, ETC STORAGE COMPUTE
  • 5. © 2017 PURE STORAGE INC. 5 MODERN BIG DATA ANALYTICS EXAMPLE SALES FEED/ PIPELINE CRM ENGINEERING TICKETS PRODUCT LOGS EXTRACT AGGREGATE
  • 6. © 2017 PURE STORAGE INC. 6 INFRASTRUCTURE OF BIG DATA WAREHOUSES >6PBs ACROSS 100s OF HETEROGENEOUS DATA SILOS
  • 7. © 2017 PURE STORAGE INC. 7 BIG DATA WAREHOUSES INFRASTRUCTURE Compute Storage BIG FAST SIMPLE
  • 8. © 2017 PURE STORAGE INC. 8 INTRODUCING FLASHBLADE™ ALL-FLASH FILE AND OBJECT STORAGE BIG Up to 8 PBs FAST 75 GBps / 8M IOPS SIMPLE Seamlessly scalable BLADE PURITY FABRIC
  • 9. Automating triage of test failures in SW development A Pure Engineering Use Case
  • 10. © 2017 PURE STORAGE INC. 10 THE PROBLEM Handful 1 Test coordinator (Jenkins) Handful Handful 100s of tests
  • 11. © 2017 PURE STORAGE INC. 11 THE PROBLEM 1,000 test failures 20,000+ tests / day 20 Engineers 2x in the next 12 months 1000+ VMs 120+ FBs 20+ Jenkins 400+ clients 100+ Engineers
  • 12. © 2017 PURE STORAGE INC. 12 THE DREAM 1.  Automate triaging of failures as much as possible 2.  Extract performance metrics from the logs 3.  Save our logs for future use 4.  Do all of this in a scalable system 5.  Real-time results!
  • 13. © 2017 PURE STORAGE INC. 13 OUR DATA ANALYTICS PIPELINE 10 FB 20 clients 100+ tests 12 12 12 12 rsyslog 12 12 12 12 12 12 12 12
  • 14. © 2017 PURE STORAGE INC. 14 OUR DATA ANALYTICS PIPELINE 100 FB 200 clients 1,000+ tests 12 12 12 12 rsyslog 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12
  • 15. © 2017 PURE STORAGE INC. 15 OUR DATA ANALYTICS PIPELINE 100 FB 200 clients 1,000+ tests 12 12 12 12 rsyslog 12 12 12 12 12 12 12 12 12 12 12 12 12 12
  • 16. © 2017 PURE STORAGE INC. 16 OUR DATA ANALYTICS PIPELINE 120+ FB 400+ clients 4,000+ tests 12 12 12 12 rsyslog 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12
  • 17. © 2017 PURE STORAGE INC. 17 OUR DATA ANALYTICS PIPELINE 1,000+ VMs 120+ FBs 20+ Jenkins 400+ clients 16 16 16 16 rsyslog 12 12 12 12 12 12 12 12 12 12 6G 40 40 40 40 18T 18T 6T 6G 12 Custom code ü  Duplicate bug ü  Infrastructure failure ü  Performance regression 20,000+ tests
  • 18. © 2017 PURE STORAGE INC. 18
  • 19. © 2017 PURE STORAGE INC. 19 Processed: 18 TB 30 Billion events per day Extracted: 6 GB 8 Million events per day
  • 20. © 2017 PURE STORAGE INC. 20 THE POWER OF DATA ANALYTICS 20,000+ tests 1,000 test failures 20,000+ tests ~30 distinct test failures Data Analytics Pipeline
  • 22. © 2017 PURE STORAGE INC. 22 SCALING STORAGE
  • 23. © 2017 PURE STORAGE INC. 23 FLASHBLADE GUI
  • 24. © 2017 PURE STORAGE INC. 24 SCALING COMPUTE
  • 25. © 2017 PURE STORAGE INC. 25 OUR DATA ANALYTICS PIPELINE 1,000+ VMs 120+ FBs 20+ Jenkins 400+ clients 16 16 16 16 rsyslog 12 12 12 12 4G 40 40 40 40 18T 18T 6T 4G 12 Custom code 20,000+ tests 12 12 12 12
  • 26. © 2017 PURE STORAGE INC. 26 ANALYTICS PIPELINE SCALING COMPUTE 1.  Download docker image 2.  Mount FlashBlade on container 3.  Hot-add to Spark cluster ALL OF THIS CAN BE DONE IN A SINGLE COMMAND WITHOUT DISRUPTING YOUR SPARK JOBS!
  • 27. © 2017 PURE STORAGE INC. 27 OUR DATA ANALYTICS PIPELINE 1000+ VMs 120+ FBs 20+ Jenkins 400+ clients 16 16 16 16 rsyslog 12 12 12 12 12 12 12 12 12 12 6G 40 40 40 40 18T 18T 6T 20,000+ tests 6G 12 Custom code
  • 28. © 2017 PURE STORAGE INC. 28 INFRASTRUCTURE AGILITY AD-HOC AND BURSTY ANALYTICS rsyslog 1000-CORE SPARK CLUSTER ON FLASHBLADE
  • 29. © 2017 PURE STORAGE INC. 29 INFRASTRUCTURE SIMPLICITY ⎯  Physical Consolidation ⎯  Density: Multiple racks to a single FlashBlade ⎯  Management Consolidation ⎯  Non-disruptive upgrades ⎯  Storage capacity planning, data access, security ⎯  Backups and Restores
  • 30. © 2017 PURE STORAGE INC. 30 FLASHBLADE File & Object AND 2.5 PBs (1:1) N+2 REDUNDANCY Purity PLUS Pure1 17TB 52TB BLADES Power 1150Watt/PB 8M IOPs AND 75 GB/s PERFORMANCE
  • 31. © 2017 PURE STORAGE INC. 31 RESOURCES Apache Spark White Papers: 1)  Guide to Supporting On-Premise Spark Deployments with a Cloud-Scale Data Platform 2) Engineering Unplugged: A Discussion with Pure Storage's Brian Gold on Big Data Analytics for Apache Spark Big Data Analytics: purestorage.com/analytics FlashBlade Product Info: purestorage.com/flashblade