SlideShare a Scribd company logo
1 of 40
Lessons learned from designing
a QA automation event streaming
platform (IoT & Big Data)
Omid Vahdaty, Big Data ninja.
Agenda
● Introduction to Big Data Testing
● Challenges of Big Data Testing.
● Methodological approach for QA Automation road map
● True Story
● The Solution?
● Performance and Maintenance
Introduction to Big Data Testing
● 100M ?
● 1B ?
● 10B?
● 100B?
● 1Tr?
How Big is Big Data?
How Hard Can it be (aggregation)?
● Select Count(*) from t;
○ Assume
■ 1 Trillion records
■ ad-hoc query (no indexes)
■ Full Scan (no cache)
○ Challenges
■ Time it takes to compute?
■ IO bottleneck? What is IO pattern?
How Hard Can it be (join)?
● Select * from t1 join t2 where t1.id =t2.id
○ Assume
■ 1 Trillion records both tables.
■ ad-hoc query (no indexes)
■ Full Scan (no cache)
○ Challenges
■ Time it takes to compute?
■ IO bottleneck? What is IO pattern?
Big Data Challenges
3 Pain Points in Big Data DBMS
Big Data System
Ingress data rate Egress data rate
Maintenance rate
Big Data Ingress Challenges
● Parsing
● Rate and Amount of data coming into the system
● ACID: Atomicity, Consistency, Isolation, Durability
● Compression on the fly (non unique values)
● On the fly analytics
● Time constraints (target: x rows per sec/hour)
● High Availability Scenarios
○ Can you afford to lost evetns/data?
Big Data egress challenges
● Sort
● Group by
● Reduce
● Join (sortMergeJoin)
● Data distribution
● Compression
● Theoretical Bottlenecks of
hardware.
What is your engine algorithm doing?
What is the impact on the system?
CPU?
IO?
RAM?
Network?
What is the impact on the cluster?
Egress while Ingress Challenges
● Input never stops
● Input rate rate oscillations
● Bottlenecks
● System gets stuck at different places in a complex cluster
● Ingress performance impact on egress.
● High Availability Scenarios
○ Can you afford to lost data?
○ How much downtime can you sustain ?
Methodological Approach
Event Streaming Platform testing
Business constraints?
● Budget?
● Time to market?
● Resources available?
● Skill set available ?
Product requirements?
● What are the product’s supported use cases?
● Supported Scale?
● Supported rate of ingress?
● Supported rate on egress?
● Complexity of Cluster?
● High availability?
● Cloud or onsite?
The Automation: Must have requirements?
● Scale out/up?
● Fault tolerance!
● Reproducibility from scratch!
● Reporting!
● “Views” to avoid duplication for testing?
● Orchestration of same environment per Developer!
● Debugging options
The method
● Phase 0: get ready
○ Product requirements & business constraints
○ Automation requirements
○ Creating a testing matrix based on insight and product features.
● Phase 1: Get insights (some baselines)
○ Test Ingress separately, find your baseline.
○ Test Egress separately, find your baseline.
○ Test Ingress while Egress in progress, find your baseline.
○ Stability and Performance
● Phase 2: Agile: Design an automation system that satisfies the requirements
True Story
Hadoop based Event Streaming Platform
(Peta Scale)
True story
● Product: Hadoop based event streaming platform. [peta scale]
● Technical Constraints
○ What is the expected impact of big data on the platform?
○ How to debug scenarios issues?
○ QA???
○ Running time - how much it takes to run all the tests?
● Company Challenges
○ Cost of Hardware? High end 6U Server
○ Expertise ? skillset?
Technical Flow
1. Start by testing the simplest flow, see that behaves as expected, and then another layer of
complexity. for this step you must understand the products state machine from A to Z.
2. start with single thread (one of each component)
3. continue with small data. gradual increments. understand machine metrics of each component.
and all the different fine tune of configuration.
4. get to the point you are convinced you have big data (what does your product support?)
a. 0 data, 1M , 10M, 100M, 1GB, 10GB, 100G, 1TB, 10TB, 100TB ,1PB
5. start over , this time using multi threads/components
6. start over , this time try negative testing (fail over scenarios, data node failures, flume failures,
namenode failures etc, spark failures)
7. Make sure Logs reflect the accurate status of the product. (error/warn/info/debug)
Testing matrix dimensions
● Multiple components: scale out of of same component:
● Data
different data rates: slow/fast /bursts (delay b/w packets slow/fast)
what happens if we change data distribution?
different input size (small , large, changing)
Performance & Metrics
what are the machine metrics: collector/flume/HDFS over time? get a rule of thumb of processing
power
ingress rate?
Heap allocations per Hadoop ecosystem?
● stability and persistence:
a. what happen if Flume fail?data loss/stuck on flume?
b. recovery from HA
c. data node loss?
d. NameNode recovery? persistence of Flume
Ingress Testing matrix
single collector, single
Flume,
multi collector, single
Flume,
multi collector, multi
Flume
HA
HDFS
New cluster
Existing data small cluster
Existing cluster Big data
Stability (failure of
components)
Egress Testing Matrix
1. Hadoop
a. HDFS performance/failover/data replication/block size
b. Yarn availability/metrics of jobs
c. spark availability/ metrics of jobs/ performance tuning
2. Assuming some sort of Engine
a. State (marks which files were already processed)
b. Poller (polling of new evetns)
Egress Testing Matrix
HDFS Yarn Spark Engine state &
poller
Any other
layers
New cluster
Existing data
small cluster
Existing cluster
Big data
Stability (failure
of components)
The baseline concept
● Requirements
○ Events data (simulated or real)
○ Cluster
○ Metrics - to understand if behaviour is expected.
○ Do i need to QA hadoop ecosystem?
● Steps
○ Create a golden standard - per event type - per building block
○ permutate
The solution
The Solution: Event Flow testing framework.
● TBD
Challenges with Ingress
● Amount of collectors
● Events streaming cluster
○ Kafka VS. Flume
○ Hadoop tuning and maintenance
○ Data loss - how to handle
○ Performance of HDFS sink
○ Channel persistency
○ Channel performance
○ Downtime + recovery time
Challenges with Egress
● Hadoop tuning
● Spark Tuning
● Engine Tuning
○ Size of file on HDFS - bigger is better?
○ Processing
■ Recovery from Crushes
■ Delayed data
● In order /out of order
Challenges with Big Data testing.
● Generating Time of events → genUtil with json for user input
● Reproducibility → genUtil again.
● Disk space for testing data → peta scale product → peta scale testing data
● Results compared mechanism
● Network bottlenecks on a scale out system...
● ORDER IS NOT GUARANTEED!
● Replications
The solution: Continuous Testing
1. Extra large Sanity Testing per commit (!)
2.Hourly testing on latest commit (24 X 7)
3. Weekly testing
4. Monthly testing
The solution: Hardware perspective
● Cluster utilization? (mesos, dockers)?
● Developer sandbox: Dockers
● Uniform OS
● Uniform Hardware
● Get red of weak links
Performance Challenges: app perspective
● Architecture bottlenecks [ count(*), group by, compression, sort Merge Join]
● What is IO pattern?
○ OLTP VS OLAP.
○ Columnar or Row based?
○ How big is your data?
○ READ % vs WRITE %.
○ Sequential? random?
○ Temporary VS permanent
Performance and Maintenance
Performance Challenges: OPS perspective
● Metrics
○ What is Expected Total RAM required per Query?
○ OS RAM cache , CPU Cache hits ?
○ SWAP ? yes/no how much?
○ OS metrics - open files, stack size, realtime priority?
● Theoretical limits?
○ Disk type selection -limitation, expected throughput
○ Raid controller, RAID type? Configuration? caching?
○ File system used? DFS? PFS? Ext4? XFS?
Maintenance challenges: OPS Perspective
How to Optimize your hardware selection ? ( Analytics on your testing data)
a. Compute intensive
i. GPU CORE intensive
ii. CPUcores intensive
1. Frequency
2. Amount of thread for concurrency
b. IO intensive?
i. SSD?
ii. NearLine sata?
Maintenance challenges: DevOps Perspective
● Lightweight - python code
○ Advantage → focus on generic skill set (python coding)
○ Disadvantage → reinventing the wheel
● Fault tolerance - Scale out topology
○ Cheap “desktops” (weak machines)
■ Advantage - >
● quick recovery from Image
● Cheap, low risk, pay as you go.
● Gets 90 % of the job DONE!
Maintenance challenges: DevOps Perspective
● Infrastructure
○ Continuous integration: continuous build, continuous packaging, installer.
○ Continuous Deployment: pre flight check, remote machine (on site, private cloud, cloud)
○ Continuous testing: Sanity, Hourly, Daily, weekly
● Reporting and Monitoring:
○ Extensive report server and analytics on current testing.
○ Monitoring: Green/Red and system real time metrics.
Maintenance challenges: Innovation Perspective
1. Data generation (using GPU to generate data)
2. (hashing)
3. Workload (one dispatcher, many compute nodes)
4. saving running time: data/compute time
a. Hashing
b. Cacheing
c. Smart Data Creating - no need to create 100B events to generate 100B events test.
d. Logs: Test results were managed on a DB
5. Dockers
Lesson learned (insights)
● Very hard to guarantee 100% coverage
● Quality in a reasonable cost is a huge challenge
● Very easy to miss milestones with QA of big data,
● each mistake create a huge ripple effect in timelines
● Main costs:
○ Employees : team of 2 people working 220 hours per month over 1 years.
○ Running Time: Coding of innovative algorithms inside testing framework,
○ Maintenance: Automation, Monitoring, Analytics
○ Hardware: innovative DevOps , innovative OPS, research
Stay in touch...
● Omid Vahdaty
● +972-54-2384178

More Related Content

What's hot

Efficient and Fast Time Series Storage - The missing link in dynamic software...
Efficient and Fast Time Series Storage - The missing link in dynamic software...Efficient and Fast Time Series Storage - The missing link in dynamic software...
Efficient and Fast Time Series Storage - The missing link in dynamic software...Florian Lautenschlager
 
Garbage collection in JVM
Garbage collection in JVMGarbage collection in JVM
Garbage collection in JVMaragozin
 
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...NETWAYS
 
OpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ CriteoOpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ CriteoNathaniel Braun
 
Apache Solr as a compressed, scalable, and high performance time series database
Apache Solr as a compressed, scalable, and high performance time series databaseApache Solr as a compressed, scalable, and high performance time series database
Apache Solr as a compressed, scalable, and high performance time series databaseFlorian Lautenschlager
 
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...NoSQLmatters
 
Writing Applications for Scylla
Writing Applications for ScyllaWriting Applications for Scylla
Writing Applications for ScyllaScyllaDB
 
OpenTSDB: HBaseCon2017
OpenTSDB: HBaseCon2017OpenTSDB: HBaseCon2017
OpenTSDB: HBaseCon2017HBaseCon
 
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLSebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLFlink Forward
 
Golang in TiDB (GopherChina 2017)
Golang in TiDB  (GopherChina 2017)Golang in TiDB  (GopherChina 2017)
Golang in TiDB (GopherChina 2017)PingCAP
 
Kubernetes as data platform
Kubernetes as data platformKubernetes as data platform
Kubernetes as data platformLars Albertsson
 

What's hot (11)

Efficient and Fast Time Series Storage - The missing link in dynamic software...
Efficient and Fast Time Series Storage - The missing link in dynamic software...Efficient and Fast Time Series Storage - The missing link in dynamic software...
Efficient and Fast Time Series Storage - The missing link in dynamic software...
 
Garbage collection in JVM
Garbage collection in JVMGarbage collection in JVM
Garbage collection in JVM
 
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
 
OpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ CriteoOpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ Criteo
 
Apache Solr as a compressed, scalable, and high performance time series database
Apache Solr as a compressed, scalable, and high performance time series databaseApache Solr as a compressed, scalable, and high performance time series database
Apache Solr as a compressed, scalable, and high performance time series database
 
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
 
Writing Applications for Scylla
Writing Applications for ScyllaWriting Applications for Scylla
Writing Applications for Scylla
 
OpenTSDB: HBaseCon2017
OpenTSDB: HBaseCon2017OpenTSDB: HBaseCon2017
OpenTSDB: HBaseCon2017
 
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLSebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
 
Golang in TiDB (GopherChina 2017)
Golang in TiDB  (GopherChina 2017)Golang in TiDB  (GopherChina 2017)
Golang in TiDB (GopherChina 2017)
 
Kubernetes as data platform
Kubernetes as data platformKubernetes as data platform
Kubernetes as data platform
 

Similar to Lessons learned from designing QA automation event streaming platform(IoT big data)

Ensuring Performance in a Fast-Paced Environment (CMG 2014)
Ensuring Performance in a Fast-Paced Environment (CMG 2014)Ensuring Performance in a Fast-Paced Environment (CMG 2014)
Ensuring Performance in a Fast-Paced Environment (CMG 2014)Martin Spier
 
Thinking DevOps in the era of the Cloud - Demi Ben-Ari
Thinking DevOps in the era of the Cloud - Demi Ben-AriThinking DevOps in the era of the Cloud - Demi Ben-Ari
Thinking DevOps in the era of the Cloud - Demi Ben-AriDemi Ben-Ari
 
Machine Learning for Capacity Management
 Machine Learning for Capacity Management Machine Learning for Capacity Management
Machine Learning for Capacity ManagementEDB
 
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...InfluxData
 
Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...
Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...
Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...StormForge .io
 
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Guglielmo Iozzia
 
Gatling - Bordeaux JUG
Gatling - Bordeaux JUGGatling - Bordeaux JUG
Gatling - Bordeaux JUGslandelle
 
Elephant grooming: quality with Hadoop
Elephant grooming: quality with HadoopElephant grooming: quality with Hadoop
Elephant grooming: quality with HadoopRoman Nikitchenko
 
Production ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeProduction ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeIdo Shilon
 
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with SchlumbergerGet Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumbergerinside-BigData.com
 
Elasticsearch Performance Testing and Scaling @ Signal
Elasticsearch Performance Testing and Scaling @ SignalElasticsearch Performance Testing and Scaling @ Signal
Elasticsearch Performance Testing and Scaling @ SignalJoachim Draeger
 
Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3LibbySchulze
 
Scaling up uber's real time data analytics
Scaling up uber's real time data analyticsScaling up uber's real time data analytics
Scaling up uber's real time data analyticsXiang Fu
 
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned Omid Vahdaty
 
Holistic data application quality
Holistic data application qualityHolistic data application quality
Holistic data application qualityLars Albertsson
 
Beyond unit tests: Deployment and testing for Hadoop/Spark workflows
Beyond unit tests: Deployment and testing for Hadoop/Spark workflowsBeyond unit tests: Deployment and testing for Hadoop/Spark workflows
Beyond unit tests: Deployment and testing for Hadoop/Spark workflowsDataWorks Summit
 
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...Codemotion
 
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...Demi Ben-Ari
 
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...Red Hat Developers
 
Toronto meetup 20190917
Toronto meetup 20190917Toronto meetup 20190917
Toronto meetup 20190917Bill Liu
 

Similar to Lessons learned from designing QA automation event streaming platform(IoT big data) (20)

Ensuring Performance in a Fast-Paced Environment (CMG 2014)
Ensuring Performance in a Fast-Paced Environment (CMG 2014)Ensuring Performance in a Fast-Paced Environment (CMG 2014)
Ensuring Performance in a Fast-Paced Environment (CMG 2014)
 
Thinking DevOps in the era of the Cloud - Demi Ben-Ari
Thinking DevOps in the era of the Cloud - Demi Ben-AriThinking DevOps in the era of the Cloud - Demi Ben-Ari
Thinking DevOps in the era of the Cloud - Demi Ben-Ari
 
Machine Learning for Capacity Management
 Machine Learning for Capacity Management Machine Learning for Capacity Management
Machine Learning for Capacity Management
 
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
 
Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...
Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...
Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...
 
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
 
Gatling - Bordeaux JUG
Gatling - Bordeaux JUGGatling - Bordeaux JUG
Gatling - Bordeaux JUG
 
Elephant grooming: quality with Hadoop
Elephant grooming: quality with HadoopElephant grooming: quality with Hadoop
Elephant grooming: quality with Hadoop
 
Production ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeProduction ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ waze
 
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with SchlumbergerGet Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
 
Elasticsearch Performance Testing and Scaling @ Signal
Elasticsearch Performance Testing and Scaling @ SignalElasticsearch Performance Testing and Scaling @ Signal
Elasticsearch Performance Testing and Scaling @ Signal
 
Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3
 
Scaling up uber's real time data analytics
Scaling up uber's real time data analyticsScaling up uber's real time data analytics
Scaling up uber's real time data analytics
 
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
 
Holistic data application quality
Holistic data application qualityHolistic data application quality
Holistic data application quality
 
Beyond unit tests: Deployment and testing for Hadoop/Spark workflows
Beyond unit tests: Deployment and testing for Hadoop/Spark workflowsBeyond unit tests: Deployment and testing for Hadoop/Spark workflows
Beyond unit tests: Deployment and testing for Hadoop/Spark workflows
 
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
 
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
 
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
 
Toronto meetup 20190917
Toronto meetup 20190917Toronto meetup 20190917
Toronto meetup 20190917
 

More from Omid Vahdaty

Data Pipline Observability meetup
Data Pipline Observability meetup Data Pipline Observability meetup
Data Pipline Observability meetup Omid Vahdaty
 
Couchbase Data Platform | Big Data Demystified
Couchbase Data Platform | Big Data DemystifiedCouchbase Data Platform | Big Data Demystified
Couchbase Data Platform | Big Data DemystifiedOmid Vahdaty
 
Machine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedMachine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedOmid Vahdaty
 
Machine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data DemystifiedMachine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data DemystifiedOmid Vahdaty
 
The technology of fake news between a new front and a new frontier | Big Dat...
The technology of fake news  between a new front and a new frontier | Big Dat...The technology of fake news  between a new front and a new frontier | Big Dat...
The technology of fake news between a new front and a new frontier | Big Dat...Omid Vahdaty
 
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3  Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3 Omid Vahdaty
 
Making your analytics talk business | Big Data Demystified
Making your analytics talk business | Big Data DemystifiedMaking your analytics talk business | Big Data Demystified
Making your analytics talk business | Big Data DemystifiedOmid Vahdaty
 
BI STRATEGY FROM A BIRD'S EYE VIEW (How to become a trusted advisor) | Omri H...
BI STRATEGY FROM A BIRD'S EYE VIEW (How to become a trusted advisor) | Omri H...BI STRATEGY FROM A BIRD'S EYE VIEW (How to become a trusted advisor) | Omri H...
BI STRATEGY FROM A BIRD'S EYE VIEW (How to become a trusted advisor) | Omri H...Omid Vahdaty
 
AI and Big Data in Health Sector Opportunities and challenges | Big Data Demy...
AI and Big Data in Health Sector Opportunities and challenges | Big Data Demy...AI and Big Data in Health Sector Opportunities and challenges | Big Data Demy...
AI and Big Data in Health Sector Opportunities and challenges | Big Data Demy...Omid Vahdaty
 
Aerospike meetup july 2019 | Big Data Demystified
Aerospike meetup july 2019 | Big Data DemystifiedAerospike meetup july 2019 | Big Data Demystified
Aerospike meetup july 2019 | Big Data DemystifiedOmid Vahdaty
 
ALIGNING YOUR BI OPERATIONS WITH YOUR CUSTOMERS' UNSPOKEN NEEDS, by Eyal Stei...
ALIGNING YOUR BI OPERATIONS WITH YOUR CUSTOMERS' UNSPOKEN NEEDS, by Eyal Stei...ALIGNING YOUR BI OPERATIONS WITH YOUR CUSTOMERS' UNSPOKEN NEEDS, by Eyal Stei...
ALIGNING YOUR BI OPERATIONS WITH YOUR CUSTOMERS' UNSPOKEN NEEDS, by Eyal Stei...Omid Vahdaty
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | EnglishOmid Vahdaty
 
AWS Big Data Demystified #4 data governance demystified [security, networ...
AWS Big Data Demystified #4   data governance demystified   [security, networ...AWS Big Data Demystified #4   data governance demystified   [security, networ...
AWS Big Data Demystified #4 data governance demystified [security, networ...Omid Vahdaty
 
AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...
AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...
AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...Omid Vahdaty
 
AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive
AWS Big Data Demystified #2 |  Athena, Spectrum, Emr, Hive AWS Big Data Demystified #2 |  Athena, Spectrum, Emr, Hive
AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive Omid Vahdaty
 
Amazon aws big data demystified | Introduction to streaming and messaging flu...
Amazon aws big data demystified | Introduction to streaming and messaging flu...Amazon aws big data demystified | Introduction to streaming and messaging flu...
Amazon aws big data demystified | Introduction to streaming and messaging flu...Omid Vahdaty
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned Omid Vahdaty
 
Emr spark tuning demystified
Emr spark tuning demystifiedEmr spark tuning demystified
Emr spark tuning demystifiedOmid Vahdaty
 
Emr zeppelin & Livy demystified
Emr zeppelin & Livy demystifiedEmr zeppelin & Livy demystified
Emr zeppelin & Livy demystifiedOmid Vahdaty
 
Zeppelin and spark sql demystified
Zeppelin and spark sql demystifiedZeppelin and spark sql demystified
Zeppelin and spark sql demystifiedOmid Vahdaty
 

More from Omid Vahdaty (20)

Data Pipline Observability meetup
Data Pipline Observability meetup Data Pipline Observability meetup
Data Pipline Observability meetup
 
Couchbase Data Platform | Big Data Demystified
Couchbase Data Platform | Big Data DemystifiedCouchbase Data Platform | Big Data Demystified
Couchbase Data Platform | Big Data Demystified
 
Machine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedMachine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data Demystified
 
Machine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data DemystifiedMachine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data Demystified
 
The technology of fake news between a new front and a new frontier | Big Dat...
The technology of fake news  between a new front and a new frontier | Big Dat...The technology of fake news  between a new front and a new frontier | Big Dat...
The technology of fake news between a new front and a new frontier | Big Dat...
 
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3  Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3
 
Making your analytics talk business | Big Data Demystified
Making your analytics talk business | Big Data DemystifiedMaking your analytics talk business | Big Data Demystified
Making your analytics talk business | Big Data Demystified
 
BI STRATEGY FROM A BIRD'S EYE VIEW (How to become a trusted advisor) | Omri H...
BI STRATEGY FROM A BIRD'S EYE VIEW (How to become a trusted advisor) | Omri H...BI STRATEGY FROM A BIRD'S EYE VIEW (How to become a trusted advisor) | Omri H...
BI STRATEGY FROM A BIRD'S EYE VIEW (How to become a trusted advisor) | Omri H...
 
AI and Big Data in Health Sector Opportunities and challenges | Big Data Demy...
AI and Big Data in Health Sector Opportunities and challenges | Big Data Demy...AI and Big Data in Health Sector Opportunities and challenges | Big Data Demy...
AI and Big Data in Health Sector Opportunities and challenges | Big Data Demy...
 
Aerospike meetup july 2019 | Big Data Demystified
Aerospike meetup july 2019 | Big Data DemystifiedAerospike meetup july 2019 | Big Data Demystified
Aerospike meetup july 2019 | Big Data Demystified
 
ALIGNING YOUR BI OPERATIONS WITH YOUR CUSTOMERS' UNSPOKEN NEEDS, by Eyal Stei...
ALIGNING YOUR BI OPERATIONS WITH YOUR CUSTOMERS' UNSPOKEN NEEDS, by Eyal Stei...ALIGNING YOUR BI OPERATIONS WITH YOUR CUSTOMERS' UNSPOKEN NEEDS, by Eyal Stei...
ALIGNING YOUR BI OPERATIONS WITH YOUR CUSTOMERS' UNSPOKEN NEEDS, by Eyal Stei...
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
 
AWS Big Data Demystified #4 data governance demystified [security, networ...
AWS Big Data Demystified #4   data governance demystified   [security, networ...AWS Big Data Demystified #4   data governance demystified   [security, networ...
AWS Big Data Demystified #4 data governance demystified [security, networ...
 
AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...
AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...
AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...
 
AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive
AWS Big Data Demystified #2 |  Athena, Spectrum, Emr, Hive AWS Big Data Demystified #2 |  Athena, Spectrum, Emr, Hive
AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive
 
Amazon aws big data demystified | Introduction to streaming and messaging flu...
Amazon aws big data demystified | Introduction to streaming and messaging flu...Amazon aws big data demystified | Introduction to streaming and messaging flu...
Amazon aws big data demystified | Introduction to streaming and messaging flu...
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
 
Emr spark tuning demystified
Emr spark tuning demystifiedEmr spark tuning demystified
Emr spark tuning demystified
 
Emr zeppelin & Livy demystified
Emr zeppelin & Livy demystifiedEmr zeppelin & Livy demystified
Emr zeppelin & Livy demystified
 
Zeppelin and spark sql demystified
Zeppelin and spark sql demystifiedZeppelin and spark sql demystified
Zeppelin and spark sql demystified
 

Recently uploaded

CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)dollysharma2066
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
DATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage exampleDATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage examplePragyanshuParadkar1
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 

Recently uploaded (20)

young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
DATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage exampleDATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage example
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 

Lessons learned from designing QA automation event streaming platform(IoT big data)

  • 1. Lessons learned from designing a QA automation event streaming platform (IoT & Big Data) Omid Vahdaty, Big Data ninja.
  • 2. Agenda ● Introduction to Big Data Testing ● Challenges of Big Data Testing. ● Methodological approach for QA Automation road map ● True Story ● The Solution? ● Performance and Maintenance
  • 3. Introduction to Big Data Testing
  • 4. ● 100M ? ● 1B ? ● 10B? ● 100B? ● 1Tr? How Big is Big Data?
  • 5. How Hard Can it be (aggregation)? ● Select Count(*) from t; ○ Assume ■ 1 Trillion records ■ ad-hoc query (no indexes) ■ Full Scan (no cache) ○ Challenges ■ Time it takes to compute? ■ IO bottleneck? What is IO pattern?
  • 6. How Hard Can it be (join)? ● Select * from t1 join t2 where t1.id =t2.id ○ Assume ■ 1 Trillion records both tables. ■ ad-hoc query (no indexes) ■ Full Scan (no cache) ○ Challenges ■ Time it takes to compute? ■ IO bottleneck? What is IO pattern?
  • 8. 3 Pain Points in Big Data DBMS Big Data System Ingress data rate Egress data rate Maintenance rate
  • 9. Big Data Ingress Challenges ● Parsing ● Rate and Amount of data coming into the system ● ACID: Atomicity, Consistency, Isolation, Durability ● Compression on the fly (non unique values) ● On the fly analytics ● Time constraints (target: x rows per sec/hour) ● High Availability Scenarios ○ Can you afford to lost evetns/data?
  • 10. Big Data egress challenges ● Sort ● Group by ● Reduce ● Join (sortMergeJoin) ● Data distribution ● Compression ● Theoretical Bottlenecks of hardware. What is your engine algorithm doing? What is the impact on the system? CPU? IO? RAM? Network? What is the impact on the cluster?
  • 11. Egress while Ingress Challenges ● Input never stops ● Input rate rate oscillations ● Bottlenecks ● System gets stuck at different places in a complex cluster ● Ingress performance impact on egress. ● High Availability Scenarios ○ Can you afford to lost data? ○ How much downtime can you sustain ?
  • 13. Business constraints? ● Budget? ● Time to market? ● Resources available? ● Skill set available ?
  • 14. Product requirements? ● What are the product’s supported use cases? ● Supported Scale? ● Supported rate of ingress? ● Supported rate on egress? ● Complexity of Cluster? ● High availability? ● Cloud or onsite?
  • 15. The Automation: Must have requirements? ● Scale out/up? ● Fault tolerance! ● Reproducibility from scratch! ● Reporting! ● “Views” to avoid duplication for testing? ● Orchestration of same environment per Developer! ● Debugging options
  • 16. The method ● Phase 0: get ready ○ Product requirements & business constraints ○ Automation requirements ○ Creating a testing matrix based on insight and product features. ● Phase 1: Get insights (some baselines) ○ Test Ingress separately, find your baseline. ○ Test Egress separately, find your baseline. ○ Test Ingress while Egress in progress, find your baseline. ○ Stability and Performance ● Phase 2: Agile: Design an automation system that satisfies the requirements
  • 17. True Story Hadoop based Event Streaming Platform (Peta Scale)
  • 18. True story ● Product: Hadoop based event streaming platform. [peta scale] ● Technical Constraints ○ What is the expected impact of big data on the platform? ○ How to debug scenarios issues? ○ QA??? ○ Running time - how much it takes to run all the tests? ● Company Challenges ○ Cost of Hardware? High end 6U Server ○ Expertise ? skillset?
  • 19. Technical Flow 1. Start by testing the simplest flow, see that behaves as expected, and then another layer of complexity. for this step you must understand the products state machine from A to Z. 2. start with single thread (one of each component) 3. continue with small data. gradual increments. understand machine metrics of each component. and all the different fine tune of configuration. 4. get to the point you are convinced you have big data (what does your product support?) a. 0 data, 1M , 10M, 100M, 1GB, 10GB, 100G, 1TB, 10TB, 100TB ,1PB 5. start over , this time using multi threads/components 6. start over , this time try negative testing (fail over scenarios, data node failures, flume failures, namenode failures etc, spark failures) 7. Make sure Logs reflect the accurate status of the product. (error/warn/info/debug)
  • 20. Testing matrix dimensions ● Multiple components: scale out of of same component: ● Data different data rates: slow/fast /bursts (delay b/w packets slow/fast) what happens if we change data distribution? different input size (small , large, changing) Performance & Metrics what are the machine metrics: collector/flume/HDFS over time? get a rule of thumb of processing power ingress rate? Heap allocations per Hadoop ecosystem? ● stability and persistence: a. what happen if Flume fail?data loss/stuck on flume? b. recovery from HA c. data node loss? d. NameNode recovery? persistence of Flume
  • 21. Ingress Testing matrix single collector, single Flume, multi collector, single Flume, multi collector, multi Flume HA HDFS New cluster Existing data small cluster Existing cluster Big data Stability (failure of components)
  • 22. Egress Testing Matrix 1. Hadoop a. HDFS performance/failover/data replication/block size b. Yarn availability/metrics of jobs c. spark availability/ metrics of jobs/ performance tuning 2. Assuming some sort of Engine a. State (marks which files were already processed) b. Poller (polling of new evetns)
  • 23. Egress Testing Matrix HDFS Yarn Spark Engine state & poller Any other layers New cluster Existing data small cluster Existing cluster Big data Stability (failure of components)
  • 24. The baseline concept ● Requirements ○ Events data (simulated or real) ○ Cluster ○ Metrics - to understand if behaviour is expected. ○ Do i need to QA hadoop ecosystem? ● Steps ○ Create a golden standard - per event type - per building block ○ permutate
  • 26. The Solution: Event Flow testing framework. ● TBD
  • 27. Challenges with Ingress ● Amount of collectors ● Events streaming cluster ○ Kafka VS. Flume ○ Hadoop tuning and maintenance ○ Data loss - how to handle ○ Performance of HDFS sink ○ Channel persistency ○ Channel performance ○ Downtime + recovery time
  • 28. Challenges with Egress ● Hadoop tuning ● Spark Tuning ● Engine Tuning ○ Size of file on HDFS - bigger is better? ○ Processing ■ Recovery from Crushes ■ Delayed data ● In order /out of order
  • 29. Challenges with Big Data testing. ● Generating Time of events → genUtil with json for user input ● Reproducibility → genUtil again. ● Disk space for testing data → peta scale product → peta scale testing data ● Results compared mechanism ● Network bottlenecks on a scale out system... ● ORDER IS NOT GUARANTEED! ● Replications
  • 30. The solution: Continuous Testing 1. Extra large Sanity Testing per commit (!) 2.Hourly testing on latest commit (24 X 7) 3. Weekly testing 4. Monthly testing
  • 31. The solution: Hardware perspective ● Cluster utilization? (mesos, dockers)? ● Developer sandbox: Dockers ● Uniform OS ● Uniform Hardware ● Get red of weak links
  • 32. Performance Challenges: app perspective ● Architecture bottlenecks [ count(*), group by, compression, sort Merge Join] ● What is IO pattern? ○ OLTP VS OLAP. ○ Columnar or Row based? ○ How big is your data? ○ READ % vs WRITE %. ○ Sequential? random? ○ Temporary VS permanent
  • 34. Performance Challenges: OPS perspective ● Metrics ○ What is Expected Total RAM required per Query? ○ OS RAM cache , CPU Cache hits ? ○ SWAP ? yes/no how much? ○ OS metrics - open files, stack size, realtime priority? ● Theoretical limits? ○ Disk type selection -limitation, expected throughput ○ Raid controller, RAID type? Configuration? caching? ○ File system used? DFS? PFS? Ext4? XFS?
  • 35. Maintenance challenges: OPS Perspective How to Optimize your hardware selection ? ( Analytics on your testing data) a. Compute intensive i. GPU CORE intensive ii. CPUcores intensive 1. Frequency 2. Amount of thread for concurrency b. IO intensive? i. SSD? ii. NearLine sata?
  • 36. Maintenance challenges: DevOps Perspective ● Lightweight - python code ○ Advantage → focus on generic skill set (python coding) ○ Disadvantage → reinventing the wheel ● Fault tolerance - Scale out topology ○ Cheap “desktops” (weak machines) ■ Advantage - > ● quick recovery from Image ● Cheap, low risk, pay as you go. ● Gets 90 % of the job DONE!
  • 37. Maintenance challenges: DevOps Perspective ● Infrastructure ○ Continuous integration: continuous build, continuous packaging, installer. ○ Continuous Deployment: pre flight check, remote machine (on site, private cloud, cloud) ○ Continuous testing: Sanity, Hourly, Daily, weekly ● Reporting and Monitoring: ○ Extensive report server and analytics on current testing. ○ Monitoring: Green/Red and system real time metrics.
  • 38. Maintenance challenges: Innovation Perspective 1. Data generation (using GPU to generate data) 2. (hashing) 3. Workload (one dispatcher, many compute nodes) 4. saving running time: data/compute time a. Hashing b. Cacheing c. Smart Data Creating - no need to create 100B events to generate 100B events test. d. Logs: Test results were managed on a DB 5. Dockers
  • 39. Lesson learned (insights) ● Very hard to guarantee 100% coverage ● Quality in a reasonable cost is a huge challenge ● Very easy to miss milestones with QA of big data, ● each mistake create a huge ripple effect in timelines ● Main costs: ○ Employees : team of 2 people working 220 hours per month over 1 years. ○ Running Time: Coding of innovative algorithms inside testing framework, ○ Maintenance: Automation, Monitoring, Analytics ○ Hardware: innovative DevOps , innovative OPS, research
  • 40. Stay in touch... ● Omid Vahdaty ● +972-54-2384178