SlideShare a Scribd company logo
1 of 21
Download to read offline
Testing Processing Frameworks
Streaming
and
Gabriela Choy
Spark / Storm
Spark Storm
Implemented in Scala Clojure, Java
Delivery Semantics
Exactly once At least once. Exactly
once with Trident
API
Python, Java, Scala Java, Scala, Clojure,
Python, etc. Trident:
Java, Scala, Clojure.
Processing Model
Batch. Micro-batches
with Spark Streaming.
~ 500ms
Record at a time/
Trident allows for
micro-batches.
Latency 1 - 2 seconds sub-seconds
Pipeline
Streaming
MySQL
MySQL
Producer/s
Producer/s
Each pipeline is run
independently
Node Specifications
Spark Streaming Storm
4 AWS nodes m3.
medium
Zookeeper 3.4.6
Kafka 0.8.2.1
Spark (streaming) 1.3
4 AWS nodes m3.
medium
Zookeeper 3.4.6
Kafka 0.8.2.1
Storm 0.9.5
Spark Streaming: 1 master node, 3 workers
Cluster Configuration
Master node
Worker 1
Worker 2
Worker 3
Storm : 1 nimbus, 3 Supervisors
Cluster Configuration
Nimbus
Supervisor 1
Supervisor 2
Supervisor 3
Metric
Throughput: amount of data that is being
processed.
● By changing batch size
● By changing load (i.e. Scaling up)
● Programs used for benchmarking will be wordcount.
# Producers Batch Interval
1 1s, 2s, 3s, 4s, 6s
4
1s, 2s, 3s, 4s, 6s
8
1s, 2s, 3s, 4s, 6s
Tests for Spark Streaming
Throughput for 1 producer with 95% CI
Throughput for 1 producer
Throughput for 4 producers with 95% CI
Throughput for 4 producers
Throughput for 8 producers
Throughput for 8 producers
# Producers Tuples Emitted-Acked
1 10 min
4 10 min
8 10 min
Tests for Storm
Preliminary results for storm
Tuples Emitted per Second
Tuples Acked per Second
Spout Latency
Takeaways
● Setting the batch interval in spark streaming
should be done by monitoring processing times
and load size
● For Storm as numbers of producers increase so
does throughput and spout latency.
Would like to add:
● Increase number of producers. Use real data.
● Add a graph as a second use case.
● Dashboard to monitor live streaming.
Gabriela Choy
Bsc. in Chem. Engineering. ULA, Vnzla.
Msc in Statistics. UT Dallas
Previously: Worked in Device Reliability Engineering at View, Inc.
About Me

More Related Content

What's hot

What's hot (20)

Benchmarking, Load Testing, and Preventing Terrible Disasters
Benchmarking, Load Testing, and Preventing Terrible DisastersBenchmarking, Load Testing, and Preventing Terrible Disasters
Benchmarking, Load Testing, and Preventing Terrible Disasters
 
Webinar patterns anti patterns
Webinar patterns anti patternsWebinar patterns anti patterns
Webinar patterns anti patterns
 
Openzipkin conf: Zipkin at Yelp
Openzipkin conf: Zipkin at YelpOpenzipkin conf: Zipkin at Yelp
Openzipkin conf: Zipkin at Yelp
 
Speeding up your testflow
Speeding up your testflowSpeeding up your testflow
Speeding up your testflow
 
.NET Fest 2018. Владимир Крамар. Многопоточное и асинхронное программирование...
.NET Fest 2018. Владимир Крамар. Многопоточное и асинхронное программирование....NET Fest 2018. Владимир Крамар. Многопоточное и асинхронное программирование...
.NET Fest 2018. Владимир Крамар. Многопоточное и асинхронное программирование...
 
Get Going With RVM and Rails 3
Get Going With RVM and Rails 3Get Going With RVM and Rails 3
Get Going With RVM and Rails 3
 
ReactiveX
ReactiveXReactiveX
ReactiveX
 
Evented Ruby VS Node.js
Evented Ruby VS Node.jsEvented Ruby VS Node.js
Evented Ruby VS Node.js
 
Pulsarctl & Pulsar Manager
Pulsarctl & Pulsar ManagerPulsarctl & Pulsar Manager
Pulsarctl & Pulsar Manager
 
Rust kafka-5-2019-unskip
Rust kafka-5-2019-unskipRust kafka-5-2019-unskip
Rust kafka-5-2019-unskip
 
Samza portable runner for beam
Samza portable runner for beamSamza portable runner for beam
Samza portable runner for beam
 
Colorado OpenStack 5th Birthday Monasca Operations
Colorado OpenStack 5th Birthday Monasca OperationsColorado OpenStack 5th Birthday Monasca Operations
Colorado OpenStack 5th Birthday Monasca Operations
 
Apache Bookkeeper and Apache Zookeeper for Apache Pulsar
Apache Bookkeeper and Apache Zookeeper for Apache PulsarApache Bookkeeper and Apache Zookeeper for Apache Pulsar
Apache Bookkeeper and Apache Zookeeper for Apache Pulsar
 
Clouds presentation, aws meetup v2
Clouds presentation, aws meetup   v2Clouds presentation, aws meetup   v2
Clouds presentation, aws meetup v2
 
Data analysis with Galaxy on the Cloud
Data analysis with Galaxy on the CloudData analysis with Galaxy on the Cloud
Data analysis with Galaxy on the Cloud
 
DOD 2016 - Kamil Szczygieł - Patching 100 OpenStack Compute Nodes with Zero-d...
DOD 2016 - Kamil Szczygieł - Patching 100 OpenStack Compute Nodes with Zero-d...DOD 2016 - Kamil Szczygieł - Patching 100 OpenStack Compute Nodes with Zero-d...
DOD 2016 - Kamil Szczygieł - Patching 100 OpenStack Compute Nodes with Zero-d...
 
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
 
Liferay OpenShift base concepts
Liferay OpenShift base conceptsLiferay OpenShift base concepts
Liferay OpenShift base concepts
 
Rust with-kafka-07-02-2019
Rust with-kafka-07-02-2019Rust with-kafka-07-02-2019
Rust with-kafka-07-02-2019
 
Keeping Latency Low and Throughput High with Application-level Priority Manag...
Keeping Latency Low and Throughput High with Application-level Priority Manag...Keeping Latency Low and Throughput High with Application-level Priority Manag...
Keeping Latency Low and Throughput High with Application-level Priority Manag...
 

Similar to Comparing processing frameworks v7

Similar to Comparing processing frameworks v7 (20)

Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
 
Microservices with Micronaut
Microservices with MicronautMicroservices with Micronaut
Microservices with Micronaut
 
Microservices with Micronaut
Microservices with MicronautMicroservices with Micronaut
Microservices with Micronaut
 
Apache samza past, present and future
Apache samza  past, present and futureApache samza  past, present and future
Apache samza past, present and future
 
Apache Samza Past, Present and Future
Apache Samza  Past, Present and FutureApache Samza  Past, Present and Future
Apache Samza Past, Present and Future
 
Apache storm vs. Spark Streaming
Apache storm vs. Spark StreamingApache storm vs. Spark Streaming
Apache storm vs. Spark Streaming
 
Low Latency Execution For Apache Spark
Low Latency Execution For Apache SparkLow Latency Execution For Apache Spark
Low Latency Execution For Apache Spark
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
 
Netflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineNetflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipeline
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
 
Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015
 
Microservices with Micronaut
Microservices with MicronautMicroservices with Micronaut
Microservices with Micronaut
 
AKKA and Scala @ Inneractive
AKKA and Scala @ InneractiveAKKA and Scala @ Inneractive
AKKA and Scala @ Inneractive
 
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and Storm
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
 
Apache Storm
Apache StormApache Storm
Apache Storm
 
Troubleshooting common oslo.messaging and RabbitMQ issues
Troubleshooting common oslo.messaging and RabbitMQ issuesTroubleshooting common oslo.messaging and RabbitMQ issues
Troubleshooting common oslo.messaging and RabbitMQ issues
 
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
 
Benchmarking Apache Samza: 1.2 million messages per sec per node
Benchmarking Apache Samza: 1.2 million messages per sec per nodeBenchmarking Apache Samza: 1.2 million messages per sec per node
Benchmarking Apache Samza: 1.2 million messages per sec per node
 
(SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR
(SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR(SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR
(SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR
 

Recently uploaded

Recently uploaded (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Comparing processing frameworks v7